一键在 Manus 中运行任何 Skill

$pwd:

mcp-e2e-validate

Name: Mcp E2e Validate
Author: DataRecce

// Use when the user asks to "validate MCP", "run MCP E2E", "run E2E validation", "benchmark MCP performance", "test the plugin flow", "test MCP integration", "compare MCP versions", "show benchmark history", "驗證 MCP", "跑 E2E", "看歷史紀錄", or wants to verify the recce plugin's full event chain works end-to-end and measure agent performance metrics.

在 Manus 中运行

$ git log --oneline --stat

stars:4

forks:1

updated:2026年3月24日 06:33

文件资源管理器

6 个文件

SKILL.md

readonly

related-skills.json

同仓库

recce-verify.md

from "DataRecce/recce-claude-plugin"

Lightweight pre-commit verification for dbt model changes in the single-environment dev loop — when the user has a warehouse-connected dbt project but no `target-base/` artifacts. Triggers when: user asks to verify a model change, check whether an edit is safe to commit, sanity-check a filter/aggregation/join change without setting up a base environment, or asks for a quick risk read before running /recce-review. Uses Tier-1 evidence only — column lineage, AST analysis, and targeted current-env SQL probes. Routes to /recce-review when `target-base/` is fresh.

2026-05-224

recce-guide.md

from "DataRecce/recce-claude-plugin"

Automatically provide Recce guidance in dbt projects. Triggers when: working in dbt project directory, discussing PRs or data changes, after dbt command execution, or when user asks about data validation.

2026-05-084

recce-review.md

from "DataRecce/recce-claude-plugin"

Review dbt model data changes using Recce. Triggers when: user asks to review data changes, check data impact, run recce review, validate model changes before committing, review a Recce Cloud PR session, connect MCP to a cloud session, pastes a GitHub PR / GitLab MR URL, or pastes a Recce Cloud session/launch URL for cloud-mode review.

2026-05-054

recce-eval.md

from "DataRecce/recce-claude-plugin"

Use when the user asks to "run eval", "recce eval", "evaluate plugin", "benchmark recce", "compare with plugin", "compare without plugin", "eval case", "score eval", "eval report", "eval history", "list eval scenarios", "list eval cases", "show eval history", "run eval case", or wants to measure the Recce Review Agent's effectiveness compared to pure Claude Code without the plugin.

2026-04-024

readme-refresh.md

from "DataRecce/recce-claude-plugin"

This skill should be used when the user asks to "update readme", "review readme", "refresh readme", "audit readme", "fix the main readme", "改 readme", "檢查 readme", or when the root README.md needs to reflect new plugin changes, version bumps, or feature additions.

2026-03-144

dbt-ci.md

from "DataRecce/recce-claude-plugin"

Use when generating GitHub Actions workflows for dbt, adding dbt build/test/docs steps, or configuring state-based selection with --select state:modified+. Provides templates for both full builds (CD) and incremental builds (CI).

2026-02-064

package.json

"author": "DataRecce"

"repository": "DataRecce/recce-claude-plugin"

打开 GitHub 仓库查看创作者相关仓库

$ install --global

$ download --local

在 Manus 中运行

$ useful --forSOC

软件质量保证分析师与测试员计算机与数学类职业15-1253L4

name	mcp-e2e-validate
description	Use when the user asks to "validate MCP", "run MCP E2E", "run E2E validation", "benchmark MCP performance", "test the plugin flow", "test MCP integration", "compare MCP versions", "show benchmark history", "驗證 MCP", "跑 E2E", "看歷史紀錄", or wants to verify the recce plugin's full event chain works end-to-end and measure agent performance metrics.
version	0.3.0
flow_version	1.0.0

/mcp-e2e-validate — MCP Integration E2E Validation & Benchmark

Validate the recce plugin's full event chain against a real dbt project and produce a performance benchmark report. Automatically saves results for historical comparison and loads the previous run as baseline.

Flow version: The flow_version field in frontmatter tracks the test flow structure. Only compare benchmarks with the same MAJOR.MINOR flow version — different flow versions produce incomparable metrics.

Flow version change	When
MAJOR bump	Test steps added/removed, agent dispatch logic changed
MINOR bump	Selector strategy, parameter defaults, or validation criteria changed
PATCH bump	Report format, error messages, documentation only

Setup: Read learned patterns before starting:

Read → ${CLAUDE_PLUGIN_ROOT}/reference/learned-patterns.md

Dependencies: This skill relies on the sibling recce plugin's scripts (start-mcp.sh, stop-mcp.sh, check-mcp.sh) and hooks (track-changes.sh, suggest-review.sh). It also dispatches the recce-reviewer agent.

Cross-plugin path: The recce plugin is located via resolve-recce-root.sh, which auto-detects both monorepo and cache layouts:

eval "$(bash ${CLAUDE_PLUGIN_ROOT}/scripts/resolve-recce-root.sh)"
# Sets RECCE_PLUGIN_ROOT and LAYOUT (monorepo|cache)

If the script is unavailable or fails, abort with an error — do not hardcode paths.

Shell variable note: Each Bash tool invocation runs in a fresh shell. Steps that reference sibling plugin scripts must re-evaluate resolve-recce-root.sh to set RECCE_PLUGIN_ROOT.

Inputs

Parse user input for optional parameters:

--baseline: Override auto-baseline. Accepts a timestamp (e.g., 2026-03-13T041957) to compare against a specific historical run, or none to skip comparison entirely. Without this flag, the skill automatically loads latest.json as baseline.
--model: Model to edit for testing (default: first .sql file under models/staging/)
--marker: Comment marker to inject (default: -- recce-e2e-validation)
--skip-dbt: Skip the dbt run step if models were already built
--history: Show benchmark history table and exit (no E2E run). Accepts optional --limit N and --flow-version X.Y.Z filters.

If no parameters provided, use defaults and run the full flow.

--history short-circuit: If --history is present, skip Steps 1–6 entirely. Run:

bash ${CLAUDE_PLUGIN_ROOT}/skills/mcp-e2e-validate/scripts/show-history.sh [--limit N] [--flow-version X.Y.Z]

Parse the output:

If ERROR= → show the error verbatim (e.g., "jq is required"). STOP.
If NO_HISTORY=true → tell user "No benchmark history found. Run a full E2E validation first." STOP here — do not proceed to Step 1.
If HAS_HISTORY=true → display the markdown table between ---TABLE_START--- and ---TABLE_END---, then show TOTAL_RUNS count. STOP here — do not proceed to Step 1.

Step 1: Pre-flight

Run the pre-flight check script:

bash ${CLAUDE_PLUGIN_ROOT}/skills/mcp-e2e-validate/scripts/preflight.sh

Parse KEY=VALUE output. Abort if any BLOCK= line appears — show the message verbatim.

Handle warnings:

SSE_SUPPORT=false → Inform user: editable install may need rm -rf site-packages/recce/ then pip install -e ".[mcp]". See memory for details.
PORT_STATUS=occupied_by_other → Suggest changing port in .claude/recce/settings.json
STALE_FILES=found → Auto-clean: rm -f /tmp/recce-mcp-*.pid /tmp/recce-changed-*.txt

Record RECCE_VERSION, PORT, and DBT_ADAPTER for the report.

Step 2: Start MCP Server

Resolve the recce plugin root, then start:

eval "$(bash ${CLAUDE_PLUGIN_ROOT}/scripts/resolve-recce-root.sh)"
bash "${RECCE_PLUGIN_ROOT}/scripts/start-mcp.sh"

If STATUS=STARTED or STATUS=ALREADY_RUNNING → record PORT and PID, proceed.
If ERROR= → abort with error details.

Verify with health check:

bash "${RECCE_PLUGIN_ROOT}/scripts/check-mcp.sh"

Confirm RUNNING=true before proceeding.

Step 3: Inject Test Edit (Tier 1 Trigger)

Select the target model file (from --model or default staging model).
Read the file and record its original content.
Append the marker comment (-- recce-e2e-validation) on a new line at the end.
Use the Edit tool (this triggers track-changes.sh PostToolUse hook).
Wait 2 seconds for async hook execution, then verify tracking:

PROJECT_HASH=$(printf '%s' "$PWD" | md5 2>/dev/null | cut -c1-8 || printf '%s' "$PWD" | md5sum | cut -c1-8)
cat /tmp/recce-changed-${PROJECT_HASH}.txt

Evaluate result:
- File exists and contains the edited model path → Tier 1 PASS
- File missing → Tier 1 FAIL (hook) — apply fallback below, then continue

Tier 1 Fallback: If the tracking file does not exist (common after mid-session plugin install — hooks require a fresh session to activate), manually create it so downstream steps can proceed:

PROJECT_HASH=$(printf '%s' "$PWD" | md5 2>/dev/null | cut -c1-8 || printf '%s' "$PWD" | md5sum | cut -c1-8)
echo "<ABSOLUTE_PATH_TO_EDITED_MODEL>" > /tmp/recce-changed-${PROJECT_HASH}.txt

Record in the report: "Tier 1 FAIL (hook not fired — manually simulated)".

Step 4: dbt Run (Tier 2 Trigger)

Skip if --skip-dbt was specified.

Run dbt on the modified model and downstream:

dbt run -s {model_name}+

dbt completes with PASS → record model count. The suggest-review.sh hook should inject a review suggestion into context. Tier 2 PASS.
dbt fails → Tier 2 FAIL (record error, continue to cleanup)

Step 5: Dispatch Review Agent

Use the Agent tool to dispatch the recce-reviewer agent (subagent_type: recce:recce-reviewer) with the tracked model context:

"Changed models (from tracked file): {model_name}. Focus review on these models using selector: {model_name}+"

Capture the full agent result, including the <usage> block. Extract:

tool_uses — number of MCP tool calls
total_tokens — total token consumption
duration_ms — wall-clock time

Check agent output for ## Data Review Summary. Validate against pass criteria in references/pass-criteria.md:

Concrete row count numbers (non-zero integers)
Risk level present (LOW/MEDIUM/HIGH)
Model names in summary
No MCP tool errors

Step 6: Cleanup

Execute in order:

Revert model edit — restore the file to its original content (remove marker comment).

Stop MCP server:

eval "$(bash ${CLAUDE_PLUGIN_ROOT}/scripts/resolve-recce-root.sh)"
bash "${RECCE_PLUGIN_ROOT}/scripts/stop-mcp.sh"

Clean tracked files:

PROJECT_HASH=$(printf '%s' "$PWD" | md5 2>/dev/null | cut -c1-8 || printf '%s' "$PWD" | md5sum | cut -c1-8)
rm -f "/tmp/recce-changed-${PROJECT_HASH}.txt"

Stale state check — verify no /tmp/recce-mcp-*.pid or /tmp/recce-changed-*.txt remain.

Step 7: Save & Report

This step has three sub-phases: load baseline → save current → produce report.

Ordering matters: Load baseline BEFORE saving current, because save-benchmark.sh overwrites latest.json. If you save first, load-baseline.sh would load the run you just saved — comparing against itself (all-zero deltas).

7a: Load Baseline

If --baseline none: Skip comparison entirely.

If --baseline <timestamp>: Load a specific historical run:

bash ${CLAUDE_PLUGIN_ROOT}/skills/mcp-e2e-validate/scripts/load-baseline.sh --timestamp <timestamp>

If no --baseline flag (default): Auto-load the most recent run:

bash ${CLAUDE_PLUGIN_ROOT}/skills/mcp-e2e-validate/scripts/load-baseline.sh

Parse the KEY=VALUE output. If BASELINE_FOUND=true, record the baseline metrics. If BASELINE_FOUND=false, this is the first run — no comparison available.

Comparability check: Compare the baseline's FLOW_VERSION with the current skill's flow_version from frontmatter. If the MAJOR or MINOR version differs, add a warning to the report: "⚠️ Flow version changed ({baseline_fv} → {current_fv}), delta may not be comparable."

7b: Save Current Benchmark

Save the current run's results. Build the JSON arguments from data collected during Steps 1–6 (RECCE_VERSION, DBT_ADAPTER, DBT_PROJECT_NAME from Step 1 preflight; performance metrics from Step 5 agent dispatch):

bash ${CLAUDE_PLUGIN_ROOT}/skills/mcp-e2e-validate/scripts/save-benchmark.sh \
  --flow-version {flow_version_from_frontmatter} \
  --recce-version {recce_version_from_step1} \
  --adapter {DBT_ADAPTER_from_step1} \
  --project {dbt_project_name} \
  --model {test_model_name} \
  --results '{"preflight":"{PASS|FAIL}","mcp_startup":"{PASS|FAIL}","tier1_tracking":"{PASS|FAIL}","tier2_suggestion":"{PASS|FAIL}","review_agent":"{PASS|FAIL}","cleanup":"{PASS|FAIL}"}' \
  --performance '{"tool_uses":{N},"total_tokens":{N},"duration_s":{N}}' \
  --risk-level {HIGH|MEDIUM|LOW} \
  --verdict {PASS|FAIL}

Confirm SAVED=true in the output. Record BENCHMARK_FILE for the report.

If save fails (non-zero exit, ERROR= in output, or SAVED=true absent): report the error in the benchmark report's Persistence section as "Benchmark save failed: {error}". Do NOT change the overall verdict — persistence failure is informational, not a test failure. The E2E validation results (Steps 1–6) are still valid.

7c: Produce Report

Generate the report using the template in references/pass-criteria.md.

If baseline was loaded (Step 7a), compute deltas:

delta = current - baseline
delta_pct = (delta / baseline) * 100

Present negative deltas (improvements) with emphasis.

Include at the bottom of the report:

Benchmark saved: {BENCHMARK_FILE} — so user knows where it was persisted
Baseline: {BASELINE_FILE} (or "first run — no baseline") — so user knows what was compared

Output the full report to the user. If all pass criteria are met, end with Verdict: PASS. Otherwise list failures.

Common Mistakes

Forgetting eval: Running bash resolve-recce-root.sh without eval "$(...)" does not set RECCE_PLUGIN_ROOT in the current shell.
Shell variables do not persist: Each Bash tool invocation starts a fresh shell. Re-run resolve-recce-root.sh in every step that needs RECCE_PLUGIN_ROOT.
Platform-specific md5: macOS uses md5, Linux uses md5sum. The snippets in this skill handle both — do not simplify to one.
Not waiting after Edit: The PostToolUse hook fires asynchronously. Wait 2 seconds before checking the tracking file.
Skipping cleanup on failure: If any step fails, still execute Step 6 (cleanup). Never leave the MCP server running or model edits in place.
Agent dispatch: Use the Agent tool with subagent_type: recce:recce-reviewer. Do not attempt to bash an agent markdown file.
Step 7 ordering: Always load baseline BEFORE saving current. Reversing the order produces self-comparison (all-zero deltas) because save overwrites latest.json.
Benchmark scripts use cwd: save-benchmark.sh, load-baseline.sh, and show-history.sh write/read .claude/recce/benchmarks/ relative to the current working directory (the dbt project root). They are not affected by CLAUDE_PLUGIN_ROOT.
Benchmark artifacts are local-only: .claude/recce/benchmarks/ should be in the dbt project's .gitignore to avoid accidentally committing benchmark JSON files. The plugin's own .gitignore does not cover this — it must be added to the user's dbt project.

Additional Resources

Reference Files

references/pass-criteria.md — Detailed pass/fail criteria per section, performance metrics extraction guide, and the benchmark report template.

Scripts

scripts/preflight.sh — Pre-flight environment checks (dbt project, recce version, SSE support, port availability, stale files). Outputs KEY=VALUE lines.
scripts/save-benchmark.sh — Persists benchmark result as JSON to .claude/recce/benchmarks/. Updates latest.json for auto-baseline.
scripts/load-baseline.sh — Loads baseline from latest.json, a specific timestamp, or a flow-version filter. Outputs KEY=VALUE lines.
scripts/show-history.sh — Displays benchmark history as a markdown table with optional --limit and --flow-version filters.
scripts/resolve-recce-root.sh (plugin-level) — Locates sibling recce plugin across monorepo and cache layouts. Outputs RECCE_PLUGIN_ROOT and LAYOUT.

Learning

After completing the main workflow:

Detection: Any unexpected failure, workaround, or pattern not in learned-patterns.md?
- Yes → check ${CLAUDE_PLUGIN_ROOT}/reference/learned-patterns.md, if not covered → append D1 entry
- No → one question: "What was most unexpected?" → three-question test → capture or done
"Nothing novel" is a valid and encouraged outcome.

mcp-e2e-validate

同仓库更多 Skills

同仓库更多 Skills

/mcp-e2e-validate — MCP Integration E2E Validation & Benchmark

Inputs

Step 1: Pre-flight

Step 2: Start MCP Server

Step 3: Inject Test Edit (Tier 1 Trigger)

Step 4: dbt Run (Tier 2 Trigger)

Step 5: Dispatch Review Agent

Step 6: Cleanup

Step 7: Save & Report

7a: Load Baseline

7b: Save Current Benchmark

7c: Produce Report

Common Mistakes

Additional Resources

Reference Files

Scripts

Learning

/mcp-e2e-validate — MCP Integration E2E Validation & Benchmark

Inputs

Step 1: Pre-flight

Step 2: Start MCP Server

Step 3: Inject Test Edit (Tier 1 Trigger)

Step 4: dbt Run (Tier 2 Trigger)

Step 5: Dispatch Review Agent

Step 6: Cleanup

Step 7: Save & Report

7a: Load Baseline

7b: Save Current Benchmark

7c: Produce Report

Common Mistakes

Additional Resources

Reference Files

Scripts

Learning