| name | new-sep |
| description | Scaffold a sep-NNNN.yaml requirement-traceability file for the MCP conformance repo from a SEP PR's spec diff. Runs the new-sep CLI, then parses the modelcontextprotocol/modelcontextprotocol spec diff to populate `requirements[]` with the RFC 2119 sentences and proposed check IDs. |
| argument-hint | <sep-number> |
new-sep: SEP traceability YAML scaffolding
You are bootstrapping a sep-NNNN.yaml file for a new SEP in the MCP conformance repo. The output is the requirement-traceability file specified by SEP-2484: a YAML that maps each normative sentence from the SEP's spec diff to a check: ID (testable) or an excluded: reason (not testable). The CLI gets the skeleton; you fill in the rows by reading the spec diff.
Step 0: Pre-flight checks
Before doing anything else, verify GitHub CLI authentication:
gh auth status 2>&1
If this fails, stop immediately and tell the user:
GitHub authentication is required for this skill. Please run gh auth login first, then re-run.
Verify you're running inside the conformance repo:
test -f package.json && jq -r '.name' package.json
The name should be @modelcontextprotocol/conformance. If not, stop and ask the user to cd into the conformance repo first.
Step 1: Parse arguments
Extract from the user's input:
- sep-number (required): the SEP number, e.g.
2164. This is also the PR number in modelcontextprotocol/modelcontextprotocol by convention.
Step 2: Generate the skeleton
Run the CLI:
npm run --silent build
node dist/index.js new-sep <NNNN>
(For development against a non-built source tree: npx tsx src/index.ts new-sep ....)
The CLI writes src/seps/sep-<NNNN>.yaml with sep, spec_url, and two TODO requirements[] rows. Capture the output path from the CLI's Wrote … line and remember it as $YAML.
If the CLI errors with "does not change any docs/specification/draft/*.mdx", the SEP's spec changes landed in a separate PR — ask the user for the spec file path and rerun with --spec-path docs/specification/draft/<path>. Do not guess.
Step 3: Fetch the spec diff
AGENTS.md (lines 64–72) is explicit that severity must come from the spec text itself, not the SEP markdown or the conformance PR description:
gh api "repos/modelcontextprotocol/modelcontextprotocol/pulls/<NNNN>/files" \
--jq '.[] | select(.filename | test("^docs/specification/draft/.*\\.mdx$")) | {filename, patch}'
For each file, pull the added (+-prefixed) lines from patch. If patch is truncated for a large file, fall back to fetching the whole file at the PR's head ref:
gh api "repos/modelcontextprotocol/modelcontextprotocol/contents/<path>?ref=<sep-branch>" \
--jq '.content' | base64 -d
Step 4: Extract RFC 2119 requirements
Walk the added lines and identify sentences containing the keywords: MUST, MUST NOT, SHOULD, SHOULD NOT, REQUIRED, SHALL, SHALL NOT, MAY, OPTIONAL.
Quote the whole sentence, not just the matched line. The matched word may sit inside a bullet point whose lead-in sentence supplies the keyword by inheritance — e.g.:
Servers SHOULD return standard JSON-RPC errors for common failure cases:
- Resource not found: -32602 (Invalid Params)
The bullet inherits SHOULD. The yaml row should quote the combined obligation: 'Servers SHOULD return standard JSON-RPC errors for common failure cases: Resource not found: -32602 (Invalid Params)' — see src/seps/sep-2164.yaml for the canonical example.
Regex alone is insufficient (this is called out in Issue #243). Read for context: pronouns, "the server", and "such cases" all refer back to the lead-in.
Step 5: Map severity → check vs. excluded
From AGENTS.md:50-56:
| Keyword | Severity | YAML field |
|---|
| MUST / MUST NOT / SHALL / SHALL NOT / REQUIRED | FAILURE | check: sep-<NNNN>-<slug> |
| SHOULD / SHOULD NOT | WARNING | check: sep-<NNNN>-<slug> |
| MAY / OPTIONAL | — | no row — skip entirely |
MAY / OPTIONAL sentences are noted in Step 4 only so you consciously skip them — they never produce a yaml row.
A row is excluded: when a MUST/SHOULD requirement can't be protocol-observed by the harness. Do not write any excluded: row on your own authority — every exclusion goes through Step 6.
While classifying, sort each MUST/SHOULD row into one of three buckets:
check: — observably testable on the wire.
- clearly-excluded — you're confident it can't be observed (e.g. "clients SHOULD also accept -32002" when the harness only drives servers).
- borderline — you'd default to
check: but observability is questionable. Markers:
- Internal state — verbs like record, store, associate, track, cache. The harness sees wire traffic, not memory; usually only observable via a downstream row already in your list.
- UI / human-facing — display, show, render, prompt the user.
- Precondition phrasing — "Before doing X, the implementation MUST Y" where X is itself another row.
Slug convention: lowercase-kebab, derived from the verb phrase. Examples from sep-2164.yaml: no-empty-contents, error-code. Same id is used for SUCCESS and FAILURE (AGENTS.md:52).
Step 6: Confirm exclusions with the user
Nothing becomes excluded: without sign-off. Two rounds:
Round 1 — clearly-excluded, single batch question. One AskUserQuestion listing all clearly-excluded rows in the question body (slug + one-line reason each). Options:
Exclude all as listed (Recommended)
Flip all to check:
Let me adjust per-row — if chosen, append these rows to round 2.
Skip this round if the bucket is empty.
Round 2 — borderline, one question per row. One AskUserQuestion call with a question per borderline row (loop in batches of 4 if needed). For each:
- header: the proposed slug
- question: quote the requirement sentence + your one-line observability concern
- options (list
check: first — it's the default for borderline):
check: — keep as a testable check
excluded: <reason> — drop to excluded with your stated reason
merge into <other-slug> — offer when the row is a precondition for another row already in the list
Apply the answers before writing. For any excluded: outcome, write the reason verbatim into the yaml and add an issue: URL if the user supplies one. A merge outcome means: drop this row, and append its text: to the surviving row's text: separated by / so the traceability isn't lost.
Step 7: Rewrite the YAML
Replace the two TODO rows the CLI generated with one row per extracted requirement. Preserve the CLI's quoting style (single quotes, two-space indent — see src/seps/sep-2164.yaml).
Key order within each row — for check: rows the check: key comes first, then text:, then any optional url:. Scanning the left margin should reveal every check ID without reading the quoted sentences. For excluded: rows the order is text: first, then excluded:, then optional issue: — there's no ID to scan for, so lead with the requirement.
Row order in the file — all check: rows first (in spec-diff order), then all excluded: rows grouped at the bottom, separated from the checks by one blank line. Do not interleave.
requirements:
- check: sep-NNNN-first-slug
text: '...'
- check: sep-NNNN-second-slug
text: '...'
- text: '...'
excluded: 'reason'
issue: https://github.com/modelcontextprotocol/conformance/issues/<NNNN>
If a requirement is ambiguous or you're not confident, leave it as a TODO: row rather than guessing — humans review this yaml before scenarios get written.
Also fix the spec_url: the CLI emits the page URL with no anchor. If the requirements you extracted live under a specific spec subsection (e.g. #error-handling), append it.
If a requirement comes from a different spec page than spec_url (the SEP touched multiple .mdx files — the CLI prints these as "PR also changes N other spec file(s)"), give that row a full url: override:
- check: sep-NNNN-slug
text: '...'
url: https://modelcontextprotocol.io/specification/draft/other/page#anchor
A row's effective spec reference is row.url ?? file.spec_url.
Write the result back to $YAML.
Step 8: Suggest a host scenario
AGENTS.md prefers fewer scenarios with more checks over one-scenario-per-check. Before telling the user to write a new scenario, look for an existing one the new checks could be folded into.
Determine the suite directory from the requirement subjects ("MCP clients MUST…" → client/, "Servers MUST…" → server/, "authorization servers MUST…" → authorization-server/; a SEP may map to more than one). Then search that directory for scenarios touching the same spec area:
rg -l -i '<domain-term>|<domain-term-2>' src/scenarios/<suite>/ --type ts
Pick 2–3 domain terms from the SEP's subject matter (for a discovery SEP: metadata, well-known; for an auth-response SEP: redirect, callback, pkce). For each hit, pull the scenario's name/description to confirm relevance:
rg -A1 'name:|description:' <hit.ts>
If you find a plausible host, recommend it by path. If nothing fits, say so explicitly — a new scenario file is then the right call.
Step 9: Hand-off
Report to the user, in this order:
- Path to the generated yaml.
- Row counts: "
N check: rows, M excluded: rows" — and note which exclusions the user signed off in Step 6.
- Any requirements you left as
TODO: and why.
- Host-scenario recommendation from Step 8 — either "consider adding these checks to
src/scenarios/<suite>/<file>.ts (it already exercises X)" or "no existing scenario covers this area; a new file is appropriate".
- Remaining next steps the user owns:
- add the checks to the host scenario (or create one) under
src/scenarios/{client,server,authorization-server}/,
- register any new scenario in
src/scenarios/index.ts (AGENTS.md:48),
- add a passing example to the everything-client/server and a negative test, per
AGENTS.md:74-81.
Do not generate or edit scenario .ts files or touch src/scenarios/index.ts. The skill's scope ends at the yaml plus the recommendation.