| name | mapping-builder |
| description | Build a reviewed crosswalk (RequirementMappingSet YAML library + review xlsx/csv) between two CISO Assistant framework YAML files using Claude itself as the reasoning engine. Zero infrastructure — stdlib + pyyaml only, no embedders, no LM Studio, no Qdrant. Use when the user asks to map / crosswalk / generate a mapping between two frameworks (e.g. ccb-cff-2023-03-01.yaml ↔ cyfun2025.yaml), wants to contribute a community mapping to backend/library/libraries/, or says things like "build a mapping between framework X and Y", "create a crosswalk YAML", "generate requirement_mapping_set". Output matches the schema in backend/library/libraries/mapping-*.yaml exactly so the result is PR-able. |
Mapping Builder
What this skill does
Generate a CISO Assistant requirement_mapping_set library YAML — plus a human-review xlsx/csv — by reasoning over two framework YAMLs entirely in this conversation. No embedders, no local LLMs, no vector DBs.
The output YAML drops directly into backend/library/libraries/ and is loadable by the platform's library loader.
When NOT to use this skill:
- For mapping huge frameworks (>500 assessable items per side). The conversation context fills up. Use
tools/mapping_builder/map_v2.py instead — it has hybrid retrieval and runs locally.
- For batch/automated runs that need to be deterministic across sessions. Use the local CLI.
How to use the helpers
Stdlib + pyyaml scripts under .claude/skills/mapping-builder/scripts/. All invocations in this doc assume you run from the repo root with the project's venv.
SKILL=.claude/skills/mapping-builder/scripts
.venv/bin/python $SKILL/parse_framework.py path/to/framework.yaml > parsed.json
.venv/bin/python $SKILL/cat_slice.py parsed.json
.venv/bin/python $SKILL/cat_slice.py parsed.json ID.AM,GV.OC
.venv/bin/python $SKILL/write_mapping_yaml.py spec.json output.yaml
.venv/bin/python $SKILL/write_review.py spec.json source_parsed.json target_parsed.json review.xlsx
.venv/bin/python $SKILL/audit_verdicts.py parsed_src.json verdicts.jsonl --threshold 6
.venv/bin/python $SKILL/diff_mappings.py A.yaml B.yaml --sample 5
.venv/bin/python $SKILL/render_html.py spec.json src_parsed.json tgt_parsed.json out.html
The spec.json for write_mapping_yaml.py and write_review.py is built up across the workflow below — see the docstrings of the scripts for the exact field shape.
Workflow
Run the steps in order. Do NOT skip the section-affinity step — it is the single most important quality lever and prevents wasting context on irrelevant pairs.
Step 1 — confirm the request and check for duplicates
Before doing any work, confirm with the user:
- Source framework YAML path
- Target framework YAML path
- Direction matters: source maps INTO target
Then Glob for backend/library/libraries/mapping-*.yaml and check if a published mapping for this exact pair already exists. If yes, surface it to the user — they may want to extend rather than recreate.
Step 2 — parse both frameworks
.venv/bin/python .claude/skills/mapping-builder/scripts/parse_framework.py \
backend/library/libraries/SOURCE.yaml > /tmp/src.json
.venv/bin/python .claude/skills/mapping-builder/scripts/parse_framework.py \
backend/library/libraries/TARGET.yaml > /tmp/tgt.json
Read both JSONs (or summarize via the CLI if too large) and confirm the framework sizes with the user. Note n_assessable and n_sections for each side.
Hard limit check: if either side has > 500 assessable items, stop and recommend the local CLI (tools/mapping_builder/map_v2.py). Don't try to power through — context will fail mid-mapping.
Check for parse warnings: parse_framework.py writes warnings to stderr (and a warnings array in the JSON). Two kinds you'll see:
- Empty-description items — assessable items in the source YAML with no name and no description. They can't be mapped; surface them to the user as known gaps.
- Too-few-sections hint — fires when there are ≤8 top-level sections but many items (NIST CSF-style frameworks: 5 Functions in v1.1, 6 in v2.0). In this case the
sections list is useless for affinity — use Category-level slicing via cat_slice.py instead (see Step 3).
Step 3 — section-affinity narrowing (DO NOT SKIP)
Read the sections array from each parsed JSON (typically 5-30 entries per side, so cheap to consider in full).
If parse warned about coarse sections, use cat_slice.py to list categories on both sides instead:
.venv/bin/python .claude/skills/mapping-builder/scripts/cat_slice.py /tmp/src.json
.venv/bin/python .claude/skills/mapping-builder/scripts/cat_slice.py /tmp/tgt.json
Then build affinity at the Category level (e.g. ID.AM → ID.AM + GV.OC + GV.RR) rather than at the Function level. NIST CSF 2.0 restructuring moves many CSF 1.1 Categories across Functions (e.g. ID.GV → GV.PO/GV.RR, PR.IP → PR.PS/PR.IR/ID.IM), so affinity must operate at that granularity.
Build a section affinity table by reasoning over section names alone — for each source section, identify the 1-3 target sections most likely to contain real mappings:
SOURCE section ID.AM (Asset Management)
→ likely targets: ID.AM (direct), GV.OC (governance overlap)
SOURCE section PR.AC (Access Control)
→ likely targets: PR.AA (Access Authentication)
... etc
Print this affinity table to the user, ask them to sanity-check it. If they correct anything, update before proceeding. This is where most recall is won or lost — embedders systematically over-match on shared jargon; reading section names is more discriminating.
Step 4 — per-section deep dive
For each (source_section, [target_sections]) entry from Step 3, do the following in order:
- Pull the source items that belong to
source_section and the target items that belong to any of the target_sections. With section narrowing, this is typically 5-20 sources × 5-30 targets = 25-600 candidate pairs per section.
- Read each source item's
full_sentence, then read all candidate targets' full_sentence.
- For each source, decide which targets are real mappings. Output a JSON list of verdicts:
Affinity is a starting heuristic, not a fence. Expect to expand target categories mid-section as you encounter obvious cross-matches — e.g. while mapping ID.AM in CCB CFF you'll find ID.AM-6.1 (roles/responsibilities) clearly belongs with GV.RR in CSF 2.0, even if you didn't list GV.RR as a Step-3 target for ID.AM. When this happens, pull the extra target category via cat_slice.py and continue. Don't force a bad match just because affinity said only one target. Mention the expansion to the user so they can sanity-check.
{
"source_requirement_urn": "...",
"target_requirement_urn": "...",
"relationship": "equal | intersect | subset | superset",
"strength_of_relationship": 0-10,
"rationale": "one-sentence why"
}
Relationship rubric (matches the platform enum):
- equal: same obligation, interchangeable for audit. Reverse maps to equal.
- intersect: meaningful overlap, neither covers the other. Reverse maps to intersect.
- subset: source is contained within target's scope. Reverse maps to superset.
- superset: source is broader than target. Reverse maps to subset.
- (
not_related is never emitted — just omit the pair from the verdicts.)
strength_of_relationship is a 0-10 confidence in the existence of the mapping (NOT in the equal/intersect label).
Append verdicts to a running verdicts list. Persist the running list to /tmp/<mapping_name>_verdicts.jsonl after each section so progress is recoverable.
Step 5 — borderline review with the user
Before emitting the YAML, run the audit script on the running verdicts jsonl:
.venv/bin/python .claude/skills/mapping-builder/scripts/audit_verdicts.py /tmp/src.json /tmp/<name>_verdicts.jsonl --threshold 6
It reports coverage %, unmapped source items (split into empty-description YAML bugs vs. real gaps), relationship/strength distributions, and every verdict ≤ threshold.
Surface ambiguous calls to the user. Specifically:
- All verdicts with
strength_of_relationship ≤ 6 (listed by the audit)
- Pairs where the equal vs intersect call could go either way
- Source items that ended up with zero verdicts (might indicate a missed section pairing) — the audit separates these from YAML-bug unmapped items
Show source/target text side-by-side and your rationale. Let the user override relationship, override strength_of_relationship, or drop the pair entirely. Apply edits to the running verdicts list.
Step 6 — emit YAML + review file
Build the spec.json:
{
"ref_id": f"mapping-{source_ref_id_lower}-and-{target_ref_id_lower}",
"name": f"{source_framework_name} <-> {target_framework_name}",
"description": "...",
"version": 1,
"publication_date": today_iso,
"copyright": "...",
"provider": "...",
"packager": "...",
"source_library_urn": <from src.json[\"library_urn\"]>,
"source_framework_urn": <from src.json[\"framework_urn\"]>,
"target_library_urn": <from tgt.json[\"library_urn\"]>,
"target_framework_urn": <from tgt.json[\"framework_urn\"]>,
"verdicts": [...]
}
Pull the URNs straight from the parsed JSON — do NOT construct them by hand. The library URN slug is not always the same as the framework ref_id (e.g. cyfun2025 → urn:intuitem:risk:library:ccb-cyfun2025).
Then run:
.venv/bin/python .claude/skills/mapping-builder/scripts/write_mapping_yaml.py \
/tmp/spec.json \
backend/library/libraries/mapping-<source>-and-<target>.yaml
.venv/bin/python .claude/skills/mapping-builder/scripts/write_review.py \
/tmp/spec.json /tmp/src.json /tmp/tgt.json \
/tmp/mapping-<source>-and-<target>_review.xlsx
.venv/bin/python .claude/skills/mapping-builder/scripts/render_html.py \
/tmp/spec.json /tmp/src.json /tmp/tgt.json \
/tmp/mapping-<source>-and-<target>_preview.html
The mapping YAML automatically contains both the forward AND reverse mapping sets (subset/superset are auto-flipped on the reverse).
The HTML preview is self-contained (embedded JSON, vanilla JS, no network) — open it in a browser to inspect the mapping. It shows a Category-level heatmap at the top (source categories × target categories with per-cell counts, click to filter) and a filterable pair list below. Categories are bucketed via the same regex as cat_slice.py (NIST CSF, ISO Annex A, numeric prefix), with section names shown on hover.
Step 7 — final summary, no auto-PR
Report to the user:
- Total mappings produced (forward; the reverse is auto-derived)
- Distribution by relationship type
- Path to the YAML library file
- Path to the review xlsx
- Path to the HTML preview
- Suggested commit message
If a prior published mapping existed (flagged in Step 1), run diff_mappings.py against it and report the shared/only-in-A/only-in-B breakdown plus label disagreements. This is the most useful evaluation signal: pairs only in the published version may point to secondary intersects the rebuild missed; pairs only in the rebuild may point to real gains worth retaining.
Do NOT auto-commit or auto-push. The user is putting their name on a community contribution; they confirm before pushing. If they ask, then git checkout -b <branch> and stage the new YAML.
Output schema reference
The emitted YAML must match this top-level shape exactly (the write_mapping_yaml.py script enforces it):
urn: urn:intuitem:risk:library:<ref_id>
locale: en
ref_id: <ref_id>
name: <name>
description: <description>
copyright: ...
version: 1
publication_date: YYYY-MM-DD
provider: ...
packager: ...
dependencies:
- <source library URN>
- <target library URN>
objects:
requirement_mapping_sets:
- urn: urn:intuitem:risk:req_mapping_set:<ref_id>
ref_id: <ref_id>
name: <name>
description: ...
source_framework_urn: <source framework URN>
target_framework_urn: <target framework URN>
requirement_mappings:
- source_requirement_urn: ...
target_requirement_urn: ...
relationship: equal | intersect | subset | superset
strength_of_relationship: 0-10
rationale: ...
-
...
Look at any existing backend/library/libraries/mapping-*.yaml for a real example.
Common mistakes to avoid
- Skipping section affinity and trying to consider source × target pairs blindly — context fills up and recall drops.
- Constructing URNs by string concatenation instead of pulling them from the parsed framework JSONs. Library and framework URN slugs are not always equal to ref_id.
- Only emitting the forward mapping set. The platform expects two sets in the library (forward + reverse). The
write_mapping_yaml.py script generates the reverse automatically — don't bypass it.
- Forgetting
subset ↔ superset flip on reverse. The script handles this; if you write the YAML by hand, you'll get it wrong.
- Auto-committing the result. This is a community contribution; the user always confirms.
- Using
not_related as a relationship value. It's not in the published schema. Just omit pairs that aren't real mappings.
Quick reference
| File | Purpose |
|---|
scripts/parse_framework.py | Framework YAML → JSON with items grouped by section; emits warnings for empty-description items and coarse-section frameworks |
scripts/cat_slice.py | Slice parsed items by Category (e.g. ID.AM, PR.AC); summary mode or filter mode |
scripts/write_mapping_yaml.py | Verdicts spec → published-format library YAML (forward + reverse) |
scripts/write_review.py | Verdicts spec + parsed frameworks → xlsx (or csv fallback) for human audit |
scripts/audit_verdicts.py | Verdicts jsonl (or mapping YAML) + parsed source → coverage %, unmapped items, low-strength verdicts |
scripts/diff_mappings.py | Compare two mapping YAMLs (or YAML vs jsonl) — shared / only-A / only-B / label disagreements |
scripts/render_html.py | Standalone HTML preview: category heatmap + filterable pair list (no network deps) |