name	context-budget-audit
description	Audit prompt sizes across build orchestration dirs. Surfaces sections that dominate the prompt (>40%), tracks per-phase budget vs the model's context window, and checks whether directives in the prompt were actually covered in the response.
argument-hint	<track>/<slug> \| <track> <slug> \| <track> <range>

Context Budget Audit: $ARGUMENTS

Why

When a build produces low scores, we usually discover only after the fact that the wiki packet was 62.5% of the write prompt and drowned out everything else. This skill catches that pattern BEFORE it wastes a build — or explains it AFTER when you're debugging.

The v6 write prompt has historically grown without anyone noticing. The first real data point: #1070 surfaced a 33K-char write prompt that was causing Gemini to ignore MCP tool instructions. The Gemini session analyzer (#1174, scripts/build/session_analysis.py) now emits a session-analysis.yaml next to every dispatch meta file. This skill is the human-readable interpretation layer on top of those raw files.

Parse Arguments

The user provides one of these patterns:

Full path: curriculum/l2-uk-en/a2/orchestration/a2-bridge/
Track + slug: a2 a2-bridge
Track + range: a2 1-10 (audit a range of modules in sequence)
Level only: a2 (audit every module with a non-empty orchestration dir)

Resolve the path(s) and process each in turn.

Execute

For each orchestration dir, walk dispatch/ and load every *-session-analysis.yaml file. Each file has this shape:

session_path: /Users/.../session-2026-04-10T21-23-efd64bd5.json
phase: write-chunk-02
prompt_chars: 41214
response_chars: 8695
prompt_words: 5683
response_words: 1442
sections:
  - name: header
    length: 489
    fraction: 0.0119
  - name: skeleton
    length: 1535
    fraction: 0.0372
  - name: plan
    length: 9991
    fraction: 0.2424
  - name: wiki
    length: 25780
    fraction: 0.6254
  - name: output_spec
    length: 128
    fraction: 0.0031
large_sections:
  - wiki
directives_total: 10
directives_covered: 6
directives_missed:
  - kind: must
    text: Each section MUST contain at least one callout.
    covered: false
    coverage_note: "1/4 keywords (25%)"

Aggregate this data across phases and produce a report with four sections:

1. Budget overview per phase

A table: phase → prompt_chars → response_chars → compression ratio → flagged (yes if any large section).

2. Oversized sections (> 40% threshold)

Any section whose fraction > 0.40. These are drowning-out candidates. Group by phase and section name:

write-chunk-02: wiki at 62.5% (25.7K of 41.2K) — recommendation: chunk the wiki packet or reduce _SECTION_MARKERS trim window in scripts/build/session_analysis.py.
write: plan at 48% (19K of 40K) — recommendation: trim plan content_outline in the writer prompt template (the plan belongs in check phase, not write).

3. Directive coverage

Cross-phase average of directives_covered / directives_total. List the top 5 most-frequently-missed directives (normalize by text prefix) so the user can decide if the directive is worth keeping or rephrasing.

4. Context window headroom

For each phase, compute prompt_chars / model_context_chars. Model context windows (as of 2026-04):

Model	Context (chars, 4 chars/token approx)
`gemini-3.1-pro-preview`	4,000,000 (1M tokens)
`gemini-3-flash`	4,000,000
`claude-opus-4-6`	800,000 (200K tokens)
`gpt-5.5` (Codex)	1,000,000

Flag any phase where prompt_chars exceeds 50% of the model's window — that's a hard rule for this skill. (For all current pipelines, this is ~500K+ chars, which should NEVER happen for a single v6 phase. If it does, something is badly wrong.)

5. Trend analysis (optional)

If there are multiple orchestration dirs for the same slug (from repeated builds / retries), compare prompt sizes across attempts. If the prompt is GROWING across retries (e.g. due to correction directives being appended), flag it.

Output

Save the report to curriculum/l2-uk-en/{track}/audit/{slug}-context-budget.md.

Format:

# Context Budget Audit: {slug}

**Track**: {track}
**Audited at**: {ISO timestamp}
**Orchestration dir**: {absolute path}

## Phase overview

| Phase | Prompt (chars) | Response (chars) | Compression | Flagged |
|---|---:|---:|---:|:-:|
| check | 1234 | 456 | 2.7x | |
| research | ... |
| skeleton | ... |
| write-chunk-01 | 41214 | 8695 | 4.7x | ⚠️ wiki 62.5% |
| write-chunk-02 | ... |
| review | ... |

## Oversized sections (> 40% of prompt)

- **write-chunk-01**: `wiki` at 62.5% (25,780 / 41,214 chars)
  - Recommendation: cap wiki packet at 20K chars in `_build_chunk_prompt`, or move wiki to a separate enrich phase after write.

## Directive coverage

Average: 6 / 10 (60%) — this is low. Target is ≥85%.

Top 5 missed directives (across all phases):
1. `Each section MUST contain at least one callout.` — missed in 3 of 4 chunks
2. `<!-- INJECT_ACTIVITY: quiz, Case Drill, 8 items -->` — missed in write-chunk-01
...

## Context window headroom

| Phase | Prompt | Model | Window | Used |
|---|---:|---|---:|---:|
| write-chunk-01 | 41K | gemini-3.1-pro-preview | 4M | 1.0% |

No phase exceeds the 50% threshold. Good.

## Recommendations

1. Cap wiki packet at 20K chars in `_build_chunk_prompt` (drops prompt by 30%).
2. Investigate why the callout directive is being ignored — it's explicit in the prompt but gets 25% keyword coverage.

Batch mode

When given a range or a whole level, also produce curriculum/l2-uk-en/{track}/audit/context-budget-summary.md:

Per-phase averages (mean prompt_chars, mean directive coverage)
Top 10 modules with the largest prompts
Top 10 modules with the lowest directive coverage
Cross-module patterns: directives missed in >50% of modules (these need prompt engineering, not per-module fixes)

Source of truth

All data comes from *-session-analysis.yaml files emitted by _save_dispatch_log in scripts/build/dispatch.py. The analysis logic lives in scripts/build/session_analysis.py and the parser in scripts/build/gemini_session.py. If a phase has no *-session-analysis.yaml file, skip it silently — that phase either pre-dates #1174 or used a non-Gemini agent (Codex/Claude session parsers are follow-up work).

Reference issues: #1076 (this skill), #1174 (the underlying analyzer), #1070 (the original 33K-prompt incident that motivated both).

name	context-budget-audit
description	Audit prompt sizes across build orchestration dirs. Surfaces sections that dominate the prompt (>40%), tracks per-phase budget vs the model's context window, and checks whether directives in the prompt were actually covered in the response.
argument-hint	<track>/<slug> \| <track> <slug> \| <track> <range>