Run any Skill in Manus with one click

$pwd:

mark-student-work-multi-agent-v2

Name: Mark Student Work Multi Agent V2
Author: jarodmeng

// Orchestrates a multi-agent workflow to mark a student's completion PDF (either against an answer key or from teacher annotations). Uses the `Task` tool to spawn isolated subagents for structural mapping, parallel transcription/grading, and taxonomy tagging. Outputs a canonical JSON marking artifact and a derived markdown learning report. Use when the user asks to mark, grade, or diagnose a student's work using the multi-agent architecture.

Run Skill in Manus

$ git log --oneline --stat

stars:0

forks:0

updated:May 6, 2026 at 02:35

SKILL.md

readonly

package.json

"author": "jarodmeng"

"repository": "jarodmeng/daydreamedu-scripts"

View GitHub Repository

$ install --globalskills.sh

$ download --local

Run Skill in Manus

[HINT] Download the complete skill directory including SKILL.md and all related files

Run any Skill with one click

name

mark-student-work-multi-agent-v2

description

Orchestrates a multi-agent workflow to mark a student's completion PDF (either against an answer key or from teacher annotations). Uses the `Task` tool to spawn isolated subagents for structural mapping, parallel transcription/grading, and taxonomy tagging. Outputs a canonical JSON marking artifact and a derived markdown learning report. Use when the user asks to mark, grade, or diagnose a student's work using the multi-agent architecture.

Multi-Agent Student Work Marking Orchestrator

This skill acts as the Orchestrator for a Hierarchical Multi-Agent System. Do not attempt to read the attempt images, grade the questions, or assign skill tags yourself. Your job is to resolve the context, spawn specialized subagents using the Task tool, assemble their outputs, and write the final artifacts.

Canonical Contract Policy (mandatory)

The final artifact must conform to the currently supported marking_result schema contract enforced by ai_study_buddy.marking:

Authoritative “latest supported” pointer is code, not a magic string. Use ai_study_buddy.marking.core.artifact_schema:
- SCHEMA_VERSION (string value is the only supported marking_result version at runtime today)
- DEFAULT_MARKING_RESULT_VERSION (currently identical to SCHEMA_VERSION; exists for callers that want an explicit knob)
- SUPPORTED_SCHEMA_VERSIONS / SCHEMA_PATHS_BY_VERSION (explicit allowlist + on-disk schema path wiring)
- Prefer reading these constants directly from source (instead of pinning a numbered version literal in prose that will drift).
"latest" is intentionally unsupported for persisted JSON and for write_marking_artifact(..., schema_version=...) payloads (pass explicit None/DEFAULT_MARKING_RESULT_VERSION, not "latest").
Persisted JSON must keep schema_version exactly equal to whatever SCHEMA_VERSION resolves to at write time (validate_marking_artifact_dict rejects anything outside SUPPORTED_SCHEMA_VERSIONS).
Contract is closed (additionalProperties: false on top-level and key nested objects). Do not add ad-hoc fields.

Before writing final JSON:

Ensure all assembled fields are schema-valid.
Run validate_marking_artifact_dict(payload) and treat any failure as a hard stop.
If any field is not part of schema, remove it or convert it to an approved schema field before finalize.

Persistence Boundary Policy (mandatory)

Treat marking/review package APIs as the only supported write boundary:

Persist final marking JSON via write_marking_artifact(...); do not write directly into context/marking_results/... with ad-hoc file I/O.
Persist review state/amendment JSON via StudentReviewRepository.save_review_state(...) and StudentReviewRepository.save_amendment(...).
Do not treat filesystem glob scans as authoritative lookup for latest attempts or review records during orchestration.
For source-of-truth lookups, use resolver/repository utilities (resolve_marking_context(...), find_marking_artifacts_for_attempt(...), and repository read helpers) instead of path heuristics.

Rationale: dual-write and operation-log coverage are attached to these APIs; bypassing them weakens auditability and rollback guarantees.

Language Policy (mandatory)

Language consistency is a hard quality gate for all phases that emit free-text fields:

For subject_context in {singapore_primary_math, singapore_primary_science, singapore_primary_english}:
- Require English-only free text in agent outputs (student_answer, correct_answer, diagnosis.reasoning, human_note, and other narrative fields).
For subject_context in {singapore_primary_chinese, singapore_primary_higher_chinese}:
- Allow Chinese in diagnosis.reasoning (and optional explanatory notes), but keep taxonomy keys (mistake_type, error_tags) in English enums.
Always pass explicit language instructions in every Phase 2 and Phase 3 Task prompt:
- subject_context=<...>
- required_output_language=english|chinese
- language_policy=<one-line rule>

If a subagent output violates language policy, treat it as malformed output and retry that subagent with explicit correction instructions. Do not proceed with mixed-language payloads.

Human Note Policy (mandatory)

human_note is a reserved transcription field, not a free-form AI commentary field.

Populate human_note only when there is clear human-written annotation distinct from student workings for that question (typically red/purple teacher/parent/tutor writing).
human_note content must be a strict transcription (verbatim or as-close-as-legible) of that human annotation.
Do not write AI summaries, inferred mistake explanations, or paraphrased diagnosis into human_note.
If there is no clear human annotation for that question, set human_note = null.
If annotation exists but is not fully legible, use a minimal transcription with explicit uncertainty marker (for example [illegible annotation]) rather than fabricating text.
Put student-focused pedagogical interpretation (why the answer is wrong, partial, or what misconception applies) in diagnosis.reasoning, never in human_note. Do not use diagnosis.reasoning for grading provenance, teacher-mark narration, or orchestration/meta text (“teacher tick”, “margin shows”, “in teacher-annotated mode…”).

CRITICAL ORCHESTRATOR BOUNDARY: You are the Orchestrator. You manage the workflow, but you MUST NEVER perform the grading, transcription, or tagging tasks yourself. If a subagent fails, times out, or returns malformed data, you MUST either:

Retry launching the subagent for that specific task.
Stop the workflow and report the error to the user.

Under NO circumstances should you attempt to "fill in the blanks" or grade the remaining questions yourself. If you do, you will hallucinate and corrupt the final artifact.

1. Resolve the Marking Context

Before spawning any subagents, you must resolve the context using PdfFileManager (or ai_study_buddy.marking.resolve_marking_context(...)).

Resolver-only contract: do not manually assemble canonical context fields for persisted artifacts. Use resolver-produced context (including context_resolution provenance) as the source of truth, then pass it through to write_marking_artifact(...).

Determine:

The student attempt PDF path.
The template PDF path (if applicable).
The answer PDF path and page range (if applicable).
The workflow mode:
- Mode A (Standard): An answer key is available (either mapped or embedded).
- Mode B (Teacher-Annotated): No answer key is available; the attempt is annotated by a teacher.

Render the required pages into the standard Marking Asset Bundle (context.marking_asset) under attempt/page-{nn}.png and answers/page-{nn}.png (if applicable).

Preferred package entrypoints:

ai_study_buddy.marking.render_attempt_pdf_to_bundle(...)
ai_study_buddy.marking.render_answers_pdf_pages_to_bundle(...)

Example:

from ai_study_buddy.marking import render_answers_pdf_pages_to_bundle, render_attempt_pdf_to_bundle

render_attempt_pdf_to_bundle(input_attempt_pdf, bundle_root, dpi_scale=2.0)
render_answers_pdf_pages_to_bundle(input_answer_pdf, bundle_root, pages_1_based=[11, 12, 13], dpi_scale=2.0)

MAB directory (bundle root): Do not render into ad-hoc paths such as ai_study_buddy/context/marking_asset_bundles/.... The bundle root must match the directory that write_marking_artifact will record as context.marking_asset: under ai_study_buddy/context/marking_assets/<student_slug>/<subject_context>/<attempt_basename>/, where <attempt_basename> follows build_attempt_basename(...) in ai_study_buddy.marking.core.artifact_paths.

Single-timestamp contract (required):

Capture run_marked_at = now_marking_iso() exactly once at orchestration start.
Use that same run_marked_at for both:
- bundle-path derivation before rendering images
- final artifact created_at and updated_at
Prefer package helper build_marking_run_paths(...) to compute (artifact_json_path, marking_asset_rel, bundle_root) from one timestamp.
Do not call now_marking_iso() again for artifact write-time fields; reuse the captured run_marked_at.

2. Phase 1: Scope, Mapper & Key Verifier Subagent

Use the Task tool to launch the project subagent marking-phase1-mapper (by name), and omit explicit model pinning so frontmatter (model: inherit) is respected.

Pass bundle paths and context (mode, attempt pages, answer pages) to this subagent. It returns the JSON mapping of question_id → attempt_pages.

Task template (Phase 1):

phase1 = Task(
    subagent_type="marking-phase1-mapper",
    prompt=(
        "Read attempt/answer bundle images and return ONLY JSON array "
        "of {question_id, attempt_pages}.\\n"
        f"mode={workflow_mode}\\n"
        f"bundle_root={bundle_root}\\n"
        f"attempt_glob={bundle_root / 'attempt/page-*.png'}\\n"
        f"answers_glob={bundle_root / 'answers/page-*.png'}"
    ),
)

Wait for this subagent to return the JSON array. Save this array to context.marking_asset/debug/phase1_mapping.json.

Phase 1 Quality Check Gate (required)

Before proceeding to Phase 2, validate the Phase 1 mapping output:

Duplicate question_id check (hard fail):
- Build the set/count of all question_id values.
- If any question_id appears more than once, do not continue.
- Retry Phase 1 with an explicit correction instruction to disambiguate/relabel duplicated IDs.
Section collision check (hard fail for sectioned papers):
- If the paper is sectioned (e.g., Section A / Section B), question_id values MUST encode section context (for example A1, A2b, B1, B1a) instead of bare labels like Q1, Q1(a).
- Treat this as a hard fail if section context is missing from IDs in a sectioned paper, even when exact string duplicates do not exist.
- Also treat it as a hard fail when IDs share the same parent stem across sections without section prefixes (for example Q1 in one section and Q1(a) in another), because users will still read these as conflicting.
- On fail, retry Phase 1 and explicitly require section-prefixed IDs for every row.
Persist QC evidence:
- Write a small QC artifact to context.marking_asset/debug/phase1_qc.json containing:
  - total_rows
  - unique_question_ids
  - duplicate_question_ids (array)
  - section_collision_detected (boolean)
  - qc_passed (boolean)
- Only allow Phase 2 when qc_passed is true.

3. Phase 2: Optimistic Fast-Pass Grader Subagent

Use the Task tool to launch marking-phase2-fast-pass-grader subagents to do a fast pass over ALL questions.

Phase 2 Faithful-Transcription Rule (hard requirement)

Phase 2 must transcribe what is actually written, not what the model thinks the student "probably meant."

Never fabricate student answers. Do not invent words, options, numbers, or rewritten responses that are not clearly present on the attempt page.
If handwriting/markings are ambiguous, partially visible, overwritten, or unreadable:
- set student_answer to a blank/no-response placeholder (for example "" or "[illegible]"),
- set confidence.transcription = "low",
- and allow Phase 3 routing to handle close inspection.
Do not mark uncertain transcriptions as high confidence.
In teacher-annotated mode, do not assume a student answer from surrounding context or from likely grammar/vocabulary fit; only capture visible student writing.

Phase 2 Teacher-Mark Capture Rule (hard requirement in Mode B)

In teacher-annotated mode, Phase 2 must explicitly read teacher grading signals for each row:

detect whether teacher marks are visible (tick, cross, numeric mark, x/y);
if visible, treat them as primary evidence for outcome and earned_marks;
if unclear or conflicting, set confidence.grading = "low" so the row is routed to Phase 3;
never assign high-confidence grading when teacher marks are not clearly localized.

CRITICAL PERFORMANCE OPTIMIZATION: Do not pass all questions to a single subagent if the paper has more than 15 questions. The subagent will hit output token limits and truncate the JSON. Instead, split the attempt_pages_map into chunks of 10-15 questions each. Launch a separate marking-phase2-fast-pass-grader subagent IN PARALLEL for each chunk.

Phase 2 MCQ Bracket Safeguards (required)

Before finalizing any MCQ as unanswered:

Perform a focused read of the answer bracket region (where the final choice is written).
Treat faint, thin, or single-stroke digit-like marks (for example a lightly written 1) as a potential response, not immediate blank.
If the mark is ambiguous (could be a valid digit or noise), do not emit no_response at high confidence. Emit low transcription confidence so it is forced into Phase 3 review.
If the student indicates a choice indirectly (for example by ticking statements that map to one option, as in “A and B only”), treat that as a valid response signal and describe it in student_answer / human_note.
Localization QC gate (minimal, required): before emitting a high-confidence blank/no_response for any MCQ, save:
- one full-page overlay image with the bracket box drawn (debug/mcq_box_checks/<question_id>_overlay.png)
- one tight bracket crop (debug/mcq_box_checks/<question_id>_tight.png) If the overlay does not clearly land on the intended question row, retry localization once. If still uncertain, do not emit a confident blank; downgrade confidence and route to Phase 3.

For each chunked invocation, pass:

attempt_pages_map chunk
standard vs teacher-annotated mode
attempt and answer bundle paths
reminder that output must be strict JSON array

Do not pin a model in Task calls; allow agent frontmatter (model: inherit) to decide.

Task template (Phase 2, one chunk):

phase2_job = Task(
    subagent_type="marking-phase2-fast-pass-grader",
    prompt=(
        "Fast-pass grade this chunk. Return ONLY JSON array.\\n"
        f"mode={workflow_mode}\\n"
        f"subject_context={subject_context}\\n"
        f"required_output_language={required_output_language}\\n"
        f"language_policy={language_policy}\\n"
        "human_note_policy=Populate human_note ONLY by transcribing distinct human-written annotations (red/purple teacher-parent notes). If absent, set null. Never write AI summary in human_note.\\n"
        "human_note_output_contract=For each row also return {human_note_source, human_note_is_verbatim, human_note_evidence_page} where source in {none,teacher_annotation,parent_or_tutor_annotation,other_human_annotation}.\\n"
        f"attempt_pages_map_chunk={attempt_pages_map_chunk}\\n"
        f"bundle_root={bundle_root}\\n"
        f"attempt_glob={bundle_root / 'attempt/page-*.png'}\\n"
        f"answers_glob={bundle_root / 'answers/page-*.png'}"
    ),
)

Wait for ALL parallel subagents to return their JSON arrays. Combine them into a single array. Save this array to context.marking_asset/debug/phase2_fast_pass.json.

Phase 2 Language QC Gate (required)

Before Phase 3 routing:

Validate language compliance for each row’s free-text fields (student_answer, correct_answer, human_note, diagnosis.reasoning).
For English-required subjects, detect CJK Han characters and treat any occurrence as a hard QC failure.
Persist context.marking_asset/debug/phase2_language_qc.json with:
- total_rows
- english_required (boolean)
- violating_question_ids (array)
- qc_passed (boolean)
If qc_passed is false:
- retry only the violating Phase 2 chunks with explicit “English-only output” correction instructions;
- replace only the violating rows;
- re-run Phase 2 language QC before continuing.

Phase 2 Human-Note QC Gate (required)

Before Phase 3 routing:

Validate each Phase 2 row with these hard rules:
- if human_note is non-null/non-empty, human_note_source must not be none;
- if human_note is non-null/non-empty, human_note_is_verbatim must be true;
- if human_note_source is none, then human_note must be null/empty.
Persist context.marking_asset/debug/phase2_human_note_qc.json with:
- total_rows
- rows_with_human_note (array of question_id)
- policy_violations (array of {question_id, reason})
- qc_passed (boolean)
If QC fails, retry only violating Phase 2 chunks with explicit correction: keep human_note null unless there is visible distinct human annotation to transcribe verbatim.

4. Phase 3: Deep-Dive Remediation Subagents

Filter the JSON array from Phase 2. For ANY question where outcome != "correct" OR any value in the confidence object is "low", use the Task tool to launch marking-phase3-deep-dive subagents IN PARALLEL.

Phase 3 MCQ Adjudication Safeguards (required)

For any MCQ flagged as wrong/partial/low-confidence, the deep-dive agent must explicitly adjudicate bracket evidence:

Is there any intentional pen stroke inside the answer bracket?
If yes, what is the most likely digit / option?
If ambiguous, what are plausible alternatives, and how does that affect confidence?

Do not allow Phase 3 to simply repeat a fast-pass no_response claim without this explicit bracket adjudication.

Phase 3 mark allocation (mandatory)

Phase 2 is the only source of truth for max_marks per question_id (read from the printed key / rubric on the answer pages).

Pass fast_pass_max_marks=<n> (and optionally the full Phase 2 row snapshot) in every Phase 3 Task prompt so the subagent knows the mark ceiling.
The deep-dive subagent must not emit max_marks, or if legacy prompts still ask for it, the orchestrator must discard any Phase 3 max_marks and always use the Phase 2 value for that question_id.
After merge, earned_marks must never exceed the Phase 2 max_marks for that row; if Phase 3 violates this, treat output as malformed and retry Phase 3 for that question_id.

Phase 3 Teacher-Authority Gate (hard requirement in Mode B)

Before accepting a Phase 3 override in teacher-annotated mode:

if teacher grading marks are clearly visible for the row, Phase 3 must not change teacher-awarded earned_marks;
allow Phase 3 to improve transcription/diagnosis/page localization while preserving teacher score;
only permit score changes when teacher grading is unclear/contradictory, and require low grading confidence plus explicit uncertainty note.

For each invocation, pass:

one question_id
fast_pass_max_marks=<n> copied from the Phase 2 row for that question_id
candidate attempt_pages
attempt/answer bundle paths
strict JSON object output requirement

Do not pin a model in Task calls; allow agent frontmatter (model: inherit) to decide.

Task template (Phase 3, one question):

phase3_job = Task(
    subagent_type="marking-phase3-deep-dive",
    prompt=(
        "Deep-dive ONLY this question. Return ONLY one JSON object.\\n"
        f"question_id={question_id}\\n"
        f"subject_context={subject_context}\\n"
        f"required_output_language={required_output_language}\\n"
        f"language_policy={language_policy}\\n"
        "human_note_policy=Populate human_note ONLY by transcribing distinct human-written annotations (red/purple teacher-parent notes). If absent, set null. Never write AI summary in human_note.\\n"
        "human_note_output_contract=Return {human_note_source, human_note_is_verbatim, human_note_evidence_page}.\\n"
        f"attempt_pages_hint={attempt_pages_hint}\\n"
        f"fast_pass_max_marks={fast_pass_max_marks}\\n"
        f"bundle_root={bundle_root}\\n"
        f"attempt_glob={bundle_root / 'attempt/page-*.png'}\\n"
        f"answers_glob={bundle_root / 'answers/page-*.png'}"
    ),
)

Wait for ALL parallel subagents to complete. Save their combined outputs to context.marking_asset/debug/phase3_deep_dive.json.

Phase 3 Language QC Gate (required)

Before merging Phase 3 rows into final results:

Validate language policy on each deep-dive object (student_answer, correct_answer, human_note, diagnosis.reasoning).
For English-required subjects, any Han character is a hard fail.
Persist context.marking_asset/debug/phase3_language_qc.json with:
- total_rows
- english_required (boolean)
- violating_question_ids (array)
- qc_passed (boolean)
If QC fails, retry only violating question_ids with explicit language correction instructions and re-run QC.

Phase 3 Human-Note QC Gate (required)

Before merging Phase 3 rows into final results:

Apply the same hard policy checks as Phase 2:
- non-null human_note requires non-none source and human_note_is_verbatim=true;
- human_note_source=none requires human_note null/empty.
Persist context.marking_asset/debug/phase3_human_note_qc.json with:
- total_rows
- rows_with_human_note (array of question_id)
- policy_violations (array of {question_id, reason})
- qc_passed (boolean)
If QC fails, retry only violating question_ids with explicit correction instructions.

5. Phase 4: Taxonomy Tagger Subagent

Merge the final results (using Phase 3 results to overwrite Phase 2 results where applicable). Use the Task tool to launch ONE marking-phase4-taxonomy-tagger subagent to tag them.

CRITICAL PERFORMANCE OPTIMIZATION: Do not pass the entire transcribed JSON array (which contains verbose student_answer, correct_answer, and diagnosis fields) to the Taxonomy Tagger. This wastes massive amounts of tokens and slows down the subagent. Instead, pass ONLY a simplified array containing question_id and the question text/stem (if available) or just the question_id if the topic can be inferred from the section.

Pass:

subject_context
simplified question list
syllabus markdown path/content
strict JSON array output requirement

Do not pin a model in Task calls; allow agent frontmatter (model: inherit) to decide.

Task template (Phase 4):

phase4 = Task(
    subagent_type="marking-phase4-taxonomy-tagger",
    prompt=(
        "Map question_id to syllabus skill_tags. Return ONLY JSON array.\\n"
        f"subject_context={subject_context}\\n"
        f"simplified_questions={simplified_questions}\\n"
        f"syllabus_path={syllabus_path}"
    ),
)

Wait for this subagent to return the tags. Save this array to context.marking_asset/debug/phase4_tags.json.

Phase 4 Tag QC Gate (required)

Before proceeding to assembly, validate the Phase 4 tags:

For subject_context = singapore_primary_science, each skill_tags entry MUST use exactly:
- <theme> > <chapter> > <topic>
- OR the approved experiment-design exception tag: Experiments > Fair-test
Treat the following as QC failures:
- chapter-number-prefixed chapter labels (for example 15. The Digestive System)
- malformed path shapes (missing or extra > segments), except the approved Experiments > Fair-test tag
- placeholder topic — when a concrete topic is clearly inferable from the paper/question set
If QC fails, retry Phase 4 once with an explicit correction instruction. Do not continue to assembly with failed tags.
Persist QC evidence to context.marking_asset/debug/phase4_qc.json with:
- total_rows
- invalid_format_rows (array of question_id)
- placeholder_topic_rows (array of question_id)
- qc_passed (boolean)

6. Phase 5: Assembly and Finalization

As the Orchestrator, you must now assemble the final artifacts:

Merge Data: Combine the final grades/diagnoses and the skill tags into the question_results array. Ensure the question_id is mapped to the result_id field in the final schema.
- Preserve max_marks from Phase 2 for every row. When applying Phase 3 objects, merge fields such as student_answer, correct_answer, outcome, earned_marks, diagnosis, human_note, and corrected_attempt_pages from Phase 3, but never overwrite Phase 2 max_marks with a Phase 3 value (Phase 3 must not be able to inflate totals).
- If Phase 3 omits earned_marks/outcome, keep the Phase 2 values; if Phase 3 includes them, they must satisfy 0 <= earned_marks <= max_marks from Phase 2.
- Persist context.marking_asset/debug/phase5_merge_qc.json with { "max_marks_source": "phase2_only", "phase3_max_marks_discarded": <bool>, "qc_passed": true } after verifying every row.
- For human_note merge policy:
  - Keep human_note only when it passes the human-note policy QC/provenance checks.
  - If provenance checks fail, force human_note = null for that row and record the dropped row in phase5_merge_qc.json under human_note_rows_dropped_by_policy.
  - Do not auto-convert dropped human_note text into diagnosis.reasoning; rerun the subagent instead when the content is needed.
Build Page Map: Use the corrected_attempt_pages (from Phase 3) or attempt_pages (from Phase 1) to build the context.question_page_map. Ensure the attempt_page_start is set to the first page in the array.
Calculate Totals: Calculate summary.earned_marks and summary.total_marks by summing the question_results.
Determine Scope: Set context.is_partial based on whether the graded questions represent the full expected paper.
Write JSON: Write via write_marking_artifact (ai_study_buddy.marking.core.artifact_writer.write_marking_artifact) using the default (schema_version=None) so timestamps normalize to marking-time SGT semantics and schema_version is stamped as artifact_schema.SCHEMA_VERSION, persisting canonical layout: context/marking_results/<student_slug>/<subject_context>/<attempt_basename>.json.
- Required: artifact.created_at == artifact.updated_at == run_marked_at (the single captured timestamp from Phase 0/startup).
- Required: context.marking_asset stem must match artifact stem for that same run_marked_at.
- Do not strip or rewrite resolver provenance fields under context.context_resolution.
- If write-time contract validation fails, treat it as a producer bug and fix the producer path instead of force-writing manual context.
Render Markdown: Run the report_renderer to generate the Markdown report in context/learning_reports/.
Write Profiling Log: Create a context.marking_asset/debug/profiling_log.md file. Record the start and end times (in SGT) for Phase 1, Phase 2, Phase 3, and Phase 4. Calculate the total duration of the marking run.
Write Telemetry Data: In the generation block of the final JSON, include a telemetry object: {"fast_pass_count": X, "deep_dive_count": Y, "total_duration_seconds": Z}. This allows you to track the efficiency of the Optimistic Fast-Pass architecture over time.

Final JSON assembly field constraints (required)

Apply these strict mappings when translating subagent outputs into canonical rows:

result_id: from question_id
outcome: must be one of correct | partial | wrong | disqualified (normalize incorrect -> wrong if a subagent emits legacy wording)
scoring_status: must be counted or excluded_disqualified
diagnosis: object with only mistake_type, reasoning, confidence
error_tags: array using allowed taxonomy enums only
question_page_map[] entries: only result_id, attempt_page_start, confidence, source, optional evidence_image, optional note
generation: only produced_by, mode, notes, optional telemetry
generation.telemetry (if present): only
- fast_pass_count (int >= 0)
- deep_dive_count (int >= 0)
- total_duration_seconds (number or null, >= 0 when numeric)
- optional manual_corrections (int >= 0)
- optional phase2_task_subagents (boolean)

Pre-Finalization Teacher Tally Reconciliation (required for Mode B)

Before final JSON/report write in teacher-annotated mode:

Extract page-01 teacher tally (overall and per-section when available).
Compute section/booklet totals from merged question_results.
Persist context.marking_asset/debug/teacher_tally_qc.json with:
- teacher_total_marks
- teacher_earned_marks
- computed_total_marks
- computed_earned_marks
- section_deltas (if section tally visible)
- qc_passed
If totals do not match and no explicit user override is provided, do not finalize; route mismatched sections/questions back for remediation.

Pre-Finalization MCQ No-Response Validator (required)

Before writing the final JSON/report:

Collect all MCQ rows currently labeled as no_response / blank-answer equivalents.
Re-check each of those rows against attempt-page evidence with a dedicated bracket-focused pass.
If any bracket shows plausible intentional ink (including faint single-stroke numerals), do not finalize as blank without adjudication.
If ambiguity remains unresolved, downgrade confidence and annotate human_note rather than silently committing a definite blank response.

Pre-Finalization Language Validator (required)

Before writing final JSON/report:

Re-scan merged question_results free-text fields:
- student_answer
- correct_answer
- human_note
- diagnosis.reasoning
Enforce subject_context language policy (English-only for non-Chinese contexts).
Persist context.marking_asset/debug/final_language_qc.json with:
- total_rows
- english_required (boolean)
- violating_result_ids (array)
- qc_passed (boolean)
Do not finalize JSON/report until qc_passed is true.

Pre-Finalization Human-Note Validator (required)

Before writing final JSON/report:

Re-scan merged rows for strict human_note policy conformance:
- human_note non-null -> provenance/source present and verbatim flag true;
- no AI-summary patterns in human_note (for example generic analytic phrasing without direct annotation cues).
Persist context.marking_asset/debug/final_human_note_qc.json with:
- total_rows
- rows_with_human_note
- policy_violations
- qc_passed
Do not finalize JSON/report until qc_passed is true.

Quality Bar:

Do not hallucinate data if a subagent fails. If a Phase 3 subagent fails or returns malformed JSON, you may retry launching a subagent for that specific question_id.
Ensure final JSON passes validate_marking_artifact_dict(...) under the repo’s enforced artifact_schema.SCHEMA_VERSION/SUPPORTED_SCHEMA_VERSIONS policy (this is stronger than matching a prose-named version literal).

7. Error Handling and Cleanup

If the marking run fails to complete (e.g., a subagent repeatedly fails, or you encounter an unrecoverable error during orchestration), you MUST clean up any temporary files or folders created during the run to avoid polluting the filesystem.

Delete the entire Marking Asset Bundle directory (context.marking_asset) that was created for this run.
Do not leave orphaned PNGs or intermediate JSON files in the workspace.
Inform the user that the run failed and the temporary assets were cleaned up.

For pruning artifacts from completed/older runs (not just failed-run temporary cleanup), use the dedicated skill: ../prune-marking-run-artifacts/SKILL.md

name

mark-student-work-multi-agent-v2

description

Multi-Agent Student Work Marking Orchestrator

Canonical Contract Policy (mandatory)

The final artifact must conform to the currently supported marking_result schema contract enforced by ai_study_buddy.marking:

Authoritative “latest supported” pointer is code, not a magic string. Use ai_study_buddy.marking.core.artifact_schema:
- SCHEMA_VERSION (string value is the only supported marking_result version at runtime today)
- DEFAULT_MARKING_RESULT_VERSION (currently identical to SCHEMA_VERSION; exists for callers that want an explicit knob)
- SUPPORTED_SCHEMA_VERSIONS / SCHEMA_PATHS_BY_VERSION (explicit allowlist + on-disk schema path wiring)
- Prefer reading these constants directly from source (instead of pinning a numbered version literal in prose that will drift).
"latest" is intentionally unsupported for persisted JSON and for write_marking_artifact(..., schema_version=...) payloads (pass explicit None/DEFAULT_MARKING_RESULT_VERSION, not "latest").
Persisted JSON must keep schema_version exactly equal to whatever SCHEMA_VERSION resolves to at write time (validate_marking_artifact_dict rejects anything outside SUPPORTED_SCHEMA_VERSIONS).
Contract is closed (additionalProperties: false on top-level and key nested objects). Do not add ad-hoc fields.

Before writing final JSON:

Ensure all assembled fields are schema-valid.
Run validate_marking_artifact_dict(payload) and treat any failure as a hard stop.
If any field is not part of schema, remove it or convert it to an approved schema field before finalize.

Persistence Boundary Policy (mandatory)

Treat marking/review package APIs as the only supported write boundary:

Persist final marking JSON via write_marking_artifact(...); do not write directly into context/marking_results/... with ad-hoc file I/O.
Persist review state/amendment JSON via StudentReviewRepository.save_review_state(...) and StudentReviewRepository.save_amendment(...).
Do not treat filesystem glob scans as authoritative lookup for latest attempts or review records during orchestration.
For source-of-truth lookups, use resolver/repository utilities (resolve_marking_context(...), find_marking_artifacts_for_attempt(...), and repository read helpers) instead of path heuristics.

Rationale: dual-write and operation-log coverage are attached to these APIs; bypassing them weakens auditability and rollback guarantees.

Language Policy (mandatory)

Language consistency is a hard quality gate for all phases that emit free-text fields:

For subject_context in {singapore_primary_math, singapore_primary_science, singapore_primary_english}:
- Require English-only free text in agent outputs (student_answer, correct_answer, diagnosis.reasoning, human_note, and other narrative fields).
For subject_context in {singapore_primary_chinese, singapore_primary_higher_chinese}:
- Allow Chinese in diagnosis.reasoning (and optional explanatory notes), but keep taxonomy keys (mistake_type, error_tags) in English enums.
Always pass explicit language instructions in every Phase 2 and Phase 3 Task prompt:
- subject_context=<...>
- required_output_language=english|chinese
- language_policy=<one-line rule>

If a subagent output violates language policy, treat it as malformed output and retry that subagent with explicit correction instructions. Do not proceed with mixed-language payloads.

Human Note Policy (mandatory)

human_note is a reserved transcription field, not a free-form AI commentary field.

Populate human_note only when there is clear human-written annotation distinct from student workings for that question (typically red/purple teacher/parent/tutor writing).
human_note content must be a strict transcription (verbatim or as-close-as-legible) of that human annotation.
Do not write AI summaries, inferred mistake explanations, or paraphrased diagnosis into human_note.
If there is no clear human annotation for that question, set human_note = null.
If annotation exists but is not fully legible, use a minimal transcription with explicit uncertainty marker (for example [illegible annotation]) rather than fabricating text.
Put student-focused pedagogical interpretation (why the answer is wrong, partial, or what misconception applies) in diagnosis.reasoning, never in human_note. Do not use diagnosis.reasoning for grading provenance, teacher-mark narration, or orchestration/meta text (“teacher tick”, “margin shows”, “in teacher-annotated mode…”).

Retry launching the subagent for that specific task.
Stop the workflow and report the error to the user.

Under NO circumstances should you attempt to "fill in the blanks" or grade the remaining questions yourself. If you do, you will hallucinate and corrupt the final artifact.

1. Resolve the Marking Context

Before spawning any subagents, you must resolve the context using PdfFileManager (or ai_study_buddy.marking.resolve_marking_context(...)).

Determine:

The student attempt PDF path.
The template PDF path (if applicable).
The answer PDF path and page range (if applicable).
The workflow mode:
- Mode A (Standard): An answer key is available (either mapped or embedded).
- Mode B (Teacher-Annotated): No answer key is available; the attempt is annotated by a teacher.

Render the required pages into the standard Marking Asset Bundle (context.marking_asset) under attempt/page-{nn}.png and answers/page-{nn}.png (if applicable).

Preferred package entrypoints:

ai_study_buddy.marking.render_attempt_pdf_to_bundle(...)
ai_study_buddy.marking.render_answers_pdf_pages_to_bundle(...)

Example:

from ai_study_buddy.marking import render_answers_pdf_pages_to_bundle, render_attempt_pdf_to_bundle

render_attempt_pdf_to_bundle(input_attempt_pdf, bundle_root, dpi_scale=2.0)
render_answers_pdf_pages_to_bundle(input_answer_pdf, bundle_root, pages_1_based=[11, 12, 13], dpi_scale=2.0)

Single-timestamp contract (required):

Capture run_marked_at = now_marking_iso() exactly once at orchestration start.
Use that same run_marked_at for both:
- bundle-path derivation before rendering images
- final artifact created_at and updated_at
Prefer package helper build_marking_run_paths(...) to compute (artifact_json_path, marking_asset_rel, bundle_root) from one timestamp.
Do not call now_marking_iso() again for artifact write-time fields; reuse the captured run_marked_at.

2. Phase 1: Scope, Mapper & Key Verifier Subagent

Use the Task tool to launch the project subagent marking-phase1-mapper (by name), and omit explicit model pinning so frontmatter (model: inherit) is respected.

Pass bundle paths and context (mode, attempt pages, answer pages) to this subagent. It returns the JSON mapping of question_id → attempt_pages.

Task template (Phase 1):

phase1 = Task(
    subagent_type="marking-phase1-mapper",
    prompt=(
        "Read attempt/answer bundle images and return ONLY JSON array "
        "of {question_id, attempt_pages}.\\n"
        f"mode={workflow_mode}\\n"
        f"bundle_root={bundle_root}\\n"
        f"attempt_glob={bundle_root / 'attempt/page-*.png'}\\n"
        f"answers_glob={bundle_root / 'answers/page-*.png'}"
    ),
)

Wait for this subagent to return the JSON array. Save this array to context.marking_asset/debug/phase1_mapping.json.

Phase 1 Quality Check Gate (required)

Before proceeding to Phase 2, validate the Phase 1 mapping output:

Duplicate question_id check (hard fail):
- Build the set/count of all question_id values.
- If any question_id appears more than once, do not continue.
- Retry Phase 1 with an explicit correction instruction to disambiguate/relabel duplicated IDs.
Section collision check (hard fail for sectioned papers):
- If the paper is sectioned (e.g., Section A / Section B), question_id values MUST encode section context (for example A1, A2b, B1, B1a) instead of bare labels like Q1, Q1(a).
- Treat this as a hard fail if section context is missing from IDs in a sectioned paper, even when exact string duplicates do not exist.
- Also treat it as a hard fail when IDs share the same parent stem across sections without section prefixes (for example Q1 in one section and Q1(a) in another), because users will still read these as conflicting.
- On fail, retry Phase 1 and explicitly require section-prefixed IDs for every row.
Persist QC evidence:
- Write a small QC artifact to context.marking_asset/debug/phase1_qc.json containing:
  - total_rows
  - unique_question_ids
  - duplicate_question_ids (array)
  - section_collision_detected (boolean)
  - qc_passed (boolean)
- Only allow Phase 2 when qc_passed is true.

3. Phase 2: Optimistic Fast-Pass Grader Subagent

Use the Task tool to launch marking-phase2-fast-pass-grader subagents to do a fast pass over ALL questions.

Phase 2 Faithful-Transcription Rule (hard requirement)

Phase 2 must transcribe what is actually written, not what the model thinks the student "probably meant."

Never fabricate student answers. Do not invent words, options, numbers, or rewritten responses that are not clearly present on the attempt page.
If handwriting/markings are ambiguous, partially visible, overwritten, or unreadable:
- set student_answer to a blank/no-response placeholder (for example "" or "[illegible]"),
- set confidence.transcription = "low",
- and allow Phase 3 routing to handle close inspection.
Do not mark uncertain transcriptions as high confidence.
In teacher-annotated mode, do not assume a student answer from surrounding context or from likely grammar/vocabulary fit; only capture visible student writing.

Phase 2 Teacher-Mark Capture Rule (hard requirement in Mode B)

In teacher-annotated mode, Phase 2 must explicitly read teacher grading signals for each row:

detect whether teacher marks are visible (tick, cross, numeric mark, x/y);
if visible, treat them as primary evidence for outcome and earned_marks;
if unclear or conflicting, set confidence.grading = "low" so the row is routed to Phase 3;
never assign high-confidence grading when teacher marks are not clearly localized.

Phase 2 MCQ Bracket Safeguards (required)

Before finalizing any MCQ as unanswered:

Perform a focused read of the answer bracket region (where the final choice is written).
Treat faint, thin, or single-stroke digit-like marks (for example a lightly written 1) as a potential response, not immediate blank.
If the mark is ambiguous (could be a valid digit or noise), do not emit no_response at high confidence. Emit low transcription confidence so it is forced into Phase 3 review.
If the student indicates a choice indirectly (for example by ticking statements that map to one option, as in “A and B only”), treat that as a valid response signal and describe it in student_answer / human_note.
Localization QC gate (minimal, required): before emitting a high-confidence blank/no_response for any MCQ, save:
- one full-page overlay image with the bracket box drawn (debug/mcq_box_checks/<question_id>_overlay.png)
- one tight bracket crop (debug/mcq_box_checks/<question_id>_tight.png) If the overlay does not clearly land on the intended question row, retry localization once. If still uncertain, do not emit a confident blank; downgrade confidence and route to Phase 3.

For each chunked invocation, pass:

attempt_pages_map chunk
standard vs teacher-annotated mode
attempt and answer bundle paths
reminder that output must be strict JSON array

Do not pin a model in Task calls; allow agent frontmatter (model: inherit) to decide.

Task template (Phase 2, one chunk):

phase2_job = Task(
    subagent_type="marking-phase2-fast-pass-grader",
    prompt=(
        "Fast-pass grade this chunk. Return ONLY JSON array.\\n"
        f"mode={workflow_mode}\\n"
        f"subject_context={subject_context}\\n"
        f"required_output_language={required_output_language}\\n"
        f"language_policy={language_policy}\\n"
        "human_note_policy=Populate human_note ONLY by transcribing distinct human-written annotations (red/purple teacher-parent notes). If absent, set null. Never write AI summary in human_note.\\n"
        "human_note_output_contract=For each row also return {human_note_source, human_note_is_verbatim, human_note_evidence_page} where source in {none,teacher_annotation,parent_or_tutor_annotation,other_human_annotation}.\\n"
        f"attempt_pages_map_chunk={attempt_pages_map_chunk}\\n"
        f"bundle_root={bundle_root}\\n"
        f"attempt_glob={bundle_root / 'attempt/page-*.png'}\\n"
        f"answers_glob={bundle_root / 'answers/page-*.png'}"
    ),
)

Wait for ALL parallel subagents to return their JSON arrays. Combine them into a single array. Save this array to context.marking_asset/debug/phase2_fast_pass.json.

Phase 2 Language QC Gate (required)

Before Phase 3 routing:

Validate language compliance for each row’s free-text fields (student_answer, correct_answer, human_note, diagnosis.reasoning).
For English-required subjects, detect CJK Han characters and treat any occurrence as a hard QC failure.
Persist context.marking_asset/debug/phase2_language_qc.json with:
- total_rows
- english_required (boolean)
- violating_question_ids (array)
- qc_passed (boolean)
If qc_passed is false:
- retry only the violating Phase 2 chunks with explicit “English-only output” correction instructions;
- replace only the violating rows;
- re-run Phase 2 language QC before continuing.

Phase 2 Human-Note QC Gate (required)

Before Phase 3 routing:

Validate each Phase 2 row with these hard rules:
- if human_note is non-null/non-empty, human_note_source must not be none;
- if human_note is non-null/non-empty, human_note_is_verbatim must be true;
- if human_note_source is none, then human_note must be null/empty.
Persist context.marking_asset/debug/phase2_human_note_qc.json with:
- total_rows
- rows_with_human_note (array of question_id)
- policy_violations (array of {question_id, reason})
- qc_passed (boolean)
If QC fails, retry only violating Phase 2 chunks with explicit correction: keep human_note null unless there is visible distinct human annotation to transcribe verbatim.

4. Phase 3: Deep-Dive Remediation Subagents

Phase 3 MCQ Adjudication Safeguards (required)

For any MCQ flagged as wrong/partial/low-confidence, the deep-dive agent must explicitly adjudicate bracket evidence:

Is there any intentional pen stroke inside the answer bracket?
If yes, what is the most likely digit / option?
If ambiguous, what are plausible alternatives, and how does that affect confidence?

Do not allow Phase 3 to simply repeat a fast-pass no_response claim without this explicit bracket adjudication.

Phase 3 mark allocation (mandatory)

Phase 2 is the only source of truth for max_marks per question_id (read from the printed key / rubric on the answer pages).

Pass fast_pass_max_marks=<n> (and optionally the full Phase 2 row snapshot) in every Phase 3 Task prompt so the subagent knows the mark ceiling.
The deep-dive subagent must not emit max_marks, or if legacy prompts still ask for it, the orchestrator must discard any Phase 3 max_marks and always use the Phase 2 value for that question_id.
After merge, earned_marks must never exceed the Phase 2 max_marks for that row; if Phase 3 violates this, treat output as malformed and retry Phase 3 for that question_id.

Phase 3 Teacher-Authority Gate (hard requirement in Mode B)

Before accepting a Phase 3 override in teacher-annotated mode:

if teacher grading marks are clearly visible for the row, Phase 3 must not change teacher-awarded earned_marks;
allow Phase 3 to improve transcription/diagnosis/page localization while preserving teacher score;
only permit score changes when teacher grading is unclear/contradictory, and require low grading confidence plus explicit uncertainty note.

For each invocation, pass:

one question_id
fast_pass_max_marks=<n> copied from the Phase 2 row for that question_id
candidate attempt_pages
attempt/answer bundle paths
strict JSON object output requirement

Do not pin a model in Task calls; allow agent frontmatter (model: inherit) to decide.

Task template (Phase 3, one question):

phase3_job = Task(
    subagent_type="marking-phase3-deep-dive",
    prompt=(
        "Deep-dive ONLY this question. Return ONLY one JSON object.\\n"
        f"question_id={question_id}\\n"
        f"subject_context={subject_context}\\n"
        f"required_output_language={required_output_language}\\n"
        f"language_policy={language_policy}\\n"
        "human_note_policy=Populate human_note ONLY by transcribing distinct human-written annotations (red/purple teacher-parent notes). If absent, set null. Never write AI summary in human_note.\\n"
        "human_note_output_contract=Return {human_note_source, human_note_is_verbatim, human_note_evidence_page}.\\n"
        f"attempt_pages_hint={attempt_pages_hint}\\n"
        f"fast_pass_max_marks={fast_pass_max_marks}\\n"
        f"bundle_root={bundle_root}\\n"
        f"attempt_glob={bundle_root / 'attempt/page-*.png'}\\n"
        f"answers_glob={bundle_root / 'answers/page-*.png'}"
    ),
)

Wait for ALL parallel subagents to complete. Save their combined outputs to context.marking_asset/debug/phase3_deep_dive.json.

Phase 3 Language QC Gate (required)

Before merging Phase 3 rows into final results:

Validate language policy on each deep-dive object (student_answer, correct_answer, human_note, diagnosis.reasoning).
For English-required subjects, any Han character is a hard fail.
Persist context.marking_asset/debug/phase3_language_qc.json with:
- total_rows
- english_required (boolean)
- violating_question_ids (array)
- qc_passed (boolean)
If QC fails, retry only violating question_ids with explicit language correction instructions and re-run QC.

Phase 3 Human-Note QC Gate (required)

Before merging Phase 3 rows into final results:

Apply the same hard policy checks as Phase 2:
- non-null human_note requires non-none source and human_note_is_verbatim=true;
- human_note_source=none requires human_note null/empty.
Persist context.marking_asset/debug/phase3_human_note_qc.json with:
- total_rows
- rows_with_human_note (array of question_id)
- policy_violations (array of {question_id, reason})
- qc_passed (boolean)
If QC fails, retry only violating question_ids with explicit correction instructions.

5. Phase 4: Taxonomy Tagger Subagent

Merge the final results (using Phase 3 results to overwrite Phase 2 results where applicable). Use the Task tool to launch ONE marking-phase4-taxonomy-tagger subagent to tag them.

Pass:

subject_context
simplified question list
syllabus markdown path/content
strict JSON array output requirement

Do not pin a model in Task calls; allow agent frontmatter (model: inherit) to decide.

Task template (Phase 4):

phase4 = Task(
    subagent_type="marking-phase4-taxonomy-tagger",
    prompt=(
        "Map question_id to syllabus skill_tags. Return ONLY JSON array.\\n"
        f"subject_context={subject_context}\\n"
        f"simplified_questions={simplified_questions}\\n"
        f"syllabus_path={syllabus_path}"
    ),
)

Wait for this subagent to return the tags. Save this array to context.marking_asset/debug/phase4_tags.json.

Phase 4 Tag QC Gate (required)

Before proceeding to assembly, validate the Phase 4 tags:

For subject_context = singapore_primary_science, each skill_tags entry MUST use exactly:
- <theme> > <chapter> > <topic>
- OR the approved experiment-design exception tag: Experiments > Fair-test
Treat the following as QC failures:
- chapter-number-prefixed chapter labels (for example 15. The Digestive System)
- malformed path shapes (missing or extra > segments), except the approved Experiments > Fair-test tag
- placeholder topic — when a concrete topic is clearly inferable from the paper/question set
If QC fails, retry Phase 4 once with an explicit correction instruction. Do not continue to assembly with failed tags.
Persist QC evidence to context.marking_asset/debug/phase4_qc.json with:
- total_rows
- invalid_format_rows (array of question_id)
- placeholder_topic_rows (array of question_id)
- qc_passed (boolean)

6. Phase 5: Assembly and Finalization

As the Orchestrator, you must now assemble the final artifacts:

Merge Data: Combine the final grades/diagnoses and the skill tags into the question_results array. Ensure the question_id is mapped to the result_id field in the final schema.
- Preserve max_marks from Phase 2 for every row. When applying Phase 3 objects, merge fields such as student_answer, correct_answer, outcome, earned_marks, diagnosis, human_note, and corrected_attempt_pages from Phase 3, but never overwrite Phase 2 max_marks with a Phase 3 value (Phase 3 must not be able to inflate totals).
- If Phase 3 omits earned_marks/outcome, keep the Phase 2 values; if Phase 3 includes them, they must satisfy 0 <= earned_marks <= max_marks from Phase 2.
- Persist context.marking_asset/debug/phase5_merge_qc.json with { "max_marks_source": "phase2_only", "phase3_max_marks_discarded": <bool>, "qc_passed": true } after verifying every row.
- For human_note merge policy:
  - Keep human_note only when it passes the human-note policy QC/provenance checks.
  - If provenance checks fail, force human_note = null for that row and record the dropped row in phase5_merge_qc.json under human_note_rows_dropped_by_policy.
  - Do not auto-convert dropped human_note text into diagnosis.reasoning; rerun the subagent instead when the content is needed.
Build Page Map: Use the corrected_attempt_pages (from Phase 3) or attempt_pages (from Phase 1) to build the context.question_page_map. Ensure the attempt_page_start is set to the first page in the array.
Calculate Totals: Calculate summary.earned_marks and summary.total_marks by summing the question_results.
Determine Scope: Set context.is_partial based on whether the graded questions represent the full expected paper.
Write JSON: Write via write_marking_artifact (ai_study_buddy.marking.core.artifact_writer.write_marking_artifact) using the default (schema_version=None) so timestamps normalize to marking-time SGT semantics and schema_version is stamped as artifact_schema.SCHEMA_VERSION, persisting canonical layout: context/marking_results/<student_slug>/<subject_context>/<attempt_basename>.json.
- Required: artifact.created_at == artifact.updated_at == run_marked_at (the single captured timestamp from Phase 0/startup).
- Required: context.marking_asset stem must match artifact stem for that same run_marked_at.
- Do not strip or rewrite resolver provenance fields under context.context_resolution.
- If write-time contract validation fails, treat it as a producer bug and fix the producer path instead of force-writing manual context.
Render Markdown: Run the report_renderer to generate the Markdown report in context/learning_reports/.
Write Profiling Log: Create a context.marking_asset/debug/profiling_log.md file. Record the start and end times (in SGT) for Phase 1, Phase 2, Phase 3, and Phase 4. Calculate the total duration of the marking run.
Write Telemetry Data: In the generation block of the final JSON, include a telemetry object: {"fast_pass_count": X, "deep_dive_count": Y, "total_duration_seconds": Z}. This allows you to track the efficiency of the Optimistic Fast-Pass architecture over time.

Final JSON assembly field constraints (required)

Apply these strict mappings when translating subagent outputs into canonical rows:

result_id: from question_id
outcome: must be one of correct | partial | wrong | disqualified (normalize incorrect -> wrong if a subagent emits legacy wording)
scoring_status: must be counted or excluded_disqualified
diagnosis: object with only mistake_type, reasoning, confidence
error_tags: array using allowed taxonomy enums only
question_page_map[] entries: only result_id, attempt_page_start, confidence, source, optional evidence_image, optional note
generation: only produced_by, mode, notes, optional telemetry
generation.telemetry (if present): only
- fast_pass_count (int >= 0)
- deep_dive_count (int >= 0)
- total_duration_seconds (number or null, >= 0 when numeric)
- optional manual_corrections (int >= 0)
- optional phase2_task_subagents (boolean)

Pre-Finalization Teacher Tally Reconciliation (required for Mode B)

Before final JSON/report write in teacher-annotated mode:

Extract page-01 teacher tally (overall and per-section when available).
Compute section/booklet totals from merged question_results.
Persist context.marking_asset/debug/teacher_tally_qc.json with:
- teacher_total_marks
- teacher_earned_marks
- computed_total_marks
- computed_earned_marks
- section_deltas (if section tally visible)
- qc_passed
If totals do not match and no explicit user override is provided, do not finalize; route mismatched sections/questions back for remediation.

Pre-Finalization MCQ No-Response Validator (required)

Before writing the final JSON/report:

Collect all MCQ rows currently labeled as no_response / blank-answer equivalents.
Re-check each of those rows against attempt-page evidence with a dedicated bracket-focused pass.
If any bracket shows plausible intentional ink (including faint single-stroke numerals), do not finalize as blank without adjudication.
If ambiguity remains unresolved, downgrade confidence and annotate human_note rather than silently committing a definite blank response.

Pre-Finalization Language Validator (required)

Before writing final JSON/report:

Re-scan merged question_results free-text fields:
- student_answer
- correct_answer
- human_note
- diagnosis.reasoning
Enforce subject_context language policy (English-only for non-Chinese contexts).
Persist context.marking_asset/debug/final_language_qc.json with:
- total_rows
- english_required (boolean)
- violating_result_ids (array)
- qc_passed (boolean)
Do not finalize JSON/report until qc_passed is true.

Pre-Finalization Human-Note Validator (required)

Before writing final JSON/report:

Re-scan merged rows for strict human_note policy conformance:
- human_note non-null -> provenance/source present and verbatim flag true;
- no AI-summary patterns in human_note (for example generic analytic phrasing without direct annotation cues).
Persist context.marking_asset/debug/final_human_note_qc.json with:
- total_rows
- rows_with_human_note
- policy_violations
- qc_passed
Do not finalize JSON/report until qc_passed is true.

Quality Bar:

Do not hallucinate data if a subagent fails. If a Phase 3 subagent fails or returns malformed JSON, you may retry launching a subagent for that specific question_id.
Ensure final JSON passes validate_marking_artifact_dict(...) under the repo’s enforced artifact_schema.SCHEMA_VERSION/SUPPORTED_SCHEMA_VERSIONS policy (this is stronger than matching a prose-named version literal).

7. Error Handling and Cleanup

Delete the entire Marking Asset Bundle directory (context.marking_asset) that was created for this run.
Do not leave orphaned PNGs or intermediate JSON files in the workspace.
Inform the user that the run failed and the temporary assets were cleaned up.

For pruning artifacts from completed/older runs (not just failed-run temporary cleanup), use the dedicated skill: ../prune-marking-run-artifacts/SKILL.md