원클릭으로 Manus에서 모든 스킬 실행

$pwd:

summarize-assessment

Name: Summarize Assessment
Author: PlanExeOrg

// Use after the napkin_math pipeline has produced parameters/bounds/scenarios/montecarlo JSON to generate a plan assessment (assessment.md) — a thin interpretation layer over the intermediary artifacts. Emits a JSON manifest, a provenance map, gate verdicts (Critical / Fragile / Marginal / Robust), failure drivers, confidence and trust boundaries, scenario sanity check, and suggested next actions. The artifact is a navigation/judgment file, not a copy of the raw simulation data.

Manus에서 실행

$ git log --oneline --stat

stars:381

forks:64

updated:2026년 5월 18일 23:11

SKILL.md

readonly

related-skills.json

같은 저장소

extract-parameters-from-digest.md

from "PlanExeOrg/PlanExe"

Use when the user wants to extract parameters from a PlanExe extraction-input digest (the markdown produced by experiments/napkin_math/prepare_extract_input.py — the 137-recommended section bundle, with the four "Keep or compress" sections compressed) instead of the full PlanExe HTML report

2026-05-21381

extract-parameters-from-full.md

from "PlanExeOrg/PlanExe"

Use when the user wants to extract parameters, modelling values, or key variables from a PlanExe report (HTML or text) for napkin math, triage, or Monte Carlo simulation

2026-05-21381

generate-bounds.md

from "PlanExeOrg/PlanExe"

Use when the user wants to generate low/base/high assumption ranges (bounds) for missing or uncertain variables in a validated extract-parameters-from-full JSON, in preparation for deterministic scenarios or Monte Carlo

2026-05-21381

run-napkin-math-pipeline.md

from "PlanExeOrg/PlanExe"

Use when the user wants to run the napkin-math pipeline end-to-end on a PlanExe report, or resume a partially populated output directory by filling in only the missing stages. Orchestrates digest preparation, parameter extraction, validation, bounds, calculations, scenarios, Monte Carlo, and assessment rendering. Never copies artifacts forward from prior runs, and never re-runs a stage whose output is already on disk.

2026-05-19381

validate-parameters.md

from "PlanExeOrg/PlanExe"

Use after the napkin_math pipeline has produced parameters.json (from extract-parameters-from-digest or extract-parameters-from-full) to validate it against the 16 structural checks the rest of the pipeline assumes. Writes validation.json next to parameters.json. Deterministic Python — no LLM call.

2026-05-17381

generate-calculations.md

from "PlanExeOrg/PlanExe"

Use when the user wants to turn a validated extract-parameters-from-full JSON into a Python module of deterministic functions implementing the formula_hint expressions for downstream scenario runs and Monte Carlo

2026-05-16381

package.json

"author": "PlanExeOrg"

"repository": "PlanExeOrg/PlanExe"

GitHub 저장소 열기 Creator 저장소 보기

$ install --global

$ download --local

Manus에서 실행

$ useful --forSOC

프로젝트 관리 전문가비즈니스 및 금융 운영직13-1082L4

name

summarize-assessment

description

Use after the napkin_math pipeline has produced parameters/bounds/scenarios/montecarlo JSON to generate a plan assessment (assessment.md) — a thin interpretation layer over the intermediary artifacts. Emits a JSON manifest, a provenance map, gate verdicts (Critical / Fragile / Marginal / Robust), failure drivers, confidence and trust boundaries, scenario sanity check, and suggested next actions. The artifact is a navigation/judgment file, not a copy of the raw simulation data.

Summarize napkin_math outputs into a plan assessment

Overview

A thin wrapper around experiments/napkin_math/summarize_assessment.py. The script reads the pipeline artifacts and emits assessment.md next to them. The output is an interpretation layer: it tells the next reader of this directory what the simulation tested, which gates fail or pass, which inputs drive the result, which assumptions remain unvalidated, and what to inspect next. The raw distributions live in montecarlo.json; assessment.md references them via the provenance map rather than reproducing them.

When to Use

The Monte Carlo stage has just produced montecarlo.json and the user wants a verdict
The user asks "is this plan in trouble?" or "what does the simulation say?"
After any iteration on bounds or calculations, to see how the gate signals moved

Not for: producing the simulation itself (monte-carlo), running scenarios (run-scenarios), or extracting parameters from a report.

Workflow

Locate the inputs. Required: parameters.json. Optional but recommended: bounds.json, scenarios.json, montecarlo.json. validation.json and montecarlo_settings.json are picked up automatically from the same directory if present. The script degrades gracefully if any optional file is missing — it just omits that section. If parameters.json is missing, ask.

Invoke the script. Requires Python 3.11+ (no extra deps):

/opt/homebrew/bin/python3.11 experiments/napkin_math/summarize_assessment.py \
  --parameters   <path>/parameters.json \
  --bounds       <path>/bounds.json \
  --scenarios    <path>/scenarios.json \
  --montecarlo   <path>/montecarlo.json \
  [--output      <path>/assessment.md]

Default output: <dir-of-parameters>/assessment.md. The script prints the output path on stdout.

Report back. Tell the user the output path. If the user asks for a verdict in-conversation, read assessment.md and quote the gate-verdict rows and critical-findings bullets verbatim — don't paraphrase.

How verdicts are decided

Verdicts come from the user's own threshold definitions in the Monte Carlo settings. No identifier-string or unit-string interpretation — domain-bias-free.

Each threshold has an operator (>=, <=, etc.) and a value. The user wrote the threshold because they want it to pass. The verdict is the pass probability in the simulation:

Band	Pass probability	Verdict	Note
≥ 80%	strong majority	Robust	passes in the strong majority of runs
50–80%	uncomfortable	Marginal	passes more often than not but uncomfortably close
20–50%	minority pass	Fragile	fails in the majority of runs
< 20%	rarely passes	Critical	rarely passes under current bounds

Any output classified Critical or Fragile also gets a "bottom line" callout at the top of the report.

The script does not invent thresholds for outputs the user did not declare. To get a verdict on an output, declare a threshold on it in the Monte Carlo settings file.

Audience and tone

assessment.md is written to be consumed by the next program or process that touches this directory — a downstream pipeline stage, a planning loop, a follow-on extractor, a future invocation of this same workflow. A human can also read it, but the writing optimises for token-density of useful signal over engagement hooks. The output describes what the file is (an interpretation layer over the simulation artifacts); it does not label its audience.

What that means concretely:

The file leads with a machine-readable JSON manifest. Every prose section after it is a structured form of those same signals.
Stable retrievable section names (## Critical findings, ## Gate verdicts, ## Failure drivers, ## Suggested next actions). No colorful or narrative-flavoured headings.
No reader-engagement prefixes ("If you read nothing else, read this", "Stop and pay attention", "Important:"). The structural markers and verdict labels carry that weight.
No filler sentences whose only job is to motivate the next sentence. Lead with the substantive claim.
Keep substantive explanations (what a verdict label means, what a column shows, what makes an item belong in a section). Those are signal, not filler.
Don't apologise for or hedge the bad news. State it.

Writing rules — apply to the script's output AND to anything you say back to the user about the assessment

These are not stylistic preferences. They are how this skill is meant to communicate.

Critical findings first. After the artifact contract, machine summary, provenance map, modelling frame, and simulation settings, the first interpretation section is ## Critical findings. It consolidates every signal that the plan does not survive its own assumptions: Critical and Fragile thresholds, scenario warnings, numbers the model could not compute, and inputs the plan does not supply at all. If nothing qualifies, the section is omitted entirely — silence is the only acceptable form of good news.
No sugar-coating. A 5% pass probability is "rarely passes under current bounds", not "shows some challenges". A base-scenario value on the wrong side of a declared threshold is "the gate fails at the plan's own central assumptions", not "may warrant further attention". Use the strongest accurate language without overclaiming epistemic certainty — never "the math says it certainly will" — but never soften the result either; if the script's wording softens a result, fix the script.
No sycophancy. Never start a paragraph with "Great plan, but..." or "The team has done strong work; one concern is...". The downstream consumer has the plan available. It does not need praise from the report.
No hedging phrases. Banned in both the script's emitted text and in conversational reporting:
- the honest read is, frankly, to be fair, in fairness, candidly, let's be real, look, the truth is
- rhetorical "I'll be honest with you" / "to put it bluntly" / "if I may"
These imply the default mode is dishonest or evasive. State the claim directly.
Hedges about data vs hedges about the speaker. Hedges that point at the underlying simulation are fine: "the simulation shows", "based on the bounds we have", "within the assumed ranges". Hedges that point at the writer's posture are not.
Quote the verdicts; don't paraphrase. When the script emits **Critical** — rarely passes under current bounds, report it as Critical — rarely passes under current bounds. Don't summarise it as "this one is concerning". The verdict bands are precise and load-bearing.

Sections in the generated assessment.md

Order is deliberate. Stable section names — programmatic consumers retrieve by heading text, so the headings stay the same regardless of plan domain:

# Assessment: <plan name> — title plus a 2-line frontmatter (type, primary goal).
## Artifact contract — declares what this file is (an interpretation layer over the simulation artifacts) and what it is not (a copy of the raw simulation data, an external feasibility proof, a probability calibration).
## Machine summary — a JSON code block with the compact manifest: assessment_schema_version (currently 6), artifact_type, plan_name, artifact_set (version / plan_slug / relative_dir — the portable identifier), source_plan_dir (absolute path; local-only), primary_model_result (a structured object: overall_risk_band ∈ doom/fragile/marginal/viable/unknown, basis — a one-line disclaimer that the band reflects the worst declared gate's pass-rate band and is not a calibrated whole-plan probability, reason, worst_gate, worst_gate_pass_rate), validation_status, simulation (n_runs/seed/distribution_default), primary_failed_gates, primary_uncertainty_drivers, known_unmodelled_existential_gates (flat list of ids the extractor flagged as existential but unmodelable; empty when none), assessment_scope_warning (string when the unmodelled list is non-empty, null when empty — a one-line warning to programmatic consumers that the simulation is partial), do_not_treat_as, schema_notes (allowed enums for overall_risk_band, verdict, basis, threshold_basis, plus the primary_model_result_semantics disclaimer). The basis_enum is intentionally wider than what the current pipeline emits (report_derived, model_assumption) — it reserves report_explicit, report_inferred, external_reference, manual_override, unknown for future provenance types. JSON, not YAML — that is intentional.
## Provenance map — table listing every intermediary file with its role and "open when" guidance. The first row points at extract_parameters_input.md, then parameters/bounds/calculations/scenarios/scenario_outputs/montecarlo_settings/montecarlo/validation.
## Modelling frame — the source plan's own statement of what the model is testing, lifted verbatim from parameters.plan_summary.modelling_frame. When parameters.unmodelled_gates is non-empty, this section gains a bold Note caveat naming the count of unmodelled existential gates and telling the reader the gate-verdict pass rates are conditional on those gates holding.
## Known unmodelled existential gates — table of gates the extractor flagged as existential to the plan but unmodelable by deterministic Python (legal authorization, political reversal, AML/banking compliance, external-actor commitments). Columns: Gate, Why it matters, Source anchor (which source section names it), Consequence if false. Section omitted entirely when parameters.unmodelled_gates is empty or absent.
## Simulation settings — n_runs, seed, distribution_default, validation status.
## Critical findings — bullets in severity order. When parameters.unmodelled_gates is non-empty, a SCOPE WARNING bullet leads the section naming the unmodelled gate labels and pointing at the dedicated section. Then: Critical gates, Fragile gates, scenario warnings, numbers the model could not compute (≥5% blank runs), still-missing inputs. Section omitted entirely when nothing qualifies.
## Gate verdicts — every declared threshold, worst-first, with the min marker on aggregate gates. Columns: marker, output, condition, threshold basis (report_explicit / report_inferred / model_defined / unknown — derived from the corresponding key_value's value_type), pass rate, verdict, meaning. Includes an ### Aggregation warning sub-section when the thresholds use incompatible units and the plan declares no min() aggregate.
## Decision implications — one row per gate with verdict in Critical/Fragile/Marginal. Five columns: Gate, Verdict, Planning consequence (templated by verdict), Structural lever (the top driver from quartile_analysis with the direction implied by its sign of Δ-pp), Gate meaning (the gate's own rationale lifted from parameters.recommended_first_calculations[].why_first or derived_questions[].why_it_matters, plus the threshold parameter the formula tests against). The Gate-meaning column surfaces plan-specific framing without inventing tactical advice; concrete revisions should be derived by reading the source report and the relevant intermediary artifacts.
## Failure drivers — one row per failing gate (Critical or Fragile): top driver from quartile_analysis (max abs Δ-pp) and the conditional input restriction from required_input_thresholds that would lift the gate to 80%. Binding-gate frequencies for aggregates appear as bullets below the table.
## Missing inputs ranked by impact — the missing_value_priority table. The Basis column translates the bounds.json source label (data → report_derived, assumption → model_assumption) so it isn't mistaken for empirically observed real-world data.
## Confidence and trust boundaries — Validated (a one-line list of validation.json checks_performed), Not validated (a canonical list: real-world accuracy of bounds, independence assumptions, external feasibility, factual truth of source claims), Per-output confidence (HIGH/MEDIUM/LOW grade table from model_confidence). The grade-table column is Declared-source inputs — the share of input bounds anchored in the source report's narrative; the rest are modelling assumptions. Neither is empirical real-world data.
## Scenario sanity check — short low/base/high deterministic comparison table. Columns: Low inputs / Base inputs / High inputs, matching the keys in scenarios.json.
## Suggested next actions — five imperatives for whatever consumes this file next. Phrased as "To answer X, lead with Y; to audit Z, open W".
## Open questions for next analysis pass — five standing audit questions the simulation can't answer on its own (bound width/bias, gate independence, hard vs soft gates, missing-input remediation, unmodelled gates).

Common Mistakes

Mistake	Fix
Running before `montecarlo.json` exists	Gate verdicts, failure drivers, and the machine summary's `primary_model_result` all depend on simulation output. Run the Monte Carlo stage first.
Reading the markdown and paraphrasing the gate verdicts	Quote them. The cutoff bands and phrasing are deliberate.
Treating the machine summary as authoritative without reading the prose	The JSON manifest is a compact pointer, not a proof. The aggregation warning, trust boundaries, and failure-driver rows are load-bearing context.
Treating a Marginal verdict as good news	Marginal means "passes in 50–80% of runs" — that's the same as "fails up to 50% of the time".
Inventing a threshold to make a number look good	Thresholds reflect the user's success criteria. Don't fabricate them after the fact.

Reference

Script (authoritative): experiments/napkin_math/summarize_assessment.py
Companion skills: ../monte-carlo/SKILL.md, ../run-scenarios/SKILL.md, ../generate-bounds/SKILL.md, ../extract-parameters-from-full/SKILL.md, ../extract-parameters-from-digest/SKILL.md
Example output: any assessment.md under experiments/napkin_math/output/<version>/<plan>/

summarize-assessment

이 저장소의 다른 Skills

이 저장소의 다른 Skills

Summarize napkin_math outputs into a plan assessment

Overview

When to Use

Workflow

How verdicts are decided

Audience and tone

Writing rules — apply to the script's output AND to anything you say back to the user about the assessment

Sections in the generated assessment.md

Common Mistakes

Reference

Summarize napkin_math outputs into a plan assessment

Overview

When to Use

Workflow

How verdicts are decided

Audience and tone

Writing rules — apply to the script's output AND to anything you say back to the user about the assessment

Sections in the generated assessment.md

Common Mistakes

Reference