| name | team-shinchan:fierce-option-panel |
| description | Default-on interview option-quality panel — N diverse generators produce structure-free options, a SelfCheckGPT majority-vote consensus filters hallucinations, a SteerConf cautious-confidence judge scores survivors, and a deterministic top-K is returned. Workflow tier; the single fierce-* skill that is ON by default. |
| user-invocable | false |
EXECUTE IMMEDIATELY
fierce-option-panel is a main-loop Workflow that hardens the quality of interview
recommendation options generated by Misae's DESIGN_NEXT_QUESTION. It runs N diverse
generators (structure-free — no A/B/C schema), filters them with a SelfCheckGPT
majority-vote consensus (an option survives only if backed by ≥ ceil(N/2+1) generators —
no any-pass promotion of a single generator's hallucination, HR-2), scores the survivors
with a SteerConf cautious-confidence judge rubric, and returns a deterministic top-K.
Default-ON: explicit exception to the fierce-* opt-in convention
Every other fierce-* skill (fierce-debate, fierce-compete, fierce-ralph, fierce-review) is
opt-in — the user invokes it explicitly because it is expensive. fierce-option-panel is
the single intentional exception: it is ON by default under the quality-over-cost
principle. Option quality at the interview stage compounds across the entire downstream
workflow, so the extra per-turn cost is worth it. Documented identically in
docs/fierce-option-panel.md and agents/misae.md (FR-10.2).
Step 0: Validate + opt-out
- This skill calls the Workflow tool (main-loop only). Never delegate to a subagent —
workflow() throws inside a Task child (R-5).
- Opt-out / escape hatch (FR-6.3): read
.shinchan-config.yaml. If
interview.fierce_option_panel: false, do NOT run the panel — run the basic B-path
(structure-free gen → verbalized sampling → missing-alternative critic → DINCO calibration,
see agents/misae.md) and record current.interview.option_source: basic.
- Read
interview.fierce_panel_generators (default 3), interview.fierce_panel_k_max
(default 6), and interview.fierce_panel_token_budget_per_turn (default 60000). If the
estimated panel cost for this turn exceeds the budget (HR-3), skip the panel, run the basic
B-path, and record option_source: basic_fallback.
Step 1: Resolve question + personas (main loop — the script can't read files)
Step 2: Run the fierce-option-panel Workflow
Workflow({
scriptPath: "${CLAUDE_PLUGIN_ROOT}/skills/fierce-option-panel/fierce-option-panel.workflow.js",
args: {
question: "<the interview question>",
generators: 3, // interview.fierce_panel_generators
kMax: 6, // interview.fierce_panel_k_max
files: ["<path>", "..."], // context the generators should read
generatorPersona: "<workflow-personas.js misae>",
generatorLearnings: "<workflow-personas.js --learnings misae>",
judgePersona: "<workflow-personas.js actionkamen>"
}
})
Generate (N generators each emit one structure-free option + evidence + weight) →
Consensus (SelfCheckGPT majority-vote, ≥ ceil(N/2+1); diversity floor bypass if < 2
survivors, R-3) → Judge (SteerConf cautious-confidence rubric, deterministic top-K).
Returns { question, generators, k_max, consensus_threshold, options, scores, winner, rationale, dissent, option_source }.
Step 3: Graceful degradation (NFR-3)
If the Workflow returns an error (any generator threw, judge returned no verdict, budget
overrun), the result carries option_source: 'basic_fallback'. Run the basic B-path to
produce the options for this turn instead. No interview turn is ever blocked by panel
failure.
Step 4: Record + format
- Record
current.interview.option_source (fierce_panel | basic | basic_fallback) in
WORKFLOW_STATE.yaml (FR-6.4) so retrospectives can audit which path each turn used.
- Apply A/B/C labels to the returned
options only now (separate formatting step, FR-1).
- Raw confidence (FR-4 / HR-1): never write raw/uncalibrated self-confidence to
WORKFLOW_STATE, logs, or debug output — only DINCO-normalized values.
Limitations / transferability gap (NFR-5)
The ECE/AUROC calibration metrics (in src/option-metrics.js) transfer from the factual
QA literature, where "correct" is objectively defined. Here, "correct" is a proxy: the
user's eventual option selection is treated as ground truth. This transfer is unvalidated
for the design-option domain. The gating bars (ECE < 0.10, AUROC >= 0.70,
Distinct-2 >= 0.55, self-BLEU <= 0.40) are pragmatic targets under the proxy, not
universally validated thresholds. The same caveat appears in the option-metrics.js module
docstring.