| name | consensus-interpret |
| description | LLM-grounded biological interpretation of a verified typed consensus run. Reads the typed run dir + the original adata, runs inline per-cluster DE, looks up markers in a bundled tissue-keyed marker DB, and asks the chair LLM to (γ) name each cluster's likely cell type with mandatory marker citations and (β) recommend top-3 next-step skills with mandatory evidence_refs. Output banner [A+I: Interpreted on verified consensus]. Failure-mode contract per ADR 0012. |
| version | 0.1.0 |
| author | OmicsClaw |
| license | Apache-2.0 |
| tags | ["spatial","consensus","interpreted-layer","biology-annotation","marker-grounded","backward-proof-driven-recommendation"] |
| requires | ["anndata","scanpy","numpy","pandas","scikit-learn"] |
consensus-interpret
When to use
The user has just finished a verified typed consensus run
(consensus-domains or sc-consensus-clustering) and wants the next
manual step (read cross_method_nmi.csv → run spatial-de →
cross-reference markers → name cell types → decide downstream skill) done
automatically with falsifiable evidence binding every LLM claim.
This skill does NOT replace the typed run. It is a strictly downstream
consumer: it reads <typed_run_dir>/{plan.json, consensus_labels.tsv, member_scores.csv, cross_method_nmi.csv} plus the original adata, and
writes its output to a different directory under
analysis://interpreted/<typed_run_id>. The verified-vs-exploratory
boundary established by ADR 0010 is preserved.
Skip when:
- The typed run did not produce
consensus_labels.tsv (i.e.
consensus-domains exited non-zero — fix the typed run first).
- You are looking for a generic
query → skill dispatcher; use
orchestrator for that (forward direction).
- You want to refine the consensus itself based on LLM judgment; that is
explicitly forbidden by §11.4 "LLM never participates in statistical
merging" and would be rejected by this skill's T3 invariants.
Inputs & Outputs
| Input | Format | Required |
|---|
| Typed run directory | --input <typed_run_dir> (must contain plan.json written by consensus-domains / sc-consensus-clustering) | yes |
| Output directory | --output <interpreted_dir> | yes |
| Tissue hint | --tissue {brain, immune, kidney, liver} (selects bundled marker DB) | yes (unless --markers provided) |
| Original AnnData path | --adata <preprocessed.h5ad> | no — defaults to plan.json input_path |
| User-provided marker DB | --markers <file.tsv> (overrides bundled DB; same schema) | no |
| Disable LLM (structural-only) | --no-llm | no — default fails-fast on LLM unavailability |
| LLM seed (for self-consistency runs) | --seed 0 | no |
| Top-K markers reported per cluster | --top-k-markers 20 | no |
| Top-K next-step recommendations | --top-k-next-steps 3 | no (capped at 3 by ADR 0012) |
| Coverage floor for T2 escalation | --coverage-floor 0.5 | no |
| Output | Path | Notes |
|---|
| Interpreted report | interpreted_report.md | First line: [A+I: Interpreted on verified consensus] (or [I-noLLM: ...] if --no-llm). Non-configurable per ADR 0012. |
| Structured cell-type assignments | interpreted_assignments.json | machine-readable; schema below |
| Per-cluster DE table | de_per_cluster.csv | inline scanpy.tl.rank_genes_groups output; consensus-interpret computes this |
| Contradiction regions | contradiction_regions.csv | rows where cross_method_nmi indicates disagreement; LLM-narrated in markdown |
| Audit | audit.json | typed_run_id, adata checksum, marker DB used, LLM model/seed, interpreted_namespace, evidence_base_namespace |
interpreted_assignments.json schema
{
"schema_version": "0.1",
"typed_run_id": "<plan.json:run_id>",
"evidence_base_namespace": "analysis://typed/<run_id>",
"interpreted_namespace": "analysis://interpreted/<run_id>",
"banner": "[A+I: Interpreted on verified consensus]",
"operator": "<from typed plan.json>",
"clusters": [
{
"id": 5,
"n_cells": 778,
"interpretation_status": "interpreted | low_confidence | failed",
"cell_type": "CA1 pyramidal",
"confidence": 0.84,
"evidence": {
"markers": [
{"gene": "Pvrl3", "de_rank": 1, "db_source": "panglaodb_brain", "db_celltype": "CA1 pyramidal neuron", "weight": 0.9},
{"gene": "Wfs1", "de_rank": 3, "db_source": "panglaodb_brain", "db_celltype": "CA1 pyramidal neuron", "weight": 0.85}
],
"mean_local_purity": 0.617,
"member_agreement": [
{"member": "leiden_resolution-0.5", "label_overlap": 0.92},
{"member": "leiden_resolution-1.0", "label_overlap": 0.78}
]
},
"narrative_md_anchor": "#cluster-5"
}
],
"next_steps": [
{
"skill": "spatial-de",
"args_hint": "--groupby consensus_kmode --comparisons cluster_3_vs_5",
"priority": 1,
"evidence_refs": [
"cross_method_nmi.csv:row=leiden_resolution-0.5,col=leiden_resolution-1.5,value=0.597"
],
"reason": "Lowest pair-wise NMI in matrix; marker disambiguation between clusters 3 and 5 will resolve whether these are sub-types of CA1 or a transition zone."
}
]
}
Flow
1. Preflight (T1 — fail-fast if any fail)
├─ Load plan.json from --input; assert schema_version + typed run integrity
├─ Locate adata at plan.json.input_path (or --adata override); check exists
├─ Load consensus_labels.tsv; assert observation column ⊆ adata.obs.index
├─ Resolve marker DB:
│ --markers <path> if given;
│ else bundled `data/markers/panglaodb_<tissue>.tsv` for --tissue;
│ else exit 5 (MarkerDBUnavailable)
└─ If LLM required and unreachable AND --no-llm not set → exit 6 (LLMUnavailable)
2. Per-cluster differential expression (deterministic, scanpy)
└─ scanpy.tl.rank_genes_groups(adata, groupby=consensus_<operator>, method='wilcoxon')
→ de_per_cluster.csv with top-K markers per cluster (K=20 default)
3. Marker → cell-type lookup (deterministic, pre-LLM)
└─ For each cluster, compute candidate cell types by ranking DB entries
whose gene appears in the cluster's top-K markers (weighted by db.weight × 1/de_rank).
4. LLM grounded interpretation (γ + β; one call per cluster + one synthesis call)
├─ Prompt template embeds (per cluster):
│ cluster_id, n_cells, top-K DE markers,
│ DB candidate cell types (ranked),
│ member_agreement summary, cross_method_nmi neighbors
├─ LLM must return JSON conforming to interpreted_assignments.json
│ schema; mandatory evidence.markers[] with non-empty
│ {gene, db_source, db_celltype}
└─ After all clusters: one synthesis call to produce next_steps[]
with mandatory evidence_refs[] (capped at top-3 by priority)
5. Invariant enforcement (T3 — fail-fast if violated)
├─ Every cluster.evidence.markers != [] → else exit 7
├─ Every next_steps[*].evidence_refs != [] → else exit 7
└─ Banner present and matches one of two allowed values → else exit 7
6. Coverage check (T2 — escalate to T1 if floor breached)
└─ interpretable_cluster_frac < --coverage-floor → exit 8
7. Artifact writes
├─ interpreted_report.md (banner enforced in format_interpreted_report)
├─ interpreted_assignments.json
├─ de_per_cluster.csv
├─ contradiction_regions.csv
└─ audit.json
Failure modes (per ADR 0012)
| Exit | Name | Meaning |
|---|
| 0 | success | All clusters interpreted (or low_confidence), invariants intact, no degradation triggered |
| 2 | argparse | CLI error |
| 3 | TypedRunInvalid | plan.json missing / malformed / not from a typed run |
| 4 | AdataMismatch | adata obs index disjoint from consensus_labels.tsv observation |
| 5 | MarkerDBUnavailable | --tissue not in bundled DBs and --markers not provided |
| 6 | LLMUnavailable | LLM endpoint unreachable and --no-llm not given |
| 7 | InvariantViolation | LLM violated marker-grounding or evidence-ref contract (T3) |
| 8 | CoverageBelowThreshold | < 50% of clusters interpretable (after T2 degradation) |
Examples
Default usage (after a typed run completes)
oc run consensus-domains --input preprocessed.h5ad --output run1/ \
--members banksy,graphst,leiden:resolution=0.5,leiden:resolution=1.0 \
--non-interactive --operator kmode --seed 0
oc run consensus-interpret --input run1/ --output run1_interpreted/ \
--tissue brain
CI / offline (structural-only)
oc run consensus-interpret --input run1/ --output run1_struct/ \
--tissue brain --no-llm
User-provided marker DB (non-bundled tissue)
oc run consensus-interpret --input run1/ --output run1_interp/ \
--markers ~/markers/mouse_intestine.tsv
Related skills
| Direction | Skill | Relationship |
|---|
| upstream (required) | consensus-domains / sc-consensus-clustering | produces the typed run this skill interprets |
| upstream (auto-chained, internal) | none — DE is computed inline with scanpy.tl.rank_genes_groups; we do not invoke spatial-de as a subprocess (see ADR 0012 rejected alternatives §4) | |
| sibling (semantically distinct) | orchestrator | forward query → skill; this skill does backward result → (skill, evidence) |
| downstream (β suggests these) | spatial-de, spatial-deconv, spatial-communication, spatial-trajectory, etc. | next-step skills the LLM may recommend; each recommendation MUST cite specific typed-run evidence |
References
- ADR 0010 — typed-vs-narrative consensus runtime (boundary integrity)
- ADR 0011 — typed consensus evaluation protocol (composite member score; DLPFC hero)
- ADR 0012 — this skill's evaluation protocol (4-axis panel + T3 invariants)
docs/CONTEXT.md "Cross-reference: Consensus runtime" — canonical vocabulary