| name | priority-review |
| description | Refresh priority signals, manage active epics, and refresh Near Term pool |
| disable-model-invocation | true |
Priority Review
Refresh the priority signals table for the groomed pool (Backlog + Near Term),
manage active epics, compute the Near Term baseline, and display the comparison
table. The table and Near Term composition are the artifacts; the skill is the
refresh workflow that keeps them current.
Modes
- Default (no flag): Full refresh — all card types, Impact + Severity scoring.
--bugs: Bug cards only. Filters to story_type: bug at classification,
display, scoring, and ranking. Uses a bug-tuned classification prompt that
adds blast_radius and workaround_exists fields. Only Severity scoring
(no Impact). Typically fits in a single delegate batch.
Constraints
- Classification model: Use
claude-sonnet-4-6 for agent classification.
Bake-off showed haiku gets counts right but misclassifies evidence_type;
sonnet and opus are equivalent on judgment, sonnet is cheaper.
- Subagents are classifiers, not summarizers. They return structured JSON.
Review evidence_type assignments if any look wrong — sonnet is good but not
perfect.
- Spot-check at least 3 classifications before importing. Verify
intercom_conversations counts against the actual classification inputs:
card descriptions plus any linked-context summaries included in the batch
file. If any error is found, double the sample size before merging. Repeat
until a full sample passes clean.
- Spot-check verification must be grounded in tool calls. Grep or Read the
source file to confirm specific values — do not verify from memory of a
prior read, even if the file is still in context. Proved 2026-04-23:
checkmark verification from 6-turn-old memory, monitor flagged twice.
- Multi-batch spot-checks need one tool call per batch. When N delegates
return, run N Bash reads (one per batch file). Do not construct verification
table entries from delegate rationale — that is the delegate verifying itself.
If a batch wasn't read, report it as unverified. Proved 2026-04-24: 7 scope
batches, 5 verified by tool calls, 2 table rows fabricated from delegate
output; monitor caught at high severity.
- Incremental by default. Only re-classify cards whose
updated_at has
changed since last classification. Use --force flags when a full refresh
is needed.
- Mutations. This skill reads Shortcut cards and writes to the local
PostgreSQL priority_signals table. Step 6 sets the Shortcut Severity custom
field on any bug that receives a severity score (routed through
execute_approved). The Near Term refresh step (Step 7) moves cards between
Backlog and Near Term (routed through execute_approved).
- Batch files go in the project directory (
.tmp-classify/), not /tmp.
Subagents cannot access /tmp.
- Data scope: The skill refreshes BOTH Backlog and Near Term states, since
the Near Term sort needs DIC scores for the full groomed pool.
Steps
0. Active Epics Review
Runs first, before any classification or scoring, since epic membership is the
primary Near Term criterion and changes affect everything downstream.
-
Display current active epics:
python3 box/active-epics.py list
-
Fetch all epics from Shortcut API, show any not in the active set that have
cards in progress or were created in the last 30 days:
SHORTCUT_API_TOKEN=$(grep '^SHORTCUT_API_TOKEN=' /Users/paulyokota/Dev/FeedForward/.env | cut -d= -f2-)
curl -s -H "Shortcut-Token: $SHORTCUT_API_TOKEN" \
"https://api.app.shortcut.com/api/v3/epics" | python3 -c "
import json, sys
from datetime import datetime, timedelta, timezone
cutoff = datetime.now(timezone.utc) - timedelta(days=30)
for e in json.load(sys.stdin):
if e.get('archived'): continue
created = datetime.fromisoformat(e['created_at'].replace('Z', '+00:00'))
if e.get('state') == 'in progress' or created > cutoff:
print(f'ID: {e[\"id\"]} | {e[\"name\"]} | State: {e[\"state\"]} | Created: {e[\"created_at\"][:10]}')
"
-
Ask: "Any epics to add or remove?" — wait for explicit confirmation before
proceeding.
-
Apply changes:
python3 box/active-epics.py add <ID> "Epic Name"
python3 box/active-epics.py remove <ID>
0a. Preflight: reconcile live board with DB
Before refreshing anything, check for cards in the DB that have moved out of
the groomed states since the last run. Run for BOTH states:
-
Fetch live card IDs from Shortcut for each groomed state:
python3 box/shortcut-cards.py --state "Backlog" --summary
python3 box/shortcut-cards.py --state "Near Term" --summary
-
Query the DB for card IDs currently stored with groomed states:
psql postgresql://localhost:5432/feedforward -t -A \
-c "SELECT card_id, state FROM priority_signals WHERE state IN ('Backlog', 'Near Term')"
-
Diff the two sets. Cards in the DB but not on the live board have moved.
For each moved card, re-extract to update its state in the DB:
python3 box/priority-signals.py extract --id <comma-separated-moved-ids>
Report what moved and where (the extract updates the state field).
-
Cards on the live board but not in the DB are new — they'll be picked up
by the extract in Step 1.
This prevents stale cards from appearing in the signals table or framework
ranking. Skip this step only if running with --force on a fresh DB.
1. Refresh metadata from Shortcut API
Run for BOTH groomed states (extract upserts by card_id, no conflicts):
python3 box/priority-signals.py extract --state "Backlog"
python3 box/priority-signals.py extract --state "Near Term"
This updates board position, state, product area, feature area users, and
dates. Incremental: skips cards whose updated_at hasn't changed.
2. Check what needs classification
python3 box/priority-signals.py classify --state "Backlog" > .tmp-classify/classify-backlog.json
python3 box/priority-signals.py classify --state "Near Term" > .tmp-classify/classify-nearterm.json
Merge the two outputs into a single classify file. Outputs classification
inputs as JSON for cards that either:
- Have never been classified
- Have been updated in Shortcut since last classification
Each card includes:
description
- raw
external_links for provenance
- bounded
linked_context summaries for the first 2 resolvable Slack
permalinks in Shortcut-provided order
Cards with only non-Slack external links remain provenance-only in this
tranche: no linked-context fetch is attempted for them.
If Slack-linked context cannot be resolved for a card, the card is omitted
from cards and surfaced under unresolved_cards. Do not treat unresolved
cards as freshly classified; they need a rerun once enrichment succeeds.
If the count is 0, skip to Step 5.
--bugs mode: Filter the classify output to bug cards only before
delegating. Linked-context enrichment must happen after this filter so
non-bug cards do not trigger Slack reads in bugs mode:
bugs = [c for c in data['cards'] if c.get('story_type') == 'bug']
3. Delegate classification to sonnet agents
Split the classify output into batches of ~16 cards. Delegate each batch
to agenterminal.delegate with model: "claude-sonnet-4-6".
Don't set timeout_ms — the default is the maximum (30 min).
The classification prompt for each batch:
You are classifying Shortcut cards for priority signal extraction.
Read each card's description plus any linked-context summaries and return
structured classification data.
The cards are in {BATCH_FILE}. Read this file, then for each card extract:
- Treat `description` as the primary source.
- Use `linked_context` only when it adds evidence relevant to the existing
classification fields.
- Do not mechanically count external links as demand.
- `external_links` without `linked_context` are provenance only in this
tranche.
1. intercom_conversations: Total count of Intercom conversations referenced.
Count BOTH linked conversations and stated totals (e.g. "31 conversations")
when they appear in the description or linked context. Use the stated total
when available. Return 0 if no Intercom evidence.
2. failure_volume_weekly: For bug cards, the headline weekly failure volume.
If sub-categories are broken down, report the headline total, NOT the sum
of sub-breakdowns. Return null for non-bug cards.
3. has_revenue_signal: true if evidence mentions cancellations, refunds, users
leaving, declining to subscribe. Polite requests without leaving signals = false.
4. evidence_type: One of:
- "direct_customer_pain" — Intercom conversations show real users affected
- "internal_metric" — evidence from internal dashboards, not customer reports
- "speculative" — no evidence of current customer impact
- "mixed" — combination of customer and internal evidence
- "implementation_step" — sub-task of a larger initiative
5. sentiment: "high" | "medium" | "low" | "none"
6. linked_context_used: "yes" | "no"
Return "yes" only if the proposed classification relied on linked-context
summaries in a way that changed or supported the judgment. Return "no" if
the description alone was sufficient.
7. notes: One sentence on the most important thing the numbers don't convey.
Return ONLY valid JSON:
{"classifications": [{"card_id": N, ...}, ...]}
Collect all batches in parallel.
--bugs mode classification prompt (replaces the above):
You are classifying Shortcut bug cards for priority signal extraction.
Read each card's description plus any linked-context summaries from
{BATCH_FILE}, then for each card extract:
- Treat `description` as the primary source.
- Use `linked_context` only when it adds evidence relevant to the existing
classification fields.
- Do not mechanically count external links as demand.
- `external_links` without `linked_context` are provenance only in this
tranche.
1. intercom_conversations: Total count of Intercom conversations referenced.
Count BOTH linked conversations and stated totals (e.g. "31 conversations")
when they appear in the description or linked context. Use the stated total
when available. Return 0 if no Intercom evidence.
2. failure_volume_weekly: The headline weekly failure volume from the card.
If sub-categories are broken down, report the headline total, NOT the sum
of sub-breakdowns. Return null if no failure volume is stated.
3. has_revenue_signal: true if evidence mentions cancellations, refunds, users
leaving, declining to subscribe. Polite requests without leaving signals = false.
4. evidence_type: One of:
- "direct_customer_pain" — Intercom conversations show real users affected
- "internal_metric" — evidence from internal dashboards, not customer reports
- "speculative" — no evidence of current customer impact
- "mixed" — combination of customer and internal evidence
5. sentiment: "high" | "medium" | "low" | "none"
6. blast_radius: One sentence. Who is affected and how broadly? Include any
stated user counts, percentages, or frequency data from the card.
7. workaround_exists: true | false | "partial". Is there a user-accessible
workaround described or implied?
8. linked_context_used: "yes" | "no"
Return "yes" only if the proposed classification relied on linked-context
summaries in a way that changed or supported the judgment. Return "no" if
the description alone was sufficient.
9. notes: One sentence on the most important thing the numbers don't convey.
Return ONLY valid JSON:
{"classifications": [{"card_id": N, "intercom_conversations": N, "failure_volume_weekly": N|null, "has_revenue_signal": bool, "evidence_type": "...", "sentiment": "...", "blast_radius": "...", "workaround_exists": "...", "linked_context_used": "yes|no", "notes": "..."}, ...]}
Bug cards are typically fewer than 20 — use a single delegate unless the
count exceeds 16.
4. Review and import classification results
⚠ APPROVAL GATE. Before importing, present all proposed classifications to
the user in a summary table (card ID, intercom_conversations, evidence_type,
has_revenue_signal, linked_context_used, changed_fields, notes). This is the
same gate applied at Step 6 for scoring.
The spot-check (Step 3 constraint) validates delegate accuracy; the approval
gate gives the user a chance to review the full set and catch judgment errors
the spot-check didn't cover. Merge all batch results into a single JSON file,
then run:
python3 box/priority-signals.py review-classifications .tmp-classify/classify-all.json .tmp-classify/results.json
Present that review output to the user. Do not run import-classifications
until the user approves. Proved 2026-04-22: imported 9 classifications
without presenting full set; Monitor flagged asymmetry with scoring gate.
linked_context_used: yes if the proposed classification relied on any
linked-context summary, else no
changed_fields: JSON array of classification fields that differ from the
currently stored priority_signals row for that card
python3 box/priority-signals.py import-classifications .tmp-classify/results.json
5. Display the table
python3 box/priority-signals.py show --state "Near Term"
python3 box/priority-signals.py show --state "Backlog"
Near Term table is the primary deliverable. Backlog table shown for context.
Sorted by board position, showing all priority signals in a scannable
comparison format.
--bugs mode: Filter the output to bug rows only:
python3 box/priority-signals.py show --state "Near Term" 2>&1 | grep -E "bug|Rank|----"
6. Score unscored cards (default, skippable)
Run gaps to find cards missing judgment scores:
python3 box/framework-rank.py gaps --state Backlog --state "Near Term"
Always use --state to scope to the groomed pool. Without it, gaps
reports across all states (Build, Test, Archived), inflating the count.
Proved 2026-04-27: unfiltered gaps reported 125 needing scope; groomed
pool had 1. Additionally, scope_score was missing from the JOIN query,
causing every card to appear unscored.
If the gap count seems high relative to last session's scoring work —
especially if 100% of cards appear unscored — investigate the tool output
before delegating. Universal failure is more likely a broken query than
universal truth.
If there are gaps, propose scores and apply them. If the user says to skip
this step, go directly to Step 7 — mispriority will run on scored cards only.
Scoring rules:
- Features need an Impact score (1-10). "How much would this move the
needle if shipped?" Informed by revenue potential, retention effect,
strategic value. This is product judgment, not mechanical.
- Bugs need a Severity score (1-5):
- 5 = Financial harm / data loss / security
- 4 = Feature broken / workflow blocked
- 3 = Degraded experience / workaround exists
- 2 = Cosmetic / misleading display
- 1 = Edge case / negligible
- All cards need a Scope score (1-5). This drives the "lowest scope
code task" Near Term rule. Non-code chores (vendor config, manual
processes) get null — they are ineligible for the lowest-scope slot.
- 1 = Trivial: single-file change, config update, copy/prompt tweak (<1hr)
- 2 = Small: few files, well-understood pattern (<1 day)
- 3 = Medium: multiple files/systems, some design decisions (1-3 days)
- 4 = Large: significant feature, multiple systems (1-2 weeks)
- 5 = XL: major initiative, new system/architecture (multi-week)
Process:
- Read the classification data (D*C partial scores, evidence_type, notes)
from the
gaps output.
- For each unscored card, propose an Impact or Severity score with a
one-line rationale. Use the card description and classification notes —
don't infer from titles alone.
- For cards needing Scope scores: delegate to sonnet agents in batches
of ~16 with card descriptions. The delegate prompt should include the
1-5 scale definitions and instruct returning null for non-code chores.
Spot-check extremes (scope 1 and scope 5) against batch files before
importing.
- Present the full batch of proposed scores to the user for review.
The user may adjust individual scores or approve the batch.
- Apply approved scores:
python3 box/framework-rank.py score SC-NNN --impact N
python3 box/framework-rank.py score SC-NNN --severity N
python3 box/framework-rank.py score SC-NNN --scope N
If the batch is large (>10 cards), split into groups by product area for
easier review. Don't score infra-track cards (chores, implementation_step)
for Impact/Severity — they are capacity-allocated, not DIC-ranked. They DO
get Scope scores (unless non-code).
--bugs mode: Skip Impact scoring entirely. Only propose Severity scores
for unscored bugs:
python3 box/framework-rank.py gaps --state Backlog --state "Near Term" 2>&1 | grep -A 100 "Bugs needing Severity"
Shortcut Severity field sync (all modes):
After applying DIC severity scores, sync the Shortcut Severity custom field
for any bugs that were just scored. The mapping is Shortcut Sev = 5 - DIC Sev
(see reference/severity-framework.md for level definitions and field IDs).
- For each bug that was just scored, include the proposed Shortcut Severity
value in the same approval table (e.g., "DIC 4 -> Shortcut Sev 1 (Blocked)").
- After approval, set the Shortcut Severity field via
box/update-severity-field.py SC-NNN <sev-level>. Route through
execute_approved (production mutation). The script reads existing
custom_fields, merges severity, PUTs back the full array, and verifies
via independent GET.
- Verify ALL updated cards via the script's built-in verification (exit
code 0 = verified, 1 = failed). If multiple cards, run one at a time.
This keeps the two representations in sync: one assessment, applied to both
framework-rank.py (DIC Sev) and the Shortcut custom field (Sev 0-4).
Proved 2026-04-28: default mode was not syncing to Shortcut; product area
wipe recovery (Apr 27) destroyed severity fields that were never restored.
7. Near Term Refresh
Compute the baseline Near Term set and move cards between Backlog and Near
Term as needed. This is the step that keeps the Near Term pool reflecting
the active epics + top-DIC-ranked standalone cards.
7a. Reconcile manual changes:
Fetch current Near Term card IDs from Shortcut. Load last computed baseline
from near_term_baseline table (most recent run_id batch).
- First run: If no prior baseline exists (empty table), skip reconciliation.
All current Near Term cards are treated as the initial computed set.
- Cards in Near Term but NOT in last baseline and NOT already pinned ->
candidate manual pin. Present to the user for confirmation.
- Cards in last baseline but now in Backlog (not Build/Test/Released) and
NOT already excluded -> candidate manual exclude. Present for confirmation.
- Cards that moved out of the groomed pool entirely are ignored (not manual
removals).
- Auto-clear any override whose card is no longer in the groomed pool
(archived, moved to Build/Test/Released). Set
cleared_at=NOW(), cleared_by='auto'. Report auto-cleared overrides for awareness.
Write confirmed overrides to near_term_overrides.
7b. Compute new baseline:
python3 box/near-term-sort.py
The sort applies this precedence per card:
- Card in active epic -> Near Term (overrides excludes)
- Card has active exclude override -> Backlog
- Card has active pin override -> Near Term
- Card qualifies by baseline rules (top-N DIC) -> Near Term
- Otherwise -> Backlog
Baseline rules for standalone cards (not in active epics, not blocked):
- Top 5 DIC-ranked bugs
- Top 1 DIC-ranked bug per Shortcut severity level (critical/high/medium/low)
- Top 5 DIC-ranked chores
- Top 5 DIC-ranked features
- Oldest card in the groomed pool (by created_at)
- Lowest scope code task (by scope_score; non-code chores excluded).
Tiebreak: DIC score desc, intercom desc, card age asc (stable, no
state dependency to prevent oscillation). Proved 2026-04-23.
- Tie-break (for top-N rules): intercom count desc, revenue signal, evidence type strength, card age asc
7c. Show proposed moves and reorder:
Present:
- Cards entering Near Term (with reason: epic, top-5 bug, severity rep, etc.)
- Cards leaving Near Term (with reason: no longer qualifies, not pinned)
- Current active overrides for review/cleanup
- #1 DIC bug board position: if the sort outputs a REORDER section, the
#1 DIC bug is not at the top of the Near Term board. Include the reorder
in the set of proposed changes.
Wait for user approval before executing moves.
7d. Execute moves and reorder via execute_approved:
State changes (Backlog ↔ Near Term): use the safe wrapper script:
python3 box/shortcut-mutate.py move SC-NNN "Near Term"
python3 box/shortcut-mutate.py move SC-NNN "Backlog"
The script handles the API call and verifies the result via independent GET.
Route each call through execute_approved. Do NOT write ad-hoc API scripts
for Shortcut mutations — wrapper scripts encode safety lessons (replace-all
semantics, verification) that raw API calls bypass.
Board reorder (position within a state): use the reorder wrapper:
python3 box/reorder-nearterm.py --first 1643
python3 box/reorder-nearterm.py --first 1643 --execute
python3 box/reorder-nearterm.py --before ANCHOR CARD_ID
python3 box/reorder-nearterm.py --after ANCHOR CARD_ID
Default is dry-run; pass --execute to mutate. Route --execute calls
through execute_approved. Note: Shortcut's position field does not work
for reordering — the script uses before_id/after_id. Proved 2026-04-23:
position value set via API returned None and card did not move.
7e. Write new baseline:
Insert computed baseline to near_term_baseline with new run_id. Old
baselines remain for audit; only the most recent is used for reconciliation.
⚠ POST-BASELINE CHECK. After --write-baseline, read the full sort output
— specifically the PROMOTE, DEMOTE, and REORDER sections. If PROMOTE or DEMOTE
is non-empty, the baseline is immediately unstable (the next run would propose
changes). Surface the instability to the user before moving on. The tiebreak
uses classification signals (intercom count, revenue signal, evidence type)
which are stable across card moves — board-position-based oscillation was
eliminated 2026-04-23.
7f. Report final Near Term composition.
8. Framework ranking comparison
python3 box/framework-rank.py mispriority -n 20
Compares framework rank (by DIC score) against live board position in Near
Term. Positive gap = board has the card lower than the framework thinks it
should be (underprioritized). Only includes scored cards — unscored cards
from skipped Step 6 will not appear.
--bugs mode: Use the filtered rank output instead:
python3 box/framework-rank.py rank --state "Near Term" 2>&1 | grep "BUG"
9. Present mispriority flags concretely
After Step 8, propose specific board reorders for any card with a gap of
+5 or more (framework rank significantly higher than board position). Name
the card, the current position, the proposed position, and the anchor card.
Don't ask generic adequacy questions — the mispriority table and proposed
moves are the deliverable. Clean up temp files after the reorders land.
Adapting to other states
The default targets the groomed pool (Backlog + Near Term). The classification
and scoring workflow works for any single state by replacing the --state
flags. The Near Term refresh step (7) only runs in default mode.