| name | budget-gatekeeper |
| description | Runtime budget enforcement for conductor workflows. Tracks model tier usage, delegation count, and estimated token cost across phases. Provides soft/hard limits with escalation patterns to prevent runaway sessions. |
| user-invocable | true |
| argument-hint | [session or phase to report tier usage for] |
Budget Gatekeeper
Provides runtime budget enforcement patterns for conductor workflows, tracking model tier allocation, delegation count, and estimated token cost to prevent runaway sessions.
Copilot CLI Native Metrics
Under copilot CLI, use native commands for budget telemetry instead of manual estimation:
/context — real-time token usage and visualisation.
/usage — session-wide model cost and invocation metrics.
Hooks still emit artifacts/sessions/hooks/*.jsonl for post-session analytics. CLI metrics supplement, do not replace, the JSONL stream.
Description
This skill teaches the conductor and delegating agents how to monitor and enforce session budgets in real time. Inspired by Athena's BudgetGatekeeper pattern, it adapts the concept to our multi-agent orchestration model — where the "cost" is distributed across 16 specialized agents using a 3-tier model allocation.
Unlike our existing token-thresholds.json (which enforces file-level token limits at validation time), this skill provides runtime session-level enforcement during active conductor workflows.
When to Use
This skill is relevant when:
- Conductor is orchestrating a multi-phase workflow with 3+ phases
- Multiple premium-tier agents are being invoked in sequence
- A workflow involves parallel subagent tracks (ULTRADEEP complexity)
- Session has been running for an extended period (15+ minutes)
- User requests cost-aware execution ("keep this lean", "minimize API usage")
When NOT to Use
- Do not use for single-turn questions or instant-complexity tasks that complete in one delegation.
- Do not use for static file-level token enforcement — that is handled by
token-thresholds.json and scripts/token-report.ps1.
Entry Points
Trigger Phrases
- "budget check", "cost estimate", "token usage"
- "keep this lean", "minimize cost", "efficient execution"
- "how many delegations", "session cost"
Context Patterns
- Multi-phase workflows with 5+ phases
- Repeated reviewer → implementer iteration loops
- Three or more premium-tier agent invocations in one session
- Extended sessions exceeding 30 minutes
Core Knowledge
Budget Tracking Model
The conductor tracks four budget dimensions across a workflow session:
| Dimension | Soft Limit (80%) | Hard Limit (100%) | Action at Soft | Action at Hard |
|---|
| Delegations | 16 | 20 | Warn user, suggest consolidation | Pause + require explicit approval |
| Premium-Tier Calls | 2 | 3 | Reserve for security-only reviews | Block further premium calls |
| Estimated Session Tokens | 400K | 500K | Recommend context compaction | Require session handoff |
| Wall Clock Time | 24 min | 30 min | Suggest checkpoint + fresh session | Force pause point |
Model Tier Cost Weights
Each delegation carries a cost weight based on the target agent's model tier. Weights are approximate ratios based on current per-token pricing.
| Tier | Relative weight | Agents | Monthly Budget Target |
|---|
| Premium (Opus 4.7/4.6) | ~1.7x | security-mode reviewer only | ≤5% of total delegations |
| Execution (Sonnet 4.6, GPT-5.4, GPT-5.3-Codex) | 1x | conductor, reviewer, implementer, planner, researcher, ops, test, iac, gui-tester, translation-conductor, translator, translation-analyzer, translation-validator | ~80% of total delegations |
| Fast (Haiku 4.5, GPT-5.4 mini) | ~0.33x | docs, ux, translation-styler | ~15% of total delegations |
Annual Plan Multiplier Mode
If the workspace is still on annual-plan multipliers, override the weights with the policy multipliers (e.g., Opus 27x, Sonnet 6x) and treat premium invocations as budget-exception-only.
AI Credits Mode (Default)
Under usage-based billing, cost is driven by token volume and per-model pricing. The tier weights are directional only; apply stricter controls on context size and effort before escalating tiers.
Budget State Tracking
The conductor maintains a budget state block in its internal tracking, updated after every delegation:
## Budget Status
- **Delegations:** {used}/{limit} ({percentage}%)
- **Premium Calls:** {used}/{limit}
- **Est. Tokens:** ~{used}K / {limit}K
- **Session Time:** {elapsed} min / {limit} min
- **Effort Profile:** {High: N, Medium: N, Low: N}
- **Zone:** 🟢 Green | 🟡 Yellow (soft limit) | 🔴 Red (hard limit)
Effort-Weighted Token Estimates
Thinking effort affects token consumption within the same pricing tier. Use these weights for budget estimation:
| Effort Level | Thinking Token Weight | Typical Use |
|---|
| None | 0× (no thinking) | Routine-tier agents (Haiku) |
| Low | 0.7× baseline | Docs, Ops, Translation sub-agents |
| Medium | 1× baseline | Implementer, Reviewer, Test (default) |
| High | 1.5× baseline | Security, Beast Mode, Red Team |
When tracking estimated session tokens, multiply the base estimate by the effort weight:
- A Medium-effort delegation to implementer: ~25K tokens × 1.0 = ~25K
- A High-effort delegation to security: ~25K tokens × 1.5 = ~37.5K
- A Low-effort delegation to docs: ~25K tokens × 0.7 = ~17.5K
Yellow Zone Cost Saving via Effort Reduction
When approaching budget limits, reduce effort before switching model tiers:
- Switch Execution-Medium agents to Execution-Low (save ~30% thinking tokens)
- Use premium tier only for explicit security review requests
- Reserve High effort for security and adversarial reviews only
Enforcement Patterns
Green Zone (0-79% of any limit)
Normal operation. No special actions needed.
Yellow Zone (80-99% of any limit — Soft Limit)
⚠️ **Budget Warning**: {dimension} at {percentage}% ({used}/{limit}).
Recommendations:
1. Consolidate remaining phases where possible
2. Consider switching premium agents to execution-tier alternatives:
- researcher → implementer (if just gathering file contents)
- planner → conductor (if plan adjustments are minor)
3. Create a handoff artifact if approaching token limit
Red Zone (100%+ of any limit — Hard Limit)
🛑 **Budget Limit Reached**: {dimension} at {used}/{limit}.
**MANDATORY PAUSE POINT**
Options:
a) **Checkpoint & Continue** — Save current progress to `artifacts/sessions/`, start fresh session
b) **Override** — User explicitly approves additional budget: "I acknowledge the budget override for {N} additional {delegations/calls}."
c) **Abort** — Stop workflow, preserve artifacts
Awaiting your decision before proceeding.
Premium-Tier Conservation Strategies
When approaching premium-tier limits, apply these substitution patterns:
| Instead of... | Use... | When acceptable |
|---|
reviewer --security (Opus) | Standard reviewer (Sonnet) | Non-security review requests, low-risk changes |
researcher (GPT-5.4) | implementer with search tools | Gathering file contents or API docs (not strategic research) |
reviewer (GPT-5.4) | quick-review prompt (Haiku) | Minor changes, NIT-only expected findings |
red-team (GPT-5.4) | reviewer with adversarial prompt | When red-team findings are optional, not mandatory |
reviewer --adversarial | Standard reviewer with adversarial prompt | When red-team depth is needed but not a full dedicated pass |
Circuit Breaker Pattern
For destructive or high-risk operations, the conductor activates a circuit breaker independent of budget limits:
Ruin-Risk Detection
Trigger the circuit breaker when the workflow involves:
- File deletions affecting 5+ files or entire directories
- Security policy changes (auth, encryption, access control)
- Production environment modifications (deployment configs, infrastructure)
- PII or credential handling (schema changes, migration scripts)
- Irreversible operations (database migrations, external API calls with side effects)
Circuit Breaker Ceremony
⚠️ **CIRCUIT BREAKER ACTIVATED**
**Detected:** {ruin category} with estimated risk level: {LOW | MEDIUM | HIGH}
**Trigger:** {specific operation that triggered the breaker}
**Impact:** {what could go wrong if this proceeds without review}
**Required:** To proceed, you must:
1. Review the specific changes listed above
2. Confirm with: "I acknowledge the risk and approve proceeding with {operation}."
Any other response or no response = abort this operation.
Integration with Token Thresholds
This skill complements (does not replace) the existing static enforcement:
| Enforcement Layer | Scope | When | Tool |
|---|
token-thresholds.json | Per-file token limits | Validation time (CI/PR) | scripts/token-report.ps1 |
budget-gatekeeper skill | Per-session runtime budget | Active conductor workflow | Conductor state tracking |
maxFileTokens: 10000 | Individual file size | Validation time | scripts/validate-copilot-assets.ps1 |
Session Handoff on Budget Exhaustion
When the conductor needs to hand off to a fresh session due to budget constraints:
## Session Handoff Artifact
**Save to:** `artifacts/sessions/{date}-{feature-slug}-handoff.md`
### Progress Summary
- **Original objective:** {user's request}
- **Completed phases:** {list with status}
- **Current phase:** {in-progress work}
- **Pending phases:** {remaining work}
### Context for Next Session
- **Key decisions made:** {list}
- **Open questions:** {list}
- **Artifacts created:** {paths}
- **Budget consumed:** {delegations, premium calls, est. tokens}
### Next Steps
1. {immediate next action}
2. {subsequent actions}
### Files Modified
{list of changed files with brief descriptions}
Examples
Example 1: Budget Check Mid-Workflow
## Budget Status
- **Delegations:** 12/20 (60%)
- **Premium Calls:** 1/3 (security review)
- **Est. Tokens:** ~280K / 500K
- **Session Time:** 18 min / 30 min
- **Zone:** 🟢 Green
All limits within normal range. Proceeding with Phase 4.
Example 2: Yellow Zone Warning
## Budget Status
- **Delegations:** 17/20 (85%)
- **Premium Calls:** 2/3 (67%)
- **Est. Tokens:** ~410K / 500K
- **Session Time:** 22 min / 30 min
- **Zone:** 🟡 Yellow
⚠️ **Budget Warning**: Delegations at 85% (17/20) and premium calls at 67% (2/3).
Recommendations:
1. Consolidate remaining 2 phases into a single implementer delegation
2. Use the `quick-review` prompt instead of full reviewer for the final phase
3. Skip optional documentation update — flag as follow-up task
Example 3: Circuit Breaker Activation
⚠️ **CIRCUIT BREAKER ACTIVATED**
**Detected:** Database migration affecting production schema
**Trigger:** Phase 3 implementation includes `ALTER TABLE users DROP COLUMN legacy_id`
**Impact:** Irreversible column deletion; any code referencing `legacy_id` will fail
**Required:** To proceed, you must:
1. Confirm no code references `legacy_id` (grep results show 0 matches)
2. Confirm with: "I acknowledge the risk and approve proceeding with schema migration."
Any other response = abort this operation.
References
- Token Thresholds:
token-thresholds.json — static per-file limits
- Token Report Script:
scripts/token-report.ps1 — validate token budgets
- Conductor Instructions:
instructions/workflows/conductor.instructions.md — workflow rules and pause expectations
- Conductor Agent:
.github/agents/conductor.agent.md — runtime orchestration entry point
- Model Selection:
instructions/global/03_model-selection.instructions.md — tier definitions
- Conductor Lifecycle:
.github/skills/conductor-lifecycle/SKILL.md — phase management
- Delegation Routing:
.github/skills/delegation-routing/SKILL.md — agent selection
- Inspiration: Athena-Public
BudgetGatekeeper pattern (MIT License)