一键在 Manus 中运行任何 Skill

cost-counterfactual

Multi-baseline counterfactual cost analysis. Compares actual session spend to hypothetical always-haiku / always-sonnet / always-opus routing baselines. Answers "is the routing earning its keep?" Negative savings flag over-escalation; positive savings quantify the router's win.

在 Manus 中运行

星标59,880

分支6,939

更新时间2026年6月16日 16:10

来源

ruvnet

ruvnet/ruflo

打开 GitHub 仓库查看创作者相关仓库

安装命令

下载

在 Manus 中运行

SKILL.md

readonly

name	cost-counterfactual
description	Multi-baseline counterfactual cost analysis. Compares actual session spend to hypothetical always-haiku / always-sonnet / always-opus routing baselines. Answers "is the routing earning its keep?" Negative savings flag over-escalation; positive savings quantify the router's win.
argument-hint	[--since 7d] [--baseline always-haiku\|always-sonnet\|always-opus\|all] [--format table\|json]
allowed-tools	Bash

Multi-baseline counterfactual cost analysis. Pairs with the existing observability surface:

cost-budget-check — "have we crossed a threshold?" (reactive)
cost-projection — "when will we cross a threshold?" (predictive)
cost-counterfactual — "is the routing earning its keep?" (comparative) ← this one

Algorithm

Read all session-* records from the cost-tracking namespace.
Apply --since window filter (default all-time).
Sum tokens across byModel[*] entries for each session.
For each requested baseline (default: all three):
- counterfactualUsd = (input × tier.input + output × tier.output + cache_write × tier.cache_write + cache_read × tier.cache_read) / 1M
Compute savings = counterfactualUsd − actualUsd.
Emit per-baseline totals + savings % across the comparison set.

Smoke transcript (2 sessions: 50K haiku tokens + 50K sonnet tokens)

| Sessions considered | 2 |
| Total input tokens  | 100,000 |
| Actual spend        | $0.162500 |

| Baseline           | Hypothetical | Actual    | Savings    | %       |
| `always-haiku`     | $0.025000    | $0.162500 | -$0.137500 | -550.00% |
| `always-sonnet`    | $0.300000    | $0.162500 | +$0.137500 |   45.83% |
| `always-opus`      | $1.500000    | $0.162500 | +$1.337500 |   89.17% |

How to read negative savings

A negative always-haiku result means the router chose more-expensive models than haiku on tasks haiku could have handled. That's an over-escalation signal:

Maybe qualityBar is set too high
Maybe the sonnet/opus session was warranted by complexity but the baseline doesn't know that
Run cost optimize (or inspect specific sessions via cost conversation) to investigate

Positive savings quantify the router's win against that baseline. The most informative number is usually always-sonnet — it's the standard "safe default" baseline most teams would pick if they didn't have routing.

When to use

Quarterly cost review: "We saved $X vs always-Sonnet — here's the proof."
CI gate: cost counterfactual --format json | jq '.baselines[1].savingsPct > 30' — fail builds if routing isn't saving ≥30% vs sonnet baseline (workload-shift detector).
Routing-config validation: When introducing a new qualityBar or cost-ceiling, re-run counterfactual to confirm savings didn't regress.

Stationarity caveat

Like all counterfactual analyses, this assumes the same tokens at the same complexity would have produced the same outcome from the baseline model. That's an upper bound — the baseline might have failed and required retries, which the math doesn't capture. Treat the numbers as a quality-blind ceiling.

同仓库更多 Skills

同仓库

cost-anomaly

ruvnet/ruflo

MAD-based outlier detection on session spend. Robust to the very outliers it hunts (unlike mean+sigma). Surfaces specific anomalous sessions with modified-z scores; optional --alert-on-outliers exit code for CI gates. Distinct from cost-burn (aggregate trend) — this answers "which INDIVIDUAL session is the outlier?".

2026-06-1659.9k

cost-burn

ruvnet/ruflo

Burn-rate trend over time with optional drift-alert exit code. Bins session spend into buckets, surfaces window-over-window delta, and can exit 1 when latest bucket exceeds prior mean by a configurable %. Distinct from `cost-trend` (benchmark drift); this tracks PRODUCTION spend trajectory.

2026-06-1659.9k

cost-diff

ruvnet/ruflo

Snapshot delta between two cost-summary JSON outputs. PR-level cost regression detection — answers "what changed between these two specific snapshots?". Pairs with cost-summary's stable JSON contract.

2026-06-1659.9k

cost-health

ruvnet/ruflo

Composite CI gate — runs cost-budget-check + cost-burn + cost-anomaly + cost-projection in parallel and surfaces a single combined health status with max exit code. The operationally-useful entry point — one shell-out covers all four alert ladders.

2026-06-1659.9k

cost-projection

ruvnet/ruflo

Forward-looking spend extrapolation. Computes a USD-per-day rate from the recent measurement window, projects to 7d/30d/90d/365d horizons, and surfaces "days until budget exhausted" when a budget is configured. Predictive counterpart to `cost-budget-check` (reactive).

2026-06-1659.9k

cost-session

ruvnet/ruflo

Per-message cost breakdown within a single session. The drill-down companion to cost-anomaly — when an outlier session is flagged, this surfaces the specific expensive messages so operators can see whether the cost came from output tokens, cache writes, or model escalations.

2026-06-1659.9k

| Sessions considered | 2 | | Total input tokens | 100,000 | | Actual spend | $0.162500 | | Baseline | Hypothetical | Actual | Savings | % | | `always-haiku` | $0.025000 | $0.162500 | -$0.137500 | -550.00% | | `always-sonnet` | $0.300000 | $0.162500 | +$0.137500 | 45.83% | | `always-opus` | $1.500000 | $0.162500 | +$1.337500 | 89.17% |