一键在 Manus 中运行任何 Skill

cost-diff

Snapshot delta between two cost-summary JSON outputs. PR-level cost regression detection — answers "what changed between these two specific snapshots?". Pairs with cost-summary's stable JSON contract.

在 Manus 中运行

星标59,880

分支6,939

更新时间2026年6月16日 16:10

来源

ruvnet

ruvnet/ruflo

打开 GitHub 仓库查看创作者相关仓库

安装命令

下载

在 Manus 中运行

SKILL.md

readonly

name	cost-diff
description	Snapshot delta between two cost-summary JSON outputs. PR-level cost regression detection — answers "what changed between these two specific snapshots?". Pairs with cost-summary's stable JSON contract.
argument-hint	--baseline <baseline.json> --current <current.json> [--alert-on-pct N] [--alert-on-usd N] [--alert-on-class-pct <class>:N[,<class>:N]] [--format table\|json]
allowed-tools	Bash

PR-level cost regression detection. Where cost-counterfactual compares to HYPOTHETICAL baselines (always-haiku/sonnet/opus) and cost-burn compares latest bucket to PRIOR MEAN, cost-diff compares two SPECIFIC known-good snapshots.

Question	Skill
"What would we have spent at always-X?"	`cost-counterfactual`
"Is daily burn accelerating vs prior mean?"	`cost-burn`
"Did THIS PR add spend vs main?"	`cost-diff` ← this

Algorithm

Implementation: scripts/diff.mjs. Consumes the stable JSON contract from cost summary --format json.

Load --baseline and --current JSON snapshots.
Sanity check: both must have total_cost_usd + sessionCount (cost-summary shape).
Per-key delta: byTier (haiku/sonnet/opus) and byModel (each model).
Each entry tagged added / removed / changed based on baseline / current zero-ness.
Sort table by |delta| descending so the biggest movers are at the top.
--alert-on-pct N: exit 1 when total_pct > N.
--alert-on-usd N: exit 1 when total_delta_usd > N. Both can be set; first to trigger wins.

PR-gate workflow

# Capture baseline (e.g. on main, via the cost-tracker-smoke CI workflow)
cost summary --format json > baseline.json

# On the PR branch, capture current state
cost summary --format json > current.json

# Compare; fail the PR if total spend grew >10% OR >$5
cost diff --baseline baseline.json --current current.json \
          --alert-on-pct 10 --alert-on-usd 5.00

The combination of both flags catches:

Percent-only fires: a small absolute change but a meaningful shift (e.g. doubling from $0.10 to $0.20 hits +100% but only +$0.10).
USD-only fires: a large absolute change with a small percent (e.g. growing from $100 to $110 is only +10% but +$10).

Either signal can fail the PR independently — they're OR'd.

--alert-on-class-pct (iter 86)

The two USD-level thresholds above miss a regression class: when ONE token type grows disproportionately even though total spend grows modestly. Example: a PR introduces a verbose context-cache pattern, total spend grows only 10% (under --alert-on-pct 50), but cache_write tokens grow 900%. The iter-82 driver hides inside the USD signal.

--alert-on-class-pct cache_write:50 exits 1 when cache_write tokens grow more than 50% baseline → current. Multiple classes can be checked in one flag (comma-separated):

cost diff --baseline baseline.json --current current.json \
          --alert-on-class-pct cache_write:50,output:25

First class to breach wins. Valid classes: input | output | cache_write | cache_read.

Recommended PR-gate triad:

cost diff --baseline ... --current ... \
          --alert-on-pct 25 \
          --alert-on-usd 5.00 \
          --alert-on-class-pct cache_write:100

Three orthogonal signals — pct (total grew), usd (large absolute jump), class-pct (composition shifted). Each catches what the others miss; AND-of-OR semantics means any one firing fails the PR.

Smoke transcript (synthetic baseline + current)

| Total spend       | $1.000000 | $1.500000 | +$0.500000 | 50.00% |
| Sessions          | 10        | 13        | +3         | 30.00% |

## By tier
| opus   | $0      | $0.60   | +$0.600000 | new      | added   |
| sonnet | $0.70   | $0.50   | -$0.200000 | -28.57%  | changed |
| haiku  | $0.30   | $0.40   | +$0.100000 | 33.33%   | changed |

Notice the table is sorted by absolute delta, not alphabetically — the biggest mover (opus newly added) bubbles to the top. Operators reading top-down see "what mattered" first.

Exit codes

Exit	Meaning
0	No alert, OR no thresholds set
1	--alert-on-pct or --alert-on-usd threshold exceeded
2	Config error (missing files, invalid JSON, malformed snapshot)

Status column

Status	Meaning
`added`	This tier/model was $0 in baseline, >$0 in current
`removed`	This tier/model was >$0 in baseline, $0 in current
`changed`	Both baseline and current >$0; delta is the difference

Entries with baseline === 0 && current === 0 are dropped (nothing to report).

Composition with cost-summary

cost-diff is the SECOND HALF of a contract that cost-summary started: the stable JSON shape from cost summary --format json. Both pieces have been frozen — adding fields to summary is fine; renaming or removing isn't.

If you're consuming snapshots elsewhere (dashboards, alerting), the same shape works — cost-diff is just one consumer.

同仓库更多 Skills

同仓库

cost-anomaly

ruvnet/ruflo

MAD-based outlier detection on session spend. Robust to the very outliers it hunts (unlike mean+sigma). Surfaces specific anomalous sessions with modified-z scores; optional --alert-on-outliers exit code for CI gates. Distinct from cost-burn (aggregate trend) — this answers "which INDIVIDUAL session is the outlier?".

2026-06-1659.9k

cost-burn

ruvnet/ruflo

Burn-rate trend over time with optional drift-alert exit code. Bins session spend into buckets, surfaces window-over-window delta, and can exit 1 when latest bucket exceeds prior mean by a configurable %. Distinct from `cost-trend` (benchmark drift); this tracks PRODUCTION spend trajectory.

2026-06-1659.9k

cost-counterfactual

ruvnet/ruflo

Multi-baseline counterfactual cost analysis. Compares actual session spend to hypothetical always-haiku / always-sonnet / always-opus routing baselines. Answers "is the routing earning its keep?" Negative savings flag over-escalation; positive savings quantify the router's win.

2026-06-1659.9k

cost-health

ruvnet/ruflo

Composite CI gate — runs cost-budget-check + cost-burn + cost-anomaly + cost-projection in parallel and surfaces a single combined health status with max exit code. The operationally-useful entry point — one shell-out covers all four alert ladders.

2026-06-1659.9k

cost-projection

ruvnet/ruflo

Forward-looking spend extrapolation. Computes a USD-per-day rate from the recent measurement window, projects to 7d/30d/90d/365d horizons, and surfaces "days until budget exhausted" when a budget is configured. Predictive counterpart to `cost-budget-check` (reactive).

2026-06-1659.9k

cost-session

ruvnet/ruflo

Per-message cost breakdown within a single session. The drill-down companion to cost-anomaly — when an outlier session is flagged, this surfaces the specific expensive messages so operators can see whether the cost came from output tokens, cache writes, or model escalations.

2026-06-1659.9k

Question

Skill

"What would we have spent at always-X?"

cost-counterfactual

"Is daily burn accelerating vs prior mean?"

cost-burn

"Did THIS PR add spend vs main?"

cost-diff ← this

# Capture baseline (e.g. on main, via the cost-tracker-smoke CI workflow) cost summary --format json > baseline.json # On the PR branch, capture current state cost summary --format json > current.json # Compare; fail the PR if total spend grew >10% OR >$5 cost diff --baseline baseline.json --current current.json \ --alert-on-pct 10 --alert-on-usd 5.00

| Total spend | $1.000000 | $1.500000 | +$0.500000 | 50.00% | | Sessions | 10 | 13 | +3 | 30.00% | ## By tier | opus | $0 | $0.60 | +$0.600000 | new | added | | sonnet | $0.70 | $0.50 | -$0.200000 | -28.57% | changed | | haiku | $0.30 | $0.40 | +$0.100000 | 33.33% | changed |

Exit

Meaning

No alert, OR no thresholds set

--alert-on-pct or --alert-on-usd threshold exceeded

Config error (missing files, invalid JSON, malformed snapshot)

Status

Meaning

added

This tier/model was $0 in baseline, >$0 in current

removed

This tier/model was >$0 in baseline, $0 in current

changed

Both baseline and current >$0; delta is the difference