| name | billing-lib |
| description | Use when writing or reviewing code that meters API token usage, bills accounts, issues invoices, applies credit grants, or computes balances with the internal `billing` library — especially around retries, mid-cycle plan changes, cache-read vs cache-write token pricing, or any place where double-billing or rounding drift would be a problem. |
billing-lib
Overview
billing is the internal usage-metering and invoicing library. It records token consumption per account, prices it against the active plan, and rolls events into invoices, credit grants, and balances. Every metering and money path is idempotent and integer-cent based — getting either wrong silently corrupts customer bills.
When to Use
Use this skill when you are:
- Recording token usage for a request (
meter_usage) from an inference service or gateway.
- Pricing cache-read vs cache-write vs base input tokens differently.
- Generating invoices, applying credit grants, or reading account balances.
- Handling retries / at-least-once delivery where the same usage event may arrive twice.
- Changing a plan mid-cycle and needing correct proration.
- Reconciling a balance that "looks off" by a few cents.
Do NOT use this for: public-facing pricing display logic, the marketing pricing page, or the Stripe/processor integration layer (that lives in payments, not billing). This library stops at producing the invoice; it does not charge cards.
Core Model
- All money is stored as integer cents (
int), never floats. A Money type wraps cents; never construct prices from float.
- Token quantities are
int. Pricing is cents_per_million_tokens (an int), so cost = tokens * rate // 1_000_000 with explicit rounding (see Gotchas).
- A usage event is the atomic unit. Events are immutable and keyed by
idempotency_key. Invoices are derived from events; you never edit an invoice line directly — you append a correcting event.
- Token classes are distinct and priced separately:
input, output, cache_write, cache_read. They are NOT interchangeable.
See references/api.md for full signatures, types, and error semantics.
Typical Flow
- An inference service finishes a request and calls
meter_usage(...) with a deterministic idempotency_key derived from the request id.
- At cycle close,
create_invoice(account_id, period) aggregates events into priced lines.
apply_credit_grant(...) records prepaid/promotional credit; balances net against it at read time.
get_balance(account_id) returns the current owed/credit position.
from billing import meter_usage, get_balance
meter_usage(
account_id="acct_8812",
model="claude-opus-4-8",
input_tokens=1_240,
output_tokens=860,
cache_read_tokens=10_000,
cache_write_tokens=0,
idempotency_key=f"req:{request_id}",
)
bal = get_balance("acct_8812")
Gotchas
ALWAYS treat these as real, observed failure modes — each has caused a customer-visible billing error before.
-
Idempotency key must be derived from the request, not generated at call time. A retry that calls uuid4() again produces a new key and double-bills. The key must be a pure function of the upstream request id (f"req:{request_id}"). meter_usage dedupes on (account_id, idempotency_key); a duplicate key is a no-op that returns the original event — it does NOT raise. If you see a DuplicateUsageEvent being swallowed in logs, that is the system working, not a bug to "fix" by removing the key.
-
Rounding is half-cent banker's rounding, applied once at the invoice line — never per event. Summing per-event rounded costs drifts by cents over thousands of events. Aggregate raw token totals per (model, token_class), then price and round the total. meter_usage stores raw tokens, not pre-computed cost, precisely so this stays correct. If you cache a cost_cents on the event, you will reintroduce the drift.
-
Cache-read and cache-write tokens are priced on opposite ends. cache_write is typically MORE expensive than base input (you pay to populate the cache); cache_read is much CHEAPER. A common mistake is treating all cache tokens as a single discounted class — that under-bills cache writes. Pass them as separate fields; never fold them into input_tokens.
-
Proration on mid-cycle plan changes splits the cycle at the change timestamp, and the OLD plan owns events before it. Call change_plan(account_id, new_plan, *, effective_at) — it closes the current sub-period and opens a new one. Do NOT retroactively reprice already-metered events at the new rate. If effective_at is omitted it defaults to now, which is almost never what you want for a scheduled change — always pass it explicitly.
-
Credit-grant ordering matters: oldest-expiring credit is consumed first (FIFO by expiry, not by grant date). A grant created later but expiring sooner is spent first. apply_credit_grant takes an expires_at; if you pass None, it is treated as non-expiring and consumed LAST. Getting this order wrong causes credit to expire unused while later grants are burned — a support escalation, not a rounding nit.
-
get_balance nets credit at read time; it is not a stored column. Do not sum invoice totals and subtract a credit_balance field — there isn't one. Always go through get_balance, which applies the FIFO credit consumption and unexpired-grant rules. Reimplementing this math elsewhere will disagree with the canonical value.
-
Negative usage is rejected, but a refund is a real concept. Don't try to "un-bill" by metering negative tokens (meter_usage raises ValueError on negative quantities). Issue a credit grant or a correcting invoice adjustment instead (apply_credit_grant with reason="refund").
-
model must match the priced model registry exactly. An unknown model id raises UnknownModel at metering time — it does not silently bill at $0. Aliases (claude-opus-4-8[1m] vs claude-opus-4-8) are distinct price points; the 1M-context variant has its own rate. Do not strip suffixes.
Files
references/api.md — full function signatures, types (Money, UsageEvent, Invoice, CreditGrant, Balance), error classes, and worked pricing/proration/credit examples.