| name | adding-hardware-sku |
| description | Use when adding a new accelerator (GPU/TPU/Trainium/Gaudi), interconnect fabric, or multi-accelerator system to the llm-calc database (calc/src/data/{accelerators,interconnects,systems}.ts). Routes to the right sub-procedure based on whether the SKU is a new chip, a new fabric, or a new product composition. Invoke whenever hardware is being added — the failure modes (sparsity-inflated TFLOPS, confused per-direction vs aggregate BW, wrong variant in a system) all produce plausible-looking perf numbers that are silently wrong. |
Adding a Hardware SKU
Three things live under hardware. Pick the right sub-procedure:
- Accelerator — new chip generation (H200, MI400, TPU v7). Adds an
AcceleratorSpec with variants and operating points.
- Interconnect — new fabric (NVLink 6, NVL Switch tray, IB-XDR, EFA v4). Adds an
InterconnectSpec.
- System — new product (DGX Spark, GB300 NVL72, p6e-class cloud SKU). Composes existing accelerator + interconnect.
If multiple apply (a new chip + new fabric + new system land together), do them in order: accelerator → interconnect → system. Later layers reference earlier by id.
Source priority (all hardware)
- Vendor whitepaper / architecture brief — primary truth. NVIDIA architecture whitepaper PDFs, AMD CDNA briefs, Intel Gaudi product briefs, Google TPU papers, AWS Neuron docs.
- Vendor datasheet — HBM capacity, package power, form factor.
- Independent microbenchmark paper — for
achievable operating points only. See calc/src/data/sources.ts for the registry (arxiv-2501-12084 for Hopper, arxiv-2510-27583 for MI300X, etc.).
- Cloud SKU page — for
availability.clouds on systems and to disambiguate variant labels (e.g. "AWS P5e" → which exact accelerator+variant).
Aggregator/marketing sites are not acceptable primary sources for TFLOPS, HBM BW, or fabric specs. See docs/data-philosophy.md for the reasoning.
A. New Accelerator
Files: calc/src/data/accelerators.ts, calc/src/data/sources.ts.
Schema in calc/src/engine/types.ts (AcceleratorSpec / AcceleratorVariant / AcceleratorOperatingPoint).
Shape:
{
id: 'mi400',
name: 'AMD MI400',
vendor: 'AMD',
family: 'CDNA Next',
variants: [
{
id: 'oam-256',
label: 'OAM 256GB',
hbmCapacityGB: 256,
operatingPoints: [
{
id: 'peak', label: 'Peak',
tflops: { fp16: ..., bf16: ..., fp8: ..., int8: ..., fp4: ... },
hbmBandwidthGBs: ...
},
{
id: 'achievable', label: 'Achievable',
tflops: { ... },
hbmBandwidthGBs: ...,
tflopsSources: ['arxiv-...'],
bandwidthSources: ['nvbandwidth'],
asOf: '2026-Q3',
notes: 'one-line measurement context'
}
]
}
]
}
TFLOPS table — pitfalls (most common bugs land here)
- Sparsity multipliers: NVIDIA quotes TensorCore numbers at 2:1 sparse. The schema uses dense values. If a spec sheet says "989/1979 TFLOPS BF16 (with sparsity)", record the unsparse half.
- FP4 / FP6: Blackwell-class chips quote these with sparsity assumed. Same rule.
- INT8 vs FP8: identical throughput on most modern chips; record both if the vendor lists both.
- TF32: skip — not a serving dtype.
- Boost vs base clock: most vendor TFLOPS are boost-clock peak. Record those (matches the rest of the database).
Achievable operating points
Optional but valuable. Acceptable sources:
- mamf-finder MAMF table (PyTorch torch.mm sweeps)
- Microbenchmark papers on arxiv (arxiv-2501-12084 Hopper, arxiv-2510-27583 MI300X, arxiv-2512-02189 Blackwell, etc.)
- AMD MAFs blog, NVIDIA cuBLAS perf posts
Don't fabricate achievable numbers. Either cite a source or skip the operating point — peak-only entries are fine.
If you add a new source, register it in sources.ts first, then reference its key.
B. New Interconnect
File: calc/src/data/interconnects.ts.
Schema in calc/src/engine/types.ts (InterconnectSpec).
Shape:
{
id: 'nvlink-6',
name: 'NVLink 6',
vendor: 'NVIDIA',
generation: 'Gen6 (Rubin)',
perGpuBandwidthGBs: ...,
perDirectionGBs: ...,
linksPerGpu: ...,
perLinkGBs: ...,
topology: 'switched',
scale: 'intra-node',
maxScaleUpGpus: 8,
sources: ['nvidia-nvlink'],
notes: '...'
}
Bandwidth conventions (read types.ts:58-72)
perGpuBandwidthGBs is the bidirectional aggregate number on vendor slides ("900 GB/s NVLink 4"). Halve it for ring all-reduce math.
- For point-to-point fabrics (direct NVLink, IB):
perLinkGBs × linksPerGpu should equal the per-direction aggregate. Check arithmetic before committing.
- For switched (NVSwitch, NVL72): the aggregate is what each chip can pump into the switch; bisection is a separate property (
contention.bisectionFactor).
Optional: contention model
Add contention: { bisectionFactor, oversubscription?, hopCostModel, singleHopUtilization } only with data. Hand-waving guesses are worse than omitting; the engine falls back gracefully. Tier overrides (tiers) are for measured collective performance — only with a citable source.
C. New System
File: calc/src/data/systems.ts.
Schema in calc/src/engine/types.ts (MultiAcceleratorSystem).
Pure composition — pick existing accelerator id+variant, existing interconnect id, set form factor, fill aggregates.
{
id: 'dgx-spark',
name: 'NVIDIA DGX Spark',
vendor: 'NVIDIA',
generation: 'GB10',
formFactor: 'node',
accelerator: { id: 'gb10', variantId: 'unified-128', count: 1 },
interconnectId: '...',
scaleOutInterconnectId: '...',
scaleOutNicsPerNode: 1,
aggregate: {
totalHbmGB: 128,
fabricBidirectionalTBs: ...
},
availability: { onPrem: true, clouds: [...] },
notes: '...'
}
Sanity checks (where systems go wrong)
aggregate.totalHbmGB must equal count × hbmCapacityGB. Off-by-factor-of-ten is the most common bug.
aggregate.fabricBidirectionalTBs must equal perGpuBandwidthGBs × count / 1000. Same kind of math, easy to flub.
availability.clouds should match the cloud SKU page exactly — aws, azure, gcp, oci, coreweave, lambda, crusoe are the common ones (full list in CloudProvider in types.ts).
- Confirm
accelerator.variantId exists in the referenced accelerator. TS won't catch this — it'd require a const-keyed lookup; reviewer needs to check.
Validation (all hardware)
npm test — won't catch numeric errors (no test asserts specific TFLOPS), but ensures imports resolve.
npm run check — TS catches schema mismatches.
npm run dev — open the UI, select the new accelerator / system, verify the perf panel renders without NaN and numbers are in the right order of magnitude vs. siblings (e.g. a new "H300" should land between H200 and B200; if it lands at 10× B200, you have a sparsity bug).
No TDD pattern here — data-only changes don't get unit tests. Schema changes do, but those go through the model-skill TDD flow because they touch the engine.
Anti-patterns
- Recording sparsity-inflated TFLOPS without halving. The most common bug, by a wide margin.
- Confusing
perGpuBandwidthGBs (aggregate) vs perDirectionGBs (half) on interconnects.
- Inventing achievable numbers without a source citation.
- Picking the wrong accelerator variant in a system (SXM vs PCIe, 80GB vs 96GB).
- Skipping
sources / asOf because "the data is obvious". It rots; provenance is the only defense.
- Adding a tier / contention model based on intuition rather than measurement. Empty is better than wrong.