| name | policyengine-microsimulation |
| description | ALWAYS USE THIS SKILL for PolicyEngine microsimulation, population-level analysis, winners/losers calculations.
Triggers: microsimulation, share who would lose/gain, policy impact, national average, weighted analysis, cost, revenue impact, budgetary, estimate the cost, federal revenues, tax revenue, budget score, how much would it cost, how much would the policy cost, total cost of, aggregate impact, cost to the government, revenue loss, fiscal impact, poverty impact, child poverty, deep poverty, poverty rate, poverty reduction, how many people lifted out of poverty, SPM poverty, distributional impact, state tax, state-level, California, New York, UBI, universal basic income, flat tax, standard deduction, winners and losers, winners, losers, inequality, Gini, decile, SALT, marginal tax rate, effective tax rate.
NOT for single-household calculations like "what would my benefit be" - use policyengine-us or policyengine-uk for those. Use this skill's code pattern; explore codebase for parameter paths if needed.
|
PolicyEngine Microsimulation
Documentation References
Latest rules and microdata provenance
When the user asks to use the latest PolicyEngine rules, data, microdata,
variable weights, sampled households, or default datasets, verify the package
release from authoritative live metadata before running the microsimulation.
Do not infer "latest" from search snippets, local installed versions, lockfiles,
or repo constraints.
For policyengine.py bundled rules plus default microdata:
python - <<'PY'
import json
import urllib.request
with urllib.request.urlopen(
"https://pypi.org/pypi/policyengine/json",
timeout=20,
) as response:
print(json.load(response)["info"]["version"])
PY
python -m pip index versions policyengine
uv pip install "policyengine[us]==X.Y.Z"
Then confirm the resolved model/data bundle without importing top-level
policyengine, which may initialize countries and private data you are not
using:
python - <<'PY'
import json
from importlib import metadata
from pathlib import Path
for package in ["policyengine", "policyengine-us"]:
print(f"{package}=={metadata.version(package)}")
direct_url = metadata.distribution(package).read_text("direct_url.json")
if direct_url:
print(f"{package} direct_url={direct_url}")
manifest_path = Path(
metadata.distribution("policyengine").locate_file(
"policyengine/data/release_manifests/us.json"
)
)
manifest = json.loads(manifest_path.read_text())
print(json.dumps({
"bundle_id": manifest.get("bundle_id"),
"model_package": manifest.get("model_package"),
"data_package": manifest.get("data_package"),
"default_dataset": manifest.get("default_dataset"),
"default_dataset_uri": (
manifest.get("certified_data_artifact") or {}
).get("uri"),
"certification": manifest.get("certification"),
}, indent=2, sort_keys=True))
PY
Only call a run "latest" after this verification. If PyPI, npm, GitHub, or the
official provider source cannot be reached, report that latest could not be
verified instead of guessing.
CRITICAL: Use calc() with MicroSeries — never strip weights or fetch them manually
MicroSeries handles all weighting automatically. Never convert to numpy, strip types, or do manual weight math.
NEVER strip MicroSeries weights
calc() and calculate() return MicroSeries with embedded weights AND entity context. Any of these operations strip both, producing silently wrong results:
| Anti-pattern | Why it's wrong |
|---|
np.array(series) | Converts to unweighted numpy array |
series.values / series.to_numpy() | Same — strips weights and entity context |
series.astype(float) / .astype(int) | Converts MicroSeries to plain pandas Series, losing weight metadata |
float(series.sum()) | Premature scalar extraction — usually a sign of manual weight math nearby |
np.average(x, weights=w) | Manual weighting — .mean() already does this correctly |
NEVER fetch weight variables manually
calc() returns MicroSeries that already knows its weights. There is no reason to fetch household_weight, spm_unit_weight, person_weight, or tax_unit_weight yourself. If you're writing sim.calc("spm_unit_weight", ...), something is wrong — calc() handles weight mapping internally via the map_to parameter.
liheap = sim.calc("dc_liheap_payment", period=2026).astype(float)
weights = sim.calc("spm_unit_weight", period=2026).astype(float)
total = float((liheap * weights).sum())
avg = float(np.average(liheap[liheap > 0], weights=weights[liheap > 0]))
liheap = sim.calc("dc_liheap_payment", period=2026)
total = liheap.sum()
avg = liheap[liheap > 0].mean()
change_arr = np.array(sim.calc("income_tax", period=2026))
weights = np.array(sim.calc("household_weight", period=2026))
losers = weights[change_arr < -1].sum()
income_tax_b = baseline.calc("income_tax", period=2026)
income_tax_r = reformed.calc("income_tax", period=2026)
tax_change = income_tax_r - income_tax_b
loser_count = (tax_change > 1).sum()
loser_share = (tax_change > 1).mean()
avg_change = tax_change.mean()
total_change = tax_change.sum()
Entity-level matching
When comparing variables across entities, use map_to to align them — never mix raw arrays from different entities:
tax = np.array(sim.calc("income_tax", period=2026))
wt = np.array(sim.calc("household_weight", period=2026))
tax = sim.calc("income_tax", period=2026)
losers = (tax > 0).sum()
Quick start
from policyengine_us import Microsimulation
from policyengine_core.reforms import Reform
baseline = Microsimulation()
reform = Reform.from_dict({
'gov.irs.credits.ctc.amount.base[0].amount': {'2026-01-01.2100-12-31': 3000}
}, 'policyengine_us')
reformed = Microsimulation(reform=reform)
baseline_income = baseline.calc('household_net_income', period=2026, map_to='person')
reformed_income = reformed.calc('household_net_income', period=2026, map_to='person')
change = reformed_income - baseline_income
print(f"Average impact: ${change.mean():,.0f}")
print(f"Total cost: ${change.sum()/1e9:,.1f}B")
print(f"Share losing: {(change < 0).mean():.1%}")
API methods
- US:
sim.calc() — policyengine_us.Microsimulation uses .calc(). Do NOT use .calculate() for US.
- UK:
sim.calculate() — policyengine_uk.Microsimulation uses .calculate(). It does NOT have .calc().
- Both return MicroSeries with automatic weighting. Use
.sum(), .mean(), arithmetic operators.
- Use the
period= keyword: sim.calc("variable", period=2026), not sim.calc("variable", 2026).
Creating reforms
US (policyengine_us.Microsimulation)
from policyengine_us import Microsimulation
from policyengine_core.reforms import Reform
reform = Reform.from_dict({
'gov.irs.credits.ctc.amount.base[0].amount': {'2026-01-01.2100-12-31': 3600},
}, 'policyengine_us')
baseline = Microsimulation()
reformed = Microsimulation(reform=reform)
UK (policyengine_uk.Microsimulation)
from policyengine_uk import Microsimulation
reform = {
'gov.hmrc.income_tax.allowances.personal_allowance.amount': {'2026-01-01.2100-12-31': 15000}
}
baseline = Microsimulation()
reformed = Microsimulation(reform=reform)
IMPORTANT: For UK, pass the dict directly to Microsimulation(reform=dict). Reform.from_dict() works for US but causes errors in UK.
Dataset argument
- National: Omit dataset argument entirely (default is correct)
- State-level:
Microsimulation(dataset='hf://policyengine/policyengine-us-data/states/NY.h5')
- Never pass short strings like
dataset="cps_2024" — use full HF URLs or omit
Available Datasets (HuggingFace)
sim = Microsimulation()
sim = Microsimulation(dataset='hf://policyengine/policyengine-us-data/states/NY.h5')
sim = Microsimulation(dataset='hf://policyengine/policyengine-us-data/districts/NY-17.h5')
For congressional district analysis (representative's constituents, district-level impacts), use the policyengine-district-analysis skill which has complete examples.
Memory considerations
State-level datasets are large (~590MB each). Loading two Microsimulation objects simultaneously (baseline + reformed) requires ~2GB+ RAM. Options:
- Preferred: Use the national dataset (default) — it includes all states with proper weighting
- If state-calibrated weights needed: Compute baseline values first, delete the baseline object (
del baseline), then create the reformed simulation
Key MicroSeries Methods
MicroSeries (from microdf) handles all weighting automatically — see the microdf skill for full documentation.
income = sim.calc('household_net_income', period=2026, map_to='person')
income.mean()
income.sum()
income.median()
(income > 50000).mean()
income.gini()
Inequality & Distributional Analysis
Use built-in MicroSeries methods — never reimplement Gini or other inequality metrics manually:
Include the full distribution by default. PolicyEngine distributional tables
should not drop the bottom 5%, trim nonpositive incomes, or otherwise exclude
income tails unless the user explicitly requests that exclusion or you are
replicating a cited external method. If you do apply an exclusion, make it
visible in the table/chart labels and methodology.
baseline_income = baseline.calc('household_net_income', period=2026, map_to='person')
reformed_income = reformed.calc('household_net_income', period=2026, map_to='person')
print(f"Baseline Gini: {baseline_income.gini():.4f}")
print(f"Reform Gini: {reformed_income.gini():.4f}")
baseline_in_poverty = baseline.calc('person_in_poverty', period=2026, map_to='person')
print(f"SPM poverty rate: {baseline_in_poverty.mean():.1%}")
income.decile_rank()
Poverty analysis
Overall and child poverty
Use household_weight (the only calibrated weight) with MicroSeries arithmetic for all poverty calculations. No .values or np.sum() needed.
from policyengine_us import Microsimulation
baseline = Microsimulation()
reformed = Microsimulation(reform=reform)
YEAR = 2026
baseline_in_poverty = baseline.calc('person_in_poverty', period=YEAR, map_to='person')
reform_in_poverty = reformed.calc('person_in_poverty', period=YEAR, map_to='person')
baseline_poverty_rate = baseline_in_poverty.mean()
reform_poverty_rate = reform_in_poverty.mean()
print(f"Poverty: {baseline_poverty_rate:.1%} → {reform_poverty_rate:.1%}")
is_child = baseline.calc('is_child', period=YEAR)
baseline_child_pov_rate = (baseline_in_poverty * is_child).sum() / is_child.sum()
reform_child_pov_rate = (reform_in_poverty * is_child).sum() / is_child.sum()
print(f"Child poverty: {baseline_child_pov_rate:.1%} → {reform_child_pov_rate:.1%}")
total_people = baseline_in_poverty.sum() / baseline_in_poverty.mean()
people_lifted = (baseline_poverty_rate - reform_poverty_rate) * total_people
children_lifted = (baseline_child_pov_rate - reform_child_pov_rate) * is_child.sum()
WARNING: Never subtract boolean MicroSeries directly. NumPy 2.4+ raises TypeError on boolean subtraction (True - False). Use .mean() to get float rates first, then subtract:
baseline_rate = baseline_in_poverty.mean()
reform_rate = reform_in_poverty.mean()
reduction_pp = baseline_rate - reform_rate
Deep poverty
in_deep_poverty is at the SPM unit level. Use household_weight mapped to the appropriate entity level:
baseline_in_deep_poverty = baseline.calc('in_deep_poverty', period=YEAR, map_to='person')
reform_in_deep_poverty = reformed.calc('in_deep_poverty', period=YEAR, map_to='person')
baseline_deep_rate = baseline_in_deep_poverty.mean()
reform_deep_rate = reform_in_deep_poverty.mean()
print(f"Deep poverty: {baseline_deep_rate:.1%} → {reform_deep_rate:.1%}")
is_child = baseline.calc('is_child', period=YEAR)
baseline_deep_child_rate = (baseline_in_deep_poverty * is_child).sum() / is_child.sum()
reform_deep_child_rate = (reform_in_deep_poverty * is_child).sum() / is_child.sum()
print(f"Deep child poverty: {baseline_deep_child_rate:.1%} → {reform_deep_child_rate:.1%}")
Subgroup analysis note
MicroSeries arithmetic handles subgroup analysis — you should rarely need .values. Multiply by a boolean MicroSeries to filter (e.g., pov * is_child * pw) and use .sum() / .mean() directly.
UK microsimulation
Key differences from US
- UK uses
.calculate(), NOT .calc() — policyengine_uk.Microsimulation does NOT have a .calc() method at all. Always use .calculate().
- UK reform: Pass a plain dict directly to
Microsimulation(reform=dict). Do NOT use Reform.from_dict() — it causes errors with UK.
- UK poverty variables:
in_poverty_bhc (before housing costs) and in_poverty_ahc (after housing costs) — both at household level. UK poverty analysis typically reports both BHC and AHC rates; always note which measure you are using.
- UK entity structure:
household (not SPM unit), benunit (benefit unit), person. Use household_weight and household_count_people for person-weighted rates.
Example: UK personal allowance reform with poverty analysis
from policyengine_uk import Microsimulation
reform = {'gov.hmrc.income_tax.allowances.personal_allowance.amount': {'2026-01-01.2100-12-31': 15000}}
baseline = Microsimulation()
reformed = Microsimulation(reform=reform)
baseline_income = baseline.calculate('household_net_income', period=2026)
reform_income = reformed.calculate('household_net_income', period=2026)
cost = (reform_income - baseline_income).sum()
baseline_in_poverty = baseline.calculate('in_poverty_bhc', period=2026, map_to='person')
reform_in_poverty = reformed.calculate('in_poverty_bhc', period=2026, map_to='person')
baseline_rate = baseline_in_poverty.mean()
reform_rate = reform_in_poverty.mean()
print(f"Cost: £{cost / 1e9:,.1f}B")
print(f"Poverty (BHC): {baseline_rate:.1%} → {reform_rate:.1%}")
CRITICAL: Budgetary impact calculation
Start with a BOTEC range before running code, and flag if the point estimate diverges
Use household_net_income for total cost — but understand what it includes
The budgetary cost of a reform is the change in household_net_income, NOT the change in the
directly-modified program variable. A reform that changes one program (e.g., CTC) can have
cascading effects on other taxes and benefits through interactions (refundability, phase-outs,
benefit clawbacks). Summing only the program-specific variable will undercount the true cost.
This matches the pattern used in the PolicyEngine API (policyengine-api/endpoints/economy/compare.py).
IMPORTANT: household_net_income includes state tax effects. Many states inherit federal
taxable_income, so a federal reform that changes taxable_income will indirectly change
state taxes too. For federal-only revenue estimates, use income_tax directly:
total_cost = (reformed.calc('household_net_income', period=YEAR).sum() -
baseline.calc('household_net_income', period=YEAR).sum()) / 1e9
federal_rev = (reformed.calc('income_tax', period=YEAR).sum() -
baseline.calc('income_tax', period=YEAR).sum()) / 1e9
state_tax_cost = (baseline.calc('state_income_tax', period=YEAR).sum() -
reformed.calc('state_income_tax', period=YEAR).sum()) / 1e9
benefit_cost = (reformed.calc('household_benefits', period=YEAR).sum() -
baseline.calc('household_benefits', period=YEAR).sum()) / 1e9
print(f"Total budgetary cost: ${total_cost:,.1f}B")
print(f"Federal income tax revenue change: ${federal_rev:,.1f}B")
print(f"State/local tax revenue loss: ${state_tax_cost:,.1f}B")
print(f"Benefit spending increase: ${benefit_cost:,.1f}B")
Why not sum the program variable directly? Example: making the CTC fully refundable
shifts credits from non-refundable to refundable, changing income_tax by much more than
the ctc variable itself changes. The household_net_income change captures the full effect.
Per-program decomposition
Individual program changes are still useful for understanding where the cost comes from,
but they don't substitute for the total household_net_income cost above.
programs = ["income_tax", "ctc", "eitc", "snap", "ssi", "household_benefits"]
for prog in programs:
b = baseline.calc(prog, period=YEAR).sum()
r = reformed.calc(prog, period=YEAR).sum()
if abs(r - b) > 1e6:
print(f"{prog}: ${(r - b) / 1e9:+.1f}B")
Current law context
Always check baseline parameter values before interpreting reform impacts. Tax law changes frequently (TCJA, OBBBA, etc.). Use CountryTaxBenefitSystem().parameters to look up current-law values:
from policyengine_us import CountryTaxBenefitSystem
p = CountryTaxBenefitSystem().parameters
print(p.gov.irs.credits.ctc.amount.base("2026-01-01"))
print(p.gov.irs.credits.ctc.refundable.fully_refundable("2026-01-01"))
Finding parameter paths
grep -r "salt" policyengine_us/parameters/gov/irs/ --include="*.yaml"
Parameter tree: gov.irs.deductions, gov.irs.credits, gov.states.{state}.tax
Patterns: Filing status variants (SINGLE, JOINT, etc.), bracket syntax [index], date format 'YYYY-MM-DD.YYYY-MM-DD'
CRITICAL: Bracket path syntax for scale parameters
When referencing bracket/scale parameters, the bracket index goes directly on the scale node — there is NO .brackets in the path.
'gov.irs.credits.ctc.amount.base[0].amount'
'gov.states.ca.tax.income.rates.single[8].rate'
'gov.states.ca.tax.income.rates.single[8].threshold'
'gov.irs.credits.ctc.amount.base.brackets[0].amount'
'gov.states.ca.tax.income.rates.single.brackets[8].rate'
The YAML file has a brackets: list, but the parameter tree flattens it. The index attaches to the node containing the brackets (the YAML filename without .yaml), not to a child called brackets.
To verify a path, inspect the parameter tree:
from policyengine_us import CountryTaxBenefitSystem
p = CountryTaxBenefitSystem().parameters
print(p.gov.irs.credits.ctc.amount.base[0].amount("2026-01-01"))
Two types of bracket parameters
-
ParameterScale (marginal rate schedules, single YAML with brackets: at root):
- Path:
parent_node.scale_name[index].rate or .threshold
- Example:
gov.states.ca.tax.income.rates.single[8].rate
-
ParameterNode with indexed children (folder-based, separate YAML files):
- Path:
node_name[index].child_name
- Example:
gov.irs.credits.ctc.amount.base[0].amount
Both use [index] syntax in Reform.from_dict() — the difference is in the YAML structure. Use CountryTaxBenefitSystem().parameters to navigate and verify paths.
Complete analysis recipe: single-program impact with breakdowns
This pattern covers the common case of analyzing a single benefit or tax variable with subgroup breakdowns. All operations stay in MicroSeries — no manual weights, no numpy, no .astype().
from policyengine_us import Microsimulation
sim = Microsimulation()
YEAR = 2026
benefit = sim.calc("dc_liheap_payment", period=YEAR)
income_level = sim.calc("dc_liheap_income_level", period=YEAR)
unit_size = sim.calc("spm_unit_size", period=YEAR)
recipients = (benefit > 0)
print(f"Total spending: ${benefit.sum():>12,.0f}")
print(f"Recipient units: {recipients.sum():>12,.0f}")
print(f"Avg benefit: ${benefit[recipients].mean():>12,.0f}")
for level in range(1, 11):
at_level = recipients & (income_level == level)
if at_level.any():
print(f"Level {level}: {at_level.sum():,.0f} units, "
f"avg ${benefit[at_level].mean():,.0f}, "
f"total ${benefit[at_level].sum():,.0f}")
for size in [1, 2, 3]:
mask = recipients & (unit_size == size)
if mask.any():
print(f"Size {size}: {mask.sum():,.0f} units, avg ${benefit[mask].mean():,.0f}")
large = recipients & (unit_size >= 4)
if large.any():
print(f"Size 4+: {large.sum():,.0f} units, avg ${benefit[large].mean():,.0f}")
Common variables for microsimulation
Weights
household_weight — the only calibrated weight. calc() uses it internally via map_to — you should never need to fetch it directly. Use map_to='person' or map_to='spm_unit' on any calc() call to project to other entity levels.
Person-level
person_in_poverty — SPM poverty indicator (boolean)
is_child — under 18
age, employment_income
Household-level
household_net_income — net income after taxes/transfers
household_count_people — number of people in household
SPM unit-level
spm_unit_size, spm_unit_count_children
in_poverty, in_deep_poverty
Tax/benefit variables
income_tax, ctc, eitc, snap, ssi