원클릭으로
policyengine-vectorization
PolicyEngine vectorization patterns - NumPy operations, where/select usage, avoiding scalar logic with arrays
Codex 또는 Claude로 설치 이 Prompt를 복사해 Codex, Claude 또는 다른 어시스턴트에 붙여 넣으면 Skill 페이지를 검토하고 설치를 진행할 수 있습니다.
메뉴
PolicyEngine vectorization patterns - NumPy operations, where/select usage, avoiding scalar logic with arrays
Codex 또는 Claude로 설치 이 Prompt를 복사해 Codex, Claude 또는 다른 어시스턴트에 붙여 넣으면 Skill 페이지를 검토하고 설치를 진행할 수 있습니다.
SOC 직업 분류 기준
Recharts chart patterns, formatting, and styling for PolicyEngine apps
PolicyEngine code writing style guide - formula optimization, direct returns, eliminating unnecessary variables
PolicyEngine parameter patterns - YAML structure, naming conventions, metadata requirements, federal/state separation
PolicyEngine code review patterns - validation checklist, common issues, review standards
PolicyEngine testing patterns - YAML test structure, naming conventions, period handling, and quality standards
PolicyEngine variable patterns - variable creation, no hard-coding principle, federal/state separation, metadata standards
| name | policyengine-vectorization |
| description | PolicyEngine vectorization patterns - NumPy operations, where/select usage, avoiding scalar logic with arrays |
Critical patterns for vectorized operations in PolicyEngine. Scalar logic with arrays will crash the microsimulation.
PolicyEngine processes multiple households simultaneously using NumPy arrays. NEVER use if-elif-else with entity data.
# THIS WILL CRASH - household data is an array
def formula(household, period, parameters):
income = household("income", period)
if income > 1000: # ❌ CRASH: "truth value of array is ambiguous"
return 500
else:
return 100
# CORRECT - works with arrays
def formula(household, period, parameters):
income = household("income", period)
return where(income > 1000, 500, 100) # ✅ Vectorized
where()# Instead of if-else
❌ if age >= 65:
amount = senior_amount
else:
amount = regular_amount
✅ amount = where(age >= 65, senior_amount, regular_amount)
select()# Instead of if-elif-else
❌ if age < 18:
benefit = child_amount
elif age >= 65:
benefit = senior_amount
else:
benefit = adult_amount
✅ benefit = select(
[age < 18, age >= 65],
[child_amount, senior_amount],
default=adult_amount
)
When dispatching on an Enum variable, list all values explicitly in select(). Set the default to match the Enum's default_value so unexpected values get consistent behavior.
# ❌ BAD — hides one option in default, unclear which is the fallback
provider_type = person("ri_ccap_provider_type", period)
types = provider_type.possible_values
return select(
[
provider_type == types.LICENSED_CENTER,
provider_type == types.LICENSED_FAMILY,
],
[center_rate, family_rate],
default=exempt_rate, # Why exempt as fallback?
)
# ✅ GOOD — all options listed, default matches Enum default_value
provider_type = person("ri_ccap_provider_type", period)
types = provider_type.possible_values
return select(
[
provider_type == types.LICENSED_CENTER,
provider_type == types.LICENSED_FAMILY,
provider_type == types.LICENSE_EXEMPT,
],
[center_rate, family_rate, exempt_rate],
default=center_rate, # Matches default_value = LICENSED_CENTER
)
# Combining conditions
eligible = (age >= 18) & (income < threshold) # Use & not 'and'
eligible = (is_disabled | is_elderly) # Use | not 'or'
eligible = ~is_excluded # Use ~ not 'not'
# Instead of if for bounds checking
❌ if amount < 0:
amount = 0
elif amount > maximum:
amount = maximum
✅ amount = clip(amount, 0, maximum)
# Or: amount = max_(0, min_(amount, maximum))
When subtracting values and wanting to floor at zero, you must wrap the entire subtraction in max_():
# Common scenario: income after deductions/losses
❌ WRONG - Creates phantom negative values:
income = max_(income, 0) - capital_loss # If capital_loss > income, result is negative!
✅ CORRECT - Properly floors at zero:
income = max_(income - capital_loss, 0) # Entire subtraction floored
# Real example from MT income tax bug:
❌ WRONG - Tax on phantom negative income:
def formula(tax_unit, period, parameters):
income = tax_unit("adjusted_gross_income", period)
capital_gains = tax_unit("capital_gains", period)
# BUG: If capital_gains is negative (loss), this creates negative income
# But max_() only floors income, not the result
regular_income = max_(income, 0) - capital_gains
return calculate_tax(regular_income) # Tax on negative number!
✅ CORRECT - No phantom income:
def formula(tax_unit, period, parameters):
income = tax_unit("adjusted_gross_income", period)
capital_gains = tax_unit("capital_gains", period)
# Properly floors the entire result
regular_income = max_(income - capital_gains, 0)
return calculate_tax(regular_income) # Never negative
Why this matters:
capital_gains = -3000 (loss), then income - capital_gains = income + 3000max_(income, 0) - capital_gains allows the subtraction to make the result negativeRule: When the formula is A - B and you want the result floored at zero, use max_(A - B, 0), NOT max_(A, 0) - B.
# OK - parameters are scalars, not arrays
def formula(entity, period, parameters):
p = parameters(period).gov.program
# This is fine - p.enabled is a scalar boolean
if p.enabled:
base = p.base_amount
else:
base = 0
# But must vectorize when using entity data
income = entity("income", period)
return where(income < p.threshold, base, 0)
# OK - controlling which calculation to use
def formula(entity, period, parameters):
year = period.start.year
if year >= 2024:
# Use new formula (still vectorized)
return entity("new_calculation", period)
else:
# Use old formula (still vectorized)
return entity("old_calculation", period)
❌ WRONG:
if household("income", period) > 1000:
# Error: truth value of array is ambiguous
✅ CORRECT:
income = household("income", period)
high_income = income > 1000 # Boolean array
benefit = where(high_income, low_benefit, high_benefit)
❌ WRONG:
eligible = is_elderly or is_disabled # Python's 'or'
✅ CORRECT:
eligible = is_elderly | is_disabled # NumPy's '|'
❌ WRONG:
if eligible:
if income < threshold:
return full_benefit
else:
return partial_benefit
else:
return 0
✅ CORRECT:
return where(
eligible,
where(income < threshold, full_benefit, partial_benefit),
0
)
where() for Divisionwhere() evaluates BOTH branches before selecting. This causes divide-by-zero warnings even when the zero case wouldn't be selected:
# ❌ WRONG - causes divide-by-zero warning
proportion = where(
total_income > 0,
person_income / total_income, # Still evaluated when total_income = 0!
0,
)
np.divide with where Parameter# ✅ CORRECT - only divides where mask is True
# The `out` parameter IS the default value - positions where mask=False keep this value
mask = total_income > 0
proportion = np.divide(
person_income,
total_income,
out=np.zeros_like(person_income), # Default to 0 where mask is False
where=mask,
)
How out works as the default:
out=np.zeros_like(...) → default is 0out=np.ones_like(...) → default is 1where=False keep their out value unchanged# ✅ CORRECT - traditional mask assignment
proportion = np.zeros_like(total_income)
mask = total_income > 0
proportion[mask] = person_income[mask] / total_income[mask]
Proportional allocation (e.g., splitting deductions between spouses):
# Allocate proportionally by income
unit_income = tax_unit.sum(person_income)
mask = unit_income > 0
share = np.divide(
person_income,
unit_income,
out=np.zeros_like(person_income),
where=mask,
)
# Default share when unit has no income
share = where(mask, share, where(is_head, 1.0, 0.0))
Calculating ratios:
# AGI ratio for credit calculations
mask = us_agi != 0
ratio = np.divide(
state_agi,
us_agi,
out=np.zeros_like(us_agi),
where=mask,
)
See these files for reference implementations:
taxable_social_security.py - person share of unit benefitsmo_taxable_income.py - AGI share allocationmd_two_income_subtraction.py - head's share of couple incomeok_child_care_child_tax_credit.py - AGI ratio# Instead of if-elif for ranges
❌ if size == 1:
amount = 100
elif size == 2:
amount = 150
elif size == 3:
amount = 190
✅ # Using parameter brackets
amount = p.benefit_schedule.calc(size)
✅ # Or using select
amounts = [100, 150, 190, 220, 250]
amount = select(
[size == i for i in range(1, 6)],
amounts[:5],
default=amounts[-1] # 5+ people
)
# Building complex eligibility
income_eligible = income < p.income_threshold
resource_eligible = resources < p.resource_limit
demographic_eligible = (age < 18) | is_pregnant
# Combine with & (not 'and')
eligible = income_eligible & resource_eligible & demographic_eligible
# Sum only for eligible members
person = household.members
is_eligible = person("is_eligible", period)
person_income = person("income", period)
# Only count income of eligible members
eligible_income = where(is_eligible, person_income, 0)
total = household.sum(eligible_income)
Error messages:
Performance:
# Your formula should work with arrays
def test_vectorization():
# Create array inputs
incomes = np.array([500, 1500, 3000])
# Should return array output
benefits = formula_with_arrays(incomes)
assert len(benefits) == 3
| Operation | Scalar (WRONG) | Vectorized (CORRECT) |
|---|---|---|
| Simple condition | if x > 5: | where(x > 5, ...) |
| Multiple conditions | if-elif-else | select([...], [...]) |
| Boolean AND | and | & |
| Boolean OR | or | | |
| Boolean NOT | not | ~ |
| Bounds checking | if x < 0: x = 0 | max_(0, x) |
| Floor subtraction | max_(x, 0) - y ❌ | max_(x - y, 0) ✅ |
| Complex logic | Nested if | Nested where/select |
When state tax calculations produce small non-zero values (e.g., $277) even though taxable income is zero, the root cause is usually phantom intermediate values in calculation chains.
When taxable income is zero but tax is non-zero, trace the calculation chain:
# Tax calculation chain (Montana example)
taxable_income: 0 # ✓ Correct
rate: 0.0475 # Used despite zero income
brackets: [15_600] # Used despite zero income
tax_before_credits: 277.41 # ❌ PHANTOM VALUE
# The bug: Brackets calculated regular tax even when taxable income was zero
# due to missing zero-check in bracket calculation
When you see phantom tax values:
Check the calculation chain - Run test with verbose output to see intermediate values:
pytest tests/file.py -vv
Verify zero-income handling - Look for formulas that don't short-circuit on zero income:
✅ GOOD:
def formula(entity, period, parameters):
taxable_income = entity("taxable_income", period)
# Short-circuit when income is zero
return where(taxable_income == 0, 0, calculate_tax(...))
❌ BAD:
def formula(entity, period, parameters):
# Always calculates, even when income is zero
return calculate_brackets(taxable_income, rates, brackets)
Check type consistency - Ensure operations preserve NumPy array dtypes:
✅ Use: max_(value, 0) or clip(value, 0, None)
❌ Avoid: max(value, 0) - Python's max can cause type issues