| name | policyengine-vectorization |
| description | PolicyEngine vectorization patterns - NumPy operations, where/select usage, avoiding scalar logic with arrays |
PolicyEngine Vectorization Patterns
Critical patterns for vectorized operations in PolicyEngine. Scalar logic with arrays will crash the microsimulation.
The Golden Rule
PolicyEngine processes multiple households simultaneously using NumPy arrays. NEVER use if-elif-else with entity data.
1. Critical: What Will Crash
❌ NEVER: if-elif-else with Arrays
def formula(household, period, parameters):
income = household("income", period)
if income > 1000:
return 500
else:
return 100
✅ ALWAYS: Vectorized Operations
def formula(household, period, parameters):
income = household("income", period)
return where(income > 1000, 500, 100)
2. Common Vectorization Patterns
Pattern 1: Simple Conditions → where()
❌ if age >= 65:
amount = senior_amount
else:
amount = regular_amount
✅ amount = where(age >= 65, senior_amount, regular_amount)
Pattern 2: Multiple Conditions → select()
❌ if age < 18:
benefit = child_amount
elif age >= 65:
benefit = senior_amount
else:
benefit = adult_amount
✅ benefit = select(
[age < 18, age >= 65],
[child_amount, senior_amount],
default=adult_amount
)
Pattern 2b: Enum Dispatch → List ALL Conditions
When dispatching on an Enum variable, list all values explicitly in select(). Set the default to match the Enum's default_value so unexpected values get consistent behavior.
provider_type = person("ri_ccap_provider_type", period)
types = provider_type.possible_values
return select(
[
provider_type == types.LICENSED_CENTER,
provider_type == types.LICENSED_FAMILY,
],
[center_rate, family_rate],
default=exempt_rate,
)
provider_type = person("ri_ccap_provider_type", period)
types = provider_type.possible_values
return select(
[
provider_type == types.LICENSED_CENTER,
provider_type == types.LICENSED_FAMILY,
provider_type == types.LICENSE_EXEMPT,
],
[center_rate, family_rate, exempt_rate],
default=center_rate,
)
Pattern 3: Boolean Operations
eligible = (age >= 18) & (income < threshold)
eligible = (is_disabled | is_elderly)
eligible = ~is_excluded
Pattern 4: Clipping Values
❌ if amount < 0:
amount = 0
elif amount > maximum:
amount = maximum
✅ amount = clip(amount, 0, maximum)
Pattern 5: Flooring Subtraction Results (CRITICAL)
When subtracting values and wanting to floor at zero, you must wrap the entire subtraction in max_():
❌ WRONG - Creates phantom negative values:
income = max_(income, 0) - capital_loss
✅ CORRECT - Properly floors at zero:
income = max_(income - capital_loss, 0)
❌ WRONG - Tax on phantom negative income:
def formula(tax_unit, period, parameters):
income = tax_unit("adjusted_gross_income", period)
capital_gains = tax_unit("capital_gains", period)
regular_income = max_(income, 0) - capital_gains
return calculate_tax(regular_income)
✅ CORRECT - No phantom income:
def formula(tax_unit, period, parameters):
income = tax_unit("adjusted_gross_income", period)
capital_gains = tax_unit("capital_gains", period)
regular_income = max_(income - capital_gains, 0)
return calculate_tax(regular_income)
Why this matters:
- If
capital_gains = -3000 (loss), then income - capital_gains = income + 3000
- The wrong pattern
max_(income, 0) - capital_gains allows the subtraction to make the result negative
- This creates "phantom income" where none exists, leading to incorrect tax calculations
Rule: When the formula is A - B and you want the result floored at zero, use max_(A - B, 0), NOT max_(A, 0) - B.
3. When if-else IS Acceptable
✅ OK: Parameter-Only Conditions
def formula(entity, period, parameters):
p = parameters(period).gov.program
if p.enabled:
base = p.base_amount
else:
base = 0
income = entity("income", period)
return where(income < p.threshold, base, 0)
✅ OK: Control Flow (Not Data)
def formula(entity, period, parameters):
year = period.start.year
if year >= 2024:
return entity("new_calculation", period)
else:
return entity("old_calculation", period)
4. Common Vectorization Mistakes
Mistake 1: Scalar Comparison with Array
❌ WRONG:
if household("income", period) > 1000:
✅ CORRECT:
income = household("income", period)
high_income = income > 1000
benefit = where(high_income, low_benefit, high_benefit)
Mistake 2: Using Python's and/or/not
❌ WRONG:
eligible = is_elderly or is_disabled
✅ CORRECT:
eligible = is_elderly | is_disabled
Mistake 3: Nested if Statements
❌ WRONG:
if eligible:
if income < threshold:
return full_benefit
else:
return partial_benefit
else:
return 0
✅ CORRECT:
return where(
eligible,
where(income < threshold, full_benefit, partial_benefit),
0
)
5. CRITICAL: Avoiding Divide-by-Zero Warnings
The Problem with where() for Division
where() evaluates BOTH branches before selecting. This causes divide-by-zero warnings even when the zero case wouldn't be selected:
proportion = where(
total_income > 0,
person_income / total_income,
0,
)
✅ CORRECT: Use np.divide with where Parameter
mask = total_income > 0
proportion = np.divide(
person_income,
total_income,
out=np.zeros_like(person_income),
where=mask,
)
How out works as the default:
out=np.zeros_like(...) → default is 0
out=np.ones_like(...) → default is 1
- Positions where
where=False keep their out value unchanged
✅ CORRECT: Alternative Mask Pattern
proportion = np.zeros_like(total_income)
mask = total_income > 0
proportion[mask] = person_income[mask] / total_income[mask]
Common Use Cases
Proportional allocation (e.g., splitting deductions between spouses):
unit_income = tax_unit.sum(person_income)
mask = unit_income > 0
share = np.divide(
person_income,
unit_income,
out=np.zeros_like(person_income),
where=mask,
)
share = where(mask, share, where(is_head, 1.0, 0.0))
Calculating ratios:
mask = us_agi != 0
ratio = np.divide(
state_agi,
us_agi,
out=np.zeros_like(us_agi),
where=mask,
)
Real Examples in Codebase
See these files for reference implementations:
taxable_social_security.py - person share of unit benefits
mo_taxable_income.py - AGI share allocation
md_two_income_subtraction.py - head's share of couple income
ok_child_care_child_tax_credit.py - AGI ratio
6. More Advanced Patterns
Pattern: Vectorized Lookup Tables
❌ if size == 1:
amount = 100
elif size == 2:
amount = 150
elif size == 3:
amount = 190
✅
amount = p.benefit_schedule.calc(size)
✅
amounts = [100, 150, 190, 220, 250]
amount = select(
[size == i for i in range(1, 6)],
amounts[:5],
default=amounts[-1]
)
Pattern: Accumulating Conditions
income_eligible = income < p.income_threshold
resource_eligible = resources < p.resource_limit
demographic_eligible = (age < 18) | is_pregnant
eligible = income_eligible & resource_eligible & demographic_eligible
Pattern: Conditional Accumulation
person = household.members
is_eligible = person("is_eligible", period)
person_income = person("income", period)
eligible_income = where(is_eligible, person_income, 0)
total = household.sum(eligible_income)
7. Testing for Vectorization Issues
Signs Your Code Isn't Vectorized
Error messages:
- "The truth value of an array is ambiguous"
- "ValueError: The truth value of an array with more than one element"
Performance:
- Tests run slowly
- Microsimulation times out
How to Test
def test_vectorization():
incomes = np.array([500, 1500, 3000])
benefits = formula_with_arrays(incomes)
assert len(benefits) == 3
Quick Reference Card
| Operation | Scalar (WRONG) | Vectorized (CORRECT) |
|---|
| Simple condition | if x > 5: | where(x > 5, ...) |
| Multiple conditions | if-elif-else | select([...], [...]) |
| Boolean AND | and | & |
| Boolean OR | or | | |
| Boolean NOT | not | ~ |
| Bounds checking | if x < 0: x = 0 | max_(0, x) |
| Floor subtraction | max_(x, 0) - y ❌ | max_(x - y, 0) ✅ |
| Complex logic | Nested if | Nested where/select |
8. Debugging Phantom Values in Tax Calculations
Problem: Non-Zero Tax Despite Zero Taxable Income
When state tax calculations produce small non-zero values (e.g., $277) even though taxable income is zero, the root cause is usually phantom intermediate values in calculation chains.
Phantom Intermediate Values in Calculation Chains
When taxable income is zero but tax is non-zero, trace the calculation chain:
taxable_income: 0
rate: 0.0475
brackets: [15_600]
tax_before_credits: 277.41
Debugging Pattern
When you see phantom tax values:
-
Check the calculation chain - Run test with verbose output to see intermediate values:
pytest tests/file.py -vv
-
Verify zero-income handling - Look for formulas that don't short-circuit on zero income:
✅ GOOD:
def formula(entity, period, parameters):
taxable_income = entity("taxable_income", period)
return where(taxable_income == 0, 0, calculate_tax(...))
❌ BAD:
def formula(entity, period, parameters):
return calculate_brackets(taxable_income, rates, brackets)
-
Check type consistency - Ensure operations preserve NumPy array dtypes:
✅ Use: max_(value, 0) or clip(value, 0, None)
❌ Avoid: max(value, 0) - Python's max can cause type issues