| name | microcalibrate |
| description | Survey weight calibration to match population targets - used in policyengine-us-data for enhanced microdata.
Triggers: "calibrate", "calibration", "survey weights", "reweighting", "population targets", "benchmarks", "microcalibrate", "weight adjustment", "target matching"
|
MicroCalibrate
MicroCalibrate calibrates survey weights to match population targets, with L0 regularization for sparsity and automatic hyperparameter tuning.
For Users
What is MicroCalibrate?
When you see PolicyEngine population impacts, the underlying data has been "calibrated" using MicroCalibrate to match official population statistics.
What calibration does:
- Adjusts survey weights to match known totals (population, income, employment)
- Creates representative datasets
- Reduces dataset size while maintaining accuracy
- Ensures PolicyEngine estimates match administrative data
Example:
- Census says US has 331 million people
- Survey has 100,000 households representing the population
- MicroCalibrate adjusts weights so survey totals match census totals
- Result: More accurate PolicyEngine calculations
For Analysts
Installation
uv pip install microcalibrate
What MicroCalibrate Does
Calibration problem:
You have survey data with initial weights, and you know certain population totals (benchmarks). Calibration adjusts weights so weighted survey totals match benchmarks.
Example:
from microcalibrate import Calibration
import numpy as np
import pandas as pd
weights = np.ones(1000)
estimate_matrix = pd.DataFrame({
'total_income': household_incomes,
'total_employed': household_employment
})
targets = np.array([
50_000_000,
600,
])
cal = Calibration(
weights=weights,
targets=targets,
estimate_matrix=estimate_matrix,
l0_lambda=0.01
)
new_weights = cal.calibrate(max_iter=1000)
achieved = (estimate_matrix.values.T @ new_weights)
print(f"Target: {targets}")
print(f"Achieved: {achieved}")
print(f"Non-zero weights: {(new_weights > 0).sum()} / {len(weights)}")
L0 Regularization for Sparsity
Why sparsity matters:
- Reduces dataset size (fewer households to simulate)
- Faster PolicyEngine calculations
- Easier to validate and understand
L0 penalty:
cal = Calibration(
weights=weights,
targets=targets,
estimate_matrix=estimate_matrix,
l0_lambda=0.01
)
To see impact:
cal_dense = Calibration(..., l0_lambda=0.0)
weights_dense = cal_dense.calibrate()
cal_sparse = Calibration(..., l0_lambda=0.01)
weights_sparse = cal_sparse.calibrate()
print(f"Dense: {(weights_dense > 0).sum()} households")
print(f"Sparse: {(weights_sparse > 0).sum()} households")
Automatic Hyperparameter Tuning
Find optimal l0_lambda:
from microcalibrate import tune_hyperparameters
best_lambda, results = tune_hyperparameters(
weights=weights,
targets=targets,
estimate_matrix=estimate_matrix,
lambda_min=1e-4,
lambda_max=1e-1,
n_trials=50
)
print(f"Best lambda: {best_lambda}")
Robustness Evaluation
Test calibration stability:
from microcalibrate import evaluate_robustness
robustness = evaluate_robustness(
weights=weights,
targets=targets,
estimate_matrix=estimate_matrix,
l0_lambda=0.01,
n_folds=5
)
print(f"Mean error: {robustness['mean_error']}")
print(f"Std error: {robustness['std_error']}")
Interactive Dashboard
Visualize calibration:
https://microcalibrate.vercel.app/
Features:
- Upload survey data
- Set targets
- Tune hyperparameters
- View results
- Download calibrated weights
For Contributors
Repository
Location: PolicyEngine/microcalibrate
Clone:
git clone https://github.com/PolicyEngine/microcalibrate
cd microcalibrate
Current Implementation
To see structure:
tree microcalibrate/
ls microcalibrate/
To see specific implementations:
cat microcalibrate/calibration.py
cat microcalibrate/hyperparameter_tuning.py
cat microcalibrate/evaluation.py
Dependencies
Required:
- torch (PyTorch for optimization)
- l0-python (L0 regularization)
- optuna (hyperparameter tuning)
- numpy, pandas, tqdm
To see all dependencies:
cat pyproject.toml
How MicroCalibrate Uses L0
from l0 import HardConcrete
gates = HardConcrete(
n_items=len(weights),
temperature=temperature,
init_mean=0.999
)
effective_weights = weights * gates()
To see L0 integration:
grep -n "HardConcrete\|l0" microcalibrate/calibration.py
Optimization Algorithm
Iterative reweighting:
- Start with initial weights
- Apply L0 gates (select samples)
- Optimize to match targets
- Apply penalty for sparsity
- Iterate until convergence
Loss function:
target_loss = sum((achieved_targets - desired_targets)^2)
l0_penalty = l0_lambda * count_nonzero(weights)
total_loss = target_loss + l0_penalty
Testing
Run tests:
make test
pytest tests/ -v
To see test patterns:
cat tests/test_calibration.py
cat tests/test_hyperparameter_tuning.py
Usage in policyengine-us-data
To see how data pipeline uses microcalibrate:
cd ../policyengine-us-data
grep -r "microcalibrate" policyengine_us_data/
grep -r "Calibration" policyengine_us_data/
Common Patterns
Pattern 1: Basic Calibration
from microcalibrate import Calibration
cal = Calibration(
weights=initial_weights,
targets=benchmark_values,
estimate_matrix=contributions,
l0_lambda=0.01
)
calibrated_weights = cal.calibrate(max_iter=1000)
Pattern 2: With Hyperparameter Tuning
from microcalibrate import tune_hyperparameters, Calibration
best_lambda, results = tune_hyperparameters(
weights=weights,
targets=targets,
estimate_matrix=estimate_matrix
)
cal = Calibration(..., l0_lambda=best_lambda)
calibrated_weights = cal.calibrate()
Pattern 3: Multi-Target Calibration
estimate_matrix = pd.DataFrame({
'total_population': population_counts,
'total_income': incomes,
'total_employed': employment_indicators,
'total_children': child_counts
})
targets = np.array([
331_000_000,
15_000_000_000_000,
160_000_000,
73_000_000
])
cal = Calibration(weights, targets, estimate_matrix, l0_lambda=0.01)
Performance Considerations
Calibration speed:
- 1,000 households, 5 targets: ~1 second
- 100,000 households, 10 targets: ~30 seconds
- Depends on: dataset size, number of targets, l0_lambda
Memory usage:
- PyTorch tensors for optimization
- Scales linearly with dataset size
To profile:
import time
start = time.time()
weights = cal.calibrate()
print(f"Calibration took {time.time() - start:.1f}s")
Troubleshooting
Common issues:
1. Calibration not converging:
cal = Calibration(..., l0_lambda=0.001)
weights = cal.calibrate(max_iter=5000)
2. Targets not matching:
achieved = (estimate_matrix.values.T @ weights)
error = np.abs(achieved - targets) / targets
print(f"Relative errors: {error}")
3. Too sparse (all weights zero):
cal = Calibration(..., l0_lambda=0.0001)
Related Skills
- l0-skill - Understanding L0 regularization
- policyengine-us-data-skill - How calibration fits in data pipeline
- microdf-skill - Working with calibrated survey data
Resources
Repository: https://github.com/PolicyEngine/microcalibrate
Dashboard: https://microcalibrate.vercel.app/
PyPI: https://pypi.org/project/microcalibrate/
Paper: Louizos et al. (2017) on L0 regularization