| name | microdf |
| description | Weighted pandas DataFrames for survey microdata analysis - inequality, poverty, and distributional calculations.
Triggers: "weighted mean", "Gini", "poverty rate", "inequality", "MicroDataFrame", "MicroSeries", "weighted statistics", "decile", "quintile", "income distribution", "microdf"
|
MicroDF
MicroDF provides weighted pandas DataFrames and Series for analyzing survey microdata, with built-in support for inequality and poverty calculations.
For Users
What is MicroDF?
When you see poverty rates, Gini coefficients, or distributional charts in PolicyEngine, those are calculated using MicroDF.
MicroDF powers:
- Poverty rate calculations (SPM)
- Inequality metrics (Gini coefficient)
- Income distribution analysis
- Weighted statistics from survey data
Understanding the Metrics
Gini coefficient:
- Calculated using MicroDF from weighted income data
- Ranges from 0 (perfect equality) to 1 (perfect inequality)
- US typically around 0.48
Poverty rates:
- Calculated using MicroDF with weighted household data
- Compares income to poverty thresholds
- Accounts for household composition
Percentiles:
- MicroDF calculates weighted percentiles
- Shows income distribution (10th, 50th, 90th percentile)
For Analysts
Installation
uv pip install microdf-python
Quick Start
import microdf as mdf
import pandas as pd
df = pd.DataFrame({
'income': [10000, 20000, 30000, 40000, 50000],
'weights': [1, 2, 3, 2, 1]
})
mdf_df = mdf.MicroDataFrame(df, weights='weights')
print(f"Weighted mean: ${mdf_df.income.mean():,.0f}")
print(f"Gini coefficient: {mdf_df.income.gini():.3f}")
Common Operations
Weighted statistics:
mdf_df.income.mean()
mdf_df.income.median()
mdf_df.income.sum()
mdf_df.income.std()
Inequality metrics:
mdf_df.income.gini()
mdf_df.income.top_x_pct_share(10)
mdf_df.income.top_x_pct_share(1)
Poverty analysis:
poverty_rate = mdf_df.poverty_rate(
income_measure='income',
threshold=poverty_line
)
poverty_gap = mdf_df.poverty_gap(
income_measure='income',
threshold=poverty_line
)
deep_poverty_rate = mdf_df.deep_poverty_rate(
income_measure='income',
threshold=poverty_line,
deep_poverty_line=0.5
)
Quantiles:
mdf_df.income.decile_values()
mdf_df.income.quintile_values()
mdf_df.income.quantile(0.25)
MicroSeries
income_series = mdf_df.income
income_series.mean()
income_series.gini()
income_series.percentile(50)
WARNING: .values and .to_numpy() strip weights. These methods now emit a UserWarning because they return plain numpy arrays where operations like .mean() are unweighted. Always use MicroSeries methods directly for weighted calculations:
ms.values.mean()
ms.to_numpy().mean()
ms.mean()
Working with PolicyEngine Results
import microdf as mdf
from policyengine_us import Simulation
situation_with_axes = {...}
sim = Simulation(situation=situation_with_axes)
incomes = sim.calculate("household_net_income", 2026)
weights = sim.calculate("household_weight", 2026)
df = pd.DataFrame({'income': incomes, 'weight': weights})
mdf_df = mdf.MicroDataFrame(df, weights='weight')
gini = mdf_df.income.gini()
poverty_rate = mdf_df.poverty_rate('income', threshold=15000)
print(f"Gini: {gini:.3f}")
print(f"Poverty rate: {poverty_rate:.1%}")
For Contributors
Repository
Location: PolicyEngine/microdf
Clone:
git clone https://github.com/PolicyEngine/microdf
cd microdf
Current Implementation
To see current API:
cat microdf/microframe.py
cat microdf/microseries.py
cat microdf/generic.py
cat microdf/inequality.py
cat microdf/poverty.py
To see all methods:
grep "def " microdf/microframe.py
grep "def " microdf/microseries.py
Testing
To see test patterns:
ls tests/
cat tests/test_microframe.py
Run tests:
make test
pytest tests/ -v
Contributing
Before contributing:
- Check if method already exists
- Ensure it's weighted correctly
- Add tests
- Follow policyengine-standards-skill
Common contributions:
- New inequality metrics
- New poverty measures
- Performance optimizations
- Bug fixes
Advanced Patterns
Custom Aggregations
def weighted_operation(series, weights):
return (series * weights).sum() / weights.sum()
result = weighted_operation(mdf_df.income, mdf_df.weights)
Groupby Operations
grouped = mdf_df.groupby('state')
state_means = grouped.income.mean()
Inequality Decomposition
To see decomposition methods:
grep -A 20 "def.*decomp" microdf/
Integration Examples
Example 1: PolicyEngine Blog Post Analysis
import microdf as mdf
baseline_income = baseline_sim.calculate("household_net_income", 2026)
reform_income = reform_sim.calculate("household_net_income", 2026)
weights = baseline_sim.calculate("household_weight", 2026)
df = pd.DataFrame({
'baseline_income': baseline_income,
'reform_income': reform_income,
'weight': weights
})
mdf_df = mdf.MicroDataFrame(df, weights='weight')
baseline_gini = mdf_df.baseline_income.gini()
reform_gini = mdf_df.reform_income.gini()
print(f"Gini change: {reform_gini - baseline_gini:+.4f}")
Example 2: Poverty Analysis
from policyengine_us import Simulation
baseline_sim = Simulation(situation=situation)
reform_sim = Simulation(situation=situation, reform=reform)
baseline_income = baseline_sim.calculate("spm_unit_net_income", 2026)
reform_income = reform_sim.calculate("spm_unit_net_income", 2026)
spm_threshold = baseline_sim.calculate("spm_unit_poverty_threshold", 2026)
weights = baseline_sim.calculate("household_weight", 2026, map_to="spm_unit")
df_baseline = mdf.MicroDataFrame(
pd.DataFrame({'income': baseline_income, 'threshold': spm_threshold, 'weight': weights}),
weights='weight'
)
poverty_baseline = (df_baseline.income < df_baseline.threshold).mean()
print(f"Poverty reduction: {(poverty_baseline - poverty_reform):.1%}")
Package Status
Maturity: Stable, production-ready
API stability: Stable (rarely breaking changes)
Performance: Optimized for large datasets
To see version:
pip show microdf-python
To see changelog:
cat CHANGELOG.md
Related Skills
- policyengine-us-skill - Generating data for microdf analysis
- policyengine-analysis-skill - Using microdf in policy analysis
- policyengine-us-data-skill - Data sources for microdf
Resources
Repository: https://github.com/PolicyEngine/microdf
PyPI: https://pypi.org/project/microdf-python/
Issues: https://github.com/PolicyEngine/microdf/issues