with one click
with one click
| name | panel-data-guide |
| description | Panel data analysis with fixed and random effects models |
| metadata | {"openclaw":{"emoji":"📋","category":"analysis","subcategory":"econometrics","keywords":["panel data","fixed effects","random effects","GMM","Hausman test","dynamic panel","Stata econometrics"],"source":"wentor-research-plugins"}} |
Estimate and interpret fixed effects, random effects, and dynamic panel models using Stata, R, and Python for longitudinal/panel datasets.
Panel data (also called longitudinal or cross-sectional time-series data) tracks the same units (individuals, firms, countries) across multiple time periods. This structure enables:
| unit_id | year | gdp_growth | investment | trade_openness |
|---------|------|-----------|------------|----------------|
| USA | 2015 | 2.9 | 20.5 | 28.3 |
| USA | 2016 | 1.7 | 20.1 | 27.1 |
| USA | 2017 | 2.3 | 20.8 | 27.5 |
| CHN | 2015 | 6.9 | 43.3 | 39.9 |
| CHN | 2016 | 6.7 | 42.7 | 37.2 |
| CHN | 2017 | 6.9 | 43.1 | 38.1 |
Key notation:
Y_it = alpha + beta * X_it + epsilon_it
Ignores panel structure; assumes no unit-specific effects. Rarely appropriate.
Y_it = alpha_i + beta * X_it + epsilon_it
Each unit has its own intercept (alpha_i) that captures all time-invariant unobserved heterogeneity. The "within" estimator removes alpha_i by demeaning.
Y_it = alpha + beta * X_it + u_i + epsilon_it
The unit-specific effect u_i is treated as random and uncorrelated with X_it.
* Declare panel structure
xtset country_id year
* Summarize within and between variation
xtsum gdp_growth investment trade_openness
* Check for gaps in panel
gen gap = year - l.year if l.year != .
tab gap // Should be all 1's for balanced annual panels
* Create balanced subsample
by country_id: gen T_i = _N
keep if T_i == max_T // Keep only units observed in all periods
* Attrition analysis
gen in_panel = 1
tsfill, full
replace in_panel = 0 if missing(in_panel)
* Fixed effects regression
xtreg gdp_growth investment trade_openness, fe
* Store results for Hausman test
estimates store FE
* Fixed effects with robust standard errors (clustered by unit)
xtreg gdp_growth investment trade_openness, fe vce(cluster country_id)
* Test joint significance of fixed effects
testparm i.country_id
* Entity and time fixed effects (fast, memory-efficient)
reghdfe gdp_growth investment trade_openness, ///
absorb(country_id year) cluster(country_id)
* Two-way clustering (entity and year)
reghdfe gdp_growth investment trade_openness, ///
absorb(country_id year) cluster(country_id year)
* Random effects regression
xtreg gdp_growth investment trade_openness, re
* Store results for Hausman test
estimates store RE
* Hausman specification test
hausman FE RE
* If p < 0.05: reject RE, use FE
* If p > 0.05: RE is consistent and efficient, prefer RE
* Mundlak (1978): add group means to RE model (robust to heteroskedasticity)
foreach var of varlist investment trade_openness {
bysort country_id: egen m_`var' = mean(`var')
}
xtreg gdp_growth investment trade_openness ///
m_investment m_trade_openness, re cluster(country_id)
test m_investment m_trade_openness
* Rejection => FE preferred; failure to reject => RE acceptable
* First-differenced regression (alternative to FE)
reg D.gdp_growth D.investment D.trade_openness, vce(cluster country_id)
library(plm)
# Convert to panel data frame
pdata <- pdata.frame(mydata, index = c("country_id", "year"))
# Fixed effects
fe_model <- plm(gdp_growth ~ investment + trade_openness,
data = pdata, model = "within")
summary(fe_model)
# Random effects
re_model <- plm(gdp_growth ~ investment + trade_openness,
data = pdata, model = "random")
summary(re_model)
# Hausman test
phtest(fe_model, re_model)
# Clustered standard errors
library(lmtest)
library(sandwich)
coeftest(fe_model, vcov = vcovHC(fe_model, type = "HC1", cluster = "group"))
# Time fixed effects
fe_twoway <- plm(gdp_growth ~ investment + trade_openness + factor(year),
data = pdata, model = "within")
# Test for time fixed effects
pFtest(fe_twoway, fe_model)
import pandas as pd
from linearmodels.panel import PanelOLS, RandomEffects, compare
# Set multi-index for panel structure
data = data.set_index(["country_id", "year"])
# Fixed effects
fe = PanelOLS.from_formula(
"gdp_growth ~ investment + trade_openness + EntityEffects",
data=data
)
fe_result = fe.fit(cov_type="clustered", cluster_entity=True)
print(fe_result.summary)
# Random effects
re = RandomEffects.from_formula(
"gdp_growth ~ investment + trade_openness + 1",
data=data
)
re_result = re.fit()
print(re_result.summary)
# Two-way fixed effects (entity + time)
twoway = PanelOLS.from_formula(
"gdp_growth ~ investment + trade_openness + EntityEffects + TimeEffects",
data=data
)
twoway_result = twoway.fit(cov_type="clustered", cluster_entity=True)
print(twoway_result.summary)
# Compare models
print(compare({"FE": fe_result, "RE": re_result, "Two-way FE": twoway_result}))
| Test | Stata | R | Null Hypothesis |
|---|---|---|---|
| F-test for FE | Built into xtreg, fe | pFtest() | All alpha_i = 0 (pooled OLS is appropriate) |
| Breusch-Pagan LM | xttest0 | plmtest() | Var(u_i) = 0 (pooled OLS vs. RE) |
| Hausman | hausman FE RE | phtest() | RE is consistent (u_i uncorrelated with X) |
* Wooldridge test for serial correlation in panel data
xtserial gdp_growth investment trade_openness
* If p < 0.05: serial correlation present; use clustered SE or AR(1) correction
# Wooldridge test
pbgtest(fe_model) # Breusch-Godfrey test for serial correlation
* Modified Wald test for groupwise heteroskedasticity
xttest3
* If p < 0.05: heteroskedasticity present; use robust/clustered SE
When a lagged dependent variable is included as a regressor:
* Difference GMM (Arellano & Bond 1991)
xtabond gdp_growth l.gdp_growth investment trade_openness, ///
lags(1) twostep robust artests(2)
* System GMM (Blundell & Bond 1998) via xtabond2
* More efficient than difference GMM, especially with persistent series
xtabond2 gdp_growth l.gdp_growth investment trade_openness i.year, ///
gmm(l.gdp_growth, lag(2 4) collapse) ///
gmm(investment, lag(2 3) collapse) ///
iv(trade_openness i.year) ///
twostep robust orthogonal small
| Test | Null Hypothesis | Desired Result | Stata Command |
|---|---|---|---|
| AR(1) | No first-order autocorrelation | Reject (p < 0.05) | Reported automatically |
| AR(2) | No second-order autocorrelation | Fail to reject (p > 0.10) | Reported automatically |
| Hansen J | Instruments are valid | Fail to reject (p > 0.10) | Reported automatically |
| Diff-in-Hansen | Level instruments valid | Fail to reject (p > 0.10) | Reported automatically |
| Instrument count | -- | N_instruments < N_groups | Check output |
* Basic DID with two-way fixed effects
xtreg outcome treated##post, fe vce(cluster unit_id)
* Event study specification
xtreg outcome i.relative_time##treated, fe vce(cluster unit_id)
* Entity-clustered (default choice for firm/country panels)
xtreg gdp_growth investment trade_openness, fe cluster(country_id)
* Driscoll-Kraay standard errors (cross-sectional dependence)
xtscc gdp_growth investment trade_openness i.year, fe lag(3)
* Diagnostic tests for SE selection
xtreg gdp_growth investment trade_openness, fe
xttest3 // Modified Wald test for heteroskedasticity
xtserial gdp_growth investment trade_openness // Wooldridge test for serial correlation
xtcsd, pesaran abs // Pesaran CD test for cross-sectional dependence
* IV with fixed effects (xtivreg)
xtivreg gdp_growth (investment = tax_incentive foreign_aid) ///
trade_openness i.year, fe first
* Report Kleibergen-Paap rk Wald F for weak instruments
Table X: Panel Regression Results (Fixed Effects)
Dependent Variable: GDP Growth (%)
(1) (2) (3)
FE RE Two-way FE
Investment 0.125*** 0.118*** 0.131***
(0.032) (0.029) (0.035)
Trade Openness 0.045** 0.051** 0.038*
(0.018) (0.017) (0.020)
Entity FE Yes No Yes
Time FE No No Yes
Observations 850 850 850
R-squared (within) 0.234 0.228 0.267
Hausman test (p) -- 0.003 --
Notes: Robust standard errors clustered at the country level in
parentheses. * p<0.10, ** p<0.05, *** p<0.01.
10 econometrics skills. Trigger: causal analysis, regression models, treatment effects, panel data. Design: method-centric guides with R/Python code and diagnostic tests.
Curated guide to generative AI covering LLMs and diffusion models
27 ai & machine learning skills. Trigger: ML experiments, model training, deep learning, NLP, computer vision. Design: covers frameworks, benchmarks, paper reproduction, and AI research workflows.
Papers on LLMs for IT operations and AIOps research
10 computer science skills. Trigger: algorithms, systems research, software engineering, security papers. Design: theory, complexity analysis, code-centric research, and security methods.
8 finance skills. Trigger: financial modeling, market data, risk analysis, quantitative finance. Design: data sources, quantitative methods, and regulatory frameworks.