Run any Skill in Manus with one click

ols-regression

Econometrics skill for OLS regression and linear models. Activates when the user asks about: "run OLS", "linear regression", "ordinary least squares", "interpret regression results", "heteroskedasticity", "multicollinearity", "regression assumptions", "robust standard errors", "GLS", "WLS", "fit a regression model", "check regression diagnostics", "OLS假设", "最小二乘法", "线性回归", "回归系数", "残差检验", "异方差", "多重共线性", "普通最小二乘", "稳健标准误", "回归诊断"

Run Skill in Manus

Overview

Install command

npx skills add https://github.com/zhouziyue233/great-econometrics --skill ols-regression

Copy and paste this command into Claude Code to install the skill

Source

zhouziyue233/great-econometrics

Stars4

Forks0

UpdatedApril 3, 2026 at 04:39

File Explorer

2 files

SKILL.md

readonly

name

ols-regression

description

OLS Regression Skill

This skill provides comprehensive guidance for OLS regression and linear models in empirical research. It covers model specification, assumption testing, diagnostic checks, and result interpretation, with code examples in Python, R, and Stata.

Core Workflow

When assisting with OLS regression, follow this sequence:

Clarify the research question and data — understand dependent variable, key regressors, and sample
Specify the model — choose functional form, control variables, fixed effects if needed
Run the regression — provide code in the user's preferred language
Check assumptions — run diagnostics systematically (see references)
Interpret and report — explain coefficients, significance, fit, and caveats

Key Concepts

Model Specification

Write the regression equation explicitly: Y = β₀ + β₁X₁ + ... + βₖXₖ + ε
Consider log transformations for skewed variables or elasticity interpretation
Include relevant controls to reduce omitted variable bias
Watch for irrelevant variables inflating standard errors

The Gauss-Markov Assumptions

Linearity in parameters
Random sampling
No perfect multicollinearity
Zero conditional mean of errors: E(ε|X) = 0
Homoskedasticity: Var(ε|X) = σ²
(For inference) Normally distributed errors

Violation of assumptions 4–5 does not bias OLS but affects standard errors. Violation of assumption 4 (endogeneity) biases estimates — recommend IV methods.

Standard Error Options

Default OLS SE: valid only under homoskedasticity
HC robust SE (White): use when heteroskedasticity is suspected; always safe for cross-section data
Clustered SE: use when observations are grouped (e.g., by firm, region, year)
Newey-West SE: use for time series with autocorrelation

Quick Code Templates

Python (statsmodels)

import statsmodels.api as sm
import statsmodels.formula.api as smf

# With robust standard errors
model = smf.ols('y ~ x1 + x2 + x3', data=df).fit(cov_type='HC3')
print(model.summary())

R

library(lmtest)
library(sandwich)

model <- lm(y ~ x1 + x2 + x3, data = df)
coeftest(model, vcov = vcovHC(model, type = "HC3"))

Stata

reg y x1 x2 x3, robust

Diagnostics Checklist

Run all diagnostics after fitting. See references/ols-reference.md for full test details.

Issue	Test	Quick Fix
Heteroskedasticity	Breusch-Pagan, White test	Robust SE
Autocorrelation	Durbin-Watson, Breusch-Godfrey	Newey-West SE
Multicollinearity	VIF > 10	Drop/combine variables
Non-normality of errors	Jarque-Bera	Check outliers; large N mitigates
Omitted variable bias	Ramsey RESET	Respecify model

Reporting Standards (Academic)

Report coefficients with standard errors in parentheses (or t-stats)
Use asterisks for significance: * p<0.10, ** p<0.05, *** p<0.01
Always state which standard errors are used (robust, clustered, etc.)
Report R², adjusted R², N, and F-statistic
Describe the identification strategy and potential endogeneity concerns

For detailed test formulas, code, and extended examples, see references/ols-reference.md.

Common Pitfalls

Claiming causality without identification: OLS with controls does not establish causality — use IV, DID, or RDD for causal claims
Using default SE with clustered data: Always cluster SE at the group level when observations are grouped
Including "bad controls": Don't control for post-treatment variables (mediators) — they introduce collider bias
Log-transforming variables with zeros: ln(0) is undefined; use asinh(x) or ln(x+1) with appropriate interpretation
Reporting R² as evidence of a good model: High R² does not mean the model is correctly specified or causal

More from this repository

same repository

literature-review

zhouziyue233/great-econometrics

Search, summarize, and synthesize economics literature. find research gaps, position your contribution.

2026-04-074

beamer-ppt

zhouziyue233/great-econometrics

Create Beamer-style academic PPTX presentations using python-pptx. Produces publication-quality .pptx files with navy-blue Metropolis theme (16:9, frame title bars, progress bar) for conference talks, job market presentations, and seminar slides. Called by /present command.

2026-04-034

data-pipeline

zhouziyue233/great-econometrics

End-to-end data pipeline for empirical research: fetch economic data from APIs (FRED, World Bank, IMF, BLS, OECD, Yahoo Finance), clean and transform raw data, construct strategy-specific variables, and validate panel structure. Use when asked to fetch data, download data, clean data, merge datasets, prepare analysis-ready data.

2026-04-034

did-analysis

zhouziyue233/great-econometrics

Econometrics skill for Difference-in-Differences (DID) analysis. Activates when the user asks about: "difference in differences", "DID", "DiD", "diff-in-diff", "parallel trends", "treatment group", "control group", "pre-treatment", "post-treatment", "policy evaluation", "natural experiment", "staggered DID", "event study regression", "two-way fixed effects DID", "callaway santanna", "sun and abraham", "双重差分", "倍差法", "平行趋势", "处理组", "对照组", "政策评估", "事件研究", "交错DID", "渐进处理"

2026-04-034

figure

zhouziyue233/great-econometrics

Called by /plot to generate and upgrade econometric figures to top-journal standards.

2026-04-034

iv-estimation

zhouziyue233/great-econometrics

Econometrics skill for instrumental variables and treatment effect estimation. Activates when the user asks about: "instrumental variables", "IV estimation", "2SLS", "two-stage least squares", "endogeneity", "weak instruments", "first stage", "Sargan test", "overidentification", "propensity score matching", "PSM", "average treatment effect", "ATT", "LATE", "local average treatment effect", "endogenous regressor", "instrument validity", "工具变量", "两阶段最小二乘", "内生性", "弱工具变量", "倾向得分匹配", "平均处理效应", "处理效应", "局部平均处理效应"

2026-04-034

Source

zhouziyue233

zhouziyue233/great-econometrics

View GitHub Repository View Creator Repositories

Install command

Download

Run Skill in Manus

Useful forSOC

Operations Research AnalystsComputer and Mathematical Occupations15-2031L4

name

ols-regression

description

OLS Regression Skill

Core Workflow

When assisting with OLS regression, follow this sequence:

Clarify the research question and data — understand dependent variable, key regressors, and sample
Specify the model — choose functional form, control variables, fixed effects if needed
Run the regression — provide code in the user's preferred language
Check assumptions — run diagnostics systematically (see references)
Interpret and report — explain coefficients, significance, fit, and caveats

Key Concepts

Model Specification

Write the regression equation explicitly: Y = β₀ + β₁X₁ + ... + βₖXₖ + ε
Consider log transformations for skewed variables or elasticity interpretation
Include relevant controls to reduce omitted variable bias
Watch for irrelevant variables inflating standard errors

The Gauss-Markov Assumptions

Linearity in parameters
Random sampling
No perfect multicollinearity
Zero conditional mean of errors: E(ε|X) = 0
Homoskedasticity: Var(ε|X) = σ²
(For inference) Normally distributed errors

Violation of assumptions 4–5 does not bias OLS but affects standard errors. Violation of assumption 4 (endogeneity) biases estimates — recommend IV methods.

Standard Error Options

Default OLS SE: valid only under homoskedasticity
HC robust SE (White): use when heteroskedasticity is suspected; always safe for cross-section data
Clustered SE: use when observations are grouped (e.g., by firm, region, year)
Newey-West SE: use for time series with autocorrelation

Quick Code Templates

Python (statsmodels)

import statsmodels.api as sm
import statsmodels.formula.api as smf

# With robust standard errors
model = smf.ols('y ~ x1 + x2 + x3', data=df).fit(cov_type='HC3')
print(model.summary())

R

library(lmtest)
library(sandwich)

model <- lm(y ~ x1 + x2 + x3, data = df)
coeftest(model, vcov = vcovHC(model, type = "HC3"))

Stata

reg y x1 x2 x3, robust

Diagnostics Checklist

Run all diagnostics after fitting. See references/ols-reference.md for full test details.

Issue	Test	Quick Fix
Heteroskedasticity	Breusch-Pagan, White test	Robust SE
Autocorrelation	Durbin-Watson, Breusch-Godfrey	Newey-West SE
Multicollinearity	VIF > 10	Drop/combine variables
Non-normality of errors	Jarque-Bera	Check outliers; large N mitigates
Omitted variable bias	Ramsey RESET	Respecify model

Reporting Standards (Academic)

Report coefficients with standard errors in parentheses (or t-stats)
Use asterisks for significance: * p<0.10, ** p<0.05, *** p<0.01
Always state which standard errors are used (robust, clustered, etc.)
Report R², adjusted R², N, and F-statistic
Describe the identification strategy and potential endogeneity concerns

For detailed test formulas, code, and extended examples, see references/ols-reference.md.

Common Pitfalls

Claiming causality without identification: OLS with controls does not establish causality — use IV, DID, or RDD for causal claims
Using default SE with clustered data: Always cluster SE at the group level when observations are grouped
Including "bad controls": Don't control for post-treatment variables (mediators) — they introduce collider bias
Log-transforming variables with zeros: ln(0) is undefined; use asinh(x) or ln(x+1) with appropriate interpretation
Reporting R² as evidence of a good model: High R² does not mean the model is correctly specified or causal