Ejecuta cualquier Skill en Manus
con un clic

Ejecuta cualquier Skill en Manus con un clic

stata-ai-fusion

全能 Stata AI 助手。覆盖代码生成、执行、调试的完整工作流。在用户提及 Stata、.do 文件、.dta 文件、回归分析、面板数据、生存分析、计量经济学、因果推断、数据清理、或任何统计分析任务时触发。即使用户没有明确提到 Stata，如果之前的对话上下文涉及 Stata 也应触发。也适用于调试 Stata .log 文件中的错误。

Ejecutar en Manus

Resumen

Comando de instalación

npx skills add https://github.com/haoyu-haoyu/stata-ai-fusion --skill stata-ai-fusion

Copia y pega este comando en Claude Code para instalar la habilidad

Fuente

haoyu-haoyu/stata-ai-fusion

Estrellas24

Forks6

Actualizado17 de marzo de 2026, 11:19

Explorador de archivos

15 archivos

SKILL.md

readonly

Fuente

haoyu-haoyu

haoyu-haoyu/stata-ai-fusion

Abrir repositorio de GitHub Ver repositorios del creador

Comando de instalación

Descarga

Ejecutar en Manus

Útil paraSOC

Científicos de datosOcupaciones informáticas y matemáticas15-2051L4

name

stata-ai-fusion

description

Stata AI Fusion Skill

You have access to comprehensive Stata reference files. Do not load all files. Read only the 1-3 files relevant to the user's current task using the routing table below.

1. Execution Mode Detection

Detect which execution mode is available and adapt accordingly.

MCP Mode (Stata MCP Server Available)

When the MCP server is connected, you can execute Stata code directly:

Use stata_run_command to run code and return results
Use stata_run_do_file to run full .do files
Use stata_inspect_data to inspect loaded datasets (summary stats, variable info)
Use stata_codebook to generate a codebook for the current dataset
Use stata_get_results to retrieve stored estimation results (r(), e(), s())
Use stata_export_graph to export the current Stata graph to an image
Use stata_search_log to search through Stata log output
Use stata_install_package to install community-contributed packages
Use stata_cancel_command to cancel a running Stata command
Use stata_list_sessions to list all active Stata sessions
Use stata_close_session to close a specific Stata session
Always check execution results before proceeding
If execution fails, read references/error-codes.md for diagnosis

Offline Mode (No MCP Server)

When MCP is not available, generate code for the user to run manually:

Write complete, self-contained .do files
Include version, clear all, set more off preamble
Add log using for output capture
Comment every non-obvious line
Wrap risky operations with capture and check _rc

Detection Logic

IF MCP tools are listed in available tools:
    -> Use MCP execution mode
    -> Run code incrementally, check results between steps
ELSE:
    -> Use offline code generation mode
    -> Produce complete .do file with full preamble

2. Core Coding Standards (Mandatory Rules)

Every piece of Stata code you generate MUST follow these rules. Violations cause silent bugs.

Missing Values Sort to +Infinity

Stata . (and .a-.z) are greater than all numbers.

* WRONG -- includes observations where income is missing!
gen high_income = (income > 50000)

* RIGHT
gen high_income = (income > 50000) if !missing(income)

* WRONG -- missing ages appear in this list
list if age > 60

* RIGHT
list if age > 60 & !missing(age)

`=` vs `==`

= is assignment; == is comparison. Mixing them is a syntax error or silent bug.

* WRONG -- syntax error
gen employed = 1 if status = 1

* RIGHT
gen employed = 1 if status == 1

Local Macro Syntax

Locals use `name' (backtick + single-quote). Globals use $name or ${name}. Forgetting the closing quote is the number-one macro bug.

local controls "age education income"
regress wage `controls'        // correct
regress wage `controls         // WRONG -- missing closing quote
regress wage 'controls'        // WRONG -- wrong quote characters

`by` Requires Prior Sort (Use `bysort`)

* WRONG -- error if data not sorted by id
by id: gen first = (_n == 1)

* RIGHT -- bysort sorts automatically
bysort id: gen first = (_n == 1)

Factor Variable Notation (`i.` and `c.`)

* WRONG -- treats race as continuous
regress wage race education

* RIGHT -- creates dummies automatically
regress wage i.race education

* Interactions
regress wage i.race##c.education    // full interaction
regress wage i.race#c.education     // interaction only (no main effects)

`generate` vs `replace`

generate creates new variables; replace modifies existing ones.

gen x = 1
gen x = 2          // ERROR: x already defined
replace x = 2      // correct

String Comparison Is Case-Sensitive

* May miss "Male", "MALE", etc.
keep if gender == "male"

* Safer
keep if lower(gender) == "male"

`merge` Always Check `_merge`

merge 1:1 id using other.dta
tab _merge                      // always inspect
assert _merge == 3              // or handle mismatches
drop _merge

Line Continuation Uses `///`

regress y x1 x2 x3 ///
    x4 x5 x6, ///
    vce(robust)

Stored Results: `r()` vs `e()` vs `s()`

r() -- r-class commands (summarize, tabulate, etc.)
e() -- e-class commands (estimation: regress, logit, etc.)
s() -- s-class commands (parsing)

A new estimation command overwrites previous e() results. Store them first:

regress y x1 x2
estimates store model1

Weights Are Not Interchangeable

fweight -- frequency weights (replication)
aweight -- analytic/regression weights (inverse variance)
pweight -- probability/sampling weights (survey data, implies robust SE)
iweight -- importance weights (rarely used)

`preserve` / `restore` for Temporary Changes

preserve
collapse (mean) income, by(state)
* ... do something with collapsed data ...
restore   // original data is back

`capture` Swallows Errors

capture some_command
if _rc != 0 {
    di as error "Failed with code: " _rc
    exit _rc
}

3. Workflow Templates

Data Cleaning Pipeline

* 1. Load and inspect
import delimited "raw_data.csv", clear varnames(1)
describe
codebook, compact

* 2. Clean
rename *, lower
destring income, replace force
replace income = . if income < 0

* 3. Label
label variable income "Annual household income (USD)"
label define yesno 0 "No" 1 "Yes"
label values employed yesno

* 4. Save
compress
save "clean_data.dta", replace

Regression Table Workflow

eststo clear
eststo: regress y x1 x2, vce(robust)
eststo: regress y x1 x2 x3, vce(robust)
eststo: regress y x1 x2 x3 x4, vce(cluster id)

esttab using "results.tex", replace ///
    se star(* 0.10 ** 0.05 *** 0.01) ///
    label booktabs ///
    title("Main Results") ///
    mtitles("(1)" "(2)" "(3)")

Panel Data Setup

xtset panelid timevar
xtdescribe
xtsum outcome

xtreg y x1 x2, fe vce(cluster panelid)
* Or with reghdfe
reghdfe y x1 x2, absorb(panelid timevar) vce(cluster panelid)

Difference-in-Differences

* Classic 2x2 DiD
gen post = (year >= treatment_year)
gen treat_post = treated * post
regress y treated post treat_post, vce(cluster id)

* Modern staggered DiD (Callaway & Sant'Anna)
csdid y x1 x2, ivar(id) time(year) gvar(first_treat) agg(event)
csdid_plot

Publication-Quality Graph

twoway (scatter y x, mcolor(navy%50) msize(small)) ///
       (lfit y x, lcolor(cranberry) lwidth(medthick)), ///
    title("Title Here") ///
    xtitle("X Label") ytitle("Y Label") ///
    legend(off) scheme(plotplain)
graph export "figure1.pdf", replace as(pdf)
graph export "figure1.png", replace as(png) width(2400)

Multiple Imputation

mi set mlong
mi register imputed income education
mi impute chained (regress) income (ologit) education ///
    = age i.gender, add(20) rseed(12345)
mi estimate: regress wage income education age i.gender

Survival Analysis Quick Start

stset time_var, failure(event_var)
stsum
sts graph, by(group)
sts test group
stcox i.group age, vce(robust)
estat phtest, detail

4. Reference Document Navigation (Routing Table)

Read only the files relevant to the user's task. Paths are relative to this SKILL.md file.

Data Operations

File	Topics & Key Commands
`references/syntax-core.md`	Variable types, operators, conditionals, loops, macros, string/numeric/date functions
`references/data-management.md`	import/export, merge, reshape, collapse, egen, encode/decode, append, tempfile
`references/defensive-coding.md`	assert, confirm, capture, isid, version, log management

Econometrics & Statistics

File	Topics & Key Commands
`references/econometrics.md`	OLS, IV/2SLS, GMM, Panel FE/RE, Hausman, cluster SE, robust SE, reghdfe
`references/causal-inference.md`	DID, Event Study, RDD, Synthetic Control, Matching, PSM, IPTW
`references/survival-analysis.md`	stset, KM curves, stcox, streg, competing risks, estat phtest

Clinical & Domain-Specific

File	Topics & Key Commands
`references/clinical-data.md`	MIMIC-IV, ICD coding, lab values, Sepsis-3, KDIGO, mi impute

Output & Visualization

File	Topics & Key Commands
`references/graphics.md`	twoway, graph bar/box/dot/pie, schemes, coefplot, binscatter
`references/tables-export.md`	esttab/estout, outreg2, putexcel, collect framework, LaTeX/Word/CSV

Programming & Advanced

File	Topics & Key Commands
`references/mata.md`	Mata basics, matrix ops, st_view/st_store, custom functions
`references/error-codes.md`	Common Stata error codes with explanations and solutions

Community Packages

File	What It Does
`references/packages/reghdfe.md`	High-dimensional FE regression with multi-way clustering
`references/packages/coefplot.md`	Coefficient plots, event study plots, multi-model comparison
`references/packages/gtools.md`	gcollapse, gegen, greshape -- fast data manipulation

5. MCP Tools Usage Guide

Tool Combination Patterns

Pattern A: Iterative Analysis

1. stata_run_command("use mydata.dta, clear")  -> load data
2. stata_run_command("describe")               -> understand structure
3. stata_run_command("regress y x1 x2")        -> run model
4. Read results, decide next step
5. stata_run_command("esttab using ...")        -> export table

Pattern B: Debug Workflow

1. User pastes error from .log file
2. Read references/error-codes.md for the error code
3. Identify the root cause
4. stata_run_command(corrected_code)      -> verify fix
5. Explain what went wrong

Pattern C: Full Pipeline

1. stata_run_command("import delimited ...")  -> load raw data
2. stata_run_command("describe \n codebook")  -> inspect
3. Generate cleaning code based on inspection
4. stata_run_command(cleaning_code)           -> clean
5. stata_run_command(analysis_code)           -> analyze
6. stata_run_command(export_code)             -> export results

Error Recovery in MCP Mode

When an MCP execution returns an error:

Parse the error code (e.g., r(111), r(198), r(601))
Look up in references/error-codes.md
Apply the documented fix
Re-execute the corrected code
If still failing, try alternative approaches

Code Generation for Offline Mode

When generating .do files for users to run manually:

/* ============================================================
   Project:  [Project Name]
   Purpose:  [What this do-file does]
   Author:   AI-generated via stata-ai-fusion
   Date:     [Current Date]
   Input:    [Input files]
   Output:   [Output files]
   ============================================================ */

version 17
clear all
set more off
set maxvar 10000

* -- Set paths --
global root "[USER_PATH]"
global data "$root/data"
global output "$root/output"

log using "$output/logs/analysis_log.txt", replace text

* -- Body --
// ... analysis code ...

log close

6. Common Code Snippets

Quick Summary Statistics

summarize price mpg weight, detail
tabstat price mpg, by(foreign) stat(n mean sd p25 p50 p75)

Winsorize at 1%/99%

foreach var of varlist income wealth {
    quietly summarize `var', detail
    replace `var' = r(p1) if `var' < r(p1) & !missing(`var')
    replace `var' = r(p99) if `var' > r(p99) & !missing(`var')
}

Create Balanced Panel

xtset id year
bysort id: gen n_years = _N
tab n_years
keep if n_years == [expected_T]

Standardize Variables (Z-score)

foreach var of varlist x1 x2 x3 {
    quietly summarize `var'
    gen `var'_z = (`var' - r(mean)) / r(sd)
}

Lag and Lead Variables in Panel

xtset id year
gen y_lag1 = L.y
gen y_lag2 = L2.y
gen y_lead1 = F.y
gen y_diff = D.y

Quick Coefficient Plot

regress y x1 x2 x3 x4, vce(robust)
coefplot, drop(_cons) xline(0, lcolor(red) lpattern(dash)) ///
    title("Coefficient Estimates") scheme(plotplain)
graph export "coefplot.png", replace width(2000)

Export Summary Table to Excel

putexcel set "summary.xlsx", replace sheet("Summary")
putexcel A1 = "Variable" B1 = "N" C1 = "Mean" D1 = "SD"
local row = 2
foreach var of varlist income age education {
    quietly summarize `var'
    putexcel A`row' = "`var'" B`row' = r(N) ///
        C`row' = r(mean) D`row' = r(sd)
    local ++row
}

Propensity Score Matching (Quick)

logit treatment x1 x2 x3
predict pscore, pr
psmatch2 treatment, pscore(pscore) outcome(y) n(1) common

Event Study Plot (Quick)

gen rel_time = year - treatment_year
forvalues k = -5/5 {
    if `k' < 0 local j = abs(`k')
    else       local j = `k'
    gen D`j'_`=cond(`k'<0,"pre","post")' = (rel_time == `k')
}
* Omit period -1 as reference
reghdfe y D*_pre D*_post, absorb(id year) vce(cluster id)
coefplot, keep(D*) vertical yline(0) xline(5.5, lpattern(dash))

Check for Multicollinearity

regress y x1 x2 x3 x4
estat vif
* VIF > 10 indicates concern

Robust Hausman Test (with clustered SE)

* Standard Hausman fails with robust/cluster SE. Use this instead:
xtreg y x1 x2, fe vce(cluster id)
estimates store fe
xtreg y x1 x2, re vce(cluster id)
estimates store re
hausman fe re, sigmamore

stata-ai-fusion

Stata AI Fusion Skill

1. Execution Mode Detection

MCP Mode (Stata MCP Server Available)

Offline Mode (No MCP Server)

Detection Logic

2. Core Coding Standards (Mandatory Rules)

Missing Values Sort to +Infinity

= vs ==

Local Macro Syntax

by Requires Prior Sort (Use bysort)

Factor Variable Notation (i. and c.)

generate vs replace

String Comparison Is Case-Sensitive

merge Always Check _merge

Line Continuation Uses ///

Stored Results: r() vs e() vs s()

Weights Are Not Interchangeable

preserve / restore for Temporary Changes

capture Swallows Errors

3. Workflow Templates

Data Cleaning Pipeline

Regression Table Workflow

Panel Data Setup

Difference-in-Differences

Publication-Quality Graph

Multiple Imputation

Survival Analysis Quick Start

4. Reference Document Navigation (Routing Table)

Data Operations

Econometrics & Statistics

Clinical & Domain-Specific

Output & Visualization

Programming & Advanced

Community Packages

5. MCP Tools Usage Guide

Tool Combination Patterns

Error Recovery in MCP Mode

Code Generation for Offline Mode

6. Common Code Snippets

Quick Summary Statistics

Winsorize at 1%/99%

Create Balanced Panel

Standardize Variables (Z-score)

Lag and Lead Variables in Panel

Quick Coefficient Plot

Export Summary Table to Excel

Propensity Score Matching (Quick)

Event Study Plot (Quick)

Check for Multicollinearity

Robust Hausman Test (with clustered SE)

Stata AI Fusion Skill

1. Execution Mode Detection

MCP Mode (Stata MCP Server Available)

Offline Mode (No MCP Server)

Detection Logic

2. Core Coding Standards (Mandatory Rules)

Missing Values Sort to +Infinity

= vs ==

Local Macro Syntax

by Requires Prior Sort (Use bysort)

Factor Variable Notation (i. and c.)

generate vs replace

String Comparison Is Case-Sensitive

merge Always Check _merge

Line Continuation Uses ///

Stored Results: r() vs e() vs s()

Weights Are Not Interchangeable

preserve / restore for Temporary Changes

capture Swallows Errors

3. Workflow Templates

Data Cleaning Pipeline

Regression Table Workflow

Panel Data Setup

Difference-in-Differences

Publication-Quality Graph

Multiple Imputation

Survival Analysis Quick Start

4. Reference Document Navigation (Routing Table)

Data Operations

`=` vs `==`

`by` Requires Prior Sort (Use `bysort`)

Factor Variable Notation (`i.` and `c.`)

`generate` vs `replace`

`merge` Always Check `_merge`

Line Continuation Uses `///`

Stored Results: `r()` vs `e()` vs `s()`

`preserve` / `restore` for Temporary Changes

`capture` Swallows Errors

`=` vs `==`

`by` Requires Prior Sort (Use `bysort`)

Factor Variable Notation (`i.` and `c.`)

`generate` vs `replace`

`merge` Always Check `_merge`

Line Continuation Uses `///`

Stored Results: `r()` vs `e()` vs `s()`

`preserve` / `restore` for Temporary Changes

`capture` Swallows Errors