تشغيل أي مهارة في Manus بنقرة واحدة

econ-data-analysis

Guide for rigorous economic data analysis. Use PROACTIVELY whenever performing data analysis on economic or financial datasets — importing, cleaning, merging, constructing variables, or producing summary statistics. Three core principles: (1) describe before and after every transformation, (2) document in jupytext percent format with interleaved code/narrative/outputs, (3) validate against economic intuition, literature, and cross-variable relationships. Includes pitfall checklists for merges, aggregations, filtering, and variable construction. Language-agnostic (Python, Julia). Trigger: any data analysis task involving economic, financial, or panel data.

تشغيل في Manus

نظرة عامة

أمر التثبيت

npx skills add https://github.com/FuZhiyu/AgentContract --skill econ-data-analysis

انسخ والصق هذا الأمر في Claude Code لتثبيت المهارة

المصدر

FuZhiyu/AgentContract

النجوم٠

التفرعات٠

آخر تحديث٨ أبريل ٢٠٢٦ في ٢٢:٠٩

مستكشف الملفات

3 ملفات

SKILL.md

readonly

المزيد من هذا المستودع

نفس المستودع

wrds-data

FuZhiyu/AgentContract

Search and download financial data from WRDS (Wharton Research Data Services). Use when asked to "download from WRDS", "search WRDS", "get CRSP data", "download Compustat", "find WRDS table", "get stock returns", "download IBES", or any WRDS/financial database task involving CRSP, Compustat, IBES, TAQ, OptionMetrics, Fama-French, BoardEx, DealScan, or other WRDS datasets.

2026-04-080

draft-review

FuZhiyu/AgentContract

Comprehensive academic paper review covering mathematical correctness, writing clarity, consistency, argumentation, proofreading, and citations. Use when user asks to 'review draft', 'check paper', 'proofread manuscript', or requests feedback on academic writing. Can also verify code-paper consistency when source code is available. Defaults to comprehensive + standard review, with optional deep parallel review when Codex multi-agent support is available.

2026-04-070

mistral-pdf-to-markdown

FuZhiyu/AgentContract

Convert PDFs to Markdown using Mistral OCR API with image extraction. Use when you need to extract structured text and images from PDFs, especially for scanned documents or documents with complex formatting. Outputs Markdown with embedded images.

2026-04-070

research-project-template

FuZhiyu/AgentContract

Create new academic research projects with two-folder architecture. Use when user wants to create a new research project, start a new paper, set up a new analysis project, or mentions needing a project structure for research.

2026-04-070

work-journal

FuZhiyu/AgentContract

Create formal, fact-checked work journal entries after completing analysis work. Use when user asks to "summarize work", "document results", or "create work journal entry". Ensures code is committed, copies figures to attachments, and creates objective summaries with mandatory citations plus a report-quality verification pass. For quick reports without fact-checking, use the `report-in-markdown` skill.

2026-04-070

worktree-data-sync

FuZhiyu/AgentContract

Sync non-git data between existing git worktrees. Supports seed, diff, and apply modes using explicit --from/--to endpoints. Does not create/remove worktrees or manage sandbox settings.

2026-04-070

المصدر

FuZhiyu

FuZhiyu/AgentContract

فتح مستودع GitHub عرض مستودعات المنشئ

أمر التثبيت

تنزيل

تشغيل في Manus

مفيد لـSOC

علماء البياناتمهن الحاسوب والرياضيات15-2051L4

name	econ-data-analysis
description	Guide for rigorous economic data analysis. Use PROACTIVELY whenever performing data analysis on economic or financial datasets — importing, cleaning, merging, constructing variables, or producing summary statistics. Three core principles: (1) describe before and after every transformation, (2) document in jupytext percent format with interleaved code/narrative/outputs, (3) validate against economic intuition, literature, and cross-variable relationships. Includes pitfall checklists for merges, aggregations, filtering, and variable construction. Language-agnostic (Python, Julia). Trigger: any data analysis task involving economic, financial, or panel data.
user-invocable	true

Economic Data Analysis

Three concurrent principles for rigorous data work. These are not sequential stages — apply all three at every point in the analysis.

Principle 1: Description Before Analysis

The most common analytical error is transforming data you do not understand. Describe thoroughly and often.

After loading any dataset

Panel structure (first priority for panel/longitudinal data — the common case):

Identify the panel ID (firm, fund, country, individual) and time ID (year, quarter, month, day)
Count unique IDs and unique time periods; verify against expectations
Date range: min and max; any expected periods absent?
Balancedness: compute periods-per-unit distribution (mean, median, min, max). Balanced ratio = actual rows / (N_ids × T_periods). If unbalanced, characterize the pattern — entry/exit, mid-panel gaps, or expanding coverage?
For pure cross-sections, note it and skip panel diagnostics

Variable diagnostics — tailor to type, focus on key variables:

Continuous (returns, prices, GDP, weights): mean, median, std, min, max, and tail percentiles (p1, p5, p95, p99) — tails detect outliers
Categorical/binary (sector codes, indicators, country): value counts and shares; check for unexpected categories or near-zero frequencies
Identifiers: does panel ID × time uniquely identify rows? Check for duplicates
Do NOT run blanket describe() on all columns — select key variables explicitly

Data types and missing values:

Column types: dates as dates, numerics as numerics (not object/string)
Missing values: count and share per variable; is missingness random or systematic (concentrated in certain periods, countries, or correlated with other variables)?
Compare to source documentation if expected sample size is stated

When data was already imported and validated upstream, read existing diagnostics rather than re-running full validation.

After every major transformation

Re-run descriptive statistics on affected variables. Compare before/after. Major transformations include: merges, filters, variable construction, aggregations, reshaping, deduplication.

Rule: if something looks unexpected, investigate before proceeding. Do not use a variable downstream until its distribution is understood.

Outlier decisions

Flag observations beyond p1/p99 — are they data errors or genuine extremes?
For naturally skewed variables (firm size, wealth, trade volumes), extreme values may be real — document the decision to keep, winsorize, or trim
If winsorizing, document cutoff and consider robustness with alternatives (see references/data-robustness-checklist.md)

Principle 2: Logs and Documentation

Analysis scripts should be human-readable documents that interleave code, narrative, and outputs. Use jupytext percent format as the default.

Script format

Write .py or .jl files in percent format — # %% separates code cells, # %% [markdown] starts narrative cells. For full syntax and rendering instructions, see references/jupytext-guide.md.

Brief example:

# %% [markdown]
"""
## Load Holdings Data
Source: CRSP mutual fund holdings, 2000-2020.
Expect ~4.7M rows across ~12K funds.
"""

# %%
df = pd.read_parquet("Data/holdings.parquet")
print(f"Shape: {df.shape}")

# Panel structure
print(f"Funds: {df['fund_id'].nunique()}, Dates: {df['date'].nunique()}")
print(f"Period: {df['date'].min()} to {df['date'].max()}")
obs_per_fund = df.groupby('fund_id')['date'].nunique()
print(f"Periods/fund — mean: {obs_per_fund.mean():.0f}, "
      f"median: {obs_per_fund.median():.0f}, "
      f"min: {obs_per_fund.min()}, max: {obs_per_fund.max()}")

# Key continuous variables only
df[["market_value", "weight"]].describe(percentiles=[.01, .05, .5, .95, .99])

# %% [markdown]
"""
## Merge with Fund Characteristics
Left join on fund_id × date. Expect same row count (fund_chars is m:1).
"""

# %%
n_before = len(df)
df = df.merge(chars, on=["fund_id", "date"], how="left")
print(f"Rows: {n_before} → {len(df)} (delta: {len(df) - n_before})")

Row count tracking

Log before/after row counts for every sample-changing operation: merges, filters, drops, deduplication, sample restrictions. Major operations (merges, large filters) typically warrant their own cell; minor operations can share a cell as long as the count is printed.

Decision documentation

Minor decisions (winsorization percentile, filter threshold): inline comment
Major decisions (excluding countries, choosing sample period, variable definition): markdown cell with reasoning

Output rendering

Pair the .py/.jl script with a .ipynb notebook:

The script goes in version control (git-friendly diffs)
The .ipynb holds rendered outputs for human review
Render with: jupytext --set-kernel <name> --to notebook --execute script.py
See references/jupytext-guide.md for details

Principle 3: Multi-Source Validation

Numbers must make economic sense. Validate against intuition, literature, and cross-variable relationships.

Scale check

Does the magnitude match economic intuition? GDP growth of 300% is wrong; stock returns of -99% need investigation. Compare summary statistics to published benchmarks (IMF WEO, World Bank, central bank data, prior literature).

Property check

Is the variable's behavior consistent with priors or what the literature has found? For constructed variables, spot-check a few observations by hand. For growth rates, verify against published figures for well-known cases.

Relationship check

Compute correlations between new variables and known related measures
Signs and magnitudes consistent with published stylized facts? (e.g., GDP growth positively correlated with employment growth)
Conditional means across subgroups behave as expected? (e.g., developed vs. emerging, pre/post crisis)

Reference verification

For key variables, find at least one external reference to verify alignment. If a relationship looks surprising, investigate before proceeding — it may indicate a data or construction error.

Missing data as validation signal

Systematic missingness (concentrated in time/geography) is informative — investigate whether it reflects true data absence or a construction error
Ask: what does "missing" mean here? No position (→ zero) vs didn't report (→ truly missing) — the correct treatment depends on the data source and research question
Missing returns treated as zero is almost always wrong

Pitfalls

Concise checklists for common data manipulation errors. Consult when performing the relevant operation.

Merges and joins

Before: check row counts and unique join-key values in both tables
Join type: 1:1, m:1, or 1:m. Many-to-many is almost always a bug — it creates a Cartesian product that silently inflates row counts
After: row count should match left table for left join (unless right has dupes on the join key — the many-to-many trap)
Unmatched: log how many rows from each side did not match; assess whether non-matching is random or systematic

Time-series operations (lag, lead, diff, cumsum, fill)

Sort first: sort by panel ID + time before any time-series operation. Joins destroy sort order — always re-sort after any merge
Check for gaps before applying lags/leads/diffs. If unit i is missing period t, a naive shift(1) treats period t+1's lag as t-1's value — silently wrong. Diagnose gaps per unit before proceeding
Use time-aware operators when available: in Julia, PanelShift.jl handles gaps correctly; in Python, merge on lagged time index or reindex to a full time grid before shifting. If the framework only supports positional shift, verify there are no gaps first, or fill gaps explicitly (with NaN, not interpolation) so shifts are correct
After: spot-check a few units to confirm the lag/lead aligns with the correct time period, especially near panel entry/exit

Reshaping

After pivot: unique IDs × unique time periods should match original shape
Check for unintended NAs from unbalanced panels going wide

Aggregations

Function: sum dollar amounts, average rates — never the reverse. Averaging dollars or summing rates are common silent errors
Group-by keys: verify they match intended level (country-year, not country-month)
Weights: if weighted average, verify weights sum to expected values
Duplicates: handle before aggregating — dupes cause double-counting

Deduplication

Check uniqueness before operations that assume it (merges, index-setting)
Document which duplicate kept and why (first, last, highest value, etc.)

Filtering

Log rows dropped: count, reason, before/after
Check non-randomness: are drops concentrated in certain countries, periods, or variable ranges? This may introduce sample selection bias
Verify boolean logic: & vs | errors are a common silent bug
Watch chained filters for unintended cumulative effects

Variable construction

Transformation order: log → winsorize → standardize (log after standardize fails because standardized values can be negative)
Ratio denominators: check for zero/near-zero; extreme ratios often come from small denominators
Growth rates: compare to published benchmarks for spot checks; first differences amplify measurement error — inspect for implausible spikes
Standardization: verify mean ≈ 0, std ≈ 1 within the relevant sample; be clear about cross-sectional vs time-series vs pooled

Missing data handling

Explicit handling (.fillna(0), .dropna(), filters) is visible and auditable
Implicit handling (package defaults silently ignoring NaN in aggregations) is easy to miss — check alignment with analytical objective
Ask: what does "missing" mean in this specific context?
Prefer passing missing through the pipeline over filling silently; use fill/coalesce only with explicit justification

Key References

references/jupytext-guide.md — percent format syntax, rendering, pairing
references/data-robustness-checklist.md — sensitivity analysis: outlier alternatives, alternative definitions, sample restrictions, leave-one-out
Gentzkow & Shapiro (2014), "Code and Data for the Social Sciences"
AEA Data Editor, "Guidance for Replication Packages"