Jeden Skill in Manus ausführen
mit einem Klick

Jeden Skill in Manus mit einem Klick ausführen

replicate-study

Sterne161

Forks42

Aktualisiert21. Juni 2026 um 08:21

Replicate an existing cohort study's methodology on a different database. Extracts study design from a source paper, maps variables to the target DB via harmonization table, generates analysis code, and produces a replication difference report.

Installation

Mit Codex oder Claude installieren Kopieren Sie diesen Prompt, fügen Sie ihn in Codex, Claude oder einen anderen Assistant ein und lassen Sie die Skill-Seite prüfen und installieren.

In Manus ausführen

Quelle

Aperivue

Aperivue/medsci-skills

GitHub-Repository öffnen Creator-Repositorys ansehen

Download

In Manus ausführen

Verwandte BerufeSOC

Basierend auf der SOC-Berufsklassifikation

EpidemiologenNatur- und Sozialwissenschaften·SOC 19-1041

Datei-Explorer

5 Dateien

SKILL.md

readonly

name	replicate-study
description	Replicate an existing cohort study's methodology on a different database. Extracts study design from a source paper, maps variables to the target DB via harmonization table, generates analysis code, and produces a replication difference report.
triggers	replicate study, replicate paper, 논문 복제, 방법론 복제, reproduce study, replication, 다른 DB로, swap database, 데이터 교체
tools	Read, Write, Edit, Bash, Grep, Glob
model	opus

Replicate Study Skill

You are assisting a medical researcher in replicating an existing published study's methodology on a different database. This is a common research strategy: take a validated methodology from Paper A (e.g., NHIS cohort study) and apply it to Database B (e.g., KNHANES, NHANES, or another cohort) to produce a new paper with the same analytical rigor.

When to Use

Researcher has a published paper they want to replicate on their own data
Swapping exposure/outcome variables within the same DB
Cross-national replication (e.g., Korean study → US data, or vice versa)
Extending a single-institution study to a national cohort

Inputs

Source paper: PDF, DOI, or markdown of the paper to replicate
Target database path: CSV/SAS data file(s) to use
Harmonization table (optional): CSV mapping source → target variables
- Default: ${SKILL_DIR}/references/harmonization_knhanes_nhanes.csv (if KNHANES↔NHANES)

Reference Files

${SKILL_DIR}/references/methodology_extraction_template.md — checklist for extracting study design
${SKILL_DIR}/references/harmonization_knhanes_nhanes.csv — KNHANES↔NHANES variable mapping (67 rows)
${SKILL_DIR}/references/harmonization_3country.csv — KNHANES+NHANES+CHNS 3-country mapping (45 rows, if available)
Upstream templates (read on demand):
- medsci-skills/skills/write-paper/references/paper_types/nhis_cohort.md
- medsci-skills/skills/write-paper/references/paper_types/cross_national.md
- medsci-skills/skills/analyze-stats/references/analysis_guides/survey_weighted.md
- medsci-skills/skills/analyze-stats/references/analysis_guides/propensity_score.md

Workflow

Phase 1: Source Paper Analysis

Read the source paper (PDF → text, or markdown).
Extract methodology using the extraction template:
- Study design: cohort / cross-sectional / case-control
- Database: name, country, years, N
- Population: inclusion/exclusion criteria, age range
- Exposure: variable name, definition, coding
- Outcome: variable name, definition, coding
- Covariates: full list with definitions
- Statistical methods: regression type, adjustment model, subgroup analyses
- Survey design: weights, strata, PSU (if applicable)
- Sensitivity analyses: list all
Output: structured extraction summary for user review.

Phase 2: Variable Mapping

Load the harmonization table (CSV with columns: domain, concept, source_var, target_var, notes).
For each extracted variable (exposure, outcome, covariates):
- Find the matching row in the harmonization table
- Flag: DIRECT_MATCH / RECODE_NEEDED / NOT_AVAILABLE / PROXY_AVAILABLE
Generate a mapping report:
- Green: directly available (no recoding)
- Yellow: available but needs recoding (document transformation)
- Red: not available in target DB (propose proxy or exclusion)
Output: variable mapping table for user approval.

Phase 3: Code Generation

Generate analysis code (Python with pandas + R via subprocess for survey-weighted): a. Data loading & cleaning: read target DB, apply inclusion/exclusion b. Variable derivation: recode variables per mapping table c. Survey design setup: define svydesign object (strata, PSU, weights) d. Table 1: demographics by exposure group (weighted) e. Main analysis: replicate the primary model (logistic/Cox/linear regression) f. Subgroup analyses: if specified in source paper g. Sensitivity analyses: replicate all listed in source paper
Use /analyze-stats templates where available (survey_weighted, propensity_score).
All code must be self-contained and reproducible.

Phase 4: Difference Report

Generate a structured difference report documenting:

Section	Content
Study Design	Same / Modified (explain)
Database	Source DB → Target DB (N, years, country)
Population	Inclusion/exclusion differences
Variable Mapping	Full mapping table with match status
Unavailable Variables	What's missing and how handled
Methodological Differences	Any forced changes (e.g., BMI cutoffs, LDL calculation)
Expected Differences	Why results may differ (population, measurement, cultural)

Save as replication_report.md in the working directory.

Phase 5: Validation Checklist

Before reporting completion, verify:

All source paper covariates accounted for (mapped, proxied, or documented as missing)
Survey weights correctly applied (NEVER analyze unweighted if source used weights)
Obesity/BMI cutoffs match target population standards (Asian vs WHO)
Fasting requirements matched (fasting glucose, lipids)
Age restrictions applied correctly
Code runs without errors on target data
Output tables match source paper structure

Critical Rules

Never pool data across surveys. Analyze each country's data with its own survey design.
Document every deviation from the source methodology in the difference report.
Asian BMI cutoffs (≥25 for obesity) when analyzing Korean data, even if source used WHO (≥30).
LDL calculation: note if source used direct measurement vs Friedewald.
Weighted analysis is mandatory for KNHANES/NHANES — never run unweighted models.
IRB: note that KNHANES/NHANES are de-identified public data (IRB exempt or waived).
Outdated source definitions: if the source paper used a pre-2023 definition that has since been superseded (e.g., NAFLD → MASLD 2023, CKD-EPI 2009 → 2021 race-free), call /define-variables to cross-check whether to mirror the legacy definition (pure replication) or upgrade to current (extension). Document the choice explicitly in the difference report.

Output Files

{working_dir}/
├── replication_report.md     — Structured difference report
├── variable_mapping.csv      — Variable mapping table with match status
├── analysis_code.py          — Main analysis script (Python + R calls)
├── analysis_code.R           — R script for survey-weighted analysis
└── results/
    ├── table1.csv            — Demographics table
    ├── main_results.csv      — Primary analysis results
    └── subgroup_results.csv  — Subgroup analysis results (if applicable)

Example Invocation

/replicate-study

Source paper: Joo 2026 (Psychiatry Research) — depression/diabetes cross-national
Target DB: /path/to/knhanes/HN18.csv
Harmonization: /path/to/harmonization_knhanes_nhanes.csv

Anti-Hallucination

Never fabricate variable names, dataset column names, or variable codings. If a variable mapping is uncertain, output [VERIFY: variable_name] and ask the user to confirm against the data dictionary.
Never fabricate statistical results — no invented p-values, effect sizes, confidence intervals, or sample sizes. All numbers must come from executed code output.
Never generate references from memory. Use /search-lit for all citations.
If a function, package, or API does not exist or you are unsure, say so explicitly rather than guessing.

Mehr aus diesem Repository

gleiches Repository

write-paper

Aperivue/medsci-skills

Full-pipeline medical/scientific paper writing. 8-phase IMRAD workflow from outline to submission-ready manuscript. Supports original articles, case reports, case series, meta-analyses, AI validation studies, animal studies, and technical notes. Do NOT trigger for self-checking (use self-review instead).

2026-06-21161

academic-aio

Aperivue/medsci-skills

Medical AI paper optimization for AI search engines (Perplexity, ChatGPT web, Elicit, Consensus, SciSpace) and RAG-based literature tools. Applies when drafting or reviewing titles, abstracts, structured summary boxes (Key Points / Research in Context / Plain-Language Summary), manuscripts for high-impact medical AI journals (Lancet Digital Health, Radiology, Radiology-AI, npj Digital Medicine, Nature Medicine), preprints (medRxiv/arXiv), GitHub README + CITATION.cff + Zenodo archives, and Hugging Face model/dataset cards. Integrates TRIPOD+AI, CLAIM 2024, STARD-AI, TRIPOD-LLM, DECIDE-AI reporting requirements with generative engine optimization (GEO) principles. Produces a visible pass/fail checklist.

2026-06-21161

peer-review

Aperivue/medsci-skills

Peer review assistant for medical journals. Generates structured review drafts with journal-specific formatting. Constructive developmental tone with systematic manuscript analysis.

2026-06-21161

self-review

Aperivue/medsci-skills

Pre-submission self-review for the user's own manuscripts, applying a reviewer perspective. Systematic check across 10 categories with research-type branching. Outputs Anticipated Major/Minor Comments with severity framing and optional R0 numbering for /revise pipeline integration.

2026-06-21161

sync-submission

Aperivue/medsci-skills

Audit SSOT-to-submission drift and create journal submission manifests from canonical manuscript artifacts.

2026-06-21161

add-journal

Aperivue/medsci-skills

Add a new journal to the MedSci Skills profile database. Extracts metadata from author guidelines, generates write-paper (detailed) and find-journal (compact) profiles in canonical format with quality gates.

2026-06-21161

name	replicate-study
description	Replicate an existing cohort study's methodology on a different database. Extracts study design from a source paper, maps variables to the target DB via harmonization table, generates analysis code, and produces a replication difference report.
triggers	replicate study, replicate paper, 논문 복제, 방법론 복제, reproduce study, replication, 다른 DB로, swap database, 데이터 교체
tools	Read, Write, Edit, Bash, Grep, Glob
model	opus

Replicate Study Skill

When to Use

Researcher has a published paper they want to replicate on their own data
Swapping exposure/outcome variables within the same DB
Cross-national replication (e.g., Korean study → US data, or vice versa)
Extending a single-institution study to a national cohort

Inputs

Source paper: PDF, DOI, or markdown of the paper to replicate
Target database path: CSV/SAS data file(s) to use
Harmonization table (optional): CSV mapping source → target variables
- Default: ${SKILL_DIR}/references/harmonization_knhanes_nhanes.csv (if KNHANES↔NHANES)

Reference Files

${SKILL_DIR}/references/methodology_extraction_template.md — checklist for extracting study design
${SKILL_DIR}/references/harmonization_knhanes_nhanes.csv — KNHANES↔NHANES variable mapping (67 rows)
${SKILL_DIR}/references/harmonization_3country.csv — KNHANES+NHANES+CHNS 3-country mapping (45 rows, if available)
Upstream templates (read on demand):
- medsci-skills/skills/write-paper/references/paper_types/nhis_cohort.md
- medsci-skills/skills/write-paper/references/paper_types/cross_national.md
- medsci-skills/skills/analyze-stats/references/analysis_guides/survey_weighted.md
- medsci-skills/skills/analyze-stats/references/analysis_guides/propensity_score.md

Workflow

Phase 1: Source Paper Analysis

Read the source paper (PDF → text, or markdown).
Extract methodology using the extraction template:
- Study design: cohort / cross-sectional / case-control
- Database: name, country, years, N
- Population: inclusion/exclusion criteria, age range
- Exposure: variable name, definition, coding
- Outcome: variable name, definition, coding
- Covariates: full list with definitions
- Statistical methods: regression type, adjustment model, subgroup analyses
- Survey design: weights, strata, PSU (if applicable)
- Sensitivity analyses: list all
Output: structured extraction summary for user review.

Phase 2: Variable Mapping

Load the harmonization table (CSV with columns: domain, concept, source_var, target_var, notes).
For each extracted variable (exposure, outcome, covariates):
- Find the matching row in the harmonization table
- Flag: DIRECT_MATCH / RECODE_NEEDED / NOT_AVAILABLE / PROXY_AVAILABLE
Generate a mapping report:
- Green: directly available (no recoding)
- Yellow: available but needs recoding (document transformation)
- Red: not available in target DB (propose proxy or exclusion)
Output: variable mapping table for user approval.

Phase 3: Code Generation

Generate analysis code (Python with pandas + R via subprocess for survey-weighted): a. Data loading & cleaning: read target DB, apply inclusion/exclusion b. Variable derivation: recode variables per mapping table c. Survey design setup: define svydesign object (strata, PSU, weights) d. Table 1: demographics by exposure group (weighted) e. Main analysis: replicate the primary model (logistic/Cox/linear regression) f. Subgroup analyses: if specified in source paper g. Sensitivity analyses: replicate all listed in source paper
Use /analyze-stats templates where available (survey_weighted, propensity_score).
All code must be self-contained and reproducible.

Phase 4: Difference Report

Generate a structured difference report documenting:

Section	Content
Study Design	Same / Modified (explain)
Database	Source DB → Target DB (N, years, country)
Population	Inclusion/exclusion differences
Variable Mapping	Full mapping table with match status
Unavailable Variables	What's missing and how handled
Methodological Differences	Any forced changes (e.g., BMI cutoffs, LDL calculation)
Expected Differences	Why results may differ (population, measurement, cultural)

Save as replication_report.md in the working directory.

Phase 5: Validation Checklist

Before reporting completion, verify:

All source paper covariates accounted for (mapped, proxied, or documented as missing)
Survey weights correctly applied (NEVER analyze unweighted if source used weights)
Obesity/BMI cutoffs match target population standards (Asian vs WHO)
Fasting requirements matched (fasting glucose, lipids)
Age restrictions applied correctly
Code runs without errors on target data
Output tables match source paper structure

Critical Rules

Never pool data across surveys. Analyze each country's data with its own survey design.
Document every deviation from the source methodology in the difference report.
Asian BMI cutoffs (≥25 for obesity) when analyzing Korean data, even if source used WHO (≥30).
LDL calculation: note if source used direct measurement vs Friedewald.
Weighted analysis is mandatory for KNHANES/NHANES — never run unweighted models.
IRB: note that KNHANES/NHANES are de-identified public data (IRB exempt or waived).
Outdated source definitions: if the source paper used a pre-2023 definition that has since been superseded (e.g., NAFLD → MASLD 2023, CKD-EPI 2009 → 2021 race-free), call /define-variables to cross-check whether to mirror the legacy definition (pure replication) or upgrade to current (extension). Document the choice explicitly in the difference report.

Output Files

{working_dir}/
├── replication_report.md     — Structured difference report
├── variable_mapping.csv      — Variable mapping table with match status
├── analysis_code.py          — Main analysis script (Python + R calls)
├── analysis_code.R           — R script for survey-weighted analysis
└── results/
    ├── table1.csv            — Demographics table
    ├── main_results.csv      — Primary analysis results
    └── subgroup_results.csv  — Subgroup analysis results (if applicable)

Example Invocation

/replicate-study

Source paper: Joo 2026 (Psychiatry Research) — depression/diabetes cross-national
Target DB: /path/to/knhanes/HN18.csv
Harmonization: /path/to/harmonization_knhanes_nhanes.csv

Anti-Hallucination

Never fabricate variable names, dataset column names, or variable codings. If a variable mapping is uncertain, output [VERIFY: variable_name] and ask the user to confirm against the data dictionary.
Never fabricate statistical results — no invented p-values, effect sizes, confidence intervals, or sample sizes. All numbers must come from executed code output.
Never generate references from memory. Use /search-lit for all citations.
If a function, package, or API does not exist or you are unsure, say so explicitly rather than guessing.