Jeden Skill in Manus ausführen
mit einem Klick

Jeden Skill in Manus mit einem Klick ausführen

generate-codebook

Sterne161

Forks42

Aktualisiert21. Juni 2026 um 08:21

Generate a citable data dictionary / codebook from a tabular dataset (CSV/TSV/Excel/Parquet/Stata/SAS). Profiles every variable — role, type, units placeholder, level frequencies, range/quantiles, missingness — and emits codebook.md + codebook.json. Flags coded variables whose level meanings are unknown as [NEEDS DICTIONARY] rather than guessing them, feeding /define-variables and the dictionary-first workflow.

Installation

Mit Codex oder Claude installieren Kopieren Sie diesen Prompt, fügen Sie ihn in Codex, Claude oder einen anderen Assistant ein und lassen Sie die Skill-Seite prüfen und installieren.

In Manus ausführen

Quelle

Aperivue

Aperivue/medsci-skills

GitHub-Repository öffnen Creator-Repositorys ansehen

Download

In Manus ausführen

Verwandte BerufeSOC

Basierend auf der SOC-Berufsklassifikation

Medizinwissenschaftler (außer Epidemiologen)Natur- und Sozialwissenschaften·SOC 19-1042

Datei-Explorer

5 Dateien

SKILL.md

readonly

name	generate-codebook
description	Generate a citable data dictionary / codebook from a tabular dataset (CSV/TSV/Excel/Parquet/Stata/SAS). Profiles every variable — role, type, units placeholder, level frequencies, range/quantiles, missingness — and emits codebook.md + codebook.json. Flags coded variables whose level meanings are unknown as [NEEDS DICTIONARY] rather than guessing them, feeding /define-variables and the dictionary-first workflow.
triggers	generate codebook, data dictionary, codebook, profile variables, variable dictionary, describe dataset, what variables, column dictionary, build codebook
tools	Read, Write, Edit, Bash, Grep, Glob
model	inherit

Generate Codebook Skill

You help a medical researcher turn a raw tabular dataset into a structured, citable data dictionary (codebook). This is the generator side of the dictionary-first workflow: it produces the artifact that /define-variables and dictionary-first QC later consume. You generate code and review output — you do not invent the meaning of coded values.

Communication Rules

Communicate with the user in their preferred language.
Variable names, codebook fields, and report output are in English.
Medical terminology is always in English.

Philosophy

A codebook describes what is in the data, not what the codes mean. Column distributions, types, and missingness are observable and safe to profile. The meaning of a coded value (fatty_liver_grade = 0) is NOT observable from the data — it lives in the authoritative data dictionary. This skill profiles the former deterministically and explicitly flags the latter as [NEEDS DICTIONARY] so a human fills it from the source. This is the generator counterpart to the dictionary-first rule that /define-variables enforces on consumption.

Reference Files

Schema + role rules: ${CLAUDE_SKILL_DIR}/references/codebook_schema.md — the codebook.json schema, the role-inference heuristics, and how the output threads into /define-variables and dictionary-first QC. Read this before interpreting output.

Deterministic Script

Run the bundled profiler rather than describing columns from memory:

python "${CLAUDE_SKILL_DIR}/scripts/generate_codebook.py" data.csv --out-dir .

Supports .csv/.tsv/.xlsx/.parquet/.dta/.sas7bdat. Flags: --max-levels N (categorical cutoff, default 20), --json-only, --md-only. The script is pandas-only, runs locally, and never sends data anywhere.

Workflow

Step 1: Profile (deterministic)

Run generate_codebook.py on the dataset. It writes codebook.json (machine- readable) and codebook.md (review table), reporting per variable: role (id / continuous / categorical / binary / date / text), dtype, missingness, unique count, level frequencies or quantile summary, and a needs_dictionary flag.

Step 2: Review with the researcher (gate)

Present codebook.md and walk the user through it. Gate: the user confirms the inferred roles (e.g., an integer-coded scale mis-read as continuous, or an id column). Do not proceed to definition work until the user approves the role assignments.

Step 3: Resolve [NEEDS DICTIONARY] items (gate)

For every variable flagged needs_dictionary: true, the level codes are uninterpretable without the authoritative source. Gate: ask the user to supply the meaning of each code from the real data dictionary (file/sheet/row), or to confirm none exists. Fill label, units, and per-level meanings into the codebook only from that source — never from inference. If the user cannot supply it, leave the [NEEDS DICTIONARY] marker in place; do not erase it.

Step 4: Hand off

The completed codebook.json becomes the input dictionary for /define-variables (operationalization) and the citation source for dictionary-first QC. Gate: confirm with the user that no needs_dictionary flags remain unresolved before the codebook is treated as authoritative for downstream analysis.

Scope Limitations

Supported

Tabular files: CSV, TSV, Excel, Parquet, Stata (.dta), SAS (.sas7bdat).
Per-variable profiling, role inference, missingness, level/range summaries.

NOT Supported

Inventing or guessing the meaning of coded values (that is [NEEDS DICTIONARY]).
Cleaning or transforming data — use /clean-data.
De-identification — use /deidentify before sharing.
Operationalizing exposure/outcome definitions — use /define-variables (this skill feeds it).

Cross-Skill Integration

/define-variables consumes codebook.json as its data dictionary input.
/clean-data profiles + cleans; this skill produces a durable dictionary artifact instead.
/deidentify should run on the raw data before a codebook is shared externally.

Output Format

codebook.json (schema in references) and codebook.md (review table with a "Columns requiring dictionary lookup" section). Summarize the counts (rows, columns, needs_dictionary_count) in chat; do not paste the full JSON.

Worked Example

Input cohort.csv:

patient_id,age,sex,fatty_liver_grade,smoking_status,visit_date
1001,54,1,0,never,2023-01-15
1002,61,2,2,former,2023-02-03

Run:

python "${CLAUDE_SKILL_DIR}/scripts/generate_codebook.py" cohort.csv --out-dir .
# -> {"n_rows": ..., "n_columns": 6, "needs_dictionary_count": 2, "outputs": [...]}

codebook.md (excerpt):

| Variable            | Role        | Missing % | Unique | Needs dictionary |
| `patient_id`        | id          | 0.0       | N      |                  |
| `age`               | continuous  | 0.0       | ...    |                  |
| `sex`               | binary      | 0.0       | 2      | ⚠️ YES           |
| `fatty_liver_grade` | categorical | 0.0       | 5      | ⚠️ YES           |
| `smoking_status`    | categorical | 0.0       | 3      |                  |
| `visit_date`        | date        | 0.0       | ...    |                  |

sex and fatty_liver_grade are flagged because their levels are bare codes (1/2, 0..4). smoking_status is not flagged — its levels are already human-readable. The reviewer then:

Opens the project's authoritative data dictionary.
Fills sex: 1 = male, 2 = female and fatty_liver_grade: 0 = none … 4 = suspected into the codebook from that source (citing file > sheet > row).
Confirms no [NEEDS DICTIONARY] flags remain, then hands codebook.json to /define-variables.

What the skill must never do: write sex: 1 = male because "that is the usual coding." If the dictionary is unavailable, the flag stays.

Anti-Hallucination

Never invent a variable's label, units, or the meaning of any coded level.
Coded categorical/binary columns with bare codes are flagged [NEEDS DICTIONARY]; the meaning is filled only from the authoritative data dictionary, then cited.
Role inference is a heuristic — surface it for user confirmation, do not assert it as ground truth.
The profiler reads values locally; no data is sent to any model or network.

Mehr aus diesem Repository

gleiches Repository

write-paper

Aperivue/medsci-skills

Full-pipeline medical/scientific paper writing. 8-phase IMRAD workflow from outline to submission-ready manuscript. Supports original articles, case reports, case series, meta-analyses, AI validation studies, animal studies, and technical notes. Do NOT trigger for self-checking (use self-review instead).

2026-06-21161

academic-aio

Aperivue/medsci-skills

Medical AI paper optimization for AI search engines (Perplexity, ChatGPT web, Elicit, Consensus, SciSpace) and RAG-based literature tools. Applies when drafting or reviewing titles, abstracts, structured summary boxes (Key Points / Research in Context / Plain-Language Summary), manuscripts for high-impact medical AI journals (Lancet Digital Health, Radiology, Radiology-AI, npj Digital Medicine, Nature Medicine), preprints (medRxiv/arXiv), GitHub README + CITATION.cff + Zenodo archives, and Hugging Face model/dataset cards. Integrates TRIPOD+AI, CLAIM 2024, STARD-AI, TRIPOD-LLM, DECIDE-AI reporting requirements with generative engine optimization (GEO) principles. Produces a visible pass/fail checklist.

2026-06-21161

peer-review

Aperivue/medsci-skills

Peer review assistant for medical journals. Generates structured review drafts with journal-specific formatting. Constructive developmental tone with systematic manuscript analysis.

2026-06-21161

self-review

Aperivue/medsci-skills

Pre-submission self-review for the user's own manuscripts, applying a reviewer perspective. Systematic check across 10 categories with research-type branching. Outputs Anticipated Major/Minor Comments with severity framing and optional R0 numbering for /revise pipeline integration.

2026-06-21161

sync-submission

Aperivue/medsci-skills

Audit SSOT-to-submission drift and create journal submission manifests from canonical manuscript artifacts.

2026-06-21161

add-journal

Aperivue/medsci-skills

Add a new journal to the MedSci Skills profile database. Extracts metadata from author guidelines, generates write-paper (detailed) and find-journal (compact) profiles in canonical format with quality gates.

2026-06-21161

name	generate-codebook
description	Generate a citable data dictionary / codebook from a tabular dataset (CSV/TSV/Excel/Parquet/Stata/SAS). Profiles every variable — role, type, units placeholder, level frequencies, range/quantiles, missingness — and emits codebook.md + codebook.json. Flags coded variables whose level meanings are unknown as [NEEDS DICTIONARY] rather than guessing them, feeding /define-variables and the dictionary-first workflow.
triggers	generate codebook, data dictionary, codebook, profile variables, variable dictionary, describe dataset, what variables, column dictionary, build codebook
tools	Read, Write, Edit, Bash, Grep, Glob
model	inherit

Generate Codebook Skill

Communication Rules

Communicate with the user in their preferred language.
Variable names, codebook fields, and report output are in English.
Medical terminology is always in English.

Philosophy

Reference Files

Schema + role rules: ${CLAUDE_SKILL_DIR}/references/codebook_schema.md — the codebook.json schema, the role-inference heuristics, and how the output threads into /define-variables and dictionary-first QC. Read this before interpreting output.

Deterministic Script

Run the bundled profiler rather than describing columns from memory:

python "${CLAUDE_SKILL_DIR}/scripts/generate_codebook.py" data.csv --out-dir .

Workflow

Step 1: Profile (deterministic)

Step 2: Review with the researcher (gate)

Step 3: Resolve [NEEDS DICTIONARY] items (gate)

Step 4: Hand off

Scope Limitations

Supported

Tabular files: CSV, TSV, Excel, Parquet, Stata (.dta), SAS (.sas7bdat).
Per-variable profiling, role inference, missingness, level/range summaries.

NOT Supported

Inventing or guessing the meaning of coded values (that is [NEEDS DICTIONARY]).
Cleaning or transforming data — use /clean-data.
De-identification — use /deidentify before sharing.
Operationalizing exposure/outcome definitions — use /define-variables (this skill feeds it).

Cross-Skill Integration

/define-variables consumes codebook.json as its data dictionary input.
/clean-data profiles + cleans; this skill produces a durable dictionary artifact instead.
/deidentify should run on the raw data before a codebook is shared externally.

Output Format

Worked Example

Input cohort.csv:

patient_id,age,sex,fatty_liver_grade,smoking_status,visit_date
1001,54,1,0,never,2023-01-15
1002,61,2,2,former,2023-02-03

Run:

python "${CLAUDE_SKILL_DIR}/scripts/generate_codebook.py" cohort.csv --out-dir .
# -> {"n_rows": ..., "n_columns": 6, "needs_dictionary_count": 2, "outputs": [...]}

codebook.md (excerpt):

| Variable            | Role        | Missing % | Unique | Needs dictionary |
| `patient_id`        | id          | 0.0       | N      |                  |
| `age`               | continuous  | 0.0       | ...    |                  |
| `sex`               | binary      | 0.0       | 2      | ⚠️ YES           |
| `fatty_liver_grade` | categorical | 0.0       | 5      | ⚠️ YES           |
| `smoking_status`    | categorical | 0.0       | 3      |                  |
| `visit_date`        | date        | 0.0       | ...    |                  |

sex and fatty_liver_grade are flagged because their levels are bare codes (1/2, 0..4). smoking_status is not flagged — its levels are already human-readable. The reviewer then:

Opens the project's authoritative data dictionary.
Fills sex: 1 = male, 2 = female and fatty_liver_grade: 0 = none … 4 = suspected into the codebook from that source (citing file > sheet > row).
Confirms no [NEEDS DICTIONARY] flags remain, then hands codebook.json to /define-variables.

What the skill must never do: write sex: 1 = male because "that is the usual coding." If the dictionary is unavailable, the flag stays.

Anti-Hallucination

Never invent a variable's label, units, or the meaning of any coded level.
Coded categorical/binary columns with bare codes are flagged [NEEDS DICTIONARY]; the meaning is filled only from the authoritative data dictionary, then cited.
Role inference is a heuristic — surface it for user confirmation, do not assert it as ground truth.
The profiler reads values locally; no data is sent to any model or network.