with one click
describe-concept-set
// Generate a detailed clinical description for an INDICATE concept set, using UMLS, LOINC, and SNOMED vocabulary sources. Use when the user wants to describe or document a concept set.
// Generate a detailed clinical description for an INDICATE concept set, using UMLS, LOINC, and SNOMED vocabulary sources. Use when the user wants to describe or document a concept set.
[HINT] Download the complete skill directory including SKILL.md and all related files
| name | describe-concept-set |
| description | Generate a detailed clinical description for an INDICATE concept set, using UMLS, LOINC, and SNOMED vocabulary sources. Use when the user wants to describe or document a concept set. |
| allowed-tools | Bash, Read, Write, Glob, Grep, WebFetch, WebSearch, AskUserQuestion, TodoWrite |
| argument-hint | [concept-set-name] |
Generate a detailed clinical description for an INDICATE concept set, using UMLS, LOINC, and SNOMED vocabulary sources.
You are an expert in OHDSI/OMOP vocabularies and clinical terminologies. Your task is to generate a concise description of a concept set that helps data engineers and data scientists, typically without a medical background, understand what the concept set captures and how to align source data to it. The description should give just enough context to make correct mapping decisions — not to teach clinical medicine.
First, read config.local.json at the repo root if it exists. The keys loincPath, snomedPath, umlsPath, and npuCodesPath give terminology source paths the user has already configured — use these silently and do not re-prompt for them.
Then ask the user only for what's still missing:
Concept set name (e.g., "Heart rate", "Mechanical ventilation")
Concept set ID (the numeric ID, e.g., 327)
Terminology source paths — only those not already set in config.local.json. All four are useful; ask the user which of the missing ones they have available and request the path for each. For any source the user does not have, give them the download URL below — they can either download it and re-invoke the skill, or skip it. Sources the user cannot provide are simply omitted from the lookup; the description is written using the available sources only, and the description itself states what was used (LOINC Part description / NPU / etc.) so the reader can see which inputs informed it.
When the user gives you a new path, suggest they save it to config.local.json so they don't have to provide it again.
loincPath) — path to the folder containing the LOINC distribution (must contain LoincTable/Loinc.csv and AccessoryFiles/). Used for COMPONENT / PROPERTY / METHOD decomposition, EXAMPLE_UCUM_UNITS, Part descriptions, and Consumer names.
Download: https://loinc.org/downloads/ (registration required, free).snomedPath) — path to the SNOMED CT RF2 release ZIP or extracted folder (must contain sct2_Description_Snapshot-en_*.txt and sct2_TextDefinition_Snapshot-en_*.txt). Used for FSN, synonyms, and text definitions of SNOMED concepts.
Download: https://www.nlm.nih.gov/healthit/snomedct/international.html (UMLS licence required, free).umlsPath) — path to the folder containing the UMLS Metathesaurus RRF files (MRCONSO.RRF, MRDEF.RRF, MRSTY.RRF, MRREL.RRF). Used as a fallback for clinical definitions (MeSH, NCI, etc.) when LOINC Part descriptions are thin.
Download: UMLS Metathesaurus Full Subset from https://www.nlm.nih.gov/research/umls/licensedcontent/umlsknowledgesources.html (UMLS licence required, free).npuCodesPath) — full path to the NPU codes CSV file (typically named npu-codes-latest.csv). Used as the authoritative source for the SI/IFCC-recommended unit of laboratory measurands (see Step 3f).
Download: https://npu-terminology.org/npu-database/.If none of the four are available, the description will rely solely on the concept names and any web information found in Step 3d — flag this clearly to the user before proceeding so they can decide whether to download the missing sources first.
Language for the output description (default: English)
The concept set files live in the repository:
concept_sets/{id}.jsonconcept_sets_resolved/{id}.jsonFetch both JSON files:
isExcluded, includeDescendants, includeMapped flags)Parse these to identify:
standardConcept: "S")isExcluded: trueFor each vocabulary source, search for definitions and metadata.
Priority order for the analyte definition (used to write the "Definition & Clinical Context" paragraph):
https://loinc.org/{concept_code}/ for any of the standard concepts in the set; the "Part description" section gives a short, authoritative definition of the analyte. Prefer this when available — it is the most concise and well-targeted source for a data engineer.Find the parent CUI — Search MRCONSO.RRF for the general concept (not individual LOINC codes, which rarely have definitions):
grep "|ENG|" MRCONSO.RRF | grep "|MSH|" | grep -i "concept_name"
Get definitions from MRDEF.RRF — Search by CUI for definitions from these sources (in priority order):
grep "^CUI_HERE|" MRDEF.RRF
Get semantic type from MRSTY.RRF:
grep "^CUI_HERE|" MRSTY.RRF
Get MeSH hierarchy from MRREL.RRF — find parent/child MeSH concepts:
grep "^CUI_HERE|" MRREL.RRF | grep "|MSH|"
Get synonyms from MRCONSO.RRF — find all English terms for the CUI:
grep "^CUI_HERE|" MRCONSO.RRF | grep "|ENG|"
Search LoincTable/Loinc.csv for each LOINC code to get:
Also check:
AccessoryFiles/ConsumerName/ConsumerName.csv for simplified namesAccessoryFiles/PartFile/Part.csv for component definitionsIf the concept set contains SNOMED concepts, extract from the RF2 release (the filename includes a release date, e.g. INT_20251101):
sct2_Description_Snapshot-en_*.txt — FSN and synonymssct2_TextDefinition_Snapshot-en_*.txt — Full text definitions (only ~3.6% of concepts have them)The snomedPath from config.local.json may point to:
Snapshot/Terminology/...), orSNOMED source/SRC/<release>.zip). In that case, locate the ZIP first with find <snomedPath> -name '*.zip' | head -1.# If the release is still a ZIP:
unzip -p "<path-to-snomed-zip>" "*/sct2_Description_Snapshot-en*" | grep "<SNOMED_ID>"
unzip -p "<path-to-snomed-zip>" "*/sct2_TextDefinition_Snapshot-en*" | grep "<SNOMED_ID>"
# If already extracted, grep the txt files directly inside the Snapshot/Terminology subfolder.
Use web search to find additional clinical information that vocabulary files may not provide:
This is especially useful for:
Cite what you use. If a specific factual claim in the description (e.g. "this is most often measured on venous samples", "this is the SI unit", "blood-gas analysers can measure it via co-oximetry") came from a web source, cite that source in the References section. Do NOT make assertive statements based on general knowledge alone — either find a citable source, soften the wording (e.g. "in practice, varies by lab"), or remove the claim. Present web sources found to the user and let them decide which to include in the final description.
The INDICATE data dictionary maintains two unit files that specify how measurements should be stored:
units/recommended_units.json — Maps each OMOP measurement concept to its recommended unit (fields: conceptId, conceptName, conceptCode, vocabularyId, domainId, recommendedUnitConceptId, recommendedUnitName, recommendedUnitCode, recommendedUnitVocabularyId).units/unit_conversions.json — Lists conversion factors between units for a given measurement concept. One row = one direction for one concept. Fields: conceptId, conceptName, sourceUnitConceptId, sourceUnitCode, sourceUnitName, conversionFactor, targetUnitConceptId, targetUnitCode, targetUnitName — meaning "1 unit of sourceUnitConceptId for conceptId = conversionFactor units of targetUnitConceptId". Each measurement concept that supports unit conversion has multiple rows (one per pair × direction).Read these files directly from the repository working tree (units/recommended_units.json, units/unit_conversions.json).
Lookup process:
recommended_units.json for entries where conceptId matches any of these OMOP concept IDs. This gives the recommended unit concept ID for each measurement.unit_conversions.json for entries where conceptId matches any of these OMOP concept IDs. This gives the accepted alternative units and their conversion factors.https://athena.ohdsi.org/search-terms/terms/{unitConceptId}) or infer from LOINC EXAMPLE_UCUM_UNITS field.# Example: filter recommended_units.json by a list of concept IDs
jq '[.[] | select(.conceptId == 3027018 or .conceptId == 4239408)]' units/recommended_units.json
# Example: filter unit_conversions.json by a list of concept IDs
jq '[.[] | select((.conceptId as $c | [3027018, 4239408] | index($c)))]' units/unit_conversions.json
What to extract:
When the description claims a unit is the "SI unit", "IFCC-recommended", "metrological standard", or similar, the claim must be backed by an authoritative source. The recommended primary source is NPU (Nomenclature for Properties and Units), an IUPAC/IFCC-maintained codification system that defines, for each clinical laboratory measurand, the system (specimen), the kind-of-property, and the standard unit. NPU is broader than JCTLM (it covers concentrations, activities, ratios, qualitative panels) and explicitly publishes both the SI variant and the conventional variant when both are in use (e.g. µkat/L and U/L for enzyme activities).
The NPU codes CSV path is collected in Step 1 (item 3). If the user did not provide it, skip NPU lookup and fall back to LOINC's EXAMPLE_UCUM_UNITS for the unit recommendation; in that case, do not claim "SI unit" / "IFCC-recommended" / "metrological standard" in the description text — those claims require NPU as a citable source.
Each row of the CSV contains, among others: npu_code, system (e.g. Plasma, Serum, Capillary blood), component (analyte), kind_of_property (e.g. substance concentration, catalytic activity concentration), unit (full name, e.g. micromole per litre, microkatal per litre), unit_short (UCUM-like short form, e.g. µmol/L, µkat/L, U/L), active.
Lookup approach:
active == 1 and component matching the analyte (e.g. Bilirubins, Gamma-Glutamyltransferase, Alkaline phosphatase). Restrict system to blood-related values (Plasma, Serum, Capillary blood, Blood, Patient(blood)) for "in blood" concept sets.unit_short column. For analytes with both a SI variant and a conventional variant (typical for enzymes), there will be two rows — one with µkat/L, one with U/L. Treat both as accepted units, but pick the one matching what real labs report (usually the conventional variant for enzymes, the SI variant for concentrations).# Example: list NPU codes for bilirubin in plasma/serum
import csv
with open("<path-to-npu-codes.csv>", encoding="utf-8") as f:
for r in csv.DictReader(f):
if r["active"] == "1" and r["component"].startswith("Bilirubin") and r["system"] in {"Plasma", "Serum"}:
print(r["npu_code"], r["short_definition"], r["unit_short"])
For non-laboratory measurements (vital signs, scales, anthropometrics), there is no NPU equivalent — fall back to clinical guidelines or the LOINC EXAMPLE_UCUM_UNITS field.
Before generating the description, show the user what was found:
=== VOCABULARY DATA FOUND ===
UMLS DEFINITIONS:
- [Source]: "Definition text..."
- [Source]: "Definition text..."
SEMANTIC TYPE: T201 - Clinical Attribute
MESH HIERARCHY:
- Parents: ...
- Children: ...
- Related: ...
SYNONYMS: term1, term2, term3
LOINC DECOMPOSITION (for each included standard concept):
| LOINC Code | Long Name | System | Method | Condition | UCUM Units | Class |
|------------|-----------|--------|--------|-----------|------------|-------|
| ... | ... | ... | ... | ... | ... | ... |
SNOMED DATA (if applicable):
- FSN: ...
- Synonyms: ...
- TextDefinition: ...
EXCLUDED CONCEPTS:
| Concept | Reason for exclusion (inferred) |
|---------|-------------------------------|
| ... | ... |
UNITS DATA:
- Recommended unit: [unit name] (concept ID: ...)
- Alternative units with conversions:
| From Unit | To Unit | Conversion Factor |
|-----------|---------|-------------------|
| ... | ... | ... |
- Concepts without recommended unit: [list or "none"]
WEB SOURCES (if found):
- [URL]: key information found
Ask the user if they want to adjust anything before generating the description.
Before generating the description, present the proposed logical grouping of included standard concepts to the user. Group concepts by method, site, condition, or clinical context — NOT alphabetically.
A concept may appear in multiple groups if it logically belongs to more than one category (e.g., a concept specifying both a method and a patient condition should appear under both "By measurement method" and "By clinical condition").
Ask the user if they want to adjust the grouping before proceeding.
Generate a structured description in Markdown and write it to a temporary file (tmp_{concept_set_name}_description.md at the repository root) so the user can preview it easily.
concept_id / concept_code — concept_name followed by the vocabulary name in parentheses if not already implied by context. Example: 3024128 / 1975-2 — Bilirubin.total [Mass/volume] in Serum or Plasma (LOINC). Never write a bare code like 20570-8 or 8554 without its label — the reader may not know the codes by heart, and a bare code forces them to look it up. This rule applies uniformly to LOINC concepts, SNOMED concepts, OMOP unit concepts (e.g. 8554 / % — percent (UCUM)), and any other vocabulary. The vocabulary suffix (LOINC) / (SNOMED) / (UCUM) may be omitted when the surrounding context already makes it obvious (e.g. inside a list of LOINC concepts), but the concept_id, code, and name must all be present.LG6199-6, LOINC Hierarchy nodes, or other non-standard classifier concepts). The convention across all concept sets in this repository is that these classification concepts are only used to anchor descendant inclusion — they are never standard concepts and are never mapping targets, so the reader does not need to be told that. Describe the inclusion strategy in plain natural language ("anchored on the two LOINC groups for X, pulling in all their descendants") rather than referencing the classification codes or OHDSI flags.includeDescendants: true, isExcluded: false, standardConcept: "S" in the description body. Translate to plain English ("descendants are included", "excluded from the set", "the standard concepts in this set"). The audience may not know the OHDSI conventions.Use numbered Vancouver-style references throughout the description:
[1], [2], etc. in the body text to cite sources.## References section with this format:## References
[1] {Author(s)}. {Title}. In: {Book/Journal}. {Publisher}; {Year}.
Available: <a href="{URL}" target="_blank">{URL}</a>
Key rules for references:
<a href="..." target="_blank"> for all URLs in the references section, since the description is rendered as HTML/Markdown in a web application.Regenstrief Institute. LOINC code {loinc_code} — {long_common_name}.
Available: <a href="https://loinc.org/{loinc_code}/" target="_blank">https://loinc.org/{loinc_code}/</a>
Nomenclature for Properties and Units (NPU). IFCC/IUPAC. NPU codes (current release).
Available: <a href="https://npu-terminology.org/npu-database/" target="_blank">https://npu-terminology.org/npu-database/</a>
Cite NPU once even when multiple NPU codes inform the description; do not link individual code pages.recommended_units.json, unit_conversions.json) as references. These are internal data and not authoritative external sources.## Definition & Clinical Context
[ONE short paragraph, 3-5 sentences maximum, answering only:]
- What is it? (plain-language definition — one sentence)
- What does "measuring it" produce concretely? (what a lab machine or device outputs — one sentence)
- In which clinical setting is it typically collected? (one short sentence)
- What units is it reported in? (one sentence)
DO NOT include: physiopathology, disease mechanisms, normal ranges (unless a range
actually disambiguates concepts in this set — e.g. adult vs neonatal), differential
diagnoses, detailed measurement methods, historical context, or patient-management
advice. If a sentence is not directly useful to a data engineer deciding how to
map source data, remove it.
## Included Concepts
[Brief intro explaining the concept set structure]
### [Group 1 name — e.g., "General heart rate"]
[For each concept:]
- *[OMOP concept_id] / [vocabulary code] — [Long common name] ([vocabulary])*: [1-2 sentence clinical description. No LOINC technical fields.]
### [Group 2 name — e.g., "By measurement method"]
...
[Group concepts logically. A concept may appear in multiple groups if relevant.]
**Note on units**: If all concepts share the same unit, state it once at the set level. Only call out individual units when they differ.
## Excluded Concepts
[Brief intro explaining the exclusion strategy]
[For each excluded concept or group:]
### [Group name]
[Why excluded]
## Mapping Notes
- Which concept to use as default when the source doesn't specify method/site
- Common source system names that map to specific concepts
- Disambiguation tips for similar-sounding concepts within this set (e.g. "serum or plasma" vs "serum, plasma or blood")
- Any gotchas specific to this concept set
- Always cite concepts with both code and concept name (e.g. ``3024128 / 1975-2 — Bilirubin.total [Mass/volume] in Serum or Plasma``)
- IMPORTANT: Never suggest fallback to a specific concept when information is missing — always default to the most general concept
- DO NOT include generic ETL rules that apply to all concept sets (e.g., unit inference, source preservation conventions). These belong in the project-wide mapping recommendations.
## References
[Numbered Vancouver-style references as described above]
Write the generated description to tmp_{concept_set_name}_description.md at the repository root so the user can preview it in their IDE. Present it and ask if they want any adjustments.
The description can then be:
longDescription field in the concept set metadata