en un clic
mic-nutrient-creation
// Skill for creating new nutrient YAML files from MIC website content. Use this when extracting a nutrient from lpi.oregonstate.edu/mic. Also useful for enhancing existing nutrient entries.
// Skill for creating new nutrient YAML files from MIC website content. Use this when extracting a nutrient from lpi.oregonstate.edu/mic. Also useful for enhancing existing nutrient entries.
| name | mic-nutrient-creation |
| description | Skill for creating new nutrient YAML files from MIC website content. Use this when extracting a nutrient from lpi.oregonstate.edu/mic. Also useful for enhancing existing nutrient entries. |
Guide the creation of new nutrient YAML files in the MIC knowledge base. The MIC website (lpi.oregonstate.edu/mic) serves as the authoritative research source. This skill emphasizes evidence-based extraction with proper ontology grounding.
kb/nutrients/This skill can also be consulted for ongoing curation of existing nutrients.
Determine the nutrient and its category:
vitamins/ - Water-soluble and fat-soluble vitaminsminerals/ - Essential mineral elementsdietary-factors/ - Other dietary factors (fiber, flavonoids, etc.)food-beverages/ - Specific foods or beveragesCheck if it already exists:
ls kb/nutrients/**/*.yaml
Update CURATION-PROGRESS.md to mark the nutrient as in progress:
[ ] to [~] to indicate work has startedExample: Change this line:
- [ ] Folate (`kb/nutrients/vitamins/folate.yaml`)
To:
- [~] Folate (`kb/nutrients/vitamins/folate.yaml`) - in progress
This helps track curation status and prevents duplicate work.
Download the MIC page using the just fetch-mic-page command:
# Format: just fetch-mic-page {category}/{nutrient}
just fetch-mic-page vitamins/biotin
just fetch-mic-page minerals/calcium
just fetch-mic-page dietary-factors/lipoic-acid
This downloads the HTML to cache/mic-pages/{nutrient}.html.
Verify the download:
ls -la cache/mic-pages/biotin.html
IMPORTANT: Before adding evidence, extract the MIC reference number → PMID mapping:
# Get TSV mapping of ref# to PMID
just extract-refs cache/mic-pages/biotin.html
This outputs a TSV with columns: source, reference_number, pubmed_id, citation
Example output:
source reference_number pubmed_id citation
biotin.html 1 Zempleni J, Wijeratne SSK...
biotin.html 2 PMID:10357733 Mock DM. Biotin...
biotin.html 3 PMID:15992684 Zempleni J, Hassan YI...
Key points:
pubmed_id = book chapter or non-PMID source (use mic_references only)pubmed_id = can fetch abstract and add PMID evidencereference_number corresponds to mic_references values in the YAMLSave the mapping for reference during curation:
just extract-refs cache/mic-pages/biotin.html > cache/refs/biotin-refs.tsv
Create kb/nutrients/{category}/{nutrient}.yaml with the basic structure:
name: Biotin
nutrient_term:
preferred_term: biotin
term:
id: CHEBI:15956
label: biotin
category: vitamin
source_url: https://lpi.oregonstate.edu/mic/vitamins/biotin
alternate_names:
- Vitamin B7
- Vitamin H
description: |
Biotin is a water-soluble B vitamin...
functions: []
deficiency: null
toxicity: null
food_sources: []
drug_interactions: []
nutrient_interactions: []
disease_associations: []
recommendations: null
references: []
Validate the structure:
just validate kb/nutrients/vitamins/biotin.yaml
For each MIC page section, read the content and extract structured data:
Extract biological roles, enzymes, and processes:
functions:
- name: Cofactor for Carboxylases
description: |
Biotin serves as a covalently bound cofactor for five mammalian
carboxylases that catalyze carbon dioxide transfer reactions.
biological_processes:
- preferred_term: fatty acid biosynthesis
term:
id: GO:0006633
label: fatty acid biosynthetic process
genes:
- preferred_term: PC
description: Pyruvate carboxylase
term:
id: HGNC:8636
label: PC
evidence:
- reference: PMID:10357733
supports: SUPPORT
snippet: "Biotin serves as a covalently bound coenzyme for five mammalian carboxylases"
explanation: Directly supports biotin's role as carboxylase cofactor
Extract symptoms, at-risk groups, and causes:
deficiency:
name: Biotin Deficiency
description: |
Biotin deficiency is rare in healthy individuals...
phenotypes:
- name: Dermatitis
phenotype_term:
preferred_term: dermatitis
term:
id: HP:0000964
label: Eczema
frequency: FREQUENT
evidence:
- reference: PMID:10357733
supports: SUPPORT
snippet: "dermatitis, conjunctivitis, and alopecia"
at_risk_groups:
- name: Pregnant women
description: Marginal biotin deficiency common during pregnancy
Extract disease associations:
disease_associations:
- name: Biotin and Neural Tube Defects
disease_term:
preferred_term: neural tube defect
term:
id: MONDO:0005343
label: neural tube defect
relationship_type: RISK_FACTOR
direction: DECREASED
population_context: pregnant women with marginal biotin status
evidence:
- reference: PMID:16549401
supports: SUPPORT
snippet: "low biotin status during early pregnancy..."
Extract dietary sources:
food_sources:
- name: Egg yolk
food_term:
preferred_term: egg yolk
term:
id: FOODON:00002669
label: egg yolk
amount: "10 mcg"
serving_size: "1 large egg"
Extract interactions with medications:
drug_interactions:
- name: Anticonvulsants and Biotin
drug_term:
preferred_term: anticonvulsant
interaction_type: REDUCES_ABSORPTION
clinical_significance: MODERATE
evidence:
- reference: PMID:8157857
supports: SUPPORT
snippet: "long-term anticonvulsant therapy..."
Use OAK to find correct ontology terms:
uv run runoak -i sqlite:obo:chebi info "biotin"
uv run runoak -i sqlite:obo:chebi info "l~ascorbic acid"
uv run runoak -i sqlite:obo:hp info "l~dermatitis"
uv run runoak -i sqlite:obo:hp info HP:0000964
uv run runoak -i sqlite:obo:go info "l~fatty acid biosynthesis"
uv run runoak -i sqlite:obo:mondo info "l~neural tube defect"
uv run runoak -i sqlite:obo:foodon info "l~egg yolk"
uv run runoak -i sqlite:obo:hgnc info "l~pyruvate carboxylase"
Use the reference mapping from Step 4 to add evidence. For each mic_references number:
mic_references only, no PMID evidenceevidence:
- reference: PMID:12345678
supports: SUPPORT
snippet: "Exact quote from abstract"
explanation: "Why this supports the claim"
Fetch and verify abstracts:
# Look up PMID for MIC reference number (from Step 3 mapping)
# e.g., ref 55 → PMID:15585762
just fetch-reference PMID:15585762
cat cache/references/pmid_15585762.md
mic_references without evidence blockRun all validation checks:
# Schema validation
just validate kb/nutrients/vitamins/biotin.yaml
# Term validation (ontology IDs and labels)
just validate-terms-file kb/nutrients/vitamins/biotin.yaml
# Reference validation (snippets match abstracts)
just validate-references kb/nutrients/vitamins/biotin.yaml
# Full QC
just qc
Check completeness:
just compliance kb/nutrients/vitamins/biotin.yaml
Address high-priority missing fields first:
nutrient_term.term (weight 5.0)disease_associations[].disease_term.term (weight 3.0)deficiency.phenotypes[].phenotype_term.term (weight 3.0)When all validation passes, update CURATION-PROGRESS.md to mark the nutrient as completed:
[~] to [x] to indicate completionExample: Change this line:
- [~] Folate (`kb/nutrients/vitamins/folate.yaml`) - in progress
To:
- [x] Folate (`kb/nutrients/vitamins/folate.yaml`)
Use lowercase with hyphens:
vitamin-b12.yamlalpha-lipoic-acid.yamlcoenzyme-q10.yamlA new nutrient file MUST include at minimum:
| Field | Source | Notes |
|---|---|---|
name | MIC page | Human-readable nutrient name |
nutrient_term | OAK lookup | CHEBI term binding |
category | MIC category | vitamin, mineral, etc. |
source_url | MIC URL | Full URL to MIC page |
functions (1+) | MIC content | At least one function |
evidence (1+) | MIC references | At least one PMID reference |
Re-run OAK lookup with fuzzy search:
uv run runoak -i sqlite:obo:chebi info "l~<term>"
The quoted text must be from the PMID's abstract. Fetch and verify:
just fetch-reference PMID:12345678
Check the schema for required fields. nutrient_term is always required.
Before finalizing a nutrient file, verify:
just validate passesjust validate-terms-file passesjust validate-references passesSkill for analyzing and improving compliance in the MIC knowledge base. Use this when checking nutrient file completeness, identifying missing fields (ontology terms, evidence, descriptions), understanding weighted priority scoring, and systematically improving knowledge base coverage.
Skill for validating and repairing evidence references in the MIC knowledge base. Use this when working with evidence items in nutrient YAML files, validating that snippet text matches PubMed abstracts, and repairing misquoted evidence. Critical for ensuring scientific accuracy and preventing AI hallucinations.
Skill for adding and validating ontology term annotations in the MIC knowledge base. Covers CHEBI, FOODON, HP, GO, MONDO, UBERON, HGNC lookups. Use when adding term bindings to nutrient YAML files.