원클릭으로
clinvar-database
// Use when needing clinical significance, pathogenicity classifications (e.g., Pathogenic, Benign, VUS), clinical evidence rationales, or finding "hard positive" benchmark controls for human genomic variants.
// Use when needing clinical significance, pathogenicity classifications (e.g., Pathogenic, Benign, VUS), clinical evidence rationales, or finding "hard positive" benchmark controls for human genomic variants.
| name | clinvar-database |
| description | Use when needing clinical significance, pathogenicity classifications (e.g., Pathogenic, Benign, VUS), clinical evidence rationales, or finding "hard positive" benchmark controls for human genomic variants. |
uv: Read the uv skill and follow its Setup instructions to ensure
uv is installed and on PATH.
User Notification: If LICENSE_NOTIFICATION.txt does not already exist in this skill directory then (1) prominently notify the user to check the terms at https://www.ncbi.nlm.nih.gov/clinvar/, then (2) create the file recording the notification text and timestamp.
.env file: Make sure the .env file exists in your home directory.
Create one if it does not exist.
NCBI_API_KEY (optional): Raises the NCBI rate limit from 3 to 10
requests/second. The skill works without it, but a key is recommended if the
user plans many queries or encounters a 429 error. The user can obtain one
for free by registering at https://www.ncbi.nlm.nih.gov/account/settings/.
If the variable is missing from .env, do NOT ask the user to paste it into
the chat (this would leak the key into the agent's context). Instead, give
the user this command — substituting ENV_FILE with the resolved literal
path to the .env file:
printf "Enter NCBI API key (typing hidden): " && read -s key && echo && echo "NCBI_API_KEY=$key" >> "ENV_FILE" && echo "Saved."
The scripts load credentials automatically via dotenv. NEVER read,
print, or inspect the .env file or its variables (e.g. no cat, grep,
echo, printenv, or os.environ.get on keys). Credentials must stay out
of the agent's context. See the
API Key section for more details.
ClinVar is the primary consensus record for clinical classifications of human genomic variations. It provides the "clinical ground truth" for pathogenicity labels (Pathogenic, Likely Pathogenic, Benign, VUS) based on assertions from global laboratories.
Use when you need to:
Do NOT use when you need to:
ClinVar queries are executed via a robust Python wrapper script to handle strict rate limiting and XML/JSON parsing.
Example: Search for BRCA1 variants
uv run scripts/clinvar_api.py search --query "BRCA1[gene]" --output results.json
--retmax 200. For
any "List all" or gene-wide request, you MUST explicitly set --retmax
higher (e.g., 1000) to ensure data completeness.NCBI_API_KEY to the
.env file.count — Count Matching VariantsPurpose: Check how many variants match a query without fetching IDs. Use to
decide whether a full search is warranted.
Arguments:
--query: (Required) NCBI Entrez search query string.--output: (Required) Output JSON file path.Example: uv run scripts/clinvar_api.py count \ --query "TP53[gene] AND \"uncertain significance\"[clinsig]" \ --output count.json Output:
{"total_count": <int>}
search — Search VariantsPurpose: Identify variants based on genomic location, gene symbols, or clinical attributes using NCBI Entrez search syntax. The search command automatically paginates through all matching results to ensure complete, deterministic retrieval.
# Fetch ALL matching variants (default behavior)
uv run scripts/clinvar_api.py search \
--query "BRCA1[gene]" --output results.json
# Search by Chromosome and Position Range
uv run scripts/clinvar_api.py search \
--query "11[chr] AND 5225000:5226000[chrpos]" --output results.json
# Combine terms using Entrez syntax
uv run scripts/clinvar_api.py search \
--query "HBB[gene] AND pathogenic[clinsig]" --output results.json
# Cap results at 50
uv run scripts/clinvar_api.py search \
--query "TP53[gene]" --retmax 50 --output results.json
Arguments:
--query: (Required) NCBI Entrez search query string.--retmax: Maximum total number of variant IDs to return. Default is 0,
which means "fetch all matching results." Set to a positive integer to cap
the result set.--page_size: Number of IDs to fetch per API request (default: 500, max:
10000 per NCBI limits).--output: (Required) Output JSON file path.Output: A JSON object containing:
total_count — Total number of matching variants in ClinVar.fetched_count — Number of IDs actually retrieved.variant_ids — List of ClinVar Variation ID strings.summary — Get Interpretation SummaryPurpose: Retrieve top-line clinical significance labels, star ratings (review status), and basic phenotype data for rapid variant screening.
# Get summary for one or more Variation IDs
uv run scripts/clinvar_api.py summary \
--variant_ids 12345 67890 --output summary.json
Arguments:
--variant_ids: (Required) One or more ClinVar Variation IDs.--output: (Required) Output JSON file path.Output: A JSON list of summary objects, each containing:
variant_id, title, clinical_significance, review_status, last_evaluated, phenotypesgenes — list of {gene_id, symbol, strand}variation_type — e.g., single nucleotide variant, Deletion, Insertionmolecular_consequences — list of strings (e.g., ["missense variant", evidence — Get Clinical EvidencePurpose: Fetch the full clinical record for a single variant, including free-text clinician rationales, assertion methods, and specific submitter notes.
# Get full evidence for a single Variation ID
uv run scripts/clinvar_api.py evidence \
--variant_id 12345 --output evidence.json
Arguments:
--variant_id: (Required) A single ClinVar Variation ID.--output: (Required) Output JSON file path.Output: A JSON object containing:
variant_idallele_info — {chromosome, position_start, position_stop, reference_allele, alternate_allele, cytogenetic_band, dbsnp_rsid} (GRCh38
preferred)conditions — list of {name, medgen_cui, omim_id, orphanet_id, hpo_terms}functional_consequences — list of {value, sequence_ontology_id}structural_variant_details — {outer_start, inner_start, inner_stop, outer_stop, copy_number} (present only for CNVs, otherwise null)citation_references — list of PubMed IDs cited in the global "Citations"
sectionsubmissions — list of per-submitter records, each containing:
submitter_name, classification, curator_notes,
assertion_criteriadate_last_evaluated — when the submitter last reviewed the
classificationFor large or unknown result sets, use count first to decide whether to
proceed, then search (which auto-paginates and returns total_count /
fetched_count), then summary to screen.
# Step 1: Gauge size (optional — search also returns total_count)
uv run scripts/clinvar_api.py count \
--query "HBB[gene] AND pathogenic[clinsig]" --output count.json
# Step 2: Fetch all variant IDs (auto-paginates)
uv run scripts/clinvar_api.py search \
--query "HBB[gene] AND pathogenic[clinsig]" --output ids.json
# Step 3: Get summaries (extract variant_ids from search output)
uv run scripts/clinvar_api.py summary \
--variant_ids 12345 67890 --output summary.json
When you need the full clinical picture for a specific variant — including
submitter rationales, PubMed citations, ontology-linked conditions, and allele
coordinates — use evidence.
uv run scripts/clinvar_api.py evidence \
--variant_id 12345 --output evidence.json
ClinVar metadata is inconsistent. To fulfill "List all" requests, do not rely on a single filter. Perform the following in a single turn and merge results:
"3 prime UTR variant"[molecular_consequence]).c.*).[chrpos]).This "triangulation" ensures structural variants with missing labels are not overlooked.
molecular_consequences alone can be ambiguous (e.g., splice donor variant
appears in both coding and non-coding contexts). Always cross-check the title
field for HGVS patterns:
c.-… — 5' UTR (non-coding)c.*… — 3' UTR (non-coding)c.123+N / c.123-N — intronic (non-coding)p.Trp146Arg etc. — protein effect (coding)A variant with UTR/intronic HGVS and no p. annotation is non-coding, even with
splicing labels. Conversely, any p. annotation indicates a coding effect.
"3 prime UTR variant"[mol_consequence]c.*"5 prime UTR variant"[mol_consequence]c.-review_status filter. This is the most efficient way to distinguish
between single-laboratory assertions and panel-reviewed ground truth.summary → clinical_significancesummary → genessummary → variation_typesummary → molecular_consequencesevidence → allele_infoevidence → conditionsevidence → functional_consequencesevidence →
structural_variant_detailsevidence → citation_referenceslast_evaluatedevidence → submissions[].curator_notesTo get precise genomic coordinates in the format <chrom>:<pos>:<ref>><alt>
(e.g., chr5:70951945:G>A), you must use the evidence command, as these
details are not available in the summary output.
You MUST always include genomic coordinates in the format
<chrom>:<pos>:<ref>><alt> when listing or presenting variants, even if not
explicitly requested by the user. If coordinates are missing from the summary,
use the evidence command or dbSNP fallback to retrieve them.
uv run scripts/clinvar_api.py evidence --variant_id <ID> --output evidence.json.evidence command parses the XML. Extract:
ChrpositionVCF (or start)referenceAlleleVCF (or referenceAllele)alternateAlleleVCF (or alternateAllele) from the
SequenceLocation element with Assembly="GRCh38".Fallback for Imprecise Coordinates (Gene Range): ClinVar often returns the
full gene range for non-coding variants. If the extracted coordinates correspond
to the gene range instead of a specific position, use the dbsnp-database skill
to resolve the precise coordinates using the dbsnp_rsid or HGVS title: 1.Check
for dbsnp_rsid in the evidence output. 2. Run uv run scripts/dbsnp_cli.py resolve-rsid {rsid} to get precise GRCh38 coordinates. 3. Format as
<chrom>:<pos>:<ref>><alt> using the SPDI or HGVS data from dbSNP.
The structural_variant_details field is only populated for copy number
variants (CNVs). For standard SNVs and small indels this field will be null.
Use the allele_info fields (position_start, position_stop,
reference_allele, alternate_allele) instead.
Large copy-number variants (CNVs) frequently have empty
molecular_consequences. If a variant title mentions "del" and coordinates
overlap your target region, it is relevant regardless of missing labels.
To increase the rate limit to 10 requests per second, you need to obtain an NCBI
API key and add it to the .env file. You can obtain a key by following the
instructions at NCBI ClinVar API docs
Once you have a key, follow the prerequisite instructions to add it to the
.env file.
uv run scripts/clinvar_api.py search --query "BRCA1[gene]" --output results.json
If a RateLimitError is encountered, follow the prerequisite instructions to
help the user add NCBI_API_KEY to the .env file, providing the
NCBI ClinVar API docs URL for instructions on how to obtain one.
uv run to execute python.jq is unavailable pivot immediately to using Python one-liners for
processing JSON (e.g., uv run python3 -c "import json; ...").count before search to understand the result set size.search command fetches all results by default and includes
total_count and fetched_count in the output — always verify these match
to confirm complete retrieval.last_evaluated.clinvar_api.py client which handles the unpredictable XML schemas
robustly.NCBI_API_KEY to the .env file, then retry.11[chr] AND 1234[chrpos]), not raw ATCG strings..lower()) when filtering.search output as a bare list — search returns a JSON object
with total_count, fetched_count, and variant_ids — not a bare list.Retrieve and analyze AlphaFold predicted structures for a protein. Use when the user provides a specific UniProt Accession ID and wants structural confidence metrics (pLDDT), domain boundary analysis, or disorder assessment. Do not use if the user only has a protein name, gene name, or amino acid sequence — ask for a UniProt ID first.
Query the ChEMBL database for bioactive molecules, drug targets, bioactivity data, approved drugs, and chemical structures. Use when the user asks about compounds, targets, IC50/Ki values, drug mechanisms, or structure searches.
Query ClinicalTrials.gov via APIv2. Use when you want to search for trials by condition, drug, location, status, or phase; retrieve trial details by NCT ID; check eligibility/inclusion criteria; count trials across conditions or time periods; identify a sponsor's trial portfolio; find recruiting trials for patient matching.
Use when you want to look up, map, and search for short genetic variants (SNPs, indels) in NCBI's dbSNP database. Resolves between rsIDs, genomic coordinates in VCF format, and HGVS strings. For an rsID, returns variant type, gene associations, clinical significance, allele frequencies, and genomic coordinates (GRCh38).
Query and search the EMBL-EBI Ontology Lookup Service (OLS) for biomedical ontology terms, definitions, and hierarchies across 250+ ontologies (e.g., GO, DOID, HP). Use when the user asks to search for terms, retrieve details, navigate hierarchies (parents, children, ancestors), look up properties and individuals, get autocomplete suggestions, or access ontology metadata and statistics.
Query the ENCODE Registry of cis-Regulatory Elements (cCREs) via the SCREEN GraphQL API, or make custom queries to the ENCODE Portal REST API for experiments and files (ChIP-seq peaks, etc.). Use when you want to query regulatory annotations or raw experimental data across human cell types.