تشغيل أي مهارة في Manus بنقرة واحدة

comparative-genomics-agent

النجوم٣

التفرعات٠

آخر تحديث١٢ فبراير ٢٠٢٦ في ١٩:٠٥

Compare a gene across multiple species — find orthologs, retrieve sequences, compute alignments, and summarize conservation

التثبيت

التثبيت باستخدام Codex أو Claude انسخ هذا Prompt والصقه في Codex أو Claude أو مساعد آخر ليراجع صفحة Skill ويثبّتها لك.

تشغيل في Manus

المصدر

syntheticgio

syntheticgio/AIBioSkills

فتح مستودع GitHub عرض مستودعات المنشئ

تنزيل

تشغيل في Manus

المهن ذات الصلةSOC

استنادا إلى تصنيف SOC المهني

علماء الأحياء، جميع الآخرونعلوم الحياة والطبيعة والاجتماع·SOC 19-1029

SKILL.md

readonly

name	comparative-genomics-agent
description	Compare a gene across multiple species — find orthologs, retrieve sequences, compute alignments, and summarize conservation

Perform a multi-species comparative genomics analysis for: $ARGUMENTS

Use the MCP tools available to you to find orthologs, retrieve sequences, align them, and report on conservation. Follow the steps below in order. If a step fails for a particular species, note the gap and continue with the remaining species.

Input Parsing

Parse the user input to identify:

Gene identifier — a gene symbol (e.g., TP53), Ensembl gene ID (e.g., ENSG00000141510), or NCBI Gene ID
Species list — extract species names. Convert common names to Ensembl species names:
- human → homo_sapiens
- mouse → mus_musculus
- rat → rattus_norvegicus
- zebrafish → danio_rerio
- chicken → gallus_gallus
- frog → xenopus_tropicalis
- fly / fruit fly → drosophila_melanogaster
- worm → caenorhabditis_elegans
- dog → canis_lupus_familiaris
- cat → felis_catus
- pig → sus_scrofa
- cow → bos_taurus

If the user says "across vertebrates" or similar, use: human, mouse, zebrafish, chicken (4 representative species). If no species are specified, default to: human, mouse, zebrafish.

Data Gathering Steps

1. Reference Gene Information

If the input is a gene symbol, call ensembl_lookup_gene with the symbol and species homo_sapiens (or the first species listed) to get the Ensembl gene ID.
Call datasets_summary_gene with the gene symbol (taxon: human) for NCBI gene metadata (full name, summary).
Note the reference Ensembl gene ID for the next step.

2. Find Orthologs

Call ensembl_get_homologs with the reference Ensembl gene ID, homology_type: "orthologues".
From the results, extract the ortholog Ensembl gene IDs for each of the requested target species.
If a requested species has no ortholog in the results, note it as "No ortholog found."
Record the percent identity values reported by Ensembl for each ortholog pair.

3. Retrieve Protein Sequences

For the reference gene and each ortholog found:

Call ensembl_get_sequence with the Ensembl gene ID, seq_type: "protein", format: "json".
Store the protein sequence and its length.

If a protein sequence is not available for a gene ID, try the canonical transcript ID instead.

4. Pairwise Alignments

For each ortholog protein sequence, align it against the reference (human) protein:

Call sequence_align with the two protein sequences, sequence_type: "protein", mode: "global".
Record: alignment score, percent identity, gap count, alignment length.

If there are 3+ species, also consider one key pairwise comparison between distant species (e.g., mouse vs zebrafish) to show the range of divergence.

5. Sequence Statistics (Optional)

For the reference protein:

Call sequence_stats with the protein sequence to get molecular weight, amino acid composition.
Note any unusual composition differences across species if evident from the alignments.

6. Domain Conservation Check (Optional)

Call interpro_get_domains with the UniProt accession (if known) or look up via uniprot_search for the reference gene.
Note which key functional domains exist — these regions are expected to be highly conserved.

Report Format

Present the analysis as a structured comparative genomics report:

# Comparative Genomics Report: [GENE SYMBOL]

## Gene Overview
- Full name, function summary (from NCBI/Ensembl)
- Reference species and Ensembl gene ID
- Number of species analyzed

## Ortholog Summary

| Species | Ensembl Gene ID | Protein Length | % Identity to [Reference] | % Positives |
|---------|-----------------|----------------|---------------------------|-------------|
| Human (reference) | ENSG... | 393 aa | — | — |
| Mouse | ENSMUSG... | 390 aa | 77.8% | 85.2% |
| Zebrafish | ENSDARG... | 373 aa | 52.1% | 66.3% |

## Pairwise Alignments

For each species pair aligned:
- **[Reference] vs [Species]**: X% identity, Y gaps, alignment length Z
- Key observations: conserved regions, notable insertions/deletions

## Conservation Analysis

Summarize the overall conservation pattern:
- Which regions are most conserved (relate to known domains if domain data was retrieved)
- Which regions show the most divergence
- Overall trend: is this gene highly conserved, moderately conserved, or rapidly evolving?
- Note any species-specific insertions or deletions

## Functional Domain Context
If domain data was retrieved:
- List key domains with positions
- Note whether these domains span the conserved regions

## Evolutionary Insights
Brief interpretation:
- What does the conservation pattern suggest about functional constraints?
- Are there species-specific adaptations visible in the sequence differences?
- How does the conservation level compare to expectations for this gene family?

## Data Sources
List which databases were queried and whether each returned data successfully for each species.

Keep the report factual — only include data returned by the tools. Do not hallucinate sequences, identity scores, or ortholog relationships. If alignment data is unavailable for a species, note "Alignment not performed — sequence unavailable."

المزيد من هذا المستودع

نفس المستودع

clinical-variant-agent

syntheticgio/AIBioSkills

Full clinical variant workup — gnomAD population frequency, ClinVar significance, protein domain impact, AlphaFold structure context, and PubMed literature

2026-02-123

drug-target-report

syntheticgio/AIBioSkills

Druggability assessment — protein function, 3D structures, ligand-bound PDBs, binding sites, interaction network, disease associations, and literature

2026-02-123

gene-list-agent

syntheticgio/AIBioSkills

Functional analysis of a gene list — batch summaries, pathway mapping, protein interactions, tissue expression, and phenotype associations

2026-02-123

gene-report

syntheticgio/AIBioSkills

Generate a comprehensive gene report by combining data from NCBI, Ensembl, UniProt, ClinVar, PDB, InterPro, STRING, and KEGG

2026-02-123

lab-notebook

syntheticgio/AIBioSkills

Research session lab notebook — start, annotate, update, report, or check status

2026-02-123

literature-agent

syntheticgio/AIBioSkills

Search PubMed for relevant biomedical literature on a gene, variant, disease, or topic and produce a structured literature summary

2026-02-123

# Comparative Genomics Report: [GENE SYMBOL] ## Gene Overview - Full name, function summary (from NCBI/Ensembl) - Reference species and Ensembl gene ID - Number of species analyzed ## Ortholog Summary | Species | Ensembl Gene ID | Protein Length | % Identity to [Reference] | % Positives | |---------|-----------------|----------------|---------------------------|-------------| | Human (reference) | ENSG... | 393 aa | — | — | | Mouse | ENSMUSG... | 390 aa | 77.8% | 85.2% | | Zebrafish | ENSDARG... | 373 aa | 52.1% | 66.3% | ## Pairwise Alignments For each species pair aligned: - **[Reference] vs [Species]**: X% identity, Y gaps, alignment length Z - Key observations: conserved regions, notable insertions/deletions ## Conservation Analysis Summarize the overall conservation pattern: - Which regions are most conserved (relate to known domains if domain data was retrieved) - Which regions show the most divergence - Overall trend: is this gene highly conserved, moderately conserved, or rapidly evolving? - Note any species-specific insertions or deletions ## Functional Domain Context If domain data was retrieved: - List key domains with positions - Note whether these domains span the conserved regions ## Evolutionary Insights Brief interpretation: - What does the conservation pattern suggest about functional constraints? - Are there species-specific adaptations visible in the sequence differences? - How does the conservation level compare to expectations for this gene family? ## Data Sources List which databases were queried and whether each returned data successfully for each species.