| name | biomni |
| description | Autonomous biomedical AI agent framework for executing complex research tasks across genomics, drug discovery, molecular biology, and clinical analysis. Use this skill when conducting multi-step biomedical research including CRISPR screening design, single-cell RNA-seq analysis, ADMET prediction, GWAS interpretation, rare disease diagnosis, or lab protocol optimization. Leverages LLM reasoning with code execution and integrated biomedical databases. |
Biomni
Overview
Biomni is an open-source biomedical AI agent framework from Stanford's SNAP lab that autonomously executes complex research tasks across biomedical domains. Use this skill when working on multi-step biological reasoning tasks, analyzing biomedical data, or conducting research spanning genomics, drug discovery, molecular biology, and clinical analysis.
Core Capabilities
Biomni excels at:
- Multi-step biological reasoning - Autonomous task decomposition and planning for complex biomedical queries
- Code generation and execution - Dynamic analysis pipeline creation for data processing
- Knowledge retrieval - Access to ~11GB of integrated biomedical databases and literature
- Cross-domain problem solving - Unified interface for genomics, proteomics, drug discovery, and clinical tasks
When to Use This Skill
Use biomni for:
- CRISPR screening - Design screens, prioritize genes, analyze knockout effects
- Single-cell RNA-seq - Cell type annotation, differential expression, trajectory analysis
- Drug discovery - ADMET prediction, target identification, compound optimization
- GWAS analysis - Variant interpretation, causal gene identification, pathway enrichment
- Clinical genomics - Rare disease diagnosis, variant pathogenicity, phenotype-genotype mapping
- Lab protocols - Protocol optimization, literature synthesis, experimental design
Quick Start
Installation and Setup
Install Biomni and configure API keys for LLM providers:
uv pip install biomni --upgrade
Configure API keys (store in .env file or environment variables):
export ANTHROPIC_API_KEY="your-key-here"
Use scripts/setup_environment.py for interactive setup assistance.
Basic Usage Pattern
from biomni.agent import A1
agent = A1(path='./data', llm='claude-sonnet-4-20250514')
agent.go("Your biomedical research question or task")
agent.save_conversation_history("report.pdf")
Working with Biomni
1. Agent Initialization
The A1 class is the primary interface for biomni:
from biomni.agent import A1
from biomni.config import default_config
agent = A1(
path='./data',
llm='claude-sonnet-4-20250514'
)
default_config.llm = "gpt-4"
default_config.timeout_seconds = 1200
default_config.max_iterations = 50
Supported LLM Providers:
- Anthropic Claude (recommended):
claude-sonnet-4-20250514, claude-opus-4-20250514
- OpenAI:
gpt-4, gpt-4-turbo
- Azure OpenAI: via Azure configuration
- Google Gemini:
gemini-2.0-flash-exp
- Groq:
llama-3.3-70b-versatile
- AWS Bedrock: Various models via Bedrock API
See references/llm_providers.md for detailed LLM configuration instructions.
2. Task Execution Workflow
Biomni follows an autonomous agent workflow:
agent = A1(path='./data', llm='claude-sonnet-4-20250514')
result = agent.go("""
Design a CRISPR screen to identify genes regulating autophagy in
HEK293 cells. Prioritize genes based on essentiality and pathway
relevance.
""")
agent.save_conversation_history("autophagy_screen_report.pdf")
3. Common Task Patterns
CRISPR Screening Design
agent.go("""
Design a genome-wide CRISPR knockout screen for identifying genes
affecting [phenotype] in [cell type]. Include:
1. sgRNA library design
2. Gene prioritization criteria
3. Expected hit genes based on pathway analysis
""")
Single-Cell RNA-seq Analysis
agent.go("""
Analyze this single-cell RNA-seq dataset:
- Perform quality control and filtering
- Identify cell populations via clustering
- Annotate cell types using marker genes
- Conduct differential expression between conditions
File path: [path/to/data.h5ad]
""")
Drug ADMET Prediction
agent.go("""
Predict ADMET properties for these drug candidates:
[SMILES strings or compound IDs]
Focus on:
- Absorption (Caco-2 permeability, HIA)
- Distribution (plasma protein binding, BBB penetration)
- Metabolism (CYP450 interaction)
- Excretion (clearance)
- Toxicity (hERG liability, hepatotoxicity)
""")
GWAS Variant Interpretation
agent.go("""
Interpret GWAS results for [trait/disease]:
- Identify genome-wide significant variants
- Map variants to causal genes
- Perform pathway enrichment analysis
- Predict functional consequences
Summary statistics file: [path/to/gwas_summary.txt]
""")
See references/use_cases.md for comprehensive task examples across all biomedical domains.
4. Data Integration
Biomni integrates ~11GB of biomedical knowledge sources:
- Gene databases - Ensembl, NCBI Gene, UniProt
- Protein structures - PDB, AlphaFold
- Clinical datasets - ClinVar, OMIM, HPO
- Literature indices - PubMed abstracts, biomedical ontologies
- Pathway databases - KEGG, Reactome, GO
Data is automatically downloaded to the specified path on first use.
5. MCP Server Integration
Extend biomni with external tools via Model Context Protocol:
6. Evaluation Framework
Benchmark agent performance on biomedical tasks:
from biomni.eval import BiomniEval1
evaluator = BiomniEval1()
score = evaluator.evaluate(
task_type='crispr_design',
instance_id='test_001',
answer=agent_output
)
dataset = evaluator.load_dataset()
Best Practices
Task Formulation
- Be specific - Include biological context, organism, cell type, conditions
- Specify outputs - Clearly state desired analysis outputs and formats
- Provide data paths - Include file paths for datasets to analyze
- Set constraints - Mention time/computational limits if relevant
Security Considerations
⚠️ Important: Biomni executes LLM-generated code with full system privileges. For production use:
- Run in isolated environments (Docker, VMs)
- Avoid exposing sensitive credentials
- Review generated code before execution in sensitive contexts
- Use sandboxed execution environments when possible
Performance Optimization
- Choose appropriate LLMs - Claude Sonnet 4 recommended for balance of speed/quality
- Set reasonable timeouts - Adjust
default_config.timeout_seconds for complex tasks
- Monitor iterations - Track
max_iterations to prevent runaway loops
- Cache data - Reuse downloaded data lake across sessions
Result Documentation
agent.save_conversation_history("results/project_name_YYYYMMDD.pdf")
Resources
References
Detailed documentation available in the references/ directory:
api_reference.md - Complete API documentation for A1 class, configuration, and evaluation
llm_providers.md - LLM provider setup (Anthropic, OpenAI, Azure, Google, Groq, AWS)
use_cases.md - Comprehensive task examples for all biomedical domains
Scripts
Helper scripts in the scripts/ directory:
setup_environment.py - Interactive environment and API key configuration
generate_report.py - Enhanced PDF report generation with custom formatting
External Resources
Troubleshooting
Common Issues
Data download fails
agent = A1(path='./data', llm='your-llm')
API key errors
echo $ANTHROPIC_API_KEY
Timeout on complex tasks
from biomni.config import default_config
default_config.timeout_seconds = 3600
Memory issues with large datasets
- Use streaming for large files
- Process data in chunks
- Increase system memory allocation
Getting Help
For issues or questions: