en un clic
chembl-database
// Query ChEMBL bioactive molecules and drug discovery data. Search compounds by structure/properties, retrieve bioactivity data (IC50, Ki), find inhibitors, perform SAR studies, for medicinal chemistry.
// Query ChEMBL bioactive molecules and drug discovery data. Search compounds by structure/properties, retrieve bioactivity data (IC50, Ki), find inhibitors, perform SAR studies, for medicinal chemistry.
Access AlphaFold 200M+ AI-predicted protein structures. Retrieve structures by UniProt ID, download PDB/mmCIF files, analyze confidence metrics (pLDDT, PAE), for drug discovery and structural biology.
Comprehensive molecular biology toolkit. Use for sequence manipulation, file parsing (FASTA/GenBank/PDB), phylogenetics, and programmatic NCBI/PubMed access (Bio.Entrez). Best for batch processing, custom bioinformatics pipelines, BLAST automation.
Access BRENDA enzyme database via SOAP API. Retrieve kinetic parameters (Km, kcat), reaction equations, organism data, and substrate-specific enzyme information for biochemical research and metabolic pathway analysis.
Query openFDA API for drugs, devices, adverse events, recalls, regulatory submissions (510k, PMA), substance identification (UNII), for FDA regulatory data analysis and safety research.
Direct REST API access to KEGG (academic use only). Pathway analysis, gene-pathway mapping, metabolic pathways, drug interactions, ID conversion. Use this for direct HTTP/REST work or KEGG-specific control.
Low-level plotting library for full customization. Use when you need fine-grained control over every plot element, creating novel plot types, or integrating with specific scientific workflows. Export to PNG/PDF/SVG for publication. For quick statistical plots use seaborn; for interactive plots use plotly; for publication-ready multi-panel figures with journal styling, use scientific-visualization.
| name | chembl_database |
| description | Query ChEMBL bioactive molecules and drug discovery data. Search compounds by structure/properties, retrieve bioactivity data (IC50, Ki), find inhibitors, perform SAR studies, for medicinal chemistry. |
| license | Unknown |
| metadata | {"skill-author":"VenusFactory2."} |
ChEMBL is a manually curated database of bioactive molecules maintained by the European Bioinformatics Institute (EBI), containing over 2 million compounds, 19 million bioactivity measurements, 13,000+ drug targets, and data on approved drugs and clinical candidates. In this project the agent exposes only download tools: save molecule data, similarity search results, substructure search results, and drug information to files; each returns rich JSON {status, file_info, content_preview, biological_metadata, execution_context}. For programmatic use, the package also provides query-style APIs (see Project Modules).
This skill should be used when:
The skill provides:
src/tools/database/chembl/: chembl_client.py (API client), chembl_molecule.py, chembl_target.py, chembl_activity.py, chembl_similarity.py, chembl_substructure.py, chembl_drug.py (atomic modules), chembl_queries.py (high-level helpers), chembl_operations.py (query/download operations); all re-exported via package. For programmatic use (including query-style APIs), import e.g. from src.tools.database.chembl import download_chembl_molecule_by_id, ....references/api_reference.md| Tool name | Arguments | Purpose |
|---|---|---|
download_chembl_molecule_by_id | mol_id, out_path | Download molecule JSON by ChEMBL ID to file |
download_chembl_similarity_by_smiles | smiles, out_path, threshold (optional, 0–100, default 70), max_results (optional) | Download Tanimoto similarity search results to JSON file |
download_chembl_substructure_by_smiles | smiles, out_path, max_results (optional) | Download substructure search results to JSON file |
download_chembl_drug_by_id | chembl_id, out_path, max_results (optional) | Download drug info (drug, mechanisms, indications) to JSON file |
No authentication required; ChEMBL API is publicly accessible.
| Capability | Function | Module | Purpose |
|---|---|---|---|
| Client | get_client() | chembl_client.py | Return ChEMBL new_client singleton |
| Molecule by ID | get_molecule(chembl_id) | chembl_molecule.py | Get molecule by ChEMBL ID |
| Molecule filter | filter_molecules(**kwargs) | chembl_molecule.py | Filter by name/properties (Django-style) |
| Target by ID | get_target(chembl_id) | chembl_target.py | Get target by ChEMBL ID |
| Target filter | filter_targets(**kwargs) | chembl_target.py | Filter targets by type/name |
| Activity filter | filter_activities(**kwargs) | chembl_activity.py | Filter bioactivities (target, type, value) |
| Similarity | similarity_search(smiles, threshold=85) | chembl_similarity.py | Find similar compounds by SMILES |
| Substructure | substructure_search(smiles) | chembl_substructure.py | Find compounds containing substructure |
| Drug / mechanism / indication | get_drug(id), get_mechanisms(mol_id), get_indications(mol_id) | chembl_drug.py | Drug record, MoA, indications |
| Query: molecule | query_chembl_molecule_by_id(chembl_id) | chembl_operations.py | Returns rich JSON in memory |
| Query: similarity | query_chembl_similarity_by_smiles(smiles, threshold, max_results) | chembl_operations.py | Returns rich JSON in memory |
| Query: substructure | query_chembl_substructure_by_smiles(smiles, max_results) | chembl_operations.py | Returns rich JSON in memory |
| Query: drug | query_chembl_drug_by_id(chembl_id, max_results) | chembl_operations.py | Returns rich JSON in memory |
| Download: molecule | download_chembl_molecule_by_id(chembl_id, out_path) | chembl_operations.py | Save molecule JSON to file, return rich JSON |
| Download: similarity | download_chembl_similarity_by_smiles(smiles, out_path, threshold, max_results) | chembl_operations.py | Save similarity results to file, return rich JSON |
| Download: substructure | download_chembl_substructure_by_smiles(smiles, out_path, max_results) | chembl_operations.py | Save substructure results to file, return rich JSON |
| Download: drug | download_chembl_drug_by_id(chembl_id, out_path, max_results) | chembl_operations.py | Save drug info to file, return rich JSON |
| High-level: molecule | get_molecule_info(id), search_molecules_by_name(name), find_molecules_by_properties(...) | chembl_queries.py | Convenience molecule helpers |
| High-level: target | get_target_info(id), search_targets_by_name(name) | chembl_queries.py | Convenience target helpers |
| High-level: activity / drug | get_bioactivity_data(...), get_compound_bioactivities(mol_id), get_drug_info(mol_id), find_kinase_inhibitors(...) | chembl_queries.py | Bioactivity and drug helpers |
| High-level: structure | find_similar_compounds(smiles, threshold) | chembl_queries.py | Similarity wrapper |
| Export | export_to_dataframe(data) | chembl_queries.py | Convert results to pandas DataFrame |
uv pip install chembl_webresource_client
Optional for export: uv pip install pandas
Download molecule by ChEMBL ID:
from src.tools.database.chembl import download_chembl_molecule_by_id
result = download_chembl_molecule_by_id("CHEMBL25", "output/chembl_aspirin.json")
# Returns rich JSON: {status, file_info, content_preview, biological_metadata, execution_context}
Query molecule (in-memory, no file):
from src.tools.database.chembl.chembl_operations import query_chembl_molecule_by_id
result = query_chembl_molecule_by_id("CHEMBL25")
# Returns rich JSON: {status, content, content_preview, biological_metadata, execution_context}
Download similar compounds by SMILES:
from src.tools.database.chembl import download_chembl_similarity_by_smiles
result = download_chembl_similarity_by_smiles(
"CC(=O)Oc1ccccc1C(=O)O", # Aspirin SMILES
"output/chembl_similarity.json",
threshold=70, # Tanimoto threshold 0-100
max_results=100, # Optional limit
)
Query similarity (in-memory):
from src.tools.database.chembl.chembl_operations import query_chembl_similarity_by_smiles
result = query_chembl_similarity_by_smiles("CC(=O)Oc1ccccc1C(=O)O", threshold=85)
Download substructure matches:
from src.tools.database.chembl import download_chembl_substructure_by_smiles
result = download_chembl_substructure_by_smiles(
"c1ccccc1", # Benzene substructure
"output/chembl_substructure.json",
max_results=50,
)
Query substructure (in-memory):
from src.tools.database.chembl.chembl_operations import query_chembl_substructure_by_smiles
result = query_chembl_substructure_by_smiles("c1ccccc1", max_results=50)
Download drug info (drug, mechanisms, indications):
from src.tools.database.chembl import download_chembl_drug_by_id
result = download_chembl_drug_by_id(
"CHEMBL25",
"output/chembl_drug.json",
max_results=100,
)
Query drug info (in-memory):
from src.tools.database.chembl.chembl_operations import query_chembl_drug_by_id
result = query_chembl_drug_by_id("CHEMBL25")
For direct programmatic access (not exposed as agent tools):
from src.tools.database.chembl import (
get_client, get_molecule, filter_molecules,
get_target, filter_targets,
filter_activities,
similarity_search, substructure_search,
get_drug, get_mechanisms, get_indications,
)
# Client
client = get_client()
# Molecule
aspirin = get_molecule("CHEMBL25")
results = filter_molecules(pref_name__icontains="aspirin")
# Target
egfr = get_target("CHEMBL203")
kinases = filter_targets(target_type="SINGLE PROTEIN", pref_name__icontains="kinase")
# Activities
activities = filter_activities(
target_chembl_id="CHEMBL203",
standard_type="IC50",
standard_value__lte=100,
)
# Similarity / substructure
similar = similarity_search("CC(=O)Oc1ccccc1C(=O)O", threshold=85)
sub_results = substructure_search("c1ccccc1")
# Drug
drug = get_drug("CHEMBL25")
mechanisms = get_mechanisms("CHEMBL25")
indications = get_indications("CHEMBL25")
from src.tools.database.chembl import (
download_chembl_molecule_by_id,
download_chembl_drug_by_id,
download_chembl_similarity_by_smiles,
)
# Step 1: Download molecule data
mol_result = download_chembl_molecule_by_id("CHEMBL25", "output/aspirin_molecule.json")
# Step 2: Download drug info (mechanisms, indications)
drug_result = download_chembl_drug_by_id("CHEMBL25", "output/aspirin_drug.json")
# Step 3: Find similar compounds
sim_result = download_chembl_similarity_by_smiles(
"CC(=O)Oc1ccccc1C(=O)O", "output/aspirin_similar.json", threshold=80
)
from src.tools.database.chembl import (
download_chembl_similarity_by_smiles,
download_chembl_substructure_by_smiles,
)
# Find similar compounds for SAR
download_chembl_similarity_by_smiles(
"query_smiles_here", "output/sar_similar.json", threshold=70
)
# Find compounds sharing a core structure
download_chembl_substructure_by_smiles(
"core_smiles_here", "output/sar_substructure.json", max_results=100
)
from src.tools.database.chembl.chembl_molecule import get_molecule
from src.tools.database.chembl.chembl_target import filter_targets
from src.tools.database.chembl.chembl_activity import filter_activities
targets = filter_targets(pref_name__icontains="EGFR")
target_id = targets[0]["target_chembl_id"] if targets else None
if target_id:
activities = filter_activities(
target_chembl_id=target_id,
standard_type="IC50",
standard_value__lte=100,
)
compound_ids = [act["molecule_chembl_id"] for act in activities]
compounds = [get_molecule(cid) for cid in compound_ids[:10]]
ChEMBL supports Django-style query filters for the low-level atomic functions:
__exact - Exact match__iexact - Case-insensitive exact match__contains / __icontains - Substring matching__startswith / __endswith - Prefix/suffix matching__gt, __gte, __lt, __lte - Numeric comparisons__range - Value in range__in - Value in list__isnull - Null/not null check{
"status": "success",
"file_info": {
"file_path": "/absolute/path/to/file.json",
"file_name": "file.json",
"file_size": 12345,
"format": "json"
},
"content_preview": "first 500 chars...",
"biological_metadata": {"chembl_id": "CHEMBL25"},
"execution_context": {"download_time_ms": 234, "source": "ChEMBL"}
}
{
"status": "success",
"content": "{...full JSON...}",
"content_preview": "first 500 chars...",
"biological_metadata": {"chembl_id": "CHEMBL25"},
"execution_context": {"query_time_ms": 123, "source": "ChEMBL"}
}
{
"status": "error",
"error": {"type": "NotFound", "message": "...", "suggestion": "..."},
"file_info": null
}
Scripts live in src/tools/database/chembl/. Import from package: from src.tools.database.chembl import ...
Central operations module providing both query and download functions:
query_chembl_molecule_by_id(chembl_id) — returns rich JSON in memoryquery_chembl_similarity_by_smiles(smiles, threshold, max_results) — returns rich JSON in memoryquery_chembl_substructure_by_smiles(smiles, max_results) — returns rich JSON in memoryquery_chembl_drug_by_id(chembl_id, max_results) — returns rich JSON in memorydownload_chembl_molecule_by_id(chembl_id, out_path) — save to file, return rich JSONdownload_chembl_similarity_by_smiles(smiles, out_path, threshold, max_results) — save to file, return rich JSONdownload_chembl_substructure_by_smiles(smiles, out_path, max_results) — save to file, return rich JSONdownload_chembl_drug_by_id(chembl_id, out_path, max_results) — save to file, return rich JSONTest: bash script/tools/database/test_chembl.sh — runs chembl_operations.py --test, outputs under example/database/chembl/.
chembl_client.py — get_client(): ChEMBL new_client singletonchembl_molecule.py — get_molecule(chembl_id), filter_molecules(**kwargs)chembl_target.py — get_target(chembl_id), filter_targets(**kwargs)chembl_activity.py — filter_activities(**kwargs)chembl_similarity.py — similarity_search(smiles, threshold, max_results)chembl_substructure.py — substructure_search(smiles, max_results)chembl_drug.py — get_drug(chembl_id), get_mechanisms(mol_id, max_results), get_indications(mol_id, max_results)get_molecule_info(id), search_molecules_by_name(name), find_molecules_by_properties(...)get_target_info(id), search_targets_by_name(name)get_bioactivity_data(...), get_compound_bioactivities(mol_id), get_drug_info(mol_id), find_kinase_inhibitors(...)find_similar_compounds(smiles, threshold)export_to_dataframe(data) — convert results to pandas DataFrameThe client automatically caches results for 24 hours. Configure:
from chembl_webresource_client.settings import Settings
Settings.Instance().CACHING = False # Disable caching
Settings.Instance().CACHE_EXPIRE = 86400 # Adjust expiration (seconds)
Queries execute only when data is accessed. Convert to list to force execution:
results = filter_molecules(pref_name__icontains='aspirin')
results_list = list(results) # Force execution
data_validity_comment field in activity recordspotential_duplicate flagspchembl_value provides normalized activity (-log scale)standard_type to understand measurement type (IC50, Ki, EC50, etc.)Comprehensive API documentation including complete endpoint listing, filter operators, molecular properties, and advanced query examples.