// "Query ChEMBL's bioactive molecules and drug discovery data. Search compounds by structure/properties, retrieve bioactivity data (IC50, Ki), find inhibitors, perform SAR studies, for medicinal chemistry."
| name | chembl-database |
| description | Query ChEMBL's bioactive molecules and drug discovery data. Search compounds by structure/properties, retrieve bioactivity data (IC50, Ki), find inhibitors, perform SAR studies, for medicinal chemistry. |
ChEMBL is a manually curated database of bioactive molecules maintained by the European Bioinformatics Institute (EBI), containing over 2 million compounds, 19 million bioactivity measurements, 13,000+ drug targets, and data on approved drugs and clinical candidates. Access and query this data programmatically using the ChEMBL Python client for drug discovery and medicinal chemistry research.
This skill should be used when:
The ChEMBL Python client is required for programmatic access:
uv pip install chembl_webresource_client
from chembl_webresource_client.new_client import new_client
# Access different endpoints
molecule = new_client.molecule
target = new_client.target
activity = new_client.activity
drug = new_client.drug
Retrieve by ChEMBL ID:
molecule = new_client.molecule
aspirin = molecule.get('CHEMBL25')
Search by name:
results = molecule.filter(pref_name__icontains='aspirin')
Filter by properties:
# Find small molecules (MW <= 500) with favorable LogP
results = molecule.filter(
molecule_properties__mw_freebase__lte=500,
molecule_properties__alogp__lte=5
)
Retrieve target information:
target = new_client.target
egfr = target.get('CHEMBL203')
Search for specific target types:
# Find all kinase targets
kinases = target.filter(
target_type='SINGLE PROTEIN',
pref_name__icontains='kinase'
)
Query activities for a target:
activity = new_client.activity
# Find potent EGFR inhibitors
results = activity.filter(
target_chembl_id='CHEMBL203',
standard_type='IC50',
standard_value__lte=100,
standard_units='nM'
)
Get all activities for a compound:
compound_activities = activity.filter(
molecule_chembl_id='CHEMBL25',
pchembl_value__isnull=False
)
Similarity search:
similarity = new_client.similarity
# Find compounds similar to aspirin
similar = similarity.filter(
smiles='CC(=O)Oc1ccccc1C(=O)O',
similarity=85 # 85% similarity threshold
)
Substructure search:
substructure = new_client.substructure
# Find compounds containing benzene ring
results = substructure.filter(smiles='c1ccccc1')
Retrieve drug data:
drug = new_client.drug
drug_info = drug.get('CHEMBL25')
Get mechanisms of action:
mechanism = new_client.mechanism
mechanisms = mechanism.filter(molecule_chembl_id='CHEMBL25')
Query drug indications:
drug_indication = new_client.drug_indication
indications = drug_indication.filter(molecule_chembl_id='CHEMBL25')
Identify the target by searching by name:
targets = new_client.target.filter(pref_name__icontains='EGFR')
target_id = targets[0]['target_chembl_id']
Query bioactivity data for that target:
activities = new_client.activity.filter(
target_chembl_id=target_id,
standard_type='IC50',
standard_value__lte=100
)
Extract compound IDs and retrieve details:
compound_ids = [act['molecule_chembl_id'] for act in activities]
compounds = [new_client.molecule.get(cid) for cid in compound_ids]
Get drug information:
drug_info = new_client.drug.get('CHEMBL1234')
Retrieve mechanisms:
mechanisms = new_client.mechanism.filter(molecule_chembl_id='CHEMBL1234')
Find all bioactivities:
activities = new_client.activity.filter(molecule_chembl_id='CHEMBL1234')
Find similar compounds:
similar = new_client.similarity.filter(smiles='query_smiles', similarity=80)
Get activities for each compound:
for compound in similar:
activities = new_client.activity.filter(
molecule_chembl_id=compound['molecule_chembl_id']
)
Analyze property-activity relationships using molecular properties from results.
ChEMBL supports Django-style query filters:
__exact - Exact match__iexact - Case-insensitive exact match__contains / __icontains - Substring matching__startswith / __endswith - Prefix/suffix matching__gt, __gte, __lt, __lte - Numeric comparisons__range - Value in range__in - Value in list__isnull - Null/not null checkConvert results to pandas DataFrame for analysis:
import pandas as pd
activities = new_client.activity.filter(target_chembl_id='CHEMBL203')
df = pd.DataFrame(list(activities))
# Analyze results
print(df['standard_value'].describe())
print(df.groupby('standard_type').size())
The client automatically caches results for 24 hours. Configure caching:
from chembl_webresource_client.settings import Settings
# Disable caching
Settings.Instance().CACHING = False
# Adjust cache expiration (seconds)
Settings.Instance().CACHE_EXPIRE = 86400
Queries execute only when data is accessed. Convert to list to force execution:
# Query is not executed yet
results = molecule.filter(pref_name__icontains='aspirin')
# Force execution
results_list = list(results)
Results are paginated automatically. Iterate through all results:
for activity in new_client.activity.filter(target_chembl_id='CHEMBL203'):
# Process each activity
print(activity['molecule_chembl_id'])
# Identify kinase targets
kinases = new_client.target.filter(
target_type='SINGLE PROTEIN',
pref_name__icontains='kinase'
)
# Get potent inhibitors
for kinase in kinases[:5]: # First 5 kinases
activities = new_client.activity.filter(
target_chembl_id=kinase['target_chembl_id'],
standard_type='IC50',
standard_value__lte=50
)
# Get approved drugs
drugs = new_client.drug.filter()
# For each drug, find all targets
for drug in drugs[:10]:
mechanisms = new_client.mechanism.filter(
molecule_chembl_id=drug['molecule_chembl_id']
)
# Find compounds with desired properties
candidates = new_client.molecule.filter(
molecule_properties__mw_freebase__range=[300, 500],
molecule_properties__alogp__lte=5,
molecule_properties__hba__lte=10,
molecule_properties__hbd__lte=5
)
Ready-to-use Python functions demonstrating common ChEMBL query patterns:
get_molecule_info() - Retrieve molecule details by IDsearch_molecules_by_name() - Name-based molecule searchfind_molecules_by_properties() - Property-based filteringget_bioactivity_data() - Query bioactivities for targetsfind_similar_compounds() - Similarity searchingsubstructure_search() - Substructure matchingget_drug_info() - Retrieve drug informationfind_kinase_inhibitors() - Specialized kinase inhibitor searchexport_to_dataframe() - Convert results to pandas DataFrameConsult this script for implementation details and usage examples.
Comprehensive API documentation including:
Refer to this document when detailed API information is needed or when troubleshooting queries.
data_validity_comment field in activity recordspotential_duplicate flagspchembl_value provides normalized activity (-log scale)standard_type to understand measurement type (IC50, Ki, EC50, etc.)