SciAgent-Skills

deseq2-differential-expression

Query PubChem (110M+ compounds) directly via the PUG-REST/JSON API with plain `requests` — no SDK install required. Search by name/CID/SMILES/InChIKey/formula, retrieve properties (MW, XLogP, TPSA, H-bond counts), do similarity/substructure searches with async ListKey polling, fetch synonyms, descriptions, assay summaries, and download SDF/PNG. For local cheminformatics use rdkit; for bioactivity-centric workflows use chembl-database-bioactivity.

2026-06-13

bioservices-multi-database

Bulk RNA-seq DE with R/Bioconductor DESeq2. Negative binomial GLM, empirical Bayes shrinkage, Wald/LRT tests, multi-factor designs, Salmon tximeta import, apeglm LFC shrinkage, MA/volcano/heatmap viz. R gold standard. Use pydeseq2-differential-expression for Python; use edgeR for TMM normalization.

2026-06-10

Unified Python interface to 40+ bioinformatics web services: UniProt proteins, KEGG pathways, ChEMBL/ChEBI/PubChem, BLAST, cross-database ID mapping, GO annotations, PPI. For deep single-DB queries use dedicated tools (gget for Ensembl, pubchempy for PubChem); bioservices excels at cross-database workflows.

cbioportal-database

Cancer genomics (TCGA et al.) via cBioPortal REST API. Retrieve somatic mutations, CNAs, expression, clinical data (survival/stage/treatment) across thousands of studies. Use for TMB, oncoprints, survival analysis. For population frequencies use gnomad-database; for drug-gene interactions use opentargets-database.

cellxgene-census

Query CELLxGENE Census (61M+ cells). Search by cell type/tissue/disease/organism; get AnnData, stream out-of-core, train PyTorch models. For your own data use scanpy; for annotated data use anndata.

esm-protein-language-model

Protein language models (ESM3, ESM C) for sequence generation, structure prediction, inverse folding, and embeddings. Design novel proteins, extract ML features, or fold sequences. Local GPU or EvolutionaryScale Forge API. Use AlphaFold for traditional folding; RDKit for small molecules.

uniprot-protein-database

Query UniProt REST API: search by gene/protein name, fetch FASTA, map IDs (Ensembl, PDB, RefSeq), access Swiss-Prot annotations. Use bioservices for multi-DB access; alphafold-database-access for structures.

zarr-python

Chunked N-D arrays with compression and cloud storage. NumPy-style indexing. Backends: local, S3, GCS, ZIP, memory. Dask/Xarray integration for parallel and labeled computation. For lineage use lamindb; for labeled arrays use xarray.

molfeat-molecular-featurization

Molecular featurization hub (100+ featurizers) for ML. SMILES to fingerprints (ECFP, MACCS, MAP4), descriptors (RDKit 2D, Mordred), pretrained embeddings (ChemBERTa, GIN, Graphormer), pharmacophores. Scikit-learn compatible with parallelization/caching. For QSAR, virtual screening, similarity, and molecular DL.

smina-molecular-docking

smina molecular docking CLI. AutoDock Vina fork with customizable scoring functions, native SDF/MOL2/PDB ligand input, autoboxing, local energy minimization, and per-atom score breakdowns. Pipeline: receptor PDBQT prep -> ligand prep (RDKit/OpenBabel) -> dock via autobox or explicit grid -> rescore/minimize with custom scoring -> rank poses by affinity. Choose smina over Vina when you need custom scoring terms (--custom_scoring), local optimization of an existing pose (--local_only), per-atom contributions (--atom_term_data), or SDF/MOL2 ligands without manual PDBQT conversion. For unknown binding sites use diffdock; for the Python-bindings/Vinardo workflow use autodock-vina-docking.

lamindb-data-management

Open-source FAIR biology data framework. Version artifacts (AnnData, DataFrame, Zarr), track lineage, validate via ontologies (Bionty), query datasets. Integrates with Nextflow, Snakemake, W&B, scVI. For scRNA-seq use scanpy; for ontology lookups use bionty.

sciagent-skill-creator

Scaffold a new SciAgent-Skills entry. Picks pipeline/toolkit/database/guide template, creates skills/{category}/{name}/SKILL.md with valid frontmatter, appends the registry.yaml entry, runs validation. Enforces name uniqueness, kebab-case, description keyword rules, schema rules from CLAUDE.md. TRIGGER when user says (any language): "add a SciAgent skill", "add a skill for <X>", "create new skill", "create a SKILL.md for <X>", "scaffold a skill", "new skill entry", "register a skill", "신규 skill 추가", "스킬 만들어줘", "스킬 생성", "skill 만들어", or any request to add a new SKILL.md to this repo. ALWAYS invoke this skill BEFORE writing to skills/ or registry.yaml. DO NOT TRIGGER when: editing existing entry's content (just edit the file directly); migrating an existing entry (read CLAUDE.md "Migrating from Existing Entries" first); only updating registry.yaml without creating a new SKILL.md.

mdtraj-trajectory-analysis

mdtraj molecular dynamics trajectory analysis (Python). Reads DCD/XTC/TRR/NetCDF/H5/PDB topologies and trajectories; computes RMSD vs time, radius of gyration, per-residue RMSF, residue-residue contact frequency maps, phi/psi torsions for Ramachandran plots (general + Gly/Pro), and 8-state DSSP secondary structure. Modules: trajectory I/O, geometry (distances/angles/dihedrals), structural analysis (RMSD/Rg/RMSF/SASA), contacts, hydrogen bonds, secondary structure (DSSP), NMR observables. For broader atom-selection grammar use mdanalysis-trajectory; for running MD simulations use OpenMM/GROMACS.

seaborn-statistical-plots

Statistical visualization on matplotlib with native pandas support. Auto aggregation, CIs, grouping for distributions (histplot, kdeplot), categorical (boxplot, violinplot), relational (scatterplot, lineplot), regression (regplot, lmplot), matrix (heatmap, clustermap), grids (pairplot, FacetGrid). Use for quick statistical summaries; matplotlib for fine control; plotly for interactive HTML.

benchling-integration

Benchling R&D Python SDK: CRUD on registry entities (DNA, RNA, proteins, custom), inventory, ELN, workflow automation. Needs Benchling account and API key. Use biopython for local sequence analysis; pubchem for chemical DBs.

protocolsio-integration

protocols.io REST API: search and fetch wet-lab, bioinformatics, and clinical protocols by keyword, DOI, or category, with steps, reagents, materials, equipment, timing. Public access free; auth needed for private or publishing. Pair with opentrons-protocol-api or benchling-integration to execute.

networkx-graph-analysis

Graph and network analysis toolkit. Four graph types (directed, undirected, multi-edge), centrality, shortest paths, community detection, generators, I/O (GraphML, GML, edge list), matplotlib viz. For large graphs (100K+ nodes) use igraph or graph-tool; for GNNs use PyG.

latex-research-posters

Research posters in LaTeX using beamerposter, tikzposter, or baposter. Layout, typography, color schemes, figure integration, accessibility, and QA for conferences. Includes templates. For figure generation use matplotlib-scientific-plotting or plotly-interactive-plots.

scientific-manuscript-writing

technical-writers

Scientific manuscript writing: IMRAD, citation styles (APA/AMA/Vancouver/IEEE), figures/tables, reporting guidelines (CONSORT/STROBE/PRISMA/ARRIVE), writing principles (clarity/conciseness/accuracy), venue-specific style. For LaTeX see companion assets.

scientific-slides

Scientific presentations for conferences, seminars, thesis defenses, and grant pitches. Slide design, talk structure, timing, data viz for slides, QA. PowerPoint and LaTeX Beamer. For posters use latex-research-posters.

interpro-database

Query InterPro REST API for protein domain architecture, family classification, and member-DB integration. Search entries, retrieve a protein's domains, list family members, get taxonomic distribution, link to PDB. Unifies Pfam, PANTHER, PIRSF, PRINTS, PROSITE, SMART, CDD, NCBIfam. Use uniprot-protein-database for sequences; pdb-database for 3D structures.

metabolomics-workbench-database

Query Metabolomics Workbench REST API (4,200+ NIH studies) for metabolite ID, study discovery, RefMet standardization, m/z precursor searches, and gene/protein annotations. Quirks: compound input_item rejects `name` (use pubchem_cid/kegg_id/inchi_key/etc.); free-text → compound is a two-step refmet/match→refmet/name flow; moverz endpoint returns TSV text, not JSON. Use hmdb-database for local XML; pubchem-compound-search for general compound lookup.

pride-database

Search the PRIDE Archive v3 REST API for proteomics datasets: discover projects by keyword + faceted filters (organism, instrument, disease, software), fetch project metadata, list and download RAW/PEAK/RESULT/FASTA files (with FTP/Aspera URLs), look up which projects mention a UniProt accession, and find similar projects. PRIDE v3 no longer exposes peptide/PSM-level identification endpoints — for spectrum-level data download the project's RESULT files. Use uniprot-protein-database for protein sequences; interpro-database for domain architecture.

biorxiv-database

Query bioRxiv/medRxiv preprints via REST API. Search by DOI, category, or date range; retrieve metadata (title, abstract, authors, category, DOI, version history) and PDFs. No auth. For peer-reviewed biomedical use pubmed-database; broader scholarly search use openalex-database.

openalex-database

Query OpenAlex REST API for 250M+ scholarly works, authors, institutions, journals, concepts. Search by keyword, author, DOI, ORCID, or ID; filter by year, OA, citations, field; retrieve citations, references, author disambiguation. Free, no auth. For PubMed use pubmed-database; preprints use biorxiv-database.

chembl-database-bioactivity

Query ChEMBL (2M+ compounds, 19M+ bioactivity measurements, 13K+ targets) via the public REST/JSON API with plain `requests` — no SDK install required. Search compounds, retrieve IC50/Ki/EC50 bioactivities, find target inhibitors, run SAR, access drug mechanism/indication data.

emdb-database

Look up EMDB cryo-EM density maps and fitted atomic models via the entry REST API + EBI Search WS. Fetch entry metadata (resolution, method, organism, sample), map download URLs, fitted PDB IDs, and citations. Keyword search via EBI Search. No auth. For atomic coordinates use pdb-database; for AlphaFold predictions use alphafold-database-access.

gtopdb-database

Query IUPHAR/BPS Guide to Pharmacology (GtoPdb) for receptor-ligand interactions, target/ligand metadata, families, and approved drugs. Affinities (pKi/pIC50/pKd), action (Agonist/Antagonist/etc.), species, structures (SMILES/InChI). No auth. Always resolve targets via geneSymbol/accession; most metadata lives in sub-resources (/databaseLinks, /structure, /synonyms).

opentargets-database

Query Open Targets GraphQL API for target-disease associations, evidence, drug links, safety. Search targets by gene, diseases by EFO ID; scores from 20+ sources, drug mechanisms, tractability. For ChEMBL use chembl-database-bioactivity; for trials use clinicaltrials-database-search.

pdb-database

Query RCSB PDB (200K+ structures) via the public REST + GraphQL APIs with plain `requests` (no SDK). Search by text, attribute, sequence, or 3D structure similarity (Search API); retrieve metadata via GraphQL (Data API); download PDB/mmCIF from files.rcsb.org. For AlphaFold predictions use alphafold-database-access; for protein sequences only use uniprot-protein-database.

unichem-database

Cross-reference compound IDs across 20+ databases (ChEMBL, DrugBank, PubChem, ChEBI, PDB, SureChEMBL, HMDB, DrugCentral, BindingDB) via UniChem REST API. Resolve InChIKeys to source IDs, translate between source-specific IDs, find structurally related compounds by connectivity. POST with a JSON body for all cross-reference queries; only /sources is GET. No auth required.

western-blot-quantification

biochemists-and-biophysicists

Protocols and best practices for western blot quantification and analysis including band detection, normalization, and statistical methods.

biochemists-and-biophysicists

sgrna-design-guide

Three-tiered sgRNA design guide using validated Addgene sequences, CRISPick pre-computed datasets, or de novo design rules for CRISPR experiments

biochemists-and-biophysicists

sar-analysis

Structure-activity relationship (SAR) analysis guide for drug discovery including molecular descriptor analysis, scaffold analysis, and activity cliff detection.

biological-scientists-all-other

omics-analysis-guide

Three-tiered approach to omics data analysis (transcriptomics, proteomics) covering validated pipelines, standard workflows, and custom methods

scientific-literature-search

computer-science-teachers-postsecondarybiological-science-teachers-postsecondary

Systematic strategies for searching scientific literature across PubMed, arXiv, Google Scholar, and AI-assisted tools. Covers PICO framework for clinical questions, three-tiered search (database-specific, AI-assisted, content extraction), PubMed field tags and MeSH, boolean query construction, and full-text extraction. Use when planning a literature search or choosing a search tier.

2026-04-30

pymc-bayesian-modeling

statisticians-152041

Bayesian modeling with PyMC 5: priors, likelihood, NUTS/ADVI sampling, diagnostics (R-hat, ESS), LOO/WAIC comparison, prediction. Hierarchical, logistic, GP variants; predictive checks.

scikit-survival-analysis

Time-to-event modeling with scikit-survival: Cox PH (elastic net), Random Survival Forests, Boosting, SVMs for censored data. C-index, Brier, time-dependent AUC; Kaplan-Meier, Nelson-Aalen, competing risks. Pipeline/GridSearchCV compatible. Use statsmodels for frequentist, pymc for Bayesian, lifelines for parametric.

statistical-analysis

Guided statistical analysis: test choice, assumption checks, effect sizes, power, APA reporting. Pick tests, verify assumptions, or format results for publication. Covers frequentist (t-test, ANOVA, chi-square, regression, correlation, survival, count, reliability) and Bayesian. Use statsmodels or pymc-bayesian-modeling to fit.

statsmodels-statistical-modeling

Python statistical modeling: regression (OLS, WLS, GLM), discrete (Logit, Poisson, NegBin), time series (ARIMA, SARIMAX, VAR), with rigorous inference, diagnostics, and hypothesis tests. Use scikit-learn for ML; statistical-analysis for test choice.