with one click
singlem
// Use SingleM to profile metagenomes and genomes, generate OTU tables, and produce GTDB-based taxonomic profiles.
// Use SingleM to profile metagenomes and genomes, generate OTU tables, and produce GTDB-based taxonomic profiles.
| name | singlem |
| description | Use SingleM to profile metagenomes and genomes, generate OTU tables, and produce GTDB-based taxonomic profiles. |
SingleM is a tool for profiling shotgun metagenomes (short- and long-read) by targeting 20 amino acid "window" sequences within single-copy marker genes. It generates GTDB-based taxonomic profiles and is particularly strong at handling novel lineages.
The primary subcommand for taxonomic profiling is singlem pipe.
Skill corresponds to SingleM v0.21.3.
conda create -c conda-forge -c bioconda --override-channels \
--name singlem 'singlem>=0.21.3'
conda activate singlem
# Download reference data (metapackage) — required after conda install
singlem data --output-directory /path/to/metapackage
docker pull wwood/singlem:0.21.3
# Run pipe directly:
docker run -v `pwd`:`pwd` wwood/singlem:0.21.3 pipe \
--sequences `pwd`/my.fastq.gz -p `pwd`/my.profile.csv --threads 4
singularity pull docker://wwood/singlem:0.21.3
singularity run -B `pwd`:`pwd` singlem_0.21.3.sif pipe \
--sequences `pwd`/my.fastq.gz -p `pwd`/my.profile.csv --threads 4
singlem pipe. Contains per-marker-gene OTU sequences with their coverage/abundance across samples.condense algorithm, which uses trimmed means and expectation maximisation across 59 marker genes.num_hits. The default minimum coverage to report in a taxonomic profile is 0.35× for reads, 0.1× for genomes.Root; d__Bacteria; p__Proteobacteria; ...).SingleM (and its phage-focused sibling Lyrebird) is a suite of subcommands. Most users only need pipe (and data once, to fetch reference data). The rest support downstream analysis, reference-data management, and package development.
| Subcommand | Purpose |
|---|---|
singlem pipe | Main workflow: profile reads/genomes → OTU table + GTDB taxonomic profile |
singlem data | Download / verify the reference metapackage |
singlem summarise | Mechanical transformations of pipe results (Krona, species-by-site tables, combining OTU tables, etc.) |
singlem renew | Re-run taxonomy assignment on an existing archive OTU table against a new metapackage |
singlem supplement | Add new genomes to a metapackage to create a custom reference |
singlem prokaryotic_fraction | Estimate the bacterial/archaeal fraction (and average genome size) of a metagenome |
singlem appraise | Assess how much of a metagenome is represented by a set of genomes/assemblies |
lyrebird data | Download / verify the Lyrebird (phage) reference metapackage |
lyrebird pipe | Profile dsDNA phages — same interface as singlem pipe |
| Subcommand | Purpose |
|---|---|
singlem condense | Generate a taxonomic profile from an existing (archive) OTU table |
singlem makedb | Build a searchable database (.sdb) from OTU tables |
singlem query | Find sequences in a makedb database similar to query OTU sequences |
singlem seqs | Choose the best window position within an HMM (step 1 of building a SingleM package) |
singlem create | Create a SingleM package from a GraftM package + taxonomy (step 2 of package building) |
singlem regenerate | Update an existing SingleM package with new sequences/taxonomy |
singlem metapackage | Create (or --describe) a metapackage from individual SingleM packages |
lyrebird condense | condense for Lyrebird (non-universal phage markers) |
lyrebird renew | renew for Lyrebird archive OTU tables |
singlem pipe \
--forward sample_R1.fastq.gz \
--reverse sample_R2.fastq.gz \
--taxonomic-profile sample.profile.tsv \
--threads 8
singlem pipe \
--sequences sample.fastq.gz \
-p sample.profile.tsv \
--threads 8
(-p is the short form of --taxonomic-profile)
singlem pipe \
--sequences sample_nanopore.fastq.gz \
-p sample.profile.tsv \
--threads 8
Long reads use the same interface; SingleM auto-detects read length.
singlem pipe \
--forward S1_R1.fq.gz S2_R1.fq.gz \
--reverse S1_R2.fq.gz S2_R2.fq.gz \
--otu-table all_samples.otu_table.csv \
--taxonomic-profile all_samples.profile.tsv \
--threads 16
For >100 samples, run each individually and combine OTU tables with
singlem summarise.
# Single genome
singlem pipe \
--genome-fasta-files genome.fna \
-p genome.profile.tsv
# Many genomes from a directory
singlem pipe \
--genome-fasta-directory /path/to/genomes/ \
--genome-fasta-extension fna \
-p genomes.profile.tsv \
--threads 16
# From a file listing genome paths
singlem pipe \
--genome-fasta-list genomes.txt \
-p genomes.profile.tsv \
--threads 16
Genome mode uses different defaults: higher
--min-taxon-coverage(0.1) and--min-orf-length(300 bp).
--otu-table)Useful for alpha/beta diversity metrics, ordination, and inspecting raw data (e.g. which marker genes fired, which OTU sequences were found). Compatible with singlem summarise and singlem appraise.
singlem pipe \
--forward sample_R1.fastq.gz \
--reverse sample_R2.fastq.gz \
--otu-table sample.otu_table.csv \
--taxonomic-profile sample.profile.tsv \
--threads 8
--archive-otu-table) — recommended for long-term archivingThe archive OTU table stores additional information (full sequence context, alignment data) needed to regenerate results without re-running the pipeline. It is the right format for two important downstream modes:
singlem condense — re-derive the taxonomic profile from the archive OTU table (e.g. with different --min-taxon-coverage settings) without re-running pipesinglem renew — re-assign taxonomy against an updated metapackage without re-running pipesinglem pipe \
--forward sample_R1.fastq.gz \
--reverse sample_R2.fastq.gz \
--archive-otu-table sample.archive.otu_table.json.gz \
--taxonomic-profile sample.profile.tsv \
--threads 8
# Later: re-derive profile with different coverage threshold
singlem condense \
--input-archive-otu-tables sample.archive.otu_table.json.gz \
--taxonomic-profile sample_recondensed.profile.tsv \
--min-taxon-coverage 0.1
# Later: re-assign taxonomy with a newer metapackage
singlem renew \
--input-archive-otu-table sample.archive.otu_table.json.gz \
--taxonomic-profile sample_updated.profile.tsv \
--metapackage /path/to/new_metapackage
| Option | Description | Default |
|---|---|---|
--forward / -1 / --reads / --sequences | Forward or unpaired reads (FASTA/FASTQ, gzipped ok) | required |
--reverse / -2 | Reverse reads for paired-end | — |
--taxonomic-profile / -p | Output taxonomic profile (TSV) | not set |
--otu-table | Output OTU table (CSV) | not set |
--threads | Number of CPU threads | 1 |
--metapackage | Path to reference metapackage | default system metapackage |
--min-taxon-coverage | Min coverage to report in profile | 0.35 (reads), 0.1 (genomes) |
--assignment-method | Taxonomy assignment algorithm for OTUs | smafa_naive_then_diamond |
--genome-fasta-files | Input genome FASTA(s) | — |
--genome-fasta-directory / -d | Directory of genome FASTAs | — |
--genome-fasta-extension | Extension for genome FASTAs | fna |
--genome-fasta-list | File listing genome paths | — |
-p / --taxonomic-profile) — SingleM condensed formatTab-separated file (.tsv) with three columns: sample, coverage, taxonomy.
sample coverage taxonomy
marine0.1 3.64 Root; d__Archaea
marine0.1 0.02 Root; d__Bacteria
marine0.1 0.56 Root; d__Archaea; p__Thermoproteota
marine0.1 0.80 Root; d__Bacteria; p__Desulfobacterota
marine0.1 2.17 Root; d__Bacteria; p__Proteobacteria
Key properties of the condensed format:
Root; d__Bacteria (coverage 0.02) does not include the coverage from p__Desulfobacterota (0.80) or p__Proteobacteria (2.17); those are reported on their own lines.Root; d__Bacteria; p__Proteobacteria; c__Gammaproteobacteria; ...sample column.--otu-table)CSV with columns: gene, sample, sequence, num_hits, coverage, taxonomy
--genome-fasta-files for assemblies/MAGs; --sequences/--forward is for raw reads only.singlem data before using pipe. Docker images include reference data.singlem summarise)singlem summarise transforms the condensed profile into several more analysis-friendly formats.
singlem summarise \
--input-taxonomic-profile sample.profile.tsv \
--output-taxonomic-profile-krona sample.krona.html
Produces an interactive hierarchical chart viewable in any web browser. Can also be generated directly from pipe with --taxonomic-profile-krona.
Outputs a taxon-by-sample matrix with relative abundance as percentages. Use --output-species-by-site-level to choose the taxonomic rank (domain, phylum, class, order, family, genus, or species):
singlem summarise \
--input-taxonomic-profile sample.profile.tsv \
--output-species-by-site-relative-abundance sample.phylum.csv \
--output-species-by-site-level phylum
Example output (one column per sample when multiple samples are present):
taxonomy marine0.1
unassigned 50.9
Root; d__Archaea; p__Thermoproteota 7.79
Root; d__Bacteria; p__Desulfobacterota 11.13
Root; d__Bacteria; p__Proteobacteria 30.18
To generate tables for all taxonomic levels at once, use a prefix:
singlem summarise \
--input-taxonomic-profile sample.profile.tsv \
--output-species-by-site-relative-abundance-prefix myprefix
# produces: myprefix-domain.tsv, myprefix-phylum.tsv, ..., myprefix-species.tsv
singlem summarise \
--input-taxonomic-profile sample.profile.tsv \
--output-taxonomic-profile-with-extras sample.with_extras.tsv
Adds full_coverage (coverage including descendants), relative_abundance (%), and level columns:
sample coverage full_coverage relative_abundance level taxonomy
marine0.1 0 7.19 100.0 root Root
marine0.1 3.64 4.20 58.41 domain Root; d__Archaea
marine0.1 0.02 2.99 41.59 domain Root; d__Bacteria
marine0.1 0.56 0.56 7.79 phylum Root; d__Archaea; p__Thermoproteota
marine0.1 0.80 0.80 11.13 phylum Root; d__Bacteria; p__Desulfobacterota
marine0.1 2.17 2.17 30.18 phylum Root; d__Bacteria; p__Proteobacteria
Note: coverage here is unfilled (not including descendants); full_coverage is filled (sum of a taxon and all its descendants).
singlem pipe \
--forward sample_R1.fq.gz --reverse sample_R2.fq.gz \
-p sample.profile.tsv --threads 8
singlem prokaryotic_fraction \
--forward sample_R1.fq.gz --reverse sample_R2.fq.gz \
-p sample.profile.tsv \
> sample.prokaryotic_fraction.tsv
Requires that the original run saved an --archive-otu-table.
singlem renew \
--input-archive-otu-table sample.archive.otu_table.json.gz \
--taxonomic-profile sample_updated.profile.tsv \
--metapackage /path/to/new_metapackage
renew also accepts --assignment-method, --threads, and --min-taxon-coverage, just like pipe.
singlem summarise \
--input-otu-tables s1.otu_table.csv s2.otu_table.csv s3.otu_table.csv \
--output-otu-table combined.otu_table.csv
singlem appraise)appraise compares OTU sequences from genomes and/or assemblies against those from the raw metagenome, reporting which lineages are represented and which are missing.
singlem pipe --sequences raw.fq.gz --otu-table metagenome.otu_table.csv
singlem pipe --genome-fasta-files my-genomes/*.fasta --otu-table genomes.otu_table.csv
singlem appraise \
--metagenome-otu-tables metagenome.otu_table.csv \
--genome-otu-tables genomes.otu_table.csv
Useful extras:
--assembly-otu-tables — appraise an assembly alongside (or instead of) binned genomes.--imperfect — match OTU sequences that are similar but not identical (e.g. to credit a genus-level representative); tune with --sequence-identity.--plot appraise.svg — render the appraisal visually (one rectangle per OTU sequence, sized by abundance).--output-binned-otu-table / --output-unbinned-otu-table / --output-unaccounted-for-otu-table — write OTU tables of the represented vs. missing populations.These subcommands support custom reference data and lower-level analyses. Most users never need them.
singlem supplement)Creates a new metapackage that includes your genomes, so future pipe runs can identify them. Taxonomy for the new genomes is assigned with GTDB-Tk (installed separately, with a version matching the metapackage's GTDB release) unless supplied via --taxonomy-file or --new-fully-defined-taxonomies.
singlem supplement \
--new-genome-fasta-files genome1.fna genome2.fna \
--input-metapackage /path/to/metapackage \
--output-metapackage supplemented.smpkg \
--checkm2-quality-file checkm2_quality.tsv \
--dereplicate-with-galah \
--threads 8
A dereplication mode is required: either --dereplicate-with-galah (run galah at species level) or --no-dereplication (inputs are already dereplicated). A quality-filtering choice is also required: pass CheckM2 results with --checkm2-quality-file, or skip with --no-quality-filter (and optionally --no-taxon-genome-lengths if no CheckM2 file is supplied).
singlem makedb / singlem query)Useful for asking "is this OTU sequence (or anything similar) present in samples B, C, D?". .sdb is the conventional database extension.
# Build a database from OTU tables
singlem makedb \
--otu-tables B.otu_table.csv C.otu_table.csv D.otu_table.csv \
--db BCD.sdb
# Find database sequences within a given divergence of query OTUs
singlem query \
--db BCD.sdb \
--query-otu-table A.otu_table.csv \
--max-divergence 3
query can also dump database contents filtered by sample (--sample-names), by taxonomy (--taxonomy Archaea), or in full (--dump).
singlem condense)condense turns an archive OTU table into a taxonomic profile. It is normally invoked implicitly by pipe's -p / --taxonomic-profile, but can be run standalone — e.g. to recompute a profile with a different --min-taxon-coverage without re-running pipe. See "Save an archive OTU table" under Output Options for an example.
singlem metapackage)Assemble individual SingleM packages (.spkg) into a metapackage, or inspect an existing one with --describe.
# Describe the contents of an existing metapackage
singlem metapackage --metapackage /path/to/metapackage --describe
# Create a metapackage from individual packages
singlem metapackage \
--singlem-packages pkg1.spkg pkg2.spkg \
--metapackage new.smpkg \
--nucleotide-sdb markers.sdb
singlem seqs → create → regenerate)Building a marker package is a multi-step expert workflow:
singlem seqs — given an HMM-aligned FASTA, choose the best (most conserved) window position.singlem create — finalise a SingleM package from a GraftM package, a taxonomy file, and the window position from seqs.singlem regenerate — update an existing SingleM package with new sequences/taxonomy without rebuilding from scratch.# 1. Choose the window position within the HMM
singlem seqs --alignment aligned.fasta --alignment-type aa --hmm marker.hmm
# 2. Create the package using the hmm-position reported by step 1
singlem create \
--input-graftm-package marker.gpkg \
--input-taxonomy marker_taxonomy.tsv \
--hmm-position 25 \
--target-domains Bacteria Archaea \
--gene-description "Ribosomal protein S2" \
--output-singlem-package marker.spkg
--gene-description is required — it is the free-form text shown by singlem metapackage --describe.
For dsDNA phage profiling, use the lyrebird command with the same interface:
# Download lyrebird reference data
lyrebird data --output-directory /path/to/lyrebird_metapackage
lyrebird pipe \
--forward sample_R1.fq.gz \
--reverse sample_R2.fq.gz \
-p sample.phage_profile.tsv \
--threads 8
Lyrebird uses >500 phage marker genes and vConTACT3-based taxonomy (not GTDB).
Lyrebird also provides condense and renew for archive OTU tables, mirroring their SingleM counterparts but using a Lyrebird metapackage. Save an archive OTU table from lyrebird pipe with --archive-otu-table to use them:
# Re-derive a phage profile from an archive OTU table
lyrebird condense \
--input-archive-otu-table sample.archive.otu_table.json.gz \
-p sample.phage_profile.tsv
# Re-assign phage taxonomy against an updated Lyrebird metapackage
lyrebird renew \
--input-archive-otu-table sample.archive.otu_table.json.gz \
-p sample.updated.phage_profile.tsv \
--metapackage /path/to/new_lyrebird_metapackage
# 1. Download reference data (once, after conda install)
singlem data --output-directory ~/singlem_metapackage
# 2. Profile paired-end metagenome (save archive OTU table for future re-use)
singlem pipe \
--forward sample_R1.fq.gz \
--reverse sample_R2.fq.gz \
--archive-otu-table sample.archive.otu_table.json.gz \
--taxonomic-profile sample.profile.tsv \
--threads 16
# 3. View profile
cat sample.profile.tsv
# 4. Convert to Krona chart
singlem summarise \
--input-taxonomic-profiles sample.profile.tsv \
--output-taxonomic-profile-krona sample.krona.html
If you use SingleM, please cite:
Ben J. Woodcroft et al. Comprehensive taxonomic identification of microbial species in metagenomic data using SingleM and Sandpiper. Nat Biotechnol (2025). https://doi.org/10.1038/s41587-025-02738-1