تشغيل أي مهارة في Manus بنقرة واحدة

bio-workflows-edna-pipeline

النجوم٩٤٣

التفرعات١٦٥

آخر تحديث٢٠ يونيو ٢٠٢٦ في ٠١:٤٨

End-to-end eDNA metabarcoding from raw amplicons to community ecology. Covers QC, primer removal (mandatory before DADA2 filterAndTrim), denoising with OBITools3 v3 (obi stats plural; DMS-based) or DADA2 ASVs (Callahan 2017), decontam combined method as screening-not-classifier (Davis 2018), tag-jumping with NovaSeq 10x MiSeq caveat (Schnell 2015), Hill-number effective species counts with coverage-based rarefaction (Jost 2006; Chao & Jost 2012; doubling rule), beta-diversity decomposition with MANDATORY PERMANOVA + PERMDISP pair (Anderson & Walsh 2013), constrained ordination, and the read-counts-not-abundance critique (Lamb 2019). Use when processing eDNA samples for biodiversity assessment, deciding ASV vs OTU, configuring OBITools3 v3, interpreting decontam screening, or reporting community comparisons with the dispersion confound check.

التثبيت

التثبيت باستخدام Codex أو Claude انسخ هذا Prompt والصقه في Codex أو Claude أو مساعد آخر ليراجع صفحة Skill ويثبّتها لك.

تشغيل في Manus

المصدر

GPTomics

GPTomics/bioSkills

فتح مستودع GitHub عرض مستودعات المنشئ

تنزيل

تشغيل في Manus

المهن ذات الصلةSOC

استنادا إلى تصنيف SOC المهني

مطوّرو البرمجياتمهن الحاسوب والرياضيات·SOC 15-1252

مستكشف الملفات

4 ملفات

SKILL.md

readonly

المزيد من هذا المستودع

نفس المستودع

bio-ribo-seq-initiation-site-mapping

GPTomics/bioSkills

Map translation initiation sites, including non-AUG and alternative starts, from initiation-drug ribosome profiling (TI-seq). Use when locating start codons, detecting near-cognate or upstream initiation, or analyzing harringtonine, lactimidomycin (GTI-seq/QTI-seq), or retapamulin (Ribo-RET) data.

2026-06-20943

bio-ribo-seq-orf-detection

GPTomics/bioSkills

Detect and quantify translated ORFs from Ribo-seq using 3-nucleotide periodicity, including uORFs, internal ORFs, dORFs, and novel ORFs. Use when finding actively translated regions beyond annotated CDS, classifying ORFs by the 2022 community standard, quantifying ORF-level translation, or choosing between periodicity-based callers.

2026-06-20943

bio-ribo-seq-riboseq-preprocessing

GPTomics/bioSkills

Preprocess ribosome profiling reads with UMI handling, adapter trimming, contaminant/rRNA depletion, and footprint-aware alignment. Use when preparing Ribo-seq FASTQ for periodicity QC, ORF detection, translation efficiency, or stalling analysis, or when deciding how to deduplicate, which aligner to use, or how to size-select ribosome-protected fragments.

2026-06-20943

bio-ribo-seq-ribosome-periodicity

GPTomics/bioSkills

Validate Ribo-seq library quality by measuring 3-nucleotide periodicity and calibrating read-length-specific P-site offsets. Use when checking whether footprints capture genuine translation, determining P-site offsets for downstream ORF/TE/stalling analysis, or deciding which read lengths to keep.

2026-06-20943

bio-ribo-seq-ribosome-stalling

GPTomics/bioSkills

Detect ribosome pausing and stalling at codon resolution from Ribo-seq, using local-relative occupancy metrics and A-site assignment. Use when studying elongation dynamics, codon dwell times, pause motifs, or ribosome collisions, and when judging whether a pause is real biology or a cycloheximide artifact.

2026-06-20943

bio-ribo-seq-translation-efficiency

GPTomics/bioSkills

Quantify translation efficiency (TE) as ribosome occupancy relative to mRNA abundance and test for differential TE between conditions. Use when separating translational from transcriptional regulation, distinguishing genuine translational control from buffering, or choosing between riborex, Xtail, anota2seq, and DESeq2 interaction models.

2026-06-20943

name	bio-workflows-edna-pipeline
description	End-to-end eDNA metabarcoding from raw amplicons to community ecology. Covers QC, primer removal (mandatory before DADA2 filterAndTrim), denoising with OBITools3 v3 (obi stats plural; DMS-based) or DADA2 ASVs (Callahan 2017), decontam combined method as screening-not-classifier (Davis 2018), tag-jumping with NovaSeq 10x MiSeq caveat (Schnell 2015), Hill-number effective species counts with coverage-based rarefaction (Jost 2006; Chao & Jost 2012; doubling rule), beta-diversity decomposition with MANDATORY PERMANOVA + PERMDISP pair (Anderson & Walsh 2013), constrained ordination, and the read-counts-not-abundance critique (Lamb 2019). Use when processing eDNA samples for biodiversity assessment, deciding ASV vs OTU, configuring OBITools3 v3, interpreting decontam screening, or reporting community comparisons with the dispersion confound check.
tool_type	mixed
primary_tool	obitools3
workflow	true
depends_on	["ecological-genomics/edna-metabarcoding","ecological-genomics/biodiversity-metrics","ecological-genomics/community-ecology","read-qc/quality-reports"]
qc_checkpoints	[{"after_demux":"Reads per sample >1000; negative controls <100 reads"},{"after_denoising":"Chimera rate <20% (>30% indicates library-prep issues); ASV/OTU count reasonable for marker"},{"after_decontam":"decontam combined method at threshold 0.1 (0.05 for low-biomass); biological-plausibility review of each flagged ASV; tag-jumping rate quantified and filtered (~0.001-0.005 MiSeq; ~0.005-0.01 NovaSeq per Schnell 2015)"},{"after_taxonomy":"Assignment rate marker-specific: 50-85% unassigned is typical per Wangensteen 2018; report gap honestly"},{"after_diversity":"Hill numbers reported as effective species counts (not raw Shannon); coverage-based rarefaction at C=0.95; extrapolation bounded at 2x reference (doubling rule); sample completeness >80%"},{"after_ordination":"PERMANOVA + PERMDISP reported TOGETHER (Anderson & Walsh 2013); if betadisper significant, location conclusion is not supported"}]

Version Compatibility

Reference examples tested with: DADA2 1.30+, FastQC 0.12+, MultiQC 1.21+, cutadapt 4.4+, phyloseq 1.46+, vegan 2.6+

Before using code patterns, verify installed versions match. If versions differ:

R: packageVersion('<pkg>') then ?function_name to verify parameters
CLI: <tool> --version then <tool> --help to confirm flags

If code throws ImportError, AttributeError, or TypeError, introspect the installed package and adapt the example to match the actual API rather than retrying.

eDNA Metabarcoding Pipeline

"Process my eDNA samples from raw reads to community ecology" -> Orchestrate primer removal, denoising (OBITools3 or DADA2), contamination filtering, taxonomy assignment, Hill number diversity estimation, and constrained ordination for species-environment analysis.

Complete workflow from raw amplicon sequences to community ecology analysis, supporting both OBITools3 and DADA2 processing paths.

Pipeline Overview

Raw amplicon FASTQ (demultiplexed)
    |
    v
[1. QC] ------------------> FastQC / MultiQC quality assessment
    |
    v
[2. Primer Removal] ------> Cutadapt (remove forward + reverse primers)
    |                            |
    |                            +---> QC: reads per sample >1000
    |
    +--- Path A: OBITools3     +--- Path B: DADA2
    |       |                  |       |
    |       v                  |       v
    |   [3a. obi alignpairedend]   [3b. filterAndTrim]
    |       |                  |       |
    |       v                  |       v
    |   [4a. obi uniq]        |   [4b. learnErrors + dada]
    |       |                  |       |
    |       v                  |       v
    |   [5a. obi ecotag]      |   [5b. assignTaxonomy]
    |                          |
    +-------- Merge -----------+
                |
                v
[6. Contamination Filter] -> decontam / microDecon (negative control removal)
    |
    v
[7. Taxonomy Table] -------> Species x sample matrix
    |
    v
[8. Diversity Analysis] ---> iNEXT Hill numbers (q=0,1,2)
    |
    v
[9. Community Comparison] -> vegan CCA/RDA + indicspecies
    |
    v
Species table + diversity metrics + ordination plots

Step 1: Quality Assessment

fastqc -t 8 -o fastqc_output/ raw_reads/*.fastq.gz
multiqc fastqc_output/ -o multiqc_report/

Step 2: Primer Removal

# Adapter sequences are marker-specific; examples below for common eDNA markers
# --discard-untrimmed: remove reads without primers (likely off-target)
# --minimum-length 50: discard very short fragments after trimming

# COI (Leray primers mlCOIintF / jgHCO2198)
cutadapt -g GGWACWGGWTGAACWGTWTAYCCYCC -G TAIACYTCIGGRTGICCRAARAAYCA \
    --discard-untrimmed --minimum-length 50 -j 8 \
    -o trimmed/{sample}_R1.fastq.gz -p trimmed/{sample}_R2.fastq.gz \
    raw_reads/{sample}_R1.fastq.gz raw_reads/{sample}_R2.fastq.gz

Common primer sets by marker:

Marker	Forward Primer	Reverse Primer	Target
COI	mlCOIintF	jgHCO2198	Metazoan invertebrates
12S MiFish	MiFish-U-F	MiFish-U-R	Fish
ITS2	ITS3	ITS4	Fungi
rbcL	rbcLa-F	rbcLa-R	Plants
18S V9	1389F	1510R	Eukaryotes

QC Checkpoint: Demultiplexing

# Gate: reads per sample >1000; negative controls <100 reads
for f in trimmed/*_R1.fastq.gz; do
    sample=$(basename "$f" _R1.fastq.gz)
    count=$(zcat "$f" | awk 'END{print NR/4}')
    echo "$sample: $count reads"
done

Step 3: Paired-End Merging and Denoising

Path A: OBITools3

# Import paired FASTQ into OBITools3 DMS
obi import --fastq-input trimmed/reads_R1.fastq.gz reads/reads1
obi import --fastq-input trimmed/reads_R2.fastq.gz reads/reads2

# Paired-end alignment
obi alignpairedend -R reads/reads2 reads/reads1 reads/aligned

# Filter by alignment score
# score >= 50: removes poorly overlapping pairs
obi grep -p 'sequence["score"] >= 50' reads/aligned reads/filtered

# Filter by merged length (marker-dependent range)
# 100-500 bp: typical COI amplicon range
obi grep -p 'len(sequence) >= 100 and len(sequence) <= 500' \
    reads/filtered reads/length_filtered

# Dereplicate
obi uniq reads/length_filtered reads/dereplicated

# Remove singletons
# count >=2: removes sequencing errors; increase to 5-10 for noisy datasets
obi grep -p 'sequence["count"] >= 2' reads/dereplicated reads/denoised

# Denoise (remove PCR/sequencing errors)
# ratio 0.05: sequences <5% abundance of a 1-mismatch parent are merged
obi clean -s merged_sample -r 0.05 -H reads/denoised reads/cleaned

Path B: DADA2 (R)

library(dada2)

path <- 'trimmed/'
fnFs <- sort(list.files(path, pattern = '_R1.fastq.gz', full.names = TRUE))
fnRs <- sort(list.files(path, pattern = '_R2.fastq.gz', full.names = TRUE))
sample_names <- gsub('_R1.fastq.gz', '', basename(fnFs))

# Filter and trim
# truncLen: set based on quality profiles; marker-dependent
# maxEE c(2,2): max expected errors; standard for eDNA
# minLen 50: minimum after trimming
filtFs <- file.path('filtered', paste0(sample_names, '_F_filt.fastq.gz'))
filtRs <- file.path('filtered', paste0(sample_names, '_R_filt.fastq.gz'))
out <- filterAndTrim(fnFs, filtFs, fnRs, filtRs,
                     truncLen = c(200, 180), maxEE = c(2, 2),
                     minLen = 50, truncQ = 2, rm.phix = TRUE,
                     multithread = TRUE)

# Learn error rates
errF <- learnErrors(filtFs, multithread = TRUE)
errR <- learnErrors(filtRs, multithread = TRUE)

# Denoise
dadaFs <- dada(filtFs, err = errF, multithread = TRUE)
dadaRs <- dada(filtRs, err = errR, multithread = TRUE)

# Merge paired reads
# minOverlap 20: standard; increase if amplicon has short overlap region
merged <- mergePairs(dadaFs, filtFs, dadaRs, filtRs, minOverlap = 20)

# Build ASV table
seqtab <- makeSequenceTable(merged)

# Remove chimeras
# method 'consensus': standard; 'pooled' for higher sensitivity
seqtab_nochim <- removeBimeraDenovo(seqtab, method = 'consensus', multithread = TRUE)
chimera_rate <- 1 - sum(seqtab_nochim) / sum(seqtab)
message(sprintf('Chimera rate: %.1f%%', chimera_rate * 100))

QC Checkpoint: Denoising

# Gate 1: Chimera rate <20%
if (chimera_rate > 0.20) message('WARNING: High chimera rate. Check primer removal and PCR conditions.')

# Gate 2: ASV count reasonable for marker
n_asvs <- ncol(seqtab_nochim)
message(sprintf('ASVs after denoising: %d', n_asvs))
# Typical ranges: COI 500-5000, 12S 50-500, ITS 200-3000

Step 4: Contamination Filtering

R (decontam)

library(decontam)
library(phyloseq)

ps <- phyloseq(otu_table(seqtab_nochim, taxa_are_rows = FALSE),
               sample_data(meta))

# Identify negative controls
sample_data(ps)$is_neg <- sample_data(ps)$sample_type == 'negative_control'

# Combined method (Davis 2018): uses BOTH negative controls AND DNA concentration
# threshold=0.1 default; 0.05 for low-biomass samples
# CRITICAL: output is SCREENING, not classification; biological-plausibility check required
if ('dna_concentration' %in% sample_variables(ps)) {
    contam <- isContaminant(ps, method = 'combined', neg = 'is_neg',
                            conc = 'dna_concentration', threshold = 0.1)
} else {
    # Fall back to prevalence-only if no qPCR/Qubit DNA-concentration data
    contam <- isContaminant(ps, method = 'prevalence', neg = 'is_neg',
                            threshold = 0.1)
}
message(sprintf('Flagged candidate contaminant ASVs: %d', sum(contam$contaminant)))
message('Manual review required: verify biological plausibility before deletion')

ps_clean <- prune_taxa(!contam$contaminant, ps)

# Remove negative control samples
ps_clean <- subset_samples(ps_clean, sample_type != 'negative_control')

Tag-jumping removal (Schnell 2015; NovaSeq caveat)

# Tag-jumping: cross-contamination from index hopping during library prep / sequencing
# Schnell 2015 Mol Ecol Resour 15:1289-1303 documented 0.1-2% per read pair
# NovaSeq patterned flow cells have ~10x higher rates than MiSeq
# Use platform-appropriate threshold:
#   MiSeq: 0.001-0.005 (0.1-0.5% of ASV total)
#   NovaSeq: 0.005-0.01 (0.5-1% of ASV total)
# Quantify residual rate first from per-ASV cross-sample appearance
otu <- as(otu_table(ps_clean), 'matrix')
max_per_asv <- apply(otu, 2, max)
otu_filtered <- otu
# Set threshold by platform; default below is for MiSeq
tag_jump_frac <- 0.001
for (j in 1:ncol(otu)) {
    threshold <- max_per_asv[j] * tag_jump_frac
    otu_filtered[otu[, j] < threshold, j] <- 0
}
otu_table(ps_clean) <- otu_table(otu_filtered, taxa_are_rows = FALSE)
# Modern alternative: metabaR::tagjumpslayer(metabarlist_obj, threshold = 0.03)

QC Checkpoint: Decontamination

# Gate: verify contaminant ASVs were removed from real samples
n_before <- ntaxa(ps)
n_after <- ntaxa(ps_clean)
message(sprintf('ASVs removed as contaminants: %d (%.1f%%)',
                n_before - n_after, (n_before - n_after) / n_before * 100))
if ((n_before - n_after) / n_before > 0.5) {
    message('WARNING: >50% ASVs flagged as contaminants. Review decontam threshold.')
}

Step 5: Taxonomy Assignment

OBITools3 (ecotag)

# ecotag assigns taxonomy using LCA algorithm against reference database
# Reference databases: EMBL, BOLD, MIDORI2, UNITE (marker-dependent)
obi ecotag -R reads/refdb --taxonomy reads/taxonomy reads/cleaned reads/assigned

# Filter by assignment quality (species-level for COI)
obi grep -p 'sequence["best_identity"] >= 0.97' reads/assigned reads/filtered_assigned

obi export --tab-output reads/filtered_assigned > taxonomy_results.tsv

DADA2 (assignTaxonomy)

# SILVA for 16S/18S, UNITE for ITS, custom for COI/12S
# Reference databases must be formatted for DADA2
# minBoot 50: minimum bootstrap confidence; 80 for more conservative assignments
taxa <- assignTaxonomy(seqtab_nochim, 'reference_db.fa.gz', multithread = TRUE, minBoot = 50)
taxa <- addSpecies(taxa, 'species_db.fa.gz')

Marker-specific taxonomy databases

Marker	Database	Typical Assignment Rate
COI	BOLD / Midori2	>90% to phylum, 60-80% to species
12S	MitoFish / 12S-seqdb	>90% to family for fish
ITS	UNITE	>80% to genus for fungi
rbcL	GenBank / NCBI nt	>85% to family for plants
18S	SILVA / PR2	>90% to phylum

QC Checkpoint: Taxonomy

# Gate: assignment rate should meet marker expectations
assigned <- !is.na(taxa[, 'Phylum'])
assignment_rate <- sum(assigned) / length(assigned) * 100
message(sprintf('Taxonomy assignment rate (phylum level): %.1f%%', assignment_rate))
if (assignment_rate < 60) message('WARNING: Low assignment rate. Check reference database completeness.')

Step 6: Diversity Analysis

R (iNEXT)

library(iNEXT)

otu_matrix <- as(otu_table(ps_clean), 'matrix')

# Hill numbers: q=0 (richness), q=1 (Shannon diversity), q=2 (Simpson diversity)
# endpoint: extrapolation to 2x observed sample size
inext_out <- iNEXT(as.list(as.data.frame(t(otu_matrix))),
                   q = c(0, 1, 2), datatype = 'abundance',
                   endpoint = 2 * max(rowSums(otu_matrix)))

# Sample completeness: fraction of estimated diversity observed
completeness <- inext_out$DataInfo$SC
message(sprintf('Sample completeness range: %.1f%% - %.1f%%',
                min(completeness) * 100, max(completeness) * 100))

QC Checkpoint: Diversity

# Gate 1: rarefaction approaching asymptote
if (min(completeness) < 0.80) {
    message('WARNING: Some samples have low completeness (<80%). Deeper sequencing recommended.')
}

# Gate 2: reasonable richness estimates
q0_estimates <- inext_out$AsyEst[inext_out$AsyEst$Diversity == 'Species richness', ]
message(sprintf('Estimated richness range: %d - %d species',
                min(q0_estimates$Estimator), max(q0_estimates$Estimator)))

Step 7: Community Comparison

R (vegan + indicspecies)

library(vegan)
library(indicspecies)

otu_matrix <- as(otu_table(ps_clean), 'matrix')
env_data <- as(sample_data(ps_clean), 'data.frame')

# Hellinger transformation: standard for community composition data
# Reduces influence of dominant species
otu_hell <- decostand(otu_matrix, method = 'hellinger')

# DCA on untransformed data to determine gradient length
dca <- decorana(otu_matrix)
gradient_length <- diff(range(scores(dca, display = 'sites', choices = 1)))
message(sprintf('DCA gradient length: %.2f SD', gradient_length))

# RDA: linear response (<=3 SD), uses Hellinger-transformed data
# CCA: unimodal response (>3 SD), uses raw abundances (chi-squared distance)
if (gradient_length <= 3) {
    ord <- rda(otu_hell ~ temperature + depth + season, data = env_data)
    method_name <- 'RDA'
} else {
    ord <- cca(otu_matrix ~ temperature + depth + season, data = env_data)
    method_name <- 'CCA'
}

# Permutation test for significance
# permutations 999: standard; increase to 9999 for publication
anova_result <- anova.cca(ord, permutations = 999)
message(sprintf('%s significance: p = %.4f', method_name, anova_result$`Pr(>F)`[1]))

# MANDATORY companion: PERMANOVA + PERMDISP (Anderson & Walsh 2013 Ecol Monogr 83:557-574)
# adonis2 tests centroid difference; betadisper tests dispersion homogeneity
# If betadisper is also significant, PERMANOVA significance is dispersion-confounded
bray_dist <- vegdist(otu_matrix, method = 'bray')
permanova <- adonis2(bray_dist ~ site, data = env_data,
                     by = 'margin', permutations = 999)
disp <- betadisper(bray_dist, env_data$site)
disp_test <- permutest(disp, permutations = 999)
message(sprintf('PERMANOVA p = %.4f; PERMDISP p = %.4f',
                permanova[['Pr(>F)']][1], disp_test$tab[['Pr(>F)']][1]))
if (permanova[['Pr(>F)']][1] < 0.05 && disp_test$tab[['Pr(>F)']][1] < 0.05) {
    message('WARNING: Both PERMANOVA and PERMDISP significant; location-vs-dispersion confounded')
}

# Indicator species analysis with group-size equalization (NOT basic IndVal)
# func='IndVal.g' corrects for unbalanced group sizes (De Caceres & Legendre 2009)
indval <- multipatt(otu_matrix, env_data$site, func = 'IndVal.g',
                    control = how(nperm = 999))
summary(indval, alpha = 0.05)

Parameter Recommendations

Step	Parameter	Recommendation
Cutadapt	--discard-untrimmed	Always use; removes off-target reads. MANDATORY before DADA2 filterAndTrim
Cutadapt	--minimum-length	50 (general); adjust per expected amplicon size
DADA2	truncLen	Set from quality profiles; marker-dependent (typical 2x250 COI: c(220,180))
DADA2	maxEE	c(2,2) standard; c(5,5) for degraded eDNA
DADA2	minOverlap	20 (standard); increase for short overlaps
DADA2	chimera method	'consensus' standard (conservative); 'pooled' more aggressive
OBITools3 v3	command syntax	`obi stats` (plural; was `obistat` in v1); `.tar.gz` taxonomy
OBITools3	--min-count	2 (removes singletons); 5-10 for noisy datasets
decontam	method	'combined' if concentration AND controls; 'prevalence' fallback
decontam	threshold	0.1 default; 0.05 for low-biomass samples
Tag-jumping MiSeq	threshold	0.001-0.005 fraction of ASV total
Tag-jumping NovaSeq	threshold	0.005-0.01 (~10x MiSeq per Schnell 2015)
Taxonomy	minBoot	50 (sensitive); 80 (conservative)
ecotag	--min-identity	0.97 (COI species); 0.95 (genus); marker-dependent
iNEXT	endpoint	2x max observed sample size
vegan	permutations	999 (standard); 9999 (publication)

Troubleshooting

Issue	Likely Cause	Solution
Few reads after primer removal	Wrong primer sequences or orientation	Verify primer sequences; try --revcomp
High chimera rate (>20%)	Excessive PCR cycles or low-quality input	Reduce PCR cycles; improve DNA extraction
Many unassigned ASVs	Incomplete reference database	Use marker-specific database; lower minBoot
Contamination in negatives	Tag-jumping or lab contamination	Apply tag-jump filter; review extraction protocol
Low sample completeness	Insufficient sequencing depth	Increase sequencing; pool fewer samples
Ordination axes not significant	Weak environmental gradients	Add more environmental variables; check sample size
Unexpected taxa (e.g., human)	Sample contamination	Filter known contaminants; review field protocols
Very few ASVs	Over-aggressive filtering	Relax truncLen, maxEE, or min-count thresholds

Related Skills

ecological-genomics/edna-metabarcoding - Detailed eDNA processing
ecological-genomics/biodiversity-metrics - Diversity analysis details
ecological-genomics/community-ecology - Ordination and indicator species
read-qc/quality-reports - Raw read quality assessment
reporting/automated-qc-reports - Aggregate FastQC across samples with MultiQC (a triage snapshot, not a pass/fail gate)
microbiome/amplicon-processing - 16S clinical alternative