| name | bio-workflows-longread-sv-pipeline |
| description | End-to-end workflow for detecting structural variants from long-read sequencing data. Covers ONT/PacBio alignment with minimap2 and SV calling with Sniffles or cuteSV. Use when detecting structural variants from long reads. |
| tool_type | cli |
| primary_tool | Sniffles |
| workflow | true |
| depends_on | ["long-read-sequencing/long-read-alignment","long-read-sequencing/long-read-qc","long-read-sequencing/structural-variants"] |
| qc_checkpoints | [{"after_qc":"Read N50 >10kb, quality score >Q10"},{"after_alignment":"Mapping rate >90%, coverage sufficient"},{"after_calling":"SV count reasonable, genotypes concordant"}] |
Version Compatibility
Reference examples tested with: minimap2 2.28+, Sniffles 2.2+, cuteSV 2.1+, bcftools 1.19+, samtools 1.19+, truvari 4.0+
Before using code patterns, verify installed versions match. If versions differ:
- CLI:
<tool> --version then <tool> --help to confirm flags
Use minimap2 >= 2.28 (has the lr:hq accurate-read preset and fixes the 2.27 --MD regression). Supply a reference-matched tandem-repeat BED to the caller - it is the single biggest false-positive lever in repeats.
If code throws an error, introspect the installed tool and adapt the example to the actual API rather than retrying.
Long-Read SV Pipeline
"Detect structural variants from my long-read sequencing data" -> Orchestrate minimap2 alignment, SV calling (Sniffles2/cuteSV), VCF merging across callers, annotation (AnnotSV), and visualization for ONT or PacBio data.
Complete workflow for detecting structural variants from ONT or PacBio long-read data.
Workflow Overview
Long reads (ONT/PacBio)
|
v
[1. QC] ----------------> NanoPlot
|
v
[2. Alignment] ---------> minimap2
|
v
[3. SV Calling] --------> Sniffles / cuteSV
|
v
[4. Filtering] ---------> bcftools
|
v
[5. Annotation] --------> AnnotSV (optional)
|
v
Filtered SV VCF
Primary Path: minimap2 + Sniffles
Step 1: Quality Control
NanoPlot --fastq reads.fastq.gz \
--outdir nanoplot_output \
--threads 8
Step 2: Alignment with minimap2
The -Y (soft-clip supplementary) flag is load-bearing for SV calling: it keeps the breakpoint sequence on the split reads that callers reconstruct SVs from. Use lr:hq for accurate R10/Q20 ONT instead of map-ont.
minimap2 -ax map-ont \
-t 16 \
--MD \
-Y \
reference.fa \
reads.fastq.gz | \
samtools sort -@ 4 -o aligned.bam
samtools index aligned.bam
minimap2 -ax map-hifi \
-t 16 \
--MD \
-Y \
reference.fa \
reads.fastq.gz | \
samtools sort -@ 4 -o aligned.bam
minimap2 -ax map-pb \
-t 16 \
--MD \
-Y \
reference.fa \
reads.fastq.gz | \
samtools sort -@ 4 -o aligned.bam
QC Checkpoint: Check alignment stats
samtools flagstat aligned.bam
samtools depth -a aligned.bam | awk '{sum+=$3} END {print "Average coverage:",sum/NR}'
- Mapping rate >90%
- Average coverage >10x for SV calling (>20x preferred)
Step 3: SV Calling with Sniffles
sniffles \
--input aligned.bam \
--vcf svs.vcf.gz \
--reference reference.fa \
--threads 8 \
--minsvlen 50
sniffles \
--input aligned.bam \
--vcf svs.vcf.gz \
--reference reference.fa \
--tandem-repeats tandem_repeats.bed \
--threads 8
Alternative: cuteSV
cuteSV's defaults are NOT platform-appropriate; pass the platform-matched cluster-bias/merge-ratio set (ONT shown below; HiFi uses 1000/0.9/1000/0.5, CLR uses 100/0.3/200/0.5). --genotype is off by default.
cuteSV \
aligned.bam \
reference.fa \
svs.vcf \
work_dir/ \
--threads 8 \
--genotype \
--max_cluster_bias_INS 100 --diff_ratio_merging_INS 0.3 \
--max_cluster_bias_DEL 100 --diff_ratio_merging_DEL 0.3
bgzip svs.vcf
tabix svs.vcf.gz
Step 4: Filtering
bcftools view -i 'QUAL>=20 && ABS(SVLEN)>=50' svs.vcf.gz -Oz -o svs.filtered.vcf.gz
bcftools view -i 'SVTYPE="DEL" || SVTYPE="INS"' svs.filtered.vcf.gz -Oz -o del_ins.vcf.gz
bcftools view -i 'GT="1/1" || GT="0/1"' svs.filtered.vcf.gz -Oz -o genotyped.vcf.gz
bcftools stats svs.filtered.vcf.gz > sv_stats.txt
Step 5: Annotation (Optional)
AnnotSV -SVinputFile svs.filtered.vcf.gz \
-outputFile annotated_svs \
-genomeBuild GRCh38
Multi-Sample SV Calling
for sample in sample1 sample2 sample3; do
sniffles --input ${sample}.bam \
--snf ${sample}.snf \
--reference reference.fa
done
sniffles --input sample1.snf sample2.snf sample3.snf \
--vcf merged_svs.vcf.gz \
--reference reference.fa
Parameter Recommendations
| Tool | Parameter | ONT | PacBio HiFi |
|---|
| minimap2 | -ax | map-ont (R9) / lr:hq (R10) | map-hifi |
| Sniffles | --minsvlen | 35 default (set 50 for the GIAB >=50bp convention) | same |
| Sniffles | --minsupport | auto (coverage-derived) | auto |
| Sniffles | --tandem-repeats | reference-matched TR BED (critical) | same |
| cuteSV | INS/DEL cluster-bias, merge-ratio | 100/0.3, 100/0.3 | 1000/0.9, 1000/0.5 |
Benchmark calls with Truvari against GIAB (truvari bench then truvari refine), and state the region set, TR BED, and Truvari params - they move precision/recall as much as the caller. For tumor-normal somatic SVs use a paired caller (Severus/nanomonsv), not Sniffles --mosaic.
SV Types Detected
| Type | Abbreviation | Description |
|---|
| Deletion | DEL | Sequence removed |
| Insertion | INS | Sequence added |
| Duplication | DUP | Sequence copied |
| Inversion | INV | Sequence reversed |
| Translocation | BND | Breakend (interchromosomal) |
Troubleshooting
| Issue | Likely Cause | Solution |
|---|
| Few SVs | Low coverage | Increase sequencing depth |
| Many false positives | Low quality reads | Filter by QUAL, increase min support |
| Missing known SV | Repeat region | Use tandem repeat annotations |
| High breakend count | Mapping artifacts | Check alignment quality |
Complete Pipeline Script
#!/bin/bash
set -e
THREADS=16
READS="reads.fastq.gz"
REF="reference.fa"
SAMPLE="sample1"
OUTDIR="sv_results"
mkdir -p ${OUTDIR}/{qc,aligned,sv}
echo "=== QC ==="
NanoPlot --fastq ${READS} --outdir ${OUTDIR}/qc -t ${THREADS}
echo "=== Alignment ==="
minimap2 -ax map-ont -t ${THREADS} --MD -Y ${REF} ${READS} | \
samtools sort -@ 4 -o ${OUTDIR}/aligned/${SAMPLE}.bam
samtools index ${OUTDIR}/aligned/${SAMPLE}.bam
echo "Alignment stats:"
samtools flagstat ${OUTDIR}/aligned/${SAMPLE}.bam
echo "=== SV Calling ==="
sniffles --input ${OUTDIR}/aligned/${SAMPLE}.bam \
--vcf ${OUTDIR}/sv/${SAMPLE}.vcf.gz \
--reference ${REF} \
--threads ${THREADS}
echo "=== Filtering ==="
bcftools view -i 'QUAL>=20' ${OUTDIR}/sv/${SAMPLE}.vcf.gz \
-Oz -o ${OUTDIR}/sv/${SAMPLE}.filtered.vcf.gz
bcftools index ${OUTDIR}/sv/${SAMPLE}.filtered.vcf.gz
bcftools stats ${OUTDIR}/sv/${SAMPLE}.filtered.vcf.gz > ${OUTDIR}/sv/stats.txt
echo "=== Complete ==="
echo "SVs: $(bcftools view -H ${OUTDIR}/sv/${SAMPLE}.filtered.vcf.gz | wc -l)"
Related Skills
- long-read-sequencing/long-read-alignment - minimap2 preset and
-Y details
- long-read-sequencing/structural-variants - Sniffles2 .snf workflow, cuteSV per-platform params, Truvari
- long-read-sequencing/long-read-qc - Read QC and chimera screening before SV calling
- long-read-sequencing/haplotype-phasing - Haplotag the BAM for phased/somatic SVs
- variant-calling/structural-variant-calling - Short-read SV methods