| name | bio-genome-assembly-assembly-polishing |
| description | Polish genome assemblies to reduce errors using short reads (Pilon), long reads (Racon), or ONT-specific tools (medaka). Essential for improving long-read assembly accuracy. Use when improving assembly accuracy with polishing tools. |
| tool_type | cli |
| primary_tool | Pilon |
Version Compatibility
Reference examples tested with: BWA 0.7.17+, QUAST 5.2+, minimap2 2.26+, samtools 1.19+
Before using code patterns, verify installed versions match. If versions differ:
- CLI:
<tool> --version then <tool> --help to confirm flags
If code throws ImportError, AttributeError, or TypeError, introspect the installed
package and adapt the example to match the actual API rather than retrying.
Assembly Polishing
"Polish my genome assembly" ā Iteratively correct base-level errors in a draft assembly using short-read or long-read alignments.
- CLI:
pilon --genome draft.fa --frags short_reads.bam (short-read), medaka_polish (ONT), racon (long-read)
Polishing Strategies
| Tool | Input Reads | Best For |
|---|
| Pilon | Illumina | Final polishing |
| medaka | ONT | ONT assemblies |
| Racon | Long reads | Quick polishing |
| NextPolish | Both | Combined approach |
Recommended Workflows
ONT Assembly
- Racon (2-3 rounds with ONT)
- medaka (1 round)
- Pilon (2-3 rounds with Illumina)
PacBio CLR Assembly
- Racon (2-3 rounds)
- Pilon (2-3 rounds with Illumina)
PacBio HiFi Assembly
- Often no polishing needed (>99% accuracy)
- Optional Pilon if Illumina available
Pilon (Illumina Polishing)
Installation
conda install -c bioconda pilon
Basic Usage
bwa index assembly.fasta
bwa mem -t 16 assembly.fasta R1.fq.gz R2.fq.gz | samtools sort -o aligned.bam
samtools index aligned.bam
pilon --genome assembly.fasta --frags aligned.bam --output polished
Key Options
| Option | Description |
|---|
--genome | Input assembly |
--frags | Paired-end BAM |
--output | Output prefix |
--changes | Write changes file |
--vcf | Write VCF of changes |
--fix | What to fix (snps, indels, gaps, all) |
--threads | Threads for alignment |
--mindepth | Min depth for correction |
Multiple Rounds
#!/bin/bash
ASSEMBLY=$1
R1=$2
R2=$3
ROUNDS=${4:-3}
current=$ASSEMBLY
for i in $(seq 1 $ROUNDS); do
echo "=== Pilon round $i ==="
bwa index $current
bwa mem -t 16 $current $R1 $R2 | samtools sort -o round${i}.bam
samtools index round${i}.bam
pilon --genome $current --frags round${i}.bam --output pilon_round${i} --changes
current=pilon_round${i}.fasta
changes=$(wc -l < pilon_round${i}.changes)
echo "Changes made: $changes"
if [ $changes -eq 0 ]; then
echo "No more changes, stopping"
break
fi
done
cp $current final_polished.fasta
Fix Specific Issues
pilon --genome assembly.fa --frags aligned.bam --output polished --fix snps,indels
pilon --genome assembly.fa --frags aligned.bam --output polished --fix gaps
medaka (ONT Polishing)
Installation
conda install -c bioconda medaka
Basic Usage
medaka_consensus -i reads.fastq.gz -d assembly.fasta -o medaka_output -t 8
Key Options
| Option | Description |
|---|
-i | Input reads |
-d | Draft assembly |
-o | Output directory |
-t | Threads |
-m | Model name |
Model Selection
medaka tools list_models
medaka_consensus -i reads.fq.gz -d assembly.fa -o output -m r1041_e82_400bps_sup_v5.1.0
Models for Common Chemistries
| Chemistry | Model |
|---|
| R10.4.1 + SUP | r1041_e82_400bps_sup_* |
| R10.4.1 + HAC | r1041_e82_400bps_hac_* |
| R9.4.1 + SUP | r941_sup_* |
Output
medaka_output/
āāā consensus.fasta # Polished assembly
āāā calls_to_draft.bam # Alignments
āāā *.hdf # Intermediate files
Racon (Long-Read Polishing)
Installation
conda install -c bioconda racon
Basic Usage
minimap2 -ax map-ont assembly.fasta reads.fastq.gz > aligned.sam
racon -t 16 reads.fastq.gz aligned.sam assembly.fasta > polished.fasta
Multiple Rounds
#!/bin/bash
ASSEMBLY=$1
READS=$2
ROUNDS=${3:-3}
current=$ASSEMBLY
for i in $(seq 1 $ROUNDS); do
echo "=== Racon round $i ==="
minimap2 -ax map-ont $current $READS > round${i}.sam
racon -t 16 $READS round${i}.sam $current > racon_round${i}.fasta
current=racon_round${i}.fasta
done
cp $current racon_polished.fasta
Key Options
| Option | Description |
|---|
-t | Threads |
-m | Match score (default: 3) |
-x | Mismatch score (default: -5) |
-g | Gap penalty (default: -4) |
-w | Window size (default: 500) |
Complete Polishing Workflow
Goal: Maximize assembly accuracy through iterative multi-tool polishing.
Approach: Apply Racon with long reads, then medaka for ONT-specific error correction, then Pilon with short reads for final accuracy.
ONT Assembly Polishing
#!/bin/bash
set -euo pipefail
ASSEMBLY=$1
ONT_READS=$2
ILLUMINA_R1=$3
ILLUMINA_R2=$4
OUTDIR=$5
mkdir -p $OUTDIR
echo "=== Racon Polishing ==="
current=$ASSEMBLY
for i in 1 2; do
minimap2 -ax map-ont $current $ONT_READS > ${OUTDIR}/racon_${i}.sam
racon -t 16 $ONT_READS ${OUTDIR}/racon_${i}.sam $current > ${OUTDIR}/racon_${i}.fasta
current=${OUTDIR}/racon_${i}.fasta
done
echo "=== medaka Polishing ==="
medaka_consensus -i $ONT_READS -d $current -o ${OUTDIR}/medaka -t 8
current=${OUTDIR}/medaka/consensus.fasta
echo "=== Pilon Polishing ==="
for i in 1 2; do
bwa index $current
bwa mem -t 16 $current $ILLUMINA_R1 $ILLUMINA_R2 | samtools sort -o ${OUTDIR}/pilon_${i}.bam
samtools index ${OUTDIR}/pilon_${i}.bam
pilon --genome $current --frags ${OUTDIR}/pilon_${i}.bam --output ${OUTDIR}/pilon_${i}
current=${OUTDIR}/pilon_${i}.fasta
done
cp $current ${OUTDIR}/final_polished.fasta
echo "Done: ${OUTDIR}/final_polished.fasta"
NextPolish (Combined Approach)
Installation
conda install -c bioconda nextpolish
Usage
cat > run.cfg << EOF
[General]
job_type = local
job_prefix = nextPolish
task = best
rewrite = yes
rerun = 3
parallel_jobs = 2
multithread_jobs = 8
genome = assembly.fasta
genome_size = auto
workdir = ./01_rundir
[lgs_option]
lgs_fofn = lgs.fofn
lgs_options = -min_read_len 1k -max_depth 100
lgs_minimap2_options = -x map-ont
[sgs_option]
sgs_fofn = sgs.fofn
sgs_options = -max_depth 100
EOF
ls reads.fastq.gz > lgs.fofn
ls R1.fq.gz R2.fq.gz > sgs.fofn
nextPolish run.cfg
Quality Assessment
Goal: Measure the accuracy improvement from polishing.
Approach: Compare original and polished assemblies with QUAST against a reference, and check alignment error rates.
After polishing, assess improvement:
quast.py -r reference.fa original.fa polished.fa -o quast_comparison
minimap2 -ax map-ont polished.fa reads.fq.gz | samtools stats | grep "error rate"
Related Skills
- long-read-assembly - Initial assembly
- short-read-assembly - Source of polishing reads
- assembly-qc - Assess polishing improvement
- long-read-sequencing - medaka variant calling