Run any Skill in Manus with one click

bio-ctdna-mutation-detection

Detects somatic mutations in circulating tumor DNA, treating low-VAF detection as a signal-versus-noise problem set by error suppression and molecules sampled, not by the choice of caller. Distinguishes de novo CALLING (scanning a panel for unknown variants, bounded by per-locus error and multiple testing) from tumor-informed DETECTION (tracking a pre-specified variant set, where panel integration reaches single-ppm). Covers VarDict and Mutect2 for de novo calling, UMI-aware callers, and a pysam-based known-variant VAF tracker, with matched-WBC subtraction as the mandatory defense against clonal hematopoiesis (the dominant false positive). Use when calling or tracking tumor mutations from plasma cfDNA, setting a VAF threshold, or deciding whether a low-VAF call is tumor versus CHIP.

Run Skill in Manus

Stars1

Forks0

UpdatedJune 8, 2026 at 03:24

Source

bg-szy

bg-szy/TOP-SKILLS

View GitHub Repository View Creator Repositories

Install command

Download

Run Skill in Manus

File Explorer

2 files

SKILL.md

readonly

More from this repository

same repository

bio-analytical-validation

bg-szy/TOP-SKILLS

Treats a ctDNA assay as a molecule-counting experiment at the Poisson edge and builds its analytical-validation case the measurement-science way. Covers the genome-equivalent currency (~330 haploid copies/ng), the lambda = input_GE x VAF sampling ceiling (lambda>=3 for ~95% detection), the error-suppression ladder (raw NGS ~1e-3 -> single-strand UMI ~1e-4/1e-5 -> duplex <1e-7), the CLSI EP17 LoB/LoD/LoD95/LoQ framework, the per-locus-vs-panel-integrated LoD distinction that lets bespoke MRD reach ppm, contrived/SEQC2 reference standards, and honest LoD reporting conditioned on input mass + consensus depth + replicate detection rate. Use when stating or trusting a sensitivity claim, designing a dilution-series validation, deciding how many genome equivalents are needed at a target VAF, choosing a single-locus vs panel-integrated LoD, or auditing a "detects 0.1% VAF" claim.

2026-06-081

bio-cfdna-preprocessing

bg-szy/TOP-SKILLS

Decides how to preprocess plasma cfDNA sequencing data so the recoverable signal survives - library-prep-aware fragment expectations (dsDNA vs ssDNA/adaptase prep), UMI/duplex consensus with fgbio (ExtractUmisFromBam, GroupReadsByUmi --strategy paired for duplex, CallMolecularConsensusReads vs CallDuplexConsensusReads, FilterConsensusReads min-reads "total s1 s2"), the align->group->consensus->RE-align ordering, and the cfDNA dedup trap where naive coordinate dedup collapses nucleosome-coincident independent molecules. Covers when single-strand consensus suffices vs when duplex is mandatory, the singleton/sensitivity tax at low input, and reading the insert-size histogram as a pre-analytical QC instrument. Use when processing plasma cfDNA reads before fragmentomics, ctDNA mutation calling, or tumor-fraction estimation.

2026-06-081

bio-comparative-genomics-ortholog-inference

bg-szy/TOP-SKILLS

Infer orthologous genes and gene families across species using OrthoFinder3 (HOG-based phylogenetic orthology), SonicParanoid2, Broccoli, ProteinOrtho, OMA / FastOMA hierarchical orthologous groups, eggNOG-mapper, JustOrthologs, and TOGA whole-genome-alignment orthology. Use when building single-copy ortholog sets for phylogenomics, classifying co-orthologs and in/out-paralogs after gene duplication, propagating functional annotation via orthology with awareness of the ortholog conjecture, distinguishing speciation from duplication via gene-tree species-tree reconciliation, computing Quest-for-Orthologs benchmark performance, or running synteny-aware ortholog detection in WGD-affected lineages.

2026-06-081

bio-crispr-screens-batch-correction

bg-szy/TOP-SKILLS

Batch effect correction for CRISPR screens covering ComBat empirical-Bayes, RUV, SVA, control-sgRNA normalization, and the model-based alternative of including batch as a covariate in MAGeCK MLE or Chronos. Covers screen-specific batch sources (passage cohort, library lot, infection day, sequencing run, Cas9 lot, FBS lot), PCA + variance-decomposition diagnostic to decide if correction is needed, when correction harms biology by over-correcting condition into batch, limma removeBatchEffect for visualization-only correction, and relationship to multi-condition design matrices. Use when combining screens for joint analysis, when passage cohort confounds biology, when DepMap-style panels need Chronos with batch covariates, when picking ComBat vs RUV, or when correction harms biology and should be replaced with explicit covariate modeling.

2026-06-081

bio-crispr-screens-batch-correction

bg-szy/TOP-SKILLS

Batch effect correction for CRISPR screens. Covers normalization across batches, technical replicate handling, and batch-aware analysis. Use when combining screens from multiple batches or correcting systematic technical variation.

2026-06-081

bio-hi-c-analysis-compartment-analysis

bg-szy/TOP-SKILLS

Detects A/B chromatin compartments from balanced Hi-C contact matrices via eigenvector decomposition of the distance-normalized, Pearson-correlated cis matrix with cooltools (eigs_cis), then orients (phases) the compartment eigenvector against a GC or gene-density track so the active (A) sign is not arbitrary. Covers the eigenvector-is-a-choice problem (per-arm view_df to remove the centromere gradient; picking the eigenvector by max correlation with activity, not by eigenvalue), GC phasing with bioframe.frac_gc, resolution choice (100kb-1Mb), saddle plots and saddle_strength for compartmentalization strength, the cohesin-loss-strengthens-compartments result, subcompartments (SNIPER/Calder/dcHiC), and cross-condition compartment switching. Use when calling A/B compartments, computing E1/eigenvectors, phasing the eigenvector, building saddle plots, choosing a compartment resolution, quantifying compartment strength, or comparing compartmentalization across conditions.

2026-06-081

name	bio-ctdna-mutation-detection
description	Detects somatic mutations in circulating tumor DNA, treating low-VAF detection as a signal-versus-noise problem set by error suppression and molecules sampled, not by the choice of caller. Distinguishes de novo CALLING (scanning a panel for unknown variants, bounded by per-locus error and multiple testing) from tumor-informed DETECTION (tracking a pre-specified variant set, where panel integration reaches single-ppm). Covers VarDict and Mutect2 for de novo calling, UMI-aware callers, and a pysam-based known-variant VAF tracker, with matched-WBC subtraction as the mandatory defense against clonal hematopoiesis (the dominant false positive). Use when calling or tracking tumor mutations from plasma cfDNA, setting a VAF threshold, or deciding whether a low-VAF call is tumor versus CHIP.
tool_type	mixed
primary_tool	VarDict

Version Compatibility

Reference examples tested with: pysam 0.22+, pandas 2.2+, VarDictJava 1.8+, GATK 4.5+, Ensembl VEP 111+

Before using code patterns, verify installed versions match. If versions differ:

Python: pip show <package> then help(module.function) to check signatures
CLI: <tool> --version then <tool> --help to confirm flags

If code throws ImportError, AttributeError, or TypeError, introspect the installed package and adapt the example to match the actual API rather than retrying.

Notes specific to this skill: VarDict's -c -S -E -g are 1-based BED COLUMN INDICES, not genomic coordinates; var2vcf_valid.pl's -E suppresses the END tag (opposite meaning to VarDict's -E). VEP gnomAD flags are --af_gnomade (exomes)/--af_gnomadg (genomes); the bare --af_gnomad is a legacy alias that returns only exome AF, so prefer the explicit forms.

ctDNA Mutation Detection

"Detect mutations in my cfDNA sample" -> Either scan a panel for unknown low-VAF somatic variants (de novo calling) or quantify a pre-specified mutation set across samples (tumor-informed tracking) — two different statistical problems.

CLI: vardict-java | teststrandbias.R | var2vcf_valid.pl for de novo low-VAF calling on a consensus BAM
CLI: gatk Mutect2 with the read-orientation model for de novo calling with artifact filtering
Python: pysam pileup of ref/alt counts at fixed loci for known-variant tracking (MRD)

The Single Most Important Modern Insight -- detection is a signal-vs-noise problem, and tracking is not the same problem as calling

Low-VAF ctDNA detection is set by two limits that no caller can overcome: the per-base error floor (raw Illumina ~1e-3 caps naive VAF detection near 0.5-1%) and the number of tumor molecules physically present in the tube (1 ng cfDNA ~= 303 haploid genome-equivalents; at 0.01% VAF in 10 ng the expected mutant count is ~0.3 copies — there is nothing to detect at any depth). The achieved limit of detection is the worse of the two. Error suppression (UMI consensus -> ~1e-5, duplex -> <1e-7) and input mass move the floor; swapping VarDict for Mutect2 does not.

Critically, de novo CALLING and known-variant DETECTION are different statistical problems. De novo calling scans every covered position for an unknown alt and pays a multiple-testing tax across 1e5-1e6 loci, so per-locus thresholds must be stringent (practical LoD ~0.1-0.5% on UMI consensus). Tumor-informed detection tests ONE hypothesis — "is tumor present?" — by integrating signal across a pre-specified set of N patient-specific loci; the multiple-testing penalty collapses and per-locus signal that is individually indistinguishable from noise sums into a confident panel-level call. This is why per-locus LoD is poor while panel-integrated LoD reaches single-ppm. Conflating the two is the most common conceptual error in the field.

Methods Landscape

Method	Class	Citation	Role	When
VarDict / vardict-java	de novo caller	AstraZeneca-NGS	sensitive low-VAF amplicon/capture calling with explicit strand-bias test	de novo panel calling on a UMI-consensus BAM
Mutect2 (tumor-only)	de novo caller	GATK	local-assembly somatic caller + learned orientation-bias artifact model	de novo calling needing FFPE/OxoG artifact filtering, PoN, germline resource
umi-varcal	UMI-aware caller	Sater 2020 Bioinformatics 36(9):2718	own UMI-aware pileup + per-position Poisson test against local background	UMI-tagged BAM where a consensus-aware caller is wanted (floor ~0.3%)
CAPP-Seq / iDES	tumor-informed integration	Newman 2014 Nat Med 20:548; 2016 Nat Biotechnol 34:547	hybrid-capture deep panel + molecular barcoding + in-silico background polishing	de novo ctDNA to ~0.02%; iDES stacks ~15x error suppression
INVAR	tumor-informed integration	Wan 2020 Sci Transl Med 12:eaaz8084	integrate variant reads across 100s-1000s patient loci, background-weighted	MRD/monitoring with tumor WES; quantifies to ~1e-5, best ~2.5 ppm
MRDetect	tumor-informed integration	Zviran 2020 Nat Med 26:1114	shallow WGS vs patient SNV compendium, read-level SVM noise model	MRD trading depth for breadth (~35x WGS, thousands of SNVs); ~1e-5
PhasED-Seq	tumor-informed integration	Kurtz 2021 Nat Biotechnol 39:1537	enrich phased (co-occurring) variants to suppress single-molecule error	sub-ppm MRD where phased variants are available

Per-locus LoD for any of these is error- and sampling-limited (~0.1-0.5%); the tumor-informed methods reach ppm only by integrating across a known, large, patient-specific variant set. Methodology evolves — verify current best practice against each tool's live docs before committing to one.

Decision Tree by Scenario

Scenario	Recommended	Why
Tumor tissue available, MRD/monitoring of a known cancer	Tumor-informed tracking (INVAR/MRDetect/Signatera-class), or the pysam tracker below for a fixed list	Integrating across N pre-specified loci is the only route to ppm; CHIP excluded by construction (CHIP variants are not on the tumor list)
No tumor tissue, screening/discovery	de novo panel calling (VarDict or Mutect2) + matched WBC	Must scan for unknown variants; CHIP subtraction is mandatory or most calls are not tumor
VAF regime > 1%	Any standard caller on a deduplicated BAM	Above the raw error floor; consensus not strictly required
VAF regime 0.1-1%	UMI single-strand consensus + VarDict/umi-varcal/Mutect2	Below the raw 1e-3 floor; consensus needed to recover real signal from error
VAF regime < 0.1%	Duplex consensus + tumor-informed integration	Single-strand consensus cannot remove one-strand deamination/oxidation; only duplex + panel integration reaches this regime
No matched WBC available	Do NOT report de novo calls as somatic-tumor	Without WBC subtraction, CHIP (the majority of non-germline cfDNA variants) is indistinguishable from tumor

CHIP -- the dominant false positive, not background

Clonal hematopoiesis of indeterminate potential (CHIP) is the single largest source of false-positive somatic calls in plasma, and it is the null hypothesis for any low-VAF cfDNA variant. Razavi 2019 sequenced cfDNA with matched white-blood-cell DNA (508 genes, >60,000x) and found that 53.2% of non-germline cfDNA variants in cancer patients and 81.6% in non-cancer controls had features consistent with clonal hematopoiesis; only ~24.4% of cfDNA somatic variants in patients were also in the matched tumor (the remainder split between white-cell CHIP and variants of uncertain origin). These are bona fide somatic mutations — in cancer genes — that come from lysed leukocytes, not tumor. No error-suppression tier removes them because they are not errors.

The biology: CHIP arises in hematopoietic stem cells and rises steeply with age (Jaiswal 2014: ~10% prevalence over age 70), enriched for PPM1D/TP53/CHEK2 clones after prior chemo/radiation — exactly the monitored population. The canonical genes are DNMT3A, TET2, ASXL1 (the big three), then PPM1D, TP53, JAK2, SF3B1, SRSF2, GNB1, GNAS, CBL, ATM, CHEK2. TP53 and ATM are both CHIP genes and bona fide tumor suppressors, so a low-VAF TP53 cfDNA call is the ambiguous case par excellence.

The only reliable filter is matched buffy-coat/WBC subtraction: sequence the WBC fraction of the same draw at comparable depth and remove any cfDNA variant also present in WBC. gnomAD filtering removes germline only — CHIP variants are somatic and absent from germline databases, so they sail straight through. A canonical-CHIP-gene list (the example's CHIP_GENES) is a heuristic flag for extra scrutiny, NOT a substitute for WBC subtraction. See analytical-validation for the LoB/LoD statistics that quantify how confidently a subtracted call clears background.

De Novo Calling with VarDict

Goal: Scan a target panel for unknown low-VAF somatic variants on a UMI-consensus BAM.

Approach: Run vardict-java with a lowered -f, pipe through the strand-bias test, then convert to VCF — matching -f across both stages so the threshold is not silently re-applied.

AF_THR=0.005   # 0.5% — practical UMI-consensus de novo floor; below this approaches the per-base error floor
vardict-java -G ref.fa -f $AF_THR -N sample -b consensus.bam \
  -c 1 -S 2 -E 3 -g 4 targets.bed | \
  teststrandbias.R | \
  var2vcf_valid.pl -N sample -E -f $AF_THR > sample.vcf

Key flags: -G indexed reference; -f min VAF (VarDict default 0.01); -N sample name; -b BAM. -c 1 -S 2 -E 3 -g 4 are the 1-based BED COLUMN INDICES for chrom/start/end/gene in a standard 4-column BED — they are column positions, not genomic values. On var2vcf_valid.pl, -E means "do NOT print the END tag" (unrelated to VarDict's -E); its -f default is 0.02, so set it to match. For PCR/amplicon data add -P 0 (positional std is expected to be ~0). For paired tumor/normal use the testsomatic.R | var2vcf_paired.pl path instead.

De Novo Calling with Mutect2 and the Orientation-Bias Model

Goal: Call de novo somatic variants while filtering FFPE-deamination (C>T) and OxoG (G>T) strand-biased artifacts that dominate low-VAF false positives.

Approach: Collect F1R2/F2R1 counts during calling, learn the orientation-bias prior, then apply it during filtering alongside a panel of normals and germline resource.

gatk Mutect2 -R ref.fa -I consensus.bam --f1r2-tar-gz f1r2.tar.gz \
  --germline-resource af-only-gnomad.vcf.gz --panel-of-normals pon.vcf.gz \
  -O unfiltered.vcf.gz
gatk LearnReadOrientationModel -I f1r2.tar.gz -O read-orientation-model.tar.gz
gatk FilterMutectCalls -R ref.fa -V unfiltered.vcf.gz \
  --ob-priors read-orientation-model.tar.gz -O filtered.vcf.gz

Mutect2 is run tumor-only here (no normal sample arg); the orientation model is the load-bearing low-VAF filter. At true ctDNA VAFs Mutect2 is underpowered relative to a dedicated UMI/duplex + background-polishing pipeline, and local assembly can miss extremely low-AF alt support — it is a reasonable de novo caller on consensus reads with the orientation model + PoN (and ideally a matched normal, not shown in this tumor-only command), not a substitute for tumor-informed integration at ppm.

Track Known Mutations Across Serial Samples

Goal: Quantify the VAF of a pre-specified mutation set at fixed loci for MRD monitoring — the detection (not calling) problem.

Approach: For each target mutation, pileup reads at the position, count ref/alt/other alleles, and compute VAF with depth; aggregate across loci as the panel-level detection signal. The single-base pileup below tracks SNVs only — indel reporters (e.g. EGFR exon-19 deletions) need read.indel/CIGAR-aware counting; a single-base comparison silently scores every indel read as other and reports the locus as cleared.

import pysam

def track_known_variants(bam_file, variants):
    '''Pileup ref/alt counts at fixed (chrom, pos, ref, alt) SNV loci; pos is 1-based.
    SNVs only - indel reporters need read.indel/CIGAR handling, not a single-base compare.'''
    bam = pysam.AlignmentFile(bam_file, 'rb')
    rows = []
    for chrom, pos, ref, alt in variants:
        counts = {'ref': 0, 'alt': 0, 'other': 0}
        for col in bam.pileup(chrom, pos - 1, pos, truncate=True):
            for read in col.pileups:
                if read.is_del or read.is_refskip:
                    continue
                base = read.alignment.query_sequence[read.query_position]
                counts['alt' if base == alt else 'ref' if base == ref else 'other'] += 1
        depth = sum(counts.values())
        rows.append({'chrom': chrom, 'pos': pos, 'ref': ref, 'alt': alt,
                     'depth': depth, 'alt_count': counts['alt'],
                     'vaf': counts['alt'] / depth if depth else 0.0})
    bam.close()
    return rows

Annotate calls for interpretation with Ensembl VEP (--cache --offline --fasta --vcf --everything); the gnomAD allele-frequency flags are --af_gnomade (exomes) and --af_gnomadg (genomes) — the bare --af_gnomad is a legacy alias returning only exome AF. gnomAD presence separates germline; only WBC presence separates CHIP.

Per-Method Failure Modes

CHIP misclassified as tumor

Trigger: de novo calling without matched WBC. Mechanism: leukocyte-derived clonal somatic variants in cancer genes look identical to tumor signal and pass gnomAD filtering. Symptom: low-VAF calls in DNMT3A/TET2/TP53; "tumor" mutations not in the matched tissue. Fix: subtract matched buffy-coat/WBC genotype; never report somatic-tumor without it.

Strand-biased / deamination artifacts at low VAF

Trigger: FFPE-style C>T or oxidative G>T at VAF near the floor. Mechanism: damage on one template strand is inherited by every PCR copy, so single-strand UMI consensus votes unanimously for the artifact. Symptom: alt support concentrated on one strand. Fix: VarDict strand-bias test or Mutect2 orientation model; for sub-0.1% require duplex consensus.

Calling below the error floor

Trigger: lowering -f to e.g. 0.001 on a non-consensus BAM. Mechanism: the raw ~1e-3 error rate manufactures alt reads at that frequency. Symptom: a flood of low-VAF calls scaling with depth. Fix: do consensus upstream; do not set a VAF threshold below the demonstrated error floor of the input.

Germline-vs-somatic confusion at low coverage

Trigger: classifying by VAF alone when depth is low. Mechanism: a true 50% het reads 3/12 = 0.25 by chance. Symptom: germline hets mislabeled subclonal somatic. Fix: gnomAD + matched-WBC presence (germline ~0.5 in WBC; CHIP at clone VAF; tumor-only absent from WBC).

Per-locus LoD quoted as the assay LoD

Trigger: reporting a single-variant sensitivity for a multi-locus tracking assay (or vice versa). Mechanism: panel-integrated LoD is orders of magnitude below per-locus LoD. Symptom: a "0.1%" claim that does not match observed ppm-level tracking. Fix: state per-locus vs panel-integrated explicitly; see analytical-validation.

Quantitative Thresholds

Threshold	Source	Rationale
Raw Illumina error ~1e-3 caps naive VAF near 0.5-1%	Schmitt 2012 PNAS 109:14508	Per-base miscall rate sets the per-locus VAF floor; alt support below it is mostly error
UMI single-strand consensus -> ~1e-5; duplex -> <1e-7	Schmitt 2012; Newman 2016 Nat Biotechnol 34:547	Family consensus erases PCR/sequencing error; duplex strand concordance also catches one-strand damage
CAPP-Seq de novo LoD ~0.02% at 96% specificity	Newman 2014 Nat Med 20:548	Deep hybrid-capture + reporter set; demonstrates the de novo panel floor
iDES ~15x error suppression (UMI ~3x x polishing ~3x)	Newman 2016 Nat Biotechnol 34:547	Molecular consensus and in-silico background polishing are orthogonal and stack
INVAR quantifies to ~1e-5, detects to ~2.5 ppm	Wan 2020 Sci Transl Med 12:eaaz8084	Integrating variant reads across 100s-1000s of patient loci collapses multiple testing
MRDetect ~1e-5 tumor fraction at ~35x WGS, 95% spec	Zviran 2020 Nat Med 26:1114	Breadth (thousands of SNVs) + read-level SVM (~14.4x error reduction) substitutes for depth
CHIP = 53.2% (cancer pts) / 81.6% (controls) of cfDNA variants	Razavi 2019 Nat Med 25:1928	Most non-germline cfDNA variants are not tumor; matched WBC is mandatory
Depth >= 1000-5000x unique consensus for panels	community / Phallen 2017 Sci Transl Med 9:eaan2415 (~30,000x)	Detecting <1% VAF needs enough unique molecules sampled at each locus
~303 genome-equivalents per ng cfDNA (3.3 pg/haploid)	standard constant	Input mass sets a hard Poisson ceiling on detectable VAF independent of sequencing
LoB / LoD / LoD95 per CLSI EP17	CLSI EP17-A2	A bare VAF without input mass + replicate detection rate is not a sensitivity spec

Common Errors

Error / symptom	Cause	Solution
Flood of low-VAF calls scaling with depth	`-f` set below the input's error floor on non-consensus reads	Do UMI/duplex consensus first; keep `-f` >= demonstrated floor
VarDict emits nothing or wrong regions	`-c -S -E -g` read as genomic values	They are 1-based BED column indices; use `-c 1 -S 2 -E 3 -g 4` for a 4-column BED
var2vcf re-filters away VarDict calls	var2vcf_valid.pl `-f` default 0.02 mismatched	Set var2vcf `-f` to match VarDict's `-f`
Real amplicon calls dropped as positional artifacts	var2vcf `-P` (filter pstd=0) on by default	Add `-P 0` for PCR/amplicon data
"Tumor" variants absent from matched tissue	CHIP not subtracted	Sequence and subtract matched WBC; flag CHIP-gene hits
`--af_gnomad` returns only exome AF	bare flag is a legacy exome-only alias	Use `--af_gnomade` (exomes) / `--af_gnomadg` (genomes)
ppm "LoD" not reproducible	per-locus LoD quoted for a tracking assay	Report panel-integrated LoD with input mass and LoD95

References

Schmitt MW, Kennedy SR, Salk JJ, Fox EJ, Hiatt JB, Loeb LA. 2012. Detection of ultra-rare mutations by next-generation sequencing. Proc Natl Acad Sci USA 109(36):14508-14513. — Duplex Sequencing; DCS error <1e-7.
Newman AM, Bratman SV, To J, et al. 2014. An ultrasensitive method for quantitating circulating tumor DNA with broad patient coverage. Nat Med 20(5):548-554. — CAPP-Seq; ~0.02% LoD.
Newman AM, Lovejoy AF, Klass DM, et al. 2016. Integrated digital error suppression for improved detection of circulating tumor DNA. Nat Biotechnol 34(5):547-555. — iDES; UMI + background polishing.
Phallen J, Sausen M, Adleff V, et al. 2017. Direct detection of early-stage cancers using circulating tumor DNA. Sci Transl Med 9(403):eaan2415. — TEC-Seq deep panel.
Razavi P, Li BT, Brown DN, et al. 2019. High-intensity sequencing reveals the sources of plasma circulating cell-free DNA variants. Nat Med 25(12):1928-1937. — CHIP is the majority of cfDNA variants; matched WBC.
Wan JCM, Heider K, Gale D, et al. 2020. ctDNA monitoring using patient-specific sequencing and integration of variant reads. Sci Transl Med 12(548):eaaz8084. — INVAR; integration to ~2.5 ppm.
Zviran A, Schulman RC, Shah M, et al. 2020. Genome-wide cell-free DNA mutational integration enables ultra-sensitive cancer monitoring. Nat Med 26(7):1114-1124. — MRDetect; shallow WGS + read SVM.
Kurtz DM, Soo J, Co Ting Keh L, et al. 2021. Enhanced detection of minimal residual disease by targeted sequencing of phased variants in circulating tumor DNA. Nat Biotechnol 39(12):1537-1547. — PhasED-Seq; phased-variant enrichment.
Sater V, Viailly P-J, Lecroq T, et al. 2020. UMI-VarCal: a new UMI-based variant caller that efficiently improves low-frequency variant detection in paired-end sequencing NGS libraries. Bioinformatics 36(9):2718-2724. — UMI-aware pileup + per-position Poisson test.

Related Skills

cfdna-preprocessing - UMI/duplex consensus input that sets the error floor
analytical-validation - LoD/LoB and the panel-integration math behind detection
longitudinal-monitoring - track detected variants across serial samples
tumor-fraction-estimation - orthogonal burden estimate to cross-check
variant-calling/variant-calling - general somatic calling principles
clinical-databases/variant-prioritization - clinical annotation and interpretation