| name | bio-hi-c-analysis-loop-calling |
| description | Detects focal chromatin loops (point interactions / corner-dots) in balanced Hi-C and Micro-C contact maps and aggregates/validates a loop set. Covers de-novo calling with cooltools dots (HiCCUPS-style 4-background local enrichment with lambda-chunked FDR), chromosight (template-correlation), and Mustache (scale-space blob detection); aggregate peak analysis (APA) via cooltools pileup for confirmation; the depth/resolution prerequisite (de-novo needs ~5-10kb resolution = hundreds of millions to billions of valid pairs); consensus across callers and convergent-CTCF support as validation; and differential loops via union anchors plus chromosight quantify. Use when calling chromatin loops or dots from a cooler, deciding whether a map is deep enough to call de-novo vs running APA on known CTCF/cohesin anchors, building an aggregate peak pileup, comparing loops across conditions, or validating loop calls. For HiChIP/PLAC-seq/PCHi-C protein-anchored data use FitHiChIP/MAPS, not dots. |
| tool_type | mixed |
| primary_tool | cooltools |
Version Compatibility
Reference examples tested with: cooltools 0.7+, cooler 0.10+, bioframe 0.7+, chromosight 1.6+, mustache 1.3+
Before using code patterns, verify installed versions match. If versions differ:
- Python:
pip show <package> then help(module.function) to check signatures
- CLI:
<tool> --version then <tool> --help to confirm flags
If code throws ImportError, AttributeError, or TypeError, introspect the installed
package and adapt the example to match the actual API rather than retrying.
The .cool must be BALANCED before calling loops -- dots/pileup read the weight column and raw counts are unsupported. An .mcool is multi-resolution; pass a single-resolution URI (file.mcool::/resolutions/10000), not the bare .mcool. cooltools changed signatures around 0.5 -> 0.7 (view_df/expected_value_col conventions); verify help(cooltools.dots) for the installed version. The view_df passed to expected_cis MUST be the same one passed to dots/pileup.
Chromatin Loop Calling
"Where are the focal loops (CTCF/cohesin corner-dots, E-P contacts) in my Hi-C map?" -> Test each off-diagonal pixel for focal enrichment against its local background (on a balanced, expected-normalized matrix), control FDR, then validate the set by aggregation and orthogonal support.
- Python:
cooltools.dots(clr, expected=cooltools.expected_cis(clr, view_df=arms), view_df=arms)
- CLI:
chromosight detect --pattern loops --min-dist 20000 --max-dist 2000000 sample.cool::/resolutions/5000 out
The Single Most Important Modern Insight -- Loop Calling Is Depth-Limited, Not Algorithm-Limited; the First Question Is "How Deep Is the Map?"
The caller choice is second-order. The dominant variable in whether loops are found at all is sequencing depth / map resolution. Rao 2014 needed ~4.9 BILLION contacts in GM12878 to reach 1kb bins and call ~10,000 loops; robust de-novo calling realistically wants 5-10kb resolution, which is hundreds of millions to billions of valid cis pairs. Below that, every caller returns near-nothing or noise, and tuning the FDR will not rescue it. So the workflow forks on depth before any tool is chosen:
- Deep map (>=~500M-1B valid pairs, 5-10kb resolution): de-novo calling is licensed. Run cooltools
dots (or chromosight / Mustache), then validate (see below).
- Shallow map: do NOT de-novo call. Run APA / pileup on a KNOWN anchor set -- loops imported from a deep reference map, or anchor pairs built from CTCF/cohesin ChIP-seq peaks. This is the single most important practical reframe in the skill: shallow data can still confirm and quantify a hypothesized loop set even when it cannot discover one.
Two corollaries that follow directly:
De-novo calling DISCOVERS; APA CONFIRMS -- never conflate them. APA aggregates many putative loops to surface mean signal no individual loop could pass FDR for. An enriched APA center pixel proves "this SET of pairs is enriched on average"; it does NOT prove any single pair is a loop and it cannot discover new loops. Presenting an APA pileup as evidence that "these loops exist" is the classic abuse. And the APA score is meaningless without a corner control -- center pixel divided by an off-diagonal corner block of the flank is the on-vs-off measurement; the bare center value alone says nothing.
Loops form between convergent CTCF motifs -- biology AND a validation filter. Loops preferentially link two CTCF motifs in CONVERGENT orientation (Rao 2014 observation; de Wit 2015, Sanborn 2015 extrusion mechanism; proven by CTCF-site inversion experiments that kill or reroute the loop). A called corner-dot whose two anchors carry convergent CTCF motifs is high-confidence; one with no CTCF/anchor support on a shallow map is likely a false positive. Not all loops are CTCF loops (E-P and polycomb loops exist), so convergent-CTCF is a strong positive filter, not a universal requirement.
Loop-Caller Taxonomy
| Tool | Philosophy | Mechanism | When |
|---|
cooltools dots | local enrichment (CPU HiCCUPS) | pixel must beat 4 local backgrounds (donut/horizontal/vertical/lower-left); Poisson p; lambda-binned BH-FDR | cooler/.mcool pipelines, the modern default; pure-CPU |
| Juicer HiCCUPS | local enrichment (GPU original) | same 4-kernel model on .hic; CUDA-bound | .hic/Juicer ecosystem with a GPU available |
chromosight detect | template correlation | Pearson correlation of a loop/border/stripe kernel vs each window | want loops AND borders AND stripes from one engine; Micro-C-friendly |
| Mustache | scale-space blobs | Difference-of-Gaussians across scales; multi-scale catches loops of different sizes | mixed loop sizes, kb-resolution Micro-C, recovers more E-P/ChIA-PET loops |
| SIP | image processing | Gaussian blur + regional-max + watershed | .hic image-based alternative |
cooltools pileup (APA) | CONFIRMATION, not discovery | aggregate snippets centered on an anchor set; measure center vs corner | validate/quantify a loop SET; works on shallow maps |
Forcato 2017 (Nat Methods 14:679) is the canonical finding that loop callers show LOW pairwise overlap and poor replicate reproducibility -- far worse than TAD callers. Practical consequence: a loop called by only one tool is suspect. Trust comes from consensus across >=2 callers plus orthogonal support (convergent CTCF, ChIA-PET/HiChIP), not from any single tool's list length.
Decision Tree by Scenario
| Scenario | Recommended | Why |
|---|
| Shallow map (tens of M pairs) | APA/pileup on KNOWN anchors (CTCF/cohesin ChIP or reference loops) -- STOP de-novo | callers return noise below ~5-10kb resolution |
| Deep cooler/.mcool, CPU only | cooltools.dots (default) | pure-CPU HiCCUPS reimplementation on balanced cooler |
Deep .hic with a GPU | Juicer HiCCUPS | CUDA original built for billion-contact .hic scans |
| Mixed loop sizes / kb Micro-C | Mustache | scale-space natively spans loop sizes; sub-5kb-friendly |
| Want stripes/borders too | chromosight (swap --pattern) | same template engine; stripes are a SEPARATE class, not loops |
| Validate a call set | cooltools.pileup -> APA score vs corner control | aggregate enrichment + visual QC of the dot |
| Confirm anchors are real loops | convergent-CTCF check -> chip-seq/peak-annotation, atac-seq/footprinting | extrusion loops carry convergent CTCF motifs |
| Annotate loop anchors | -> chip-seq/peak-annotation, atac-seq/enhancer-gene-linking | E-P / TF context lives there |
| Anchor-overlap enrichment p-value | -> genome-intervals/overlap-significance | turn an anchor-overlap count into a permutation test |
| Two conditions, loop strength shift | union anchors -> chromosight quantify per condition -> test delta (or diff_mustache) | NO bin-level DESeq for loops; quantify a fixed coordinate set |
| HiChIP / PLAC-seq / PCHi-C | FitHiChIP / MAPS / HiC-DC+ -> chip-seq/peak-calling | protein-anchored, coverage-biased; HiCCUPS null is wrong |
De-Novo Loop Calling with cooltools dots
Goal: Discover focal loops genome-wide on a deep, balanced map with honest FDR control.
Approach: Build chromosome-arm regions, compute the distance-decay expected on those arms, then run dots -- which convolves four local-background kernels and runs Benjamini-Hochberg FDR independently within geometrically-spaced lambda-bins of locally-adjusted expected. The arms view_df must be identical for expected_cis and dots.
import cooler, cooltools, bioframe
clr = cooler.Cooler('matrix.mcool::/resolutions/10000')
arms = bioframe.make_viewframe(clr.chromsizes)
expected = cooltools.expected_cis(clr, view_df=arms, nproc=4)
loops = cooltools.dots(
clr, expected=expected, view_df=arms,
max_loci_separation=10_000_000,
n_lambda_bins=40, lambda_bin_fdr=0.1,
clustering_radius=20_000,
nproc=4,
)
The four backgrounds, and why lower-left is the clever one. A pixel must beat ALL four local-background kernels, not one. Donut = is it a focal enrichment at all. Horizontal and vertical = is it actually a STRIPE pixel masquerading as a dot (these kernels exist to NOT call architectural stripes as loops). Lower-left = is it just a TAD/contact-domain CORNER -- a domain corner is enriched vs the donut but NOT vs its lower-left neighborhood, so requiring the pixel to also beat lower-left separates a genuine point loop from a generic domain corner. Skipping lower-left inflates calls with domain corners.
Lambda-chunking is why HiCCUPS FDR is honest. Contact counts span orders of magnitude with genomic distance, so a single genome-wide BH-FDR would be dominated by the high-count near-diagonal regime and over-call. dots bins pixels by their locally-adjusted expected into geometrically-spaced lambda-bins (n_lambda_bins=40) and runs BH-FDR independently within each (lambda_bin_fdr=0.1), so low-count and high-count regimes are each thresholded correctly.
Template-Matching with chromosight
chromosight detect --pattern loops --threads 8 \
--min-dist 20000 --max-dist 2000000 --pearson 0.4 \
sample.cool::/resolutions/5000 sample_loops
The score is a Pearson correlation (-1..1) between a loop kernel and each windowed submatrix. The same engine finds borders and stripes by swapping --pattern (loops, loops_small, borders, hairpins, centromeres, stripes_left, stripes_right) -- but stripes are a separate feature class, NOT loops. --pearson is the correlation cutoff; raise it for fewer, higher-confidence calls.
Scale-Space with Mustache
mustache -f sample.mcool -r 5000 -o loops.tsv -pt 0.1 -st 0.88 -norm weight -p 8
-pt is the FDR/p-value threshold (default 0.1), -st the sparsity filter (default 0.88), -norm weight for a balanced .cool (KR for .hic). Mustache spans loop sizes natively via Difference-of-Gaussians across scales, which is why it adapts to kb-resolution Micro-C better than fixed-kernel HiCCUPS.
Aggregate Peak Analysis (APA) -- Confirm, Don't Discover
Goal: Quantify whether a loop SET is enriched on average and visually QC the call set (a clean aggregate dot = mostly real; a smeared/absent center = contaminated).
Approach: Compute expected, pile up observed/expected snippets centered on each anchor pair, average across the stack, then report the APA score = center pixel divided by an off-diagonal corner-control block. Pass expected_df so snippets are O/E and comparable across genomic separations.
import numpy as np
import cooltools
expected = cooltools.expected_cis(clr, view_df=arms, nproc=4)
stack = cooltools.pileup(clr, loops, view_df=arms, expected_df=expected, flank=100_000, nproc=4)
apa = np.nanmean(stack, axis=0)
center = apa.shape[0] // 2
corner = 3
apa_score = apa[center, center] / np.nanmean(apa[-corner:, :corner])
Differential Loops -- Union Anchors, Not a Bin-Level Tool
Goal: Find loops whose strength changes between conditions.
Approach: There is NO DESeq-for-loops. Build a UNION anchor set across conditions, then score each loop's strength per condition at a FIXED coordinate set (chromosight quantify, which is purpose-built for this, or APA per condition), then test the strength delta.
chromosight quantify --pattern loops union_anchors.bed2d condA.cool condA_q
chromosight quantify --pattern loops union_anchors.bed2d condB.cool condB_q
diffHic, multiHiCcompare, and dcHiC operate on BINS or COMPARTMENTS, not focal loops -- do NOT use them as a loop-differential tool. Cross-reference hic-differential for the bin/compartment regime.
Per-Method Failure Modes
De-novo calling on a shallow map
Trigger: running dots/chromosight/Mustache on tens of millions of pairs or >=25kb bins. Mechanism: focal signal is below the noise floor without depth. Symptom: zero or a handful of scattered, irreproducible calls. Fix: STOP de-novo; run APA on a known anchor set (CTCF/cohesin ChIP or reference loops).
APA reported without a corner control
Trigger: quoting the aggregate center-pixel value as the loop "strength." Mechanism: without an off-diagonal corner the number has no on-vs-off baseline. Symptom: a "high" APA that reflects distance-decay, not looping. Fix: APA score = center / corner-control block (Rao 2014 lower-left convention).
APA presented as proof loops exist
Trigger: showing a pileup to claim "these N loops are real." Mechanism: APA surfaces mean enrichment across a SET; it cannot validate any single loop or discover new ones. Symptom: confident per-loop claims backed only by an aggregate. Fix: treat APA as set-level confirmation; for per-loop confidence use consensus + convergent-CTCF.
Trusting a single caller's list
Trigger: reporting "Mustache found N loops" with no cross-check. Mechanism: callers have low pairwise overlap (Forcato 2017). Symptom: a list that barely overlaps a second tool or replicate. Fix: intersect >=2 callers and require convergent-CTCF / ChIA-PET / HiChIP support.
Calling domain corners as loops
Trigger: a caller without a lower-left background (or a custom kernel set). Mechanism: a TAD corner beats the donut but is not a point loop. Symptom: "loops" sitting exactly at TAD corners with no anchor support. Fix: use dots (it beats all four backgrounds); cross-check anchors.
Calling stripe pixels as dots
Trigger: detecting on a map with strong architectural stripes. Mechanism: a stripe pixel is enriched vs the donut but lies on a horizontal/vertical band. Symptom: "loops" smeared along a row/column. Fix: the horizontal/vertical kernels suppress these in dots; treat stripes as a separate class (chromosight stripes_*).
10kb-tuned kernels on 1kb Micro-C
Trigger: default HiCCUPS donut/peak-width on sub-5kb Micro-C. Mechanism: kernels are sized for 5-10kb Hi-C. Symptom: blurred or missed fine E-P loops. Fix: shrink the kernels for sub-5kb, or use Mustache/chromosight which adapt more gracefully.
Raw (unbalanced) matrix into dots
Trigger: dots on a cooler with no weight column. Mechanism: dots requires balancing weights + expected. Symptom: error or meaningless output. Fix: cooler balance first; confirm clr.matrix(balance=True) is not all-NaN.
HiCCUPS-style calling on HiChIP/PLAC-seq
Trigger: running dots on cohesin/H3K27ac HiChIP or PLAC-seq. Mechanism: protein-anchored data is coverage-biased; the Hi-C null is wrong. Symptom: distorted FDR, wrong loop counts. Fix: use FitHiChIP/MAPS/HiC-DC+ against a protein-anchored background.
Quantitative Thresholds
| Threshold | Source | Rationale |
|---|
| De-novo loop resolution 5-10kb | Rao 2014 depth scaling | ~4.9B contacts reached 1kb / ~10k loops; coarser bins blur anchors, shallow maps cannot resolve them |
max_loci_separation 2-10Mb | loop size distribution | most loops are <2Mb; 10Mb is the cooltools default ceiling on diagonal distance |
n_lambda_bins=40, lambda_bin_fdr=0.1 | cooltools/HiCCUPS default | geometric lambda-binning + per-bin BH-FDR keeps FDR honest across the count dynamic range |
clustering_radius=20_000 | cooltools default | merges adjacent called pixels into one loop call |
chromosight --pearson ~0.4 loops | chromosight default | template-correlation cutoff; raise for higher-confidence, fewer calls |
Mustache -pt 0.1, -st 0.88 | Mustache defaults | p/FDR threshold and sparsity filter |
| Consensus across >=2 callers | Forcato 2017 low overlap | single-caller lists are unreliable; require intersection or orthogonal support |
| APA score = center / corner block | Rao 2014 | the corner is the on-vs-off control; the bare center is uninterpretable |
Common Errors
| Error / symptom | Cause | Solution |
|---|
dots returns nothing / scattered junk | map too shallow or resolution too coarse | check depth; below ~5-10kb resolution run APA on known anchors instead |
clr.matrix(balance=True) all NaN | cooler not balanced | cooler balance / cooler.balance_cooler before calling loops |
expected/dots shape or view error | different view_df for expected vs dots | reuse the same view_df (arms) for expected_cis and dots |
Empty / wrong-resolution result on .mcool | bare .mcool passed | use file.mcool::/resolutions/<bp> URI |
| Empty result, no error | chrom naming mismatch (chr1 vs 1) across cooler/anchors/peaks | harmonize chromosome naming everywhere |
AttributeError on a cooltools function | pre-0.7 vs 0.7+ signature change | help(cooltools.dots); adapt to the installed signature |
| APA center looks high but loops are weak | no corner control / expected_df omitted | pass expected_df and divide center by a corner block |
References
- Rao SSP, Huntley MH, Durand NC, et al. 2014. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell 159(7):1665-1680.
- Open2C, Abdennur N, Abraham S, Fudenberg G, et al. 2024. Cooltools: enabling high-resolution Hi-C analysis in Python. PLoS Comput Biol 20(5):e1012067.
- Matthey-Doret C, Baudry L, Breuer A, et al. 2020. Computer vision for pattern detection in chromosome contact maps (chromosight). Nat Commun 11:5795.
- Roayaei Ardakany A, Gezer HT, Lonardi S, Ay F. 2020. Mustache: multi-scale detection of chromatin loops from Hi-C and Micro-C maps using scale-space representation. Genome Biol 21:256.
- Rowley MJ, Poulet A, Nichols MH, et al. 2020. Analysis of Hi-C data using SIP effectively identifies loops in organisms from C. elegans to mammals. Genome Res 30(3):447-458.
- Forcato M, Nicoletti C, Pal K, et al. 2017. Comparison of computational methods for Hi-C data analysis. Nat Methods 14:679-685.
- de Wit E, Vos ESM, Holwerda SJB, et al. 2015. CTCF binding polarity determines chromatin looping. Mol Cell 60(4):676-684.
- Sanborn AL, Rao SSP, Huang SC, et al. 2015. Chromatin extrusion explains key features of loop and domain formation. PNAS 112(47):E6456-E6465.
- Rao SSP, Huang SC, Glenn St Hilaire B, et al. 2017. Cohesin loss eliminates all loop domains. Cell 171(2):305-320.
- Haarhuis JHI, van der Weide RH, Blomen VA, et al. 2017. The cohesin release factor WAPL restricts chromatin loop extension. Cell 169(4):693-707.
- Schwarzer W, Abdennur N, Goloborodko A, et al. 2017. Two independent modes of chromatin organization revealed by cohesin removal. Nature 551:51-56.
- Krietenstein N, Abraham S, Venev SV, et al. 2020. Ultrastructural details of mammalian chromosome architecture (Micro-C). Mol Cell 78(3):554-565.
- Hsieh THS, Cattoglio C, Slobodyanyuk E, et al. 2020. Resolving the 3D landscape of transcription-linked mammalian chromatin folding (Micro-C). Mol Cell 78(3):539-553.
- Bhattacharyya S, Chandra V, Vijayanand P, Ay F. 2019. Identification of significant chromatin contacts from HiChIP data by FitHiChIP. Nat Commun 10:4221.
Related Skills
- hic-data-io - Load and access the cooler files this skill calls loops on
- matrix-operations - Balancing and expected/O/E that dots and pileup depend on
- hic-visualization - Render called loops and APA pileups on the heatmap
- hic-differential - Bin/compartment-level differential (the regime loops are NOT in)
- tad-detection - TAD corners vs point loops; the lower-left background separates them
- chip-seq/peak-calling - CTCF/cohesin peaks to anchor and validate loops; HiChIP peak context
- chip-seq/peak-annotation - Annotate loop anchors with TF/CTCF peaks
- atac-seq/enhancer-gene-linking - E-P contacts complementing loop calls
- atac-seq/footprinting - TF footprints at loop anchors
- genome-intervals/overlap-significance - Permutation test for anchor/feature enrichment