| name | chromatin-state-inference |
| description | This skill should be used when users need to infer chromatin states from histone modification ChIP-seq data using chromHMM. It provides workflows for chromatin state segmentation, model training, state annotation. |
ChromHMM Chromatin State Inference
Overview
This skill enables comprehensive chromatin state analysis using chromHMM for histone modification ChIP-seq data. ChromHMM uses a multivariate Hidden Markov Model to segment the genome into discrete chromatin states based on combinatorial patterns of histone modifications.
Main steps include:
- Refer to Inputs & Outputs to verify necessary files.
- Always prompt user if required files are missing.
- Always prompt user for genome assembly used.
- Always prompt user for the bin size for generating binarized files.
- Always prompt user for the bin size for the number of states the ChromHMM target.
- Run chromHMM workflow: Binarization → Learning.
When to use this skill
Use this skill when you need to infer chromatin states from histone modification ChIP-seq data using chromHMM.
Inputs & Outputs
Inputs
(1) Option 1: BED files of aligned reads
<mark1>.bed
<mark2>.bed
...
(1) Option 2: BAM files of aligned reads
<mark1>.bam
<mark2>.bam
...
Outputs
chromhmm_output/
binarized/
*.txt
model/
*.txt
...
Decision Tree
Step 0: Initialize Project
Call:
mcp__project-init-tools__project_init
with:
sample: all
task: chromhmm
Step 1: Prepare the cellmarkfile (skip this step if signal files are provided)
cell1 mark1 cell1_mark2.bam cell1_control.bam
cell1 mark2 cell1_mark2.bam cell1/control.bam
Step 2: Data Binarization
Step 3: Model Learning
Call
mcp__chromhmm-tools__learn_model
with:
binarized_dir: Directory binarized file located in
num_states: Provide by user (e.g. 15)
output_model_dir: (e.g. model_15_states/)
genome: Provide by user (e.g. hg38)
threads: Provide by user (e.g. 16)
Parameter Optimization
Number of States
- 8 states: Basic chromatin states
- 15 states: Standard comprehensive states
- 25 states: High-resolution states
- Optimization: Use Bayesian Information Criterion (BIC)
Bin Size
- 200bp: Standard resolution
- 100bp: High resolution (requires more memory)
- 500bp: Low resolution (faster computation)
State Interpretation
Common Chromatin States
- Active Promoter: H3K4me3, H3K27ac
- Weak Promoter: H3K4me3
- Poised Promoter: H3K4me3, H3K27me3
- Strong Enhancer: H3K27ac, H3K4me1
- Weak Enhancer: H3K4me1
- Insulator: CTCF
- Transcribed: H3K36me3
- Repressed: H3K27me3
- Heterochromatin: Low signal across marks
Troubleshooting
- Memory errors: Reduce bin size or number of states
- Convergence problems: Increase iterations or adjust learning rate
- Uninterpretable states: Check input data quality and mark combinations
- Missing chromosomes: Verify chromosome naming consistency