-
Notifications
You must be signed in to change notification settings - Fork 5
Computing Enrichments For External Annotations
After learning a ConsHMM annotation for a genome of interest, it can be useful to compute the enrichment of the states for some external annotation in order to further understand the biological significance of the states. Using the included ChromHMM software, these enrichments can be computed either relative to each state, or relative to a set of anchor points. Both these type of enrichments were computed in the original ConsHMM paper.
The steps below use the hg19 100 state segmentation based on the Multiz 100-way alignment, which can be downloaded here. Depending on the size of the external annotations, these enrichments can take several hours to compute.
The coords
folder provides an example set of external annotations, which is used in the example below. Any external annotations must be provided in .bed format and may be gzipped to save space. To compute the enrichments of the 100 states in the hg19 segmentation for the example coords run
java -jar ChromHMM/ChromHMM.jar OverlapEnrichment -lowmem -b 1 GW_segmentation.bed.gz coords/hg19/ hg19_multiz100way_enrichments
The flags -lowmem
and -b 1
are necessary because the ConsHMM state annotation has single nucleotide resolution. The output of this command will be a file named hg19_multiz100way_enrichments.txt
where each row is a state and each column contains the enrichments of the states for one of the external annotations in the coords/hg19/ directory.
The anchorFiles
folder provides an example set of anchor points, which is used in the example below. Any external annotations must be provided as a file with one anchor point per line determined by chromosome coordinate
and an optional strand
field. Gzipping is accepted. To compute the enrichments of the hg19 100 states within 200 bases of exon starts at single nucleotide resolution run
java -mx25000M -jar ChromHMM/ChromHMM.jar NeighborhoodEnrichment -lowmem -b 1 -s 1 -l 200 -r 200 GW_segmentation.bed.gz anchorFiles/GENCODE_exons_start.txt.gz hg19_multiz100way_positional_enrichments
The flags -lowmem
and -b 1
are necessary because the ConsHMM state annotation has single nucleotide resolution. The outptu of this command will be a file named hg19_multiz100_positional_enrichments.txt
where each row is a state and each column contains the enrichment of the states at a position relative to the anchor point, in this case ranging from -200 to +200.