Overview

A map of constrained coding regions (CCRs) in the human genome.

Click on the badge above to read the docs and learn more about how to run the model.

Go to INSTALL to see all packages and software necessary to run the model.

Overview

This repository is linked to our manuscript describing constrained coding regions in the human genome. If you would like to view CCRs throughout the genome or download the model in its most current form, go to the CCR Browser. The version used in the paper, which is the version currently available for download, utilizes the hg19/GRCh37 reference genome.

The constrained coding regions model (CCR) uses the Genome Aggregation Database (gnomAD, version 2.0.1 in the paper) to reveal regions of protein coding genes that are likely to be under potentially purifiying selection. We used protein-altering variation from across 123,136 ostensibly healthy individuals' exomes to reveal coding regions that are completely devoid of any protein-coding variation. We infer such coding regions to be constrained; the higher the constraint percentile, the more constrained we predict the region to be.

The most constrained regions (≥90th percentile, and especially at or above the ≥99th percentile) have been shown to be extremely enriched for pathogenic variation in ClinVar, de novo dominant mutations in patients with severe developmental disorders, and critical Pfam domains exome-wide. Even more exciting, 72% of genes harboring a CCR in the 99th percentile or higher have no known pathogenic variants. There is great opportunity for discovery of function in these understudied genes as well as their role in disease phenotypes or potentially in embryonic lethality when altered.

Citation

If you use this model in any way, please cite the paper:

Havrilla, J.M., Pedersen, B.S., Layer, R.M. & Quinlan, A.R. A map of constrained coding regions in the human genome. Nature Genetics (2018). doi:10.1038/s41588-018-0294-6

CCR BED Files

Each column in the above CCR BED files is described below:

BED file columns

Column	Description
chrom	Chromosome ID
start	Start coordinate (0-based, may be part of a multi-exon CCR)
end	End coordinate (1-based, may be part of a multi-exon CCR)
ccr_pct	CCR percentile. 0 represents gnomAD variants and is total non-constraint. 100 represents complete constraint, the highest constrained region in the model.
gene	HGNC gene name.
ranges	The range of coordinates that represent the CCR. For multi-exon spanning CCRs, this will be a comma-separated list of ranges.
varflag	VARTRUE = 0th percentile CCR, and thus an ExAC variant coordinate (or several ExAC deletions merged into one CCR). VARFALSE = Anything that is not a 0th percentile CCR.
syn_density	A calculation of the synonymous variant density of the CCR region. Used variants that were SNPs and did not change amino acids or stop/start codons. Allowed multiple alleles at same bp.
cpg	CpG dinucleotide density of the whole CCR region.
cov_score	The score of length scaled by coverage proportion at 10x for each base pair.
resid	Raw residual value from the linear regression model.
resid_pctile	Raw residual percentile, not weighted by proportion of exome represented.
unique_key	A unique key ID for each CCR.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
docs		docs
INSTALL		INSTALL
LICENSE		LICENSE
README.md		README.md
exac-regions.py		exac-regions.py
get-chain.py		get-chain.py
getfiles.sh		getfiles.sh
makebeds.sh		makebeds.sh
mkdocs.yml		mkdocs.yml
newbeds.sh		newbeds.sh
newccrs.sh		newccrs.sh
newfiles.sh		newfiles.sh
newrun.sh		newrun.sh
regions.sh		regions.sh
resid-plot.py		resid-plot.py
sbatch.sh		sbatch.sh
utils.py		utils.py
varmake.sh		varmake.sh
weightpercentile.py		weightpercentile.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A map of constrained coding regions (CCRs) in the human genome.

Overview

Citation

CCR BED Files

BED file columns

About

Releases

Packages

Contributors 2

Languages

License

quinlan-lab/ccr

Folders and files

Latest commit

History

Repository files navigation

A map of constrained coding regions (CCRs) in the human genome.

Overview

Citation

CCR BED Files

BED file columns

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages