First attempt to use HiC-Pro to finish a 3D genome analysis HiCPro document: http://nservant.github.io/HiC-Pro/
- Install HiC-Pro and dependencies
- Download data and preprocess
- Setting the configuration file
- About the output
- Visualization with HiCPlotter
Dependency | function |
---|---|
bowtie2 | index |
Python (>2.7 & <3) | with 4 packages following |
pysam (>=0.8.3) | |
bx-python(>=0.5.0) | |
numpy(>=1.8.2) | |
scipy(>=0.15.1) | |
R | with packages RColorBrewer and ggplot2 (>2.2.1) |
g++ compiler | use "which" to check if we have |
samtools (>1.1) |
- python pip
- conda: the most convenient
mkdir -p ~/biosoft/hicpro
cd ~/biosoft/hicpro
git clone https://github.com/nservant/HiC-Pro.git
- to edit the config-install.txt file and manually defined the paths to dependencies
cat config-install.txt
# edit
PREFIX =/home/zengjianming/biosoft/hicpro/bin
BOWTIE2_PATH =/home/zengjianming/miniconda3/envs/hic/bin/bowtie2
SAMTOOLS_PATH =/home/zengjianming/miniconda3/envs/hic/bin/samtools
R_PATH =/home/zengjianming/miniconda3/envs/hic/bin/R
PYTHON_PATH =/home/zengjianming/miniconda3/envs/hic/bin/python
CLUSTER_SYS =TORQUE # Resource management system of our server
# install
make configure
make install
# add hicpro to $PATH
vim .bashrc
export HICPRO_PATH=/public/home/liuj626/tools/biosoft/hicpro/bin/HiC-Pro_2.11.1/bin
export PATH=$HICPRO_PATH:$PATH
# check if install successfully
HiC-Pro -h
- Two parts: Experiment data and Reference genome
Article:3D genome of multiple myeloma reveals spatial genome disorganization associated with copy number variations.Nature Communications SRX: SRX2208970 SRR: SRR4341901-SRR4341904
- ==download sra toolkit==
# use sratool
prefetch SRR4341901
# transform to fa.gz
# sample must be placed in the same folder
fastq-dump --gzip --split-3 SRR4341901
hg19
- to prepare .bed and genome size files
$ cd /your/path/of/reference/
$ wget http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/chromFa.tar.gz
$ tar zvfx chromFa.tar.gz
# Notice that the download file is seperated in different chr
$ cat *.fa > hg19.fa
$ rm chr*.fa
# bowtie2 index:add bowtie2 to path before
$ bowtie2-build hg19.fa hg19.fa
# first "hg19.fa":input file; Second "hg19.fa": prefix of output file
# generate 6 ".bt2"
# .bed
mkdir -p ~/data/project/hic/digest # create new folder
cd ~/data/project/hic/digest
bin=/public/.../HiC-Pro_2.10.0/HiC-Pro/bin/utils/digest_genome.py # use digest_genome.py to generate .bed file
$bin -r C^CATGG -o hg19.bed ../ref/hg19.fa
- If you use hg19 as reference, hicpro provides some config files(like
chrom_hg19.sizes
)
**every time you run the pipeline you should create a configuration file.**See the manual for details about the configuration file
- Copy and edit the configuration file "config-hicpro.txt" in your local folder.
- Put all input files in a rawdata folder. The input files have to be organized with a folder per sample.
- Use
qsub
to submit your pbs script
I have provide my file as example (see)
# the basic context of config file
HiC-Pro -i FULL_PATH_TO_RAW_DATA -o FULL_PATH_TO_OUTPUTS -c MY_LOCAL_CONFIG_FILE -p
- You will get message like this and do as the guidance.
Please run HiC-Pro in two steps :
1- The following command will launch the parallel workflow through 12 torque jobs:
qsub HiCPro_step1.sh
2- The second command will merge all outputs to generate the contact maps:
qsub HiCPro_step2.sh
- The results provide:
- mapping results
- contact map(matrix form)
kcakdemir/HiCPlotter: https://github.com/kcakdemir/HiCPlotter/releases
- Notice that the output(.png) will be placed in home directory instead of the working directory(Still don't find solution)
- It has lots of optional parameters
- I have uploaded my file as example (see:)
# basic grammar
python HiCPlotter.py -f file -n name -chr chrX -o output