-
Notifications
You must be signed in to change notification settings - Fork 32
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #41 from raphael-group/py3
Py3
- Loading branch information
Showing
23 changed files
with
4,606 additions
and
4,411 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,74 @@ | ||
### Manual Installation | ||
|
||
If you wish to install `HATCHet` directly from this repository, the steps are a bit more involved. | ||
The core module of HATCHet is written in C++11 and thus requires a modern C++ compiler (GCC >= 4.8.1, or Clang). | ||
As long as you have a recent version of GCC or Clang installed, `setuptools` should automatically be able to download a | ||
recent version of `cmake` and compile the Hatchet code into a working package. | ||
|
||
The installation process can be broken down into the following steps: | ||
|
||
1. **Get [Gurobi](http://www.gurobi.com/)** (>= 6.0) | ||
|
||
The coordinate-method applied by HATCHet is based on several integer linear programming (ILP) formulations. Gurobi is a commercial ILP solver with two licensing options: (1) a single-host license where the license is tied to a single computer and (2) a network license for use in a compute cluster (using a license server in the cluster). Both options are freely and [easily available](http://www.gurobi.com/academia/academia-center) for users in academia. | ||
[Download](https://www.gurobi.com/downloads/gurobi-optimizer-eula) Gurobi for your specific platform. | ||
|
||
2. **Set GUROBI_HOME environment variable** | ||
```shell | ||
$ export GUROBI_HOME=/path/to/gurobiXXX | ||
``` | ||
Set `GUROBI_HOME` to where you download Gurobi. Here `XXX` is the 3-digit version of gurobi. | ||
|
||
3. **Build Gurobi** | ||
```shell | ||
$ cd "${GUROBI_HOME}" | ||
$ cd linux64/src/build/ | ||
$ make | ||
$ cp libgurobi_c++.a ../../lib | ||
``` | ||
Substitute `mac64` for `linux64` if using the Mac OSX platform. | ||
|
||
4. **Create a new venv/conda environment for Hatchet** | ||
|
||
`Hatchet` is a Python 3 package. Unless you want to compile/install it in your default Python 3 environment, you will | ||
want to create either a new Conda environment for Python 3 and activate it: | ||
``` | ||
conda create --name hatchet python=3.8 | ||
conda activate hatchet | ||
``` | ||
or use `virtualenv` through `pip`: | ||
``` | ||
python3 -m pip virtualenv env | ||
source env/bin/activate | ||
``` | ||
|
||
5. **Install basic packages** | ||
|
||
It is **highly recommended** that you upgrade your `pip` and `setuptools` versions to the latest, using: | ||
```shell | ||
pip install -U pip | ||
pip install -U setuptools | ||
``` | ||
|
||
6. **Build and install HATCHet** | ||
|
||
Execute the following commands from the root of HATCHet's repository. | ||
```shell | ||
$ pip install . | ||
``` | ||
**NOTE**: If you experience a failure of compilation with an error message like: | ||
``` | ||
_undefined reference to symbol 'pthread_create@@GLIBC_2.2.5'_. | ||
``` | ||
you may need to set `CXXFLAGS` to `-pthread` before invoking the command: | ||
```shell | ||
$ CXXFLAGS=-pthread pip install . | ||
``` | ||
When the compilation process fails or when the environment has special requirements, you may have to manually specify the required paths to Gurobi by following the [detailed intructions](doc/doc_compilation.md). | ||
7. **Install required utilities** | ||
For reading BAM files, read counting, allele counting, and SNP calling, you need to install [SAMtools and BCFtools](http://www.htslib.org/doc/). | ||
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,37 @@ | ||
# Scripts for running the HATCHet workflow | ||
|
||
## 1) Set environmental variables | ||
|
||
Set all variables in `config.txt` with appropriate values. You likely will not need to change anything in the `*.sh` scripts. | ||
|
||
|
||
## 2a) HATCHet without phasing | ||
|
||
Use the following commend To run HATCHet without phasing: | ||
|
||
``` | ||
bash runUnphased.sh > out.txt 2> err.txt | ||
``` | ||
|
||
Feel free to rename the standard out (out.txt) and standard error (err.txt) files to whatever you wish. | ||
|
||
## 2b) HATCHet with phasing | ||
|
||
Running HATCHet with phasing is currently a two part process. It's a little more labor intensive on the user end but may produce cleaner results. | ||
|
||
First run `runPhased_01.sh`, which executes the first three steps of HATCHet: | ||
``` | ||
bash runPhased_01.sh > out1.txt 2> err1.txt | ||
``` | ||
|
||
After this script finishes, go to the `snps` subdirectory within the working directory given to HATCHet in `config.txt`. Here you will find a collection of VCF files, one for each chromosome. These must then be phased (e.g. [Michigan Imputation Server](https://imputationserver.sph.umich.edu/index.html#!)), and the location of the phased VCF file is specified in `config.txt` under the `PHASE` variable. If you use the Michigan imputation server: | ||
|
||
1. you may have to use `bcftools annotate` to convert between chromosome names (e.g. chr20 -> 20) | ||
2. results are always returned in hg19 coordinates, so you may need to convert coordinates back to hg38 using e.g. Picard's [LiftoverVcf](https://broadinstitute.github.io/picard/command-line-overview.html#LiftoverVcf) | ||
3. the by-chromosome phased VCF files you receive must be combined with the `bcftools concat` command to give HATCHet a single phased VCF file. | ||
|
||
Then, run the second half of the HATCHet workflow, which should have a shorter runtime than the first part: | ||
|
||
``` | ||
bash runPhased_02.sh > out2.txt 2> err2.txt | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,49 @@ | ||
#!/usr/bin/env bash | ||
|
||
#################################################################################### | ||
# Please set up the correct configuration values here below before running HATCHet # | ||
#################################################################################### | ||
|
||
REF="/path/to/reference.fa" #Please make sure to have produced the reference dictionary /path/to/reference.dict | ||
REF_VERS="" # Reference version used to select list of known SNPs; possible values are "hg19" or "hg38", or leave blank "" if you wish for all positions to be genotyped by bcftools | ||
CHR_NOTATION=true # Does your reference name chromosomes with "chr" prefix?; possible values true/false | ||
SAM="/path/to/samtools-home/bin/" #Uncomment if samtools is already in PATH | ||
BCF="/path/to/bcftools-home/bin/" #Uncomment if bcftools is already in PATH | ||
XDIR="/path/to/running-dir/" #Path for output | ||
NORMAL="/path/to/matched-normal.bam" | ||
BAMS="/path/to/tumor-sample1.bam /path/to/tumor-sample2.bam" | ||
NAMES="Primary Met" #Use the same order as the related tumor BAM files in BAMS above | ||
J=$(python -c 'import multiprocessing as mp; print(mp.cpu_count())') #Replace with fixed number if you do not want to use all available cpus | ||
MINREADS=8 #Use 8 for WGS with >30x and 20 for WES with ~100x | ||
MAXREADS=300 #Use 300 for WGS with >30x and Use 1000 for WES with ~100x | ||
BIN="50kb" #Bin size for calculating RDR and BAF | ||
|
||
################################################################################################################################ | ||
# To run HATCHet with phasing please do the following: # | ||
# 1. Use a phasing algorithm with the SNP VCF files generated in ${SNP}*.vcf.gz (snps folder by default) # | ||
# 2. Combine the phased SNPs for all chromosomes in a unique phased file with `CHROM POS PHASE` where: # | ||
# - CHROM is the chromosome of the SNP; # | ||
# - POS is the genomic position of the SNP; # | ||
# - PHASE is any string that contains 0|1 and 1|0 (lines without those will be excluded as well as those starting with #) # | ||
# 3. Provide the path to the phased file in the variable PHASE here below # | ||
# 4. Choose haplotype block size BLOCK, 50kb is used by default | ||
# Note: a phased VCF file (with phased genotypes 0|1 and 1|0) works and `bcftools concat` can be used to combine chromosomes # # | ||
# If using reference-phasing algorithm please make sure the ouput VCF are w.r.t. same reference genome, otherwise please # | ||
# use LiftOver to convert it or bcftools --annotate to add or remove `chr` notation # | ||
################################################################################################################################ | ||
PHASE="None" #Path to phased file; specify "None" to run hatchet without phasing | ||
BLOCK="50kb" #Haplotype block size used for combining SNPs | ||
|
||
|
||
# These specify the subdirectories created and used by HATCHet and do not need to be changed | ||
|
||
RDR="rdr/" | ||
SNP="snps/" | ||
BAF="baf/" | ||
BB="bb/" | ||
BBC="bbc/" | ||
PLO="plots/" | ||
RES="results/" | ||
SUM="summary/" | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,53 @@ | ||
#!/usr/bin/env bash | ||
|
||
source ./config.txt | ||
|
||
LIST="" | ||
# Select list of known SNPs based on reference genome | ||
if [ "$REF_VERS" = "hg19" ] | ||
then | ||
if [ "$CHR_NOTATION" = true ] | ||
then | ||
LIST="https://ftp.ncbi.nih.gov/snp/organisms/human_9606_b151_GRCh37p13/VCF/GATK/00-All.vcf.gz" | ||
else | ||
LIST="https://ftp.ncbi.nih.gov/snp/organisms/human_9606_b151_GRCh37p13/VCF/00-All.vcf.gz" | ||
fi | ||
else | ||
if [ "$REF_VERS" = "hg38" ] | ||
then | ||
if [ "$CHR_NOTATION" = true ] | ||
then | ||
LIST="https://ftp.ncbi.nih.gov/snp/organisms/human_9606_b151_GRCh38p7/VCF/GATK/00-All.vcf.gz" | ||
else | ||
LIST="https://ftp.ncbi.nih.gov/snp/organisms/human_9606_b151_GRCh38p7/VCF/00-All.vcf.gz" | ||
fi | ||
fi | ||
fi | ||
#################################################################################### | ||
|
||
|
||
################################################################## | ||
# For default run please execute the following without changes # | ||
# Otherwise please follow the related HATCHet's reccommendations # | ||
# To run HATCHet with phasing of SNPs please see below # | ||
################################################################## | ||
set -e | ||
set -o xtrace | ||
PS4='\''[\t]'\' | ||
ALLNAMES="Normal ${NAMES}" | ||
export PATH=$PATH:${SAM} | ||
export PATH=$PATH:${BCF} | ||
export OPENBLAS_NUM_THREADS=1 | ||
export OMP_NUM_THREADS=1 | ||
|
||
cd ${XDIR} | ||
mkdir -p ${RDR} | ||
mkdir -p ${SNP} | ||
mkdir -p ${BAF} | ||
|
||
python3 -m hatchet binBAM -N ${NORMAL} -T ${BAMS} -S ${ALLNAMES} -b ${BIN} -g ${REF} -j ${J} -O ${RDR}normal.1bed -o ${RDR}tumor.1bed -t ${RDR}total.tsv |& tee ${RDR}bins.log | ||
|
||
python3 -m hatchet SNPCaller -N ${NORMAL} -r ${REF} -j ${J} -c ${MINREADS} -C ${MAXREADS} -R ${LIST} -o ${SNP} |& tee ${BAF}bafs.log | ||
|
||
python3 -m hatchet deBAF -N ${NORMAL} -T ${BAMS} -S ${ALLNAMES} -r ${REF} -j ${J} -c ${MINREADS} -C ${MAXREADS} -L ${SNP}*.vcf.gz -O ${BAF}normal.1bed -o ${BAF}tumor.1bed |& tee ${BAF}bafs.log | ||
|
Oops, something went wrong.