LOGBOOK

## Nicolas Servant
## HiC-Pro

################################################################################
##
## FUTUR DEV
##
################################################################################

o Error if sample dir is named 'fastq'

o Add a way to load paired BAM file

o New graph for Dumped pairs (see. Angelo's idea on the forum)

o HiCUP compare

o Assign reads using its 5' end ...

o bug in samtools

o look at pull request

o bug in plot using R3.1.2 and ggplot 2.2.1

o update qsub in step2 for paralleization per sample

o EV - update error tracking

o EV - update sort -V - Add a unix sort in the requirement !

o EV - bug in build_maps

o Deal with reads with multiple RS sites ?

o Check the impact of reads ordering in pairs classification

o Update of HiC-Pro for capture-C data

o Think about a docker version of HiC-Pro


################################################################################
##
## LOGBOOK
##
################################################################################

##--------------------
## 21-03-16
##--------------------

   o Change validPairs format for Juicebox compatibility

##--------------------
## 08-01-16
##--------------------

  o Extend HiC-pro to work with other schedulers such as SGE or SLURM

##--------------------
## 21-10-15
##--------------------

   o New version of iced normalization


##--------------------
## 30-09-15
##--------------------

   o Stable release v2.7.0
   o Update manual

##--------------------
## 10-09-15
##--------------------

   o Release v2.6.1
   o New utility digest_genome.py
   o Update HiC-Pro for DNase Hi-C data
   o Update manual

##--------------------
## 23-07-15
##--------------------

   o Release v2.6.0
   o Update split_reads.py utility to handle fastq.gz files
   o Update manual and create gh-pages under github. The HiC-Pro website is now available at http://nservant.github.io/HiC-Pro/

##--------------------
## 20-07-15
##--------------------

   o Add script to combine perfastq stats files in a single file. Counts are summed, percentage are averaged.
   o Split G1/G2 valid interactions using a script rather than using awk parsing. Using script is less efficient, but allow to generate stats after duplicates removal.

##--------------------
## 19-07-15
##--------------------

   o Fix conflicts between versions
   o Merge AS branch with master

##--------------------
## 30-06-15
##--------------------

   o Add check when running the pipeline
   o New quality controls plots for duplicates, short range versus long range interactions, and fragment size

##--------------------
## 05-06-15
##--------------------

   o First version of allele specific HiC-Pro ready !!!
 
   o Update of the mapped_2hic_fragments.py script in order to add the haplotype tag in the list of valid interaction

##--------------------
## 01-06-15
##--------------------

   o First version of markAllelicStatus.py with read a BAM file, and add a tag according to the parental genome. The idea in to use the MD tag, get the N position within the reads and then on the genome.

   o Once the data are mapped. Add a step to reconstruct the haplotype at 'N' position and therefore to assign a read to a parental genome.

##--------------------
## 27-05-15
##--------------------

   o First version of extract_snps.py scripts 

   o First part of allele specific pipeline - be able to extract the haplotype from the SANGER database. The mapping has to be done on masked genome.

##--------------------
## 03-04-15
##--------------------

   o Check version nnumber in install_dependencies.sh


##--------------------
## 02-04-15
##--------------------

   o mergeSAM.pl is now written in python. This version is more efficient than the previous one (I/O).
   o All SAM are removed and converted in the fly into BAM files.

##--------------------
## 24-03-15
##--------------------
   
   o Add new utilities - split_reads 

##--------------------
## 19-03-15
##--------------------

  o Update scripts' name to ease debug
  o Change ORGANISM to REFERENCE_GENOME
  o Change CUT_SITE_5OVER to LIGATION_SITE. The reads that do not map using the end-to-end strategy should be trimmed after the ligation site. The ligation sequence depends on the fill in strategy. In classical Hi-C, the ligation is expected to create a Nhe1 site flanked by two HindII site, ie.e AAGCTAGCTT
  o Support both long and short options. see HiC-Pro -- help


##--------------------
## 19-01-15
##--------------------

  o Change README file
  o Fix bug in get_hic_files when R1, R2, R3 files are available

##--------------------
## 19-01-15
##--------------------

   o Change error handling
   o Change logs
   o Automatic Installation procedure
   o Change HiC-Pro architecture

##--------------------
## 15-12-14
##--------------------

   o Change mapping strategy. Local mapping is replaced by a global mapping on trimmed reads. This avoid the reporting of 3' end site, and therefore decrease the number of DE.

##--------------------
## 28-11-14
##--------------------

   o Add the ICE normalization (N. Varoquaux)

##--------------------
## 30-10-14
##--------------------

   o Release version 2.1
   o Changes to support fastq.gz files
   o Plotting function for pairing statistics

##--------------------
## 29-10-14
##--------------------

   o Plotting functions for mapping and valid pairs statistics

##--------------------
## 20-10-14
##--------------------

   o Removed unused variable
   o Test on small dataset - 3 x 1M reads - v2.1 (local) = 16m14.715s / v2.1_pbs = 00:09:52 + 0m11.259s / v2.0 = 20min40.009s / v2.0_pbs = 00:14:04
   o disk results (bowtie_results, hic_results, logs) without SAM files - v2.0 = 3.6 Go / v2.1 = 1.8 Go

##--------------------
## 10-10-14
##--------------------

   o Reads assignment to bins are now based on the middle of the reads
   o New options for overlapMapped2HiCFragments.py (-S) to produce a SAM file with tags related to pairs classification
   o New filter on mapping pairs. Check the presence of 5' overhang within locally align reads
   o Remove duplicate (currently 2 versions. The unix sort looks the best way)

##--------------------
## 30-09-14
##--------------------

   o New version of overlapMapped2HiCFragments.py. The script starts with a SAM file as input and output the valid ligation pairs.

##--------------------
## 01-09-14
##--------------------

   o New version of mapping_stat script

##--------------------
## 13-08-14
##--------------------

   o Test the complete mapping step from raw reads to single PE file. The results are a little bit different from v2.0. This comes from the format of unaligned reads after the global alignment. The read's name seems to have an impact on the mapping results !!! (exemple with SRR941287.278)
   o Add filter on MAPQ
   o Fix bug in discarding multiple hits. The regexp in v2.0 was not correct (-?)
   o New mergeSAM.pl script. Create a PE file from two SE files and clean the reads (unique, multiple, MAPQ)

##--------------------
## 10-08-14
##--------------------
   
   o Remove separateAlignment step and replace by Bowtie2 --un option
   o Remove all .aln file generator from the Makefile
   o Update the Makefile to run only the mapping
   o Clean the scripts dir to remove all unused scripts