forked from nservant/HiC-Pro
-
Notifications
You must be signed in to change notification settings - Fork 0
/
LOGBOOK
255 lines (173 loc) · 7.16 KB
/
LOGBOOK
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
## Nicolas Servant
## HiC-Pro
################################################################################
##
## FUTUR DEV
##
################################################################################
o Error if sample dir is named 'fastq'
o Add a way to load paired BAM file
o New graph for Dumped pairs (see. Angelo's idea on the forum)
o HiCUP compare
o Assign reads using its 5' end ...
o bug in samtools
o look at pull request
o bug in plot using R3.1.2 and ggplot 2.2.1
o update qsub in step2 for paralleization per sample
o EV - update error tracking
o EV - update sort -V - Add a unix sort in the requirement !
o EV - bug in build_maps
o Deal with reads with multiple RS sites ?
o Check the impact of reads ordering in pairs classification
o Update of HiC-Pro for capture-C data
o Think about a docker version of HiC-Pro
################################################################################
##
## LOGBOOK
##
################################################################################
##--------------------
## 21-03-16
##--------------------
o Change validPairs format for Juicebox compatibility
##--------------------
## 08-01-16
##--------------------
o Extend HiC-pro to work with other schedulers such as SGE or SLURM
##--------------------
## 21-10-15
##--------------------
o New version of iced normalization
##--------------------
## 30-09-15
##--------------------
o Stable release v2.7.0
o Update manual
##--------------------
## 10-09-15
##--------------------
o Release v2.6.1
o New utility digest_genome.py
o Update HiC-Pro for DNase Hi-C data
o Update manual
##--------------------
## 23-07-15
##--------------------
o Release v2.6.0
o Update split_reads.py utility to handle fastq.gz files
o Update manual and create gh-pages under github. The HiC-Pro website is now available at http://nservant.github.io/HiC-Pro/
##--------------------
## 20-07-15
##--------------------
o Add script to combine perfastq stats files in a single file. Counts are summed, percentage are averaged.
o Split G1/G2 valid interactions using a script rather than using awk parsing. Using script is less efficient, but allow to generate stats after duplicates removal.
##--------------------
## 19-07-15
##--------------------
o Fix conflicts between versions
o Merge AS branch with master
##--------------------
## 30-06-15
##--------------------
o Add check when running the pipeline
o New quality controls plots for duplicates, short range versus long range interactions, and fragment size
##--------------------
## 05-06-15
##--------------------
o First version of allele specific HiC-Pro ready !!!
o Update of the mapped_2hic_fragments.py script in order to add the haplotype tag in the list of valid interaction
##--------------------
## 01-06-15
##--------------------
o First version of markAllelicStatus.py with read a BAM file, and add a tag according to the parental genome. The idea in to use the MD tag, get the N position within the reads and then on the genome.
o Once the data are mapped. Add a step to reconstruct the haplotype at 'N' position and therefore to assign a read to a parental genome.
##--------------------
## 27-05-15
##--------------------
o First version of extract_snps.py scripts
o First part of allele specific pipeline - be able to extract the haplotype from the SANGER database. The mapping has to be done on masked genome.
##--------------------
## 03-04-15
##--------------------
o Check version nnumber in install_dependencies.sh
##--------------------
## 02-04-15
##--------------------
o mergeSAM.pl is now written in python. This version is more efficient than the previous one (I/O).
o All SAM are removed and converted in the fly into BAM files.
##--------------------
## 24-03-15
##--------------------
o Add new utilities - split_reads
##--------------------
## 19-03-15
##--------------------
o Update scripts' name to ease debug
o Change ORGANISM to REFERENCE_GENOME
o Change CUT_SITE_5OVER to LIGATION_SITE. The reads that do not map using the end-to-end strategy should be trimmed after the ligation site. The ligation sequence depends on the fill in strategy. In classical Hi-C, the ligation is expected to create a Nhe1 site flanked by two HindII site, ie.e AAGCTAGCTT
o Support both long and short options. see HiC-Pro -- help
##--------------------
## 19-01-15
##--------------------
o Change README file
o Fix bug in get_hic_files when R1, R2, R3 files are available
##--------------------
## 19-01-15
##--------------------
o Change error handling
o Change logs
o Automatic Installation procedure
o Change HiC-Pro architecture
##--------------------
## 15-12-14
##--------------------
o Change mapping strategy. Local mapping is replaced by a global mapping on trimmed reads. This avoid the reporting of 3' end site, and therefore decrease the number of DE.
##--------------------
## 28-11-14
##--------------------
o Add the ICE normalization (N. Varoquaux)
##--------------------
## 30-10-14
##--------------------
o Release version 2.1
o Changes to support fastq.gz files
o Plotting function for pairing statistics
##--------------------
## 29-10-14
##--------------------
o Plotting functions for mapping and valid pairs statistics
##--------------------
## 20-10-14
##--------------------
o Removed unused variable
o Test on small dataset - 3 x 1M reads - v2.1 (local) = 16m14.715s / v2.1_pbs = 00:09:52 + 0m11.259s / v2.0 = 20min40.009s / v2.0_pbs = 00:14:04
o disk results (bowtie_results, hic_results, logs) without SAM files - v2.0 = 3.6 Go / v2.1 = 1.8 Go
##--------------------
## 10-10-14
##--------------------
o Reads assignment to bins are now based on the middle of the reads
o New options for overlapMapped2HiCFragments.py (-S) to produce a SAM file with tags related to pairs classification
o New filter on mapping pairs. Check the presence of 5' overhang within locally align reads
o Remove duplicate (currently 2 versions. The unix sort looks the best way)
##--------------------
## 30-09-14
##--------------------
o New version of overlapMapped2HiCFragments.py. The script starts with a SAM file as input and output the valid ligation pairs.
##--------------------
## 01-09-14
##--------------------
o New version of mapping_stat script
##--------------------
## 13-08-14
##--------------------
o Test the complete mapping step from raw reads to single PE file. The results are a little bit different from v2.0. This comes from the format of unaligned reads after the global alignment. The read's name seems to have an impact on the mapping results !!! (exemple with SRR941287.278)
o Add filter on MAPQ
o Fix bug in discarding multiple hits. The regexp in v2.0 was not correct (-?)
o New mergeSAM.pl script. Create a PE file from two SE files and clean the reads (unique, multiple, MAPQ)
##--------------------
## 10-08-14
##--------------------
o Remove separateAlignment step and replace by Bowtie2 --un option
o Remove all .aln file generator from the Makefile
o Update the Makefile to run only the mapping
o Clean the scripts dir to remove all unused scripts