-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathsession1.html
680 lines (549 loc) · 34.5 KB
/
session1.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" dir="ltr">
<head>
<meta http-equiv="content-type" content="text/html; charset=UTF-8"/>
<meta name="description" content=""/>
<meta name="keywords" content="" />
<meta name="author" content="carl" />
<title>ECCB 2014 T01</title>
<link rel="stylesheet" type="text/css" href="style.css" media="screen" />
<style type="text/css">
#wrapper{
margin:0 auto;
#padding:15px 15% 8em;
text-align:left
}
#content {
max-width:70em;
width:100%;
margin:0 auto;
padding-bottom:20px;
overflow:hidden
}
.demo {
margin:1.5em 0;
padding:1.5em 1.5em 0.75em;
border:1px solid #ccc;
position:relative;
overflow:hidden
}
.post {
position:relative;
overflow:hidden
}
.collapse p {padding:0 10px 1em}
.switch {position:absolute; top:1.5em; right: 1.5em; padding:3px}
#.post .switch {position:static; text-align:right}
.post .main{margin-bottom:0; padding-bottom:0}
.other li, .summary {margin-bottom:.3em; padding:1em; border:1px solid #e8e7e8; background-color:#f8f7f8}
.other ul {list-style-type:none; text-align:center}
.expand{padding-bottom:.75em}
/* --- Links --- */
#download {
border-style:none;
background:white;
}
.expand a {
display:block;
padding:3px 10px
}
.expand a:link, .expand a:visited {
border-width:1px;
background-image:url(img/arrow-down.gif);
background-repeat:no-repeat;
background-position:98% 50%;
}
.expand a:hover, .expand a:active, .expand a:focus {
}
.expand a.open:link, .expand a.open:visited {
#border-style:solid;
#background:#eee url(img/arrow-up.gif) no-repeat 98% 50%
color:black;
}
</style>
<!--[if lte IE 6]>
<style type="text/css">
h3 a, .demo {position:relative; height:1%}
</style>
<![endif]-->
<!--[if lte IE 6]>
<script type="text/javascript">
try { document.execCommand( "BackgroundImageCache", false, true); } catch(e) {};
</script>
<![endif]-->
<!--[if !lt IE 6]><!-->
<script type="text/javascript" src="http://ajax.googleapis.com/ajax/libs/jquery/1.4.2/jquery.min.js"></script>
<script type="text/javascript" src="scripts/expand.js"></script>
<script type="text/javascript">
<!--//--><![CDATA[//><!--
$(function() {
$("#content h3.expand").toggler();
$("#content h2.expand").toggler();
$("#content div.demo").expandAll({trigger: "h3.expand", ref: "h3.expand"});
$("#content div.other").expandAll({
expTxt : "[Show]",
cllpsTxt : "[Hide]",
ref : "ul.collapse",
showMethod : "show",
hideMethod : "hide"
});
$("#content div.post").expandAll({
expTxt : "[Show tip]",
cllpsTxt : "[Hide tip]",
ref : "div.collapse",
localLinks: "p.top a"
});
});
//--><!]]>
</script>
<!--<![endif]-->
<script type="text/javascript" src="syntaxhighlight/shCore.js"></script>
<script type="text/javascript" src="syntaxhighlight/shBrushBash.js"></script>
<script type="text/javascript" src="syntaxhighlight/shBrushR.js"></script>
<link type="text/css" rel="stylesheet" href="syntaxhighlight/shCore.css"/>
<link type="text/css" rel="stylesheet" href="syntaxhighlight/shThemeDefault.css"/>
<script type="text/javascript">
SyntaxHighlighter.config.clipboardSwf = 'syntaxhighlight/clipboard.swf';
SyntaxHighlighter.all();
</script>
</head>
<body>
<div id="site-wrapper">
<div id="header">
<div id="top">
<div class="left" id="logo">
<a href="#"><h2 class="label label-green">Session 1 : Website</h2></a>
</div>
<div class="clearer"> </div>
</div>
<div class="navigation" id="sub-nav">
<ul class="tabbed">
<li ><a href="index.html">Home</a></li>
<li class="current-tab"><a href="session1.html">Session1:Web site</a></li>
<li><a href="session2.html">Session2:Command line</a></li>
<li><a href="session3.html">Session3:SOAP Web services</a></li>
</ul>
<div class="clearer"> </div>
</div>
</div>
<div class="main" id="main-two-columns">
<div class="left" id="main-content">
<h2 id="contents"> Introduction </h2>
<b>Goal</b><br/>
<u>The aim is to :</u>
<ul>
<li> Get familiar with motif analysis of ChIP-seq data.</li>
<li> Learn de novo motif discovery methods.</li>
<li> Become familiar with using RSAT via the website.</li>
</ul>
<u>In practice :</u>
<ul>
<li> Motif discovery with <i>peak-motifs</i>.</li>
<li> Advanced parameter settings</li>.
<li> visualisation of the putative TFBS</li>
<li> Motif enrichment with <i>matrix-quality</i>.</li>
</ul>
<div id="wrapper">
<div id="content">
<!-- SEQUENCES -->
<div class="demo">
<h2 id="download" class="expand">Retrieving sequences from your peaks</h2>
<div class="collapse">
<div class="notice">
<b>Goal:</b> Given a set of peaks from a ChIP-seq experiment in a <a href="http://genome.ucsc.edu/FAQ/FAQformat.html#format1" target='_blank'>bed format</a>, retrieve the sequences corresponding to those coordinates from the genome in fasta format.
<br/>
</div>
<h3 class="expand">1 - Example dataset1: CEBPa binding regions in dog liver </h3>
<div class="collapse">
<p>Schmidt, Wilson and Ballester published a ChIP-seq
experiment on liver tissue to identify binding regions for
the transcription factor
CEBPa <a href="http://www.sciencemag.org/content/328/5981/1036"
target='_blank'>(PMID:20378774)</a> in five different
species (human, mouse, dog, short-tailed opossum and
chicken). This data set is publicly available through
arrayexpess<a href="http://www.ebi.ac.uk/arrayexpress/experiments/E-TABM-722/"
target='_blank'> (E-TABM-722)</a>.</br>
As done by the authors, CEBPa binding regions (peaks) were
called by running <a target="_blank"
href="http://www.ebi.ac.uk/~swilder/SWEMBL/">SWEMBL</a> with
parameter R=0.05, on merged reads from two biological
replicates and their corresponding input controls. For this
tutorial we will analyze CEBPa binding pattern in dog, peaks
can be downloaded
from <a href="./data/dataset_cebpa_dog/do61+do79_cfam_CEBPA_liver.bed.SWEMBL.3.3.bed">here</a>.<br/>
</div>
<h3 class="expand">2 - Fetch sequences from a bed file </h3>
<div class="collapse">
<ol>
<li>In a web browser window open the <a href="http://rsat.sb-roscoff.fr/" target="_blank">RSAT</a> web page</li>
<li>In the menu (left side) click on the NGS-ChIP-seq drop down menu, and select the tool: fetch-sequences from UCSC.</li>
<li>Select the genome of interest, in this case: <b>canFam2</b>.</li>
<li>There are several options to input a bed file: Paste
the coordinates, input from a URL, and upload the file
from your computer. In this case, to prevent traffic between the teaching room and internet,
we will favor using the URL option. Right click on this <a href="./data/dataset_cebpa_dog/do61+do79_cfam_CEBPA_liver.bed.SWEMBL.3.3.bed">link</a> and "copy link location" to get the URL of the peak coordinates (BED) file. Alternatively, save the file on your computer and use the "upload" method.</li>
<li>Introduce your email address to receive a mail once the job is done. </li>
<li>Click on Go once the form is complete.</li>
<img class="bordered" src="./figures/Screen1_c.png" alt="fetch_seq_SC1" width="550"/>
<li>When the job is finished you will receive a link to the fasta file containing the sequences corresponding to the coordinates in the bed file.</li>
</ol>
<div class="post">
<div class="success"><b>Check point:</b> Did you recover all sequences in the bed file?</p>
<b>Anticipated Results</b>
<ol>
<li><a href="./results/session1/1_do61+do79_cfam_CEBPA_liver.SWEMBL.3.3_peaks.fasta">Fasta file</a>.</li>
<li><a href="./results/session1/1_do61+do79_cfam_CEBPA_liver.SWEMBL.3.3_peaks_log.txt">Log</a>.</li>
</div>
</div>
<div class="error"><b>Selecting the correct genome version:</b> Genomes are constantly updated with the improvement of sequencing technologies, alignment tools and annotations. Always verify you are selecting the correct version.</p>
<b>Results:</b> At this point, you should have the URL link to the fasta file.
</div>
</div>
</div>
</div>
<!-- PEAK-MOTIFS-->
<div class="demo">
<h2 id="quality" class="expand" >Discovering motifs from peak sequences</h2>
<div class="collapse">
<div class="notice">
<b>Goal:</b>Discover binding motifs or patterns from a fasta file containing ChIP-seq determined binding regions of a transcription factor.
</div>
<h3 class="expand">1 - Getting to know peak-motifs</h3>
<div class="collapse">
<i>peak-motifs</i> is a computational pipeline that discovers motifs in peak sequences, compares them with databases, exports putative binding sites for visualization in the UCSC genome browser and generates an extensive report. </br>
The following articles describe peak-motifs and its usage:
<ol>
<li>Thomas-Chollier, M., Herrmann, C., Defrance, M., Sand, O., Thieffry, D. and van Helden, J. (2011).<i> RSAT peak-motifs: motif analysis in full-size ChIP-seq datasets</i> Nucleic Acids Research doi:10.1093/nar/gkr1104, 9.<a href="http://nar.oxfordjournals.org/content/40/4/e31.long"> [Paper] </a></li>
<li>Thomas-Chollier M, Darbo E, Herrmann C, Defrance M, Thieffry D, van Helden J. (2012). <i>A complete workflow for the analysis of full-size ChIP-seq (and similar) data sets using peak-motifs</i>. Nat Protoc 7(8): 1551-1568. <a href="http://www.nature.com/nprot/journal/v7/n8/full/nprot.2012.088.html"> [Paper]</a></li>
</ol>
In this section we will get familiar with this tool and its general usage.
Its basic usage requires as input:
<ol>
<li>Title for the analysis. For the studied dataset, use <b>CEBPa_ChIP-seq_in_dog_liver</b></li>
<li>A set of peak sequences in fasta format. Sequences can be pasted in the available box, input from a URL, and uploading a file from your computer. Alternatively, the sequences can come directly from another RSAT program (like <i>fetch-sequences</i>), as detailed just below.</li>
<li>Introduce your email address to receive a mail once the job is done. </li>
</ol>
<img class="bordered" src="./figures/Screen2_a.png" alt="fetch_seq_SC2" width="550"/>
<br/>
<b>Passing results from one tool to the next one:</b> As a suite of tools, RSAT is designed to pass the output from one tool as input into related tools. E.g: From the output display in <i>fetch-sequences</i>, you can directly send the sequences to <i>peak-motifs</i>.
<img class="bordered" src="./figures/Screen1_2_PM_color.png" alt="fetch_seq_SC3" width="550"/>
<br/>
The <i>peak-motifs</i> output is formed by the following parts:
<ol>
<li><b>Sequence Composition:</b>The distribution of sequence lengths provides a useful way to detect outlier peaks (i.e., exceptionally long peaks that may ‘dilute’ the motif signal) or irregular length distributions resulting from problems during the peak-calling procedure. Nucleotide and dinucleotide compositions are computed and displayed in the form of heat maps and positional profiles</li>
<img class="bordered" src="./figures/peak-motifs_archive_cebpa_liver_dog_SC1.png" alt="PMSC3" width="550"/>
<li><b>Motif Discovery:</b>The workflow combines four word-based pattern-discovery algorithms that rely on two complementary criteria (overrepresentation and positional bias) to detect exceptional words (oligonucleotides) and spaced pairs of words (dyads). Significant words are used as seeds to build probabilistic description of motifs (position-specific scoring matrices), indicating residue variability at each position of the motif. Motif discovery will be done only using oligonucleotides detection by default.</li>
<img class="bordered" src="./figures/peak-motifs_archive_cebpa_liver_dog_SC2.png" alt="PMSC3" width="550"/>
<li><b>Motif comparisons:</b> Discovered motifs are compared with one or several public databases of annotated motifs to predict associated transcription factors. Comparison results are displayed as multiple motif alignments to highlight matches with several annotated motifs (e.g., factors belonging to the same family, composite motifs bound by protein complexes). Motif comparison is perfomred against vertebrate transcription factors binding motifs in <a href="http://jaspar.genereg.net/" target='_blank'>JASPAR database</a>. </li>
<li><b>Binding site predictions:</b>Sequences are scanned with the discovered motifs to locate binding sites, and their positioning within peaks is analyzed (coverage, positional distribution along peaks).</li>
<img class="bordered" src="./figures/peak-motifs_archive_cebpa_liver_dog_SC3.png" alt="PMSC3" width="550"/>
</ol>
<div class="post">
<div class="success"><b>Understanding <i>peak-motifs</i> results</b>
<ol>
<li>Do you have any concerns regarding peak compotition?</li>
<li>Are there any significant motifs discovered?</li>
<li>Were you expecting these results?</li>
</ol>
<b>Anticipated Results:</b><a href="./results/session1/2_peak-motifs_archive_cebpa_liver_dog/peak-motifs_synthesis.html">peak-motifs results</a>
</div>
</div>
</div>
<h3 class="expand">2 - Fine-tuning peak-motifs parameters </h3>
<div class="collapse">
Several parameters can be tunned in <i>peak-motifs</i> in order to obtain better results.
<ol>
<li><b>Reduce peak sequences:</b> In our previous results it is possible to observe that most of the discovered motifs lay in the middle parts of the peaks. In order to focus our anaysis to this section of the peaks we can use this option and reduce the sequence length.</li>
<img class="bordered" src="figures/Screen_peak_motifs_cutseq.png" alt="PMSC4" width="550"/>
<li><b>Motif-discovery parameters:</b> The choice of motif discovery algorithms markedly affects the result. It is recommended to combine the analysis of overrepresentation (oligo-analysis) and positional bias (position-analysis). Other available analysis are based on: spaced pairs (dyad-analysis) and locally overrepresented words (local-word).</li>
<img class="bordered" src="figures/Screen_peak_motifs_motif_dis.png" alt="PMSC5" width="550"/>
<li><b>Motif Comparison:</b> There are several databases that contain binding motifs available. Users can also add their own collections, in this case we will add as well the motif reported by Schmidt,Wilson and Ballester based on a CEBPa ChIP-seq done in mouse [<a href="./data/dataset_cebpa_dog/do560+do843_mmus_cebpa_liver_top_meme.tf" target='_blank'>motif</a>]</li>
<img class="bordered" src="figures/Screen_peak_motifs_motif_comp.png" alt="PMSC6" width="550"/>
<li><b>Locate motifs:</b> Locating discovered motifs in peaks can be useful to detect potitional bias, once an intersting motif is found it becomes important to locate the site in the genomic context, there are options available in <i>peak-motifs</i> that facilitate this task.</li>
<img class="bordered" src="figures/Screen_peak_motifs_motif_locate.png" alt="PMSC7" width="550"/>
</ol>
<div class="post">
<div class="success"><b>Results and parameters</b>
<ol>
<li>Try different combinations of parameters. How would you improve these results?</li>
<li>How different is the discovered dog CEBPa motif in comparison to the mouse reported motif?</li>
</ol>
<a href="./results/session1/3_peak-motifs_archive_cebpa_liver_dog/peak-motifs_synthesis.html "><b>Anticipated Results</b></a>
</div>
</div>
</div>
<h3 class="expand">3 - Using a sequence set as control </h3>
<div class="collapse">
<i>peak-motifs</i> can take as input a second set of sequences to be used as a control. For example, if there is a set of peaks produced by a ChIP-seq experiment on a mutant that does not contain the transcription factor, this peak set can be used as a control for motif discovery.</br>
Ballester, et al, in their most resent work (eLife, in press) classified the CEBPa peaks in: Peaks belonging to a Cluster of Regulatory Modules (CRM, several TFs binding together) and Singletons (only CEBPa). Using Singleton peaks as control for the CRM peaks it is possible to discover the CEBPA co-factors.
<ol>
<li><b>CRMs:</b><a href="./data/dataset_cebpa_dog/cfam2_cebpa_inModules_EPO.chr_crm.bed">bed file</a>, <a href="./data/dataset_cebpa_dog/cfam2_cebpa_inModules_EPO.chr_crm.fasta">fasta file</a> </li>
<li><b>Singletons:</b><a href="./data/dataset_cebpa_dog/cfam2_cebpa_inModules_EPO.chr_singleton.bed">bed file</a> , <a href="./data/dataset_cebpa_dog/cfam2_cebpa_inModules_EPO.chr_crm.fasta">fasta file</a></li>
</ol>
<div class="post">
<div class="success"><b>Results </b>
<ol>
<li>Now there are two sequence composition results, one for the query sample and one for the control. The query is composed by peaks in the CRM category, the control are Singletons, as expected intuitively the size of the sequences are different.</li>
<li>The CEBPa motif is no longer found since it was over-represented in the control data set, now is possible to observe enrichment for other transcription factors like HNF4a which is a known important liver transcription factor likely to bind together with CEBPa</li>
</ol>
<b>Anticipated Results</b>:<a href="results/session1/3_peak-motifs_archive_cebpa_liver_dog_controlset_CRM_Single/peak-motifs_synthesis.html">peak-motifs control, CRMs vs Singletons.</a>
</div>
</div>
</div>
</div>
</div>
<!--Visualisation -->
<div class="demo">
<h2 id="mapping" class="expand">Visualizing the sites in the context of genome annotations</h2>
<div class="collapse">
<div class="notice">
<b>Goal:</b> Visualize the predicted binding sites with the discovered motifs in genomic context.
<br/>
</div>
<h3 class="expand">1 - UCSC browser </h3>
<div class="collapse">
<p>Visualization of ChIP-seq data in the genome context can be very useful; it can be used to empirically assess quality and to identify interesting genomic regions.</p>
<p><a href="https://genome-euro.ucsc.edu/cgi-bin/hgTracks?db=canFam2&position=chrX%3A112423225-112747804&hgsid=198963761_uRsmrbdyiFmQRhnsq1KM2m5zgDBS">USCS browser</a> contains several annotations and data sets (mostly for human and model organisms) that can be visualized together with user specified samples.</p>
<p><img class="bordered" src="./figures/ucsc_browser_SC1.png" alt="VSC1" width="550"/> </p>
<p>Users can create and share personalized sessions with their data.</p>
<p><img class="bordered" src="./figures/ucsc_browser_SC2.png" alt="VSC2" width="550"/> </p>
You can find a session we prepared containing the dog data set <a href="https://genome-euro.ucsc.edu/cgi-bin/hgTracks?hgS_doOtherUser=submit&hgS_otherUserName=amedina&hgS_otherUserSessionName=canFam2_eccb14">here</a>
</div>
<h3 class="expand">2 - Load predicted binding sites into UCSC browser </h3>
<div class="collapse">
To visualize our binding sites predictions we need to:
<ol>
<li>Dowload the bed file with the coordinates for the predicted sites from the <i>peak-motifs</i> output</li>
<p><img class="bordered" src="./figures/ucsc_browser_SC3_color.png" alt="VSC3" width="550"/> </p>
<li>In UCSC browser select: My Data / Custom Tracks / add custom tracks. </li>
<li>Select the bed file and click on submit</li>
<li>This task can take time, don't close the window!!</li>
<li>Once is loaded you will see one track per motif in the table </li>
<li>Now go back to genome browser </li>
</ol>
</div>
<h3 class="expand">3 - Interpretation</h3>
<div class="collapse">
<div class="post">
<div class="success"><b>Sites in perspective</b>
<ol>
<li>Do all peaks have sites?</li>
<li>Did you expect this results?</li>
<li>Do you have any interesting findings?</li>
</ol>
<b>Anticipated Results:</b> <a href="https://genome-euro.ucsc.edu/cgi-bin/hgTracks?hgS_doOtherUser=submit&hgS_otherUserName=amedina&hgS_otherUserSessionName=canFam2_eccb14_with_sites">UCSC Session.</a>
</div>
</div>
</div>
</div>
</div>
<!-- FNR data set -->
<div class="demo">
<h2 id="peak" class="expand">ChIP-seq in bacteria</h2>
<div class="collapse">
<div class="notice">
<b>Goal:</b> Apply the knowledge acquire today in a second data set.
</div>
<h3 class="expand">1 - FNR ChIP-seq in <i>E. coli</i> K12 </h3>
<div class="collapse">
<p>Myers, et al. recently published a paper where they characterized through ChIP-seq the binding profile for the transcription factor FNR <a href="http://dx.plos.org/10.1371/journal.pgen.1003565">(Paper)</a> in <i>Escherichia coli</i> K12MG1655 . </p>
<p>Data was processed in the following way:</p>
<ol>
<li>Raw reads were downloaded from GEO database. ID:<a href=L"http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE41195"></a>GSE41195.</li>
<li>Reads were aligned to <i>E.coli</i> K12 MG1655 genome, version <b>NC_000913.2</b> using bowtie. </li>
<li>Peaks were called using MACS with parameters: --gsize 4639675 --name "macs14" --bw 400 --keep-dup 1 --bdg --single-profile --diag.</li>
<li>The peak set is <a href="./data/FNR_coli/macs14_peaks.bed">here</a>.</li>
</ol>
</div>
<h3 class="expand">2 - Get the genome sequence</h3>
<div class="collapse">
<p>We will require to have the fasta file for the <b>NC_000913.2</b> version of the <i>E.coli</i> K12 MG1655 genome for the following step. </p>
<ol>
<li>We will download this file from <a href="http://www.ncbi.nlm.nih.gov/nuccore/NC_000913.2">NCBI</a>.</li>
<li>In the "Send to" menu, select file and then specify fasta format.</li>
<p><img class="bordered" src="./figures/NCBI_genome_ecoli.png" alt="FNRSC1" width="550"/></p>
</ol>
</div>
<h3 class="expand">3 - Fetching peak sequences with Galaxy</h3>
<div class="collapse">
<p><a href="https://usegalaxy.org/">Galaxy</a> provides access through the web to useful tools for NGS analysis. We will use the tool <i>Extract Genomic DNA</i> to get the peak regions from the fasta file of the <i>E. coli</i> K12 MG1655 genome. This tool is specially useful for genomes that are not supported by resources like UCSC genome browser.</p>
<ol>
<li>Go to Galaxy:<a href="https://usegalaxy.org/">https://usegalaxy.org/</a>, and login. In case you don't have a user please create one, is fast and it might be useful in the future.</li>
<li>Now we will upload the genome (fasta format) and peaks (bed format) files into Galaxy through the tool <i>Upload File</i> under the "Get Data" menu. Select the corresponding type of data and select "unspecified" genome (the <i>E. coli</i> K12 genome availablable in Galaxy does not correspond to the verision we are using).</li>
<p><img class="bordered" src="./figures/galaxy_upload_files_2.png" alt="FNRSC2" width="550"/></p>
<li>The tool "Extract Genomic DNA" can be found under the <i>Fetch Sequences</i> dropdown menu.</li>
<li>Select the bed and fasta files saved in the history as inputs and execute the job.</li>
<p><img class="bordered" src="./figures/galaxy_fetch_2.png" alt="FNRSC3" width="550"/></p>
<li>Once the job is finished the result will appear in the history.</li>
</ol>
<div class="post">
<div class="success"><b>Fetching sequences for new genomes</b>
<ol>
<li>Do you know other options to fetch peak sequences for genomes that are new or not supported?</li>
<div class="collapse">
You can also do this using the command line.
<pre>$ bedtools getfasta -fi Escherichia_coli_K_12_MG1655.fasta -bed macs14_peaks.bed -fo macs14_peaks.fa </pre>
</div>
</ol>
<b>Anticipated Results:</b><a href="./results/session1/4_E_coliK12_peaks-seq_galaxy.fasta"> fasta file.</a>
</div>
</div>
</div>
<h3 class="expand">4 - Motif Discovery </h3>
<div class="collapse">
Now that we have the peak sequences we can do motif discovery.
<p>You now know what to do!</p>
<ol>
<li>Go to the RSAT <a href="http://rsat.sb-roscoff.fr/">web page</a>.</li>
<li>Fill the form and input the peak sequences in fasta format.</li>
<li>Select the desired options.</li>
<li>Go!</li>
</ol>
<div class="post">
<div class="success"><b>Tunning parameters</b>
<ol>
<li>Which parameters did you use?</li>
<li>Did you select any specific set of motifs to compare with? Why?</li>
<li>Which algorithm gave you the expected motif?</li>
<li>Try comparing the discovered motifs with motifs from binding interfaces of proteins in <a href="http://floresta.eead.csic.es/footprintdb/index.php">footprintDB</a>, these motifs are inferred from a collection of protein-DNA complexes.
</ol>
<b>Anticipated Results:</b> <a href="./results/session1/6_peak-motifs_archive_FNR_Ecoli/peak-motifs_synthesis.html"> [Default parameters]</a> <a href="results/session1/6_peak-motifs_archive_FNR_Ecoli_dyad/peak-motifs_synthesis.html"> [Tunned parameters]</a>
</div>
</div>
</div>
<h3 class="expand">5 - Focusing motif discovery on the summits </h3>
<div class="collapse">
The last analysis didn't show the expected results. Probably because the peaks were to long and this can difficult the search. We will now center the search around +/- 50 base pairs of the reported <a href="./data/FNR_coli/macs14_summits.bed">summits</a>.
<ol>
<li>Go back to the galaxy server.</li>
<li>Upload the summits</li>
<li>We will use the tool <i>Compute</i> under the Text Manipulation menu. We will use this tool twice, one to calculate the start-50 bps and end +50 bps.</li>
<p><img class="bordered" src="./figures/galaxy_compute_2.png" alt="FNRSC3" width="550"/></p>
<li>With the tool <i>Cut</i> under the Text Manipulation menu we will select the first,sixth and seventh columns to create a new bed file.</li>
<p><img class="bordered" src="./figures/galaxy_cut_2.png" alt="FNRSC4" width="550"/></p>
<li>As done before, use the bed file to obtain the sequences from the genome data file.</li>
</ol>
<div class="post">
<div class="success"><b>Signal in the summit</b>
<ol>
<li>Now the results show the expected FNR motif.</li>
</ol>
<b>Anticipated Results:</b> <a href="./results/session1/5_FNR_summit_50bps_sequences.fasta">[summit fasta file]</a> <a href="./results/session1/6_peak-motifs_archive_FNR_Ecoli_summits_50bps/peak-motifs_synthesis.html"> [Summit peak-motifs]</a>
</div>
</div>
</div>
</div>
</div>
<!-- ENRICHMENT -->
<div class="demo">
<h2 id="vizu" class="expand">Measure the enrichment of your peak for expected motifs</h2>
<div class="collapse">
<div class="notice">
<b>Goal:</b> To identify weather there is enrichment for a set of specific motifs in a collection of peaks.
</div>
<h3 class="expand">1 - Motif enrichment in RSAT</h3>
<div class="collapse">
<i>matrix-quality</i> is a tool that highlights the enrichment of binding sites in sequence sets obtained from high-throughput ChIP-chip, and ChIP–seq and experiments, to assess enrichment it combines information from theoretical and empirical score distributions.
The tool can be found under the Matrix Tools menu in the <a href="http://rsat.sb-roscoff.fr/"> RSAT </a> web. The following paper describes in detail the algorithm behind this tool.
<ul>
<li>Medina-Rivera, A., Abreu-Goodger, C., Thomas-Chollier, M., Salgado, H., Collado-Vides, J., & Van Helden, J. (2011).<i> Theoretical and empirical quality assessment of transcription factor-binding motifs. </i> Nucleic Acids Research, 39(3), 808–824.<a href="http://nar.oxfordjournals.org/cgi/pmidlookup?view=long&pmid=20923783"> [Paper] </a></li>
</ul>
</div>
<h3 class="expand">2 - Enrichment of liver Transcription factors binding sites in ChIP-seq peak sequences </h3>
<div class="collapse">
In a recent paper, Ballester, et al. in press, characterized the binding profile for other three relevant transcription factors in liver: OCT1 (HNF6), FOXA1 and HNF4A. The matrices reported in this paper for mouse can be found <a href="data/dataset_cebpa_dog/Liver_TFs_mmus_ballester_zoo-chip.tf">here</a>.
We will use <i>matrix-qualt</i> to assess enrichment for the four TFs (CEBPa, OCT1, HNF4A and FOXA1).
<ol>
<li>Define a title for the job. We will use the title:<b>CEBPa motif enrichment in ChIP-seq</b></li>
<li>Paste the motifs to be used and select <b>transfac</b> format.</li>
<li>Input the peak sequences using the URL.</li>
<p><img class="bordered" src="./figures/matrix_quality_SC1_2.png" alt="MQSC1" width="550"/></p>
<li>Permutations are used as negative control, select 2.</li>
<li>Enter your email.</li>
<li>Go!</li>
<p><img class="bordered" src="./figures/matrix_quality_SC2_2.png" alt="MQSC1" width="550"/></p>
</ol>
</div>
<h3 class="expand">3 - Understanding enrichment graphs</h3>
<div class="collapse">
First we will analyse the enrichment for CEBPa reported motif in the collection of peaks from the ChIP-seq in dog liver.
<ol>
<li>Decreasing cumulative distribution function (dCDF).</li>
<p><img class="bordered" src="./results/session1/7_matrix-quality/do560plusdo843_mmus_cebpa_liver_meme_top_m1/matrix-quality_2014-09-05.235841_do560plusdo843_mmus_cebpa_liver_meme_top_m1_score_distrib_compa.png" alt="MQ3" width="550"/></p>
<li>Decreasing cumulative distribution function (dCDF) in logy scale. The logarithm scale facilitates observing differences in high scores.</li>
<p><img class="bordered" src="./results/session1/7_matrix-quality/do560plusdo843_mmus_cebpa_liver_meme_top_m1/matrix-quality_2014-09-05.235841_do560plusdo843_mmus_cebpa_liver_meme_top_m1_score_distrib_compa_logy.png" alt="MQ4" width="550"/></p>
<li>Receiver Operating Characteristic (ROC) curves.</li>
<p><img class="bordered" src="./results/session1/7_matrix-quality/do560plusdo843_mmus_cebpa_liver_meme_top_m1/matrix-quality_2014-09-05.235841_do560plusdo843_mmus_cebpa_liver_meme_top_m1_score_distrib_compa_roc_xlog.png" alt="MQ5" width="550"/></p>
</ol>
<div class="post">
<div class="success"><b>Is the CEBPa motif enriched?</b></p>
<b>Anticipated Results:</b> In comparison with the theoretical score distribution with the empirical one using the CEBPA motif shows enrichment for high score values. High scores are more likely to be related to biologically relevant sites.
<li><a href="./results/session1/7_matrix-quality/matrix-quality_2014-09-05.235841_synthesis.html">matrix-quality result</a>.</li>
</div>
</div>
</div>
</div>
</div>
<!--Background Model-->
<div class="demo">
<h2 id="mapping" class="expand">Background model</h2>
<div class="collapse">
<div class="notice">
<b>Goal:</b> Understand the relevance of background model selection and how to create one.
<br/>
</div>
<h3 class="expand">1 - Creating a background model</h3>
<div class="collapse">
<p>The tool <i>creat-background-model</i> can be used to create a costumized background model from the a set of sequences.</p>
<ol>
<li>Input the peak sequences in fasta format.</li>
<li>Select the markov order to be used.</li>
<li>Specify an email.</li>
</ol>
<p><img class="bordered" src="./figures/creat_bgmodel_2.png" alt="VSC1" width="550"/> </p>
</div>
</div>
</div>
</div>
</div>
</div>
<!-- SIDE BAR -->
<div class="right sidebar" id="sidebar">
<h5>Slides:</h5>
<ul>
<li><a href="booklet/booklet_chip-seq.pdf" target='_blank'>Presentation</a></li>
</ul>
<h5>Datasets:</h5>
<ul>
<li>Schmidt,Wilson and Ballester <i>Five-vertebrate ChIP-seq reveals the evolutionary dynamics of transcription factor binding.</i> Science, 328(5981), 1036–1040. 2010 <a href="http://www.sciencemag.org/content/328/5981/1036" target='_blank'>Article</a><br/>
<u>peaks:</u><a href="./data/dataset_cebpa_dog/do61-do79_cfam_CEBPA_liver.bed.SWEMBL.3.3.bed" target='_blank'> bed file</a> </li>
<li>Myers, K. S., Yan, H., Ong, I. M., Chung, D., Liang, K., Tran, F., et al. (2013).<i>Genome-scale analysis of escherichia coli FNR reveals complex features of transcription factor binding.</i> PLoS Genetics, 9(6), e1003565. doi:10.1371/journal.pgen.1003565 <a href="http://www.plosgenetics.org/article/info%3Adoi%2F10.1371%2Fjournal.pgen.1003565" target='_blank'>Article</a></br>
<u>peaks:</u><a href="./data/FNR_coli/macs14_peaks.bed" target='_blank'> peak bed file</a>, <a href="./data/FNR_coli/macs14_summits.bed" target='_blank'> summit bed file</a><br/></li>
</ul>
<h5>Suggested Reading:</h5>
<ul>
<li>Bailey et al. <i>Practical Guidelines for the Comprehensive Analysis of ChIP-seq Data.</i> PLoS Comput Biol 9, e1003326 (2013).<a href="http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1003326" target='_blank'> [Paper] </a><br/></li>
<li>Thomas-Chollier, M., Herrmann, C., Defrance, M., Sand, O., Thieffry, D. and van Helden, J. (2011).<i> RSAT peak-motifs: motif analysis in full-size ChIP-seq datasets</i> Nucleic Acids Research doi:10.1093/nar/gkr1104, 9.<a href="http://nar.oxfordjournals.org/content/40/4/e31.long"> [Paper] </a></li>
<li>Thomas-Chollier M, Darbo E, Herrmann C, Defrance M, Thieffry D, van Helden J. (2012). <i>A complete workflow for the analysis of full-size ChIP-seq (and similar) data sets using peak-motifs</i>. Nat Protoc 7(8): 1551-1568. <a href="http://www.nature.com/nprot/journal/v7/n8/full/nprot.2012.088.html"> [Paper]</a></li>
<li>Medina-Rivera, A., Abreu-Goodger, C., Thomas-Chollier, M., Salgado, H., Collado-Vides, J., & Van Helden, J. (2011).<i> Theoretical and empirical quality assessment of transcription factor-binding motifs. </i> Nucleic Acids Research, 39(3), 808–824.<a href="http://nar.oxfordjournals.org/cgi/pmidlookup?view=long&pmid=20923783"> [Paper] </a></li>
</ol>
</li>
</ul>
<p/>
</div>
<div class="clearer"> </div>
</div>
<div id="footer">
<div class="left" id="footer-left">
<p>© 2014 Morgane Thomas-Chollier. All rights Reserved</p>
<p class="quiet"><a href="http://templates.arcsin.se/">Website template</a> by <a href="http://arcsin.se/">Arcsin</a></p>
<div class="clearer"> </div>
</div>
<div class="right" id="footer-right">
</div>
<div class="clearer"> </div>
</div>
</div>
</body>
</html>