-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathindexing.html
265 lines (206 loc) · 14.2 KB
/
indexing.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
<!DOCTYPE html>
<html>
<style>
seq {font-family:Monaco, "Courier New";
font-size:0.9em;
text-indent: 2.7em;
}
align.small {
font-family:Monaco, "Courier New";
font-size:0.9em;
line-height:65%;
}
align.long {
font-family:Monaco, "Courier New";
font-size:0.75em;
line-height:70%;
}
s5 {color:#6baed6;}
s7 {color:#fc9272;}
p5 {color:#08519c;}
p7 {color:#a50f15;}
me {color:#969696;}
t7 {color:blue;}
w1 {color:#f03b20;}
pri {color:#75c37c;}
i71 {color:#a6cee3;}
i72 {color:#1f78b4;}
i73 {color:#b2df8a;}
i74 {color:#33a02c;}
i75 {color:#fb9a99;}
i51 {color:#fdbf6f;}
i52 {color:#ff7f00;}
i53 {color:#cab2d6;}
i54 {color:#6a3d9a;}
i55 {color:#b15928;}
h3 {font-family:verdana;}
</style>
<head>
<title>Illumina sequencing</title>
</head>
<body>
<h1>Illumina sequencing libraries</h1>
<p><span style="font-family:verdana; font-size:1.1em;">Illumina sequencing by synthesis requires special oligonucleotide adapters to be annealed to the purified target DNA in order to initiate sequencing. These adapters consist of three main components: (1) the P5 and P7 sequences that allow the library to bind and generate clusters on the flow cell. (2) The i5 and i7 index sequences (barcodes) which uniquely label the molecules from different samples to allow multiplexing/pooling of multiple samples in a single sequencing run or flow cell lane.(3) The binding sites for the Read 1 and Read 2 sequencing primers which initiate the sequencing process itself. There are a variety of Illumina and third party adapter designs that can be used for Illumina sequencing, with the TruSeq and Nextera adapter systems being the most popular:</span></p>
<h3>TruSeq Dual Index Library:</h3>
<pre>
<seq>
5'- <p5>AATGATACGGCGACCACCGAGATCTACAC</p5><t7>NNNNNNNN</t7><s5>ACACTCTTTCCCTACACGACGCTCTTCCGATCT</s5>-insert-<s7>AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC</s7><t7>NNNNNNNN</t7><p7>ATCTCGTATGCCGTCTTCTGCTTG</p7> -3'
3'- <p5>TTACTATGCCGCTGGTGGCTCTAGATGTG</p5><t7>NNNNNNNN</t7><s5>TGTGAGAAAGGGATGTGCTGCGAGAAGGCTAGA</s5>-insert-<s7>TCTAGCCTTCTCGTGTGCAGACTTGAGGTCAGTG</s7><t7>NNNNNNNN</t7><p7>TAGAGCATACGGCAGAAGACGAAC</p7> -5'
<p5>Illumina P5</p5> <t7>i5</t7> <s5>TruSeq Read 1</s5> <s7>TruSeq Read 2</s7> <t7>i7</t7> <p7>Illumina P7</p7>
</seq>
</pre>
<h3>Nextera Dual Index Library:</h3>
<pre>
<seq>
5'- <p5>AATGATACGGCGACCACCGAGATCTACAC</p5><t7>NNNNNNNN</t7><s5>TCGTCGGCAGCGTC</s5><me>AGATGTGTATAAGAGACAG</me>-insert-<me>CTGTCTCTTATACACATCT</me><s7>CCGAGCCCACGAGAC</s7><t7>NNNNNNNN</t7><p7>ATCTCGTATGCCGTCTTCTGCTTG</p7> -3'
3'- <p5>TTACTATGCCGCTGGTGGCTCTAGATGTG</p5><t7>NNNNNNNN</t7><s5>AGCAGCCGTCGCAG</s5><me>TCTACACATATTCTCTGTC</me>-insert-<me>GACAGAGAATATGTGTAGA</me><s7>GGCTCGGGTGCTCTG</s7><t7>NNNNNNNN</t7><p7>TAGAGCATACGGCAGAAGACGAAC</p7> -5'
<p5>Illumina P5</p5> <t7>i5</t7> <s5>Next</s5><me>era Read 1</me> <me>Next</me><s7>era Read 2</s7> <t7>i7</t7> <p7>Illumina P7</p7>
</seq>
</pre>
<p><span style="font-family:verdana; font-size:1.1em;">the "insert" is the commonly used term for the target DNA that is to be sequenced, in the case of metabarcoding libraries this also includes the forward and reverse PCR primers used to amplify the target DNA.</span></p>
<p><span style="font-family:verdana; font-size:1.1em;">The "N"s in the above diagrams indicate the "indexes", or "barcodes" used to discriminate different samples. These are short 8-10bp sequences (i.e. CTATGTTA) that are unique to each sample. The index at the right hand side is the "i7 index", or "index1", and the index at the left hand side is the "i5 index", or "index2".</span></p>
<p><span style="font-family:verdana; font-size:1.1em;">Most modern sequencing protocols use dual-indexing rather than single indexing. Dual indexed libraries can either be combinatiorial, where only 1 index is different between samples, while the other is shared:</span></p>
<pre>
<seq>
Sample 1 - <i51>-AATAACGT</i51>...<i71>AATCGTTA</i71>
Sample 2 - <i51>-TTCTTGAA</i51>...<i72>GTCTACAT</i72>
Sample 3 - <i51>-GGCAGATC</i51>...<i73>CGCTGCTC</i73>
Sample 4 - <i51>-CTATGTTA</i51>...<i74>GATCAACA</i74>
Sample 5 - <i51>-GTTGACGC</i51>...<i75>CGAAGGAC</i75>
<t7>i5</t7> <t7>i7</t7>
</seq>
</pre>
<p><span style="font-family:verdana; font-size:1.1em;">Or completely unique (Unique Dual Indexing) where both the i5 and i7 index is completely unique to that sample:</span></p>
<pre>
<seq>
Sample 1 - <i51>-AATAACGT</i51>...<i71>AATCGTTA</i71>
Sample 2 - <i52>-TTCTTGAA</i52>...<i72>GTCTACAT</i72>
Sample 3 - <i53>-GGCAGATC</i53>...<i73>CGCTGCTC</i73>
Sample 4 - <i54>-CTATGTTA</i54>...<i74>GATCAACA</i74>
Sample 5 - <i55>-GTTGACGC</i55>...<i75>CGAAGGAC</i75>
<t7>i5</t7> <t7>i7</t7>
</seq>
</pre>
<p><span style="font-family:verdana; font-size:1.1em;">Illumina now encourages customers to use unique dual indexing (UDI) whenever possible to ensure the most accurate demultiplexing, and therefore reduce the risk of sample cross-contamination.</span></p>
<h1>Library preparation:</h1>
<p><span style="font-family:verdana; font-size:1.1em;">In our current metabarcoding protocol, we are using the TruSeq adapter system and anneal them to the molecule using 2 separate PCRs:</span></p>
<h3>First PCR:</h3>
<p><span style="font-family:verdana; font-size:1.1em;">The first PCR amplifies the target DNA and adds the illumina Read 1 primer on the left side of the insert, and the Read 2 primer on right side of insert. To achieve this we need to modify our locus-specific primers to include the Universal 5' adapters as tails. In the below example we are using the fwhF2-fwhR2n primer sets which amplify a short region of the mitochondrial COI barcode: </span></p>
<p><span style="font-family:verdana; font-size:1.1em;"></span></p>
<pre>
<seq>
<p>Tailed F primer: 5'- <s5>ACACTCTTTCCCTACACGACGCTCTTCCGATCT</s5>-<pri>GGDACWGGWTGAACWGTWTAYCCHCC</pri> -3'
<s5>TruSeq Read 1</s5> <pri>Forward primer</pri>
<p>Tailed R primer: 5'- <s7>GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT</s7>-<pri>GTRATWGCHCCDGCTARWACWGG</pri> -3'
<s7>TruSeq Read 2</s7> <pri>Reverse primer</pri>
</seq>
</pre>
<p><span style="font-family:verdana; font-size:1.1em;">Following amplification with these tailed primers, the molecules will look like this:</span></p>
<pre>
<seq>
5'- <s5>ACACTCTTTCCCTACACGACGCTCTTCCGATCT</s5><pri>GGDACWGGWTGAACWGTWTAYCCHCC</pri>-Target-<pri>CCWGTWYTAGCHGGDGCWATYAC</pri><s7>AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC</s7> -3'
5'- <s5>TGTGAGAAAGGGATGTGCTGCGAGAAGGCTAGA</s5><pri>CCHCCYATWTGWCAAGTWGGWCADGG</pri>-Target-<pri>GGWCAWRATCGDCCHCGWTARTG</pri><s7>TCTAGCCTTCTCGTGTGCAGACTTGAGGTCAGTG</s7> -3'
<s5>TruSeq Read 1</s5> <pri>Forward primer</pri> <pri>Reverse primer</pri> <s7>TruSeq Read 2</s7>
</seq>
</pre>
<p><span style="font-family:verdana; font-size:1.1em;">Note that in some sequencing protocols such as those used for whole-genome, metagenomic, or metatranscriptomics, the Read 1 and Read 2 adapters are annealed to the molecules using alternatives to PCR such as tagmentation or ligation.</span></p>
<br></br>
<h3>Second PCR:</h3>
<p><span style="font-family:verdana; font-size:1.1em;">The second PCR uses the Read 1 and Read 2 primer sequences as templates to add the P5 and P7 sequencing primers, as well as the i5 and i7 indexes. The second set of primers, commonly referred to as indexing primers, are normally purchased in a kit or designed in-house. Either way, they are generally structured as follows:</span></p>
<pre>
<seq>
<p>iTru_R1_5: 5'- <p5>AATGATACGGCGACCACCGAGATCTACAC</p5><t7>NNNNNNNN</t7><s5>ACACTCTTTCCCTACACGACGCTCTTCCGATCT</s5> -3'
<p5>Illumina P5</p5> <t7>i5</t7> <s5>TruSeq Read 1</s5>
<p>iTru_R2_5: 5'- <s7>AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC</s7><t7>NNNNNNNN</t7><p7>ATCTCGTATGCCGTCTTCTGCTTG</p7> -3'
<s7>TruSeq Read 2</s7> <t7>i7</t7> <p7>Illumina P7</p7>
</seq>
</pre>
<p><span style="font-family:verdana; font-size:1.1em;">Following amplification with the second set of primers (indexing primers), the molecules will look like this:</span></p>
<pre>
<seq>
5'- <p5>AATGATACGGCGACCACCGAGATCTACAC</p5><t7>NNNNNNNN</t7><s5>ACACTCTTTCCCTACACGACGCTCTTCCGATCT</s5><pri>GGDACWGGWTGAACWGTWTAYCCHCC</pri>-Target-<pri>CCWGTWYTAGCHGGDGCWATYAC</pri><s7>AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC</s7><t7>NNNNNNNN</t7><p7>ATCTCGTATGCCGTCTTCTGCTTG</p7> -3'
3'- <p5>TTACTATGCCGCTGGTGGCTCTAGATGTG</p5><t7>NNNNNNNN</t7><s5>TGTGAGAAAGGGATGTGCTGCGAGAAGGCTAGA</s5><pri>CCHCCYATWTGWCAAGTWGGWCADGG</pri>-Target-<pri>GGWCAWRATCGDCCHCGWTARTG</pri><s7>TCTAGCCTTCTCGTGTGCAGACTTGAGGTCAGTG</s7><t7>NNNNNNNN</t7><p7>TAGAGCATACGGCAGAAGACGAAC</p7> -5'
<p5>Illumina P5</p5> <t7>i5</t7> <s5>TruSeq Read 1</s5> <pri>Forward primer</pri> <pri>Reverse primer</pri> <s7>TruSeq Read 2</s7> <t7>i7</t7> <p7>Illumina P7</p7>
</seq>
</pre>
<p><span style="font-family:verdana; font-size:1.1em;">Once the adapters are added the libraries are ready to be sequenced.</span></p>
<h1>Sequencing:</h1>
<p><span style="font-family:verdana; font-size:1.1em;">The below steps are automatically performed by the machine and sequencing chemistry, and do not need to be performed by the operator. From here on the target DNA, forward, and reverse primers will be referred to as the "Insert".</span></p>
<p><span style="font-family:verdana; font-size:1.1em;">In the sequencing reagents provided by Illumina, the sequencing primers are actually a mixture of different primers, including TruSeq, Nextera and even primers from obsolete kits. Therefore, you actually can sequence different types of libraries together.</span></p>
<h3>(Step 1) Add Read 1 sequencing primer mixture to sequence the first read (bottom strand as template):</h3>
<pre>
<seq>
TruSeq Dual Index Library:
5'- <s5>ACACTCTTTCCCTACACGACGCTCTTCCGATCT</s5>---->
3'- <p5>TTACTATGCCGCTGGTGGCTCTAGATGTG</p5><t7>NNNNNNNN</t7><s5>TGTGAGAAAGGGATGTGCTGCGAGAAGGCTAGA</s5>-insert-<s7>TCTAGCCTTCTCGTGTGCAGACTTGAGGTCAGTG</s7><t7>NNNNNNNN</t7><p7>TAGAGCATACGGCAGAAGACGAAC</p7> -5'
</seq>
</pre>
<pre>
<seq>
Nextera Dual Index Library:
5'- <s5>TCGTCGGCAGCGTC</s5><me>AGATGTGTATAAGAGACAG</me>------>
3'- <p5>TTACTATGCCGCTGGTGGCTCTAGATGTG</p5><t7>NNNNNNNN</t7><s5>AGCAGCCGTCGCAG</s5><me>TCTACACATATTCTCTGTC</me>-insert-<me>GACAGAGAATATGTGTAGA</me><s7>GGCTCGGGTGCTCTG</s7><t7>NNNNNNNN</t7><p7>TAGAGCATACGGCAGAAGACGAAC</p7> -5'
</seq>
</pre>
<h3>(Step 2) Add Index 1 sequencing primer mixture to sequence the first index (index 1, i7, bottom strand as template):</h3>
<pre>
<seq>
TruSeq Dual Index Library:
5'- <s7>AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC</s7>------->
3'- <p5>TTACTATGCCGCTGGTGGCTCTAGATGTG</p5><t7>NNNNNNNN</t7><s5>TGTGAGAAAGGGATGTGCTGCGAGAAGGCTAGA</s5>-insert-<s7>TCTAGCCTTCTCGTGTGCAGACTTGAGGTCAGTG</s7><t7>NNNNNNNN</t7><p7>TAGAGCATACGGCAGAAGACGAAC</p7> -5'
</seq>
</pre>
<pre>
<seq>
Nextera Dual Index Library:
5'- <me>CTGTCTCTTATACACATCT</me><s7>CCGAGCCCACGAGAC</s7>------->
3'- <p5>TTACTATGCCGCTGGTGGCTCTAGATGTG</p5><t7>NNNNNNNN</t7><s5>AGCAGCCGTCGCAG</s5><me>TCTACACATATTCTCTGTC</me>-insert-<me>GACAGAGAATATGTGTAGA</me><s7>GGCTCGGGTGCTCTG</s7><t7>NNNNNNNN</t7><p7>TAGAGCATACGGCAGAAGACGAAC</p7> -5'
</seq>
</pre>
<h3>(Step 3 of MiSeq, HiSeq2000/2500 and NovaSeq 6000) Folds over and sequence the second index (index 2, i5, bottom strand as template):</h3>
<pre>
<seq>
TruSeq Dual Index Library:
5'- <p5>AATGATACGGCGACCACCGAGATCTACAC</p5>------->
3'- <p5>TTACTATGCCGCTGGTGGCTCTAGATGTG</p5><t7>NNNNNNNN</t7><s5>TGTGAGAAAGGGATGTGCTGCGAGAAGGCTAGA</s5>-insert-<s7>TCTAGCCTTCTCGTGTGCAGACTTGAGGTCAGTG</s7><t7>NNNNNNNN</t7><p7>TAGAGCATACGGCAGAAGACGAAC</p7> -5'
</seq>
</pre>
<pre>
<seq>
Nextera Dual Index Library:
5'- <p5>AATGATACGGCGACCACCGAGATCTACAC</p5>------->
3'- <p5>TTACTATGCCGCTGGTGGCTCTAGATGTG</p5><t7>NNNNNNNN</t7><s5>AGCAGCCGTCGCAG</s5><me>TCTACACATATTCTCTGTC</me>-insert-<me>GACAGAGAATATGTGTAGA</me><s7>GGCTCGGGTGCTCTG</s7><t7>NNNNNNNN</t7><p7>TAGAGCATACGGCAGAAGACGAAC</p7> -5'
</seq>
</pre>
<h3>(Step 3 of iSeq 100, MiniSeq, NextSeq, HiSeq X and HiSeq 3000/4000) Add Index 2 sequencing primer mixture to sequence the second index (index 2, i5, top strand as template):</h3>
<pre>
<seq>
TruSeq Dual Index Library:
5'- <p5>AATGATACGGCGACCACCGAGATCTACAC</p5><t7>NNNNNNNN</t7><s5>ACACTCTTTCCCTACACGACGCTCTTCCGATCT</s5>-insert-<s7>AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC</s7><t7>NNNNNNNN</t7><p7>ATCTCGTATGCCGTCTTCTGCTTG</p7> -3'
<-------<s5>TGTGAGAAAGGGATGTGCTGCGAGAAGGCTAGA</s5> -5'
</seq>
</pre>
<pre>
<seq>
Nextera Dual Index Library:
5'- <p5>AATGATACGGCGACCACCGAGATCTACAC</p5><t7>NNNNNNNN</t7><s5>TCGTCGGCAGCGTC</s5><me>AGATGTGTATAAGAGACAG</me>-insert-<me>CTGTCTCTTATACACATCT</me><s7>CCGAGCCCACGAGAC</s7><t7>NNNNNNNN</t7><p7>ATCTCGTATGCCGTCTTCTGCTTG</p7> -3'
<-------<s5>AGCAGCCGTCGCAG</s5><me>TCTACACATATTCTCTGTC</me> -5'
</seq>
</pre>
<h3>(Step 4) Cluster regeneration, add Read 2 sequencing primer mixture to sequence the second read (top strand as template):</h3>
<pre>
<seq>
TruSeq Dual Index Library:
5'- <p5>AATGATACGGCGACCACCGAGATCTACAC</p5><t7>NNNNNNNN</t7><s5>ACACTCTTTCCCTACACGACGCTCTTCCGATCT</s5>-insert-<s7>AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC</s7><t7>NNNNNNNN</t7><p7>ATCTCGTATGCCGTCTTCTGCTTG</p7> -3'
<------<s7>TCTAGCCTTCTCGTGTGCAGACTTGAGGTCAGTG</s7> -5'
</seq>
</pre>
<pre>
<seq>
Nextera Dual Index Library:
5'- <p5>AATGATACGGCGACCACCGAGATCTACAC</p5><t7>NNNNNNNN</t7><s5>TCGTCGGCAGCGTC</s5><me>AGATGTGTATAAGAGACAG</me>-insert-<me>CTGTCTCTTATACACATCT</me><s7>CCGAGCCCACGAGAC</s7><t7>NNNNNNNN</t7><p7>ATCTCGTATGCCGTCTTCTGCTTG</p7> -3'
<------<me>GACAGAGAATATGTGTAGA</me><s7>GGCTCGGGTGCTCTG</s7> -5'
</seq>
</pre>
</body>
</html>