indexing.html


<!DOCTYPE html>
<html>

<style>
seq {font-family:Monaco, "Courier New";
  font-size:0.9em;
  text-indent: 2.7em;
}

align.small {
    font-family:Monaco, "Courier New";
    font-size:0.9em;
    line-height:65%;
}

align.long {
  font-family:Monaco, "Courier New";
  font-size:0.75em;
  line-height:70%;
}

s5 {color:#6baed6;}
s7 {color:#fc9272;}
p5 {color:#08519c;}
p7 {color:#a50f15;}
me {color:#969696;}
t7 {color:blue;}
w1 {color:#f03b20;}
pri {color:#75c37c;}
i71 {color:#a6cee3;}
i72 {color:#1f78b4;}
i73 {color:#b2df8a;}
i74 {color:#33a02c;}
i75 {color:#fb9a99;}
i51 {color:#fdbf6f;}
i52 {color:#ff7f00;}
i53 {color:#cab2d6;}
i54 {color:#6a3d9a;}
i55 {color:#b15928;}
h3 {font-family:verdana;}

</style>

<head>
<title>Illumina sequencing</title>
</head>
<body>

<h1>Illumina sequencing libraries</h1>

<p><span style="font-family:verdana; font-size:1.1em;">Illumina sequencing by synthesis requires special oligonucleotide adapters to be annealed to the purified target DNA in order to initiate sequencing. These adapters consist of three main components: (1) the P5 and P7 sequences that allow the library to bind and generate clusters on the flow cell. (2) The i5 and i7 index sequences (barcodes) which uniquely label the molecules from different samples to allow multiplexing/pooling of multiple samples in a single sequencing run or flow cell lane.(3) The binding sites for the Read 1 and Read 2 sequencing primers which initiate the sequencing process itself. There are a variety of Illumina and third party adapter designs that can be used for Illumina sequencing, with the TruSeq and Nextera adapter systems being the most popular:</span></p>

<h3>TruSeq Dual Index Library:</h3>
<pre>
<seq>
5'- <p5>AATGATACGGCGACCACCGAGATCTACAC</p5><t7>NNNNNNNN</t7><s5>ACACTCTTTCCCTACACGACGCTCTTCCGATCT</s5>-insert-<s7>AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC</s7><t7>NNNNNNNN</t7><p7>ATCTCGTATGCCGTCTTCTGCTTG</p7> -3'
3'- <p5>TTACTATGCCGCTGGTGGCTCTAGATGTG</p5><t7>NNNNNNNN</t7><s5>TGTGAGAAAGGGATGTGCTGCGAGAAGGCTAGA</s5>-insert-<s7>TCTAGCCTTCTCGTGTGCAGACTTGAGGTCAGTG</s7><t7>NNNNNNNN</t7><p7>TAGAGCATACGGCAGAAGACGAAC</p7> -5'
          <p5>Illumina P5</p5>               <t7>i5</t7>            <s5>TruSeq Read 1</s5>                          <s7>TruSeq Read 2</s7>                 <t7>i7</t7>        <p7>Illumina P7</p7>
</seq>
</pre>

<h3>Nextera Dual Index Library:</h3>
<pre>
<seq>
5'- <p5>AATGATACGGCGACCACCGAGATCTACAC</p5><t7>NNNNNNNN</t7><s5>TCGTCGGCAGCGTC</s5><me>AGATGTGTATAAGAGACAG</me>-insert-<me>CTGTCTCTTATACACATCT</me><s7>CCGAGCCCACGAGAC</s7><t7>NNNNNNNN</t7><p7>ATCTCGTATGCCGTCTTCTGCTTG</p7> -3'
3'- <p5>TTACTATGCCGCTGGTGGCTCTAGATGTG</p5><t7>NNNNNNNN</t7><s5>AGCAGCCGTCGCAG</s5><me>TCTACACATATTCTCTGTC</me>-insert-<me>GACAGAGAATATGTGTAGA</me><s7>GGCTCGGGTGCTCTG</s7><t7>NNNNNNNN</t7><p7>TAGAGCATACGGCAGAAGACGAAC</p7> -5'
           <p5>Illumina P5</p5>              <t7>i5</t7>             <s5>Next</s5><me>era Read 1</me>                                <me>Next</me><s7>era Read 2</s7>        <t7>i7</t7>         <p7>Illumina P7</p7>
</seq>
</pre>

<p><span style="font-family:verdana; font-size:1.1em;">the "insert" is the commonly used term for the target DNA that is to be sequenced, in the case of metabarcoding libraries this also includes the forward and reverse PCR primers used to amplify the target DNA.</span></p>

<p><span style="font-family:verdana; font-size:1.1em;">The "N"s in the above diagrams indicate the "indexes", or "barcodes" used to discriminate different samples. These are short 8-10bp sequences (i.e. CTATGTTA) that are unique to each sample. The index at the right hand side is the "i7 index", or "index1", and the index at the left hand side is the "i5 index", or "index2".</span></p>

<p><span style="font-family:verdana; font-size:1.1em;">Most modern sequencing protocols use dual-indexing rather than single indexing. Dual indexed libraries can either be combinatiorial, where only 1 index is different between samples, while the other is shared:</span></p>
<pre>
<seq>
Sample 1 - <i51>-AATAACGT</i51>...<i71>AATCGTTA</i71>
Sample 2 - <i51>-TTCTTGAA</i51>...<i72>GTCTACAT</i72>
Sample 3 - <i51>-GGCAGATC</i51>...<i73>CGCTGCTC</i73>
Sample 4 - <i51>-CTATGTTA</i51>...<i74>GATCAACA</i74>
Sample 5 - <i51>-GTTGACGC</i51>...<i75>CGAAGGAC</i75>
               <t7>i5</t7>        <t7>i7</t7>
</seq>
</pre>

<p><span style="font-family:verdana; font-size:1.1em;">Or completely unique (Unique Dual Indexing) where both the i5 and i7 index is completely unique to that sample:</span></p>
<pre>
<seq>
Sample 1 - <i51>-AATAACGT</i51>...<i71>AATCGTTA</i71>
Sample 2 - <i52>-TTCTTGAA</i52>...<i72>GTCTACAT</i72>
Sample 3 - <i53>-GGCAGATC</i53>...<i73>CGCTGCTC</i73>
Sample 4 - <i54>-CTATGTTA</i54>...<i74>GATCAACA</i74>
Sample 5 - <i55>-GTTGACGC</i55>...<i75>CGAAGGAC</i75>
               <t7>i5</t7>        <t7>i7</t7>
</seq>
</pre>

<p><span style="font-family:verdana; font-size:1.1em;">Illumina now encourages customers to use unique dual indexing (UDI) whenever possible to ensure the most accurate demultiplexing, and therefore reduce the risk of sample cross-contamination.</span></p>

<h1>Library preparation:</h1>

<p><span style="font-family:verdana; font-size:1.1em;">In our current metabarcoding protocol, we are using the TruSeq adapter system and anneal them to the molecule using 2 separate PCRs:</span></p>

<h3>First PCR:</h3>

<p><span style="font-family:verdana; font-size:1.1em;">The first PCR amplifies the target DNA  and adds the illumina Read 1 primer on the left side of the insert, and the Read 2 primer on right side of insert. To achieve this we need to modify our locus-specific primers to include the Universal 5' adapters as tails. In the below example we are using the fwhF2-fwhR2n primer sets which amplify a short region of the mitochondrial COI barcode: </span></p>

<p><span style="font-family:verdana; font-size:1.1em;"></span></p>
<pre>
<seq>
<p>Tailed F primer: 5'- <s5>ACACTCTTTCCCTACACGACGCTCTTCCGATCT</s5>-<pri>GGDACWGGWTGAACWGTWTAYCCHCC</pri> -3'
                                <s5>TruSeq Read 1</s5>                    <pri>Forward primer</pri> 
<p>Tailed R primer: 5'- <s7>GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT</s7>-<pri>GTRATWGCHCCDGCTARWACWGG</pri> -3'
                                <s7>TruSeq Read 2</s7>                    <pri>Reverse primer</pri> 
</seq>
</pre>

<p><span style="font-family:verdana; font-size:1.1em;">Following amplification with these tailed primers, the molecules will look like this:</span></p>
<pre>
<seq>
5'- <s5>ACACTCTTTCCCTACACGACGCTCTTCCGATCT</s5><pri>GGDACWGGWTGAACWGTWTAYCCHCC</pri>-Target-<pri>CCWGTWYTAGCHGGDGCWATYAC</pri><s7>AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC</s7> -3'
5'- <s5>TGTGAGAAAGGGATGTGCTGCGAGAAGGCTAGA</s5><pri>CCHCCYATWTGWCAAGTWGGWCADGG</pri>-Target-<pri>GGWCAWRATCGDCCHCGWTARTG</pri><s7>TCTAGCCTTCTCGTGTGCAGACTTGAGGTCAGTG</s7> -3'
            <s5>TruSeq Read 1</s5>                <pri>Forward primer</pri>                          <pri>Reverse primer</pri>               <s7>TruSeq Read 2</s7>
</seq>
</pre>

<p><span style="font-family:verdana; font-size:1.1em;">Note that in some sequencing protocols such as those used for whole-genome, metagenomic, or metatranscriptomics, the Read 1 and Read 2 adapters are annealed to the molecules using alternatives to PCR such as tagmentation or ligation.</span></p>


<br></br>

<h3>Second PCR:</h3>
<p><span style="font-family:verdana; font-size:1.1em;">The second PCR uses the Read 1 and Read 2 primer sequences as templates to add the P5 and P7 sequencing primers, as well as the i5 and i7 indexes. The second set of primers, commonly referred to as indexing primers, are normally purchased in a kit or designed in-house. Either way, they are generally structured as follows:</span></p>

<pre>
<seq>
<p>iTru_R1_5: 5'- <p5>AATGATACGGCGACCACCGAGATCTACAC</p5><t7>NNNNNNNN</t7><s5>ACACTCTTTCCCTACACGACGCTCTTCCGATCT</s5> -3'
                        <p5>Illumina P5</p5>               <t7>i5</t7>            <s5>TruSeq Read 1</s5>
<p>iTru_R2_5: 5'- <s7>AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC</s7><t7>NNNNNNNN</t7><p7>ATCTCGTATGCCGTCTTCTGCTTG</p7> -3'
                        <s7>TruSeq Read 2</s7>                 <t7>i7</t7>        <p7>Illumina P7</p7>
</seq>
</pre>

<p><span style="font-family:verdana; font-size:1.1em;">Following amplification with the second set of primers (indexing primers), the molecules will look like this:</span></p>

<pre>
<seq>
5'- <p5>AATGATACGGCGACCACCGAGATCTACAC</p5><t7>NNNNNNNN</t7><s5>ACACTCTTTCCCTACACGACGCTCTTCCGATCT</s5><pri>GGDACWGGWTGAACWGTWTAYCCHCC</pri>-Target-<pri>CCWGTWYTAGCHGGDGCWATYAC</pri><s7>AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC</s7><t7>NNNNNNNN</t7><p7>ATCTCGTATGCCGTCTTCTGCTTG</p7> -3'
3'- <p5>TTACTATGCCGCTGGTGGCTCTAGATGTG</p5><t7>NNNNNNNN</t7><s5>TGTGAGAAAGGGATGTGCTGCGAGAAGGCTAGA</s5><pri>CCHCCYATWTGWCAAGTWGGWCADGG</pri>-Target-<pri>GGWCAWRATCGDCCHCGWTARTG</pri><s7>TCTAGCCTTCTCGTGTGCAGACTTGAGGTCAGTG</s7><t7>NNNNNNNN</t7><p7>TAGAGCATACGGCAGAAGACGAAC</p7> -5'
          <p5>Illumina P5</p5>               <t7>i5</t7>            <s5>TruSeq Read 1</s5>               <pri>Forward primer</pri>                         <pri>Reverse primer</pri>             <s7>TruSeq Read 2</s7>                 <t7>i7</t7>        <p7>Illumina P7</p7>
</seq>
</pre>

<p><span style="font-family:verdana; font-size:1.1em;">Once the adapters are added the libraries are ready to be sequenced.</span></p>

<h1>Sequencing:</h1>

<p><span style="font-family:verdana; font-size:1.1em;">The below steps are automatically performed by the machine and sequencing chemistry, and do not need to be performed by the operator. From here on the target DNA, forward, and reverse primers will be referred to as the "Insert".</span></p>

<p><span style="font-family:verdana; font-size:1.1em;">In the sequencing reagents provided by Illumina, the sequencing primers are actually a mixture of different primers, including TruSeq, Nextera and even primers from obsolete kits. Therefore, you actually can sequence different types of libraries together.</span></p>

<h3>(Step 1) Add Read 1 sequencing primer mixture to sequence the first read (bottom strand as template):</h3>

<pre>
<seq>
TruSeq Dual Index Library:

                                     5'- <s5>ACACTCTTTCCCTACACGACGCTCTTCCGATCT</s5>---->
3'- <p5>TTACTATGCCGCTGGTGGCTCTAGATGTG</p5><t7>NNNNNNNN</t7><s5>TGTGAGAAAGGGATGTGCTGCGAGAAGGCTAGA</s5>-insert-<s7>TCTAGCCTTCTCGTGTGCAGACTTGAGGTCAGTG</s7><t7>NNNNNNNN</t7><p7>TAGAGCATACGGCAGAAGACGAAC</p7> -5'
</seq>
</pre>

<pre>
<seq>
Nextera Dual Index Library:

                                     5'- <s5>TCGTCGGCAGCGTC</s5><me>AGATGTGTATAAGAGACAG</me>------>
3'- <p5>TTACTATGCCGCTGGTGGCTCTAGATGTG</p5><t7>NNNNNNNN</t7><s5>AGCAGCCGTCGCAG</s5><me>TCTACACATATTCTCTGTC</me>-insert-<me>GACAGAGAATATGTGTAGA</me><s7>GGCTCGGGTGCTCTG</s7><t7>NNNNNNNN</t7><p7>TAGAGCATACGGCAGAAGACGAAC</p7> -5'
</seq>
</pre>

<h3>(Step 2) Add Index 1 sequencing primer mixture to sequence the first index (index 1, i7, bottom strand as template):</h3>

<pre>
<seq>
TruSeq Dual Index Library:

                                                                              5'- <s7>AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC</s7>------->
3'- <p5>TTACTATGCCGCTGGTGGCTCTAGATGTG</p5><t7>NNNNNNNN</t7><s5>TGTGAGAAAGGGATGTGCTGCGAGAAGGCTAGA</s5>-insert-<s7>TCTAGCCTTCTCGTGTGCAGACTTGAGGTCAGTG</s7><t7>NNNNNNNN</t7><p7>TAGAGCATACGGCAGAAGACGAAC</p7> -5'
</seq>
</pre>

<pre>
<seq>
Nextera Dual Index Library:

                                                                              5'- <me>CTGTCTCTTATACACATCT</me><s7>CCGAGCCCACGAGAC</s7>------->
3'- <p5>TTACTATGCCGCTGGTGGCTCTAGATGTG</p5><t7>NNNNNNNN</t7><s5>AGCAGCCGTCGCAG</s5><me>TCTACACATATTCTCTGTC</me>-insert-<me>GACAGAGAATATGTGTAGA</me><s7>GGCTCGGGTGCTCTG</s7><t7>NNNNNNNN</t7><p7>TAGAGCATACGGCAGAAGACGAAC</p7> -5'
</seq>
</pre>

<h3>(Step 3 of MiSeq, HiSeq2000/2500 and NovaSeq 6000) Folds over and sequence the second index (index 2, i5, bottom strand as template):</h3>

<pre>
<seq>
TruSeq Dual Index Library:

5'- <p5>AATGATACGGCGACCACCGAGATCTACAC</p5>------->
3'- <p5>TTACTATGCCGCTGGTGGCTCTAGATGTG</p5><t7>NNNNNNNN</t7><s5>TGTGAGAAAGGGATGTGCTGCGAGAAGGCTAGA</s5>-insert-<s7>TCTAGCCTTCTCGTGTGCAGACTTGAGGTCAGTG</s7><t7>NNNNNNNN</t7><p7>TAGAGCATACGGCAGAAGACGAAC</p7> -5'
</seq>
</pre>

<pre>
<seq>
Nextera Dual Index Library:

5'- <p5>AATGATACGGCGACCACCGAGATCTACAC</p5>------->
3'- <p5>TTACTATGCCGCTGGTGGCTCTAGATGTG</p5><t7>NNNNNNNN</t7><s5>AGCAGCCGTCGCAG</s5><me>TCTACACATATTCTCTGTC</me>-insert-<me>GACAGAGAATATGTGTAGA</me><s7>GGCTCGGGTGCTCTG</s7><t7>NNNNNNNN</t7><p7>TAGAGCATACGGCAGAAGACGAAC</p7> -5'
</seq>
</pre>

<h3>(Step 3 of  iSeq 100, MiniSeq, NextSeq, HiSeq X and HiSeq 3000/4000) Add Index 2 sequencing primer mixture to sequence the second index (index 2, i5, top strand as template):</h3>

<pre>
<seq>
TruSeq Dual Index Library:

5'- <p5>AATGATACGGCGACCACCGAGATCTACAC</p5><t7>NNNNNNNN</t7><s5>ACACTCTTTCCCTACACGACGCTCTTCCGATCT</s5>-insert-<s7>AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC</s7><t7>NNNNNNNN</t7><p7>ATCTCGTATGCCGTCTTCTGCTTG</p7> -3'
                                 <-------<s5>TGTGAGAAAGGGATGTGCTGCGAGAAGGCTAGA</s5> -5'
</seq>
</pre>

<pre>
<seq>
Nextera Dual Index Library:

5'- <p5>AATGATACGGCGACCACCGAGATCTACAC</p5><t7>NNNNNNNN</t7><s5>TCGTCGGCAGCGTC</s5><me>AGATGTGTATAAGAGACAG</me>-insert-<me>CTGTCTCTTATACACATCT</me><s7>CCGAGCCCACGAGAC</s7><t7>NNNNNNNN</t7><p7>ATCTCGTATGCCGTCTTCTGCTTG</p7> -3'
                                 <-------<s5>AGCAGCCGTCGCAG</s5><me>TCTACACATATTCTCTGTC</me> -5'
</seq>
</pre>

<h3>(Step 4) Cluster regeneration, add Read 2 sequencing primer mixture to sequence the second read (top strand as template):</h3>

<pre>
<seq>
TruSeq Dual Index Library:

5'- <p5>AATGATACGGCGACCACCGAGATCTACAC</p5><t7>NNNNNNNN</t7><s5>ACACTCTTTCCCTACACGACGCTCTTCCGATCT</s5>-insert-<s7>AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC</s7><t7>NNNNNNNN</t7><p7>ATCTCGTATGCCGTCTTCTGCTTG</p7> -3'
                                                                           <------<s7>TCTAGCCTTCTCGTGTGCAGACTTGAGGTCAGTG</s7> -5'
</seq>
</pre>

<pre>
<seq>
Nextera Dual Index Library:

5'- <p5>AATGATACGGCGACCACCGAGATCTACAC</p5><t7>NNNNNNNN</t7><s5>TCGTCGGCAGCGTC</s5><me>AGATGTGTATAAGAGACAG</me>-insert-<me>CTGTCTCTTATACACATCT</me><s7>CCGAGCCCACGAGAC</s7><t7>NNNNNNNN</t7><p7>ATCTCGTATGCCGTCTTCTGCTTG</p7> -3'
                                                                           <------<me>GACAGAGAATATGTGTAGA</me><s7>GGCTCGGGTGCTCTG</s7> -5'
</seq>
</pre>

</body>
</html>