Skip to content

Commit

Permalink
Deployed 31373d5 to 2024.2 with MkDocs 1.4.1 and mike 1.1.2
Browse files Browse the repository at this point in the history
  • Loading branch information
Geert van Geest committed Feb 23, 2024
1 parent dfbef34 commit 0d92964
Show file tree
Hide file tree
Showing 9 changed files with 24 additions and 24 deletions.
Binary file modified 2024.2/.DS_Store
Binary file not shown.
Binary file modified 2024.2/assets/pdf/04_sequencing_alignment.pdf
Binary file not shown.
18 changes: 9 additions & 9 deletions 2024.2/day1/alignment/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -781,7 +781,7 @@ <h2 id="exercises">Exercises</h2>
<p>Use the &ldquo;Comments&rdquo; box at the bottom of each page 👇 for asking questions or giving feedback. It requires a <a href="https://github.com/">github account</a>.</p>
</div>
<h3 id="1-download-data-and-prepare-the-reference-genome">1. Download data and prepare the reference genome</h3>
<p>Let&rsquo;s start with the first script of our &lsquo;pipeline&rsquo;. We will use it to download and unpack the course data. Use the code snippet below to create a script called <code>A01_download_course_data.sh</code>. Store it in <code>~/workdir/scripts/A-prepare_references/</code>, and run it.</p>
<p>Let&rsquo;s start with the first script of our &lsquo;pipeline&rsquo;. We will use it to download and unpack the course data. Use the code snippet below to create a script called <code>A01_download_course_data.sh</code>. Store it in <code>~/project/scripts/A-prepare_references/</code>, and run it.</p>
<div class="highlight"><span class="filename">A01_download_course_data.sh</span><pre><span></span><code><span class="ch">#!/usr/bin/env bash</span>
<span class="nb">cd</span><span class="w"> </span>~/project

Expand Down Expand Up @@ -832,7 +832,7 @@ <h3 id="1-download-data-and-prepare-the-reference-genome">1. Download data and p
<p>The software <code>bwa</code> is in this environment. We will use it for the alignment. Like all alignment software, it requires an index of the reference genome. You can make an index like this:</p>
<div class="highlight"><pre><span></span><code>bwa<span class="w"> </span>index<span class="w"> </span>&lt;reference.fa&gt;
</code></pre></div>
<p>Make an index of the reference sequence of chromosome 20 of the human genome. You can find the fasta file in <code>~/workdir/data/reference/Homo_sapiens.GRCh38.dna.chromosome.20.fa</code>. Do it with a script called <code>A02_create_bwa_index.sh</code>. Also store this in the directory <code>A-prepare_references</code>.</p>
<p>Make an index of the reference sequence of chromosome 20 of the human genome. You can find the fasta file in <code>~/project/data/reference/Homo_sapiens.GRCh38.dna.chromosome.20.fa</code>. Do it with a script called <code>A02_create_bwa_index.sh</code>. Also store this in the directory <code>A-prepare_references</code>.</p>
<details class="done">
<summary>Answer</summary>
<div class="highlight"><span class="filename">A02_create_bwa_index.sh</span><pre><span></span><code><span class="ch">#!/usr/bin/env bash</span>
Expand All @@ -842,7 +842,7 @@ <h3 id="1-download-data-and-prepare-the-reference-genome">1. Download data and p
</code></pre></div>
</details>
<h3 id="2-read-alignment">2. Read alignment</h3>
<p>Check out the <a href="http://bio-bwa.sourceforge.net/bwa.shtml">synopsis and manual of <code>bwa mem</code></a>. We&rsquo;ll be using paired-end reads of three samples that can be found at <code>~/workdir/data/fastq</code>. If we run <code>bwa mem</code> with default options, which three arguments do we need?</p>
<p>Check out the <a href="http://bio-bwa.sourceforge.net/bwa.shtml">synopsis and manual of <code>bwa mem</code></a>. We&rsquo;ll be using paired-end reads of three samples that can be found at <code>~/project/data/fastq</code>. If we run <code>bwa mem</code> with default options, which three arguments do we need?</p>
<details class="done">
<summary>Answer</summary>
<p>The manual says:
Expand All @@ -864,8 +864,8 @@ <h3 id="2-read-alignment">2. Read alignment</h3>
&gt;<span class="w"> </span>&lt;alignment.sam&gt;
</code></pre></div>
</details>
<p>We will now go through all the steps concerning alignment for the sample <code>mother</code>. To store the results of these steps, we will create a directory within <code>~/workdir</code> called <code>results</code>. For the alignment, make a script called <code>B01_alignment.sh</code>. Since we will perform a similar analysis later on for all samples, we store this script in<code>~/workdir/scripts/B-mother_only</code>. </p>
<p>Your directory <code>~/workdir/scripts</code> should now like this:</p>
<p>We will now go through all the steps concerning alignment for the sample <code>mother</code>. To store the results of these steps, we will create a directory within <code>~/project</code> called <code>results</code>. For the alignment, make a script called <code>B01_alignment.sh</code>. Since we will perform a similar analysis later on for all samples, we store this script in<code>~/project/scripts/B-mother_only</code>. </p>
<p>Your directory <code>~/project/scripts</code> should now like this:</p>
<div class="highlight"><pre><span></span><code>scripts
├── A-prepare_references
│ ├── A01_download_course_data.sh
Expand All @@ -874,7 +874,7 @@ <h3 id="2-read-alignment">2. Read alignment</h3>
│ └── B01_alignment.sh
└── C-all_samples
</code></pre></div>
<p>In <code>B01_alignment.sh</code> write the commands to perform an alignment with <code>bwa mem</code> of the reads from the mother (<code>mother_R1.fastq</code> and <code>mother_R2.fastq</code>) against chromosome 20. Write the resulting <code>.sam</code> file to a directory in <code>~/workdir/results</code> called <code>alignments</code>.</p>
<p>In <code>B01_alignment.sh</code> write the commands to perform an alignment with <code>bwa mem</code> of the reads from the mother (<code>mother_R1.fastq</code> and <code>mother_R2.fastq</code>) against chromosome 20. Write the resulting <code>.sam</code> file to a directory in <code>~/project/results</code> called <code>alignments</code>.</p>
<div class="admonition note">
<p class="admonition-title">Index prefix is the same a reference filename</p>
<p>With default values, the name of the index of a reference for <code>bwa mem</code> is the same as the name of the reference itself. In this case, this would be <code>Homo_sapiens.GRCh38.dna.chromosome.20.fa</code>.</p>
Expand All @@ -894,7 +894,7 @@ <h3 id="2-read-alignment">2. Read alignment</h3>
</code></pre></div>
</details>
<h3 id="3-alignment-statistics">3. Alignment statistics</h3>
<p><strong>Exercise:</strong> Check out the statistics of the alignment by using <code>samtools flagstat</code>. Write the output of samtools flagstat to a file called <code>mother.sam.flagstat</code>. Do this by creating a script called <code>B02_get_alignment_statistics.sh</code>, and add this script to <code>~/workdir/scripts/B-mother_only</code>. Find the documentation of <code>samtools flagstat</code> <a href="http://www.htslib.org/doc/samtools-flagstat.html">here</a>. Any duplicates in there?</p>
<p><strong>Exercise:</strong> Check out the statistics of the alignment by using <code>samtools flagstat</code>. Write the output of samtools flagstat to a file called <code>mother.sam.flagstat</code>. Do this by creating a script called <code>B02_get_alignment_statistics.sh</code>, and add this script to <code>~/project/scripts/B-mother_only</code>. Find the documentation of <code>samtools flagstat</code> <a href="http://www.htslib.org/doc/samtools-flagstat.html">here</a>. Any duplicates in there?</p>
<details class="done">
<summary>Answer</summary>
<div class="highlight"><span class="filename">B02_get_alignment_statistics.sh</span><pre><span></span><code><span class="ch">#!/usr/bin/env bash</span>
Expand Down Expand Up @@ -922,7 +922,7 @@ <h3 id="3-alignment-statistics">3. Alignment statistics</h3>
</details>
<h3 id="4-sorting-and-compression">4. Sorting and compression</h3>
<p>Many downstream analyses require a coordinate sorted alignment file. Now, your alignment file is in the same order as the fastq file. You can coordinate sort an alignment file with <code>samtools sort</code>. You can find the documentation <a href="http://www.htslib.org/doc/samtools-sort.html">here</a>. </p>
<p><strong>Exercise</strong>: Sort the alignment file according to coordinate. In order to do this, create a script called <code>B03_sort_alignment.sh</code> (in <code>~/workdir/scripts/B-mother_only</code>). </p>
<p><strong>Exercise</strong>: Sort the alignment file according to coordinate. In order to do this, create a script called <code>B03_sort_alignment.sh</code> (in <code>~/project/scripts/B-mother_only</code>). </p>
<details class="done">
<summary>Answer</summary>
<div class="highlight"><span class="filename">B03_sort_alignment.sh</span><pre><span></span><code><span class="ch">#!/usr/bin/env bash</span>
Expand All @@ -937,7 +937,7 @@ <h3 id="4-sorting-and-compression">4. Sorting and compression</h3>
<p>Like <code>bwa mem</code>, <code>samtools sort</code> and <code>samtools view</code> can write its output to stdout. This means that you need to redirect your output to a file with <code>&gt;</code> or use the the output option <code>-o</code>.</p>
</div>
<p>The command <code>samtools view</code> is very versatile. It takes an alignment file and writes a filtered or processed alignment to the output. You can for example use it to compress your SAM file into a BAM file. Let&rsquo;s start with that.</p>
<p><strong>Exercise</strong>: compress our SAM file into a BAM file and include the header in the output. For this, use the <code>-b</code> and <code>-h</code> options. Perform the calculation from a script called <code>B04_compress_alignment.sh</code> (in <code>~/workdir/scripts/B-mother_only</code>). Find the required documentation <a href="http://www.htslib.org/doc/samtools-view.html">here</a>. How much was the disk space reduced by compressing the file?</p>
<p><strong>Exercise</strong>: compress our SAM file into a BAM file and include the header in the output. For this, use the <code>-b</code> and <code>-h</code> options. Perform the calculation from a script called <code>B04_compress_alignment.sh</code> (in <code>~/project/scripts/B-mother_only</code>). Find the required documentation <a href="http://www.htslib.org/doc/samtools-view.html">here</a>. How much was the disk space reduced by compressing the file?</p>
<details class="done">
<summary>Answer</summary>
<p><div class="highlight"><span class="filename">B04_compress_alignment.sh</span><pre><span></span><code><span class="ch">#!/usr/bin/env bash</span>
Expand Down
14 changes: 7 additions & 7 deletions 2024.2/day1/alignment_advanced/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -810,7 +810,7 @@ <h3 id="1-adding-readgroups">1. Adding readgroups</h3>
</code></pre></div>
<p>Here, e.g. the <code>PU</code> field would be <code>FC706VJ.2.ATCACG</code></p>
</div>
<p><strong>Exercise:</strong> Have a look at the <a href="https://gatk.broadinstitute.org/hc/en-us/articles/360037226472-AddOrReplaceReadGroups-Picard-">documentation</a> of <code>AddOrReplaceReadGroups</code>. Specify the required arguments, and run the command. Do this from a script called <code>B05_add_readgroups.sh</code> (in <code>~/workdir/scripts/B-mother_only</code>).</p>
<p><strong>Exercise:</strong> Have a look at the <a href="https://gatk.broadinstitute.org/hc/en-us/articles/360037226472-AddOrReplaceReadGroups-Picard-">documentation</a> of <code>AddOrReplaceReadGroups</code>. Specify the required arguments, and run the command. Do this from a script called <code>B05_add_readgroups.sh</code> (in <code>~/project/scripts/B-mother_only</code>).</p>
<details class="done">
<summary>Answer</summary>
<p>We can use the answers of the previous exercise, and use them in the command:</p>
Expand Down Expand Up @@ -847,7 +847,7 @@ <h3 id="1-adding-readgroups">1. Adding readgroups</h3>
</details>
<h3 id="2-mark-duplicates">2. Mark duplicates</h3>
<p>Now that we have specified read groups, we can mark the duplicates with <code>gatk MarkDuplicates</code>. </p>
<p><strong>Exercise:</strong> Have a look at the <a href="https://gatk.broadinstitute.org/hc/en-us/articles/360037052812-MarkDuplicates-Picard-">documentation</a>, and run <code>gatk MarkDuplicates</code> with the three required arguments. Do this from a script called <code>B06_mark_duplicates.sh</code> (in <code>~/workdir/scripts/B-mother_only</code>). </p>
<p><strong>Exercise:</strong> Have a look at the <a href="https://gatk.broadinstitute.org/hc/en-us/articles/360037052812-MarkDuplicates-Picard-">documentation</a>, and run <code>gatk MarkDuplicates</code> with the three required arguments. Do this from a script called <code>B06_mark_duplicates.sh</code> (in <code>~/project/scripts/B-mother_only</code>). </p>
<details class="done">
<summary>Answer</summary>
<div class="highlight"><span class="filename">B06_mark_duplicates.sh</span><pre><span></span><code><span class="ch">#!/usr/bin/env bash</span>
Expand All @@ -860,7 +860,7 @@ <h3 id="2-mark-duplicates">2. Mark duplicates</h3>
--METRICS_FILE<span class="w"> </span>alignments/marked_dup_metrics_mother.txt<span class="w"> </span>
</code></pre></div>
</details>
<p><strong>Exercise:</strong> Run <code>samtools flagstat</code> on the alignment file with marked duplicates, and write the output to a file called <code>mother.rg.md.bam.flagstat</code>. Create a script called <code>B07_get_alignment_stats_after_md.sh</code> (in <code>~/workdir/scripts/B-mother_only</code>). How many reads were marked as duplicate?</p>
<p><strong>Exercise:</strong> Run <code>samtools flagstat</code> on the alignment file with marked duplicates, and write the output to a file called <code>mother.rg.md.bam.flagstat</code>. Create a script called <code>B07_get_alignment_stats_after_md.sh</code> (in <code>~/project/scripts/B-mother_only</code>). How many reads were marked as duplicate?</p>
<details class="done">
<summary>Answer</summary>
<div class="highlight"><span class="filename">B07_get_alignment_stats_after_md.sh</span><pre><span></span><code><span class="ch">#!/usr/bin/env bash</span>
Expand Down Expand Up @@ -891,7 +891,7 @@ <h3 id="3-indexing">3. Indexing</h3>
<p>To look up specific alignments, it is convenient to have your alignment file indexed. An indexing can be compared to a kind of &lsquo;phonebook&rsquo; of your sequence alignment file. Indexing can be done with <code>samtools</code> as well, but it first needs to be sorted on coordinate (i.e. the alignment location). You can do it like this:</p>
<div class="highlight"><pre><span></span><code>samtools<span class="w"> </span>index<span class="w"> </span>&lt;bam<span class="w"> </span>file&gt;
</code></pre></div>
<p><strong>Exercise</strong>: Create a script called <code>B08_index_alignment.sh</code> (in <code>~/workdir/scripts/B-mother_only</code>) to perform the alignment.</p>
<p><strong>Exercise</strong>: Create a script called <code>B08_index_alignment.sh</code> (in <code>~/project/scripts/B-mother_only</code>) to perform the alignment.</p>
<details class="done">
<summary>Answer</summary>
<div class="highlight"><span class="filename">B08_index_alignment.sh</span><pre><span></span><code><span class="ch">#!/usr/bin/env bash</span>
Expand Down Expand Up @@ -934,7 +934,7 @@ <h3 id="5-apply-it-on-all-three-samples-with-pipes-and-loops">5. Apply it on all
<span class="p">|</span><span class="w"> </span>samtools<span class="w"> </span>sort<span class="w"> </span><span class="se">\</span>
<span class="p">|</span><span class="w"> </span>samtools<span class="w"> </span>view<span class="w"> </span>-bh<span class="w"> </span>&gt;<span class="w"> </span>results/alignments/<span class="s2">&quot;</span><span class="nv">$SAMPLE</span><span class="s2">&quot;</span>.bam
</code></pre></div>
<p><strong>Exercise</strong>: Make a directory in the scripts directory <code>C-all_samples</code> (so <code>~/workdir/scripts/C-all_samples</code>). In here, create a script called <code>C01_alignment_sorting_compression.sh</code>. Within that script use the above snippet to make a loop that performs the alignment, sorting and compression for all three samples (i.e. <code>mother</code>, <code>father</code> and <code>son</code>).</p>
<p><strong>Exercise</strong>: Make a directory in the scripts directory <code>C-all_samples</code> (so <code>~/project/scripts/C-all_samples</code>). In here, create a script called <code>C01_alignment_sorting_compression.sh</code>. Within that script use the above snippet to make a loop that performs the alignment, sorting and compression for all three samples (i.e. <code>mother</code>, <code>father</code> and <code>son</code>).</p>
<details class="done">
<summary>Answer</summary>
<p>Your <code>scripts</code> directory should look like:</p>
Expand Down Expand Up @@ -970,7 +970,7 @@ <h3 id="5-apply-it-on-all-three-samples-with-pipes-and-loops">5. Apply it on all
</code></pre></div>
</details>
<p>Now we continue with adding the readgroups. For each sample, we have to add specific information to the different readgroup fields. We can do that by looping over a tab delimited file with sample-specific information in each row. Let&rsquo;s create that tab-delimited file. </p>
<p><strong>Exercise</strong> Generate a tab-delimited file called <code>sample_rg_fields.txt</code> and store it in <code>~/workdir/results/</code>. In this file, each line should represent a sample (mother, father and son), and you specify the <code>SM</code>, <code>LB</code>, <code>PU</code> and <code>ID</code> fields. E.g., the first line (for &lsquo;mother&rsquo;) would look like:</p>
<p><strong>Exercise</strong> Generate a tab-delimited file called <code>sample_rg_fields.txt</code> and store it in <code>~/project/results/</code>. In this file, each line should represent a sample (mother, father and son), and you specify the <code>SM</code>, <code>LB</code>, <code>PU</code> and <code>ID</code> fields. E.g., the first line (for &lsquo;mother&rsquo;) would look like:</p>
<div class="highlight"><pre><span></span><code>mother lib1 H0164.2.ALXX140820 H0164.2
</code></pre></div>
<div class="admonition warning">
Expand All @@ -985,7 +985,7 @@ <h3 id="5-apply-it-on-all-three-samples-with-pipes-and-loops">5. Apply it on all
son lib3 H0164.6.ALXX140820 H0164.6
</code></pre></div>
</details>
<p><strong>Exercise</strong> Generate a script called <code>C02_add_readgroups.sh</code> (in <code>~/workdir/scripts/C-all_samples</code>) to loop over the tab-delimited file (have a look at the last exercise in <a href="../server_login#loops">Setup</a>), and add the correct readgroups to the bam file of each sample with <code>gatk AddOrReplaceReadGroups</code>. </p>
<p><strong>Exercise</strong> Generate a script called <code>C02_add_readgroups.sh</code> (in <code>~/project/scripts/C-all_samples</code>) to loop over the tab-delimited file (have a look at the last exercise in <a href="../server_login#loops">Setup</a>), and add the correct readgroups to the bam file of each sample with <code>gatk AddOrReplaceReadGroups</code>. </p>
<details class="hint">
<summary>Hint</summary>
<p>Try to just print the variables from a loop in order to check to see whether the loop performs according to your expectation. E.g.:</p>
Expand Down
4 changes: 2 additions & 2 deletions 2024.2/day1/reproducibility/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -729,7 +729,7 @@ <h2 id="exercises">Exercises</h2>
<li>Alignment and variant calling on one sample (&lsquo;mother&rsquo;)</li>
<li>Alignment, variant calling and filtering on all samples</li>
</ul>
<p>We store the scripts required for these subprojects in different subdirectories of <code>~/workdir/scripts</code> named:</p>
<p>We store the scripts required for these subprojects in different subdirectories of <code>~/project/scripts</code> named:</p>
<ul>
<li><code>A-prepare_references</code></li>
<li><code>B-mother_only</code></li>
Expand All @@ -743,7 +743,7 @@ <h2 id="exercises">Exercises</h2>
B-mother_only<span class="w"> </span><span class="se">\</span>
C-all_samples
</code></pre></div>
<p>By the end of day 2 <code>~/workdir/scripts</code> should look (something) like this:</p>
<p>By the end of day 2 <code>~/project/scripts</code> should look (something) like this:</p>
<div class="highlight"><pre><span></span><code>scripts
├── A_prepare_references
│ ├── A01_download_course_data.sh
Expand Down
Loading

0 comments on commit 0d92964

Please sign in to comment.