Skip to content

Commit

Permalink
update alignment QC
Browse files Browse the repository at this point in the history
  • Loading branch information
malachig committed Nov 16, 2024
1 parent 6aeaf31 commit 0f46b51
Showing 1 changed file with 8 additions and 5 deletions.
13 changes: 8 additions & 5 deletions _posts/0002-06-01-Alignment_QC.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,9 @@ Try filtering the BAM file to require or exclude certain flags. This can be done

* [http://broadinstitute.github.io/picard/explain-flags.html](http://broadinstitute.github.io/picard/explain-flags.html)

Try requiring that alignments are 'paired' and 'mapped in a proper pair' (=3). Also filter out alignments that are 'unmapped', the 'mate is unmapped', and 'not primary alignment' (=268)
Try requiring that alignments are 'paired' and 'mapped in a proper pair' (=3).

Also filter out alignments that are 'unmapped', the 'mate is unmapped', and 'not primary alignment' (=268)

```bash
samtools view -f 3 -F 268 UHR.bam | head | column -t | less -S
Expand Down Expand Up @@ -89,7 +91,7 @@ mv *fastqc.zip fastqc/
### Using Picard
You can use Picard to generate RNA-seq specific quality metrics and figures

In this section we need to create some additional formats of our reference files.
In this section we need to create some additional formats of our reference transcriptome files.

Picard uses a "sequence dictionary" file for many commands (simply a list of reference sequences and their sizes)

Expand All @@ -105,14 +107,15 @@ cd $RNA_HOME/refs
# Create a .dict file for our reference
java -jar $PICARD CreateSequenceDictionary -R chr22_with_ERCC92.fa -O chr22_with_ERCC92.dict

# Create a bed file of the location of ribosomal sequences in our reference (first extract from the gtf then convert to bed)
# Create a bed file of the location of ribosomal sequences in our reference (first extract them from the GTF then convert to BED format)
# Note that here we pull all the "rrna" transcripts from the GTF. This is a good strategy for the whole transcriptome ...
# ... but on chr22 there is very little "rrna" content, leading to 0 coverage for all samples, so we are also adding a single protein coding ribosomal gene "RRP7A" (normally we would not do this)
# Note that for the convert2bed command we will treat out GTF file as a GFF file to retain exon level features when converting it to BED

# Note the convert2bed command will convert our GTF to BED format
# "<" is used to feed the GTF file into the tool. ">2/dev/null" is used to throw away a harmless warning. "1>" is use to save our result to a file

grep --color=none -i -P "rrna|rrp7a" chr22_with_ERCC92.gtf > ref_ribosome.gtf
convert2bed --input=gff --output=bed < ref_ribosome.gtf 2>/dev/null 1>ref_ribosome.bed
convert2bed --input=gtf --output=bed < ref_ribosome.gtf 2>/dev/null 1>ref_ribosome.bed

# Create interval list file for the location of just the ribosomal sequences in our reference
java -jar $PICARD BedToIntervalList -I ref_ribosome.bed -O ref_ribosome.interval_list -SD chr22_with_ERCC92.dict
Expand Down

0 comments on commit 0f46b51

Please sign in to comment.