diff --git a/doc/body.rst b/doc/body.rst index e68770b..27dcfa0 100644 --- a/doc/body.rst +++ b/doc/body.rst @@ -259,8 +259,7 @@ is given below: - In the dataset section select **Ensembl genes 84** and choose **Homo sapiens genes**. - In the filter section and **Gene** subsection, input your gene IDs to **external references ID** textbox and select appropriate nomenclature (e.g. **HGNC symbol** when using gene symbols). - In attributes section select **Sequences** and untick all header information features. -- Under the **SEQUENCES** menu select **Exon sequences** and input the 5' and 3' flank base count, e.g. 50 (it should be the same for both 5' - and 3' flank). This is to ensure that reads produced by exome capture techniques are fully mapped. +- Under the **SEQUENCES** menu select **Exon sequences** and input the 5' and 3' flank base count, e.g. 50 (it should be the same for both 5' and 3' flank). This is to ensure that reads produced by exome capture techniques are fully mapped. - Select the following features (order matters here): **Associated Gene Name**, **Ensembl Exon ID**, **Chromosome Name**, **Exon Chr Start (bp)**, **Exon Chr End (bp)** and **Strand**. - Go to results, select **unique results only** and download the resulting FASTA file (save it as ``biomart_refs.fa``). - Download and run the following `script `__ (requires `Groovy `__ to be installed): ``groovy ExtractBedFormBiomartRefs.groovy biomart_refs.fa refs.fa refs.bed 50``. The last argument specifies the flank size. diff --git a/doc/index.rst b/doc/index.rst index ace2f04..dfd4cb3 100644 --- a/doc/index.rst +++ b/doc/index.rst @@ -18,13 +18,13 @@ be browsed and post-processed by the majority of conventional bioinformatics sof Terminology ----------- -- UMI - unique molecular identifier, a short (typically 6-20nt) degenerate nucleotide sequence, that is attach to cDNA/DNA molecules in order to trace them throughout the entire experiment. +- UMI - unique molecular identifier, a short (4-20bp) degenerate nucleotide sequence, that is attach to cDNA/DNA molecules in order to trace them throughout the entire experiment. - Sample barcode - a short specific nucleotide sequence used to mark cDNA/DNA molecules that correspond to a given sample in a pooled sequencing library - MIG - molecular identifier group, a set of reads or read pairs that have an identical UMI sequence -- MIG consensus - the consensus sequence of MIG, calculated from MIG position-weight matrix by taking the letter with highest frequency at each position -- CQS - consensus quality score, calculated as the maximal relative nucleotide frequency at a given PWM position. Usually scaled to [2, 40] range to fit Phred33 string representation. Indicates our confidence in the consensus sequence at a given position. -- Major variant (aka dominant variant, supermutant) - a sequence variant that is present in MIG consensus sequence -- Minor variant - a sequence variant that is present in reads within an MIG, but doesn’t get to the final MIG consensus sequence +- MIG consensus - the consensus sequence of MIG, that is, the consensus of multiple alignment of all reads in a given MIG +- CQS - consensus quality score, calculated as the fraction of reads matching the consensus sequence at a given position. Can be scaled to ``[2, 40]`` range to fit Phred33 quality representation. +- Major variant (aka dominant variant, supermutant) - a sequence variant that is present in MIG consensus, but doesn't match the reference sequence +- Minor variant - a sequence variant that differs from the consensus sequence found in one or more reads within a given MIG Table of contents ----------------- diff --git a/pom.xml b/pom.xml index 594f043..d2e6d88 100644 --- a/pom.xml +++ b/pom.xml @@ -4,7 +4,7 @@ com.antigenomics mageri - 1.0.1-SNAPSHOT + 1.1.0 jar MAGERI