Skip to content

Commit

Permalink
update readme
Browse files Browse the repository at this point in the history
  • Loading branch information
skanwal committed Jun 19, 2024
1 parent 34a1058 commit e0399c8
Showing 1 changed file with 36 additions and 50 deletions.
86 changes: 36 additions & 50 deletions README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -63,8 +63,7 @@ below. For more details, see [workflow.md](/workflow.md).

<img src="man/figures/RNAsum_workflow.png" width="100%">

1. Collect patient **WTS data** from the [DRAGEN RNA][dragen-rna]
pipeline including per-gene read counts and gene fusions.
1. Collect sample **WTS data** including per-gene read counts and gene fusions.
2. Add expression data from **[reference cohorts](#reference-data)** to get an
idea about the expression levels of genes of interest in other cancer patient
cohorts. The read counts are normalised, transformed and converted into a scale
Expand All @@ -83,14 +82,12 @@ below. For more details, see [workflow.md](/workflow.md).
plots presenting expression levels of the genes of interest. The report
consists of several sections described [here](./articles/report_structure.md).

[dragen-rna]: <https://www.illumina.com/products/by-type/informatics-products/basespace-sequence-hub/apps/edico-genome-inc-dragen-rna-pipeline.html>

## Reference data

The reference expression data are available for **33 cancer types** and were
derived from [external](#external-reference-cohorts)
([TCGA](https://tcga-data.nci.nih.gov/)) and [internal](#internal-reference-cohort)
([UMCCR](https://research.unimelb.edu.au/centre-for-cancer-research/our-research/precision-oncology-research-group))
([UMCCR](https://mdhs.unimelb.edu.au/centre-for-cancer-research/our-research/precision-oncology-research-group))
resources.

### External reference cohorts
Expand Down Expand Up @@ -148,46 +145,45 @@ There are two rationales for using the internal reference cohort:

## Input data

`RNAsum` accepts [WTS](#wts) data processed by the `DRAGEN RNA`
pipeline. Additionally, the WTS data
can be integrated with [WGS](#wgs)-based data processed using the
`umccrise` pipeline. In the latter case, the genome-based findings from the
corresponding patient sample are incorporated into the report and are used as a
`RNAsum` accepts [WTS](#wts) data processed by the state-of-the-art bioinformatic
tools such as kallisto and salmon for quantification and Arriba for fusion calling.
RNAsum can aso process and combine fusion output from Illumina's Dragen pipeline.
Additionally, the WTS data can be integrated with [WGS](#wgs)-based data processed
using the tools discussed in the section [WGS](#wgs).

In the latter case, the genome-based findings from the
corresponding sample are incorporated into the report and are used as a
primary source for expression profile prioritisation.

### WTS

The only required WTS input data are **read counts** provided in a
quantification file from the `DRAGEN RNA` pipeline.
quantification file.

#### DRAGEN RNA
#### RNA

The table below lists all input data accepted in `RNAsum`:

| Input file | Tool | Example | Required |
| -------------------------------------- | ---------------------------------------------- | ----------------------------------------------- | ---------- |
| Quantified transcript **abundances** | [salmon][salmon] ([description][salmon-res]) | [TEST.quant.sf][salmon-ex] | **Yes** |
| **Fusion gene** list | [DRAGEN RNA][dragen-rna] | [TEST.fusion_candidates.final][dragen-rna-ex] | No |
| Quantified transcript **abundances** | [salmon][salmon] ([description][salmon-res]) | [*.quant.sf][salmon-ex] | **Yes** |
| Quantified gene **abundances** | [salmon][salmon] ([description][salmon-res]) | [*.quant.gene.sf][salmon-ex2] | **Yes** |
| **Fusion gene** list | [Arriba][arriba] | [fusions.tsv][dragen-rna-ex] | No |
| **Fusion gene** list | [DRAGEN RNA][dragen-rna] | [*.fusion_candidates.final][dragen-rna-ex] | No |

[salmon]: <https://salmon.readthedocs.io/en/latest/salmon.html>
[salmon-ex]: </inst/rawdata/test_data/dragen/TEST.quant.sf>
[salmon-ex2]: </inst/rawdata/test_data/dragen/TEST.quant.gene.sf>
[salmon-res]: <https://salmon.readthedocs.io/en/latest/file_formats.html#fileformats>
[arriba]: <https://arriba.readthedocs.io/en/latest/>
[arriba-res]: </inst/rawdata/test_data/final/test_sample_WTS/arriba/fusions.tsv>
[dragen-rna]: <https://sapac.illumina.com/products/by-type/informatics-products/basespace-sequence-hub/apps/edico-genome-inc-dragen-rna-pipeline.html>
[dragen-rna-ex]: </inst/rawdata/test_data/dragen/test_sample_WTS.fusion_candidates.final>

These files are expected to be organised in the following structure:

```text
|
|____<SampleName>
|____<SampleName>quant.sf
|____<SampleName>.fusion_candidates.final
```

### WGS

`RNAsum` is designed to be compatible with WGS outputs generated from
`umccrise`.
`RNAsum` is designed to be compatible with WGS outputs.

The table below lists all input data accepted in `RNAsum`:

Expand All @@ -204,20 +200,6 @@ The table below lists all input data accepted in `RNAsum`:
[manta]: <https://github.com/Illumina/manta>
[manta-ex]: </inst/rawdata/test_data/umccrised/test_sample_WGS/structural/sv-prioritize-manta.tsv>

These files are expected to be organised in the following structure:

```text
|
|____umccrised
|____<SampleName>
|____pcgr
| |____<SampleName>-somatic.pcgr.snvs_indels.tiers.tsv
|____purple
| |____<SampleName>.purple.gene.cnv
|____structural
|____<SampleName>-manta.tsv
```

## Usage

```{bash echo=TRUE, eval=FALSE}
Expand All @@ -242,23 +224,24 @@ echo ""

Human reference genome
***[GRCh38](https://www.ncbi.nlm.nih.gov/assembly/GCF_000001405.39)***
(*Ensembl* based annotation version ***86***) is used for gene annotation by
(*Ensembl* based annotation version ***105***) is used for gene annotation by
default. GRCh37 is no longer supported.

### Examples

Below are `RNAsum` CLI commands for generating HTML reports under different
data availability scenarios:

1. [WTS data only](#1-wts-data-only)
2. [WTS and WGS data](#2-wts-and-wgs-data)

1. [WTS and WGS data](#1-wts-and-wgs-data)
2. [WTS data only](#2-wts-data-only)
3. [WTS WGS and clinical data](#3-wts-wgs-and-clinical-data)

**Note**

* Example data is provided in the `/inst/rawdata/test_data` folder of the GitHub
[repo][rnasum-gh].
* The `RNAsum` runtime should be less than **20 minutes** using **16GB RAM**
* The `RNAsum` runtime should be less than **15 minutes** using **16GB RAM**
memory and **1 CPU**.

#### 1. WTS and WGS data
Expand All @@ -267,9 +250,8 @@ This is the **most frequent and preferred case**, in which the
[WGS](#wgs)-based findings will be used as a primary source for expression
profile prioritisation. The genome-based results can be incorporated into the
report by specifying the location of the corresponding
`umccrise` output files (including results from `PCGR`, `PURPLE`, and `Manta`)
using the `--umccrise` argument. The **`Mutated genes`**,
**`Structural variants`** and **`CN altered genes`** report sections will
output files (including results from `PCGR`, `PURPLE`, and `Manta`).
The **`Mutated genes`**, **`Structural variants`** and **`CN altered genes`** report sections will
contain information about expression levels of the mutated genes,
genes located within detected SVs and CN altered regions, respectively.
The results in the **`Fusion genes`** section will be ordered based on the
Expand All @@ -280,9 +262,12 @@ adenocarcinoma dataset is used as reference cohort (`--dataset TEST `).
rnasum.R \
--sample_name test_sample_WTS \
--dataset TEST \
--dragen_rnaseq inst/rawdata/test_data/dragen \
--manta_tsv \
--pcgr_tiers_tsv \
--purple_gene_tsv \
--salmon \
--arriba_tsv \
--report_dir inst/rawdata/test_data/dragen/RNAsum \
--umccrise inst/rawdata/test_data/umccrised/test_sample_WGS \
--save_tables FALSE
```

Expand All @@ -292,7 +277,7 @@ The HTML report `test_sample_WTS.RNAsum.html` will be created in the
#### 2. WTS data only

In this scenario, only [WTS](#wts) data will be used and only expression levels
of key **[`UMCCR Cancer genes`](https://github.com/umccr/umccrise/blob/master/workflow.md#key-cancer-genes)**,
of key **[`Cancer genes`](https://github.com/umccr/umccrise/blob/master/workflow.md#key-cancer-genes)**,
**`Fusion genes`**, **`Immune markers`** and homologous recombination deficiency
genes (**`HRD genes`**) will be reported. Moreover, gene fusions reported in
the `Fusion genes` report section will not contain information about evidence
Expand All @@ -303,7 +288,8 @@ is used as the reference cohort (`--dataset TEST`).
rnasum.R \
--sample_name test_sample_WTS \
--dataset TEST \
--dragen_rnaseq inst/rawdata/test_data/dragen \
--salmon inst/rawdata/test_data/dragen \
--arriba \
--report_dir inst/rawdata/test_data/dragen/RNAsum \
--save_tables FALSE
```
Expand Down Expand Up @@ -386,4 +372,4 @@ that are presented in the HTML report.

#### Code of Conduct

The code of conduct can be accessed [here](./CODE_OF_CONDUCT.md).
The code of conduct can be accessed [here](./CODE_OF_CONDUCT.md)

0 comments on commit e0399c8

Please sign in to comment.