diff --git a/README.Rmd b/README.Rmd
index 4a001977..f2d460ce 100755
--- a/README.Rmd
+++ b/README.Rmd
@@ -63,8 +63,7 @@ below. For more details, see [workflow.md](/workflow.md).
-1. Collect patient **WTS data** from the [DRAGEN RNA][dragen-rna]
- pipeline including per-gene read counts and gene fusions.
+1. Collect sample **WTS data** including per-gene read counts and gene fusions.
2. Add expression data from **[reference cohorts](#reference-data)** to get an
idea about the expression levels of genes of interest in other cancer patient
cohorts. The read counts are normalised, transformed and converted into a scale
@@ -83,14 +82,12 @@ below. For more details, see [workflow.md](/workflow.md).
plots presenting expression levels of the genes of interest. The report
consists of several sections described [here](./articles/report_structure.md).
-[dragen-rna]:
-
## Reference data
The reference expression data are available for **33 cancer types** and were
derived from [external](#external-reference-cohorts)
([TCGA](https://tcga-data.nci.nih.gov/)) and [internal](#internal-reference-cohort)
-([UMCCR](https://research.unimelb.edu.au/centre-for-cancer-research/our-research/precision-oncology-research-group))
+([UMCCR](https://mdhs.unimelb.edu.au/centre-for-cancer-research/our-research/precision-oncology-research-group))
resources.
### External reference cohorts
@@ -148,46 +145,45 @@ There are two rationales for using the internal reference cohort:
## Input data
-`RNAsum` accepts [WTS](#wts) data processed by the `DRAGEN RNA`
-pipeline. Additionally, the WTS data
-can be integrated with [WGS](#wgs)-based data processed using the
-`umccrise` pipeline. In the latter case, the genome-based findings from the
-corresponding patient sample are incorporated into the report and are used as a
+`RNAsum` accepts [WTS](#wts) data processed by the state-of-the-art bioinformatic
+tools such as kallisto and salmon for quantification and Arriba for fusion calling.
+RNAsum can aso process and combine fusion output from Illumina's Dragen pipeline.
+Additionally, the WTS data can be integrated with [WGS](#wgs)-based data processed
+using the tools discussed in the section [WGS](#wgs).
+
+In the latter case, the genome-based findings from the
+corresponding sample are incorporated into the report and are used as a
primary source for expression profile prioritisation.
### WTS
The only required WTS input data are **read counts** provided in a
-quantification file from the `DRAGEN RNA` pipeline.
+quantification file.
-#### DRAGEN RNA
+#### RNA
The table below lists all input data accepted in `RNAsum`:
| Input file | Tool | Example | Required |
| -------------------------------------- | ---------------------------------------------- | ----------------------------------------------- | ---------- |
-| Quantified transcript **abundances** | [salmon][salmon] ([description][salmon-res]) | [TEST.quant.sf][salmon-ex] | **Yes** |
-| **Fusion gene** list | [DRAGEN RNA][dragen-rna] | [TEST.fusion_candidates.final][dragen-rna-ex] | No |
+| Quantified transcript **abundances** | [salmon][salmon] ([description][salmon-res]) | [*.quant.sf][salmon-ex] | **Yes** |
+| Quantified gene **abundances** | [salmon][salmon] ([description][salmon-res]) | [*.quant.gene.sf][salmon-ex2] | **Yes** |
+| **Fusion gene** list | [Arriba][arriba] | [fusions.tsv][dragen-rna-ex] | No |
+| **Fusion gene** list | [DRAGEN RNA][dragen-rna] | [*.fusion_candidates.final][dragen-rna-ex] | No |
[salmon]:
[salmon-ex]:
+[salmon-ex2]:
[salmon-res]:
+[arriba]:
+[arriba-res]:
[dragen-rna]:
[dragen-rna-ex]:
-These files are expected to be organised in the following structure:
-
-```text
-|
-|____
- |____quant.sf
- |____.fusion_candidates.final
-```
### WGS
-`RNAsum` is designed to be compatible with WGS outputs generated from
-`umccrise`.
+`RNAsum` is designed to be compatible with WGS outputs.
The table below lists all input data accepted in `RNAsum`:
@@ -204,20 +200,6 @@ The table below lists all input data accepted in `RNAsum`:
[manta]:
[manta-ex]:
-These files are expected to be organised in the following structure:
-
-```text
-|
-|____umccrised
- |____
- |____pcgr
- | |____-somatic.pcgr.snvs_indels.tiers.tsv
- |____purple
- | |____.purple.gene.cnv
- |____structural
- |____-manta.tsv
-```
-
## Usage
```{bash echo=TRUE, eval=FALSE}
@@ -242,7 +224,7 @@ echo ""
Human reference genome
***[GRCh38](https://www.ncbi.nlm.nih.gov/assembly/GCF_000001405.39)***
-(*Ensembl* based annotation version ***86***) is used for gene annotation by
+(*Ensembl* based annotation version ***105***) is used for gene annotation by
default. GRCh37 is no longer supported.
### Examples
@@ -250,15 +232,16 @@ default. GRCh37 is no longer supported.
Below are `RNAsum` CLI commands for generating HTML reports under different
data availability scenarios:
-1. [WTS data only](#1-wts-data-only)
-2. [WTS and WGS data](#2-wts-and-wgs-data)
+
+1. [WTS and WGS data](#1-wts-and-wgs-data)
+2. [WTS data only](#2-wts-data-only)
3. [WTS WGS and clinical data](#3-wts-wgs-and-clinical-data)
**Note**
* Example data is provided in the `/inst/rawdata/test_data` folder of the GitHub
[repo][rnasum-gh].
-* The `RNAsum` runtime should be less than **20 minutes** using **16GB RAM**
+* The `RNAsum` runtime should be less than **15 minutes** using **16GB RAM**
memory and **1 CPU**.
#### 1. WTS and WGS data
@@ -267,9 +250,8 @@ This is the **most frequent and preferred case**, in which the
[WGS](#wgs)-based findings will be used as a primary source for expression
profile prioritisation. The genome-based results can be incorporated into the
report by specifying the location of the corresponding
-`umccrise` output files (including results from `PCGR`, `PURPLE`, and `Manta`)
-using the `--umccrise` argument. The **`Mutated genes`**,
-**`Structural variants`** and **`CN altered genes`** report sections will
+output files (including results from `PCGR`, `PURPLE`, and `Manta`).
+The **`Mutated genes`**, **`Structural variants`** and **`CN altered genes`** report sections will
contain information about expression levels of the mutated genes,
genes located within detected SVs and CN altered regions, respectively.
The results in the **`Fusion genes`** section will be ordered based on the
@@ -280,9 +262,12 @@ adenocarcinoma dataset is used as reference cohort (`--dataset TEST `).
rnasum.R \
--sample_name test_sample_WTS \
--dataset TEST \
- --dragen_rnaseq inst/rawdata/test_data/dragen \
+ --manta_tsv \
+ --pcgr_tiers_tsv \
+ --purple_gene_tsv \
+ --salmon \
+ --arriba_tsv \
--report_dir inst/rawdata/test_data/dragen/RNAsum \
- --umccrise inst/rawdata/test_data/umccrised/test_sample_WGS \
--save_tables FALSE
```
@@ -292,7 +277,7 @@ The HTML report `test_sample_WTS.RNAsum.html` will be created in the
#### 2. WTS data only
In this scenario, only [WTS](#wts) data will be used and only expression levels
-of key **[`UMCCR Cancer genes`](https://github.com/umccr/umccrise/blob/master/workflow.md#key-cancer-genes)**,
+of key **[`Cancer genes`](https://github.com/umccr/umccrise/blob/master/workflow.md#key-cancer-genes)**,
**`Fusion genes`**, **`Immune markers`** and homologous recombination deficiency
genes (**`HRD genes`**) will be reported. Moreover, gene fusions reported in
the `Fusion genes` report section will not contain information about evidence
@@ -303,7 +288,8 @@ is used as the reference cohort (`--dataset TEST`).
rnasum.R \
--sample_name test_sample_WTS \
--dataset TEST \
- --dragen_rnaseq inst/rawdata/test_data/dragen \
+ --salmon inst/rawdata/test_data/dragen \
+ --arriba \
--report_dir inst/rawdata/test_data/dragen/RNAsum \
--save_tables FALSE
```
@@ -386,4 +372,4 @@ that are presented in the HTML report.
#### Code of Conduct
-The code of conduct can be accessed [here](./CODE_OF_CONDUCT.md).
+The code of conduct can be accessed [here](./CODE_OF_CONDUCT.md)