Skip to content

Commit

Permalink
Merge pull request #159 from umccr/readme_update
Browse files Browse the repository at this point in the history
Sync READMEs
  • Loading branch information
pdiakumis authored Aug 9, 2024
2 parents d2c49fc + dc75fb1 commit 08625ad
Show file tree
Hide file tree
Showing 2 changed files with 47 additions and 69 deletions.
33 changes: 15 additions & 18 deletions README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -16,8 +16,8 @@ knitr::opts_chunk$set(

# RNAsum

`RNAsum` is an R package that can post-process, summarise and visualise outputs
primarily from [DRAGEN RNA][dragen-rna] pipelines.
`RNAsum` is an R package that can post-process, summarise and visualise
outputs primarily from [DRAGEN RNA][dragen-rna] pipelines.
Its main application is to complement genome-based findings from the
[umccrise][umccrise] pipeline and to provide additional evidence for detected
alterations.
Expand All @@ -34,7 +34,7 @@ alterations.
[rnasum-gh]: <https://github.com/umccr/RNAsum>

```r
remotes::install_github("umccr/RNAsum") # latest master commit
remotes::install_github("umccr/RNAsum") # latest main commit
remotes::install_github("umccr/[email protected]") # version 0.0.X
remotes::install_github("umccr/RNAsum@abcde") # commit abcde
remotes::install_github("umccr/RNAsum#123") # PR 123
Expand All @@ -58,14 +58,15 @@ docker pull ghcr.io/umccr/rnasum:latest

## Workflow

The pipeline consists of five main components illustrated and briefly described
below. For more details, see [workflow.md](/workflow.md).
The pipeline consists of five main components illustrated and briefly
described below. For more details, see [workflow.md](/workflow.md).

<img src="man/figures/RNAsum_workflow.png" width="100%">
<img src="man/figures/RNAsum_workflow_updated.png" width="100%">

1. Collect sample **WTS data** including per-gene read counts and gene fusions.
2. Add expression data from **[reference cohorts](#reference-data)** to get an
idea about the expression levels of genes of interest in other cancer patient
1. Collect patient **WTS data** from the [DRAGEN RNA][dragen-rna] pipeline
including per-gene read counts and gene fusions.
2. Add expression data from **[reference cohorts](#reference-data)** to
get an idea about the expression levels of genes of interest in other cancer patient
cohorts. The read counts are normalised, transformed and converted into a scale
that allows to present the patient's expression measurements in the context of the
reference cohorts.
Expand Down Expand Up @@ -261,12 +262,9 @@ adenocarcinoma dataset is used as reference cohort (`--dataset TEST `).
rnasum.R \
--sample_name test_sample_WTS \
--dataset TEST \
--manta_tsv \
--pcgr_tiers_tsv \
--purple_gene_tsv \
--salmon \
--arriba_tsv \
--dragen_wts_dir inst/rawdata/test_data/dragen \
--report_dir inst/rawdata/test_data/dragen/RNAsum \
--umccrise inst/rawdata/test_data/umccrised/test_sample_WGS \
--save_tables FALSE
```

Expand All @@ -287,8 +285,7 @@ is used as the reference cohort (`--dataset TEST`).
rnasum.R \
--sample_name test_sample_WTS \
--dataset TEST \
--salmon inst/rawdata/test_data/dragen \
--arriba \
--dragen_wts_dir inst/rawdata/test_data/dragen \
--report_dir inst/rawdata/test_data/dragen/RNAsum \
--save_tables FALSE
```
Expand All @@ -313,7 +310,7 @@ cohort (`--dataset TEST `).
rnasum.R \
--sample_name test_sample_WTS \
--dataset TEST \
--dragen_rnaseq $(pwd)/../rawdata/test_data/dragen \
--dragen_wts_dir $(pwd)/../rawdata/test_data/dragen \
--report_dir $(pwd)/../rawdata/test_data/dragen/RNAsum \
--umccrise $(pwd)/../rawdata/test_data/umccrised/test_sample_WGS \
--save_tables FALSE \
Expand Down Expand Up @@ -371,4 +368,4 @@ that are presented in the HTML report.

#### Code of Conduct

The code of conduct can be accessed [here](./CODE_OF_CONDUCT.md)
The code of conduct can be accessed [here](./CODE_OF_CONDUCT.md).
83 changes: 32 additions & 51 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ provide additional evidence for detected alterations.
source](https://github.com/umccr/RNAsum):

``` r
remotes::install_github("umccr/RNAsum") # latest master commit
remotes::install_github("umccr/RNAsum") # latest main commit
remotes::install_github("umccr/[email protected]") # version 0.0.X
remotes::install_github("umccr/RNAsum@abcde") # commit abcde
remotes::install_github("umccr/RNAsum#123") # PR 123
Expand Down Expand Up @@ -88,7 +88,7 @@ The reference expression data are available for **33 cancer types** and
were derived from [external](#external-reference-cohorts)
([TCGA](https://tcga-data.nci.nih.gov/)) and
[internal](#internal-reference-cohort)
([UMCCR](https://research.unimelb.edu.au/centre-for-cancer-research/our-research/precision-oncology-research-group))
([UMCCR](https://mdhs.unimelb.edu.au/centre-for-cancer-research/our-research/precision-oncology-research-group))
resources.

### External reference cohorts
Expand Down Expand Up @@ -150,62 +150,44 @@ There are two rationales for using the internal reference cohort:

## Input data

`RNAsum` accepts [WTS](#wts) data processed by the `DRAGEN RNA`
pipeline. Additionally, the WTS data can be integrated with
[WGS](#wgs)-based data processed using the `umccrise` pipeline. In the
latter case, the genome-based findings from the corresponding patient
`RNAsum` accepts [WTS](#wts) data processed by the state-of-the-art
bioinformatic tools such as kallisto and salmon for quantification and
Arriba for fusion calling. RNAsum can aso process and combine fusion
output from Illumina’s Dragen pipeline. Additionally, the WTS data can
be integrated with [WGS](#wgs)-based data processed using the tools
discussed in the section [WGS](#wgs).

In the latter case, the genome-based findings from the corresponding
sample are incorporated into the report and are used as a primary source
for expression profile prioritisation.

### WTS

The only required WTS input data are **read counts** provided in a
quantification file from the `DRAGEN RNA` pipeline.
quantification file.

#### DRAGEN RNA
#### RNA

The table below lists all input data accepted in `RNAsum`:

| Input file | Tool | Example | Required |
|--------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------|----------|
| Quantified transcript **abundances** | [salmon](https://salmon.readthedocs.io/en/latest/salmon.html) ([description](https://salmon.readthedocs.io/en/latest/file_formats.html#fileformats)) | [TEST.quant.sf](/inst/rawdata/test_data/dragen/TEST.quant.sf) | **Yes** |
| **Fusion gene** list | [DRAGEN RNA](https://sapac.illumina.com/products/by-type/informatics-products/basespace-sequence-hub/apps/edico-genome-inc-dragen-rna-pipeline.html) | [TEST.fusion_candidates.final](/inst/rawdata/test_data/dragen/test_sample_WTS.fusion_candidates.final) | No |

These files are expected to be organised in the following structure:

``` text
|
|____<SampleName>
|____<SampleName>quant.sf
|____<SampleName>.fusion_candidates.final
```
| Input file | Tool | Example | Required |
|----|----|----|----|
| Quantified transcript **abundances** | [salmon](https://salmon.readthedocs.io/en/latest/salmon.html) ([description](https://salmon.readthedocs.io/en/latest/file_formats.html#fileformats)) | [\*.quant.sf](/inst/rawdata/test_data/dragen/TEST.quant.sf) | **Yes** |
| Quantified gene **abundances** | [salmon](https://salmon.readthedocs.io/en/latest/salmon.html) ([description](https://salmon.readthedocs.io/en/latest/file_formats.html#fileformats)) | [\*.quant.gene.sf](/inst/rawdata/test_data/dragen/TEST.quant.gene.sf) | **Yes** |
| **Fusion gene** list | [Arriba](https://arriba.readthedocs.io/en/latest/) | [fusions.tsv](/inst/rawdata/test_data/dragen/test_sample_WTS.fusion_candidates.final) | No |
| **Fusion gene** list | [DRAGEN RNA](https://sapac.illumina.com/products/by-type/informatics-products/basespace-sequence-hub/apps/edico-genome-inc-dragen-rna-pipeline.html) | [\*.fusion_candidates.final](/inst/rawdata/test_data/dragen/test_sample_WTS.fusion_candidates.final) | No |

### WGS

`RNAsum` is designed to be compatible with WGS outputs generated from
`umccrise`.
`RNAsum` is designed to be compatible with WGS outputs.

The table below lists all input data accepted in `RNAsum`:

| Input file | Tool | Example | Required |
|-----------------|-------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------|----------|
| **SNVs/Indels** | [PCGR](https://github.com/sigven/pcgr) | [pcgr.snvs_indels.tiers.tsv](/inst/rawdata/test_data/umccrised/test_sample_WGS/small_variants/pcgr.snvs_indels.tiers.tsv) | No |
| **CNVs** | [PURPLE](https://github.com/hartwigmedical/hmftools/tree/master/purple) | [purple.cnv.gene.tsv](/inst/rawdata/test_data/umccrised/test_sample_WGS/purple/purple.gene.cnv) | No |
| **SVs** | [Manta](https://github.com/Illumina/manta) | [sv-prioritize-manta.tsv](/inst/rawdata/test_data/umccrised/test_sample_WGS/structural/sv-prioritize-manta.tsv) | No |

These files are expected to be organised in the following structure:

``` text
|
|____umccrised
|____<SampleName>
|____pcgr
| |____<SampleName>-somatic.pcgr.snvs_indels.tiers.tsv
|____purple
| |____<SampleName>.purple.gene.cnv
|____structural
|____<SampleName>-manta.tsv
```
| Input file | Tool | Example | Required |
|----|----|----|----|
| **SNVs/Indels** | [PCGR](https://github.com/sigven/pcgr) | [pcgr.snvs_indels.tiers.tsv](/inst/rawdata/test_data/umccrised/test_sample_WGS/small_variants/pcgr.snvs_indels.tiers.tsv) | No |
| **CNVs** | [PURPLE](https://github.com/hartwigmedical/hmftools/tree/master/purple) | [purple.cnv.gene.tsv](/inst/rawdata/test_data/umccrised/test_sample_WGS/purple/purple.gene.cnv) | No |
| **SVs** | [Manta](https://github.com/Illumina/manta) | [sv-prioritize-manta.tsv](/inst/rawdata/test_data/umccrised/test_sample_WGS/structural/sv-prioritize-manta.tsv) | No |

## Usage

Expand All @@ -215,7 +197,7 @@ export PATH="${rnasum_cli}:${PATH}"
```

$ rnasum.R --version
1.0.0
1.1.0

$ rnasum.R --help
Usage
Expand Down Expand Up @@ -332,23 +314,23 @@ export PATH="${rnasum_cli}:${PATH}"

Human reference genome
***[GRCh38](https://www.ncbi.nlm.nih.gov/assembly/GCF_000001405.39)***
(*Ensembl* based annotation version ***86***) is used for gene
(*Ensembl* based annotation version ***105***) is used for gene
annotation by default. GRCh37 is no longer supported.

### Examples

Below are `RNAsum` CLI commands for generating HTML reports under
different data availability scenarios:

1. [WTS data only](#1-wts-data-only)
2. [WTS and WGS data](#2-wts-and-wgs-data)
1. [WTS and WGS data](#1-wts-and-wgs-data)
2. [WTS data only](#2-wts-data-only)
3. [WTS WGS and clinical data](#3-wts-wgs-and-clinical-data)

**Note**

- Example data is provided in the `/inst/rawdata/test_data` folder of
the GitHub [repo](https://github.com/umccr/RNAsum).
- The `RNAsum` runtime should be less than **20 minutes** using **16GB
- The `RNAsum` runtime should be less than **15 minutes** using **16GB
RAM** memory and **1 CPU**.

#### 1. WTS and WGS data
Expand All @@ -357,9 +339,8 @@ This is the **most frequent and preferred case**, in which the
[WGS](#wgs)-based findings will be used as a primary source for
expression profile prioritisation. The genome-based results can be
incorporated into the report by specifying the location of the
corresponding `umccrise` output files (including results from `PCGR`,
`PURPLE`, and `Manta`) using the `--umccrise` argument. The
**`Mutated genes`**, **`Structural variants`** and
corresponding output files (including results from `PCGR`, `PURPLE`, and
`Manta`). The **`Mutated genes`**, **`Structural variants`** and
**`CN altered genes`** report sections will contain information about
expression levels of the mutated genes, genes located within detected
SVs and CN altered regions, respectively. The results in the
Expand All @@ -384,7 +365,7 @@ The HTML report `test_sample_WTS.RNAsum.html` will be created in the

In this scenario, only [WTS](#wts) data will be used and only expression
levels of key
**[`UMCCR Cancer genes`](https://github.com/umccr/umccrise/blob/master/workflow.md#key-cancer-genes)**,
**[`Cancer genes`](https://github.com/umccr/umccrise/blob/master/workflow.md#key-cancer-genes)**,
**`Fusion genes`**, **`Immune markers`** and homologous recombination
deficiency genes (**`HRD genes`**) will be reported. Moreover, gene
fusions reported in the `Fusion genes` report section will not contain
Expand Down

0 comments on commit 08625ad

Please sign in to comment.