-
Notifications
You must be signed in to change notification settings - Fork 3
/
Copy pathREADME.Rmd
429 lines (317 loc) · 22.8 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
<!--You'll still need to render `README.Rmd` regularly, to keep `README.md` up-to-date. `devtools::build_readme()` is handy for this. You could also use GitHub Actions to re-render `README.Rmd` every time you push. An example workflow can be found here: <https://github.com/r-lib/actions/tree/master/examples>. -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "inst/figures/README-",
out.width = "100%"
)
```
<!-- badges: start -->
[![License: GPL v3](https://img.shields.io/badge/License-GPLv3-blue.svg)](https://www.gnu.org/licenses/gpl-3.0)
[![Lifecycle: stable](https://img.shields.io/badge/lifecycle-stable-brightgreen.svg)](https://lifecycle.r-lib.org/articles/stages.html#stable)
[![DOI](https://zenodo.org/badge/374398121.svg)](https://zenodo.org/badge/latestdoi/374398121)
<!-- badges: end -->
![](./GRETTA_hex_logo-02.png)
# Introduction
Genetic inteRaction and EssenTiality mApper (GRETTA) is an R package that leverages data generated by the [Cancer Dependency Map (DepMap) project](https://depmap.org/portal/) to perform in-silico genetic knockout screens and map essentiality networks. A manuscript describing this tool is available at [bioinformatics (Takemon, Y. and Marra, MA., 2023)](https://doi.org/10.1093/bioinformatics/btad381).
The DepMap data used in this tutorial is version 22Q2. This version along with all versions provided in this repository were downloaded through the DepMap data portal, which was distributed and used under the terms and conditions of [CC Attribution 4.0 license](https://creativecommons.org/licenses/by/4.0/).
## Maintainer
This repository is maintained by [Yuka Takemon](https://github.com/ytakemon), a PhD candidate in [Dr. Marco Marra](https://www.bcgsc.ca/labs/marra-lab)'s laboratory at [Canada's Michael Smith Genome Sciences Centre](https://www.bcgsc.ca/).
## Citations
When using GRETTA, please cite the manuscript describing GRETTA:
Yuka Takemon, Marco A Marra, GRETTA: an R package for mapping in silico genetic interaction and essentiality networks, Bioinformatics, Volume 39, Issue 6, June 2023, btad381, https://doi.org/10.1093/bioinformatics/btad381
Please also cite the DepMap project and the appropriate data version found on https://depmap.org/portal/:
Tsherniak A, Vazquez F, Montgomery PG, Weir BA, Kryukov G, Cowley GS, Gill S, Harrington WF, Pantel S, Krill-Burger JM, Meyers RM, Ali L, Goodale A, Lee Y, Jiang G, Hsiao J, Gerath WFJ, Howell S, Merkel E, Ghandi M, Garraway LA, Root DE, Golub TR, Boehm JS, Hahn WC. Defining a Cancer Dependency Map. Cell. 2017 Jul 27;170(3):564-576.
## Questions
Please check the [FAQ section](https://github.com/ytakemon/GRETTA/wiki/Frequently-Asked-Questions) for additional information and if you cannot find your answer there or have a request please submit an [issue](https://github.com/ytakemon/GRETTA/issues).
## Requirements
* GRETTA is supported and compatible for R versions >= 4.2.0.
* 12G of space to store one DepMap data set with and an additional 11G of temporary space to for .tar.gz prior to extraction.
# Installation
> **Warning**
> The new version of [dbplyr (v2.4.0)](https://stackoverflow.com/questions/77370659/error-failed-to-collect-lazy-table-caused-by-error-in-db-collect-using/77370920#77370920
) is currently incompatable with another library used in GRETTA. If you encounter an error message like the one below. Please install the previous working version also shown below.
>
> Error message:
>
> ```
> Error in `collect()`:
> ! Failed to collect lazy table.
> Caused by error in `db_collect()`:
> ! Arguments in `...` must be used.
> ✖ Problematic argument:
> • ..1 = Inf
> ℹ Did you misspell an argument name?
> ```
>
> Solution:
>
> ```
> install.packages("devtools")
> devtools::install_version("dbplyr", version = "2.3.4")`
> ```
You can install the GRETTA package from [GitHub](https://github.com) with:
``` r
install.packages(c("devtools", "dplyr","forcats","ggplot2"))
devtools::install_github("ytakemon/GRETTA")
```
DepMap 22Q2 data and the data documentation files are provided above and can be extracted directly in terminal using the following bash code (not in R/RStudio). For other DepMap data versions please refer to the [FAQ section](https://github.com/ytakemon/GRETTA/wiki/Frequently-Asked-Questions#q-how-to-download-and-use-other-versions-of-depmap-data).
``` bash
# Make a new directory/folder called GRETTA_project and go into directory
mkdir GRETTA_project
cd GRETTA_project
# Download data from the web
wget https://www.bcgsc.ca/downloads/ytakemon/GRETTA/22Q2/GRETTA_DepMap_22Q2_data.tar.gz
# Extract data and data documentation
tar -zxvf GRETTA_DepMap_22Q2_data.tar.gz
```
A singularity container has also been provided and instructions can be found [here](https://github.com/ytakemon/GRETTA/wiki/Frequently-Asked-Questions#q-how-to-run-singularity).
### Additional DepMap versions
In this example we use DepMap's 2022 data release (22Q2). However, we also provide previous data released in 2020 (v20Q1) and 2021 (v21Q4), which are available at :`https://www.bcgsc.ca/downloads/ytakemon/GRETTA/`. We are hoping to make new data sets available as the are released by DepMap.
# Workflows
## Genetic interaction mapping
1. Install `GRETTA` and download accompanying data.
1. Select mutant cell lines that carry mutations in the gene of interest and control cell lines.
+ ([optional specifications](https://github.com/ytakemon/GRETTA/wiki/Frequently-Asked-Questions#q-how-can-context-specific-genetic-screens-or-essentiality-network-analyses-be-performed)) can be used to select cell lines based on disease type, disease subtype, or amino acid change.
1. Determine differential expression between mutant and control cell line groups.
+ (optional but recommended).
1. Perform *in silico* genetic screen.
1. Visualize results.
## Co-essential network mapping
1. Install `GRETTA` and download accompanying data.
1. Run correlation coefficient analysis.
+ ([optional specifications](https://github.com/ytakemon/GRETTA/wiki/Frequently-Asked-Questions#q-how-can-context-specific-genetic-screens-or-essentiality-network-analyses-be-performed:~:text=For%20the%20essentiality%20network%20analysis%2C%20context%2Dspecific%20cell%20lines%20can%20be%20selected%20in%20two%20ways%3A)) can be used to perform analysis on cell lines of a specific disease type(s).
1. Calculate inflection points of negative/positive curve to determine a threshold.
1. Apply threshold.
1. Visualize results.
# Example: Identifying *ARID1A* genetic interactions
*ARID1A* encodes a member of the chromatin remodeling SWItch/Sucrose Non-Fermentable (SWI/SNF) complex and is a frequently mutated gene in cancer. It is known that *ARID1A* and its homolog, *ARID1B*, are synthetic lethal to one another: The dual loss of ARID1A and its homolog, ARID1B, in a cell is lethal; however, the loss of either gene alone is not ([Helming et al., 2014](https://doi.org/10.1038/nm.3480)). This example will demonstrate how we can identify synthetic lethal interactors of *ARID1A* using `GRETTA` and predict this known interaction.
For this example you will need to call the following libraries. If you they are not installed yet use `install.packages()` (eg. `install.packages("dplyr")`).
```{r libraries}
# Load library
library(tidyverse)
library(GRETTA)
```
## Download example data
A small data set has been created for this tutorial and can be downloaded using the following code.
```{r download_data}
path <- getwd()
download_example_data(path)
```
Then, assign variable that point to where the `.rda` files are stored and where result files should go.
```{r data_path}
gretta_data_dir <- paste0(path,"/GRETTA_example/")
gretta_output_dir <- paste0(path,"/GRETTA_example_output/")
```
## Exploring cell lines
One way to explore cell lines that are available in DepMap is through their [portal](https://depmap.org/portal/). However, there are some simple built-in methods in GRETTA to provide users with a way to glimpse the data using the series of `list_available` functions: `list_mutations()`, `list_cancer_types()`, `list_cancer_subtypes()`
Current DepMap data used by default is version 22Q2, which contains whole-genome sequencing or whole-exome sequencing annotations for `1771` cancer cell lines (`1406` cell lines with RNA-seq data, `375` cell lines with quantitative proteomics data, and `1086` cell lines with CRISPR-Cas9 knockout screen data)
```{r listing_available_muts, eval = FALSE}
## Find ARID1A hotspot mutations detected in all cell lines
list_mutations(gene = "ARID1A", is_hotspot = TRUE, data_dir = gretta_data_dir)
```
```{r listing_available_cancer_types, eval = TRUE}
## List all available cancer types
list_cancer_types(data_dir = gretta_data_dir)
## List all available cancer subtypes
list_cancer_subtypes(input_disease = "Lung Cancer", data_dir = gretta_data_dir)
```
## Selecting mutant and control cell line groups
As default `select_cell_lines()` will identify cancer cell lines with loss-of-function alterations in the gene specified and group them into six different groups.
Loss-of-function alterations include variants that are annotated as: `"Nonsense_Mutation", "Frame_Shift_Ins", "Splice_Site", "De_novo_Start_OutOfFrame", "Frame_Shift_Del", "Start_Codon_SNP", "Start_Codon_Del",` and `"Start_Codon_Ins"`. Copy number alterations are also taken into consideration and group as `"Deep_del", "Loss", "Neutral",` or `"Amplified"`.
The cell line groups assigned by default are:
* `Control` cell lines do not harbor any single nucleotide variations (SNVs) or insertions and deletions (InDels) with a neutral copy number (CN).
* `HomDel` cell lines harbor one or more homozygous deleterious SNVs or have deep CN loss.
* `T-HetDel` cell lines harbor two or more heterozygous deleterious SNVs/InDels with neutral or CN loss.
* `HetDel` cell lines harbor one heterozygous deleterious SNV/InDel with neutral CN, or no SNV/InDel with CN loss.
* `Amplified` cell lines harbor no SNVs/InDels with increased CN.
* `Others` cell lines harbor deleterious SNVs with increased CN.
```{r select_groups, eval = TRUE}
ARID1A_groups <- select_cell_lines(input_gene = "ARID1A", data_dir = gretta_data_dir)
## Show number of cell lines in each group
count(ARID1A_groups, Group)
```
### Optional cell line filters
There are several additional filters that can be combined together to narrow down your search. These
* `input_aa_change` - by amino acid change (eg. "p.Q515*").
* `input_disease` - by disease type (eg. "Pancreatic Cancer")
* `input_disease_subtype` - by disease subtype (eg. "Ductal Adenosquamous Carcinoma")
```{r select_groups_option, eval = TRUE}
## Find pancreatic cancer cell lines with ARID1A mutations
ARID1A_pancr_groups <- select_cell_lines(input_gene = "ARID1A",
input_disease = "Pancreatic Cancer",
data_dir = gretta_data_dir)
## Show number of cell lines in each group
count(ARID1A_pancr_groups, Group)
```
## Check for differential expression
Of the three mutant cancer cell line groups `ARID1A_HomDel`, `ARID1A_T-HetDel`, and `ARID1A_HetDel`, cancer cell lines with `ARID1A_HomDel` mutations are most likely to result in a loss or reduced expression of *ARID1A*. Therefore, we want to check whether cell lines in `ARID1A_HomDel` mutant group have significantly less *ARID1A* RNA or protein expression compared to control cell lines.
```{r Check_expression_rna, eval = TRUE}
## Select only HomDel and Control cell lines
ARID1A_groups_subset <- ARID1A_groups %>% filter(Group %in% c("ARID1A_HomDel", "Control"))
## Get RNA expression
ARID1A_rna_expr <- extract_rna(
input_samples = ARID1A_groups_subset$DepMap_ID,
input_genes = "ARID1A",
data_dir = gretta_data_dir)
```
Not all cell lines contain RNA and/or protein expression profiles, and not all proteins were detected by mass spectrometer. (Details on data generation can be found on the [DepMap site](https://depmap.org/portal/).)
```{r Check_expression_protein, eval = FALSE}
## Get protein expression
ARID1A_prot_expr <- extract_prot(
input_samples = ARID1A_groups_subset$DepMap_ID,
input_genes = "ARID1A",
data_dir = gretta_data_dir)
## Produces an error message since ARID1A protein data is not available
```
Using Welch's t-test, we can check to see whether *ARID1A* RNA expression (in TPM) is significantly reduced in `ARID1A_HomDel` cell lines compared to `Controls`.
```{r Check_expression_rna_stats, eval = TRUE}
## Append groups and test differential expression
ARID1A_rna_expr <- left_join(
ARID1A_rna_expr,
ARID1A_groups_subset %>% select(DepMap_ID, Group)) %>%
mutate(Group = fct_relevel(Group,"Control")) # show Control group first
## T-test
t.test(ARID1A_8289 ~ Group, ARID1A_rna_expr)
## plot
ggplot(ARID1A_rna_expr, aes(x = Group, y = ARID1A_8289)) +
geom_boxplot()
```
## Perform genome-wide *in silico* genetic screen
After determining cell lines in the `ARID1A_HomDel` group has statistically significant reduction in RNA expression compared to `Control` cell lines, the next step is to perform a *in silico* genetic screen using `screen_results()`. This uses the dependency probabilities (or **"lethality probabilities"**) generated from DepMap's genome-wide CRISPR-Cas9 knockout screen.
**Lethality probabilities** range from 0.0 to 1.0 and is quantified for each gene knock out in every cancer cell line screened (There are 18,334 genes targeted in 739 cancer cell lines). A gene knock out with a lethality probability of 0.0 indicates a non-essential for the cell line, and a gene knock out with a 1.0 indicates an essential gene (ie. very lethal). Details can be found in [Meyers, R., et al., 2017](https://doi.org/10.1038/ng.3984)
At its core, `screen_results()` performs multiple Mann-Whitney U tests, comparing lethality probabilities of each targeted gene between mutant and control groups. This generates a data frame with the following columns:
* `GeneName_ID` - Hugo symbol with NCBI gene ID
* `GeneNames` - Hugo symbol
* `_median, _mean, _sd, _iqr` - Control and mutant group's median, mean, standard deviation (sd), and interquartile range (iqr) of dependency probabilities. Dependency probabilities range from zero to one, where one indicates a essential gene (ie. KO of gene was lethal) and zero indicates a non-essential gene (KO of gene was not lethal)
* `Pval` - P-value from Mann Whitney U test between control and mutant groups.
* `Adj_pval` - BH-adjusted P-value.
* `log2FC_by_median` - Log2 normalized median fold change of dependency probabilities (mutant / control). Dependency probabilities range from 0.0-1.0 where 1.0 indicates high probability of KO leading to lethality, while 0.0 indicates little to no lethality.
* `log2FC_by_mean` - Log2 normalized mean fold change of dependency probabilities (mutant / control). Dependency probabilities range from 0.0-1.0 where 1.0 indicates high probability of KO leading to lethality, while 0.0 indicates little to no lethality.
* `CliffDelta` - Cliff's delta non-parametric effect size between mutant and control dependency probabilities. Ranges between -1 to 1.
* `dip_pval` - Hartigan's dip test p-value. Tests whether distribution of mutant dependency probability is unimodel. If dip test is rejected (p-value < 0.05), this indicates that there is a multimodel dependency probability distribution and that there may be another factor contributing to this separation.
* `Interaction_score` - Combined value generated from signed p-values: -log10(Pval) \* sign(log2FC_by_median). Positive scores indicate possible lethal genetic interaction, and negative scores indicate possible alleviating genetic interaction.
> **Warning**
> This process may take a few hours depending on the number of cores assigned. Our example below `GI_screen()` took ~2 hours to process. To save time, we have preprocessed this step for you.
```{r run_screen, eval = FALSE}
ARID1A_mutant_id <- ARID1A_groups %>% filter(Group %in% c("ARID1A_HomDel")) %>% pull(DepMap_ID)
ARID1A_control_id <- ARID1A_groups %>% filter(Group %in% c("Control")) %>% pull(DepMap_ID)
## See warning above.
## This can take several hours depending on number of lines/cores used.
screen_results <- GI_screen(
control_id = ARID1A_control_id,
mutant_id = ARID1A_mutant_id,
core_num = 5, # depends on how many cores you have
output_dir = gretta_output_dir, # Will save your results here as well as in the variable
data_dir = gretta_data_dir,
test = FALSE) # use TRUE to run a short test to make sure all will run overnight.
```
```{r load_prepared_results, eval = TRUE}
## Load prepared ARID1A screen result
load(paste0(gretta_data_dir,"/sample_22Q2_ARID1A_KO_screen.rda"), envir = environment())
```
We can quickly determine whether any lethal genetic interactions were predicted by `GRETTA`. We use a `Pval` cut off of 0.05 and rank based on the `Interaction_score`.
```{r checkout_res, eval = TRUE}
screen_results %>%
filter(Pval < 0.05) %>%
arrange(-Interaction_score) %>%
select(GeneNames:Mutant_median, Pval, Interaction_score) %>% head
```
We immediately see that *ARID1B*, a known synthetic lethal interaction of *ARID1A*, was a the top of this list.
### Optional: Performing a small-scale screen
To perform a small in silico screen, a list of genes can be provided in the `gene_list =` argument.
```{r run_screen_subset, eval = FALSE}
small_screen_results <- GI_screen(
control_id = ARID1A_control_id,
mutant_id = ARID1A_mutant_id,
gene_list = c("ARID1A", "ARID1B", "SMARCA2", "GAPDH", "SMARCC2"),
core_num = 5, # depends on how many cores you have
output_dir = gretta_output_dir, # Will save your results here as well as in the variable
data_dir = gretta_data_dir)
```
## Visualize screen results
Finally once the *in silico* screen is complete, results can be quickly visualized using `plot_screen()`. Positive genetic interaction scores indicate potential synthetic lethal genetic interactors, and negative scores indicate potential alleviating genetic interactors. As expected, we identified *ARID1B* as a synthetic lethal interactor of *ARID1A*.
```{r plot, eval = TRUE}
## Visualize results, turn on gene labels,
## and label three genes each that are predicted to have
## lethal and alleviating genetic interactions, respectively
plot_screen(result_df = screen_results,
label_genes = TRUE,
label_n = 3)
```
# Example: Identifying *ARID1A* co-essential genes
Perturbing genes that function in same/synergistic pathways or in the same complex are said to show similar fitness effects, and these that show effects are considered to be "co-essential". The strategy of mapping co-essential gene have been used by several studies to attribute functions to previously annotated genes as well as to identify a novel subunit of a large complex ([Wainberg et al. 2021](https://doi.org/10.1038/s41588-021-00840-z); [Pan et al. 2018](https://doi.org/10.1016/j.cels.2018.04.011)).
Given that ARID1A is known subunit of the mammalian SWI/SNF complex ([Mashtalir et al. 2018](https://doi.org/10.1016/j.cell.2018.09.032)), we expect that members of the SWI/SNF complex would share co-essentiality with *ARID1A*. This example will demonstrate how we can map *ARID1A*'s co-essential gene network using `GRETTA`.
## Identifying genes with highest correlation coefficients
To determine co-essential genes, we will perform multiple Pearson correlation coefficient analyses between *ARID1A* KO effects and the KO effects of all 18,333 genes. A cut off will be determined by calculating the inflection point of the ranked coefficient curve. As expected find SWI/SNF subunit encoding genes, *SMARCE1* and *SMARCB1*, as the top two co-essential genes.
> **Warning**
> This process may take several minutes. Our example below `coessential_map()` + `get_inflection_points()` took ~17 minutes to process. To save time we have pre-processed this setp for you.
```{r coess_df, eval = FALSE}
## Map co-essential genes
coess_df <- coessential_map(
input_gene = "ARID1A",
data_dir = gretta_data_dir,
output_dir = gretta_output_dir)
## Calculate inflection points of positive and negative curve using co-essential gene results.
coess_inflection_df <- get_inflection_points(coess_df)
```
```{r load_prepared_results_coess, eval = TRUE, include = FALSE}
# Load pre-processed results
load(paste0(gretta_data_dir,"/sample_22Q2_ARID1A_coessential_result.rda"), envir = environment())
load(paste0(gretta_data_dir,"/sample_22Q2_ARID1A_coessential_inflection.rda"), envir = environment())
```
Next, we annotate the data frame containing the co-essential network data and visualize.
```{r combine_n_visualize, eval = TRUE}
## Combine and annotate data frame containing co-essential genes
coess_annotated_df <- annotate_df(coess_df, coess_inflection_df)
plot_coess(
result_df = coess_annotated_df,
inflection_df = coess_inflection_df,
label_genes = TRUE, # Should gene names be labeled?
label_n = 3) # Number of genes to display from each end
```
We also see that the top ten *ARID1A* co-essential genes include eight known SWI/SNF subunits, namely *ARID1A*, *SMARCB1*, *SMARCE1*, *SMARCC1*, *SS18*, *DPF2*, *SMARCC2*, and *SMARCD2*.
```{r top10_coess, eval = TRUE}
## Show top 10 co-essential genes.
coess_annotated_df %>% arrange(Rank) %>% head(10)
```
### Optional filter for specific cancer types
Instead of mapping for essentiality across all available cell lines, users can also subset by disease type using the option `input_disease = ""`, or within a pre-selected group of cell lines using the option `input_cell_lines = c()`. Below we provide an example of how ARID1A essential genes are mapped for pancreatic cancers.
> **Warning**
> Depending on the number of cell lines that are available after the subsetting step,
> the inflection point calculation and thresholds may not be optimal.
> Please use caution when interpreting these results.
```{r coess_for_panc, eval = FALSE}
## Map co-essential genes in pancreatic cancers only
coess_df <- coessential_map(
input_gene = "ARID1A",
input_disease = "Pancreatic Cancer",
core_num = 5, ## Depending on how many cores you have access to, increase this value to shorten processing time.
data_dir = gretta_data_dir,
output_dir = gretta_output_dir,
test = FALSE)
```
### Optional filter for custom cell lines
We can also map essentiality across a manually defined list of cell lines using the `input_cell_lines = c()` option.
> **Warning**
> Depending on the number of cell lines provided, the inflection point may not be calculated.
> Please use caution when interpreting these results.
```{r coess_for_lines, eval = FALSE}
custom_lines <- c("ACH-000001", "ACH-000002", "ACH-000003",...)
coess_df <- coessential_map(
input_gene = "ARID1A",
input_cell_lines = custom_lines,
core_num = 5, ## Depending on how many cores you have access to, increase this value to shorten processing time.
data_dir = gretta_data_dir,
output_dir = gretta_output_dir,
test = FALSE)
```
# Session information
```{r sessioninfo}
sessionInfo()
```