-
Notifications
You must be signed in to change notification settings - Fork 3
Frequently Asked Questions
Below are several common questions. More will be added as similar issues arise in the issues section.
About GRETTA:
Accessing data:
Interpreting results:
Specific applications:
A manuscript describing GRETTA and its usage is now available on BioRxiv (Takemon, Y. and Marra, MA., 2020).
Each GRETTA release has also been archived and made citable on Zenodo. Running citation("GRETTA")
in R for the full citation including the DOI (10.5281/zenodo.6940757).
Copy number (CN) data by DepMap is provided as log2(relative CN + 1) so that the CN of a diploid cell with two alleles is one. We infer zygosity by binning cells by CN based on hard thresholds that we have set:
- Amplified >= 1.25
- Neutral < 1.25 & >= 0.75
- Shallow loss < 0.75 & > 0.25
- Deep loss <= 0.25
A detailed description can be found on BioRxiv (Takemon, Y. and Marra, MA., 2020).
This package used to be called GINI, GINIR, GRETA, or GRETTA.
TLDR version: People (who decide the fate of my Ph.D. candidacy) told me they didn't like the name, so I pivoted as I am a people pleaser.
Long version: My Ph.D. committee member pointed out that GINI (Genetic Interaction Network Identifier) was an overused acronym, which was true (see Gini coefficient, GINI impurity for decision trees, etc.)... As a result, I stuck an R
behind the name, GINIR (Genetic Interaction Network IdentifieR), to give it an R-programming vibe, which my PI said sounded weird (He pronounced it
"Guh-nerr"). So then I re-re-named the package to GRETA (Genetic inteRaction and EssenTiality mApper), and people generally seem to react positively to it. However, in the process of submitting this package to Bioconductor, I realized that the lower case greta
was already a package on CRAN. Therefore, its now GETTA with an extra T (Genetic inteRaction and EssenTialiTy mApper)! I am hoping that GRETTA sticks!
Currently, one DepMap version will be prepared and accessible on this GitHub page per year. Please download the following version using the code provided below:
# 20Q1
wget https://github.com/ytakemon/GRETTA/raw/main/GRETTA_DepMap_20Q1_data.tar.gz
# 21Q4
wget https://github.com/ytakemon/GRETTA/raw/main/Additional_DepMap_data_versions/21Q4/GRETTA_DepMap_21Q4_data.tar.gz
# 22Q2
wget https://github.com/ytakemon/GRETTA/raw/main/Additional_DepMap_data_versions/22Q2/GRETTA_DepMap_22Q2_data.tar.gz
# 23Q4
wget https://github.com/ytakemon/GRETTA/raw/main/Additional_DepMap_data_versions/23Q4/GRETTA_DepMap_23Q4_data.tar.gz
Output from GI_screen()
provides a data frame with a list of all screened genes and their Interaction_score. A positive Interaction_score indicates a lethal genetic interaction, and a negative Interaction score indicates an alleviating interaction. Users can set thresholds of their choosing, but we suggest starting with the following filters to discover candidate interactors:
screen_results %>%
filter(
Pval < 0.05,
Mutant_median > 0.3 | Control_median > 0.3,
abs(log2FC_by_median) > 0.5) %>%
arrange(-Interaction_score)
References for understanding genetic interaction networks:
- Mair, B., et al. Curr Opin Genet Dev. 2019
- Mani, R., et al. PNAS. 2008
- Tsherniak, A., et al. Cell. 2017](https://doi.org/10.1016/j.cell.2017.06.010)
Output from coessential_map()
provides a data frame with Pearson correlation coefficient output of each pair-wise comparison against a user's gene of interest. The estimate
column provides the coefficient score; thus, a score of 1 indicates a perfect correlation (co-essential genes), and a score of -1 indicates a perfect anti-correlation (anti-essential gene). A gene is considered a candidate co-/anti-essential gene if p-value < 0.05 and passes the inflection point of the positive and negative curves.
References for understanding essential gene networks:
- Wainberg et al. Nat. Genet. 2021
- Pan, J., et al. Cell Systems. 2019
- Kim, E., et al. Life Sci. Alliance. 2019
For this, you will need to install Singularity. Please refer to their documentation.
There are two ways you can choose to interact with GRETTA:
- Externally
- Within the container
A warning may appear after pulling the image that can either be ignored or verified.
WARNING: unable to verify container: gretta_latest.sif WARNING: Skipping container verification
To verify:
singularity verify gretta_latest.sif
Container is signed by 1 key(s): Verifying partition: FS: 8A189A7929B18DBE38DC651806AF5BD75B1CC530 [LOCAL] Yuka Takemon <[email protected]> [OK] Data integrity verified
Here we show you how to run an R script containing all code necessary to reproduce the tutorial.
## On terminal/shell
# Create a directory for GRETTA
mkdir GRETTA_tutorial
cd GRETTA_tutorial
# Download the R script containing the ARID1A tutorial
wget https://raw.githubusercontent.com/ytakemon/GRETTA/main/Singularity/ARID1A_tutorial.R
# Download the 22Q2 data set and extract
wget https://www.bcgsc.ca/downloads/ytakemon/GRETTA/22Q2/GRETTA_DepMap_22Q2_data.tar.gz
tar -zxvf GRETTA_DepMap_22Q2_data.tar.gz
# Pull the GRETTA container on Sylabs, https://cloud.sylabs.io/library/ytakemon/gretta/gretta
singularity pull --arch amd64 library://ytakemon/gretta/gretta:latest
# Run ARID1A tutorial, may take several hours depending on how many threads you have access to. Here we use 20 threads. Please adjust accordingly.
singularity run gretta_latest.sif ARID1A_tutorial.R 20 data/
You should see the following files in ./output/
:
- GINI_coessentiality_network_results.csv
- GRETTA_GI_screen_results.csv
- Tutorial_ARID1A_essentiality_ranked_plot.pdf
- Tutorial_ARID1A_GI_ranked_plot.pdf
## On terminal/shell
# Create a directory for GRETTA
mkdir GRETTA_tutorial
cd GRETTA_tutorial
# Download the R script containing the ARID1A tutorial
wget https://raw.githubusercontent.com/ytakemon/GRETTA/main/Singularity/ARID1A_tutorial.R
# Download the 22Q2 data set and extract
wget https://www.bcgsc.ca/downloads/ytakemon/GRETTA/22Q2/GRETTA_DepMap_22Q2_data.tar.gz
tar -zxvf GRETTA_DepMap_22Q2_data.tar.gz
# Pull the GRETTA container on Sylabs, https://cloud.sylabs.io/library/ytakemon/gretta/gretta
singularity pull --arch amd64 library://ytakemon/gretta/gretta:latest
# Open interactive container
# singularity shell gretta_latest.sif
## Inside the container:
# Open pre-configured R
Singularity> R
## Within R:
# Necessary packages have been installed and thus can be loaded directly
> library(GRETTA)
> library(tidyverse)
# Ready to follow tutorial on the main page: https://github.com/ytakemon/GRETTA
For the genetic interaction screen, context-specific cell lines can be selected in two ways:
- Using optional arguments provided in
select_cell_lines()
. See?select_cell_lines
in R for more info.
# List of optional arguments
select_cell_lines(
input_gene = NULL,
input_AA_change = NULL,
input_disease = NULL,
input_disease_subtype = NULL,
data_dir = NULL)
# Example selecting TP53 mutants only in SCLC subtypes
select_cell_lines(input_gene = "TP53",
input_disease = "Lung Cancer",
input_disease_subtype = "Small Cell Lung Cancer (SCLC)",
data_dir = "/path/to/DepMap_data/")
- Manual selection
# If you already have specific cell lines in mind to use as mutant and controls, you can also manually set them:
custom_controls <- c("ACH-000001", "ACH-000002", "ACH-000003")
custom_mutants <- c("ACH-000010", "ACH-000011", "ACH-000012")
screen_results <- GI_screen(
control_IDs = custom_controls,
mutant_IDs = custom_mutants,
core_num = 5, # depends on how many cores you have
output_dir = "path/to/results/folder/", # Will save your results here as well as in the variable
data_dir = GRETTA_data_dir,
test = FALSE)
For the essentiality network analysis, context-specific cell lines can be selected in two ways:
- Using optional arguments provided in
coessential_map()
. See?coessential_map
in R for more info.
# List of optional arguments
coessential_map(
input_gene = NULL,
input_disease = NULL,
input_cell_lines = NULL,
core_num = NULL,
output_dir = NULL,
data_dir = NULL,
output_filename = NULL,
test = FALSE
)
# Example essentiality analysis for ARID1A in Lung Cancer cell lines only:
res <- coessential_map(
input_gene = "ARID1A",
input_disease = "Lung Cancer",
core_num = 4,
output_dir = "path/to/results/folder/", # Will save your results here as well as in the variable
data_dir = "/path/to/DepMap_data/"
)
- Manual selection
# If you already have specific cell lines groups in mind for , you can also manually set them:
custom_line_list <- c("ACH-000001", "ACH-000002", "ACH-000003",...)
res <- coessential_map(
input_gene = "ARID1A",
input_cell_lines = custom_line_list,
core_num = 4,
output_dir = "path/to/results/folder/", # Will save your results here as well as in the variable
data_dir = "/path/to/DepMap_data/"
)