Skip to content

Frequently Asked Questions

Yuka Takemon edited this page Jun 3, 2024 · 24 revisions

Below are several common questions. More will be added as similar issues arise in the issues section.

About GRETTA:

Accessing data:

Interpreting results:

Specific applications:


About GRETTA:

Q: How to cite this package?

A manuscript describing GRETTA and its usage is now available on BioRxiv (Takemon, Y. and Marra, MA., 2020).

Each GRETTA release has also been archived and made citable on Zenodo. Running citation("GRETTA") in R for the full citation including the DOI (10.5281/zenodo.6940757).

Q: How does GRETTA determine zygosity?

Copy number (CN) data by DepMap is provided as log2(relative CN + 1) so that the CN of a diploid cell with two alleles is one. We infer zygosity by binning cells by CN based on hard thresholds that we have set:

  • Amplified >= 1.25
  • Neutral < 1.25 & >= 0.75
  • Shallow loss < 0.75 & > 0.25
  • Deep loss <= 0.25

A detailed description can be found on BioRxiv (Takemon, Y. and Marra, MA., 2020).

Q: Wasn't this package called GINI, GINIR, or GRETA? Why the name change?

This package used to be called GINI, GINIR, GRETA, or GRETTA.

TLDR version: People (who decide the fate of my Ph.D. candidacy) told me they didn't like the name, so I pivoted as I am a people pleaser.

Long version: My Ph.D. committee member pointed out that GINI (Genetic Interaction Network Identifier) was an overused acronym, which was true (see Gini coefficient, GINI impurity for decision trees, etc.)... As a result, I stuck an R behind the name, GINIR (Genetic Interaction Network IdentifieR), to give it an R-programming vibe, which my PI said sounded weird (He pronounced it "Guh-nerr"). So then I re-re-named the package to GRETA (Genetic inteRaction and EssenTiality mApper), and people generally seem to react positively to it. However, in the process of submitting this package to Bioconductor, I realized that the lower case greta was already a package on CRAN. Therefore, its now GETTA with an extra T (Genetic inteRaction and EssenTialiTy mApper)! I am hoping that GRETTA sticks!


Accessing data:

Q: How to download and use other versions of DepMap data?

Currently, one DepMap version will be prepared and accessible on this GitHub page per year. Please download the following version using the code provided below:

# 20Q1
wget https://github.com/ytakemon/GRETTA/raw/main/GRETTA_DepMap_20Q1_data.tar.gz

# 21Q4
wget https://github.com/ytakemon/GRETTA/raw/main/Additional_DepMap_data_versions/21Q4/GRETTA_DepMap_21Q4_data.tar.gz

# 22Q2
wget https://github.com/ytakemon/GRETTA/raw/main/Additional_DepMap_data_versions/22Q2/GRETTA_DepMap_22Q2_data.tar.gz

# 23Q4
wget https://github.com/ytakemon/GRETTA/raw/main/Additional_DepMap_data_versions/23Q4/GRETTA_DepMap_23Q4_data.tar.gz

Interpreting results:

Q: How to interpret the genetic screening results from GI_screen()?

Output from GI_screen() provides a data frame with a list of all screened genes and their Interaction_score. A positive Interaction_score indicates a lethal genetic interaction, and a negative Interaction score indicates an alleviating interaction. Users can set thresholds of their choosing, but we suggest starting with the following filters to discover candidate interactors:

screen_results %>% 
  filter(
    Pval < 0.05,
    Mutant_median > 0.3 | Control_median > 0.3,
    abs(log2FC_by_median) > 0.5) %>%
  arrange(-Interaction_score)

References for understanding genetic interaction networks:

Q: How to interpret the essentiality network results from coessential_map()?

Output from coessential_map() provides a data frame with Pearson correlation coefficient output of each pair-wise comparison against a user's gene of interest. The estimate column provides the coefficient score; thus, a score of 1 indicates a perfect correlation (co-essential genes), and a score of -1 indicates a perfect anti-correlation (anti-essential gene). A gene is considered a candidate co-/anti-essential gene if p-value < 0.05 and passes the inflection point of the positive and negative curves.

References for understanding essential gene networks:


Specific applications:

Q: How to run Singularity?

For this, you will need to install Singularity. Please refer to their documentation.

There are two ways you can choose to interact with GRETTA:

  1. Externally
  2. Within the container

A warning may appear after pulling the image that can either be ignored or verified.

WARNING: unable to verify container: gretta_latest.sif
WARNING: Skipping container verification

To verify:

singularity verify gretta_latest.sif
Container is signed by 1 key(s):

Verifying partition: FS:
8A189A7929B18DBE38DC651806AF5BD75B1CC530
[LOCAL]   Yuka Takemon <[email protected]>
[OK]      Data integrity verified

1. External method

Here we show you how to run an R script containing all code necessary to reproduce the tutorial.

## On terminal/shell
# Create a directory for GRETTA
mkdir GRETTA_tutorial
cd GRETTA_tutorial

# Download the R script containing the ARID1A tutorial
wget https://raw.githubusercontent.com/ytakemon/GRETTA/main/Singularity/ARID1A_tutorial.R

# Download the 22Q2 data set and extract 
wget https://www.bcgsc.ca/downloads/ytakemon/GRETTA/22Q2/GRETTA_DepMap_22Q2_data.tar.gz
tar -zxvf GRETTA_DepMap_22Q2_data.tar.gz

# Pull the GRETTA container on Sylabs, https://cloud.sylabs.io/library/ytakemon/gretta/gretta
singularity pull --arch amd64 library://ytakemon/gretta/gretta:latest

# Run ARID1A tutorial, may take several hours depending on how many threads you have access to. Here we use 20 threads. Please adjust accordingly.
singularity run gretta_latest.sif ARID1A_tutorial.R 20 data/

You should see the following files in ./output/:

  • GINI_coessentiality_network_results.csv
  • GRETTA_GI_screen_results.csv
  • Tutorial_ARID1A_essentiality_ranked_plot.pdf
  • Tutorial_ARID1A_GI_ranked_plot.pdf

2. Within container method

## On terminal/shell
# Create a directory for GRETTA
mkdir GRETTA_tutorial
cd GRETTA_tutorial

# Download the R script containing the ARID1A tutorial
wget https://raw.githubusercontent.com/ytakemon/GRETTA/main/Singularity/ARID1A_tutorial.R

# Download the 22Q2 data set and extract 
wget https://www.bcgsc.ca/downloads/ytakemon/GRETTA/22Q2/GRETTA_DepMap_22Q2_data.tar.gz
tar -zxvf GRETTA_DepMap_22Q2_data.tar.gz

# Pull the GRETTA container on Sylabs, https://cloud.sylabs.io/library/ytakemon/gretta/gretta
singularity pull --arch amd64 library://ytakemon/gretta/gretta:latest

# Open interactive container
# singularity shell gretta_latest.sif

## Inside the container:
# Open pre-configured R
Singularity> R

## Within R:
# Necessary packages have been installed and thus can be loaded directly
> library(GRETTA)
> library(tidyverse)

# Ready to follow tutorial on the main page: https://github.com/ytakemon/GRETTA

Q: How can context-specific genetic screens or essentiality network analyses be performed?

For the genetic interaction screen, context-specific cell lines can be selected in two ways:

  1. Using optional arguments provided in select_cell_lines(). See ?select_cell_lines in R for more info.
# List of optional arguments 
select_cell_lines(
  input_gene = NULL,
  input_AA_change = NULL,
  input_disease = NULL,
  input_disease_subtype = NULL,
  data_dir = NULL)

# Example selecting TP53 mutants only in SCLC subtypes
select_cell_lines(input_gene = "TP53",
  input_disease = "Lung Cancer",
  input_disease_subtype = "Small Cell Lung Cancer (SCLC)",
  data_dir = "/path/to/DepMap_data/")
  1. Manual selection
# If you already have specific cell lines in mind to use as mutant and controls, you can also manually set them:

custom_controls <- c("ACH-000001", "ACH-000002", "ACH-000003")
custom_mutants <- c("ACH-000010", "ACH-000011", "ACH-000012")

screen_results <- GI_screen(
  control_IDs = custom_controls, 
  mutant_IDs = custom_mutants,
  core_num = 5, # depends on how many cores you have  
  output_dir = "path/to/results/folder/", # Will save your results here as well as in the variable
  data_dir = GRETTA_data_dir,
  test = FALSE)

For the essentiality network analysis, context-specific cell lines can be selected in two ways:

  1. Using optional arguments provided in coessential_map(). See ?coessential_map in R for more info.
# List of optional arguments 
coessential_map(
       input_gene = NULL,
       input_disease = NULL,
       input_cell_lines = NULL,
       core_num = NULL,
       output_dir = NULL,
       data_dir = NULL,
       output_filename = NULL,
       test = FALSE
     )

# Example essentiality analysis for ARID1A in Lung Cancer cell lines only:
res <- coessential_map(
       input_gene = "ARID1A",
       input_disease = "Lung Cancer",
       core_num = 4,
       output_dir = "path/to/results/folder/", # Will save your results here as well as in the variable
       data_dir = "/path/to/DepMap_data/"
     )
  1. Manual selection
# If you already have specific cell lines groups in mind for , you can also manually set them:

custom_line_list <- c("ACH-000001", "ACH-000002", "ACH-000003",...)

res <- coessential_map(
       input_gene = "ARID1A",
       input_cell_lines = custom_line_list,
       core_num = 4,
       output_dir = "path/to/results/folder/", # Will save your results here as well as in the variable
       data_dir = "/path/to/DepMap_data/"
     )