GABMLR.results - an R package with convenience functions for working with GAMBL results (restricted to useers with access to GSC).
GAMBLR.results is an open-source package. It can be easily installed directly from GitHub:
devtools::install_github("morinlab/GAMBLR.results", repos = BiocManager::repositories())
This will install the GAMBLR.results package with all necessary dependencies. It requires access to the GSC resources and is not intended to be used outside of GSC. If you are interested in standalone functionality, please refer to the documentation of the GAMBLR.data package or any other individual child package.
If you have access to gphost, the easiest way to obtain and contribute to GAMBLR is to do this via cloning the repository
cd
git clone git@github.com:morinlab/GAMBLR.results.git
In your R editor of choice, set your working directory to the place you just cloned the repo.
setwd("~/GAMBLR.results")
Install the package in R by running the following command (requires the devtools package)
devtools::install()
As GAMBL users (GAMBLRs, so to speak) rely on the functionality of this package, the Master branch is protected. All commits must be submitted via pull request on a branch. Please refer to the GAMBL documentation for details on how to do this.
If you don't have access to gphost on GSC, no worries, you can still execute GAMBLR functions in another way. Remote support was developed for this purpose. This section explains how to run GAMBLR remote on a local machine (i.e on your own computer). There are two different approaches to get this to work, both with its own advantages and limitations. We will be going over both in this next section.
This section details how to deploy GAMBLR with limited functionality. This approach requires either a working GSC VPN connection (or is directly accessible if connected to the GSC network).
- You need a working GSC VPN connection to use this approach. For setting up a VPN connection see this guide. Keep in mind that a VPN connection is not needed if your already connected to the GSC network.
- Clone GAMBL and GAMBLR to your local computer. From your terminal run the following commands (folder structures can be whatever you want...)
mkdir ~/git_repos
cd ~/git_repos #set as working directiory
git clone https://github.com/morinlab/gambl
git clone https://github.com/morinlab/GAMBLR
- Update the paths in your local config.yml (GAMBLR) to point to the recently cloned, local gambl folder (repo_base). In your favorite text editor, edit the line shown below (under remote ). Similarly, you will also need to edit the line above it to point to where you will eventually sync the GAMBL results.
remote:
project_base: "/path/to/your/local/gambl_results_directory/"
repo_base: "/path/to/your/local/gambl_repo/"
- Set the working directory in Rstudio. Open Rstudio on your local machine and locate the repo you cloned previously.
setwd("~/git_repos/GAMBLR")
- Install GAMBLR in your local R studio.
devtools::install()
- Load packages.
library(GAMBLR)
- Execute the following in Rstudio console to make use of the updated paths in the config.yml from step 3.
Sys.setenv(R_CONFIG_ACTIVE = "remote")
Alternatively, you can add the content of ~/git_repos/GAMBLR/.Rprofile to your ~/.Rprofile
file. In this way, you do not need to enter the command above every time you start your R session (recommended).
cat ~/git_repos/GAMBLR/.Rprofile >> ~/.Rprofile
- Test if setup was successful (e.g call
get_gambl_metadata()
to retrieve meta data for all gambl samples).
get_gambl_metadata() %>%
head()
This section details how to obtain GAMBLR with full functionality, using a dedicated snake file to retrieve all necessary files and dependencies.
- Make sure you have a working SSH key setup with a pass phrase . If not, follow instructions at GSC Wiki. Warning, this will not work with a pass phrase-less SSH connection.
mkdir ~/git_repos
cd ~/git_repos
git clone https://github.com/morinlab/gambl
git clone https://github.com/morinlab/GAMBLR
- On your local machine, make a new directory called gambl_results , for example.
mkdir ~/gambl_results/
- In the config.yml file of your local GAMBLR folder, update paths under the
remote
field to point to the recently cloned local gambl folder ( repo_base ) and recently created gambl_results folder ( project_base ). Also, update the host field to contain your username (you can use any gphost here). For example:
remote:
project_base: "~/gambl_results/"
repo_base: "~/git_repos/gambl/"
...
host: "your_username@gphost01.bcgsc.ca"
- Copy the following files (from your recently cloned GAMBLR directory) into the folder from the previous step;
config.yml
andget_gambl_results.smk
.
cp ~/git_repos/GAMBLR/config.yml ~/gambl_results/
cp ~/git_repos/GAMBLR/get_gambl_results.smk ~/gambl_results/
- Add ENVVARS bash/zsh environment variables to your bashrc/zsh or some other way that will ensure they're in your session (e.g. you can set them manually each time if you want, just make sure they are set). For example in your local terminal run the following commands (with updated values...).
export GSC_USERNAME="your_gsc_username"
export GSC_KEY="path_to_SSH_key_with_passphrase_from_step_1"
export GSC_PASSPHRASE="passpharase_from_step_1"
- Open Rstudio (locally) and set the working directory to the folder you downloaded in step 2 (in the Rstudio console) and install GAMBLR.
setwd("~/git_repos/GAMBLR")
- Install and load GAMBLR into your local R session.
devtools::install()
- In the terminal on your local machine, create a new snakemake environment from the get_gambl_results.yml file (get_gambl_results_linux.yml for Linux). Note that you can name this new environment whatever you would like. In this example, the new environment is called snakemake_gambl .
cd ~/gambl_results
conda env create --name snakemake_gambl --file ~/git_repos/GAMBLR/get_gambl_results.yml
- Activate this newly created snakemake environment with:
conda activate snakemake_gambl
- Retrieve necessary files (download a local copy of all files needed to run a collection of GAMBLR functions). It's strongly advised to use
--cores 1
for this, since it seems to be the more stable option. In addition, if your sync gets interrupted, you only need restart the syncing of 1 file, compared to if you run on multiple cores.
snakemake -s get_gambl_results.smk --cores 1
- In Rstudio (local), open test_remote.R in GAMBLR master folder.
- Execute the following in Rstudio console to make use of the updated paths in the config.yml from step 5.
Sys.setenv(R_CONFIG_ACTIVE = "remote")
Alternatively, you can add the content of ~/git_repos/GAMBLR/.Rprofile to your ~/.Rprofile
file. In this way, you do not need to enter the command above every time you start your R session (recommended).
cat ~/git_repos/GAMBLR/.Rprofile >> ~/.Rprofile
- Check what files (if any) are currently missing.
check_gamblr_config()
- You should now be all set to explore a collection of GAMBLR function remotely on your local machine. For example you could try the following test code to ensure your setup was successful. For a set of comprehensive examples and tutorials, please refer to the test_remote.R script.
get_gambl_metadata() %>%
head()
Note , if your seeing the following message when trying to use GAMBLR, please ensure that the config/gambl repo is set up properly (step 5 ) and that you are using the remote mode (step 13 ).
get_gambl_metadata(seq_type_filter = "capture") %>%
pull(cohort) %>%
table()
Error: '/projects/rmorin/projects/gambl-repos/gambl-rmorin/data/metadata/gambl_all_outcomes.tsv' does not exist.