Interactive Analysis on Terra

This repository and associated files is meant for a toy demo for Interactive Analysis on Terra.Bio cloud platform.

NOTE: It DOES NOT cover nor teach how to methodically perform an analysis, but it acts as "cookbook" of best practices to ease your pain while working interactively on Terra; specifically:-

Files/code organization.
Data handling and organization.
Associated output handling and archiving/storage.
Version control and collaboration.

# create `data` folder if doesn't exist <in the current working directory>
if (!file.exists(xfun::relative_path("data"))){
  dir.create(xfun::relative_path("data"),
             recursive = TRUE)}

# create `output` folder if doesn't exist <in the current working directory>
if (!file.exists(xfun::relative_path("output"))){
  dir.create(xfun::relative_path("output"),
             recursive = TRUE)}

IMPORTANT: Copy input data from another workspace

Use gsutil command line tools to copy the data from a Data Assets Workspace into the data sub-directory of your analysis workspace. You can find the workspace_bucket name in the cloud information section after clicking on any Terra workspace.

# copy raw data into `data` subdirectory
gsutil cp -r `gs://workspace_bucket/path_to_terra_workspace` data/         # using comand line

# OPTIONAL Advanced: From an R script
system(command = "gsutil cp `gs://path_to_terra_workspace` data/")

Perform your Analysis

Now you have your raw data inside data subfolder, it's time to import it into your software (python or R) and begin your analysis. All your data import commands shall point to data/some_file.ext or data/data_sub_folder/some_file.ext.

This is important in 3 ways:-

Keeps your file paths consistent across all your scripts.
Your whole analysis is reproducible in another VM/CE with the same set up; i.e. data subfolder for storing raw data
Above all, collaboration becomes easy, no need to change file paths back and forth in the same script.

output subfolder: Any analysis effort, processed data or plots and documents should be deposited in this folder. You can have expand the organization inside this folder to include further sub-folders as needed.

About the analysis

This is a toy analysis for demo purposes using the mtcars and/or iris data sets from R to perform simple wrangling and visualization. The ideas can be expanded and be used to any kind of rectangular dataset i.e. flat file.

Tasks:

Copy data from workspace
Clean iris data by converting the columns (measurement in centimetres) into metres
Create a simple scatter plot of Sepal.Length vs Petal.Length, color points by Species
Create a multi-faceted histogram of all the columns colorued by Species
Save/export the cleaned iris data back to the workspace under "processed" folder
Save/export the two plots back to the workspace under "processed" folder

IMPORTANT: Export your output to Terra Workspace

Once done with your analysis effort, it's time to start doing housekeeping. If you created files or analysis results inside the output folder and you need to store them in Terra for future sharing/use. The process is analagous to a previous step of copying raw data into our data subfolder using gsutil command line tools.

For the purposes of this demo we want to use an environment variable $OWNER_EMAIL so you and everyone else can see your own output in the shared workspace. Do this by adding '/$OWNER_EMAIL' to the end of your bucket path. Note that this is not commonly done outside of this demo within the IMCM.

# copy files/products from `output` subdirectory to Terra workspace in the command line
gsutil cp -r output/some_file(s)  `gs://workspace_bucket/path_to_terra_workspace_sub_folder/$OWNER_EMAIL`      

# OPTIONAL Advanced: Programmatically from an R script
system(command = "gsutil cp  output/some_file(s)  `gs://path_to_terra_workspace/sub_folder/`")

This ensure that the files or data generated from an analysis that needs to be re-used later is safely stored in a respective Terra workspace and are not lost in the event we shut down the VM and delete the PD.

Shut down the VM

Now, it's end of an analysis task. Your outputs are safe in a Terra workspace and your codes are version-ed and pushed into GitHub, it's highly recommended you shut down your VM to avoid incurring costs.

While shutting down the VM, you can either:-

Delete the VM and the PD (all generated files are lost, RECOMMENDED)
Delete the VM and spare the PD (You loose the CE configuration, but you get to keep all the files in the PD)

OPTIONAL ADVANCED: Push your code and scripts to GitHub

Your analysis output is safe in a Terra workspace, now it's time to give our upadted analysis code/script a safe home as well 🙂. Note this step requires a github account.

Do the following (usual git versioning workflow):

# stage the edited codes
git add .                                      # to stage all the edited scripts
git add R/script_name.R||python/script_name.py # to stage a specific script

# commit
git commit -m"descriptive_commit_message"     # commit the staged scripts

# push to github
git push origin <branch name>                 # push to GitHub your "new scripts"

OPTIONAL: Using `.gitignore` files

Once you have the two subfolders, make sure you exclude them from your future GitHub commits by editing the .gitigonre file and specifying that they be excluded from a commit and push. You DO NOT want to commit/push data into GitHub, and this prevents that.

# put these lines in .gitignore file
data/
output/

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
R		R
bash		bash
python		python
.gitignore		.gitignore
LICENSE		LICENSE
README.html		README.html
README.md		README.md
interactive-analysis-terra.Rproj		interactive-analysis-terra.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Interactive Analysis on Terra

Contents

Create analysis enviroment

Bring in your code from GitHub

Bring in your data from Terra Workspace

OPTIONAL Advanced: This can be done programmatically like so in an R script:-

IMPORTANT: Copy input data from another workspace

Perform your Analysis

About the analysis

IMPORTANT: Export your output to Terra Workspace

Shut down the VM

OPTIONAL ADVANCED: Push your code and scripts to GitHub

OPTIONAL: Using `.gitignore` files

About

Releases

Packages

Contributors 2

Languages

License

IMCM-OX/interactive-analysis-terra

Folders and files

Latest commit

History

Repository files navigation

Interactive Analysis on Terra

Contents

Create analysis enviroment

Bring in your code from GitHub

Bring in your data from Terra Workspace

OPTIONAL Advanced: This can be done programmatically like so in an R script:-

IMPORTANT: Copy input data from another workspace

Perform your Analysis

About the analysis

IMPORTANT: Export your output to Terra Workspace

Shut down the VM

OPTIONAL ADVANCED: Push your code and scripts to GitHub

OPTIONAL: Using .gitignore files

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

OPTIONAL: Using `.gitignore` files

Packages