This repository contains workflows to submit and verify data with two different government repositories: GDC and dbGaP
There are two steps to submitting data to the GDC: submission and validation.
- First, use the transferToGDC workflow to transfer your files. The README is located at the bottom of the page at that link, which describes what each input to the workflow is.
- Note that read-group level metadata can either be provided in the Terra metadata tables (this should be used for older samples run via Zamboni), OR it can be provided in a JSON file (the
read_group_metadata_json
, which should be used for newer samples run via DRAGEN).
- Note that read-group level metadata can either be provided in the Terra metadata tables (this should be used for older samples run via Zamboni), OR it can be provided in a JSON file (the
- After samples have been submitted, their validation status can be checked by using the validateGDCStatus workflow. This workflow will check each sample's validation status within the GDC and update the Terra metadata tables with the validation status.
There are two steps to submitting data to dbGapP: submission and validation.
- First, use the transferToDbgap workflow to transfer your files. The README is located at the bottom of the page at that link, which described what each input to the workflow is.
- Note that read-group level metadata can either be provided in the Terra metadata tables (this should be used for older samples run via Zamboni), OR it can be provided in a JSON file (the
read_group_metadata_json
, which should be used for newer samples run via DRAGEN).
- Note that read-group level metadata can either be provided in the Terra metadata tables (this should be used for older samples run via Zamboni), OR it can be provided in a JSON file (the
- After samples have been submitted, their validation status can be checked by using the validateDbGapStatus workflow. This workflow will check each sample's validation status within dbGaP and update the Terra metadata tables with the validation status.
Nothing is required if only the .wdl files are changed. Once your branch is merged to main
, dockstore will automatically get updated with the most recent changes. In your Terra workspace, you can always verify what code is running by looking at the source code (in Terra on GCP, this can be found in the "SCRIPT" tab when you're navigated to your workflow configuration page).
If you've made a change to your Python file, most likely you'll need to recreate and push the image using the V2 Dockerfile since this is the one that contains all the Python code. You'll need to build, tag and push the docker image to this repository. Note that even though this repository is public, you'll need to be added as a collaborator in order to successfully push changes to it.
If you've updated any of the Python code, the docker image(s) will have to be rebuilt and pushed to DockerHub. First track down where in which .wdl
file that Python code is called. Now in that .wdl
, find the Docker image that's defined in the runtime attributes. This should correspond to one of the Docker files that are located within a subdirectory of Docker. Once you've found the Dockerfile you'll need to re-create, you can use the following commands to build and push the docker images (note, you don't have to necessarily build all three images, but these are the commands to use in case you do):
docker build -t schaluvadi/horsefish:submissionAspera -f Docker/Aspera/Dockerfile . --platform="linux/amd64"
docker build -t schaluvadi/horsefish:submissionV2GDC -f Docker/V2/Dockerfile . --platform="linux/amd64"
docker build -t schaluvadi/horsefish:submissionV1 -f Docker/V1/Dockerfile . --platform="linux/amd64"
You'll need to add the --platform="linux/amd64
in case your default platform is different on your machine.
Once you've successfully created the Docker image, you can run docker images
and you should see a newly created image. If you're like to verify anything, you can open the image in an interactive shell. First run docker images
and copy the IMAGE ID
of your new image. Next run docker run -it {IMAGE_ID}
. This opens an interactive shell where you can run regular unix commands such as cd
, grep
, vim
, etc.
Once you're recreated your image and verified that your changes have propagated locally, you'll need to push your new image version to this public repository. You can do so by running any of the following commands (depending on which image you have built and need to push):
docker push schaluvadi/horsefish:submissionAspera
docker push schaluvadi/horsefish:submissionV2GDC
docker push schaluvadi/horsefish:submissionV1
This guide provides instructions for creating an SSH key pair and utilizing it to establish secure connections with dbGaP.
To generate an SSH key pair, follow these steps:
- Open a terminal or command prompt.
- Use the following command:
ssh-keygen -t rsa -m PEM -f ./private.openssh
This command will generate two files in your current directory:
private.openssh
: Your private key.private.openssh.pub
: Your public key.
Once you have generated your SSH key pair, follow these steps to link your public key:
- Send your public SSH key (
private.openssh.pub
) to[email protected]
.
After linking your public key, you can upload your private key to the designated workspace.
Note: Keep your private key secure and do not share it with anyone.
For any inquiries or assistance, please contact Nareh Sahakian at [email protected]
.
Ensure you follow your organization's security policies and guidelines when managing SSH keys and accessing workspaces.