Skip to content

GTEX_data

smoretti edited this page May 26, 2020 · 13 revisions

Main NCBI documentation: http://www.ncbi.nlm.nih.gov/books/NBK36439/#Download.download_using_prefetch_command

How to connect

GTEX data are hosted at NCBI, in dbGaP.

GTEX data are download protected. You need a download privilege for them, given by PI of the project using GTEX.

Once you have an access, log in to the "Authorized Access" area at

https://dbgap.ncbi.nlm.nih.gov/aa/wga.cgi?login=&page=login

Select data to download

In the "Authorized Access" tab, you have access to your current download tickets ("Downloads" sub-tab), and to project(s) you have access to ("My Requests" sub-tab).

For new requests/tickets, go to "My Requests" sub-tab and click on "Request Files" as Action in your chosen project.

Now, you have access to new tabs, with which you can browse files available in the project:

  • "Phenotype and Genotype Files" tab to browse annotation files
  • "SRA data (reads and reference alignments)" tab to open an advanced file selector
  • "SRA submitted files" tab to browse SRA files, from project root files

Once you have selected files, a ticket is created and dbGaP generates a command line, using Aspera connect, to download them.

Download data

Due to authorized access, it is simpler, faster and recommended to download files with Aspera Connect (client), freely available at http://downloads.asperasoft.com/en/downloads/8

Download and install last version of client Aspera Connect.

Aspera Connect is mainly a browser plugin, but it is simpler to use the command line tool "ascp" that should have been installed with Aspera Connect in $HOME/.aspera/connect/ on Linux.

Download your data following the command line provided with your ticket. Just replace %ASPERA_CONNECT_DIR% by $HOME/.aspera/connect/

Decrypt data

Once downloaded, GTEX data may be encrypted!

To use encrypted data, or decrypt them, you need the NCBI SRA toolkit, freely available at http://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=software as binary.

In order to use this toolkit, you have to configure it the right way or it will NOT work!

First create a "vdb-passwd" file containing your dbGaP password in $HOME/.ncbi/

Then, ask the dbGaP PI to provide the project repository key to you. Should be a file like "prj_8976.ngc", where the integer part is your dbGaP project id. Put it in $HOME/.ncbi/

See http://www.ncbi.nlm.nih.gov/Traces/sra/?view=toolkit_doc&f=dbgap_use

Import the project repository key with the SRA toolkit program "vdb-config -i"

You may then change your project local repository because SRA files may be very large.

You can now decrypt files with the "vdb-decrypt" command. But remember that The SRA Toolkit will only decrypt and download project files when executed from within the project's workspace directory AND using the fullpath to the "vdb-decrypt" command.

E.g. _/opt/mine/bin/Bioinfo/decryption.2.5.2-centos_linux64/bin/vdb-decrypt dbgap-download-folder/

Additional info and FAQ can be found at http://www.ncbi.nlm.nih.gov/books/NBK36439/#Download.download_using_prefetch_command

SRA toolkit is also available on the cluster. Configuration has to be done for each user independently.

Clone this wiki locally