-
Notifications
You must be signed in to change notification settings - Fork 52
2. How to Install and Set Up DRAM
To install DRAM some dependencies need to be installed first then DRAM can be installed from this repository. In the future DRAM will be available via conda.
Dependencies can be installed via conda or by manually installing all dependencies.
Install DRAM and all dependencies into a new conda environment using the provided enviornment.yaml file. This will install the newest version of DRAM from pip. If you want to install a development version you can follow instructions for installing DRAM below.
wget https://raw.githubusercontent.com/WrightonLabCSU/DRAM/master/environment.yaml
conda env create -f environment.yaml -n DRAM
If this installation method is used then all further steps should be ran inside the created DRAM environment. To activate the enviornment you can use this command:
conda activate DRAM
If you do not install via a conda environment the dependencies pandas, networkx, scikit-bio, prodigal, mmseqs2, hmmer and tRNAscan-SE manually.
There are two ways you can install DRAM. The latest release version can be downloaded from pip or you can install the latest development version from GitHub. Installing the latest release from pip is recommended.
Install DRAM using pip
pip install DRAM-bio
Install DRAM from GitHub
Download this repository using git clone https://github.com/WrightonLabCSU/DRAM.git
Then change directory into the DRAM directory and install DRAM using pip install .
You have now installed DRAM.
To run DRAM you need to set up the databases it needs in order to get annotations from those databases. All databases but KEGG can be downloaded and set up for use with DRAM for you automatically. To get KEGG annotations and fully take advantage of the genome summarization capabilities of DRAM you must have access to the KEGG database. KEGG is a paid subscription service to download the protein files used by this annotator. If you do not have access to KEGG then your data can be annotated with all other databases. Genome summarization will summarize the tRNAs, peptidases and CAZy's in your data set but not the primary metabolisms.
I have access to KEGG
Then set up DRAM using the following command:
DRAM-setup.py prepare_databases --output_dir DRAM_data --kegg_loc kegg.pep
kegg.pep
is the path to the amino acid FASTA file downloaded from KEGG. This can be any of the single files
provided by the KEGG FTP server or a concatenated version of the multiple provided files. DRAM_data
is the path
to the processed databases used by DRAM. If you already have any of the databases downloaded to your server and
don't want to download them again then you can give them to the prepare_databases
command by use the --{db_name}_loc
flags such as --uniref_loc
and --viral_loc
.
I don't have access to KEGG
Not a problem. Then use this command:
DRAM-setup.py prepare_databases --output_dir DRAM_data
Similar to above you can still provide locations of databases you have already downloaded so you don't have to do it again.
To see that your set up worked use the command DRAM-setup.py print_config
and the location of all databases provided
will be shown as well as the presence of additional annotation information.
NOTE: Setting up DRAM can take a long time (up to 5 hours) and uses a large about of memory (512 gb) by default. To
use less memory you can use the --skip_uniref
flag which will reduce memory usage to ~64 gb if you do not provide KEGG
Genes and 128 gb if you do. Depending on the number of processors which you tell it to use (using the --threads
argument) and the speed of your internet connection. On a less than 5 year old server with 10 processors it takes about
2 hours to process the data when databases do not need to be downloaded.