Skip to content

2. How to Install and Set Up DRAM

Rory M Flynn edited this page Nov 2, 2022 · 11 revisions

Installing DRAM

To install DRAM some dependencies need to be installed first then DRAM can be installed from this repository. In the future DRAM will be available via conda.

Dependencies can be installed via conda or by manually installing all dependencies.

Conda Installation

Install DRAM and all dependencies into a new conda environment using the provided enviornment.yaml file. This will install the newest version of DRAM from pip. If you want to install a development version you can follow instructions for installing DRAM below.

wget https://raw.githubusercontent.com/WrightonLabCSU/DRAM/master/environment.yaml
conda env create -f environment.yaml -n DRAM

If this installation method is used then all further steps should be ran inside the created DRAM environment. To activate the enviornment you can use this command:

conda activate DRAM

Manual Installation

If you do not install via a conda environment the dependencies pandas, networkx, scikit-bio, prodigal, mmseqs2, hmmer and tRNAscan-SE manually.

There are two ways you can install DRAM. The latest release version can be downloaded from pip or you can install the latest development version from GitHub. Installing the latest release from pip is recommended.

Install DRAM using pip

pip install DRAM-bio

Install DRAM from GitHub

Download this repository using git clone https://github.com/WrightonLabCSU/DRAM.git

Then change directory into the DRAM directory and install DRAM using pip install .

You have now installed DRAM.

Setting up DRAM

To run DRAM you need to set up the databases it needs in order to get annotations from those databases. All databases but KEGG can be downloaded and set up for use with DRAM for you automatically. To get KEGG annotations and fully take advantage of the genome summarization capabilities of DRAM you must have access to the KEGG database. KEGG is a paid subscription service to download the protein files used by this annotator. If you do not have access to KEGG then your data can be annotated with all other databases. Genome summarization will summarize the tRNAs, peptidases and CAZy's in your data set but not the primary metabolisms.

I have access to KEGG

Then set up DRAM using the following command:

DRAM-setup.py prepare_databases --output_dir DRAM_data --kegg_loc kegg.pep

kegg.pep is the path to the amino acid FASTA file downloaded from KEGG. This can be any of the single files provided by the KEGG FTP server or a concatenated version of the multiple provided files. DRAM_data is the path to the processed databases used by DRAM. If you already have any of the databases downloaded to your server and don't want to download them again then you can give them to the prepare_databases command by use the --{db_name}_loc flags such as --uniref_loc and --viral_loc.

I don't have access to KEGG

Not a problem. Then use this command:

DRAM-setup.py prepare_databases --output_dir DRAM_data

Similar to above you can still provide locations of databases you have already downloaded so you don't have to do it again.

To see that your set up worked use the command DRAM-setup.py print_config and the location of all databases provided will be shown as well as the presence of additional annotation information.

NOTE: Setting up DRAM can take a long time (up to 5 hours) and uses a large about of memory (512 gb) by default. To use less memory you can use the --skip_uniref flag which will reduce memory usage to ~64 gb if you do not provide KEGG Genes and 128 gb if you do. Depending on the number of processors which you tell it to use (using the --threads argument) and the speed of your internet connection. On a less than 5 year old server with 10 processors it takes about 2 hours to process the data when databases do not need to be downloaded.