Skip to content

4b. Interpreting the Results of DRAM v

Rory M Flynn edited this page Nov 2, 2022 · 3 revisions

Hopefully you have successfully ran DRAM-v through the annotate and distill steps. Now you want to know what each of the MAGs or metagenomes are capable of. The recommended workflow for running DRAM and DRAM-v is to always annotate and then distill. By starting with the liquor, the most distilled part of the pipeline, then looking to the distillate and then raw data will make analyzing your viral contigs the most helpful.

Reading the Liquor

The DRAM-v liquor is a summary of the potential AMGs (pAMGs) that have been detected across annotated viral contigs. Like the DRAM liquor it comes in the form of a HTML file called liquor.html which is fully portable and can be opened in any web browser and looks something like this. On this heatmap the y-axis is made up of all viral contigs that were annotated and the x-axis is various functions present in at least one viral contig. The far left column is the total number of pAMGs present in that viral contig. This is given because more cells in a given row may be lit up as true, indicating that the function is present, than the total number of pAMGs found in this viral contig. This is because each gene may have more than one annotation, each of which could be involved in different functions, or because a single annotation is associated with multiple functions.

The subsequent sections of the heatmap represent each of the major categories of metabolism and other functions from the distillate with each column being an individual function. Like the DRAM liquor the heatmap is also interactive and hovering over a function will tell you the gene on the viral contig which encoded that function as well as the annotation assigned to that gene which made DRAM-v believe that the function was present.

The primary takeaways from the DRAM-v liquor are 1) how many viral contigs have at least one pAMG present as indicated by the number of viruses in the heatmap. 2) The number of pAMGs present in each viral contig as shown by the number in the far left column of the heatmap. 3) The functionality of the pAMGs present in a viral contig as given by the lit up sections of the heatmap. And 4) the frequency of any pAMG across viral contigs which could indicate the frequency of this function being used by phages in a system.

Reading the distillate

While the heatmap can give you a quick overview of the pAMGs you have annotated you may want more detail about those pAMGs or about the community of viruses that you have annotated. This can be done by viewing both the amg_summary.tsv and the viral_genome_summary.tsv files.

AMG summary gives metabolic detail about pAMGs

The file amg_summary.tsv is a more detailed look at the pAMGs that are present across your annotated viral contigs. Each row represents a function associated with an annotation that was assigned to a gene in a viral contig. This means a gene can be present in several rows as each gene may have more than one annotation, each of which could be involved in different functions, or because a single annotation is associated with multiple functions. This means that the total number of rows in the AMG summary table is not the total number of pAMGs you have annotated but the number of functions that those pAMGs represent.

The columns of the AMG summary give information about the gene, the DRAM distillate categories of the annotation and DRAM-v information on why the gene was called a pAMG. The first two columns give the gene which is being called a pAMG and the scaffold that gene is on. The next columns give all levels of information from the DRAM distillate about the function associated with that AMG. The final two columns give the auxiliary score and AMG flags as assigned by DRAM-v. Remember that a low auxiliary score indicates a gene that is confidently viral and all the AMG flags are defined in the DRAM-v in detail page.

Viral genome summary gives details about each viral contig

In the viral_genome_summary.tsv each viral contig is a line. Each line contains all information needed for the MIUViG. This includes VirSorter information, gene count, strand switch found and counts of genes from each VOGDB major category as well as other additional information.

Digging into the raw annotations

In the raw annotations are the types of files returned by most genome annotators. This ranges from scaffold and genome feature files to tables with all the recorded annotations. These are the files you will want to dig into if your metabolism or gene function of interest is not covered by the distillate and liquor or you need more detail than these levels of summarization provide.

The annotations master table

The file annotations.tsv contains all annotations for all predicted open reading frames. Each row is an individual gene and all columns give annotation information. The first column gives the assigned gene name and subsequent columns give the name of the FASTA file and name of the scaffold that the gene was called from. Next is the gene position on the scaffold (1-end), the nucleotide start position, nucleotide end position and the strandedness of the gene. After that is the rank of the annotation. The rank is assigned by Ranks are assigned based on the methods outlined in Daly, et al. 2016. Briefly an annotation is given an A rank if there is a reciprocal best hit to a KEGG gene, a B if there is a reciprocal best hit to a UniRef90 gene, a C if there is a forward hit only to either KEGG or UniRef90, a D if there is only a hit to PFAM and an E if there is no annotation to KEGG, UniRef90 or PFAM.

The subsequent columns give the annotation information. For databases with BLAST-style (done using MMseqs2) searches columns with the database hit ({database}_hit), if the hit was a reciprocal best hit ({database}_RBH), the percent identity of the match ({database}_identity), the bitscore of the hit ({database}_bitScore) and the E-value of the hit ({database}_eVal). If the database has specific identifiers that DRAM-v pulls then an additional column is present ({database}_id). These are the identifiers that are used in the distillation of annotations.

For databases that are searched using HMMs (using HMMER) only a list of hits is given. The hits are separated by semicolons and after each hit in square brackets is the identifier that is associated with the hit. The identifier is what is used in the distillation of the annotations.

After that is the MHC count (heme_regulatory_motif_count). This is a count of the number of times the CXXCH is present in that gene. Iron-reducing microorganisms use multi-heme c-type cytochromes (MHCs) as terminal reductases for the final step of electron transfer. We note, this is the first step in identifying MHCs, thus to further validate MHC potential, users should look at the annotation (e.g. nitrate reductases should not be considered MHCs), upload into psortB to obtain location information, and perform sequence similarity network analysis relative to known MHCs.

The last columns in DRAM-v annotations give additional viral information. First is the VirSorter category of the gene. If the gene is not predicted in VirSorter then the cell will be left blank and if the gene is predicted by VirSorter but not given a category then the cell will contain a -. Next is the auxiliary score and the last is the flags assigned by DRAM-v. More details about these can be found here.

FASTA Files

Three FASTA files are generated by DRAM-v: scaffolds.fna, genes.fna and genes.faa. The scaffolds file has all scaffolds from all input MAGs with the renamed format to match the annotations table and output gff file. genes.fna has all genes from all MAGs as nucleotides and genes.faa has all genes from all MAGs as amino acids.

Genome feature files

Annotations are given in two formats that can be used for subsequent analysis or visualization. These file contain all open reading frames with annotations as well as tRNAs and rRNAs. The genes.gff file contains all genes from all MAGs and matches with the scaffolds.fna file. There is a file called scaffolds.gbk which is a multi-genbank file with all scaffolds included. This can be viewed with genome viewers such as IGV or geneious.

tRNA and rRNA files

tRNAs are summarized in tRNAs.tsv and rRNAs are summarized in rRNAs.tsv.