-
Notifications
You must be signed in to change notification settings - Fork 52
4a. Interpreting the Results of DRAM
So you have successfully ran DRAM or DRAM-v through the annotate and distill steps. Now you want to know what each of the MAGs or metagenomes are capable of. The recommended workflow for running DRAM and DRAM-v is to always annotate and then distill. By starting with the liquor, the most distilled part of the pipeline, then looking to the distillate and then raw data will make analyzing genomes the most helpful.
Start by getting your liquor.html
file to a local environment (e.g. download the file if it is on a server) then open it in an internet browser window. This file is totally independent and can be shared without needing to share any other data. When you open that window you will get something that will look like this. It is a heatmap where the y-axis represents the different genomes that have been analyzed and the x-axis represents various functions genomes can be capable of. This gives out an overview of the major metabolic functions that each genome you have annotated is capable of.
In addition to the liquor html file a table containing the same information is also generated called liquor.tsv
.
The farthest left section of the heatmap is labelled module and gives the completion of various pathways representing common, relevant metabolisms. The color of each cell of the heatmap represents the completeness of that module. DRAM measures completeness by looking at the individual steps of a module and checking if at least one gene is present in each step. Therefore all genes in a module or even all subunits of all genes of one path through a module need to be present in order for the coverage of a module to be presented as 100% covered (see figure below). To see more detail about the coverage of a module you can jump to using the distillate.
The next section of the liquor shows the presence of various electron transport chain complexes. This allows the user to rapidly understand the ability of microbes to perform aerobic respiration. Like with modules the color of the individual cells of the heatmap shows the coverage of that electron transport chain complex. Coverage is calculated differently from modules though. Here the complete path through a module is considered and to be fully covered a complete set of annotations from any valid path through the module must be present (see figure below).
The remainder of the heatmap is based on checking whether or not certain annotations are present which are indicative of characteristic metabolic functions being present. Each cell is shaded based on the presence or absence of the function for that column. Each cell is represented by a list of annotation ID's that can come from any of the databases that DRAM annotates against. If any one of the annotation ID's from that list is present in a genome then the cell is lit up to represent a true finding. For some functions the presence of just one gene is not enough to say that function is present. For these functions there are multiple columns with the same name but multiple parts (e.g. Xyloglucan, pt. 1; Xyloglucan, pt. 2; and Xyloglucan, pt. 3). For these functions all parts must be lit up as being present to consider the genome to encode that function. It is critical to remember that absence of evidence is not evidence of absence. If a function is absent that may be because that gene was not part of the assembly of the genome or it is a gene that is difficult to annotate accurately. Finding and annotating this gene in a more detail manual way may be something you want to pursue if you are particularly interested in a certain function.
The distillate comes in the form of two files which contain detailed information about the metabolic functions annotated in each genome and information about the quality and taxonomy of each genome.
Where the liquor gives a brief overview of the metabolic functions that each genome is capable of the metabolism summary gives an accounting of the abundance of a wide variety of genes including those representing common metabolisms. This table comes in the form of a excel formatted spreadsheet with multiple columns. The first five columns give information about the annotation and the remaining columns give the count of genes with that annotation in each of the genomes analyzed.
The gene_id
and gene_description
columns give the finest level of detail about the function present.