-
Notifications
You must be signed in to change notification settings - Fork 1
Question KO_tsv file and plot #5
Comments
Hey Sophie, You know that I’m an advocate of genome-centric analysis. The KO-gene are linked with gene-genome table and multiplied by the relative abundance of the genome to produce the KO-abundance table. Now if you have produced a gene-abundance table (metagenome-atlas/atlas#276) you can link them to the genne- KO table Sum KOs if you want. Yes you can normalise by one ore the median of many single-copy-KO This would give you the results as gene-copies / genome. |
Hi Silas, You know that I’m an advocate of genome-centric analysis.
Now if you have produced a gene-abundance table (metagenome-atlas/atlas#276) you can link them to the genne- KO table Sum KOs if you want.
To: Like this I have an 'OTU table' kind of thing which I can use for downstream analyses, just like the OTU-table you have for the genomes, which I am using, but I am afraid I miss several genes part of 'partial' genomes. With the contigs, I come a step closer. Additionally, I complement this with something like Metannotate, performing your QC steps, and then without assembly, fraggenescan++ and then hmm search for certain genes.. then I feel like I have taken the most out of my shotgun dataset (for bacteria). Thanks! |
Hey Sofie, My idea behind atlas was to create a consistent reference to annotate all samples of a project. Either you do this with a collection of genomes or the genecatalog. The advantages are, you don't need to annotate the same gene multiple times and you quantify the same genes in all samples. You can annotate the gene catalog with what you want.
Once you have a table
You can combine it with the table "Genecatalog/counts/median_coverage.tsv.gz" #276
You simply load the counts table, normalize the counts, and select the genes for which you have annotations. You may want to sum genes that have the same annotation. I can do this easily in python, but it should also be easy in R. |
Hi Silas! Ok I am looking, but I don't see a folder counts inside my genecatalog folder. Metannotate: |
Look at gene-abundance table (metagenome-atlas/atlas#276) for how to get the median coverage. @jmtsuji |
Hi Silas,
I have a question regarding the output written to the 'results' folder:
(1) KO.tsv
Structure is:
MAG K00001 K0002 etc..
MAG1 1 1
and the file written to the Genecatalog/annotations folder:
(2) KO.tsv
Structure is:
Gene0014228 KO7304
Gene...
And the summary.html the last table and plot of the Kegg orthologs. I was trying to understand what I am seeing, or which file you load in:
Is this Kegg ortholog table and heatmap based based on your file (1), so it are the KEGG orthologs/mags/sample
Or do you read in the file (2), KEGG ortholog/gene/(somehow gene_abundance)/sample?
I wanted to do further downstream analyses with the genecatalogs file, but I don't know how you translate the query to its abundance (occurrence) in a sample.
I was thinking, can we make graphs in which we express the abundance of certain genes relative to a common gene in each sample. So we speak about ratio's, I think this reduces (statistically) a bit the dependency/skew of the data on its different throughput. Like rpoB as reference gene, and suppose the query I am interested in, is an oil degrading gene. So I can say in this groundwater monitoring well, I have 5 oil degrading targets/ 10 rpob (rpoB is common to all bacteria, so I can say either half of my bacteria population is oil degrader, or, one strain expresses 5 oil degrading genes, etc...). The samples can have different number of reads, that doesn't really matter than, as long as we take the ratio between strains in a sample?
Jackson did something similar with the outcome of metannotate and I was wondering if I can translate the outcome of your genecatalog file as input to his R script. I just need one extra column in the table saying to which sample, the gene belongs, then I can continue :-)
Sofie
The text was updated successfully, but these errors were encountered: