Replies: 7 comments 1 reply
-
Thank you for the Suggestion. @jmtsuji also asked for the MMseqs taxonomy annotation. As you proposed:
Can you send me a output file for the mmseqs easy-taxonomy |
Beta Was this translation helpful? Give feedback.
-
@Sofie8 Do you want to work on this? I can you help going, but you would need to write the same snakemake rules. Which isn't more complicated than bash that you already did. |
Beta Was this translation helpful? Give feedback.
-
@SilasK yes gladly! It is part of a citizen science project , in which we sequence each month 10 locations of the water stream. The samples you see are from October, I have also the data from November, [december was not possible], we continue now in January, till august 2021. Samples are from the free-flowing water body, filtered over 0.2 µm and 3 µm and then separately DNA-extracted and sequenced (I thought this might help assembly, I should have an EUK enriched fraction (3 µm) and then prokaryotes (0.2 µm). We are interested to see (1) what's in the water, seasonal influence on biodiversity, and the impact of sewage overflows. At which locations do we pick up DNA from fishes, frogs, where it is 'too' dirty.. So the end result for me, would be a taxonomy, function 'abundance' table. Binning would be hard unless I try something on the merged all locations pool of reads. So let's get started :-) |
Beta Was this translation helpful? Give feedback.
-
@SilasK Your contig classification steps the eukaryotic genes could probably tie into what I was talking about in #455. |
Beta Was this translation helpful? Give feedback.
-
@SilasK What would the effect be of different assemblers on the ability to generate eukaryotic contigs? My understanding is that in some cases different assemblers have different performances on different genomes. For example, there was a discontinued version of spades that was designed to better handle diploid genomes. Or is metaspades good enough for what we are trying to do? |
Beta Was this translation helpful? Give feedback.
-
This tool might be an interesting eukaryotic replacement for checkm. Estimating the quality of eukaryotic genomes recovered from metagenomic analysis with EukCC |
Beta Was this translation helpful? Give feedback.
-
Thank you @LeeBergstrand @jmtsuji FYI, I thought this pre-print might interest you, given that you've been looking into binning eukaryotic genomes: https://doi.org/10.1101/2021.11.15.468626 Whokaryote: distinguishing eukaryotic and prokaryotic contigs in metagenomes based on gene structure Abstract |
Beta Was this translation helpful? Give feedback.
-
Hi @SilasK ,
My question is a bit related to the RNA metatranscriptomics question: #143
but rather on DNA metagenomes. Can we use parts of the atlas pipeline for finding/annotating eukaryotic contigs/genes in metagenome datasets.
It is hard to find a workflow for analysing metagenome datasets for eukaryotes, so here we go.
Suggestion:
What I tried now:
take the taxonomic mmseqs swiss-prot database:
mmseqs databases UniProtKB/Swiss-Prot swissprot tmp
run mmseqs easy-taxonomy on all contig files:
for file1 in _final_contigs.fasta
do
out=${file1%%._final_contigs.fasta}_output
mmseqs easy-taxonomy $file1 $databasedir/swissprot $resultdir/$out $resultdir/tmp --search-type 2
done
That's it, I am a little bit stuck here, I would like to hear others suggestions!
I have available DNA metagenome datasets (shallow metagenomes, 2x150 bp) from a freshwater stream to test things out.
Cheers,
Sofie
Beta Was this translation helpful? Give feedback.
All reactions