-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Module completeness as stand-alone package #19
Comments
I would also like this feature. @Alxdu have you found any alternatives? |
Hello both of you, Indeed KEMET was conceived and structured in 3 different scripts, but at the time of first manuscript submission to a journal, one reviewer suggested to bundle all functions in a single package. Due to this, the design of the main script was reworked and it's now in the present form, but lines 2444-2495 are remnants of the initial concept about Module annotation alone. I've briefly checked the code of The script is not specifically asking for FASTA files as input, but it's using file names of said files to keep a constant flow for all operations connected to the same MAGs/genomes. That is to say that if for f in $(ls PATH/TO/ANNOTATION-FILES/);
do
./kemet.py $f -a ANNOTATION_FORMAT --skip_hmm;
done should work for batch annotation. In the meantime I guess another workaround could be to truncate the names of KO annotations files with a code like: for f in $(ls PATH/TO/ANNOTATION-FILES/);
do
f1="${f%%.*}";
./kemet.py $f1 -a ANNOTATION_FORMAT --skip_hmm;
done For single file annotation, instead of pointing to fasta files path, it is possible to point to an annotation file, with the exception of leaving out the extension. I'll work on the solution I mentioned in this reply, to include single file and batch use cases, soon when I'll be available! @Alxdu regarding the tool to create module definition it could be available, but it would take a while more. I already have some code for that, which was used as backbone for the most of .kk files but it still needs some manual curation for a minority of them. Therefore I was figuring out a way to eliminate this manual curation on the code, and in the meanwhile I had updated to the second to last KEGG version. I'll also try to do the same for the last one in the close future. Best, Matteo |
@Matteopaluh this is great news. Would also be possible to include some functionality that takes in something like just a list of KO ids? Something like this: for GENOME_ID in $(cat genomes.list);
do
KO_IDS=kofam_results/${GENOME_ID}.ko_ids.list
kemet.py $KO_IDS -a ko_list > kemet_results/${GENOME_ID}.mcr.tsv
done If you're able to implement this functionality and add the module as a conda package I will incorporate it into my https://github.com/jolespin/veba package. I'm working on the v2 publication right now so your package of course would be cited and properly referenced. What would be very useful would be to give How difficult would this be on your end to make this type of update? |
@Matteopaluh is it excellent to hear you intend to revisit and improve upon the module completeness functionality. I will have a go at your suggested code modifications as a workaround, but I also look forward to your own implementation in upcoming updates. Same goes for module definition tooling (i.e., rebuilding .kk file). |
First of all, thank you for putting together this really great package.
I find the module completeness assessment really unique, with only a few other lesser options out there (e.g., KeggDecoder). I also liked the way you break down the module definition in .kk files for improved completeness assessment. Therefore, I look forward to see continued support and development for this function.
In my case, I use ko annotations made within a different pipeline to assess module completeness with KEMET. In theory I would only need the annotation .txt file, but I have to also provide the genome assembly .fasta file to run the script (which is not really needed when running with --skip_hmm and --skip_gsmm arguments).
If I could make a feature request/suggestion, it would be to separate the module completeness functionality where it accepts just ko annotation files (either a path to a file or a path to a folder for batch operation).
It would also be great to have a stand-alone tool to create module definition .kk files from the official kegg module .txt files, for situations where KEMET is not continuously supported and current .kk files become obsolete.
Thank you for giving these some consideration.
The text was updated successfully, but these errors were encountered: