-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ecountering an issue when running AlleleCall #182
Comments
Dear @kamivain, Thank you for your interest in chewBBACA. Please have a look at issue #176. I note that you are using python 3.10. Althought this should not be a problem we do advise to use python 3.9, this may also result in a clearer error reporting. The other potential problem is if you are using BLAST>2.9. Please downgrade if necessary because we know there are incompatibilities. If downgrading BLAST does not solve the problem there may be problems with the file or contig names. Please look into the previous issues reported on this. Best Regards, Mario |
Dear Mario,
Thank you for your reply,I would adopt your advise and try the program again.
Best Regards,
kamivain
Original Email
Sender:"ramirma"< ***@***.*** >;
Sent Time:2023/8/14 16:08
To:"B-UMMI/chewBBACA"< ***@***.*** >;
Cc recipient:"kamivain"< ***@***.*** >;"Mention"< ***@***.*** >;
Subject:Re: [B-UMMI/chewBBACA] Ecountering an issue when running AlleleCall(Issue #182)
Dear @kamivain,
Thank you for your interest in chewBBACA. Please have a look at issue #176. I note that you are using python 3.10. Althought this should not be a problem we do advise to use python 3.9, this may also result in a clearer error reporting. The other potential problem is if you are using BLAST>2.9. Please downgrade if necessary because we know there are incompatibilities. If downgrading BLAST does not solve the problem there may be problems with the file or contig names. Please look into the previous issues reported on this.
Best Regards,
Mario
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
I have the similar problem, but if I apply the command on a selection of the genomes it appears to be solved. Conversely, when applied on the second part I have agains the problem. |
Greetings @Fla1487, Thank you for your interest in chewBBACA. Based on what you report, it might be related to issues in one or several input files (badly formatted files, special characters in the filename or sequence headers, etc). Updating to the latest version may also help, as it solves several issues in older versions. If you cannot find the cause of the issue, please share what's printed to the stdout, as it might include enough information to determine the type of issue. Kind regards, Rafael |
Edits: Hi chewBBACA developer, I encounter the exact error, but only to a subset of my genomes. So, initially, I tried to perform AlleleCall for 2500 genomes which failed due to the same error. Then I did multiple AlleleCall to 4 batches of 600-700 genomes, some of them succeeded, but some failed (N = 937 genomes). This is what I have done:
Below is the code The error file and output file of this run are attached. Please kindly look into this and what can you suggest for me to do? Thank you very much! Best, |
Hello @artmisk13, Thank you for reporting this issue. We know of more users who have encountered this bug under similar circumstances. Based on what users report, it should be related to a single or a set of input files. We never got the same issue or managed to reproduce the error even when users shared data. That is the reason why we could not look into this properly. This error is strange because chewBBACA cannot get a sequence that should be in the FASTA file. Could you share a minimal test case that leads to the same error? For example, we can use the schema, a subset of the schema loci, and a genome to find and fix the issue. Any data you share with us is handled privately; we will only use it for bug fixing (you can upload a Zip with the data to WeTransfer and send the link to [email protected]). Also, part of the problem might be related to the environment configuration. If you are using a conda environment to run chewie, can you run Lastly, the BLAST error Let us know if you can share some data and if changing the file names fixes the BLAST error. Best regards, Rafael |
Hi Rafael, Thanks for your thorough explanation and suggestions, they are really helpful!
So I'm guessing there is a problem somewhere in 1) reading the fasta files when the name only has numerical characters and 2) creating the "missing_classes.fasta" file when there is a '-' separator in the input fasta name (problem in string variable splitting?). The 2nd problem probably has been addressed in the newer chewie version. I hope this new information helps you further in debugging the AlleleCall module.
*Once the manuscript is accepted for publication, I am happy to upload the scheme to Chewie-NS so more people can use it! Best, |
Hello @artmisk13, Thank you for sharing the details and data about the errors. It will help us a lot. We will probably change how IDs are processed internally to solve this kind of issue for good. Best regards, Rafael |
Hello @artmisk13, We released chewBBACA v3.3.9. This version includes changes to check if BLAST interprets input unique IDs as PDB chain IDs or if it modifies the IDs at all. We use Kind regards, Rafael |
Hello,I have encountered an issue when running AlleleCall to the genomes. It said "AttributeError: 'NoneType' object has no attribute 'seq'", what's the matter, thank you!
$ chewBBACA.py AlleleCall -i bu_genome -g bu_schema/schema_seed/ --gl bu_result_wgMLST/cgMLST/cgMLSTschema99.txt -o bu_result251_cgMLST --cpu 2
chewBBACA version: 3.2.0
Authors: Rafael Mamede, Pedro Cerqueira, Mickael Silva, João Carriço, Mário Ramirez
Github: https://github.com/B-UMMI/chewBBACA
Documentation: https://chewbbaca.readthedocs.io/en/latest/index.html
Contacts: [email protected]
==========================
chewBBACA - AlleleCall
Started at: 2023-08-13T22:39:06
Minimum sequence length: 0
Size threshold: 0.2
Translation table: 11
BLAST Score Ratio: 0.6
Word size: 5
Window size: 5
Clustering similarity: 0.2
Prodigal training file: bu_schema/schema_seed/bu_train.trn
CPU cores: 2
BLAST path: /usr/bin
CDS input: False
Prodigal mode: single
Mode: 4
Number of inputs: 251
Number of loci: 971
== CDS prediction ==
Predicting CDS for 251 inputs...
[====================] 100%
== CDS extraction ==
Extracting predicted CDS for 251 inputs...
[====================] 100%
Extracted a total of 1694809 CDS from 251 inputs.
== CDS deduplication ==
Identifying distinct CDS...identified 603928 distinct CDS.
== CDS exact matches ==
Searching for DNA exact matches...found 194185 exact matches (matching 38271 distinct alleles).
Unclassified CDS: 565657
== CDS translation ==
Translating 565657 CDS...
[====================] 100%
Identified 3633 CDS that could not be translated.
Information about untranslatable and small sequences stored in bu_result251_cgMLST/temp/invalid_cds.txt
Unclassified CDS: 562024
== Protein deduplication ==
Identifying distinct proteins...identified 296723 distinct proteins.
== Protein exact matches ==
Searching for Protein exact matches...found 5906 exact matches (22513 distinct CDS, 30655 total CDS).
Unclassified proteins: 290823
== Clustering ==
Translating schema's representative alleles...done.
Creating minimizer index for representative alleles...done.
Created index with 81137 distinct minimizers for 971 loci.
Clustering proteins...
[====================] 100%
Clustered 290823 proteins into 984 clusters.
Clusters to BLAST: 984
[====================] 100%
Classifying clustered proteins...
[====================] 100%
Classified 11856 distinct proteins.
Unclassified proteins: 278967
== Representative determination ==
Iteration 1
Loci: 971
BLASTing loci representatives against unclassified proteins...done.
Traceback (most recent call last):
File "/home/yao/.local/bin/chewBBACA.py", line 8, in
sys.exit(main())
File "/home/yao/.local/lib/python3.10/site-packages/CHEWBBACA/chewBBACA.py", line 1545, in main
functions_info[process]1
File "/home/yao/.local/lib/python3.10/site-packages/CHEWBBACA/utils/process_datetime.py", line 146, in wrapper
func(*args, **kwargs)
File "/home/yao/.local/lib/python3.10/site-packages/CHEWBBACA/chewBBACA.py", line 528, in allele_call
AlleleCall.main(genome_list, loci_list, args.schema_directory,
File "/home/yao/.local/lib/python3.10/site-packages/CHEWBBACA/AlleleCall/AlleleCall.py", line 2718, in main
results = allele_calling(input_files, schema_directory, temp_directory,
File "/home/yao/.local/lib/python3.10/site-packages/CHEWBBACA/AlleleCall/AlleleCall.py", line 2510, in allele_calling
locus_results = expand_matches(match_info, prot_index, dna_index,
File "/home/yao/.local/lib/python3.10/site-packages/CHEWBBACA/AlleleCall/AlleleCall.py", line 1389, in expand_matches
target_protein = str(pfasta_index.get(target_id).seq)
AttributeError: 'NoneType' object has no attribute 'seq'
The text was updated successfully, but these errors were encountered: