Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assigning cgSTs - Brucella spp. cgMLST #210

Open
wjewers opened this issue Dec 3, 2024 · 1 comment
Open

Assigning cgSTs - Brucella spp. cgMLST #210

wjewers opened this issue Dec 3, 2024 · 1 comment
Assignees
Labels
Status: In Progress Has been assigned and is being worked on.

Comments

@wjewers
Copy link

wjewers commented Dec 3, 2024

Hi,

Not necessarily an overt issue with chewBBACA's implementation, but I have a query around some potentially spurious results. I'm having issues whereby everything I run through the AlleleCall module is coming back as a novel cgST. I've adapted the Brucella spp. scheme available from BIGSdb/PubMLST and I'm using the allele profiles available from the same location.

I thought it would be useful to benchmark chewBBACA, before running any of my own isolates through it, by downloading publicly available genomic assemblies, with assigned cgSTs (listed on PubMLST so they should be robust). However, when running these same assemblies through chewBBACA I return nothing but novel cgSTs.

Any light you can shed on this issue, and why it might be happening, would be greatly appreciated.

Thanks,

Will

@rfm-targa rfm-targa self-assigned this Dec 6, 2024
@rfm-targa rfm-targa added the Status: In Progress Has been assigned and is being worked on. label Dec 6, 2024
@rfm-targa
Copy link
Contributor

Hello @wjewers,

Sorry for the delay. The difference in results you're observing is expected. chewBBACA and BIGSdb/PubMLST do not use the same allele calling algorithm. chewBBACA uses Pyrodigal to predict CDSs and classifies CDSs based on clustering and the BLAST Score Ratio after aligning the translated CDSs and schema alleles with BLASTp. I do not know how the allele caller from BIGSdb/PubMLST works, but it is different. This means that the allele definition, what is an allele, varies between chewBBACA and BIGSdb/PubMLST. I think you'll find the same allele for many loci, but there will be differences for some loci. A single difference is enough to change the allelic profile and, consequently, the cgST. You should identify the same or very similar relationships between strains based on chewBBACA's results; you'll just get a consistent identification of different alleles for some loci. If you know a strain's BIGSdb/PubMLST cgST, you can match the chewBBACA allelic profile to that cgST to know which BIGSdb/PubMLST cgST the allelic profile corresponds to.
I hope this helps answer your question. Let us know if you have further questions.

Kind regards,

Rafael

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: In Progress Has been assigned and is being worked on.
Projects
None yet
Development

No branches or pull requests

2 participants