Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove code for ignoring ambiguous spectra from MSFReader.cpp #3111

Merged
merged 14 commits into from
Jan 17, 2025

Conversation

nickshulman
Copy link
Contributor

A user had a problem that spectra which matched more than one peptide in a .pdresult file were not being put into the .blib even if "Keep ambiguous spectra" was checked.
https://skyline.ms/announcements/home/support/thread.view?rowId=64796

MSFReader has its own code for preventing ambiguous spectra from being included in the .blib which does not pay attention to the "-K" command-line parameter.
As far as I can tell, there is no reason for that code to exist, so I figured that we should probably just remove it.

@nickshulman nickshulman requested a review from chambm August 8, 2024 22:02
@chambm
Copy link
Member

chambm commented Aug 20, 2024

Could this be a performance issue with large pdResult files? Perhaps this internal way of eliminating ambiguous spectra is faster than relying on BiblioSpec's generic method?

@nickshulman
Copy link
Contributor Author

Could this be a performance issue with large pdResult files? Perhaps this internal way of eliminating ambiguous spectra is faster than relying on BiblioSpec's generic method?

I don't think it was done for performance reasons.
This code was written before the "-K" command line parameter existed, so, back then, there was no way to tell BiblioSpec to keep any of the ambiguous spectra. This code would seemed to have been written to keep only the spectrum which scored the highest.

There are a couple of problems with this code as it is:

  1. This code does not know anything about the "Confidence Level" (1, 2, 3) type of scores that most pdresult files use these days and so this code thinks usually thinks all of the ambiguous spectra have equal scores and so they all get removed.
  2. Because this code does not know about the "-K" parameter, there is no way to tell BiblioSpec to keep all of the spectra.

I am, however, thinking that the correct fix for this behavior is not to actually remove this code like in this pull request.
Instead, maybe the code should keep all PSMs that score as well as the best PSM and discard all PSMs that score worse. In this way, the scores that are tied for best place will be kept if "-K" was specified and discarded if "-K" was not specified.

I think the way I have things in this pull request might make the behavior worse in the cases where there are some PSMs that clearly scored worse compared to the best PSMs. I will investigate that.

@nickshulman
Copy link
Contributor Author

I am, however, thinking that the correct fix for this behavior is not to actually remove this code like in this pull request. Instead, maybe the code should keep all PSMs that score as well as the best PSM and discard all PSMs that score worse. In this way, the scores that are tied for best place will be kept if "-K" was specified and discarded if "-K" was not specified.

I think the way I have things in this pull request might make the behavior worse in the cases where there are some PSMs that clearly scored worse compared to the best PSMs. I will investigate that.

Actually, I think that the code changes that I have in this pull request are what we should go with.
This other idea that I was proposing here where we only keep the best scoring PSM among multiple PSMs that scored better than the cutoff would be very different behavior from what we do for other types of search results.

@brendanx67
Copy link
Contributor

brendanx67 commented Sep 9, 2024 via email

@nickshulman nickshulman merged commit abe9cec into master Jan 17, 2025
15 checks passed
@nickshulman nickshulman deleted the Skyline/work/20240617_MsfAmbiguousSpectra branch January 17, 2025 06:15
@chambm
Copy link
Member

chambm commented Jan 21, 2025

@nickshulman
Copy link
Contributor Author

tiny-msf-keep seems unreliable. Any idea why? https://teamcity.labkey.org/test/4851941575156252551?currentProjectId=ProteoWizard&orderBy=status&order=desc

It looks like the filename is getting mangled somehow, or is the weird filename part of the test?

Reading results from C:\pwiz\pwiz_tools\BiblioSpec\tests\inputs\试验_tiny.msf.
ERROR: no such table: SchemaInfo

@chambm
Copy link
Member

chambm commented Jan 21, 2025

Unless marked as unsupported, we test BiblioSpec for Unicode support by mangling the filenames internally.

@chambm
Copy link
Member

chambm commented Jan 21, 2025

Oops actually it's opt-in, not opt-out. Looks like you just copied the existing msf test which was opted in. I can't think why your changes would have anything to do with Unicode support though.

@nickshulman
Copy link
Contributor Author

nickshulman commented Jan 22, 2025

I created PR #3342 which adds some logging code in MSFReader::openFile.
The file that it's trying to open ("试验_tiny.msf") is zero bytes long.
It's supposed to be 155648 bytes long.

I imagine "tiny-msf-keep" is failing intermittently because it uses the same input files as the test that comes right before it ("tiny-msf") and something is not getting copied correctly as part of the test setup.

@chambm
Copy link
Member

chambm commented Jan 22, 2025

Or they're both trying to run in parallel and it's a race condition. Let me add a dependency and see if that solves it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants