-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature update: calculateFragments2 to include modifications #16
Conversation
feat: calculateFragments2 Provides modifications to the generated theoretical fragments
feat: tests for calculateFragments2
The function also resolves 2 of the current open issues in PSMatch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- I haven't looked at the code yet. This will follow.
- I am wondering whether it wouldn't be best for immediately merge both functions. Or may be as a second PR, and only momentarily keep the two variants. Either there's a single function/code, ou there could be two, that are called based on the presence of variable modifications - something along the lines of
if (variable_modifications) {
calculateFragments2(...)
} else {
calculateFragments(...)
}
To be discussed as part of the review.
@guideflandre - please also amend the latest entry in the |
Following up on the merging of We will need to consider how to handle the |
Updates based on reviewed PR
Updates according to reviewed PR
Dear @guideflandre, thanks for this contribution. Before looking at the code I am wondering whether we are really need/want this "brute-force" algorithm. I am afraid the number of combinations would explode easily. If we want to solve #14 and #15 that both ask for a modification on a specific position we may should use a different solution like (partly) supporting ProForma (instead of creating tens, hundreds or thousands of combinations to get just one specific fragment with the modification the user is interested in). Please see my PR #17 for an example implementation. |
Hi @sgibb - I think this and #17 address partly overlapping, but different use cases. The brute force approach |
@lgatto thanks for the clarification. In that case I agree that we should merge |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't not completely review everything (e.g. I didn't look at the calculateFragment2
integrate of the calculated combinations yet).
Co-authored-by: Sebastian Gibb <[email protected]>
Corrected .cumsumFragmentMasses
I benchmarked the current version of > result <- microbenchmark(calculateFragments("PQRAGRTKIPK", verbose = FALSE),
calculateFragments2("PQRAGRTKIPK", verbose = FALSE),
times = 1000L)
> result
Unit: milliseconds
expr min lq mean median uq max neval
calculateFragments 1.852632 1.917022 2.016044 1.949842 2.004322 14.56149 1000
calculateFragments2 1.551952 1.620542 1.751102 1.647552 1.696492 16.93496 1000
|
Change modified peptide layout into AGC[57.02]AK instead of AG[C]AK to specify the modified mass
@sgibb - are you ok with this? If so, could you approve. |
@guideflandre - once this PR is merged, I suggest you send a quick PR to replace |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a minor thing to discuss. Otherwise I am fine with that PR.
Without doing this benchmark on my own. How could it be that the new version that does more is faster than the old one? |
@guideflandre I really like your |
In PSMatch, |
I have encountered some hurdles with the integration of my version of |
Ok, indeed dispatching takes much time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please fix the documentation:
❯ checking for code/documentation mismatches ... WARNING
Codoc mismatches from Rd file 'calculateFragments2.Rd':
calculateFragments2
Code: function(sequence, type = c("b", "y"), z = 1,
fixed_modifications = c(C = 57.02146),
variable_modifications = numeric(), max_mods = Inf,
neutralLoss = defaultNeutralLoss(), verbose = TRUE,
modifications = NULL)
Docs: function(sequence, type = c("b", "y"), z = 1,
fixed_modifications = c(C = 57.02146),
variable_modifications = NULL, max_mods = Inf,
neutralLoss = defaultNeutralLoss(), verbose = TRUE)
Argument names in code not in docs:
modifications
Mismatches in argument default values:
Name: 'variable_modifications' Code: numeric() Docs: NULL
https://github.com/rformassspectrometry/PSMatch/actions/runs/12868423898/job/35885823856?pr=16
Fixed ! Sorry for the delay |
Once again, fixed the documentation, sorry for this: ❯ checking Rd \usage sections ... WARNING
Undocumented arguments in Rd file 'calculateFragments2.Rd'
‘modifications’
Functions with \usage entries need to have the appropriate \alias
entries, and all their arguments documented.
The \usage entries must correspond to syntactically valid R code.
See chapter ‘Writing R documentation files’ in the ‘Writing R
Extensions’ manual.
❯ checking R code for possible problems ... NOTE
.modificationPositions : <anonymous>: no visible global function
definition for ‘combn’
Undefined global functions or variables:
combn
Consider adding
importFrom("utils", "combn")
to your NAMESPACE file.
``` |
The current function
calculateFragments
uses modifications as fixed modifications, thus not giving all possibilities or combinations of modifications. Instead, I proposecalculateFragments2
that replaces the orginalmodification
parameter with these three parameters:fixed_modifications
applies fixed modifications regardless ofmax_mods
. Much like how the currentcalculateFragments
applies modificationsvariable_modifications
applies modifications based on all combinations possible and limited to the number ofmax_mods
. If peptideARGHKA
hasvariable_modifications = c(A = 10, G = 20), max_mods = 2
, then there will be 7 possibilities:"[A]R[G]HKA"
"[A]RGHK[A]"
"[A]RGHKA"
"AR[G]HK[A]"
"AR[G]HKA"
"ARGHK[A]"
"ARGHKA"
max_mods
sets a limit for the maximus amount of variable modifications at once on the peptide. By default, it's set at +Inf (allowing all combinations possible - no restraint on the maximum amount of mods at once).The returning value is a dataframe with the same columns and output as the original function, simply adding lines for new possibilities of modifications combinations. There is an additional column
peptide
that gives the modified sequence that the fragment belongs to (amino acids within brackets are modified, only variable modifications are shown).I have added the necessary documentation on the function.
@lgatto @sgibb