Feature update: calculateFragments2 to include modifications #16

guideflandre · 2024-12-20T10:08:07Z

The current function calculateFragments uses modifications as fixed modifications, thus not giving all possibilities or combinations of modifications. Instead, I propose calculateFragments2 that replaces the orginal modification parameter with these three parameters:

fixed_modifications applies fixed modifications regardless of max_mods. Much like how the current calculateFragments applies modifications
variable_modifications applies modifications based on all combinations possible and limited to the number of max_mods. If peptide ARGHKA has variable_modifications = c(A = 10, G = 20), max_mods = 2, then there will be 7 possibilities: "[A]R[G]HKA" "[A]RGHK[A]" "[A]RGHKA" "AR[G]HK[A]" "AR[G]HKA" "ARGHK[A]" "ARGHKA"
max_mods sets a limit for the maximus amount of variable modifications at once on the peptide. By default, it's set at +Inf (allowing all combinations possible - no restraint on the maximum amount of mods at once).

The returning value is a dataframe with the same columns and output as the original function, simply adding lines for new possibilities of modifications combinations. There is an additional column peptide that gives the modified sequence that the fragment belongs to (amino acids within brackets are modified, only variable modifications are shown).

I have added the necessary documentation on the function.

@lgatto @sgibb

feat: calculateFragments2 Provides modifications to the generated theoretical fragments

feat: tests for calculateFragments2

guideflandre · 2024-12-20T10:12:32Z

The function also resolves 2 of the current open issues in PSMatch

lgatto

I haven't looked at the code yet. This will follow.
I am wondering whether it wouldn't be best for immediately merge both functions. Or may be as a second PR, and only momentarily keep the two variants. Either there's a single function/code, ou there could be two, that are called based on the presence of variable modifications - something along the lines of

if (variable_modifications) {
      calculateFragments2(...)
} else {
   calculateFragments(...)
}

To be discussed as part of the review.

DESCRIPTION

PSMatch.Rproj

R/fragments-calculate2.R

lgatto · 2024-12-24T14:09:43Z

@guideflandre - please also amend the latest entry in the NEWS.md files.

lgatto · 2024-12-24T14:15:28Z

Following up on the merging of calculateFragments() and calculateFragments2(), given that the former is a method and the latter a function, I suggest we keep the two separate first, merge the documentation, make sure that the unit tests confirm that their behaviour are identical when calculateFragments2() has no variable modifications, and consider merging later.

We will need to consider how to handle the modifiations from calculateFragments() and fixed_modifications from calculateFragments2(). I would suggest to keep both for backwards compatibility, and made the latter a synonym of the former, and if fixed_modifications is set, modifiations is ignored.

R/fragments-calculate2.R

tests/testthat/test_fragments2.R

Updates based on reviewed PR

Updates according to reviewed PR

sgibb · 2024-12-25T22:55:43Z

Dear @guideflandre, thanks for this contribution. Before looking at the code I am wondering whether we are really need/want this "brute-force" algorithm. I am afraid the number of combinations would explode easily.

If we want to solve #14 and #15 that both ask for a modification on a specific position we may should use a different solution like (partly) supporting ProForma (instead of creating tens, hundreds or thousands of combinations to get just one specific fragment with the modification the user is interested in).

Please see my PR #17 for an example implementation.

lgatto · 2024-12-26T07:13:39Z

Hi @sgibb - I think this and #17 address partly overlapping, but different use cases. The brute force approach variable_modifications = c(S = 79) addresses the ProForma <S79>ARGSHKSATC syntax, but there would still be a need to compute all possible peptidoform sequences and their fragments if that's what we want. The use case we have here is really to compute all possible fragments with variable modifications.

sgibb · 2024-12-26T10:21:54Z

@lgatto thanks for the clarification. In that case I agree that we should merge calculateFragments and calculateFragments2 to avoid code duplication and keep it simple for the user (or find a better name for calculateFragments2, e.g. calculateFragmentsVariableMods ...).

sgibb

I didn't not completely review everything (e.g. I didn't look at the calculateFragment2 integrate of the calculated combinations yet).

R/fragments-calculate2.R

Co-authored-by: Sebastian Gibb <[email protected]>

Corrected .cumsumFragmentMasses

guideflandre · 2025-01-08T10:28:14Z

I benchmarked the current version of calculateFragments and calculateFragments2 and here are the results for a thousand runs without modifications:

> result <- microbenchmark(calculateFragments("PQRAGRTKIPK", verbose = FALSE), 
                           calculateFragments2("PQRAGRTKIPK", verbose = FALSE), 
                           times = 1000L)
> result
Unit: milliseconds
             expr         min        lq      mean     median     uq      max    neval
 calculateFragments      1.852632 1.917022 2.016044 1.949842 2.004322 14.56149  1000
 calculateFragments2     1.551952 1.620542 1.751102 1.647552 1.696492 16.93496  1000

calculateFragments2 seems to perform ever-so-slightly better. The same is true when benchmarking modifications against fixed_modifications. In case of variable_modificaitons, calculateFragments2 is naturally slower.

guideflandre · 2025-01-09T09:53:06Z

In order to merge calculateFragments and calculateFragments2, two "conflicts" arise:

calculateFragments uses the paramater modifications whereas calculateFragments2 uses fixed_modifications. Their functionality is identical, simply their names differ. To avoid making users' code throw errors when the parameter modifications is called but not recognised by calculateFragments2, I added a warning at the start of the function saying modifications is deprecated, while replacing fixed_modifications with modifications:

calculateFragments2 <- function(sequence, 
                                fixed_modifications = c(C = 57.02146),
                                modifications = NULL, ...) {
    
    if (!is.null(modifications)) {
        warning("'modifications' is deprecated, please use 'fixed_modifications' instead.")
        fixed_modifications <- modifications
    }

This way, all cases where modifications == NULL | modifications != NULL and/or fixed_modifications == NULL | fixed_modifications != NULL result in the code working. This is but a temporary approach, ideally removing the modifications parameter in the future.
I can generate unit tests for these, do tell me if you think another approach should be used.

The functions dependent on calculateFragments are: addFragments and plotSpectra from the Spectra package, the latter being dependent on the former. In case when variable_modifications is called, addFragments fails due to duplicated mz values. A remedy for this could be this edited version of addFragments where a list of labels is generated in case variable_modifications is called:

addFragments2 <- function (x, tolerance = 0, ppm = 20, ...) {
    stopifnot(requireNamespace("Spectra"))
    stopifnot(inherits(x, "Spectra"), length(x) == 1)
    stopifnot("sequence" %in% Spectra::spectraVariables(x))
    y <- Spectra::spectraData(x)[["sequence"]]
    x_data <- Spectra::peaksData(x)[[1L]]
    y_data <- calculateFragments2(y, verbose = FALSE, ...)
    
    if (length(unique(y_data$peptide))>1) { ## recognises that variable modifications have been used
        y_data <- split(y_data, y_data$peptide)
        labels <- list()
        for (i in 1:NROW(y_data)) {
            y_data[[i]] <- y_data[[i]][order(y_data[[i]]$mz), ]
            idx <- which(MsCoreUtils::common(x_data[, "mz"],
                                             y_data[[i]][,"mz"],
                                             tolerance = tolerance,
                                             ppm = ppm))
            idy <- which(MsCoreUtils::common(y_data[[i]][, "mz"],
                                             x_data[,"mz"], 
                                             tolerance = tolerance, 
                                             ppm = ppm))
            
            labels[[i]] <- rep(NA_character_, nrow(x_data))
            labels[[i]][idx] <- y_data[[i]][idy, "ion"]
        } 
        labels
    } else { ## current addFragment function
        y_data <- y_data[order(y_data$mz), ]
        idx <- which(MsCoreUtils::common(x_data[, "mz"],
                                         y_data[,"mz"],
                                         tolerance = tolerance,
                                         ppm = ppm))
        idy <- which(MsCoreUtils::common(y_data[, "mz"],
                                         x_data[,"mz"], 
                                         tolerance = tolerance, 
                                         ppm = ppm))
        
        labels <- rep(NA_character_, nrow(x_data))
        labels[idx] <- y_data[idy, "ion"]
        labels
    }
}

This way, the existing code will work with modifications and the plots are not disturbed as variable_modifications is not called. However, when variable_modifications is called, a list of labels is generated and the plotting throws an error.

I have been working on a new plotSpectra function that would allow variable_modifications as well as improve the visualisation (including the annotated fragments (color-coded) and the sequence in the plot):

I will create a PR for this new plotting function soon, but as the whole ecosystem is dependent on calculateFragments I thought it might be good to mention it here. In any case, we can start by accepting addFragments' annotated peaks from the sequence without variable modifications at first, and build on this when the new plotSpectra function is optimised to accept these modifications.

@lgatto @sgibb

Change modified peptide layout into AGC[57.02]AK instead of AG[C]AK to specify the modified mass

lgatto · 2025-01-17T15:05:28Z

@sgibb - are you ok with this? If so, could you approve.

lgatto · 2025-01-17T15:06:37Z

@guideflandre - once this PR is merged, I suggest you send a quick PR to replace calculateFragments() with the new version.

sgibb

Just a minor thing to discuss. Otherwise I am fine with that PR.

R/fragments-calculate2.R

sgibb · 2025-01-20T11:00:42Z

I benchmarked the current version of calculateFragments and calculateFragments2 and here are the results for a thousand runs without modifications:
> result <- microbenchmark(calculateFragments("PQRAGRTKIPK", verbose = FALSE), 
                           calculateFragments2("PQRAGRTKIPK", verbose = FALSE), 
                           times = 1000L)
> result
Unit: milliseconds
             expr         min        lq      mean     median     uq      max    neval
 calculateFragments      1.852632 1.917022 2.016044 1.949842 2.004322 14.56149  1000
 calculateFragments2     1.551952 1.620542 1.751102 1.647552 1.696492 16.93496  1000
calculateFragments2 seems to perform ever-so-slightly better. The same is true when benchmarking modifications against fixed_modifications. In case of variable_modificaitons, calculateFragments2 is naturally slower.

Without doing this benchmark on my own. How could it be that the new version that does more is faster than the old one?

sgibb · 2025-01-20T11:03:24Z

@guideflandre I really like your plotSpectra method! Could you integrate the delta plot as well (#13 ; and create a PR for this)?

guideflandre · 2025-01-20T13:00:28Z

Without doing this benchmark on my own. How could it be that the new version that does more is faster than the old one?

In PSMatch, calculateFragments is called through setMethod. The version I built is not. I believe this might be the source for that ? I intended to replace calculateFragments with calculateFragments2 in a new PR, which is why I didn't build a new method.

guideflandre · 2025-01-20T13:03:54Z

@guideflandre I really like your plotSpectra method! Could you integrate the delta plot as well (#13 ; and create a PR for this)?

I have encountered some hurdles with the integration of my version of plotSpectra, especially for the other functions plotSpectraMirror and and plotSpectraOverlay. I will create an appropriate issue to discuss those with you, Laurent and Johannes. The delta plot is indeed and interesting touch I will look into

sgibb · 2025-01-20T16:02:20Z

Without doing this benchmark on my own. How could it be that the new version that does more is faster than the old one?

In PSMatch, calculateFragments is called through setMethod. The version I built is not. I believe this might be the source for that ? I intended to replace calculateFragments with calculateFragments2 in a new PR, which is why I didn't build a new method.

Ok, indeed dispatching takes much time.

sgibb

Could you please fix the documentation:

❯ checking for code/documentation mismatches ... WARNING
  Codoc mismatches from Rd file 'calculateFragments2.Rd':
  calculateFragments2
    Code: function(sequence, type = c("b", "y"), z = 1,
                   fixed_modifications = c(C = 57.02146),
                   variable_modifications = numeric(), max_mods = Inf,
                   neutralLoss = defaultNeutralLoss(), verbose = TRUE,
                   modifications = NULL)
    Docs: function(sequence, type = c("b", "y"), z = 1,
                   fixed_modifications = c(C = 57.02146),
                   variable_modifications = NULL, max_mods = Inf,
                   neutralLoss = defaultNeutralLoss(), verbose = TRUE)
    Argument names in code not in docs:
      modifications
    Mismatches in argument default values:
      Name: 'variable_modifications' Code: numeric() Docs: NULL

https://github.com/rformassspectrometry/PSMatch/actions/runs/12868423898/job/35885823856?pr=16

guideflandre · 2025-01-20T17:38:51Z

Could you please fix the documentation:

Fixed ! Sorry for the delay

guideflandre · 2025-01-20T18:42:08Z

Once again, fixed the documentation, sorry for this:

❯ checking Rd \usage sections ... WARNING
  Undocumented arguments in Rd file 'calculateFragments2.Rd'
    ‘modifications’
  
  Functions with \usage entries need to have the appropriate \alias
  entries, and all their arguments documented.
  The \usage entries must correspond to syntactically valid R code.
  See chapter ‘Writing R documentation files’ in the ‘Writing R
  Extensions’ manual.

❯ checking R code for possible problems ... NOTE
  .modificationPositions : <anonymous>: no visible global function
    definition for ‘combn’
  Undefined global functions or variables:
    combn
  Consider adding
    importFrom("utils", "combn")
  to your NAMESPACE file.
```

guideflandre and others added 7 commits December 19, 2024 16:33

Create fragments-calculate2.R

11d943c

feat: calculateFragments2 Provides modifications to the generated theoretical fragments

Create test_fragments2.R

32dd91b

feat: tests for calculateFragments2

Update fragments-calculate2.R

ab01551

Update fragments-calculate2.R

d1951bb

Update fragments-calculate2.R

d505a3d

Update documentation

b75af03

Update documentation

f3b7ba7

lgatto requested changes Dec 24, 2024

View reviewed changes

DESCRIPTION Show resolved Hide resolved

PSMatch.Rproj Outdated Show resolved Hide resolved

R/fragments-calculate2.R Outdated Show resolved Hide resolved

R/fragments-calculate2.R Show resolved Hide resolved

R/fragments-calculate2.R Outdated Show resolved Hide resolved

lgatto reviewed Dec 24, 2024

View reviewed changes

R/fragments-calculate2.R Outdated Show resolved Hide resolved

Delete PSMatch.Rproj

ea262a0

lgatto reviewed Dec 24, 2024

View reviewed changes

guideflandre added 4 commits December 24, 2024 15:42

Update NEWS.md

dfeaa2e

Update DESCRIPTION

7c8f6c9

Update fragments-calculate2.R

9d1ea08

Updates based on reviewed PR

Update test_fragments2.R

befd86a

Updates according to reviewed PR

sgibb requested changes Dec 26, 2024

View reviewed changes

R/fragments-calculate2.R Outdated Show resolved Hide resolved

R/fragments-calculate2.R Outdated Show resolved Hide resolved

R/fragments-calculate2.R Outdated Show resolved Hide resolved

R/fragments-calculate2.R Outdated Show resolved Hide resolved

guideflandre and others added 8 commits January 2, 2025 10:40

Update R/fragments-calculate2.R

046cca0

Co-authored-by: Sebastian Gibb <[email protected]>

Update fragments-calculate2.R

ebb8673

Update fragments-calculate2.R

5c4ce49

Corrected .cumsumFragmentMasses

Update DESCRIPTION

7c8afd6

git ignore Rproj file

6d25915

update .cumsumFragmentMasses

f5b8b06

update: testthat .cumsumFragmentMasses

9d5b9ff

update variable_modifications default to empty numeric

5c0344e

guideflandre mentioned this pull request Jan 9, 2025

Error in addFragments when calling multiple Spectra objects #18

Open

Update fragments-calculate2.R

a39cbee

Change modified peptide layout into AGC[57.02]AK instead of AG[C]AK to specify the modified mass

lgatto approved these changes Jan 17, 2025

View reviewed changes

Add warning for modifications parameter

0525f86

sgibb requested changes Jan 20, 2025

View reviewed changes

R/fragments-calculate2.R Outdated Show resolved Hide resolved

Remove redundancy in strplit(sequence)

f4a5d0f

sgibb approved these changes Jan 20, 2025

View reviewed changes

sgibb self-requested a review January 20, 2025 17:14

sgibb requested changes Jan 20, 2025

View reviewed changes

Update documentation

2526ccd

update documentation

bdd7cd8

sgibb self-requested a review January 20, 2025 21:28

sgibb approved these changes Jan 20, 2025

View reviewed changes

sgibb merged commit a046b32 into rformassspectrometry:main Jan 20, 2025
1 check passed

guideflandre mentioned this pull request Jan 21, 2025

Improve Spectra::plotSpectra and PSMatch::addFragments rformassspectrometry/Spectra#346

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature update: calculateFragments2 to include modifications #16

Feature update: calculateFragments2 to include modifications #16

guideflandre commented Dec 20, 2024

guideflandre commented Dec 20, 2024

lgatto left a comment

lgatto commented Dec 24, 2024

lgatto commented Dec 24, 2024 •

edited

Loading

sgibb commented Dec 25, 2024

lgatto commented Dec 26, 2024

sgibb commented Dec 26, 2024 •

edited

Loading

sgibb left a comment

guideflandre commented Jan 8, 2025

guideflandre commented Jan 9, 2025 •

edited

Loading

lgatto commented Jan 17, 2025

lgatto commented Jan 17, 2025

sgibb left a comment

sgibb commented Jan 20, 2025

sgibb commented Jan 20, 2025

guideflandre commented Jan 20, 2025

guideflandre commented Jan 20, 2025

sgibb commented Jan 20, 2025

sgibb left a comment

guideflandre commented Jan 20, 2025

guideflandre commented Jan 20, 2025

Feature update: calculateFragments2 to include modifications #16

Feature update: calculateFragments2 to include modifications #16

Conversation

guideflandre commented Dec 20, 2024

guideflandre commented Dec 20, 2024

lgatto left a comment

Choose a reason for hiding this comment

lgatto commented Dec 24, 2024

lgatto commented Dec 24, 2024 • edited Loading

sgibb commented Dec 25, 2024

lgatto commented Dec 26, 2024

sgibb commented Dec 26, 2024 • edited Loading

sgibb left a comment

Choose a reason for hiding this comment

guideflandre commented Jan 8, 2025

guideflandre commented Jan 9, 2025 • edited Loading

lgatto commented Jan 17, 2025

lgatto commented Jan 17, 2025

sgibb left a comment

Choose a reason for hiding this comment

sgibb commented Jan 20, 2025

sgibb commented Jan 20, 2025

guideflandre commented Jan 20, 2025

guideflandre commented Jan 20, 2025

sgibb commented Jan 20, 2025

sgibb left a comment

Choose a reason for hiding this comment

guideflandre commented Jan 20, 2025

guideflandre commented Jan 20, 2025

lgatto commented Dec 24, 2024 •

edited

Loading

sgibb commented Dec 26, 2024 •

edited

Loading

guideflandre commented Jan 9, 2025 •

edited

Loading