-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
remove_unused_fragments() disrupts the order of other dataframe attributes like nAA
#269
Comments
nAA
.nAA
Hi, can you give a minimal example which shows the bug. |
OK [GeorgWa](https://github.com/GeorgWa), I will clarify this bug in more detail. When running the following code: precursor_df = refine_precursor_df(precursor_df)
print(f"precursor_df.frag_start_idx.is_monotonic_increasing: {precursor_df.frag_start_idx.is_monotonic_increasing}\n",
f"(precursor_df.frag_start_idx.iloc[1:].values == precursor_df.frag_stop_idx.iloc[:-1].values).all(): {(precursor_df.frag_start_idx.iloc[1:].values == precursor_df.frag_stop_idx.iloc[:-1].values).all()}\n",
f"precursor_df.nAA.is_monotonic_increasing: {precursor_df.nAA.is_monotonic_increasing}.")
precursor_df, _ = remove_unused_fragments(precursor_df, [fragment_intensity_df])
print(f"precursor_df.frag_start_idx.is_monotonic_increasing: {precursor_df.frag_start_idx.is_monotonic_increasing}\n",
f"(precursor_df.frag_start_idx.iloc[1:].values == precursor_df.frag_stop_idx.iloc[:-1].values).all(): {(precursor_df.frag_start_idx.iloc[1:].values == precursor_df.frag_stop_idx.iloc[:-1].values).all()}\n",
f"precursor_df.nAA.is_monotonic_increasing: {precursor_df.nAA.is_monotonic_increasing}.") The results are as follows:
However, the desired results should be:
This is just a specific and straightforward example. The key issue is that the |
Here’s a professionally translated and clarified version of your description: The bug was initially discovered during testing of AlphaPeptDeep. When the model performs inference, there are two possible ways to store the prediction results:
For the entire block assignment to work, the following conditions must be met: (len(precursor_df) == 0) or (
(precursor_df.index.values[0] == 0)
and precursor_df.frag_start_idx.is_monotonic_increasing
and (precursor_df.frag_start_idx.iloc[1:].values == precursor_df.frag_stop_idx.iloc[:-1].values).all()
and precursor_df.nAA.is_monotonic_increasing
and np.all(np.diff(precursor_df.index.values) == 1)
) For further details, you can refer to the I hope my understanding is correct, and I apologize if my explanation has caused you any confusion.🥺 |
The BugAt the moment Instead, a dataframe can be refined by the user which allows functions to use optimizations. The function for this is Furthermore, Your SolutionFirst, The solution you provide does not make sure that See docstring: @nb.njit()
def compress_fragment_indices(frag_idx):
"""
recalculates fragment indices to remove unused fragments. Can be used to compress a fragment library.
Expects fragment indices to be ordered by increasing values (!!!).
It should be O(N) runtime with N being the number of fragment rows.
""" Lastly, we need a unittest which checks if the fragment-precursor pairing is still valid. Otherwise, we can't be sure that the fragments are still the same. Instead of using random fragment samples, for example assign Suggested SolutionLet's split this into two tasks. First, the user (or peptideep) calls We also have to change |
Thx, GeorgWa
precursor_df = precursor_df.sort_values([frag_start_col], ascending=True) This step illustrates that the input order is altered during the process.
|
# order by frag frag_start_col
# calling reset_index would be even better to make sure index is clean
precursor_df = precursor_df.sort_values([frag_start_col], ascending=True)
# ... some code
# restore old input order
precursor_df = precursor_df.sort_index()
What do you think about:
Sorry for the confusion, I hope this way we can address it and merge your PR :) |
Describe the bug
In
compress_fragment_indices()
, thefrag_start_col
has already been sorted in ascending order. Therefore, there is no need to usesort_values()
before callingcompress_fragment_indices
. Doing so may disrupt the order ofnAA
or other sequential attributes.Similarly, using
sort_index()
might compromise the ordered nature offrag_start_col
. To prevent this, it should be replaced withreset_index()
.changes
The text was updated successfully, but these errors were encountered: