Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow "unsafe" (faster) Spectrum1DCollection.from_spectra() #327

Merged
merged 9 commits into from
Nov 19, 2024

Conversation

ajjackson
Copy link
Collaborator

When filtering or otherwise manipulating spectra from an existing collection, we can be confident the axes remain consistent. The safety checks are quite expensive so this should be a useful optimisation.

When filtering or otherwise manipulating spectra from an existing
collection, we can be confident the axes remain consistent. The safety
checks are quite expensive so this should be a useful optimisation.
Copy link
Contributor

github-actions bot commented Nov 18, 2024

Test Results

    22 files  ± 0      22 suites  ±0   29m 29s ⏱️ -9s
 1 066 tests + 5   1 060 ✅ + 5   6 💤 ±0  0 ❌ ±0 
10 600 runs  +50  10 537 ✅ +50  63 💤 ±0  0 ❌ ±0 

Results for commit 78e4870. ± Comparison against base commit 23fb80b.

♻️ This comment has been updated with latest results.

While refactoring, change assert statements to explicit logic and
Errors. This avoids them being ignored in some runtime modes.
@ajjackson ajjackson marked this pull request as ready for review November 18, 2024 11:03
@ajjackson
Copy link
Collaborator Author

ajjackson commented Nov 18, 2024

Profiling results a bit odd: there is a decent speedup, but also there is a significant penalty on the first call to from_spectra, regardless of unsafe.

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     8                                           @line_profiler.profile
     9                                           def main():
    10
    11         1       1267.0   1267.0      0.4      ref_spectra = Spectrum1DCollection.from_json_file(ref_file)
    12         1       4973.0   4973.0      1.5      ref_spectra_list = list(ref_spectra)
    13
    14         1        154.0    154.0      0.0      dummy_quantity = Quantity(ref_spectra_list[0].y_data.magnitude, ref_spectra_list[0].y_data.units)
    15
    16         1     235027.0 235027.0     70.2      collection_safe = Spectrum1DCollection.from_spectra(ref_spectra_list * 10)
    17
    18         1      17199.0  17199.0      5.1      collection_unsafe = Spectrum1DCollection.from_spectra(ref_spectra_list * 10, unsafe=True)
    19
    20         1      75966.0  75966.0     22.7      collection_safe = Spectrum1DCollection.from_spectra(ref_spectra_list * 10)

This is driven by the construction of the Quantity for output data (an unavoidable step)

Function: from_spectra at line 1235

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  1235                                               @classmethod
  1236                                               @line_profiler.profile
  1237                                               def from_spectra(
  1238                                                       cls: Self, spectra: Sequence[Spectrum1D], unsafe: bool = False
  1239                                               ) -> Self:
  1240                                                   """Combine Spectrum1D to produce a new collection
  1241
  1242                                                   x_bins and x_tick_labels must be consistent (in magnitude and units)
  1243                                                   across the input spectra. If 'unsafe', this will not be checked.
  1244
  1245                                                   """
  1246         1          0.0      0.0      0.0          if len(spectra) < 1:
  1247                                                       raise IndexError("At least one spectrum is needed for collection")
  1248
  1249         1          1.0      1.0      0.0          cls._item_type_check(spectra[0])
  1250         1         56.0     56.0      0.0          x_data = spectra[0].x_data
  1251         1          0.0      0.0      0.0          x_tick_labels = spectra[0].x_tick_labels
  1252         1         74.0     74.0      0.0          y_data_length = len(spectra[0].y_data)
  1253         1          1.0      1.0      0.0          y_data_magnitude = np.empty((len(spectra), y_data_length))
  1254         1         75.0     75.0      0.0          y_data_magnitude[0, :] = spectra[0].y_data.magnitude
  1255         1         74.0     74.0      0.0          y_data_units = spectra[0].y_data.units
  1256
  1257       220         43.0      0.2      0.0          for i, spectrum in enumerate(spectra[1:]):
  1258       219          9.0      0.0      0.0              if not unsafe:
  1259       219         90.0      0.4      0.0                  cls._item_type_check(spectrum)
  1260       438      57378.0    131.0     24.6                  cls._from_spectra_data_check(
  1261       219         17.0      0.1      0.0                      spectrum, x_data, y_data_units, x_tick_labels)
  1262
  1263       219      17202.0     78.5      7.4              y_data_magnitude[i + 1, :] = spectrum.y_data.magnitude
  1264
  1265         1        597.0    597.0      0.3          metadata = cls._combine_metadata([spec.metadata for spec in spectra])
  1266         1     157958.0 157958.0     67.6          y_data = Quantity(y_data_magnitude, y_data_units)
  1267         2        109.0     54.5      0.0          return cls(x_data, y_data, x_tick_labels=x_tick_labels,
  1268         1          0.0      0.0      0.0                     metadata=metadata)

Total time: 0.240367 s

Presumably Pint is doing some setup/caching. It would be good to understand this better so the information can be separated out more clearly and we know where it might effect production calculations.

This has a surprisingly large performance impact as Pint has to
reinvent what a Quantity is. (Good luck finding the Quantity
constructor in Pint's source code!)
@ajjackson
Copy link
Collaborator Author

Have removed that bottleneck: the problem was bare use of Quantity rather than the existing euphonic.ureg.Quantity. Timing tests now seem reasonable.

File: euphonic_spectra_benchmark.py
Function: main at line 9

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     9                                           @line_profiler.profile
    10                                           def main():
    11
    12         1       1305.0   1305.0      0.7      ref_spectra = Spectrum1DCollection.from_json_file(ref_file)
    13         1       5113.0   5113.0      2.9      ref_spectra_list = list(ref_spectra)
    14
    15         1      77661.0  77661.0     43.4      collection_safe = Spectrum1DCollection.from_spectra(ref_spectra_list * 10)
    16
    17         1      17603.0  17603.0      9.8      collection_unsafe = Spectrum1DCollection.from_spectra(ref_spectra_list * 10, unsafe=True)
    18
    19         1      77341.0  77341.0     43.2      collection_safe = Spectrum1DCollection.from_spectra(ref_spectra_list * 10)

@ajjackson ajjackson requested a review from oerc0122 November 18, 2024 12:11
@ajjackson
Copy link
Collaborator Author

Next I am looking at some performance improvements to item access by building a bare Spectrum with __new__ and populating with known-correct data. I think it makes sense to do that as a separate PR, targeting this branch while it is in review.

Copy link
Collaborator

@oerc0122 oerc0122 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All looks sensible. Just some style suggestions to think about rather than needed.

euphonic/spectra.py Outdated Show resolved Hide resolved
euphonic/spectra.py Outdated Show resolved Hide resolved
euphonic/spectra.py Outdated Show resolved Hide resolved
euphonic/spectra.py Show resolved Hide resolved
euphonic/spectra.py Outdated Show resolved Hide resolved
ajjackson and others added 3 commits November 18, 2024 17:16
A little more safety seems healthy here!

Co-authored-by: Jacob Wilkins <[email protected]>
Co-authored-by: Jacob Wilkins <[email protected]>
I keep forgetting about this, it really is much better
euphonic/spectra.py Outdated Show resolved Hide resolved
@ajjackson
Copy link
Collaborator Author

Thanks for review 😄

@ajjackson ajjackson merged commit 27c9565 into master Nov 19, 2024
10 checks passed
@ajjackson ajjackson deleted the unsafe-spectra branch December 12, 2024 16:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants