Allow "unsafe" (faster) Spectrum1DCollection.from_spectra() #327

ajjackson · 2024-11-18T09:05:51Z

When filtering or otherwise manipulating spectra from an existing collection, we can be confident the axes remain consistent. The safety checks are quite expensive so this should be a useful optimisation.

github-actions · 2024-11-18T09:27:30Z

Test Results

22 files ± 0 22 suites ±0 29m 29s ⏱️ -9s
1 066 tests + 5 1 060 ✅ + 5 6 💤 ±0 0 ❌ ±0
10 600 runs +50 10 537 ✅ +50 63 💤 ±0 0 ❌ ±0

Results for commit 78e4870. ± Comparison against base commit 23fb80b.

♻️ This comment has been updated with latest results.

While refactoring, change assert statements to explicit logic and Errors. This avoids them being ignored in some runtime modes.

ajjackson · 2024-11-18T11:22:58Z

Profiling results a bit odd: there is a decent speedup, but also there is a significant penalty on the first call to from_spectra, regardless of unsafe.

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     8                                           @line_profiler.profile
     9                                           def main():
    10
    11         1       1267.0   1267.0      0.4      ref_spectra = Spectrum1DCollection.from_json_file(ref_file)
    12         1       4973.0   4973.0      1.5      ref_spectra_list = list(ref_spectra)
    13
    14         1        154.0    154.0      0.0      dummy_quantity = Quantity(ref_spectra_list[0].y_data.magnitude, ref_spectra_list[0].y_data.units)
    15
    16         1     235027.0 235027.0     70.2      collection_safe = Spectrum1DCollection.from_spectra(ref_spectra_list * 10)
    17
    18         1      17199.0  17199.0      5.1      collection_unsafe = Spectrum1DCollection.from_spectra(ref_spectra_list * 10, unsafe=True)
    19
    20         1      75966.0  75966.0     22.7      collection_safe = Spectrum1DCollection.from_spectra(ref_spectra_list * 10)

This is driven by the construction of the Quantity for output data (an unavoidable step)

Function: from_spectra at line 1235

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  1235                                               @classmethod
  1236                                               @line_profiler.profile
  1237                                               def from_spectra(
  1238                                                       cls: Self, spectra: Sequence[Spectrum1D], unsafe: bool = False
  1239                                               ) -> Self:
  1240                                                   """Combine Spectrum1D to produce a new collection
  1241
  1242                                                   x_bins and x_tick_labels must be consistent (in magnitude and units)
  1243                                                   across the input spectra. If 'unsafe', this will not be checked.
  1244
  1245                                                   """
  1246         1          0.0      0.0      0.0          if len(spectra) < 1:
  1247                                                       raise IndexError("At least one spectrum is needed for collection")
  1248
  1249         1          1.0      1.0      0.0          cls._item_type_check(spectra[0])
  1250         1         56.0     56.0      0.0          x_data = spectra[0].x_data
  1251         1          0.0      0.0      0.0          x_tick_labels = spectra[0].x_tick_labels
  1252         1         74.0     74.0      0.0          y_data_length = len(spectra[0].y_data)
  1253         1          1.0      1.0      0.0          y_data_magnitude = np.empty((len(spectra), y_data_length))
  1254         1         75.0     75.0      0.0          y_data_magnitude[0, :] = spectra[0].y_data.magnitude
  1255         1         74.0     74.0      0.0          y_data_units = spectra[0].y_data.units
  1256
  1257       220         43.0      0.2      0.0          for i, spectrum in enumerate(spectra[1:]):
  1258       219          9.0      0.0      0.0              if not unsafe:
  1259       219         90.0      0.4      0.0                  cls._item_type_check(spectrum)
  1260       438      57378.0    131.0     24.6                  cls._from_spectra_data_check(
  1261       219         17.0      0.1      0.0                      spectrum, x_data, y_data_units, x_tick_labels)
  1262
  1263       219      17202.0     78.5      7.4              y_data_magnitude[i + 1, :] = spectrum.y_data.magnitude
  1264
  1265         1        597.0    597.0      0.3          metadata = cls._combine_metadata([spec.metadata for spec in spectra])
  1266         1     157958.0 157958.0     67.6          y_data = Quantity(y_data_magnitude, y_data_units)
  1267         2        109.0     54.5      0.0          return cls(x_data, y_data, x_tick_labels=x_tick_labels,
  1268         1          0.0      0.0      0.0                     metadata=metadata)

Total time: 0.240367 s

Presumably Pint is doing some setup/caching. It would be good to understand this better so the information can be separated out more clearly and we know where it might effect production calculations.

This has a surprisingly large performance impact as Pint has to reinvent what a Quantity is. (Good luck finding the Quantity constructor in Pint's source code!)

ajjackson · 2024-11-18T11:46:01Z

Have removed that bottleneck: the problem was bare use of Quantity rather than the existing euphonic.ureg.Quantity. Timing tests now seem reasonable.

File: euphonic_spectra_benchmark.py
Function: main at line 9

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     9                                           @line_profiler.profile
    10                                           def main():
    11
    12         1       1305.0   1305.0      0.7      ref_spectra = Spectrum1DCollection.from_json_file(ref_file)
    13         1       5113.0   5113.0      2.9      ref_spectra_list = list(ref_spectra)
    14
    15         1      77661.0  77661.0     43.4      collection_safe = Spectrum1DCollection.from_spectra(ref_spectra_list * 10)
    16
    17         1      17603.0  17603.0      9.8      collection_unsafe = Spectrum1DCollection.from_spectra(ref_spectra_list * 10, unsafe=True)
    18
    19         1      77341.0  77341.0     43.2      collection_safe = Spectrum1DCollection.from_spectra(ref_spectra_list * 10)

ajjackson · 2024-11-18T12:15:29Z

Next I am looking at some performance improvements to item access by building a bare Spectrum with __new__ and populating with known-correct data. I think it makes sense to do that as a separate PR, targeting this branch while it is in review.

oerc0122

All looks sensible. Just some style suggestions to think about rather than needed.

euphonic/spectra.py

A little more safety seems healthy here! Co-authored-by: Jacob Wilkins <[email protected]>

Co-authored-by: Jacob Wilkins <[email protected]>

I keep forgetting about this, it really is much better

euphonic/spectra.py

Co-authored-by: Jacob Wilkins <[email protected]>

ajjackson · 2024-11-19T09:17:58Z

Thanks for review 😄

Allow "unsafe" (faster) Spectrum1DCollection.from_spectra()

1028add

When filtering or otherwise manipulating spectra from an existing collection, we can be confident the axes remain consistent. The safety checks are quite expensive so this should be a useful optimisation.

ajjackson added 3 commits November 18, 2024 10:49

Implement Spectrum2DCollection.from_spectra() unsafe option

677d15e

While refactoring, change assert statements to explicit logic and Errors. This avoids them being ignored in some runtime modes.

Unit test unsafe 2D from_spectra()

2ddf008

Update CHANGELOG

d581bd8

ajjackson marked this pull request as ready for review November 18, 2024 11:03

Use ureg.Quantity in .from_spectra()

e05935c

This has a surprisingly large performance impact as Pint has to reinvent what a Quantity is. (Good luck finding the Quantity constructor in Pint's source code!)

ajjackson requested a review from oerc0122 November 18, 2024 12:11

oerc0122 approved these changes Nov 18, 2024

View reviewed changes

euphonic/spectra.py Outdated Show resolved Hide resolved

euphonic/spectra.py Outdated Show resolved Hide resolved

euphonic/spectra.py Outdated Show resolved Hide resolved

euphonic/spectra.py Show resolved Hide resolved

euphonic/spectra.py Outdated Show resolved Hide resolved

ajjackson and others added 3 commits November 18, 2024 17:16

Apply suggestions from code review: kwarg-only "unsafe"

2dec9fa

A little more safety seems healthy here! Co-authored-by: Jacob Wilkins <[email protected]>

Remove superfluous parens

9654164

Co-authored-by: Jacob Wilkins <[email protected]>

Use the enumerate start argument, cheers @oerc0122

b043098

I keep forgetting about this, it really is much better

oerc0122 reviewed Nov 19, 2024

View reviewed changes

euphonic/spectra.py Outdated Show resolved Hide resolved

Another mandatory kwarg for "unsafe"

78e4870

Co-authored-by: Jacob Wilkins <[email protected]>

oerc0122 approved these changes Nov 19, 2024

View reviewed changes

ajjackson merged commit 27c9565 into master Nov 19, 2024
10 checks passed

ajjackson deleted the unsafe-spectra branch December 12, 2024 16:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow "unsafe" (faster) Spectrum1DCollection.from_spectra() #327

Allow "unsafe" (faster) Spectrum1DCollection.from_spectra() #327

ajjackson commented Nov 18, 2024

github-actions bot commented Nov 18, 2024 •

edited

Loading

ajjackson commented Nov 18, 2024 •

edited

Loading

ajjackson commented Nov 18, 2024

ajjackson commented Nov 18, 2024

oerc0122 left a comment

ajjackson commented Nov 19, 2024

Allow "unsafe" (faster) Spectrum1DCollection.from_spectra() #327

Allow "unsafe" (faster) Spectrum1DCollection.from_spectra() #327

Conversation

ajjackson commented Nov 18, 2024

github-actions bot commented Nov 18, 2024 • edited Loading

Test Results

ajjackson commented Nov 18, 2024 • edited Loading

ajjackson commented Nov 18, 2024

ajjackson commented Nov 18, 2024

oerc0122 left a comment

Choose a reason for hiding this comment

ajjackson commented Nov 19, 2024

github-actions bot commented Nov 18, 2024 •

edited

Loading

ajjackson commented Nov 18, 2024 •

edited

Loading