-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize computation of timeseries statistics through multiprocessing #1492
base: main-dev
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main-dev #1492 +/- ##
============================================
- Coverage 78.46% 78.44% -0.02%
============================================
Files 139 139
Lines 21119 21130 +11
============================================
+ Hits 16570 16576 +6
- Misses 4549 4554 +5
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
…ional with aerovaldb
…ly functional with aerovaldb" This reverts commit afabd57.
Note afabd57 contains the changes to parallelize the computations of the heat maps and scatter plots which were deemed outside of the scope of this PR in Pyaerocom Meeting 2025-01-27. This is because it requires parallel writing of files, which is a thorny problem we can't currently handle in aerovaldb. I'm keeping note of the commit here in case we want to revisit this problem. |
Change Summary
Timeseries were computed independently and in serial by looping over the regions. This PR parallelizes this for loop using multiprocessing. To that end, implements a function in
pyaerocom/aeroval/coldatatojson_helpers.py
,_process_statistics_timeseries_single_region
, to compute the timeseries for a specific region, and uses multiprocessing inpyaerocom/aeroval/coldatatojson_engine.py
to call this function. I also corrected some type hints.Related issue number
Partly addresses #1277
Checklist