Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pandas wrapper #483

Merged
merged 53 commits into from
Nov 25, 2024
Merged

Pandas wrapper #483

merged 53 commits into from
Nov 25, 2024

Conversation

nick-harder
Copy link
Member

@nick-harder nick-harder commented Nov 15, 2024

Pull Request

Related Issue

Closes #321

Description

As suggested by @maurerle this improves performance of all simulations by a factor of 2 up to 4, by replacing pandas actions with numpy where possible. For this, we are using a wrapper object FastIndex and FastSeries which wraps a numpy array so that we can access it using typical datetime accessors.

The speed up for small simulations is 2x and for large simulations 3x.

Changes Proposed

  • Shift to using special classes of FastIndex and FastSeries
  • Adjust the rest of the code to make use of this new class

Testing

Most of the tests pass and simulations work, but a more extensive testing is required to test full functionality.

Checklist

Please check all applicable items:

  • Code changes are sufficiently documented (docstrings, inline comments, doc folder updates)
  • New unit tests added for new features or bug fixes
  • Existing tests pass with the changes
  • Reinforcement learning examples are operational (for DRL-related changes)
  • Code tested with both local and Docker databases
  • Code follows project style guidelines and best practices
  • Changes are backwards compatible, or deprecation notices added
  • New dependencies added to pyproject.toml
  • A note for the release notes doc/release_notes.rst of the upcoming release is included
  • Consent to release this PR's code under the GNU Affero General Public License v3.0

Additional Notes (if applicable)

[Any additional information, concerns, or areas you want reviewers to focus on]

Screenshots (if applicable)

[Add screenshots to demonstrate visual changes]

@nick-harder
Copy link
Member Author

@maurerle all tests pass except for one in unit_operator def test_get_actual_dispatch. I don't understand why it doesn't work, could you please take a look into it?

@nick-harder nick-harder marked this pull request as draft November 15, 2024 15:22
@nick-harder nick-harder mentioned this pull request Nov 18, 2024
nick-harder and others added 2 commits November 20, 2024 09:33
the format of the unit_dfs output did change, so we need to adapt the test to it as it does not use pandas dataframe anymore
@maurerle
Copy link
Member

Somehow the other tests for RL are not yet working @nick-harder

-fix RL tests by passing learning bidding strategies during initialization
@nick-harder
Copy link
Member Author

@maurerle thanks for the fix, I have also fixed the RL tests now

Copy link

codecov bot commented Nov 20, 2024

Codecov Report

Attention: Patch coverage is 75.23923% with 207 lines in your changes missing coverage. Please review.

Project coverage is 76.61%. Comparing base (7d5465b) to head (445b168).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
assume/common/fast_pandas.py 69.36% 136 Missing ⚠️
assume/strategies/flexable.py 44.44% 30 Missing ⚠️
assume/common/forecasts.py 71.11% 13 Missing ⚠️
assume/strategies/flexable_storage.py 80.55% 7 Missing ⚠️
assume/units/storage.py 89.36% 5 Missing ⚠️
assume/common/outputs.py 85.00% 3 Missing ⚠️
assume/strategies/naive_strategies.py 70.00% 3 Missing ⚠️
assume/units/steel_plant.py 80.00% 3 Missing ⚠️
assume/strategies/extended.py 50.00% 2 Missing ⚠️
assume/common/base.py 96.66% 1 Missing ⚠️
... and 4 more
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #483      +/-   ##
==========================================
+ Coverage   76.40%   76.61%   +0.20%     
==========================================
  Files          49       50       +1     
  Lines        6295     6727     +432     
==========================================
+ Hits         4810     5154     +344     
- Misses       1485     1573      +88     
Flag Coverage Δ
pytest 76.61% <75.23%> (+0.20%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.


🚨 Try these New Features:

@nick-harder
Copy link
Member Author

@kim-mskw @maurerle I have tested all non learning examples and everythings works fine, all data is also being properly written to the database and other outputs. I have also replaced and removed pandas from the code as much as possible and renamed the typing hints and other things. Please take another look at the commit

@maurerle maurerle force-pushed the pandas_wrapper branch 2 times, most recently from 84b7c80 to a0d39b3 Compare November 22, 2024 14:05
@nick-harder
Copy link
Member Author

nick-harder commented Nov 22, 2024

@maurerle @kim-mskw @Manish-Khanra I have been testing now with the largest example we have for 2019 with two EOM and CRM markets, and the speed up is crazy, I didn't expect that. The difference is around 4.2 times. We should really all put a bit more effort and merge this as soon as we can after proper testing, this performance increase is even more what we expected...

running on pandas_wrapper
telegram-cloud-document-2-5332300261325037747

running on main
telegram-cloud-document-2-5332300261325038181

@nick-harder
Copy link
Member Author

@kim-mskw I have tested the RL examples we have. Some things had to be fixed byt everything is working and the learning in small 1 and 2 is also like 2 to 3 times faster

@maurerle
Copy link
Member

maurerle commented Nov 22, 2024

Note that this implementation also has calculate_generation_cost enabled which was commented out due to slowing down the simulation a lot.

This is fixed here as well (by using numpy and not pandas anymore) and commenting out does not have a noticable influence on the performance now.

So we have a better fix for #355 and #357 now and economic results are again available in the standard calculation

@nick-harder
Copy link
Member Author

@maurerle thanks for the review! @kim-mskw would you have time to take another look at the PR if everything is good from your side and looks fine?

@kim-mskw
Copy link
Contributor

kim-mskw commented Nov 25, 2024

the review! @kim-mskw would you have time to take another look at the PR if everything is good from your side and looks fine?

Of course :) on it! I have one minor comment left.

@nick-harder nick-harder merged commit b9cf60e into main Nov 25, 2024
8 of 9 checks passed
@nick-harder nick-harder deleted the pandas_wrapper branch November 25, 2024 15:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve performance by switching from pandas to numpy
3 participants