Pandas wrapper #483

nick-harder · 2024-11-15T15:20:57Z

Pull Request

Related Issue

Closes #321

Description

As suggested by @maurerle this improves performance of all simulations by a factor of 2 up to 4, by replacing pandas actions with numpy where possible. For this, we are using a wrapper object FastIndex and FastSeries which wraps a numpy array so that we can access it using typical datetime accessors.

The speed up for small simulations is 2x and for large simulations 3x.

Changes Proposed

Shift to using special classes of FastIndex and FastSeries
Adjust the rest of the code to make use of this new class

Testing

Most of the tests pass and simulations work, but a more extensive testing is required to test full functionality.

Checklist

Please check all applicable items:

Additional Notes (if applicable)

[Any additional information, concerns, or areas you want reviewers to focus on]

Screenshots (if applicable)

[Add screenshots to demonstrate visual changes]

-adjusted strategies for new format -fixed tests

-rename FastDateTimeSeries to FastSeries -adjust bidding strategies to use zip for better readability and speed

-fix dmas strategies

…umpy

-fix flexable storage startegy -adjust RL strategies to work with new format -adjust steelplant code to use new index -fix several tests

-fix extended strategy

nick-harder · 2024-11-15T15:22:00Z

@maurerle all tests pass except for one in unit_operator def test_get_actual_dispatch. I don't understand why it doesn't work, could you please take a look into it?

the format of the unit_dfs output did change, so we need to adapt the test to it as it does not use pandas dataframe anymore

maurerle · 2024-11-20T11:27:13Z

Somehow the other tests for RL are not yet working @nick-harder

-fix RL tests by passing learning bidding strategies during initialization

nick-harder · 2024-11-20T12:03:44Z

@maurerle thanks for the fix, I have also fixed the RL tests now

codecov · 2024-11-20T12:24:12Z

Codecov Report

Attention: Patch coverage is 75.23923% with 207 lines in your changes missing coverage. Please review.

Project coverage is 76.61%. Comparing base (7d5465b) to head (445b168).
Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
assume/common/fast_pandas.py	69.36%	136 Missing ⚠️
assume/strategies/flexable.py	44.44%	30 Missing ⚠️
assume/common/forecasts.py	71.11%	13 Missing ⚠️
assume/strategies/flexable_storage.py	80.55%	7 Missing ⚠️
assume/units/storage.py	89.36%	5 Missing ⚠️
assume/common/outputs.py	85.00%	3 Missing ⚠️
assume/strategies/naive_strategies.py	70.00%	3 Missing ⚠️
assume/units/steel_plant.py	80.00%	3 Missing ⚠️
assume/strategies/extended.py	50.00%	2 Missing ⚠️
assume/common/base.py	96.66%	1 Missing ⚠️
... and 4 more

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #483      +/-   ##
==========================================
+ Coverage   76.40%   76.61%   +0.20%     
==========================================
  Files          49       50       +1     
  Lines        6295     6727     +432     
==========================================
+ Hits         4810     5154     +344     
- Misses       1485     1573      +88

Flag	Coverage Δ
pytest	`76.61% <75.23%> (+0.20%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚨 Try these New Features:

Flaky Tests Detection - Detect and resolve failed and flaky tests

-clean up index everywhere

nick-harder · 2024-11-22T13:02:50Z

@kim-mskw @maurerle I have tested all non learning examples and everythings works fine, all data is also being properly written to the database and other outputs. I have also replaced and removed pandas from the code as much as possible and renamed the typing hints and other things. Please take another look at the commit

nick-harder · 2024-11-22T14:45:08Z

@maurerle @kim-mskw @Manish-Khanra I have been testing now with the largest example we have for 2019 with two EOM and CRM markets, and the speed up is crazy, I didn't expect that. The difference is around 4.2 times. We should really all put a bit more effort and merge this as soon as we can after proper testing, this performance increase is even more what we expected...

running on pandas_wrapper

running on main

nick-harder · 2024-11-22T15:37:58Z

@kim-mskw I have tested the RL examples we have. Some things had to be fixed byt everything is working and the learning in small 1 and 2 is also like 2 to 3 times faster

…for each different example

maurerle · 2024-11-22T15:39:53Z

Note that this implementation also has calculate_generation_cost enabled which was commented out due to slowing down the simulation a lot.

This is fixed here as well (by using numpy and not pandas anymore) and commenting out does not have a noticable influence on the performance now.

So we have a better fix for #355 and #357 now and economic results are again available in the standard calculation

nick-harder · 2024-11-25T08:48:06Z

@maurerle thanks for the review! @kim-mskw would you have time to take another look at the PR if everything is good from your side and looks fine?

kim-mskw · 2024-11-25T09:54:11Z

the review! @kim-mskw would you have time to take another look at the PR if everything is good from your side and looks fine?

Of course :) on it! I have one minor comment left.

assume/scenario/loader_csv.py

-save dfs when size limit is reached

nick-harder added 17 commits November 13, 2024 17:54

initial code for the pandas wrapper

227ea34

first functional version

c3c591d

-fixing tests

3661e40

-improved functionality of fastseries

ffc53f9

-adjusted strategies for new format -fixed tests

-small fix in flexable strategy

4d09b6d

set default save frequency to None to speed up default simulation runs

73cb253

-fix advanced strategies

b94b089

-rename FastDateTimeIndex to FastIndex

ad02294

-rename FastDateTimeSeries to FastSeries -adjust bidding strategies to use zip for better readability and speed

-small improvements

649ece2

-improve speed of fastindex class

8a8a0c2

-improve structure and performance of the fast pandas

1ce53a3

-add iloc methods

94741db

-fix dmas strategies

-fix dmas storage strategy

f92c105

-create a TensorSeries to store tensor values without conversion to n…

b1e9ff0

…umpy

-add as_datetimeindex to convert FastIndex to pandas index

7346184

-fix flexable storage startegy -adjust RL strategies to work with new format -adjust steelplant code to use new index -fix several tests

-adjust typing

5442e17

-fixing several tests

9a9afee

-fix extended strategy

nick-harder marked this pull request as draft November 15, 2024 15:22

nick-harder mentioned this pull request Nov 18, 2024

Performance tuning #472

Closed

nick-harder and others added 2 commits November 20, 2024 09:33

-remove initializing TensorFastSeries only when RL strategy is defined

9e59388

fix test_get_actual_dispatch

5685d39

the format of the unit_dfs output did change, so we need to adapt the test to it as it does not use pandas dataframe anymore

-fix check for learning strategy

217cfee

-fix RL tests by passing learning bidding strategies during initialization

-fix tests and a small bug

fdf2902

nick-harder and others added 3 commits November 20, 2024 14:03

-remove index from unit initialization

9ea05d6

-clean up index everywhere

move convert_forecasts_to_fast_series to csvforecast

94b8ad1

update notebook_03 to suit new usage

54cfe30

nick-harder requested review from kim-mskw and maurerle November 22, 2024 13:03

add test for parse_duration

564b7c7

maurerle force-pushed the pandas_wrapper branch 2 times, most recently from 84b7c80 to a0d39b3 Compare November 22, 2024 14:05

fix warning in dmas powerplant

259043b

maurerle force-pushed the pandas_wrapper branch from a0d39b3 to 259043b Compare November 22, 2024 14:29

-add release notes

db2db0e

nick-harder and others added 3 commits November 22, 2024 15:59

-fix learning strategies and broken config

f5f0431

reduce log level for async run status

79dd828

-adjust save_forecast function to reflect new code

efbaa8f

-add a bit clarity to examples.py and remove creating a new local db …

f70c7dc

…for each different example

maurerle approved these changes Nov 25, 2024

View reviewed changes

Merge branch 'main' into pandas_wrapper

764ea28

kim-mskw reviewed Nov 25, 2024

View reviewed changes

assume/scenario/loader_csv.py Outdated Show resolved Hide resolved

maurerle and others added 7 commits November 25, 2024 11:49

revert increasing default save_frequency_hours to reduce memory usage

6e32a59

-add max size check to the output role

3cc85e0

-save dfs when size limit is reached

-rename things

1eaf736

move calculate_content_size to utils

f01fd09

use fork multiprocessing with mango 2.x

47f37c4

slicing out of range should not through but return empty iterable

dc19710

fix output moving to utils

445b168

nick-harder merged commit b9cf60e into main Nov 25, 2024
8 of 9 checks passed

nick-harder deleted the pandas_wrapper branch November 25, 2024 15:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pandas wrapper #483

Pandas wrapper #483

nick-harder commented Nov 15, 2024 •

edited

Loading

nick-harder commented Nov 15, 2024

maurerle commented Nov 20, 2024

nick-harder commented Nov 20, 2024

codecov bot commented Nov 20, 2024 •

edited

Loading

nick-harder commented Nov 22, 2024

nick-harder commented Nov 22, 2024 •

edited

Loading

nick-harder commented Nov 22, 2024

maurerle commented Nov 22, 2024 •

edited

Loading

nick-harder commented Nov 25, 2024

kim-mskw commented Nov 25, 2024 •

edited

Loading

Pandas wrapper #483

Pandas wrapper #483

Conversation

nick-harder commented Nov 15, 2024 • edited Loading

Pull Request

Related Issue

Description

Changes Proposed

Testing

Checklist

Additional Notes (if applicable)

Screenshots (if applicable)

nick-harder commented Nov 15, 2024

maurerle commented Nov 20, 2024

nick-harder commented Nov 20, 2024

codecov bot commented Nov 20, 2024 • edited Loading

Codecov Report

nick-harder commented Nov 22, 2024

nick-harder commented Nov 22, 2024 • edited Loading

nick-harder commented Nov 22, 2024

maurerle commented Nov 22, 2024 • edited Loading

nick-harder commented Nov 25, 2024

kim-mskw commented Nov 25, 2024 • edited Loading

nick-harder commented Nov 15, 2024 •

edited

Loading

codecov bot commented Nov 20, 2024 •

edited

Loading

nick-harder commented Nov 22, 2024 •

edited

Loading

maurerle commented Nov 22, 2024 •

edited

Loading

kim-mskw commented Nov 25, 2024 •

edited

Loading