Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance v0.11.0 #1015

Closed
wants to merge 5 commits into from
Closed

Conversation

datejada
Copy link
Member

@datejada datejada commented Feb 5, 2025

  • Using JuMP.add_to_expression! and subset when creating the constraints over clustered year (inter-temporal constraints) to improve efficiency.

Related issues

Closes #1014

Checklist

  • I am following the contributing guidelines

  • Tests are passing

  • Lint workflow is passing

  • Docs were updated and workflow is passing

@datejada datejada added the benchmark PR only - Run benchmark on PR label Feb 5, 2025
Copy link
Contributor

github-actions bot commented Feb 5, 2025

Benchmark Results

Benchmark in progress...

Copy link

codecov bot commented Feb 5, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 95.25%. Comparing base (d7cba2a) to head (6b29178).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1015      +/-   ##
==========================================
+ Coverage   95.22%   95.25%   +0.02%     
==========================================
  Files          29       29              
  Lines        1152     1158       +6     
==========================================
+ Hits         1097     1103       +6     
  Misses         55       55              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@datejada datejada force-pushed the 1014-improve-performance-v0_11_0 branch from 48bfb7a to f3cda08 Compare February 5, 2025 11:47
@abelsiqueira
Copy link
Member

I don't know what is the error in the benchmark, but it doesn't seem related to the cancellation due to exceeded time. Does it run the benchmark locally?

@datejada datejada force-pushed the 1014-improve-performance-v0_11_0 branch from f3cda08 to c99898e Compare February 5, 2025 14:24
@datejada
Copy link
Member Author

datejada commented Feb 5, 2025

I don't know what is the error in the benchmark, but it doesn't seem related to the cancellation due to exceeded time. Does it run the benchmark locally?

I forgot to update the duration for the incoming inflows. I fixed and it runs locally.

The results in v0.10.4 are here: #1013 (comment)

I expect this Benchmark to be around the same time, which is good since we achieve more or less the same performance we got in that version, but...20min to create the model is too much :/ Especially since the hourly problem of the whole year (a bigger problem) takes around 5 minutes to create. The bottleneck is this function add_expression_terms_over_clustered_year_constraints! which is too slow...I wonder what else we can do to improve its performance...any ideas?

@datejada datejada requested a review from abelsiqueira February 5, 2025 14:34
@abelsiqueira
Copy link
Member

That's the counterpart of https://github.com/TulipaEnergy/TulipaEnergyModel.jl/pull/987/files#diff-43e9d15ce9a457e188412a614efc15efee4860e85525d1b47ef2892074e90d43, right? Hopefully something similar to what was done there, i.e., change the loops to use DuckDB tables and explicit mark some parts with types. I used Cthulhu and @profview while I was doing that PR to help figure out where to change things.
However, it will be more complicated since there is also a df_map, so it's not immediately clear how to change things. I can start looking into this, to figure out more

@datejada
Copy link
Member Author

datejada commented Feb 5, 2025

  • JuMP.add_to_expression!

So, to recap (for myself and having things clear), we have identified two bottlenecks:

  1. The +=, which is solved using JuMP.add_to_expression! Yesterday, Joaquim and Oscar commented on the JuMP energy models benchmark meeting, explaining why this += is slow and reinforcing that we improve efficiency using JuMP.add_to_expression!
  2. The filtering/subsetting of the dataframes, which might improve with a DuckDB version as the representative period's counterpart.

The first point leaves us at a similar performance level to v0.10.4 (which is still slow for the use cases we are solving now).

The second point is challenging, but it feels the way to go. I will try to think about it, too, but if you have time between the multi-year refactor and this, it would be appreciated.

Thanks!

@abelsiqueira
Copy link
Member

  1. Yes
  2. Hopefully, yes. Either way, part of the refactor involves this function, so I changing to DuckDB will be necessary. The rep_period counterpart was done more on the objective of refactoring, and some investigation led to it being a bit faster. This might be the case here as well.

I will try to spend some time on it

@datejada
Copy link
Member Author

datejada commented Feb 6, 2025

@abelsiqueira I just realized that the benchmark would never work here since the current main can't create the EU case study in less than the GitHub time limit 😆 Locally, I can see the improvements in this PR, but it won't work here.

So, I want to try something else with the JuMP function, but I think the DuckDB part should be in another PR. We should merge these changes to have the new reference.

What do you think?

@abelsiqueira
Copy link
Member

There is one problem with merging this in the current state: It will become the benchmark! I.e., the next pull request improving it won't run in the GitHub time limit!
How slow is it right now? Maybe we change the data a bit to make it run fast enough to show up, or we change the benchmark file a little bit to run less times (less accurate estimation, but finishing in time). I'll work overtime to try to improve this before next week. I'll see what I produce with DuckDB and we see how to merge these two things.

@datejada
Copy link
Member Author

datejada commented Feb 6, 2025

There is one problem with merging this in the current state: It will become the benchmark! I.e., the next pull request improving it won't run in the GitHub time limit! How slow is it right now? Maybe we change the data a bit to make it run fast enough to show up, or we change the benchmark file a little bit to run less times (less accurate estimation, but finishing in time). I'll work overtime to try to improve this before next week. I'll see what I produce with DuckDB and we see how to merge these two things.

I agree with your suggestion. Let's use a smaller version to benchmark here; I will create a case with 36 resource points (36rp) and commit it for reference. Nevertheless, I would like to maintain the 100 resource points (100rp) as the benchmark for future comparisons (after we merge this PR). We can consider merging this branch at the end, but we should do so without that commit of the EU case by using interactive rebasing before merging.

Let’s discuss this further next week. In the meantime, I will update the data for the 36rp benchmark.

As for the slowness issue, it takes about an hour to create the model on my PC, whereas version 0.10.4 takes around 30 minutes for the same dataset.

Thanks!

@abelsiqueira
Copy link
Member

I checked the logs and the benchmark doesn't even finish the baseline (i.e., main).
It looks like we should reduce the case even more until it can fit here, or we'll have to rely on local benchmarks only.

Do you have an idea of how the time depends on the rp value?

The GitHub time limit is 6h. The model creation is run 4 times (or 5, if it compiles by running, I'm not sure). I just realised there is an extra create_model! in the benchmarks/benchmark.jl script that was used for non-existing next-steps, so we can improve benchmark time a bit with that.

@datejada
Copy link
Member Author

Closing this PR in favor of #1017

@datejada datejada closed this Feb 13, 2025
@datejada datejada deleted the 1014-improve-performance-v0_11_0 branch February 13, 2025 15:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
benchmark PR only - Run benchmark on PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve performance v0.11.0
2 participants