Improve performance v0.11.0 #1015

datejada · 2025-02-05T11:09:56Z

Using JuMP.add_to_expression! and subset when creating the constraints over clustered year (inter-temporal constraints) to improve efficiency.

Related issues

Closes #1014

Checklist

I am following the contributing guidelines
Tests are passing
Lint workflow is passing
Docs were updated and workflow is passing

github-actions · 2025-02-05T11:10:20Z

Benchmark Results

Benchmark in progress...

codecov · 2025-02-05T11:46:17Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 95.25%. Comparing base (d7cba2a) to head (6b29178).

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1015      +/-   ##
==========================================
+ Coverage   95.22%   95.25%   +0.02%     
==========================================
  Files          29       29              
  Lines        1152     1158       +6     
==========================================
+ Hits         1097     1103       +6     
  Misses         55       55

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

abelsiqueira · 2025-02-05T12:21:14Z

I don't know what is the error in the benchmark, but it doesn't seem related to the cancellation due to exceeded time. Does it run the benchmark locally?

datejada · 2025-02-05T14:34:02Z

I don't know what is the error in the benchmark, but it doesn't seem related to the cancellation due to exceeded time. Does it run the benchmark locally?

I forgot to update the duration for the incoming inflows. I fixed and it runs locally.

The results in v0.10.4 are here: #1013 (comment)

I expect this Benchmark to be around the same time, which is good since we achieve more or less the same performance we got in that version, but...20min to create the model is too much :/ Especially since the hourly problem of the whole year (a bigger problem) takes around 5 minutes to create. The bottleneck is this function add_expression_terms_over_clustered_year_constraints! which is too slow...I wonder what else we can do to improve its performance...any ideas?

abelsiqueira · 2025-02-05T14:52:40Z

That's the counterpart of https://github.com/TulipaEnergy/TulipaEnergyModel.jl/pull/987/files#diff-43e9d15ce9a457e188412a614efc15efee4860e85525d1b47ef2892074e90d43, right? Hopefully something similar to what was done there, i.e., change the loops to use DuckDB tables and explicit mark some parts with types. I used Cthulhu and @profview while I was doing that PR to help figure out where to change things.
However, it will be more complicated since there is also a df_map, so it's not immediately clear how to change things. I can start looking into this, to figure out more

datejada · 2025-02-05T15:10:43Z

JuMP.add_to_expression!

So, to recap (for myself and having things clear), we have identified two bottlenecks:

The +=, which is solved using JuMP.add_to_expression! Yesterday, Joaquim and Oscar commented on the JuMP energy models benchmark meeting, explaining why this += is slow and reinforcing that we improve efficiency using JuMP.add_to_expression!
The filtering/subsetting of the dataframes, which might improve with a DuckDB version as the representative period's counterpart.

The first point leaves us at a similar performance level to v0.10.4 (which is still slow for the use cases we are solving now).

The second point is challenging, but it feels the way to go. I will try to think about it, too, but if you have time between the multi-year refactor and this, it would be appreciated.

Thanks!

abelsiqueira · 2025-02-05T15:33:20Z

Yes
Hopefully, yes. Either way, part of the refactor involves this function, so I changing to DuckDB will be necessary. The rep_period counterpart was done more on the objective of refactoring, and some investigation led to it being a bit faster. This might be the case here as well.

I will try to spend some time on it

datejada · 2025-02-06T10:21:52Z

@abelsiqueira I just realized that the benchmark would never work here since the current main can't create the EU case study in less than the GitHub time limit 😆 Locally, I can see the improvements in this PR, but it won't work here.

So, I want to try something else with the JuMP function, but I think the DuckDB part should be in another PR. We should merge these changes to have the new reference.

What do you think?

abelsiqueira · 2025-02-06T11:03:29Z

There is one problem with merging this in the current state: It will become the benchmark! I.e., the next pull request improving it won't run in the GitHub time limit!
How slow is it right now? Maybe we change the data a bit to make it run fast enough to show up, or we change the benchmark file a little bit to run less times (less accurate estimation, but finishing in time). I'll work overtime to try to improve this before next week. I'll see what I produce with DuckDB and we see how to merge these two things.

datejada · 2025-02-06T16:20:03Z

There is one problem with merging this in the current state: It will become the benchmark! I.e., the next pull request improving it won't run in the GitHub time limit! How slow is it right now? Maybe we change the data a bit to make it run fast enough to show up, or we change the benchmark file a little bit to run less times (less accurate estimation, but finishing in time). I'll work overtime to try to improve this before next week. I'll see what I produce with DuckDB and we see how to merge these two things.

I agree with your suggestion. Let's use a smaller version to benchmark here; I will create a case with 36 resource points (36rp) and commit it for reference. Nevertheless, I would like to maintain the 100 resource points (100rp) as the benchmark for future comparisons (after we merge this PR). We can consider merging this branch at the end, but we should do so without that commit of the EU case by using interactive rebasing before merging.

Let’s discuss this further next week. In the meantime, I will update the data for the 36rp benchmark.

As for the slowness issue, it takes about an hour to create the model on my PC, whereas version 0.10.4 takes around 30 minutes for the same dataset.

Thanks!

abelsiqueira · 2025-02-08T11:28:43Z

I checked the logs and the benchmark doesn't even finish the baseline (i.e., main).
It looks like we should reduce the case even more until it can fit here, or we'll have to rely on local benchmarks only.

Do you have an idea of how the time depends on the rp value?

The GitHub time limit is 6h. The model creation is run 4 times (or 5, if it compiles by running, I'm not sure). I just realised there is an extra create_model! in the benchmarks/benchmark.jl script that was used for non-existing next-steps, so we can improve benchmark time a bit with that.

datejada · 2025-02-13T15:20:32Z

Closing this PR in favor of #1017

datejada added the benchmark PR only - Run benchmark on PR label Feb 5, 2025

datejada force-pushed the 1014-improve-performance-v0_11_0 branch from 48bfb7a to f3cda08 Compare February 5, 2025 11:47

datejada added 2 commits February 5, 2025 15:09

Improve performance when creating expressions for timeframe constraints

c53bbb3

Update duration for incoming inflows

c99898e

datejada force-pushed the 1014-improve-performance-v0_11_0 branch from f3cda08 to c99898e Compare February 5, 2025 14:24

datejada requested a review from abelsiqueira February 5, 2025 14:34

datejada added 2 commits February 5, 2025 21:57

Fixing update of the JuMP expression

e518d61

Calculate only if weight > 0

64dc814

TEMP: update EU benchmark case to 36rp

6b29178

abelsiqueira mentioned this pull request Feb 8, 2025

Improve performance to create constraints over clustered year #1017

Merged

datejada closed this Feb 13, 2025

datejada deleted the 1014-improve-performance-v0_11_0 branch February 13, 2025 15:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve performance v0.11.0 #1015

Improve performance v0.11.0 #1015

datejada commented Feb 5, 2025 •

edited

Loading

github-actions bot commented Feb 5, 2025

codecov bot commented Feb 5, 2025 •

edited

Loading

abelsiqueira commented Feb 5, 2025

datejada commented Feb 5, 2025 •

edited

Loading

abelsiqueira commented Feb 5, 2025

datejada commented Feb 5, 2025

abelsiqueira commented Feb 5, 2025

datejada commented Feb 6, 2025

abelsiqueira commented Feb 6, 2025

datejada commented Feb 6, 2025

abelsiqueira commented Feb 8, 2025

datejada commented Feb 13, 2025

Improve performance v0.11.0 #1015

Improve performance v0.11.0 #1015

Conversation

datejada commented Feb 5, 2025 • edited Loading

Related issues

Checklist

github-actions bot commented Feb 5, 2025

Benchmark Results

codecov bot commented Feb 5, 2025 • edited Loading

Codecov Report

abelsiqueira commented Feb 5, 2025

datejada commented Feb 5, 2025 • edited Loading

abelsiqueira commented Feb 5, 2025

datejada commented Feb 5, 2025

abelsiqueira commented Feb 5, 2025

datejada commented Feb 6, 2025

abelsiqueira commented Feb 6, 2025

datejada commented Feb 6, 2025

abelsiqueira commented Feb 8, 2025

datejada commented Feb 13, 2025

datejada commented Feb 5, 2025 •

edited

Loading

codecov bot commented Feb 5, 2025 •

edited

Loading

datejada commented Feb 5, 2025 •

edited

Loading