Improve performance to create constraints over clustered year #1017

abelsiqueira · 2025-02-07T16:12:03Z

This is built on top of #1015.

Closes #1014

Notes:

This is essentially the refactor for the relevant function. So graph and other similar things were removed in favour of DuckDB.
The idea is pretty much the same as in the similar rep_periods function. The main differences are:
- Now, the constraints are defined over period blocks, because it is over clustered year.
- This means we need the rep_periods_mapping table (and friends) to join the period from constraints with the rep_period from variables taking into account the weight.
- This create a two-level group for the main loop, one for (asset, year) and internally for rep_period.
In addition to the expressions, the :inflows_profile_aggregation is also computed here. Although, in the end the solution I found was to create a completely separate query to aggregate the profile - and probably it doesn't make sense to compute this in this function (unless we need to make it faster and figure a way to join them), because the inflows don't require any flow information.
I have no idea if this is correct!!! The tests are passing, but they were also passing for known failing intermediary states. In the last version, I compare the generated LP files as well, and they are slightly different, but as far as I can see, only on the inflow value for an error of ~1e-12, i.e., it appears to be a normal numerical precision difference.
Locally, the PR head of Improve performance v0.11.0 #1015 takes around 230 seconds and this is taking around 5 seconds, compared only with @time, but I would prefer that you test in your normal workflow.
The benchmarks failed with exceeded time for main. I don't remember which version of the file it uses, but I assumed it was the newest. Maybe I'm wrong, or maybe main is just much worse, or GitHub's runners are much slower. Anyway, up to discussion.

github-actions · 2025-02-07T16:12:31Z

Benchmark Results

Benchmark in progress...

codecov · 2025-02-08T12:25:09Z

Codecov Report

Attention: Patch coverage is 98.07692% with 1 line in your changes missing coverage. Please review.

Project coverage is 95.27%. Comparing base (d7cba2a) to head (1cf8c1d).
Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
src/model-preparation.jl	98.07%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1017      +/-   ##
==========================================
+ Coverage   95.22%   95.27%   +0.04%     
==========================================
  Files          29       29              
  Lines        1152     1184      +32     
==========================================
+ Hits         1097     1128      +31     
- Misses         55       56       +1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

src/model-preparation.jl

datejada

Hi @abelsiqueira, thanks for the PR. I tested it and we get the same .lp file for 100rp comparing this PR and PR #1015. The SQL queries can e followed, so that is good, since they follow the same type of structure as the others in the code. I also like that the inflows aggregation is not an expression, it is better as a coefficients (maybe the same is useful for other aggregations, like the availability and demand profile aggregation).

I ran the benchmark locally with 100rp and the times to create the model are:

PR Improve performance v0.11.0 #1015 1167s
This PR 28.6s

My main comment is that in these changes the JuMP.add_to_expression! is lost in the code, and I think we still can gain more with change to (or am I not seeing/understanding something here)

Finally, since the inflow profiles are calculated in the coefficients, it is better to clean the function and delete the commented parts that are not used anymore (even some TODOs might be obsolete).

Thanks again, great work here!

Diego

src/model-preparation.jl

benchmark/benchmarks.jl

src/model-preparation.jl

abelsiqueira · 2025-02-12T08:38:19Z

src/model-preparation.jl

-                view = true,
+        grouped_var_table_name = "t_grouped_$(flow.table_name)_match_on_$(from_or_to)"
+        if !_check_if_table_exists(connection, grouped_var_table_name)
+            DuckDB.query(


This was not executed so the codecov is complaining. Indeed, this table should already be computed in the rp version - it's the same aggregation. This is only here in the case that the rp version is not called at all.
Is that possible? If yes, then we should create an issue to create a test case where that happens.
If not, then we can delete this and keep only the check and an error message.

Thanks! The answer to the question is, no, it is not possible. The RP version will always call for the timeframe constraints. So, the check and an error message should be the way to go here.

@abelsiqueira Hi, I change the code here to have the error check, but it looks a bit ugly, so, feel free to suggest some changes. Thanks!

src/model-preparation.jl

abelsiqueira · 2025-02-12T08:52:03Z

Hi Diego, I've replied to your JuMP.add_expression comments, but in summary, we don't need them anymore because we don't have a workspace of expressions anymore.

On doing the same aggregation for everyone (or the opposite, not doing for any of them), it's a good question that should be investigate. Can you create an issue for that?

I removed the other comments that I could see from this PR page and reduced the DuckDB query to remove the LEFT JOIN on the profiles. Hopefully the code still works.

This should address all comments, but if it doesn't, feel free to pick it up while I'm back on the multi-year refactor. But if you do need me, feel free to ping me.

datejada · 2025-02-12T14:01:38Z

Hi Diego, I've replied to your JuMP.add_expression comments, but in summary, we don't need them anymore because we don't have a workspace of expressions anymore.

On doing the same aggregation for everyone (or the opposite, not doing for any of them), it's a good question that should be investigate. Can you create an issue for that?

I removed the other comments that I could see from this PR page and reduced the DuckDB query to remove the LEFT JOIN on the profiles. Hopefully the code still works.

This should address all comments, but if it doesn't, feel free to pick it up while I'm back on the multi-year refactor. But if you do need me, feel free to ping me.

@abelsiqueira Thank you for the reply and explanation. My question also arose when I saw this message in the Julia console using this code (for hourly 36rp):

┌ Warning: The addition operator has been used on JuMP expressions a large number of times. This warning is safe to ignore but may indicate that model generation is slower than necessary. For performance reasons, you should not add expressions in a loop. Instead of x += y, use add_to_expression!(x,y) to modify x in place. If y is a single variable, you may also use add_to_expression!(x, coef, y) for x += coef*y.

So, if it is not this part, then I wonder where else we have the += such that JuMP complains 🤔 Any ideas?

To reproduce the warning, use the files in the benchmark, and before creating the model, use the following command to change all the partitions to 1 hour:

DuckDB.query(connection, "UPDATE assets_rep_periods_partitions SET partition = 1")
DuckDB.query(connection, "UPDATE flows_rep_periods_partitions SET partition = 1")

Thanks!

abelsiqueira · 2025-02-13T09:20:21Z

No ideas, but indeed there is something somewhere. I'm probably not touching this until Monday, though.

datejada · 2025-02-13T17:05:09Z

No ideas, but indeed there is something somewhere. I'm probably not touching this until Monday, though.

great! thanks, since it seems it is happening in another part of the code, then we can address it in another issue and merge this one first.

datejada

@abelsiqueira, the PR is approved, feel free to merge it once you have looked at the error message that I suggest to add in the code.

datejada · 2025-02-14T08:05:03Z

@abelsiqueira, I forgot to mention: I am running the versions' benchmarks locally to get a full picture of improvements. That is why I am not benchmarking this PR anymore and returning the files to the 100rp benchmark.

abelsiqueira · 2025-02-14T10:31:51Z

Hi @datejada, I've updated your error message a bit. I have not found any reason for the expression warning yet. I looked into the += and sum in that file, and they don't appear to be dealing with expressions (we do create expressions summing a large number of Julia variables, though, could that be the same warning?).
I wrote a message on slack, in case someone has a suggestions on how to easily find out where the warning originates.

abelsiqueira · 2025-02-14T10:34:09Z

I will merge this now. Thanks for the help.

datejada added 4 commits February 5, 2025 15:09

Improve performance when creating expressions for timeframe constraints

c53bbb3

Update duration for incoming inflows

c99898e

Fixing update of the JuMP expression

e518d61

Calculate only if weight > 0

64dc814

abelsiqueira added the benchmark PR only - Run benchmark on PR label Feb 7, 2025

abelsiqueira force-pushed the 1014-abel branch from dd763a1 to e14d510 Compare February 7, 2025 16:53

datejada reviewed Feb 11, 2025

View reviewed changes

src/model-preparation.jl Show resolved Hide resolved

datejada reviewed Feb 11, 2025

View reviewed changes