Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance to create constraints over clustered year #1017

Merged
merged 14 commits into from
Feb 14, 2025

Conversation

abelsiqueira
Copy link
Member

@abelsiqueira abelsiqueira commented Feb 7, 2025

This is built on top of #1015.

Closes #1014

Notes:

  • This is essentially the refactor for the relevant function. So graph and other similar things were removed in favour of DuckDB.
  • The idea is pretty much the same as in the similar rep_periods function. The main differences are:
    • Now, the constraints are defined over period blocks, because it is over clustered year.
    • This means we need the rep_periods_mapping table (and friends) to join the period from constraints with the rep_period from variables taking into account the weight.
    • This create a two-level group for the main loop, one for (asset, year) and internally for rep_period.
  • In addition to the expressions, the :inflows_profile_aggregation is also computed here. Although, in the end the solution I found was to create a completely separate query to aggregate the profile - and probably it doesn't make sense to compute this in this function (unless we need to make it faster and figure a way to join them), because the inflows don't require any flow information.
  • I have no idea if this is correct!!! The tests are passing, but they were also passing for known failing intermediary states. In the last version, I compare the generated LP files as well, and they are slightly different, but as far as I can see, only on the inflow value for an error of ~1e-12, i.e., it appears to be a normal numerical precision difference.
  • Locally, the PR head of Improve performance v0.11.0 #1015 takes around 230 seconds and this is taking around 5 seconds, compared only with @time, but I would prefer that you test in your normal workflow.
  • The benchmarks failed with exceeded time for main. I don't remember which version of the file it uses, but I assumed it was the newest. Maybe I'm wrong, or maybe main is just much worse, or GitHub's runners are much slower. Anyway, up to discussion.

@abelsiqueira abelsiqueira added the benchmark PR only - Run benchmark on PR label Feb 7, 2025
Copy link
Contributor

github-actions bot commented Feb 7, 2025

Benchmark Results

Benchmark in progress...

Copy link

codecov bot commented Feb 8, 2025

Codecov Report

Attention: Patch coverage is 98.07692% with 1 line in your changes missing coverage. Please review.

Project coverage is 95.27%. Comparing base (d7cba2a) to head (1cf8c1d).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
src/model-preparation.jl 98.07% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1017      +/-   ##
==========================================
+ Coverage   95.22%   95.27%   +0.04%     
==========================================
  Files          29       29              
  Lines        1152     1184      +32     
==========================================
+ Hits         1097     1128      +31     
- Misses         55       56       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@datejada datejada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @abelsiqueira, thanks for the PR. I tested it and we get the same .lp file for 100rp comparing this PR and PR #1015. The SQL queries can e followed, so that is good, since they follow the same type of structure as the others in the code. I also like that the inflows aggregation is not an expression, it is better as a coefficients (maybe the same is useful for other aggregations, like the availability and demand profile aggregation).

I ran the benchmark locally with 100rp and the times to create the model are:

My main comment is that in these changes the JuMP.add_to_expression! is lost in the code, and I think we still can gain more with change to (or am I not seeing/understanding something here)

Finally, since the inflow profiles are calculated in the coefficients, it is better to clean the function and delete the commented parts that are not used anymore (even some TODOs might be obsolete).

Thanks again, great work here!

Diego

view = true,
grouped_var_table_name = "t_grouped_$(flow.table_name)_match_on_$(from_or_to)"
if !_check_if_table_exists(connection, grouped_var_table_name)
DuckDB.query(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was not executed so the codecov is complaining. Indeed, this table should already be computed in the rp version - it's the same aggregation. This is only here in the case that the rp version is not called at all.
Is that possible? If yes, then we should create an issue to create a test case where that happens.
If not, then we can delete this and keep only the check and an error message.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! The answer to the question is, no, it is not possible. The RP version will always call for the timeframe constraints. So, the check and an error message should be the way to go here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@abelsiqueira Hi, I change the code here to have the error check, but it looks a bit ugly, so, feel free to suggest some changes. Thanks!

@abelsiqueira
Copy link
Member Author

Hi Diego, I've replied to your JuMP.add_expression comments, but in summary, we don't need them anymore because we don't have a workspace of expressions anymore.

On doing the same aggregation for everyone (or the opposite, not doing for any of them), it's a good question that should be investigate. Can you create an issue for that?

I removed the other comments that I could see from this PR page and reduced the DuckDB query to remove the LEFT JOIN on the profiles. Hopefully the code still works.

This should address all comments, but if it doesn't, feel free to pick it up while I'm back on the multi-year refactor. But if you do need me, feel free to ping me.

@datejada
Copy link
Member

datejada commented Feb 12, 2025

Hi Diego, I've replied to your JuMP.add_expression comments, but in summary, we don't need them anymore because we don't have a workspace of expressions anymore.

On doing the same aggregation for everyone (or the opposite, not doing for any of them), it's a good question that should be investigate. Can you create an issue for that?

I removed the other comments that I could see from this PR page and reduced the DuckDB query to remove the LEFT JOIN on the profiles. Hopefully the code still works.

This should address all comments, but if it doesn't, feel free to pick it up while I'm back on the multi-year refactor. But if you do need me, feel free to ping me.

@abelsiqueira Thank you for the reply and explanation. My question also arose when I saw this message in the Julia console using this code (for hourly 36rp):

┌ Warning: The addition operator has been used on JuMP expressions a large number of times. This warning is safe to ignore but may indicate that model generation is slower than necessary. For performance reasons, you should not add expressions in a loop. Instead of x += y, use add_to_expression!(x,y) to modify x in place. If y is a single variable, you may also use add_to_expression!(x, coef, y) for x += coef*y.

So, if it is not this part, then I wonder where else we have the += such that JuMP complains 🤔 Any ideas?

To reproduce the warning, use the files in the benchmark, and before creating the model, use the following command to change all the partitions to 1 hour:

DuckDB.query(connection, "UPDATE assets_rep_periods_partitions SET partition = 1")
DuckDB.query(connection, "UPDATE flows_rep_periods_partitions SET partition = 1")

Thanks!

@abelsiqueira
Copy link
Member Author

No ideas, but indeed there is something somewhere. I'm probably not touching this until Monday, though.

@datejada datejada removed the benchmark PR only - Run benchmark on PR label Feb 13, 2025
@datejada datejada changed the title 1014 abel Improve performance to create constraints over clustered year Feb 13, 2025
@datejada datejada mentioned this pull request Feb 13, 2025
4 tasks
@datejada datejada marked this pull request as ready for review February 13, 2025 16:28
@datejada
Copy link
Member

No ideas, but indeed there is something somewhere. I'm probably not touching this until Monday, though.

great! thanks, since it seems it is happening in another part of the code, then we can address it in another issue and merge this one first.

Copy link
Member

@datejada datejada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@abelsiqueira, the PR is approved, feel free to merge it once you have looked at the error message that I suggest to add in the code.

@datejada
Copy link
Member

@abelsiqueira, I forgot to mention: I am running the versions' benchmarks locally to get a full picture of improvements. That is why I am not benchmarking this PR anymore and returning the files to the 100rp benchmark.

@abelsiqueira
Copy link
Member Author

abelsiqueira commented Feb 14, 2025

Hi @datejada, I've updated your error message a bit. I have not found any reason for the expression warning yet. I looked into the += and sum in that file, and they don't appear to be dealing with expressions (we do create expressions summing a large number of Julia variables, though, could that be the same warning?).
I wrote a message on slack, in case someone has a suggestions on how to easily find out where the warning originates.

@abelsiqueira
Copy link
Member Author

I will merge this now. Thanks for the help.

@abelsiqueira abelsiqueira merged commit d7d70b7 into main Feb 14, 2025
7 checks passed
@abelsiqueira abelsiqueira deleted the 1014-abel branch February 14, 2025 10:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve performance v0.11.0
2 participants