Refactor as a package #72

gdalle · 2024-11-06T15:52:49Z

Hi @ccoffrin!
Would you be open to a PR refactoring rosetta-opf as a proper Julia package? That way people trying to use it don't have to copy-paste the code or download individual files.
Example: what if I want to benchmark ADNLPModels with an AD backend different from the default, or with a solver different from Ipopt? At the moment I have to copy-paste and modify the nlpmodels.jl file at the root of this repo.

The text was updated successfully, but these errors were encountered:

ccoffrin · 2024-11-13T17:49:53Z

I have not had a chance to think deeply on this but my initial reaction is that I would prefer not to make this a package. I think it will add maintenance burden while providing a relatively small increase in convenience. I don't particularly want this repo to be a dependency to other packages and the spirit of the repo is to be just a collection of scripts, rather than some modular infrastructure.

To achieve the objective of your "Example", my expectation would be that you use this repo as a reference for a new implementation/package that is specifically designed for benchmarking AD in ADNLPModels.

Some notes about the draft PR. I do not have a clear understanding of Julia extensions, so I am not ready to make them part of a repo that I maintain. The proposed solution does not yet address the variant models, so is not yet fully covering the scope of this repo.

gdalle · 2024-11-13T18:13:08Z

while providing a relatively small increase in convenience.

Respectfully, I strongly disagree. What's great about the work you (and others) did here is that now other people shouldn't have to redo it. If we need to go to the repo, download it, then copy-paste individual files and change multiple lines every time we need to run benchmarks, I'm not sure anyone will bother besides you. A functional API to create the model/run the benchmarks is much easier for scientists who don't want to dive into how models are encoded in each optimization framework.

This PR doesn't come from nowhere, it comes from my desire to use your benchmarking code for a research project, and my realization that it would be much more convenient with a package. Essentially, #73 replaces the manual Git cloning with automated management through Julia's Pkg.

I don't particularly want this repo to be a dependency to other packages

It probably won't be a dependency of registered, widely used packages, but it could be a dependency for a lot of academic code. Making rosetta-opf a package helps with that, as it does with clean versioning and reproducibility.
Note that this is separate from the question of whether the rosetta-opf package should be registered (part of the general registry).

I do not have a clear understanding of Julia extensions, so I am not ready to make them part of a repo that I maintain.

Extensions are a key feature of the package manager, introduced in Julia 1.9. They are critical here because they would allow users of rosetta-opf to run the code for, say, ADNLPModels without installing all of the other optimization frameworks. Some of these frameworks may go unmaintained or become irrelevant, and we don't want them to constrain the environment where we run the benchmarks by imposing outdated versions of indirect dependencies. I encountered similar problems with my package DifferentiationInterface, where I tried to pull in every single autodiff package in the test environment, and ended up with a lot of compatibility conflicts. I anticipate the same would happen with optimization modeling and solver libraries.

If you are interested in reading the documentation on extensions, I will be happy to answer any questions you have.

The proposed solution does not yet address the variant models, so is not yet fully covering the scope of this repo.

That is easily fixed (by me), if there is interest on your part.

I think it will add maintenance burden [...] the spirit of the repo is to be just a collection of scripts.

Ultimately it is a question of choosing programming paradigms. I would argue that modularity is never a bad thing, and that most Julia projects should be packages anyway, even if are not meant for wide reuse. This is also the workflow we recommend in https://modernjuliaworkflows.org/. Packages are the right way to deal with dependencies and versioning.
Besides, enabling code reuse comes at a relatively small cost here, the full PR is about 100 LOCs. With variants, maybe 200.

gdalle · 2024-11-13T18:15:35Z

I'd like to hear @odow's take on this

adrhill · 2024-11-13T18:18:57Z

Respectfully, I strongly disagree. What's great about the work you (and others) did here is that now other people shouldn't have to redo it. If we need to go to the repo, download it, then copy-paste individual files and change multiple lines every time we need to run benchmarks, I'm not sure anyone will bother besides you.

To add some background information, @gdalle and I are currently doing just this for a benchmark in an upcoming paper of ours. We have to provide written rosetta-opf installation instructions for readers/reviewers and tell them which commit to check out for future reproducibility. All of this could instead be handled by Pkg and a Manifest.toml.
So a real-world use case for a package already exists.

odow · 2024-11-13T22:45:17Z

I see both arguments.

I'll note that the license is very permissive: https://github.com/lanl-ansi/rosetta-opf/blob/main/LICENSE.md

gdalle · 2024-11-13T22:52:12Z

I didn't mean that the main obstacle to copy pasting would be the license, my considerations were more tied to practicality and ease of installation

ccoffrin · 2024-11-14T21:20:06Z

Respectfully, I strongly disagree. What's great about the work you (and others) did here is that now other people shouldn't have to redo it. If we need to go to the repo, download it, then copy-paste individual files and change multiple lines every time we need to run benchmarks, I'm not sure anyone will bother besides you. A functional API to create the model/run the benchmarks is much easier for scientists who don't want to dive into how models are encoded in each optimization framework.

This was not intended to be a benchmark. Like its namesake, rosetta code, it is first and foremost a tool for folks to compare, contrast and learn how different Julia optimization modeling layers work. It seems that you have found it to be a useful benchmark as well. That is great! But that is not my goal for this repo.

For example, this repo does not include the larger network data case files, which are required for doing an OPF benchmark study. I have my own separate collections of scripts for OPF benchmarking, those are not included in this repo. If I was going to make an OPF benchmark repo, it would surely be engineered much differently than this one.

This PR doesn't come from nowhere, it comes from my desire to use your benchmarking code for a research project, and my realization that it would be much more convenient with a package. Essentially, #73 replaces the manual Git cloning with automated management through Julia's Pkg.

I agree this brings some convenience, I just believe that the added maintenance burden does not sufficiently offset that convenience. I am on the hook for doing this maintenance, I have to be realistic about my personal bandwith to do this.

Extensions are a key feature of the package manager, introduced in Julia 1.9. They are critical here because they would allow users of rosetta-opf to run the code for, say, ADNLPModels without installing all of the other optimization frameworks. Some of these frameworks may go unmaintained or become irrelevant, and we don't want them to constrain the environment where we run the benchmarks by imposing outdated versions of indirect dependencies. I encountered similar problems with my package DifferentiationInterface, where I tried to pull in every single autodiff package in the test environment, and ended up with a lot of compatibility conflicts. I anticipate the same would happen with optimization modeling and solver libraries.

Package version clashes are a challenge faced by this repo. This is an interesting idea to help resolve this. How do these extensions interact with package requirements as they are specified in the Project.toml file?

Ultimately it is a question of choosing programming paradigms. I would argue that modularity is never a bad thing, and that most Julia projects should be packages anyway, even if are not meant for wide reuse. This is also the workflow we recommend in https://modernjuliaworkflows.org/. Packages are the right way to deal with dependencies and versioning.

Correct, I don't want to have to maintain versioning convention for this repo. That adds maintenance overhead.

My feeling is still that you want this repo to be a benchmark framework, where I want it to be a learning tool. Especially if you are working on a paper about optimization methods benchmarking, I think it would be best for you to make a different repo, maintained by you, that can be use for the reproducibility of that paper.

gdalle · 2024-11-15T11:43:10Z

This was not intended to be a benchmark. Like its namesake, rosetta code, it is first and foremost a tool for folks to compare, contrast and learn how different Julia optimization modeling layers work. It seems that you have found it to be a useful benchmark as well. That is great! But that is not my goal for this repo.

My mistake. I heard about this repo through your Discourse post, and though you specify at first that it is only an "example implementation", the rest of the conversation focuses heavily on benchmarks. Whether you like it or not, it seems that people want to use your code for benchmarking ^^ The only question is whether they can import it, or whether they have to download it and edit the files themselves.

For example, this repo does not include the larger network data case files, which are required for doing an OPF benchmark study. I have my own separate collections of scripts for OPF benchmarking, those are not included in this repo. If I was going to make an OPF benchmark repo, it would surely be engineered much differently than this one.

If you're open to that, I can also set up a pipeline with DataDeps.jl or DataToolkit that downloads the files from pglib-opf on-the-fly when the user asks for a specific instance. I've done this before for other projects, it's no trouble at all.

I agree this brings some convenience, I just believe that the added maintenance burden does not sufficiently offset that convenience. I am on the hook for doing this maintenance, I have to be realistic about my personal bandwith to do this.

I understand your concern, but I don't understand where the maintenance burden would come from. Moving code from a script to an extension is basically free, and it is a one-off operation which I have already done for you. As for your lack of familiarity with extensions, they are already ubiquitous throughout the ecosystem, so accepting them may be the only way forward for productive Julia development.

More importantly, if you agree to make this a reusable package instead of a collection of scripts, more people might chime in to contribute. That's exactly what I am doing right now, because I believe your work to be a very valuable resource and I want to help spread the word.

Package version clashes are a challenge faced by this repo. This is an interesting idea to help resolve this. How do these extensions interact with package requirements as they are specified in the Project.toml file?

The package requirements are still specified in the Project.toml, but most of them are under [weakdeps] instead of [deps] (see #73). The main advantage is that when installing a hypothetical RosettaOPFBenchmarks.jl package, you only download the basic requirements like PowerModels.jl. Then, you can install and load individual solver packages, and the corresponding code from the extension is executed automatically. Thus, you and potential users are free to use any subset of the solver family instead of having to install them all at once.

My feeling is still that you want this repo to be a benchmark framework, where I want it to be a learning tool. Especially if you are working on a paper about optimization methods benchmarking, I think it would be best for you to make a different repo, maintained by you, that can be use for the reproducibility of that paper.

Why duplicate efforts if I can contribute to this repository instead? You know a lot more about power flow models than I do, and perhaps I'm slightly more experienced in Julia package development. We could help each other out.

odow · 2024-11-19T02:22:07Z

@ccoffrin I might take a look at how we could restructure this repo as part of the HPC stuff.

@gdalle I don't necessarily think that we should register this as a package, but I understand your motivation. Ditto with the artifacts, etc.

odow · 2024-11-20T04:11:03Z

Here's my thoughts:

I like the simple scripts. There's nothing wrong with a bit of copy-paste
We don't gain much by making this into a package if it's just a collection of scripts, since people can't easily modify or parameterise the implementations. For example, Restructure as standalone package #73 would enable
```
using RosettaOPF
RosettaOPF.solve_opf(filename, Val(:JuMP))
```
which doesn't seem much better than
```
include("jump.jl")
solve_opf(filename)
```
Part of the confusion is that @ccoffrin actually has a branch with a CLI driver, but this isn't in the main branch of the repo, so it isn't really obvious how to "drive" rosetta-off.
I think we should move the CLI to main, and offer it as the one-true-way to interact with rosetta-opf.

The second motivation for making it a package is dependency management.

This is already a pain, because things like ModelingToolkit are really heavy and they're not all needed if you want to run only a subset of the implementations.
Ditto with the variants. They currently have a separate Project.toml because we want some additional packages that are not always up to date etc.

A third complication is that Carleton wants to benchmark with HSL, which is not open-source. Currently, his branch sets additional options.

I'd propose something like:

rosetta-opf/
    src/
        rosetta-opf.jl
        JuMP/
            Project.toml
            main.jl
        ExaModels/
            Project.toml
            main.jl
    /variants
        /JuMP_AmplNLWriter
            Project.toml
            main.jl

with usage something like

$ julia --project=src/JuMP src/JuMP/main.jl --instance=abc.m --output=output.jsonl

where each main.jl pulled in some common code from rosetta-opf loading the instance from PGLib.jl and parsing cmd line arguments, etc.

gdalle · 2024-11-20T06:17:42Z

We don't gain much by making this into a package if it's just a collection of scripts, since people can't easily modify or parameterise the implementations.

As for your comparison, you're missing the installation step, which is the main reason I did all this in the first place. Without #73, if you want reproducible experiments in your paper, you have to tell the reader to do something like

git clone [email protected]:lanl-ansi/rosetta-opf.git
cd rosetta-opf
git checkout 496d02671eb123368928b8e384e84f900db7cc3c
cd ..

git clone [email protected]:power-grid-lib/pglib-opf.git
cd pglib-opf
git checkout dc6be4b2f85ca0e776952ec22cbd4c22396ea5a3
cd ..

whereas #73 automates this (including the versioning) through Julia's package manager. The latest version of the PR also downloads the pglib-opf instances from the correct URL without the user even knowing about it.

Of course it was just a first POC, if you change your mind about this we can add more options to the function (like solver choice or maximum solve time).

The second motivation for making it a package is dependency management.

That's precisely what package extensions can help with, because you can have an environment with mandatory and optional parts.

I don't think endless debate is very productive at this point. My suggestion arose from trying to actually use your code in a research setting, and realizing how much easier it would be with a package. Has anyone else beyond you ever tried the same?
Judging by the Discourse thread, when SciML attempted to use your code, they ended up reimplementing it for their benchmarks instead. I chose to contribute to the original, but I understand that this contribution is not wanted. I'll leave my fork up in case you change your mind.

gdalle linked a pull request Nov 6, 2024 that will close this issue

Restructure as standalone package #73

Draft

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor as a package #72

Refactor as a package #72

gdalle commented Nov 6, 2024 •

edited

Loading

ccoffrin commented Nov 13, 2024

gdalle commented Nov 13, 2024 •

edited

Loading

gdalle commented Nov 13, 2024

adrhill commented Nov 13, 2024

odow commented Nov 13, 2024

gdalle commented Nov 13, 2024

ccoffrin commented Nov 14, 2024

gdalle commented Nov 15, 2024

odow commented Nov 19, 2024

odow commented Nov 20, 2024

gdalle commented Nov 20, 2024

Refactor as a package #72

Refactor as a package #72

Comments

gdalle commented Nov 6, 2024 • edited Loading

ccoffrin commented Nov 13, 2024

gdalle commented Nov 13, 2024 • edited Loading

gdalle commented Nov 13, 2024

adrhill commented Nov 13, 2024

odow commented Nov 13, 2024

gdalle commented Nov 13, 2024

ccoffrin commented Nov 14, 2024

gdalle commented Nov 15, 2024

odow commented Nov 19, 2024

odow commented Nov 20, 2024

gdalle commented Nov 20, 2024

gdalle commented Nov 6, 2024 •

edited

Loading

gdalle commented Nov 13, 2024 •

edited

Loading