-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Design simplification #728
Comments
Thanks for compiling and sharing your impressions as a new user! There is a lot of food for thought here. I will make some comments although I can't guarantee that I will reply to continued discussion on these points.
This is a matter of style and what you're comfortable with. I understand that you are familiar and comfortable with a different style.
PyClaw is not only a package itself, but is also used by other packages, and is used by people with different backgrounds and skills. Because of that, it's set up to be used in different ways. This is not optimal for you but each of the methods you mention is needed and used by someone.
Patches are an abstraction needed for adaptive mesh refinement. If you're not using AMR, you don't need to know about or work with Patches, only Grids.
This is historical and could probably be improved. Feel free to make a concrete suggestion (PR). But you can set them by name in a sense, using things like
Can you explain why? In my view,
If I recall correctly, they are separate because for large parallel runs it's essential to know in advance if you need to do an allreduce (which is required for F but not for p).
That's okay; you can ignore it. It is used heavily by others, myself included.
I'm not sure what you're referring to here. Of course, some of the available examples are much longer than others, and not every one of them would be a good example to start with. The basic example at https://www.clawpack.org/pyclaw/tutorial.html is about 20 lines. |
Sure, it's a different style, but it's more than that. It forces a certain helpfully explicit structure: When I see Right now you have a lot of parameters in your I'm not advocating for a totally functional paradigm. It can make sense to hold data in objects and have methods operate on that data. But consider what's easier to follow in different situations. Constructors generally take parameter values to set to object variables. Right now a lot of parameters feel hidden, buried in the source code or among a heap of others in the documentation pages, so it can feel difficult to find which ones I need to set and what I need to set them to. Contrast with function-specific docstrings that specify exactly how each input affects the behavior of that specific function. I get to not read about parameters that are irrelevant to the call I want to make, and I get to not read about cases where parameters might be used in other ways elsewhere in the object. Confusion is less likely. I can do the thing I want to do with minimum viable information.
I don't think "There's a lot downstream" is in itself a sufficient reason to preserve the status quo. You can always create a new version where old functionality is deprecated and eventually phase it out altogether, and users can keep using particular older versions if they want. I believe there should be a conscious drive toward simplification every time changes are made to any code, generally. Otherwise code entropy can overtake a project. "What is the minimal set of options that covers the use cases?" If there need to be several, maybe they can be made to feel similar (like how I suggested always taking a
Can we always safely get away with using
This data is unique to each problem, right? There is a unique Riemann solver (typically in Fortran) for each problem, so it kind of feels like the wrapping Solver (in Python) is a more natural 1:1 where constants unique to each problem should go.
Is there a way to make that distinction automatically in the background somehow, so the user doesn't have to be faced with both
I was being slightly snarky there. Sorry. Shorter examples are definitely possible, but I've found myself drawn to the gallery and the example source code links there, because they're very findable. But they're not super accessible. They're often made to show off extra functionality of the library. Even so, setting up more complex problems does take a significant number of lines, on the order of 100, maybe. That's a barrier to entry. The easiest way I see to save lines is to turn |
Before I launch into my own particular thoughts, I would say that many of your points are valid, but that the realities of scientific software development can get in the way. We do not get funded to develop, maintain, and improve the software beyond specific funding efforts and the small bit of free-time any one developer has. The resulting tower-of-cards we tend to build is a more profound problem that bears discussion, but as of now has no great solution. Now off my soap-box:
|
I totally get you on the no direct funding house of cards thing. Why has the export-as-PDF feature in Jupyter notebook caused an HTTP 500 error for years? I buy the cathedral vs the bazaar argument that open source is often better, but there really are cases where the logic breaks down because incentives just aren't there. In scientific computing, where the abstruse technical details are at a maximum and money is at a minimum, where there are few eyes to make all bugs shallow and few hands to lighten the load, it can be pretty challenging, unless you get super popular because ML hype or something. I don't have the latitude to spend a huge portion of my time improving clawpack, nor do I have the kind of over-arching knowledge to spot all the opportunities for improvement or take care of them efficiently. But I do have a maybe usefully-naive perspective as a user, and I will probably continue to need to play that role, so if I can make that incrementally better for future-me as well as others, I'm keen. I just want to make sure we're on the same page about what we can and should try to tackle, so I don't pour time in to a PR that can't be merged or something. (I once did that, and I'm still bitter about it lol, still trying to eat that guy's lunch in terms of user numbers: exportify vs exportify. My app is first on Google--for now.) To that end, I appreciate you guys being responsive on this thread. Personally, as someone who's gone through the ML side and long been disillusioned with its limitations, I think sims of this kind can help extend models' predictive and sensing power in the future. (If we ever have a call with @nathankutz, we can talk more about that vision.) If we want to position this library to take more user share over time, so our bugs can be shallow and our street cred (or even published results) as contributors can have increasing value, then earlier improvements can pay greater dividends. |
@pavelkomarov Thanks for your thoughts. For any PR that will require a significant amount of work, it's definitely a good idea to open an issue for discussion first to avoid wasting anyone's time. While we are certainly motivated to make the software as easy to use as we can, this is a secondary priority after enabling research. I'm not sure that "taking more user share" is a good goal for Clawpack; we don't get paid by the users and in a sense more users just means more work that we have little time to do. But improvements that enable people to effectively use the code with less help / less problems are something I want very much, and I think that's what you're aiming for too. |
I have some suggestions after reading through more of the documentation:
Controller
'stfinal
and aState
'sq
values and aSolver
's BCs, should be arguments to the respective constructors. I don't want to have to futz with object parameters after object creation, and I don't want to be able to create an object that isn't usable because something isn't yet set. Take inspiration from scikit-learn, with those long parameter lists, mostly kwargs set to defaults, with BIG docstrings explaining how each controls the object.Solution
from a file, a frame number in memory, from a lower-level list of dimensions, etc. constitute a lot of pretty different methods, each of which takes brain space. Give me one function or one constructor with well-documented parameters to handle the different cases. Removing support for less-used options or folding them in to others will greatly shorten the code and make it easier to maintain. For example, aController
can iterate over basically alinspace
of times, or over a list of output times, or just give an output frame everyk
th iteration. If you say "Always give me a list", then that covers the first two use cases (because I can in-line alinspace
call somewhere), and you can remove the other two paths from the code.State
s take aDomain
,Patch
,Grid
, or list ofDimension
s at initialization, and then theSolution
, which containsState
s also takes geometry. Why? Can't it get the geometry by asking its sub-objects for their geometries? Maybe there is a good reason, but I found this confusing and possibly redundant.Patch
"contains a reference to a nearly identicalGrid
object". If they're nearly identical, why are there two, neither inheriting from the other? Remember the Zen of Python, "There should be one-- and preferably only one --obvious way to do it."problem_data
feels like it should belong to theSolver
for a given problem, not to theState
.compute_p
andcompute_F
? I get the sensep
is defined over the whole geometry, andF
can only be a single value. But they're both derived quantities. I say the concepts should be merged: You can either compute a derived quantity over the whole field, or you can sum out a scalar value.run_app_from_main
utility. It's easier to callrun
on aController
and then call theplot
function or do anything else I want with it afterward. I feel like trying to make a command-line utility withhtmlplot=True
and other specially-named arguments isn't necessary and again gives me an extra way to do things (also hides where some things are actually happening), when my degrees of freedom already feel overwhelming.Of all of them, the first is the most important.
In my opinion, on the surface, at the level of examples, it should be far fewer than 300 lines to get going. A lot of complexity can be abstracted or deduplicated away. Seduce me as a user.
That said, I think the division of things into the various
.py
files makes pretty good sense. The fact the Riemann Solvers live in a totally different package was my most major point of confusion. There is just a lot of code. I find myselfgrep
ing to find things often.The text was updated successfully, but these errors were encountered: