-
-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactoring the viz module to a more modular design. #476
Conversation
This pull request introduces 140 alerts and fixes 5 when merging 4391711 into ea81dcd - view on LGTM.com new alerts:
fixed alerts:
|
58f2dd9
to
471fc2c
Compare
This pull request introduces 140 alerts and fixes 5 when merging 471fc2c into ea81dcd - view on LGTM.com new alerts:
fixed alerts:
|
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #476 +/- ##
==========================================
- Coverage 87.42% 87.27% -0.16%
==========================================
Files 376 355 -21
Lines 48172 47680 -492
==========================================
- Hits 42113 41611 -502
- Misses 6059 6069 +10
☔ View full report in Codecov by Sentry. |
This pull request introduces 150 alerts and fixes 5 when merging 0b8546a into 7f1f979 - view on LGTM.com new alerts:
fixed alerts:
|
This pull request introduces 152 alerts and fixes 5 when merging 0b035be into 7f1f979 - view on LGTM.com new alerts:
fixed alerts:
|
This pull request introduces 149 alerts and fixes 5 when merging 5f40203 into 29b81cf - view on LGTM.com new alerts:
fixed alerts:
|
5f40203
to
597399b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CodeQL found more than 10 potential problems in the proposed changes. Check the Files changed tab for more details.
This pull request introduces 149 alerts and fixes 5 when merging 597399b into 51b57c7 - view on LGTM.com new alerts:
fixed alerts:
Heads-up: LGTM.com's PR analysis will be disabled on the 5th of December, and LGTM.com will be shut down ⏻ completely on the 16th of December 2022. It looks like GitHub code scanning with CodeQL is already set up for this repo, so no further action is needed 🚀. For more information, please check out our post on the GitHub blog. |
This is not to push you, what is your ETA on this. |
Hmm I think it's better not to rush it, so you can publish the release without it. Actually I was waiting for your opinion on the core ideas, because maybe there are some things that can be simplified/made more functional. Whenever you have time if you could look at the html file that I provided here with a rendered notebook that would be nice :) |
Ok, I'll have a look (hopefully this week or next)! Thanks! |
Ok, I have had a look. And all-in-all I really like your ideas! I have uploaded a new notebook with comments in them, (change extension to I say we go! :) |
Thanks for looking at it! I will go through some of your comments here:
Yes, I also thought that this would be problematic with grids. But then my thought was that if it does not make sense to store its inputs and outputs it shouldn't be a node. E.g. if you have a plot that does
No, it doesn't. This is one of the reasons I want to get rid of defining all functions as nodes (see design tweak below).
Yeah this is a bit tricky, but I think Of course you can also consider that you can split all the functionality in a node into individual pieces that are linked together, but you wouldn't want to store checkpoints for all of them 😅 My thoughts for a design tweakI didn't like the fact that there was from sisl import some_func
from sisl.viz import Node, Workflow
node_func = Node.from_func(some_func)
@Workflow.from_func
def my_workflow(...):
...
a = node_func()
... I have now an implementation (not merged here) that on from sisl import some_func
from sisl.viz import Workflow
@Workflow.from_func
def my_workflow(...):
...
a = some_func()
... and it will be exactly the same as the previous code. To me this is a great improvement because:
I do the change from function calls to node calls getting the Abstract Syntax Tree, modifying it and recompiling. I don't know if it is too "hacky", but it seems very robust to me, and cleaner than any other implementation. Do you think it is fine or should I stick to "more normal" python trickery? Also, if you don't need to define all functionalities as nodes I thought that this might be useful not only for Cheers! |
I'll have to wrap my head a bit more around the node stuff, I am a bit worried it might make the code more complicated and not as intuitive. I have however thought about the dispatch methods since there are some short-comings, here the nodes might be interesting. A note on the workflows, it seems to me that if you do it via |
Which part will be more complicated? I don't mean to define sisl functions as nodes, I just mean to store the
What do you mean by doing the |
|
Regarding (update) Just converting all workflow inputs to a My first idea was to restrict the usage of workflows to very simple things. At the beggining I thought I would only allow assignments from function calls. E.g.: @Workflow.from_func
def my_workflow(a, b):
val = calc_val(a, b)
shift = calc_shift(a)
return shift_val(val, shift) I think if you do this you are forcing devs to create easy to understand workflows and it is very clear where the checkpoints are. Then I thought I could just replace function calls by node calls and everything would work the same. So in my current implementation it is now valid to do: @Workflow.from_func
def my_workflow(a, b):
return shift_val(calc_val(a, b), calc_shift(a)) and it produces the same workflow with the same checkpoints. My idea in doing this was that it would be much simpler to convert any given function into a workflow, which is nice. However, I don't know if I should go back to forcing the first syntax. I like it because it explicitly declares the checkpoints. And you can access checkpoints of the workflow from variable names (e.g. What I'm thinking now is that I could support the second syntax, but a variable declaration could mean that we should create a breakpoint. This is a bit different because checkpoints would be a thing external to nodes instead of using their intrinsic functionality. One could allow: @Workflow.from_func
def my_workflow(a, b):
# Private variables indicate that these are just temporal values and should not be checkpointed
_val_left = calc_val(a, b)
_val_right = calc_val(b, a)
# I want to create a checkpoint here
val = _val_left + _val_right
# Another checkpoint is implicitly created for the output.
return shift_val(val, calc_shift(a)) Sorry for the long text :) Any ideas on what you think is best? |
In the last idea, you wouldn't necessarily need to use @Workflow.from_func(store=["val"])
def my_workflow(a, b):
# Temporal values that are not checkpointed
val_left = calc_val(a, b)
val_right = calc_val(b, a)
# I want to create a checkpoint here
val = val_left + val_right
# Another checkpoint is implicitly created for the output.
return shift_val(val, calc_shift(a)) |
Would all val = my_workflow.nodes.calc_val.get() should be the way to get it, no? Since each node is accessible and the values them selves are |
To add to that, if a user wants a check-point, they should use a node! :) |
Yes, but that's what happens automatically, thanks to your advice on
Yes, exactly. Naming checkpoints by their node name has the downside that if you use the same node more than once you need to modify the names. What this PR implements right now is to add a number to the name if that's the case. I.e. there is I thought that giving names to checkpoints was much nicer for this and because a normal user would understand it much better without knowing the inner workings I believe. The naming for the checkpoints would just be a local table to access the nodes outputs, it doesn't impose anything on the node itself. So a separate workflow would be able to use the same node and map its output to another name. (this is just the normal way variables work in python, nothing particularly strange, you can have multiple variables pointing to the same object and each named differently). |
One could even argue if the variable name is more stable than the node name. You might change the way you compute a given variable of your workflow, or the function name from a third party package might simply change. |
Well, everything is up for changes ;) With this I think your idea of notifying the workflow what to expose is the best idea. @Workflow.from_func(store=["val"]) You could imagine a The above approach ensures that the implementors of the workflow will document the stored variables for consistency and ease of use. |
Yes, my idea was exactly that! It would generate a copy of the workflow with this extra checkpoint.
I would say it requires the same amount of documentation as any other object, no? You need to document the variables that you expose and try not to change them, private variables need less documentation since they are not expected to be used by users, etc... |
Good.
Yes, but here it is a bit more difficult, right. Ideally the |
Yes, I think of the workflow more as an object with state instead of a function. So what I meant is that you need to document the checkpoints of your workflow as you do with the state of any other object in python. |
Yes, just so we agree that the documentation needs to follow the store values ;) |
Just an idea, would it make sense to do something like this: def inject_plot(yfunction, base=sisl.viz.Plot):
def dec(cls):
d = dict(cls.__dict__)
d["function"] = staticmethod(function) # or possibly check if it needs to be a class call?
return type(cls.__name__,cls.__bases__ + (base,), d)
return dec
@inject_plot(a_cool_plot)
class CoolPlot:
def color_like(self, color): ... Or is this annoying, another complexity level? Otherwise, it looks good! :) |
I think the But I like the idea of the decorator as an alternative to defining the By the way is it possible to support class CoolPlot(Plot, a_cool_plot):
... ? (I know, a bit crazy haha) Or maybe class CoolPlot(Plot[a_cool_plot]):
... This one is possible for sure, but I don't know if it is sane, you tell me 😅 |
But my decorator is having Your first is not viable, and the 2nd is too obscure IMHO... |
Yes I know that it has the The Maybe if we don't agree we could just leave it as it is for now. And then if someone uses it (which would already be very nice) and complains about it being too verbose we can think about it. |
I think this is too weird. I would think an index of a plot could possibly be hooked only to the backend, that wouldn't be so weird. But an indexer that subclasses with a function handle... hmm... not something I would expect.
Yes, lets do that. :)
I think here it would be better to use Could you go through them and make the tests succeed, then I'll merge! |
Fixed the bug :) But don't merge yet, I realised I still need to add some docstrings. |
Hmm this is very strange, the test is failing with exactly the same exact gap value and only for spin orbit. And I can't reproduce the failing test locally. Can you? The test data is generated randomly so I wonder if the CI sets a value for the random seed and that is what explains the exact same value error. I don't think it is just a numerical issue, because from 2.5 to 2.46 it's too much error. That's why I don't want to add |
I get these errors:
I did use the updated one... |
But is the gap you have entered to the test the correct one? I can not imagine anything where the gap is exactly 2.5... |
Those errors are locally in your computer? In this case it is exactly 2.5 (to machine precision, which I don't think will introduce an error of 0.03) because the data is sintetically generated with that gap. |
I thought the error could be related to python 3.11 but even with 3.11 I could not reproduce it 😪 |
Yes, locally on my machine... That is weird... |
And to add more mistery it only fails for spin orbit, not even non-colinear spin. Hmm I think I'll mark it as skip for now. The workflow one I also get it so I'll try to fix it. |
Ok, interesting, I can reproduce if I run the full test suite (previously I was only running the |
Sounds like a state? |
It's a bug in the generation of the synthetic data. The bands are random polynomials, and if there were band crossings in the valence band the gap was not properly set. At some point of running the tests numpy's random seed is set and that's why it's generating always exactly the same gap. |
Could you rebase, then the tests should run through |
I still haven't pushed the fix, give me 30 min or something like that hehe |
7870e52
to
fc9d1b0
Compare
Unbelievably, this is finally ready for merge :) |
fc9d1b0
to
0dfeb39
Compare
Sorry, as always I was pushing a change in By the way, I remember now that I didn't change the naming of the modules. I don't know what to rename them to :( |
Some things that needs to be cleaned in the commits (probably I'll do a squash to remove everything)
I think that is more or less it. |
Everything is now based on functions that adhere to functional programming. We combine these functions to create workflows and then use the sisl.nodes framework to provide the update functionality. Plotting backends are now implemented each on a single `Figure` class that implements generic drawing methods, as well as subplots, animations... functionality.
0dfeb39
to
bc1ea37
Compare
Done, sorry. Shouldn't the |
Refactoring the viz module to a more modular design.
The discussion for this PR started in #446, but the final aim is very different from what I intended at the beggining of that PR, so I prefer to open a new one.
Basically, I refactored the module to have a more modular and functional design while keeping the functionalities that I thought were important. The refactoring is still not complete. I have refactored PDOS and bands as a proof of concept and wanted to know your opinion before carrying on.
As a first introduction and to display the differences, this is now the code of
BandsPlot
, which at a high level is much easier to understand from scratch than the previous design:The main ideas are explained in this notebook (I include the html output so that you don't need to run it), because I thought it was much easier to explain in a notebook. I tried to not go into the details and explain the framework at a high level, so I expect it will be a fast read even if it seems a bit long.
Looking forward to hear your thoughts whenever you have time!