-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pyiron_workflow
-independent dict containing workflow info
#576
Comments
Today, I had a discussion with @jan-janssen while jogging, and he sounded he's actually interested in extending the universal representation, and I thought about it now for quite some time for macro nodes. With this in mind, question for @jan-janssen, but also for @liamhuber: On this page @jan-janssen defined the universal representation of the following code: def add_x_and_y(x, y):
z = x + y
return x, y, z
def add_x_and_y_and_z(x, y, z):
w = x + y + z
return w
x, y, z = add_x_and_y(x=1, y=2)
w = add_x_and_y_and_z(x=x, y=y, z=z) via (I defined a workflow_dict = {
"nodes_dict": {
0: add_x_and_y,
1: add_x_and_y_and_z,
2: 1,
3: 2,
},
"edges_lst": [
{'target': 0, 'targetHandle': 'x', 'source': 2, 'sourceHandle': None},
{'target': 0, 'targetHandle': 'y', 'source': 3, 'sourceHandle': None},
{'target': 1, 'targetHandle': 'x', 'source': 0, 'sourceHandle': 'x'},
{'target': 1, 'targetHandle': 'y', 'source': 0, 'sourceHandle': 'y'},
{'target': 1, 'targetHandle': 'z', 'source': 0, 'sourceHandle': 'z'},
]
} Now my suggestion: How about changing it to the following? nodes_dict = {
0: add_x_and_y,
1: add_x_and_y_and_z,
}
edges_lst = [
{'target': 0, 'targetHandle': 'x', 'source': "input", 'sourceHandle': "X", "value": 1},
{'target': 0, 'targetHandle': 'y', 'source': "input", 'sourceHandle': "Y", "value": 2},
{'target': 1, 'targetHandle': 'x', 'source': 0, 'sourceHandle': 'x'},
{'target': 1, 'targetHandle': 'y', 'source': 0, 'sourceHandle': 'y'},
{'target': 1, 'targetHandle': 'z', 'source': 0, 'sourceHandle': 'z'},
{'target': "output", 'targetHandle': "W", 'source': 1, 'sourceHandle': 'w'},
] The reason is fairly easy: Let's say I have the following macro node: def all_at_once(X, Y):
x, y, z = add_x_and_y(x=X, y=Y)
w = add_x_and_y_and_z(x=x, y=y, z=z)
W = w
return W Then I can define: macro_dict = {
"nodes_dict": {
0: add_x_and_y,
1: add_x_and_y_and_z,
},
"edges_lst": [
{'target': 0, 'targetHandle': 'x', 'source': "input", 'sourceHandle': "X", "value": 1},
{'target': 0, 'targetHandle': 'y', 'source': "input", 'sourceHandle': "Y", "value": 2},
{'target': 1, 'targetHandle': 'x', 'source': 0, 'sourceHandle': 'x'},
{'target': 1, 'targetHandle': 'y', 'source': 0, 'sourceHandle': 'y'},
{'target': 1, 'targetHandle': 'z', 'source': 0, 'sourceHandle': 'z'},
{'target': "output", 'targetHandle': "W", 'source': 1, 'sourceHandle': 'w'},
]
} Now let's say we have the following workflow: w = all_at_once(X=1, Y=2)
x, y, z = add_x_and_y(x=w, y=3) Then the new definition would allow us to write something like: workflow_dict = {
"nodes_dict": {
0: macro_dict,
1: add_x_and_y_and_z,
},
"edges_lst": [
{'target': 0, 'targetHandle': 'X', 'source': "input", 'sourceHandle': None, "value": 1},
{'target': 0, 'targetHandle': 'Y', 'source': "input", 'sourceHandle': None, "value": 2},
{'target': 1, 'targetHandle': 'y', 'source': "input", 'sourceHandle': None, "value": 3},
{'target': 1, 'targetHandle': 'x', 'source': 0, 'sourceHandle': 'W'},
{'target': "output", 'targetHandle': None, 'source': 1, 'sourceHandle': 'x'},
{'target': "output", 'targetHandle': None, 'source': 1, 'sourceHandle': 'y'},
{'target': "output", 'targetHandle': None, 'source': 1, 'sourceHandle': 'z'},
]
} From my point of view, this conceptually incorporates macro nodes. Besides, after the execution |
I forgot to mention, but if we replace the enumeration in the node_dict keys by the node labels, with this representation we can extract all the information that I listed above except for the type hints, which we can probably also optionally return. |
I like the idea of an output handle, as previously it was not directly clear which property the user is interested in. In terms of nesting the macro dictionary inside the node dictionary, I am not so happy as I initially wanted to separate the nodes / python functions from the edges / logic. Now with the nested structure both get merged. Another suggestion would be: workflow_dict = {
"mymacro" : {
"nodes_dict": {
0: add_x_and_y,
1: add_x_and_y_and_z,
},
"edges_lst": [
{'target': 0, 'targetHandle': 'x', 'source': "input", 'sourceHandle': "X", "value": 1},
{'target': 0, 'targetHandle': 'y', 'source': "input", 'sourceHandle': "Y", "value": 2},
{'target': 1, 'targetHandle': 'x', 'source': 0, 'sourceHandle': 'x'},
{'target': 1, 'targetHandle': 'y', 'source': 0, 'sourceHandle': 'y'},
{'target': 1, 'targetHandle': 'z', 'source': 0, 'sourceHandle': 'z'},
{'target': "output", 'targetHandle': "W", 'source': 1, 'sourceHandle': 'w'},
]
},
"main": {
"nodes_dict": {
0: "mymacro",
1: add_x_and_y_and_z,
},
"edges_lst": [
{'target': 0, 'targetHandle': 'X', 'source': "input", 'sourceHandle': None, "value": 1},
{'target': 0, 'targetHandle': 'Y', 'source': "input", 'sourceHandle': None, "value": 2},
{'target': 1, 'targetHandle': 'y', 'source': "input", 'sourceHandle': None, "value": 3},
{'target': 1, 'targetHandle': 'x', 'source': 0, 'sourceHandle': 'W'},
{'target': "output", 'targetHandle': None, 'source': 1, 'sourceHandle': 'x'},
{'target': "output", 'targetHandle': None, 'source': 1, 'sourceHandle': 'y'},
{'target': "output", 'targetHandle': None, 'source': 1, 'sourceHandle': 'z'},
]
},
} |
Comment on the use of values in the edges. I feel it is dangerous as the user might not realize that two values are related to each other. Let's say I hard code the lattice constant for aluminium in my workflow. When I use it in multiple structure generators, I might define the same value in multiple edges. For somebody using my workflow it is not clear how these values are related. |
At a high level I have three pieces:
|
Given # Function node
def add_x_and_y(x, y):
z = x + y
return x, y, z
# Function Node
def add_x_and_y_and_z(x, y, z):
w = x + y + z
return w
# Macro node
def all_at_once(X, Y):
x, y, z = add_x_and_y(x=X, y=Y)
w = add_x_and_y_and_z(x=x, y=y, z=z)
W = w
return W
# Workflow
all_at_once(1, 2) I would imagine a graph representation like this: my_first_node = {
"function": add_x_and_y,
"inputs": ("x", "y",), # These can be automatically scraped
"outputs": ("x", "y", "z"), # This can be scraped in pyiron_workflow
# but not in general -- nevertheless we need some way of identifying outputs,
# so I guess some meta data like this dictionary is necessary to accompany functions
}
my_second_node = {
"function": add_x_and_y_and_z,
"inputs": ("x", "y",),
"outputs": ("w",),
}
my_macro = {
"inputs": ("X", "Y",),
"outputs": ("W",),
"nodes": { # Label to callable
"xy": my_first_node,
"xyz": my_second_node,
},
"edges": (
("inputs.X", "xy.inputs.x"),
("inputs.Y", "xy.inputs.y"),
("xy.outputs.x", "xyz.inputs.x"),
("xy.outputs.y", "xyz.inputs.y"),
("xy.outputs.z", "xyz.inputs.z"),
("xy.outputs.w", "outputs.W"),
)
}
workflow = {
"inputs": ("X", "Y",),
"outputs": ("W",),
"nodes": {"m": my_macro},
"edges": (
("inputs.X", "m.inputs.X"),
("inputs.Y", "m.inputs.Y"),
("m.outputs.W", "outputs.W"),
)
} Note that the IO is always explicitly defined, and thus macros are able to indicate how their IO communicates with their children, but there is no direct connection between a macro's siblings and its children. (Maybe this is already the case in @jan-janssen's example! I am just having a bit of trouble grokking it.) Then notice that here even the functions are required to have an explicty "graphy" representation dictionary. In my_first_node = {
"function": add_x_and_y,
"inputs": ("x", "y",),
"outputs": ("tup"), # Now the whole return statement is parsed as a single tuple
# Of course this flexiblity only goes so far -- we couldn't pull out "(x, y), z"
# It's all together or all apart
} Ok, and finally note that we can imagine extending such a terminology to include annotations: "inputs": ( ("x", None), ("y", u(int, units="meters"),), or "inputs": {
"x": None,
"y": u(int, units="meters"),
} So coming back to a high level after a bit more percolation: I like where you guys are going with this. Excluding the fact that I've dropped values, I think we've got basically all the pieces you mentioned in your first post, @samwaseda. I think my favourite thing so far is my example above + the (label, annotation) dictionary for inputs and outputs. Until I wrote up this example, I hadn't really thought about it in this generic workflow language context, but now I see we really do need to constrain the bottom-most graph callables -- i.e. the root most functions. We certainly can't graph-ize anything, e.g. functions with multiple return branches with different numbers/natures of items would totally sink this approach, so we'd better constrain them. |
The whole idea is to represent a workflow that a user has run or is going to run, that means there is no way whether two values are supposed to be the same or accidentally the same, i.e. we could think about: output_1 = my_function(structure_1)
output_2 = my_function(structure_2)
output_3 = my_function(structure_2) Let's assume structure = structure_generator(some_input)
output_1 = my_function(structure_1)
output_2 = my_function(structure)
output_3 = my_function(structure) and include I can live with @liamhuber's representation (not to mention also with @jan-janssen's), except for the fact that I would like to have a straightforward extension to include values, i.e. instead of: workflow = {
"inputs": ("X", "Y",),
"outputs": ("W",),
"nodes": {"m": my_macro},
"edges": (
("inputs.X", "m.inputs.X"),
("inputs.Y", "m.inputs.Y"),
("m.outputs.W", "outputs.W"),
)
} I would suggest: workflow = {
"inputs": ("X", "Y",),
"outputs": ("W",),
"nodes": {"m": my_macro},
"edges": (
{"input": "inputs.X", "output": "m.inputs.X"},
{"input": "inputs.Y", "output": "m.inputs.Y"},
{"input": "m.outputs.W", "output": "outputs.W"},
)
} This being said, it might be slightly more difficult for @jan-janssen to accept this representation, because it would involve more parsing when it's read by the workflow manager. This being said, our suggestions are not so different from each other, so I'm gonna make an example for @liamhuber's one for now and open a PR here. |
Sorry actually since one dictionary should be returned, the representation should look like this: result = {
"my_first_node": {
"function": add_x_and_y,
"inputs": ("x", "y",), # These can be automatically scraped
"outputs": ("x", "y", "z"), # This can be scraped in pyiron_workflow
# but not in general -- nevertheless we need some way of identifying outputs,
# so I guess some meta data like this dictionary is necessary to accompany functions
}
"my_second_node": {
"function": add_x_and_y_and_z,
"inputs": ("x", "y",),
"outputs": ("w",),
}
"my_macro": {
"inputs": ("X", "Y",),
"outputs": ("W",),
"edges": (
{"input": "inputs.X", "output": "my_first_node.inputs.x"},
{"input": "inputs.Y", "output": "my_first_node.inputs.y"},
{"input": "my_first_node.outputs.x", "output": "my_second_node.inputs.x"},
{"input": "my_first_node.outputs.y", "output": "my_second_node.inputs.y"},
{"input": "my_first_node.outputs.z", "output": "my_second_node.inputs.z"},
{"input": "my_second_node.outputs.w", "output": "outputs.W"},
)
}
"workflow": {
"inputs": ("X", "Y",),
"outputs": ("W",),
"edges": (
{"input": "inputs.X", "output": "my_macro.inputs.X"},
{"input": "inputs.Y", "output": "my_macro.inputs.Y"},
{"input": "my_macro.outputs.W", "output": "outputs.W"},
)
}
} I also removed |
About to go to sleep, but quickly: I think you probably still want a nodes field, otherwise it's purely implicit which node belongs to which macro/workflow, accessible only by parsing the edges. Might be fine to not require it to specify the dictionary, but for reading the dictionary I'd still like to see these. |
At least for the general workflow definition, I do not expect anybody to write this graph directly. Rather I expect them to use one of the existing workflow frameworks and export the graph in this format. The advantage is that the current workflow frameworks all have a way how to declare the expected output, even though they might not agree to do it in the same way. |
That's true... One possibility is to merge the ideas of you two: I changed @jan-janssen's version with node and variable combined workflow_dict = {
"mymacro" : {
"nodes_dict": {
"xy": add_x_and_y,
"xyz": add_x_and_y_and_z,
},
"edges_lst": [
{'target': 'xy.inputs.x', 'source': "input.X", "value": 1},
{'target': 'xy.inputs.y', 'source': "input.Y", "value": 2},
{'target': 'xyz.inputs.x', 'source': 'xy.outputs.x'},
{'target': 'xyz.inputs.y', 'source': 'xy.outputs.y'},
{'target': 'xyz.inputs.z', 'source': 'xy.outputs.z'},
{'target': "output.W", 'source': 'xyz.outputs.w'},
]
},
"main": {
"nodes_dict": {
"mymacro": "mymacro",
"xyz": add_x_and_y_and_z,
},
"edges_lst": [
{'target': 'mymacro.inputs.X', 'source': "input", "value": 1},
{'target': 'mymacro.inputs.Y', 'source': "input", "value": 2},
{'target': 'xyz.inputs.y', 'source': "input", "value": 3},
{'target': 'xyz.inputs.x', 'source': 'mymacro.outputs.W'},
{'target': "output", 'source': 'xyz.outputs.x'},
{'target': "output", 'source': 'xyz.outputs.y'},
{'target': "output", 'source': 'xyz.outputs.z'},
]
},
} And I claim this is closer to what pyiron_workflow does than your representation, because in your representation the combination of |
I am not bound to a specific notation and for me the two are equivalent: {'target': 'xyz', 'targetHandle': 'x', 'source': 'xy', 'sourceHandle': 'x'}, and: {'target': 'xyz.inputs.x', 'source': 'xy.outputs.x'}, so from that side, I am fine with either solution. The part where I am still not sure, is where to store the values. I could live with using the "value" field of the edges for optional storage of outputs, but at least for the default input values and the parameters the user is able to change, I would prefer those to be nodes. |
I'm not sure if I'm understanding the problem correctly. I was thinking about writing a function like |
Does it make sense to have the outside input and output variables specifically marked. I mean when I search for a workflow which calculates a bulk modulus, I do not really want to parse all the edges to see if the bulk modulus is an output edge. |
I guess that's definitely an extension that we can think of, but I thought the universal representation was the minimum representation to communicate with other nodes, and as long as there is a possibility to extract data, you don't want to give more requirements to the workflow managers? In other words, doesn't it make sense to agree on a minimum representation and run everything else outside? |
@liamhuber How do I know that there's a connection between @Workflow.wrap.as_macro_node
def all_at_once(macro, X, Y):
macro.x, macro.y, macro.z = add_x_and_y(x=X, y=Y)
macro.w = add_x_and_y_and_z(x=x, y=y, z=z)
W = macro.w
return W
my_macro = all_at_once() |
Sounds good to me. At any rate, I think we need to include such a function-input-output representation at this dictionary level. For instance, @samwaseda's immediately following example here is, as far as I can tell, incomplete and uncomputable because the generic representation has no way to know that |
I think this comes naturally from my last example, where "inputs" and "outputs" are explicit fields of each abstract object. In my example it was just a tuple of string names, but these fields could hold doubles of (name, annotation), triples of (name, annotation, value). Not that these need to be triples, we could as well have |
Ah yeah I see that I actually totally ignored your inline comment XD |
So first off, you can't run that. The example is incomplete anyhow, but all children of macros need to be nodes, so I'm going to operate on the assumption that Here's an operational example that I think is equivalent: import pyiron_workflow as pwf
@pwf.as_function_node
def add_x_and_y(x, y):
z = x + y
return x, y, z
@pwf.as_function_node
def add_x_and_y_and_z(x, y, z):
w = x + y + z
return w
@pwf.as_macro_node
def all_at_once(macro, X, Y):
macro.xy = add_x_and_y(x=X, y=Y)
macro.xyz = add_x_and_y_and_z(
x=macro.xy.outputs.x,
y=macro.xy.outputs.y,
z=macro.xy.outputs.z,
)
W = macro.xyz
return W
print(all_at_once.preview_io())
my_macro = all_at_once()
my_macro(1,2) We can then return to the question "How do I know that there's a connection between The answer is that a the class level, you don't. You know that
But at the end of the day, at the class level the macro is basically taking our word for it that we've defined the IO of the macro in a valid way. Once we move to the instance, we'll get errors when we try to create the macro instance if we've screwed things up. For instance, if we try Now, finally, we get to your answer: we demand that everything coming out of the macro be channel data because the role of a macro is to execute a subgraph; the macro's IO is merely an interface to the IO of this subgraph. We handle the relationship between the macro IO and its child IO with "value receiver" pairs. Unlike a connection which needs to form between (input/output)-(output/input), there is only ever one value receiver and it forms (input/output)-(input/output). All it does is keep the Thus, in the running example above: my_macro.xyz.outputs.w.value_receiver is my_macro.outputs.W
>>> True
my_macro.xyz.value_receiver is my_macro.outputs.W # exploiting single-output property
>>> True
my_macro.outputs.W in my_macro.xyz.connections
>>> False
# Obviously, they are both outputs! It would be super cool to know more of this stuff at the class level, but I don't currently see a way how. At any rate, for the generic workflow representation, I still think we don't need to know that these two have a connection; A macro is just like any other function and the workflow only cares (and only should care) what it's IO is. Then, in the macro's own generic definition, the macro knows how its IO maps to the IO of its subgraph (again, because we need explicit "input" and "output" fields in our generic workflow description!) Now the big caveat -- what about ontological information? Ideally, we want the macro output channels to somehow be able to read their subgraph and make sure that their output channels assume the same ontological- (and, for that matter type-)information as the child channels to which they are value coupled. Right now, I don't even do this for types. I do think it's possible at the instance level, but I don't yet see how to guarantee it at the class level. So for now I would suggest that we simply demand that users go to the trouble of explicitly annotating their workflows, and that we go in and add the |
Now it is my turn to feel like a Russian novelist 😝 |
Yeah and I feel like I'm the one who's drunk XD |
wowowowow now I understand the punchline of this story. It feels like you actually declared both Raskolnikov and Sonya dead in the epilogue XD |
But then I don't really get it: When you draw the graph, it looks like the inputs and the outputs are correctly connected. How are these arrows drawn? |
When who draws it? When drawing a graph, the data edges for "connections" are always shown between ports of sibling nodes. It would be perfectly reasonable to then open up and resolve a macro in the same image and show lines from its input channels to the input channels of its children, and from the output channels of its children to its output channels, but (a) these lines, while meaningful, aren't true data connection edges, and (b) when we define a macro as we do currently -- by decorating some function -- we can't inspect the innards like this until it has been instantiated. Right now only the IO of the macro is machine-verifiable at the class level. I'd love to change (b), and maybe we can make some headway by making the graph constructor into a graph-constructor-recipe that is available as a class method... I think such an attack should be doable, but it would take some effort and I don't know how easy it would be to pull off while still maintaining the current user-ease for defining macros. So, like, I'm open to it, but it will cost time at a minimum and likely some UX penalty. |
Actually, talking through it, I'd really like to change (b). It's going to take some thinking for how to not make it too much more verbose for users though |
ok actually it looks like I misunderstood something. I reformulate my question: How can I know that in the example above that |
How can you know as a human, or how does the computer know? As a human you just read the macro definition and know that return values are getting mapped to outputs, and that value receivers are the functionality we use to handle that mapping. As a computer, so far we only care about knowing at subgraph creation time (aka macro instantiation time). For the technicalities of building the mapping it's best to just go read the macro class. As a computer post facto, yes you could iterate over macro inputs and all the child outputs and search for their value receivers (all the macro inputs will have one, most of the macro child output usually won't, but the fraction depends totally on your graph) |
Ah, unless you meant my common workflow definition. There I stuck it in the edges, despite the fact I keep asserting it isn't one: I also had a typo, that should be "xyz" |
ah now I finally got it. I don't know how I got stuck, but I was stuck for some weird reasoning not even worth explaining. I'm gonna work on the prototype and will present it probably today. |
Now my next suggestion: workflow_dict = {
"mymacro": {
"nodes_dict": {
"xy": add_x_and_y,
"xyz": add_x_and_y_and_z,
},
"inputs": ("X", "Y"),
"outputs": ("W", ),
"edges_lst": [
{'target': 'xy.inputs.x', 'source': "input.X", "value": 1},
{'target': 'xy.inputs.y', 'source': "input.Y", "value": 2},
{'target': 'xyz.inputs.x', 'source': 'xy.outputs.x'},
{'target': 'xyz.inputs.y', 'source': 'xy.outputs.y'},
{'target': 'xyz.inputs.z', 'source': 'xy.outputs.z'},
{'target': "output.W", 'source': 'xyz.outputs.w'},
]
},
"main": {
"nodes_dict": {
"mymacro": "mymacro",
"xyz": add_x_and_y_and_z,
},
"edges_lst": [
{'target': 'mymacro.inputs.X', 'source': "input", "value": 1},
{'target': 'mymacro.inputs.Y', 'source': "input", "value": 2},
{'target': 'xyz.inputs.y', 'source': "input", "value": 3},
{'target': 'xyz.inputs.x', 'source': 'mymacro.outputs.W'},
{'target': "output", 'source': 'xyz.outputs.x'},
{'target': "output", 'source': 'xyz.outputs.y'},
{'target': "output", 'source': 'xyz.outputs.z'},
]
},
} I'm still reluctant to put input and output arguments in the main workflow definition, because |
Yeah, by and large I'm happy with this. I still think it's a technical criticality to wrap the underlying functions -- again in this example I don't see a way for the machine to parse "xy": {
"function": add_x_and_y,
"inputs": ("x", "y"),
"outputs": ("x", "y", "z"),
}
At a high level, I will still push for these to be included. First, because I don't see a reason for "macros" to have a fundamentally different syntax from "workflows". We need IO for macros, so I think it improves consistency and readability to give them to workflows too. Second, I will obviously promote Very specifically, I'm totally content with As a nit, what do you guys think of "from" and "to" instead of "source" and "target"? I also think we can truncate "nodes_dict" down to "nodes", and truncate and then specify "edges_lst" to "data_edges". Finally, I was trying to think whether there's a way to express the I-I/O-O value coupling by exploiting the "value" field of the edge dictionary. I didn't manage to find a way that I both liked more and that carried the right logical information for the machine, but I want to bring it up in case you guys can see a more clever way to attack along that route than I do. It would be nice to have only "true" edges in the from-to edge data, and to have some really clear indication when we're supplying information for a "false" edge. |
Ok, I slept on it and some more ideas came. First, regarding where to store values: they can live either on the inputs/outputs, or on the edges themselves. There is no difference between these on the quality of information we stored, but they differ in readability and memory efficiency; storing on edges the value gets repeated n_connections times, storing on the I/O channels it gets repeated n_connections+1 times -- i.e. the worst case (and most typical case) is that IO storage is 2x worse. For now, let's optimize for human readability, and we can always convert our representation to/from a more efficient machine-storage representation as needed. Next, if we are storing values on the IO itself, then the edges only need to be string references to the scoped IO labels. I realized that we can nicely distinguish "true" data edges from "fake" value synchronizing edges by the fact that the former must be directional (output->input) whereas the latter, in principle, is a non-directional property. Of course, in practice input-input value synchronizations always see the outer-scope value get updated and this propagates to the inner-scope, and vice-versa for output-output value synchronizations, but the result is that the values are always synchronized. Thus, they can be easily distinguished by an ordered tuple Finally, I extended the underlying example functions a bit to demonstrate how important the import and output fields are: from semantikon.typing import u
# Function node
def add_x_and_y(x, y):
z = x + y
return x, y, z
# Function Node
def add_x_and_y_and_z(x, y=4, z: u(int, "meters") = 5) -> int:
w = x + y + z
return w
# Macro node
def all_at_once(X, Y: int):
x, y, z = add_x_and_y(x=X, y=Y)
w = add_x_and_y_and_z(x=x, y=y, z=z)
W = w
return W
# Workflow
all_at_once(1, 2)
all_at_once_dict = {
"inputs": {
"mymacro__X": {"value": 1},
"mymacro__Y": {"value": 2},
},
"outputs": {"mymacro__W": {"value": 6}},
"nodes": {
"mymacro": {
"inputs": {
"X": {"value": 1},
"Y": {
"value": 2,
"annotation": int,
},
},
"outputs": {"W": {"value": 6}},
"nodes": {
"xy": {
"function": add_x_and_y,
"inputs": {
"x": {"value": 1},
"y": {"value": 2},
},
"outputs": {
"x": {"value": 1},
"y": {"value": 2 },
"z": {"value": 3},
}
},
"xyz": {
"function": add_x_and_y_and_z,
"inputs": {
"x": {"value": 1},
"y": {
"value": 2,
"default": 4,
},
"z": {
"value": 3,
"default": 5,
"annotation": u(int, "meters"),
},
},
"outputs": {"w": {"value": 6, "annotation": int}},
},
},
"data_edges": (
{"inputs.X", "xy.inputs.X"},
{"inputs.Y", "xy.inputs.Y"},
("xy.outputs.x", "xyz.inputs.x"),
("xy.outputs.y", "xyz.inputs.y"),
("xy.outputs.z", "xyz.inputs.z"),
{"xyz.outputs.w", "outputs.W"},
),
},
},
"data_edges": (
{"inputs.mymacro__X", "mymacro.inputs.X"},
{"inputs.mymacro__Y", "mymacro.inputs.Y"},
{"mymacro.outputs.W", "outputs.mymacro__W"},
),
} I think this gives us the power to capture everything we want, and (for me at least...) it is still pretty readable given the density of information it contains. One thing I really like is that now the only difference between the "recipe" class-view and the post-run instance view is that the A big change here is that everything has been defined in a nested way. This is fine, and probably preferable, when we don't reuse nodes. I could imagine breaking these into reusable parts by simply having the node/macro dictionaries defined independently in the same scope as the macro/workflow is defined. E.g., xyz_dict = {
"function": add_x_and_y_and_z,
"inputs": {
"x": {},
"y": {"default": 4},
"z": {
"default": 5,
"annotation": u(int, "meters"),
},
}
}
all_at_once_dict = {
...
"xyz": xyz_dict,
...
} Since this is now reusable, note that I've purged the values. This works beautifully for the "recipe" view, but for the instance-view, I guess we'd again want to convert the dictionary to a more verbose nested view so that each occurrence of |
Now I also slept on it, and I guess we can agree on your final version. Here, I try to formulate the specs: Requirements
Optional fields
Footnotes |
I am in favor of an optional |
I support this idea, but I find it also somewhat inconvenient that the label is not included in the dictionary, because I would need it in the knowledge graph. |
You don't mention how we pass information from the scope of one graph to another graph (i.e. where I used sets earlier). Even if we don't go for the set notation, this layer of information passing should be explicitly identified. |
Can you clarify what you mean by "label"? From my reading the label is In principle, we could add a "label" field: xyz_dict = {
"label": "xyz_dict",
"function": add_x_and_y_and_z,
"inputs": {
"x": {},
"y": {"default": 4},
"z": {
"default": 5,
"annotation": u(int, "meters"),
},
}
} But at this stage it would really be a "recipe label", and IMO would anyhow get overridden to the "instance label" of |
With |
Note: I will work on this myself. You can just tell me what you think @liamhuber
Currently, we have
graph_as_dict
, which returns a dict containing all the information about the workflow. However, when we try to makesemantikon
work with any workflow manager, the most natural thing for me is to ask every workflow manager to provide a dictionary that does not requiresemantikon
to depend on the workflow manager. In addition, that would be consistent with the effort that @jan-janssen is making here.I was initially thinking we should extend @jan-janssen's universal representation of workflows by adding optional args required by
semantikon
, and potentially also allowingnodes_dict
to contain sub-nodes to represent also theMacro
nodes. However, I started realizing that this might not be the right way, because:So my first suggestion is to separate
semantikon
-dict from the universal representation dict, in order to represent the workflow including the scientific logic.Coming to the actual content,
semantikon
would need:The type hints are optional, because
semantikon
can parse them, but that could be a bit dangerous because it might fail to assign the types to the IO labels. The reason why we need the original python function is because since this PR the function metadata is attached tof._semantikon_metadata
, which I would not ask the workflow manager to provide.Concerning the
Macro
nodes, I was testing the following code:Then I realized that
wf.graph_as_dict
does explicitly not provide the connection betweenwf.three.inputs.c
andwf.three.one.inputs.a
as wellwf.three.two.outputs.result
andwf.three.outputs.result
. They may not be edges in the sense of workflow edges, but they still somehow have to be included in the dict.Well, thanks for reading, but actually I'm probably the one who profited from it the most as it helped me figure out what I actually want XD
The text was updated successfully, but these errors were encountered: