-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to semantically describe a workflow #86
Comments
Only very shallow feedback, but: I feel like
is indeed redundant and something that is an instance-level feature that gets (and indeed should and needs to be gotten) at the level of the graph connection -- i.e. inferred by a combination if semantic typing and graph connections. What I see missing is rather the connection across the node body, something like
This is pretty straightforward, but the idea gets more complicated to me if we start considering something like, e.g., a vacancy formation energy, which is somehow a property of multiple structures. I would also recommend that we push for examples requiring transitive behaviour, as this is going to be the hard part. This means examples of at least (and preferably at most, for now) three nodes. |
I would note also that the existing implementation for transitive features is actually still missing something -- we can link output to upstream o-types with the transitive capability, but in the event of multiple outputs, I don't think we have the infrastructure currently to link a specific output to a specific upstream input. The next generation solution should offer this level of control. |
I should have clearly stated my intention in my first post above (To be honest I was just doodling because I didn't really know how to proceed). In this post, we talked about how to connect data using triples, which the case you mentioned here belongs to. The reason why I posted this issue was because I started having the feeling that the knowledge graph should know the full workflow automatically, maybe in the form of
At this point, there's only a workflow connection between inputs and outputs, like between Here's maybe a more important question: Why am I doing this? Well I'm still exploring a practical case, where the user has the possibility to look up something scientifically meaningful. A simple example is something like "What's the vacancy formation energy of a given element?". And a simple way of looking it up is to see whether a workflow has Footnotes
|
In the meantime, there's a prototype that I implemented in this PR, that should already do some work. The example is the same as in this issue. from rdflib import Namespace, Graph
from pyiron_ontology.parser import get_inputs_and_outputs, get_triples
EX = Namespace("http://example.org/")
from pyiron_workflow import Workflow
from semantikon.typing import u
from ase import Atoms, build
@Workflow.wrap.as_function_node
def create_structure(
element: u(str, triple=(EX["IsElementOf"], "outputs.output_structure"), uri=EX["element"])
) -> u(Atoms, uri=EX["ComputationalSample"]):
output_structure = build.bulk(element, cubic=True)
return output_structure
@Workflow.wrap.as_function_node
def get_volume(
input_structure: u(Atoms, uri=EX["ComputationalSample"])
) -> u(float, units="angstrom**3", triple=(EX["IsCalculatedPropertyOf"], "inputs.input_structure"), uri=EX["volume"]):
volume = input_structure.get_volume()
return volume
wf = Workflow("my_workflow")
wf.my_structure = create_structure(element="Al")
wf.my_volume = get_volume(input_structure=wf.my_structure)
wf.run() And then the knowledge graph can be retrieved via: graph = Graph()
for key, value in wf.children.items():
data = get_inputs_and_outputs(value)
graph += get_triples(data, EX) This time I'm not gonna show the diagram anymore because it's a Persian bazaar now. The important point is that now I included workflow nodes in the knowledge graph. I also switched to @liamhuber's notation (i.e. |
I talked a lot with ChatGPT over Christmas about how to semantically describe workflow steps, because in this issue I didn't address anything about nodes. I sort of came to the conclusion that what makes the most sense is to define triples for inputs and outputs for workflow nodes. Let's take the same example, where the workflow consists of two steps: structure creation for a given element, and the calculation of its energy:
My suggestion is to define:
create_structure
-hasInput
-create_structure.input.element
create_structure
-hasOutput
-create_structure.output.structure
calculate_energy
-hasInput
-calculate_energy.input.structure
calculate_energy
-hasOutput
-calculate_energy.output.energy
create_structure.output.structure
-equalTo
-calculate_energy.input.structure
And I think there should be some marking of the fact that
element
is the global input andenergy
is the global output. On top of this, we can obviously also append all the ontological information that I've talked about in this issue.This being said, all the input/output definition +
equalTo
looks extremely redundant. I guess I'm gonna try to make a prototype in the coming days, but I would appreciate it if you could leave a comment if you guys have an idea.The text was updated successfully, but these errors were encountered: