-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Specs for the ontology group #85
Comments
This time (in contrast to my last attempts), I strictly followed the text above, and wrote a prototype. It's again workflow to calculate the volume of a chemical element, where it involves two steps: element -> structure -> volume. Parserfrom rdflib import Graph, Literal, RDF, URIRef, RDFS, Namespace
from semantikon.converter import parse_input_args, parse_output_args
EX = Namespace("http://example.org/")
def get_inputs_and_outputs(node):
"""
Read input and output arguments with their type hints and return a dictionary containing all input output information
Args:
node (pyiron_workflow.nodes.Node): node to be parsed
Returns:
(dict): dictionary containing input output args, type hints, values and variable names
"""
inputs = parse_input_args(node.node_function)
outputs = parse_output_args(node.node_function)
if isinstance(outputs, dict):
outputs = (outputs, )
outputs = {key: out for key, out in zip(node.outputs.labels, outputs)}
for key, value in node.inputs.to_value_dict().items():
inputs[key]["value"] = value
inputs[key]["var_name"] = key
for key, value in node.outputs.to_value_dict().items():
outputs[key]["value"] = value
outputs[key]["var_name"] = key
return {"input": inputs, "output": outputs}
def get_node_dict(io_dict):
"""
Translate the dictionary returned by get_inputs_and_outputs into
the one that contains keys with their labels (or variable names
whenever labels are not available) and the values with the dict
content
"""
results = {}
for io_ in ["input", "output"]:
for key, value in io_dict[io_].items():
if "uri" not in value:
continue
if value["label"] is not None:
results[value["label"]] = value
else:
results[key] = value
return results
def node_to_knowledge_graph(node, graph=None, EX=EX):
"""Translate a node into a knowledge graph"""
if graph == None:
graph = Graph()
d = get_inputs_and_outputs(node)
node_dict = get_node_dict(d)
for key, d in node_dict.items():
label = URIRef(key)
label_def_triple = (label, RDFS.label, Literal(key))
if len(list(graph.triples(label_def_triple))) == 0:
graph.add(label_def_triple)
graph.add((label, RDF.type, d["uri"]))
graph.add((label, EX.HasValue, Literal(d["value"])))
if d["units"] is not None:
graph.add((label, EX.HasUnits, EX[d["units"]]))
if d["triple"] is not None:
graph.add((label, d["triple"][0], URIRef(d["triple"][1])))
return graph
def workflow_to_knowledge_graph(wf, graph=None, EX=EX):
if graph is None:
graph = Graph()
for node in wf.children.values():
graph = node_to_knowledge_graph(node=node, graph=graph, EX=EX)
return graph User definitionfrom pyiron_workflow import Workflow
from semantikon.typing import u
from ase import Atoms, build
@Workflow.wrap.as_function_node
def create_structure(
element: u(str, triple=(EX["IsElementOf"], "structure"), uri=EX["element"])
) -> u(Atoms, uri=EX["ComputationalSample"]):
structure = build.bulk(element, cubic=True)
return structure
@Workflow.wrap.as_function_node
def get_volume(
structure: u(Atoms, uri=EX["ComputationalSample"])
) -> u(float, units="angstrom**3", triple=(EX["IsCalculatedPropertyOf"], "structure"), uri=EX["volume"]):
volume = structure.get_volume()
return volume
wf = Workflow("my_workflow")
wf.structure = create_structure(element="Al")
wf.volume = get_volume(structure=wf.structure)
wf.run() Parsinggraph = workflow_to_knowledge_graph(wf)
print(list(graph.triples(3 * (None, )))) Output: [(rdflib.term.URIRef('element'),
rdflib.term.URIRef('http://www.w3.org/1999/02/22-rdf-syntax-ns#type'),
rdflib.term.URIRef('http://example.org/element')),
(rdflib.term.URIRef('volume'),
rdflib.term.URIRef('http://example.org/HasUnits'),
rdflib.term.URIRef('http://example.org/angstrom**3')),
(rdflib.term.URIRef('structure'),
rdflib.term.URIRef('http://www.w3.org/2000/01/rdf-schema#label'),
rdflib.term.Literal('structure')),
(rdflib.term.URIRef('element'),
rdflib.term.URIRef('http://www.w3.org/2000/01/rdf-schema#label'),
rdflib.term.Literal('element')),
(rdflib.term.URIRef('element'),
rdflib.term.URIRef('http://example.org/IsElementOf'),
rdflib.term.URIRef('structure')),
(rdflib.term.URIRef('volume'),
rdflib.term.URIRef('http://example.org/HasValue'),
rdflib.term.Literal('66.43012500000002', datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#double'))),
(rdflib.term.URIRef('volume'),
rdflib.term.URIRef('http://example.org/IsCalculatedPropertyOf'),
rdflib.term.URIRef('structure')),
(rdflib.term.URIRef('volume'),
rdflib.term.URIRef('http://www.w3.org/2000/01/rdf-schema#label'),
rdflib.term.Literal('volume')),
(rdflib.term.URIRef('structure'),
rdflib.term.URIRef('http://example.org/HasValue'),
rdflib.term.Literal("Atoms(symbols='Al4', pbc=True, cell=[4.05, 4.05, 4.05])")),
(rdflib.term.URIRef('volume'),
rdflib.term.URIRef('http://www.w3.org/1999/02/22-rdf-syntax-ns#type'),
rdflib.term.URIRef('http://example.org/volume')),
(rdflib.term.URIRef('structure'),
rdflib.term.URIRef('http://www.w3.org/1999/02/22-rdf-syntax-ns#type'),
rdflib.term.URIRef('http://example.org/ComputationalSample')),
(rdflib.term.URIRef('element'),
rdflib.term.URIRef('http://example.org/HasValue'),
rdflib.term.Literal('Al'))] |
Nice work, @samwaseda. At a first read it looks good to me. The only clear problem is see is that the io parsing needs to more particularly specify whether a reference to another object is referencing an input or output object. If I'm reading this correctly, labels are scraped from the function, all the labels are lumped together, and then we look for our reference. Since there's nothing stopping us from having the same label as both an input and output, this is thus not fully defined. I reserve the right to find other complaints later 😝 but so far this is the only thing that I see as definitively problematic. This is a nice strong foundation. |
Ah it's true that I removed @Workflow.wrap.as_function_node
def create_structure(
element: u(str, triple=(EX["IsElementOf"], "structure"), uri=EX["element"], label="element")
) -> u(Atoms, uri=EX["ComputationalSample"], label="structure"):
structure = build.bulk(element, cubic=True)
return structure And I'm kind of hoping that the introduction of @Workflow.wrap.as_function_node
def some_transformation(
u(Atoms, uri=EX["ComputationalSample"], label="input.structure")
) -> u(Atoms, uri=EX["ComputationalSample"], label="output.structure"):
...
return structure |
I think that we can probably work it so the label itself is scraped from the signature (/the |
So dear @pyiron/ontology team, I wrote down in a very explicit way the parts that we can probably agree on quickly, in order for the implementation to be done systematically. I'm obviously painfully aware of the fact that I haven't included any of the tricky points discussed this Monday and in the following conversation on the discussion page, but I think it's good to have a solid ground to build a house on. I think the list below is fairly rudimentary but I still don't think it's something to sneeze at, because just for having this information we can already extract quite an amount of ontological data.
Anyway I would be very happy to have your comments. Especially it would be nice if we could also think about how to include the points we discussed this week.
I already started writing a prototype here, but it doesn't strictly follow the points I wrote here, so I will rewrite it and post the revamped version here below.
Basics
EX = Namespace(my_namespace)
pyiron_workflow
is used(URIRef(label), RDFS.label, Literal(label))
(URIRef(label), RDF.type, uri)
(URIRef(label), EX.hasValue, Literal(value))
(URIRef(label), EX.hasUnit, EX[units])
(probably a different namespace for units)All the items above (URI, label, data type, units) can be specified either by a data class or by semantikon arguments, i.e. the following two cases are equivalent:
Case I
Case II
Triples
(EX["IsElementOf"], "structure")
str
(EX["HasDefect"], EX["vacancy"])
Parsing
The text was updated successfully, but these errors were encountered: