Add parsers #90

samwaseda · 2025-01-06T17:28:43Z

I start adding pyiron_workflow parsers in order to take ontology into account, using semantikon.

github-actions · 2025-01-06T17:28:58Z

👈 Launch a binder notebook on branch pyiron/pyiron_ontology/add_parsers

codacy-production · 2025-01-06T17:31:42Z

Coverage summary from Codacy

See diff coverage on Codacy

Coverage variation	Diff coverage
✅ +0.27% (target: -1.00%)	✅ 100.00%

Coverage variation details

	Coverable lines	Covered lines	Coverage
Common ancestor commit (`62b1337`)	452	411	90.93%
Head commit (`e43109e`)	466 (+14)	425 (+14)	91.20% (+0.27%)

Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: <coverage of head commit> - <coverage of common ancestor commit>

Diff coverage details

	Coverable lines	Covered lines	Diff coverage
Pull request (#90)	14	14	100.00%

Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: <covered lines added or modified>/<coverable lines added or modified> * 100%

See your quality gate settings Change summary preferences

_{Codacy stopped sending the deprecated coverage status on June 5th, 2024. Learn more}

…add_parsers

liamhuber · 2025-01-06T18:37:42Z

I have some concerns that we're taking too roundabout a way to accomplish this, and that we'd be better off adding semantikon as a dependency of pyiron_workflow and parsing semantikon.typing.u hints directly there. The current route has limitations like working on instances and only for function nodes.

We can actually already almost do this. E.g., at the class level and with a macro:

from pyiron_workflow import as_macro_node
from semantikon.typing import u

cookie_count = u(int, units="cookies", triple=("a", "b", "c"))

@as_macro_node
def SomeMacro(
    self, 
    in1: cookie_count, 
    in2: float
) -> tuple[cookie_count, float]:
    self.int_out = 2 * in1
    self.float_out = in1 + in2
    return self.int_out, self.float_out

relevant_ontological_info = SomeMacro.preview_io()
relevant_ontological_info.update({"node": SomeMacro.__name__})
relevant_ontological_info

Gives

{'inputs': {'in1': (int, NOT_DATA), 'in2': (float, NOT_DATA)},
 'outputs': {'int_out': int, 'float_out': float},
 'node': 'SomeMacro'}

On the one hand, I'm super happy that semantikon doesn't screw up how pyiron_workflow accesses the type hint, so kudos to you! On the other hand, what we really want is for the pyiron_workflow Channel to see that it is getting something richer and be able to handle that, i.e. we want an output like

{'inputs': {'in1': (u(int, units="cookies", triple=("a", "b", "c"), NOT_DATA), 'in2': (float, NOT_DATA)},
 'outputs': {'int_out': u(int, units="cookies", triple=("a", "b", "c"), 'float_out': float},
 'node': 'SomeMacro'}

For a node-recommendation system, it's critical that we be able to work on the class-level like this. For walking an actual graph instance and looking at transitive ontological requirements, we'd need the label field as in your example. This is a difference that's super easy to treat, IMO. My main point is that building up the relevant ontological information should be able to leverage StaticNode.preview_io() directly. I believe that the place we really need to be doing most of the leg work in order to get the functionality you're pushing at here is over in pyiron_workflow, so it parses the entire annotation instead of just the type (i.e. second output instead of first).

In this way, I would imagine that pyiron_ontology knows how to handle these dictionaries, and I guess simple dictionary representations of the graph -- then pyiron_workflow depends on both semantikon (to accept and parse u hints) and pyiron_ontology (to validate u-hinted connections and find u-compatible nodes), and just packages its data in a way pyiron_ontology could read it. Thoughts?

samwaseda · 2025-01-06T19:05:22Z

Do I understand correctly that you are talking about the implementation and not the overall functionality of get_inputs_and_outputs (i.e. the function that I added in this PR)? In that case, I guess we can definitely talk about how it should look in the end, but this PR is more like a place for brainstorming for me, so I might postpone the discussion, until a prototype is indeed there.

When it comes to what does what, I can definitely see your point. I just feel I'm not really entitled to say whether it should be included in preview_io, especially as long as there's no useful product. I would therefore leave the decision to you.

liamhuber · 2025-01-06T19:43:25Z

Do I understand correctly that you are talking about the implementation and not the overall functionality of get_inputs_and_outputs (i.e. the function that I added in this PR)?

Yes, I agree we want a function that parses a node (or a node class, which differs from this PR) and returns a dictionary identical or very similar to what you have here. I just think it should live in pyiron_workflow and work for all (static -- i.e. non-Workflow) nodes and not just Function nodes. Then, here, I think we want to be able to parse and operate on such dictionaries.

When it comes to what does what, I can definitely see your point. I just feel I'm not really entitled to say whether it should be included in preview_io, especially as long as there's no useful product. I would therefore leave the decision to you.

This is actually why I would push towards pyiron_ontology operating on simple dictionaries -- then we can build up the ontological processing that leverages dictionaries and semantikon.typing.u, and then pyiron_workflow merely needs to be able to represent itself this way and it can leverage the power. At the same time, anyone else who can build an "inputs+outputs+owner(class and/or instance)" dictionary can also use the power.

It's actually a very similar deal for Jan's more generic workflow representation, and these are the two dictionaries I think we want pyiron_workflow to operate on: one to define how a set of nodes are connected to each other, and a set of secondary dictionaries describing how outputs are connected to inputs for a given owner (i.e. node). Then pyiron_worfklow can be one of many tools that is able to represent itself this way to leverage the ontological power (i.e. for connection validation and node recommendation).

We can and should keep pyiron_workflow in mind when defining the spec for these dictionaries, but that doesn't mean they need to leverage pyiron_workflow from the get-go. So, for instance, the graph dictionary is something like nodes (source reference and local label), edges (a set of two owner labels+channel labels), and a guarantee that the graph is non-cyclic; while the ontological dictionary is something like nodes/functions (source reference), optionally a local label (if we have an instance rather than class), and inputs and outputs (labels and semantikon hints). The ontological dictionary then lets us know how a given node/function connects output ontology to input ontology using the semantikon triples and local names, e.g.

{
    "type": "some_node_package.AllergenDetector",
    "label": "this_label_also_appears_in_the_graph_dictionary",  # Optional
    "inputs": {
        "x": u(int, units="cookies", uri="baking.Cookie"),
    },
    "outputs": {
        "chocolate_mass": u(float, units="g", triple=("chocolate", "isPropertyOf", "inputs.x"), uri="baking.Chocolate",
        "has_nuts": u(bool, triple=("hasNuts", "isPropertyOf", "inputs.x")),
    }
}

Don't take my triples too seriously here -- I think my understanding of this concept still lags yours and Sarath's significantly. Same with what I shoved in for uri -- I'm just trying to shoehorn in a reference to some external ontology and I'm not totally clear how we do that. What I am trying to show here is that pyiron_workflow would know how to generate such a representation (assuming the channels are u-hinted), but that the type field in such a dictionary could just as well be a non-node thing, like a reference to some plain python function, as long as that function was similarly u-hinted.

In the same way, we could have a graph dictionary like

{
    "nodes":  {
        "oven": some_node_package.MakeCookie,
        "checker": some_node_package.AllergenDetector
    },
    "edges": (
        ("oven.outputs.cookie", "checker.inputs.x"),
    )
}

And again, this is super easy with pyiron_workflow, but also works with any other node objects that have both labeled input and labeled output (e.g., again, a u-hinted plain old python function). (I don't remember how Jan handled output labels in his generic workflow format, and I didn't find the code he wrote on a super quick search, but IIRC it all looked somewhat like this.)

In this way, we can imagine taking some u-hinted channel input and searching over a corpus of type-inputs-outputs dictionaries for node recommendations, or walking over the "nodes"-"edges" graph dictionary combined with a subset of that type-inputs-outputs dictionary corpus (filtered by which "nodes" are in our graph) and augmented by an extra "label" field to generate connection recommendations.

But from what I see now, we should be able to build up pyiron_ontology just using semantikon and these graph- and ontology-dictionaries, independent of pyiron_workflow, then (hopefully easily) make pyiron_workflow capable of generating compatible representations later.

samwaseda · 2025-01-07T08:24:15Z

I think we are on the same page on this topic. I would even go a step further and say there isn't such a thing as pyiron_ontology from the conceptual point of view - the components we will need in the end are:

Ontology (atomRDF, PMDcore whatsoever)
Annotation (semantikon)
Interpretation (pyiron_workflow)
Reasoning (some external library)

There's a possibility that we would have to develop our own reasoner, but that's still very different from what I wrote for this PR, so currently there's no space for pyiron_ontology in my plan.

Maybe: Is it possible that you are thinking I'm only implementing what I presented before? If that's the case, no, that's not what I'm trying to do. I'm going more in the direction of how to represent a workflow (cf. this issue). The problem with the prototype that I had presented was that there is no connection between functions and inputs/outputs, even though according to my understanding, there should be intransitive connections for input - function - output. So I wanted to see how to incorporate it to my first prototype AND think about how to also include transitive connections (I got some ideas this week). I could have done it in pyiron_contrib, but since pyiron_ontology is not in use right now, I'm using it as a playground.

…low definitions

liamhuber · 2025-01-09T18:00:54Z

I think we are on the same page on this topic.

👍 Yep, at least insofar as I understand the topic 😂

I would even go a step further and say there isn't such a thing as pyiron_ontology from the conceptual point of view - the components we will need in the end are: ...

I guess the role I see for pyiron_ontology is at the "interpretation" stage, where we bring together the ontological hints (which are not strictly necessary for workflows) and the transitive nature of those hints (which are not possible without workflow connections). For me, the union of these two ideas -- pure ontological hinting and their modification via transitive graph connections is the "interpretation" bit. I would only suggest that while we could do this directly inside pyiron_workflow, I think the community is better served if we do these operations in a separate place (e.g. "pyiron_ontology" using a more workflow-tool-agnostic representation, and have pyiron_workflow just package itself up in this agnostic representation and talk to that tool.

There's a possibility that we would have to develop our own reasoner, but that's still very different from what I wrote for this PR, so currently there's no space for pyiron_ontology in my plan.

Yes, 💯. My current vision is that the "interpreter" combines the graph connectivity and the annotations to come up with raw (but perhaps rich and complex) ontological objects, and that we can then pass these to an off-the-shell reasoner that only knows about the ontology itself in order to answer questions about whether or not one object is "as or more specific than" the other object from an ontological perspective.

Maybe: Is it possible that you are thinking I'm only implementing what I presented before? If that's the case, no, that's not what I'm trying to do. I'm going more in the direction of how to represent a workflow (cf. this issue). The problem with the prototype that I had presented was that there is no connection between functions and inputs/outputs, even though according to my understanding, there should be intransitive connections for input - function - output. So I wanted to see how to incorporate it to my first prototype AND think about how to also include transitive connections (I got some ideas this week). I could have done it in pyiron_contrib, but since pyiron_ontology is not in use right now, I'm using it as a playground.

Hmm, I start to get a little confused here. My own grasp of what I mean by "(in)transitive" is, honestly, pretty rough, so I can't be at all certain we're using it the same way. I certainly don't disagree with anything in this paragraph though! I guess I feel pretty confident at this stage about how we can represent (exclusively DAG) workflows in a simple dictionary-like way, and the "intransitive"(?) ontological connections between input and output across a "node" in a simple dictionary-like way, and pretty un-confident in how we can bring these together in an "interpretation" to achieve "transitive" updates to resulting "node" I/O ontological hints. In light of that lack of confidence, I look forward to hearing about your ideas! There is no agenda yet for Monday, so maybe there is space to discuss it at the regular meeting? Otherwise let's find another time for you to show off the ideas!

samwaseda · 2025-01-09T21:50:07Z

Sorry actually transitive/intransitive was not really the focus of what I wanted to say. What I wanted to say was the knowledge graph should contain not only data, but also the workflow nodes, and that's what I wanted to achieve in this PR. Don't know how I ended up with the discussion with transitive/intransitive. Anyway I'm pretty confident that it's the right move, especially after I realized that @jan-janssen had the same idea independently.

codacy-production · 2025-01-10T06:28:21Z

Coverage summary from Codacy

See diff coverage on Codacy

Coverage variation	Diff coverage
✅ +0.68% (target: -1.00%)	✅ 94.44%

Coverage variation details

	Coverable lines	Covered lines	Coverage
Common ancestor commit (`62b1337`)	452	411	90.93%
Head commit (`21112ec`)	560 (+108)	513 (+102)	91.61% (+0.68%)

Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: <coverage of head commit> - <coverage of common ancestor commit>

Diff coverage details

	Coverable lines	Covered lines	Diff coverage
Pull request (#90)	108	102	94.44%

Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: <covered lines added or modified>/<coverable lines added or modified> * 100%

See your quality gate settings Change summary preferences

_{Codacy stopped sending the deprecated coverage status on June 5th, 2024. Learn more}

…logy into add_parsers

…add_parsers

samwaseda added 2 commits January 4, 2025 17:04

Add semantikon to the environment

10d3729

update environment

19876e7

pyiron-runner and others added 2 commits January 6, 2025 17:29

[dependabot skip] Update env file

ebd3380

Forgot to add the parser

6e8d597

samwaseda added 4 commits January 6, 2025 17:36

remove python 3.9

ad45af4

Merge branch 'add_parsers' of github.com:pyiron/pyiron_ontology into …

71ea04a

…add_parsers

run black

09a09e4

add tests

e6d90ad

samwaseda marked this pull request as draft January 6, 2025 19:00

samwaseda added 2 commits January 9, 2025 09:03

Make input and output plural to be in agreement with the pyiron_workf…

e43109e

…low definitions

small bugfixes

c446523

samwaseda mentioned this pull request Jan 9, 2025

How to semantically describe a workflow #86

Closed

samwaseda and others added 3 commits January 10, 2025 06:12

set lower bound

cb421d1

Add rdflib

b6c61b2

[dependabot skip] Update env file

888e25b

samwaseda and others added 6 commits January 10, 2025 13:05

add one test

66bf6a7

git push oMerge branch 'add_parsers' of github.com:pyiron/pyiron_onto…

fa9d3b4

…logy into add_parsers

update semantikon version

06daa09

[dependabot skip] Update env file

f80549b

update definition of connection

c6c47df

bugfix

770e72f

samwaseda and others added 21 commits January 16, 2025 14:39

make inherit properties a public function

918d141

add restriction parser

71eeed1

[dependabot skip] Update env file

c18444b

raise lower bound of semantikon

bcf33dd

run black

eefe3fa

add reasoner

43a5009

Replace [] by None

bc64153

implement validator

ed2d030

Do not inherit type

20fb3a8

calculate number of iterations automatically

f4f8dd6

adjust line length

7dbf3c2

be more specific about the number of triples in tests

077c980

Add tests (but not working right now)

a720944

increase number of inheritances

f34f52d

Add wrong analysis

535aac1

include owlrl in the environment

3cd1f78

[dependabot skip] Update env file

73a2741

change lower bounds

4dfe112

Another try

972ba6d

update pint version

28c7dd4

update lower bounds

394b553

samwaseda marked this pull request as ready for review January 20, 2025 07:14

samwaseda and others added 6 commits January 20, 2025 07:59

Define each label as a named individual

2224b59

update semantikon version

0b1d6e2

[dependabot skip] Update env file

8b14555

Name change in order to avoid confusion

5e8e52f

Codacy

992d3e5

Merge branch 'add_parsers' of github.com:pyiron/pyiron_ontology into …

21112ec

…add_parsers

samwaseda merged commit 071ac40 into main Jan 21, 2025
17 checks passed

samwaseda deleted the add_parsers branch January 21, 2025 06:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add parsers #90

Add parsers #90

samwaseda commented Jan 6, 2025

github-actions bot commented Jan 6, 2025

codacy-production bot commented Jan 6, 2025 •

edited

Loading

liamhuber commented Jan 6, 2025

samwaseda commented Jan 6, 2025 •

edited

Loading

liamhuber commented Jan 6, 2025

samwaseda commented Jan 7, 2025

liamhuber commented Jan 9, 2025

samwaseda commented Jan 9, 2025

codacy-production bot commented Jan 10, 2025 •

edited

Loading

Add parsers #90

Add parsers #90

Conversation

samwaseda commented Jan 6, 2025

github-actions bot commented Jan 6, 2025

codacy-production bot commented Jan 6, 2025 • edited Loading

Coverage summary from Codacy

See diff coverage on Codacy

See your quality gate settings Change summary preferences

liamhuber commented Jan 6, 2025

samwaseda commented Jan 6, 2025 • edited Loading

liamhuber commented Jan 6, 2025

samwaseda commented Jan 7, 2025

liamhuber commented Jan 9, 2025

samwaseda commented Jan 9, 2025

codacy-production bot commented Jan 10, 2025 • edited Loading

Coverage summary from Codacy

See diff coverage on Codacy

See your quality gate settings Change summary preferences

codacy-production bot commented Jan 6, 2025 •

edited

Loading

samwaseda commented Jan 6, 2025 •

edited

Loading

codacy-production bot commented Jan 10, 2025 •

edited

Loading