Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add parsers #90

Merged
merged 61 commits into from
Jan 21, 2025
Merged

Add parsers #90

merged 61 commits into from
Jan 21, 2025

Conversation

samwaseda
Copy link
Member

I start adding pyiron_workflow parsers in order to take ontology into account, using semantikon.

Copy link

github-actions bot commented Jan 6, 2025

Binder 👈 Launch a binder notebook on branch pyiron/pyiron_ontology/add_parsers

Copy link

codacy-production bot commented Jan 6, 2025

Coverage summary from Codacy

See diff coverage on Codacy

Coverage variation Diff coverage
+0.27% (target: -1.00%) 100.00%
Coverage variation details
Coverable lines Covered lines Coverage
Common ancestor commit (62b1337) 452 411 90.93%
Head commit (e43109e) 466 (+14) 425 (+14) 91.20% (+0.27%)

Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: <coverage of head commit> - <coverage of common ancestor commit>

Diff coverage details
Coverable lines Covered lines Diff coverage
Pull request (#90) 14 14 100.00%

Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: <covered lines added or modified>/<coverable lines added or modified> * 100%

See your quality gate settings    Change summary preferences

Codacy stopped sending the deprecated coverage status on June 5th, 2024. Learn more

@liamhuber
Copy link
Member

I have some concerns that we're taking too roundabout a way to accomplish this, and that we'd be better off adding semantikon as a dependency of pyiron_workflow and parsing semantikon.typing.u hints directly there. The current route has limitations like working on instances and only for function nodes.

We can actually already almost do this. E.g., at the class level and with a macro:

from pyiron_workflow import as_macro_node
from semantikon.typing import u

cookie_count = u(int, units="cookies", triple=("a", "b", "c"))

@as_macro_node
def SomeMacro(
    self, 
    in1: cookie_count, 
    in2: float
) -> tuple[cookie_count, float]:
    self.int_out = 2 * in1
    self.float_out = in1 + in2
    return self.int_out, self.float_out

relevant_ontological_info = SomeMacro.preview_io()
relevant_ontological_info.update({"node": SomeMacro.__name__})
relevant_ontological_info

Gives

{'inputs': {'in1': (int, NOT_DATA), 'in2': (float, NOT_DATA)},
 'outputs': {'int_out': int, 'float_out': float},
 'node': 'SomeMacro'}

On the one hand, I'm super happy that semantikon doesn't screw up how pyiron_workflow accesses the type hint, so kudos to you! On the other hand, what we really want is for the pyiron_workflow Channel to see that it is getting something richer and be able to handle that, i.e. we want an output like

{'inputs': {'in1': (u(int, units="cookies", triple=("a", "b", "c"), NOT_DATA), 'in2': (float, NOT_DATA)},
 'outputs': {'int_out': u(int, units="cookies", triple=("a", "b", "c"), 'float_out': float},
 'node': 'SomeMacro'}

For a node-recommendation system, it's critical that we be able to work on the class-level like this. For walking an actual graph instance and looking at transitive ontological requirements, we'd need the label field as in your example. This is a difference that's super easy to treat, IMO. My main point is that building up the relevant ontological information should be able to leverage StaticNode.preview_io() directly. I believe that the place we really need to be doing most of the leg work in order to get the functionality you're pushing at here is over in pyiron_workflow, so it parses the entire annotation instead of just the type (i.e. second output instead of first).

In this way, I would imagine that pyiron_ontology knows how to handle these dictionaries, and I guess simple dictionary representations of the graph -- then pyiron_workflow depends on both semantikon (to accept and parse u hints) and pyiron_ontology (to validate u-hinted connections and find u-compatible nodes), and just packages its data in a way pyiron_ontology could read it. Thoughts?

@samwaseda samwaseda marked this pull request as draft January 6, 2025 19:00
@samwaseda
Copy link
Member Author

samwaseda commented Jan 6, 2025

Do I understand correctly that you are talking about the implementation and not the overall functionality of get_inputs_and_outputs (i.e. the function that I added in this PR)? In that case, I guess we can definitely talk about how it should look in the end, but this PR is more like a place for brainstorming for me, so I might postpone the discussion, until a prototype is indeed there.

When it comes to what does what, I can definitely see your point. I just feel I'm not really entitled to say whether it should be included in preview_io, especially as long as there's no useful product. I would therefore leave the decision to you.

@liamhuber
Copy link
Member

Do I understand correctly that you are talking about the implementation and not the overall functionality of get_inputs_and_outputs (i.e. the function that I added in this PR)?

Yes, I agree we want a function that parses a node (or a node class, which differs from this PR) and returns a dictionary identical or very similar to what you have here. I just think it should live in pyiron_workflow and work for all (static -- i.e. non-Workflow) nodes and not just Function nodes. Then, here, I think we want to be able to parse and operate on such dictionaries.

When it comes to what does what, I can definitely see your point. I just feel I'm not really entitled to say whether it should be included in preview_io, especially as long as there's no useful product. I would therefore leave the decision to you.

This is actually why I would push towards pyiron_ontology operating on simple dictionaries -- then we can build up the ontological processing that leverages dictionaries and semantikon.typing.u, and then pyiron_workflow merely needs to be able to represent itself this way and it can leverage the power. At the same time, anyone else who can build an "inputs+outputs+owner(class and/or instance)" dictionary can also use the power.

It's actually a very similar deal for Jan's more generic workflow representation, and these are the two dictionaries I think we want pyiron_workflow to operate on: one to define how a set of nodes are connected to each other, and a set of secondary dictionaries describing how outputs are connected to inputs for a given owner (i.e. node). Then pyiron_worfklow can be one of many tools that is able to represent itself this way to leverage the ontological power (i.e. for connection validation and node recommendation).

We can and should keep pyiron_workflow in mind when defining the spec for these dictionaries, but that doesn't mean they need to leverage pyiron_workflow from the get-go. So, for instance, the graph dictionary is something like nodes (source reference and local label), edges (a set of two owner labels+channel labels), and a guarantee that the graph is non-cyclic; while the ontological dictionary is something like nodes/functions (source reference), optionally a local label (if we have an instance rather than class), and inputs and outputs (labels and semantikon hints). The ontological dictionary then lets us know how a given node/function connects output ontology to input ontology using the semantikon triples and local names, e.g.

{
    "type": "some_node_package.AllergenDetector",
    "label": "this_label_also_appears_in_the_graph_dictionary",  # Optional
    "inputs": {
        "x": u(int, units="cookies", uri="baking.Cookie"),
    },
    "outputs": {
        "chocolate_mass": u(float, units="g", triple=("chocolate", "isPropertyOf", "inputs.x"), uri="baking.Chocolate",
        "has_nuts": u(bool, triple=("hasNuts", "isPropertyOf", "inputs.x")),
    }
}

Don't take my triples too seriously here -- I think my understanding of this concept still lags yours and Sarath's significantly. Same with what I shoved in for uri -- I'm just trying to shoehorn in a reference to some external ontology and I'm not totally clear how we do that. What I am trying to show here is that pyiron_workflow would know how to generate such a representation (assuming the channels are u-hinted), but that the type field in such a dictionary could just as well be a non-node thing, like a reference to some plain python function, as long as that function was similarly u-hinted.

In the same way, we could have a graph dictionary like

{
    "nodes":  {
        "oven": some_node_package.MakeCookie,
        "checker": some_node_package.AllergenDetector
    },
    "edges": (
        ("oven.outputs.cookie", "checker.inputs.x"),
    )
}

And again, this is super easy with pyiron_workflow, but also works with any other node objects that have both labeled input and labeled output (e.g., again, a u-hinted plain old python function). (I don't remember how Jan handled output labels in his generic workflow format, and I didn't find the code he wrote on a super quick search, but IIRC it all looked somewhat like this.)

In this way, we can imagine taking some u-hinted channel input and searching over a corpus of type-inputs-outputs dictionaries for node recommendations, or walking over the "nodes"-"edges" graph dictionary combined with a subset of that type-inputs-outputs dictionary corpus (filtered by which "nodes" are in our graph) and augmented by an extra "label" field to generate connection recommendations.

But from what I see now, we should be able to build up pyiron_ontology just using semantikon and these graph- and ontology-dictionaries, independent of pyiron_workflow, then (hopefully easily) make pyiron_workflow capable of generating compatible representations later.

@samwaseda
Copy link
Member Author

I think we are on the same page on this topic. I would even go a step further and say there isn't such a thing as pyiron_ontology from the conceptual point of view - the components we will need in the end are:

  • Ontology (atomRDF, PMDcore whatsoever)
  • Annotation (semantikon)
  • Interpretation (pyiron_workflow)
  • Reasoning (some external library)

There's a possibility that we would have to develop our own reasoner, but that's still very different from what I wrote for this PR, so currently there's no space for pyiron_ontology in my plan.

Maybe: Is it possible that you are thinking I'm only implementing what I presented before? If that's the case, no, that's not what I'm trying to do. I'm going more in the direction of how to represent a workflow (cf. this issue). The problem with the prototype that I had presented was that there is no connection between functions and inputs/outputs, even though according to my understanding, there should be intransitive connections for input - function - output. So I wanted to see how to incorporate it to my first prototype AND think about how to also include transitive connections (I got some ideas this week). I could have done it in pyiron_contrib, but since pyiron_ontology is not in use right now, I'm using it as a playground.

@liamhuber
Copy link
Member

I think we are on the same page on this topic.

👍 Yep, at least insofar as I understand the topic 😂

I would even go a step further and say there isn't such a thing as pyiron_ontology from the conceptual point of view - the components we will need in the end are: ...

I guess the role I see for pyiron_ontology is at the "interpretation" stage, where we bring together the ontological hints (which are not strictly necessary for workflows) and the transitive nature of those hints (which are not possible without workflow connections). For me, the union of these two ideas -- pure ontological hinting and their modification via transitive graph connections is the "interpretation" bit. I would only suggest that while we could do this directly inside pyiron_workflow, I think the community is better served if we do these operations in a separate place (e.g. "pyiron_ontology" using a more workflow-tool-agnostic representation, and have pyiron_workflow just package itself up in this agnostic representation and talk to that tool.

There's a possibility that we would have to develop our own reasoner, but that's still very different from what I wrote for this PR, so currently there's no space for pyiron_ontology in my plan.

Yes, 💯. My current vision is that the "interpreter" combines the graph connectivity and the annotations to come up with raw (but perhaps rich and complex) ontological objects, and that we can then pass these to an off-the-shell reasoner that only knows about the ontology itself in order to answer questions about whether or not one object is "as or more specific than" the other object from an ontological perspective.

Maybe: Is it possible that you are thinking I'm only implementing what I presented before? If that's the case, no, that's not what I'm trying to do. I'm going more in the direction of how to represent a workflow (cf. this issue). The problem with the prototype that I had presented was that there is no connection between functions and inputs/outputs, even though according to my understanding, there should be intransitive connections for input - function - output. So I wanted to see how to incorporate it to my first prototype AND think about how to also include transitive connections (I got some ideas this week). I could have done it in pyiron_contrib, but since pyiron_ontology is not in use right now, I'm using it as a playground.

Hmm, I start to get a little confused here. My own grasp of what I mean by "(in)transitive" is, honestly, pretty rough, so I can't be at all certain we're using it the same way. I certainly don't disagree with anything in this paragraph though! I guess I feel pretty confident at this stage about how we can represent (exclusively DAG) workflows in a simple dictionary-like way, and the "intransitive"(?) ontological connections between input and output across a "node" in a simple dictionary-like way, and pretty un-confident in how we can bring these together in an "interpretation" to achieve "transitive" updates to resulting "node" I/O ontological hints. In light of that lack of confidence, I look forward to hearing about your ideas! There is no agenda yet for Monday, so maybe there is space to discuss it at the regular meeting? Otherwise let's find another time for you to show off the ideas!

@samwaseda
Copy link
Member Author

Sorry actually transitive/intransitive was not really the focus of what I wanted to say. What I wanted to say was the knowledge graph should contain not only data, but also the workflow nodes, and that's what I wanted to achieve in this PR. Don't know how I ended up with the discussion with transitive/intransitive. Anyway I'm pretty confident that it's the right move, especially after I realized that @jan-janssen had the same idea independently.

Copy link

codacy-production bot commented Jan 10, 2025

Coverage summary from Codacy

See diff coverage on Codacy

Coverage variation Diff coverage
+0.68% (target: -1.00%) 94.44%
Coverage variation details
Coverable lines Covered lines Coverage
Common ancestor commit (62b1337) 452 411 90.93%
Head commit (21112ec) 560 (+108) 513 (+102) 91.61% (+0.68%)

Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: <coverage of head commit> - <coverage of common ancestor commit>

Diff coverage details
Coverable lines Covered lines Diff coverage
Pull request (#90) 108 102 94.44%

Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: <covered lines added or modified>/<coverable lines added or modified> * 100%

See your quality gate settings    Change summary preferences

Codacy stopped sending the deprecated coverage status on June 5th, 2024. Learn more

@samwaseda samwaseda marked this pull request as ready for review January 20, 2025 07:14
@samwaseda samwaseda merged commit 071ac40 into main Jan 21, 2025
17 checks passed
@samwaseda samwaseda deleted the add_parsers branch January 21, 2025 06:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants