Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Federation extension: how to ensure consistent process version across backends? #548

Closed
soxofaan opened this issue Nov 7, 2024 · 7 comments

Comments

@soxofaan
Copy link
Member

soxofaan commented Nov 7, 2024

This came up in dev-telco yesterday:

We now have version 1.x and 2.x of openeo-processes, which are somewhat incompatible (e.g. schema differences regarding raster/vector versus generic data cubes). When participating backends in a federation use different versions, the aggregator merges these process definitions, and possibly ends up with a hybrid v1-v2 mix of processes, which can cause trouble in different places (e.g. clients getting confused or flagging inconsistent wiring of process graphs).

One possible solution is to make the merging in the aggregator "smarter" in some way to produce a consistent result. However, this doesn't seem like a sustainable solution to me because this kind automatic conflict resolution can get quite complex and/or could require tedious ad-hoc configuration and hacks. It might even involve rewriting process graphs, which could make things quite confusing.

I think it's more sustainable to require a certain version of openeo-processes to participate in a federation, which makes it more clear to all parties what to target for.

I'm not sure yet how to tackle this, but want to open the discussion here.
Should this just be something in the federation extension? Or make this more general and make sure the process version is explicitly stated in the capabilities in some way?

thoughts?

cc @m-mohr @ValentinaHutter

@soxofaan
Copy link
Member Author

soxofaan commented Nov 7, 2024

@m-mohr
Copy link
Member

m-mohr commented Nov 7, 2024

You are describing two separate issues here, which are not necessarily strictly dependant on each other:

  1. Expose process version (has nothing to do with federations)

    There is a separate issue for this: How to expose which version of openeo-processes is available/targeted  #517

  2. "Combining" processes from backends that are implementing different versions of the specification

    I'm not sure whether this is an actual API issue. It feels like this is more an issue of the federation to ensure they have a contract in place that defines the base versions (both for API + processes). Small differences can occur also in the API. The federation also doesn't really cater yet for differences that occur due to different process implementations, completely independant of the versions (right?). Examples: resample_spatial implements different resampling methods, different options available in ard_surface_reflectance, ...
    With regards to the process schemas, I think mixing raster-cube and datacube should be handled by the clients. I think if it wouldn't have been an issue in the Web Editor until recently, no one may have noticed it at all. The datacube type is a direct successor, which can be translated 1:1. raster-cube => datacube with dimensions x and y, vector-cube => datacube with dimension geometry.

@ValentinaHutter
Copy link

The problem only came up, when the webeditor explicitely checked, if the output of one process matches the input of the following process and raised a Warning when they were set to rastercube in the first output but datacube in the following input, so to me it makes sense that the clients allow rastercubes and vectorcubes as datacubes (or add a check for the dimensions as well).

I think I missed the discussion about supporting api 1.1.0 with processes 1.0 and api 1.2 with processes 2.0, when it first came up, so we only have one /processes endpoint in the eodc api at the moment.
It seems to be quite some effort to update this to support both /processes endpoints. So, I am now wondering if there is a general plan to move towards favoring v2.0 for the processes? Or is there a specific reason to still support v1.0?

@m-mohr
Copy link
Member

m-mohr commented Nov 8, 2024

I think I missed the discussion about supporting api 1.1.0 with processes 1.0 and api 1.2 with processes 2.0, when it first came up,

That was never a discussion, that was just a decision from VITO how they want to do it. There's no direct linkage between API versions and process versions in openEO (except for that only API 1.2 officially has a fully schema for 2.0 additions to the parameter/return value schemas). But in principle any openEO API version can implement any openEO processes version (except 0.x versions).

So, I am now wondering if there is a general plan to move towards favoring v2.0 for the processes? Or is there a specific reason to still support v1.0?

More a note than anything else: openEO processes are only in v2.0.0-rc.1 - The latest stable release is still 1.2.0.

@soxofaan
Copy link
Member Author

soxofaan commented Nov 8, 2024

I'm not sure whether this is an actual API issue. It feels like this is more an issue of the federation to ensure they have a contract in place that defines the base versions (both for API + processes).

Indeed, this would be something to express in the federation extension (which is managed under the openeo-api project, so that's why I created this issue here).
Also, I think it would be cleaner if the federation extension can refer express this in terms of openEO API pointers: e.g. the API version is part of the capabilities doc GET /, but for the openEO processes version there is no such thing yet (indeed topic of #517, thanks for that pointer)

Small differences can occur also in the API. The federation also doesn't really cater yet for differences that occur due to different process implementations, completely independant of the versions (right?). Examples: resample_spatial implements different resampling methods, different options available in ard_surface_reflectance, ...

Indeed, the process specification merging does not go arbitrary deep yet for all possible "conflicts". This is hard to do generically as you can imagine. In the past we've done some ad-hoc tweaks for certain use cases.

With regards to the process schemas, I think mixing raster-cube and datacube should be handled by the clients. I think if it wouldn't have been an issue in the Web Editor until recently, no one may have noticed it at all.

True about translating cube schemas. But there are also other differences between v1 anv v2 of the processes, like process-renames, changes in default values, ... which can not be adapted for with simple client-side tricks. The whole point of major-version bumps like this is to allow for backward incompatible changes, so we have to be explicit about the process versions in the right places. This is especially true in federated situations.

That was never a discussion, that was just a decision from VITO how they want to do it. There's no direct linkage between API versions and process versions in openEO (except for that only API 1.2 officially has a fully schema for 2.0 additions to the parameter/return value schemas). But in principle any openEO API version can implement any openEO processes version (except 0.x versions).

Indeed, this is a pragmatic choice from VITO:

  • endpoints ending in /openeo/1.0/ and /openeo/1.1/ assume/support V1.x of openeo processes
  • endpoints ending in /openeo/1.2/ assume/support V2.x of openeo processes
    We can not just roll out V2 of the processes everywhere because a lot of our users have built their use case against V1, e.g. in the form of Python scripts, or static JSON dumps of process graphs (e.g. like UDPs). We've experienced a lot of trouble when trying to push V2 too aggressively (e.g. because of process/argument renames).

I think we also considered a construction where users can mix and match API version and process version with something like /openeo/{api_version}/{process_version}/, but that felt a bit like overkill, and it was also not compatible with the /.well-known/openeo specification.

@m-mohr
Copy link
Member

m-mohr commented Nov 15, 2024

My point was that I don't think the API can ensure consistent process versions across backends, that's part of a federation contract (which the federation extension is not).
The API should probably only report which version is implemented, which is discussed in #517.
So I think from an API perspective this issue can be closed. There might be more discussion needed on individual points, but non of them really is something we'd specify in the API (except for #517).

Having different process names can't easily be solved, but the processes with different names are usually different enough so that we shouldn't align them anyway. In this case it's just handled the same was as normal missing processes for other backends.

@soxofaan
Copy link
Member Author

Ok makes sense. The federation extension is indeed an API between a federation and end-user. What this ticket is/was for is settling on something between federation and participating backends. #517/#549 should be enough to cover that on level of API, so indeed time to close

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants