Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Machine Learning for openEO #441

Open
wants to merge 9 commits into
base: draft
Choose a base branch
from
Open

Machine Learning for openEO #441

wants to merge 9 commits into from

Conversation

m-mohr
Copy link
Member

@m-mohr m-mohr commented May 15, 2023

Potentially interesting for "bring your own model": https://onnx.ai/

@m-mohr m-mohr added the ML label May 15, 2023
@m-mohr m-mohr added this to the 2.1.0 milestone May 15, 2023
@m-mohr
Copy link
Member Author

m-mohr commented May 15, 2023

Variants discussed in the ML meeting:

datacube = load_collection("s2", temporal_extent = ..., spatial_extent = ..., bands = ...)
model = load_ml_model("my-model-id")

# variant 1
function fn(data) {
	return predict_ml_model(data, model)
}
datacube2 = reduce_dimension(datacube, dimension = "bands", reducer = fn)

function fn2(data) {
	return predict_ml_model_probabilities(data, model)
}
datacube3 = apply_dimension(datacube, dimension = "bands", target_dimension = "probabilities", process = fn2)

# variant 2
datacube2 = predict_ml_model(datacube, dimension = "bands", model = model)
datacube3 = predict_ml_model_probabilities(datacube, dimension = "bands", model = model)

@m-mohr m-mohr marked this pull request as draft May 16, 2023 09:08
@m-mohr
Copy link
Member Author

m-mohr commented May 16, 2023

Some things from the ML meeting:

image

image

Regularization may consist of (and is mapped to openEO processes):

  • resample resolution -> resample_spatial
  • temporal aggregation -> aggregate_temporal_period
  • filter extent -> filter_bbox/spatial
  • cloud removal -> cloud_detection or masking based on cloud bands

-> combine these to a new openEO process with some arguments that are commonly used with reasonable defaults

@m-mohr m-mohr force-pushed the ml branch 2 times, most recently from 22be7a9 to e854271 Compare May 16, 2023 14:52
@m-mohr
Copy link
Member Author

m-mohr commented May 22, 2023

datacube = load_collection("s2", temporal_extent = ..., spatial_extent = ..., bands = ...)
model = load_ml_model("my-model-id")

# variant 1 (does NOT work with the current definition in this PR)
function fn(data) {
	let values = ml_predict(data, model)
	return array_element(values, 0)
}
datacube2 = reduce_dimension(datacube, dimension = "bands", reducer = fn)

function fn2(data) {
	return ml_predict(data, model)
}
datacube3 = apply_dimension(datacube, dimension = "bands", target_dimension = "predictions", process = fn2)

# variant 2 (works with the current definition in this PR)
datacube2 = ml_predict(datacube, model)
datacube2 = drop_dimension(datacube, "predictions")

datacube3 = ml_predict(datacube, model)

@PondiB
Copy link
Member

PondiB commented Jun 12, 2023

@m-mohr , is there a reason why the keyword fit is used in the naming convention instead of train?
I will work on data cube regularization and do a different PR.

@m-mohr
Copy link
Member Author

m-mohr commented Jun 12, 2023

Just to align with fit_cuve, I guess. Train is also fine...

@PondiB
Copy link
Member

PondiB commented Jun 12, 2023

Just to align with fit_cuve, I guess. Train is also fine...

Cool

@PondiB PondiB marked this pull request as ready for review August 23, 2023 09:03
Copy link
Member

@PondiB PondiB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Approving this now. We can continuously improve during implementations if need be.

@PondiB
Copy link
Member

PondiB commented Sep 19, 2023

@m-mohr is renaming these two to follow with a prefix of "ml" a possible alternative?
load_ml_model() > ml_load_model()
save_ml_model() > ml_save_model()

@m-mohr
Copy link
Member Author

m-mohr commented Sep 19, 2023

Why? The current proposal follows the load_* and save_result schema.

@PondiB
Copy link
Member

PondiB commented Sep 20, 2023

Why? The current proposal follows the load_* and save_result schema.

Cool makes sense to follow the previous schema. My initial thoughts came from the perception that it would be good for a general user if most of the ml operations start with that prefix i.e. "ml_".

@m-mohr
Copy link
Member Author

m-mohr commented Dec 6, 2023

The STAC ML Model extension may get deprecated in favor of https://github.com/crim-ca/dlm-extension
@PondiB I think it would be great to get in touch with the folks so that we can influence that it also works for openEO.

@PondiB
Copy link
Member

PondiB commented Dec 6, 2023

The STAC ML Model extension may get deprecated in favor of https://github.com/crim-ca/dlm-extension @PondiB I think it would be great to get in touch with the folks so that we can influence that it also works for openEO.

Sure thanks, I just saw the notification about it, I will follow up with them.

{
"id": "save_ml_model",
"summary": "Save a ML model",
"description": "Saves a machine learning model as part of a batch job.\n\nThe model will be accompanied by a separate STAC Item that implements the [ml-model extension](https://github.com/stac-extensions/ml-model).",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The model will be accompanied by a separate STAC Item that

What does "accompanied" practically mean? Should there be an additional job result asset? Or should this be an job result link item?

The reason I'm asking is that we want to streamline the detection of the model's URL at the client side.

e.g. see Open-EO/openeo-python-client#576 we we currently have a highly implementation-specific hack

ml_model_metadata_url = [
    link 
    for link in links if 'ml_model_metadata.json' in link['href']
][0]['href']

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question, I guess we should clarify that.

On the other hand, please note that this PR is implicitly outdated as the ML Model extension in STAC is likely going to be replaced by another extension. So this generally needs more work (which I have no plans to do anytime soon).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ML Model extension in STAC is likely going to be replaced by another extension.

can you point to the new one @m-mohr ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Move ML processes to 2.1.0 predict_class and predict_probabilities
3 participants