Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add machine-learning Processes #91

Open
wants to merge 7 commits into
base: develop
Choose a base branch
from

Conversation

MichaelBrueggemann
Copy link
Contributor

@MichaelBrueggemann MichaelBrueggemann commented Feb 9, 2024

What?

New Processes

These two new Processes train_model and apply_prediction form the base for machine-learning in openeocubes. train_model allows to train different models (that are supported in the package caret) with openeocubes and store the trained models as .rds-files in the workspace of openeocubes.
apply_prediciton uses these models and the function apply_pixel() from the gdalcubes package to apply the machine-learning model to each pixel of a datacube build with openeocubes.

New File Formats

This PR also introduces the support to save outputs of openeocubes as .rds-files. This can be trained models, but also the "proxy-datacubes" returned by gdalcubes can be saved this way.
The trained models can be downloaded via any OpenEO-Client.

New Code Layout

These changes also propose a new style for the management of the source code in this repository. Instead of storing all processes in one file (e.g. processes.R) each Process now has its own .R file. Also every Process is now split into two parts. The operation and the Process. The operation is the actual R-function that is accessed by openeocubes Process-Manager. I suggest to "suffix" the function name with "_opp" to indicate, that it's a function corresponding to a openeoccubes-Process and shouldn't be used on it's own.
The operation is then attached to the process-object, to ensure each process works as it's used to in older versions of openeocubes.

Why?

Currently openeocubes doesn't support machine-learning on datacubes. This PR proposes a first draft for the RandomForest machine-learning-algorithm and a preliminary design, that should be refined for other coming machine-learning algorithems. It also implements the support of .rds file downloads to access the trained models on the users local device to further support reusability of the created models.

It also proposes a new style for the Code layout, to make it more readable and better testable with formal testing methods (Unittests).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant