Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Julia BRT Pipeline and Priority Map #99

Closed
wants to merge 62 commits into from
Closed

Conversation

gottacatchenall
Copy link
Contributor

@gottacatchenall gottacatchenall commented Oct 1, 2023

This pipeline (./pipelines/julia_sdm.json) runs on my machine, few notes before trying to run:

  • This requires rebuilding the Docker image to update to Julia v1.9.3 with docker compose build
  • Currently this doesn't mask out open-water, which should probably be fixed because the SDM assigns positive probability to some open-water regions, though it fits (or overfits?) much closer to the occurrence points this happens now

@jmlord
Copy link
Contributor

jmlord commented Oct 2, 2023

About masking open water, could the BinaryLayer script be useful?

Copy link
Contributor

@jmlord jmlord left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok on my side with these minor changes.
We'd need an SDM person to review now...

runners/julia-dockerfile Outdated Show resolved Hide resolved
scripts/SDM/julia_sdms/generateBackground.yml Show resolved Hide resolved
scripts/SDM/julia_sdms/fitBRT.yml Outdated Show resolved Hide resolved
scripts/SDM/julia_sdms/generateBackground.yml Outdated Show resolved Hide resolved
scripts/SDM/julia_sdms/loadCHELSA.yml Show resolved Hide resolved
scripts/SDM/julia_sdms/loadCHELSA.yml Show resolved Hide resolved
pipelines/julia_sdm.json Outdated Show resolved Hide resolved
pipelines/julia_sdm.json Outdated Show resolved Hide resolved
pipelines/julia_sdm.json Outdated Show resolved Hide resolved
pipelines/julia_sdm.json Outdated Show resolved Hide resolved
@gottacatchenall gottacatchenall requested a review from glaroc October 2, 2023 15:51
@glaroc
Copy link
Contributor

glaroc commented Oct 2, 2023

I think it would be important to have spatial resolution as a pipeline input for all SDMs.

@gottacatchenall
Copy link
Contributor Author

gottacatchenall commented Oct 2, 2023

I think it would be important to have spatial resolution as a pipeline input for all SDMs.

I agree, and this also further highlights why we need a pipeline that is only about validating inputs (i.e. hypothetically the user thinks the spatial res is defined in meters when its in degrees so it goes around the earth several times)

@frousseu
Copy link
Contributor

The BRT pipeline can now be run on my Windows computer. For now, the two main things to improve would be:
1 - The ability to choose the species and the predictors
2 - There are duplicated inputs in the Input form (mask, weight matrix) and it is not clear why the weight matrix is there. Shouldn't it be in the pipeline for the priority map?
Otherwise, it would be nice to connect the julia brt with the createBackground function to get a biased background.

@frousseu
Copy link
Contributor

Otherwise, I'm getting this error from the block3.json in the priority map pipeline:

$ids
[1] "esacci-lc-2020"

Error: HTTP status '400'. Invalid parameters provided 1 validation error for SearchPostRequest
bbox
  Bounding box must be within (-180, -90, 180, 90) (type=value_error)
No traceback available 
Parent job is Cancelling: killing server process...
Parent job is Cancelling: done.

@gottacatchenall
Copy link
Contributor Author

Otherwise, I'm getting this error from the block3.json in the priority map pipeline:

$ids
[1] "esacci-lc-2020"

Error: HTTP status '400'. Invalid parameters provided 1 validation error for SearchPostRequest
bbox
  Bounding box must be within (-180, -90, 180, 90) (type=value_error)
No traceback available 
Parent job is Cancelling: killing server process...
Parent job is Cancelling: done.

This is because the bounding box has to in long/lat coordinates for Julia right now as everything is in WGS84 (between -180 and 180 long and -85,85 lat)

@jmlord
Copy link
Contributor

jmlord commented Feb 1, 2024

The BRT pipeline can now be run on my Windows computer. For now, the two main things to improve would be: 1 - The ability to choose the species and the predictors 2 - There are duplicated inputs in the Input form (mask, weight matrix) and it is not clear why the weight matrix is there. Shouldn't it be in the pipeline for the priority map? Otherwise, it would be nice to connect the julia brt with the createBackground function to get a biased background.

I think Guillaume's recent modifications should have fixed these.

Copy link
Contributor

@jmlord jmlord left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gottacatchenall A few clarifications in documentation please before we merge.

Also, most important: the priority map was empty when I ran it.

@@ -0,0 +1,22 @@
script: pcaLayers.jl
name: PCA Layers
description: "PCAs the layers"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lacking documentation

"label": "Bounding box",
"description": "Bounding coordinates of the extent in the order xmin, ymin, xmax, ymax, in degrees",
"type": "float[]",
"example": ["-78", "43", "-68", "65"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image
The default bounding box is a bit weird.

},
"pipeline@6": {
"label": "collections_items (for climate uniqueness)",
"description": "Vector of strings, collection name followed by '|' followed by item id",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please explain a bit how we are supposed to choose these.

},
"pipeline@43": {
"label": "k",
"description": "the value of k for the k-means algorithm",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please explains what this means, how to choose it, what impacts it has.

"example": 5
},
"Block3>weightLayers.yml@44|uncertainty": {
"description": "uncertainty in the inference of some value across space",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More doc here and in the script. I know this is meant to be generic, but maybe giving an example : an indicator we want to optimize for?
Also, what effect in the priority map?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be in the folder along with the other climate layers. There should be an explanation differentiating it with Juan's work.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason not to use the R version?

type: image/tiff;application=geotiff
fit_stats:
label: fit_stats
description: JSON of model fit stats and threshold
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Guidance for interpretation of the fit statistics would be appreciated.

type: image/tiff;application=geotiff
sdm_uncertainty:
label: sdm_uncertainty
description: map of relative uncertainty
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Guidance for interpretation: 1 uncertain and 0 certain?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I ran this, the output priority map was empty.

Other comments were added to GitHub PR.
@jmlord
Copy link
Contributor

jmlord commented Apr 3, 2024

@gottacatchenall Can you investigate why this produces no result?
See also documentation clarification requests in review above.

@gottacatchenall
Copy link
Contributor Author

Once PoisotLab/SpeciesDistributionToolkit.jl#227 is merged, I will refactor this to be merged.

@tpoisot
Copy link

tpoisot commented May 7, 2024

Once

yup

@jmlord
Copy link
Contributor

jmlord commented May 28, 2024

Hi folks,
Any update on this?
Laura's lab intends to integrate a new bias correction algorithm and could use a BRT script in their pipeline.

@tpoisot
Copy link

tpoisot commented May 28, 2024

I'm close but no ETA yet -- if all goes well, by the end of June? I'd need to see with @gottacatchenall how much work is needed to port his existing pipeline

@glaroc
Copy link
Contributor

glaroc commented May 29, 2024

@gottacatchenall Do you have a description somewhere of the procedure you use to obtain the uncertainty layer for the BRT ? Thanks.

@gottacatchenall
Copy link
Contributor Author

I've added a description of the BRT methodology (including how uncertainty is computed) here .

A few things I noticed on revisiting this:

  • Crossvalidation splits are included in the BRT script, which isn't ideal because we would like different CV methods to be different scripts that can be used an inputs to different SDMs
  • This probably necessitates a subpipeline for repeated CV on a given model using a single CV method
  • Computation of fit metrics is inevitably going to be duplicated so should be moved somewhere else

@gottacatchenall
Copy link
Contributor Author

replaced with #170

@tpoisot
Copy link

tpoisot commented Sep 19, 2024

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants