-
Notifications
You must be signed in to change notification settings - Fork 2
/
README.Rmd
152 lines (106 loc) · 6.5 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
---
output:
md_document:
variant: markdown_github
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, echo = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "README-"
)
```
[![Build Status](https://travis-ci.org/boyanangelov/sdmbench.svg?branch=master)](https://travis-ci.org/boyanangelov/sdmbench)
[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.1436376.svg)](https://doi.org/10.5281/zenodo.1436376)
[![DOI](http://joss.theoj.org/papers/10.21105/joss.00847/status.svg)](https://doi.org/10.21105/joss.00847)
# sdmbench
![](logo.png)
Species Distribution Modeling (SDM) is a field of increasing importance in ecology<sup>[1](#footnote1)</sup>. Several popular applications of SDMs are understanding climate change effects on species<sup>[2](#footnote2)</sup>, natural reserve planning<sup>[3](#footnote3)</sup> and invasive species monitoring<sup>[4](#footnote4)</sup>. The `sdmbench` package solves several issues related to the development and evaluation of those models by providing a consistent benchmarking workflow:
* consistent species occurence data acquisition and preprocessing
* consistent environmental data acquisition and preprocessing (both current data and future projections)
* consistent spatial data partitioning
* integration of a wide variety of machine learning models
* graphical user interface for non/semi-technical users
The end result of a `sdmbench` SDM analysis is to determine the model - data processing combination that results in the highest predictive power for the species of interest. Such an analysis is useful to researchers who want to avoid issues of model selection and evaluation, and want to rapidly test prototypes of species distribution models.
## Installation
```{r eval=FALSE}
# add build_vignettes = TRUE if you want the package vignette
devtools::install_github("boyanangelov/sdmbench")
```
There are several additional packages you need to install if you want to access the complete sdmbench functionality. First Tensorflow. You can use the `keras` package to install that (it is installed by the previous command). Note that this step requires a working Python installation on your system. Most modern operating systems have Python pre-installed, but if you are not sure you can check the [official website](https://www.python.org/).
```{r eval=FALSE}
# consult the keras documentation if you want GPU support
keras::install_keras(tensorflow = "default")
```
Additionally you will need MaxEnt. Installation instructions are available [here](https://www.rdocumentation.org/packages/dismo/versions/1.1-4/topics/maxent). Note that this requires Java which you can get get from [here](http://www.oracle.com/technetwork/java/javase/downloads/index.html).
## Examples
Here are several examples of what you can do with `sdmbench`. Downloading and prepare benchmarking data:
```{r example}
library(sdmbench)
benchmarking_data <- get_benchmarking_data("Loxodonta africana", limit = 1200, climate_resolution = 10)
head(benchmarking_data$df_data)
```
Preparing data for benchmarking (i.e. add a spatial partitioning method):
```{r}
data("wrld_simpl", package = "maptools")
benchmarking_data$df_data <- partition_data(dataset_raster = benchmarking_data$raster_data,
dataset = benchmarking_data$df_data,
env = benchmarking_data$raster_data$climate_variables,
method = "block")
learners <- list(mlr::makeLearner("classif.randomForest", predict.type = "prob"),
mlr::makeLearner("classif.logreg", predict.type = "prob"),
mlr::makeLearner("classif.rpart", predict.type = "prob"),
mlr::makeLearner("classif.ksvm", predict.type = "prob"))
benchmarking_data$df_data <- na.omit(benchmarking_data$df_data)
```
Benchmarking machine learning models on parsed species occurence data:
```{r message=FALSE, warning=FALSE}
bmr <- benchmark_sdm(benchmarking_data$df_data,
learners = learners,
dataset_type = "block",
sample = FALSE)
best_results <- get_best_model_results(bmr)
best_results
```
Plot best model results:
```{r}
bmr_models <- mlr::getBMRModels(bmr)
plot_sdm_map(raster_data = benchmarking_data$raster_data,
bmr_models = bmr_models,
model_id = best_results$learner.id[1],
model_iteration = best_results$iter[1],
map_type = "static") +
raster::plot(wrld_simpl,
add = TRUE,
border = "darkgrey")
```
## Using custom data
If you are interested in bringing your own data (rather than using GBIF) you can toggle the checkmark in the sidebar and upload it. The required format is as follows:
| bio1 | bio2 | bio3 | bio4 | bio [...] | bio 19 | label |
|--- |--- |--- |--- |--- |--- |--- |
| | | | | | | |
| | | | | | | |
where `label` is `0/1`. At the moment custom data is supported only for `General Models`, and has no mapping capability.
A good starting point to discover the full package functionality is to start the GUI with `run_sdmbench()`. Here are some screenshots:
![](vignettes/gui_screenshots/screenshot_1.png)
<br>
<br>
![](vignettes/gui_screenshots/screenshot_2.png)
## Vignette
A thorough introduction to the package is available as a vignette in the package, and [online](https://boyanangelov.com/materials/sdmbench_vignette.html).
```r
# open vignette
vignette("sdmbench")
```
## Tests
The package tests are in the `tests` directory, and can be run by using `devtools::test` or Ctrl/Cmd + Shift + T within RStudio.
## Contributors
Contributions are welcome and guidelines are stated in `CONTRIBUTING.md`.
## License
MIT (`LICENSE.md`)
## References
<a name="footnote1">1</a>. Elith, J. & Leathwick, J. R. Species Distribution Models: Ecological Explanation and Prediction Across Space and Time. Annu. Rev. Ecol. Evol. Syst. 40, 677–697 (2009).
<a name="footnote2">2</a>. Austin, M. P. & Van Niel, K. P. Improving species distribution models for climate change studies: Variable selection and scale. J. Biogeogr. 38, 1–8 (2011).
<a name="footnote3">3</a>. Guisan, A. et al. Predicting species distributions for conservation decisions. Ecol. Lett. 16, 1424–1435 (2013).
<a name="footnote4">4</a>. Descombes, P. et al. Monitoring and distribution modelling of invasive species along riverine habitats at very high resolution. Biol. Invasions 18, 3665–3679 (2016).