-
Notifications
You must be signed in to change notification settings - Fork 1
/
austraits_package.qmd
307 lines (207 loc) · 13.7 KB
/
austraits_package.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
# The `austraits` package
```{r, include = FALSE}
knitr::opts_chunk$set(
message = FALSE,
warning = FALSE,
comment = "#>",
fig.path = "figures/"
)
library(dplyr)
library(stringr)
library(austraits)
austraits <- austraits:::austraits_5.0.0_lite
```
The `austraits` package was initially designed to aid users in accessing data from [AusTraits](https://zenodo.org/record/5112001), a curated plant trait database for the Australian flora. This package contains several core functions to explore, wrangle and visualise data.
In 2024 the package was generalised to support all databases built using the [traits.build workflow](https://github.com/traitecoevo/traits.build), new functions were added, and existing functions were re-worked. The structure of AusTraits evolved from its release in 2021 until present and the version 3.0 of the austraits package only supports AusTraits versions from 5.0 onwards. If you are working with AusTraits version 4.2.0 or earlier, you need to install an [old version of austraits]()
Below, we include a tutorial to illustrate how to use these functions.
**Note the examples shown us a subset of AusTraits release 5.0.0, but the code can be run using any traits.build database.**
## Getting started
`austraits` is still under development and not yet on Cran. To install the current version from GitHub:
```{r setup, eval = FALSE}
#install.packages("remotes")
#remotes::install_github("traitecoevo/austraits", dependencies = TRUE, upgrade = "ask")
## Load the austraits package
library(austraits)
```
### Loading AusTraits database
`load_austraits` is the one `austraits` function that is specific to the AusTraits database. By default, `load_austraits` will download AusTraits to a specified path e.g. `data/austraits` and will reload it from this location in the future. You can set `update = TRUE` so the austrait versions are downloaded fresh from [Zenodo](https://zenodo.org/record/3568429). Note that `load_austraits` accepts a version number or the DOI of a particular version.
If you are new to using AusTraits we recommend you download the most recent release, while you may want to download an older version to reproduce a previous analysis.
```{r, load_data, eval = FALSE}
austraits <- load_austraits(version = "6.0.0", path = "data/austraits")
```
You can check out the different versions of Austraits and their associated DOI by using:
```{r, eval = FALSE}
get_versions(path = "data/austraits")
```
The traits.build object is a very *long* list with various of elements. If you are not familiar with working with lists in R, we recommend having a quick look at this [tutorial](https://www.tutorialspoint.com/r/r_lists.htm). To learn more about the structure of a traits.build database, check out the [structure of the database](https://traitecoevo.github.io/austraits/articles/structure.html).
```{r}
austraits
```
## Descriptive summaries of traits and taxa
The perfect way to begin exploring a traits.build database is to learn which traits are included and how much data exists for various traits and taxa.
Interested in a specific trait? `lookup_trait`, `lookup_location_property` and `lookup_context_property` let you find terms based on exact and partial string matches.
```{r}
lookup_trait(database = austraits, term = "leaf") %>% head()
lookup_location_property(database = austraits, term = "temperature") %>% head()
lookup_context_property(database = austraits, term = "fire") %>% head()
```
Alternatively, have a look how much data a traits.build database has for specific traits or taxa. *This function only summarises by `trait_name`, `genus` or `family`.*
```{r}
summarise_database(database = austraits, var = "trait_name") %>% head()
summarise_database(database = austraits, var = "family") %>% head()
summarise_database(database = austraits, var = "genus") %>% head()
```
All traits.build databases include definitions for all traits. Check out the dictionary if you want to learn more about trait's that have been output by a `lookup_` or `summarise_` query.
```{r}
austraits$definitions %>% head()
austraits$definitions[["leaf_area"]] %>% convert_list_to_df1
```
## Extracting data
In most cases, users would like to extract a subset of a traits.build database for their own research purposes.`extract_dataset` subsets by dataset(s), `extract_trait`subsets by trait, and `extract_taxa` subsets by taxon_name, genus or family. In addition, the function `extract_data` extracts data based on a specified value(s) from any column of any table within a traits.build database.
Note that the other tables and elements of the AusTraits data are extracted too, not just the main traits table, retaining the database's original structure. See `?extract_data` and `?extract_trait` for more details.
### Extracting by study
Filtering **one particular study** and assigning it to an object
```{r, extract_study}
subset_data <- extract_dataset(database = austraits, dataset_id = "Falster_2005_2")
subset_data$traits %>% head()
```
Filtering **multiple studies by two different lead authors** and assigning it to an object
```{r, extract_studies}
subset_multi_studies <- extract_dataset(database = austraits,
dataset_id = c("Thompson_2001","Ilic_2000"))
subset_multi_studies$traits %>% distinct(dataset_id)
```
Filtering **multiple studies by same lead author** (e.g. Falster) and assigning it to an object.
```{r, extract_Falster_studies}
data_falster_studies <- extract_dataset(austraits, "Falster")
data_falster_studies$traits %>% distinct(dataset_id)
```
### Extracting by taxonomic level
Filtering
```{r extract_taxa}
# By family
proteaceae <- extract_taxa(austraits, family = "Proteaceae")
# Checking that only taxa in Proteaceae have been extracted
proteaceae$taxa$family %>% unique()
# By genus
acacia <- extract_taxa(austraits, genus = "Acacia")
# Checking that only taxa in Acacia have been extracted
acacia$traits$taxon_name %>% unique() %>% head()
```
### Extracting by trait
Filtering **one trait** and assigning it to an object
```{r, extract_trait}
data_wood_dens <- extract_trait(austraits, "wood_density")
head(data_wood_dens$traits)
```
Using `extract_trait` to extract data for **all traits with 'leaf' in the trait name** and assigning it to an object.
```{r}
data_leaf <- extract_trait(austraits, "leaf")
unique(data_leaf$traits$trait_name)
```
Using `extract_trait` to extract data for **all traits with 'leaf_N' or 'leaf_P' in the trait name** and assigning it to an object.
```{r}
data_vector_extraction <- extract_trait(austraits, c("leaf_photosyn", "huber_value"))
unique(data_vector_extraction$traits$trait_name)
```
The function `extract_data` offers the flexibility of subsetting a database based on any combination of data table, column within the table, and column value.
The database tables you can subset on are: 'traits', 'locations', 'contexts', 'methods', 'contributors', 'taxa', and 'taxonomic_updates'
```{r}
observations_with_soil_data <- extract_data(
database = austraits,
table = "locations",
col = "location_property",
col_value = "soil")
# If you are unsure about column names, check with:
names(austraits$methods)
```
Any of the extract functions can be linked together to output a more precise subset of data.
For instance, to return data on 'leaf mass per area' and 'wood density' for 'adult' plants of any species in the genus Acacia:
```{r}
Acacias_specific_traits <- austraits %>%
extract_taxa(genus = "Acacia") %>%
extract_trait(trait_name = c("leaf_mass_per_area", "wood_density")) %>%
extract_data(table = "traits", col = "life_stage", col_value = "adult")
```
## Join data from other tables and elements
Once users have extracted the data they want, they may want to merge the study metadata stored in various relational tables into the main `traits` dataframe for their analyses. For example, users may require additional taxonomic information for a phylogenetic analysis, location coordinates to plot data, or context properties to understand variation in trait values for a single taxon. This is where the `join_` functions come in.
There are seven `join_` functions in total, each designed to append specific information from other tables and elements in the `austraits`the ancillary data tables in a `traits.build` object. Their suffixes refer to the type of information that is joined, e.g. `join_taxonomy` appends taxonomic information to the `traits` dataframe. Each function lets you select which data columns you wish to add and the output format.
```{r, join_}
# Join location coordinates
(data_leaf %>% join_location_coordinates)$traits %>% head()
# Join location properties using defaults
(data_leaf %>% join_location_properties)$traits %>% head()
# Join location properties, with each location property pertaining to soil added as a separate column.
temperature_properties <- lookup_location_property(data_leaf, "temperature")
(data_leaf %>% join_location_properties(format = "many_columns", vars = temperature_properties))$traits %>% head()
# Join context properties using defaults
(data_leaf %>% join_context_properties)$traits %>% head()
# Join context properties, with each context property pertaining to fire added as a separate column.
fire_properties <- lookup_context_property(data_leaf, "fire")
(austraits %>% join_context_properties(format = "many_columns", vars = fire_properties, include_description = TRUE))$traits %>% head()
# Join methodological information using defaults
(data_leaf %>% join_methods)$traits %>% head()
# Join taxonomic information using defaults
(data_leaf %>% join_taxa)$traits %>% head()
# Join taxonomic updates information using defaults
(data_leaf %>% join_taxonomic_updates)$traits %>% head()
# Join contributors information using defaults
(data_leaf %>% join_contributors)$traits %>% head()
```
All data tables can be merged with `flatten_database`, which calls each of the `join_` functions.
```{r}
# Flatten database using defaults
all_joined <- data_leaf %>% flatten_database()
# Flatten database also accepts a list of vectors to specify which columns to include for each table
data_leaf %>% flatten_database(
format = "single_column_json",
vars = list(
location = "all",
context = "all",
contributors = "all",
taxonomy = c("family"),
taxonomic_updates = c("aligned_name"),
methods = c("methods"))
)
```
## Binding extracted databases
The function `bind_databases` allows you to bind together two subsetted databases you have created by filtering (`extract_` functions) based on two critera.
```{r}
extracted_wood <- austraits %>% extract_trait("wood_density")
extracted_Rutaceae <- austraits %>% extract_taxa(family = "Rutaceae")
merged <- bind_databases(extracted_wood, extracted_Rutaceae)
```
## Visualising data by site
`plot_locations` graphically summarises where trait data were collected and how much data is available. The legend refers to the number of neighbouring points: the warmer the colour, the more data that are available for a particular location. This function only includes data that are geo-referenced. Users must first use `join_location_coordinates` to append latitude and longitude information into the trait dataframe before plotting
```{r, site_plot, fig.align = "center", fig.width=5, fig.height=5}
data_wood_dens <- data_wood_dens %>% join_location_coordinates()
plot_locations(data_wood_dens$traits)
```
## Visualising data distribution and variance
`plot_trait_distribution_beeswarm` creates histograms and [beeswarm plots](https://github.com/eclarke/ggbeeswarm) for continuous traits to help users visualise the data. Users can specify whether to create separate beeswarm plots based on any column in the traits table (e.g. `dataset_id` or `life_stage`) or at the level of `genus` or `family`.
```{r, beeswarm, fig.align = "center", fig.width=6, fig.height=7}
austraits %>% plot_trait_distribution_beeswarm("wood_density", "family")
austraits %>% plot_trait_distribution_beeswarm("wood_density", "dataset_id")
```
## Pivotting from long to wide format
The table of traits in AusTraits comes in **long** format, where data for all trait information are denoted by two columns called `trait_name` and `value`. You can convert this to wide format, where each trait is in a separate column, using the function `trait_pivot_wider`. Note, that the informtion in the columns `unit`, `replicates`, `measurement_remarks`, and `basis_of_value` is lost when this function pivots.
```{r, pivot}
data_wide_bound <- data_falster_studies %>% # Joining multiple obs with `--`
trait_pivot_wider()
```
If there are multiple measurements linked to the same observation_id, such as when both a minimum and maximum are recorded, this information is retained as separate rows when using `trait_pivot_wider`. Instead, you can use `bind_trait_values` first, which merges multiple entries for `value`, `value_type`, `basis_of_value`, and `replicates` into a single row, with values delimited by "--".
```{r}
bound_values <- (austraits %>% extract_dataset("ABRS_1981"))$traits %>%
bind_trait_values()
bounded_wider <- bound_values %>% trait_pivot_wider
```
If you would like to revert the bounded trait values, you have to use `separate_trait_values`. Note, this function does not always recreate the original table as the delimitor "--" is also used for value_type `bin` and `range`, which should not necessarily be split. *This function is on the list to rework for future `austraits` versions.*
```{r}
bound_values2 <- (austraits %>%
extract_dataset("ABRS_1981") %>%
extract_data(table = "traits", col = "value_type", col_value = c("minimum", "maximum"))
)$traits %>%
bind_trait_values()
bound_values2 %>%
separate_trait_values(., austraits$definitions)
```