-
Notifications
You must be signed in to change notification settings - Fork 1
/
data_common_issues.qmd
80 lines (55 loc) · 3.1 KB
/
data_common_issues.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
# Common issues
Note, this chapter is a work in progress. It will be expanded over time.
## Unsupported trait values
This error occurs when, for a categorical trait, the value in [data.csv]{style="color:blue;"} is different to the value in the traits dictionary ([config/traits.yml]{style="color:blue;"}).
```{r, eval=FALSE}
table <- my_database$excluded_data %>%
filter(dataset_id == current_study) %>%
filter(error == "Unsupported trait value") %>%
select(dataset_id, trait_name, value) %>%
distinct()
```
You can individually add substitutions to [metadata.yml]{style="color:blue;"} using the function [metadata_add_substitution]{style="color:blue;"}
```{r, eval=FALSE}
metadata_add_substitution(
dataset_id = current_study,
trait_name = "plant_growth_form",
find = "T",
replace = "tree"
)
```
Or, you can add an additional column to the table output (code above) and read it into [metadata.yml]{style="color:blue;"} using the function [metadata_add_substitutions_table]{style="color:blue;"}
The table read in must have the columns [dataset_id]{style="color:blue;"}, [trait_name]{style="color:blue;"}, [find]{style="color:blue;"}, and [replace]{style="color:blue;"}.
This is a hypothetical example for a table that contains 5 rows with plant_growth_form value that need updating.
```{r, eval=FALSE}
table <- table %>%
rename(find = value) %>%
mutate(replace = c("tree", "mallee", "shrub", "graminoid", "herb"))
metadata_add_substitutions_table(
table,
dataset_id = dataset_id,
trait_name = trait_name,
find = find,
replace = replace
)
```
You can of course also write the table to a csv file, edit it in Excel or a text editor, then read it back into R.
```{r, eval=FALSE}
write_csv(table, "data/dataset_id/raw/substitutions_required.csv")
...edit outside of R
table <- read_csv("data/dataset_id/raw/substitutions_required.csv")
```
## Dataset can't pivot wider {#cannot_pivot}
In order to convert a traits.build database into a wide format, the traits.build$traits table must be able to pivot wider.
One of the tests run during `dataset_test` is the function `traits.build::check_pivot_wider()`. This function checks that each row in the `traits` table has a unique combination of a particular 7 columns:
`dataset_id`, `trait_name`, `observation_id`, `value_type`, `repeat_measurements_id`, `method_id`, `method_context_id`
There are two likely explanations -- and solutions -- to this error:
1. If your dataset combines individual (or population) level measurements with species-level measurements, the same species-level measurement may be read in many times. To solve this problem, you need to retain only the first instance of each species-level measurement, by including the following `custom_R_code`, where `taxon_name` is the column that contains taxon names and `column 1`, `column 2`, etc is a vector of the columns with categorical traits that require `de-duplicating`.
```{r, eval=FALSE}
data %>%
group_by(taxon_name) %>%
mutate(across(c("column 1", "column 2", "column 3"), replace_duplicates_with_NA))
ungroup()
```
2. Rows of data that represent measurements made at different times,
... TBC