Skip to content

Commit

Permalink
Address potential data mistakes flagged by ANU students (#790)
Browse files Browse the repository at this point in the history
Fixes issue #775 

* I collapsed some duplicate context property names
* Removed a special character from traits.yml but @ehwenk not sure if this is allowed with the new APD workflow
* ABRS_1981 measurements will apparently be removed anyway with @ehwenk's work cleaning duplicates
* Fixed issues and documented in another column here:

https://github.com/traitecoevo/austraits.build/files/13519251/AusTraits.potential.errors.-.Sheet1.csv
  • Loading branch information
yangsophieee authored May 13, 2024
1 parent dcc9276 commit bf9d41e
Show file tree
Hide file tree
Showing 21 changed files with 142 additions and 70 deletions.
4 changes: 2 additions & 2 deletions config/traits.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4362,7 +4362,7 @@ traits:
(at dispersal, mature); 'air dried' (at local ambient conditions); 'seed bank
air dried' (to 15% relative humidity); and 'oven dried' (>100 deg C for a
set number of hours; e.g. seed bank standard is 103 deg C for 17 hours). It
is expected that some observations in AusTraits mapped onto ‘seed_dry_mass'
is expected that some observations in AusTraits mapped onto 'seed_dry_mass'
will actually include both the seed and some dispersal tissue, if the two
cannot easily be separated; these should be mapped to 'diaspore_dry_mass'.
type: numeric
Expand Down Expand Up @@ -6741,7 +6741,7 @@ traits:
their host plant. The trait `root_structures` includes specialised root structures
type: categorical
allowed_values_levels:
saprophyte: Plant that acquires nutrients from decaying biomass.
saprophyte: Plant that acquires nutrients from decaying biomass.
carnivorous: Plant that acquires some or most of its nutrients from animals or protozoa.
nutrient_mining: Plant that uses a specialised root structure, such as cluster roots, to mine nutrients inaccessible to other plant taxa.
entity_URI: https://w3id.org/APD/traits/trait_0030017
Expand Down
22 changes: 11 additions & 11 deletions data/ABRS_1981/data.csv
Original file line number Diff line number Diff line change
Expand Up @@ -13086,7 +13086,7 @@ Aloe parvibracteata,leaf length maximum,cm,40
Aloe saponaria,leaf length maximum,cm,16
Aloe vera,leaf length maximum,cm,60
Alpinia arctiflora,leaf length maximum,cm,50
Alpinia arundelliana,leaf length maximum,cm,25
Alpinia arundelliana,leaf length maximum,mm,25
Alpinia coerulea,leaf length maximum,cm,40
Alpinia hylandii,leaf length maximum,cm,16
Alpinia modesta,leaf length maximum,cm,30
Expand Down Expand Up @@ -13603,7 +13603,7 @@ Choretrum glomeratum var. glomeratum,leaf length maximum,,
Choretrum lateriflorum,leaf length maximum,mm,2
Choretrum pauciflorum,leaf length maximum,mm,1
Choretrum pritzelii,leaf length maximum,mm,1.5
Choretrum spicatum,leaf length maximum,cm,2.5
Choretrum spicatum,leaf length maximum,mm,2.5
Cinnamomum baileyanum,leaf length maximum,cm,13
Cinnamomum laubatii,leaf length maximum,cm,14.5
Cinnamomum oliveri,leaf length maximum,cm,17
Expand Down Expand Up @@ -13794,7 +13794,7 @@ Coopernookia chisholmii,leaf length maximum,cm,9
Coopernookia georgei,leaf length maximum,cm,5
Coopernookia polygalacea,leaf length maximum,cm,30
Coopernookia scabridiuscula,leaf length maximum,cm,8
Coopernookia strophiolata,leaf length maximum,cm,35
Coopernookia strophiolata,leaf length maximum,mm,35
Cordyline cannifolia,leaf length maximum,cm,50
Cordyline congesta,leaf length maximum,cm,40
Cordyline fruticosa,leaf length maximum,cm,80
Expand Down Expand Up @@ -14573,14 +14573,14 @@ Gonocarpus humilis,leaf length maximum,mm,18
Gonocarpus implexus,leaf length maximum,mm,11
Gonocarpus intricatus,leaf length maximum,cm,1.3
Gonocarpus leptothecus,leaf length maximum,mm,30
Gonocarpus longifolius,leaf length maximum,cm,25
Gonocarpus longifolius,leaf length maximum,mm,25
Gonocarpus mezianus,leaf length maximum,mm,17
Gonocarpus micranthus,leaf length maximum,mm,13
Gonocarpus micranthus subsp. micranthus,leaf length maximum,mm,7
Gonocarpus micranthus subsp. ramosissimus,leaf length maximum,mm,13
Gonocarpus montanus,leaf length maximum,mm,6
Gonocarpus nodulosus,leaf length maximum,mm,6
Gonocarpus oreophilus,leaf length maximum,cm,35
Gonocarpus oreophilus,leaf length maximum,mm,35
Gonocarpus paniculatus,leaf length maximum,cm,5
Gonocarpus pithyoides,leaf length maximum,mm,15
Gonocarpus pusillus,leaf length maximum,mm,8
Expand Down Expand Up @@ -17130,7 +17130,7 @@ Viscum whitei,leaf length maximum,cm,5.5
Viscum whitei subsp. flexicaule,leaf length maximum,cm,5
Viscum whitei subsp. whitei,leaf length maximum,cm,4
Watsonia versfeldii var. alba,leaf length maximum,cm,120
Wikstroemia indica,leaf length maximum,cm,80
Wikstroemia indica,leaf length maximum,mm,80
Wilkiea angustifolia,leaf length maximum,cm,21
Wilkiea austroqueenslandica,leaf length maximum,cm,21
Wilkiea cordata,leaf length maximum,cm,26
Expand Down Expand Up @@ -17389,7 +17389,7 @@ Aloe parvibracteata,leaf length minimum,cm,30
Aloe saponaria,leaf length minimum,cm,12
Aloe vera,leaf length minimum,cm,40
Alpinia arctiflora,leaf length minimum,,
Alpinia arundelliana,leaf length minimum,cm,12
Alpinia arundelliana,leaf length minimum,mm,12
Alpinia coerulea,leaf length minimum,,
Alpinia hylandii,leaf length minimum,cm,10
Alpinia modesta,leaf length minimum,cm,10
Expand Down Expand Up @@ -17906,7 +17906,7 @@ Choretrum glomeratum var. glomeratum,leaf length minimum,,
Choretrum lateriflorum,leaf length minimum,mm,1
Choretrum pauciflorum,leaf length minimum,mm,0.5
Choretrum pritzelii,leaf length minimum,,
Choretrum spicatum,leaf length minimum,mm,15
Choretrum spicatum,leaf length minimum,mm,1.5
Cinnamomum baileyanum,leaf length minimum,cm,5
Cinnamomum laubatii,leaf length minimum,cm,8
Cinnamomum oliveri,leaf length minimum,cm,8.5
Expand Down Expand Up @@ -18097,7 +18097,7 @@ Coopernookia chisholmii,leaf length minimum,cm,4
Coopernookia georgei,leaf length minimum,cm,2
Coopernookia polygalacea,leaf length minimum,cm,12
Coopernookia scabridiuscula,leaf length minimum,cm,4
Coopernookia strophiolata,leaf length minimum,cm,10
Coopernookia strophiolata,leaf length minimum,mm,10
Cordyline cannifolia,leaf length minimum,cm,20
Cordyline congesta,leaf length minimum,cm,30
Cordyline fruticosa,leaf length minimum,cm,25
Expand Down Expand Up @@ -18876,14 +18876,14 @@ Gonocarpus humilis,leaf length minimum,mm,11
Gonocarpus implexus,leaf length minimum,mm,6
Gonocarpus intricatus,leaf length minimum,cm,1
Gonocarpus leptothecus,leaf length minimum,mm,25
Gonocarpus longifolius,leaf length minimum,cm,15
Gonocarpus longifolius,leaf length minimum,mm,15
Gonocarpus mezianus,leaf length minimum,mm,7
Gonocarpus micranthus,leaf length minimum,mm,3
Gonocarpus micranthus subsp. micranthus,leaf length minimum,mm,3
Gonocarpus micranthus subsp. ramosissimus,leaf length minimum,mm,10
Gonocarpus montanus,leaf length minimum,mm,3.5
Gonocarpus nodulosus,leaf length minimum,mm,3
Gonocarpus oreophilus,leaf length minimum,cm,10
Gonocarpus oreophilus,leaf length minimum,mm,10
Gonocarpus paniculatus,leaf length minimum,cm,2
Gonocarpus pithyoides,leaf length minimum,mm,10
Gonocarpus pusillus,leaf length minimum,mm,7
Expand Down
13 changes: 9 additions & 4 deletions data/ABRS_1981/metadata.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ contributors:
dataset_curators: Rachael Gallagher
dataset:
data_is_long_format: yes
custom_R_code: '
custom_R_code: '
data$i <- seq_len(nrow(data));
data_numeric <- data %>%
Expand Down Expand Up @@ -82,12 +82,14 @@ dataset:
value = ifelse(species_name %in% c("Piper hederaceum var. hederaceum") & trait == "lifeform","climber_vine_woody",value),
value = ifelse(species_name %in% c("Stephania japonica var. discolor") & trait == "lifeform","vine",value),
value = ifelse(species_name %in% c("Caesalpinia subtropica") & trait == "lifeform","climber_woody",value),
trait = ifelse(value %in% c("HE", "A", "E"), "plant_growth_substrate", trait)
trait = ifelse(value %in% c("HE", "A", "E"), "plant_growth_substrate", trait),
value = ifelse(stringr::str_detect(species_name, "^Xanthorrhoea") &
stringr::str_detect(trait, "^leaf length"), NA, value)
) %>%
group_by(species_name, trait) %>%
distinct(value, .keep_all = TRUE) %>%
ungroup()
'
'
collection_date: unknown/2015
taxon_name: species_name
trait_name: trait
Expand All @@ -98,7 +100,10 @@ dataset:
sampling_strategy: herbarium specimens
original_file: STU
notes: Request ackowledgment "Data was sourced from the Flora of Australia with
permission from the Australian Biological Resources Study"
permission from the Australian Biological Resources Study"; Choretrum spicatum
leaf_length in data.csv corrected to 15 to 1.5, leaf_length units corrected for
Alpinia arundelliana, Choretrum spicatum, Coopernookia strophiolata, Gonocarpus
longifolius, Gonocarpus oreophilus, Wikstroemia indica, due to likely typos.
locations: .na
contexts: .na
traits:
Expand Down
5 changes: 3 additions & 2 deletions data/ABRS_2023/data.csv
Original file line number Diff line number Diff line change
Expand Up @@ -152130,7 +152130,8 @@ Flora_of_Australia,Thysanotus rectantherus,seed_breadth,seeds,1,,mm,maximum
Flora_of_Australia,Thysanotus sabulosus,seed_breadth,seeds,1,,mm,maximum
Flora_of_Australia,Thysanotus scaber,seed_breadth,seeds,1.5,,mm,maximum
Flora_of_Australia,Thysanotus sparteus,seed_breadth,seeds,1,,mm,maximum
Flora_of_Australia,Thysanotus tenellus,seed_breadth,seeds,0.5,,mm,maximum
Flora_of_Australia,Thysanotus tenellus,seed_breadth,seeds,1.5,,mm,maximum
Flora_of_Australia,Thysanotus tenellus,seed_length,seeds,1.5,,mm,maximum
Flora_of_Australia,Thysanotus tenuis,seed_breadth,seeds,1,,mm,maximum
Flora_of_Australia,Thysanotus thyrsoideus,seed_breadth,seeds,1,,mm,maximum
Flora_of_Australia,Thysanotus triandrus,seed_breadth,seeds,1,,mm,maximum
Expand Down Expand Up @@ -169008,7 +169009,7 @@ Flora_of_Australia,Thysanotus sabulosus,seed_width,seeds,1,,mm,maximum
Flora_of_Australia,Thysanotus scaber,seed_width,seeds,1.5,,mm,maximum
Flora_of_Australia,Thysanotus sparteus,seed_width,seeds,1,,mm,maximum
Flora_of_Australia,Thysanotus speckii,seed_width,seeds,1.5,,mm,maximum
Flora_of_Australia,Thysanotus tenellus,seed_width,seeds,0.5,,mm,maximum
Flora_of_Australia,Thysanotus tenellus,seed_width,seeds,1.5,,mm,maximum
Flora_of_Australia,Thysanotus tenuis,seed_width,seeds,1,,mm,maximum
Flora_of_Australia,Thysanotus thyrsoideus,seed_width,seeds,1,,mm,maximum
Flora_of_Australia,Thysanotus triandrus,seed_width,seeds,1,,mm,maximum
Expand Down
2 changes: 1 addition & 1 deletion data/ABRS_2023/metadata.yml
Original file line number Diff line number Diff line change
Expand Up @@ -136,7 +136,7 @@ contexts:
description: Trait value inferred from the value of a different trait.
- value: inferred_from_taxonomy
description: Trait value inferred from a higher level taxon description.
- context_property: entity_measured
- context_property: entity measured
category: method_context
var_in: entity_measured
- context_property: replicate observations
Expand Down
11 changes: 9 additions & 2 deletions data/Briggs_2010/metadata.yml
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,13 @@ contributors:
dataset_curators: Rachael Gallagher
dataset:
data_is_long_format: no
custom_R_code: data %>% mutate(site_name = "Terrick Terrick National Park")
custom_R_code: '
data %>%
mutate(
site_name = "Terrick Terrick National Park",
`Seed mass (mg)` = `Seed mass (mg)`/50
)
'
collection_date: 2007/2007
taxon_name: Scientific Name
location_name: site_name
Expand All @@ -41,7 +47,8 @@ dataset:
database.xls" extracted. Original copy of the excel file located in Google Drive
in the folder "Morgan_2011_1 Morgan_2011_2 Morgan_2014 Angevin_2010 Briggs_2010
Cross_2011 Lunt_2012 Roberts_2006 Scott_2010"
notes: none
notes: Divided all seed mass measurements by 50 (the number of replicates) as otherwise
the measurements are outliers (seem to be off by 50 times).
locations:
Terrick Terrick National Park:
latitude (deg): -36.1738
Expand Down
58 changes: 46 additions & 12 deletions data/Kew_2019_1/metadata.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ contributors:
dataset_curators: Elizabeth Wenk
dataset:
data_is_long_format: no
custom_R_code: '
custom_R_code: '
data %>%
mutate(
ref_year = stringr::str_extract(refshort, "(?<!\\d)\\d{4}(?!\\d)"),
Expand Down Expand Up @@ -46,21 +46,32 @@ dataset:
"REFERENCE: ", original_dataset_id
),
measurement_remarks = stringr::str_replace(
measurement_remarks,
measurement_remarks,
"MATERIAL: NA; ", ""),
measurement_remarks = stringr::str_replace(
measurement_remarks,
"NOTES: NA; ", "")
) %>%
filter(!original_dataset_id %in% c("Jurado_1991", "Moles_2000", "Milberg_1998")) %>%
arrange(taxa, materialweigheddesc, thousandseedweight, original_dataset_id) %>%
measurement_remarks,
"NOTES: NA; ", ""),
seed_mass_method = case_when(
stringr::str_detect(seedweightnotes, "wet mass") ~ "Weight refers to wet mass of mature seed.",
stringr::str_detect(seedweightnotes, "air-dry|air dry|Air-dry") ~ "Weight refers to the air-dried mass of mature seed.",
stringr::str_detect(seedweightnotes, "oven dry|dried at|dried to|dried in|dried for") ~ "Weight refers to the oven-dried mass of mature seed.",
stringr::str_detect(seedweightnotes, "fresh|Fresh") ~ "Weight refers to fresh mass of mature seed.",
stringr::str_detect(seedweightnotes, "mean weight of 20 dry seeds|Dired seed") ~ "Weight refers to dry mass of mature seed.",
stringr::str_detect(seedweightnotes, "dry weight") & stringr::str_detect(seedweightnotes, "SE=") ~ "Weight refers to dry mass of mature seed.",
seedweightnotes %in% c("Dry weight", "Dry weight.", "dry", "Weight refers to seed dry mass", "Seed dry-weight.") ~ "Weight refers to dry mass of mature seed.",
TRUE ~ "Weight probably refers to dry mass (see measurement remarks)."),
thousandseedweight = ifelse(taxa == "Angophora bakeri", thousandseedweight/1000, thousandseedweight)
) %>%
filter(!original_dataset_id %in% c("Jurado_1991", "Moles_2000", "Milberg_1998")) %>%
arrange(taxa, materialweigheddesc, thousandseedweight, original_dataset_id) %>%
group_by(taxa, materialweigheddesc, thousandseedweight, seedweightnotes, weightprecision) %>%
mutate(merged_ref = ifelse(length(unique(original_dataset_id)) == 1, original_dataset_id, paste0(original_dataset_id, collapse = "; "))) %>%
ungroup() %>%
ungroup() %>%
group_by(taxa, merged_ref) %>%
mutate(across(c(thousandseedweight, materialweigheddesc), replace_duplicates_with_NA)) %>%
mutate(across(c(thousandseedweight, materialweigheddesc), replace_duplicates_with_NA)) %>%
ungroup()
'
'
collection_date: unknown/2019
taxon_name: taxa
source_id: original_dataset_id
Expand All @@ -72,9 +83,32 @@ dataset:
original_file: in raw data folder
measurement_remarks: merged_ref
notes: Data provided for inclusion in AusTraits and subsequent reuse under a Creative
Commons license. Data not to be used for Commercial Purposes.
Commons license. Data not to be used for Commercial Purposes. Seed weight for Angophora
bakeri divided by 1000 in custom_R_code due to extreme outlier.
locations: .na
contexts: .na
contexts:
- context_property: seed mass method
category: method_context
var_in: seed_mass_method
values:
- find: Weight probably refers to dry mass (see measurement remarks).
value: dry mass (assumed)
description: Weight probably refers to dry mass (see measurement remarks).
- find: Weight refers to the air-dried mass of mature seed.
value: air-dried mass
description: Weight refers to the air-dried mass of mature seed.
- find: Weight refers to the oven-dried mass of mature seed.
value: oven-dried mass
description: Weight refers to the oven-dried mass of mature seed.
- find: Weight refers to dry mass of mature seed.
value: dry mass
description: Weight refers to dry mass of mature seed.
- find: Weight refers to fresh mass of mature seed.
value: fresh mass
description: Weight refers to fresh mass of mature seed.
- find: Weight refers to wet mass of mature seed.
value: wet mass
description: Weight refers to wet mass of mature seed.
traits:
- var_in: thousandseedweight
unit_in: mg
Expand Down
11 changes: 8 additions & 3 deletions data/Kooyman_2011/metadata.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ contributors:
dataset_curators: Rachael Gallagher
dataset:
data_is_long_format: no
custom_R_code: '
custom_R_code: '
data %>%
group_by(`Species_reconciled with updates and additions`) %>%
mutate(across(c(`seed length maximum (mm)`,`seed length minimum (mm)`,
Expand All @@ -38,7 +38,11 @@ dataset:
`leaf width minimum (mm)` = ifelse(is.na(leaf_width_min_duplicate),`leaf width minimum (mm)`,NA),
`leaf width maximum (mm)` = ifelse(is.na(leaf_width_max_duplicate),`leaf width maximum (mm)`,NA),
`leaf length minimum (mm)` = ifelse(is.na(leaf_length_min_duplicate),`leaf length minimum (mm)`,NA),
`leaf length maximum (mm)` = ifelse(is.na(leaf_length_max_duplicate),`leaf length maximum (mm)`,NA)
`leaf length maximum (mm)` = ifelse(is.na(leaf_length_max_duplicate),`leaf length maximum (mm)`,NA),
`seed length minimum (mm)` = ifelse(`Species_reconciled with updates and additions` == "Rhizophora mucronata", NA, `seed length minimum (mm)`),
`seed length maximum (mm)` = ifelse(`Species_reconciled with updates and additions` == "Rhizophora mucronata", NA, `seed length maximum (mm)`),
`seed width minimum (mm)` = ifelse(`Species_reconciled with updates and additions` == "Rhizophora mucronata", NA, `seed width minimum (mm)`),
`seed width maximum (mm)` = ifelse(`Species_reconciled with updates and additions` == "Rhizophora mucronata", NA, `seed width maximum (mm)`)
) %>%
select(-`seed size categorical: <10mm`,-`seed size categorical: >10mm`,
`wood density genus`, `wood density family`, -seed_width_min_duplicate, - seed_width_max_duplicate,
Expand Down Expand Up @@ -126,7 +130,8 @@ dataset:
of the five areas; Cape York (CY) andWet Tropics (WT) in Qld, and Nightcap-Border
Ranges (NB), Dorrigo (DO) and Washpool (WA) in NSW.
original_file: Aust_RF_Traits_Kooyman_original.xls
notes: none
notes: Excluded seed widths and lengths for Rhizophora mucronata as seed length
seems to be for hypocotyl according to Flora of Australia.
locations: .na
contexts: .na
traits:
Expand Down
4 changes: 2 additions & 2 deletions data/Laxton_2005/metadata.yml
Original file line number Diff line number Diff line change
Expand Up @@ -22,10 +22,10 @@ contributors:
dataset_curators: Rachael Gallagher
dataset:
data_is_long_format: no
custom_R_code: '
custom_R_code: '
data %>%
mutate(SITE_ID = ifelse(is.na(SITE_ID),"unknown", SITE_ID))
'
'
collection_date: 2002/2004
taxon_name: species_name
location_name: SITE_ID
Expand Down
4 changes: 2 additions & 2 deletions data/Leishman_2007/metadata.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,14 +24,14 @@ contributors:
dataset:
data_is_long_format: no
custom_R_code: '
data %>%
data %>%
mutate(
across(c(`gs (molm-2s-1)`), ~na_if(.x, 0)),
entity_to_use = ifelse(is.na(Replicate),"population","individual"),
value_type_to_use = ifelse(is.na(Replicate),"mean","raw"),
replicates_to_use = ifelse(is.na(Replicate),"3","1")
)
'
'
collection_date: unknown/2007
taxon_name: Species
location_name: site
Expand Down
17 changes: 10 additions & 7 deletions data/McGlone_2015/metadata.yml
Original file line number Diff line number Diff line change
Expand Up @@ -22,22 +22,24 @@ contributors:
dataset_curators: Elizabeth Wenk
dataset:
data_is_long_format: no
custom_R_code: '
data %>%
custom_R_code: '
data %>%
mutate(
woodiness = "woody",
plant_growth_form = habit,
parasitic = ifelse(habit %in% c("parasite", "mistletoe"), "parasitic", NA),
parasitic = ifelse(habit %in% c("parasite", "mistletoe"), "parasitic", NA),
plant_growth_substrate = ifelse(habit %in% c("mistletoe"), "epiphyte", NA),
leaf_type = NA
) %>%
leaf_type = NA,
`ht (m)` = ifelse(`species name` == "Acacia mitchellii" & `ht (m)` == 30, NA, `ht (m)`),
`ht (m)` = ifelse(`species name` == "Gyrostemon australasicus" & `ht (m)` == 10, NA, `ht (m)`)
) %>%
distinct(`species name`, `ht (m)`, `ll (mm)`, `lw (mm)`, `leaf form`, `habit`, .keep_all = TRUE) %>%
move_values_to_new_trait(
"leaf form", "leaf_type",
"scale", "scale", ""
) %>%
mutate(across(c(`leaf form`), ~na_if(.x,"")))
'
'
collection_date: 2014/2014
taxon_name: species name
location_name: sites
Expand All @@ -56,7 +58,8 @@ dataset:
Walsh, N.G. & Entwistle, T.J. 1997. Flora of Victoria Volume 4. Chatswood. Enkata
Press.
original_file: Woody plants in Tasmania and Victoria.xls
notes: none
notes: plant_height values for Acacia mitchellii and Gyrostemon australasicus have
been removed as outliers, using `custom_R_code`
locations:
state not specified:
latitude (deg): .na.real
Expand Down
Loading

0 comments on commit bf9d41e

Please sign in to comment.