From 128be6344170f624aca3599d8ff684d4d034bce9 Mon Sep 17 00:00:00 2001 From: ehwenk Date: Wed, 1 Nov 2023 14:29:03 +1100 Subject: [PATCH 1/2] renaming taxon fields to match APCalign `cleaned_name` -> `aligned_name` (and similar uses of `cleaned_` to `aligned_`) `taxononic_reference` -> `taxonomic_dataset` --- inst/support/austraits.build_schema.yml | 10 +++---- inst/support/report_dataset.Rmd | 18 ++++++------ remake.yml | 35 ++++++++++++++--------- scripts/dictionary.Rmd | 2 +- tests/testthat/config/taxon_list-orig.csv | 2 +- tests/testthat/test-setup.R | 2 +- 6 files changed, 38 insertions(+), 31 deletions(-) diff --git a/inst/support/austraits.build_schema.yml b/inst/support/austraits.build_schema.yml index 17dd5c342..e7982ba77 100644 --- a/inst/support/austraits.build_schema.yml +++ b/inst/support/austraits.build_schema.yml @@ -153,11 +153,11 @@ austraits: elements: dataset_id: *dataset_id original_name: *original_name - cleaned_name: The taxon name without authorship after implementing automated syntax standardisation and spelling changes as well as manually encoded syntax alignments for this taxon in the metadata file for the corresponding `dataset_id`. This name has not yet been matched to the currently accepted (botanical) or valid (zoological) taxon name in cases where there are taxonomic synonyms, isonyms, orthographic variants, etc. + aligned_name: The taxon name without authorship after implementing automated syntax standardisation and spelling changes as well as manually encoded syntax alignments for this taxon in the metadata file for the corresponding `dataset_id`. This name has not yet been matched to the currently accepted (botanical) or valid (zoological) taxon name in cases where there are taxonomic synonyms, isonyms, orthographic variants, etc. taxonomic_resolution: &taxonomic_resolution The rank of the most specific taxon name (or scientific name) to which a submitted orignal name resolves. - cleaned_scientific_name_id: An identifier for the cleaned name before it is updated to the currently accepted name usage. This may be a global unique identifier or an identifier specific to the data set. Must be resolvable within this dataset. - cleaned_name_taxonomic_status: The status of the use of the `cleaned_name` as a label for a taxon. Requires taxonomic opinion to define the scope of a taxon. Rules of priority then are used to define the taxonomic status of the nomenclature contained in that scope, combined with the experts opinion. It must be linked to a specific taxonomic reference that defines the concept. - cleaned_name_alternative_taxonomic_status: The taxonomic status of alternative taxonomic records with `cleaned_name` as the accepted (botanical) or valid (zoological) taxon name. + aligned_scientific_name_id: An identifier for the cleaned name before it is updated to the currently accepted name usage. This may be a global unique identifier or an identifier specific to the data set. Must be resolvable within this dataset. + aligned_name_taxonomic_status: The status of the use of the `aligned_name` as a label for a taxon. Requires taxonomic opinion to define the scope of a taxon. Rules of priority then are used to define the taxonomic status of the nomenclature contained in that scope, combined with the experts opinion. It must be linked to a specific taxonomic reference that defines the concept. + aligned_name_alternative_taxonomic_status: The taxonomic status of alternative taxonomic records with `aligned_name` as the accepted (botanical) or valid (zoological) taxon name. taxon_id: &taxon_id An identifier for the set of taxon information (data associated with the taxon class). May be a global unique identifier or an identifier specific to the data set. Must be resolvable within this dataset. taxon_name: *taxon_name taxa: @@ -165,7 +165,7 @@ austraits: type: table elements: taxon_name: *taxon_name - taxonomic_reference: Name of the taxonomy (tree) that contains this concept. ie. APC, AusMoss etc. + taxonomic_dataset: Name of the taxonomy (tree) that contains this concept. ie. APC, AusMoss etc. taxon_rank: The taxonomic rank of the most specific name in the scientific name. trinomial: The infraspecific taxon name match for an original name. This column is assigned `na` for taxon name that are at a broader taxonomic_resolution. binomial: The species-level taxon name match for an original name. This column is assigned `na` for taxon name that are at a broader taxonomic_resolution. diff --git a/inst/support/report_dataset.Rmd b/inst/support/report_dataset.Rmd index 81ebbd52e..9fcfd11c7 100644 --- a/inst/support/report_dataset.Rmd +++ b/inst/support/report_dataset.Rmd @@ -738,23 +738,23 @@ if(nrow(tmp) < 10000 ){ To create the list of aligned taxa, we needed to make some taxonomic changes. This involved two stages. -* **Stage 1**: Where possible (i.e. there was no or only a few characters difference), the name you supplied was matched automatically with a known name in APC or APNI. In other cases we may have aligned the taxa by searching for an appropriate match. Such changes are documented in the study metadata file. The variable `cleaned_name` shows the updated name. The variable `d1` shows the number of characters difference between the `original_name` and `cleaned_name`. -* **Stage 2**: Once aligned with a known name, we used the APC to update the `cleaned_name` to an accepted name. The taxonomic status of the cleaned name is indicated in column `status cleaned name`. If accepted, no change was made. If it is a synonym or otherwise, the name was changed according to the recommendation given in the APC. Where they existed, we preferred to take the accepted status of an `cleaned_name`, if it existed. Alternative status values are indicated in brackets. This indicate if alternative uses of the name were ever applied. +* **Stage 1**: Where possible (i.e. there was no or only a few characters difference), the name you supplied was matched automatically with a known name in APC or APNI. In other cases we may have aligned the taxa by searching for an appropriate match. Such changes are documented in the study metadata file. The variable `aligned_name` shows the updated name. The variable `d1` shows the number of characters difference between the `original_name` and `aligned_name`. +* **Stage 2**: Once aligned with a known name, we used the APC to update the `aligned_name` to an accepted name. The taxonomic status of the cleaned name is indicated in column `status cleaned name`. If accepted, no change was made. If it is a synonym or otherwise, the name was changed according to the recommendation given in the APC. Where they existed, we preferred to take the accepted status of an `aligned_name`, if it existed. Alternative status values are indicated in brackets. This indicate if alternative uses of the name were ever applied. -Links on `cleaned_name` and `taxon_name` take you to the APC or APNI record for that name. +Links on `aligned_name` and `taxon_name` take you to the APC or APNI record for that name. ```{r} data_study$taxonomic_updates %>% select(-dataset_id) %>% filter(original_name !=taxon_name) %>% mutate( - d1 = purrr::map2_dbl(original_name, cleaned_name, ~adist(.x, .y)) %>% as.character() %>% str_replace("0", ""), - d2 = purrr::map2_dbl(taxon_name, cleaned_name, ~adist(.x, .y)) %>% as.character() %>% str_replace("0", ""), - cleaned_name = as_link(cleaned_scientific_name_id, cleaned_name), + d1 = purrr::map2_dbl(original_name, aligned_name, ~adist(.x, .y)) %>% as.character() %>% str_replace("0", ""), + d2 = purrr::map2_dbl(taxon_name, aligned_name, ~adist(.x, .y)) %>% as.character() %>% str_replace("0", ""), + aligned_name = as_link(aligned_scientific_name_id, aligned_name), taxon_name = as_link(taxon_id, taxon_name), - cleaned_name_alternative_taxonomic_status = replace_na(cleaned_name_alternative_taxonomic_status, ""), - `status cleaned name` = sprintf("%s %s", cleaned_name_taxonomic_status, ifelse(cleaned_name_alternative_taxonomic_status=="", "", paste("(", cleaned_name_alternative_taxonomic_status, ")"))) + aligned_name_alternative_taxonomic_status = replace_na(aligned_name_alternative_taxonomic_status, ""), + `status cleaned name` = sprintf("%s %s", aligned_name_taxonomic_status, ifelse(aligned_name_alternative_taxonomic_status=="", "", paste("(", aligned_name_alternative_taxonomic_status, ")"))) ) %>% - select(original_name, d1, cleaned_name, d2, taxon_name, `status cleaned name`) %>% + select(original_name, d1, aligned_name, d2, taxon_name, `status cleaned name`) %>% mutate_all(replace_na, "") %>% my_kable_styling() ``` diff --git a/remake.yml b/remake.yml index 9087e0d09..d03dc1015 100644 --- a/remake.yml +++ b/remake.yml @@ -1,27 +1,19 @@ -# This file is automatically generated from traits.build -# package, via the file remake.yml.whisker: +# This file is automatically generated from traits.build +# package, via the file remake.yml.whisker: # edit the file there (or the files that it includes). packages: - traits.build targets: + +# Define generic targets all: depends: - austraits - export/data/curr/austraits.rds - export/data/curr/austraits.rds: - command: saveRDS(austraits, target_name) - - version_number: - command: util_get_version("config/metadata.yml") - - git_SHA: - command: util_get_SHA() - depends: - - .git/index - +# Load data resources schema: command: get_schema() @@ -37,6 +29,7 @@ targets: taxon_list: command: read_csv_char("config/taxon_list.csv") +# Build each source ABRS_1981_config: command: > dataset_configure("data/ABRS_1981/metadata.yml", @@ -7302,6 +7295,20 @@ targets: Zieminska_2013, Zieminska_2015, NULL) - + +# Version information + version_number: + command: util_get_version("config/metadata.yml") + + git_SHA: + command: util_get_SHA() + depends: + - .git/index + +# Combine all the source into one resource austraits: command: build_add_version(austraits_raw, version_number, git_SHA) + +# Save to file + export/data/curr/austraits.rds: + command: saveRDS(austraits, target_name) diff --git a/scripts/dictionary.Rmd b/scripts/dictionary.Rmd index d9a8d089e..aa99ba594 100644 --- a/scripts/dictionary.Rmd +++ b/scripts/dictionary.Rmd @@ -184,7 +184,7 @@ AusTraits does not include intra-individual observations made at a single point ## Taxonomy -Version `r austraits$build_info$version` of AusTraits contains records for `r austraits$taxa %>% nrow()` different taxa. We have aligned taxa with known taxonomic units in the [`Australian Plant Census` (APC)](https://biodiversity.org.au/nsl/services/apc) and/or the [`Australian Plant Names Index` (APNI)](https://biodiversity.org.au/nsl/services/APNI). Of the `r austraits$taxa %>% nrow()` taxa included, `r austraits$taxa %>% filter(!is.na(taxonomic_reference)) %>% nrow()` are aligned with known taxa. +Version `r austraits$build_info$version` of AusTraits contains records for `r austraits$taxa %>% nrow()` different taxa. We have aligned taxa with known taxonomic units in the [`Australian Plant Census` (APC)](https://biodiversity.org.au/nsl/services/apc) and/or the [`Australian Plant Names Index` (APNI)](https://biodiversity.org.au/nsl/services/APNI). Of the `r austraits$taxa %>% nrow()` taxa included, `r austraits$taxa %>% filter(!is.na(taxonomic_dataset)) %>% nrow()` are aligned with known taxa. The `traits` table reports both the original and the updated taxon name alongside each trait record. diff --git a/tests/testthat/config/taxon_list-orig.csv b/tests/testthat/config/taxon_list-orig.csv index cd2622d43..fd33fee44 100644 --- a/tests/testthat/config/taxon_list-orig.csv +++ b/tests/testthat/config/taxon_list-orig.csv @@ -1,4 +1,4 @@ -cleaned_name,taxonomic_reference,cleaned_scientific_name_id,cleaned_name_taxonomic_status,cleaned_name_alternative_taxonomic_status,taxon_name,taxon_id,scientific_name_authorship,taxon_rank,taxonomic_status,family,taxon_distribution,establishment_means,scientific_name,scientific_name_id +aligned_name,taxonomic_dataset,aligned_scientific_name_id,aligned_name_taxonomic_status,aligned_name_alternative_taxonomic_status,taxon_name,taxon_id,scientific_name_authorship,taxon_rank,taxonomic_status,family,taxon_distribution,establishment_means,scientific_name,scientific_name_id Acacia celsa,APC,https://id.biodiversity.org.au/name/apni/154988,accepted,NA,Acacia celsa,https://id.biodiversity.org.au/taxon/apni/51436506,Tindale,Species,accepted,Fabaceae,Qld,native,Acacia celsa Tindale,https://id.biodiversity.org.au/name/apni/154988 Acronychia acidula,APC,https://id.biodiversity.org.au/name/apni/73794,accepted,NA,Acronychia acidula,https://id.biodiversity.org.au/node/apni/2913200,F.Muell.,Species,accepted,Rutaceae,Qld,native,Acronychia acidula F.Muell.,https://id.biodiversity.org.au/name/apni/73794 Alphitonia petriei,APC,https://id.biodiversity.org.au/name/apni/82935,accepted,NA,Alphitonia petriei,https://id.biodiversity.org.au/node/apni/2887911,Braid & C.T.White,Species,accepted,Rhamnaceae,"Qld, NSW",native,Alphitonia petriei Braid & C.T.White,https://id.biodiversity.org.au/name/apni/82935 diff --git a/tests/testthat/test-setup.R b/tests/testthat/test-setup.R index 381bb52cd..c830d3386 100644 --- a/tests/testthat/test-setup.R +++ b/tests/testthat/test-setup.R @@ -169,7 +169,7 @@ test_that("test build_setup_pipeline is working",{ expect_true(file.exists("config/taxon_list.csv")) expect_silent(taxa1 <- read_csv_char("config/taxon_list.csv")) - vars <- c('cleaned_name', 'taxonomic_reference', 'cleaned_scientific_name_id', 'cleaned_name_taxonomic_status', 'cleaned_name_alternative_taxonomic_status', 'taxon_name', 'taxon_id', 'scientific_name_authorship', 'taxon_rank', 'taxonomic_status', 'family', 'taxon_distribution', 'establishment_means', 'scientific_name', 'scientific_name_id') + vars <- c('aligned_name', 'taxonomic_dataset', 'aligned_scientific_name_id', 'aligned_name_taxonomic_status', 'aligned_name_alternative_taxonomic_status', 'taxon_name', 'taxon_id', 'scientific_name_authorship', 'taxon_rank', 'taxonomic_status', 'family', 'taxon_distribution', 'establishment_means', 'scientific_name', 'scientific_name_id') expect_named(taxa1, vars) expect_length(taxa1, 15) From 5fc520fca412c6f3eb514a18730204c4db9d169a Mon Sep 17 00:00:00 2001 From: ehwenk Date: Wed, 1 Nov 2023 21:32:26 +1100 Subject: [PATCH 2/2] Update taxon_list.csv forgot to rename columns in the current taxon_list.csv file (because I'm working off a different one) --- config/taxon_list.csv | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/config/taxon_list.csv b/config/taxon_list.csv index 2d158aeb0..5f04c8557 100644 --- a/config/taxon_list.csv +++ b/config/taxon_list.csv @@ -1,4 +1,4 @@ -cleaned_name,taxonomic_reference,cleaned_scientific_name_id,cleaned_name_taxonomic_status,cleaned_name_alternative_taxonomic_status,taxon_name,taxon_id,scientific_name_authorship,taxon_rank,taxonomic_status,family,taxon_distribution,establishment_means,scientific_name,scientific_name_id +aligned_name,taxonomic_reference,aligned_scientific_name_id,aligned_name_taxonomic_status,aligned_name_alternative_taxonomic_status,taxon_name,taxon_id,scientific_name_authorship,taxon_rank,taxonomic_status,family,taxon_distribution,establishment_means,scientific_name,scientific_name_id (Dockrillia pugioniformis x Dockrillia striolata) x Dockrillia pugioniformis,APC,https://id.biodiversity.org.au/name/apni/51342600,accepted,NA,(Dockrillia pugioniformis x Dockrillia striolata) x Dockrillia pugioniformis,https://id.biodiversity.org.au/taxon/apni/51404554,(A.Cunn.) Rauschert x Dockrillia striolata (Rchb.f.) Rauschert) x Dockrillia pugioniformis (A.Cunn.) Rauschert,Species,accepted,Orchidaceae,NSW,native,(Dockrillia pugioniformis (A.Cunn.) Rauschert x Dockrillia striolata (Rchb.f.) Rauschert) x Dockrillia pugioniformis (A.Cunn.) Rauschert,https://id.biodiversity.org.au/name/apni/51342600 Abelia,APC,https://id.biodiversity.org.au/name/apni/147345,accepted,NA,Abelia,https://id.biodiversity.org.au/taxon/apni/51432946,R.Br.,Genus,accepted,Caprifoliaceae,NSW (naturalised),naturalised,Abelia R.Br.,https://id.biodiversity.org.au/name/apni/147345 Abelia x grandiflora,APC,https://id.biodiversity.org.au/name/apni/190758,accepted,NA,Abelia x grandiflora,https://id.biodiversity.org.au/taxon/apni/51432945,(Rovelli ex André) Rehder,Species,accepted,Caprifoliaceae,NSW (naturalised),naturalised,Abelia x grandiflora (Rovelli ex André) Rehder,https://id.biodiversity.org.au/name/apni/190758