Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

renaming taxon fields to match APCalign #778

Merged
merged 2 commits into from
Nov 1, 2023
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 5 additions & 5 deletions inst/support/austraits.build_schema.yml
Original file line number Diff line number Diff line change
Expand Up @@ -153,19 +153,19 @@ austraits:
elements:
dataset_id: *dataset_id
original_name: *original_name
cleaned_name: The taxon name without authorship after implementing automated syntax standardisation and spelling changes as well as manually encoded syntax alignments for this taxon in the metadata file for the corresponding `dataset_id`. This name has not yet been matched to the currently accepted (botanical) or valid (zoological) taxon name in cases where there are taxonomic synonyms, isonyms, orthographic variants, etc.
aligned_name: The taxon name without authorship after implementing automated syntax standardisation and spelling changes as well as manually encoded syntax alignments for this taxon in the metadata file for the corresponding `dataset_id`. This name has not yet been matched to the currently accepted (botanical) or valid (zoological) taxon name in cases where there are taxonomic synonyms, isonyms, orthographic variants, etc.
taxonomic_resolution: &taxonomic_resolution The rank of the most specific taxon name (or scientific name) to which a submitted orignal name resolves.
cleaned_scientific_name_id: An identifier for the cleaned name before it is updated to the currently accepted name usage. This may be a global unique identifier or an identifier specific to the data set. Must be resolvable within this dataset.
cleaned_name_taxonomic_status: The status of the use of the `cleaned_name` as a label for a taxon. Requires taxonomic opinion to define the scope of a taxon. Rules of priority then are used to define the taxonomic status of the nomenclature contained in that scope, combined with the experts opinion. It must be linked to a specific taxonomic reference that defines the concept.
cleaned_name_alternative_taxonomic_status: The taxonomic status of alternative taxonomic records with `cleaned_name` as the accepted (botanical) or valid (zoological) taxon name.
aligned_scientific_name_id: An identifier for the cleaned name before it is updated to the currently accepted name usage. This may be a global unique identifier or an identifier specific to the data set. Must be resolvable within this dataset.
aligned_name_taxonomic_status: The status of the use of the `aligned_name` as a label for a taxon. Requires taxonomic opinion to define the scope of a taxon. Rules of priority then are used to define the taxonomic status of the nomenclature contained in that scope, combined with the experts opinion. It must be linked to a specific taxonomic reference that defines the concept.
aligned_name_alternative_taxonomic_status: The taxonomic status of alternative taxonomic records with `aligned_name` as the accepted (botanical) or valid (zoological) taxon name.
taxon_id: &taxon_id An identifier for the set of taxon information (data associated with the taxon class). May be a global unique identifier or an identifier specific to the data set. Must be resolvable within this dataset.
taxon_name: *taxon_name
taxa:
description: A table containing details on taxa associated with information in `traits`. Whenever possible, this information is sourced from curated taxon lists that include identifiers for each taxon. The information compiled in this table is released under a CC-BY3 license. Cross referencing between the two dataframes is possible using combinations of the variable `taxon_name`.
type: table
elements:
taxon_name: *taxon_name
taxonomic_reference: Name of the taxonomy (tree) that contains this concept. ie. APC, AusMoss etc.
taxonomic_dataset: Name of the taxonomy (tree) that contains this concept. ie. APC, AusMoss etc.
taxon_rank: The taxonomic rank of the most specific name in the scientific name.
trinomial: The infraspecific taxon name match for an original name. This column is assigned `na` for taxon name that are at a broader taxonomic_resolution.
binomial: The species-level taxon name match for an original name. This column is assigned `na` for taxon name that are at a broader taxonomic_resolution.
Expand Down
18 changes: 9 additions & 9 deletions inst/support/report_dataset.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -738,23 +738,23 @@ if(nrow(tmp) < 10000 ){

To create the list of aligned taxa, we needed to make some taxonomic changes. This involved two stages.

* **Stage 1**: Where possible (i.e. there was no or only a few characters difference), the name you supplied was matched automatically with a known name in APC or APNI. In other cases we may have aligned the taxa by searching for an appropriate match. Such changes are documented in the study metadata file. The variable `cleaned_name` shows the updated name. The variable `d1` shows the number of characters difference between the `original_name` and `cleaned_name`.
* **Stage 2**: Once aligned with a known name, we used the APC to update the `cleaned_name` to an accepted name. The taxonomic status of the cleaned name is indicated in column `status cleaned name`. If accepted, no change was made. If it is a synonym or otherwise, the name was changed according to the recommendation given in the APC. Where they existed, we preferred to take the accepted status of an `cleaned_name`, if it existed. Alternative status values are indicated in brackets. This indicate if alternative uses of the name were ever applied.
* **Stage 1**: Where possible (i.e. there was no or only a few characters difference), the name you supplied was matched automatically with a known name in APC or APNI. In other cases we may have aligned the taxa by searching for an appropriate match. Such changes are documented in the study metadata file. The variable `aligned_name` shows the updated name. The variable `d1` shows the number of characters difference between the `original_name` and `aligned_name`.
* **Stage 2**: Once aligned with a known name, we used the APC to update the `aligned_name` to an accepted name. The taxonomic status of the cleaned name is indicated in column `status cleaned name`. If accepted, no change was made. If it is a synonym or otherwise, the name was changed according to the recommendation given in the APC. Where they existed, we preferred to take the accepted status of an `aligned_name`, if it existed. Alternative status values are indicated in brackets. This indicate if alternative uses of the name were ever applied.

Links on `cleaned_name` and `taxon_name` take you to the APC or APNI record for that name.
Links on `aligned_name` and `taxon_name` take you to the APC or APNI record for that name.

```{r}
data_study$taxonomic_updates %>% select(-dataset_id) %>%
filter(original_name !=taxon_name) %>%
mutate(
d1 = purrr::map2_dbl(original_name, cleaned_name, ~adist(.x, .y)) %>% as.character() %>% str_replace("0", ""),
d2 = purrr::map2_dbl(taxon_name, cleaned_name, ~adist(.x, .y)) %>% as.character() %>% str_replace("0", ""),
cleaned_name = as_link(cleaned_scientific_name_id, cleaned_name),
d1 = purrr::map2_dbl(original_name, aligned_name, ~adist(.x, .y)) %>% as.character() %>% str_replace("0", ""),
d2 = purrr::map2_dbl(taxon_name, aligned_name, ~adist(.x, .y)) %>% as.character() %>% str_replace("0", ""),
aligned_name = as_link(aligned_scientific_name_id, aligned_name),
taxon_name = as_link(taxon_id, taxon_name),
cleaned_name_alternative_taxonomic_status = replace_na(cleaned_name_alternative_taxonomic_status, ""),
`status cleaned name` = sprintf("%s %s", cleaned_name_taxonomic_status, ifelse(cleaned_name_alternative_taxonomic_status=="", "", paste("(", cleaned_name_alternative_taxonomic_status, ")")))
aligned_name_alternative_taxonomic_status = replace_na(aligned_name_alternative_taxonomic_status, ""),
`status cleaned name` = sprintf("%s %s", aligned_name_taxonomic_status, ifelse(aligned_name_alternative_taxonomic_status=="", "", paste("(", aligned_name_alternative_taxonomic_status, ")")))
) %>%
select(original_name, d1, cleaned_name, d2, taxon_name, `status cleaned name`) %>%
select(original_name, d1, aligned_name, d2, taxon_name, `status cleaned name`) %>%
mutate_all(replace_na, "") %>%
my_kable_styling()
```
Expand Down
35 changes: 21 additions & 14 deletions remake.yml
Original file line number Diff line number Diff line change
@@ -1,27 +1,19 @@
# This file is automatically generated from traits.build
# package, via the file remake.yml.whisker:
# This file is automatically generated from traits.build
# package, via the file remake.yml.whisker:
# edit the file there (or the files that it includes).

packages:
- traits.build

targets:

# Define generic targets
all:
depends:
- austraits
- export/data/curr/austraits.rds

export/data/curr/austraits.rds:
command: saveRDS(austraits, target_name)

version_number:
command: util_get_version("config/metadata.yml")

git_SHA:
command: util_get_SHA()
depends:
- .git/index

# Load data resources
schema:
command: get_schema()

Expand All @@ -37,6 +29,7 @@ targets:
taxon_list:
command: read_csv_char("config/taxon_list.csv")

# Build each source
ABRS_1981_config:
command: >
dataset_configure("data/ABRS_1981/metadata.yml",
Expand Down Expand Up @@ -7302,6 +7295,20 @@ targets:
Zieminska_2013,
Zieminska_2015,
NULL)


# Version information
version_number:
command: util_get_version("config/metadata.yml")

git_SHA:
command: util_get_SHA()
depends:
- .git/index

# Combine all the source into one resource
austraits:
command: build_add_version(austraits_raw, version_number, git_SHA)

# Save to file
export/data/curr/austraits.rds:
command: saveRDS(austraits, target_name)
2 changes: 1 addition & 1 deletion scripts/dictionary.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -184,7 +184,7 @@ AusTraits does not include intra-individual observations made at a single point

## Taxonomy

Version `r austraits$build_info$version` of AusTraits contains records for `r austraits$taxa %>% nrow()` different taxa. We have aligned taxa with known taxonomic units in the [`Australian Plant Census` (APC)](https://biodiversity.org.au/nsl/services/apc) and/or the [`Australian Plant Names Index` (APNI)](https://biodiversity.org.au/nsl/services/APNI). Of the `r austraits$taxa %>% nrow()` taxa included, `r austraits$taxa %>% filter(!is.na(taxonomic_reference)) %>% nrow()` are aligned with known taxa.
Version `r austraits$build_info$version` of AusTraits contains records for `r austraits$taxa %>% nrow()` different taxa. We have aligned taxa with known taxonomic units in the [`Australian Plant Census` (APC)](https://biodiversity.org.au/nsl/services/apc) and/or the [`Australian Plant Names Index` (APNI)](https://biodiversity.org.au/nsl/services/APNI). Of the `r austraits$taxa %>% nrow()` taxa included, `r austraits$taxa %>% filter(!is.na(taxonomic_dataset)) %>% nrow()` are aligned with known taxa.

The `traits` table reports both the original and the updated taxon name alongside each trait record.

Expand Down
2 changes: 1 addition & 1 deletion tests/testthat/config/taxon_list-orig.csv
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
cleaned_name,taxonomic_reference,cleaned_scientific_name_id,cleaned_name_taxonomic_status,cleaned_name_alternative_taxonomic_status,taxon_name,taxon_id,scientific_name_authorship,taxon_rank,taxonomic_status,family,taxon_distribution,establishment_means,scientific_name,scientific_name_id
aligned_name,taxonomic_dataset,aligned_scientific_name_id,aligned_name_taxonomic_status,aligned_name_alternative_taxonomic_status,taxon_name,taxon_id,scientific_name_authorship,taxon_rank,taxonomic_status,family,taxon_distribution,establishment_means,scientific_name,scientific_name_id
Acacia celsa,APC,https://id.biodiversity.org.au/name/apni/154988,accepted,NA,Acacia celsa,https://id.biodiversity.org.au/taxon/apni/51436506,Tindale,Species,accepted,Fabaceae,Qld,native,Acacia celsa Tindale,https://id.biodiversity.org.au/name/apni/154988
Acronychia acidula,APC,https://id.biodiversity.org.au/name/apni/73794,accepted,NA,Acronychia acidula,https://id.biodiversity.org.au/node/apni/2913200,F.Muell.,Species,accepted,Rutaceae,Qld,native,Acronychia acidula F.Muell.,https://id.biodiversity.org.au/name/apni/73794
Alphitonia petriei,APC,https://id.biodiversity.org.au/name/apni/82935,accepted,NA,Alphitonia petriei,https://id.biodiversity.org.au/node/apni/2887911,Braid & C.T.White,Species,accepted,Rhamnaceae,"Qld, NSW",native,Alphitonia petriei Braid & C.T.White,https://id.biodiversity.org.au/name/apni/82935
Expand Down
2 changes: 1 addition & 1 deletion tests/testthat/test-setup.R
Original file line number Diff line number Diff line change
Expand Up @@ -169,7 +169,7 @@ test_that("test build_setup_pipeline is working",{
expect_true(file.exists("config/taxon_list.csv"))
expect_silent(taxa1 <- read_csv_char("config/taxon_list.csv"))

vars <- c('cleaned_name', 'taxonomic_reference', 'cleaned_scientific_name_id', 'cleaned_name_taxonomic_status', 'cleaned_name_alternative_taxonomic_status', 'taxon_name', 'taxon_id', 'scientific_name_authorship', 'taxon_rank', 'taxonomic_status', 'family', 'taxon_distribution', 'establishment_means', 'scientific_name', 'scientific_name_id')
vars <- c('aligned_name', 'taxonomic_dataset', 'aligned_scientific_name_id', 'aligned_name_taxonomic_status', 'aligned_name_alternative_taxonomic_status', 'taxon_name', 'taxon_id', 'scientific_name_authorship', 'taxon_rank', 'taxonomic_status', 'family', 'taxon_distribution', 'establishment_means', 'scientific_name', 'scientific_name_id')

expect_named(taxa1, vars)
expect_length(taxa1, 15)
Expand Down
Loading