Skip to content

Commit

Permalink
work on APCalign vignettes/articles & function descriptions (#175)
Browse files Browse the repository at this point in the history
-  add vignette with more details about each function, including some example code
- ensure 1-sentence function names are similar to longer descriptions
- ensure longer function descriptions identical in manuscript, in function_notes file, and in individual R files for functions
- more work formatting tables for vignettes
- add missing info into vignettes; delete an unused vignette
  • Loading branch information
ehwenk authored Jan 23, 2024
1 parent 3173a85 commit 2aaaa2e
Show file tree
Hide file tree
Showing 28 changed files with 385 additions and 97 deletions.
15 changes: 9 additions & 6 deletions R/align_taxa.R
Original file line number Diff line number Diff line change
@@ -1,10 +1,13 @@
#' Find taxonomic alignments for a list of names to a version of the Australian Plant Census (APC) through standardizing formatting and checking for spelling issues
#' For a list of Australian plant names, find taxonomic or scientific name alignments to the APC or APNI through standardizing formatting and fixing spelling errors
#'
#' This function uses Australian Plant Census (APC) & the Australian Plant Name Index (APNI) to find taxonomic alignments for a list of names.
#' It uses the internal function `match_taxa` to attempt to match input strings to taxon names in the APC/APNI.
#' It sequentially searches for matches against more than 20 different string patterns, prioritising exact matches (to accepted names as well as synonyms, orthographic variants)
#' over fuzzy matches. It prioritises matches to taxa in the APC over names in the APNI.
#' It identifies string patterns in input names that suggest a name can only be aligned to a genus (hybrids that are not in the APC/ANI; graded species; taxa not identified to species), and indicates these names only have a genus-rank match.
#' This function finds taxonomic alignments in APC or scientific name alignments in APNI.
#' It uses the internal function `match_taxa` to attempt to match input strings to taxon names in the APC/APNI.
#' It sequentially searches for matches against more than 20 different string patterns,
#' prioritising exact matches (to accepted names as well as synonyms, orthographic variants) over fuzzy matches.
#' It prioritises matches to taxa in the APC over names in the APNI.
#' It identifies string patterns in input names that suggest a name can only be aligned to a genus
#' (hybrids that are not in the APC/ANI; graded species; taxa not identified to species),
#' and indicates these names only have a genus-rank match.
#'
#' @param original_name A list of names to query for taxonomic alignments.
#' @param output (optional) The name of the file to save the results to.
Expand Down
5 changes: 3 additions & 2 deletions R/create_species_state_origin_matrix.R
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
#' Process geographic data and return state level species origin and diversity counts
#' Use the taxon distribution data from the APC to determine state level native and introduced origin status
#'
#' This function processes the geographic data available in the current or any version of the Australian Plant Census and returns state level diversity for native, introduced and more complicated species origins.
#' This function processes the geographic data available in the APC and
#' returns state level native, introduced and more complicated origins status for all taxa.
#'
#'
#' @family diversity methods
Expand Down
6 changes: 4 additions & 2 deletions R/create_taxonomic_update_lookup.R
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
#' Create a lookup table to help fix the taxonomy for a list of Australian plant species
#' Create a lookup table with the best-possible scientific name match for a list of Australian plant names
#'
#' This function takes a list of Australian plant species that needs to be reconciled with current taxonomy and generates a lookup table to help fix the taxonomy. The lookup table contains the original species names, the aligned species names, and additional taxonomic information such as taxon IDs and genera.
#' This function takes a list of Australian plant names that need to be reconciled with current taxonomy and
#' generates a lookup table of the best-possible scientific name match for each input name.
#' It uses first the function `align_taxa`, then the function `update_taxonomy` to achieve the output.
#'
#' @family taxonomic alignment functions
#'
Expand Down
4 changes: 3 additions & 1 deletion R/load_taxonomic_resources.R
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
#' Load taxonomic resources from either stable or current versions of APC and APNI
#'
#' Loads taxonomic resources into the global environment. This function accesses taxonomic data from a dataset using the provided version number or the default version. The loaded data contains two lists: APC and APNI, which contain taxonomic information about plant species in Australia. The function creates several tibbles by filtering and selecting data from the loaded lists.
#' This function loads two taxonomic datasets for Australia's vascular plants, the APC and APNI, into the global environment.
#' It accesses taxonomic data from a dataset using the provided version number or the default version.
#' The function creates several data frames by filtering and selecting data from the loaded lists.
#'
#' @param stable_or_current_data Type of dataset to access. The default is "stable", which loads the
#' dataset from a github archived file. If set to "current", the dataset will be loaded from
Expand Down
6 changes: 4 additions & 2 deletions R/match_taxa.R
Original file line number Diff line number Diff line change
Expand Up @@ -852,8 +852,9 @@ match_taxa <- function(
return(taxa)
}

# match_09a: `genus aff. species` taxa
# match_09a: `genus aff. species` and `genus cf. species`taxa
# Exact match to APC-accepted or APC-known genus for names where "aff" indicates the taxon has an affinity to another taxon, but isn't the other taxon.
# Similarly, "cf" indicates that a comparison should be made between the specific taxon and another taxon, but again, isn't the other taxon.
# Taxon names fitting this pattern that are not APC-accepted, APC-known, or APNI-listed species are automatically aligned to genus,
# since this is the highest taxon rank that can be attached to the plant name.
# This alignment can only be made after exact matches of complete taxon names to APC/APNI + fuzzy matches to APC are complete,
Expand All @@ -862,7 +863,8 @@ match_taxa <- function(
i <-
(
stringr::str_detect(taxa$tocheck$cleaned_name, "[Aa]ff[\\.\\s]") |
stringr::str_detect(taxa$tocheck$cleaned_name, " affinis ")
stringr::str_detect(taxa$tocheck$cleaned_name, " affinis ") |
stringr::str_detect(taxa$tocheck$cleaned_name, " cf[\\.\\s]")
) &
taxa$tocheck$genus %in% resources$genera_all2$canonical_name

Expand Down
9 changes: 4 additions & 5 deletions R/native_anywhere_in_australia.R
Original file line number Diff line number Diff line change
@@ -1,9 +1,8 @@
#' Check if a vector of species are native anywhere in Australia
#' For a vector of taxon names in to the APC, check if the species are native anywhere in Australia
#'
#' This function checks if the given species is native anywhere in Australia according to the loaded version of the Australian Plant Census (APC).
#' It creates a lookup table from taxonomic resources, and checks if the species
#' is listed as native in that table. Note that this will not detect within Australia invasions,
#' e.g. if a species is from Western Australia and is invasive on the east coast. And recent invasions are unlikely to be documented yet in APC.
#' This function checks if the given species is native anywhere in Australia according to the APC.
#' Note that this will not detect within-Australia introductions, e.g. if a species is from Western Australia and is invasive on the east coast.
#' And recent invasions are unlikely to be documented yet in APC.
#' For the complete matrix of species by states that also represents within-Australia invasions,
#' use \link{create_species_state_origin_matrix}. For spelling checks and taxonomy updates please see \link{create_taxonomic_update_lookup}.
#'
Expand Down
13 changes: 5 additions & 8 deletions R/standardise_names.R
Original file line number Diff line number Diff line change
@@ -1,13 +1,10 @@

#' Standardise Taxon Names
#' Standardises taxon names by performing a series of text substitutions to remove common inconsistencies in taxonomic nomenclature.
#'
#' This function standardises taxon names by performing a series of text
#' substitutions to remove common inconsistencies in taxonomic nomenclature.
#' The function takes a character vector of taxon names as input and returns a
#' character vector of taxon names using standardised taxonomic syntax as output. In particular it standardises
#' the abbreviations used to document infraspecific taxon ranks (subsp., var., f.),
#' as people use many variants of these terms. It also standardises or removes a few additional filler
#' words used within taxon names (affinis becomes aff.; s.l. and s.s. are removed).
#' The function takes a character vector of taxon names as input and
#' returns a character vector of taxon names using standardised taxonomic syntax as output.
#' In particular it standardises taxon rank abbreviations and qualifiers (subsp., var., f.), as people use many variants of these terms.
#' It also standardises or removes a few additional filler words used within taxon names (affinis becomes aff.; s.l. and s.s. are removed).
#'
#' @param taxon_names A character vector of taxon names that need to be standardised.
#'
Expand Down
5 changes: 3 additions & 2 deletions R/state_diversity_counts.R
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
#' Calculate Australian plant state-level diversity for native, introduced, and more complicated species origins
#' For Australian states and territories, use data from the APC to calculate state-level diversity for native, introduced, and more complicated species origins
#'
#' This function calculates state-level diversity for native, introduced, and more complicated species origins based on the geographic data available in the current Australian Plant Census.
#' This function calculates state-level diversity for native, introduced, and more complicated species origins
#' based on the geographic data available in the APC.
#'
#' @family diversity methods
#' @param state A character string indicating the Australian state or territory to calculate the diversity for. Possible values are "NSW", "NT", "Qld", "WA", "ChI", "SA", "Vic", "Tas", "ACT", "NI", "LHI", "MI", "HI", "MDI", "CoI", "CSI", and "AR".
Expand Down
7 changes: 3 additions & 4 deletions R/strip_names.R
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
#' Strip taxonomic names of subtaxa designations and special characters
#' Strip taxonomic names of taxon rank abbreviations and qualifiers and special characters
#'
#' Given a vector of taxonomic names, this function removes subtaxa designations ("subsp.", "var.", "f.", and "ser"),
#' special characters (e.g., "-", ".", "(", ")", "?"), and extra whitespace. The resulting vector
Expand Down Expand Up @@ -34,10 +34,10 @@ strip_names <- function(taxon_names) {
tolower()
}

#' Strip taxonomic names of subtaxa designations, filled words and special characters
#' Strip taxonomic names of taxon rank abbreviations and qualifiers, filler words and special characters
#'
#' Given a vector of taxonomic names, this function removes subtaxa designations ("subsp.", "var.", "f.", and "ser"),
#' additional filler words and characters (" x " for hybrid taxa, "sp.", "cf"),
#' additional filler words and characters (" x " for hybrid taxa, "sp."),
#' special characters (e.g., "-", ".", "(", ")", "?"), and extra whitespace. The resulting vector
#' of names is also converted to lowercase.
#'
Expand Down Expand Up @@ -69,7 +69,6 @@ strip_names_2 <- function(taxon_names) {
stringr::str_replace_all(" sp ", " ") %>%
stringr::str_replace_all(" sp1", " 1") %>%
stringr::str_replace_all(" sp2", " 2") %>%
stringr::str_replace_all(" cf | cf$", " ") %>%
stringr::str_replace_all("\\=", " ") %>%
stringr::str_replace_all(" ", " ") %>%
stringr::str_squish() %>%
Expand Down
9 changes: 5 additions & 4 deletions R/update_taxonomy.R
Original file line number Diff line number Diff line change
@@ -1,8 +1,9 @@
#' Use APC and APNI to update taxonomy, replacing synonyms to current taxa where relevant
#' For a list of taxon names aligned to the APC, update the name to an accepted taxon concept per the APC and add scientific name and taxon concept metadata to names aligned to either the APC or APNI.
#'
#' This function uses the Australia's Virtual Herbarium's taxonomic resources, specifically the Australian Plant
#' Census (APC) and the Australian Plant Name Index (APNI), to update taxonomy of plant species, replacing any synonyms
#' to their current accepted name.
#' This function uses the APC to update the taxonomy of names aligned to a taxon concept listed in the APC to the currently accepted name for the taxon concept.
#' The aligned_data data frame that is input must contain 5 columns,
#' `original_name`, `aligned_name`, `taxon_rank`, `taxonomic_dataset`, and `aligned_reason`.
#' The aligned name is a plant name that has been aligned to a taxon name in the APC or APNI by the align_taxa function.
#'
#' @family taxonomic alignment functions
#'
Expand Down
6 changes: 3 additions & 3 deletions _pkgdown.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,9 +19,9 @@ navbar:
- text: "Data providers"
- text: APC and APNI
href: articles/data-providers.html
- text: "Data caching"
- text: How is APC/APNI stored in APCalign?
href: 'articles/caching.html'
- text: "Functions"
- text: Details on the 10 exported functions, including examples of usage
href: function_notes.html
- text: -------
- text: "Taxon matching"
- text: Our fuzzy matching algorithm
Expand Down
4 changes: 2 additions & 2 deletions inst/extdata/APCalign_outputs_documentation.csv
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ variable,returned by,description
original_name,default,The original plant name.
aligned_name,default,The input plant name that has been aligned to a taxon name in the APC or APNI by the align_taxa function.
accepted_name,default,The APC-accepted plant name when available.
suggested_name,default,The suggested plant name to use. Identical to the accepted_name when an accepted_name exists; otherwise the the suggested_name is the aligned_name.
suggested_name,default,The suggested plant name to use. Identical to the accepted_name when an accepted_name exists; otherwise the suggested_name is the aligned_name or the aligned name with an outdated genus updated.
genus,default,The genus of the accepted (or suggested) name; only APC-accepted genus names are filled in.
family,full,The family of the accepted (or suggested) name; only APC-accepted family names are filled in.
taxon_rank,default,The taxonomic rank of the suggested (and accepted) name.
Expand All @@ -18,4 +18,4 @@ taxon_ID_genus,full,An identifier for the genus; only filled in if an APC-accept
scientific_name_ID,full,An identifier for the nomenclatural (not taxonomic) details of a scientific name; available for both APC and APNI names.
taxonomic_status_aligned,full,The taxonomic status of the aligned name before any taxonomic updates have been applied.
row_number,full,The row number of a specific original_name in the input.
number_of_collapsed_taxa,default,The number of possible taxon names that have been collapsed when taxonomic_splits == "collapse_to_higher_taxon".
number_of_collapsed_taxa,default,"The number of possible taxon names that have been collapsed when taxonomic_splits == ""collapse_to_higher_taxon""."
8 changes: 4 additions & 4 deletions inst/extdata/match_taxa_documentation.csv
Original file line number Diff line number Diff line change
Expand Up @@ -7,12 +7,12 @@ match_03a,"Detect ` -- `, `--` (intergrade taxa) and align to genus","first word
match_03b,"Detect ` -- `, `--` (intergrade taxa) and align to genus","first word (""genus"")",fuzzy,APC accepted taxon concepts,genus,
match_03c,"Detect ` -- `, `--` (intergrade taxa) and align to genus","first word (""genus"")",fuzzy,other APC taxon concepts,genus,
match_03d,"Detect ` -- `, `--` (intergrade taxa) and align to genus","first word (""genus"")",fuzzy,APNI,genus,
match_03e,"Detect ` -- `, `--` (intergrade taxa), but fail to align to genus",NA,no match,NA,genus,
match_03e,"Detect ` -- `, `--` (intergrade taxa), but fail to align to genus",NA,no match,NA,NA,
match_04a,Detect ` \` (indecision between taxa) and align to genus.,"first word (""genus"")",exact,"APC accepted taxon concepts, other APC taxon concepts, APNI",genus,Next find strings that indicate a name reflects a data collector's indecision about which of two (or more) taxa is the appropriate taxon. These names can only be aligned to a genus.
match_04b,Detect ` \` (indecision between taxa) and align to genus.,"first word (""genus"")",fuzzy,APC accepted taxon concepts,genus,
match_04c,Detect ` \` (indecision between taxa) and align to genus.,"first word (""genus"")",fuzzy,other APC taxon concepts,genus,
match_04d,Detect ` \` (indecision between taxa) and align to genus.,"first word (""genus"")",fuzzy,APNI,genus,
match_04e,"Detect ` \` (indecision between taxa), but fail to align to genus",NA,no match,NA,genus,
match_04e,"Detect ` \` (indecision between taxa), but fail to align to genus",NA,no match,NA,NA,
match_05a,"Detect scientific names, including authorship",original_name,exact,APC accepted taxon concepts,species/infraspecific,"Check if strings are full scientific names, including authorship."
match_05b,"Detect scientific names, including authorship",original_name,exact,other APC taxon concepts,species/infraspecific,
match_06a,"Detect canonical names, lacking authorship",cleaned_name,exact,APC accepted taxon concepts,species/infraspecific,"Check if strings are taxon names, lacking authorship."
Expand All @@ -24,14 +24,14 @@ match_09a,"Detect `aff`, `affinis` (affinity to) and align to genus","first word
match_09b,"Detect `aff`, `affinis` (affinity to) and align to genus","first word (""genus"")",fuzzy,APC accepted taxon concepts,genus,
match_09c,"Detect `aff`, `affinis` (affinity to) and align to genus","first word (""genus"")",fuzzy,other APC taxon concepts,genus,
match_09d,"Detect `aff`, `affinis` (affinity to) and align to genus","first word (""genus"")",fuzzy,APNI,genus,
match_09e,"Detect `aff`, `affinis` (affinity to), but fail to align to genus",NA,no match,NA,genus,
match_09e,"Detect `aff`, `affinis` (affinity to), but fail to align to genus",NA,no match,NA,NA,
match_10a,"Detect canonical names, lacking authorship",stripped_name,imprecise fuzzy,APC accepted taxon concepts,species/infraspecific,"Further checks if strings are taxon names, lacking authorship, now with imprecise fuzzy matching"
match_10b,"Detect canonical names, lacking authorship",stripped_name,imprecise fuzzy,other APC taxon concepts,species/infraspecific,
match_11a,Detect ` x ` (hybrid taxon) and align to genus,"first word (""genus"")",exact,"APC accepted taxon concepts, other APC taxon concepts, APNI",genus,"Find strings that indicate a name that is a hybrid between two taxa. Such names, unless documented in APC (i.e. matches 6, 7 above) can only be aligned to genus."
match_11b,Detect ` x ` (hybrid taxon) and align to genus,"first word (""genus"")",fuzzy,APC accepted taxon concepts,genus,
match_11c,Detect ` x ` (hybrid taxon) and align to genus,"first word (""genus"")",fuzzy,other APC taxon concepts,genus,
match_11d,Detect ` x ` (hybrid taxon) and align to genus,"first word (""genus"")",fuzzy,APNI,genus,
match_11e,"Detect ` x ` (hybrid taxon), but fail to align to genus",NA,no match,NA,genus,
match_11e,"Detect ` x ` (hybrid taxon), but fail to align to genus",NA,no match,NA,NA,
match_12a,"Detect canonical names, by checking first three words in string",trinomial (from stripped_name_2),exact,APC accepted taxon concepts,species/infraspecific,"Check if the first three words in the name string match with a taxon name, allowing notes to be discarded. Also useful for aligning phrase names."
match_12b,"Detect canonical names, by checking first three words in string",trinomial (from stripped_name_2),exact,other APC taxon concepts,species/infraspecific,
match_13a,"Detect canonical names, by checking first three words in string",trinomial (from stripped_name_2),fuzzy,APC accepted taxon concepts,species/infraspecific,
Expand Down
33 changes: 33 additions & 0 deletions inst/extdata/test_taxa.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
original_name
Banksia serrata
Banksia serrate
Banksee serrate
Banksia cerrata
Banksia sp.
Dryandra sp.
Argyrodendron (Whyanbeel)
Argyrodendron ssp. (Whyanbeel BH 1106RFK)
Argyrodendron Whyanbeel
Argyrodendron sp. (Whyanbeel BH 1106RFK)
Argyrodendron sp. Whyanbeel (B.P.Hyland RFK 1106)
Argyrodendron sp. Whyanbeel (B.P.Hyland RFK1106)
Dryandra aurantia
Banksia aurantia
Dryandra blechnifolia
Banksia pellaeifolia
Dryandra idiogenes
Banksia idiogenes
Dryandra lindleyana
Banksia dallanneyi
Acacia aneura
Acacia minyura
Acacia paraneura
Racosperma aneurum
Acacia aneura var. intermedia
Banksia (has long pink leaves)
Dryandra (has long pink leaves)
Acacia minyura / Acacia paraneura
Acacia aphanoclada x Acacia pyrifolia var. pyrifolia
Acacia minyura x Acacia paraneura
"no clue, a monocot"
Orchidaceae (epiphtye)
Loading

0 comments on commit 2aaaa2e

Please sign in to comment.