Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug fix, re-number match algorithms #179

Merged
merged 7 commits into from
Feb 6, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion R/APCalign-package.R
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ utils::globalVariables(
"family",
"fuzzy_match_genus",
"fuzzy_match_genus_APNI",
"fuzzy_match_genus_known",
"fuzzy_match_genus_synonym",
"genus",
"genus_accepted",
"known",
Expand Down
20 changes: 10 additions & 10 deletions R/align_taxa.R
Original file line number Diff line number Diff line change
Expand Up @@ -35,16 +35,16 @@
#' - binomial: the first two words in `stripped_name2`, required for matches that ignore all other text in the original_name; improves phrase name matches.
#' - genus: the first two words in `cleaned_name`; required for genus-rank matches and reprocessing of genus-rank names.
#' - fuzzy_match_genus: fuzzy match of genus column to best match among APC-accepted names; required for fuzzy matches of genus-rank names.
#' - fuzzy_match_genus_known: fuzzy match of genus column to best match among APC-known names, only considering different matches to those documented under APC-accepted genera; required for fuzzy matches of genus-rank names.
#' - fuzzy_match_genus_synonym: fuzzy match of genus column to best match among APC-known names, only considering different matches to those documented under APC-accepted genera; required for fuzzy matches of genus-rank names.
#' - fuzzy_match_genus_APNI: fuzzy match of genus column to best match among APNI names, only considering different matches to those documented under APC-accepted and APC-known genera; required for fuzzy matches of genus-rank names.
#' - fuzzy_match_cleaned_APC: fuzzy match of stripped_name to APC-accepted names; created for yet-to-be-aligned names at the match step 07a in the function `match_taxa`.
#' - fuzzy_match_cleaned_APC_known: fuzzy match of stripped_name to APC-known names; created for yet-to-be-aligned names at the match step 07b in the function `match_taxa`.
#' - fuzzy_match_cleaned_APC_synonym: fuzzy match of stripped_name to APC-known names; created for yet-to-be-aligned names at the match step 07b in the function `match_taxa`.
#' - fuzzy_match_cleaned_APC_imprecise: imprecise fuzzy match of stripped_name to APC-accepted names; created for yet-to-be-aligned names at the match step 10a in the function `match_taxa`.
#' - fuzzy_match_cleaned_APC_known_imprecise: imprecise fuzzy match of stripped_name to APC-accepted names; created for yet-to-be-aligned names at the match step 10b in the function `match_taxa`.
#' - fuzzy_match_cleaned_APC_synonym_imprecise: imprecise fuzzy match of stripped_name to APC-accepted names; created for yet-to-be-aligned names at the match step 10b in the function `match_taxa`.
#' - fuzzy_match_binomial: fuzzy match of binomial column to best match among APC-accepted names; created for yet-to-be-aligned names at match step 15a in the function `match_taxa`.
#' - fuzzy_match_binomial_APC_known: fuzzy match of binomial column to best match among APC-known names; created for yet-to-be-aligned names at match step 15a in the function `match_taxa`.
#' - fuzzy_match_binomial_APC_synonym: fuzzy match of binomial column to best match among APC-known names; created for yet-to-be-aligned names at match step 15a in the function `match_taxa`.
#' - fuzzy_match_trinomial: fuzzy match of trinomial column to best match among APC-accepted names; created for yet-to-be-aligned names at match step 16a in the function `match_taxa`.
#' - fuzzy_match_trinomial_known: fuzzy match of trinomial column to best match among APC-known names; created for yet-to-be-aligned names at match step 16b in the function `match_taxa`.
#' - fuzzy_match_trinomial_synonym: fuzzy match of trinomial column to best match among APC-known names; created for yet-to-be-aligned names at match step 16b in the function `match_taxa`.
#' - fuzzy_match_cleaned_APNI: fuzzy match of stripped_name to APNI names; created for yet-to-be-aligned names at the match step 16a in the function `match_taxa`.
#' - fuzzy_match_cleaned_APNI_imprecise: imprecise fuzzy match of stripped_name to APNI names; created for yet-to-be-aligned names at the match step 17a in the function `match_taxa`.
#'
Expand Down Expand Up @@ -127,17 +127,17 @@ align_taxa <- function(original_name,
aligned_name = NA_character_,
aligned_reason = NA_character_,
fuzzy_match_genus = NA_character_,
fuzzy_match_genus_known = NA_character_,
fuzzy_match_genus_synonym = NA_character_,
fuzzy_match_genus_APNI = NA_character_,
fuzzy_match_binomial = NA_character_,
fuzzy_match_binomial_APC_known = NA_character_,
fuzzy_match_binomial_APC_synonym = NA_character_,
fuzzy_match_trinomial = NA_character_,
fuzzy_match_trinomial_known = NA_character_,
fuzzy_match_trinomial_synonym = NA_character_,
fuzzy_match_cleaned_APC = NA_character_,
fuzzy_match_cleaned_APC_known = NA_character_,
fuzzy_match_cleaned_APC_synonym = NA_character_,
fuzzy_match_cleaned_APNI = NA_character_,
fuzzy_match_cleaned_APC_imprecise = NA_character_,
fuzzy_match_cleaned_APC_known_imprecise = NA_character_,
fuzzy_match_cleaned_APC_synonym_imprecise = NA_character_,
fuzzy_match_cleaned_APNI_imprecise = NA_character_,
taxonomic_dataset = NA_character_,
taxon_rank = NA_character_,
Expand Down
2 changes: 1 addition & 1 deletion R/create_taxonomic_update_lookup.R
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@
#' - update_reason: the explanation of a specific taxon name update (from an aligned name to an accepted or suggested name).
#' - subclass: the subclass of the accepted name.
#' - taxon_distribution: the distribution of the accepted name; only filled in if an APC accepted_name is available.
#' - scientific_name_authorship: the authorship information for the accepted (or known) name; available for both APC and APNI names.
#' - scientific_name_authorship: the authorship information for the accepted (or synonymous) name; available for both APC and APNI names.
#' - taxon_ID: the unique taxon concept identifier for the accepted_name; only filled in if an APC accepted_name is available.
#' - taxon_ID_genus: an identifier for the genus; only filled in if an APC-accepted genus name is available.
#' - scientific_name_ID: an identifier for the nomenclatural (not taxonomic) details of a scientific name; available for both APC and APNI names.
Expand Down
7 changes: 5 additions & 2 deletions R/load_taxonomic_resources.R
Original file line number Diff line number Diff line change
Expand Up @@ -195,9 +195,10 @@ load_taxonomic_resources <-
genus
) %>%
dplyr::filter(taxon_rank %in% c("genus"), taxonomic_status == "accepted") %>%
dplyr::filter(!stringr::str_detect(stringr::word(genus, 1), "aceae$")) %>%
dplyr::mutate(taxonomic_dataset = "APC")

taxonomic_resources[["genera_known"]] <-
taxonomic_resources[["genera_synonym"]] <-
taxonomic_resources$APC %>%
dplyr::select(
canonical_name,
Expand All @@ -213,6 +214,7 @@ load_taxonomic_resources <-
) %>%
dplyr::filter(taxon_rank %in% c("genus")) %>%
dplyr::filter(!canonical_name %in% taxonomic_resources$genera_accepted$canonical_name) %>%
dplyr::filter(!stringr::str_detect(stringr::word(genus, 1), "aceae$")) %>%
dplyr::mutate(taxonomic_dataset = "APC") %>%
dplyr::distinct(canonical_name, .keep_all = TRUE)

Expand All @@ -229,13 +231,14 @@ load_taxonomic_resources <-
) %>%
dplyr::filter(taxon_rank %in% c("genus")) %>%
dplyr::filter(!canonical_name %in% taxonomic_resources$APC$canonical_name) %>%
dplyr::filter(!stringr::str_detect(stringr::word(genus, 1), "aceae$")) %>%
dplyr::mutate(taxonomic_dataset = "APNI") %>%
dplyr::distinct(canonical_name, .keep_all = TRUE)

taxonomic_resources[["genera_all"]] <-
dplyr::bind_rows(
taxonomic_resources$genera_accepted,
taxonomic_resources$genera_known,
taxonomic_resources$genera_synonym,
taxonomic_resources$genera_APNI
) %>%
dplyr::mutate(
Expand Down
Loading
Loading