Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

extract_data not extracting #139

Open
fontikar opened this issue Dec 5, 2024 · 6 comments · May be fixed by #147
Open

extract_data not extracting #139

fontikar opened this issue Dec 5, 2024 · 6 comments · May be fixed by #147
Assignees

Comments

@fontikar
Copy link
Collaborator

fontikar commented Dec 5, 2024

Describe the bug
extract_data not extracting from not standard columns e.g. basis_of_record/lifestage

To Reproduce
Steps to reproduce the behavior:

library(austraits)
#> Loading required package: RefManageR
#> Thanks for showing interest in `austraits`! Please consider citing this package
#> - citation('austraits')

# Load data
austraits <- load_austraits(version = "6.0.0", path = "../austraits/ignore/data/austraits")
#> Downloading AusTraits to '../austraits/ignore/data/austraits'
#> Loading data from '../austraits/ignore/data/austraits/austraits-6.0.0.rds'

austraits |> extract_data(table = "trait", col = "lifestage", col_value = "sapling")
#> Error in UseMethod("slice"): no applicable method for 'slice' applied to an object of class "NULL"

Created on 2024-12-05 with reprex v2.1.1

Expected behavior

Extracted database by lifestage where lifestage==sapling

@fontikar
Copy link
Collaborator Author

fontikar commented Dec 8, 2024

austraits |> extract_data(table = "traits", col = "life_stage", col_value = "sapling")

need checking messages

@fontikar
Copy link
Collaborator Author

Just brainstorm about the problem here.

Currently extract_data accepts both the database object and the traits table.
There are 3-4 types of checks that we need to do:

  • Are values for table, col, col_value supplied? (missingness checks)
  • If there are supplied, do they exist? If not, prompt user to double check spelling and try again

It is relatively straight forward to check the inputs of the database but when a user supplies database = database$traits things get a little complicated because checks created for the database doesn't work if the database is actually a table.

One way is to pass database to table if(is.na(table) table <- database and then proceed with regular checks

I won't implement this at ESA but I can see functions for checking missingness in each argument and the checks for each level (table, col, col_value). Best to write these and call inside extract_data so the main script is not all flooded with checking code.

@fontikar fontikar self-assigned this Dec 16, 2024
fontikar added a commit that referenced this issue Dec 19, 2024
fontikar added a commit that referenced this issue Dec 19, 2024
@fontikar
Copy link
Collaborator Author

Input from @ehwenk

col_value accepts multiple values e.g. c("Acacia", "Banksia", "Cookie").
extract_data should still proceed with extraction for col_values that exist in the column and print a warning that particular value doesn't exist

@fontikar
Copy link
Collaborator Author

Currently stuck on checking on col_value, as currently...extract_data accepts partial matches... so how do we find absolute non matches after partial matching. e.g.

extract_data(database = subset_by_dataset_id3, table = "contexts", col = "context_property", col_value = c("age", "apples"))

age will match to 'leaf age' and apples is a non match...

@ehwenk
Copy link
Collaborator

ehwenk commented Dec 19, 2024

I think it would be ok to swap to exact matches for 'col'. It doesn't seem like that needs to include partial matches.

@fontikar
Copy link
Collaborator Author

fontikar commented Dec 19, 2024

I think it would be ok to swap to exact matches for 'col'. It doesn't seem like that needs to include partial matches.

@ehwenk I think you might be misunderstanding me, the checks are exact matches for col and table which looks like this:

# Check if table name exists 
# Note that users can supply the $traits table to database, for that reason table can be NA
# First check if table is NA

check_table_name_exists <- function(database, table){
  if(!is.na(table) & !tibble::is_tibble(database)){ # database is NOT $traits and `table` is supplied
    if(! names(database) %in% table |> any()){ # Does any names of the tables in database contain `table` 
      cli::cli_abort(
        c(
          "x" = "`{table}` is not a valid table name",
          "i" = "Check `names(database)` and try again!"
        )
      )
    } 
  } 
}

# Check if col exists in specified table when database is traits.build object
check_col_exists_in_table <- function(database, table, col){
  if(! names(database[[table]]) %in% col |> any()){ # Does any names in table contain `col`
    cli::cli_abort(c(
      "x" = "`{col}` is not a valid column name in the `{table}` table",
      "i" = "Check `names(database${table})` and try again!"
    )
    )
  }
}

using the %in% is exacting...

but I need to find the absolute non matches AFTER allowing for partial matching of col_value...

I think it might just be a long day and maybe fresh eyes will be good tomorrow! :)

fontikar added a commit that referenced this issue Jan 5, 2025
…t_data only assigns class if full database supplied #139
fontikar added a commit that referenced this issue Jan 6, 2025
@fontikar fontikar linked a pull request Jan 6, 2025 that will close this issue
@fontikar fontikar linked a pull request Jan 6, 2025 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

Successfully merging a pull request may close this issue.

2 participants