Replies: 12 comments 15 replies
-
Here are some of the major steps that can be converted into issues for migrating functions that interact with database outputs into
2. Feature: Re-export functions in traits.build
|
Beta Was this translation helpful? Give feedback.
-
From @ehwenk: |
Beta Was this translation helpful? Give feedback.
-
We could make landing an older version possible, but then print a message saying this version is not supported by most functions in the package? |
Beta Was this translation helpful? Give feedback.
-
Just starting a discussion thread here to put our thoughts as we review
This discussion was sparked by |
Beta Was this translation helpful? Give feedback.
-
Relational vs wide table which one to work with majority of the time?
|
Beta Was this translation helpful? Give feedback.
-
Generalise Take one Filter specified table and then trim other tables. user needs to specify, table name, col name and value of the col name OR two taxon |
Beta Was this translation helpful? Give feedback.
-
Summary of what is left to do for austraits 3.0.0:
- load_austraits
~- combine_tables argument for simplified and full table ~ ✅
~- Review old issues to see if they are still relevant ~ ✅ |
Beta Was this translation helpful? Give feedback.
-
After talking with Daniel about packing and joining I propose that:
Overall, I'm concerned ecologists don't want to deal with json compacted information. |
Beta Was this translation helpful? Give feedback.
-
Playing with join_locations and join_contexts - some more thoughts about how to make these more useful for people.
|
Beta Was this translation helpful? Give feedback.
-
@dfalster and @ehwenk can I get your opinion about something? When we extract or subset the relational database, what approach do we want to trim the other tables? I think our old approach using
Thoughts? # Workflow for generalising extract function
library(tidyverse)
library(austraits)
#> Loading required package: RefManageR
#> Thanks for showing interest in `austraits`! Please consider citing this package - citation('austraits')
# Example
austraits = load_austraits(version = "6.0.0", path = "ignore/data/austraits/")
#> Downloading AusTraits to 'ignore/data/austraits/'
#> Loading data from 'ignore/data/austraits//austraits-6.0.0.rds'
table = "locations"
col = "location_name"
value = "Queensland"
# Empty list
ret <- list()
# Cookie cutters
locations_cc <- c("dataset_id", "location_id")
# Allowed columns
c(setdiff(names(austraits$locations), locations_cc))
#> [1] "location_name" "location_property" "value"
# Trim locations, pattern matching
found_indicies <- austraits[[table]][[col]] |> stringr::str_which(pattern = value)
ret[[table]] <- austraits[[table]] |>
dplyr::slice(found_indicies)
# Trim traits using key columns (if reference table is locations)
cc_traits <- ret[[table]] |>
dplyr::select(dataset_id, location_id) |>
dplyr::distinct()
# What do we want to keep in the other tables?
## Do we want to keep the rows in the other tables that match the cookie cutter columns from the reference table
## E.g. dataset_id and location_id from the locations table?
# Filtering join
## It will quite literally cookie cutting the traits table if the columns match what is in cc_traits
ret[["traits"]] <- austraits$traits |>
dplyr::semi_join(cc_traits)
#> Joining with `by = join_by(dataset_id, location_id)`
# Double checking if it works
# That the location_ids in the cc_traits are the ones that are left in traits table
austraits$traits |>
dplyr::semi_join(cc_traits) |>
dplyr::filter(dataset_id == "Choat_2012") |>
dplyr::count(location_id)
#> Joining with `by = join_by(dataset_id, location_id)`
#> # A tibble: 4 × 2
#> location_id n
#> <chr> <int>
#> 1 05 20
#> 2 07 3
#> 3 10 3
#> 4 11 5
cc_traits |>
print(n = 100)
#> # A tibble: 37 × 2
#> dataset_id location_id
#> <chr> <chr>
#> 1 ANBG_2019 0039
#> 2 Choat_2012 05
#> 3 Choat_2012 07
#> 4 Choat_2012 10
#> 5 Choat_2012 11
#> 6 Grubb_1996 01
#> 7 Jagdish_2020 23
#> 8 Jagdish_2020 24
#> 9 Jagdish_2020 25
#> 10 Jagdish_2020 26
#> 11 Jagdish_2020 27
#> 12 Jagdish_2020 28
#> 13 Jagdish_2020 29
#> 14 Jagdish_2020 30
#> 15 Jagdish_2020 31
#> 16 Jagdish_2020 32
#> 17 Jagdish_2020 33
#> 18 Jagdish_2020 34
#> 19 Jagdish_2020 35
#> 20 Jagdish_2020 36
#> 21 Jagdish_2020 37
#> 22 Jagdish_2020 38
#> 23 Jagdish_2020 39
#> 24 Jagdish_2020 40
#> 25 Jordan_2015 01
#> 26 Jordan_2015 02
#> 27 Jordan_2015 09
#> 28 Jordan_2015 25
#> 29 Jordan_2015 33
#> 30 Reynolds_2018 01
#> 31 Richards_2008 32
#> 32 Richards_2008 33
#> 33 Richards_2008 34
#> 34 Tomlinson_2013 09
#> 35 Tomlinson_2013 13
#> 36 Tomlinson_2019 06
#> 37 Tomlinson_2019 09
# The below method is how we have always been doing it in the past where we
# filter any rows that have same values for the nominated columns e.g. dataset_id AND location_id
# Problem with this approach we get records that fulfill the combination of dataset_id AND location_id
austraits$traits |>
dplyr::filter(dataset_id %in% cc_traits$dataset_id,
location_id %in% cc_traits$location_id)
#> # A tibble: 4,569 × 26
#> dataset_id taxon_name observation_id trait_name value unit entity_type
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 ANBG_2019 Brachychiton co… 0725 seed_germ… imbi… <NA> individual
#> 2 ANBG_2019 Brachychiton co… 0726 seed_dry_… 63.7… mg individual
#> 3 Choat_2012 Acacia cyclops 01 water_pot… -0.91 MPa population
#> 4 Choat_2012 Agonis flexuosa 03 water_pot… -2.36 MPa population
#> 5 Choat_2012 Allocasuarina c… 04 hydraulic… -4.04 MPa population
#> 6 Choat_2012 Allocasuarina c… 04 hydraulic… 1.5 MPa population
#> 7 Choat_2012 Allocasuarina c… 04 water_pot… -2.96 MPa population
#> 8 Choat_2012 Allocasuarina c… 04 water_pot… -8.5 MPa population
#> 9 Choat_2012 Allocasuarina c… 04 water_pot… -7 MPa population
#> 10 Choat_2012 Alphitonia exce… 05 hydraulic… 1.09 MPa population
#> # ℹ 4,559 more rows
#> # ℹ 19 more variables: value_type <chr>, basis_of_value <chr>,
#> # replicates <chr>, basis_of_record <chr>, life_stage <chr>,
#> # population_id <chr>, individual_id <chr>, repeat_measurements_id <chr>,
#> # temporal_context_id <chr>, source_id <chr>, location_id <chr>,
#> # entity_context_id <chr>, plot_context_id <chr>, treatment_context_id <chr>,
#> # collection_date <chr>, measurement_remarks <chr>, method_id <chr>, …
austraits$traits |>
dplyr::filter(dataset_id %in% cc_traits$dataset_id,
location_id %in% cc_traits$location_id) |>
dplyr::count(dataset_id, location_id) |>
print(n = Inf)
#> # A tibble: 106 × 3
#> dataset_id location_id n
#> <chr> <chr> <int>
#> 1 ANBG_2019 0039 2
#> 2 Choat_2012 01 35
#> 3 Choat_2012 02 1
#> 4 Choat_2012 05 20
#> 5 Choat_2012 06 1
#> 6 Choat_2012 07 3
#> 7 Choat_2012 09 5
#> 8 Choat_2012 10 3
#> 9 Choat_2012 11 5
#> 10 Choat_2012 13 10
#> 11 Grubb_1996 01 133
#> 12 Jagdish_2020 01 37
#> 13 Jagdish_2020 02 59
#> 14 Jagdish_2020 05 4
#> 15 Jagdish_2020 06 4
#> 16 Jagdish_2020 07 4
#> 17 Jagdish_2020 09 14
#> 18 Jagdish_2020 10 4
#> 19 Jagdish_2020 11 19
#> 20 Jagdish_2020 13 4
#> 21 Jagdish_2020 23 7
#> 22 Jagdish_2020 24 13
#> 23 Jagdish_2020 25 4
#> 24 Jagdish_2020 26 7
#> 25 Jagdish_2020 27 10
#> 26 Jagdish_2020 28 7
#> 27 Jagdish_2020 29 4
#> 28 Jagdish_2020 30 7
#> 29 Jagdish_2020 31 4
#> 30 Jagdish_2020 32 4
#> 31 Jagdish_2020 33 7
#> 32 Jagdish_2020 34 7
#> 33 Jagdish_2020 35 11
#> 34 Jagdish_2020 36 13
#> 35 Jagdish_2020 37 7
#> 36 Jagdish_2020 38 4
#> 37 Jagdish_2020 39 8
#> 38 Jagdish_2020 40 18
#> 39 Jordan_2015 01 2
#> 40 Jordan_2015 02 2
#> 41 Jordan_2015 05 2
#> 42 Jordan_2015 06 2
#> 43 Jordan_2015 07 2
#> 44 Jordan_2015 09 18
#> 45 Jordan_2015 10 4
#> 46 Jordan_2015 11 2
#> 47 Jordan_2015 23 2
#> 48 Jordan_2015 25 2
#> 49 Jordan_2015 26 10
#> 50 Jordan_2015 27 2
#> 51 Jordan_2015 28 10
#> 52 Jordan_2015 29 4
#> 53 Jordan_2015 30 2
#> 54 Jordan_2015 31 4
#> 55 Jordan_2015 32 4
#> 56 Jordan_2015 33 4
#> 57 Jordan_2015 34 2
#> 58 Jordan_2015 35 4
#> 59 Jordan_2015 36 2
#> 60 Jordan_2015 37 4
#> 61 Jordan_2015 38 2
#> 62 Jordan_2015 39 2
#> 63 Jordan_2015 40 2
#> 64 Reynolds_2018 01 551
#> 65 Richards_2008 01 28
#> 66 Richards_2008 02 50
#> 67 Richards_2008 05 4
#> 68 Richards_2008 06 28
#> 69 Richards_2008 07 18
#> 70 Richards_2008 09 20
#> 71 Richards_2008 10 20
#> 72 Richards_2008 11 9
#> 73 Richards_2008 13 2
#> 74 Richards_2008 23 6
#> 75 Richards_2008 24 6
#> 76 Richards_2008 25 16
#> 77 Richards_2008 26 15
#> 78 Richards_2008 27 2
#> 79 Richards_2008 28 2
#> 80 Richards_2008 29 2
#> 81 Richards_2008 30 2
#> 82 Richards_2008 31 24
#> 83 Richards_2008 32 20
#> 84 Richards_2008 33 5
#> 85 Richards_2008 34 5
#> 86 Richards_2008 35 18
#> 87 Richards_2008 36 15
#> 88 Richards_2008 37 12
#> 89 Richards_2008 38 18
#> 90 Richards_2008 39 25
#> 91 Tomlinson_2013 01 18
#> 92 Tomlinson_2013 02 36
#> 93 Tomlinson_2013 05 18
#> 94 Tomlinson_2013 06 13
#> 95 Tomlinson_2013 07 18
#> 96 Tomlinson_2013 09 18
#> 97 Tomlinson_2013 11 18
#> 98 Tomlinson_2013 13 18
#> 99 Tomlinson_2019 01 361
#> 100 Tomlinson_2019 02 569
#> 101 Tomlinson_2019 05 384
#> 102 Tomlinson_2019 06 396
#> 103 Tomlinson_2019 07 345
#> 104 Tomlinson_2019 09 387
#> 105 Tomlinson_2019 10 216
#> 106 Tomlinson_2019 11 226
cc_traits |>
select(dataset_id) |>
distinct()
#> # A tibble: 9 × 1
#> dataset_id
#> <chr>
#> 1 ANBG_2019
#> 2 Choat_2012
#> 3 Grubb_1996
#> 4 Jagdish_2020
#> 5 Jordan_2015
#> 6 Reynolds_2018
#> 7 Richards_2008
#> 8 Tomlinson_2013
#> 9 Tomlinson_2019
cc_traits |>
select(location_id) |>
distinct() |>
pull() |>
sort()
#> [1] "0039" "01" "02" "05" "06" "07" "09" "10" "11" "13"
#> [11] "23" "24" "25" "26" "27" "28" "29" "30" "31" "32"
#> [21] "33" "34" "35" "36" "37" "38" "39" "40" Created on 2024-10-29 with reprex v2.1.0 |
Beta Was this translation helpful? Give feedback.
-
Seperate cleaning branch ~- austraits object or traits.build object ~ ✅
Currently a mix of both so the print function is not working for bind_databases. ~ - Which pipe to use %>% (final search and replace) ~ ✅ Seperate bind Branch ~- Bind-databases ~ ✅
~- Testing for Bind-databases before investigating TODO. ~ ✅ |
Beta Was this translation helpful? Give feedback.
-
I think its safe to close this discussion @dfalster and @ehwenk - thoughts? What remains:
|
Beta Was this translation helpful? Give feedback.
-
Here details the thought exercise of what the life would look if we put all the functions that interact with database outputs into {austraits} e.g.
plot_trait_beeswarm
,combine_table
functions.@ehwenk and I white-boarded this problem and noticed a few things (see photo):
traits.build::db_trait_pivot_*
is integral for the database development process.austraits::trait_pivot_
but there is possible scope to leavetraits.build::db_trait_pivot_*
as its own function in traits.build. Thoughts?traits.build::combine_table
will be moved toaustraits
as_wide_table
will still be maintained.traits.build::combine_table
has been migrated, we should discuss the need for both “wide” functions!traits.build::plot_trait_beeswarm
will be removed from traits.buildaustraits::plot_trait_beeswarm
inreports.R
and this has not broken anything 🎉Beta Was this translation helpful? Give feedback.
All reactions