Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prepare for v5.0.0 #787

Merged
merged 105 commits into from
Nov 19, 2023
Merged

Prepare for v5.0.0 #787

merged 105 commits into from
Nov 19, 2023

Conversation

dfalster
Copy link
Member

No description provided.

dfalster and others added 30 commits January 30, 2023 16:43
* update definitions vignette to source current trait definitions

* Add traits.yml & metadata.yml to vignettes/config

* change `definitions.html` to `trait_definitions.html`

* Rebuild austraits.build website

[skip ci]
* update syntax to dplyr 1.1.0

Various changes to comply with dplyr 1.1.0, including changes to fix things that broke and those that will in the future when additional deprecated functions/capabilities are fully dropped

* changed as ~na_if(.,0) to ~na_if(.x,0)
* moved ~na_if statements that were being piped into summarise()
* na_if doesn't seem to work if on numeric columns, so converted relevant columns to as.character() before running na_if
* replaced all use of "vars" with "c"
* replaced all use of "mutate_at" with "mutate(across())"

Of all studies, only Kooyman_2011 had any changes - about 50 rows added, but not sure why

The only remaining code that works differently than before is in custom.R, the function `move_values_to_new_trait`. The final line to convert blank cells to na's no longer triggers.  Instead this line now has to be added to the custom R code of each dataset that uses this function. I've fixed all datasets where lots of blanks were created, but there are a few datasets where I'm just leaving the blank cells (appear in excluded_data)
Add new dataset Gosper_2022

- fire response data from WA, including data on time from fire to fruit 
- study has been reviewed by contributor
- contributor also reviewed definitions of several traits associated with this study to ensure they matched the data

In addition:
- minor rewording changes to traits.yml
- minor edits to reports.Rmd
When the units were updated, some changes to allowable ranges didn't get merged in, causing ~4000 values to be mistakenly placed in the excluded_data table. These are the traits where some people report "inverted" values (as in, positive water potentials, when they are actually negative) and the units had been `neg_MPa`, indicating the value in AusTraits was the negative of the "real number". When the units were changed to `MPa`, there were a few traits where the allowable values weren't updated.

Also two other minor substitution issues mixed - introduced at the same time.
Add two TERN datasets submitted by Greg Guerin, which contain Eucalypt SLA and leaf shrinkage data

These datasets are published on the TERN website together with metadata and methods and have their own doi's, but are not part of other publications.
---------

Co-authored-by: yangsophieee <[email protected]>
Co-authored-by: Sophie Yang <[email protected]>
Co-authored-by: Daniel Falster <[email protected]>
Adding new study from Lynda Prior with Tasmania fire response data

Report reviewed by contributor; everything looks good.

First study we've input where there are near-continuous context properties - might edit these in the future. See also issue #673.
* add new study
* report reviewed by AusTraits team & contributor

contributor reviewed report and made a number of suggestions, including splitting the data into `pollination syndrome` and `pollination system`
Added Mesaglio_2022, a flowering time dataset for taxa along Duck River in suburban Sydney. Required year as a temporal metadata field to identify repeat measurements on the same taxon.

Also fixed bugs in reports script
* flowering_time & fruiting_time have to be excluded from the trait summaries, since they are categorical traits without defined lists of terms
* can only run scripts with substitutions if they exist, otherwise need to create a blank tibble

---------

Co-authored-by: ehwenk <[email protected]>
add Harvey_2017, a fire response dataset for WA submitted by Carl Gosper
A study on regeneration following fire, juvenile period and flowering phenology in the Northern Sandplain Kwongan, WA. The study adds values for `fire_response` for 88 new taxa! Note that for flowering time, Jan to Mar flowering was not recorded.

---------

Co-authored-by: ehwenk <[email protected]>
)

Added new traits related to fire response: `post_fire_flowering`, `plant_influence_on_fire_risk`, `plant_tolerance_fire`, `life_history_ephemeral_class`. Revised definitions for `clonal_spread_mechanism`, `storage_organ` and `bud_bank_location` to accommodate new datasets.
 
These are traits that Lizzy and I have been talking about over the past several months while I've been scoring fire response traits from the floras. 

Hopefully this flora scraping has improved fire response coverage in AusTraits by a large margin!

---------

Co-authored-by: ehwenk <[email protected]>
Co-authored-by: Daniel Falster <[email protected]>
* edits to trait definitions including:
 - standardising syntax for categorical trait values (how synonyms are designated; adding "." throughout)
 - edits as entire ADP read by AusTraits team
- edits following Belinda Medlyn and David Ellsworth's review of ADP
- minor edits to seed traits and leaf traits to capture a broader range of trait values in scraped flora data; mostly adding synonyms, but a few trait values for traits that previously had low coverage in AusTraits
- small amount of work on fire-response traits
- add trait seed_surface_reflectivity to capture the many instances of `dull` and `shiny` in seed descriptions
 - add `flowering_cues` to trait definitions

And then a major change to trait descriptions, to synchronise those in the AusTraits traits.yml file with those in APD. Included reading/writing entire yml file, which introduced lots of line breaks

And a small change: there were NA's in one of the Prior_2022 context properties, which was causing the austraits join functions to break
* add 2 new studies: SinghRamesh_2019 & SinghRamesh_2023 
* the leaf P values for SinghRamesh_2023  are (currently) the highest in AusTraits and have been reviewed by the contributor and are correct
---------

Co-authored-by: yangsophieee <[email protected]>
* add study: Wenk_2023_2 (manually scored rain cued flowering data from floras)

following Sophie's checks: 
* fixed some rain scorings
* add entity_context column to capture taxa where there are region-specific flowering cues

---------

Co-authored-by: yangsophieee <[email protected]>
* add new study: Ellsworth_2015
* report reviewed by contributor & AusTraits team
testthat 3.1.8 outputs NA's without quotes, rather than with quotes. Changed representation of NA in various tests to accommodate this change, as the RMD check was failing on pull requests

closes issue #688
* add new study: Pfautsch_2016
* data entirely public, but report reviewed/endorsed by contributor
---------
Co-authored-by: Sophie Yang <[email protected]>
* small, experimental dataset with all traits in manuscript, so won't contact the authors
* detailed fire response data from a collection of Western Australia sites, with a focus on time until reproductive maturity or secondary reproductive maturity post-fire
* the difference between the reproductive maturity traits and the time from fire to flowering/fruit traits will likely be revised post-fire workshops and the trait name being mapped may then be adjusted
---------
Co-authored-by: yangsophieee <[email protected]>
We have not yet heard back from the author but we are merging this in because individual-level data were publicly available.

---------

Co-authored-by: ehwenk <[email protected]>
* data set documenting different fire response traits for 1000+ WA taxa

* compiled by Neil Burrows and originally available on NatureMap, but data acquired through Carl Gosper.

* includes 2 new fire response traits, the time from fire to peak flowering and the time from fire to a decline in flower production/flowering levels
---------

Co-authored-by: yangsophieee <[email protected]>
- Need to fill in remaining information about the reference once it is published
- In the future the author may submit more detailed information about the sex type of non-dioecious taxa

---------

Co-authored-by: ehwenk <[email protected]>
Study on fire response, root morphology and root storage traits for 37 alpine plant species. Data were transcribed from tables in the article.

---------

Co-authored-by: ehwenk <[email protected]>
Add study compiling data on fire response and bud bank location of grasses (original study was global, so taxa were subsetted to Australian species based on the location column in their raw data). 

Also filtered out records already in Austraits from the original data sources (Pekin_2010, Moore_2018).

---------

Co-authored-by: ehwenk <[email protected]>
A study examining vegetation regeneration after burning in dry sclerophyll communities near Canberra. Data were transcribed from original manuscript.

---------

Co-authored-by: ehwenk <[email protected]>
A study on hydraulic vulnerability in dominant species of a Tasmanian dry sclerophyll woodland community.

---------

Co-authored-by: ehwenk <[email protected]>
ehwenk and others added 29 commits September 11, 2023 15:30
* fixes to Canham, from a branch to be deleted
* fix Taseski_2017 units
* fix Yang entity types
more fixes to pivoting problems:
- minor changes copied from traits.build back to austraits.build process.R file
- had to add locations to Yang_2023
- Witkowski_1991 would only recognised plant ages when I wrote them in letters after trying many many other tricks: I tried '10' and '21'; I checked metadata_check_custom_R_code("Witkowski_1991) to see how it was read in; I tried retyping the cells; I tried assigning it to a different context category; I tried adding "mutate to as.character" through custom_R_code and nothing worked. All rows in the column have a value, so it isn't being read in strangely.
- Maybe just a Windows issue
- Special characters in Canham_2023, Grigg_2008 and SinghRamesh_2019
Remove duplicate taxon names, to allow austraits to pivot
* add study: Wenk_2023, near-complete life history dataset for Australian plant taxa
- Add `custom_R_code` to replace special characters
- Uses unicode as some characters aren't normally recognised by Windows
- Warning will still remain for vanderMoezel_1987 as that will only be fixed once the traits.build package is used
- The vanderMoezel_1987 bin values that need unit conversions are lost during building but this has been happening previously
*  swap to using the traits.build package to build austraits.build
* move script with functions used in custom_R_code - it was scripts/custom.R; now R/custom_R_code.R
*  edits to metadata files to switch method_id to method_context_id & add proper method_id
---------
Co-authored-by: Elizabeth Wenk <[email protected]>
Co-authored-by: Sophie Yang <[email protected]>
Co-authored-by: yangsophieee <[email protected]>
Remove values from metadata[["context"]] if there are neither descriptions nor a find -> value replacement. The traits.build workflow now automatically reads in the list of unique values from the specified column if no values are given.  (Possible following the completion of traitecoevo/traits.build#15)
Update austraits remake file to accompany changes to units conversions on traits.build
change `austraits_curators` to `dataset_curators` - following corresponding change in traits.build (traitecoevo/traits.build#79)

Co-authored-by: Daniel Falster <[email protected]>
as part of adding the word "context" to all context_id's (in traits.build), needed to update the context categories in the metadata files in austraits.build, so
`category: plot` -> `category: plot_context` 
`category: treatment` -> `category: treatment_context` 
`category: temporal` -> `category: temporal_context` 

Also changed this and the names of context id's in the vignettes

matches corresponding change in traits.build (traitecoevo/traits.build#78)

Co-authored-by: Daniel Falster <[email protected]>
…vigenettes (#776)

* Remove vignettes, docs and website, now that traits.build has more developed documentation.
* Enhance Readme & contributing page

---------

Co-authored-by: ehwenk <[email protected]>
* Sort taxonomic updates in alphabetical order
* Convert all taxonomic_resolution values to lowercase
* Convert all taxonomic_resolution values to English (not Latin) versions

code used for resorting, and making lowercase
```
library(tidyverse)
library(traits.build)


ids <- dir("data")
metadata <- map(ids, ~read_metadata(sprintf("data/%s/metadata.yml", .x)))

# function to update a single file
f <- function(m){
  if(!is.na(m$taxonomic_updates[1]))
    m$taxonomic_updates <- 
      m$taxonomic_updates %>% util_list_to_df2() %>% arrange(find) %>%
      mutate(taxonomic_resolution = tolower(taxonomic_resolution))
  m
}

# apply updates
metadata_updated <- map(metadata, f)

# save to file
walk2(metadata_updated, ids, ~ write_metadata(.x, sprintf("data/%s/metadata.yml", .y)))
```
---------

Co-authored-by: ehwenk <[email protected]>
* renaming taxon fields to match APCalign

`cleaned_name` -> `aligned_name` (and similar uses of `cleaned_` to `aligned_`)
`taxononic_reference` -> `taxonomic_dataset`

* Update column names in taxon_list.csv
…ist.csv (#779)

New function to rebuild the taxon_list:
- uses APCalign::update_taxonomy() for the bulk of its functionality
- defaults to binding new rows to bottom; but option to overwrite list to update from new NSL files

New function to align taxon names and add taxonomic_updates to metadata files (build_align_taxon_names.R)

Lots of taxonomic updates to bring all datasets up to date again

Also reworking Nano_2011 taxon names

* Nano_2011 had a mix of scientific names with and without authorship, but not always in standard syntax
* there would have been 1000's of taxonomic updates added, so first manipulated the taxon names using stringr matches, such that all but ~300 ended up as exact matches to APC/APNI canonical names
* merged these names into main data spreadsheet, together with alignments, reasons, etc for the remaining names
* just the ~300 taxa requiring actual alignments added to metadata file

Standardising which taxa are excluded from observations

Checking to make sure that is a taxon is flagged as `non-native, non-naturalised` (or some other reason for exclusion) it is excluded from all datasets.
These tests are no longer needed as functions and tests have been migrated to https://github.com/traitecoevo/traits.build

Closes #780
* Update Cheesman_2020 basis_of_record field

* recode `entity_context` as `method_context` for some datasets

* fixes based on running all dataset tests
Update taxon_list.csv

- new taxon list - created using new function.
- Confirming that changes now show up in a readable manner - and that unneeded names are removed when `replace = TRUE`.
- .. have cyclically rebuilt AusTraits, taxon_list, AusTraits, etc. several times and confirm nothing is changing

* another search for non-native, non-naturalised taxa
* ongoing minor fixes to non-APC names in taxonomic_updates
* changes to rebuilding taxon script that add family, genus info for `unknown` species (phrase names, new names)
`build_update_taxonomy` -> `dataset_update_taxonomy` to be consistent, since this function now occurs on each dataset.
* remove file not required in austraits.build

* Recover deleted files

* Update to run

---------

Co-authored-by: Daniel Falster <[email protected]>
* Bump version
* Update DESCRIPTION
* use furrr for build instead or remake; change decencies accordingly
A number of new tests in traits.build captured metadata inconsistencies,  that led to fixes in austraits.build metadata files and also a few refinements to the tests.

* lots of taxonomic_updates that were no longer being used - a mix of duplicates and "artifacts" from long ago, when 1) additional updates were added for a secondary fix, rather than overwriting the first one; 2) a time a while ago when fixes were made to standardised names, not truly original names; and 3) for some of the flora scraping studies the same list of taxonomic updates was used for all studies from the same flora, although some taxa didn't have any entries in one of the 2-3 datasets.
* changed process.R in traits.build such that `excluded_observations` looks for `original_name` not the standardised name, and changed those accordingly
* found a few commas delimiting values, where there should have been spaces (in basis_of_record)
* removed duplicate substitutions in ANBG_2019

In the process of removing duplicate taxonomic_updates, kept rebuilding taxon_list & checking taxonomic_updates tibble. Taxon_list changed minimally and in expected ways.

At the end:
* all original names in taxonomic_updates now match to a taxon name (first time ever!)
* `combined_table` had same number of rows as `austraits$traits` (i.e. no duplication)
* all tests pass - this also means all datasets pivot
* looked carefully at alignments in taxa table and they look good - various filtering, against different `taxon_rank`, `taxonomic_dataset` values
---------

Co-authored-by: yangsophieee <[email protected]>
@dfalster dfalster merged commit 21bd155 into master Nov 19, 2023
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants