diff --git a/_pkgdown.yml b/_pkgdown.yml index 824dca7c4..aa4def5ea 100644 --- a/_pkgdown.yml +++ b/_pkgdown.yml @@ -139,7 +139,7 @@ navbar: - text: Adding new datasets into `AusTraits` href: articles/adding_data.html - text: Docker for reproducible compute environment - href: articles/Docker.html + href: articles/docker.html - text: Functions icon: fa-list href: reference/index.html diff --git a/docs/404.html b/docs/404.html index 66e68dfd9..add9dc25c 100644 --- a/docs/404.html +++ b/docs/404.html @@ -79,7 +79,7 @@ Adding new datasets into `AusTraits`
  • - Docker for reproducible compute environment + Docker for reproducible compute environment
  • diff --git a/docs/CODE_OF_CONDUCT.html b/docs/CODE_OF_CONDUCT.html index cd4094615..424f66a25 100644 --- a/docs/CODE_OF_CONDUCT.html +++ b/docs/CODE_OF_CONDUCT.html @@ -62,7 +62,7 @@ Adding new datasets into `AusTraits`
  • - Docker for reproducible compute environment + Docker for reproducible compute environment
  • @@ -91,7 +91,6 @@

    Contributor Code of Conduct

    -

    As contributors and maintainers of this project, we pledge to respect all people who contribute through reporting issues, posting feature requests, updating documentation, submitting pull requests or patches, and other activities.

    We are committed to making participation in this project a harassment-free experience for everyone, regardless of level of experience, gender, gender identity and expression, sexual orientation, disability, personal appearance, body size, race, ethnicity, age, or religion.

    Examples of unacceptable behavior by participants include the use of sexual language or imagery, derogatory comments or personal attacks, trolling, public or private harassment, insults, or other unprofessional conduct.

    diff --git a/docs/CONTRIBUTING.html b/docs/CONTRIBUTING.html index f3b8e3fcb..1a5528830 100644 --- a/docs/CONTRIBUTING.html +++ b/docs/CONTRIBUTING.html @@ -62,7 +62,7 @@ Adding new datasets into `AusTraits`
  • - Docker for reproducible compute environment + Docker for reproducible compute environment
  • @@ -90,8 +90,7 @@

    Contributing to austraits.build

    -
    - +

    We envision AusTraits as an on-going collaborative community resource that:

    1. Increases our collective understanding of the Australian flora;
    2. Facilitates accumulating and sharing of trait data;
    3. diff --git a/docs/ISSUE_TEMPLATE.html b/docs/ISSUE_TEMPLATE.html index 3762cf856..c009f5a1b 100644 --- a/docs/ISSUE_TEMPLATE.html +++ b/docs/ISSUE_TEMPLATE.html @@ -62,7 +62,7 @@ Adding new datasets into `AusTraits`
    4. - Docker for reproducible compute environment + Docker for reproducible compute environment
    5. @@ -95,7 +95,7 @@

      NA

      Please include a minimal reproducible example (AKA a reprex). If you’ve never heard of a reprex before, start by reading https://www.tidyverse.org/help/#reprex.


      Brief description of the problem

      -# insert reprex here
      +# insert reprex here
    diff --git a/docs/LICENSE-text.html b/docs/LICENSE-text.html index 30552e78e..aea39cf98 100644 --- a/docs/LICENSE-text.html +++ b/docs/LICENSE-text.html @@ -62,7 +62,7 @@ Adding new datasets into `AusTraits`
  • - Docker for reproducible compute environment + Docker for reproducible compute environment
  • diff --git a/docs/articles/adding_data.html b/docs/articles/adding_data.html index b1c010811..bb6d1cabd 100644 --- a/docs/articles/adding_data.html +++ b/docs/articles/adding_data.html @@ -80,7 +80,7 @@ Adding new datasets into `AusTraits`
  • - Docker for reproducible compute environment + Docker for reproducible compute environment
  • @@ -123,80 +123,46 @@

    2023-02-09

    -

    This vignette explains the protocol for adding a new study to -AusTraits. Before starting this, you should read more about

    +

    This vignette explains the protocol for adding a new study to AusTraits. Before starting this, you should read more about

    -

    It is important that all steps are followed so that our automated -workflow proceeds without problems.

    +

    It is important that all steps are followed so that our automated workflow proceeds without problems.

    An overview of the main steps

    1. Clone the austraits.build repository from github
    2. -
    3. Create a new branch in the repo, named for the new -dataset_id in author_year format, -e.g. Gallagher_2014.
    4. -
    5. Create a new folder within the folder data with the -name dataset_id, e.g. Gallagher_2014.
    6. -
    7. Prepare the file data.csv and place it within the new -folder (details here).
    8. -
    9. Prepare the file metadata.yml and place it within the -new folder (details here).
    10. -
    11. Add the new study into the build framework and rebuild AusTraits, by -running build_setup_pipeline().
    12. +
    13. Create a new branch in the repo, named for the new dataset_id in author_year format, e.g. Gallagher_2014.
    14. +
    15. Create a new folder within the folder data with the name dataset_id, e.g. Gallagher_2014.
    16. +
    17. Prepare the file data.csv and place it within the new folder (details here).
    18. +
    19. Prepare the file metadata.yml and place it within the new folder (details here).
    20. +
    21. Add the new study into the build framework and rebuild AusTraits, by running build_setup_pipeline().
    -

    This step updates the file remake.yml with appropriate -rules for the new dataset; similarly if you remove datasets, do the -same. (At this stage, remake offers no looping -constructs, so for now we generate the remake file using: whisker.)

    +

    This step updates the file remake.yml with appropriate rules for the new dataset; similarly if you remove datasets, do the same. (At this stage, remake offers no looping constructs, so for now we generate the remake file using: whisker.)

    You can then rebuild AusTraits, including your dataset.

      -
    1. Run tests and quality checks on the newly added dataset and correct -the data.csv and metadata.yml files as -necessary (details here).
    2. -
    3. Generate and proofread a report on the data. In particular, check -that numeric trait values fall within a logical range relative to other -studies, and that individual trait observations are not unnecessarily -excluded because their trait values are unsupported.
    4. -
    5. Return to step 6 if changes are made to the data.csv or -metadata.yml files.
    6. +
    7. Run tests and quality checks on the newly added dataset and correct the data.csv and metadata.yml files as necessary (details here).
    8. +
    9. Generate and proofread a report on the data. In particular, check that numeric trait values fall within a logical range relative to other studies, and that individual trait observations are not unnecessarily excluded because their trait values are unsupported.
    10. +
    11. Return to step 6 if changes are made to the data.csv or metadata.yml files.
    12. Push the GitHub branch.
    -

    It may help to download one of the existing -datasets to use as a template for your own files and a guide on -required content. You should look at the files in the config -folder, particularly the definitions file for the list -of traits we cover and the supported trait values for each trait. The -GitHub repository also hosts a compiled trait -definitions table.

    -

    The remainder of this vignette provides incredibly detailed -instructions for steps 4-8 above. It is intended for anyone wishing to -add datasets to either AusTraits itself or to use the austraits.build -workflow to create a separate database.

    +

    It may help to download one of the existing datasets to use as a template for your own files and a guide on required content. You should look at the files in the config folder, particularly the definitions file for the list of traits we cover and the supported trait values for each trait. The GitHub repository also hosts a compiled trait definitions table.

    +

    The remainder of this vignette provides incredibly detailed instructions for steps 4-8 above. It is intended for anyone wishing to add datasets to either AusTraits itself or to use the austraits.build workflow to create a separate database.

    Getting started

    -

    The austraits.build repository includes a selection of -functions that help build the repository. To use these, you’ll need to -make them available.

    -

    The easiest way to load the functions into your workspace is to run -the following (from within the repository)

    +

    The austraits.build repository includes a selection of functions that help build the repository. To use these, you’ll need to make them available.

    +

    The easiest way to load the functions into your workspace is to run the following (from within the repository)

    -source("scripts/custom.R")            # source functions written for use within custom_R_code
    -library(austraits.build)              # open the austraits.build package that provides the pipeline to build AusTraits
    +source("scripts/custom.R") # source functions written for use within custom_R_code +library(austraits.build) # open the austraits.build package that provides the pipeline to build AusTraits

    Inputting data @@ -204,287 +170,155 @@

    Inputting data

    Add a new folder

    -

    Add a new folder within the data folder. Its name should -be the study’s dataset_id, the core organising unit behind -AusTraits.

    -

    The preferred format for dataset_id is the surname of -the first author of any corresponding publication, followed by the year, -as surname_year. E.g. Falster_2005. Wherever -there are multiple studies with the same id, we add a suffix -_2, _3 etc. E.g.Falster_2005, -Falster_2005_2.

    +

    Add a new folder within the data folder. Its name should be the study’s dataset_id, the core organising unit behind AusTraits.

    +

    The preferred format for dataset_id is the surname of the first author of any corresponding publication, followed by the year, as surname_year. E.g. Falster_2005. Wherever there are multiple studies with the same id, we add a suffix _2, _3 etc. E.g.Falster_2005, Falster_2005_2.

    Constructing the data.csv file

    -

    All data for a study (dataset_id) must be merged into a -single spreadsheet: data.csv. All accompanying metadata is -read in through the metadata.yml file. Some information -must be input explicitly through the data.csv or -metdata.yml file, while other information can be entered -via either file; this is explicitly indicated for each element.

    +

    All data for a study (dataset_id) must be merged into a single spreadsheet: data.csv. All accompanying metadata is read in through the metadata.yml file. Some information must be input explicitly through the data.csv or metdata.yml file, while other information can be entered via either file; this is explicitly indicated for each element.

    1. -Required columns: Columns within the -data.csv file must include taxon name, -location_name (if there are multiple locations), -contexts (if appropriate), and collection_date -(if appropriate). The data.csv file can either be in a wide -format (1 column for each trait, with trait name as the -column header) or long format (a single column for all -trait values and additional columns for -trait name and units)
    2. +Required columns: Columns within the data.csv file must include taxon name, location_name (if there are multiple locations), contexts (if appropriate), and collection_date (if appropriate). The data.csv file can either be in a wide format (1 column for each trait, with trait name as the column header) or long format (a single column for all trait values and additional columns for trait name and units)
      -
    1. For all field studies, ensure there is a column for -location_name. If all measurements were made at a single -location, a location_name column can easily be mutated -using custom_R_code within the metadata.yml -file. See sections adding locations and -adding contexts below for more -information on compiling location and context data.

    2. -
    3. If available, be sure to include a column with -collection date. If possible, provide in -yyyy-mm-dd (e.g. 2020-03-05) format or, if the day of the -month isn’t known, as yyyy-mm (e.g. 2020-03). However, any -format is allowed and the column can be parsed to the proper yyyy-mm-dd -format using custom_R_code. If the same collection date -applies to the entire study it can be added directly into the -metadata.yml file.

    4. -
    5. If applicable, ensure there are columns for an context -properties, including experimental treatments, specific differences in -method, a stratified sampling scheme within a plot, or sampling season. -Additional context columns could be added through -custom_R_code or keyed in where traits are added, but it is -best to include a column in the data.csv file whenever possible. The -protocol for adding context properties to the metadata file is under adding contexts

    6. +
    7. For all field studies, ensure there is a column for location_name. If all measurements were made at a single location, a location_name column can easily be mutated using custom_R_code within the metadata.yml file. See sections adding locations and adding contexts below for more information on compiling location and context data.

    8. +
    9. If available, be sure to include a column with collection date. If possible, provide in yyyy-mm-dd (e.g. 2020-03-05) format or, if the day of the month isn’t known, as yyyy-mm (e.g. 2020-03). However, any format is allowed and the column can be parsed to the proper yyyy-mm-dd format using custom_R_code. If the same collection date applies to the entire study it can be added directly into the metadata.yml file.

    10. +
    11. If applicable, ensure there are columns for an context properties, including experimental treatments, specific differences in method, a stratified sampling scheme within a plot, or sampling season. Additional context columns could be added through custom_R_code or keyed in where traits are added, but it is best to include a column in the data.csv file whenever possible. The protocol for adding context properties to the metadata file is under adding contexts

    1. -Summarising data: Data submitted by a contributor -should be in the rawest form possible; always request data with -individual measurements over location/species means. Some studies make -replicate measurements on an individual at a single point in time. For -these studies, individual means need to be calculated, as AusTraits does -not include multiple measurements per individual. The raw values are -preserved in the contributor’s raw data files. Be sure to calculate the -number of replicates that contributed to each mean value.
    2. +Summarising data: Data submitted by a contributor should be in the rawest form possible; always request data with individual measurements over location/species means. Some studies make replicate measurements on an individual at a single point in time. For these studies, individual means need to be calculated, as AusTraits does not include multiple measurements per individual. The raw values are preserved in the contributor’s raw data files. Be sure to calculate the number of replicates that contributed to each mean value.

    When there is just a single row of values to summarise, use:

    -read_csv("data/dataset_id/raw/raw_data.csv") %>%
    -  mutate(leaf_area_replicates = 1) %>%
    -  group_by(individual, `species name`, location, context, etc) %>%
    -  summarise(
    -    leaf_area_mean = mean(leaf_area),
    -    leaf_area_replicates = sum(leaf_area_replicates)
    -    ) %>%
    -  ungroup()
    -

    (Make sure you group_by all categorical variables you -want to retain, for only columns that are grouping variables will be -kept)

    -

    When you want to take the mean of a series of continuous variables, -use:

    +read_csv("data/dataset_id/raw/raw_data.csv") %>% + mutate(leaf_area_replicates = 1) %>% + group_by(individual, `species name`, location, context, etc) %>% + summarise( + leaf_area_mean = mean(leaf_area), + leaf_area_replicates = sum(leaf_area_replicates) + ) %>% + ungroup()
    +

    (Make sure you group_by all categorical variables you want to retain, for only columns that are grouping variables will be kept)

    +

    When you want to take the mean of a series of continuous variables, use:

    -read_csv("data/dataset_id/raw/raw_data.csv") %>%
    -  mutate(replicates = 1) %>%
    -  group_by(individual, `species name`, location, context, etc) %>%
    -  summarise(
    -    across(
    -      c(leaf_area, `leaf N`), .fns = mean,
    -      c(replicates), .fns = sum,
    -      c(growth_form, `photosynthetic pathway`), .fns = first
    -    )
    -  ) %>%
    -  ungroup()
    +read_csv("data/dataset_id/raw/raw_data.csv") %>% + mutate(replicates = 1) %>% + group_by(individual, `species name`, location, context, etc) %>% + summarise( + across( + c(leaf_area, `leaf N`), .fns = mean, + c(replicates), .fns = sum, + c(growth_form, `photosynthetic pathway`), .fns = first + ) + ) %>% + ungroup()
    1. -Merging multiple spreadsheets: If multiple -spreadsheets of data are submitted these must be merged together.
    2. +Merging multiple spreadsheets: If multiple spreadsheets of data are submitted these must be merged together.
      -
    1. If the spreadsheets include different trait measurements made on the -same individual (or location means for the same species), they are best -merged using full_join, specifying all conditions that need -to be matched across spreadsheets (e.g. individual, species, location, -context). Ensure the column names are identical between spreadsheets or -specify columns that need to be matched.
    2. +
    3. If the spreadsheets include different trait measurements made on the same individual (or location means for the same species), they are best merged using full_join, specifying all conditions that need to be matched across spreadsheets (e.g. individual, species, location, context). Ensure the column names are identical between spreadsheets or specify columns that need to be matched.
    -read_csv("data/dataset_id/raw/data_file_1.csv") -> data_1
    -read_csv("data/dataset_id/raw/data_file_2.csv") -> data_2
    -data_1 %>% full_join(data_2, by = c("Individual", "Taxon", "Location", "Context"))
    +read_csv("data/dataset_id/raw/data_file_1.csv") -> data_1 +read_csv("data/dataset_id/raw/data_file_2.csv") -> data_2 +data_1 %>% full_join(data_2, by = c("Individual", "Taxon", "Location", "Context"))
      -
    1. If the spreadsheets include trait measurements for different -individuals (or possibly data at different scales - such as individual -level data for some traits and species means for other traits), they are -best merged using bind_rows. Ensure the column names for -taxon name, location name, context, individual, and collection date are -identical between spreadsheets. If there are data for the same traits in -both spreadsheets, make sure those column headers are identical as -well.
    2. +
    3. If the spreadsheets include trait measurements for different individuals (or possibly data at different scales - such as individual level data for some traits and species means for other traits), they are best merged using bind_rows. Ensure the column names for taxon name, location name, context, individual, and collection date are identical between spreadsheets. If there are data for the same traits in both spreadsheets, make sure those column headers are identical as well.
    -read_csv("data/dataset_id/raw/data_file_1.csv") -> data_1
    -read_csv("data/dataset_id/raw/data_file_2.csv") -> data_2
    -data_1 %>% bind_rows(data_2)
    +read_csv("data/dataset_id/raw/data_file_1.csv") -> data_1 +read_csv("data/dataset_id/raw/data_file_2.csv") -> data_2 +data_1 %>% bind_rows(data_2)
    1. -Taxon names: Taxon names need to be complete names. -If the main data file includes code names, with a key as a separate -file, they need to be merged:
    2. +Taxon names: Taxon names need to be complete names. If the main data file includes code names, with a key as a separate file, they need to be merged:
    -read_csv("data/dataset_id/raw/species_key.csv") -> species_key
    -read_csv("data/dataset_id/raw/data_file.csv") %>%
    -  left_join(species_key, by = "code")
    +read_csv("data/dataset_id/raw/species_key.csv") -> species_key +read_csv("data/dataset_id/raw/data_file.csv") %>% + left_join(species_key, by = "code")

    Unexpected hangups

    -read_csv("data/dataset_id/raw/raw_data.csv", guess_max = 10000)
    -

    This checks 10,000 rows of data before declaring the column is -non-numeric. The value can be set even higher…

    +read_csv("data/dataset_id/raw/raw_data.csv", guess_max = 10000)
    +

    This checks 10,000 rows of data before declaring the column is non-numeric. The value can be set even higher…

    Constructing the metadata.yml file

    -

    One way to construct the metadata.yml file is to use one -of the existing files and modify yours to follow the same format. As a -start, check out some examples from existing -studies in AusTraits, e.g. Angevin_2010 -or Wright_2009.

    -

    Note, when editing the metadata.yml, edits should be -made in a proper text editor (Microsoft word tends to mess up the -formatting). For example, Rstudio, textmate, sublime text, and Visual -Studio Code are all good editors.

    -

    To assist you in constructing the metadata.yml file, we -have developed functions to help fill in the different sections of the -file. You can then manually edit the file further to fill in missing -details.

    +

    One way to construct the metadata.yml file is to use one of the existing files and modify yours to follow the same format. As a start, check out some examples from existing studies in AusTraits, e.g. Angevin_2010 or Wright_2009.

    +

    Note, when editing the metadata.yml, edits should be made in a proper text editor (Microsoft word tends to mess up the formatting). For example, Rstudio, textmate, sublime text, and Visual Studio Code are all good editors.

    +

    To assist you in constructing the metadata.yml file, we have developed functions to help fill in the different sections of the file. You can then manually edit the file further to fill in missing details.

    First run the following to make the functions available

    -library(austraits.build)
    -

    The functions for populating the metadata file all begin with -metadata_. A list of the available functions is -automatically generated within the man/ folder within the -austraits.build directory.

    +library(austraits.build)
    +

    The functions for populating the metadata file all begin with metadata_. A list of the available functions is automatically generated within the man/ folder within the austraits.build directory.

    Creating a template

    -

    Create a basic template for the metadata.yml file for -your study. Note, it requires you to have already created a file -data.csv in the folder -data/your_dataset_id.

    -

    Let’s imagine you’re entering a study called -Yang_2028

    +

    Create a basic template for the metadata.yml file for your study. Note, it requires you to have already created a file data.csv in the folder data/your_dataset_id.

    +

    Let’s imagine you’re entering a study called Yang_2028

    -current_study <- "Yang_2028"
    -
    -metadata_create_template(current_study)
    -
    -# or simply
    -
    -metadata_create_template("Yang_2028")
    -

    The function will ask a series of questions and then create a -relatively empty file data/your_dataset_id/metadata.yml. -The key questions are:

    +current_study <- "Yang_2028" + +metadata_create_template(current_study) + +# or simply + +metadata_create_template("Yang_2028")
    +

    The function will ask a series of questions and then create a relatively empty file data/your_dataset_id/metadata.yml. The key questions are:

    -

    If your data.csv file does not yet have a -location_name column, this information can later be added -manually.

    +

    If your data.csv file does not yet have a location_name column, this information can later be added manually.

    Adding a source

    -

    Three functions are available to help with entering citation details -for the source data.

    -

    The function metadata_create_template creates a template -for the primary source with default fields for a journal article, which -you can then edit manually.

    +

    Three functions are available to help with entering citation details for the source data.

    +

    The function metadata_create_template creates a template for the primary source with default fields for a journal article, which you can then edit manually.

    If you have a doi for your study, use the function:

    -metadata_add_source_doi(dataset_id = current_study, doi = "doi")
    -

    and the different elements within the source will automatically be -generated. Double check the information added to ensure:
    +metadata_add_source_doi(dataset_id = current_study, doi = "doi")

    +

    and the different elements within the source will automatically be generated. Double check the information added to ensure:
    1. The title is in sentence case
    -2. Overall, the information isn’t in all caps (information -from a few journals is read in like this)
    -3. Pages numbers are present and added as, for example, -123 -- 134 ; note the -- between page -numbers

    -

    By default, details are added as the primary source. If multiple -sources are linked to a single dataset_id, you can specify -a source as secondary. Attempting to add a second primary -source will overwrite the information already input.

    +2. Overall, the information isn’t in all caps (information from a few journals is read in like this)
    +3. Pages numbers are present and added as, for example, 123 -- 134 ; note the -- between page numbers

    +

    By default, details are added as the primary source. If multiple sources are linked to a single dataset_id, you can specify a source as secondary. Attempting to add a second primary source will overwrite the information already input.

    -metadata_add_source_doi(dataset_id, doi, type = "secondary")
    +metadata_add_source_doi(dataset_id, doi, type = "secondary") -

    Alternatively, if you have reference details saved in a bibtex file -called myref.bib you can use the function

    +

    Alternatively, if you have reference details saved in a bibtex file called myref.bib you can use the function

    -metadata_add_source_doi(dataset_id, file = "myref.bib")
    -

    (These options require the packages rcrossref and RefManageR to be -installed.)

    +metadata_add_source_doi(dataset_id, file = "myref.bib") +

    (These options require the packages rcrossref and RefManageR to be installed.)

    For a book, the proper format is:

    source:
       primary:
    @@ -525,32 +359,17 @@ 

    Adding a source

    -

    If you manually add information, note that if there is a colon (:) or -apostrophe (’) in a reference, the text for that line must be in quotes -(“).

    +

    If you manually add information, note that if there is a colon (:) or apostrophe (’) in a reference, the text for that line must be in quotes (").

    Adding contributors

    -

    The skeletal metadata.yml file created by the function -metadata_create_template includes a template for entering -details about data contributors. Edit this manually, duplicating if -details for multiple people are required.

    +

    The skeletal metadata.yml file created by the function metadata_create_template includes a template for entering details about data contributors. Edit this manually, duplicating if details for multiple people are required.

    For example, in Roderick_2002

    @@ -567,56 +386,34 @@

    Adding contributors

    Custom R code

    -

    For many studies there are changes we want to make to a dataset -before the data.csv file is read into AusTraits. These most often -include applying a function to transform data, a function to filter -data, or a function to replace a contributor’s “measurement missing” -placeholder symbol with NA. In each case it is appropriate -to leave the rawer data in data.csv.

    +

    For many studies there are changes we want to make to a dataset before the data.csv file is read into AusTraits. These most often include applying a function to transform data, a function to filter data, or a function to replace a contributor’s “measurement missing” placeholder symbol with NA. In each case it is appropriate to leave the rawer data in data.csv.

    Background
    -

    In each case we want to make some custom modifications to a -particular dataset before the common pipeline of operations gets -applied. To make this possible, the workflow allows for some custom R -code to be run as a first step in the processing pipeline. That pipeline -(the function process_custom_code called within dataset_process) -looks like this:

    +

    In each case we want to make some custom modifications to a particular dataset before the common pipeline of operations gets applied. To make this possible, the workflow allows for some custom R code to be run as a first step in the processing pipeline. That pipeline (the function process_custom_code called within dataset_process) looks like this:

    -data <-
    -  read_csv(filename_data_raw, col_types = cols(), guess_max = 1e5) %>%
    -  process_custom_code(metadata[["dataset"]][["custom_R_code"]])() %>%
    -  process_parse_data(dataset_id, metadata)
    -

    Note the second line. This is where the custom code gets applied, -right after the file is loaded.

    +data <- + read_csv(filename_data_raw, col_types = cols(), guess_max = 1e5) %>% + process_custom_code(metadata[["dataset"]][["custom_R_code"]])() %>% + process_parse_data(dataset_id, metadata)
    +

    Note the second line. This is where the custom code gets applied, right after the file is loaded.

    Summary
    @@ -624,274 +421,163 @@
    Examples of appropriate us
    1. -

      Most sources from herbaria record flowering_time and -fruiting_time as a span of months, while AusTraits codes -these variables as a sequence of 12 N’s and Y’s for the 12 months. A -series of functions make this conversion in custom_R_code. These -include:

      +

      Most sources from herbaria record flowering_time and fruiting_time as a span of months, while AusTraits codes these variables as a sequence of 12 N’s and Y’s for the 12 months. A series of functions make this conversion in custom_R_code. These include:

        -
      • format_flowering_months’ (Create flowering times from -start to end pair)
        +
      • format_flowering_months’ (Create flowering times from start to end pair)
      • -
      • convert_month_range_string_to_binary’ (Converts -flowering and fruiting month ranges to 12 element character strings of -binary data)
        +
      • convert_month_range_string_to_binary’ (Converts flowering and fruiting month ranges to 12 element character strings of binary data)
      • -
      • convert_month_range_vec_to_binary’ (Convert vectors of -month range to 12 element character strings of binary data)
        +
      • convert_month_range_vec_to_binary’ (Convert vectors of month range to 12 element character strings of binary data)
      • -
      • collapse_multirow_phenology_data_to_binary_vec’ -(Converts multirow phenology data to a 12 digit binary string)
      • +
      • collapse_multirow_phenology_data_to_binary_vec’ (Converts multirow phenology data to a 12 digit binary string)
    2. -
    3. Many datasets from herbaria record traits like -leaf_length, leaf_width, -seed_length, etc. as a range (e.g. 2-8). The -function separate_range separates this data into a pair of -columns with minimum and maximum values, -required to properly align units

    4. +
    5. Many datasets from herbaria record traits like leaf_length, leaf_width, seed_length, etc. as a range (e.g. 2-8). The function separate_range separates this data into a pair of columns with minimum and maximum values, required to properly align units

    6. Duplicate values within a study need to be filtered out.

    -

    If a species-level measurement has been entered for all -within-location replicates, you need to filter out the duplicates. This -is true for both numeric and categorical values.

    +

    If a species-level measurement has been entered for all within-location replicates, you need to filter out the duplicates. This is true for both numeric and categorical values.

    -data %>%
    -  group_by(Species) %>%
    -    mutate(
    -      across(c(`leaf_percentN`, `plant growth form`), replace_duplicates_with_NA)
    -      ) %>%
    -  ungroup()
    -

    Note: You would use group_by(Species, Location) if there -are unique values at the species x location level.

    +data %>% + group_by(Species) %>% + mutate( + across(c(`leaf_percentN`, `plant growth form`), replace_duplicates_with_NA) + ) %>% + ungroup()
    +

    Note: You would use group_by(Species, Location) if there are unique values at the species x location level.

      -
    1. Values that were sourced from a different study need to be -filtered out. See Duplicates -between studies below - functions to automate this process are in -progress.

    2. -
    3. Author has represented missing data values with a symbol, such as -0 :

    4. +
    5. Values that were sourced from a different study need to be filtered out. See Duplicates between studies below - functions to automate this process are in progress.

    6. +
    7. Author has represented missing data values with a symbol, such as 0 :

    -data %>% mutate(across(c(`height (cm)`, `leaf area (mm2)`), ~ na_if(., 0)))
    +data %>% mutate(across(c(`height (cm)`, `leaf area (mm2)`), ~ na_if(., 0)))
      -
    1. If a subset of data in a column are also values for a -second trait in AusTraits, some data values can be duplicated in a -second temporary column. In the example below, some data in the -contributor’s fruit_type column also apply -to the trait fruit_fleshiness in AusTraits:
    2. +
    3. If a subset of data in a column are also values for a second trait in AusTraits, some data values can be duplicated in a second temporary column. In the example below, some data in the contributor’s fruit_type column also apply to the trait fruit_fleshiness in AusTraits:
    -data %>% mutate(fruit_fleshiness = ifelse(`fruit type` == "pome", "fleshy", NA))
    +data %>% mutate(fruit_fleshiness = ifelse(`fruit type` == "pome", "fleshy", NA))
      -
    1. If a subset of data in a column are instead -values for a second trait in AusTraits, some data values -can be moved to a second column (second trait), using the function -‘move_values_to_new_trait’. In the example below, some data -in the contributor’s growth_form column -only apply to the trait parasitic in -AusTraits. Note you need to create a blank variable to move the trait -values to.
    2. +
    3. If a subset of data in a column are instead values for a second trait in AusTraits, some data values can be moved to a second column (second trait), using the function ‘move_values_to_new_trait’. In the example below, some data in the contributor’s growth_form column only apply to the trait parasitic in AusTraits. Note you need to create a blank variable to move the trait values to.
    -data %>% 
    -  mutate(new_trait = NA_character) %>%
    -  move_values_to_new_trait(
    -    original_trait= "growth form", 
    -    new_trait = "parasitic",
    -    original_values = "parasitic",
    -    values_for_new_trait = "parasitic",
    -    values_to_keep = "NA")
    +data %>% + mutate(new_trait = NA_character) %>% + move_values_to_new_trait( + original_trait= "growth form", + new_trait = "parasitic", + original_values = "parasitic", + values_for_new_trait = "parasitic", + values_to_keep = "NA")

    or

    -data %>% 
    -  mutate(dispersal_appendage = NA.char) %>%
    -  move_values_to_new_trait(
    -    "fruits", "dispersal_appendage",
    -    c("dry & winged", "enclosed in aril"), 
    -    c("wings", "aril"),
    -    c("NA", "enclosed")
    -  )
    +data %>% + mutate(dispersal_appendage = NA.char) %>% + move_values_to_new_trait( + "fruits", "dispersal_appendage", + c("dry & winged", "enclosed in aril"), + c("wings", "aril"), + c("NA", "enclosed") + )
      -
    1. If the data.csv file includes raw data that you want to -manipulate into a trait, or the contributor presents the -data in a different formulation than AusTraits:
    2. +
    3. If the data.csv file includes raw data that you want to manipulate into a trait, or the contributor presents the data in a different formulation than AusTraits:
    -data %>% mutate(root_mass_fraction = `root mass` / (`root mass` + `shoot mass`))
    +data %>% mutate(root_mass_fraction = `root mass` / (`root mass` + `shoot mass`))
      -
    1. You can do manipulations, such as adding a column with -locations or manipulating location names. This -is only recommended for studies with a single (or few) location, where -manually adding the location data to the metadata.yml file -is fast, since in precludes automatically propagating location data into -metadata (see Adding location -details). As an example, see Blackman_2010:
    2. +
    3. You can do manipulations, such as adding a column with locations or manipulating location names. This is only recommended for studies with a single (or few) location, where manually adding the location data to the metadata.yml file is fast, since in precludes automatically propagating location data into metadata (see Adding location details). As an example, see Blackman_2010:
    -data %>%
    -  mutate(
    -    location_name = ifelse(location_name == "Mt Field" & habitat == "Montane rainforest", "Mt Field_wet", location_name),
    -    location_name = ifelse(location_name == "Mt Field" & habitat == "Dry sclerophyll", "Mt Field_dry", location_name)
    -  )
    +data %>% + mutate( + location_name = ifelse(location_name == "Mt Field" & habitat == "Montane rainforest", "Mt Field_wet", location_name), + location_name = ifelse(location_name == "Mt Field" & habitat == "Dry sclerophyll", "Mt Field_dry", location_name) + )
      -
    1. You can generate observation_numbers for sequential -measurements on the same individual
    2. +
    3. You can generate observation_numbers for sequential measurements on the same individual
    -data %>%
    -  group_by(Tree) %>%
    -    mutate(observation_number = row_number()) %>%
    -  ungroup() 
    +data %>% + group_by(Tree) %>% + mutate(observation_number = row_number()) %>% + ungroup()
      -
    1. You can generatemeasurement_remarks from more cryptic -notes
    2. +
    3. You can generatemeasurement_remarks from more cryptic notes
    -data %>%
    -  mutate(
    -        measurement_remarks = ifelse(material == "FRESH","fresh leaves (indicating amount of leaf moisture)", NA),
    -        measurement_remarks = ifelse(material == "DRIED","dry leaves (indicating amount of leaf moisture)", measurement_remarks),
    -        measurement_remarks = ifelse(material == "SENESCED","senesced leaves (indicating amount of leaf moisture)", measurement_remarks),
    -  )
    +data %>% + mutate( + measurement_remarks = ifelse(material == "FRESH","fresh leaves (indicating amount of leaf moisture)", NA), + measurement_remarks = ifelse(material == "DRIED","dry leaves (indicating amount of leaf moisture)", measurement_remarks), + measurement_remarks = ifelse(material == "SENESCED","senesced leaves (indicating amount of leaf moisture)", measurement_remarks), + )
      -
    1. You can reformat collection_dates supplied into the -yyyy-mm-dd format, or add a date column
    2. +
    3. You can reformat collection_dates supplied into the yyyy-mm-dd format, or add a date column
    -

    Converting from any mdy format to -yyyy-mm-dd (e.g. Dec 3 2015 to -2015-12-03)

    +

    Converting from any mdy format to yyyy-mm-dd (e.g. Dec 3 2015 to 2015-12-03)

    -data %>% mutate(Date = Date %>% mdy())
    -

    Converting from any dmy format to -yyyy-mm-dd (e.g. 3-12-2015 to -2015-12-03)

    +data %>% mutate(Date = Date %>% mdy()) +

    Converting from any dmy format to yyyy-mm-dd (e.g. 3-12-2015 to 2015-12-03)

    -data %>% mutate(Date = Date %>% dmy())
    -

    Converting from a mmm-yyyy (string) format to -yyyy-mm (e.g. Dec 2015 to -2015-12)

    +data %>% mutate(Date = Date %>% dmy()) +

    Converting from a mmm-yyyy (string) format to yyyy-mm (e.g. Dec 2015 to 2015-12)

    -data %>% mutate(Date = parse_date_time(Date, orders = "my") %>% format.Date("%Y-%m"))
    -

    Converting from a mdy format to yyyy-mm -(e.g. Excel has reinterpreted the data as full dates -12-01-2015 but the resolution should be “month” -2015-12)

    +data %>% mutate(Date = parse_date_time(Date, orders = "my") %>% format.Date("%Y-%m")) +

    Converting from a mdy format to yyyy-mm (e.g. Excel has reinterpreted the data as full dates 12-01-2015 but the resolution should be “month” 2015-12)

    -data %>% mutate(Date = parse_date_time(Date, orders = "mdy") %>% format.Date("%Y-%m"))
    -

    A particularly complicated example where some dates are presented as -yyyy-mm and others as yyyy-mm-dd

    +data %>% mutate(Date = parse_date_time(Date, orders = "mdy") %>% format.Date("%Y-%m")) +

    A particularly complicated example where some dates are presented as yyyy-mm and others as yyyy-mm-dd

    -data %>%
    -    mutate(
    -      weird_date = ifelse(str_detect(gathering_date, "^[0-9]{4}"), gathering_date, NA),
    -      gathering_date = gathering_date %>% mdy(quiet = T) %>% as.character(),
    -      gathering_date = coalesce(gathering_date, weird_date)
    -    ) %>% 
    -    select(-weird_date)
    +data %>% + mutate( + weird_date = ifelse(str_detect(gathering_date, "^[0-9]{4}"), gathering_date, NA), + gathering_date = gathering_date %>% mdy(quiet = T) %>% as.character(), + gathering_date = coalesce(gathering_date, weird_date) + ) %>% + select(-weird_date)
    Testing your custom R code
    -

    After you’ve added the custom R code to a file, check that it has -completed the intended data frame manipulation:

    +

    After you’ve added the custom R code to a file, check that it has completed the intended data frame manipulation:

    -

    You could alternatively read the data.csv file into R and run the -code line by line.

    +metadata_check_custom_R_code("Blackman_2010")
    +

    You could alternatively read the data.csv file into R and run the code line by line.

    Fill in metadata$dataset

    -

    The dataset section is a mix of fields that are filled -in automatically during metadata_create_template() and -fields that need to be manually filled in.

    +

    The dataset section is a mix of fields that are filled in automatically during metadata_create_template() and fields that need to be manually filled in.

    -

    WARNING If you have an entry -individual_id: unknown this assigns all rows of data to an -individual named “unknown” and the entire dataset will be assumed to be -from a single individual. This is why it is essential to omit this field -if there isn’t an actual row of data being read in.

    +

    WARNING If you have an entry individual_id: unknown this assigns all rows of data to an individual named “unknown” and the entire dataset will be assumed to be from a single individual. This is why it is essential to omit this field if there isn’t an actual row of data being read in.

    -

    There are also fields that will only be used for a subset of -datasets:

    +

    There are also fields that will only be used for a subset of datasets:

    Add traits

    -

    Begin by automatically adding all traits to your skeletal -metadata.yml file:

    +

    Begin by automatically adding all traits to your skeletal metadata.yml file:

    -metadata_add_traits(current_study)
    -

    You will be asked to indicate the columns you wish to keep as -distinct traits. Include all columns with trait data.

    -

    This automatically propagates each trait selected into -metadata.yml as follows where var_in is the -name of a column in the data.csv file (for wide datasets) -or a unique trait name values in the trait_name column (for -a long dataset):

    +metadata_add_traits(current_study)
    +

    You will be asked to indicate the columns you wish to keep as distinct traits. Include all columns with trait data.

    +

    This automatically propagates each trait selected into metadata.yml as follows where var_in is the name of a column in the data.csv file (for wide datasets) or a unique trait name values in the trait_name column (for a long dataset):

    - var_in: leaf area (mm2)  
       unit_in: .na  
       trait_name: .na
    @@ -902,234 +588,79 @@ 

    Add traits -for definitions of these accepted entity types. Note: entity_type is -about the hierarchical level to which the trait measurement refers; this -is separate from the taxonomic resolution of the entity’s -name.

    -
  • value_type: Allowable value types are mean, -minimum, maximum, mode, range, raw, and bin. See the top of -system.file("support", "austraits.build_schema.yml", package = "austraits.build") -for definitions of these accepted value types. All categorical traits -are generally scored as being a mode, the most commonly -observed value. Note that for values that are bins, the two -numbers are separated by a double-hyphen, 1 -- 10.

  • -
  • basis_of_value: Basis of value indicates how a -value was determined. Allowable terms are measurement, expert_score, -model_derived, and literature. See the top of -system.file("support", "austraits.build_schema.yml", package = "austraits.build") -for definitions of these accepted value types, but in general most -categorical traits are values that have been scored by an expert -(expert_score) and more numeric trait values are measurements.

  • -
  • replicates: Fill in with the appropriate value. -For categorical variables, leave this as .na. If there is a -column that specifies replicate number, you can list the column name in -the field.

  • +
  • units: fill in the units specified by the author - such as mm2. If you’re uncertain about the syntax/format used for some more complex units, look through the traits definition file (config/traits.yml) or the file showing unit conversions (config/unit_conversions.csv). For categorical variables, leave this as .na.

  • +
  • trait_name: This is the appropriate trait name from config/traits.yml. If no appropriate trait exists in AusTraits, a new trait can often be added - just ensure it is a trait where data will be comparable across studies and has been measured for a fair number (~>50) species. For currently unsupported traits, we leave this as .na but then fill in the rest of the data and flag this study as having a potential new trait. Then in the future, when this trait is added to the traits.yml file, the data can be read into AusTraits by simply replacing the .na with a trait name.

  • +
  • entity_type: Entity types indicate the taxonomic/ecological hierarchical level corresponding to the trait value. Entity types can be individual, population, species, genus, family or order. Metapopulation-level measurements are coded as population and infraspecific taxon-level measurements are coded as species. See the top of system.file("support", "austraits.build_schema.yml", package = "austraits.build") for definitions of these accepted entity types. Note: entity_type is about the hierarchical level to which the trait measurement refers; this is separate from the taxonomic resolution of the entity’s name.

  • +
  • value_type: Allowable value types are mean, minimum, maximum, mode, range, raw, and bin. See the top of system.file("support", "austraits.build_schema.yml", package = "austraits.build") for definitions of these accepted value types. All categorical traits are generally scored as being a mode, the most commonly observed value. Note that for values that are bins, the two numbers are separated by a double-hyphen, 1 -- 10.

  • +
  • basis_of_value: Basis of value indicates how a value was determined. Allowable terms are measurement, expert_score, model_derived, and literature. See the top of system.file("support", "austraits.build_schema.yml", package = "austraits.build") for definitions of these accepted value types, but in general most categorical traits are values that have been scored by an expert (expert_score) and more numeric trait values are measurements.

  • +
  • replicates: Fill in with the appropriate value. For categorical variables, leave this as .na. If there is a column that specifies replicate number, you can list the column name in the field.

  • -

    methods: This information can usually be copied -verbatim from a manuscript. In general, methods sections extracted from -pdfs include “special characters” (non-UTF-8 characters). Non-English -alphabet characters are recognised (e.g. é, ö) and should remain -unchanged. Other characters will be re-formatted during the study input -process, so double check that degree symbols (º), en-dashes (–), -em-dashes (—), and curly quotes (‘,’,“,”) have been maintained or -reformatted with a suitable alternative. Greek letters and some other -characters are replaced with their Unicode equivalent -(e.g. <U+03A8> replaces Psi (Ψ)); for these it is best to replace -the symbol with an interpretable English-character equivalent.

    +

    methods: This information can usually be copied verbatim from a manuscript. In general, methods sections extracted from pdfs include “special characters” (non-UTF-8 characters). Non-English alphabet characters are recognised (e.g. é, ö) and should remain unchanged. Other characters will be re-formatted during the study input process, so double check that degree symbols (º), en-dashes (–), em-dashes (—), and curly quotes (‘,’,“,”) have been maintained or reformatted with a suitable alternative. Greek letters and some other characters are replaced with their Unicode equivalent (e.g. <U+03A8> replaces Psi (Ψ)); for these it is best to replace the symbol with an interpretable English-character equivalent.

      -
    • Note with methods, if the identical methods apply to a string of -traits, for the first trait use the following syntax, where the -&leaf_length_method notation assigns the remaining text -in the field as the leaf_length_method.
    • +
    • Note with methods, if the identical methods apply to a string of traits, for the first trait use the following syntax, where the &leaf_length_method notation assigns the remaining text in the field as the leaf_length_method.
      methods: &leaf_length_method All measurements were from dry herbarium collections, with leaf and bracteole measurements taken from the largest of these structures on each specimen.
    -

    Then for the next trait that uses this method you can just include. -At the end of processing you can read/write the yml file and this will -fill in the assigned text throughout.

    +

    Then for the next trait that uses this method you can just include. At the end of processing you can read/write the yml file and this will fill in the assigned text throughout.

      methods: *leaf_length_method
  • -

    In addition to the automatically propagated fields, there are a -number of optional fields you can add if appropriate.

    +

    In addition to the automatically propagated fields, there are a number of optional fields you can add if appropriate.

      -
    • life_stage If all measurements in a dataset were -made on plants of the same life stage a global value should -be entered under metadata$dataset. -However if different traits were measured at different life stages or -different rows of data represent measurements at different life stages -you can specify a unique life stage for each trait or indicate a column -where this information is stored.

    • -
    • basis_of_record If all measurements in a dataset -represent the same basis_of_record a global value should be -entered under metadata$dataset. However -if different traits have different basis_of_record values or different -rows of data represent different basis_of_record values you can specify -a unique basis_of_record value for each trait or indicate a column where -this information is stored.

    • -
    • measurement_remarks: Measurement remarks is a -field to indicate miscellaneous comments. If these comments only apply -to specific trait(s), this field should be specified with those trait’s -metadata sections. This meant to be information that is not captured by -“methods” (which is fixed to a single value for a trait).

    • -
    • method_context If different columns in a wide -data.csv file indicate measurements on the same trait using different -methods, this needs to be designated. At the bottom of the trait’s -metadata, add a method_context_name field -(e.g. method_context words well). Write a word or short -phrase that indicates which method context applies to that trait (data -column). For instance, one trait might have -method_context: fully expanded leaves and a second entry -with the same trait name and method might have -method_context: leaves still expanding. The method context -details must also be added to the contexts section.

    • -
    • temporal_context If different columns in a wide -data.csv file indicate measurements on the same trait, on the same -individuals at different points in time, this needs to be designated. At -the bottom of the trait’s metadata, add a -temporal_context_name field -(e.g. temporal_context words well). Write a word or short -phrase that indicates which temporal context applies to that trait (data -column). For instance, one trait might have -temporal_context: dry season and a second entry with the -same trait name and method might have -temporal_context: after rain. The temporal context details -must also be added to the contexts -section.

    • +
    • life_stage If all measurements in a dataset were made on plants of the same life stage a global value should be entered under metadata$dataset. However if different traits were measured at different life stages or different rows of data represent measurements at different life stages you can specify a unique life stage for each trait or indicate a column where this information is stored.

    • +
    • basis_of_record If all measurements in a dataset represent the same basis_of_record a global value should be entered under metadata$dataset. However if different traits have different basis_of_record values or different rows of data represent different basis_of_record values you can specify a unique basis_of_record value for each trait or indicate a column where this information is stored.

    • +
    • measurement_remarks: Measurement remarks is a field to indicate miscellaneous comments. If these comments only apply to specific trait(s), this field should be specified with those trait’s metadata sections. This meant to be information that is not captured by “methods” (which is fixed to a single value for a trait).

    • +
    • method_context If different columns in a wide data.csv file indicate measurements on the same trait using different methods, this needs to be designated. At the bottom of the trait’s metadata, add a method_context_name field (e.g. method_context words well). Write a word or short phrase that indicates which method context applies to that trait (data column). For instance, one trait might have method_context: fully expanded leaves and a second entry with the same trait name and method might have method_context: leaves still expanding. The method context details must also be added to the contexts section.

    • +
    • temporal_context If different columns in a wide data.csv file indicate measurements on the same trait, on the same individuals at different points in time, this needs to be designated. At the bottom of the trait’s metadata, add a temporal_context_name field (e.g. temporal_context words well). Write a word or short phrase that indicates which temporal context applies to that trait (data column). For instance, one trait might have temporal_context: dry season and a second entry with the same trait name and method might have temporal_context: after rain. The temporal context details must also be added to the contexts section.

    Adding location details

    -

    Location data includes location names, latitude/longitude -coordinates, verbal location descriptions, and any additional -abiotic/biotic location variables provided by the contributor (or in the -accompanying manuscript). For studies with more than a few locations, it -is most efficient to create a table of this data that is automatically -read into the metadata.yml file.

    +

    Location data includes location names, latitude/longitude coordinates, verbal location descriptions, and any additional abiotic/biotic location variables provided by the contributor (or in the accompanying manuscript). For studies with more than a few locations, it is most efficient to create a table of this data that is automatically read into the metadata.yml file.

      -
    1. Location names must be identical (including syntax, case) to -those in data.csv

    2. -
    3. Column headers for latitude and longitude data must read -latitude (deg) and longitude (deg)

    4. -
    5. Latitude and longitude must be in decimal degrees -(i.e. -46.5832). There are many online converters to convert from -degrees,minutes,seconds format or UTM. Or use -the following formula: -decimel_degrees = degrees + (minutes/60) + (seconds/3600)

    6. -
    7. If there is a column with a general vegetation description -(i.e. rainforest, coastal heath it should be -titled description)

    8. -
    9. Although location properties are not restricted to a controlled -vocabulary, newly added studies should use the same location property -syntax as others whenever possible, to allow future discoverability. To -generate a list of already used under location_property, -use:

    10. +
    11. Location names must be identical (including syntax, case) to those in data.csv

    12. +
    13. Column headers for latitude and longitude data must read latitude (deg) and longitude (deg)

    14. +
    15. Latitude and longitude must be in decimal degrees (i.e. -46.5832). There are many online converters to convert from degrees,minutes,seconds format or UTM. Or use the following formula: decimel_degrees = degrees + (minutes/60) + (seconds/3600)

    16. +
    17. If there is a column with a general vegetation description (i.e. rainforest, coastal heath it should be titled description)

    18. +
    19. Although location properties are not restricted to a controlled vocabulary, newly added studies should use the same location property syntax as others whenever possible, to allow future discoverability. To generate a list of already used under location_property, use:

    -
    austraits$locations %>% distinct(location_property)
    -

    A few contributors provide a standalone file of all location data. -Otherwise, the following sequence works well:

    +
    austraits$locations %>% distinct(location_property)
    +

    A few contributors provide a standalone file of all location data. Otherwise, the following sequence works well:

      -
    1. Identify all location names in the data.csv file. The following code -extracts a list of location names and any other columns in the data file -that include location-specific information:
    2. +
    3. Identify all location names in the data.csv file. The following code extracts a list of location names and any other columns in the data file that include location-specific information:
    -read_csv("data/dataset_id/data.csv") %>%
    -  distinct(location, .keep_all = TRUE) %>% # the argument `.keep_all` ensures columns aren't dropped
    -  select(location, rainfall, lat, lon) %>% # list of relevant columns to keep
    -  rename(`latitude (deg)` = lat, `longitude (deg)` = long)  # rename columns to how you want them to appear in the metadata file. Faster to do it once here than repeatedly in the metadata file
    -  write_csv("data/dataset_id/raw/location_data.csv")
    +read_csv("data/dataset_id/data.csv") %>% + distinct(location, .keep_all = TRUE) %>% # the argument `.keep_all` ensures columns aren't dropped + select(location, rainfall, lat, lon) %>% # list of relevant columns to keep + rename(`latitude (deg)` = lat, `longitude (deg)` = long) # rename columns to how you want them to appear in the metadata file. Faster to do it once here than repeatedly in the metadata file + write_csv("data/dataset_id/raw/location_data.csv")

      -
    1. Open the spreadsheet in Excel (or any editor of your choice) and -manually add any additional data from the manuscript. Save as a .csv -file.

    2. +
    3. Open the spreadsheet in Excel (or any editor of your choice) and manually add any additional data from the manuscript. Save as a .csv file.

    4. Open in R

    -read_csv("data/dataset_id/raw/location_data.csv") -> location_data
    +read_csv("data/dataset_id/raw/location_data.csv") -> location_data

    As an example of what the location table should look like:

      -
    1. This location data can then be read into -metadata.yml:
    2. +
    3. This location data can then be read into metadata.yml:
    -metadata_add_locations(current_study, site_data)
    -

    You are first prompted to identify the column with the location name -and then to list all columns that contain location data. This -automatically fills in the location component on the metadata file.

    -

    It is possible that you will want to specify life_stage -or basis_of_record at the location_level. You can later -manually add these fields to some or all locations.

    -

    (During processing location_id’s are automatically generated and -paired with each location_name.)

    +metadata_add_locations(current_study, site_data) +

    You are first prompted to identify the column with the location name and then to list all columns that contain location data. This automatically fills in the location component on the metadata file.

    +

    It is possible that you will want to specify life_stage or basis_of_record at the location_level. You can later manually add these fields to some or all locations.

    +

    (During processing location_id’s are automatically generated and paired with each location_name.)

    Context details

    -

    The dictionary definition of a context is the situation within -which something exists or happens, and that can help explain it. -This is exactly what context_properties are in AusTraits, -ancillary information that is important to explaining and understanding -a trait value.

    -

    AusTraits recognises 5 categories of contexts: - -treatment contexts are experimental treatments applied -to individuals, such as soil nutrient manipulations, growing -temperatures, or CO2 enchancement. - plot contexts are -either blocks/plots within an experimental design or a variable that has -been measured within a location and measurements have been -stratified across this variable. Topographic position within a location -is an example of this. - temporal contexts relate to -repeat measurements on the same entity (individual, population, or -species) across time. They may simply be number observations or might be -explicitly linked to growing season or time of day. - -method contexts indicate that the same trait has been -measured on the same entity (individual, population or species) using -multiple methods. These might be samples from different canopy light -environments, different leaf ages, or sapwood samples from different -branch diameters. - entity_contexts capture ancillary -information about the entity (individual, population or species) that -helps explain the measured trait values. This might be the entity’s sex, -caste (for social insects), or host plant (for insects).

    -

    Context properties are not restricted to a controlled vocabulary. -However, newly added studies should use the same context property syntax -as others whenever possible, to allow future discoverability. To -generate a list of terms already used under -context_property, use:

    -
    austraits$contexts %>% distinct(context_property)
    -

    The AusTraits workflow can handle as many context properties as is -required. These are most easily read with the dedicated function

    -
    metadata_add_contexts(dataset_id)
    -

    The function first displays a list of all data columns (from the -data.csv file) and prompts you to select those that are context -properties. For each column you are asked to indicate its -category (those described above). You are shown a list of -unique values present in the data column and asked if these require any -substitutions. This function adds the following information to the -section metadata$contexts (example from -Crous_2013)

    +

    The dictionary definition of a context is the situation within which something exists or happens, and that can help explain it. This is exactly what context_properties are in AusTraits, ancillary information that is important to explaining and understanding a trait value.

    +

    AusTraits recognises 5 categories of contexts: - treatment contexts are experimental treatments applied to individuals, such as soil nutrient manipulations, growing temperatures, or CO2 enchancement. - plot contexts are either blocks/plots within an experimental design or a variable that has been measured within a location and measurements have been stratified across this variable. Topographic position within a location is an example of this. - temporal contexts relate to repeat measurements on the same entity (individual, population, or species) across time. They may simply be number observations or might be explicitly linked to growing season or time of day. - method contexts indicate that the same trait has been measured on the same entity (individual, population or species) using multiple methods. These might be samples from different canopy light environments, different leaf ages, or sapwood samples from different branch diameters. - entity_contexts capture ancillary information about the entity (individual, population or species) that helps explain the measured trait values. This might be the entity’s sex, caste (for social insects), or host plant (for insects).

    +

    Context properties are not restricted to a controlled vocabulary. However, newly added studies should use the same context property syntax as others whenever possible, to allow future discoverability. To generate a list of terms already used under context_property, use:

    +
    austraits$contexts %>% distinct(context_property)
    +

    The AusTraits workflow can handle as many context properties as is required. These are most easily read with the dedicated function

    +
    metadata_add_contexts(dataset_id)
    +

    The function first displays a list of all data columns (from the data.csv file) and prompts you to select those that are context properties. For each column you are asked to indicate its category (those described above). You are shown a list of unique values present in the data column and asked if these require any substitutions. This function adds the following information to the section metadata$contexts (example from Crous_2013)

    - context_property: unknown
       category: temporal
       var_in: month
    @@ -1161,13 +692,8 @@ 

    Context details

    -

    You must then manually fill in the fields designated as -unknown. You are permitted to omit the -description field if the context_property value itself -provides sufficient description.

    -

    If there are additional context properties that were designated in -the traits section, these will have to be added manually, as this -information is not captured in a column. A final output might be:

    +

    You must then manually fill in the fields designated as unknown. You are permitted to omit the description field if the context_property value itself provides sufficient description.

    +

    If there are additional context properties that were designated in the traits section, these will have to be added manually, as this information is not captured in a column. A final output might be:

    - context_property: sampling season
       category: temporal
       var_in: month
    @@ -1212,97 +738,59 @@ 

    Context details

    Using substitutions

    -

    It is very unlikely that a contributor will use categorical trait -values that are entirely identical to those in the -traits.yml file. You need to add substitutions for those -that do not exactly align to match the wording and syntax supported by -AusTraits. Combinations of multiple trait values are allowed - simply -list them, space delimited (e.g. shrub tree for a species -whose growth form includes both)

    +

    It is very unlikely that a contributor will use categorical trait values that are entirely identical to those in the traits.yml file. You need to add substitutions for those that do not exactly align to match the wording and syntax supported by AusTraits. Combinations of multiple trait values are allowed - simply list them, space delimited (e.g. shrub tree for a species whose growth form includes both)

    Single substitutions can be added by running:

    -metadata_add_substitution(current_study, "trait_name", "find", "replace")
    -

    where trait_name is the AusTraits defined trait name, -find is the trait value used in the data.csv file and -replace is the trait value supported by AusTraits.

    -

    If you have many substitutions to add, the following may be more -efficient:

    +metadata_add_substitution(current_study, "trait_name", "find", "replace")

    +

    where trait_name is the AusTraits defined trait name, find is the trait value used in the data.csv file and replace is the trait value supported by AusTraits.

    +

    If you have many substitutions to add, the following may be more efficient:

      -
    • Add a single substitution via the function and then copy and -paste the lines many times in the metadata.yml file, changing the -relevant fields

    • -
    • Create a spreadsheet with a list of all trait_name -by trait_value combinations requiring substitutions. The -spreadsheet would have four columns with headers -dataset_id, trait_name, find and -replace. This table can be read directly into the -metadata.yml file using the function -metadata_add_substitutions_table. This is described below -under Adding many -substitutions.

    • +
    • Add a single substitution via the function and then copy and paste the lines many times in the metadata.yml file, changing the relevant fields

    • +
    • Create a spreadsheet with a list of all trait_name by trait_value combinations requiring substitutions. The spreadsheet would have four columns with headers dataset_id, trait_name, find and replace. This table can be read directly into the metadata.yml file using the function metadata_add_substitutions_table. This is described below under Adding many substitutions.

    Excluded data
    -

    This section of the metadata.yml file provides the -capacity to explicitly exclude specific trait values or taxon names. -These are values that are in the data.csv file but should -be excluded from AusTraits.

    +

    This section of the metadata.yml file provides the capacity to explicitly exclude specific trait values or taxon names. These are values that are in the data.csv file but should be excluded from AusTraits.

    It includes three elements:
    -- variable: A variable from the traits table, typically -taxon_name, location_name or -context_name
    +- variable: A variable from the traits table, typically taxon_name, location_name or context_name
    - find: Value of variable to remove
    -- reason: Records why the data was removed, -e.g. exotic

    -

    Multiple, comma-delimited values can be added under -find.

    +- reason: Records why the data was removed, e.g. exotic

    +

    Multiple, comma-delimited values can be added under find.

    For example, in Munroe_2019:

    -
    exclude_observations:
    -- variable: taxon_name
    -  find: Campylopus introflexus, Dicranoloma menziesii, Philonotis tenuis, Polytrichastrum
    -    alpinum, Polytrichum juniperinum, Sphagnum cristatum
    -  reason: moss (E Wenk, 2020.06.18)
    -- variable: taxon_name
    -  find: Xanthoparmelia semiviridis
    -  reason: lichen (E Wenk, 2020.06.18)
    +
    Questions
    -

    The final section of the metadata.yml file is titled -questions. This is a location to:

    +

    The final section of the metadata.yml file is titled questions. This is a location to:

      -
    1. Ask the data contributor targeted questions about their study. When -you generate a report (described below) these questions will appear at -the top of the report. +
    2. Ask the data contributor targeted questions about their study. When you generate a report (described below) these questions will appear at the top of the report.
        -
      • Preface the first question you have with contributor: -(indented once), and additional questions with question2:, -etc.
        +
      • Preface the first question you have with contributor: (indented once), and additional questions with question2:, etc.
      • Ask contributors about missing metadata
      • -
      • Point contributors attention to odd data distributions, to make sure -they look at those traits extra carefully.
      • -
      • Let contributors know if you’re uncertain about their units or if -you transformed the data in a fairly major way.
        +
      • Point contributors attention to odd data distributions, to make sure they look at those traits extra carefully.
      • +
      • Let contributors know if you’re uncertain about their units or if you transformed the data in a fairly major way.
      • -
      • Ask the contributors if you’re uncertain you aligned their trait -names correctly.
      • +
      • Ask the contributors if you’re uncertain you aligned their trait names correctly.
    3. -
    4. This is a place to list any trait data that are not yet -traits supported by AusTraits. Use the following syntax, -indented once: additional_traits:, followed by a list of -traits.
    5. +
    6. This is a place to list any trait data that are not yet traits supported by AusTraits. Use the following syntax, indented once: additional_traits:, followed by a list of traits.
    Hooray! You now have a fully propagated metadata.yml file!
    -

    Next is making sure it has captured all the data exactly as you’ve -intended.

    +

    Next is making sure it has captured all the data exactly as you’ve intended.

    @@ -1310,63 +798,39 @@
    Hooray! You no

    Quality checks

    -

    Before starting the quality checks, it is helpful to assign a -variable, current_study:

    +

    Before starting the quality checks, it is helpful to assign a variable, current_study:

    -current_study <- "Wright_2001"
    -

    This lets you have a list of tests you run for each study and you -just have to reassign a new dataset_id to -current_study.

    +current_study <- "Wright_2001"
    +

    This lets you have a list of tests you run for each study and you just have to reassign a new dataset_id to current_study.

    It is best to run tests and fix formatting first.

    Clear formatting

    -

    The clear formatting code below reads and re-writes the -yaml file. This is the same process that is repeated when running -functions that automatically add substitutions or check taxonomy. -Running it first ensures that any formatting issues introduced (or -fixed) during the read/write process are identified and solved -first.

    -

    For instance, the write_metadata function inserts line -breaks every 80 characters and reworks other line breaks (except in -custom_R_code). It also reformats special characters in the text, -substituting in its accepted format for degree symbols, en-dashes, -em-dashes and quotes, and substituting in Unicode codes for more obscure -symbols.

    +

    The clear formatting code below reads and re-writes the yaml file. This is the same process that is repeated when running functions that automatically add substitutions or check taxonomy. Running it first ensures that any formatting issues introduced (or fixed) during the read/write process are identified and solved first.

    +

    For instance, the write_metadata function inserts line breaks every 80 characters and reworks other line breaks (except in custom_R_code). It also reformats special characters in the text, substituting in its accepted format for degree symbols, en-dashes, em-dashes and quotes, and substituting in Unicode codes for more obscure symbols.

    -f <- file.path("data", current_study, "metadata.yml")
    -read_metadata(f) %>% write_metadata(f)
    +f <- file.path("data", current_study, "metadata.yml") +read_metadata(f) %>% write_metadata(f)

    Running tests

    -

    Begin by running some automated tests to ensure the dataset meets the -required set up. The tests run through a collection of pre-specified -checks on the files for each study. The output alerts you to possible -issues needing to be fixed, by comparing the data in the files with the -expected structure and allowed values, as specified in the schema and -definitions.

    -

    Certain special characters may show up as errors and need to be -manually adjusted in the metadata.yml file

    -

    The tests also identify mismatches between the location names in the -data.csv file vs. metadata.yml file (same for context), unsupported -trait names, etc.

    -

    To run the tests, the variable dataset_ids must be -defined in the global namespace, containing a vector of ids to check. -For example:

    +

    Begin by running some automated tests to ensure the dataset meets the required set up. The tests run through a collection of pre-specified checks on the files for each study. The output alerts you to possible issues needing to be fixed, by comparing the data in the files with the expected structure and allowed values, as specified in the schema and definitions.

    +

    Certain special characters may show up as errors and need to be manually adjusted in the metadata.yml file

    +

    The tests also identify mismatches between the location names in the data.csv file vs. metadata.yml file (same for context), unsupported trait names, etc.

    +

    To run the tests, the variable dataset_ids must be defined in the global namespace, containing a vector of ids to check. For example:

    -# load relevant functions
    -library(austraits.build)
    -
    -# Tests run test on one study
    -dataset_ids <- "Bragg_2002"
    -dataset_test(dataset_ids)
    -
    -# Tests run test on all studies
    -dataset_ids <- dir("data")
    -dataset_test(dataset_ids)
    -

    Fix as many errors as you can and then rerun -dataset_test() repeatedly until no errors remain.

    +# load relevant functions +library(austraits.build) + +# Tests run test on one study +dataset_ids <- "Bragg_2002" +dataset_test(dataset_ids) + +# Tests run test on all studies +dataset_ids <- dir("data") +dataset_test(dataset_ids)
    +

    Fix as many errors as you can and then rerun dataset_test() repeatedly until no errors remain.

    See below for suggestions on how to implement large numbers of trait value substitutions.

    @@ -1374,469 +838,286 @@

    Rebuild AusTraits
    -build_setup_pipeline()
    -austraits <- remake::make("austraits")

    +build_setup_pipeline() +austraits <- remake::make("austraits")

    Check excluded data

    -

    AusTraits automatically excludes data for a number of reasons. These -are available in the frame excluded_data.

    -

    When you are finished running quality checks, no data should be -excluded due to Missing unit conversion and Unsupported -trait.

    -

    A few values may be legitimately excluded due to other errors, but -check each entry.

    +

    AusTraits automatically excludes data for a number of reasons. These are available in the frame excluded_data.

    +

    When you are finished running quality checks, no data should be excluded due to Missing unit conversion and Unsupported trait.

    +

    A few values may be legitimately excluded due to other errors, but check each entry.

    The best way to view excluded data for a study is:

    -austraits$excluded_data %>%
    -  filter(
    -    dataset_id == current_study,
    -    error != "Observation excluded in metadata"
    -  ) %>%
    -  View()
    -

    Missing values (blank cells, cells with NA) are not included in the -excluded_data table, because they are assumed to be -legitimate blanks. If you want to confirm this, you need to temporarily -change the default arguments for the internal function -dataset_process where it is called within the -remake.yml file. For instance, the default,

    +austraits$excluded_data %>% + filter( + dataset_id == current_study, + error != "Observation excluded in metadata" + ) %>% + View()
    +

    Missing values (blank cells, cells with NA) are not included in the excluded_data table, because they are assumed to be legitimate blanks. If you want to confirm this, you need to temporarily change the default arguments for the internal function dataset_process where it is called within the remake.yml file. For instance, the default,

    -      dataset_process("data/Ahrens_2019/data.csv",
    -                  Ahrens_2019_config,
    -                  schema
    -                 )
    + dataset_process("data/Ahrens_2019/data.csv", + Ahrens_2019_config, + schema + )

    needs to be changed to:

    -      dataset_process("data/Ahrens_2019/data.csv",
    -                  Ahrens_2019_config,
    -                  schema,
    -                  filter_missing_values = FALSE
    -                 )
    + dataset_process("data/Ahrens_2019/data.csv", + Ahrens_2019_config, + schema, + filter_missing_values = FALSE + )

    Reasons for data to be excluded

    Possible reasons for excluding a trait value includes:

      -
    • Missing species name: Species name is missing -from data.csv file for a given row of data. This usually occurs when -there are stray characters in the data.csv file below the data – delete -these rows.

    • -
    • Missing unit conversion: Value was present but -appropriate unit conversion was missing. This requires that you add a -new unit conversion to the file -config/unit_conversions.csv. Add additional conversions -near similar unit conversions already in the file for easier searching -in the future.

    • -
    • Observation excluded in metadata: Specific -values, usually certain taxon names can be excluded in the metadata. -This is generally used when a study includes a number of non-native and -non-naturalised species that need to be excluded. These should be -intentional exclusions, as they have been added by you.

    • -
    • Time contains non-number: Indicates a problem -with the value entered into the traits flowering_time and -fruiting_time. (Note to AusTraits custodians: This -error should no longer appear - will retain for now as a -placeholder.)

    • -
    • Unsupported trait: trait_name not -listed in config/traits.yml, under traits. -Double check you have used the correct spelling/exact syntax for the -trait_name, adding a new trait to the -traits.yml file if appropriate. If there is a trait that is -currently unsupported by AusTraits, leave trait_name: .na. -Do not fill in an arbitrary name.

    • -
    • Unsupported trait value: This error, referencing -categorical traits, means that the value for a trait is not -included in the list of supported trait values for that trait in -config/traits.yml. See adding many substitutions if there -are many trait values requiring substitutions. If appropriate, add -another trait value to the traits.yml file, but confer with -other curators, as the lists of trait values have been carefully agreed -upon through workshop sessions.

    • -
    • Value does not convert to numeric: Is there a -strange character in the file preventing easy conversion? This error is -rare and generally justified.

    • -
    • Value out of allowable range: This error, -referencing numeric traits, means that the trait value, after unit -conversions, falls outside of the allowable range specified for that -trait in config/traits.yml. Sometimes the AusTraits range -is too narrow and other times the author’s value is truly an outlier -that should be excluded. Look closely at these and adjust the range in -config/traits.yml if justified. Generally, don’t change the -range until you’ve create a report for the study and confirmed that the -general cloud of data aligns with other studies as excepted. Most -frequently the units or unit conversion is what is incorrect.

    • +
    • Missing species name: Species name is missing from data.csv file for a given row of data. This usually occurs when there are stray characters in the data.csv file below the data – delete these rows.

    • +
    • Missing unit conversion: Value was present but appropriate unit conversion was missing. This requires that you add a new unit conversion to the file config/unit_conversions.csv. Add additional conversions near similar unit conversions already in the file for easier searching in the future.

    • +
    • Observation excluded in metadata: Specific values, usually certain taxon names can be excluded in the metadata. This is generally used when a study includes a number of non-native and non-naturalised species that need to be excluded. These should be intentional exclusions, as they have been added by you.

    • +
    • Time contains non-number: Indicates a problem with the value entered into the traits flowering_time and fruiting_time. (Note to AusTraits custodians: This error should no longer appear - will retain for now as a placeholder.)

    • +
    • Unsupported trait: trait_name not listed in config/traits.yml, under traits. Double check you have used the correct spelling/exact syntax for the trait_name, adding a new trait to the traits.yml file if appropriate. If there is a trait that is currently unsupported by AusTraits, leave trait_name: .na. Do not fill in an arbitrary name.

    • +
    • Unsupported trait value: This error, referencing categorical traits, means that the value for a trait is not included in the list of supported trait values for that trait in config/traits.yml. See adding many substitutions if there are many trait values requiring substitutions. If appropriate, add another trait value to the traits.yml file, but confer with other curators, as the lists of trait values have been carefully agreed upon through workshop sessions.

    • +
    • Value does not convert to numeric: Is there a strange character in the file preventing easy conversion? This error is rare and generally justified.

    • +
    • Value out of allowable range: This error, referencing numeric traits, means that the trait value, after unit conversions, falls outside of the allowable range specified for that trait in config/traits.yml. Sometimes the AusTraits range is too narrow and other times the author’s value is truly an outlier that should be excluded. Look closely at these and adjust the range in config/traits.yml if justified. Generally, don’t change the range until you’ve create a report for the study and confirmed that the general cloud of data aligns with other studies as excepted. Most frequently the units or unit conversion is what is incorrect.

    -

    You can also ask how many of each error type are present for a -study:

    +

    You can also ask how many of each error type are present for a study:

    -austraits$excluded_data %>%
    -  filter(dataset_id == "Cheal_2017") %>%
    -  pull(error) %>%
    -  table()
    -#> < table of extent 0 >
    +austraits$excluded_data %>% + filter(dataset_id == "Cheal_2017") %>% + pull(error) %>% + table() +#> < table of extent 0 >

    Or produce a table of error type by trait:

    -austraits$excluded_data %>%
    -  filter(
    -    dataset_id == "Cheal_2017",
    -  ) %>%
    -  select(trait_name, error) %>%
    -  table()
    -#> < table of extent 0 x 0 >
    -

    Note, most studies have no excluded data. This study is an extreme -example!

    +austraits$excluded_data %>% + filter( + dataset_id == "Cheal_2017", + ) %>% + select(trait_name, error) %>% + table() +#> < table of extent 0 x 0 > +

    Note, most studies have no excluded data. This study is an extreme example!

    Adding many substitutions

    -

    For categorical traits, if you want to create a list of all values -that require substitutions:

    +

    For categorical traits, if you want to create a list of all values that require substitutions:

    -austraits$excluded_data %>%
    -  filter(
    -    dataset_id == current_study,
    -    error == "Unsupported trait value"
    -  ) %>%
    -  distinct(dataset_id, trait_name, value) %>%
    -  rename(find = value) %>%
    -  select(-dataset_id) %>%
    -  write_csv("data/dataset_id/raw/substitutions_required.csv")
    -

    For studies with a small number of substitutions, add them -individually using:

    +austraits$excluded_data %>% + filter( + dataset_id == current_study, + error == "Unsupported trait value" + ) %>% + distinct(dataset_id, trait_name, value) %>% + rename(find = value) %>% + select(-dataset_id) %>% + write_csv("data/dataset_id/raw/substitutions_required.csv")
    +

    For studies with a small number of substitutions, add them individually using:

    -metadata_add_substitution(dataset_id, trait_name, find, replace)
    -

    For studies with large number of substitutions required, you can add -an additional column to this table, replace, and fill in -all the correct trait values. Then read the list of substitutions -directly into the metadata file:

    +metadata_add_substitution(dataset_id, trait_name, find, replace) +

    For studies with large number of substitutions required, you can add an additional column to this table, replace, and fill in all the correct trait values. Then read the list of substitutions directly into the metadata file:

    -substitutions_to_add <-
    -  read_csv("data/dataset_id/raw/substitutions_required_after_editing.csv")
    -
    -metadata_add_substitutions_list(dataset_id, substitutions_to_add)
    +substitutions_to_add <- + read_csv("data/dataset_id/raw/substitutions_required_after_editing.csv") + +metadata_add_substitutions_list(dataset_id, substitutions_to_add)

    Add taxonomic updates

    -

    The function add_taxonomic_updates allows you to -manually align submitted taxon names (the original_name) -with the taxon names in the taxonomic resource.

    -
      metadata_add_taxonomic_change <- function(dataset_id, find, replace, reason, taxonomic_resolution)
    +

    The function add_taxonomic_updates allows you to manually align submitted taxon names (the original_name) with the taxon names in the taxonomic resource.

    +
    • -find is the name in the taxon name column in the -dataset
    • +find is the name in the taxon name column in the dataset
    • -replace is the equivalent taxon name in the -taxonomic resource
    • +replace is the equivalent taxon name in the taxonomic resource
    • -reason provides information about why the taxonomic -update is required
    • +reason provides information about why the taxonomic update is required
    • -taxonomic_resolution indicates the most specific -taxon rank that the name in replace aligns to
    • +taxonomic_resolution indicates the most specific taxon rank that the name in replace aligns to

    As examples:

    -

    A simple fix correcting a minor typo to align with an accepted taxon -name:

    -
    taxonomic_updates:
    -- find: Drummondita rubroviridis
    -  replace: Drummondita rubriviridis
    -  reason: match_07_fuzzy. Fuzzy alignment with accepted canonical name in APC (2022-11-21)
    -  taxonomic_resolution: Species
    -

    An example of a taxon name that can only be aligned to genus. The -taxonomic_resolution is therefore specified as -genus. The portion of the name that can be aligned to the -taxonomic resource must be before the square brackets. Any information -within the square brackets is important for uniquely identifying this -entry within AusTraits, but does not provide additional taxonomic -information.

    -
    - find: Acacia ancistrophylla/sclerophylla
    -  replace: Acacia sp. [Acacia ancistrophylla/sclerophylla; White_2020]
    -  reason: match_04. Rewording taxon where `/` indicates uncertain species identification
    -    to align with `APC accepted` genus (2022-11-10)
    -  taxonomic_resolution: genus
    -

    A taxonomic update that aligns a name to the most similar taxon_name -within a taxonomic resource (the APC), but this is a taxonomic synonym -and the austraits workflow will update it to its currently accepted name -(since this is documented within the taxon_list.csv -file):

    -
    - find: Polyalthia (Wyvur)
    -  replace: Polyalthia sp. (Wyvuri B.P.Hyland RFK2632)
    -  reason: match_15_fuzzy. Fuzzy match alignment with species-level canonical name
    -    in `APC known` when everything except first 2 words ignored (2022-11-10)
    -  taxonomic_resolution: Species
    +

    A simple fix correcting a minor typo to align with an accepted taxon name:

    + +

    An example of a taxon name that can only be aligned to genus. The taxonomic_resolution is therefore specified as genus. The portion of the name that can be aligned to the taxonomic resource must be before the square brackets. Any information within the square brackets is important for uniquely identifying this entry within AusTraits, but does not provide additional taxonomic information.

    + +

    A taxonomic update that aligns a name to the most similar taxon_name within a taxonomic resource (the APC), but this is a taxonomic synonym and the austraits workflow will update it to its currently accepted name (since this is documented within the taxon_list.csv file):

    +

    Check if AusTraits pivots wider

    -

    AusTraits users want to be able to “pivot” between long and wide -formats. Each row of data should have a unique combination of the -following fields: trait_name, dataset_id, -observation_id, source_id, -taxon_name, population_id, -individual_id, temporal_id, -method_id, value_type, and -original_name

    -

    Therefore, the dataset should be able to pivot wider and the -following code should have a 1 in every cell.

    +

    AusTraits users want to be able to “pivot” between long and wide formats. Each row of data should have a unique combination of the following fields: trait_name, dataset_id, observation_id, source_id, taxon_name, population_id, individual_id, temporal_id, method_id, value_type, and original_name

    +

    Therefore, the dataset should be able to pivot wider and the following code should have a 1 in every cell.

    -austraits$traits %>%
    -  select(dataset_id, trait_name, value, observation_id, source_id, taxon_name, population_id, individual_id, temporal_id, method_id, value_type, original_name) %>%
    -  pivot_wider(names_from = trait_name, values_from = value, values_fn = length) %>% View()
    -

    If AusTraits fails to pivot_wider, likely problems are: -- Not all context information has been captured. For -instance, is it possible that you have two columns with data for the -same trait, measured using different methods? In this case you need to -add a method_context to both the relevant traits and to the contexts section. - There are -multiple observations per entity. In a number of large studies -which, in theory, include a single observation per species, have a few -scattered instances of a second row of trait values with the same taxon -name. They might be true duplicates and can be removed or perhaps they -are indeed some alternate values. In this case the following -custom_R_code works:

    -
    ' data %>%
    -    group_by(taxon_name) %>%
    -      mutate(observation_number = dplyr::row_number()) %>%
    -    ungroup()'
    -

    Then add observation_number as a context with -category: temporal

    +austraits$traits %>% + select(dataset_id, trait_name, value, observation_id, source_id, taxon_name, population_id, individual_id, temporal_id, method_id, value_type, original_name) %>% + pivot_wider(names_from = trait_name, values_from = value, values_fn = length) %>% View()
    +

    If AusTraits fails to pivot_wider, likely problems are: - Not all context information has been captured. For instance, is it possible that you have two columns with data for the same trait, measured using different methods? In this case you need to add a method_context to both the relevant traits and to the contexts section. - There are multiple observations per entity. In a number of large studies which, in theory, include a single observation per species, have a few scattered instances of a second row of trait values with the same taxon name. They might be true duplicates and can be removed or perhaps they are indeed some alternate values. In this case the following custom_R_code works:

    +
    ' data %>%
    +    group_by(taxon_name) %>%
    +      mutate(observation_number = dplyr::row_number()) %>%
    +    ungroup()'
    +

    Then add observation_number as a context with category: temporal

    Check for duplicates

    -

    AusTraits strives to have no duplicate entries for numeric -(continuous) trait measurements. That is, each value in AusTraits should -represent a unique measurement, rather than a measurement sourced from -another study.

    -

    When you receive/solicit a dataset, ask the data contributor if all -data submitted was collected for the specific study and if they suspect -other studies from their lab/colleagues may also have contributed any of -this data.

    -

    In addition, there are tests to check for duplicates within and -across dataset_ids.

    +

    AusTraits strives to have no duplicate entries for numeric (continuous) trait measurements. That is, each value in AusTraits should represent a unique measurement, rather than a measurement sourced from another study.

    +

    When you receive/solicit a dataset, ask the data contributor if all data submitted was collected for the specific study and if they suspect other studies from their lab/colleagues may also have contributed any of this data.

    +

    In addition, there are tests to check for duplicates within and across dataset_ids.

    To check for duplicates:

    -austraits_deduped <- remove_suspected_duplicates(austraits)
    -duplicates_for_dataset_id <-
    -  austraits_deduped$excluded_data %>%
    -  filter(
    -    dataset_id == current_study
    -  )
    +austraits_deduped <- remove_suspected_duplicates(austraits) +duplicates_for_dataset_id <- + austraits_deduped$excluded_data %>% + filter( + dataset_id == current_study + )

    Duplicates within the study

      -
    1. First sort duplicates_for_dataset_id by the column -error and scan for duplicates within the study (these will -be entries under error that begin with the same dataset_id -as the dataset being processed)

    2. -
    3. For legitimately identical measurements, do nothing. For -instance, if %N has been measured on 50 replicates of a species and is -reported to the nearest 0.01% it is quite likely there will be a few -identical values within the study.

    4. -
    5. If a species-level measurement has been entered for all -within-location replicates, you need to filter out the duplicates. This -is true for both numeric and categorical values. Enter the following -code as custom_R_code in the dataset’s metadata -file:

    6. +
    7. First sort duplicates_for_dataset_id by the column error and scan for duplicates within the study (these will be entries under error that begin with the same dataset_id as the dataset being processed)

    8. +
    9. For legitimately identical measurements, do nothing. For instance, if %N has been measured on 50 replicates of a species and is reported to the nearest 0.01% it is quite likely there will be a few identical values within the study.

    10. +
    11. If a species-level measurement has been entered for all within-location replicates, you need to filter out the duplicates. This is true for both numeric and categorical values. Enter the following code as custom_R_code in the dataset’s metadata file:

    -
    data %>%
    -  group_by(Species) %>%
    -  mutate(
    -    across(
    -      c(leaf_percentN, `plant growth form`), replace_duplicates_with_NA)
    -    )
    -  ) %>%
    -  ungroup()
    -

    Note: Using custom R code instead of filtering the values in the -data.csv file itself ensures the relevant trait values are still -associated with each line of data in the data.csv file, but only read -into AusTraits a single time. Note: You would use -group_by(Species, Location) if there are unique values at -the species x location level.

    + +

    Note: Using custom R code instead of filtering the values in the data.csv file itself ensures the relevant trait values are still associated with each line of data in the data.csv file, but only read into AusTraits a single time. Note: You would use group_by(Species, Location) if there are unique values at the species x location level.

    Duplicates between studies

    -

    AusTraits does not attempt to filter out duplicates in categorical -traits between studies. The commonly duplicated traits like -life_form, plant_growth_form, -photosynthetic_pathway, fire_response, etc. -are legitimately duplicated and if the occasional study reported a -different plant_growth_form or fire_response -it would be important to have documented that one trait value was much -more common than another. Such categorical trait values may have been -sourced from reference material or measured/identified by this research -team.

    -

    Identifying duplicates in numeric traits between studies can be -difficult, but it is essential that we attempt to filter out all -duplicate occurrences of the same measurement. Some common patterns of -duplication include:

    +

    AusTraits does not attempt to filter out duplicates in categorical traits between studies. The commonly duplicated traits like life_form, plant_growth_form, photosynthetic_pathway, fire_response, etc. are legitimately duplicated and if the occasional study reported a different plant_growth_form or fire_response it would be important to have documented that one trait value was much more common than another. Such categorical trait values may have been sourced from reference material or measured/identified by this research team.

    +

    Identifying duplicates in numeric traits between studies can be difficult, but it is essential that we attempt to filter out all duplicate occurrences of the same measurement. Some common patterns of duplication include:

      -
    1. For a single trait, if there are a large number of values -duplicated in a specific other dataset_id (i.e. the error -repeatedly starts with the same dataset_id), be suspicious. -Before contacting the author, check the metadata for the two datasets, -especially authors and study locations, to see if it is likely these are -data values that have been jointly collected and shared across studies. -Similar location names/locations, identical university affiliations, or -similar lists of traits being measured are good clues.

    2. -
    3. plant_height, leaf_length, -leaf_width, seed_length, -seed_width and seed_mass are the numeric -variables that are most frequently sourced from reference material -(e.g. floras, herbarium collections, reference books, Kew seed database, -etc.)

    4. -
    5. The following datasets are flagged in AusTraits as -reference studies and are the source of most duplicates for -the variables listed above: Kew_2019_1, -Kew_2019_2, Kew_2019_3, -Kew_2019_4, Kew_2019_5, -Kew_2019_6, ANBG_2019, -GrassBase_2014, CPBR_2002, -NTH_2014,RBGK_2014, NHNSW_2016, -RBGSYD__2014_2, RBGSYD_2014, -TMAG_2009, WAH_1998, -WAH_2016,Brock_1993, Barlow_1981, -Hyland_2003, Cooper_2013

    6. +
    7. For a single trait, if there are a large number of values duplicated in a specific other dataset_id (i.e. the error repeatedly starts with the same dataset_id), be suspicious. Before contacting the author, check the metadata for the two datasets, especially authors and study locations, to see if it is likely these are data values that have been jointly collected and shared across studies. Similar location names/locations, identical university affiliations, or similar lists of traits being measured are good clues.

    8. +
    9. plant_height, leaf_length, leaf_width, seed_length, seed_width and seed_mass are the numeric variables that are most frequently sourced from reference material (e.g. floras, herbarium collections, reference books, Kew seed database, etc.)

    10. +
    11. The following datasets are flagged in AusTraits as reference studies and are the source of most duplicates for the variables listed above: Kew_2019_1, Kew_2019_2, Kew_2019_3, Kew_2019_4, Kew_2019_5, Kew_2019_6, ANBG_2019, GrassBase_2014, CPBR_2002, NTH_2014,RBGK_2014, NHNSW_2016, RBGSYD__2014_2, RBGSYD_2014, TMAG_2009, WAH_1998, WAH_2016,Brock_1993, Barlow_1981, Hyland_2003, Cooper_2013

    -

    Data from these studies are assumed to be the source, and the other -study with the value is assumed to have sourced it from the above study. -We recognise this is not always accurate, especially for compilations -within Kew_2019_1, Kew’s seed mass database. Whenever we -input a raw dataset that is also part of the Kew compilation, we filter -that contributors data from Kew_2019_1.

    +

    Data from these studies are assumed to be the source, and the other study with the value is assumed to have sourced it from the above study. We recognise this is not always accurate, especially for compilations within Kew_2019_1, Kew’s seed mass database. Whenever we input a raw dataset that is also part of the Kew compilation, we filter that contributors data from Kew_2019_1.

      -
    1. Data for wood_density is also often sourced from -other studies, most commonly Ilic_2000 or -Zanne_2009.

    2. -
    3. Data from a number of studies from Leishman and -Wright have been extensively shared within the trait -ecology community, especially through TRY

    4. +
    5. Data for wood_density is also often sourced from other studies, most commonly Ilic_2000 or Zanne_2009.

    6. +
    7. Data from a number of studies from Leishman and Wright have been extensively shared within the trait ecology community, especially through TRY

    -

    If the dataset you are processing has a number of numeric trait -duplicates that follow one of the patterns of duplication -listed, the duplicates should be filtered out. Any other data explicitly -indicated in the manuscript as sourced should also be filtered out. Most -difficult are studies that have partially sourced data, often from many -small studies, and partially collected new data, but not identified the -source of each value.

    +

    If the dataset you are processing has a number of numeric trait duplicates that follow one of the patterns of duplication listed, the duplicates should be filtered out. Any other data explicitly indicated in the manuscript as sourced should also be filtered out. Most difficult are studies that have partially sourced data, often from many small studies, and partially collected new data, but not identified the source of each value.

    Filtering duplicate data is a three-step process. In brief:

      -
    1. Identify traits and studies with duplicates you believe should be -removed.
      +
    2. Identify traits and studies with duplicates you believe should be removed.
    3. -
    4. Add additional columns to data.csv, identifying certain -trait_values as duplicates.
      +
    5. Add additional columns to data.csv, identifying certain trait_values as duplicates.
    6. -
    7. Add custom R code that filters out identified -duplicates when the study is merged into AusTraits.
    8. +
    9. Add custom R code that filters out identified duplicates when the study is merged into AusTraits.
    Identify traits and studies
      -
    1. Either in R or Excel, manipulate duplicates_for_dataset_id -to remove rows that you believe are legitimate duplicates, including -duplicates values due to replicate measurements within a single study -and stray duplicates across studies that likely true, incidental -duplicate values. Carefully consider which datasets and traits to -include/exclude from the filter.
    2. +
    3. Either in R or Excel, manipulate duplicates_for_dataset_id to remove rows that you believe are legitimate duplicates, including duplicates values due to replicate measurements within a single study and stray duplicates across studies that likely true, incidental duplicate values. Carefully consider which datasets and traits to include/exclude from the filter.

    As an example:

    -# Note, this code will be replaced by a function in the future.
    -duplicates_to_filter <-
    -  duplicates_for_dataset_id %>%
    -  mutate(
    -    dataset_with_duplicate =
    -      error %>%
    -        gsub("Duplicate of ", "", .) %>%
    -        gsub("[[:alnum:]]$", "", .) %>%
    -        gsub("[[:punct:]]$", "", .)
    -  ) %>%
    -  filter(dataset_with_duplicate %in% c("Ilic_2000", "Zanne_2009", "Kew_2019_1", "Barlow_1981", "NTH_2014")) %>%
    -  filter(trait_name %in% c("wood_density", "seed_mass", "leaf_length", "leaf_width"))
    +# Note, this code will be replaced by a function in the future. +duplicates_to_filter <- + duplicates_for_dataset_id %>% + mutate( + dataset_with_duplicate = + error %>% + gsub("Duplicate of ", "", .) %>% + gsub("[[:alnum:]]$", "", .) %>% + gsub("[[:punct:]]$", "", .) + ) %>% + filter(dataset_with_duplicate %in% c("Ilic_2000", "Zanne_2009", "Kew_2019_1", "Barlow_1981", "NTH_2014")) %>% + filter(trait_name %in% c("wood_density", "seed_mass", "leaf_length", "leaf_width"))
      -
    1. Use the following code to add columns to data.csv that -identify specific values as duplicates:
    2. +
    3. Use the following code to add columns to data.csv that identify specific values as duplicates:
    -# Note, this code will be replaced by a function in the future.
    -wood_density_duplicates <-
    -  duplicates_to_filter %>%
    -  filter(trait_name == "wood_density") %>%
    -  select(error, original_name) %>%
    -  rename(wood_density_duplicate = error)
    -
    -seed_mass_duplicates <-
    -  duplicates_to_filter %>%
    -  filter(trait_name == "seed_width") %>%
    -  select(error, original_name) %>%
    -  rename(seed_mass_duplicate = error)
    -
    -leaf_width_min_duplicates <-
    -  duplicates_to_filter %>%
    -  filter(trait_name == "leaf_width", value_type == "expert_min") %>%
    -  select(error, original_name) %>%
    -  rename(leaf_width_min_duplicate = error)
    -
    -leaf_width_max_duplicates <-
    -  duplicates_to_filter %>%
    -  filter(trait_name == "leaf_width", value_type == "expert_max") %>%
    -  select(error, original_name) %>%
    -  rename(leaf_width_max_duplicate = error)
    -
    -leaf_length_min_duplicates <-
    -  duplicates_to_filter %>%
    -  filter(trait_name == "leaf_length", value_type == "expert_min") %>%
    -  select(error, original_name) %>%
    -  rename(leaf_length_min_duplicate = error)
    -
    -leaf_length_max_duplicates <-
    -  duplicates_to_filter %>%
    -  filter(trait_name == "leaf_length", value_type == "expert_max") %>%
    -  select(error, original_name) %>%
    -  rename(leaf_length_max_duplicate = error)
    -
    -read_csv("data/dataset_id/data.csv") %>%
    -  left_join(wood_density_duplicates, by = c("column_with_taxon_name" = "original_name")) %>%
    -  left_join(seed_mass_duplicates, by = c("column_with_taxon_name" = "original_name")) %>%
    -  left_join(leaf_width_min_duplicates, by = c("column_with_taxon_name" = "original_name")) %>%
    -  left_join(leaf_width_max_duplicates, by = c("column_with_taxon_name" = "original_name")) %>%
    -  left_join(leaf_length_min_duplicates, by = c("column_with_taxon_name" = "original_name")) %>%
    -  left_join(leaf_length_max_duplicates, by = c("column_with_taxon_name" = "original_name")) %>%
    -  write_csv("data/dataset_id/data.csv")
    +# Note, this code will be replaced by a function in the future. +wood_density_duplicates <- + duplicates_to_filter %>% + filter(trait_name == "wood_density") %>% + select(error, original_name) %>% + rename(wood_density_duplicate = error) + +seed_mass_duplicates <- + duplicates_to_filter %>% + filter(trait_name == "seed_width") %>% + select(error, original_name) %>% + rename(seed_mass_duplicate = error) + +leaf_width_min_duplicates <- + duplicates_to_filter %>% + filter(trait_name == "leaf_width", value_type == "expert_min") %>% + select(error, original_name) %>% + rename(leaf_width_min_duplicate = error) + +leaf_width_max_duplicates <- + duplicates_to_filter %>% + filter(trait_name == "leaf_width", value_type == "expert_max") %>% + select(error, original_name) %>% + rename(leaf_width_max_duplicate = error) + +leaf_length_min_duplicates <- + duplicates_to_filter %>% + filter(trait_name == "leaf_length", value_type == "expert_min") %>% + select(error, original_name) %>% + rename(leaf_length_min_duplicate = error) + +leaf_length_max_duplicates <- + duplicates_to_filter %>% + filter(trait_name == "leaf_length", value_type == "expert_max") %>% + select(error, original_name) %>% + rename(leaf_length_max_duplicate = error) + +read_csv("data/dataset_id/data.csv") %>% + left_join(wood_density_duplicates, by = c("column_with_taxon_name" = "original_name")) %>% + left_join(seed_mass_duplicates, by = c("column_with_taxon_name" = "original_name")) %>% + left_join(leaf_width_min_duplicates, by = c("column_with_taxon_name" = "original_name")) %>% + left_join(leaf_width_max_duplicates, by = c("column_with_taxon_name" = "original_name")) %>% + left_join(leaf_length_min_duplicates, by = c("column_with_taxon_name" = "original_name")) %>% + left_join(leaf_length_max_duplicates, by = c("column_with_taxon_name" = "original_name")) %>% + write_csv("data/dataset_id/data.csv")
      -
    1. For the above example, then add the following code to custom R code, -removing the duplicate values from the data columns (by setting them as -NA) as the dataset is read into AusTraits.
    2. +
    3. For the above example, then add the following code to custom R code, removing the duplicate values from the data columns (by setting them as NA) as the dataset is read into AusTraits.
    -data %>%
    -  mutate(
    -    `wood density` = ifelse(is.na(wood_density_duplicate), `wood density`, NA),
    -    `seed mass (mg)` = ifelse(is.na(seed_mass_duplicate), `seed mass (mg)`, NA),
    -    `leaf width minimum (mm)` = ifelse(is.na(leaf_width_min_duplicate), `leaf width minimum (mm)`, NA),
    -    `leaf width maximum (mm)` = ifelse(is.na(leaf_width_max_duplicate), `leaf width maximum (mm)`, NA),
    -    `leaf length minimum (mm)` = ifelse(is.na(leaf_length_min_duplicate), `leaf length minimum (mm)`, NA),
    -    `leaf length maximum (mm)` = ifelse(is.na(leaf_length_max_duplicate), `leaf length maximum (mm)`, NA)
    -  )
    +data %>% + mutate( + `wood density` = ifelse(is.na(wood_density_duplicate), `wood density`, NA), + `seed mass (mg)` = ifelse(is.na(seed_mass_duplicate), `seed mass (mg)`, NA), + `leaf width minimum (mm)` = ifelse(is.na(leaf_width_min_duplicate), `leaf width minimum (mm)`, NA), + `leaf width maximum (mm)` = ifelse(is.na(leaf_width_max_duplicate), `leaf width maximum (mm)`, NA), + `leaf length minimum (mm)` = ifelse(is.na(leaf_length_min_duplicate), `leaf length minimum (mm)`, NA), + `leaf length maximum (mm)` = ifelse(is.na(leaf_length_max_duplicate), `leaf length maximum (mm)`, NA) + )

    Difficulties:

      -
    • This method only identifies values as duplicates if they have the -same number of significant figures.
    • -
    • More complex matching may reveal further duplicates. For seed mass -in particular, some studies likely source values from the Kew database -and then round these values. They may similarly source several values -from Kew and then include the mean in their dataset. If their methods or -correspondence with the contributor suggests the values were sourced -from Kew (or another lab, papers, etc.) it is best to filter out all -values, EXCEPT species that are not yet represented in AusTraits for the -trait in question.
    • +
    • This method only identifies values as duplicates if they have the same number of significant figures.
    • +
    • More complex matching may reveal further duplicates. For seed mass in particular, some studies likely source values from the Kew database and then round these values. They may similarly source several values from Kew and then include the mean in their dataset. If their methods or correspondence with the contributor suggests the values were sourced from Kew (or another lab, papers, etc.) it is best to filter out all values, EXCEPT species that are not yet represented in AusTraits for the trait in question.
    @@ -1851,180 +1132,118 @@

    Build study report
    -f <- file.path("data", current_study, "metadata.yml")
    -read_metadata(f) %>% write_metadata(f)
    -
    -dataset_ids <- current_study
    -austraits_run_tests()
    -
    -austraits <- remake::make("austraits")
    -dataset_report(current_study, overwrite = TRUE)
    +f <- file.path("data", current_study, "metadata.yml") +read_metadata(f) %>% write_metadata(f) + +dataset_ids <- current_study +austraits_run_tests() + +austraits <- remake::make("austraits") +dataset_report(current_study, overwrite = TRUE)

    To generate a report for a collection of studies:

    -dataset_reports(c("Falster_2005_1", "Wright_2002"), overwrite = TRUE)
    +dataset_reports(c("Falster_2005_1", "Wright_2002"), overwrite = TRUE)

    Or for all studies:

    -dataset_reports(overwrite = TRUE)
    -

    Add the argument overwrite=TRUE if you already have a -copy of a specific report stored in your computer and want to replace it -with a newer version.

    -

    (Reports are written in Rmarkdown and generated -via the knitr package. -The template is stored in scripts/report_study.html).

    +dataset_reports(overwrite = TRUE) +

    Add the argument overwrite=TRUE if you already have a copy of a specific report stored in your computer and want to replace it with a newer version.

    +

    (Reports are written in Rmarkdown and generated via the knitr package. The template is stored in scripts/report_study.html).

    Working with our GitHub repository

    -

    By far our preferred way of contributing is for you to contribute -files directly into the repository and then send a pull -request with your input. You can do this by

    +

    By far our preferred way of contributing is for you to contribute files directly into the repository and then send a pull request with your input. You can do this by

      -
    • (for approved maintainers of austraits.build) Creating a branch, -or
    • +
    • (for approved maintainers of austraits.build) Creating a branch, or
    • (for others) forking the database in github

    In short,

      -
    1. Create a Git branch for your new work, either within the AusTraits -repo (if you are an approved contributor) or as a fork -of the repo.
    2. +
    3. Create a Git branch for your new work, either within the AusTraits repo (if you are an approved contributor) or as a fork of the repo.
    4. Make commits and push these up onto the branch.
    5. Make sure everything runs fine before you send a pull request.
    6. When you’re ready to merge in the new features,
    -

    Before you make a substantial pull request, you should always file an -issue and make sure someone from the team agrees that it’s worth -pursuing the problem. If you’ve found a bug, create an associated issue -and illustrate the bug with a minimal reprex illustrating -the issue.

    -

    If this is not possible, you could email the relevant files (see -above) to the AusTraits email:

    +

    Before you make a substantial pull request, you should always file an issue and make sure someone from the team agrees that it’s worth pursuing the problem. If you’ve found a bug, create an associated issue and illustrate the bug with a minimal reprex illustrating the issue.

    +

    If this is not possible, you could email the relevant files (see above) to the AusTraits email: austraits.database@gmail.com

    Merging a pull request

    -

    There are multiple ways to merge a pull request, including using -GitHub’s built-in options for merging and squashing. When merging a PR, -we ideally want

    +

    There are multiple ways to merge a pull request, including using GitHub’s built-in options for merging and squashing. When merging a PR, we ideally want

    • a single commit
    • to attribute the work to the original author
    • to run various checks along the way
    -

    There are two ways to do this. For both, you need to be an approved -maintainer.

    +

    There are two ways to do this. For both, you need to be an approved maintainer.

    Merging in your own PR

    -

    You can merge in your own PR after you’ve had someone else review -it.

    +

    You can merge in your own PR after you’ve had someone else review it.

    1. Send the PR
    2. Tag someone to review
    3. -
    4. Once ready, merge into main choosing “Squash & Merge”, using an -informative commit message.
    5. +
    6. Once ready, merge into main choosing “Squash & Merge”, using an informative commit message.

    Merging someone else’s PR

    -

    When merging in someone else’s PR, the built-in options aren’t ideal, -as they either take all of the commits on a branch (ugh, messy), OR make -the commit under the name of the person merging the request.

    -

    The workflow below describes how to merge a pull request from the -command line, with a single commit & attributing the work to the -original author. Lets assume a branch of name -Smith_1995.

    +

    When merging in someone else’s PR, the built-in options aren’t ideal, as they either take all of the commits on a branch (ugh, messy), OR make the commit under the name of the person merging the request.

    +

    The workflow below describes how to merge a pull request from the command line, with a single commit & attributing the work to the original author. Lets assume a branch of name Smith_1995.

    First, from the master branch in the repo, run the following:

    git merge --squash origin/Smith_1995

    Then in R

    Now back in the terminal

    git add .
     git commit
    -

    Add a commit message, referencing relevant pull requests and issues, -e.g.

    +

    Add a commit message, referencing relevant pull requests and issues, e.g.

    Smith_1995: Import new data
     
     For #224, closes #286
    -

    And finally, amend the commit author, to reference the person who did -all the work!

    +

    And finally, amend the commit author, to reference the person who did all the work!

    git commit --amend --author "XXX <XXX@gmail.com>"

    Commit messages

    -

    Informative commit messages are ideal. Where possible, these should -reference the issue being addressed. They should clearly describe the -work done and value added to AusTraits in a few, clear, bulleted -points.

    +

    Informative commit messages are ideal. Where possible, these should reference the issue being addressed. They should clearly describe the work done and value added to AusTraits in a few, clear, bulleted points.

    Version updating & Making a new release

    -

    Releases of the dataset are snapshots that are archived and available -for use.

    -

    We use semantic versioning to label our versions. As discussed in Falster et al 2019, -semantic versioning can apply to datasets as well as code.

    -

    The version number will have 3 components for actual releases, and 4 -for development versions. The structure is -major.minor.patch.dev, where dev is at least -9000. The dev component provides a visual signal that this -is a development version. So, if the current version is 0.9.1.9000, the -release be 0.9.2, 0.10.0 or 1.0.0.

    +

    Releases of the dataset are snapshots that are archived and available for use.

    +

    We use semantic versioning to label our versions. As discussed in Falster et al 2019, semantic versioning can apply to datasets as well as code.

    +

    The version number will have 3 components for actual releases, and 4 for development versions. The structure is major.minor.patch.dev, where dev is at least 9000. The dev component provides a visual signal that this is a development version. So, if the current version is 0.9.1.9000, the release be 0.9.2, 0.10.0 or 1.0.0.

    Our approach to incrementing version numbers is

    • -major: increment when you make changes to the structure -that are likely incompatible with any code written to work with previous -versions.
    • +major: increment when you make changes to the structure that are likely incompatible with any code written to work with previous versions.
    • -minor: increment to communicate any changes to the -structure that are likely to be compatible with any code written to work -with the previous versions (i.e., allows code to run without error). -Such changes might involve adding new data within the existing -structure, so that the previous dataset version exists as a subset of -the new version. For tabular data, this includes adding columns or rows. -On the other hand, removing data should constitute a major version -because records previously relied on may no longer exist.
    • +minor: increment to communicate any changes to the structure that are likely to be compatible with any code written to work with the previous versions (i.e., allows code to run without error). Such changes might involve adding new data within the existing structure, so that the previous dataset version exists as a subset of the new version. For tabular data, this includes adding columns or rows. On the other hand, removing data should constitute a major version because records previously relied on may no longer exist.
    • -patch: Increment to communicate correction of errors in -the actual data, without any changes to the structure. Such changes are -unlikely to break or change analyses written with the previous version -in a substantial way.
    • +patch: Increment to communicate correction of errors in the actual data, without any changes to the structure. Such changes are unlikely to break or change analyses written with the previous version in a substantial way.

    -

    Figure: Semantic versioning communicates to users -the types of changes that have occurred between successive versions of -an evolving dataset, using a tri-digit label where increments in a -number indicate major, minor, and patch-level changes, respectively. -From Falster et al -2019, (CC-BY).

    -

    The process of making a release is as follows. Note that -corresponding releases and versions are needed in both -austraits and austraits.build:

    +

    Figure: Semantic versioning communicates to users the types of changes that have occurred between successive versions of an evolving dataset, using a tri-digit label where increments in a number indicate major, minor, and patch-level changes, respectively. From Falster et al 2019, (CC-BY).

    +

    The process of making a release is as follows. Note that corresponding releases and versions are needed in both austraits and austraits.build:

      -
    1. Update the version number in the DESCRIPTION file, using -`

    2. +
    3. Update the version number in the DESCRIPTION file, using `

    4. Compile austraits.build.

    5. Update the documentation.

    6. Commit and push to github.

    7. Make a release on github, adding version number

    8. -
    9. Prepare for the next version by updating version -numbers.

    10. +
    11. Prepare for the next version by updating version numbers.

    @@ -2037,44 +1256,26 @@

    File types

    CSV

    -

    A comma-separated values (CSV) file is a delimited text file that -uses a comma to separate values. Each line of the file is a data record. -Each record consists of one or more fields, separated by commas. This is -a comma format for storing tables of data in a simple text file. You can -edit it an Excel or in a text editor. For more, see here.

    +

    A comma-separated values (CSV) file is a delimited text file that uses a comma to separate values. Each line of the file is a data record. Each record consists of one or more fields, separated by commas. This is a comma format for storing tables of data in a simple text file. You can edit it an Excel or in a text editor. For more, see here.

    YAML files

    -

    The yml file extension (pronounced “YAML”) is a type structured data -file, that is both human and machine readable. You can edit it in -any text editor, or in Rstudio. Generally, yml is used in situations -where a table does not suit because of variable lengths and/or nested -structures. It has the advantage over a spreadsheet in that the nested -“headers” can have variable numbers of categories. The data under each -of the hierarchical headings are easily extracted by R.

    +

    The yml file extension (pronounced “YAML”) is a type structured data file, that is both human and machine readable. You can edit it in any text editor, or in Rstudio. Generally, yml is used in situations where a table does not suit because of variable lengths and/or nested structures. It has the advantage over a spreadsheet in that the nested “headers” can have variable numbers of categories. The data under each of the hierarchical headings are easily extracted by R.

    Extracting data from PDF tables

    -

    If you encounter a PDF table of data and need to extract values, this -can be achieved with the tabula-java -tool. There’s actually an R wrapper (called tabulizer), -but we haven’t succeeded in getting this running. However, it’s easy -enough to run the java tool at the command line on OSX.

    +

    If you encounter a PDF table of data and need to extract values, this can be achieved with the tabula-java tool. There’s actually an R wrapper (called tabulizer), but we haven’t succeeded in getting this running. However, it’s easy enough to run the java tool at the command line on OSX.

      -
    1. Download latest -release of tabula-java and save the file in your -path

    2. -
    3. Run

    4. +
    5. Download latest release of tabula-java and save the file in your path

    6. +
    7. Run
    java -jar tabula-1.0.3-jar-with-dependencies.jar my_table.pdf -o my_data.csv
    -

    This should output the data from the table in -my_table.pdf into the csv my_data.csv

    +

    This should output the data from the table in my_table.pdf into the csv my_data.csv

      -
    1. Clean up in Excel. check especially that the locations of white -spaces are correct.
    2. +
    3. Clean up in Excel. check especially that the locations of white spaces are correct.
    diff --git a/docs/articles/austraits.build.html b/docs/articles/austraits.build.html index 318a11e30..58a095db9 100644 --- a/docs/articles/austraits.build.html +++ b/docs/articles/austraits.build.html @@ -80,7 +80,7 @@ Adding new datasets into `AusTraits`
  • - Docker for reproducible compute environment + Docker for reproducible compute environment
  • @@ -127,18 +127,9 @@

    2023-02-09

    What type of repository is this and who is it for?

    -

    The main purpose of this repo is to build AusTraits, a -curated database of traits for Australian flora. It contains code and -data to build the harmonised AusTraits database outputs. The intended -audiences are those who are interested in building AusTraits from -scratch, or contributing data to AusTraits.

    -

    Those interested in simply using data from AusTraits should visit and -download the compiled resource from the versioned releases archived on -Zenodo at doi: 10.5281/zenodo.3568417.

    -

    The repo is partway between a compendium and an R package. It is -structured like an R package, and contains code that can be installed as -a package. This allows us to use some of R’s package management -tools:

    +

    The main purpose of this repo is to build AusTraits, a curated database of traits for Australian flora. It contains code and data to build the harmonised AusTraits database outputs. The intended audiences are those who are interested in building AusTraits from scratch, or contributing data to AusTraits.

    +

    Those interested in simply using data from AusTraits should visit and download the compiled resource from the versioned releases archived on Zenodo at doi: 10.5281/zenodo.3568417.

    +

    The repo is partway between a compendium and an R package. It is structured like an R package, and contains code that can be installed as a package. This allows us to use some of R’s package management tools:

    • use a DESCRIPTION file to document our dependencies
    • create our documentation using the package pkgdown
    • @@ -146,143 +137,91 @@

      What type of reposito
    • manage and build vignettes
    • use the devtools function to load functions.
    -

    It also contains data for rebuilding AusTraits. A key goal for us was -to make the process of harmonising different datasets as transparent as -possible. Our workflow is, therefore, fully-reproducible and open, -meaning it exposes the decisions made in the processing of data into a -harmonised and curated dataset (Figure 1); and can also be rerun by -others.

    +

    It also contains data for rebuilding AusTraits. A key goal for us was to make the process of harmonising different datasets as transparent as possible. Our workflow is, therefore, fully-reproducible and open, meaning it exposes the decisions made in the processing of data into a harmonised and curated dataset (Figure 1); and can also be rerun by others.

    Ways to contribute

    -

    We envision AusTraits as an ongoing collaborative community resource -that:

    +

    We envision AusTraits as an ongoing collaborative community resource that:

    1. Increases our collective understanding of the Australian flora
    2. Facilitates the accumulation and sharing of trait data
    3. Builds a sense of community among contributors and users
    4. -
    5. Aspires to be fully transparent and reproducible research of the -highest standard.
    6. +
    7. Aspires to be fully transparent and reproducible research of the highest standard.

    Below are some of the ways you can contribute.

    -

    *Please note that the AusTraits project is released with a Contributor -Code of Conduct. By contributing to this project you agree to abide -by its terms.

    +

    *Please note that the AusTraits project is released with a Contributor Code of Conduct. By contributing to this project you agree to abide by its terms.

    Contributing new data

    -

    We gladly accept new data contributions to AusTraits. If you would -like to contribute data, the requirements are:

    +

    We gladly accept new data contributions to AusTraits. If you would like to contribute data, the requirements are:

      -
    • Data was collected for Australian plant species growing in -Australia
    • -
    • You collected data on one of the traits listed in the trait -definitions table +
    • Data was collected for Australian plant species growing in Australia
    • +
    • You collected data on one of the traits listed in the trait definitions table
    • -
    • You are willing to release the data under an open license for reuse -by the scientific community
    • -
    • You make it is as easy as possible for us to incorporate your data -by following the instructions.
    • +
    • You are willing to release the data under an open license for reuse by the scientific community
    • +
    • You make it is as easy as possible for us to incorporate your data by following the instructions.
    -

    If you want to contribute data, please review the instructions here -on how to contribute data.

    +

    If you want to contribute data, please review the instructions here on how to contribute data.

    Reporting errors and improving documentation

    -

    Data contributors and data users who are less familiar with the -AusTraits format and code than the custodians may determine that -important descriptions or steps are omitted from this documentation -file. We welcome additions and edits that make using the existing data -or adding new data easier for the community.

    -

    If you notice a possible error in AusTraits, please post an -issue on GitHub. If you can, please provide code illustrating the -problem.

    -

    If you would like to value-add to AusTraits in some other way, please -get in contact with an idea or offer of time.

    +

    Data contributors and data users who are less familiar with the AusTraits format and code than the custodians may determine that important descriptions or steps are omitted from this documentation file. We welcome additions and edits that make using the existing data or adding new data easier for the community.

    +

    If you notice a possible error in AusTraits, please post an issue on GitHub. If you can, please provide code illustrating the problem.

    +

    If you would like to value-add to AusTraits in some other way, please get in contact with an idea or offer of time.

    Improving data quality

    -

    A core initiative of AusTraits from 2021-2023 is to refine and better -document the trait names, definitions, and values that are the direct -link from each contributor’s dataset to the harmonised database. This -effort is funded by an Australian Research Data Commons (ARDC) grant -through their Australian Data Partnerships program. It includes both a -review of definitions by the core AusTraits team and a series of -workshops to discuss clusters of related trait definitions.

    -

    The goal is to link as many trait names as possible to established, -published definitions (e.g. in the traits handbook, a review paper on a -method or manuscripts regularly cited as the standard for a specific -trait). In addition, the list of allowable values for each categorical -trait will be reviewed and revised.

    -

    If you are interested in contributing expertise to the revision of a -given trait (or cluster of related traits), please contact us.

    +

    A core initiative of AusTraits from 2021-2023 is to refine and better document the trait names, definitions, and values that are the direct link from each contributor’s dataset to the harmonised database. This effort is funded by an Australian Research Data Commons (ARDC) grant through their Australian Data Partnerships program. It includes both a review of definitions by the core AusTraits team and a series of workshops to discuss clusters of related trait definitions.

    +

    The goal is to link as many trait names as possible to established, published definitions (e.g. in the traits handbook, a review paper on a method or manuscripts regularly cited as the standard for a specific trait). In addition, the list of allowable values for each categorical trait will be reviewed and revised.

    +

    If you are interested in contributing expertise to the revision of a given trait (or cluster of related traits), please contact us.

    Compiling AusTraits

    -

    In this section, we describe how to build the harmonised dataset. By -“compiling” we mean transforming data from all the different studies -into a harmonised common format. As described above, and depicted in -Figure 1, AusTraits is built so that the database can be rebuilt from -its parts at any time. This means that decisions made along the way (in -how data is transformed or encoded) can be inspected and modified, and -new data can be easily incorporated.

    -

    The first step to compile AusTraits is to download a copy of the austraits.build -repository from Github. Then open the Rstudio project, or open R into -the right repo directory.

    +

    In this section, we describe how to build the harmonised dataset. By “compiling” we mean transforming data from all the different studies into a harmonised common format. As described above, and depicted in Figure 1, AusTraits is built so that the database can be rebuilt from its parts at any time. This means that decisions made along the way (in how data is transformed or encoded) can be inspected and modified, and new data can be easily incorporated.

    +

    The first step to compile AusTraits is to download a copy of the austraits.build repository from Github. Then open the Rstudio project, or open R into the right repo directory.

    Dependencies

    -

    To check you have the right packages installed, you can use the devtools package to -run:

    +

    To check you have the right packages installed, you can use the devtools package to run:

    -#install.packages("devtools")  # install devtools if needed
    -devtools::install(quick=TRUE)
    +#install.packages("devtools") # install devtools if needed +devtools::install(quick=TRUE)

    Source functions

    To successfully compile AusTraits you need to load the package

    -

    and source some custom functions written explicitly for this -database:

    +library("austraits.build")
    +

    and source some custom functions written explicitly for this database:

    -source("scripts/custom.R")        # functions used in custom_R_code
    +source("scripts/custom.R") # functions used in custom_R_code

    Compile via remake

    -

    One of the packages that will be installed with the above is remake. This -package manages the compiling, and also helps streamline the amount of -recompiling needed when new sources are added.

    -

    Running the following command will rebuild AusTraits and save the -assembled database into an RDS file located in -export/data/curr/austraits.rds.

    +

    One of the packages that will be installed with the above is remake. This package manages the compiling, and also helps streamline the amount of recompiling needed when new sources are added.

    +

    Running the following command will rebuild AusTraits and save the assembled database into an RDS file located in export/data/curr/austraits.rds.

    -remake::make()
    -

    Remake can also load the compiled dataset directly into R by -calling:

    +remake::make()
    +

    Remake can also load the compiled dataset directly into R by calling:

    -austraits <- remake::make("austraits")
    +austraits <- remake::make("austraits")

    Developing AusTraits

    -

    For those working with AusTraits code base or data, you may want to -read about

    +

    For those working with AusTraits code base or data, you may want to read about

    diff --git a/docs/articles/austraits_database_structure.html b/docs/articles/austraits_database_structure.html index 48bfd2052..8d4644b9c 100644 --- a/docs/articles/austraits_database_structure.html +++ b/docs/articles/austraits_database_structure.html @@ -80,7 +80,7 @@ Adding new datasets into `AusTraits`
  • - Docker for reproducible compute environment + Docker for reproducible compute environment
  • @@ -123,14 +123,9 @@

    2023-02-09

    -

    This document describes the structure of the AusTraits compilation, -corresponding to Version 3.0.2 of the dataset.

    -

    Note that the information provided below is based on the information -provided within the file austraits.build_schema.yml, which -can be accessed by running get_schema or -system.file("support", "austraits.build_schema.yml", package = "austraits.build").

    -

    AusTraits is essentially a series of linked components, which cross -link against each other::

    +

    This document describes the structure of the AusTraits compilation, corresponding to Version 3.0.2 of the dataset.

    +

    Note that the information provided below is based on the information provided within the file austraits.build_schema.yml, which can be accessed by running get_schema or system.file("support", "austraits.build_schema.yml", package = "austraits.build").

    +

    AusTraits is essentially a series of linked components, which cross link against each other::

    austraits
     ├── traits
     ├── locations
    @@ -145,8 +140,7 @@ 

    2023-02-09

    ├── schema ├── metadata └── build_info
    -

    These include all the data and contextual information submitted with -each contributed dataset.

    +

    These include all the data and contextual information submitted with each contributed dataset.

    Components

    @@ -154,8 +148,7 @@

    Components

    traits

    -

    Description: A table containing measurements of -traits.

    +

    Description: A table containing measurements of traits.

    Content:

    @@ -172,10 +165,7 @@

    traits dataset_id

    @@ -183,10 +173,7 @@

    traits taxon_name

    @@ -194,13 +181,7 @@

    traits observation_id

    @@ -208,8 +189,7 @@

    traits trait_name

    @@ -225,8 +205,7 @@

    traits unit

    @@ -234,8 +213,7 @@

    traits entity_type

    @@ -243,8 +221,7 @@

    traits value_type

    @@ -260,14 +237,7 @@

    traits replicates

    @@ -275,8 +245,7 @@

    traits basis_of_record

    @@ -284,9 +253,7 @@

    traits life_stage

    @@ -294,10 +261,7 @@

    traits population_id

    @@ -305,12 +269,7 @@

    traits individual_id

    @@ -318,9 +277,7 @@

    traits temporal_id

    @@ -328,8 +285,7 @@

    traits source_id

    @@ -337,9 +293,7 @@

    traits location_id

    @@ -347,9 +301,7 @@

    traits entity_context_id

    @@ -357,10 +309,7 @@

    traits plot_id

    @@ -368,9 +317,7 @@

    traits treatment_id

    @@ -378,11 +325,7 @@

    traits collection_date

    @@ -398,11 +341,7 @@

    traits method_id

    @@ -419,19 +358,11 @@

    traits
    Entity type
    -

    An entity is the feature of interest, indicating what a -trait value applies to. While an entity can be just a component of an -organism, within the scope of AusTraits, an individual is -the finest scale entity that can be documented. The same study might -measure some traits at a population-level -(entity = population) and others at an individual-level -(entity = individual).

    +

    An entity is the feature of interest, indicating what a trait value applies to. While an entity can be just a component of an organism, within the scope of AusTraits, an individual is the finest scale entity that can be documented. The same study might measure some traits at a population-level (entity = population) and others at an individual-level (entity = individual).

    In detail:

    • -entity_type is a categorical variable specifying the -entity corresponding to the trait values recorded. Possible values -are:
    • +entity_type is a categorical variable specifying the entity corresponding to the trait values recorded. Possible values are:

    -Primary identifier for each study contributed to AusTraits; most often -these are scientific papers, books, or online resources. By default this -should be the name of the first author and year of publication, -e.g. Falster_2005. +Primary identifier for each study contributed to AusTraits; most often these are scientific papers, books, or online resources. By default this should be the name of the first author and year of publication, e.g. Falster_2005.
    -Scientific name of the taxon on which traits were sampled, without -authorship. When possible, this is the currently accepted (botanical) or -valid (zoological) scientific name, but might also be a higher taxonomic -level. +Scientific name of the taxon on which traits were sampled, without authorship. When possible, this is the currently accepted (botanical) or valid (zoological) scientific name, but might also be a higher taxonomic level.
    -A unique integral identifier for the observation, where an observation -is all measurements made on an individual at a single point in time. It -is important for joining traits coming from the same -observation_id. Within each dataset, observation_id’s are -unique combinations of taxon_name, -population_id, individual_id, and -temporal_id. +A unique integral identifier for the observation, where an observation is all measurements made on an individual at a single point in time. It is important for joining traits coming from the same observation_id. Within each dataset, observation_id’s are unique combinations of taxon_name, population_id, individual_id, and temporal_id.
    -Name of the trait sampled. Allowable values specified in the table -definitions. +Name of the trait sampled. Allowable values specified in the table definitions.
    -Units of the sampled trait value after aligning with AusTraits -standards. +Units of the sampled trait value after aligning with AusTraits standards.
    -A categorical variable specifying the entity corresponding to the trait -values recorded. +A categorical variable specifying the entity corresponding to the trait values recorded.
    -A categorical variable describing the statistical nature of the trait -value recorded. +A categorical variable describing the statistical nature of the trait value recorded.
    -Number of replicate measurements that comprise a recorded trait -measurement. A numeric value (or range) is ideal and appropriate if the -value type is a mean, median, min -or max. For these value types, if replication is unknown -the entry should be unknown. If the value type is -raw_value the replicate value should be 1. If the trait is -categorical or the value indicates a measurement for an entire species -(or other taxon) replicate value should be .na. +Number of replicate measurements that comprise a recorded trait measurement. A numeric value (or range) is ideal and appropriate if the value type is a mean, median, min or max. For these value types, if replication is unknown the entry should be unknown. If the value type is raw_value the replicate value should be 1. If the trait is categorical or the value indicates a measurement for an entire species (or other taxon) replicate value should be .na.
    -A categorical variable specifying from which kind of specimen traits -were recorded. +A categorical variable specifying from which kind of specimen traits were recorded.
    -A field to indicate the life stage or age class of the entity measured. -Standard values are adult, sapling, -seedling and juvenile. +A field to indicate the life stage or age class of the entity measured. Standard values are adult, sapling, seedling and juvenile.
    -A unique integer identifier for a population, where a population is -defined as individuals growing in the same location (location_id -/location_name) and plot (plot_id, a context category) and being -subjected to the same treatment (treatment_id, a context category). +A unique integer identifier for a population, where a population is defined as individuals growing in the same location (location_id /location_name) and plot (plot_id, a context category) and being subjected to the same treatment (treatment_id, a context category).
    -A unique integer identifier for an individual, with individuals numbered -sequentially within each dataset by taxon by population grouping. Most -often each row of data represents an individual, but in some datasets -trait data collected on a single individual is presented across multiple -rows of data, such as if the same trait is measured using different -methods or the same individual is measured repeatedly across time. +A unique integer identifier for an individual, with individuals numbered sequentially within each dataset by taxon by population grouping. Most often each row of data represents an individual, but in some datasets trait data collected on a single individual is presented across multiple rows of data, such as if the same trait is measured using different methods or the same individual is measured repeatedly across time.
    -A unique integer identifier assigned where repeat observations are made -on the same individual (or population, or taxon) across time. The -identifier links to specific information in the context table. +A unique integer identifier assigned where repeat observations are made on the same individual (or population, or taxon) across time. The identifier links to specific information in the context table.
    -For datasets that are compilations, an identifier for the original data -source. +For datasets that are compilations, an identifier for the original data source.
    -A unique integer identifier for a location, with locations numbered -sequentially within a dataset. The identifier links to specific -information in the location table. +A unique integer identifier for a location, with locations numbered sequentially within a dataset. The identifier links to specific information in the location table.
    -A unique integer identifier indicating specific contextual properties of -an individual, possibly including the individual’s sex or caste (for -social insects). +A unique integer identifier indicating specific contextual properties of an individual, possibly including the individual’s sex or caste (for social insects).
    -A unique integer identifier for a plot, where a plot is a distinct -collection of organisms within a single geographic location, such as -plants growing on different aspects or blocks in an experiment. The -identifier links to specific information in the context table. +A unique integer identifier for a plot, where a plot is a distinct collection of organisms within a single geographic location, such as plants growing on different aspects or blocks in an experiment. The identifier links to specific information in the context table.
    -A unique integer identifier for a treatment, where a treatment is any -experimental manipulation to an organism’s growing/living conditions. -The identifier links to specific information in the context table. +A unique integer identifier for a treatment, where a treatment is any experimental manipulation to an organism’s growing/living conditions. The identifier links to specific information in the context table.
    -Date sample was taken, in the format yyyy-mm-dd, -yyyy-mm or yyyy, depending on the resoluton -specified. Alternatively an overall range for the study can be -indicating, with the starting and ending sample date sepatated by a -/, as in 2010-10/2011-03 +Date sample was taken, in the format yyyy-mm-dd, yyyy-mm or yyyy, depending on the resoluton specified. Alternatively an overall range for the study can be indicating, with the starting and ending sample date sepatated by a /, as in 2010-10/2011-03
    -A unique integer identifier indicating a trait is measured multiple -times on the same entity, with different methods used for each entry. -This field is only used if a single trait is measured using multiple -methods within the same dataset. The identifier links to specific -information in the context table. +A unique integer identifier indicating a trait is measured multiple times on the same entity, with different methods used for each entry. This field is only used if a single trait is measured using multiple methods within the same dataset. The identifier links to specific information in the context table.
    @@ -456,8 +387,7 @@
    Entity type -Value represents a summary statistic from multiple individuals at a -single location. +Value represents a summary statistic from multiple individuals at a single location.
    @@ -465,8 +395,7 @@
    Entity type -Value represents a summary statistic from individuals of the taxon -across multiple locations. +Value represents a summary statistic from individuals of the taxon across multiple locations.
    @@ -474,12 +403,7 @@
    Entity type -Value represents a summary statistic for a species or infraspecific -taxon across its range or as estimated by an expert based on their -knowledge of the taxon. Data fitting this category include estimates -from reference books that represent a taxon’s entire range and values -for categorical variables obtained from a reference book or identified -by an expert. +Value represents a summary statistic for a species or infraspecific taxon across its range or as estimated by an expert based on their knowledge of the taxon. Data fitting this category include estimates from reference books that represent a taxon’s entire range and values for categorical variables obtained from a reference book or identified by an expert.
    @@ -487,8 +411,7 @@
    Entity type -Value represents a summary statistic or expert score for an entire -genus. +Value represents a summary statistic or expert score for an entire genus.
    @@ -496,8 +419,7 @@
    Entity type -Value represents a summary statistic or expert score for an entire -family. +Value represents a summary statistic or expert score for an entire family.
    @@ -505,8 +427,7 @@
    Entity type -Value represents a summary statistic or expert score for an entire -order. +Value represents a summary statistic or expert score for an entire order.
    @@ -515,84 +436,29 @@
    Entity type
    Identifiers
    -

    The traits table includes 12 identifiers, dataset_id, -observation_id, taxon_name, -population_id, individual_id, -temporal_id, source_id, -location_id, entity_context_id, -plot_id, treatment_id, and -method_id.

    -

    dataset_id, source_id and -taxon_name have easy-to-interpret values. The others are -simply integral identifiers that link groups of measurements and are -automatically generated through the AusTraits workflow -(individual_id can be assigned in the metadata file or -automatically generated.)

    +

    The traits table includes 12 identifiers, dataset_id, observation_id, taxon_name, population_id, individual_id, temporal_id, source_id, location_id, entity_context_id, plot_id, treatment_id, and method_id.

    +

    dataset_id, source_id and taxon_name have easy-to-interpret values. The others are simply integral identifiers that link groups of measurements and are automatically generated through the AusTraits workflow (individual_id can be assigned in the metadata file or automatically generated.)

    To expand on the definitions provided above,

      -
    • observation_id links measurements made on the same -entity (individual, population, or species) at a single point in -time.

    • -
    • population_id indicates entites that share a common -location_id, plot_id, and -treatment_id. It is used to align measurements and -observation_id’s for individuals versus -populations (i.e. distinct entity_types) that -share a common population_id. It is numbered sequentially -within a dataset.

    • -
    • individual_id indicates a unique organisms. It is -numbered sequentially within a dataset by population. Multiple -observations on the same organism across time (with distinct -observation_id values), share a common -individual_id.

    • -
    • temporal_id indicates a distinct point in time and -is used only if there are repeat measurements on a population or -individual across time. The identifier links to context properties -(& their associated information) in the contexts table -for context properties of type temporal.

    • -
    • source_id is applied if not all data within a single -dataset (dataset_id) is from the same source, such as when -a dataset represents a compilation for a meta-analysis.

    • -
    • location_id links to a distinct -location_name and associated -location_properties in the location -table.

    • -
    • entity_context_id links to information in the -contexts table for context properties (& associated -values/descriptions) with category entity_context. -Entity_contexts include organism sex, organism caste and -any other features of an entity that needs to be documented.

    • -
    • plot_id links to information in the -contexts table for context properties (& associated -values/descriptions) with category plot. -Plot contexts include both blocks/plots within an -experimental design as well as any stratified variation within a -location that needs to be documented (e.g. slope position).

    • -
    • treatment_idlinks to information in the -contexts table for context properties (& associated -values/descriptions) with category treatment. -Treatment contexts are experimental manipulations applied -to groups of individuals.

    • -
    • method_idlinks to information in the -contexts table for context properties (& associated -values/descriptions) with category method. A -method context indicates that the same trait was measured -on or across individuals using different methods.

    • +
    • observation_id links measurements made on the same entity (individual, population, or species) at a single point in time.

    • +
    • population_id indicates entites that share a common location_id, plot_id, and treatment_id. It is used to align measurements and observation_id’s for individuals versus populations (i.e. distinct entity_types) that share a common population_id. It is numbered sequentially within a dataset.

    • +
    • individual_id indicates a unique organisms. It is numbered sequentially within a dataset by population. Multiple observations on the same organism across time (with distinct observation_id values), share a common individual_id.

    • +
    • temporal_id indicates a distinct point in time and is used only if there are repeat measurements on a population or individual across time. The identifier links to context properties (& their associated information) in the contexts table for context properties of type temporal.

    • +
    • source_id is applied if not all data within a single dataset (dataset_id) is from the same source, such as when a dataset represents a compilation for a meta-analysis.

    • +
    • location_id links to a distinct location_name and associated location_properties in the location table.

    • +
    • entity_context_id links to information in the contexts table for context properties (& associated values/descriptions) with category entity_context. Entity_contexts include organism sex, organism caste and any other features of an entity that needs to be documented.

    • +
    • plot_id links to information in the contexts table for context properties (& associated values/descriptions) with category plot. Plot contexts include both blocks/plots within an experimental design as well as any stratified variation within a location that needs to be documented (e.g. slope position).

    • +
    • treatment_idlinks to information in the contexts table for context properties (& associated values/descriptions) with category treatment. Treatment contexts are experimental manipulations applied to groups of individuals.

    • +
    • method_idlinks to information in the contexts table for context properties (& associated values/descriptions) with category method. A method context indicates that the same trait was measured on or across individuals using different methods.

    -

    As well, measurement_remarks is used to document brief -comments or notes accompanying the trait measurement.

    +

    As well, measurement_remarks is used to document brief comments or notes accompanying the trait measurement.

    Life stage, basis of record
      -
    • life_stage is a field to indicate the life stage or -age class of the entity measured. standard values are -adult, sapling, seedling and -juvenile..

    • -
    • basis_of_record is a categorical variable specifying -from which kind of specimen traits were recorded. Possible values -are:

    • +
    • life_stage is a field to indicate the life stage or age class of the entity measured. standard values are adult, sapling, seedling and juvenile..

    • +
    • basis_of_record is a categorical variable specifying from which kind of specimen traits were recorded. Possible values are:

    @@ -617,8 +483,7 @@
    Life stage, basis of record
    @@ -626,8 +491,7 @@
    Life stage, basis of record
    @@ -635,8 +499,7 @@
    Life stage, basis of record
    @@ -644,8 +507,7 @@
    Life stage, basis of record
    @@ -653,8 +515,7 @@
    Life stage, basis of record
    @@ -663,25 +524,10 @@
    Life stage, basis of record
    Values, Value types, basis of value
    -

    Each record in the table of trait data has an associated -value, value_type, and -basis_of_value.

    -

    Value type: A trait’s value_type is either -numeric or categorical. - For traits with -numerical values, the recorded value has been converted into -standardised units and the AusTraits workflow has confirmed the value -can be converted into a number and lies within the allowable range. - -For categorical variables, records have been aligned through -substitutions to values listed as allowable values (terms) in a trait’s -definition. * we use _ for multi-word terms, -e.g. semi_deciduous
    -* we use a space for situations where two values co-occur for the same -entity. For instance, a flora might indicate that a plant species can be -either annual or biennial, in which case the trait is scored as -annual biennial.

    -

    Each trait measurement has an associated value_type, -which is a categorical variable describing the statistical nature of the -trait value recorded. Possible values are:

    +

    Each record in the table of trait data has an associated value, value_type, and basis_of_value.

    +

    Value type: A trait’s value_type is either numeric or categorical. - For traits with numerical values, the recorded value has been converted into standardised units and the AusTraits workflow has confirmed the value can be converted into a number and lies within the allowable range. - For categorical variables, records have been aligned through substitutions to values listed as allowable values (terms) in a trait’s definition. * we use _ for multi-word terms, e.g. semi_deciduous
    +* we use a space for situations where two values co-occur for the same entity. For instance, a flora might indicate that a plant species can be either annual or biennial, in which case the trait is scored as annual biennial.

    +

    Each trait measurement has an associated value_type, which is a categorical variable describing the statistical nature of the trait value recorded. Possible values are:

    -Traits were recorded on entities living under experimentally manipulated -conditions in the field. +Traits were recorded on entities living under experimentally manipulated conditions in the field.
    -Traits were recorded on entities living in a common garden, arboretum, -or botanical or zoological garden. +Traits were recorded on entities living in a common garden, arboretum, or botanical or zoological garden.
    -Traits were recorded on entities growing in a lab, glasshouse or growth -chamber. +Traits were recorded on entities growing in a lab, glasshouse or growth chamber.
    -Traits were recorded from specimens preserved in a collection, eg. -herbarium or museum +Traits were recorded from specimens preserved in a collection, eg. herbarium or museum
    -Traits were sourced from values reported in the literature, and where -the basis of record is not otherwise known. +Traits were sourced from values reported in the literature, and where the basis of record is not otherwise known.
    @@ -767,9 +612,7 @@
    Values, Value types, basis of value
    @@ -737,8 +583,7 @@
    Values, Value types, basis of value
    -Value is the mode of values recorded for an entity. This is the -appropriate value type for a categorical trait value. +Value is the mode of values recorded for an entity. This is the appropriate value type for a categorical trait value.
    -

    Each trait measurement also has an associated -basis_of_value, which is a categorical variable describing -how the trait value was obtained. Possible values are:

    +

    Each trait measurement also has an associated basis_of_value, which is a categorical variable describing how the trait value was obtained. Possible values are:

    @@ -815,21 +657,13 @@
    Values, Value types, basis of value
    @@ -793,8 +636,7 @@
    Values, Value types, basis of value
    -Value has been estimated by an expert based on their knowledge of the -entity. +Value has been estimated by an expert based on their knowledge of the entity.
    -

    AusTraits does not include intra-individual observations made at a -single point in time. When multiple measurements per individual are -submitted to AusTraits, we take the mean of the values and record the -value_type as mean and indicate under replicates the number -of measurements made.

    +

    AusTraits does not include intra-individual observations made at a single point in time. When multiple measurements per individual are submitted to AusTraits, we take the mean of the values and record the value_type as mean and indicate under replicates the number of measurements made.

    locations

    -

    Description: A table containing observations of -location/site characteristics associated with information in -traits. Cross referencing between the two dataframes is -possible using combinations of the variables dataset_id, -location_name.

    +

    Description: A table containing observations of location/site characteristics associated with information in traits. Cross referencing between the two dataframes is possible using combinations of the variables dataset_id, location_name.

    Content:

    @@ -846,10 +680,7 @@

    locations -Primary identifier for each study contributed to AusTraits; most often -these are scientific papers, books, or online resources. By default this -should be the name of the first author and year of publication, -e.g. Falster_2005. +Primary identifier for each study contributed to AusTraits; most often these are scientific papers, books, or online resources. By default this should be the name of the first author and year of publication, e.g. Falster_2005.

    @@ -857,9 +688,7 @@

    locations -A unique integer identifier for a location, with locations numbered -sequentially within a dataset. The identifier links to specific -information in the location table. +A unique integer identifier for a location, with locations numbered sequentially within a dataset. The identifier links to specific information in the location table.

    @@ -875,11 +704,7 @@

    locations -The location characteristic being recorded. The name should include -units of measurement, e.g. MAT (C). Ideally we have at -least the following variables for each location, -longitude (deg), latitude (deg), -description. +The location characteristic being recorded. The name should include units of measurement, e.g. MAT (C). Ideally we have at least the following variables for each location, longitude (deg), latitude (deg), description.

    @@ -896,11 +721,7 @@

    locations

    contexts

    -

    Description: A table containing observations of -contextual characteristics associated with information in -traits. Cross referencing between the two dataframes is -possible using combinations of the variables dataset_id, -link_id, and link_vals.

    +

    Description: A table containing observations of contextual characteristics associated with information in traits. Cross referencing between the two dataframes is possible using combinations of the variables dataset_id, link_id, and link_vals.

    Content:

    @@ -917,10 +738,7 @@

    contexts

    @@ -928,8 +746,7 @@

    contexts

    @@ -937,9 +754,7 @@

    contexts

    @@ -963,8 +778,7 @@

    contexts

    @@ -972,9 +786,7 @@

    contexts

    @@ -983,11 +795,7 @@

    contexts

    methods

    -

    Description: A table containing details on methods -with which data were collected, including time frame and source. Cross -referencing with the traits table is possible using -combinations of the variables dataset_id, -trait_name.

    +

    Description: A table containing details on methods with which data were collected, including time frame and source. Cross referencing with the traits table is possible using combinations of the variables dataset_id, trait_name.

    Content:

    -Primary identifier for each study contributed to AusTraits; most often -these are scientific papers, books, or online resources. By default this -should be the name of the first author and year of publication, -e.g. Falster_2005. +Primary identifier for each study contributed to AusTraits; most often these are scientific papers, books, or online resources. By default this should be the name of the first author and year of publication, e.g. Falster_2005.
    -The contextual characteristic being recorded. If applicable, name should -include units of measurement, e.g. CO2 concentration (ppm). +The contextual characteristic being recorded. If applicable, name should include units of measurement, e.g. CO2 concentration (ppm).
    -The category of context property, with options being plot, -treatment, individual_context, -temporal and method. +The category of context property, with options being plot, treatment, individual_context, temporal and method.
    -Variable indicating which identifier column in the traits table contains -the specified link_vals. +Variable indicating which identifier column in the traits table contains the specified link_vals.
    -Unique integer identifiers that link between identifier columns in the -traits table and the contextual properties/values in the -contexts table. +Unique integer identifiers that link between identifier columns in the traits table and the contextual properties/values in the contexts table.
    @@ -1004,10 +812,7 @@

    methods

    @@ -1015,8 +820,7 @@

    methods

    @@ -1024,11 +828,7 @@

    methods

    @@ -1044,11 +844,7 @@

    methods

    @@ -1056,8 +852,7 @@

    methods

    @@ -1065,8 +860,7 @@

    methods

    @@ -1074,8 +868,7 @@

    methods

    @@ -1083,8 +876,7 @@

    methods

    @@ -1092,8 +884,7 @@

    methods

    @@ -1101,8 +892,7 @@

    methods

    @@ -1118,8 +908,7 @@

    methods

    @@ -1127,8 +916,7 @@

    methods

    @@ -1137,13 +925,7 @@

    methods

    exluded_data

    -

    Description: A table of data that did not pass -quality tests and so were excluded from the master dataset. The -structure is identical to that presented in the traits -table, only with an extra column called error indicating -why the record was excluded. Common reasons are -missing_unit_conversions, missing_value, and -unsupported_trait_value.

    +

    Description: A table of data that did not pass quality tests and so were excluded from the master dataset. The structure is identical to that presented in the traits table, only with an extra column called error indicating why the record was excluded. Common reasons are missing_unit_conversions, missing_value, and unsupported_trait_value.

    Content:

    -Primary identifier for each study contributed to AusTraits; most often -these are scientific papers, books, or online resources. By default this -should be the name of the first author and year of publication, -e.g. Falster_2005. +Primary identifier for each study contributed to AusTraits; most often these are scientific papers, books, or online resources. By default this should be the name of the first author and year of publication, e.g. Falster_2005.
    -Name of the trait sampled. Allowable values specified in the table -definitions. +Name of the trait sampled. Allowable values specified in the table definitions.
    -A textual description of the methods used to collect the trait data. -Whenever available, methods are taken near-verbatim from the referenced -source. Methods can include descriptions such as ‘measured on botanical -collections’, ‘data from the literature’, or a detailed description of -the field or lab methods used to collect the data. +A textual description of the methods used to collect the trait data. Whenever available, methods are taken near-verbatim from the referenced source. Methods can include descriptions such as ‘measured on botanical collections’, ‘data from the literature’, or a detailed description of the field or lab methods used to collect the data.
    -A written description of how study locations were selected and how study -individuals were selected. When available, this information is lifted -verbatim from a published manuscript. For preserved specimens, this -field ideally indicates which records were ‘sampled’ to measure a -specific trait. +A written description of how study locations were selected and how study individuals were selected. When available, this information is lifted verbatim from a published manuscript. For preserved specimens, this field ideally indicates which records were ‘sampled’ to measure a specific trait.
    -Citation key for the primary source in sources. The key is -typically formatted as Surname_year. +Citation key for the primary source in sources. The key is typically formatted as Surname_year.
    -Citation for the primary source. This detail is generated from the -primary source in the metadata. +Citation for the primary source. This detail is generated from the primary source in the metadata.
    -Citation key for the secondary source in sources. The key -is typically formatted as Surname_year. +Citation key for the secondary source in sources. The key is typically formatted as Surname_year.
    -Citations for the secondary source. This detail is generated from the -secondary source in the metadata. +Citations for the secondary source. This detail is generated from the secondary source in the metadata.
    -Citation key for the original dataset_id in sources; for compilations. -The key is typically formatted as Surname_year. +Citation key for the original dataset_id in sources; for compilations. The key is typically formatted as Surname_year.
    -Citations for the original dataset_id in sources; for compilationse. -This detail is generated from the original source in the metadata. +Citations for the original dataset_id in sources; for compilationse. This detail is generated from the original source in the metadata.
    -Names of additional people who played a more minor role in data -collection for the study. +Names of additional people who played a more minor role in data collection for the study.
    -Names of AusTraits team member(s) who contacted the data collectors and -added the study to the AusTraits repository. +Names of AusTraits team member(s) who contacted the data collectors and added the study to the AusTraits repository.
    @@ -1160,8 +942,7 @@

    exluded_data -Indicating why the record was excluded. Common reasons are -missing_unit_conversions, missing_value, and unsupported_trait_value. +Indicating why the record was excluded. Common reasons are missing_unit_conversions, missing_value, and unsupported_trait_value.

    @@ -1169,10 +950,7 @@

    exluded_data -Primary identifier for each study contributed to AusTraits; most often -these are scientific papers, books, or online resources. By default this -should be the name of the first author and year of publication, -e.g. Falster_2005. +Primary identifier for each study contributed to AusTraits; most often these are scientific papers, books, or online resources. By default this should be the name of the first author and year of publication, e.g. Falster_2005.

    @@ -1180,10 +958,7 @@

    exluded_data -Scientific name of the taxon on which traits were sampled, without -authorship. When possible, this is the currently accepted (botanical) or -valid (zoological) scientific name, but might also be a higher taxonomic -level. +Scientific name of the taxon on which traits were sampled, without authorship. When possible, this is the currently accepted (botanical) or valid (zoological) scientific name, but might also be a higher taxonomic level.

    @@ -1191,13 +966,7 @@

    exluded_data -A unique integral identifier for the observation, where an observation -is all measurements made on an individual at a single point in time. It -is important for joining traits coming from the same -observation_id. Within each dataset, observation_id’s are -unique combinations of taxon_name, -population_id, individual_id, and -temporal_id. +A unique integral identifier for the observation, where an observation is all measurements made on an individual at a single point in time. It is important for joining traits coming from the same observation_id. Within each dataset, observation_id’s are unique combinations of taxon_name, population_id, individual_id, and temporal_id.

    @@ -1205,8 +974,7 @@

    exluded_data -Name of the trait sampled. Allowable values specified in the table -definitions. +Name of the trait sampled. Allowable values specified in the table definitions.

    @@ -1222,8 +990,7 @@

    exluded_data -Units of the sampled trait value after aligning with AusTraits -standards. +Units of the sampled trait value after aligning with AusTraits standards.

    @@ -1231,8 +998,7 @@

    exluded_data -A categorical variable specifying the entity corresponding to the trait -values recorded. +A categorical variable specifying the entity corresponding to the trait values recorded.

    @@ -1240,8 +1006,7 @@

    exluded_data -A categorical variable describing the statistical nature of the trait -value recorded. +A categorical variable describing the statistical nature of the trait value recorded.

    @@ -1257,14 +1022,7 @@

    exluded_data -Number of replicate measurements that comprise a recorded trait -measurement. A numeric value (or range) is ideal and appropriate if the -value type is a mean, median, min -or max. For these value types, if replication is unknown -the entry should be unknown. If the value type is -raw_value the replicate value should be 1. If the trait is -categorical or the value indicates a measurement for an entire species -(or other taxon) replicate value should be .na. +Number of replicate measurements that comprise a recorded trait measurement. A numeric value (or range) is ideal and appropriate if the value type is a mean, median, min or max. For these value types, if replication is unknown the entry should be unknown. If the value type is raw_value the replicate value should be 1. If the trait is categorical or the value indicates a measurement for an entire species (or other taxon) replicate value should be .na.

    @@ -1272,8 +1030,7 @@

    exluded_data -A categorical variable specifying from which kind of specimen traits -were recorded. +A categorical variable specifying from which kind of specimen traits were recorded.

    @@ -1281,9 +1038,7 @@

    exluded_data -A field to indicate the life stage or age class of the entity measured. -Standard values are adult, sapling, -seedling and juvenile. +A field to indicate the life stage or age class of the entity measured. Standard values are adult, sapling, seedling and juvenile.

    @@ -1291,10 +1046,7 @@

    exluded_data -A unique integer identifier for a population, where a population is -defined as individuals growing in the same location (location_id -/location_name) and plot (plot_id, a context category) and being -subjected to the same treatment (treatment_id, a context category). +A unique integer identifier for a population, where a population is defined as individuals growing in the same location (location_id /location_name) and plot (plot_id, a context category) and being subjected to the same treatment (treatment_id, a context category).

    @@ -1302,12 +1054,7 @@

    exluded_data -A unique integer identifier for an individual, with individuals numbered -sequentially within each dataset by taxon by population grouping. Most -often each row of data represents an individual, but in some datasets -trait data collected on a single individual is presented across multiple -rows of data, such as if the same trait is measured using different -methods or the same individual is measured repeatedly across time. +A unique integer identifier for an individual, with individuals numbered sequentially within each dataset by taxon by population grouping. Most often each row of data represents an individual, but in some datasets trait data collected on a single individual is presented across multiple rows of data, such as if the same trait is measured using different methods or the same individual is measured repeatedly across time.

    @@ -1315,9 +1062,7 @@

    exluded_data -A unique integer identifier assigned where repeat observations are made -on the same individual (or population, or taxon) across time. The -identifier links to specific information in the context table. +A unique integer identifier assigned where repeat observations are made on the same individual (or population, or taxon) across time. The identifier links to specific information in the context table.

    @@ -1325,8 +1070,7 @@

    exluded_data -For datasets that are compilations, an identifier for the original data -source. +For datasets that are compilations, an identifier for the original data source.

    @@ -1334,9 +1078,7 @@

    exluded_data -A unique integer identifier for a location, with locations numbered -sequentially within a dataset. The identifier links to specific -information in the location table. +A unique integer identifier for a location, with locations numbered sequentially within a dataset. The identifier links to specific information in the location table.

    @@ -1344,9 +1086,7 @@

    exluded_data -A unique integer identifier indicating specific contextual properties of -an individual, possibly including the individual’s sex or caste (for -social insects). +A unique integer identifier indicating specific contextual properties of an individual, possibly including the individual’s sex or caste (for social insects).

    @@ -1354,10 +1094,7 @@

    exluded_data -A unique integer identifier for a plot, where a plot is a distinct -collection of organisms within a single geographic location, such as -plants growing on different aspects or blocks in an experiment. The -identifier links to specific information in the context table. +A unique integer identifier for a plot, where a plot is a distinct collection of organisms within a single geographic location, such as plants growing on different aspects or blocks in an experiment. The identifier links to specific information in the context table.

    @@ -1365,9 +1102,7 @@

    exluded_data -A unique integer identifier for a treatment, where a treatment is any -experimental manipulation to an organism’s growing/living conditions. -The identifier links to specific information in the context table. +A unique integer identifier for a treatment, where a treatment is any experimental manipulation to an organism’s growing/living conditions. The identifier links to specific information in the context table.

    @@ -1375,11 +1110,7 @@

    exluded_data -Date sample was taken, in the format yyyy-mm-dd, -yyyy-mm or yyyy, depending on the resoluton -specified. Alternatively an overall range for the study can be -indicating, with the starting and ending sample date sepatated by a -/, as in 2010-10/2011-03 +Date sample was taken, in the format yyyy-mm-dd, yyyy-mm or yyyy, depending on the resoluton specified. Alternatively an overall range for the study can be indicating, with the starting and ending sample date sepatated by a /, as in 2010-10/2011-03

    @@ -1395,11 +1126,7 @@

    exluded_data -A unique integer identifier indicating a trait is measured multiple -times on the same entity, with different methods used for each entry. -This field is only used if a single trait is measured using multiple -methods within the same dataset. The identifier links to specific -information in the context table. +A unique integer identifier indicating a trait is measured multiple times on the same entity, with different methods used for each entry. This field is only used if a single trait is measured using multiple methods within the same dataset. The identifier links to specific information in the context table.

    @@ -1416,15 +1143,8 @@

    exluded_data

    taxa

    -

    Description: A table containing details on taxa that -are included in the table traits. We -have attempted to align species names with known taxonomic units in the -Australian Plant Census -(APC) and/or the Australian Plant Names Index -(APNI); the sourced information is released under a CC-BY3 -license.

    -

    Version 0.9.0 of AusTraits contains records for 1245 different -taxa.

    +

    Description: A table containing details on taxa that are included in the table traits. We have attempted to align species names with known taxonomic units in the Australian Plant Census (APC) and/or the Australian Plant Names Index (APNI); the sourced information is released under a CC-BY3 license.

    +

    Version 0.9.0 of AusTraits contains records for 1255 different taxa.

    Content:

    @@ -1441,10 +1161,7 @@

    taxa taxon_name

    @@ -1452,8 +1169,7 @@

    taxa taxonomic_reference

    @@ -1469,9 +1185,7 @@

    taxa trinomial

    @@ -1479,9 +1193,7 @@

    taxa binomial

    @@ -1513,9 +1225,7 @@

    taxa establishment_means

    @@ -1523,10 +1233,7 @@

    taxa taxonomic_status

    @@ -1542,8 +1249,7 @@

    taxa scientific_name_authorship

    @@ -1551,9 +1257,7 @@

    taxa taxon_id

    @@ -1561,9 +1265,7 @@

    taxa scientific_name_id

    @@ -1572,13 +1274,7 @@

    taxa

    taxonomic_updates

    -

    Description: A table of all taxonomic changes -implemented in the construction of AusTraits. Changes are determined by -comparing the originally submitted taxon name against the taxonomic -names listed in the taxonomic reference files, best placed in a -subfolder in the config folder . Cross referencing with the -traits table is possible using combinations of the -variables dataset_id and taxon_name.

    +

    Description: A table of all taxonomic changes implemented in the construction of AusTraits. Changes are determined by comparing the originally submitted taxon name against the taxonomic names listed in the taxonomic reference files, best placed in a subfolder in the config folder . Cross referencing with the traits table is possible using combinations of the variables dataset_id and taxon_name.

    Content:

    -Scientific name of the taxon on which traits were sampled, without -authorship. When possible, this is the currently accepted (botanical) or -valid (zoological) scientific name, but might also be a higher taxonomic -level. +Scientific name of the taxon on which traits were sampled, without authorship. When possible, this is the currently accepted (botanical) or valid (zoological) scientific name, but might also be a higher taxonomic level.
    -Name of the taxonomy (tree) that contains this concept. ie. APC, AusMoss -etc. +Name of the taxonomy (tree) that contains this concept. ie. APC, AusMoss etc.
    -The infraspecific taxon name match for an original name. This column is -assigned na for taxon name that are at a broader -taxonomic_resolution. +The infraspecific taxon name match for an original name. This column is assigned na for taxon name that are at a broader taxonomic_resolution.
    -The species-level taxon name match for an original name. This column is -assigned na for taxon name that are at a broader -taxonomic_resolution. +The species-level taxon name match for an original name. This column is assigned na for taxon name that are at a broader taxonomic_resolution.
    -Statement about whether an organism or organisms have been introduced to -a given place and time through the direct or indirect activity of modern -humans. +Statement about whether an organism or organisms have been introduced to a given place and time through the direct or indirect activity of modern humans.
    -The status of the use of the scientificName as a label for the taxon in -regard to the ‘accepted (or valid) taxonomy’. The assigned taxonomic -status must be linked to a specific taxonomic reference that defines the -concept. +The status of the use of the scientificName as a label for the taxon in regard to the ‘accepted (or valid) taxonomy’. The assigned taxonomic status must be linked to a specific taxonomic reference that defines the concept.
    -The authorship information for the scientific name formatted according -to the conventions of the applicable. +The authorship information for the scientific name formatted according to the conventions of the applicable.
    -An identifier for the set of taxon information (data associated with the -taxon class). May be a global unique identifier or an identifier -specific to the data set. Must be resolvable within this dataset. +An identifier for the set of taxon information (data associated with the taxon class). May be a global unique identifier or an identifier specific to the data set. Must be resolvable within this dataset.
    -An identifier for the set of taxon information (data associated with the -taxon class). May be a global unique identifier or an identifier -specific to the data set. Must be resolvable within this dataset. +An identifier for the set of taxon information (data associated with the taxon class). May be a global unique identifier or an identifier specific to the data set. Must be resolvable within this dataset.
    @@ -1595,10 +1291,7 @@

    taxonomic_updates -Primary identifier for each study contributed to AusTraits; most often -these are scientific papers, books, or online resources. By default this -should be the name of the first author and year of publication, -e.g. Falster_2005. +Primary identifier for each study contributed to AusTraits; most often these are scientific papers, books, or online resources. By default this should be the name of the first author and year of publication, e.g. Falster_2005.

    @@ -1614,12 +1307,7 @@

    taxonomic_updates -The taxon name without authorship after implementing automated syntax -standardisation and spelling changes as well as manually encoded syntax -alignments for this taxon in the metadata file for the corresponding -dataset_id. This name has not yet been matched to the -currently accepted (botanical) or valid (zoological) taxon name in cases -where there are taxonomic synonyms, isonyms, orthographic variants, etc. +The taxon name without authorship after implementing automated syntax standardisation and spelling changes as well as manually encoded syntax alignments for this taxon in the metadata file for the corresponding dataset_id. This name has not yet been matched to the currently accepted (botanical) or valid (zoological) taxon name in cases where there are taxonomic synonyms, isonyms, orthographic variants, etc.

    @@ -1627,8 +1315,7 @@

    taxonomic_updates -The rank of the most specific taxon name (or scientific name) to which a -submitted orignal name resolves. +The rank of the most specific taxon name (or scientific name) to which a submitted orignal name resolves.

    @@ -1636,10 +1323,7 @@

    taxonomic_updates -An identifier for the cleaned name before it is updated to the currently -accepted name usage. This may be a global unique identifier or an -identifier specific to the data set. Must be resolvable within this -dataset. +An identifier for the cleaned name before it is updated to the currently accepted name usage. This may be a global unique identifier or an identifier specific to the data set. Must be resolvable within this dataset.

    @@ -1647,12 +1331,7 @@

    taxonomic_updates -The status of the use of the cleaned_name as a label for a -taxon. Requires taxonomic opinion to define the scope of a taxon. Rules -of priority then are used to define the taxonomic status of the -nomenclature contained in that scope, combined with the experts opinion. -It must be linked to a specific taxonomic reference that defines the -concept. +The status of the use of the cleaned_name as a label for a taxon. Requires taxonomic opinion to define the scope of a taxon. Rules of priority then are used to define the taxonomic status of the nomenclature contained in that scope, combined with the experts opinion. It must be linked to a specific taxonomic reference that defines the concept.

    @@ -1660,9 +1339,7 @@

    taxonomic_updates -The taxonomic status of alternative taxonomic records with -cleaned_name as the accepted (botanical) or valid -(zoological) taxon name. +The taxonomic status of alternative taxonomic records with cleaned_name as the accepted (botanical) or valid (zoological) taxon name.

    @@ -1670,9 +1347,7 @@

    taxonomic_updates -An identifier for the set of taxon information (data associated with the -taxon class). May be a global unique identifier or an identifier -specific to the data set. Must be resolvable within this dataset. +An identifier for the set of taxon information (data associated with the taxon class). May be a global unique identifier or an identifier specific to the data set. Must be resolvable within this dataset.

    @@ -1680,10 +1355,7 @@

    taxonomic_updates -Scientific name of the taxon on which traits were sampled, without -authorship. When possible, this is the currently accepted (botanical) or -valid (zoological) scientific name, but might also be a higher taxonomic -level. +Scientific name of the taxon on which traits were sampled, without authorship. When possible, this is the currently accepted (botanical) or valid (zoological) scientific name, but might also be a higher taxonomic level.

    @@ -1693,14 +1365,8 @@

    taxonomic_updates

    definitions

    -

    Description: A copy of the definitions for all -tables and terms. Information included here was used to process data and -generate any documentation for the study.

    -

    Details on trait definitions: The allowable trait -names and trait values are defined in the definitions file. Each trait -is labelled as either numeric or categorical. -An example of each type is as follows. For the full list, see the Trait -definitions vignette.

    +

    Description: A copy of the definitions for all tables and terms. Information included here was used to process data and generate any documentation for the study.

    +

    Details on trait definitions: The allowable trait names and trait values are defined in the definitions file. Each trait is labelled as either numeric or categorical. An example of each type is as follows. For the full list, see the Trait definitions vignette.

    leaf_mass_per_area

    • number of records: 0
    • @@ -1715,8 +1381,7 @@

      definitions

      contributors

      -

      Description: A table of people contributing to each -study.

      +

      Description: A table of people contributing to each study.

      Content:

    @@ -1733,10 +1398,7 @@

    contributors -Primary identifier for each study contributed to AusTraits; most often -these are scientific papers, books, or online resources. By default this -should be the name of the first author and year of publication, -e.g. Falster_2005. +Primary identifier for each study contributed to AusTraits; most often these are scientific papers, books, or online resources. By default this should be the name of the first author and year of publication, e.g. Falster_2005.

    @@ -1785,29 +1447,23 @@

    contributors

    sources

    -

    For each dataset in the compilation there is the option to list -primary and secondary citations. The primary citation is defined as, -The secondary citation is defined as,

    -

    The element sources includes bibtex versions of all -sources which can be imported into your reference library:

    -
    RefManageR::WriteBib(austraits$sources, "refs.bib") # write all sources to file
    -RefManageR::WriteBib(austraits$sources["Falster_2005_1"], "refs.bib") # write a single reference to a file
    +

    For each dataset in the compilation there is the option to list primary and secondary citations. The primary citation is defined as, The secondary citation is defined as,

    +

    The element sources includes bibtex versions of all sources which can be imported into your reference library:

    +
    RefManageR::WriteBib(austraits$sources, "refs.bib") # write all sources to file
    +RefManageR::WriteBib(austraits$sources["Falster_2005_1"], "refs.bib") # write a single reference to a file

    Or individually viewed:

    -
    austraits$sources["Falster_2005_1"]
    +
    austraits$sources["Falster_2005_1"]

    A formatted version of the sources also exists within the table methods.

    metadata

    -

    Description: Metadata associated with the dataset, -including title, creators, license, subject, funding sources.

    +

    Description: Metadata associated with the dataset, including title, creators, license, subject, funding sources.

    build_info

    -

    Description: A description of the computing -environment used to create this version of the dataset, including -version number, git commit and R session_info.

    +

    Description: A description of the computing environment used to create this version of the dataset, including version number, git commit and R session_info.

    diff --git a/docs/articles/austraits_file_structure.html b/docs/articles/austraits_file_structure.html index 11ed045f8..d248ffff8 100644 --- a/docs/articles/austraits_file_structure.html +++ b/docs/articles/austraits_file_structure.html @@ -80,7 +80,7 @@ Adding new datasets into `AusTraits`
  • - Docker for reproducible compute environment + Docker for reproducible compute environment
  • @@ -127,9 +127,7 @@

    2023-02-09

    File structure

    -

    The main directory for the austraits.build -repository contains the following files and folders, with purpose as -indicated.

    +

    The main directory for the austraits.build repository contains the following files and folders, with purpose as indicated.

    R project file
    @@ -174,26 +172,13 @@

    Details on files used for da

    Configuration

    -

    The folder config contains three files which govern the -building of the dataset.

    +

    The folder config contains three files which govern the building of the dataset.

    config
     ├── traits.yml
     ├── taxon_list.csv
     └── unit_conversions.csv
    -

    The file traits.yml provides the trait definitions used -to compile AusTraits, including allowable trait values. The trait -definitions are fully described in an additional vignette. A -.yml file is a structured data file where information is -presented in a hierarchical format (see appendix for -details).

    -

    The file taxon_list.csv is our master list of known -taxa. Each species is listed once, with links to species’ identifiers -provided by the Australian Plant Name Index (APNI). The file -taxon_list.csv is added to if a study includes taxa not -previously represented in AusTraits. These can be names included in -either the APC/APNI, compilations of taxonomic concepts (APC) or names -(APNI) for plants that are either native to or naturalised in Australia, -or taxa without recognised names.

    +

    The file traits.yml provides the trait definitions used to compile AusTraits, including allowable trait values. The trait definitions are fully described in an additional vignette. A .yml file is a structured data file where information is presented in a hierarchical format (see appendix for details).

    +

    The file taxon_list.csv is our master list of known taxa. Each species is listed once, with links to species’ identifiers provided by the Australian Plant Name Index (APNI). The file taxon_list.csv is added to if a study includes taxa not previously represented in AusTraits. These can be names included in either the APC/APNI, compilations of taxonomic concepts (APC) or names (APNI) for plants that are either native to or naturalised in Australia, or taxa without recognised names.

    @@ -418,9 +403,7 @@

    Configuration

    @@ -550,24 +533,15 @@

    Configuration

    Data

    -

    The folder data contains the raw data from individual -studies included in AusTraits.

    -

    Records within the data folder are organised as coming -from a particular study, defined by the dataset_id. Data -from each study are organised into a separate folder, with two -files:

    +

    The folder data contains the raw data from individual studies included in AusTraits.

    +

    Records within the data folder are organised as coming from a particular study, defined by the dataset_id. Data from each study are organised into a separate folder, with two files:

    • -data.csv: a table containing the actual trait -data.
    • +data.csv: a table containing the actual trait data.
    • -metadata.yml: a file that contains study metadata -(source, methods, locations, and context), maps trait names and units -onto standard types, and lists any substitutions applied to the data in -processing.
    • +metadata.yml: a file that contains study metadata (source, methods, locations, and context), maps trait names and units onto standard types, and lists any substitutions applied to the data in processing.
    -

    The folder data thus contains a long list of folders, -one for each study and each containing two files:

    +

    The folder data thus contains a long list of folders, one for each study and each containing two files:

    data
     ├── Angevin_2010
     │   ├── data.csv
    @@ -580,89 +554,36 @@ 

    Data │ └── metadata.yml ├── ....

    -

    where Angevin_2010, Barlow_1981, & -Bean_1997 are each a unique dataset_id in the -final dataset.

    +

    where Angevin_2010, Barlow_1981, & Bean_1997 are each a unique dataset_id in the final dataset.

    Data.csv

    -

    The file data.csv contains raw measurements and can be -in either long or wide format.

    -

    Required columns include the taxon name, the trait name (column in -long format, header in wide format), units (column in long format, part -of header in wide format), location (if applicable), context (if -applicable), date (if available), and trait values.

    -

    It is important that all trait measurements made on the same -individual or that are the mean of a species’ measurements from the same -location are kept linked.

    +

    The file data.csv contains raw measurements and can be in either long or wide format.

    +

    Required columns include the taxon name, the trait name (column in long format, header in wide format), units (column in long format, part of header in wide format), location (if applicable), context (if applicable), date (if available), and trait values.

    +

    It is important that all trait measurements made on the same individual or that are the mean of a species’ measurements from the same location are kept linked.

      -
    • If the data is in wide format, each row should include -measurements made on a single individual at a single point in time or a -single species-by-location mean, with different trait values as -consecutive columns.

    • -
    • If the data is in long format, an additional column, -individual_id, is required to ensure multiple trait -measurements made on the same individual, or the mean of a species’ -measurements from the same location, are linked. If the data is in wide -format and there are multiple rows of data for the same individual, an -individual_id column should be included. These -individual_id columns ensure that related data values -remain linked.

    • +
    • If the data is in wide format, each row should include measurements made on a single individual at a single point in time or a single species-by-location mean, with different trait values as consecutive columns.

    • +
    • If the data is in long format, an additional column, individual_id, is required to ensure multiple trait measurements made on the same individual, or the mean of a species’ measurements from the same location, are linked. If the data is in wide format and there are multiple rows of data for the same individual, an individual_id column should be included. These individual_id columns ensure that related data values remain linked.

    -

    We aim to keep the data file in the rawest form possible (i.e. with -as few changes as possible) but it must be a single csv file. Additional -custom R code may be required to make the file exactly compatible with -the AusTraits format, but these changes should be executed as AusTraits -is compiled and should be in the metadata.yml file under -dataset/custom_R_code (see below). Any files used to create -the submitted data.csv file (e.g. Excel …) should be -archived in a sub-folder within the study folder named -raw.

    +

    We aim to keep the data file in the rawest form possible (i.e. with as few changes as possible) but it must be a single csv file. Additional custom R code may be required to make the file exactly compatible with the AusTraits format, but these changes should be executed as AusTraits is compiled and should be in the metadata.yml file under dataset/custom_R_code (see below). Any files used to create the submitted data.csv file (e.g. Excel …) should be archived in a sub-folder within the study folder named raw.

    Metadata.yml

    -

    The metadata is compiled in a .yml file, a structured -data file where information is presented in a hierarchical format (see -Appendix for details). There are 10 values at the -top hierarchical level: source, contributors, dataset, locations, -contexts, traits, substitutions, taxonomic_updates, -exclude_observations, questions. These are each described below.

    -

    As a start, you may want to check out some examples from existing -studies in Austraits, e.g. Angevin_2010 -or Wright_2009.

    +

    The metadata is compiled in a .yml file, a structured data file where information is presented in a hierarchical format (see Appendix for details). There are 10 values at the top hierarchical level: source, contributors, dataset, locations, contexts, traits, substitutions, taxonomic_updates, exclude_observations, questions. These are each described below.

    +

    As a start, you may want to check out some examples from existing studies in Austraits, e.g. Angevin_2010 or Wright_2009.

    Source
    -

    This section provides citation details for the original source(s) for -the data, whether it is a published journal article, book, website, or -thesis. In general we aim to reference the primary source. References -are written in structured yml format, under the category -source and then under sub-groupings primary, -secondary, and original. A reference is -designated as secondary if it is a second publication by -the data collector that analyses the data. When the primary -reference is a compilation of multiple sources for a meta-analysis, the -original references are designated as original.

    +

    This section provides citation details for the original source(s) for the data, whether it is a published journal article, book, website, or thesis. In general we aim to reference the primary source. References are written in structured yml format, under the category source and then under sub-groupings primary, secondary, and original. A reference is designated as secondary if it is a second publication by the data collector that analyses the data. When the primary reference is a compilation of multiple sources for a meta-analysis, the original references are designated as original.

    General guidelines for describing a source include:

    • A maximum of one primary source allowed.
    • Elements are names as in bibtex format.
    • -
    • Keys should be named in the format Surname_year and the -primary source is almost always identical to the name given to the -dataset folder. A second instance of the identical Surname_year should -have the key Surname_year_2.
    • -
    • One or more secondary source may be included if traits from a single -dataset were presented in two different manuscripts. Multiple sources -are also appropriate if an author has compiled data from a number of -sources, which are not individually in AusTraits, for a published or -unpublished compilation.
    • -
    • If your data is from an unpublished study, only include the elements -that are applicable.
    • -
    • If someone has transcribed a published source, the primary source -will be the published work and the person who has completed the -transcription will be acknowledged as the contributor of -the dataset.
    • +
    • Keys should be named in the format Surname_year and the primary source is almost always identical to the name given to the dataset folder. A second instance of the identical Surname_year should have the key Surname_year_2.
    • +
    • One or more secondary source may be included if traits from a single dataset were presented in two different manuscripts. Multiple sources are also appropriate if an author has compiled data from a number of sources, which are not individually in AusTraits, for a published or unpublished compilation.
    • +
    • If your data is from an unpublished study, only include the elements that are applicable.
    • +
    • If someone has transcribed a published source, the primary source will be the published work and the person who has completed the transcription will be acknowledged as the contributor of the dataset.

    An example of a primary source that is a journal article is:

    source:
    @@ -707,9 +628,7 @@ 
    Source
    Contributors
    -

    This section provides a list of contributors to the study, their -respective affiliations, roles in the study, and orcids. The following -information is recorded for each data contributor:

    +

    This section provides a list of contributors to the study, their respective affiliations, roles in the study, and orcids. The following information is recorded for each data contributor:

    @@ -766,9 +684,7 @@
    Contributors -Any additional roles the data collector had in the study, a field most -frequently used to identify which data contributor is the contact person -for the dataset. +Any additional roles the data collector had in the study, a field most frequently used to identify which data contributor is the contact person for the dataset.
    @@ -785,119 +701,60 @@
    Contributors
    Dataset
    -

    This section includes study details, including format of the data, -custom r code applied to data, and various descriptors. the value -entered for each element can be either a header for a column within the -data.csv file or the actual value to be used.

    -

    The following elements are included under the element -dataset:

    +

    This section includes study details, including format of the data, custom r code applied to data, and various descriptors. the value entered for each element can be either a header for a column within the data.csv file or the actual value to be used.

    +

    The following elements are included under the element dataset:

    • -data_is_long_format: Indicates if the data -spreadsheet has a vertical (long) or horizontal (wide) configuration -with yes or no terminology.
    • +data_is_long_format: Indicates if the data spreadsheet has a vertical (long) or horizontal (wide) configuration with yes or no terminology.
    • -custom_R_code: A field where additional R code can -be included. This allows for custom manipulation of the data in the -submitted spreadsheet into a different format for easy integration with -AusTraits. .na indicates no custom R code was used.
    • +custom_R_code: A field where additional R code can be included. This allows for custom manipulation of the data in the submitted spreadsheet into a different format for easy integration with AusTraits. .na indicates no custom R code was used.
    • -collection_date: Date sample was taken, in the -format yyyy-mm-dd, yyyy-mm or -yyyy, depending on the resoluton specified. Alternatively -an overall range for the study can be indicating, with the starting and -ending sample date sepatated by a /, as in -2010-10/2011-03
    • +collection_date: Date sample was taken, in the format yyyy-mm-dd, yyyy-mm or yyyy, depending on the resoluton specified. Alternatively an overall range for the study can be indicating, with the starting and ending sample date sepatated by a /, as in 2010-10/2011-03
    • -taxon_name: Scientific name of the taxon on which -traits were sampled, without authorship. When possible, this is the -currently accepted (botanical) or valid (zoological) scientific name, -but might also be a higher taxonomic level.
    • +taxon_name: Scientific name of the taxon on which traits were sampled, without authorship. When possible, this is the currently accepted (botanical) or valid (zoological) scientific name, but might also be a higher taxonomic level.
    • location_name: location name
    • -source_id: For datasets that are compilations, an -identifier for the original data source.
    • +source_id: For datasets that are compilations, an identifier for the original data source.
    • -individual_id: A unique integer identifier for an -individual, with individuals numbered sequentially within each dataset -by taxon by population grouping. Most often each row of data represents -an individual, but in some datasets trait data collected on a single -individual is presented across multiple rows of data, such as if the -same trait is measured using different methods or the same individual is -measured repeatedly across time.
    • +individual_id: A unique integer identifier for an individual, with individuals numbered sequentially within each dataset by taxon by population grouping. Most often each row of data represents an individual, but in some datasets trait data collected on a single individual is presented across multiple rows of data, such as if the same trait is measured using different methods or the same individual is measured repeatedly across time.
    • -trait_name: Element required for long datasets to -specify the column indicating the trait name associated with each row of -data.
    • +trait_name: Element required for long datasets to specify the column indicating the trait name associated with each row of data.
    • value: The measured value of a trait.
    • -description: A 1-2 sentence description of the -purpose of the study.
    • +description: A 1-2 sentence description of the purpose of the study.
    • -basis_of_record: A categorical variable specifying -from which kind of specimen traits were recorded.
    • +basis_of_record: A categorical variable specifying from which kind of specimen traits were recorded.
    • -life_stage: A field to indicate the life stage or -age class of the entity measured. Standard values are -adult, sapling, seedling and -juvenile.
    • +life_stage: A field to indicate the life stage or age class of the entity measured. Standard values are adult, sapling, seedling and juvenile.
    • -sampling_strategy: A written description of how -study locations were selected and how study individuals were selected. -When available, this information is lifted verbatim from a published -manuscript. For preserved specimens, this field ideally indicates which -records were ‘sampled’ to measure a specific trait.
    • +sampling_strategy: A written description of how study locations were selected and how study individuals were selected. When available, this information is lifted verbatim from a published manuscript. For preserved specimens, this field ideally indicates which records were ‘sampled’ to measure a specific trait.
    • -measurement_remarks: Brief comments or notes -accompanying the trait measurement.
    • +measurement_remarks: Brief comments or notes accompanying the trait measurement.
    • -original_file: The name of the file initially -submitted to AusTraits.
    • +original_file: The name of the file initially submitted to AusTraits.
    • -notes: Generic notes about the study and processing -of data.
    • +notes: Generic notes about the study and processing of data.
    -

    Of these, the fields collection_date, -life_stage, basis_of_record, and -measurement_remarks can all be specified at the dataset -level or the traits level (which overrides a dataset-level entry) or -location level (which also overrides a dataset-level entry). In each -case, they can be a fixed text value or indicate a column within the -data.csv file (or generated through custom_R_code) that -includes the relevant information.

    +

    Of these, the fields collection_date, life_stage, basis_of_record, and measurement_remarks can all be specified at the dataset level or the traits level (which overrides a dataset-level entry) or location level (which also overrides a dataset-level entry). In each case, they can be a fixed text value or indicate a column within the data.csv file (or generated through custom_R_code) that includes the relevant information.

      -
    • life_stage, basis_of_record, and -collection_date are usually included under -metadata$dataset unless they vary by trait.

    • -
    • entity_type, replicates, -basis_of_value, and value_type are usually -different across traits and are usually mapped under the -metadata$traits section (see below), but are allowed to be -specified for the entire dataset in this section.

    • -
    • traits and value are only specified in -metadata$dataset for long-format datasets.

    • -
    • measurement_remarks and individual_id -are only included if required. They are absent from the majority of -datasets.

    • +
    • life_stage, basis_of_record, and collection_date are usually included under metadata$dataset unless they vary by trait.

    • +
    • entity_type, replicates, basis_of_value, and value_type are usually different across traits and are usually mapped under the metadata$traits section (see below), but are allowed to be specified for the entire dataset in this section.

    • +
    • traits and value are only specified in metadata$dataset for long-format datasets.

    • +
    • measurement_remarks and individual_id are only included if required. They are absent from the majority of datasets.

    An example is as follows:

      data_is_long_format: no
    @@ -930,32 +787,14 @@ 
    Dataset
    -

    A common use of the custom_R_code is to automate the -conversion of a verbal description of flowering or fruiting periods into -the supported trait values. It might also be used if values for a single -trait are expressed across multiple columns and need to be merged. See -Catford_2014 as an example of this. The adding -data vignette provides additional examples of code regularly -implemented in custom_R_code, including functions -specifically that were developed for AusTraits data manipulations and -are in the file scripts\custom.R.

    +

    A common use of the custom_R_code is to automate the conversion of a verbal description of flowering or fruiting periods into the supported trait values. It might also be used if values for a single trait are expressed across multiple columns and need to be merged. See Catford_2014 as an example of this. The adding data vignette provides additional examples of code regularly implemented in custom_R_code, including functions specifically that were developed for AusTraits data manipulations and are in the file scripts\custom.R.

    locations
    -

    This section provides a list of study locations (sites) and -information about each of the study locations where data were collected. -Each should include at least three variables - -latitude (deg), longitude (deg) and -description. Additional variables can be included where -available. Set to .na for botanical collections and field -studies where data values are a mean across many locations.

    -

    Although the properties listed under each location are not part of a -controlled vocabulary, it is best practice to align with in-use -properties whenever possible. These can be identified by running -austraits$locations %>% distinct(location_property).

    -

    An example of how a location and its properties, and the value of -each property are listed (modified from Vesk_2019), is:

    +

    This section provides a list of study locations (sites) and information about each of the study locations where data were collected. Each should include at least three variables - latitude (deg), longitude (deg) and description. Additional variables can be included where available. Set to .na for botanical collections and field studies where data values are a mean across many locations.

    +

    Although the properties listed under each location are not part of a controlled vocabulary, it is best practice to align with in-use properties whenever possible. These can be identified by running austraits$locations %>% distinct(location_property).

    +

    An example of how a location and its properties, and the value of each property are listed (modified from Vesk_2019), is:

      Round Hill-Nombinnie Nature Reserve:
         latitude (deg): -32.965
         longitude (deg): 146.161
    @@ -970,54 +809,25 @@ 
    locations
    Contexts
    -

    This section provides contextual characteristics associated with -information in traits.

    -

    Within the context section is a list of contextual properties, each -encapsulating information read in through a different column or created -through custom_R_code or as elements within specific -traits (see below).

    +

    This section provides contextual characteristics associated with information in traits.

    +

    Within the context section is a list of contextual properties, each encapsulating information read in through a different column or created through custom_R_code or as elements within specific traits (see below).

    • -context_property: The context property represented -by the data in the column specified by var_in.
    • +context_property: The context property represented by the data in the column specified by var_in.
    • -category: The category of contextual data. Options -are plot (a distinct collection of organisms within a -single geographic location, such as plants growing on different aspects -or blocks in an experiment), treatment (an experimental -treatment), entity_context (contextual information to -record about the entity the isn’t documented elsewhere, including the -entity’s sex, caste), temporal (indicating when repeat -observations are made on the same individual (or population, or taxon) -across time) and method (indicating the same trait was -measured on the same individual (or population, or taxon) using multiple -methods).
    • +category: The category of contextual data. Options are plot (a distinct collection of organisms within a single geographic location, such as plants growing on different aspects or blocks in an experiment), treatment (an experimental treatment), entity_context (contextual information to record about the entity the isn’t documented elsewhere, including the entity’s sex, caste), temporal (indicating when repeat observations are made on the same individual (or population, or taxon) across time) and method (indicating the same trait was measured on the same individual (or population, or taxon) using multiple methods).
    • -var_in: Name of column with contextual data in the -original data submitted.
    • +var_in: Name of column with contextual data in the original data submitted.
    • -find: The contextual values in the original data -submitted (optional)
    • +find: The contextual values in the original data submitted (optional)
    • -value: The standardised contextual values, aligning -syntax and wording with other studies.
    • +value: The standardised contextual values, aligning syntax and wording with other studies.
    • -description: A description of the contextual -values.
    • +description: A description of the contextual values.
    -

    If the contextual values read in are appropriate and no substitutions -are required, the field find can be omitted, with the -values from the data.csv column entered under the field -value. The field description can likewise be -omitted if it is redundant; for instance, if the values are simply -sequential observation numbers, times of day, or taxon names -(e.g. insect host plants).

    -

    As with location, the context properties are not part of a controlled -vocabulary, but it is best practice to align syntax with in-use -properties whenever possible. These can be identified by running -austraits$contexts %>% distinct(context_property).

    -

    An example of how the contexts for a study are formatted (modified -from Crous_2013), is:

    +

    If the contextual values read in are appropriate and no substitutions are required, the field find can be omitted, with the values from the data.csv column entered under the field value. The field description can likewise be omitted if it is redundant; for instance, if the values are simply sequential observation numbers, times of day, or taxon names (e.g. insect host plants).

    +

    As with location, the context properties are not part of a controlled vocabulary, but it is best practice to align syntax with in-use properties whenever possible. These can be identified by running austraits$contexts %>% distinct(context_property).

    +

    An example of how the contexts for a study are formatted (modified from Crous_2013), is:

    contexts:
     - context_property: sampling season
       category: temporal
    @@ -1065,79 +875,34 @@ 
    Contexts
    Traits
    -

    This section provides a translation table, mapping traits and units -from a contributed study onto corresponding variables in AusTraits. The -methods used to collect the data are also specified here.

    -

    For each trait submitted to AusTraits, there is the following -information:

    +

    This section provides a translation table, mapping traits and units from a contributed study onto corresponding variables in AusTraits. The methods used to collect the data are also specified here.

    +

    For each trait submitted to AusTraits, there is the following information:

    • -var_in: Name of trait in the original data -submitted.
    • +var_in: Name of trait in the original data submitted.
    • -unit_in: Units of trait in the original data -submitted.
    • +unit_in: Units of trait in the original data submitted.
    • -trait_name: Name of the trait sampled. Allowable -values specified in the table definitions.
    • +trait_name: Name of the trait sampled. Allowable values specified in the table definitions.
    • -entity_type: A categorical variable specifying the -entity corresponding to the trait values recorded.
    • +entity_type: A categorical variable specifying the entity corresponding to the trait values recorded.
    • -value_type: A categorical variable describing the -statistical nature of the trait value recorded.
    • +value_type: A categorical variable describing the statistical nature of the trait value recorded.
    • -basis_of_record: A categorical variable specifying -from which kind of specimen traits were recorded.
    • +basis_of_record: A categorical variable specifying from which kind of specimen traits were recorded.
    • -basis_of_value: A categorical variable describing -how the trait value was obtained.
    • +basis_of_value: A categorical variable describing how the trait value was obtained.
    • -replicates: Number of replicate measurements that -comprise a recorded trait measurement. A numeric value (or range) is -ideal and appropriate if the value type is a mean, -median, min or max. For these -value types, if replication is unknown the entry should be -unknown. If the value type is raw_value the -replicate value should be 1. If the trait is categorical or the value -indicates a measurement for an entire species (or other taxon) replicate -value should be .na.
    • +replicates: Number of replicate measurements that comprise a recorded trait measurement. A numeric value (or range) is ideal and appropriate if the value type is a mean, median, min or max. For these value types, if replication is unknown the entry should be unknown. If the value type is raw_value the replicate value should be 1. If the trait is categorical or the value indicates a measurement for an entire species (or other taxon) replicate value should be .na.
    • -measurement_remarks: Brief comments or notes -accompanying the trait measurement.
    • +measurement_remarks: Brief comments or notes accompanying the trait measurement.
    • -methods: A textual description of the methods used -to collect the trait data. Whenever available, methods are taken -near-verbatim from the referenced source. Methods can include -descriptions such as ‘measured on botanical collections’, ‘data from the -literature’, or a detailed description of the field or lab methods used -to collect the data.
    • +methods: A textual description of the methods used to collect the trait data. Whenever available, methods are taken near-verbatim from the referenced source. Methods can include descriptions such as ‘measured on botanical collections’, ‘data from the literature’, or a detailed description of the field or lab methods used to collect the data.
    • -life_stage: A field to indicate the life stage or -age class of the entity measured. Standard values are -adult, sapling, seedling and -juvenile.
    • +life_stage: A field to indicate the life stage or age class of the entity measured. Standard values are adult, sapling, seedling and juvenile.
    -

    The elements trait_name, entity_type, -value_type, basis_of_record, and -basis of value are controlled vocabularies; the values for -these elements must be from the list of allowable values. Those for -traits are listed in the traits.yml file -or vignette. -For the other elements, see the database -structure vignette.

    -

    The fields replicates, basis_of_value, -value_type, life_stage, -basis_of_record, and measurement_remarks can -all be specified at the dataset level or the traits level (which -overrides a dataset-level entry). In each case, they can be a fixed text -value or indicate a column (within the data.csv file or -generated through custom_R_code) that includes the relevant -information. In addition, fields can be added to specify a specific -context (most commonly a method context, but occasionally a -temporal context). If such a field is added, the same name -must appear in both the contexts section and for some (or all) of the -traits.

    +

    The elements trait_name, entity_type, value_type, basis_of_record, and basis of value are controlled vocabularies; the values for these elements must be from the list of allowable values. Those for traits are listed in the traits.yml file or vignette. For the other elements, see the database structure vignette.

    +

    The fields replicates, basis_of_value, value_type, life_stage, basis_of_record, and measurement_remarks can all be specified at the dataset level or the traits level (which overrides a dataset-level entry). In each case, they can be a fixed text value or indicate a column (within the data.csv file or generated through custom_R_code) that includes the relevant information. In addition, fields can be added to specify a specific context (most commonly a method context, but occasionally a temporal context). If such a field is added, the same name must appear in both the contexts section and for some (or all) of the traits.

    Two examples are as follows:

    - var_in: LeafP.m
       unit_in: mg/g
    @@ -1172,62 +937,42 @@ 
    Traits
    Substitutions
    -

    This section provides a list of any “find and replace” substitutions -needed to get the data into the right format.

    -

    Substitutions are required whenever the exact word(s) used to -describe a categorical trait value in AusTraits is different from the -vocabulary used by the author in the data.csv file. It is -preferable to align vocabulary using substitutions rather -than changing the data.csv file. The trait -definitions file provides a list of supported values for each -trait.

    +

    This section provides a list of any “find and replace” substitutions needed to get the data into the right format.

    +

    Substitutions are required whenever the exact word(s) used to describe a categorical trait value in AusTraits is different from the vocabulary used by the author in the data.csv file. It is preferable to align vocabulary using substitutions rather than changing the data.csv file. The trait definitions file provides a list of supported values for each trait.

    Each substitution is documented using the following elements:

    • -trait_name: Trait where substitutions are -required.
    • +trait_name: Trait where substitutions are required.
    • -find: Contributor’s trait value that needs to be -changed.
    • +find: Contributor’s trait value that needs to be changed.
    • -replace: AusTraits supported replacement -value.
    • +replace: AusTraits supported replacement value.

    An example is as follows:

    -
    substitutions:
    -- trait_name: life_history
    -  find: p
    -  replace: perennial
    -- trait_name: plant_growth_form
    -  find: s
    -  replace: shrub
    -- ...
    +
    substitutions:
    +- trait_name: life_history
    +  find: p
    +  replace: perennial
    +- trait_name: plant_growth_form
    +  find: s
    +  replace: shrub
    +- ...
    Taxonomic updates
    -

    This section provides a table of taxonomic name changes needed to -align original names in the dataset with taxon names in the chosen -taxonomic reference(s).

    +

    This section provides a table of taxonomic name changes needed to align original names in the dataset with taxon names in the chosen taxonomic reference(s).

    Each substitution is documented using the following elements:

    • -find: Name given to taxon in the original data -supplied by the authors.
    • +find: Name given to taxon in the original data supplied by the authors.
    • -replace: Scientific name of the taxon on which -traits were sampled, without authorship. When possible, this is the -currently accepted (botanical) or valid (zoological) scientific name, -but might also be a higher taxonomic level.
    • +replace: Scientific name of the taxon on which traits were sampled, without authorship. When possible, this is the currently accepted (botanical) or valid (zoological) scientific name, but might also be a higher taxonomic level.
    • -reason: Records why the change was implemented, -e.g. typos, taxonomic synonyms, and -standardising spellings +reason: Records why the change was implemented, e.g. typos, taxonomic synonyms, and standardising spellings
    -

    Algorithms within AusTraits automatically align outdated taxonomy and -taxonomic synonyms to their currently accepted scientific name, so such -adjustments are not documented as substitutions.

    +

    Algorithms within AusTraits automatically align outdated taxonomy and taxonomic synonyms to their currently accepted scientific name, so such adjustments are not documented as substitutions.

    Some examples of taxonomic updates are as follows:

    taxonomic_updates:
     - find: Drummondita rubroviridis
    @@ -1248,10 +993,7 @@ 
    Taxonomic updates
    Questions
    -

    This section provides a place to record any queries we have about the -dataset (recorded as a named array), including notes on any additional -traits that may have been collected in the study but have not been -incorporated into austraits.

    +

    This section provides a place to record any queries we have about the dataset (recorded as a named array), including notes on any additional traits that may have been collected in the study but have not been incorporated into austraits.

    An example is as follows:

    questions:
       questions for author: Triglochin procera has very different seed masses in the main traits spreadsheet and the field seeds worksheet. Which is correct? There are a number of species with values in the field leaves worksheet that are absent in the main traits worksheet - we have included this data into Austraits; please advise if this was inappropriate.
    @@ -1268,86 +1010,54 @@ 

    File types

    CSV

    -

    A comma-separated values (CSV) file is a delimited text file that -uses a comma to separate values. Each line of the file is a data record. -Each record consists of one or more fields, separated by commas. This is -a comma format for storing tables of data in a simple text file. You can -edit it in Excel or in a text editor. For more, see here.

    +

    A comma-separated values (CSV) file is a delimited text file that uses a comma to separate values. Each line of the file is a data record. Each record consists of one or more fields, separated by commas. This is a comma format for storing tables of data in a simple text file. You can edit it in Excel or in a text editor. For more, see here.

    YAML files

    -

    The yml file extension (pronounced “YAML”) is a type structured data -file, that is both human and machine readable. You can edit it in -any text editor, or also in Rstudio. Generally, yml is used in -situations where a table is not suitable because of variable lengths and -or nested structures. It has the advantage over a spreadsheet in that -the nested “headers” can have variable numbers of categories. The data -under each of the hierarchical headings are easily extracted by R.

    +

    The yml file extension (pronounced “YAML”) is a type structured data file, that is both human and machine readable. You can edit it in any text editor, or also in Rstudio. Generally, yml is used in situations where a table is not suitable because of variable lengths and or nested structures. It has the advantage over a spreadsheet in that the nested “headers” can have variable numbers of categories. The data under each of the hierarchical headings are easily extracted by R.

    Adding custom R code into metadata.yml

    -

    Occasionally all the changes we want to make to dataset may not fit -into the prescribed workflow used in AusTraits. For example, we assume -that each trait has a single unit. But there are a few datasets where -data on different rows have different units. So we want to make to make -some custom modifications to this particular dataset before the common -pipeline of operations gets applied. To make this possible, the workflow -allows for some custom R code to be run as a first step in the -processing pipeline. That pipeline (in the function read_data_study) -looks like:

    +

    Occasionally all the changes we want to make to dataset may not fit into the prescribed workflow used in AusTraits. For example, we assume that each trait has a single unit. But there are a few datasets where data on different rows have different units. So we want to make to make some custom modifications to this particular dataset before the common pipeline of operations gets applied. To make this possible, the workflow allows for some custom R code to be run as a first step in the processing pipeline. That pipeline (in the function read_data_study) looks like:

    -data <-
    -  read_csv(filename_data_raw, col_types = cols()) %>%
    -  process_custom_code(metadata[["dataset"]][["custom_R_code"]])() %>%
    -  process_parse_data(dataset_id, metadata) %>%
    -  ...()
    +data <- + read_csv(filename_data_raw, col_types = cols()) %>% + process_custom_code(metadata[["dataset"]][["custom_R_code"]])() %>% + process_parse_data(dataset_id, metadata) %>% + ...()

    Note the second line.

    Example problem

    -

    As an example, for Blackman_2010 we want to combine two -columns to create an appropriate location variable. Here is the code -that was included in data/Blackman_2010/metadata.yml -under custom_R_code.

    +

    As an example, for Blackman_2010 we want to combine two columns to create an appropriate location variable. Here is the code that was included in data/Blackman_2010/metadata.yml under custom_R_code.

    -data %>% mutate(
    -  location = ifelse(location == "Mt Field" & habitat == "Montane rainforest", "Mt Field_wet", location),
    -  location = ifelse(location == "Mt Field" & habitat == "Dry sclerophyll", "Mt Field_dry", location)
    -)
    -

    This is the finished solution, but to get there we did as -follows:

    +data %>% mutate( + location = ifelse(location == "Mt Field" & habitat == "Montane rainforest", "Mt Field_wet", location), + location = ifelse(location == "Mt Field" & habitat == "Dry sclerophyll", "Mt Field_dry", location) +)
    +

    This is the finished solution, but to get there we did as follows:

    Generally, this code should

      -
    • assume a single object called data, and apply whatever -fixes are needed
    • -
    • use dplyr functions like -mutate, rename, etc
    • -
    • use pipes to weave together a single statement, if possible. -(Otherwise you’ll need a semi colon ; at the end of each -statement).
    • -
    • be fully self-contained (we’re not going to use any of the other -remake machinery here)
    • +
    • assume a single object called data, and apply whatever fixes are needed
    • +
    • use dplyr functions like mutate, rename, etc
    • +
    • use pipes to weave together a single statement, if possible. (Otherwise you’ll need a semi colon ; at the end of each statement).
    • +
    • be fully self-contained (we’re not going to use any of the other remake machinery here)

    First, load an object called data:

    -library(readr)
    -library(yaml)
    -
    -data <- read_csv(file.path("data", "Blackman_2010", "data.csv"), col_types = cols(.default = "c"))
    -data
    -

    Second, write your code to manipulate data, like the example -above

    -

    Third, once you have some working code, you then want to add it into -your yml file under dataset -> -custom_R_code.

    -

    Finally, check it works. Let’s assume you added it in. The function -metadata_check_custom_R_code loads the data and applies the -custom R code:

    +library(readr) +library(yaml) + +data <- read_csv(file.path("data", "Blackman_2010", "data.csv"), col_types = cols(.default = "c")) +data +

    Second, write your code to manipulate data, like the example above

    +

    Third, once you have some working code, you then want to add it into your yml file under dataset -> custom_R_code.

    +

    Finally, check it works. Let’s assume you added it in. The function metadata_check_custom_R_code loads the data and applies the custom R code:

    +metadata_check_custom_R_code("Blackman_2010") diff --git a/docs/articles/austraits_overview.html b/docs/articles/austraits_overview.html index cc26fd614..4807a28bc 100644 --- a/docs/articles/austraits_overview.html +++ b/docs/articles/austraits_overview.html @@ -80,7 +80,7 @@ Adding new datasets into `AusTraits`
  • - Docker for reproducible compute environment + Docker for reproducible compute environment
  • @@ -123,100 +123,45 @@

    2023-02-09

    -

    AusTraits is an open-source, harmonised database of Australian plant -trait data. Traits vary in scope, including morphological attributes -(e.g. leaf area, seed mass, plant height), physiological measures of -performance (e.g. photosynthetic gas exchange, water-use efficiency), -tissue biochemical composition (e.g. leaf nitrogen content, leaf -chlorophyll content), and life history traits (e.g. seed bank location, -plant growth form, salinity tolerance).

    -

    This vignette provides an overview of our workflow, to demonstrate -our commitment to creating a reliable, reproducible resource for anyone -interested in plant traits.

    +

    AusTraits is an open-source, harmonised database of Australian plant trait data. Traits vary in scope, including morphological attributes (e.g. leaf area, seed mass, plant height), physiological measures of performance (e.g. photosynthetic gas exchange, water-use efficiency), tissue biochemical composition (e.g. leaf nitrogen content, leaf chlorophyll content), and life history traits (e.g. seed bank location, plant growth form, salinity tolerance).

    +

    This vignette provides an overview of our workflow, to demonstrate our commitment to creating a reliable, reproducible resource for anyone interested in plant traits.

    AusTraits workflow

    Data sources

    -

    The data in AusTraits is derived from nearly 300 distinct sources, -each contributed by an individual researcher, government entity -(e.g. herbaria), or NGO. Each reflects the research agenda of the -individual/organisation who contributed the data - the species selected, -traits measured, manipulative treatments performed, and locations -sampled encompass the diversity of research interests present in -Australia throughout past decades. To attain data, the AusTraits data -curators have reached out to as many researchers as time permitted. This -was done without explicitly soliciting datasets with specific traits; -therefore, the spotty data coverage by trait or location simply -represents what has been merged into AusTraits at this time.

    -

    These datasets use different variable trait names, units and methods -and have different data structures.

    +

    The data in AusTraits is derived from nearly 300 distinct sources, each contributed by an individual researcher, government entity (e.g. herbaria), or NGO. Each reflects the research agenda of the individual/organisation who contributed the data - the species selected, traits measured, manipulative treatments performed, and locations sampled encompass the diversity of research interests present in Australia throughout past decades. To attain data, the AusTraits data curators have reached out to as many researchers as time permitted. This was done without explicitly soliciting datasets with specific traits; therefore, the spotty data coverage by trait or location simply represents what has been merged into AusTraits at this time.

    +

    These datasets use different variable trait names, units and methods and have different data structures.

    Standardising and harmonising data

    -

    To create a single database for distribution to the research -community, we developed a reproducible and transparent workflow in R for -merging each dataset into AusTraits. The pipeline ensures the following -information is standardised across all datasets in AusTraits. A -metadata file for each study documents how the -data tables submitted by an individual contributor are -translated into the standardised terms used in the AusTraits -database.

    +

    To create a single database for distribution to the research community, we developed a reproducible and transparent workflow in R for merging each dataset into AusTraits. The pipeline ensures the following information is standardised across all datasets in AusTraits. A metadata file for each study documents how the data tables submitted by an individual contributor are translated into the standardised terms used in the AusTraits database.

    • -taxonomic nomenclature follows the Australian Plant -Census (APC), with a pipeline to update outdated taxonomy, correct minor -spelling mistakes, and align with a known genus when a full species -names isn’t provided.
      +taxonomic nomenclature follows the Australian Plant Census (APC), with a pipeline to update outdated taxonomy, correct minor spelling mistakes, and align with a known genus when a full species names isn’t provided.
    • -trait names are defined in our -traits.yml file and only data for traits included in this -file can be merged into AusTraits. The trait names used in the incoming -dataset are mapped onto the appropriate AusTraits trait name.
    • -
    • For numeric traits the traits.yml file -includes units and the allowable range of -values. All incoming data are converted to the appropriate units and -data outside the range of allowable values are removed from the main -AusTraits data table.
    • -
    • For categorical traits the traits.yml -file includes a list of allowable values, allowed terms for -the trait. Each categorical trait value is defined in the -traits.yml file. Lists of substitutions translate the exact -syntax and terms in a submitted dataset into the values allowed by -AusTraits. This ensures that for a certain trait the same -value has an identical meaning throughout the AusTraits -database.
    • +trait names are defined in our traits.yml file and only data for traits included in this file can be merged into AusTraits. The trait names used in the incoming dataset are mapped onto the appropriate AusTraits trait name. +
    • For numeric traits the traits.yml file includes units and the allowable range of values. All incoming data are converted to the appropriate units and data outside the range of allowable values are removed from the main AusTraits data table.
    • +
    • For categorical traits the traits.yml file includes a list of allowable values, allowed terms for the trait. Each categorical trait value is defined in the traits.yml file. Lists of substitutions translate the exact syntax and terms in a submitted dataset into the values allowed by AusTraits. This ensures that for a certain trait the same value has an identical meaning throughout the AusTraits database.
    • Site locations are recorded in decimal degrees.

    Referencing sources and recording methods

    -

    The metadata file also includes all metadata associated -with the study:

    +

    The metadata file also includes all metadata associated with the study:

      -
    • The source information for each dataset is recorded. Most -frequently, these are the primary publications derived from the -dataset.
    • -
    • People associated with the collection of the data are listed, -including their role in the project.
    • +
    • The source information for each dataset is recorded. Most frequently, these are the primary publications derived from the dataset.
    • +
    • People associated with the collection of the data are listed, including their role in the project.
    • Collection methods are included.
    • -
    • Fields capture value type (mean, min, max, mode, range, bin) and -associated replicate numbers, basis of value (measurement, expert_score, -model_derived), entity type (species, population, individual), life -stage (adult, juvenile,sapling, seedling), basis of record (field, -field_experiment, preserved_specimen, captive_cultivated, lab, -literature), and any additional measurement remarks.
    • +
    • Fields capture value type (mean, min, max, mode, range, bin) and associated replicate numbers, basis of value (measurement, expert_score, model_derived), entity type (species, population, individual), life stage (adult, juvenile,sapling, seedling), basis of record (field, field_experiment, preserved_specimen, captive_cultivated, lab, literature), and any additional measurement remarks.
    • Available data on location properties are recorded.
    • -
    • Available data on plot and treatment contextual properties are -recorded.
    • -
    • A context field, temporal_id, indicates if repeat measures were made -on the same individual over time.
    • -
    • A context field, method_id, indicates if the same trait was measured -using multiple methods.
    • +
    • Available data on plot and treatment contextual properties are recorded.
    • +
    • A context field, temporal_id, indicates if repeat measures were made on the same individual over time.
    • +
    • A context field, method_id, indicates if the same trait was measured using multiple methods.
    • Collection date is recorded.
    @@ -224,45 +169,35 @@

    Referencing sources and recor

    Error checking

      -
    • The AusTraits data curator runs a series of tests -on each data set, detailed in the adding -data vignette +
    • The AusTraits data curator runs a series of tests on each data set, detailed in the adding data vignette
    • -
    • These tests identify misaligned units, -unrecognised taxon names, and unsupported -categorical trait values +
    • These tests identify misaligned units, unrecognised taxon names, and unsupported categorical trait values
    • -
    • These tests also identify and eliminate most -duplicate data - instances where the same numeric trait -data is submitted by multiple people
    • -
    • Each dataset is then compiled into a report which -summarises metadata and plots/charts trait values in comparison to other -measurements of that trait in AusTraits. The report is reviewed by the -data contributor to ensure metadata is complete and data values are as -expected.
    • -
    • A second member of the AusTraits team double checks each dataset -before it is merged into the main repository.
    • +
    • These tests also identify and eliminate most duplicate data - instances where the same numeric trait data is submitted by multiple people
    • +
    • Each dataset is then compiled into a report which summarises metadata and plots/charts trait values in comparison to other measurements of that trait in AusTraits. The report is reviewed by the data contributor to ensure metadata is complete and data values are as expected.
    • +
    • A second member of the AusTraits team double checks each dataset before it is merged into the main repository.
    diff --git a/docs/articles/contributing_data.html b/docs/articles/contributing_data.html index 580f192d1..49b2dba1b 100644 --- a/docs/articles/contributing_data.html +++ b/docs/articles/contributing_data.html @@ -80,7 +80,7 @@ Adding new datasets into `AusTraits`
  • - Docker for reproducible compute environment + Docker for reproducible compute environment
  • @@ -123,122 +123,42 @@

    2023-02-09

    -

    AusTraits is an open-source harmonised database of Australian plant -trait data. It exists because hundreds of researchers across Australia -(and beyond) have contributed their datasets to this endeavour. Each -dataset we receive incrementally broadens trait coverage for the -Australian flora and, in turn, makes the database a little better to -address your research questions.

    -

    As such, we welcome all data contributions to AusTraits, including -recently collected trait data, legacy trait data from your file -archives, transcribed reference works, and transcribed datasets from the -literature.

    -

    The AusTraits data-entry people then merge each dataset into -AusTraits. AusTraits is a harmonised database: for each study we -carefully check to ensure units are accurate, continuous trait values -map in the expected range, categorical trait values map onto sensible -terms, location data are accurate, taxon names are aligned to current -standards, and all metadata are recorded.

    -

    After completing a series of quality checks, we will send you a -report to review that summarises the data and metadata. The reports -include plots for each continuous trait, comparing values in your -submission to those already in AusTraits. It plots your study locations -(sites) on a map. It summarises your metadata and indicates the -taxonomic alignments made. The report includes both targeted questions -(sometimes) and automated questions, acting as prompts to review aspects -of the report. Reviewing your report should not take long, and confirms -the transparent, thorough process used to build AusTraits.

    -

    As a first step, all we really require is a Data -Spreadsheet and a copy of your Manuscript.

    +

    AusTraits is an open-source harmonised database of Australian plant trait data. It exists because hundreds of researchers across Australia (and beyond) have contributed their datasets to this endeavour. Each dataset we receive incrementally broadens trait coverage for the Australian flora and, in turn, makes the database a little better to address your research questions.

    +

    As such, we welcome all data contributions to AusTraits, including recently collected trait data, legacy trait data from your file archives, transcribed reference works, and transcribed datasets from the literature.

    +

    The AusTraits data-entry people then merge each dataset into AusTraits. AusTraits is a harmonised database: for each study we carefully check to ensure units are accurate, continuous trait values map in the expected range, categorical trait values map onto sensible terms, location data are accurate, taxon names are aligned to current standards, and all metadata are recorded.

    +

    After completing a series of quality checks, we will send you a report to review that summarises the data and metadata. The reports include plots for each continuous trait, comparing values in your submission to those already in AusTraits. It plots your study locations (sites) on a map. It summarises your metadata and indicates the taxonomic alignments made. The report includes both targeted questions (sometimes) and automated questions, acting as prompts to review aspects of the report. Reviewing your report should not take long, and confirms the transparent, thorough process used to build AusTraits.

    +

    As a first step, all we really require is a Data Spreadsheet and a copy of your Manuscript.

    Data

    -

    Your dataset, preferably in a spreadsheet -format.

    -

    Traits: Make sure the trait names used in your -dataset are easy to interpret or, alternatively, provide a brief -definition

    -

    Units: Please make sure the units for each trait -are provided as part of the trait name or in a separate -spreadsheet/worksheet

    -

    Value type: We prefer to incorporate raw values -(or individual means) in AusTraits, but can use population or multi-site -means if that is what is available. For mean values, please provide -sample size.

    -

    Location: For field studies, please provide -location details (see more below).

    -

    Context: Optional, but AusTraits can read in one -(or more) column(s) with contextual information, such as canopy -position, experimental manipulation, dry vs. wet season, etc.

    -

    Collection date: Optional, but AusTraits can read -in a column with sampling date (in any format)

    -

    Species/taxa: Please provide complete species -names or a look-up table to match species codes. Out-dated taxonomy is -fine – we have name-matching algorithms.

    +

    Your dataset, preferably in a spreadsheet format.

    +

    Traits: Make sure the trait names used in your dataset are easy to interpret or, alternatively, provide a brief definition

    +

    Units: Please make sure the units for each trait are provided as part of the trait name or in a separate spreadsheet/worksheet

    +

    Value type: We prefer to incorporate raw values (or individual means) in AusTraits, but can use population or multi-site means if that is what is available. For mean values, please provide sample size.

    +

    Location: For field studies, please provide location details (see more below).

    +

    Context: Optional, but AusTraits can read in one (or more) column(s) with contextual information, such as canopy position, experimental manipulation, dry vs. wet season, etc.

    +

    Collection date: Optional, but AusTraits can read in a column with sampling date (in any format)

    +

    Species/taxa: Please provide complete species names or a look-up table to match species codes. Out-dated taxonomy is fine – we have name-matching algorithms.

    Metadata

    -

    The AusTraits structure has fields to input all metadata associated -with your study, including methods, location details, and context. In -detail:

    -

    Methods: For published studies the necessary -methods and study information can be extracted from a publication; just -attach a copy of the manuscript or the DOI.

    +

    The AusTraits structure has fields to input all metadata associated with your study, including methods, location details, and context. In detail:

    +

    Methods: For published studies the necessary methods and study information can be extracted from a publication; just attach a copy of the manuscript or the DOI.

      -
    • The only commonly missing information is the general sampling -period, such as ‘October-December 2020’; this is only required if your -data file doesn’t have a date column.

    • -
    • For unpublished studies, provide brief methods for how each trait -was measured; you can simply refer to a standard published -protocol

    • +
    • The only commonly missing information is the general sampling period, such as ‘October-December 2020’; this is only required if your data file doesn’t have a date column.

    • +
    • For unpublished studies, provide brief methods for how each trait was measured; you can simply refer to a standard published protocol

    -

    Study locations: Whenever possible, AusTraits -includes location names, location coordinates (latitude/longitude), and -any other location properties you have measured/recorded (vegetation -description, soil chemistry, climate data, etc.). This information can -be provided as a second spreadsheet or as additional columns in the main -data spreadsheet. Just make sure the location name is the same in both -spreadsheets.

    -

    Context: If your study includes contextual -variables, make sure the context values are included as columns in the -data spreadsheet. Also, please make sure the contextual values are -self-explanatory or provide the necessary explanation.

    -

    Authors: Authorship is extended to anyone who -played a key intellectual role in the experimental design and data -collection. Most studies have 1-3 authors. For each author, please -provide a name, institutional -affiliation, email address, and their -ORCID (if available). Please nominate a single -contributor to be the dataset’s point of contact; this person’s email -will not be listed in the metadata file, but is the person future -AusTraits users are likely to seek out if they have questions. -Additional field assistants can be listed.

    -

    Source: The published manuscript is generally the -source. If different traits or observations from a single dataset were -published separately, please provide both references. If the dataset you -are submitting is a compilation from many sources, please provide a -complete list of sources and indicate which rows of data are -attributable to which source.

    +

    Study locations: Whenever possible, AusTraits includes location names, location coordinates (latitude/longitude), and any other location properties you have measured/recorded (vegetation description, soil chemistry, climate data, etc.). This information can be provided as a second spreadsheet or as additional columns in the main data spreadsheet. Just make sure the location name is the same in both spreadsheets.

    +

    Context: If your study includes contextual variables, make sure the context values are included as columns in the data spreadsheet. Also, please make sure the contextual values are self-explanatory or provide the necessary explanation.

    +

    Authors: Authorship is extended to anyone who played a key intellectual role in the experimental design and data collection. Most studies have 1-3 authors. For each author, please provide a name, institutional affiliation, email address, and their ORCID (if available). Please nominate a single contributor to be the dataset’s point of contact; this person’s email will not be listed in the metadata file, but is the person future AusTraits users are likely to seek out if they have questions. Additional field assistants can be listed.

    +

    Source: The published manuscript is generally the source. If different traits or observations from a single dataset were published separately, please provide both references. If the dataset you are submitting is a compilation from many sources, please provide a complete list of sources and indicate which rows of data are attributable to which source.

    Most common hang-ups

    -

    Categorical trait values: If you have categorical -traits, please define any trait values (i.e. entries for that trait) -that are not self-explanatory. A copy of our definitions file, including -allowable values for each trait is available here. -The definitions file is a work-in-progress and additional trait values -can be added if needed to capture the exact meaning you intended.

    -

    Data sourced from others: For numerical traits, -AusTraits strives to only include data collected by you for this -project, to avoid having multiple entries of the same -measurement/observation. If you have certain trait values that were -sourced from the literature, an online database, or colleagues, please -indicate that clearly. If trait values for some species were collected -by you and others were sourced, it is very helpful if you could add a -column to your spreadsheet that indicates the source for different rows -of data.

    +

    Categorical trait values: If you have categorical traits, please define any trait values (i.e. entries for that trait) that are not self-explanatory. A copy of our definitions file, including allowable values for each trait is available here. The definitions file is a work-in-progress and additional trait values can be added if needed to capture the exact meaning you intended.

    +

    Data sourced from others: For numerical traits, AusTraits strives to only include data collected by you for this project, to avoid having multiple entries of the same measurement/observation. If you have certain trait values that were sourced from the literature, an online database, or colleagues, please indicate that clearly. If trait values for some species were collected by you and others were sourced, it is very helpful if you could add a column to your spreadsheet that indicates the source for different rows of data.

    diff --git a/docs/articles/Docker.html b/docs/articles/docker.html similarity index 85% rename from docs/articles/Docker.html rename to docs/articles/docker.html index e6c587f72..86b44a5bb 100644 --- a/docs/articles/Docker.html +++ b/docs/articles/docker.html @@ -80,7 +80,7 @@ Adding new datasets into `AusTraits`
  • - Docker for reproducible compute environment + Docker for reproducible compute environment
  • @@ -116,56 +116,39 @@

    Docker for reproducible compute environment

    2023-02-09

    - Source: vignettes/Docker.Rmd - + Source: vignettes/docker.Rmd + -

    As the R compute environment and packages change over time, we have -created a Docker container to ensure all our builds are reproducible -into the future.

    -

    If you have Docker installed, -you can recreate the compute environment as follows. For more -instructions on running docker, see the info from the R docker project -rocker. Our docker -container is build off the rocker/verse container. -This includes

    +

    As the R compute environment and packages change over time, we have created a Docker container to ensure all our builds are reproducible into the future.

    +

    If you have Docker installed, you can recreate the compute environment as follows. For more instructions on running docker, see the info from the R docker project rocker. Our docker container is build off the rocker/verse container. This includes

    • R version
    • rstudio
    • tidyverse & devtools
    • tex & publishing-related packages
    -

    On top of that, we install everything needed to build AusTraits and -the reports.

    +

    On top of that, we install everything needed to build AusTraits and the reports.

    Running via Docker

    First fetch the container:

    docker pull traitecoevo/austraits.build:latest
    -

    (instead of latest, you can indicate a specific tag, .e.g. 3.6.1, or -4.1.2)

    +

    (instead of latest, you can indicate a specific tag, .e.g. 3.6.1, or 4.1.2)

    Then launch it via:

    docker run --user root -v $(pwd):/home/rstudio/ -p 8787:8787 -e DISABLE_AUTH=true traitecoevo/austraits.build:latest
    -

    Adding a -d into the command above will cause the image -to run in the background.

    -

    The code above initialises a docker container, which runs an rstudio -session, which is accessed by pointing your browser to localhost:8787.

    -

    Note, this container does not contain the actual github repo, only -the software environment. If you run the above command from within your -downloaded repo, it will map the working directory as the current -working directory inside the docker container.

    +

    Adding a -d into the command above will cause the image to run in the background.

    +

    The code above initialises a docker container, which runs an rstudio session, which is accessed by pointing your browser to localhost:8787.

    +

    Note, this container does not contain the actual github repo, only the software environment. If you run the above command from within your downloaded repo, it will map the working directory as the current working directory inside the docker container.

    Building Docker container

    -

    The recipe used to build the docker container is included in the -Dockerfile in this repo. Our image builds off rocker/verse -container via the following command, in a terminal contained within -the downloaded repo:

    +

    The recipe used to build the docker container is included in the Dockerfile in this repo. Our image builds off rocker/verse container via the following command, in a terminal contained within the downloaded repo:

    docker build -t traitecoevo/austraits.build:4.1.2 .

    Images are pushed to dockerhub (here):

    docker push traitecoevo/austraits.build:4.1.2
    diff --git a/docs/articles/index.html b/docs/articles/index.html index 4797ffc32..af82a40b9 100644 --- a/docs/articles/index.html +++ b/docs/articles/index.html @@ -62,7 +62,7 @@ Adding new datasets into `AusTraits`
  • - Docker for reproducible compute environment + Docker for reproducible compute environment
  • @@ -106,7 +106,7 @@

    All vignettes

    Contributing data to AusTraits
    -
    Docker for reproducible compute environment
    +
    Docker for reproducible compute environment
    Definitions for traits in AusTraits
    diff --git a/docs/articles/Trait_definitions.html b/docs/articles/trait_definitions.html similarity index 70% rename from docs/articles/Trait_definitions.html rename to docs/articles/trait_definitions.html index 127ae6a23..bf4526aa7 100644 --- a/docs/articles/Trait_definitions.html +++ b/docs/articles/trait_definitions.html @@ -80,7 +80,7 @@ Adding new datasets into `AusTraits`
  • - Docker for reproducible compute environment + Docker for reproducible compute environment
  • @@ -123,17 +123,13 @@

    2023-02-09

    -

    This document provides a full list of trait definitions used in -AusTraits version 4.1.0.9000, as defined in the file configuration files -config/traits.yml.

    +

    This document provides a full list of trait definitions used in AusTraits version 4.1.0.9000, as defined in the file configuration files config/traits.yml.

    accessory_cost_fraction

    -
    bib_print(
    -  bib,
    -  .opts = list(first.inits = TRUE, max.names = 1000, style = "markdown")
    -)
    +
    bib_print(
    +  bib,
    +  .opts = list(first.inits = TRUE, max.names = 1000, style = "markdown")
    +)
    diff --git a/docs/reference/build_add_version.html b/docs/reference/build_add_version.html index 011a8a8a6..f7faebcbd 100644 --- a/docs/reference/build_add_version.html +++ b/docs/reference/build_add_version.html @@ -62,7 +62,7 @@ Adding new datasets into `AusTraits`
  • - Docker for reproducible compute environment + Docker for reproducible compute environment
  • @@ -97,7 +97,7 @@

    Add version information to AusTraits

  • -
    build_add_version(austraits, version, git_sha)
    +
    build_add_version(austraits, version, git_sha)
    diff --git a/docs/reference/build_combine.html b/docs/reference/build_combine.html index 546b7ba5e..3b25568b8 100644 --- a/docs/reference/build_combine.html +++ b/docs/reference/build_combine.html @@ -63,7 +63,7 @@ Adding new datasets into `AusTraits`
  • - Docker for reproducible compute environment + Docker for reproducible compute environment
  • @@ -99,7 +99,7 @@

    Combine all the AusTraits studies into the compiled AusTraits database

  • -
    build_combine(..., d = list(...))
    +
    build_combine(..., d = list(...))
    diff --git a/docs/reference/build_find_taxon.html b/docs/reference/build_find_taxon.html index 60247d83e..8be30dc65 100644 --- a/docs/reference/build_find_taxon.html +++ b/docs/reference/build_find_taxon.html @@ -62,7 +62,7 @@ Adding new datasets into `AusTraits`
  • - Docker for reproducible compute environment + Docker for reproducible compute environment
  • @@ -97,7 +97,7 @@

    Find list of all unique taxa within compilation

  • -
    build_find_taxon(taxon_name, austraits, original_name = FALSE)
    +
    build_find_taxon(taxon_name, austraits, original_name = FALSE)
    diff --git a/docs/reference/build_setup_pipeline.html b/docs/reference/build_setup_pipeline.html index 09e61104d..d32411731 100644 --- a/docs/reference/build_setup_pipeline.html +++ b/docs/reference/build_setup_pipeline.html @@ -63,7 +63,7 @@ Adding new datasets into `AusTraits`
  • - Docker for reproducible compute environment + Docker for reproducible compute environment
  • @@ -99,12 +99,12 @@

    Update the remake.yml file with new studies

  • -
    build_setup_pipeline(
    -  template = readLines(system.file("support", "remake.yml.whisker", package =
    -    "austraits.build")),
    -  path = "data",
    -  dataset_ids = dir(path)
    -)
    +
    build_setup_pipeline(
    +  template = readLines(system.file("support", "remake.yml.whisker", package =
    +    "austraits.build")),
    +  path = "data",
    +  dataset_ids = dir(path)
    +)
    diff --git a/docs/reference/build_update_taxonomy.html b/docs/reference/build_update_taxonomy.html index 088ba035d..ec825e6f9 100644 --- a/docs/reference/build_update_taxonomy.html +++ b/docs/reference/build_update_taxonomy.html @@ -62,7 +62,7 @@ Adding new datasets into `AusTraits`
  • - Docker for reproducible compute environment + Docker for reproducible compute environment
  • @@ -97,7 +97,7 @@

    Apply taxonomic updates to austraits_raw

  • -
    build_update_taxonomy(austraits_raw, taxa)
    +
    build_update_taxonomy(austraits_raw, taxa)
    diff --git a/docs/reference/create_tree_branch.html b/docs/reference/create_tree_branch.html index e3669e165..2f7eca06d 100644 --- a/docs/reference/create_tree_branch.html +++ b/docs/reference/create_tree_branch.html @@ -65,7 +65,7 @@ Adding new datasets into `AusTraits`
  • - Docker for reproducible compute environment + Docker for reproducible compute environment
  • @@ -103,7 +103,7 @@

    Format a tree structure from a vector

  • -
    create_tree_branch(x, title, prefix = "")
    +
    create_tree_branch(x, title, prefix = "")
    diff --git a/docs/reference/dataset_configure.html b/docs/reference/dataset_configure.html index 293de56ea..52bf2872f 100644 --- a/docs/reference/dataset_configure.html +++ b/docs/reference/dataset_configure.html @@ -67,7 +67,7 @@ Adding new datasets into `AusTraits`
  • - Docker for reproducible compute environment + Docker for reproducible compute environment
  • @@ -107,7 +107,7 @@

    Configure AusTraits database object

  • -
    dataset_configure(filename_metadata, definitions, unit_conversion_functions)
    +
    dataset_configure(filename_metadata, definitions, unit_conversion_functions)
    @@ -133,10 +133,10 @@

    Value

    Examples

    -
    if (FALSE) {
    -dataset_configure("data/Falster_2003/metadata.yml", read_yaml("config/traits.yml"),
    -get_unit_conversions("config/unit_conversions.csv"))
    -}
    +    
    if (FALSE) {
    +dataset_configure("data/Falster_2003/metadata.yml", read_yaml("config/traits.yml"),
    +get_unit_conversions("config/unit_conversions.csv"))
    +}
     
    diff --git a/docs/reference/dataset_process.html b/docs/reference/dataset_process.html index da3781218..3b8446c16 100644 --- a/docs/reference/dataset_process.html +++ b/docs/reference/dataset_process.html @@ -64,7 +64,7 @@ Adding new datasets into `AusTraits`
  • - Docker for reproducible compute environment + Docker for reproducible compute environment
  • @@ -101,13 +101,13 @@

    Load Dataset

  • -
    dataset_process(
    -  filename_data_raw,
    -  config_for_dataset,
    -  schema,
    -  resource_metadata,
    -  filter_missing_values = TRUE
    -)
    +
    dataset_process(
    +  filename_data_raw,
    +  config_for_dataset,
    +  schema,
    +  resource_metadata,
    +  filter_missing_values = TRUE
    +)
    @@ -138,11 +138,11 @@

    Value

    Examples

    -
    if (FALSE) {
    -dataset_process("data/Falster_2003/data.csv", dataset_configure("data/Falster_2003/metadata.yml",
    -read_yaml("config/traits.yml"), get_unit_conversions("config/unit_conversions.csv")),
    -get_schema())
    -}
    +    
    if (FALSE) {
    +dataset_process("data/Falster_2003/data.csv", dataset_configure("data/Falster_2003/metadata.yml",
    +read_yaml("config/traits.yml"), get_unit_conversions("config/unit_conversions.csv")),
    +get_schema())
    +}
     
    diff --git a/docs/reference/dataset_report.html b/docs/reference/dataset_report.html index 2051e4ab5..3bb4dc041 100644 --- a/docs/reference/dataset_report.html +++ b/docs/reference/dataset_report.html @@ -63,7 +63,7 @@ Adding new datasets into `AusTraits`
  • - Docker for reproducible compute environment + Docker for reproducible compute environment
  • @@ -99,15 +99,15 @@

    Build reports for listed datasets

  • -
    dataset_report(
    -  dataset_id,
    -  austraits,
    -  overwrite = FALSE,
    -  output_path = "export/reports",
    -  input_file = system.file("support", "report_dataset.Rmd", package = "austraits.build"),
    -  quiet = TRUE,
    -  keep = FALSE
    -)
    +
    dataset_report(
    +  dataset_id,
    +  austraits,
    +  overwrite = FALSE,
    +  output_path = "export/reports",
    +  input_file = system.file("support", "report_dataset.Rmd", package = "austraits.build"),
    +  quiet = TRUE,
    +  keep = FALSE
    +)
    diff --git a/docs/reference/dataset_test.html b/docs/reference/dataset_test.html index 48019a7fc..7bb5f22ef 100644 --- a/docs/reference/dataset_test.html +++ b/docs/reference/dataset_test.html @@ -62,7 +62,7 @@ Adding new datasets into `AusTraits`
  • - Docker for reproducible compute environment + Docker for reproducible compute environment
  • @@ -97,12 +97,12 @@

    Test whether specified dataset_id has the correct setup

  • -
    dataset_test(
    -  dataset_ids,
    -  path_config = "config",
    -  path_data = "data",
    -  reporter = testthat::default_reporter()
    -)
    +
    dataset_test(
    +  dataset_ids,
    +  path_config = "config",
    +  path_data = "data",
    +  reporter = testthat::default_reporter()
    +)
    diff --git a/docs/reference/dataset_test_worker.html b/docs/reference/dataset_test_worker.html index 4e7bc5e23..d520bd14f 100644 --- a/docs/reference/dataset_test_worker.html +++ b/docs/reference/dataset_test_worker.html @@ -62,7 +62,7 @@ Adding new datasets into `AusTraits`
  • - Docker for reproducible compute environment + Docker for reproducible compute environment
  • @@ -97,13 +97,13 @@

    Test whether specified dataset_id has the correct setup

  • -
    dataset_test_worker(
    -  test_dataset_ids,
    -  path_config = "config",
    -  path_data = "data",
    -  schema = get_schema(),
    -  definitions = get_schema(file.path(path_config, "traits.yml"), I("traits"))
    -)
    +
    dataset_test_worker(
    +  test_dataset_ids,
    +  path_config = "config",
    +  path_data = "data",
    +  schema = get_schema(),
    +  definitions = get_schema(file.path(path_config, "traits.yml"), I("traits"))
    +)
    diff --git a/docs/reference/get_schema.html b/docs/reference/get_schema.html index ffdebffdc..9efc0a3ba 100644 --- a/docs/reference/get_schema.html +++ b/docs/reference/get_schema.html @@ -62,7 +62,7 @@ Adding new datasets into `AusTraits`
  • - Docker for reproducible compute environment + Docker for reproducible compute environment
  • @@ -97,11 +97,11 @@

    Load schema for an austraits.build data compilation (excluding traits)

  • -
    get_schema(
    -  path = system.file("support", "austraits.build_schema.yml", package =
    -    "austraits.build"),
    -  subsection = NULL
    -)
    +
    get_schema(
    +  path = system.file("support", "austraits.build_schema.yml", package =
    +    "austraits.build"),
    +  subsection = NULL
    +)
    @@ -123,10 +123,10 @@

    Value

    Examples

    -
    {
    -
    -schema <- get_schema()
    -}
    +    
    {
    +
    +schema <- get_schema()
    +}
     
    diff --git a/docs/reference/get_unit_conversions.html b/docs/reference/get_unit_conversions.html index 4c063859f..ae5ab5ea5 100644 --- a/docs/reference/get_unit_conversions.html +++ b/docs/reference/get_unit_conversions.html @@ -62,7 +62,7 @@ Adding new datasets into `AusTraits`
  • - Docker for reproducible compute environment + Docker for reproducible compute environment
  • @@ -97,7 +97,7 @@

    Make unit conversion functions

  • -
    get_unit_conversions(filename)
    +
    get_unit_conversions(filename)
    @@ -115,9 +115,9 @@

    Value

    Examples

    -
    if (FALSE) {
    -get_unit_conversions("config/unit_conversions.csv")
    -}
    +    
    if (FALSE) {
    +get_unit_conversions("config/unit_conversions.csv")
    +}
     
    diff --git a/docs/reference/index.html b/docs/reference/index.html index 8f16455cb..37a5c6604 100644 --- a/docs/reference/index.html +++ b/docs/reference/index.html @@ -62,7 +62,7 @@ Adding new datasets into `AusTraits`
  • - Docker for reproducible compute environment + Docker for reproducible compute environment
  • diff --git a/docs/reference/metadata_add_contexts.html b/docs/reference/metadata_add_contexts.html index de71da3fd..2d78014e1 100644 --- a/docs/reference/metadata_add_contexts.html +++ b/docs/reference/metadata_add_contexts.html @@ -65,7 +65,7 @@ Adding new datasets into `AusTraits`
  • - Docker for reproducible compute environment + Docker for reproducible compute environment
  • @@ -103,7 +103,7 @@

    For specified `dataset_id` import context data from a dataframe

  • -
    metadata_add_contexts(dataset_id, overwrite = FALSE)
    +
    metadata_add_contexts(dataset_id, overwrite = FALSE)
    diff --git a/docs/reference/metadata_add_locations.html b/docs/reference/metadata_add_locations.html index 14f8abc79..e8ebb7f88 100644 --- a/docs/reference/metadata_add_locations.html +++ b/docs/reference/metadata_add_locations.html @@ -65,7 +65,7 @@ Adding new datasets into `AusTraits`
  • - Docker for reproducible compute environment + Docker for reproducible compute environment
  • @@ -103,7 +103,7 @@

    For specified `dataset_id` import location data from a dataframe

  • -
    metadata_add_locations(dataset_id, location_data)
    +
    metadata_add_locations(dataset_id, location_data)
    @@ -119,11 +119,11 @@

    Arguments

    Examples

    -
    if (FALSE) {
    -austraits$locations %>% dplyr::filter(dataset_id == "Falster_2005_1") %>% 
    -select(-dataset_id) %>% spread(location_property, value) %>% type_convert()-> location_data
    -metadata_add_locations("Falster_2005_1", location_data)
    -}
    +    
    if (FALSE) {
    +austraits$locations %>% dplyr::filter(dataset_id == "Falster_2005_1") %>% 
    +select(-dataset_id) %>% spread(location_property, value) %>% type_convert()-> location_data
    +metadata_add_locations("Falster_2005_1", location_data)
    +}
     
    diff --git a/docs/reference/metadata_add_source_bibtex.html b/docs/reference/metadata_add_source_bibtex.html index 193c44060..de6737b8f 100644 --- a/docs/reference/metadata_add_source_bibtex.html +++ b/docs/reference/metadata_add_source_bibtex.html @@ -62,7 +62,7 @@ Adding new datasets into `AusTraits`
  • - Docker for reproducible compute environment + Docker for reproducible compute environment
  • @@ -97,13 +97,13 @@

    Adds citation details to a metadata file for given study

  • -
    metadata_add_source_bibtex(
    -  dataset_id,
    -  file,
    -  type = "primary",
    -  key = dataset_id,
    -  drop = c("dateobj", "month")
    -)
    +
    metadata_add_source_bibtex(
    +  dataset_id,
    +  file,
    +  type = "primary",
    +  key = dataset_id,
    +  drop = c("dateobj", "month")
    +)
    diff --git a/docs/reference/metadata_add_source_doi.html b/docs/reference/metadata_add_source_doi.html index b3fe7ca66..3b7f72d9f 100644 --- a/docs/reference/metadata_add_source_doi.html +++ b/docs/reference/metadata_add_source_doi.html @@ -63,7 +63,7 @@ Adding new datasets into `AusTraits`
  • - Docker for reproducible compute environment + Docker for reproducible compute environment
  • @@ -99,7 +99,7 @@

    Adds citation details from a doi to a metadata file for a dataset_id.

  • -
    metadata_add_source_doi(..., doi, bib = NULL)
    +
    metadata_add_source_doi(..., doi, bib = NULL)
    diff --git a/docs/reference/metadata_add_substitution.html b/docs/reference/metadata_add_substitution.html index 4d4820d18..33da1da34 100644 --- a/docs/reference/metadata_add_substitution.html +++ b/docs/reference/metadata_add_substitution.html @@ -64,7 +64,7 @@ Adding new datasets into `AusTraits`
  • - Docker for reproducible compute environment + Docker for reproducible compute environment
  • @@ -101,7 +101,7 @@

    Add a categorical trait value substitution into a metadata file for a datase

  • -
    metadata_add_substitution(dataset_id, trait_name, find, replace)
    +
    metadata_add_substitution(dataset_id, trait_name, find, replace)
    diff --git a/docs/reference/metadata_add_substitutions_list.html b/docs/reference/metadata_add_substitutions_list.html index f2af2fa8c..c72567217 100644 --- a/docs/reference/metadata_add_substitutions_list.html +++ b/docs/reference/metadata_add_substitutions_list.html @@ -62,7 +62,7 @@ Adding new datasets into `AusTraits`
  • - Docker for reproducible compute environment + Docker for reproducible compute environment
  • @@ -97,7 +97,7 @@

    Add a dataframe of trait value substitutions into a metadata file for a data

  • -
    metadata_add_substitutions_list(dataset_id, substitutions)
    +
    metadata_add_substitutions_list(dataset_id, substitutions)
    diff --git a/docs/reference/metadata_add_substitutions_table.html b/docs/reference/metadata_add_substitutions_table.html index f8e195656..caf1c28b5 100644 --- a/docs/reference/metadata_add_substitutions_table.html +++ b/docs/reference/metadata_add_substitutions_table.html @@ -63,7 +63,7 @@ Adding new datasets into `AusTraits`
  • - Docker for reproducible compute environment + Docker for reproducible compute environment
  • @@ -99,13 +99,13 @@

    Substitutions from csv

  • -
    metadata_add_substitutions_table(
    -  dataframe_of_substitutions,
    -  dataset_id,
    -  trait_name,
    -  find,
    -  replace
    -)
    +
    metadata_add_substitutions_table(
    +  dataframe_of_substitutions,
    +  dataset_id,
    +  trait_name,
    +  find,
    +  replace
    +)
    @@ -139,12 +139,12 @@

    Value

    Examples

    -
    if (FALSE) {
    -read_csv("export/dispersal_syndrome_substitutions.csv") %>%
    -  select(-extra) %>%
    -  filter(dataset_id == "Angevin_2011") -> dataframe_of_substitutions
    -metadata_add_substitutions_table(dataframe_of_substitutions, dataset_id, trait_name, find, replace)
    -}
    +    
    if (FALSE) {
    +read_csv("export/dispersal_syndrome_substitutions.csv") %>%
    +  select(-extra) %>%
    +  filter(dataset_id == "Angevin_2011") -> dataframe_of_substitutions
    +metadata_add_substitutions_table(dataframe_of_substitutions, dataset_id, trait_name, find, replace)
    +}
     
    diff --git a/docs/reference/metadata_add_taxonomic_change.html b/docs/reference/metadata_add_taxonomic_change.html index f557aeeb9..71e359e4a 100644 --- a/docs/reference/metadata_add_taxonomic_change.html +++ b/docs/reference/metadata_add_taxonomic_change.html @@ -62,7 +62,7 @@ Adding new datasets into `AusTraits`
  • - Docker for reproducible compute environment + Docker for reproducible compute environment
  • @@ -97,13 +97,13 @@

    Add a taxonomic change into the metadata yaml file for a dataset_id

  • -
    metadata_add_taxonomic_change(
    -  dataset_id,
    -  find,
    -  replace,
    -  reason,
    -  taxonomic_resolution
    -)
    +
    metadata_add_taxonomic_change(
    +  dataset_id,
    +  find,
    +  replace,
    +  reason,
    +  taxonomic_resolution
    +)
    diff --git a/docs/reference/metadata_add_taxonomic_changes_list.html b/docs/reference/metadata_add_taxonomic_changes_list.html index f1a1d75a0..5fee6b11e 100644 --- a/docs/reference/metadata_add_taxonomic_changes_list.html +++ b/docs/reference/metadata_add_taxonomic_changes_list.html @@ -63,7 +63,7 @@ Adding new datasets into `AusTraits`
  • - Docker for reproducible compute environment + Docker for reproducible compute environment
  • @@ -99,7 +99,7 @@

    Add a list of taxonomic updates into a metadata file for a dataset_id

  • -
    metadata_add_taxonomic_changes_list(dataset_id, taxonomic_updates)
    +
    metadata_add_taxonomic_changes_list(dataset_id, taxonomic_updates)
    diff --git a/docs/reference/metadata_add_traits.html b/docs/reference/metadata_add_traits.html index c33b44cc4..f5d1cccd3 100644 --- a/docs/reference/metadata_add_traits.html +++ b/docs/reference/metadata_add_traits.html @@ -63,7 +63,7 @@ Adding new datasets into `AusTraits`
  • - Docker for reproducible compute environment + Docker for reproducible compute environment
  • @@ -99,7 +99,7 @@

    For specified `dataset_id`, populate columns for traits into metadata

  • -
    metadata_add_traits(dataset_id)
    +
    metadata_add_traits(dataset_id)
    diff --git a/docs/reference/metadata_check_custom_R_code.html b/docs/reference/metadata_check_custom_R_code.html index ef61cfd89..b8f6bfcba 100644 --- a/docs/reference/metadata_check_custom_R_code.html +++ b/docs/reference/metadata_check_custom_R_code.html @@ -67,7 +67,7 @@ Adding new datasets into `AusTraits`
  • - Docker for reproducible compute environment + Docker for reproducible compute environment
  • @@ -106,7 +106,7 @@

    Check the output of running `custom_R_code` specified in

  • -
    metadata_check_custom_R_code(dataset_id)
    +
    metadata_check_custom_R_code(dataset_id)
    diff --git a/docs/reference/metadata_create_template.html b/docs/reference/metadata_create_template.html index 7efc3e63e..1febd917b 100644 --- a/docs/reference/metadata_create_template.html +++ b/docs/reference/metadata_create_template.html @@ -62,7 +62,7 @@ Adding new datasets into `AusTraits`
  • - Docker for reproducible compute environment + Docker for reproducible compute environment
  • @@ -97,11 +97,11 @@

    Create a template of file `metadata.yml` for specified `dataset_id`

  • -
    metadata_create_template(
    -  dataset_id,
    -  path = file.path("data", dataset_id),
    -  skip_manual = FALSE
    -)
    +
    metadata_create_template(
    +  dataset_id,
    +  path = file.path("data", dataset_id),
    +  skip_manual = FALSE
    +)
    diff --git a/docs/reference/metadata_exclude_observations.html b/docs/reference/metadata_exclude_observations.html index e40594570..724186976 100644 --- a/docs/reference/metadata_exclude_observations.html +++ b/docs/reference/metadata_exclude_observations.html @@ -62,7 +62,7 @@ Adding new datasets into `AusTraits`
  • - Docker for reproducible compute environment + Docker for reproducible compute environment
  • @@ -97,7 +97,7 @@

    Exclude observations in a yaml file for a dataset_id

  • -
    metadata_exclude_observations(dataset_id, variable, find, reason)
    +
    metadata_exclude_observations(dataset_id, variable, find, reason)
    diff --git a/docs/reference/metadata_find_taxonomic_change.html b/docs/reference/metadata_find_taxonomic_change.html index 937850007..dd00d255d 100644 --- a/docs/reference/metadata_find_taxonomic_change.html +++ b/docs/reference/metadata_find_taxonomic_change.html @@ -62,7 +62,7 @@ Adding new datasets into `AusTraits`
  • - Docker for reproducible compute environment + Docker for reproducible compute environment
  • @@ -97,7 +97,7 @@

    Find taxonomic changes within the metadata yml files

  • -
    metadata_find_taxonomic_change(find, replace = NULL, studies = NULL)
    +
    metadata_find_taxonomic_change(find, replace = NULL, studies = NULL)
    diff --git a/docs/reference/metadata_path_dataset_id.html b/docs/reference/metadata_path_dataset_id.html index efdc4016c..45b6e2042 100644 --- a/docs/reference/metadata_path_dataset_id.html +++ b/docs/reference/metadata_path_dataset_id.html @@ -62,7 +62,7 @@ Adding new datasets into `AusTraits`
  • - Docker for reproducible compute environment + Docker for reproducible compute environment
  • @@ -97,7 +97,7 @@

    Path to the `metadata.yml` file for specified `dataset_id`

  • -
    metadata_path_dataset_id(dataset_id)
    +
    metadata_path_dataset_id(dataset_id)
    diff --git a/docs/reference/metadata_remove_taxonomic_change.html b/docs/reference/metadata_remove_taxonomic_change.html index a69019bf6..3c5b1378a 100644 --- a/docs/reference/metadata_remove_taxonomic_change.html +++ b/docs/reference/metadata_remove_taxonomic_change.html @@ -62,7 +62,7 @@ Adding new datasets into `AusTraits`
  • - Docker for reproducible compute environment + Docker for reproducible compute environment
  • @@ -97,7 +97,7 @@

    Remove a taxonomic change from a yaml file for a dataset_id

  • -
    metadata_remove_taxonomic_change(dataset_id, find, replace = NULL)
    +
    metadata_remove_taxonomic_change(dataset_id, find, replace = NULL)
    diff --git a/docs/reference/metadata_update_taxonomic_change.html b/docs/reference/metadata_update_taxonomic_change.html index 483de1633..9e61d8552 100644 --- a/docs/reference/metadata_update_taxonomic_change.html +++ b/docs/reference/metadata_update_taxonomic_change.html @@ -62,7 +62,7 @@ Adding new datasets into `AusTraits`
  • - Docker for reproducible compute environment + Docker for reproducible compute environment
  • @@ -97,13 +97,13 @@

    Update a taxonomic change into a yaml file for a dataset_id

  • -
    metadata_update_taxonomic_change(
    -  dataset_id,
    -  find,
    -  replace,
    -  reason,
    -  taxonomic_resolution
    -)
    +
    metadata_update_taxonomic_change(
    +  dataset_id,
    +  find,
    +  replace,
    +  reason,
    +  taxonomic_resolution
    +)
    diff --git a/docs/reference/metadata_user_select_column.html b/docs/reference/metadata_user_select_column.html index e03822130..f2dce611a 100644 --- a/docs/reference/metadata_user_select_column.html +++ b/docs/reference/metadata_user_select_column.html @@ -65,7 +65,7 @@ Adding new datasets into `AusTraits`
  • - Docker for reproducible compute environment + Docker for reproducible compute environment
  • @@ -103,7 +103,7 @@

    Select column by user

  • -
    metadata_user_select_column(column, choices)
    +
    metadata_user_select_column(column, choices)
    diff --git a/docs/reference/metadata_user_select_names.html b/docs/reference/metadata_user_select_names.html index 5f71c28c7..b627e5184 100644 --- a/docs/reference/metadata_user_select_names.html +++ b/docs/reference/metadata_user_select_names.html @@ -64,7 +64,7 @@ Adding new datasets into `AusTraits`
  • - Docker for reproducible compute environment + Docker for reproducible compute environment
  • @@ -101,7 +101,7 @@

    Select variable names by user

  • -
    metadata_user_select_names(title, vars)
    +
    metadata_user_select_names(title, vars)
    diff --git a/docs/reference/notes_random_string.html b/docs/reference/notes_random_string.html index a082d87e7..bc7116934 100644 --- a/docs/reference/notes_random_string.html +++ b/docs/reference/notes_random_string.html @@ -63,7 +63,7 @@ Adding new datasets into `AusTraits`
  • - Docker for reproducible compute environment + Docker for reproducible compute environment
  • @@ -99,7 +99,7 @@

    Create a string of random letters

  • -
    notes_random_string(n = 8)
    +
    notes_random_string(n = 8)
    diff --git a/docs/reference/notetaker_add_note.html b/docs/reference/notetaker_add_note.html index d4e9db092..79d3c0061 100644 --- a/docs/reference/notetaker_add_note.html +++ b/docs/reference/notetaker_add_note.html @@ -62,7 +62,7 @@ Adding new datasets into `AusTraits`
  • - Docker for reproducible compute environment + Docker for reproducible compute environment
  • @@ -97,7 +97,7 @@

    Add a note to the note recorder as a new row

  • -
    notetaker_add_note(notes, new_note)
    +
    notetaker_add_note(notes, new_note)
    diff --git a/docs/reference/notetaker_as_note.html b/docs/reference/notetaker_as_note.html index 062a90300..45ba06232 100644 --- a/docs/reference/notetaker_as_note.html +++ b/docs/reference/notetaker_as_note.html @@ -63,7 +63,7 @@ Adding new datasets into `AusTraits`
  • - Docker for reproducible compute environment + Docker for reproducible compute environment
  • @@ -99,7 +99,7 @@

    Create a tibble with two columns with note and link

  • -
    notetaker_as_note(note, link = NA_character_)
    +
    notetaker_as_note(note, link = NA_character_)
    diff --git a/docs/reference/notetaker_get_note.html b/docs/reference/notetaker_get_note.html index d48dd2200..1158eec24 100644 --- a/docs/reference/notetaker_get_note.html +++ b/docs/reference/notetaker_get_note.html @@ -63,7 +63,7 @@ Adding new datasets into `AusTraits`
  • - Docker for reproducible compute environment + Docker for reproducible compute environment
  • @@ -99,7 +99,7 @@

    Return a specific row from notes

  • -
    notetaker_get_note(notes, i = nrow(notes))
    +
    notetaker_get_note(notes, i = nrow(notes))
    diff --git a/docs/reference/notetaker_print_all.html b/docs/reference/notetaker_print_all.html index 17447014e..ebc67abe6 100644 --- a/docs/reference/notetaker_print_all.html +++ b/docs/reference/notetaker_print_all.html @@ -62,7 +62,7 @@ Adding new datasets into `AusTraits`
  • - Docker for reproducible compute environment + Docker for reproducible compute environment
  • @@ -97,7 +97,7 @@

    Print all notes

  • -
    notetaker_print_all(notes, ..., numbered = TRUE)
    +
    notetaker_print_all(notes, ..., numbered = TRUE)
    diff --git a/docs/reference/notetaker_print_note.html b/docs/reference/notetaker_print_note.html index 40a3e34c5..054ce9666 100644 --- a/docs/reference/notetaker_print_note.html +++ b/docs/reference/notetaker_print_note.html @@ -62,7 +62,7 @@ Adding new datasets into `AusTraits`
  • - Docker for reproducible compute environment + Docker for reproducible compute environment
  • @@ -97,12 +97,12 @@

    Print note (needs review?)

  • -
    notetaker_print_note(
    -  note,
    -  as_anchor = FALSE,
    -  anchor_text = "",
    -  link_text = "link"
    -)
    +
    notetaker_print_note(
    +  note,
    +  as_anchor = FALSE,
    +  anchor_text = "",
    +  link_text = "link"
    +)
    diff --git a/docs/reference/notetaker_print_notes.html b/docs/reference/notetaker_print_notes.html index 2cde701aa..fed736286 100644 --- a/docs/reference/notetaker_print_notes.html +++ b/docs/reference/notetaker_print_notes.html @@ -62,7 +62,7 @@ Adding new datasets into `AusTraits`
  • - Docker for reproducible compute environment + Docker for reproducible compute environment
  • @@ -97,7 +97,7 @@

    Print a specific row from notes

  • -
    notetaker_print_notes(notes, i = nrow(notes), ...)
    +
    notetaker_print_notes(notes, i = nrow(notes), ...)
    diff --git a/docs/reference/notetaker_start.html b/docs/reference/notetaker_start.html index cde73014c..35ca52b4a 100644 --- a/docs/reference/notetaker_start.html +++ b/docs/reference/notetaker_start.html @@ -62,7 +62,7 @@ Adding new datasets into `AusTraits`
  • - Docker for reproducible compute environment + Docker for reproducible compute environment
  • @@ -97,7 +97,7 @@

    Start note recorder (needs review?)

  • -
    notetaker_start()
    +
    notetaker_start()
    diff --git a/docs/reference/pipe.html b/docs/reference/pipe.html index 6facdcaba..ed4768d07 100644 --- a/docs/reference/pipe.html +++ b/docs/reference/pipe.html @@ -62,7 +62,7 @@ Adding new datasets into `AusTraits`
  • - Docker for reproducible compute environment + Docker for reproducible compute environment
  • @@ -97,7 +97,7 @@

    Pipe operator

  • -
    lhs %>% rhs
    +
    lhs %>% rhs
    diff --git a/docs/reference/process_add_all_columns.html b/docs/reference/process_add_all_columns.html index efd87ea16..6846cf67b 100644 --- a/docs/reference/process_add_all_columns.html +++ b/docs/reference/process_add_all_columns.html @@ -63,7 +63,7 @@ Adding new datasets into `AusTraits`
  • - Docker for reproducible compute environment + Docker for reproducible compute environment
  • @@ -99,7 +99,7 @@

    Add or remove columns of data

  • -
    process_add_all_columns(data, vars, add_error_column = TRUE)
    +
    process_add_all_columns(data, vars, add_error_column = TRUE)
    diff --git a/docs/reference/process_convert_units.html b/docs/reference/process_convert_units.html index 8ce96a314..f0ea65337 100644 --- a/docs/reference/process_convert_units.html +++ b/docs/reference/process_convert_units.html @@ -62,7 +62,7 @@ Adding new datasets into `AusTraits`
  • - Docker for reproducible compute environment + Docker for reproducible compute environment
  • @@ -97,7 +97,7 @@

    Convert units to desired type

  • -
    process_convert_units(data, definitions, unit_conversion_functions)
    +
    process_convert_units(data, definitions, unit_conversion_functions)
    diff --git a/docs/reference/process_create_observation_id.html b/docs/reference/process_create_observation_id.html index cd1265e4b..b8f1c256c 100644 --- a/docs/reference/process_create_observation_id.html +++ b/docs/reference/process_create_observation_id.html @@ -64,7 +64,7 @@ Adding new datasets into `AusTraits`
  • - Docker for reproducible compute environment + Docker for reproducible compute environment
  • @@ -101,7 +101,7 @@

    Create entity id

  • -
    process_create_observation_id(data)
    +
    process_create_observation_id(data)
    diff --git a/docs/reference/process_custom_code.html b/docs/reference/process_custom_code.html index 2206c87ce..8e668dea5 100644 --- a/docs/reference/process_custom_code.html +++ b/docs/reference/process_custom_code.html @@ -64,7 +64,7 @@ Adding new datasets into `AusTraits`
  • - Docker for reproducible compute environment + Docker for reproducible compute environment
  • @@ -101,7 +101,7 @@

    Apply custom data manipulations

  • -
    process_custom_code(txt)
    +
    process_custom_code(txt)
    diff --git a/docs/reference/process_flag_excluded_observations.html b/docs/reference/process_flag_excluded_observations.html index cda7e6748..fe7232cd0 100644 --- a/docs/reference/process_flag_excluded_observations.html +++ b/docs/reference/process_flag_excluded_observations.html @@ -64,7 +64,7 @@ Adding new datasets into `AusTraits`
  • - Docker for reproducible compute environment + Docker for reproducible compute environment
  • @@ -101,7 +101,7 @@

    Flag any excluded observations

  • -
    process_flag_excluded_observations(data, metadata)
    +
    process_flag_excluded_observations(data, metadata)
    diff --git a/docs/reference/process_flag_unsupported_traits.html b/docs/reference/process_flag_unsupported_traits.html index e41ba35c0..cb05de20f 100644 --- a/docs/reference/process_flag_unsupported_traits.html +++ b/docs/reference/process_flag_unsupported_traits.html @@ -62,7 +62,7 @@ Adding new datasets into `AusTraits`
  • - Docker for reproducible compute environment + Docker for reproducible compute environment
  • @@ -97,7 +97,7 @@

    Flag any unrecognised traits

  • -
    process_flag_unsupported_traits(data, definitions)
    +
    process_flag_unsupported_traits(data, definitions)
    diff --git a/docs/reference/process_flag_unsupported_values.html b/docs/reference/process_flag_unsupported_values.html index 20261f8b5..5fa2d79d4 100644 --- a/docs/reference/process_flag_unsupported_values.html +++ b/docs/reference/process_flag_unsupported_values.html @@ -63,7 +63,7 @@ Adding new datasets into `AusTraits`
  • - Docker for reproducible compute environment + Docker for reproducible compute environment
  • @@ -99,7 +99,7 @@

    Flag values outside of allowable range

  • -
    process_flag_unsupported_values(data, definitions)
    +
    process_flag_unsupported_values(data, definitions)
    diff --git a/docs/reference/process_format_contexts.html b/docs/reference/process_format_contexts.html index 3240ce1c9..f5f23c37a 100644 --- a/docs/reference/process_format_contexts.html +++ b/docs/reference/process_format_contexts.html @@ -62,7 +62,7 @@ Adding new datasets into `AusTraits`
  • - Docker for reproducible compute environment + Docker for reproducible compute environment
  • @@ -97,7 +97,7 @@

    Format context data from list to tibble

  • -
    process_format_contexts(my_list, dataset_id)
    +
    process_format_contexts(my_list, dataset_id)
    @@ -119,9 +119,9 @@

    Value

    Examples

    -
    if (FALSE) {
    -process_format_contexts(read_metadata("data/Apgaua_2017/metadata.yml")$context)
    -}
    +    
    if (FALSE) {
    +process_format_contexts(read_metadata("data/Apgaua_2017/metadata.yml")$context)
    +}
     
    diff --git a/docs/reference/process_format_contributors.html b/docs/reference/process_format_contributors.html index 4442b161d..fbe507bdd 100644 --- a/docs/reference/process_format_contributors.html +++ b/docs/reference/process_format_contributors.html @@ -62,7 +62,7 @@ Adding new datasets into `AusTraits`
  • - Docker for reproducible compute environment + Docker for reproducible compute environment
  • @@ -97,7 +97,7 @@

    Format contributors from list into tibble

  • -
    process_format_contributors(my_list, dataset_id, schema)
    +
    process_format_contributors(my_list, dataset_id, schema)
    @@ -123,9 +123,9 @@

    Value

    Examples

    -
    if (FALSE) {
    -process_format_contributors(read_metadata("data/Falster_2003/metadata.yml")$contributors)
    -}
    +    
    if (FALSE) {
    +process_format_contributors(read_metadata("data/Falster_2003/metadata.yml")$contributors)
    +}
     
    diff --git a/docs/reference/process_format_locations.html b/docs/reference/process_format_locations.html index 0dfabb69d..d8a07acd9 100644 --- a/docs/reference/process_format_locations.html +++ b/docs/reference/process_format_locations.html @@ -62,7 +62,7 @@ Adding new datasets into `AusTraits`
  • - Docker for reproducible compute environment + Docker for reproducible compute environment
  • @@ -97,7 +97,7 @@

    Format location data from list to tibble

  • -
    process_format_locations(my_list, dataset_id, schema)
    +
    process_format_locations(my_list, dataset_id, schema)
    @@ -123,9 +123,9 @@

    Value

    Examples

    -
    if (FALSE) {
    -process_format_locations(read_metadata("data/Falster_2003/metadata.yml")$locations, "Falster_2003")
    -}
    +    
    if (FALSE) {
    +process_format_locations(read_metadata("data/Falster_2003/metadata.yml")$locations, "Falster_2003")
    +}
     
    diff --git a/docs/reference/process_generate_id.html b/docs/reference/process_generate_id.html index 1c1928a75..eb335a396 100644 --- a/docs/reference/process_generate_id.html +++ b/docs/reference/process_generate_id.html @@ -65,7 +65,7 @@ Adding new datasets into `AusTraits`
  • - Docker for reproducible compute environment + Docker for reproducible compute environment
  • @@ -102,7 +102,7 @@

    Function to generate seuqnece of integer ids from vector of names

  • -
    process_generate_id(x, prefix, sort = FALSE)
    +
    process_generate_id(x, prefix, sort = FALSE)
    diff --git a/docs/reference/process_parse_data.html b/docs/reference/process_parse_data.html index 115d41f8a..82c7f73d1 100644 --- a/docs/reference/process_parse_data.html +++ b/docs/reference/process_parse_data.html @@ -65,7 +65,7 @@ Adding new datasets into `AusTraits`
  • - Docker for reproducible compute environment + Docker for reproducible compute environment
  • @@ -103,7 +103,7 @@

    Process a single dataset

  • -
    process_parse_data(data, dataset_id, metadata, contexts)
    +
    process_parse_data(data, dataset_id, metadata, contexts)
    diff --git a/docs/reference/process_standardise_names.html b/docs/reference/process_standardise_names.html index 3004c6857..c04b4b0b8 100644 --- a/docs/reference/process_standardise_names.html +++ b/docs/reference/process_standardise_names.html @@ -62,7 +62,7 @@ Adding new datasets into `AusTraits`
  • - Docker for reproducible compute environment + Docker for reproducible compute environment
  • @@ -97,7 +97,7 @@

    Standardise species names

  • -
    process_standardise_names(x)
    +
    process_standardise_names(x)
    diff --git a/docs/reference/process_taxonomic_updates.html b/docs/reference/process_taxonomic_updates.html index 24d2f90f2..e99f775e9 100644 --- a/docs/reference/process_taxonomic_updates.html +++ b/docs/reference/process_taxonomic_updates.html @@ -62,7 +62,7 @@ Adding new datasets into `AusTraits`
  • - Docker for reproducible compute environment + Docker for reproducible compute environment
  • @@ -97,7 +97,7 @@

    Apply taxonomic updates

  • -
    process_taxonomic_updates(data, metadata)
    +
    process_taxonomic_updates(data, metadata)
    diff --git a/docs/reference/process_unit_conversion_name.html b/docs/reference/process_unit_conversion_name.html index 9ecc1f6b6..efc4713d3 100644 --- a/docs/reference/process_unit_conversion_name.html +++ b/docs/reference/process_unit_conversion_name.html @@ -63,7 +63,7 @@ Adding new datasets into `AusTraits`
  • - Docker for reproducible compute environment + Docker for reproducible compute environment
  • @@ -99,7 +99,7 @@

    Generate unit conversion name

  • -
    process_unit_conversion_name(from, to)
    +
    process_unit_conversion_name(from, to)
    diff --git a/docs/reference/read_csv_char.html b/docs/reference/read_csv_char.html index d4b86a1f7..f81de1d23 100644 --- a/docs/reference/read_csv_char.html +++ b/docs/reference/read_csv_char.html @@ -63,7 +63,7 @@ Adding new datasets into `AusTraits`
  • - Docker for reproducible compute environment + Docker for reproducible compute environment
  • @@ -99,7 +99,7 @@

    Read in a csv as a tibble with column types as characters

  • -
    read_csv_char(...)
    +
    read_csv_char(...)
    diff --git a/docs/reference/read_metadata.html b/docs/reference/read_metadata.html index 3f6339f02..96b3d4a41 100644 --- a/docs/reference/read_metadata.html +++ b/docs/reference/read_metadata.html @@ -62,7 +62,7 @@ Adding new datasets into `AusTraits`
  • - Docker for reproducible compute environment + Docker for reproducible compute environment
  • @@ -97,7 +97,7 @@

    Read in a metadata.yml file for a study

  • -
    read_metadata(path)
    +
    read_metadata(path)
    diff --git a/docs/reference/read_metadata_dataset.html b/docs/reference/read_metadata_dataset.html index a98059a28..2fe7be89a 100644 --- a/docs/reference/read_metadata_dataset.html +++ b/docs/reference/read_metadata_dataset.html @@ -62,7 +62,7 @@ Adding new datasets into `AusTraits`
  • - Docker for reproducible compute environment + Docker for reproducible compute environment
  • @@ -97,7 +97,7 @@

    Read the `metadata.yml` file for specified `dataset_id`

  • -
    read_metadata_dataset(dataset_id)
    +
    read_metadata_dataset(dataset_id)
    diff --git a/docs/reference/read_yaml.html b/docs/reference/read_yaml.html index 1a531a85d..d9de5e3fc 100644 --- a/docs/reference/read_yaml.html +++ b/docs/reference/read_yaml.html @@ -62,7 +62,7 @@ Adding new datasets into `AusTraits`
  • - Docker for reproducible compute environment + Docker for reproducible compute environment
  • diff --git a/docs/reference/util_append_to_list.html b/docs/reference/util_append_to_list.html index 89bfa63b8..e37674343 100644 --- a/docs/reference/util_append_to_list.html +++ b/docs/reference/util_append_to_list.html @@ -62,7 +62,7 @@ Adding new datasets into `AusTraits`
  • - Docker for reproducible compute environment + Docker for reproducible compute environment
  • @@ -97,7 +97,7 @@

    Add an item to the end of a list

  • -
    util_append_to_list(my_list, to_append)
    +
    util_append_to_list(my_list, to_append)
    @@ -119,9 +119,9 @@

    Value

    Examples

    -
     if (FALSE) {
    -util_append_to_list(as.list(iris)[c(1,2)], as.list(iris)[c(3,4)])
    -}
    +    
     if (FALSE) {
    +util_append_to_list(as.list(iris)[c(1,2)], as.list(iris)[c(3,4)])
    +}
     
    diff --git a/docs/reference/util_bib_to_list.html b/docs/reference/util_bib_to_list.html index d3dfb540f..420d66d55 100644 --- a/docs/reference/util_bib_to_list.html +++ b/docs/reference/util_bib_to_list.html @@ -62,7 +62,7 @@ Adding new datasets into `AusTraits`
  • - Docker for reproducible compute environment + Docker for reproducible compute environment
  • @@ -97,7 +97,7 @@

    Convert BibEntry object to a list

  • -
    util_bib_to_list(bib)
    +
    util_bib_to_list(bib)
    diff --git a/docs/reference/util_check_all_values_in.html b/docs/reference/util_check_all_values_in.html index 808413bdc..3a2f7e41e 100644 --- a/docs/reference/util_check_all_values_in.html +++ b/docs/reference/util_check_all_values_in.html @@ -63,7 +63,7 @@ Adding new datasets into `AusTraits`
  • - Docker for reproducible compute environment + Docker for reproducible compute environment
  • @@ -99,7 +99,7 @@

    Check values in one vector against values in another vector

  • -
    util_check_all_values_in(x, y, sep = " ")
    +
    util_check_all_values_in(x, y, sep = " ")
    diff --git a/docs/reference/util_df_convert_character.html b/docs/reference/util_df_convert_character.html index 85fb1bd97..97f001d03 100644 --- a/docs/reference/util_df_convert_character.html +++ b/docs/reference/util_df_convert_character.html @@ -62,7 +62,7 @@ Adding new datasets into `AusTraits`
  • - Docker for reproducible compute environment + Docker for reproducible compute environment
  • @@ -97,7 +97,7 @@

    Convert all columns in data frame to character

  • -
    util_df_convert_character(df)
    +
    util_df_convert_character(df)
    @@ -115,7 +115,7 @@

    Value

    Examples

    -
    lapply(austraits.build:::util_df_convert_character(iris), class) 
    +    
    lapply(austraits.build:::util_df_convert_character(iris), class) 
     #> $Sepal.Length
     #> [1] "character"
     #> 
    diff --git a/docs/reference/util_df_to_list.html b/docs/reference/util_df_to_list.html
    index 8e65f01e1..db09ee9ec 100644
    --- a/docs/reference/util_df_to_list.html
    +++ b/docs/reference/util_df_to_list.html
    @@ -62,7 +62,7 @@
           Adding new datasets into `AusTraits`
         
         
  • - Docker for reproducible compute environment + Docker for reproducible compute environment
  • @@ -97,7 +97,7 @@

    Convert dataframe to list

  • -
    util_df_to_list(df)
    +
    util_df_to_list(df)
    @@ -115,7 +115,7 @@

    Value

    Examples

    -
    util_df_to_list(iris)
    +    
    util_df_to_list(iris)
     #> [[1]]
     #> [[1]]$Sepal.Length
     #> [1] 5.1
    diff --git a/docs/reference/util_extract_list_element.html b/docs/reference/util_extract_list_element.html
    index 8031acb04..e0bceb208 100644
    --- a/docs/reference/util_extract_list_element.html
    +++ b/docs/reference/util_extract_list_element.html
    @@ -62,7 +62,7 @@
           Adding new datasets into `AusTraits`
         
         
  • - Docker for reproducible compute environment + Docker for reproducible compute environment
  • @@ -97,7 +97,7 @@

    Extract a trait element from the definitions$traits$elements

  • -
    util_extract_list_element(i, my_list, var)
    +
    util_extract_list_element(i, my_list, var)
    @@ -123,9 +123,9 @@

    Value

    Examples

    -
    if (FALSE) {
    -util_extract_list_element(1, definitions$traits$elements, "units")
    -}
    +    
    if (FALSE) {
    +util_extract_list_element(1, definitions$traits$elements, "units")
    +}
     
    diff --git a/docs/reference/util_get_SHA.html b/docs/reference/util_get_SHA.html index 661a731b1..d88be3678 100644 --- a/docs/reference/util_get_SHA.html +++ b/docs/reference/util_get_SHA.html @@ -64,7 +64,7 @@ Adding new datasets into `AusTraits`
  • - Docker for reproducible compute environment + Docker for reproducible compute environment
  • @@ -101,7 +101,7 @@

    Get SHA string from Github repository for latest commit

  • -
    util_get_SHA(path = ".")
    +
    util_get_SHA(path = ".")
    diff --git a/docs/reference/util_get_version.html b/docs/reference/util_get_version.html index 9712cf8a7..f6ab9fdfa 100644 --- a/docs/reference/util_get_version.html +++ b/docs/reference/util_get_version.html @@ -62,7 +62,7 @@ Adding new datasets into `AusTraits`
  • - Docker for reproducible compute environment + Docker for reproducible compute environment
  • @@ -97,7 +97,7 @@

    Retrieve version for compilation from definitions

  • -
    util_get_version(path = "config/metadata.yml")
    +
    util_get_version(path = "config/metadata.yml")
    diff --git a/docs/reference/util_kable_styling_html.html b/docs/reference/util_kable_styling_html.html index 9af310bbb..161df71e0 100644 --- a/docs/reference/util_kable_styling_html.html +++ b/docs/reference/util_kable_styling_html.html @@ -62,7 +62,7 @@ Adding new datasets into `AusTraits`
  • - Docker for reproducible compute environment + Docker for reproducible compute environment
  • @@ -97,7 +97,7 @@

    Format table with kable and default styling for html

  • -
    util_kable_styling_html(...)
    +
    util_kable_styling_html(...)
    diff --git a/docs/reference/util_list_to_bib.html b/docs/reference/util_list_to_bib.html index 44b790371..c722674ef 100644 --- a/docs/reference/util_list_to_bib.html +++ b/docs/reference/util_list_to_bib.html @@ -62,7 +62,7 @@ Adding new datasets into `AusTraits`
  • - Docker for reproducible compute environment + Docker for reproducible compute environment
  • @@ -97,7 +97,7 @@

    Convert a list of elements into a BibEntry object

  • -
    util_list_to_bib(ref)
    +
    util_list_to_bib(ref)
    diff --git a/docs/reference/util_list_to_df1.html b/docs/reference/util_list_to_df1.html index c87b505bf..307122dfa 100644 --- a/docs/reference/util_list_to_df1.html +++ b/docs/reference/util_list_to_df1.html @@ -62,7 +62,7 @@ Adding new datasets into `AusTraits`
  • - Docker for reproducible compute environment + Docker for reproducible compute environment
  • @@ -97,7 +97,7 @@

    Convert a list with single entries to dataframe

  • -
    util_list_to_df1(my_list)
    +
    util_list_to_df1(my_list)
    @@ -115,9 +115,9 @@

    Value

    Examples

    -
    if (FALSE) {
    -util_list_to_df1(as.list(iris)[2])
    -}
    +    
    if (FALSE) {
    +util_list_to_df1(as.list(iris)[2])
    +}
     
    diff --git a/docs/reference/util_list_to_df2.html b/docs/reference/util_list_to_df2.html index 92057d5cd..082f0901c 100644 --- a/docs/reference/util_list_to_df2.html +++ b/docs/reference/util_list_to_df2.html @@ -62,7 +62,7 @@ Adding new datasets into `AusTraits`
  • - Docker for reproducible compute environment + Docker for reproducible compute environment
  • @@ -97,7 +97,7 @@

    Convert a list of lists to dataframe

  • -
    util_list_to_df2(my_list, as_character = TRUE, on_empty = NA)
    +
    util_list_to_df2(my_list, as_character = TRUE, on_empty = NA)
    @@ -117,7 +117,7 @@

    Arguments

    Examples

    -
    util_list_to_df2(util_df_to_list(iris))
    +    
    util_list_to_df2(util_df_to_list(iris))
     #> # A tibble: 150 × 5
     #>    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
     #>    <chr>        <chr>       <chr>        <chr>       <chr>  
    diff --git a/docs/reference/util_replace_null.html b/docs/reference/util_replace_null.html
    index 73f40d3fc..cf6de53f0 100644
    --- a/docs/reference/util_replace_null.html
    +++ b/docs/reference/util_replace_null.html
    @@ -63,7 +63,7 @@
           Adding new datasets into `AusTraits`
         
         
  • - Docker for reproducible compute environment + Docker for reproducible compute environment
  • @@ -99,7 +99,7 @@

    Convert NULL values to a different value

  • -
    util_replace_null(x, val = NA)
    +
    util_replace_null(x, val = NA)
    @@ -121,9 +121,9 @@

    Value

    Examples

    -
    if (FALSE) {
    -util_replace_null(NULL)
    -}
    +    
    if (FALSE) {
    +util_replace_null(NULL)
    +}
     
    diff --git a/docs/reference/util_separate_and_sort.html b/docs/reference/util_separate_and_sort.html index d8f810ee8..60ffc644f 100644 --- a/docs/reference/util_separate_and_sort.html +++ b/docs/reference/util_separate_and_sort.html @@ -63,7 +63,7 @@ Adding new datasets into `AusTraits`
  • - Docker for reproducible compute environment + Docker for reproducible compute environment
  • @@ -99,7 +99,7 @@

    Split and sort cells with multiple values

  • -
    util_separate_and_sort(x, sep = " ")
    +
    util_separate_and_sort(x, sep = " ")
    @@ -121,7 +121,7 @@

    Value

    Examples

    -
    if (FALSE) util_separate_and_sort("z y x")
    +    
    if (FALSE) util_separate_and_sort("z y x")
     
    diff --git a/docs/reference/util_standardise_doi.html b/docs/reference/util_standardise_doi.html index 691bfc80c..8a2fceb64 100644 --- a/docs/reference/util_standardise_doi.html +++ b/docs/reference/util_standardise_doi.html @@ -62,7 +62,7 @@ Adding new datasets into `AusTraits`
  • - Docker for reproducible compute environment + Docker for reproducible compute environment
  • @@ -97,7 +97,7 @@

    Standarise doi into form https://doi.org/XXX

  • -
    util_standardise_doi(doi)
    +
    util_standardise_doi(doi)
    diff --git a/docs/reference/util_strip_taxon_names.html b/docs/reference/util_strip_taxon_names.html index f74a5d6cc..1281f8a0b 100644 --- a/docs/reference/util_strip_taxon_names.html +++ b/docs/reference/util_strip_taxon_names.html @@ -63,7 +63,7 @@ Adding new datasets into `AusTraits`
  • - Docker for reproducible compute environment + Docker for reproducible compute environment
  • @@ -99,7 +99,7 @@

    Strip scientific names of formatting and abbreviations

  • -
    util_strip_taxon_names(x)
    +
    util_strip_taxon_names(x)
    @@ -117,7 +117,7 @@

    Value

    Examples

    -
    c("Bankisa_serrata", "bankisa  serrata", "Banksia Seratta") %>% util_strip_taxon_names()
    +    
    c("Bankisa_serrata", "bankisa  serrata", "Banksia Seratta") %>% util_strip_taxon_names()
     #> [1] "bankisa serrata" "bankisa serrata" "banksia seratta"
     
    diff --git a/docs/reference/write_metadata.html b/docs/reference/write_metadata.html index 66d9e2637..f337043e3 100644 --- a/docs/reference/write_metadata.html +++ b/docs/reference/write_metadata.html @@ -62,7 +62,7 @@ Adding new datasets into `AusTraits`
  • - Docker for reproducible compute environment + Docker for reproducible compute environment
  • @@ -97,7 +97,7 @@

    Write metadata.yml for a study

  • -
    write_metadata(data, path, style_code = FALSE)
    +
    write_metadata(data, path, style_code = FALSE)
    @@ -117,11 +117,11 @@

    Arguments

    Examples

    -
    if (FALSE) {
    -f <- "data/Falster_2003/metadata.yml"
    -data <- read_metadata(f)
    -write_metadata(data, f)
    -}
    +    
    if (FALSE) {
    +f <- "data/Falster_2003/metadata.yml"
    +data <- read_metadata(f)
    +write_metadata(data, f)
    +}
     
    diff --git a/docs/reference/write_metadata_dataset.html b/docs/reference/write_metadata_dataset.html index 10123ce9d..d5f043ad9 100644 --- a/docs/reference/write_metadata_dataset.html +++ b/docs/reference/write_metadata_dataset.html @@ -65,7 +65,7 @@ Adding new datasets into `AusTraits`
  • - Docker for reproducible compute environment + Docker for reproducible compute environment
  • @@ -102,7 +102,7 @@

    Write the YAML representation of metadata.yml for specified `dataset_id` to

  • -
    write_metadata_dataset(metadata, dataset_id)
    +
    write_metadata_dataset(metadata, dataset_id)
    diff --git a/docs/reference/write_plaintext.html b/docs/reference/write_plaintext.html index 85b3ad82a..93b4b9bd8 100644 --- a/docs/reference/write_plaintext.html +++ b/docs/reference/write_plaintext.html @@ -62,7 +62,7 @@ Adding new datasets into `AusTraits`
  • - Docker for reproducible compute environment + Docker for reproducible compute environment
  • @@ -97,7 +97,7 @@

    Export AusTraits version as plain text

  • -
    write_plaintext(austraits, path)
    +
    write_plaintext(austraits, path)
    diff --git a/docs/reference/write_yaml.html b/docs/reference/write_yaml.html index a295046dd..aec5819fb 100644 --- a/docs/reference/write_yaml.html +++ b/docs/reference/write_yaml.html @@ -62,7 +62,7 @@ Adding new datasets into `AusTraits`
  • - Docker for reproducible compute environment + Docker for reproducible compute environment
  • diff --git a/docs/sitemap.xml b/docs/sitemap.xml index 393c7c305..820c2b8f4 100644 --- a/docs/sitemap.xml +++ b/docs/sitemap.xml @@ -3,6 +3,18 @@ http://traitecoevo.github.io/austraits.build/404.html + + http://traitecoevo.github.io/austraits.build/CODE_OF_CONDUCT.html + + + http://traitecoevo.github.io/austraits.build/CONTRIBUTING.html + + + http://traitecoevo.github.io/austraits.build/ISSUE_TEMPLATE.html + + + http://traitecoevo.github.io/austraits.build/LICENSE-text.html + http://traitecoevo.github.io/austraits.build/articles/adding_data.html @@ -22,7 +34,7 @@ http://traitecoevo.github.io/austraits.build/articles/contributing_data.html - http://traitecoevo.github.io/austraits.build/articles/Docker.html + http://traitecoevo.github.io/austraits.build/articles/docker.html http://traitecoevo.github.io/austraits.build/articles/index.html @@ -33,21 +45,9 @@ http://traitecoevo.github.io/austraits.build/authors.html - - http://traitecoevo.github.io/austraits.build/CODE_OF_CONDUCT.html - - - http://traitecoevo.github.io/austraits.build/CONTRIBUTING.html - http://traitecoevo.github.io/austraits.build/index.html - - http://traitecoevo.github.io/austraits.build/ISSUE_TEMPLATE.html - - - http://traitecoevo.github.io/austraits.build/LICENSE-text.html - http://traitecoevo.github.io/austraits.build/news/index.html
  • @@ -749,8 +668,7 @@
    Contributors -ORCID ID (Open Researcher and Contributor ID) for the data collector, if -available. +ORCID ID (Open Researcher and Contributor ID) for the data collector, if available.