Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

forcis: An R client to access the FORCIS database #660

Open
17 of 29 tasks
ahasverus opened this issue Sep 28, 2024 · 10 comments
Open
17 of 29 tasks

forcis: An R client to access the FORCIS database #660

ahasverus opened this issue Sep 28, 2024 · 10 comments
Assignees

Comments

@ahasverus
Copy link

ahasverus commented Sep 28, 2024

Submitting Author Name: Nicolas Casajus
Submitting Author Github Handle: @ahasverus
Other Package Authors Github handles: (comma separated, delete if none) @MatGreco90, @ChaabaneS, @xgiraud
Repository: https://github.com/frbcesab/forcis
Version submitted: 0.1.0
Submission type: Standard
Editor: @beatrizmilz
Reviewers: TBD

Archive: TBD
Version accepted: TBD
Language: en


  • Paste the full DESCRIPTION file inside a code block below:
Package: forcis
Type: Package
Title: An R Client to Access the FORCIS Database
Version: 0.1.0
Authors@R: c(
    person(given   = "Nicolas",
           family  = "Casajus",
           role    = c("aut", "cre", "cph"),
           email   = "[email protected]",
           comment = c(ORCID = "0000-0002-5537-5294")),
    person(given   = "Mattia",
           family  = "Greco",
           role    = "aut",
           email   = "[email protected]",
           comment = c(ORCID = "0000-0003-2416-6235")),
    person(given   = "Sonia",
           family  = "Chaabane",
           role    = "aut",
           email   = "[email protected]",
           comment = c(ORCID = "0000-0002-4653-8610")),
    person(given   = "Xavier",
           family  = "Giraud",
           role    = "aut",
           email   = "[email protected]",
           comment = c(ORCID = "0000-0001-5067-8176")),
    person(given   = "Thibault",
           family  = "de Garidel-Thoron",
           role    = "aut",
           email   = "[email protected]",
           comment = c(ORCID = "0000-0001-8983-9571")),
    person(given   = "Khalil",
           family  = "Hammami",
           role    = "ctb",
           email   = "[email protected]"))
Description: Provides an interface to the FORCIS database 
    (<https://zenodo.org/doi/10.5281/zenodo.7390791>) on global foraminifera
    distribution. This package allows to download and to handle FORCIS data.
    It is part of the FRB-CESAB working group FORCIS.
    <https://www.fondationbiodiversite.fr/en/the-frb-in-action/programs-and-projects/le-cesab/forcis/>.
URL: https://frbcesab.github.io/forcis
BugReports: https://github.com/FRBCesab/forcis/issues
License: GPL (>= 2)
Encoding: UTF-8
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.3.2
VignetteBuilder: knitr
Depends: 
    R (>= 2.10)
Imports: 
    dplyr,
    ggplot2,
    jsonlite,
    rlang,
    sf,
    tidyr,
    utils,
    vroom
Suggests: 
    fs,
    knitr,
    rmarkdown,
    testthat (>= 3.0.0),
    withr
Config/testthat/edition: 3

Scope

  • Please indicate which category or categories from our package fit policies this package falls under: (Please check an appropriate box below. If you are unsure, we suggest you make a pre-submission inquiry.):

    • data retrieval
    • data extraction
    • data munging
    • data deposition
    • data validation and testing
    • workflow automation
    • version control
    • citation management and bibliometrics
    • scientific software wrappers
    • field and lab reproducibility tools
    • database software bindings
    • geospatial data
    • text analysis
  • Explain how and why the package falls under these categories (briefly, 1-2 sentences):

This package is designed to download the FORCIS data files hosted on Zenodo. It includes functions to download (data retrieval), select, filter, reshape, and visualize data (data munging).

  • Who is the target audience and what are scientific applications of this package?

This package should be of interest to scientists working on Foraminifera species distribution and interested in the FORCIS database (spatial analyses, time series analyses, etc.). The package have been developed to facilitate the data wrangling to avoid some pitfalls and to easily get data ready to be analyzed/visualized.

No other package exists to handle the FORCIS database. Note that we are authors of the database and already published a Data paper describing the database.

Not applicable.

  • If you made a pre-submission inquiry, please paste the link to the corresponding issue, forum post, or other discussion, or @tag the editor you contacted.

Pre-submission inquiry: #655
Editor: @adamhsparks

  • Explain reasons for any pkgcheck items which your package is unable to pass.

The function pkgcheck::pkgcheck() returns the following report:

── forcis 0.1.0 ────────────────────────────────────────────

✔ Package name is available
✔ has a 'codemeta.json' file.
✔ has a 'contributing' file.
✔ uses 'roxygen2'.
✔ 'DESCRIPTION' has a URL field.
✔ 'DESCRIPTION' has a BugReports field.
✔ Package has at least one HTML vignette
✔ All functions have examples.
✔ Package has continuous integration checks.
✔ Package coverage is 97.4%.
✔ R CMD check found no errors.
✔ R CMD check found no warnings.

ℹ Current status:
✔ This package may be submitted.

The package goodpractice returns warnings:

  • Write unit tests: some functions are difficult to test (HTTP requests)
  • Avoid calling setwd(): this function is used in unit tests in combination with withr::defer()

Technical checks

Confirm each of the following by checking the box.

This package:

Publication options

  • Do you intend for this package to go on CRAN?

  • Do you intend for this package to go on Bioconductor?

  • Do you wish to submit an Applications Article about your package to Methods in Ecology and Evolution? If so:

MEE Options
  • The package is novel and will be of interest to the broad readership of the journal.
  • The manuscript describing the package is no longer than 3000 words.
  • You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see MEE's Policy on Publishing Code)
  • (Scope: Do consider MEE's Aims and Scope for your manuscript. We make no guarantee that your manuscript will be within MEE scope.)
  • (Although not required, we strongly recommend having a full manuscript prepared when you submit here.)
  • (Please do not submit your package separately to Methods in Ecology and Evolution)

Code of conduct

@ropensci-review-bot
Copy link
Collaborator

Thanks for submitting to rOpenSci, our editors and @ropensci-review-bot will reply soon. Type @ropensci-review-bot help for help.

@ropensci-review-bot
Copy link
Collaborator

🚀

Editor check started

👋

@ropensci-review-bot
Copy link
Collaborator

Checks for forcis (v0.1.0)

git hash: e80b91c5

  • ✔️ Package name is available
  • ✔️ has a 'codemeta.json' file.
  • ✔️ has a 'contributing' file.
  • ✔️ uses 'roxygen2'.
  • ✔️ 'DESCRIPTION' has a URL field.
  • ✔️ 'DESCRIPTION' has a BugReports field.
  • ✔️ Package has at least one HTML vignette
  • ✔️ All functions have examples.
  • ✔️ Package has continuous integration checks.
  • ✔️ Package coverage is 97.4%.
  • ✔️ R CMD check found no errors.
  • ✔️ R CMD check found no warnings.

Package License: GPL (>= 2)


1. Package Dependencies

Details of Package Dependency Usage (click to open)

The table below tallies all function calls to all packages ('ncalls'), both internal (r-base + recommended, along with the package itself), and external (imported and suggested packages). 'NA' values indicate packages to which no identified calls to R functions could be found. Note that these results are generated by an automated code-tagging system which may not be entirely accurate.

type package ncalls
internal base 105
internal forcis 66
internal graphics 3
imports utils 56
imports sf 12
imports vroom 10
imports jsonlite 3
imports dplyr NA
imports ggplot2 NA
imports rlang NA
imports tidyr NA
suggests fs NA
suggests knitr NA
suggests rmarkdown NA
suggests testthat NA
suggests withr NA
linking_to NA NA

Click below for tallies of functions used in each package. Locations of each call within this package may be generated locally by running 's <- pkgstats::pkgstats(<path/to/repo>)', and examining the 'external_calls' table.

base

file.path (11), c (7), which (7), data.frame (6), for (6), seq_len (6), suppressWarnings (6), length (5), list.files (5), rbind (5), unique (5), as.numeric (4), colnames (3), tryCatch (3), url (3), drop (2), file (2), format (2), lapply (2), months (2), paste0 (2), strsplit (2), unlist (2), as.Date (1), gsub (1), nrow (1), options (1), readline (1), readLines (1), which.max (1)

forcis

get_species_names (11), data_to_sf (5), species_list (5), get_available_versions (3), get_metadata (3), cpr_north_filename (2), cpr_south_filename (2), get_current_version (2), get_latest_version (2), add_data_type (1), check_field_in_data (1), check_if_character (1), check_if_df (1), check_if_path_exists (1), check_if_valid_taxonomy (1), check_required_columns (1), check_unique_taxonomy (1), check_version (1), compute_abundances (1), compute_concentrations (1), compute_frequencies (1), convert_to_long_format (1), crs_robinson (1), data_types (1), date_format (1), download_file (1), download_forcis_db (1), filter_by_bbox (1), filter_by_month (1), filter_by_ocean (1), filter_by_polygon (1), filter_by_species (1), filter_by_year (1), geom_basemap (1), get_data_type (1), get_required_columns (1), get_version_metadata (1), plankton_net_filename (1), pump_filename (1), sediment_trap_filename (1)

utils

data (55), download.file (1)

sf

st_intersects (6), st_bbox (3), st_crs (2), st_as_sf (1)

vroom

vroom (10)

graphics

polygon (3)

jsonlite

read_json (3)

NOTE: Some imported packages appear to have no associated function calls; please ensure with author that these 'Imports' are listed appropriately.


2. Statistical Properties

This package features some noteworthy statistical properties which may need to be clarified by a handling editor prior to progressing.

Details of statistical properties (click to open)

The package has:

  • code in R (100% in 33 files) and
  • 5 authors
  • 6 vignettes
  • no internal data file
  • 8 imported packages
  • 31 exported functions (median 27 lines of code)
  • 81 non-exported functions in R (median 16 lines of code)

Statistical properties of package structure as distributional percentiles in relation to all current CRAN packages
The following terminology is used:

  • loc = "Lines of Code"
  • fn = "function"
  • exp/not_exp = exported / not exported

All parameters are explained as tooltips in the locally-rendered HTML version of this report generated by the checks_to_markdown() function

The final measure (fn_call_network_size) is the total number of calls between functions (in R), or more abstract relationships between code objects in other languages. Values are flagged as "noteworthy" when they lie in the upper or lower 5th percentile.

measure value percentile noteworthy
files_R 33 90.1
files_vignettes 6 96.8
files_tests 35 98.0
loc_R 1549 76.5
loc_vignettes 418 71.2
loc_tests 1200 86.4
num_vignettes 6 97.7 TRUE
n_fns_r 112 77.5
n_fns_r_exported 31 78.2
n_fns_r_not_exported 81 77.6
n_fns_per_file_r 2 31.7
num_params_per_fn 2 8.2
loc_per_fn_r 19 57.7
loc_per_fn_r_exp 27 58.8
loc_per_fn_r_not_exp 16 53.2
rel_whitespace_R 48 92.5
rel_whitespace_vignettes 68 89.5
rel_whitespace_tests 55 95.6 TRUE
doclines_per_fn_exp 38 46.8
doclines_per_fn_not_exp 0 0.0 TRUE
fn_call_network_size 157 84.7

2a. Network visualisation

Click to see the interactive network visualisation of calls between objects in package


3. goodpractice and other checks

Details of goodpractice checks (click to open)

3a. Continuous Integration Badges

R-CMD-check.yaml
pkgdown.yaml

GitHub Workflow Results

id name conclusion sha run_number date
11082922089 pages build and deployment success 702763 164 2024-09-28
11082912111 pkgdown success e80b91 199 2024-09-28
11082809930 R CMD Check success e80b91 198 2024-09-28
11082809923 Test coverage success e80b91 94 2024-09-28
11082912112 Update CITATION.cff success e80b91 23 2024-09-28

3b. goodpractice results

R CMD check with rcmdcheck

R CMD check generated the following check_fail:

  1. no_import_package_as_a_whole

Test coverage with covr

Package coverage: 97.38

Cyclocomplexity with cyclocomp

No functions have cyclocomplexity >= 15

Static code analyses with lintr

lintr found the following 7 potential issues:

message number of times
Avoid changing the working directory, or restore it in on.exit 2
Avoid library() and require() calls in packages 5


Package Versions

package version
pkgstats 0.1.6.17
pkgcheck 0.1.2.58


Editor-in-Chief Instructions:

This package is in top shape and may be passed on to a handling editor

@adamhsparks
Copy link
Member

@ropensci-review-bot assign @beatrizmilz as editor

@ropensci-review-bot
Copy link
Collaborator

Assigned! @beatrizmilz is now the editor

@beatrizmilz
Copy link

Hi @ahasverus!
I'm Beatriz, and I'll be the editor for your submission. 👋

@beatrizmilz
Copy link

Editor checks:

  • Documentation: The package has sufficient documentation available online (README, pkgdown docs) to allow for an assessment of functionality and scope without installing the package. In particular,
    • Is the case for the package well made?
    • Is the reference index page clear (grouped by topic if necessary)?
    • Are vignettes readable, sufficiently detailed and not just perfunctory?
  • Fit: The package meets criteria for fit and overlap.
  • Installation instructions: Are installation instructions clear enough for human users?
  • Tests: If the package has some interactivity / HTTP / plot production etc. are the tests using state-of-the-art tooling?
  • Contributing information: Is the documentation for contribution clear enough e.g. tokens for tests, playgrounds?
  • License: The package has a CRAN or OSI accepted license.
  • Project management: Are the issue and PR trackers in a good shape, e.g. are there outstanding bugs, is it clear when feature requests are meant to be tackled? (there are no open issues)

Editor comments

Hi @ahasverus ! Congratulations for you and the team for this great package, and also for the [publication of the data paper on Nature] (https://www.nature.com/articles/s41597-023-02264-2).

The comments below are related to the Editor Checks above.

A good practice to ensure that the user works with the latest version of the database might be to add this line at the beginning of the script:

download_forcis_db(version = NULL, ...)

  • (2) In the vignette Select, reshape, and filter data: it's not clear to me what the required columns mean. I saw the page for the function get_required_columns(), and there is a list of columns. But could it be possible to describe a bit more? Why are they required? Is that because these columns are the most important for basic analysis?

  • (3) In the vignettes (for example Select, reshape, and filter data): The names of the sections are the names of functions. I think it's best to name sections with a description of the task.
    You can see some examples in vignettes of other packages in rOpenSci: magick, tabulapdf, etc.

For example:

How it is now:

On this page

    Setup
    select_taxonomy()
    select_forcis_columns()
    filter_by_month()
    filter_by_year()
    filter_by_bbox()
    filter_by_ocean()
    filter_by_polygon()
    filter_by_species()
    convert_to_long_format() 

Idea: (this is just an example)


    Setup
    Selecting columns
       Selecting columns by taxonomy
       Selecting required columns 
    Filtering rows
       Filter by month of data collection
       Filter by year of data collection
       Filter by location (bounding box)
       Filter by ocean
       Filter by polygon
       Filter by species
    Reshaping
       Convert to long format

  • (4) About tests for fuctions that creates plots, in the dev guide is said:

For testing your functions creating plots, we suggest using vdiffr, an extension of the testthat package that relies on testthat snapshot tests.

From what I checked in some tests (eg. test-plot_record_by_month.R), the tests verify the class of the plot created.
Could you improve the tests for the functions that create plots using vdiffr?


  • (5) About testing the download functions: I looked up the tests for the download function but I'm not sure if it follows the best practices. I need some time to read more about testing functions that access resources on the web (eg. the dev guide recommends the book HTTP testing in R). As you wrote in the submission, "some functions are difficult to test (HTTP requests)". I'll be back with that feedback soon!

  • (6) This is more of a question. In the package webpage, it says that the package has been developed for the Centre for the Synthesis and Analysis of Biodiversity. Does this mean that they are the funder? If so, they can be added as fnd in the authors list. This post can be useful to understand these three-letter-code used in the authors list on DESCRIPTION.

@ahasverus
Copy link
Author

Hi @beatrizmilz!

Thank you for this first round of comments. I will start looking into them in the next few days and come back to you as soon as possible.

@beatrizmilz
Copy link

Hi @beatrizmilz!

Thank you for this first round of comments. I will start looking into them in the next few days and come back to you as soon as possible.

Hi! I hope the comments are helpful.
Most of the comments are suggestions, feel free to work on the ones that makes sense for you.

The comments about testing are the most important, since they are recommendations from the dev guide!

@ahasverus
Copy link
Author

Hi @beatrizmilz!

No, all your comments are very useful.
I have started to work on some of them.


A good practice to ensure that the user works with the latest version of the database might be to add this line at the beginning of the script:

download_forcis_db(version = NULL, ...)

Answer: Thanks for reporting the lack of clarity of this section. I have added a few sentences to clarify this paragraph in the vignette Database versions (commit 3ab5baa). I hope it's clearer.


  • (2) In the vignette Select, reshape, and filter data: it's not clear to me what the required columns mean. I saw the page for the function get_required_columns(), and there is a list of columns. But could it be possible to describe a bit more? Why are they required? Is that because these columns are the most important for basic analysis?

Answer: I have added a few sentences to explain why these columns are required in the vignette Select, reshape, and filter data and in the documentation of the function get_required_columns() (commit 06a053b).


  • (3) In the vignettes (for example Select, reshape, and filter data): The names of the sections are the names of functions. I think it's best to name sections with a description of the task.
    You can see some examples in vignettes of other packages in rOpenSci: magick, tabulapdf, etc.

Answer: Thanks for this suggestion. I have modified the section names in the vignettes Select, reshape, and filter data and Data visualization (commit 81db6ae).


  • (6) This is more of a question. In the package webpage, it says that the package has been developed for the Centre for the Synthesis and Analysis of Biodiversity. Does this mean that they are the funder? If so, they can be added as fnd in the authors list. This post can be useful to understand these three-letter-code used in the authors list on DESCRIPTION.

Answer: Indeed, the research group FORCIS has been funded by the FRB-CESAB. I have added it to the DESCRIPTION file with the role fnd (commit 6bd6e2b).


Regarding your comments (4) and (5), I need to read more about the packages vdiffr and httptest to improve and implement unit tests for plotting functions and HTTP requests.

I will come back to you very soon.
Thanks again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants