Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

galamm: Generalized Additive Latent and Mixed Models #615

Open
12 of 20 tasks
osorensen opened this issue Oct 20, 2023 · 53 comments
Open
12 of 20 tasks

galamm: Generalized Additive Latent and Mixed Models #615

osorensen opened this issue Oct 20, 2023 · 53 comments

Comments

@osorensen
Copy link

osorensen commented Oct 20, 2023

Submitting Author Name: Øystein Sørensen
Submitting Author Github Handle: @osorensen
Repository: https://github.com/LCBC-UiO/galamm
Version submitted: 0.1.1.9000
Submission type: Stats
Badge grade: gold
Editor: @noamross
Reviewers: @nicholasjclark, @eric-pedersen

Due date for @nicholasjclark: 2024-11-10

Due date for @eric-pedersen: 2024-11-23
Archive: TBD
Version accepted: TBD
Language: en

  • Paste the full DESCRIPTION file inside a code block below:
Package: galamm
Title: Generalized Additive Latent and Mixed Models
Version: 0.1.1.9000
Authors@R: c(
    person(given = "Øystein",
           family = "Sørensen",
           role = c("aut", "cre"),
           email = "[email protected]",
           comment = c(ORCID = "0000-0003-0724-3542")),
    person(given = "Douglas", family = "Bates", role = "ctb"),       
    person(given = "Ben", family = "Bolker", role = "ctb"),
    person(given = "Martin", family = "Maechler", role = "ctb"),
    person(given = "Allan", family = "Leal", role = "ctb"),
    person(given = "Fabian", family = "Scheipl", role = "ctb"),
    person(given = "Steven", family = "Walker", role = "ctb"),
    person(given = "Simon", family = "Wood", role = "ctb")
           )
Description: Estimates generalized additive latent and
    mixed models using maximum marginal likelihood, 
    as defined in Sorensen et al. (2023) 
    <doi:10.1007/s11336-023-09910-z>, which is an extension of Rabe-Hesketh and
    Skrondal (2004)'s unifying framework for multilevel latent variable 
    modeling <doi:10.1007/BF02295939>. Efficient computation is done using sparse 
    matrix methods, Laplace approximation, and automatic differentiation. The 
    framework includes generalized multilevel models with heteroscedastic 
    residuals, mixed response types, factor loadings, smoothing splines, 
    crossed random effects, and combinations thereof. Syntax for model 
    formulation is close to 'lme4' (Bates et al. (2015) 
    <doi:10.18637/jss.v067.i01>) and 'PLmixed' (Rockwood and Jeon (2019) 
    <doi:10.1080/00273171.2018.1516541>).
License: GPL (>= 3)
URL: https://github.com/LCBC-UiO/galamm, https://lcbc-uio.github.io/galamm/
BugReports: https://github.com/LCBC-UiO/galamm/issues
Encoding: UTF-8
Imports: 
    lme4,
    Matrix,
    memoise,
    methods,
    mgcv,
    nlme,
    Rcpp,
    Rdpack,
    stats
Depends:
    R (>= 3.5.0)
LinkingTo:
    Rcpp,
    RcppEigen
LazyData: true
Roxygen: list(markdown = TRUE, roclets = c ("namespace", "rd", "srr::srr_stats_roclet"))
RoxygenNote: 7.2.3
Suggests:
    covr,
    gamm4,
    knitr,
    PLmixed,
    rmarkdown,
    testthat (>= 3.0.0)
Config/testthat/edition: 3
VignetteBuilder: knitr, rmarkdown
RdMacros: Rdpack
NeedsCompilation: yes
SystemRequirements: C++17

Scope

  • Please indicate which of our statistical package categories this package falls under. (Please check one appropriate box below):

    Statistical Packages

    • Bayesian and Monte Carlo Routines
    • Dimensionality Reduction, Clustering, and Unsupervised Learning
    • Machine Learning
    • Regression and Supervised Learning
    • Exploratory Data Analysis (EDA) and Summary Statistics
    • Spatial Analyses
    • Time Series Analyses

Pre-submission Inquiry

  • A pre-submission inquiry has been approved in issue 614

General Information

  • Who is the target audience and what are scientific applications of this package?
    The target audience is applied statisticians and quantitative scientists, particularly those working on the social sciences. The package is motivated by longitudinal studies in cognitive neuroscience, but it is applicable wherever a measurement model (of factor analysis type) needs to be combined with hierarchical modeling.

  • Paste your responses to our General Standard G1.1 here, describing whether your software is:

This is the first implementation of the algorithm developed in Sørensen, Fjell, and Walhovd (2023).

Badging

  1. "Compliance with a good number of standards beyond those identified as minimally necessary.": I have attempted to comply with all standards for regression software outlined in the Online Book for Statistical Software. I have used srr to point out which parts of the code I think address each of the standards.
  2. "Demonstrating excellence in compliance with multiple standards from at least two broad sub-categories.": I have tried to comply with all the standards in 6.1.1 - 6.1.5 of the Standards Chapter.
  3. "Have a demonstrated generality of usage beyond one single envisioned use case.": The software supports generality of usage, and the vignettes describe several such use cases.

Technical checks

Confirm each of the following by checking the box.

This package:

Publication options

  • Do you intend for this package to go on CRAN?
    The package is on CRAN. I am aware that rOpenSci recommends waiting with submitting to CRAN, but the package has some users already, and having pre-compiled binaries on CRAN makes it easier for them to install it, rather than having to set up a toolchain required for install from source. I hence opted to send it to CRAN.
  • Do you intend for this package to go on Bioconductor?

Code of conduct

@ropensci-review-bot
Copy link
Collaborator

Thanks for submitting to rOpenSci, our editors and @ropensci-review-bot will reply soon. Type @ropensci-review-bot help for help.

@ropensci-review-bot
Copy link
Collaborator

🚀

Editor check started

👋

@ropensci-review-bot
Copy link
Collaborator

Checks for galamm (v0.1.1.9000)

git hash: 26cfad15

  • ✔️ Package is already on CRAN.
  • ✔️ has a 'codemeta.json' file.
  • ✔️ has a 'contributing' file.
  • ✔️ uses 'roxygen2'.
  • ✔️ 'DESCRIPTION' has a URL field.
  • ✔️ 'DESCRIPTION' has a BugReports field.
  • ✔️ Package has at least one HTML vignette
  • ✔️ All functions have examples.
  • ✔️ Package has continuous integration checks.
  • ✔️ Package coverage is 98.4%.
  • ✔️ R CMD check found no errors.
  • ✔️ R CMD check found no warnings.
  • 👀 Function names are duplicated in other packages

(Checks marked with 👀 may be optionally addressed.)

Package License: GPL (>= 3)


1. rOpenSci Statistical Standards (srr package)

✔️ All applicable standards [v0.2.0] have been documented in this package (217 complied with; 0 N/A standards)


2. Package Dependencies

Details of Package Dependency Usage (click to open)

The table below tallies all function calls to all packages ('ncalls'), both internal (r-base + recommended, along with the package itself), and external (imported and suggested packages). 'NA' values indicate packages to which no identified calls to R functions could be found. Note that these results are generated by an automated code-tagging system which may not be entirely accurate.

type package ncalls
internal base 439
internal galamm 76
internal utils 30
internal graphics 6
imports stats 79
imports lme4 11
imports Matrix 11
imports mgcv 8
imports methods 3
imports nlme 3
imports memoise 1
imports Rcpp NA
imports Rdpack NA
suggests covr NA
suggests gamm4 NA
suggests knitr NA
suggests PLmixed NA
suggests rmarkdown NA
suggests testthat NA
linking_to Rcpp NA
linking_to RcppEigen NA

Click below for tallies of functions used in each package. Locations of each call within this package may be generated locally by running 's <- pkgstats::pkgstats(<path/to/repo>)', and examining the 'external_calls' table.

base

list (31), seq_along (22), for (19), lapply (19), length (18), c (17), names (15), ncol (15), attr (14), seq_len (14), vapply (14), if (13), drop (11), rep (11), as.numeric (10), is.null (9), nrow (9), integer (8), unlist (8), factor (7), paste (7), qr (7), diff (6), max (6), seq (6), all.vars (5), any (5), matrix (5), numeric (5), cbind (4), colnames (4), logical (4), sqrt (4), beta (3), eval (3), grepl (3), levels (3), Map (3), match.call (3), Reduce (3), return (3), row.names (3), scale (3), unique (3), by (2), col (2), data.frame (2), diag (2), do.call (2), ifelse (2), lengths (2), order (2), parent.frame (2), qr.R (2), rank (2), rbind (2), abs (1), array (1), as.character (1), as.integer (1), as.logical (1), as.matrix (1), assign (1), backsolve (1), deparse (1), deparse1 (1), dim (1), environment (1), getOption (1), inherits (1), intersect (1), is.infinite (1), is.nan (1), min (1), parse (1), pmax (1), qr.qty (1), regexpr (1), rep.int (1), rowSums (1), setdiff (1), split (1), sum (1), t (1), tabulate (1), which (1)

stats

deviance (9), pf (8), formula (6), as.formula (4), BIC (4), family (4), logLik (4), model.matrix (4), weights (4), quantile (3), terms (3), AIC (2), nobs (2), rf (2), terms.formula (2), contrasts (1), D (1), delete.response (1), df (1), gaussian (1), getCall (1), model.frame (1), model.response (1), na.action (1), optim (1), pchisq (1), pnorm (1), qnorm (1), reformulate (1), smooth (1), start (1), update (1), vcov (1)

galamm

extractor (3), factor_finder (3), find_parm_inds (3), fn (3), gr (3), mlwrapper (3), define_factor_mappings (2), extend_lambda (2), extract_name (2), find_k (2), gam.setup (2), gamm4 (2), gamm4.setup (2), interpret.gam0 (2), set_initial_values (2), setup_factor (2), anova.galamm (1), coef.galamm (1), confint.galamm (1), deviance.galamm (1), extract_optim_parameters (1), extract_optim_parameters.galamm (1), factor_loadings (1), factor_loadings.galamm (1), family.galamm (1), fitted.galamm (1), fixef.galamm (1), formula.galamm (1), galamm (1), galamm_control (1), gam.side (1), gamm4.wrapup (1), llikAIC (1), logLik.galamm (1), mappingunwrapping (1), marginal_likelihood (1), new_galamm_control (1), nobs.galamm (1), plot_smooth (1), plot_smooth.galamm (1), plot.galamm (1), predict.galamm (1), print.summary.galamm (1), print.VarCorr.galamm (1), ranef.galamm (1), release_questions (1), residuals.galamm (1), setup_family (1), setup_response_object (1), sl (1), squeeze_mappings (1), t2l (1), VarCorr.galamm (1), variable.summary (1)

utils

data (30)

lme4

findbars (3), nobars (3), lFormula (2), mkReTrms (2), .prt.VC (1)

Matrix

t (4), chol (2), Matrix (2), solve (2), Diagonal (1)

mgcv

new.name (2), smooth2random (2), Rrank (1), s (1), smoothCon (1), t2 (1)

graphics

par (3), abline (2), text (1)

methods

as (3)

nlme

fixef (1), ranef (1), VarCorr (1)

memoise

memoise (1)

NOTE: Some imported packages appear to have no associated function calls; please ensure with author that these 'Imports' are listed appropriately.


3. Statistical Properties

This package features some noteworthy statistical properties which may need to be clarified by a handling editor prior to progressing.

Details of statistical properties (click to open)

The package has:

  • code in C++ (4% in 2 files), C/C++ Header (66% in 18 files) and R (29% in 30 files)
  • 1 authors
  • 9 vignettes
  • 8 internal data files
  • 9 imported packages
  • 31 exported functions (median 6 lines of code)
  • 81 non-exported functions in R (median 16 lines of code)
  • 618 C/C++ functions (median 4 lines of code)

Statistical properties of package structure as distributional percentiles in relation to all current CRAN packages
The following terminology is used:

  • loc = "Lines of Code"
  • fn = "function"
  • exp/not_exp = exported / not exported

All parameters are explained as tooltips in the locally-rendered HTML version of this report generated by the checks_to_markdown() function

The final measure (fn_call_network_size) is the total number of calls between functions (in R), or more abstract relationships between code objects in other languages. Values are flagged as "noteworthy" when they lie in the upper or lower 5th percentile.

measure value percentile noteworthy
files_R 30 89.3
files_src 2 79.1
files_inst 18 99.6
files_vignettes 9 99.2
files_tests 10 90.7
loc_R 1777 81.8
loc_src 252 31.9
loc_inst 4014 86.1
loc_vignettes 1732 96.3 TRUE
loc_tests 2479 95.4 TRUE
num_vignettes 9 99.6 TRUE
data_size_total 265405 88.8
data_size_median 13688 80.9
n_fns_r 112 79.1
n_fns_r_exported 31 79.2
n_fns_r_not_exported 81 79.5
n_fns_src 618 96.1 TRUE
n_fns_per_file_r 2 39.7
n_fns_per_file_src 24 95.1 TRUE
num_params_per_fn 2 11.9
loc_per_fn_r 12 36.1
loc_per_fn_r_exp 6 10.5
loc_per_fn_r_not_exp 16 52.7
loc_per_fn_src 4 2.0 TRUE
rel_whitespace_R 18 80.9
rel_whitespace_src 14 29.1
rel_whitespace_inst 24 85.7
rel_whitespace_vignettes 51 99.2 TRUE
rel_whitespace_tests 11 88.9
doclines_per_fn_exp 42 52.8
doclines_per_fn_not_exp 0 0.0 TRUE
fn_call_network_size 1302 98.5 TRUE

3a. Network visualisation

Click to see the interactive network visualisation of calls between objects in package


4. goodpractice and other checks

Details of goodpractice checks (click to open)

3a. Continuous Integration Badges

R-CMD-check.yaml

GitHub Workflow Results

id name conclusion sha run_number date
6584930507 lint success 26cfad 495 2023-10-20
6584967181 pages build and deployment success 2fba91 144 2023-10-20
6584930514 pkgdown success 26cfad 350 2023-10-20
6584930510 R-CMD-check success 26cfad 558 2023-10-20
6584930523 test-coverage success 26cfad 248 2023-10-20

3b. goodpractice results

R CMD check with rcmdcheck

R CMD check generated the following note:

  1. checking installed package size ... NOTE
    installed size is 34.0Mb
    sub-directories of 1Mb or more:
    doc 2.0Mb
    libs 30.7Mb

R CMD check generated the following check_fail:

  1. rcmdcheck_reasonable_installed_size

Test coverage with covr

Package coverage: 98.35

Cyclocomplexity with cyclocomp

The following functions have cyclocomplexity >= 15:

function cyclocomplexity
galamm 45
gam.setup 44
gamm4.wrapup 44
interpret.gam0 29
define_factor_mappings 17
galamm_control 17

Static code analyses with lintr

lintr found the following 296 potential issues:

message number of times
Avoid library() and require() calls in packages 10
Lines should not be more than 80 characters. 286


5. Other Checks

Details of other checks (click to open)

✖️ The following 2 function names are duplicated in other packages:

    • plot_smooth from itsadug
    • sl from reinsureR


Package Versions

package version
pkgstats 0.1.3.9
pkgcheck 0.1.2.10
srr 0.0.1.194


Editor-in-Chief Instructions:

This package is in top shape and may be passed on to a handling editor

@osorensen
Copy link
Author

👋 @noamross, I just wanted to ask: what's the status of this submission? Is rOpenSci interested in reviewing it?

@noamross
Copy link
Contributor

@osorensen, thanks for following up. My apologies, I think this package fell between cracks in our editor hand-off. I'll follow up later today.

@mpadge mpadge added the stats label Mar 20, 2024
@mpadge
Copy link
Member

mpadge commented Apr 25, 2024

@osorensen Apologies once again, with recent organisational changes this once again fell through the cracks. We are now finally on it. How are you positioned if we finally get the process started now?

@osorensen
Copy link
Author

osorensen commented Apr 25, 2024

@mpadge, a paper describing the package is currently under review for a journal, so I think my best option now is the withdraw the submission to ropensci. I can maybe just to that by closing this issue?

@mpadge
Copy link
Member

mpadge commented Apr 25, 2024

@osorensen We'd still like to work with you to get this through our review process. How about one of the following options:

  1. We put the submission on hold here, until you let us know when your paper has passed through review. We'll then re-start the official review process straight after then. Or,
  2. We start the review process anyway. Our reviews are generally completed within just a few months at most. Especially given our slow response thus far, we'd ensure that our process would be completed well before a typical manuscript review. (Some good examples for recent stats submissions are Submission - melt: Multiple Empirical Likelihood Tests #550 and waywiser: Ergonomic Methods for Assessing Spatial Models #571, both done in around 2 months.) All changes and software improvements during our review could be used to support your responses to manuscript reviews.

Note that if your submission is to Journal of Statistical Software, then our system has been developed in collaboration with their processes, and they would likely welcome you using the results of a review here to support their own process.

@osorensen
Copy link
Author

Thanks @mpadge, I go for option 1 then, and will ping you here once I've got a final decision on the paper.

@markean
Copy link
Member

markean commented May 3, 2024

@osorensen We'd still like to work with you to get this through our review process. How about one of the following options:

  1. We put the submission on hold here, until you let us know when your paper has passed through review. We'll then re-start the official review process straight after then. Or,
  2. We start the review process anyway. Our reviews are generally completed within just a few months at most. Especially given our slow response thus far, we'd ensure that our process would be completed well before a typical manuscript review. (Some good examples for recent stats submissions are Submission - melt: Multiple Empirical Likelihood Tests #550 and waywiser: Ergonomic Methods for Assessing Spatial Models #571, both done in around 2 months.) All changes and software improvements during our review could be used to support your responses to manuscript reviews.

Note that if your submission is to Journal of Statistical Software, then our system has been developed in collaboration with their processes, and they would likely welcome you using the results of a review here to support their own process.

@mpadge, I just stumbled upon this notification. Thank you for mentioning my package review case. The peer review process has significantly enhanced the package's quality in a very short period of time, and I also believe this has expedited its review for publication in the Journal of Statistical Software.

@osorensen
Copy link
Author

@mpadge, the software paper is now published in Multivariate Behavioral Research, https://doi.org/10.1080/00273171.2024.2385336. We can therefore start this review now.

@adamhsparks
Copy link
Member

@ropensci-review-bot check srr

1 similar comment
@adamhsparks
Copy link
Member

@ropensci-review-bot check srr

@ropensci ropensci deleted a comment from ropensci-review-bot Aug 26, 2024
@maelle
Copy link
Member

maelle commented Aug 26, 2024

@ropensci-review-bot check srr

@adamhsparks
Copy link
Member

@ropensci-review-bot check srr

@mpadge
Copy link
Member

mpadge commented Aug 26, 2024

@adamhsparks @osorensen Sorry for any inconvenience. The non-responsive bot was a bug on our side that has now been fixed. It should all work now when you call check srr again.

@maelle
Copy link
Member

maelle commented Aug 26, 2024

@ropensci-review-bot check srr

@ropensci-review-bot
Copy link
Collaborator

✖️ Error: Package documents compliance only with general standards. Statistical packages must document compliance with at least one set of category-specific standards as well.

The following standards are missing:

General standards:

G5.2, G5.2a, G5.2b, G5.3, G5.4a, G5.4b, G5.5, G5.6

All standards must be documented prior to submission

@osorensen
Copy link
Author

osorensen commented Aug 26, 2024

@maelle, the above seems to be a false positive. The mentioned standards are documented here:

https://github.com/LCBC-UiO/galamm/blob/578b52c8a7db11401d98f12a2c21b7fdccc3e37f/tests/testthat.R#L1-L25

@adamhsparks
Copy link
Member

@osorensen, unfortunately, that's not a false positive. They are not properly documented so that the bot can find the standards.

The documentation for ROxygen2 needs to be in /R, not /tests for it to be valid, e.g., https://github.com/giovsaraceno/QuadratiK-package/blob/master/R/srr-stats-standards.R

If you just move it, and redocument it should be fine.

@osorensen
Copy link
Author

@ropensci-review-bot check srr

@mpadge
Copy link
Member

mpadge commented Aug 27, 2024

@osorensen Sorry, bot's not quite ready for your case. Yours is the first package we've had with standards in tests/testthat.R. That is of course perfectly okay, but we simply hadn't encountered it, and so didn't realise that it caused a failure on our side. I'll ping here when our system has been rebuilt with the fix.

(And FYI @adamhsparks, "srrstats" tags can go pretty much anywhere at all, including README, tests, src, and R code.)

@ropensci-review-bot
Copy link
Collaborator

✖️ Error: Package documents compliance only with general standards. Statistical packages must document compliance with at least one set of category-specific standards as well.

'srr' standards compliance:

  • Complied with: 68 / 68 = 100% (general: 68 / 68)
  • Not complied with: / = % (

✔️ This package complies with > 50% of all standads and may be submitted.

@osorensen
Copy link
Author

Thanks @mpadge, there was a typo in there, which should be fixed now. I tried running check srr above but not sure if anything happens.

@adamhsparks
Copy link
Member

@ropensci-review-bot check srr

1 similar comment
@mpadge
Copy link
Member

mpadge commented Sep 17, 2024

@ropensci-review-bot check srr

@ropensci-review-bot
Copy link
Collaborator

'srr' standards compliance:

  • Complied with: 117 / 117 = 100% (general: 68 / 68; regression: 49 / 49)

✔️ This package complies with > 50% of all standards and may be submitted.

@mpadge
Copy link
Member

mpadge commented Sep 17, 2024

@osorensen @adamhsparks The previous calls seem to have issued during our weekly system rebuild. Just rotten timing, but all should work fine from here on. And nice work @osorensen - 100% compliance!!

@adamhsparks
Copy link
Member

@ropensci-review-bot assign @noamross as editor

@ropensci-review-bot
Copy link
Collaborator

Assigned! @noamross is now the editor

@noamross
Copy link
Contributor

@ropensci-review-bot seeking reviewers

@ropensci-review-bot
Copy link
Collaborator

Please add this badge to the README of your package repository:

[![Status at rOpenSci Software Peer Review](https://badges.ropensci.org/615_status.svg)](https://github.com/ropensci/software-review/issues/615)

Furthermore, if your package does not have a NEWS.md file yet, please create one to capture the changes made during the review process. See https://devguide.ropensci.org/releasing.html#news

@noamross
Copy link
Contributor

@ropensci-review-bot assign @nicholasjclark as reviewer

@ropensci-review-bot
Copy link
Collaborator

@nicholasjclark added to the reviewers list. Review due date is 2024-10-31. Thanks @nicholasjclark for accepting to review! Please refer to our reviewer guide.

rOpenSci’s community is our best asset. We aim for reviews to be open, non-adversarial, and focused on improving software quality. Be respectful and kind! See our reviewers guide and code of conduct for more.

@ropensci-review-bot
Copy link
Collaborator

@nicholasjclark: If you haven't done so, please fill this form for us to update our reviewers records.

@ropensci ropensci deleted a comment from ropensci-review-bot Oct 10, 2024
@noamross
Copy link
Contributor

@ropensci-review-bot set due date for @nicholasjclark to 2024-11-10

@ropensci-review-bot
Copy link
Collaborator

Review due date for @nicholasjclark is now 10-November-2024

@noamross
Copy link
Contributor

@ropensci-review-bot assign @eric-pedersen as reviewer

@ropensci-review-bot
Copy link
Collaborator

@eric-pedersen added to the reviewers list. Review due date is 2024-11-16. Thanks @eric-pedersen for accepting to review! Please refer to our reviewer guide.

rOpenSci’s community is our best asset. We aim for reviews to be open, non-adversarial, and focused on improving software quality. Be respectful and kind! See our reviewers guide and code of conduct for more.

@noamross
Copy link
Contributor

@ropensci-review-bot set due date for @eric-pedersen to 2024-11-23

@ropensci-review-bot
Copy link
Collaborator

Review due date for @eric-pedersen is now 23-November-2024

@nicholasjclark
Copy link

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

  • As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Compliance with Standards

  • This package complies with a sufficient number of standards for a gold badge
  • This grade of badge is the same as what the authors wanted to achieve

The following standards currently deemed non-applicable (through tags of @srrstatsNA) could potentially be applied to future versions of this software: (Please specify)

  • No @srrstatsNA tags were issued, and I agree that all standards apply if the authors wish to achieve a gold badge

Please also comment on any standards which you consider either particularly well, or insufficiently, documented.

  • I find the adherence to standards to be very high throughout

For packages aiming for silver or gold badges:

  • This package extends beyond minimal compliance with standards in the following ways: (please describe)

  • It is very nice to see the explicit recognition of authors / maintainers of crucial packages on which this package is built as contributors in the Description. This is a great way to champion open source software

  • In line with the above, there is extensive referencing throughout the package that will be very helpful for users to better navigate and understand the modeling landscape

  • Function documentation is extensive and has clearly been written to help novel users get started

  • The code follows a strict tidyverse style guide throughout, which makes navigating easy and consistent. The same is also applied to internal functions, which are extremely easy to navigate and understand

  • It is easy to find relevant examples that showcase all arguments from the primary functions, particularly galamm()

  • The vignettes do an excellent job of introducing the problems that this package attempts to address, and of showcasing how users can work through the package

  • Running goodpractice::gp() only returns some warnings about avoiding code lines that are longer than 80 characters. In my view these are not essential


General Review

Documentation

The package includes all the following forms of documentation:

  • A statement of need clearly stating problems the software is designed to solve and its target audience in README

  • The opening paragraphs of the README, and the information in ?galamm-package, do a good job of explaining the problem being addressed. I think that this R package indeed solves a challenging problem, so very well done

  • Installation instructions: for the development version of package and any non-standard dependencies in README

  • I had no issues with installation

  • Community guidelines including contribution guidelines in the README or CONTRIBUTING

  • There is adequate information in CONTRIBUTING.md for how to contribute, which is much appreciated

  • The documentation is sufficient to enable general use of the package beyond one specific use case

  • There is a very comprehensive set of vignettes available already, and the examples in the various functions are also well documented. I was able to quickly fit some models to my own data, which I take as a good sign that the documentation can help users adapt these models for their own use cases

Algorithms

  • The .cpp functions are all very well documented. I am no expert in these algorithms so cannot comment on their design and efficiency, but the level of documentation would make that very straightforward for others that may wish to help drive any possible improvements in future

  • The tests of parameter recovery, and of the scaling of the precision of this recovery with data size, are valuable and informative

Testing

  • All tests passed on my Windows machine, once I installed the necessary PLmixed package. In addition, all examples also passed. Code coverage is very high, which is impressive

  • I confirmed that error messages from mgcv were passed on to the user (in my case I purposely set k too large). But explicit tests that key warnings or errors are passed on would potentially be valuable to help users design and build relevant models

  • When erroring on the dimensions of lambda, I believe the message "lambda matrix must contain one row for each element in load.var" could be changed to "lambda matrix must contain one row for each unique element in load.var"

Visualisation (where appropriate)

  • The visualisations are all appropriate and in line with standard practice for regression packages that use semiparametric functional predictors. However as I mentioned above, I do feel that consideration of making objects of class galamm available for use in gratia, marginaleffects or other popular post-processing packages would greatly extend the user's ability to interact with and interrogate their models

Package Design

  • The design of the package is very familiar to users of popular regression modeling packages such as lme4 or mgcv, so I feel that it won't be challenging for new users to understand how to navigate the functional landscape. It is exceptionally helpful to have common methods such as anova(), ranef(), deviance() and predict(). Other methods to perhaps consider in future would be update() or tidy(), depending on the directions the package may be going

  • I find it confusing that the weights argument in galamm() is specified as a formula. This goes against convention for most regression modeling packages in R, where weights must be supplied as a vector. Perhaps the name of this argument should be changed to avoid confusion?

  • Similarly I find the argument load.var to be inconsistent with the snake_case used throughout the rest of the galamm() function. Perhaps this should be changed to load_var instead?

  • The family_mapping argument description does not specify that this needs to be a vector of class integer, so it is a bit confusing about how this should be set up. For example, I ran:

dat <- head(subset(cognition, domain == 1 & timepoint == 1), 100)
loading_matrix <- matrix(c(1, NA, NA), ncol = 1)

mod <- galamm(
  formula = y ~ 0 + item + sl(x, k = 5, factor = "loading"),
  data = dat,
  load.var = "item",
  lambda = loading_matrix,
  family = c(gaussian, poisson),
  family_mapping = sample(c('a', 'b'), size = NROW(dat), replace = TRUE),
  factor = "loading"
)

And received the following feedback:

Error in galamm(formula = y ~ 0 + item + sl(x, k = 5, factor = "loading"),  : 
  family_mapping must contain a unique index for each element in family_list.
In addition: Warning message:
In galamm(formula = y ~ 0 + item + sl(x, k = 5, factor = "loading"),  :
  NAs introduced by coercion

It is not clear what family_list is here, and the NAs are introduced when trying to convert my string using family_mapping <- as.integer(family_mapping). It may be useful to catch this type of behaviour earlier in the galamm() function

  • Considering inter-operability, it may also be helpful to add methods (or request them via a PR) for the popular gratia and marginaleffects packages. It seems this would be easy enough given that an object of class galamm contains a gam object for the smooth outputs and that it already has a useful predict function. But I believe these additions would be helpful as both gratia and marginaleffects have support for generating ggplot outputs from many other objects that contain mgcv-style smooth outputs, as well as for testing particular hypotheses via model-based predictions

  • I feel that more comprehensive sets of related functions could be listed in the @seealso tags, particularly for the primary function galamm(). As a user of new modeling packages I find it very helpful if these tags provide a general overview of some of the directions I might take after fitting a model. Obvious functions worth listing would be summary.galamm(), plot.galamm(), factor_loadings() and anova.galamm().

  • It may also be helpful to consider a help / MAN page that is dedicated to modeling strategies. This is an extremely flexible modeling package and the vignettes are great, but rather than working through specific examples you may want to give some more general overview of what the package can do without going into too many details. Have a look at ?mgcv::gam.models as an example, and perhaps consider including guidance on how to maintain identifiability of the factors (i.e. are there recommendations about how many slots to fix and how many can be freely estimated?)

  • I realize this may be too optimistic or possibly irrelevant, but the mgcv package can handle list datasets for setting up linear functional terms (i.e. distributed lags, for example). I could envisage examples where users may want to model these types of nonlinear functions as well within the galamm framework, so it may be worth investigating whether they could also be supported


  • Packaging guidelines: The package conforms to the rOpenSci packaging guidelines

  • The guidelines are strictly adhered to throughout the package

Estimated hours spent reviewing: 5

  • Should the author(s) deem it appropriate, I agree to be acknowledged as a package reviewer ("rev" role) in the package DESCRIPTION file.

@osorensen
Copy link
Author

Thanks for your review @nicholasjclark. @noamross, may I start addressing the points immediately, or is it better to wait until the second reviewer has completed his review also?

@noamross
Copy link
Contributor

Thanks for your review, @nicholasjclark! @osorensen, you can address things as you want, but I would suggest waiting for @eric-pedersen's review in case there's anything to be reconciled between the two.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants