Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Major update that improves support for formulas specification #582

Open
wants to merge 39 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
4cf4953
Major update that improves support for formulas specification
stefvanbuuren Sep 11, 2023
ea84be3
Convert documentation Rd tags to markdown tags for roxygen2
stefvanbuuren Sep 11, 2023
5c6bee2
Add a data argument to nimp() to calculate number of imputations per …
stefvanbuuren Sep 12, 2023
755c23a
Restore classic predictorMatrix behaviour that sets predictorMatrix[j…
stefvanbuuren Sep 13, 2023
c2da03c
Clean up source, identicate that there is still a problem with edit.s…
stefvanbuuren Sep 13, 2023
28821a6
Create a make.nest(), n2b() and b2n() function for working with nest …
stefvanbuuren Sep 13, 2023
731bf25
Insist that predictorMatrix has a zero diagonal
stefvanbuuren Sep 13, 2023
8f92307
- Prevention of NA propagation
stefvanbuuren Sep 18, 2023
772c876
Add exit checks on mids object
stefvanbuuren Sep 18, 2023
465bd5c
Add test for zero predictorMatrix row if method == "", deal with rela…
stefvanbuuren Sep 18, 2023
c8ed335
Update news
stefvanbuuren Sep 18, 2023
05a0209
Update documentation for mice() arguments
stefvanbuuren Sep 18, 2023
6033fc6
Update list of builtin imputation methods
stefvanbuuren Sep 18, 2023
29fee22
Reorder sequence of mice() arguments
stefvanbuuren Sep 18, 2023
fef881b
Reorder nest in data sequence
stefvanbuuren Sep 19, 2023
ba383eb
Use lowercase 'b' and 'f' for automatic naming of blocks and formulas
stefvanbuuren Sep 19, 2023
4175534
Update error message in mpmm
stefvanbuuren Sep 19, 2023
0166992
Sort terms both for pred and formulas
stefvanbuuren Sep 19, 2023
35b6084
Create a mechanism to inform check.method() of the set of variables t…
stefvanbuuren Sep 21, 2023
65f544f
Introduce NA types in initialize.imp()
stefvanbuuren Sep 21, 2023
d9c6fa6
Update nest printing in print.mids()
stefvanbuuren Sep 21, 2023
b9e398e
Add support for blots to multivariate imputation models
stefvanbuuren Sep 21, 2023
0345ec3
Rename `nest` to `parcel`
stefvanbuuren Sep 21, 2023
07a79e9
Use lower case default block names
stefvanbuuren Sep 21, 2023
53916f4
Rename `blots` to `dots`
stefvanbuuren Sep 21, 2023
3c09055
Rename files from blots/nest to dots/parcel
stefvanbuuren Sep 21, 2023
3cebc30
Add deprecation support for make.blots()
stefvanbuuren Sep 21, 2023
7b7a17c
Implement autoremove in check.predictorMatrix() and check.formulas()
stefvanbuuren Sep 21, 2023
8c4bb38
Write one loggedEvent for each removed variable
stefvanbuuren Sep 22, 2023
24688b1
Abort mice when user speficies mixes of `formulas` and `predictorMatr…
stefvanbuuren Sep 22, 2023
e1c475f
Update NEWS.md
stefvanbuuren Sep 22, 2023
da6396b
Reorder mice() arguments into a clusters of operations
stefvanbuuren Oct 2, 2023
db5caf6
Remove superfluous construct.parcel(), make remove.rhs.variables() in…
stefvanbuuren Oct 2, 2023
f5d5c99
Add MICE 4 Syntax Documentation CONCEPT as a vignette
stefvanbuuren Oct 2, 2023
6edcd71
Rebuild site to include article mice4syntax
stefvanbuuren Oct 2, 2023
232a0b6
Add test for character variable (#601)
stefvanbuuren Apr 17, 2024
09e58ea
Merge main and support_blocks into new branch mice4 (still failing so…
stefvanbuuren Apr 17, 2024
15321b4
Merging update
stefvanbuuren Apr 17, 2024
deac372
Update support_blocks with master
stefvanbuuren Nov 22, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
5 changes: 5 additions & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,7 @@ export(convergence)
export(densityplot)
export(estimice)
export(extractBS)
export(f2p)
export(fico)
export(filter)
export(fix.coef)
Expand All @@ -96,8 +97,10 @@ export(is.mitml.result)
export(lm.mids)
export(make.blocks)
export(make.blots)
export(make.dots)
export(make.formulas)
export(make.method)
export(make.parcel)
export(make.post)
export(make.predictorMatrix)
export(make.visitSequence)
Expand Down Expand Up @@ -154,6 +157,7 @@ export(nelsonaalen)
export(nic)
export(nimp)
export(norm.draw)
export(p2f)
export(parlmice)
export(pool)
export(pool.compare)
Expand Down Expand Up @@ -262,6 +266,7 @@ importFrom(stats,spline)
importFrom(stats,summary.glm)
importFrom(stats,terms)
importFrom(stats,update)
importFrom(stats,update.formula)
importFrom(stats,var)
importFrom(stats,vcov)
importFrom(tidyr,complete)
Expand Down
45 changes: 45 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,48 @@
# mice 4 dev

## New behaviours and features

1. TWO SEPARATE INTERFACES FOR MODEL SPECIFICATION: This version promotes two interfaces to specify imputations models: predictor (`predictorMatrix` + `parcel` + `method`) and formula (`formulas + method`). This version does not accept anymore accept mixes of `predictorMatrix` and `formulas` arguments in the call to `mice()`.

2. NA-PROPAGATION PREVENTION. This version detects when a predictor contains missing values that are not imputed. In order to prevent NA propagation, `mice()` can follow two strategies: "Autoremove" (remove incomplete predictor(s) from the RHS, set `method` to `""`, adapt `predictorMatrix`, `formulas` and `blocks`, write to loggedEvents), or "Autoimpute" (Impute incomplete predictor and adapt `method`, `predictorMatrix`, `formulas`, and so on). "Autoremove" is implemented and current default. Use `mice(..., autoremove = FALSE)` to revert to old behavior (NA propagation).

3. SUBMODELS: The `predictorMatrix` input can be a square submatrix of the full `predictorMatrix` when its dimensions are named. `mice()` will augment the tiny `predictorMatrix` to the full matrix and always return a p * p named matrix corresponding to the p columns in the data. Unmentioned variables not be imputed, and the `predictorMatrix`, `formulas` and `method` are adapted accordingly.

4. DROP NON-SQUARE PREDICTOR MATRIX: Version 3.0 introduced non-square versions, but its interpretation turned out to be complex and ambiguous. For clarity, this update works with a predictor matrix that is square with both dimensions identically named with the names of the variables in the data. Variable groups are now specified through the `parcel` argument.

5. NEW PARCEL ARGUMENT. There is a new `parcel` argument that is easier to use. The print of the `mids` object shows `parcel` when it is different from the default.
`parcel` can take over the role of `blocks` in specification. `blocks` is soft-deprecated, but still widely used within the program code.

6. NEW DOTS ARGUMENT. The `blots` argument is renamed to `dots`

7. EXIT VALIDATION: Adds a new `validate.mids()` checks the `mids` object before exit.


## Changes

- Adds functions to convert between `predictorMatrix` and `formulas` specification
- Adds support to pass down user-specified options to multivariate imputation methods
- Now uses lowercase default block names
- The `predictorMatrix` input may be unnamed if its size is p * p. For other than p * p, an unnamed matrix generated an error.
- Performs stricter checks on zero rows in predictorMatrix under empty imputation method
- Adds new function `remove.rhs.variables()`
- Removes codes designed to work specifically with a non-square `predictorMatrix`
- Generates an error if `predictorMatrix` has fewer rows than length of `blocks`
- Better initialization using typed `NA`s in `initialize.imp()`
- Rewritten the documentation of all `mice()` arguments to be precise and consistent

## New exit checks

- `rownames(predictorMatrix)` must match `colnames(data)`
- length of `formulas` and `blocks` must be equal
- length of `formulas` and `method` must be equal
- length of `dots` and `method` must be equal
- length of `method` vector cannot exceed number of variables
- length of `imp` and number of variables must be equal

## -----------------------------------------------------------


# mice 3.16.16

* Prevent `as.mids()` from filling the `imp` object for complete variables
Expand Down
20 changes: 10 additions & 10 deletions R/D1.R
Original file line number Diff line number Diff line change
Expand Up @@ -2,25 +2,25 @@
#'
#' The D1-statistics is the multivariate Wald test.
#'
#' @param fit1 An object of class \code{mira}, produced by \code{with()}.
#' @param fit0 An object of class \code{mira}, produced by \code{with()}. The
#' model in \code{fit0} is a nested within \code{fit1}. The default null
#' model \code{fit0 = NULL} compares \code{fit1} to the intercept-only model.
#' @param fit1 An object of class `mira`, produced by `with()`.
#' @param fit0 An object of class `mira`, produced by `with()`. The
#' model in `fit0` is a nested within `fit1`. The default null
#' model `fit0 = NULL` compares `fit1` to the intercept-only model.
#' @param dfcom A single number denoting the
#' complete-data degrees of freedom of model \code{fit1}. If not specified,
#' it is set equal to \code{df.residual} of model \code{fit1}. If that cannot
#' complete-data degrees of freedom of model `fit1`. If not specified,
#' it is set equal to `df.residual` of model `fit1`. If that cannot
#' be done, the procedure assumes (perhaps incorrectly) a large sample.
#' @param df.com Deprecated
#' @note Warning: `D1()` assumes that the order of the variables is the
#' same in different models. See
#' \url{https://github.com/amices/mice/issues/420} for details.
#' <https://github.com/amices/mice/issues/420> for details.
#' @references
#' Li, K. H., T. E. Raghunathan, and D. B. Rubin. 1991.
#' Large-Sample Significance Levels from Multiply Imputed Data Using
#' Moment-Based Statistics and an F Reference Distribution.
#' \emph{Journal of the American Statistical Association}, 86(416): 1065–73.
#' *Journal of the American Statistical Association*, 86(416): 1065–73.
#'
#' \url{https://stefvanbuuren.name/fimd/sec-multiparameter.html#sec:wald}
#' <https://stefvanbuuren.name/fimd/sec-multiparameter.html#sec:wald>
#' @examples
#' # Compare two linear models:
#' imp <- mice(nhanes2, seed = 51009, print = FALSE)
Expand All @@ -34,7 +34,7 @@
#' fit0 <- with(imp, glm(gen > levels(gen)[1] ~ hgt + hc, family = binomial))
#' D1(fit1, fit0)
#' }
#' @seealso \code{\link[mitml]{testModels}}
#' @seealso [mitml::testModels()]
#' @export
D1 <- function(fit1, fit0 = NULL, dfcom = NULL, df.com = NULL) {
install.on.demand("mitml")
Expand Down
8 changes: 4 additions & 4 deletions R/D2.R
Original file line number Diff line number Diff line change
Expand Up @@ -7,13 +7,13 @@
#' @inheritParams mitml::testModels
#' @note Warning: `D2()` assumes that the order of the variables is the
#' same in different models. See
#' \url{https://github.com/amices/mice/issues/420} for details.
#' <https://github.com/amices/mice/issues/420> for details.
#' @references
#' Li, K. H., X. L. Meng, T. E. Raghunathan, and D. B. Rubin. 1991.
#' Significance Levels from Repeated p-Values with Multiply-Imputed Data.
#' \emph{Statistica Sinica} 1 (1): 65–92.
#' *Statistica Sinica* 1 (1): 65–92.
#'
#' \url{https://stefvanbuuren.name/fimd/sec-multiparameter.html#sec:chi}
#' <https://stefvanbuuren.name/fimd/sec-multiparameter.html#sec:chi>
#' @examples
#' # Compare two linear models:
#' imp <- mice(nhanes2, seed = 51009, print = FALSE)
Expand All @@ -27,7 +27,7 @@
#' fit0 <- with(imp, glm(gen > levels(gen)[1] ~ hgt + hc, family = binomial))
#' D2(fit1, fit0)
#' }
#' @seealso \code{\link[mitml]{testModels}}
#' @seealso [mitml::testModels()]
#' @export
D2 <- function(fit1, fit0 = NULL, use = "wald") {
install.on.demand("mitml")
Expand Down
26 changes: 13 additions & 13 deletions R/D3.R
Original file line number Diff line number Diff line change
Expand Up @@ -3,34 +3,34 @@
#' The D3-statistic is a likelihood-ratio test statistic.
#'
#' @details
#' The \code{D3()} function implement the LR-method by
#' The `D3()` function implement the LR-method by
#' Meng and Rubin (1992). The implementation of the method relies
#' on the \code{broom} package, the standard \code{update} mechanism
#' for statistical models in \code{R} and the \code{offset} function.
#' on the `broom` package, the standard `update` mechanism
#' for statistical models in `R` and the `offset` function.
#'
#' The function calculates \code{m} repetitions of the full
#' The function calculates `m` repetitions of the full
#' (or null) models, calculates the mean of the estimates of the
#' (fixed) parameter coefficients \eqn{\beta}. For each imputed
#' imputed dataset, it calculates the likelihood for the model with
#' the parameters constrained to \eqn{\beta}.
#'
#' The \code{mitml::testModels()} function offers similar functionality
#' for a subset of statistical models. Results of \code{mice::D3()} and
#' \code{mitml::testModels()} differ in multilevel models because the
#' \code{testModels()} also constrains the variance components parameters.
#' The `mitml::testModels()` function offers similar functionality
#' for a subset of statistical models. Results of `mice::D3()` and
#' `mitml::testModels()` differ in multilevel models because the
#' `testModels()` also constrains the variance components parameters.
#' For more details on
#'
#' @seealso \code{\link{fix.coef}}
#' @seealso [fix.coef()]
#' @inheritParams D1
#' @return An object of class \code{mice.anova}
#' @return An object of class `mice.anova`
#' @references
#' Meng, X. L., and D. B. Rubin. 1992.
#' Performing Likelihood Ratio Tests with Multiply-Imputed Data Sets.
#' \emph{Biometrika}, 79 (1): 103–11.
#' *Biometrika*, 79 (1): 103–11.
#'
#' \url{https://stefvanbuuren.name/fimd/sec-multiparameter.html#sec:likelihoodratio}
#' <https://stefvanbuuren.name/fimd/sec-multiparameter.html#sec:likelihoodratio>
#'
#' \url{http://bbolker.github.io/mixedmodels-misc/glmmFAQ.html#setting-residual-variances-to-a-fixed-value-zero-or-other}
#' <http://bbolker.github.io/mixedmodels-misc/glmmFAQ.html#setting-residual-variances-to-a-fixed-value-zero-or-other>
#' @examples
#' # Compare two linear models:
#' imp <- mice(nhanes2, seed = 51009, print = FALSE)
Expand Down
56 changes: 28 additions & 28 deletions R/ampute.R
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,11 @@
#'
#' This function generates multivariate missing data under a MCAR, MAR or MNAR
#' missing data mechanism. Imputation of data sets containing missing values can
#' be performed with \code{\link{mice}}.
#' be performed with [mice()].
#'
#' This function generates missing values in complete data sets. Amputation of complete
#' data sets is useful for the evaluation of imputation techniques, such as multiple
#' imputation (performed with function \code{\link{mice}} in this package).
#' imputation (performed with function [mice()] in this package).
#'
#' The basic strategy underlying multivariate imputation was suggested by
#' Don Rubin during discussions in the 90's. Brand (1997) created one particular
Expand All @@ -21,13 +21,13 @@
#' With the univariate approach, it is difficult to relate the missingness on one
#' variable to the missingness on another variable. A multivariate amputation procedure
#' solves this issue and moreover, it does justice to the multivariate nature of
#' data sets. Hence, \code{ampute} is developed to perform multivariate amputation.
#' data sets. Hence, `ampute` is developed to perform multivariate amputation.
#'
#' The idea behind the function is the specification of several missingness
#' patterns. Each pattern is a combination of variables with and without missing
#' values (denoted by \code{0} and \code{1} respectively). For example, one might
#' values (denoted by `0` and `1` respectively). For example, one might
#' want to create two missingness patterns on a data set with four variables. The
#' patterns could be something like: \code{0,0,1,1} and \code{1,0,1,0}.
#' patterns could be something like: `0,0,1,1` and `1,0,1,0`.
#' Each combination of zeros and ones may occur.
#'
#' Furthermore, the researcher specifies the proportion of missingness, either the
Expand All @@ -43,14 +43,14 @@
#' For a discussion on how missingness mechanisms are related to the observed data,
#' we refer to \doi{10.1177/0049124118799376}.
#'
#' When the user specifies the missingness mechanism to be \code{"MCAR"}, the candidates
#' have an equal probability of becoming incomplete. For a \code{"MAR"} or \code{"MNAR"} mechanism,
#' When the user specifies the missingness mechanism to be `"MCAR"`, the candidates
#' have an equal probability of becoming incomplete. For a `"MAR"` or `"MNAR"` mechanism,
#' weighted sum scores are calculated. These scores are a linear combination of the
#' variables.
#'
#' In order to calculate the weighted sum scores, the data is standardized. For this reason,
#' the data has to be numeric. Second, for each case, the values in
#' the data set are multiplied with the weights, specified by argument \code{weights}.
#' the data set are multiplied with the weights, specified by argument `weights`.
#' These weighted scores will be summed, resulting in a weighted sum score for each case.
#'
#' The weights may differ between patterns and they may be negative or zero as well.
Expand Down Expand Up @@ -93,19 +93,19 @@
#' @param prop A scalar specifying the proportion of missingness. Should be a value
#' between 0 and 1. Default is a missingness proportion of 0.5.
#' @param patterns A matrix or data frame of size #patterns by #variables where
#' \code{0} indicates that a variable should have missing values and \code{1} indicates
#' `0` indicates that a variable should have missing values and `1` indicates
#' that a variable should remain complete. The user may specify as many patterns as
#' desired. One pattern (a vector) is possible as well. Default
#' is a square matrix of size #variables where each pattern has missingness on one
#' variable only (created with \code{\link{ampute.default.patterns}}). After the
#' amputation procedure, \code{\link{md.pattern}} can be used to investigate the
#' variable only (created with [ampute.default.patterns()]). After the
#' amputation procedure, [md.pattern()] can be used to investigate the
#' missing data patterns in the data.
#' @param freq A vector of length #patterns containing the relative frequency with
#' which the patterns should occur. For example, for three missing data patterns,
#' the vector could be \code{c(0.4, 0.4, 0.2)}, meaning that of all cases with
#' the vector could be `c(0.4, 0.4, 0.2)`, meaning that of all cases with
#' missing values, 40 percent should have pattern 1, 40 percent pattern 2 and 20
#' percent pattern 3. The vector should sum to 1. Default is an equal probability
#' for each pattern, created with \code{\link{ampute.default.freq}}.
#' for each pattern, created with [ampute.default.freq()].
#' @param mech A string specifying the missingness mechanism, either "MCAR"
#' (Missing Completely At Random), "MAR" (Missing At Random) or "MNAR" (Missing Not At
#' Random). Default is a MAR missingness mechanism.
Expand All @@ -115,50 +115,50 @@
#' zero. For a MNAR mechanism, these weights could have any possible value. Furthermore,
#' the weights may differ between patterns and between variables. They may be negative
#' as well. Within each pattern, the relative size of the values are of importance.
#' The default weights matrix is made with \code{\link{ampute.default.weights}} and
#' The default weights matrix is made with [ampute.default.weights()] and
#' returns a matrix with equal weights for all variables. In case of MAR, variables
#' that will be amputed will be weighted with \code{0}. For MNAR, variables
#' that will be observed will be weighted with \code{0}. If the mechanism is MCAR, the
#' that will be amputed will be weighted with `0`. For MNAR, variables
#' that will be observed will be weighted with `0`. If the mechanism is MCAR, the
#' weights matrix will not be used.
#' @param std Logical. Whether the weighted sum scores should be calculated with
#' standardized data or with non-standardized data. The latter is especially advised when
#' making use of train and test sets in order to prevent leakage.
#' @param cont Logical. Whether the probabilities should be based on a continuous
#' or a discrete distribution. If TRUE, the probabilities of being missing are based
#' on a continuous logistic distribution function. \code{\link{ampute.continuous}}
#' on a continuous logistic distribution function. [ampute.continuous()]
#' will be used to calculate and assign the probabilities. These probabilities will then
#' be based on the argument \code{type}. If FALSE, the probabilities of being missing are
#' based on a discrete distribution (\code{\link{ampute.discrete}}) based on the \code{odds}
#' be based on the argument `type`. If FALSE, the probabilities of being missing are
#' based on a discrete distribution ([ampute.discrete()]) based on the `odds`
#' argument. Default is TRUE.
#' @param type A string or vector of strings containing the type of missingness for each
#' pattern. Either \code{"LEFT"}, \code{"MID"}, \code{"TAIL"} or '\code{"RIGHT"}.
#' pattern. Either `"LEFT"`, `"MID"`, `"TAIL"` or '`"RIGHT"`.
#' If a single missingness type is given, all patterns will be created with the same
#' type. If the missingness types should differ between patterns, a vector of missingness
#' types should be given. Default is RIGHT for all patterns and is the result of
#' \code{\link{ampute.default.type}}.
#' [ampute.default.type()].
#' @param odds A matrix where #patterns defines the #rows. Each row should contain
#' the odds of being missing for the corresponding pattern. The number of odds values
#' defines in how many quantiles the sum scores will be divided. The odds values are
#' relative probabilities: a quantile with odds value 4 will have a probability of
#' being missing that is four times higher than a quantile with odds 1. The
#' number of quantiles may differ between the patterns, specify NA for cells remaining empty.
#' Default is 4 quantiles with odds values 1, 2, 3 and 4 and is created by
#' \code{\link{ampute.default.odds}}.
#' [ampute.default.odds()].
#' @param bycases Logical. If TRUE, the proportion of missingness is defined in
#' terms of cases. If FALSE, the proportion of missingness is defined in terms of
#' cells. Default is TRUE.
#' @param run Logical. If TRUE, the amputations are implemented. If FALSE, the
#' return object will contain everything except for the amputed data set.
#'
#' @return Returns an S3 object of class \code{\link{mads-class}} (multivariate
#' @return Returns an S3 object of class [mads-class()] (multivariate
#' amputed data set)
#' @author Rianne Schouten [aut, cre], Gerko Vink [aut], Peter Lugtig [ctb], 2016
#' @seealso \code{\link{mads-class}}, \code{\link{bwplot}}, \code{\link{xyplot}},
#' \code{\link{mice}}
#' @author Rianne Schouten (aut, cre), Gerko Vink (aut), Peter Lugtig (ctb), 2016
#' @seealso [mads-class()], [bwplot()], [xyplot()],
#' [mice()]
#'
#' @references Brand, J.P.L. (1999) \emph{Development, implementation and
#' @references Brand, J.P.L. (1999) *Development, implementation and
#' evaluation of multiple imputation strategies for the statistical analysis of
#' incomplete data sets.} pp. 110-113. Dissertation. Rotterdam: Erasmus University.
#' incomplete data sets.* pp. 110-113. Dissertation. Rotterdam: Erasmus University.
#'
#' Schouten, R.M., Lugtig, P and Vink, G. (2018)
#' Generating missing values for simulation purposes: A multivariate
Expand Down
Loading
Loading