amices · stefvanbuuren · Sep 11, 2023 · Sep 11, 2023 · Sep 12, 2023 · Sep 13, 2023
diff --git a/NAMESPACE b/NAMESPACE
@@ -75,6 +75,7 @@ export(convergence)
 export(densityplot)
 export(estimice)
 export(extractBS)
+export(f2p)
 export(fico)
 export(filter)
 export(fix.coef)
@@ -96,8 +97,10 @@ export(is.mitml.result)
 export(lm.mids)
 export(make.blocks)
 export(make.blots)
+export(make.dots)
 export(make.formulas)
 export(make.method)
+export(make.parcel)
 export(make.post)
 export(make.predictorMatrix)
 export(make.visitSequence)
@@ -154,6 +157,7 @@ export(nelsonaalen)
 export(nic)
 export(nimp)
 export(norm.draw)
+export(p2f)
 export(parlmice)
 export(pool)
 export(pool.compare)
@@ -262,6 +266,7 @@ importFrom(stats,spline)
 importFrom(stats,summary.glm)
 importFrom(stats,terms)
 importFrom(stats,update)
+importFrom(stats,update.formula)
 importFrom(stats,var)
 importFrom(stats,vcov)
 importFrom(tidyr,complete)

diff --git a/NEWS.md b/NEWS.md
@@ -1,3 +1,48 @@
+# mice 4 dev
+
+## New behaviours and features
+
+1. TWO SEPARATE INTERFACES FOR MODEL SPECIFICATION: This version promotes two interfaces to specify imputations models: predictor (`predictorMatrix` + `parcel` + `method`) and formula (`formulas + method`). This version does not accept anymore accept mixes of `predictorMatrix` and `formulas` arguments in the call to `mice()`.
+
+2. NA-PROPAGATION PREVENTION. This version detects when a predictor contains missing values that are not imputed. In order to prevent NA propagation, `mice()` can follow two strategies: "Autoremove" (remove incomplete predictor(s) from the RHS, set `method` to `""`, adapt `predictorMatrix`, `formulas` and `blocks`, write to loggedEvents), or "Autoimpute" (Impute incomplete predictor and adapt `method`, `predictorMatrix`, `formulas`, and so on). "Autoremove" is implemented and current default. Use `mice(..., autoremove = FALSE)` to revert to old behavior (NA propagation).
+
+3. SUBMODELS: The `predictorMatrix` input can be a square submatrix of the full `predictorMatrix` when its dimensions are named. `mice()` will augment the tiny `predictorMatrix` to the full matrix and always return a p * p named matrix corresponding to the p columns in the data. Unmentioned variables not be imputed, and the `predictorMatrix`, `formulas` and `method` are adapted accordingly.
+
+4. DROP NON-SQUARE PREDICTOR MATRIX: Version 3.0 introduced non-square versions, but its interpretation turned out to be complex and ambiguous. For clarity, this update works with a predictor matrix that is square with both dimensions identically named with the names of the variables in the data. Variable groups are now specified through the `parcel` argument.
+
+5. NEW PARCEL ARGUMENT. There is a new `parcel` argument that is easier to use. The print of the `mids` object shows `parcel` when it is different from the default. 
+`parcel` can take over the role of `blocks` in specification. `blocks` is soft-deprecated, but still widely used within the program code.
+
+6. NEW DOTS ARGUMENT. The `blots` argument is renamed to `dots`
+
+7. EXIT VALIDATION: Adds a new `validate.mids()` checks the `mids` object before exit.
+
+
+## Changes 
+
+- Adds functions to convert between `predictorMatrix` and `formulas` specification
+- Adds support to pass down user-specified options to multivariate imputation methods
+- Now uses lowercase default block names
+- The `predictorMatrix` input may be unnamed if its size is p * p. For other than p * p, an unnamed matrix generated an error.
+- Performs stricter checks on zero rows in predictorMatrix under empty imputation method
+- Adds new function `remove.rhs.variables()`
+- Removes codes designed to work specifically with a non-square `predictorMatrix`
+- Generates an error if `predictorMatrix` has fewer rows than length of `blocks`
+- Better initialization using typed `NA`s in `initialize.imp()`
+- Rewritten the documentation of all `mice()` arguments to be precise and consistent
+
+## New exit checks
+
+- `rownames(predictorMatrix)` must match `colnames(data)`
+- length of `formulas` and `blocks` must be equal
+- length of `formulas` and `method` must be equal
+- length of `dots` and `method` must be equal
+- length of `method` vector cannot exceed number of variables
+- length of `imp` and number of variables must be equal
+
+## -----------------------------------------------------------
+
+
 # mice 3.16.16
 
 * Prevent `as.mids()` from filling the `imp` object for complete variables

diff --git a/R/D1.R b/R/D1.R
@@ -2,25 +2,25 @@
 #'
 #' The D1-statistics is the multivariate Wald test.
 #'
-#' @param fit1 An object of class \code{mira}, produced by \code{with()}.
-#' @param fit0 An object of class \code{mira}, produced by \code{with()}. The
-#' model in \code{fit0} is a nested within \code{fit1}. The default null
-#' model \code{fit0 = NULL} compares \code{fit1} to the intercept-only model.
+#' @param fit1 An object of class `mira`, produced by `with()`.
+#' @param fit0 An object of class `mira`, produced by `with()`. The
+#' model in `fit0` is a nested within `fit1`. The default null
+#' model `fit0 = NULL` compares `fit1` to the intercept-only model.
 #' @param dfcom A single number denoting the
-#' complete-data degrees of freedom of model \code{fit1}. If not specified,
-#' it is set equal to \code{df.residual} of model \code{fit1}. If that cannot
+#' complete-data degrees of freedom of model `fit1`. If not specified,
+#' it is set equal to `df.residual` of model `fit1`. If that cannot
 #' be done, the procedure assumes (perhaps incorrectly) a large sample.
 #' @param df.com Deprecated
 #' @note Warning: `D1()` assumes that the order of the variables is the
 #' same in different models. See
-#' \url{https://github.com/amices/mice/issues/420} for details.
+#' <https://github.com/amices/mice/issues/420> for details.
 #' @references
 #' Li, K. H., T. E. Raghunathan, and D. B. Rubin. 1991.
 #' Large-Sample Significance Levels from Multiply Imputed Data Using
 #' Moment-Based Statistics and an F Reference Distribution.
-#' \emph{Journal of the American Statistical Association}, 86(416): 1065–73.
+#' *Journal of the American Statistical Association*, 86(416): 1065–73.
 #'
-#' \url{https://stefvanbuuren.name/fimd/sec-multiparameter.html#sec:wald}
+#' <https://stefvanbuuren.name/fimd/sec-multiparameter.html#sec:wald>
 #' @examples
 #' # Compare two linear models:
 #' imp <- mice(nhanes2, seed = 51009, print = FALSE)
@@ -34,7 +34,7 @@
 #' fit0 <- with(imp, glm(gen > levels(gen)[1] ~ hgt + hc, family = binomial))
 #' D1(fit1, fit0)
 #' }
-#' @seealso \code{\link[mitml]{testModels}}
+#' @seealso [mitml::testModels()]
 #' @export
 D1 <- function(fit1, fit0 = NULL, dfcom = NULL, df.com = NULL) {
   install.on.demand("mitml")

diff --git a/R/D2.R b/R/D2.R
@@ -7,13 +7,13 @@
 #' @inheritParams mitml::testModels
 #' @note Warning: `D2()` assumes that the order of the variables is the
 #' same in different models. See
-#' \url{https://github.com/amices/mice/issues/420} for details.
+#' <https://github.com/amices/mice/issues/420> for details.
 #' @references
 #' Li, K. H., X. L. Meng, T. E. Raghunathan, and D. B. Rubin. 1991.
 #' Significance Levels from Repeated p-Values with Multiply-Imputed Data.
-#' \emph{Statistica Sinica} 1 (1): 65–92.
+#' *Statistica Sinica* 1 (1): 65–92.
 #'
-#' \url{https://stefvanbuuren.name/fimd/sec-multiparameter.html#sec:chi}
+#' <https://stefvanbuuren.name/fimd/sec-multiparameter.html#sec:chi>
 #' @examples
 #' # Compare two linear models:
 #' imp <- mice(nhanes2, seed = 51009, print = FALSE)
@@ -27,7 +27,7 @@
 #' fit0 <- with(imp, glm(gen > levels(gen)[1] ~ hgt + hc, family = binomial))
 #' D2(fit1, fit0)
 #' }
-#' @seealso \code{\link[mitml]{testModels}}
+#' @seealso [mitml::testModels()]
 #' @export
 D2 <- function(fit1, fit0 = NULL, use = "wald") {
   install.on.demand("mitml")

diff --git a/R/D3.R b/R/D3.R
@@ -3,34 +3,34 @@
 #' The D3-statistic is a likelihood-ratio test statistic.
 #'
 #' @details
-#' The \code{D3()} function implement the LR-method by
+#' The `D3()` function implement the LR-method by
 #' Meng and Rubin (1992). The implementation of the method relies
-#' on the \code{broom} package, the standard \code{update} mechanism
-#' for statistical models in \code{R} and the \code{offset} function.
+#' on the `broom` package, the standard `update` mechanism
+#' for statistical models in `R` and the `offset` function.
 #'
-#' The function calculates \code{m} repetitions of the full
+#' The function calculates `m` repetitions of the full
 #' (or null) models, calculates the mean of the estimates of the
 #' (fixed) parameter coefficients \eqn{\beta}. For each imputed
 #' imputed dataset, it calculates the likelihood for the model with
 #' the parameters constrained to \eqn{\beta}.
 #'
-#' The \code{mitml::testModels()} function offers similar functionality
-#' for a subset of statistical models. Results of \code{mice::D3()} and
-#' \code{mitml::testModels()} differ in multilevel models because the
-#' \code{testModels()} also constrains the variance components parameters.
+#' The `mitml::testModels()` function offers similar functionality
+#' for a subset of statistical models. Results of `mice::D3()` and
+#' `mitml::testModels()` differ in multilevel models because the
+#' `testModels()` also constrains the variance components parameters.
 #' For more details on
 #'
-#' @seealso \code{\link{fix.coef}}
+#' @seealso [fix.coef()]
 #' @inheritParams D1
-#' @return An object of class \code{mice.anova}
+#' @return An object of class `mice.anova`
 #' @references
 #' Meng, X. L., and D. B. Rubin. 1992.
 #' Performing Likelihood Ratio Tests with Multiply-Imputed Data Sets.
-#' \emph{Biometrika}, 79 (1): 103–11.
+#' *Biometrika*, 79 (1): 103–11.
 #'
-#' \url{https://stefvanbuuren.name/fimd/sec-multiparameter.html#sec:likelihoodratio}
+#' <https://stefvanbuuren.name/fimd/sec-multiparameter.html#sec:likelihoodratio>
 #'
-#' \url{http://bbolker.github.io/mixedmodels-misc/glmmFAQ.html#setting-residual-variances-to-a-fixed-value-zero-or-other}
+#' <http://bbolker.github.io/mixedmodels-misc/glmmFAQ.html#setting-residual-variances-to-a-fixed-value-zero-or-other>
 #' @examples
 #' # Compare two linear models:
 #' imp <- mice(nhanes2, seed = 51009, print = FALSE)

diff --git a/R/ampute.R b/R/ampute.R
@@ -2,11 +2,11 @@
 #'
 #' This function generates multivariate missing data under a MCAR, MAR or MNAR
 #' missing data mechanism. Imputation of data sets containing missing values can
-#' be performed with \code{\link{mice}}.
+#' be performed with [mice()].
 #'
 #' This function generates missing values in complete data sets. Amputation of complete
 #' data sets is useful for the evaluation of imputation techniques, such as multiple
-#' imputation (performed with function \code{\link{mice}} in this package).
+#' imputation (performed with function [mice()] in this package).
 #'
 #' The basic strategy underlying multivariate imputation was suggested by
 #' Don Rubin during discussions in the 90's. Brand (1997) created one particular
@@ -21,13 +21,13 @@
 #' With the univariate approach, it is difficult to relate the missingness on one
 #' variable to the missingness on another variable. A multivariate amputation procedure
 #' solves this issue and moreover, it does justice to the multivariate nature of
-#' data sets. Hence, \code{ampute} is developed to perform multivariate amputation.
+#' data sets. Hence, `ampute` is developed to perform multivariate amputation.
 #'
 #' The idea behind the function is the specification of several missingness
 #' patterns. Each pattern is a combination of variables with and without missing
-#' values (denoted by \code{0} and \code{1} respectively). For example, one might
+#' values (denoted by `0` and `1` respectively). For example, one might
 #' want to create two missingness patterns on a data set with four variables. The
-#' patterns could be something like: \code{0,0,1,1} and \code{1,0,1,0}.
+#' patterns could be something like: `0,0,1,1` and `1,0,1,0`.
 #' Each combination of zeros and ones may occur.
 #'
 #' Furthermore, the researcher specifies the proportion of missingness, either the
@@ -43,14 +43,14 @@
 #' For a discussion on how missingness mechanisms are related to the observed data,
 #' we refer to \doi{10.1177/0049124118799376}.
 #'
-#' When the user specifies the missingness mechanism to be \code{"MCAR"}, the candidates
-#' have an equal probability of becoming incomplete. For a \code{"MAR"} or \code{"MNAR"} mechanism,
+#' When the user specifies the missingness mechanism to be `"MCAR"`, the candidates
+#' have an equal probability of becoming incomplete. For a `"MAR"` or `"MNAR"` mechanism,
 #' weighted sum scores are calculated. These scores are a linear combination of the
 #' variables.
 #'
 #' In order to calculate the weighted sum scores, the data is standardized. For this reason,
 #' the data has to be numeric. Second, for each case, the values in
-#' the data set are multiplied with the weights, specified by argument \code{weights}.
+#' the data set are multiplied with the weights, specified by argument `weights`.
 #' These weighted scores will be summed, resulting in a weighted sum score for each case.
 #'
 #' The weights may differ between patterns and they may be negative or zero as well.
@@ -93,19 +93,19 @@
 #' @param prop A scalar specifying the proportion of missingness. Should be a value
 #' between 0 and 1. Default is a missingness proportion of 0.5.
 #' @param patterns A matrix or data frame of size #patterns by #variables where
-#' \code{0} indicates that a variable should have missing values and \code{1} indicates
+#' `0` indicates that a variable should have missing values and `1` indicates
 #' that a variable should remain complete. The user may specify as many patterns as
 #' desired. One pattern (a vector) is possible as well. Default
 #' is a square matrix of size #variables where each pattern has missingness on one
-#' variable only (created with \code{\link{ampute.default.patterns}}). After the
-#' amputation procedure, \code{\link{md.pattern}} can be used to investigate the
+#' variable only (created with [ampute.default.patterns()]). After the
+#' amputation procedure, [md.pattern()] can be used to investigate the
 #' missing data patterns in the data.
 #' @param freq A vector of length #patterns containing the relative frequency with
 #' which the patterns should occur. For example, for three missing data patterns,
-#' the vector could be \code{c(0.4, 0.4, 0.2)}, meaning that of all cases with
+#' the vector could be `c(0.4, 0.4, 0.2)`, meaning that of all cases with
 #' missing values, 40 percent should have pattern 1, 40 percent pattern 2 and 20
 #' percent pattern 3. The vector should sum to 1. Default is an equal probability
-#' for each pattern, created with \code{\link{ampute.default.freq}}.
+#' for each pattern, created with [ampute.default.freq()].
 #' @param mech A string specifying the missingness mechanism, either "MCAR"
 #' (Missing Completely At Random), "MAR" (Missing At Random) or "MNAR" (Missing Not At
 #' Random). Default is a MAR missingness mechanism.
@@ -115,50 +115,50 @@
 #' zero. For a MNAR mechanism, these weights could have any possible value. Furthermore,
 #' the weights may differ between patterns and between variables. They may be negative
 #' as well. Within each pattern, the relative size of the values are of importance.
-#' The default weights matrix is made with \code{\link{ampute.default.weights}} and
+#' The default weights matrix is made with [ampute.default.weights()] and
 #' returns a matrix with equal weights for all variables. In case of MAR, variables
-#' that will be amputed will be weighted with \code{0}. For MNAR, variables
-#' that will be observed will be weighted with \code{0}. If the mechanism is MCAR, the
+#' that will be amputed will be weighted with `0`. For MNAR, variables
+#' that will be observed will be weighted with `0`. If the mechanism is MCAR, the
 #' weights matrix will not be used.
 #' @param std Logical. Whether the weighted sum scores should be calculated with
 #' standardized data or with non-standardized data. The latter is especially advised when
 #' making use of train and test sets in order to prevent leakage.
 #' @param cont Logical. Whether the probabilities should be based on a continuous
 #' or a discrete distribution. If TRUE, the probabilities of being missing are based
-#' on a continuous logistic distribution function. \code{\link{ampute.continuous}}
+#' on a continuous logistic distribution function. [ampute.continuous()]
 #' will be used to calculate and assign the probabilities. These probabilities will then
-#' be based on the argument \code{type}. If FALSE, the probabilities of being missing are
-#' based on a discrete distribution (\code{\link{ampute.discrete}}) based on the \code{odds}
+#' be based on the argument `type`. If FALSE, the probabilities of being missing are
+#' based on a discrete distribution ([ampute.discrete()]) based on the `odds`
 #' argument. Default is TRUE.
 #' @param type A string or vector of strings containing the type of missingness for each
-#' pattern. Either \code{"LEFT"}, \code{"MID"}, \code{"TAIL"} or '\code{"RIGHT"}.
+#' pattern. Either `"LEFT"`, `"MID"`, `"TAIL"` or '`"RIGHT"`.
 #' If a single missingness type is given, all patterns will be created with the same
 #' type. If the missingness types should differ between patterns, a vector of missingness
 #' types should be given. Default is RIGHT for all patterns and is the result of
-#' \code{\link{ampute.default.type}}.
+#' [ampute.default.type()].
 #' @param odds A matrix where #patterns defines the #rows. Each row should contain
 #' the odds of being missing for the corresponding pattern. The number of odds values
 #' defines in how many quantiles the sum scores will be divided. The odds values are
 #' relative probabilities: a quantile with odds value 4 will have a probability of
 #' being missing that is four times higher than a quantile with odds 1. The
 #' number of quantiles may differ between the patterns, specify NA for cells remaining empty.
 #' Default is 4 quantiles with odds values 1, 2, 3 and 4 and is created by
-#' \code{\link{ampute.default.odds}}.
+#' [ampute.default.odds()].
 #' @param bycases Logical. If TRUE, the proportion of missingness is defined in
 #' terms of cases. If FALSE, the proportion of missingness is defined in terms of
 #' cells. Default is TRUE.
 #' @param run Logical. If TRUE, the amputations are implemented. If FALSE, the
 #' return object will contain everything except for the amputed data set.
 #'
-#' @return Returns an S3 object of class \code{\link{mads-class}} (multivariate
+#' @return Returns an S3 object of class [mads-class()] (multivariate
 #' amputed data set)
-#' @author Rianne Schouten [aut, cre], Gerko Vink [aut], Peter Lugtig [ctb], 2016
-#' @seealso \code{\link{mads-class}}, \code{\link{bwplot}}, \code{\link{xyplot}},
-#' \code{\link{mice}}
+#' @author Rianne Schouten (aut, cre), Gerko Vink (aut), Peter Lugtig (ctb), 2016
+#' @seealso [mads-class()], [bwplot()], [xyplot()],
+#' [mice()]
 #'
-#' @references Brand, J.P.L. (1999) \emph{Development, implementation and
+#' @references Brand, J.P.L. (1999) *Development, implementation and
 #' evaluation of multiple imputation strategies for the statistical analysis of
-#' incomplete data sets.} pp. 110-113. Dissertation. Rotterdam: Erasmus University.
+#' incomplete data sets.* pp. 110-113. Dissertation. Rotterdam: Erasmus University.
 #'
 #' Schouten, R.M., Lugtig, P and Vink, G. (2018)
 #' Generating missing values for simulation purposes: A multivariate