Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: Model/data anonymization function #755

Open
billdenney opened this issue Aug 6, 2024 · 1 comment
Open

Feature request: Model/data anonymization function #755

billdenney opened this issue Aug 6, 2024 · 1 comment

Comments

@billdenney
Copy link
Contributor

A perennial issue with reporting nlmixr2 issues is anonymizing the model and/or data.

I've worked up the function below that covers almost all anonymization. The things that I see which are not (or may not be) covered and which may be issues for inclusion are:

  1. If a fixed value is set in the model (e.g. model({mw <- 150000}), neither the name mw nor the fixed value are modified. How can I find a list of all left-hand-side names which are given in this way, i.e. not a model parameter, covariate, residual error value, output, or compartment name that are also not special lines in the model like derivatives?
  2. How can I remove meta-information?
  3. It uses the data simplification function from nlmixr2targets to minimize the dataset width. (We may want to move that function to the same package where this lives.)
  4. I need to test that it will anonymize eta values.

If generally acceptable, where do you think this should live? I think rxode2, but I can imagine

#' Anonymize an rxode2/nlmixr2 model and dataset to assist with issue reporting
#'
#' It is your responsibility to inspect any information that you are sharing.
#' Do not share confidential data, and this function is designed to help you not
#' share anything that may be confidential.  But, final responsibility remains
#' with the person sharing the data.
#'
#' When parameters and DV values are anonymized, they maintain the approximate
#' order of magnitude and the sign.  This is done by multiplying by a uniform
#' random number between 0.5 and 1.5.
#'
#' The changes in this function may change if an issue occurs or not.  Please
#' retest the issue with the updated model and dataset to see if it recurs.
#'
#' @param model The model to anonymize
#' @param data The dataset to minimize and anonymize
#' @param anonParam Anonymize the initial conditions of the parameters (except
#'   values that are exactly 0 or 1)?
#' @param anonParam1 Also anonymize the initial conditions of the parameters
#'   that are exactly 1?  This only has an effect if `anonParam = TRUE`.
#' @param anonDv Anonymize the DV values in the dataset?
#' @param nId Number of subject identifiers to include in the dataset.  The
#'   first `nId` `"id"` column values will be included to minimize dataset size
#'   (or all `"id"` values if `nId` is greater than the number in the dataset).
#'   Set to `Inf` to include all data.
#'
#' @returns A list with two components, `"model"` and `"data"` containing your
#'   model and dataset, respectively
#' @export
nlmixr2Anonymize <- function(model, data, anonParam = TRUE, anonParam1 = FALSE, anonDv = TRUE, nId = 5) {
  modelUi <- rxode2::as.rxUi(model)
  # Drop all meta-data (FIXME: how do I do this?  It says it should not be overwritten)
  #modelUi$meta <- NULL
  # Drop all labels
  modelUi$iniDf$label <- NA_character_

  dataSimple <- nlmixr2targets::nlmixr_data_simplify(data = data, object = modelUi)
  # Find everything that needs to be renamed
  covariateRename <- modelUi$allCovs
  paramRename <- unique(unlist(modelUi$params[c("pop", "resid", "group", "cmt", "output")]))
  allRename <- c(covariateRename, paramRename)
  allRename <- stats::setNames(allRename, paste0("anon", seq_along(allRename)))
  argsRename <- list(.data = modelUi)
  for (idx in seq_along(allRename)) {
    currentRename <- setNames(list(as.name(allRename[idx])), names(allRename[idx]))
    argsRename <- append(argsRename, currentRename)
  }
  # And rename them all in the model
  modelUiRenamed <- do.call(rxode2::rxRename, argsRename)

  # Anonymize initial conditions while maintaining order of magnitude and sign
  if (anonParam) {
    newIniDf <- modelUiRenamed$iniDf
    currentFactor <- runif(n = nrow(newIniDf), min = 0.5, max = 1.5)
    if (!anonParam1) {
      # Keep values of 1 as they are not likely to be informative and
      currentFactor[newIniDf$est == 1] <- 1
    }
    newIniDf$lower <- newIniDf$lower * currentFactor
    newIniDf$est <- newIniDf$est * currentFactor
    newIniDf$upper <- newIniDf$upper * currentFactor
    rxode2::ini(modelUiRenamed) <- newIniDf
  }

  # Then, rename all covarites in the dataset
  oldNames <- names(dataSimple)
  newNames <- oldNames
  for (currentCovariate in covariateRename) {
    newNames[oldNames %in% currentCovariate] <-
      names(allRename[allRename %in% currentCovariate])
  }
  dataRenamed <- setNames(dataSimple, nm = newNames)

  # Rename compartments in the dataset
  for (oldCmt in modelUi$params$cmt) {
    newCmt <- names(allRename[allRename %in% oldCmt])
    mask <- dataRenamed$cmt %in% oldCmt
    if (any(mask)) {
      dataRenamed$cmt[mask] <- newCmt
    }
  }

  # Anonymize the ids in the dataset
  dataRenamed$id <- as.integer(factor(dataRenamed$id))

  # Anonymize DV in the dataset
  if (anonDv && "dv" %in% names(dataRenamed)) {
    dataRenamed$dv <- dataRenamed$dv * runif(n = nrow(dataRenamed), min = 0.5, max = 1.5)
  }

  # Reduce the dataset size, if desired
  if (is.finite(nId)) {
    allId <- unique(dataRenamed$id)
    if (nId < length(allId)) {
      dataRenamed <- dataRenamed[dataRenamed$id %in% allId[seq_len(nId)], ]
    }
  }

  list(
    model = as.function(modelUiRenamed),
    data = dataRenamed
  )
}
@mattfidler
Copy link
Member

Probably in rxode2 with appropriate tests.

@mattfidler mattfidler transferred this issue from nlmixr2/nlmixr2 Aug 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants