Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

closes #2481 bug the result of derive param tte depends on the sort order of the input #2569

Open
wants to merge 24 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 20 commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
8e23388
Added order arguments to censor_source and event_source. Also added s…
ProfessorP-beep Nov 18, 2024
cd52801
Added order argument to tte_source as part of development and error f…
ProfessorP-beep Nov 18, 2024
2727736
Fixed previous erros but still need to address failed tests for Test …
ProfessorP-beep Nov 18, 2024
9e86217
added check_type arg_match to derive_param_tte so user has to input a…
ProfessorP-beep Nov 18, 2024
d97377c
Changed position of signal_duplicate_records function in derive_param…
ProfessorP-beep Nov 18, 2024
fa49a51
lintr changes by removing whitespace.
ProfessorP-beep Nov 18, 2024
01e8f5a
styler fix.
ProfessorP-beep Nov 18, 2024
53457c2
updated NEWS.md with changes to derive_param_tte,. Removed Test 17 fr…
ProfessorP-beep Nov 19, 2024
020c9d7
Merge branch 'main' into 2481-bug-the-result-of-derive_param_tte-depe…
ProfessorP-beep Nov 19, 2024
dccdbe1
changed the signal_duplicate_records within derive_parame_tte to hand…
ProfessorP-beep Nov 19, 2024
8006891
Merge branch '2481-bug-the-result-of-derive_param_tte-depends-on-the-…
ProfessorP-beep Nov 19, 2024
4c95243
added a tryCatch() to filter_date_sources to catch duplicates to addr…
ProfessorP-beep Nov 21, 2024
087c0f3
Moved duplication check to filter_date_sources in tryCatch() and rewr…
ProfessorP-beep Nov 24, 2024
4405868
1. Moved updates in News section to admiral dev section
ProfessorP-beep Dec 3, 2024
21b5a00
Ran styler, lintr fixes, and devtools check.
ProfessorP-beep Dec 3, 2024
ce07ad1
styler changes
ProfessorP-beep Dec 3, 2024
1d4e6b7
accepted snapshots from testthat and addressed bds_tte.Rmd error for …
ProfessorP-beep Dec 3, 2024
22f3f2d
added documentation for order and check_type arguments added to funct…
ProfessorP-beep Dec 3, 2024
47637a5
requested updates to documentation and test script for derive_param_tte
ProfessorP-beep Dec 16, 2024
e882758
corrected documentation and removed rlang from bds_tte.Rmd
ProfessorP-beep Dec 17, 2024
e5c28fc
updated derive_param_tte documentation and added test to derive_param…
ProfessorP-beep Dec 20, 2024
404c949
fixed spelling error
ProfessorP-beep Dec 20, 2024
ae70492
updates to derive_param_tte documentation and test examples.
ProfessorP-beep Dec 23, 2024
34d2fb3
Update NEWS.md
ProfessorP-beep Jan 8, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,12 @@
- Update `ADEG` template to flag `ABLFL` and `ANL01FL` based on `DTYPE == "AVERAGE"` records. (#2561)

## Updates of Existing Functions
- added `message` as option for `check_type` argument in `filter_extreme()` function. (#2481)
ProfessorP-beep marked this conversation as resolved.
Show resolved Hide resolved

- Users can now specify how duplicate records are handled in `derive_param_tte()` using the `check_type` argument, with options including `"error"`, `"warning"`, `"message"`, or `"none"`, allowing for greater flexibility in managing duplicate data scenarios. (#2481)

- `order` argument has been added to `event_source()` and `censor_source()` and
defaulted to `NULL` to allow specifying variables in addition to the date variable. This can be used to ensure the uniqueness of the select records if there is more than one record per date. (#2481)

- The `keep_nas` argument of `derive_param_computed()` was enhanced such that it
is now possible to specify a list of variables for which `NA`s are acceptable.
Expand Down
8 changes: 4 additions & 4 deletions R/derive_merged.R
Original file line number Diff line number Diff line change
Expand Up @@ -112,15 +112,15 @@
#'
#' @param check_type Check uniqueness?
#'
#' If `"warning"` or `"error"` is specified, the specified message is issued
#' If `"warning"`, `"message"`, or `"error"` is specified, the specified message is issued
#' if the observations of the (restricted) additional dataset are not unique
#' with respect to the by variables and the order.
#'
#' If the `order` argument is not specified, the `check_type` argument is ignored:
#' if the observations of the (restricted) additional dataset are not unique with respect
#' to the by variables, an error is issued.
#' if the observations of the (restricted) additional dataset are not unique with respect
#' to the by variables, an error is issued.
#'
#' *Permitted Values*: `"none"`, `"warning"`, `"error"`
#' *Permitted Values*: `"none"`, `"message"`,`"warning"`, `"error"`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#' *Permitted Values*: `"none"`, `"message"`,`"warning"`, `"error"`
#' *Permitted Values*: `"none"`, `"message"`, `"warning"`, `"error"`

#'
#' @param duplicate_msg Message of unique check
#'
Expand Down
121 changes: 96 additions & 25 deletions R/derive_param_tte.R
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,14 @@
#'
#' A list of symbols created using `exprs()` is expected.
#'
#' @param check_type Check uniqueness
#'
#' If `"warning"`, `"message"`, or `"error"` is specified, the specified message is issued
#' if the observations of the input dataset are not unique with respect to the
#' by variables and the order.
Comment on lines +66 to +67
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#' if the observations of the input dataset are not unique with respect to the
#' by variables and the order.
#' if the observations of the source datasets are not unique with respect to the
#' by variables and the date and order specified in the `event_source()` and
#' `censor_source()` objects.

#'
#' Default: `"none"`
ProfessorP-beep marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#' Default: `"none"`
#' *Permitted Values*: `"none"`, `"message"`, `"warning"`, `"error"`

#'
#' @details The following steps are performed to create the observations of the
#' new parameter:
#'
Expand Down Expand Up @@ -322,8 +330,14 @@ derive_param_tte <- function(dataset = NULL,
censor_conditions,
create_datetime = FALSE,
set_values_to,
subject_keys = get_admiral_option("subject_keys")) {
subject_keys = get_admiral_option("subject_keys"),
check_type = "warning") {
# checking and quoting #
check_type <- assert_character_scalar(
check_type,
values = c("warning", "message", "error", "none"),
case_sensitive = FALSE
)
assert_data_frame(dataset, optional = TRUE)
assert_vars(by_vars, optional = TRUE)
start_date <- assert_symbol(enexpr(start_date))
Expand Down Expand Up @@ -373,16 +387,17 @@ derive_param_tte <- function(dataset = NULL,
by_vars = by_vars
)
}

tmp_event <- get_new_tmp_var(dataset)

# determine events #
event_data <- filter_date_sources(
sources = event_conditions,
source_datasets = source_datasets,
by_vars = by_vars,
create_datetime = create_datetime,
subject_keys = subject_keys,
mode = "first"
mode = "first",
check_type = check_type
) %>%
mutate(!!tmp_event := 1L)

Expand All @@ -393,7 +408,8 @@ derive_param_tte <- function(dataset = NULL,
by_vars = by_vars,
create_datetime = create_datetime,
subject_keys = subject_keys,
mode = "last"
mode = "last",
check_type = check_type
) %>%
mutate(!!tmp_event := 0L)

Expand Down Expand Up @@ -436,7 +452,8 @@ derive_param_tte <- function(dataset = NULL,
bind_rows(event_data, censor_data),
by_vars = expr_c(subject_keys, by_vars),
order = exprs(!!tmp_event),
mode = "last"
mode = "last",
check_type = check_type
) %>%
inner_join(
adsl,
Expand Down Expand Up @@ -505,6 +522,16 @@ derive_param_tte <- function(dataset = NULL,
#'
#' Permitted Values: `"first"`, `"last"`
#'
#' @param check_type Check uniqueness
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#' @param check_type Check uniqueness
#' @param check_type Check uniqueness

(If the tag is indented, it is rendered as text.)

#'
#' If `"warning"`, `"message"`, or `"error"` is specified, the specified message is issued
#' if the observations of the input dataset are not unique with respect to the
#' by variables and the order.
Comment on lines +572 to +573
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#' if the observations of the input dataset are not unique with respect to the
#' by variables and the order.
#' if the observations of the source datasets are not unique with respect to the
#' by variables and the date and order specified in the `tte_source()` objects.

#'
#' Default: `"none"`
#'
Comment on lines +575 to +576
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#' Default: `"none"`
#'

#' Permitted Values: `"none"`, `"warning"`, `"error"`, `"message"`
#'
ProfessorP-beep marked this conversation as resolved.
Show resolved Hide resolved
#' @details The following steps are performed to create the output dataset:
#'
#' \enumerate{ \item For each source dataset the observations as specified by
Expand All @@ -529,7 +556,7 @@ derive_param_tte <- function(dataset = NULL,
#' @return A dataset with one observation per subject as described in the
#' "Details" section.
#'
#' @noRd
#' @keywords internal
ProfessorP-beep marked this conversation as resolved.
Show resolved Hide resolved
#'
#' @examples
#' library(tibble)
Expand Down Expand Up @@ -571,14 +598,16 @@ derive_param_tte <- function(dataset = NULL,
#' by_vars = exprs(AEDECOD),
#' create_datetime = FALSE,
#' subject_keys = get_admiral_option("subject_keys"),
#' mode = "first"
#' mode = "first",
#' check_type = "none"
#' )
filter_date_sources <- function(sources,
source_datasets,
by_vars,
create_datetime = FALSE,
subject_keys,
mode) {
mode,
check_type = "none") {
ProfessorP-beep marked this conversation as resolved.
Show resolved Hide resolved
assert_list_of(sources, "tte_source")
assert_list_of(source_datasets, "data.frame")
assert_logical_scalar(create_datetime)
Expand Down Expand Up @@ -613,17 +642,49 @@ filter_date_sources <- function(sources,
var = !!source_date_var,
dataset_name = sources[[i]]$dataset_name
)
data[[i]] <- source_dataset %>%
filter_if(sources[[i]]$filter) %>%
filter_extreme(
order = exprs(!!source_date_var),
by_vars = expr_c(subject_keys, by_vars),
mode = mode,
check_type = "none"
)

# wrap filter_extreme in tryCatch to catch duplicate records and create a message
data[[i]] <- rlang::try_fetch(
{
source_dataset %>%
filter_if(sources[[i]]$filter) %>%
filter_extreme(
order = expr_c(exprs(!!source_date_var), sources[[i]]$order),
by_vars = expr_c(subject_keys, by_vars),
mode = mode,
check_type = check_type
ProfessorP-beep marked this conversation as resolved.
Show resolved Hide resolved
)
},
warning = function(cnd) {
# Handle warnings
if (grepl("duplicate records", conditionMessage(cnd))) {
cli::cli_warn(c(
"Dataset '{.val {sources[[i]]$dataset_name}}' contains duplicate records.",
"i Duplicates were identified based on variables:
{.val {paste(c(subject_keys, by_vars, source_date_var), collapse = ', ')}}."
))
}
source_dataset %>%
filter_if(sources[[i]]$filter) %>%
arrange(!!!sources[[i]]$order) # Return filtered dataset even if a warning occurred
},
error = function(err) {
cli::cli_abort(c(
"Duplicate records detected during processing.",
"x Duplicate records were found in dataset {.val {sources[[i]]$dataset_name}}.",
"i The duplicates were identified based on the following variables:
{.val {paste(c(subject_keys, by_vars, source_date_var), collapse = ', ')}}.",
"i Consider reviewing your `by_vars` or `order` argument to ensure uniqueness."
))
},
message = function(msg) {
cli::cli_inform(c(
"Processing dataset '{.val {sources[[i]]$dataset_name}}'...",
"i Filter and order criteria: {.val {paste(c(subject_keys, by_vars,
sources[[i]]$order), collapse = ', ')}}."
))
}
Comment on lines +701 to +729
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once #2592 is merged this could be simplified:

      duplicate_records = function(cnd) {
        cnd_funs <- list(message = cli_inform, warning = cli_warn, error = cli_abort)
        cnd_funs[[check_type]](
          paste(
            "Dataset {.val {sources[[i]]$dataset_name}} contains duplicate records with respect to",
            "{.var {cnd$by_vars}}"
          ),
          class = class(cnd))
        cnd_muffle(cnd)
        zap()
      }

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ProfessorP-beep , #2592 is merged now.

)
# add date variable and accompanying variables

if (create_datetime) {
date_derv <- exprs(!!date_var := as_datetime(!!source_date_var))
} else {
Expand All @@ -649,7 +710,7 @@ filter_date_sources <- function(sources,
by_vars = expr_c(subject_keys, by_vars),
order = exprs(!!date_var),
mode = mode,
check_type = "none"
check_type = check_type
)
}

Expand Down Expand Up @@ -782,6 +843,10 @@ extend_source_datasets <- function(source_datasets,
#' SRCDOM = "ADSL", SRCVAR = "DTHDT")`. The values must be a symbol, a
#' character string, a numeric value, an expression, or `NA`.
#'
#' @param order Sort order
#'
#' If the argument is set to a non-null value, for each by group the first or
#' last observation
Copy link
Collaborator

@rossfarrugia rossfarrugia Dec 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ProfessorP-beep from a user perspective i find this argument description confusing. I'm unclear whether i need to use exprs() like for set_values_to explained above and i don't really understand the description here. I thought it would be worded more something like:

An optional named list returned by exprs() defining additional variables that the input dataset is sorted on after date.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bundfussr any thoughts on this suggestion? just in case i'm missing anything.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rossfarrugia , I agree that the description should be updated. I would move the technical details to "Permitted Values". For example

Additional variables that the source dataset is sorted on after date.

Permitted Values: list of variables created by exprs() e.g. exprs(ASEQ)

#'
#' @keywords source_specifications
#' @family source_specifications
Expand All @@ -793,7 +858,8 @@ tte_source <- function(dataset_name,
filter = NULL,
date,
censor = 0,
set_values_to = NULL) {
set_values_to = NULL,
order = order) {
out <- list(
dataset_name = assert_character_scalar(dataset_name),
filter = assert_filter_cond(enexpr(filter), optional = TRUE),
Expand All @@ -803,7 +869,8 @@ tte_source <- function(dataset_name,
set_values_to,
named = TRUE,
optional = TRUE
)
),
order = order
)
class(out) <- c("tte_source", "source", "list")
out
Expand Down Expand Up @@ -844,13 +911,15 @@ tte_source <- function(dataset_name,
event_source <- function(dataset_name,
filter = NULL,
date,
set_values_to = NULL) {
set_values_to = NULL,
order = NULL) {
out <- tte_source(
dataset_name = assert_character_scalar(dataset_name),
filter = !!enexpr(filter),
date = !!assert_expr(enexpr(date)),
censor = 0,
set_values_to = set_values_to
set_values_to = set_values_to,
order = order
)
class(out) <- c("event_source", class(out))
out
Expand Down Expand Up @@ -891,13 +960,15 @@ censor_source <- function(dataset_name,
filter = NULL,
date,
censor = 1,
set_values_to = NULL) {
set_values_to = NULL,
order = NULL) {
out <- tte_source(
dataset_name = assert_character_scalar(dataset_name),
filter = !!enexpr(filter),
date = !!assert_expr(enexpr(date)),
censor = assert_integer_scalar(censor, subset = "positive"),
set_values_to = set_values_to
set_values_to = set_values_to,
order = order
)
class(out) <- c("censor_source", class(out))
out
Expand Down
2 changes: 1 addition & 1 deletion R/filter_extreme.R
Original file line number Diff line number Diff line change
Expand Up @@ -106,7 +106,7 @@ filter_extreme <- function(dataset,
check_type <-
assert_character_scalar(
check_type,
values = c("none", "warning", "error"),
values = c("none", "warning", "error", "message"),
case_sensitive = FALSE
)
assert_data_frame(dataset, required_vars = by_vars)
Expand Down
1 change: 1 addition & 0 deletions inst/WORDLIST
Original file line number Diff line number Diff line change
Expand Up @@ -304,6 +304,7 @@ msec
nd
occds
onwards
param
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is not necessary. You can check by calling spelling::update_wordlist().

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Will push soon.

parttime
pharmaverse
pharmaverseadam
Expand Down
8 changes: 7 additions & 1 deletion man/censor_source.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

11 changes: 10 additions & 1 deletion man/derive_param_tte.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 2 additions & 2 deletions man/derive_vars_merged.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 2 additions & 2 deletions man/derive_vars_merged_lookup.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading
Loading