Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

data argument to setup_bootstrap_run and remove nm_join requirement #707

Merged
merged 35 commits into from
Sep 19, 2024

Conversation

barrettk
Copy link
Collaborator

@barrettk barrettk commented Jun 26, 2024

We use nm_join() under the hood to create the filtered starting dataset we resample from. There was previously no method for passing a .join_col, so the default NUM would always be used under the hood.

The workaround was to add a NUM column to the original model and resubmit before attempting to bootstrap it, though this is obviously inconvenient and shouldn't be the behavior. Note that a support ticket has already been filed for this issue, so this PR was initially in response to that.

Rather than providing a new .join_col argument, this PR removes the nm_join() dependency by parsing the IGNORE and ACCEPT options in a $DATA record, returning a filtered dataset.

Since we had to touch this function anyways, I reached out to @kylebaron about adding a data argument as well. There have been previously discussions about this, though mostly offline. I followed the same approach used for nmsim models (save new data to the output directory after some checks) with some minor differences.

@barrettk barrettk added bug Something isn't working bootstrap Bootstrap development labels Jun 26, 2024
@barrettk barrettk force-pushed the bootstrap/starting-data branch from d58a0ac to 8c4b5aa Compare June 28, 2024 16:52
@barrettk barrettk requested review from kylebaron and seth127 June 28, 2024 18:30
@barrettk
Copy link
Collaborator Author

@kylebaron requested your review from more of a conceptual and documentation perspective (though feel free to make any additional comments). Im planning on discussing the technical details with @seth127 upon his return for a more formal review

@barrettk
Copy link
Collaborator Author

Local Tests
> devtools::test(filter = "boot")
ℹ Testing bbr| F W  S  OK | Context|        115 | testing bootstrap functionality and running bbi [115.8s]                                                              

══ Results ════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════
Duration: 115.8 s

[ FAIL 0 | WARN 0 | SKIP 0 | PASS 115 ]

 - get_input_columns has to utilize the based_on model instead of .boot_run, as the referenced dataset will no longer exist when overwriting

 - If a dataset was previously passed, but not when overwriting, we need to revert the data path back to the one specified in the original control stream

 - add regression tests for passing in a .join_col or user provided dataset
@barrettk barrettk force-pushed the bootstrap/starting-data branch from 49d9f9b to ed2c258 Compare June 28, 2024 20:22
barrettk added 10 commits July 2, 2024 17:12
 - Extracts and formats IGNORE and ACCEPT record options, transforming them into `dplyr::filter()` expressions
 - After some internal discussion, it was suggested that the ability to create a full NM-TRAN dataset via parsing the $DATA and $INPUT records could be valuable.

 - This commit adds the dropping of columns via `DROP` and `SKIP` options, handles the renaming of columns, and null mapping (changing null values to a particular character string).

 - nm_data_filter was renamed from filter_nm_data. All the helper function names may change, but was trying to be consistent across the new helper functions at the minimum.

 - minor bug fix to nm_data_filter: can handle `=` IGNORE/ACCEPT options
 - read_data_record and get_records() can only be used for reading the records, not overwriting. This slipped past me in a previous commit.

 - dont apply `#` filter to all columns - only the first
 - pull out the individual expression translation into a separate function

 - move all related functions to this PR to a new script
 - (list) type expressions must be split up first. This wasn't necessary before the refactor, but is now

 - add examples and documentation
 - removed other NM-TRAN setup functions. They were incomplete/not fully accurate, and we can just use NM-TRAN directly to create that dataset

 - Hook up filter_nm_data to setup_bootstrap_run and update tests
 - pulled out additional logic for parsing the IGNORE and ACCEPT options

 - fix: adjusted `@` IGNORE filter option to only apply to the first column
@barrettk barrettk changed the title Add .join_col and data arguments to setup_bootstrap_run data argument to setup_bootstrap_run and remove nm_join requirement Jul 11, 2024
barrettk added 6 commits July 11, 2024 11:28
 - add tests for all NONMEM operators
 - adjust map call placement to improve clarity of helper functions

 - abort if both IGNORE and ACCEPT expressions are found

 - now return the `type` as part of `get_data_filter_exprs()`, which is then passed to `translate_nm_expr()`

 - inform user of how many records are dropped as part of `nm_data(filter = TRUE)`

 - adjust documentation and arguments
 - doesnt work by default for internal functions. I imagine there is a workaround for this, though the inclusion of these examples are not important and only for developer purposes
 - dont look for digits
 - only look for one character
barrettk added 14 commits July 18, 2024 11:15
 - extra `\\` are added, which are escaped when parse() is called down the line
 - Some of this was a bit out of scope for this PR, but would be ok with the scope of the parent PR (updating bootstrap functionality). The out of scope portions included capitalizing parameter documentation for all bootstrap related functions

 - Added examples and improved regex for IGNORE/ACCEPT list parsing
If parsing the NONMEM filtering expressions fails, we now instruct users to provide a starting dataset instead of proceeeding with an unfiltered `nm_data(mod)` dataset

 - filter_nm_data no longer returns NULL on failure
- If the based_on model has run, check the number of records in `starting_data` to make sure the filtering went ok. This introduces a `.bbi_args` argument passed to `model_summary()`, similar to the behavior in `nm_join()`.
 - This will read in the _starting_ dataset (i.e. does not take any resampling for each sub-model into account)

 - This is done ahead of the next commit, which will change the control stream template that's used for parsing NONMEM filter expressions
…pressions

 - previously used the parent model. However we want any changes made in the bootstrap control stream template to be reflected during setup_bootstrap_run

 - Add a warning if the parent model hasn't been submitted (and `data` is not provided, as this prevents us from being able to perform certain checks (number of expected records).
 - move data path fix to above parsing the control stream. This previously wasnt necessary since we parsed the parent model
Parse $DATA record for the purpose of filtering the input data
 - This may have a use down the road within `nm_join()`, but removing it from `bbr` for now since it's no longer utilized.

 - It's not used anymore because we now use `nm_data(mod, filter = TRUE)` instead of `nm_join()` within `setup_bootstrap_run()`
R/nm-file.R Show resolved Hide resolved
Copy link
Contributor

@kylebaron kylebaron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one request for less verbose information.

@kylebaron kylebaron self-requested a review September 19, 2024 22:33
@barrettk barrettk merged commit 03bec34 into main Sep 19, 2024
8 checks passed
@barrettk barrettk deleted the bootstrap/starting-data branch September 19, 2024 22:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bootstrap Bootstrap development bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants