Skip to content

Commit

Permalink
fix: styler, build, test, check all happy
Browse files Browse the repository at this point in the history
  • Loading branch information
dsweber2 committed Sep 5, 2023
1 parent c4408b3 commit f7a46b3
Show file tree
Hide file tree
Showing 9 changed files with 161 additions and 73 deletions.
1 change: 1 addition & 0 deletions DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ Imports:
checkmate,
cli,
httr,
glue,
jsonlite,
magrittr,
MMWRweek,
Expand Down
1 change: 1 addition & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,7 @@ export(pvt_twitter)
export(set_cache)
export(wiki)
import(cachem)
import(glue)
import(openssl)
importFrom(MMWRweek, MMWRweek2Date)
importFrom(checkmate, assert)
Expand Down
116 changes: 83 additions & 33 deletions R/cache.R
Original file line number Diff line number Diff line change
@@ -1,21 +1,32 @@
# IMPORTANT DEV NOTE:
# make sure to @include cache.R in the Roxygen docs of any function referencing this environment, so this file is loaded first
# make sure to @include cache.R in the Roxygen docs of any function referencing this environment, so this file is loaded
# first
cache_environ <- new.env(parent = emptyenv())
cache_environ$use_cache <- NULL
cache_environ$epidatr_cache <- NULL
#' create or renew a cache for this session
#'
#' @aliases set_cache
#' @description
#' By default, epidatr re-requests data from the API on every call of `fetch`. In case you find yourself repeatedly calling the same data, you can enable the cache using either this function for a given session, or environmental variables for a persistent cache.
#' The typical recommended workflow for using the cache is to set the environmental variables `EPIDATR_USE_CACHE=TRUE` and `EPIDATR_CACHE_DIRECTORY="/your/directory/here"`in your `.Renviron`, for example by calling `usethis::edit_r_environ()`.
#' By default, epidatr re-requests data from the API on every call of `fetch`. In case you find yourself repeatedly
#' calling the same data, you can enable the cache using either this function for a given session, or environmental
#' variables for a persistent cache.
#' The typical recommended workflow for using the cache is to set the environmental variables `EPIDATR_USE_CACHE=TRUE`
#' and `EPIDATR_CACHE_DIRECTORY="/your/directory/here"`in your `.Renviron`, for example by calling
#' `usethis::edit_r_environ()`.
#' See the parameters below for some more configurables if you're so inclined.
#'
#' `set_cache` (re)defines the cache to use in a particular R session. This does not clear existing data at any previous location, but instead creates a handle to the new cache using [cachem](https://cachem.r-lib.org/index.html) that seamlessly handles caching for you.
#' Say your cache is normally stored in some default directory, but for the current session you want to save your results in `~/my/temporary/savedirectory`, then you would call `set_cache(dir = "~/my/temporary/savedirectory")`.
#' Or if you know the data from 2 days ago is wrong, you could call `set_cache(days = 1)` to clear older data whenever the cache is referenced.
#' In both cases, these changes would only last for a single session (though the deleted data would be gone permanently!).
#' `set_cache` (re)defines the cache to use in a particular R session. This does not clear existing data at any previous
#' location, but instead creates a handle to the new cache using [cachem](https://cachem.r-lib.org/index.html) that
#' seamlessly handles caching for you.
#' Say your cache is normally stored in some default directory, but for the current session you want to save your
#' results in `~/my/temporary/savedirectory`, then you would call `set_cache(dir = "~/my/temporary/savedirectory")`.
#' Or if you know the data from 2 days ago is wrong, you could call `set_cache(days = 1)` to clear older data whenever
#' the cache is referenced.
#' In both cases, these changes would only last for a single session (though the deleted data would be gone
#' permanently!).
#'
#' An important feature of the caching in this package is that only calls which specify either `issues` before a certain date, or `as_of` before a certain date will actually cache. For example the call
#' An important feature of the caching in this package is that only calls which specify either `issues` before a certain
#' date, or `as_of` before a certain date will actually cache. For example the call
#' ```
#' covidcast(
#' source = "jhu-csse",
Expand All @@ -26,7 +37,8 @@ cache_environ$epidatr_cache <- NULL
#' time_values = epirange(20200601, 20230801)
#' )
#' ```
#' *won't* cache, since it is possible for the cache to be invalidated by new releases with no warning. On the other hand, the call
#' *won't* cache, since it is possible for the cache to be invalidated by new releases with no warning. On the other
#' hand, the call
#' ```
#' covidcast(
#' source = "jhu-csse",
Expand All @@ -38,9 +50,13 @@ cache_environ$epidatr_cache <- NULL
#' as_of = "2023-08-01"
#' )
#' ```
#' *will* cache, since normal new versions of data can't invalidate it (since they would be `as_of` a later date). It is still possible that Delphi may patch such data, but the frequency is on the order of months rather than days. We are working on creating a public channel to communicate such updates. While specifying `issues` will usually cache, a call with `issues="*"` won't cache, since its subject to cache invalidation by normal versioning.
#' *will* cache, since normal new versions of data can't invalidate it (since they would be `as_of` a later date). It is
#' still possible that Delphi may patch such data, but the frequency is on the order of months rather than days. We
#' are working on creating a public channel to communicate such updates. While specifying `issues` will usually cache,
#' a call with `issues="*"` won't cache, since its subject to cache invalidation by normal versioning.
#'
#' On the backend, the cache uses cachem, with filenames generated using an md5 encoding of the call url. Each file corresponds to a unique epidata-API call.
#' On the backend, the cache uses cachem, with filenames generated using an md5 encoding of the call url. Each file
#' corresponds to a unique epidata-API call.
#' @examples
#' \dontrun{
#' set_cache(
Expand All @@ -52,15 +68,26 @@ cache_environ$epidatr_cache <- NULL
#' )
#' }
#'
#' @param cache_dir the directory in which the cache is stored. By default, this is `tools::R_user_dir()` if on R 4.0+, but must be specified for earlier versions of R. The path can be either relative or absolute. The environmental variable is `EPIDATR_CACHE_DIR`.
#' @param days the maximum length of time in days to keep any particular cached call. By default this is `1`. The environmental variable is `EPIDATR_CACHE_MAX_AGE_DAYS`.
#' @param max_size the size of the entire cache, in MB, at which to start pruning entries. By default this is `1024`, or 1GB. The environmental variable is `EPIDATR_CACHE_MAX_SIZE_MB`.
#' @param logfile where cachem's log of transactions is stored, relative to the cache directory. By default, it is `"logfile.txt"`. The environmental variable is `EPIDATR_CACHE_LOGFILE`.
#' @param prune_rate how many calls to go between checking if any cache elements are too old or if the cache overall is too large. Defaults to `2000L`. Since cachem fixes the max time between prune checks to 5 seconds, there's little reason to actually change this parameter. Doesn't have a corresponding environmental variable.
#' @param confirm whether to confirm directory creation. default is `TRUE`; should only be set in non-interactive scripts
#' @seealso [clear_cache] to delete the old cache while making a new one, [disable_cache] to disable without deleting, and [cache_info]
#' @param cache_dir the directory in which the cache is stored. By default, this is `tools::R_user_dir()` if on R 4.0+,
#' but must be specified for earlier versions of R. The path can be either relative or absolute. The environmental
#' variable is `EPIDATR_CACHE_DIR`.
#' @param days the maximum length of time in days to keep any particular cached call. By default this is `1`. The
#' environmental variable is `EPIDATR_CACHE_MAX_AGE_DAYS`.
#' @param max_size the size of the entire cache, in MB, at which to start pruning entries. By default this is `1024`, or
#' 1GB. The environmental variable is `EPIDATR_CACHE_MAX_SIZE_MB`.
#' @param logfile where cachem's log of transactions is stored, relative to the cache directory. By default, it is
#' `"logfile.txt"`. The environmental variable is `EPIDATR_CACHE_LOGFILE`.
#' @param prune_rate how many calls to go between checking if any cache elements are too old or if the cache overall is
#' too large. Defaults to `2000L`. Since cachem fixes the max time between prune checks to 5 seconds, there's little
#' reason to actually change this parameter. Doesn't have a corresponding environmental variable.
#' @param confirm whether to confirm directory creation. default is `TRUE`; should only be set in non-interactive
#' scripts
#' @seealso [clear_cache] to delete the old cache while making a new one, [disable_cache] to disable without deleting,
#' and [cache_info]
#' @export
#' @import cachem
#' @import glue
#' @importFrom utils sessionInfo
set_cache <- function(cache_dir = NULL,
days = NULL,
max_size = NULL,
Expand All @@ -72,7 +99,7 @@ set_cache <- function(cache_dir = NULL,
} else if (is.null(cache_dir)) {
# earlier version, so no tools
cache_dir <- Sys.getenv("EPIDATR_CACHE_DIR")
if (cach_dir == "") {
if (cache_dir == "") {
cli::cli_abort("no valid EPIDATR_CACHE_DIR", class = "epidatr_cache_error")
}
}
Expand All @@ -94,9 +121,13 @@ set_cache <- function(cache_dir = NULL,
cache_usable <- file.access(cache_dir, mode = 6) == 0
if (!(cache_exists)) {
if (confirm) {
user_input <- readline(glue::glue("there is no directory at {cache_dir}; the cache will be turned off until a viable directory has been set. Create one? (yes|no) "))
user_input <- readline(glue::glue(
"there is no directory at {cache_dir}; the cache will be turned off until a ",
"viable directory has been set. Create one? (yes|no(default)) "
))
repeat {
valid_user_input <- ifelse(grepl("yes|no", user_input), sub(".*(yes|no).*", "\\1", user_input), NA)
valid_user_input <- ifelse(grepl("", user_input), "", NA)
if (!is.na(valid_user_input)) {
break
}
Expand All @@ -114,7 +145,10 @@ set_cache <- function(cache_dir = NULL,


if (!cache_usable) {
print(glue::glue("The directory at {cache_dir} is not accessible; check permissions and/or use a different directory for the cache (see the `set_cache` documentation)."))
print(glue::glue(
"The directory at {cache_dir} is not accessible; check permissions and/or use a different ",
"directory for the cache (see the `set_cache` documentation)."
))
} else if (cache_exists) {
cache_environ$epidatr_cache <- cachem::cache_disk(
dir = cache_dir,
Expand All @@ -128,7 +162,9 @@ set_cache <- function(cache_dir = NULL,

#' manually reset the cache, deleting all currently saved data and starting afresh
#' @description
#' deletes the current cache and resets a new cache. Deletes local data! If you are using a session unique cache, you will have to pass the arguments you used for `set_cache` earlier, otherwise the system-wide `.Renviron`-based defaults will be used.
#' deletes the current cache and resets a new cache. Deletes local data! If you are using a session unique cache, you
#' will have to pass the arguments you used for `set_cache` earlier, otherwise the system-wide `.Renviron`-based
#' defaults will be used.
#' @examples
#' \dontrun{
#' clear_cache(
Expand All @@ -140,22 +176,29 @@ set_cache <- function(cache_dir = NULL,
#' )
#' }
#' @param disable instead of setting a new cache, disable caching entirely; defaults to `FALSE`
#' @param ... see the `set_cache` arguments below
#' @inheritParams set_cache
#' @seealso [set_cache] to start a new cache (and general caching info), [disable_cache] to only disable without deleting, and [cache_info]
#' @seealso [set_cache] to start a new cache (and general caching info), [disable_cache] to only disable without
#' deleting, and [cache_info]
#' @export
#' @import cachem
clear_cache <- function(disable = FALSE, ...) {
cache_environ$epidatr_cache$destroy()
if (!disable) {
set_cache(...)
} else {
cache_environ$epidatr_cache <- NULL
}
}

#' turn off the caching for this session
#' @description
#' Disable caching until you call `set_cache` or restart R. The files defining the cache are untouched. If you are looking to disable the caching more permanently, set `EPIDATR_USE_CACHE=FALSE` as environmental variable in your `.Renviron`.
#' Disable caching until you call `set_cache` or restart R. The files defining the cache are untouched. If you are
#' looking to disable the caching more permanently, set `EPIDATR_USE_CACHE=FALSE` as environmental variable in your
#' `.Renviron`.
#' @export
#' @seealso [set_cache] to start a new cache (and general caching info), [clear_cache] to delete the cache and set a new one, and [cache_info]
#' @seealso [set_cache] to start a new cache (and general caching info), [clear_cache] to delete the cache and set a new
#' one, and [cache_info]
#' @import cachem
disable_cache <- function() {
cache_environ$epidatr_cache <- NULL
Expand All @@ -164,12 +207,16 @@ disable_cache <- function() {
#' describe current cache
#' @description
#' Print out the information about the cache (as would be returned by cachem's `info()` method)
#' @examples
#' cache_info()
#' @seealso [set_cache] to start a new cache (and general caching info), [clear_cache] to delete the cache and set a new one, and [disable_cache] to disable without deleting
#' @seealso [set_cache] to start a new cache (and general caching info),
#' [clear_cache] to delete the cache and set a new one, and [disable_cache] to
#' disable without deleting
#' @export
cache_info <- function() {
cache_environ$epidatr_cache$info()
if (is.null(cache_environ$epidatr_cache)) {
return("there is no cache")
} else {
return(cache_environ$epidatr_cache$info())
}
}

#' dispatch caching
Expand All @@ -190,9 +237,12 @@ cache_epidata_call <- function(epidata_call, fetch_args = fetch_args_list()) {
as_of_recent <- check_is_recent(epidata_call$params$as_of, 7)
issues_recent <- check_is_recent(epidata_call$params$issues, 7)
if (as_of_recent || issues_recent) {
cli::cli_warn("using cached results with `as_of` within the past week (or the future!). This will likely result in an invalid cache. Consider
1. disabling the cache for this session with `disable_cache` or permanently with environmental variable `EPIDATR_USE_CACHE=FALSE`
2. setting `EPIDATR_CACHE_MAX_AGE_DAYS={Sys.getenv('EPIDATR_CACHE_MAX_AGE_DAYS', unset = 1)}` to e.g. `3/24` (3 hours).",
cli::cli_warn("using cached results with `as_of` within the past week (or the future!). This will likely result ",
"in an invalid cache. Consider\n",
"1. disabling the cache for this session with `disable_cache` or permanently with environmental ",
"variable `EPIDATR_USE_CACHE=FALSE`\n",
"2. setting `EPIDATR_CACHE_MAX_AGE_DAYS={Sys.getenv('EPIDATR_CACHE_MAX_AGE_DAYS', unset = 1)}` to e.g. `3/24` ",
"(3 hours).",
.frequency = "regularly",
.frequency_id = "cache timing issues",
class = "cache_recent_data"
Expand Down
2 changes: 1 addition & 1 deletion R/epidatr-package.R
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

.onLoad <- function(libname, pkgname) {
cache_environ$use_cache <- Sys.getenv("EPIDATR_USE_CACHE", unset = FALSE)
cache_environ$use_cache <- cache_environ$use_cache == "TRUE"
cache_environ$use_cache <- (cache_environ$use_cache == "TRUE")
if (cache_environ$use_cache) {
set_cache()
}
Expand Down
7 changes: 3 additions & 4 deletions man/cache_info.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

9 changes: 7 additions & 2 deletions man/clear_cache.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

7 changes: 5 additions & 2 deletions man/disable_cache.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading

0 comments on commit f7a46b3

Please sign in to comment.