You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a large dataframe that is built by doing computationally expensive joins of other dataframes. I have a function that returns this dataframe that looks like this:
data_df <- function(options){
df <-
mustashe::stash(
"stash_dataDF",
{df <- data.frame() # lots of expensive joins here},
depends_on = c(getOption("DFs")),
functional = TRUE)
if (missing(options)) return(df)
else {
# do some post-processing of df based on options
return(df)
}
}
When I run this and it looks up stash_dataDF and finds that it can load the stashed object, it then has to load stash_dataDF.qs from disk. In my case, this tends to take between 0.25 - 0.5 seconds. Which isn't much, but does add up if data_df() is called multiple times within a larger workflow. I wonder if it would be possible to add an option to save stash_dataDF as an in-memory object and then use the same hash checking as is currently done. stash_dataDF would still need to either be built fresh or loaded from stash_dataDF.qs at the beginning of each session, but then after that, this should be a lot faster than reading it from disk each time.
The text was updated successfully, but these errors were encountered:
As an alternative, you could consider implementing support for cachem, which supports both disk and memory-backed key-value stores, plus additional features such as pruning of old values.
I have a large dataframe that is built by doing computationally expensive joins of other dataframes. I have a function that returns this dataframe that looks like this:
When I run this and it looks up
stash_dataDF
and finds that it can load the stashed object, it then has to loadstash_dataDF.qs
from disk. In my case, this tends to take between 0.25 - 0.5 seconds. Which isn't much, but does add up if data_df() is called multiple times within a larger workflow. I wonder if it would be possible to add an option to savestash_dataDF
as an in-memory object and then use the same hash checking as is currently done.stash_dataDF
would still need to either be built fresh or loaded fromstash_dataDF.qs
at the beginning of each session, but then after that, this should be a lot faster than reading it from disk each time.The text was updated successfully, but these errors were encountered: