-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
new feature making clustermq "pipeable" #318
Comments
Here is an improved version which is a bit more clever on the first argument name and does chunking, which can speed up things a lot: library(brms)
library(tidybayes)
library(dplyr)
library(tidyr)
fit1 <- brm(count ~ zAge + zBase * Trt + (1|patient),
data = epilepsy, family = poisson())
## adding predictions to the orginal data set can be done with a pipe approach
epilepsy |> tidybayes::add_predicted_rvars(fit1)
## which does not work with Q_rows as Q_rows sends the individual
## columns as arguments to the function. Thus the function below does
## nest things in a way so that clustermq can be applied directly
## here:
Q_rows_nested <- function(data, fun, arg, chunk_size=1, ...) {
if(missing(arg)) {
arg <- rlang::sym(names(formals(fun))[1])
}
data |>
dplyr::mutate(.chunk=sort(rep(seq_len(ceiling(dplyr::n()/chunk_size)), length.out=dplyr::n()))) |>
tidyr::nest(data=-.chunk) |>
dplyr::select("{{arg}}" := data) |>
clustermq::Q_rows(fun=fun, ...) |>
dplyr::bind_rows()
}
## now we can run the predictions in parallel over clustermq
epilepsy |> Q_rows_nested(tidybayes::add_predicted_rvars, const=list(object=fit1), pkgs="tidybayes", n_jobs=6) |
Thanks for the idea and great to hear that the package is working well for you! The way I understand it, you want to pass a row or a number of rows of a data frame as one combined argument to a function. Instead of nesting the data, I would go about it like this: with_rvars = clustermq::Q(
tidybayes::add_predicted_rvars,
newdata = split(epilepsy, seq_len(nrow(epilepsy))),
const = list(object=fit1),
n_jobs = 6
) |> bind_rows() That looks fairly straightforward to me. I'm not sure if adding a new concept like What do you think? |
Nice alternative version. However, it is not "pipeable" - so the user cannot pipe into a Q boosted thing. The other day I had the thought that one should probably refine this towards a "Q_mutate" function which would even avoid the need for the user to define intermediate functions, which one would need if one would like to operate on multiple columns at once. I totally agree with not bloating a package with unnecessary code, for sure. How about we let this issue around for a moment so that we collect better ideas of the above function... and finally include this in some form in the documentation? An example, a section in the |
Happy to leave this open for a while and see what we come up with! |
Hi!
First, clustermq is really great - it powers a lot of what I do. Today I just wrote a small utility function which makes the "Q" functions compatible with the pipe syntax which is being used a lot in R workflows. So maybe this function could be implemented in clustermq directly?
The above makes more sense for huge simulations and fits. What would be nice to add is chunking in a way so that the "data" is being chunked into bigger pieces... which should be easy to add.
This is just a feature suggestion as I think this could be useful for many others as well.
The text was updated successfully, but these errors were encountered: