Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sampling from estimated conditional distributions using different predict() methods #375

Closed
awunderground opened this issue Oct 5, 2020 · 4 comments

Comments

@awunderground
Copy link

Feature

I have a specific question and a general feature request/question.

I sometimes sample from nodes in regression trees instead of using node means. For example, I can use library(partykit) to sample from the nodes:

library(rpart)
library(partykit)
library(tidyverse)

rpart_model <- rpart(mpg ~ ., data = mtcars)

node_ecdf <- predict(as.party(rpart_model), newdata = remove_rownames(mtcars[1, ]), type = "prob")

sample(environment(node_ecdf[["1"]])[["x"]], size = 1)

The process above is clunky and does not generalize across models or packages. It also doesn't work with library(parsnip):

library(tidymodels)

cart_model <- parsnip::decision_tree() %>%
  parsnip::set_engine("rpart") %>%
  parsnip::set_mode("regression")

parsnip_model <- fit(cart_model, mpg ~ ., data = mtcars)

as.party(parsnip_model)
Error in UseMethod("as.party") : 
  no applicable method for 'as.party' applied to an object of class "c('_rpart', 'model_fit')"
  1. Specific question: should class conversions like as.party() work in this situation?
  2. General request/question: are there plans for tidymodels to allow for a wider range of prediction methods or will this all be handled through the model packages (e.g. rpart)? It is useful to sample from conditional distributions created by lm, rpart, ranger, etc. I am happy to work on this and want to make sure my efforts align with your excellent API/framework.
@juliasilge
Copy link
Member

On your first question, you need to use repair_call() (see more here).

library(tidymodels)
library(partykit)
#> Loading required package: grid
#> Loading required package: libcoin
#> Loading required package: mvtnorm

cart_model <- parsnip::decision_tree() %>%
  parsnip::set_engine("rpart") %>%
  parsnip::set_mode("regression")

cart_fit <- fit(cart_model, mpg ~ ., data = mtcars)
fixed_fit <- repair_call(cart_fit, data = mtcars)
as.party(fixed_fit$fit)
#> 
#> Model formula:
#> mpg ~ cyl + disp + hp + drat + wt + qsec + vs + am + gear + carb
#> 
#> Fitted party:
#> [1] root
#> |   [2] cyl >= 5
#> |   |   [3] hp >= 192.5: 13.414 (n = 7, err = 28.8)
#> |   |   [4] hp < 192.5: 18.264 (n = 14, err = 59.9)
#> |   [5] cyl < 5: 26.664 (n = 11, err = 203.4)
#> 
#> Number of inner nodes:    2
#> Number of terminal nodes: 3

Created on 2020-10-05 by the reprex package (v0.3.0.9001)

@topepo
Copy link
Member

topepo commented Dec 4, 2020

are there plans for tidymodels to allow for a wider range of prediction methods or will this all be handled through the model packages (e.g. rpart)? It is useful to sample from conditional distributions created by lm, rpart, ranger, etc. I am happy to work on this and want to make sure my efforts align with your excellent API/framework.

I don't want to maintain parsnip wrappers for a large number of modeling functions. That has been a bit of a nightmare for caret.

The nice thing about parsnip is that the work and be spread to "parsnip-adjacent packages (e.g. rules, baguette, etc).

I started on party engines to use in the treesnip package but have not gotten far (mostly due to how their S4 methods work). That's on my holiday "pet project" list.

In general though, if there is something that you want to implement and maintain, take a look at the help documentation and add issues here in case you run into issues.

@simonpcouch
Copy link
Contributor

Going to go ahead and close as this hasn't come to the top of our to-do in the last 4 years.

Generally, though, while you can't as.party(parsnip_model) in this case, you can as.party(extract_fit_engine(parsnip_model)). If you run into issues doing so, please feel free to open a new issue. :)

Copy link

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

@github-actions github-actions bot locked and limited conversation to collaborators Apr 19, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants