-
Notifications
You must be signed in to change notification settings - Fork 3
/
Copy pathREADME.Rmd
80 lines (58 loc) · 2.23 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
---
title: "h2oTreeHelpR"
output: github_document
---
#### Useful functions to manipulate tree based models in h2o from R
Refer to the [Code-Structure](https://github.com/richardangell/h2oTreeHelpR/blob/master/Code-Structure.md) document for an overview of the code structure. <br>
The main functions in the package are demonstrated below. <br>
First build a simple gbm with the following code; <br>
```{r, message = FALSE, results = 'hide'}
library(h2o)
library(h2oTreeHelpR)
h2o.no_progress()
h2o.init()
```
```{r, message = FALSE}
prostate.hex = h2o.uploadFile(path = system.file("extdata",
"prostate.csv",
package = "h2o"),
destination_frame = "prostate.hex")
prostate.hex["RACE"] = as.factor(prostate.hex["RACE"])
prostate.hex["DPROS"] = as.factor(prostate.hex["DPROS"])
expl_cols <- c("AGE", "RACE", "DPROS", "DCAPS", "PSA", "VOL", "GLEASON")
prostate.gbm = h2o.gbm(x = expl_cols,
y = "CAPSULE",
training_frame = prostate.hex,
ntrees = 3,
max_depth = 1,
learn_rate = 0.1)
```
## h2o_tree_convertR
#### Convert h2o tree structures to data.frames
```{r}
prostate.gbm
```
```{r, message = FALSE, results = 'hide'}
h2o_tree_dfs = h2o_tree_convertR(h2o_model = prostate.gbm)
```
```{r}
h2o_tree_dfs[[1]]
```
## extract_split_rules
#### Get split conditions for terminal nodes for trees in a h2o model
```{r}
terminal_node_rules <- extract_split_rules(h2o_tree_dfs)
```
```{r}
terminal_node_rules[[1]]
```
## map_h2o_encoding
#### Create a mapping for the output of h2o.predict_lead_node_assignment in terms of variable conditions
Specifically this a mapping between terminal node paths represented as i. L(eft) or R(ight) directions at each node (output from h2o.predict_lead_node_assignment) and ii. variable conditions (i.e. A > x & B in (y, z) etc).
```{r}
terminal_node_mapping <- map_h2o_encoding(h2o_tree_dfs)
```
```{r}
terminal_node_mapping[[1]]
```
The last column *terminal_node_directions_h2o* shows the terminal nodes as they are represented in the output of the h2o.predict_lead_node_assignment function.