Sankey diagrams and parallel coordinates plots of all paths from root to leaf nodes through the binary trees that constitute a random forest.
Install the devtools
package and use it to install forestviews
:
install.packages('devtools')
library('devtools')
install_github(repo = 'brfitzpatrick/forestviews')
Load the data and packages we will use:
library(mlbench)
data(Satellite)
library(randomForest)
library(networkD3)
library(forestviews)
Fit a random forest:
rf.1 <- randomForest(classes ~ ., data = Satellite, mtry = 8, keep.forest = TRUE, ntree = 25, importance = TRUE)
Calculate all paths through the random forest:
rf.1.all.paths <- rf_pathfinder(rf = rf.1)
Convert these paths to a d3network:
nd3 <- rf_sankey(all.paths.out = rf.1.all.paths, all.nodes = FALSE, plot.node.lim = 6)
Plot the network as an interactive Sankey Diagram (this plot will open in your web browser):
sankeyNetwork(Links = nd3$links, Nodes = nd3$nodes , Source = 'source', Target = 'target', Value = 'value', NodeID = 'name', units = 'Count', fontSize = 12, nodeWidth = 30, NodeGroup = NULL)
Simple usage:
rf.pc <- rf_parcoor(all.paths.out = rf.1.all.paths, plot = TRUE, all.nodes = TRUE, plot.title = '', grey.scale = FALSE)
rf.pc
Grey scale version:
rf.pc.grey <- rf_parcoor(all.paths.out = rf.1.all.paths, plot = TRUE, all.nodes = TRUE, plot.title = '', grey.scale = TRUE)
rf.pc.grey
See the project website for an example of an interactive Sankey diagram.
The visualisation techniques implemented in forestviews
are introduced in our manuscript on this topic.
A pre-print of this manuscript is available from arXiv.