Skip to content

Commit

Permalink
[python][docs] more detailed docs for trees_to_dataframe(), create_tr…
Browse files Browse the repository at this point in the history
…ee_digraph(), plot_tree() (#3618)

* [python] more detailed docs for trees_to_dataframe(), create_tree_digraph(), plot_tree()

* fixing warnings

* fix warnings

* undo unnecessary space

* Apply suggestions from code review

Co-authored-by: Nikita Titov <[email protected]>

* single line, better weight descriptions

* Apply suggestions from code review

Co-authored-by: Nikita Titov <[email protected]>

* column names

* Update python-package/lightgbm/plotting.py

Co-authored-by: Nikita Titov <[email protected]>

Co-authored-by: Nikita Titov <[email protected]>
  • Loading branch information
jameslamb and StrikerRUS authored Dec 7, 2020
1 parent f38f118 commit eb03501
Show file tree
Hide file tree
Showing 2 changed files with 54 additions and 6 deletions.
18 changes: 18 additions & 0 deletions python-package/lightgbm/basic.py
Original file line number Diff line number Diff line change
Expand Up @@ -2202,6 +2202,24 @@ def free_network(self):
def trees_to_dataframe(self):
"""Parse the fitted model and return in an easy-to-read pandas DataFrame.
The returned DataFrame has the following columns.
- ``tree_index`` : int64, which tree a node belongs to. 0-based, so a value of ``6``, for example, means "this node is in the 7th tree".
- ``node_depth`` : int64, how far a node is from the root of the tree. The root node has a value of ``1``, its direct children are ``2``, etc.
- ``node_index`` : string, unique identifier for a node.
- ``left_child`` : string, ``node_index`` of the child node to the left of a split. ``None`` for leaf nodes.
- ``right_child`` : string, ``node_index`` of the child node to the right of a split. ``None`` for leaf nodes.
- ``parent_index`` : string, ``node_index`` of this node's parent. ``None`` for the root node.
- ``split_feature`` : string, name of the feature used for splitting. ``None`` for leaf nodes.
- ``split_gain`` : float64, gain from adding this split to the tree. ``NaN`` for leaf nodes.
- ``threshold`` : float64, value of the feature used to decide which side of the split a record will go down. ``NaN`` for leaf nodes.
- ``decision_type`` : string, logical operator describing how to compare a value to ``threshold``. For example, ``split_feature = "Column_10", threshold = 15, decision_type = "<="`` means that records where ``Column_10 <= 15`` follow the left side of the split, otherwise follows the right side of the split. ``None`` for leaf nodes.
- ``missing_direction`` : string, split direction that missing values should go to. ``None`` for leaf nodes.
- ``missing_type`` : string, describes what types of values are treated as missing.
- ``value`` : float64, predicted value for this leaf node, multiplied by the learning rate.
- ``weight`` : float64 or int64, sum of hessian (second-order derivative of objective), summed over observations that fall in this node.
- ``count`` : int64, number of records in the training data that fall into this node.
Returns
-------
result : pandas DataFrame
Expand Down
42 changes: 36 additions & 6 deletions python-package/lightgbm/plotting.py
Original file line number Diff line number Diff line change
Expand Up @@ -474,6 +474,16 @@ def create_tree_digraph(booster, tree_index=0, show_info=None, precision=3,
orientation='horizontal', **kwargs):
"""Create a digraph representation of specified tree.
Each node in the graph represents a node in the tree.
Non-leaf nodes have labels like ``Column_10 <= 875.9``, which means
"this node splits on the feature named "Column_10", with threshold 875.9".
Leaf nodes have labels like ``leaf 2: 0.422``, which means "this node is a
leaf node, and the predicted value for records that fall into this node
is 0.422". The number (``2``) is an internal unique identifier and doesn't
have any special meaning.
.. note::
For more information please visit
Expand All @@ -487,9 +497,14 @@ def create_tree_digraph(booster, tree_index=0, show_info=None, precision=3,
The index of a target tree to convert.
show_info : list of strings or None, optional (default=None)
What information should be shown in nodes.
Possible values of list items:
'split_gain', 'internal_value', 'internal_count', 'internal_weight',
'leaf_count', 'leaf_weight', 'data_percentage'.
- ``'split_gain'`` : gain from adding this split to the model
- ``'internal_value'`` : raw predicted value that would be produced by this node if it was a leaf node
- ``'internal_count'`` : number of records from the training data that fall into this non-leaf node
- ``'internal_weight'`` : total weight of all nodes that fall into this non-leaf node
- ``'leaf_count'`` : number of records from the training data that fall into this leaf node
- ``'leaf_weight'`` : total weight (sum of hessian) of all observations that fall into this leaf node
- ``'data_percentage'`` : percentage of training data that fall into this node
precision : int or None, optional (default=3)
Used to restrict the display of floating point values to a certain precision.
orientation : string, optional (default='horizontal')
Expand Down Expand Up @@ -536,6 +551,16 @@ def plot_tree(booster, ax=None, tree_index=0, figsize=None, dpi=None,
show_info=None, precision=3, orientation='horizontal', **kwargs):
"""Plot specified tree.
Each node in the graph represents a node in the tree.
Non-leaf nodes have labels like ``Column_10 <= 875.9``, which means
"this node splits on the feature named "Column_10", with threshold 875.9".
Leaf nodes have labels like ``leaf 2: 0.422``, which means "this node is a
leaf node, and the predicted value for records that fall into this node
is 0.422". The number (``2``) is an internal unique identifier and doesn't
have any special meaning.
.. note::
It is preferable to use ``create_tree_digraph()`` because of its lossless quality
Expand All @@ -556,9 +581,14 @@ def plot_tree(booster, ax=None, tree_index=0, figsize=None, dpi=None,
Resolution of the figure.
show_info : list of strings or None, optional (default=None)
What information should be shown in nodes.
Possible values of list items:
'split_gain', 'internal_value', 'internal_count', 'internal_weight',
'leaf_count', 'leaf_weight', 'data_percentage'.
- ``'split_gain'`` : gain from adding this split to the model
- ``'internal_value'`` : raw predicted value that would be produced by this node if it was a leaf node
- ``'internal_count'`` : number of records from the training data that fall into this non-leaf node
- ``'internal_weight'`` : total weight of all nodes that fall into this non-leaf node
- ``'leaf_count'`` : number of records from the training data that fall into this leaf node
- ``'leaf_weight'`` : total weight (sum of hessian) of all observations that fall into this leaf node
- ``'data_percentage'`` : percentage of training data that fall into this node
precision : int or None, optional (default=3)
Used to restrict the display of floating point values to a certain precision.
orientation : string, optional (default='horizontal')
Expand Down

0 comments on commit eb03501

Please sign in to comment.