Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-deterministic even with "deterministic=True" "seed=0" and the same number of threads in LightGBM==3.1.1 #3761

Closed
ZhangTP1996 opened this issue Jan 14, 2021 · 8 comments
Assignees

Comments

@ZhangTP1996
Copy link

ZhangTP1996 commented Jan 14, 2021

LightGBM component:

Environment info

Operating System: Linux

CPU/GPU model: CPU

Python version: 3.7.3

LightGBM version or commit hash: 3.1.1, installed by pip

Error message and / or logs

Reproducible example(s)

import lightgbm as lgb
import pandas as pd
import os
import numpy as np
import random
assert lgb.__version__ == '3.1.1'
assert np.__version__ == '1.16.4'
assert pd.__version__ == '1.1.4'
# python 3.7.3
# Linux version 4.15.0-123-generic (buildd@lcy01-amd64-027) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.12)) #126~16.04.1-Ubuntu

def get_gain_importance(data, label):
    np.random.seed(0)
    random.seed(0)
    os.environ['PYTHONHASHSEED'] = '0'
    gbm = lgb.LGBMClassifier(**{'importance_type': 'gain', 'seed': 0,
                                'deterministic': True, 'n_jobs': n_jobs})
    gbm.fit(data.values, label.values.ravel())
    return gbm.feature_importances_


if __name__ == '__main__':
    path = './'
    n_jobs = 16
    data = pd.read_csv(path+'train_x.csv')
    label = pd.read_csv(path+'train_y.csv')
    FI = get_gain_importance(data, label)
    for i in range(100):
        print(i)
        FI_new = get_gain_importance(data, label)
        if ~(FI == FI_new).all():
            print(FI)
            print(FI_new)
            exit()

data.zip

The data to reproduce is attached in the zip file. Please fill in the "path" and run the code.

@guolinke
Copy link
Collaborator

@shiyu1994 can you help to check this? it may be related to the bugs you fixed recently.

@shiyu1994 shiyu1994 self-assigned this Jan 14, 2021
@shiyu1994
Copy link
Collaborator

@ZhangTP1996 The non-deterministic behavior comes from col-wise and row-wise histogram construction strategy. A quick fix to get a deterministic behavior is to set force_row_wise=True or force_col_wise=True. Otherwise LightGBM automatically chooses col-wise or row-wise strategy using a simple test which can make a different decision in each time.

However, ideally col-wise and row-wise should produce the same result, I'll continue to look into this.

@ZhangTP1996
Copy link
Author

@ZhangTP1996 The non-deterministic behavior comes from col-wise and row-wise histogram construction strategy. A quick fix to get a deterministic behavior is to set force_row_wise=True or force_col_wise=True. Otherwise LightGBM automatically chooses col-wise or row-wise strategy using a simple test which can make a different decision in each time.

However, ideally col-wise and row-wise should produce the same result, I'll continue to look into this.

Thanks for the rapid response. I will check this tomorrow.

@ZhangTP1996
Copy link
Author

@ZhangTP1996 The non-deterministic behavior comes from col-wise and row-wise histogram construction strategy. A quick fix to get a deterministic behavior is to set force_row_wise=True or force_col_wise=True. Otherwise LightGBM automatically chooses col-wise or row-wise strategy using a simple test which can make a different decision in each time.

However, ideally col-wise and row-wise should produce the same result, I'll continue to look into this.

It seems that setting force_row_wise=True solves my current issue. Thanks!

@shiyu1994
Copy link
Collaborator

This problem is a numerical issue. Since row-wise and col-wise accumulate gradient values in different ways, when the values of gradients and hessians are quite small, the resultant histogram will have slight differences. Following is the histogram values from the first tree where col_wise and row_wise differ, compared by diff.

6c6
< 0.8558845588,0.1233515594
---
> 0.8558845588,0.1233515668
13c13
< 0.01200347667,0.01187949732
---
> 0.01200347667,0.01187949639
15c15
< 3.043845621e-09,3.048046703e-09
---
> 3.043845399e-09,3.048046481e-09
31c31
< 8.874970808e-05,8.874182822e-05
---
> 8.874967898e-05,8.874180639e-05
38c38
< 0.9603701234,0.038059365
---
> 0.9603701234,0.03805936873
70c70
< -1.129891929e-36,1.129891929e-36
---
> -1.129891839e-36,1.129891839e-36
173c173
< 1.496937505e-38,1.496937505e-38
---
> 1.496937365e-38,1.496937365e-38
224c224
< 0.998657763,0.001340438845
---
> 0.998657763,0.001340438728

With col-wise, bin 31 is selected as threshold, while with row-wise bin 32 is selected.

Considering the numerical problem, I think a quick solution to provide deterministic behavior is to force using row_wise or col_wise when deterministic=True.

@shiyu1994
Copy link
Collaborator

Also, I noticed that during training, the training loss vibrates dramatically with the provided data (starting from iteration 38). May be we need to increase the numerical stability.

[1]	valid's binary_logloss: 0.0259893
[LightGBM] [Debug] Trained a tree with leaves = 31 and max_depth = 11
[2]	valid's binary_logloss: 0.0308559
[LightGBM] [Debug] Trained a tree with leaves = 31 and max_depth = 14
[3]	valid's binary_logloss: 0.0361613
[LightGBM] [Debug] Trained a tree with leaves = 31 and max_depth = 10
[4]	valid's binary_logloss: 0.0300254
[LightGBM] [Debug] Trained a tree with leaves = 31 and max_depth = 12
[5]	valid's binary_logloss: 0.0287943
[LightGBM] [Debug] Trained a tree with leaves = 31 and max_depth = 12
[6]	valid's binary_logloss: 0.0275819
[LightGBM] [Debug] Trained a tree with leaves = 31 and max_depth = 12
[7]	valid's binary_logloss: 0.0248224
[LightGBM] [Debug] Trained a tree with leaves = 31 and max_depth = 11
[8]	valid's binary_logloss: 0.0252161
[LightGBM] [Debug] Trained a tree with leaves = 31 and max_depth = 9
[9]	valid's binary_logloss: 0.0230506
[LightGBM] [Debug] Trained a tree with leaves = 31 and max_depth = 10
[10]	valid's binary_logloss: 0.0377827
[LightGBM] [Debug] Trained a tree with leaves = 31 and max_depth = 10
[11]	valid's binary_logloss: 0.0263986
[LightGBM] [Debug] Trained a tree with leaves = 31 and max_depth = 13
[12]	valid's binary_logloss: 0.0207991
[LightGBM] [Debug] Trained a tree with leaves = 31 and max_depth = 9
[13]	valid's binary_logloss: 0.0197117
[LightGBM] [Debug] Trained a tree with leaves = 31 and max_depth = 10
[14]	valid's binary_logloss: 0.0192411
[LightGBM] [Debug] Trained a tree with leaves = 31 and max_depth = 10
[15]	valid's binary_logloss: 0.018162
[LightGBM] [Debug] Trained a tree with leaves = 31 and max_depth = 8
[16]	valid's binary_logloss: 0.0174082
[LightGBM] [Debug] Trained a tree with leaves = 31 and max_depth = 9
[17]	valid's binary_logloss: 0.0170077
[LightGBM] [Debug] Trained a tree with leaves = 31 and max_depth = 9
[18]	valid's binary_logloss: 0.016684
[LightGBM] [Debug] Trained a tree with leaves = 31 and max_depth = 10
[19]	valid's binary_logloss: 0.0191817
[LightGBM] [Debug] Trained a tree with leaves = 31 and max_depth = 9
[20]	valid's binary_logloss: 0.0179756
[LightGBM] [Debug] Trained a tree with leaves = 31 and max_depth = 12
[21]	valid's binary_logloss: 0.0158201
[LightGBM] [Debug] Trained a tree with leaves = 31 and max_depth = 9
[22]	valid's binary_logloss: 0.0193097
[LightGBM] [Debug] Trained a tree with leaves = 31 and max_depth = 16
[23]	valid's binary_logloss: 0.014494
[LightGBM] [Debug] Trained a tree with leaves = 31 and max_depth = 13
[24]	valid's binary_logloss: 0.019959
[LightGBM] [Debug] Trained a tree with leaves = 31 and max_depth = 10
[25]	valid's binary_logloss: 0.0217315
[LightGBM] [Debug] Trained a tree with leaves = 31 and max_depth = 10
[26]	valid's binary_logloss: 0.0230094
[LightGBM] [Debug] Trained a tree with leaves = 31 and max_depth = 12
[27]	valid's binary_logloss: 0.0206723
[LightGBM] [Debug] Trained a tree with leaves = 31 and max_depth = 11
[28]	valid's binary_logloss: 0.0163684
[LightGBM] [Debug] Trained a tree with leaves = 31 and max_depth = 11
[29]	valid's binary_logloss: 0.0148431
[LightGBM] [Debug] Trained a tree with leaves = 31 and max_depth = 9
[30]	valid's binary_logloss: 0.0289696
[LightGBM] [Debug] Trained a tree with leaves = 31 and max_depth = 12
[31]	valid's binary_logloss: 0.0223404
[LightGBM] [Debug] Trained a tree with leaves = 31 and max_depth = 9
[32]	valid's binary_logloss: 0.0406943
[LightGBM] [Debug] Trained a tree with leaves = 31 and max_depth = 7
[33]	valid's binary_logloss: 0.0394845
[LightGBM] [Debug] Trained a tree with leaves = 31 and max_depth = 12
[34]	valid's binary_logloss: 0.0388705
[LightGBM] [Debug] Trained a tree with leaves = 31 and max_depth = 11
[35]	valid's binary_logloss: 0.0758646
[LightGBM] [Debug] Trained a tree with leaves = 31 and max_depth = 10
[36]	valid's binary_logloss: 0.0532624
[LightGBM] [Debug] Trained a tree with leaves = 31 and max_depth = 12
[37]	valid's binary_logloss: 0.0820735
[LightGBM] [Debug] Trained a tree with leaves = 31 and max_depth = 9
[38]	valid's binary_logloss: 0.0800012
[LightGBM] [Debug] Trained a tree with leaves = 31 and max_depth = 10
[39]	valid's binary_logloss: 0.186073
[LightGBM] [Debug] Trained a tree with leaves = 31 and max_depth = 8
[40]	valid's binary_logloss: 0.0959558
[LightGBM] [Debug] Trained a tree with leaves = 31 and max_depth = 11
[41]	valid's binary_logloss: 0.114168
[LightGBM] [Debug] Trained a tree with leaves = 31 and max_depth = 9
[42]	valid's binary_logloss: 0.145749
[LightGBM] [Debug] Trained a tree with leaves = 31 and max_depth = 10
[43]	valid's binary_logloss: 0.153056
[LightGBM] [Debug] Trained a tree with leaves = 31 and max_depth = 9
[44]	valid's binary_logloss: 0.101863
[LightGBM] [Debug] Trained a tree with leaves = 31 and max_depth = 8
[45]	valid's binary_logloss: 0.021308
[LightGBM] [Debug] Trained a tree with leaves = 31 and max_depth = 11
[46]	valid's binary_logloss: 0.149215
[LightGBM] [Debug] Trained a tree with leaves = 31 and max_depth = 9
[47]	valid's binary_logloss: 0.0238757
[LightGBM] [Debug] Trained a tree with leaves = 31 and max_depth = 9
[48]	valid's binary_logloss: 0.107908
[LightGBM] [Debug] Trained a tree with leaves = 31 and max_depth = 11
[49]	valid's binary_logloss: 0.0173251

@StrikerRUS
Copy link
Collaborator

Closed via #4027.

@github-actions
Copy link

This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 23, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants