Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] What is the default value of K (Max Conflict Count)? #6104

Open
deana00 opened this issue Sep 16, 2023 · 2 comments
Open

[Question] What is the default value of K (Max Conflict Count)? #6104

deana00 opened this issue Sep 16, 2023 · 2 comments
Labels

Comments

@deana00
Copy link

deana00 commented Sep 16, 2023

image

Hello, I have few questions regarding the algorithm of LightGBM. Currently, I am using LightGBM for my undergraduate thesis.

  1. In the Greedy Bundling algorithm showed above as in the LightGBM paper (https://proceedings.neurips.cc/paper_files/paper/2017/file/6449f44a102fde848669bdd9eb6b76fa-Paper.pdf), what is the default value of K (max conflict count)? Do we need to assign it explicitly? Can you please show me the code where K is used?
  2. For binary classification, how to calculate gain? Can you please give me an example of the calculations?
  3. If I have features
    F0 = [1,0,1,0,2,3,0,1,1,2] and
    F1 = [0,1,0,3,0,0,0,4,0,0],
    which one is correct between newBundle = [4,8,4,10,5,6,0,11,4,5] as Algorithm 4 in the paper,
    newBundle = [1,4,1,6,2,3,0,7,1,2], or newBundle = [1,4,1,6,2,3,0,1,1,2]?
  4. In Algorithm 4 below, where does the value of numBin come from? How do we know the value of numBin for each features?
    IMG_20230916_153038.jpg
  5. How do we get the value of bin boundaries?

I am sorry for asking too much questions. But I hope someone here can give me better understanding, thank you.

Edited:

  1. Already answered here Exclusive Feature Bundle is for categorical data only? #4114 (comment). Default value of K is number of data/10000, no?
@shiyu1994
Copy link
Collaborator

@deana00 Link to the value of K here.

const data_size_t single_val_max_conflict_cnt =

It is about 1/10000 of the number of sampled data used to determine the groups and bins.

@deana00
Copy link
Author

deana00 commented Sep 20, 2023

@deana00 Link to the value of K here.

const data_size_t single_val_max_conflict_cnt =

It is about 1/10000 of the number of sampled data used to determine the groups and bins.

Hi, thank you for your response. Can you elaborate your last sentence about determining the bins?

So, how do we determine the number of bin and the bin boundaries needed to build histogram?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants