-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rstudio on Windows throws Fatal error, on Mac works fine. #2412
Comments
@henry090 Does LightGBM crash with the default example? library(lightgbm)
data(agaricus.train, package = "lightgbm")
train <- agaricus.train
dtrain <- lgb.Dataset(train$data, label = train$label)
data(agaricus.test, package = "lightgbm")
test <- agaricus.test
dtest <- lgb.Dataset.create.valid(dtrain, test$data, label = test$label)
params <- list(objective = "regression", metric = "l2")
valids <- list(test = dtest)
model <- lgb.train(params,
dtrain,
100,
valids,
min_data = 1,
learning_rate = 1,
early_stopping_rounds = 10) |
Yes, as I mentioned, it works fine. This dataset causes fatal error on windows. https://www.kaggle.com/c/ieee-fraud-detection I figured out that when I read 10k rows, it does not throw fatal error. When load 100k rows it thrwos an error.
|
Hi, I noticed that after fresh installation for the 1st time LightGBM works (it runs the training process successfully). 2nd time it throws a fatal error (when you restart RStudio). |
@henry090 thanks for the report and apologies for our delay. I will see if I can reproduce this. |
I tested another dataset from Kaggle competition and it works fine. The dataset which I mentioned earlier has 400 or even more columns. Maybe it is related to the number of columns. |
@henry090 I tried to reproduce this today. Short AnswerI think that if you stop using git clone --recursive https://github.com/jameslamb/LightGBM
cd LightGBM
git fetch origin r/msvc-support
git checkout r/msvc-support
Rscript build_r.R Long AnswerI put some work in today to try to reproduce the issue you reported. The rest of this issue details that attempt. Sorry it took us so long to get back to you! This issue just slipped through. my environmentBecause you said in #2412 (comment) that sometimes the code succeeds the first time and fails the second time, for each test I installed the R package, loaded it, ran the code, removed all objects and restarted the session, then tried again.
sessionInfo (click me)
I used your code from #2412 (comment) (thanks for providing that!). I had to add a R code I ran (click me)library(tidyquant)
library(data.table)
library(dplyr)
library(lightgbm)
# Get training data by clicking "download all" at
# https://www.kaggle.com/c/ieee-fraud-detection/data
#
# unzip the .zip archive downloaded, so that all the .csv files are in a
# "data/" directory under the working directory you run this R code from
DATA_DIR <- file.path(getwd(), "data")
train_trs <- data.table::fread(
file.path(DATA_DIR, 'train_transaction.csv')
, na.strings = ''
, nrows = 100000
)
train_iden <- data.table::fread(
file.path(DATA_DIR, 'train_identity.csv')
, na.strings = ''
, nrows = 100000
)
test_trs <- data.table::fread(
file.path(DATA_DIR, 'test_transaction.csv')
, na.strings = ''
, nrows = 100000
)
test_iden <- data.table::fread(
file.path(DATA_DIR, 'test_identity.csv')
, na.strings = ''
, nrows = 100000
)
print('reading is done')
train <- left_join(train_trs,train_iden) %>% select(-c(TransactionID))
test <- left_join(test_trs,test_iden) %>% select(-c(TransactionID))
print('join is done')
total = bind_rows(train,test)
total = total %>% select(c(isFraud,TransactionDT,TransactionAmt,ProductCD,card1,card2,card3,card4,card5,card6,addr1,addr2,dist1,dist2,P_emaildomain,R_emaildomain,C3,C5,C7,D1,D3,D8,D9,D10,D11,D12,D13,D14,D15,M1,M2,M3,M4,M5,M6,M7,M8,M9,V1,V2,V3,V4,V6,V7,V8,V11,V12,V20,V24,V25,V27,V35,V37,V44,V46,V47,V53,V55,V56,V62,V65,V66,V75,V77,V81,V83,V86,V98,V107,V109,V110,V112,V115,V116,V118,V119,V120,V121,V122,V123,V124,V131,V135,V138,V140,V142,V147,V149,V159,V161,V162,V166,V169,V170,V171,V173,V174,V175,V184,V187,V188,V189,V190,V194,V195,V196,V200,V205,V206,V207,V208,V209,V210,V216,V220,V222,V223,V224,V227,V231,V234,V235,V238,V240,V241,V250,V252,V253,V260,V262,V270,V271,V273,V281,V282,V283,V284,V285,V286,V289,V290,V291,V294,V296,V297,V300,V305,V311,V313,V319,V321,id_01,id_02,id_03,id_04,id_05,id_06,id_07,id_08,id_09,id_10,id_11,id_12,id_13,id_14,id_15,id_16,id_17,id_18,id_19,id_20,id_21,id_22,id_23,id_24,id_25,id_26,id_27,id_28,id_29,id_30,id_31,id_32,id_33,id_34,id_35,id_36,id_37,id_38,DeviceType,DeviceInfo))
rm(train_trs,train_iden,test_trs,test_iden,train,test)
total$id_30 %>% gsub(.,replacement = '',pattern = '[[:punct:]]|[[:digit:]]') %>%
trimws() ->total$id_30
total %>% tidyr::separate(P_emaildomain,c('pdomain','ptype_org','pcountry_domain'),sep='.')->total
total %>% tidyr::separate(R_emaildomain,c('rdomain','rtype_org','rcountry_domain'),sep='.')->total
total$id_31 %>% as.character() %>% gsub(.,replacement = '',pattern = '[[:digit:]]|\\.') %>%
trimws() %>% stringr::str_to_lower() ->total$id_31
total %>% tidyr::separate(id_31,c('id_31_1st','id_31_2nd'),sep="/| ") ->total
total$id_31_2nd = total$id_31_2nd %>% gsub(.,replacement = '',pattern = '[[:punct:]]')
total %>% tidyr::separate(id_33,c('length','width'),'x') ->total
total%>% mutate(length = as.numeric(length),width=as.numeric(width))->total
total %>% tidyr::separate(DeviceInfo,c('device1','device2'),sep=" |/|:|-") ->total
total[total=='']=NA
total = total %>% mutate_if(is.character,as.factor)
total <- total %>% arrange(TransactionDT) %>%
mutate(hr = floor( (TransactionDT / 3600) %% 24 ),
weekday = floor( (TransactionDT / 3600 / 24) %% 7)
) %>% select(-TransactionDT)
train = total %>% filter(!is.na(isFraud))
test= total %>% filter(!is.na(isFraud))
rules <- lightgbm::lgb.prepare_rules(data = train)
train <- rules$data
test <- lightgbm::lgb.prepare_rules(data = test, rules = rules$rules)$data
set.seed(500)
pct = floor(nrow(train)*0.8)
tr_0 = train[1:pct,]
tr1 = train[(1+pct):nrow(train),]
dtrain <- lightgbm::lgb.Dataset(data.matrix(tr_0 %>% select(-isFraud)),
label = tr_0 %>% pull(isFraud))
dtest <- lightgbm::lgb.Dataset(data.matrix(tr1 %>% select(-isFraud)),
label = tr1 %>% pull(isFraud))
p <- list(
boosting_type = 'gbdt',
objective = "binary" ,
metric = "AUC",
boost_from_average = "false",
learning_rate = 0.008,
num_leaves = 197,
min_gain_to_split = 0,
feature_fraction = 0.3,
bagging_freq = 1,
bagging_fraction = 0.7,
min_data_in_leaf = 100,
lambda_l1 = 0,
lambda_l2 = 0
)
model <- lgb.train(data = dtrain,
params= p,
nrounds=5000,
valids = list(val1 = dtrain ,val2 = dtest),
metric="auc",
obj = "binary",
eval_freq = 200,
early_stopping_rounds=150
) Results with installing from GitHub (
|
Hi, Currently due to quarantine I cannot access my laptop. Please, close this issue because I will not be able to check. If I can, I will inform you. Thank you so much James. |
This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this. |
Hi, I run my code snippet both on my Mac and Windows. Mac works fine but on Windows, RStudio throws a fatal error... What could be the possible reason of error?
But your examples for LightGBM on github work fine in both systems. So, probably this issue is related to my dataset.
AFAIR @Laurae2 once advised that integer and missing values could be the main reason. But again, Mac does not throw any kind of errors for NA and integers despite that I filled NAs with 0s and encoded integers to numeric on Windows.
The text was updated successfully, but these errors were encountered: