Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

h2o.predict wrong when supplied a mojo with offset_column and fold_column #15697

Closed
hutch3232 opened this issue Aug 15, 2023 · 4 comments
Closed
Assignees
Labels
Milestone

Comments

@hutch3232
Copy link

H2O version, Operating System and Environment
Occurs in 3.42.0.2 and at least as far back as 3.36.0.4
Occurred on Windows and Linux versions of R:

java version "11.0.10" 2021-01-19 LTS
Java(TM) SE Runtime Environment 18.9 (build 11.0.10+8-LTS-162)
Java HotSpot(TM) 64-Bit Server VM 18.9 (build 11.0.10+8-LTS-162, mixed mode)

Actual behavior
Predictions from an imported mojo, when the GLM was built with both an offset_column and a fold_column are incorrect. They do not match the in-memory model or the binary model object.

Expected behavior
The predictions should match and should incorporate the effect of the offset.

Steps to reproduce

library(data.table)
library(h2o)

h2o.init(port = 58282)
h2o.getVersion()
# [1] "3.42.0.2"

mt <- as.h2o(mtcars)

find_offset <- h2o.glm(
  x = c("cyl", "disp"),
  y = "mpg",
  training_frame = mt,
  lambda = 0
)

h2o.residual_deviance(find_offset)
# [1] 270.7403

# create offset to look exactly like it's in the model (easy comparison)
mt$offset <- mt$cyl * find_offset@model$coefficients[["cyl"]]
mt$fold <- h2o.kfold_column(data = mt, nfolds = 3, seed = 123)

# move "cyl" from being modeled to be an offset
# build with a fold column
mod_w_offset <- h2o.glm(
  x = c("disp"),
  y = "mpg",
  training_frame = mt,
  offset = "offset",
  lambda = 0,
  fold_column = "fold"
)

h2o.residual_deviance(mod_w_offset)
# [1] 270.7403 (match as expected)

# save out models then immediately reimport
mojo_path <- h2o.save_mojo(object = mod_w_offset, path = ".")
biny_path <- h2o.saveModel(object = mod_w_offset, path = ".")

mojo <- h2o.import_mojo(mojo_file_path = mojo_path)
biny <- h2o.loadModel(path = biny_path)

h2o.cbind(
  h2o.predict(object = mod_w_offset, newdata = mt),
  h2o.predict(object = mojo, newdata = mt),
  h2o.predict(object = biny, newdata = mt)
)

# in-memory model and binary match, mojo way off

#     predict predict0 predict1
# 1 21.84395 33.36761 21.84395
# 2 21.84395 32.36761 21.84395
# 3 26.08886 33.43796 26.08886
# 4 19.82676 31.35042 19.82676
# 5 14.55267 29.25089 14.55267
# 6 20.50602 31.02968 20.50602
# 
# [32 rows x 3 columns]

Upload logs
NA

Screenshots
NA

Additional context
It seems related to these issues I previously submitted (and were resolved):
#6955 and #6980

@hutch3232 hutch3232 added the bug label Aug 15, 2023
@hutch3232 hutch3232 changed the title h2o.predict wrong when supplied a GLM mojo with offset_column and fold_column h2o.predict wrong when supplied a mojo with offset_column and fold_column Sep 18, 2023
@hutch3232
Copy link
Author

hutch3232 commented Jan 25, 2024

Hi @syzonyuliia and @wendycwong, I wanted to follow-up again on this issue to see if it's possibly been reproduced on your end? It feels like a pretty severe bug, especially because it's silently returning bad vales, so I hope it could be addressed. Thank you!

@wendycwong
Copy link
Contributor

Hi Hutch: I am in the middle of resolving a customer issue. Will take a look at this issue next.

@wendycwong wendycwong assigned wendycwong and unassigned syzonyuliia Mar 18, 2024
@wendycwong wendycwong added this to the 3.46.0.2 milestone Mar 22, 2024
@wendycwong
Copy link
Contributor

Paul:

I solved the problem. Mojo failed to recognize fold column and hence use the wrong column as the offset column. You can fix the problem on your side by removing the fold column before calling the prediction.

I fixed the problem but it will take a while to merge this into 3.46.0.2.

Thank you so much for bringing this to my attention.

W

@hutch3232
Copy link
Author

Thank you, Wendy, that's great to hear!

wendycwong added a commit that referenced this issue Apr 19, 2024
* GH-15697: add R test to reproduce error.

* add generic model to test as well.
* Add fold column info to generic model.
* add compareFrame to compare prediction results.
* add prediction with fold column removed

Co-authored-by: wendycwong <[email protected]>
Co-authored-by: Veronika Maurerová <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants