-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[python-package] Allow to pass Arrow table and array as init scores #6167
[python-package] Allow to pass Arrow table and array as init scores #6167
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this! I support it overall.
I think it should be possible to provide 2D pyarrow.Array
and pyarrow.ChunkedArray
objects for init_score
as well. Are there some limitations there that I'm missing, that led you to only designate pyarrow.Table
as "for multi-class" in the docs + not add tests on 2D pyarrow.Array
/ pyarrow.ChunkedArray
?
If that was just an oversight, please add some tests on providing 2D data for init_score
via the pyarrow
array types.
I'm not sure I mistook something in the Arrow specification but aren't Arrow arrays 1D by design? In my mind, there's no n-d Arrow arrays (like you have NumPy arrays), an Arrow array is really just a vector. |
Oh I guess they are: https://blog.djnavarro.net/posts/2022-05-25_arrays-and-tables-in-arrow/#data-objects (the author was working at Voltron on Arrow when that was written, I believe) Sorry, I'm not that familiar with Arrow... trying to get caught up to review these PRs. I think this was the first of these PRs that included both the Array types and Table, so the first place I really thought about it (e.g. the absence of Thanks for your patience and quick answers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great to me, thanks!
@shiyu1994 or @guolinke could you help with a review as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Can we continue here @jameslamb? 🙏🏼 would love to get Arrow support in a LightGBM release before Christmas 😄 |
I dont want to merge this until #6216 is merged, as that fixes an issue causing CI to fail deterministically. That is just waiting for a review. Sorry for the delay. |
Ok #6216 is merged, thanks to @guolinke ! And thanks to @shiyu1994 , I could merge Once CI passes, I'll merge this and we can move on to the next one. Thanks as always for your patience and help @borchero 😊 |
What's up with the CUDA CI, is this a problem across this repo? 👀 |
It seems like it, yes. You can see all runs of GitHub Actions in a project at the https://github.com/microsoft/LightGBM/actions/workflows/cuda.yml Looks to me like many of them failed yesterday (around 15-20 hours ago) with the same types of errors.
(example build not from this PR) These run on a dedicated VM Microsoft pays for, not GitHub's fleet of hosted runners, so there can only be one job running at a time across the entire project. That makes it easy for them to get backed up. I do see a task got picked up and is running right now for another PR: https://github.com/microsoft/LightGBM/actions/runs/7061296005/job/19222636772 Let's see if that succeeds. Sorry for the inconvenience. |
Yeah the CUDA CI runs are all timing out at this point: LightGBM/.github/workflows/cuda.yml Line 106 in 5083df1
@shiyu1994 can you please look into it? Maybe try shelling into that box and running the steps from the CI configuration manually, to see if you can get more logs? I suspect it might be something like an interactive dialog that's being triggered, and then timing out waiting for that. |
@shiyu1994 can you please help with the CUDA machine or give me access so I can do it? The most recent manually-triggered re-run for this PR has been stuck for almost 24 hours.
Thank you. |
The agent for CUDA action simply doesn't respond with and command in the shell. I've restarted the VM and it should work now. |
Thanks so much @shiyu1994 ! I just restarted the builds for this PR. |
🎉 that worked, thank you again! https://github.com/microsoft/LightGBM/actions/runs/7054883149?pr=6167 |
This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this. |
Motivation
This is part of a series of PRs towards #6022 and depends on #6166.
For a legible diff, see borchero/LightGBM@arrow-support-groups...arrow-support-init-scores.