Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[python-package] do not copy column-major numpy arrays when predicting #6751

Merged
merged 1 commit into from
Dec 15, 2024

Conversation

jmoralez
Copy link
Collaborator

@jmoralez jmoralez commented Dec 12, 2024

Follow up to #6721. This prevents copying the data array when predicting if it's column-major.

@jmoralez jmoralez marked this pull request as ready for review December 12, 2024 17:32
Copy link
Collaborator

@StrikerRUS StrikerRUS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for continuing working on this!

@StrikerRUS
Copy link
Collaborator

StrikerRUS commented Dec 14, 2024

There is one more place where _np2d_to_np1d() can simplify the code and save memory:

if mat.dtype == np.float32 or mat.dtype == np.float64:
mats[i] = np.asarray(mat.reshape(mat.size), dtype=mat.dtype)
else: # change non-float data to float data, need to copy
mats[i] = np.array(mat.reshape(mat.size), dtype=np.float32)

But it'll require to change from single flag int is_row_major to array of such flags int* is_row_major in LGBM_DatasetCreateFromMats

Could you please prepare one more PR for that?

Copy link
Collaborator

@jameslamb jameslamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Merging this as-is because @StrikerRUS 's suggestion in #6751 (comment) is for a follow-up PR.

@jameslamb jameslamb merged commit 1090a93 into master Dec 15, 2024
48 checks passed
@jameslamb jameslamb deleted the no-copy-predict-np-col-major branch December 15, 2024 05:45
@jmoralez
Copy link
Collaborator Author

The create from mats function takes a single int argument for the layout

LIGHTGBM_C_EXPORT int LGBM_DatasetCreateFromMats(int32_t nmat,
const void** data,
int data_type,
int32_t* nrow,
int32_t ncol,
int is_row_major,

so they should all have the same layout.

I can work on that, i.e. try not to copy if they all have the same layout and enforce row-major otherwise.

@StrikerRUS
Copy link
Collaborator

@jmoralez

The create from mats function takes a single int argument for the layout ...

Yeah, I saw this before writing my suggestion, and that's why I wrote:

But it'll require to change from single flag int is_row_major to array of such flags int* is_row_major in LGBM_DatasetCreateFromMats

I think that a small breaking change in public API worth it in exchange of saving memory.
This will allow to pass mats with different layouts.

@jmoralez
Copy link
Collaborator Author

I see, I had misunderstood the suggestion. Sounds good, I'll work on that.

@StrikerRUS
Copy link
Collaborator

@jmoralez Thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants