Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SYCL: Remove misleading ggml_sycl_op_flatten function #12387

Merged
merged 7 commits into from
Mar 31, 2025

Conversation

qnixsynapse
Copy link
Collaborator

@qnixsynapse qnixsynapse commented Mar 14, 2025

Original work of #11515. Tried to submit smaller change this time.

@github-actions github-actions bot added ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language labels Mar 14, 2025
// dd = data device
float * src0_ddf = (float *) src0->data;
float * src1_ddf = use_src1 ? (float *) src1->data : nullptr;
float * dst_ddf = (float *) dst->data;
Copy link
Collaborator Author

@qnixsynapse qnixsynapse Mar 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function typecasted src0, src1 and dst tensors to float regardless of whatever types they were. I think we don't want this behaviour in the long run.

Copy link

@acbits acbits Mar 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, why we don't use auto? You could avoid the casting and if the datatype changes in future, you don't have to rework it again?

Modern C++ code should use auto where ever it is possible.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to create a separate pointer variable when we are passing the ggml_tensor itself anyways to the kernel OP function which contains everything needed to perform the OP?

This comment was marked as spam.

@qnixsynapse qnixsynapse force-pushed the remove_op_flatten_fn branch from f42c7bb to 73f1849 Compare March 14, 2025 12:11
@qnixsynapse qnixsynapse marked this pull request as draft March 14, 2025 12:27
@qnixsynapse qnixsynapse marked this pull request as ready for review March 14, 2025 13:19
@zunigasllc

This comment was marked as spam.

@arthw
Copy link
Collaborator

arthw commented Mar 15, 2025

This PR include many code change. But no any bug or feature work.
Because some CI UT cases are skipped manually for 100% pass rate. I'm afraid the updated code can't be verified by CI cases.

So it's high risk to update so much code without 100% UT cover rate.

I suggest refactor them as additional work for bug fix or feature in same source file or class.
So that the code will be covered by test fully.

@qnixsynapse
Copy link
Collaborator Author

This PR include many code change. But no any bug or feature work.
Because some CI UT cases are skipped manually for 100% pass rate. I'm afraid the updated code can't be verified by CI cases.

So it's high risk to update so much code without 100% UT cover rate.

I suggest refactor them as additional work for bug fix or feature in same source file or class.
So that the code will be covered by test fully.

I usually test them locally(by running a model) before opening a PR here. If you want, we can update the CI to run test cases in an available Nvidia ci/cd instance here.

Copy link
Collaborator

@Rbiessy Rbiessy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense to me. We have to be careful when we merge this as it will break #12412 though.

@qnixsynapse
Copy link
Collaborator Author

@Rbiessy I don't think this will break RWKV kernels because they don't depend on ggml_sycl_op_flatten function. I will do a rebase and test.

Please note that this is a prerequisite for supporting F16 types in unary and eltwise operations which I intend to do in future PRs.

@NeoZhangJianyu
Copy link
Collaborator

This PR include many code change. But no any bug or feature work.
Because some CI UT cases are skipped manually for 100% pass rate. I'm afraid the updated code can't be verified by CI cases.
So it's high risk to update so much code without 100% UT cover rate.
I suggest refactor them as additional work for bug fix or feature in same source file or class.
So that the code will be covered by test fully.

I usually test them locally(by running a model) before opening a PR here. If you want, we can update the CI to run test cases in an available Nvidia ci/cd instance here.

Test with several models can't cover the OPs.
It's better to run CI.

If your PC has iGPU in intel Core CPU (since 11th or newer Core CPU), you could run the CI on it too.

@Rbiessy
Copy link
Collaborator

Rbiessy commented Mar 18, 2025

@Rbiessy I don't think this will break RWKV kernels because they don't depend on ggml_sycl_op_flatten function. I will do a rebase and test.

Please note that this is a prerequisite for supporting F16 types in unary and eltwise operations which I intend to do in future PRs.

I was referring to this line which adds a usage of ggml_sycl_op_flatten. It will be fine if you rebase it now.

@qnixsynapse
Copy link
Collaborator Author

@Rbiessy I don't think this will break RWKV kernels because they don't depend on ggml_sycl_op_flatten function. I will do a rebase and test.
Please note that this is a prerequisite for supporting F16 types in unary and eltwise operations which I intend to do in future PRs.

I was referring to this line which adds a usage of ggml_sycl_op_flatten. It will be fine if you rebase it now.

Ah okay. That can be fixed.

@qnixsynapse qnixsynapse marked this pull request as draft March 19, 2025 15:05
@qnixsynapse qnixsynapse force-pushed the remove_op_flatten_fn branch 2 times, most recently from 1384273 to e99683f Compare March 22, 2025 06:43
@qnixsynapse qnixsynapse force-pushed the remove_op_flatten_fn branch from e99683f to dc87ed7 Compare March 27, 2025 06:02
@qnixsynapse qnixsynapse marked this pull request as ready for review March 27, 2025 06:02
@qnixsynapse qnixsynapse marked this pull request as draft March 27, 2025 12:03
@qnixsynapse qnixsynapse marked this pull request as ready for review March 27, 2025 12:31
@qnixsynapse qnixsynapse requested a review from Alcpz March 27, 2025 15:04
Copy link
Collaborator

@Alcpz Alcpz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks great to me, thanks for simplifying the code a bit!
Do you have an answer to the try catch questions from @Rbiessy (#12387 (comment))?

I'm happy to give the approval but if you intend to add a few more changes then I will wait until you finish all the stuff.

@qnixsynapse
Copy link
Collaborator Author

Overall looks great to me, thanks for simplifying the code a bit!

Thank you. Will add support for fp16 once this PR gets merged.

Do you have an answer to the try catch questions from @Rbiessy (#12387 (comment))?

I'm happy to give the approval but if you intend to add a few more changes then I will wait until you finish all the stuff.

If you look at my latest changes, I did exactly what @Rbiessy suggested, i.e to add the try - catch only to the compute_forward function and removed it on elementwise ops functions

Copy link
Collaborator

@Rbiessy Rbiessy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, great work!

@Alcpz
Copy link
Collaborator

Alcpz commented Mar 31, 2025

If you look at my latest changes, I did exactly what @Rbiessy suggested, i.e to add the try - catch only to the compute_forward function and removed it on elementwise ops functions

I misunderstood the comment then. LGTM!

@Rbiessy Rbiessy merged commit 6c02a03 into ggml-org:master Mar 31, 2025
48 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants