-
Notifications
You must be signed in to change notification settings - Fork 11.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SYCL: Remove misleading ggml_sycl_op_flatten function #12387
Conversation
// dd = data device | ||
float * src0_ddf = (float *) src0->data; | ||
float * src1_ddf = use_src1 ? (float *) src1->data : nullptr; | ||
float * dst_ddf = (float *) dst->data; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function typecasted src0, src1 and dst tensors to float regardless of whatever types they were. I think we don't want this behaviour in the long run.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious, why we don't use auto? You could avoid the casting and if the datatype changes in future, you don't have to rework it again?
Modern C++ code should use auto where ever it is possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to create a separate pointer variable when we are passing the ggml_tensor itself anyways to the kernel OP function which contains everything needed to perform the OP?
This comment was marked as spam.
This comment was marked as spam.
Sorry, something went wrong.
f42c7bb
to
73f1849
Compare
This comment was marked as spam.
This comment was marked as spam.
This PR include many code change. But no any bug or feature work. So it's high risk to update so much code without 100% UT cover rate. I suggest refactor them as additional work for bug fix or feature in same source file or class. |
I usually test them locally(by running a model) before opening a PR here. If you want, we can update the CI to run test cases in an available Nvidia ci/cd instance here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense to me. We have to be careful when we merge this as it will break #12412 though.
@Rbiessy I don't think this will break RWKV kernels because they don't depend on ggml_sycl_op_flatten function. I will do a rebase and test. Please note that this is a prerequisite for supporting F16 types in unary and eltwise operations which I intend to do in future PRs. |
Test with several models can't cover the OPs. If your PC has iGPU in intel Core CPU (since 11th or newer Core CPU), you could run the CI on it too. |
I was referring to this line which adds a usage of ggml_sycl_op_flatten. It will be fine if you rebase it now. |
Ah okay. That can be fixed. |
1384273
to
e99683f
Compare
e99683f
to
dc87ed7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks great to me, thanks for simplifying the code a bit!
Do you have an answer to the try catch
questions from @Rbiessy (#12387 (comment))?
I'm happy to give the approval but if you intend to add a few more changes then I will wait until you finish all the stuff.
Thank you. Will add support for fp16 once this PR gets merged.
If you look at my latest changes, I did exactly what @Rbiessy suggested, i.e to add the try - catch only to the compute_forward function and removed it on elementwise ops functions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, great work!
I misunderstood the comment then. LGTM! |
Original work of #11515. Tried to submit smaller change this time.