Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCLomatic] Enable migration of CUBLASLT: EPILOGUE_BIAS, EPILOGUE_GELU, EPILOGUE_GELU_AUX, EPILOGUE_GELU_AUX_BIAS #2460

Open
wants to merge 17 commits into
base: SYCLomatic
Choose a base branch
from

Conversation

abhilash1910
Copy link
Contributor

} else if (compute_desc->_epilogue == epilogue_t::bias) {
matmul_ops.append_binary(::dnnl::algorithm::binary_add, bias_md);
} else if (compute_desc-> epilogue == epilogue::gelu_aux_bias) {
matmul_ops.append_eltwise(::dnnl::algorithm::eltwise_gelu_erf, 0.f, 0.f);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. CUBLASLT_EPILOGUE_GELU_AUX_BIAS needs to output the bias result to CUBLASLT_MATMUL_DESC_EPILOGUE_AUX_POINTER. But here, your implementation does not.
  2. Please write an E2E test to validate your implementation. I tried something similar before but encountered a runtime error or an incorrect result.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Please help to review.

Copy link
Contributor

@zhiweij1 zhiweij1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please refine

} else if (compute_desc-> epilogue == epilogue::gelu_aux_bias) {
matmul_ops.append_eltwise(::dnnl::algorithm::eltwise_gelu_erf, 0.f, 0.f);
matmul_ops.append_binary(::dnnl::algorithm::binary_add, bias_md);
compute_desc->set_attribute(matmul_desc_t::attribute::epilogue_aux_pointer, bias_mem);
Copy link
Contributor

@zhiweij1 zhiweij1 Nov 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to copy the data in bias_mem to _epilogue_aux_pointer. Meanwhile, please also consider the data layout (row_major/col_major) and the leading dimension (CUBLASLT_MATMUL_DESC_EPILOGUE_AUX_LD).

And according to the cublaslt doc, CUBLASLT_MATMUL_DESC_BIAS_POINTER is also related to this epilogue. Please check the usage.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I directly added matrix memcpy to copy epilogue_aux_pointer contents from bias_mem pointer, using the epilogue_aux_ld is and new_ldc (bias ld). This usage is for matrix memcpy, and will try to see if tests are ok .
Please suggest in meantime.

Copy link
Contributor

@zhiweij1 zhiweij1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please refine

@zhiweij1
Copy link
Contributor

Also could you please prepare an E2E test first? like https://github.com/oneapi-src/SYCLomatic-test/blob/SYCLomatic/features/feature_case/cublasLt/matmul.cu .
Make sure your test case (cuda code) can run pass on NV HW. Then migrate it to sycl. Compile the sycl code within your PR, make sure the result is same as the cuda code output.

@abhilash1910 abhilash1910 reopened this Nov 26, 2024
} else if (compute_desc->_epilogue == epilogue_t::gelu_aux_bias) {
matmul_ops.append_eltwise(::dnnl::algorithm::eltwise_gelu_erf, 0.f, 0.f);
matmul_ops.append_binary(::dnnl::algorithm::binary_add, bias_md);
dpct::blas::matrix_mem_copy(matmul_desc_t::attribute::epilogue_aux_pointer, bias_mem,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

matmul_desc_t::attribute::epilogue_aux_pointer is a enum, not a pointer. bias_mem is a pointer of dnnl::memory, not a pointer to data.

@abhilash1910 abhilash1910 changed the title [SYCLomatic] Enable migration of CUBLASLT_EPILOGUE_BIAS & CUBLASLT_EPILOGUE_GELU_AUX_BIAS [SYCLomatic] Enable migration of CUBLASLT: EPILOGUE_BIAS, EPILOGUE_GELU, EPILOGUE_GELU_AUX, EPILOGUE_GELU_AUX_BIAS Nov 29, 2024
matmul_ops.append_eltwise(::dnnl::algorithm::eltwise_gelu_erf, 0.f, 0.f);
} else if (compute_desc->_epilogue == epilogue_t::gelu_aux) {
matmul_ops.append_eltwise(::dnnl::algorithm::eltwise_gelu_erf, 0.f, 0.f);
dpct::blas::matrix_mem_copy(compute_desc->_epilogue_aux_pointer, new_c,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forget one issue. This dpct::blas::matrix_mem_copy does not return sycl::event. So we cannot control the dependency chain when using non-USM.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another issue is new_c is the bias, not the result of (A*B+C). I did not find there is a method to get that immediate result when using oneDNN post-op.
Maybe we need to use a separate oneDNN elewise primitive.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes correct, for the second point, I see it is raised in JIRA. Currently I could not find a way to extract immediate result of A*B+C for the aux memcpy.

For the first case, do I have to force use USM always then ? What do you suggest ?

if (compute_desc->_epilogue == epilogue_t::relu) {
matmul_ops.append_eltwise(::dnnl::algorithm::eltwise_relu, 0.f, 0.f);
} else if (compute_desc->_epilogue == epilogue_t::gelu) {
matmul_ops.append_eltwise(::dnnl::algorithm::eltwise_gelu_erf, 0.f, 0.f);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
matmul_ops.append_eltwise(::dnnl::algorithm::eltwise_gelu_erf, 0.f, 0.f);
matmul_ops.append_eltwise(::dnnl::algorithm::eltwise_gelu_tanh, 0.f, 0.f);

compute_desc->_epilogue_aux_ld, new_ldc, m, n,
sizeof(size_t) , dpct::device_to_device, queue);
} else if (compute_desc->_epilogue == epilogue_t::bias) {
matmul_ops.append_binary(::dnnl::algorithm::binary_add, bias_md);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bias_md represents the C in the D=A*B+C, not the extra bias in epilogue.

Copy link
Contributor Author

@abhilash1910 abhilash1910 Dec 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to create new bias descriptor. A*B+C+bias.?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants