Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rename oneMKL Interface to oneMath #602

Open
wants to merge 45 commits into
base: develop
Choose a base branch
from

Conversation

Rbiessy
Copy link
Contributor

@Rbiessy Rbiessy commented Oct 21, 2024

Description

Related RFC: #564 and related spec PR: uxlfoundation/oneAPI-spec#596
As a reminder the plan is to have this PR and the spec PR approved before we move this repository to https://github.com/uxlfoundation and merge the 2 PRs.

  • The PR has the following implications on the MKLCPU and MKLGPU backends:
    • Had to introduce src/include/common_onemkl_conversion.hpp to convert oneMath common types to oneMKL types. This wasn't needed before as the backends relied on the 2 projects to use the same namespace. The file also provide helper macros to catch oneMKL exceptions and rethrow them as oneMath exceptions.
    • The DFT MKLGPU backend used to catch and rethrow some but not all oneapi::mkl exceptions when committing a descriptor. We discussed this issue by email recently. With the changes needed by the renaming, we now catch and rethrow all the exceptions as oneapi::math exceptions and I was able to remove a related workaround in the tests (FYI @oneapi-src/onemkl-dft-write). Also see related discussion below: Rename oneMKL Interface to oneMath #602 (comment)
    • The RNG MKLGPU backend relied heavily on the 2 libraries using the same namespace. I introduced onemkl_distribution_conversion.hpp to convert the RNG types and call the backend's free functions such as oneapi::mkl::rng::generate, oneapi::mkl::rng::skip_ahead. @oneapi-src/onemkl-rng-write you may want to carefully review src/rng/backends/mklgpu/ as I am not familiar with the RNG domain.
    • The renaming made it clear that some header files were duplicating function declarations for no reason in the BLAS and LAPACK domains. They are removed in baeb476.
  • The PR also remove the domain documentation for BLAS and LAPACK which duplicates the specification (commit b382291). We discussed in the issue [BLAS] Wrong namespace usage in BLAS Documentation #489 that the duplicated documentation was an issue and some of us are keen to remove it. Now is a good time as it becomes even more outdated with the name change. (@oneapi-src/onemkl-blas-write and @oneapi-src/onemkl-lapack-write FYI)
  • The changes assume that the teams @oneapi-src/onemkl-* will be renamed (or duplicated) to @uxlfoundation/onemath-* in the new repository (@rscohn2 FYI)
  • The changes assume that the onemkl Slack channel will be renamed to onemath (@rodburns FYI)
  • Removed USE_MKLREF which was only used in Lapack tests and was not documented. This macro wouldn't work easily with the namespace change anymore (@oneapi-src/onemkl-lapack-write FYI)
  • For reference the remaining occurrences of MKL are:
    • Intel backend names MKLCPU and MKLGPU.
    • Usages of oneapi::mkl inside the Intel backends.
    • File cmake/mkl/MKLConfig.cmake (could probably be removed IMO).
    • [email protected] in CODE_OF_CONDUCT.md and CONTRIBUTING.md. Let me know if there is any plan to create [email protected]. If not this can be updated separately.
    • Folder include/oneapi/mkl for deprecated headers and deprecated oneapi::mkl C++ namespace.
    • Deprecated ONEMKL CMake namespace.
    • Some files in the MKLCPU and MKLGPU backends use "mkl" in their name. They could be updated to onemkl if needed.

Checklist

All Submissions

Copy link
Contributor

@anantsrivastava30 anantsrivastava30 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are a lot of changes for one PR!. Mostly they look fine with what was discussed in the RFC and spce change. However I have a few question in my review.

src/dft/backends/mklgpu/commit.cpp Show resolved Hide resolved
include/oneapi/math/detail/backends_table.hpp Show resolved Hide resolved
@Rbiessy
Copy link
Contributor Author

Rbiessy commented Oct 30, 2024

The conflicts are mostly due to the recent PR for cuSPARSE support. I will work on fixing them but the other domains can still be reviewed.

@andreyfe1
Copy link
Contributor

Thanks for doing this huge work!
I have a general question from my side. Have you used git mv for renaming/duplicating files? This command preserves the file’s history. So, we will be able to see the history of commits for old files.

@Rbiessy
Copy link
Contributor Author

Rbiessy commented Nov 4, 2024

Thanks for doing this huge work! I have a general question from my side. Have you used git mv for renaming/duplicating files? This command preserves the file’s history. So, we will be able to see the history of commits for old files.

Yes I have used git mv in the second commit: 68d3966
In this commit GitHub shows the movement unfortunately when the changes from the other commits are shown GitHub decides it's best to show some files have been deleted and re-created.
To work around this you could try to compare the diff between the commit after the move and the latest commit if that's easier. You can see it here: https://github.com/Rbiessy/oneMKL/compare/600cd44...2a8a76b?diff=split&w=

@andreyfe1
Copy link
Contributor

ok. Thanks. Just wanted to make sure that we don't accidentally miss important information

@Rbiessy
Copy link
Contributor Author

Rbiessy commented Nov 5, 2024

@oneapi-src/onemkl-sparse-write I have updated the PR to fix the conflicts with the cuSPARSE PR. It would be useful if you can review the sparse related changes in this PR before the rocSPARSE PR #544!
Log for Intel backends: intel_mklcpu_mklgpu.txt
Log for Nvidia backends: log_a100.txt

@Rbiessy
Copy link
Contributor Author

Rbiessy commented Nov 7, 2024

I fixed some more recent conflicts.
@oneapi-src/onemkl-blas-write, @oneapi-src/onemkl-lapack-write in the last commit (baeb476) I have removed a couple of header files in blas and lapack which were duplicating function declarations for the MKLCPU and MKLGPU backends. Given that we can simply include the Intel oneMKL headers instead they can be removed here. This was discussed in #606 (comment)
Log testing the MKLCPU and MKLGPU backends with the 2025.0 base toolkit: intel_mklcpu_mklgpu.txt

Copy link
Contributor

@andrewtbarker andrewtbarker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BLAS changes look good to me.

docs/building_the_project_with_dpcpp.rst Outdated Show resolved Hide resolved
@sknepper
Copy link
Contributor

sknepper commented Nov 8, 2024

/intelci: run

Copy link
Contributor

@anantsrivastava30 anantsrivastava30 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving on behalf of DFT

@toxicscum
Copy link

/intelci: run

@sknepper
Copy link
Contributor

Thanks for all your work here - there are so many changes, so I know you've spent a lot of effort on this!
It looks like there are some problems building on Windows. For example, I'm seeing an error for the BLAS domain like:

[1/155] C:\cache\stash\oneapi-compiler\20240719_rls\win\package\compiler\latest\bin\icx.exe -fsycl /nologo -IC:\temp\ec\mkltest\auto-efa034fe-1902-f165-9fcf-a4bf010d0e2d\sources\src -IC:\temp\ec\mkltest\auto-efa034fe-1902-f165-9fcf-a4bf010d0e2d\sources\src\include -IC:\temp\ec\mkltest\auto-efa034fe-1902-f165-9fcf-a4bf010d0e2d\build\bin -IC:\cache\stash\onemkl\regular\2024.2.1\20240722\build__release_win\include /EHsc /DWIN32 /D_WINDOWS /W3 /GR /EHsc -Wno-unused-function -w /O2 /Ob2 /DNDEBUG -iquote C:/temp/ec/mkltest/auto-efa034fe-1902-f165-9fcf-a4bf010d0e2d/sources/include -DMKL_ILP64 -fsycl -QMD -QMT bin\blas\backends\mklcpu\CMakeFiles\onemath_blas_mklcpu_obj.dir\mklcpu_level2.cpp.obj -QMF bin\blas\backends\mklcpu\CMakeFiles\onemath_blas_mklcpu_obj.dir\mklcpu_level2.cpp.obj.d /Fobin\blas\backends\mklcpu\CMakeFiles\onemath_blas_mklcpu_obj.dir\mklcpu_level2.cpp.obj -c C:\temp\ec\mkltest\auto-efa034fe-1902-f165-9fcf-a4bf010d0e2d\sources\src\blas\backends\mklcpu\mklcpu_level2.cpp
FAILED: bin/blas/backends/mklcpu/CMakeFiles/onemath_blas_mklcpu_obj.dir/mklcpu_level2.cpp.obj
C:\cache\stash\oneapi-compiler\20240719_rls\win\package\compiler\latest\bin\icx.exe -fsycl /nologo -IC:\temp\ec\mkltest\auto-efa034fe-1902-f165-9fcf-a4bf010d0e2d\sources\src -IC:\temp\ec\mkltest\auto-efa034fe-1902-f165-9fcf-a4bf010d0e2d\sources\src\include -IC:\temp\ec\mkltest\auto-efa034fe-1902-f165-9fcf-a4bf010d0e2d\build\bin -IC:\cache\stash\onemkl\regular\2024.2.1\20240722\build__release_win\include /EHsc /DWIN32 /D_WINDOWS /W3 /GR /EHsc -Wno-unused-function -w /O2 /Ob2 /DNDEBUG -iquote C:/temp/ec/mkltest/auto-efa034fe-1902-f165-9fcf-a4bf010d0e2d/sources/include -DMKL_ILP64 -fsycl -QMD -QMT bin\blas\backends\mklcpu\CMakeFiles\onemath_blas_mklcpu_obj.dir\mklcpu_level2.cpp.obj -QMF bin\blas\backends\mklcpu\CMakeFiles\onemath_blas_mklcpu_obj.dir\mklcpu_level2.cpp.obj.d /Fobin\blas\backends\mklcpu\CMakeFiles\onemath_blas_mklcpu_obj.dir\mklcpu_level2.cpp.obj -c C:\temp\ec\mkltest\auto-efa034fe-1902-f165-9fcf-a4bf010d0e2d\sources\src\blas\backends\mklcpu\mklcpu_level2.cpp
icx: error: cannot specify '-Fobin\blas\backends\mklcpu\CMakeFiles\onemath_blas_mklcpu_obj.dir\mklcpu_level2.cpp.obj' when compiling multiple source files

And somewhat similar for LAPACK (and RNG); e.g.,
icx: error: cannot specify '-Fobin\lapack\backends\mklcpu\CMakeFiles\onemath_lapack_mklcpu_obj.dir\mkl_lapack.cpp.obj' when compiling multiple source files

@Rbiessy
Copy link
Contributor Author

Rbiessy commented Nov 13, 2024

Thanks @sknepper I have been able to identify that the issue is due to using -iquote flag here which is not supported on Windows.
This is needed because we now need to include the Intel oneMKL mkl.hpp file to have a declaration of the oneapi::mkl symbols. Including this file conflicts with the mkl.hpp in this project which does not provide the right symbols anymore.
I am looking for another solution other than using -iquote but I am not confident I can find a nice solution. This is a difficult problem to solve and the main reason why I think different projects should not provide the same header files.

In terms of review it is still possible to review most of the changes. I expect the changes needed will only affect CMake files.

@Rbiessy
Copy link
Contributor Author

Rbiessy commented Nov 14, 2024

@toxicscum @sknepper I have found a decent solution which should fix the compilation issue on Windows in a2de0e5.
I have described the issue and the solution I used in this comment: a2de0e5#diff-148715d6ea0c0ea0a346af3f6bd610d010d490eca35ac6a9b408748f7ca9e3f4R51

@sknepper
Copy link
Contributor

/intelci: run

Copy link
Contributor

@sknepper sknepper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @Rbiessy - what an impressive set of changes!
LAPACK looks good - thanks for removing the duplicated headers and documentation - that will streamline things in the future.
The USE_MKLREF in the tests was just a convenience feature to avoid a dependency on Netlib; in the future, if we create standalone testing, the same goal will have been achieved. So that's fine to remove this undocumented macro.

The latest round of precommit testing is still running, but it does indeed look like the Windows build issues were resolved - much thanks!

Copy link
Contributor

@mkrainiuk mkrainiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good to me, thank you! I have two minor comments.

README.md Outdated Show resolved Hide resolved

You can also join the mailing lists for the [UXL Foundation](https://lists.uxlfoundation.org/g/main/subgroups) to be informed of when meetings are happening and receive the latest information and discussions.

---

## Contributing

You can contribute to this project and also contribute to [the specification for this project](https://oneapi-spec.uxlfoundation.org/specifications/oneapi/latest/elements/onemkl/source/). Please read the [CONTRIBUTING](CONTRIBUTING.md) page for more information. You can also contact oneMKL developers and maintainers via [UXL Foundation Slack](https://slack-invite.uxlfoundation.org/) using [#onemkl](https://uxlfoundation.slack.com/archives/onemkl) channel.
You can contribute to this project and also contribute to [the specification for this project](https://oneapi-spec.uxlfoundation.org/specifications/oneapi/latest/elements/onemath/source/). Please read the [CONTRIBUTING](CONTRIBUTING.md) page for more information. You can also contact oneMath developers and maintainers via [UXL Foundation Slack](https://slack-invite.uxlfoundation.org/) using [#onemath](https://uxlfoundation.slack.com/archives/onemath) channel.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know who can update the slack channel name? Does it make sense to introduce new channel and close the old one once these changes will be merged?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know who can update the slack channel name? Does it make sense to introduce new channel and close the old one once these changes will be merged?

I can rename the channel or create a new one. It might make sense to retain the existing channel though I am open to whatever the consensus is. Just drop me a message when you are ready.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @rodburns! Renaming'd be perfect option if possible.

Co-authored-by: Maria Kraynyuk <[email protected]>
@Rbiessy
Copy link
Contributor Author

Rbiessy commented Nov 26, 2024

@spencerpatty gentle ping since you have approved the PR for the specification renaming but not the implementation.
Also pinging @oneapi-src/onemkl-rng-write if you are able to review the 2 renaming PRs before November 29 that'd be great.
Thanks

return oneapi::mkl::sparse::trmv(queue, A_view.uplo_view, opA, A_view.diag_view,
host_alpha, backend_handle, x_usm, host_beta, y_usm,
dependencies);
RETHROW_ONEMKL_EXCEPTIONS_RET(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice macro for rethrowing exceptions and returning sycl events :) handy that!

@@ -45,16 +43,16 @@ sycl::event release_spsv_descr(sycl::queue& queue, spsv_descr_t spsv_descr,
return detail::submit_release(queue, spsv_descr, dependencies);
}

void check_valid_spsv(const std::string& function_name, oneapi::mkl::transpose opA,
matrix_view A_view, matrix_handle_t A_handle, dense_vector_handle_t x_handle,
void check_valid_spsv(const std::string& function_name, transpose opA, matrix_view A_view,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

much simpler now to distinguish between oneMKL product types and the oneMATH types :) good changes for this interfaces project

Copy link

@spencerpatty spencerpatty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have reviewed all Sparse BLAS related files and I approve of these changes. I love that it is much simpler to distinguish whether we are using oneMATH variables or the backend variables in the files now. This is a good thing!

@spencerpatty
Copy link

@oneapi-src/onemkl-sparse-write I have updated the PR to fix the conflicts with the cuSPARSE PR. It would be useful if you can review the sparse related changes in this PR before the rocSPARSE PR #544! Log for Intel backends: intel_mklcpu_mklgpu.txt Log for Nvidia backends: log_a100.txt

@Rbiessy I have reviewed the cuSPARSE changes and the general changes for Sparse BLAS and approved them, but in the intel mklcpu/gpu logs it seems like all the SPARSE BLAS tests are skipping. Is that intended? cuSPARSE tests appear to run the first 4 or 5 then skip the rest as well ... is that intended ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants