Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vulkan: fix coopmat shader generation when cross-compiling #12272

Merged
merged 3 commits into from
Mar 28, 2025

Conversation

Icenowy
Copy link
Contributor

@Icenowy Icenowy commented Mar 8, 2025

Previously the status of coopmat{,2} support isn't passed to the vulkan-shaders-gen project building on the host, which leads to build failure because of the cross-compiling code expecting coopmat{,2} shaders that didn't get generated.

Fix this by passing the coopmat{,2} support status to vulkan-shaders subproject.

Tested by building with a glslc w/ coopmat and w/o coopmat2 both natively and cross-compiling (to RISC-V).

@github-actions github-actions bot added Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning labels Mar 8, 2025
@jeffbolznv
Copy link
Collaborator

I wonder if this will fix #11695. In that case it seemed like there were two different glslcs on the system with different coopmat(2) support.

@Icenowy
Copy link
Contributor Author

Icenowy commented Mar 9, 2025

@jeffbolznv I am not sure, this PR can never fix anything about "Vulkan_GLSLC_EXECUTABLE-NOTFOUND". It only fixes the issue that the detected glslc isn't properly tested in the vulkan-shaders-gen project.

However, the original issue might be really fixed, because that issue looks like coopmat(2) detection issue.

@0cc4m 0cc4m requested a review from bandoti March 12, 2025 13:16
@Icenowy
Copy link
Contributor Author

Icenowy commented Mar 27, 2025

Rebased and added fallback OFF value.

I am still not sure whether to make these options.

@jeffbolznv
Copy link
Collaborator

There's a similar PR happening at ggml-org/whisper.cpp#2914.

bandoti added a commit to bandoti/llama.cpp that referenced this pull request Mar 27, 2025
@bandoti
Copy link
Collaborator

bandoti commented Mar 27, 2025

@jeffbolznv After patching #12428 with this change it appears to have resolved the cross-compile there, consistent with reporting in whisper.cpp 2914. I think if we go ahead and merge this PR in before revisiting the whisper one, it may resolve the issue on whisper side. This change is needed before #12428 can be merged.

And, as you mention, it may also resolve #11695 as well.

cc: @sandrohanea

Copy link
Collaborator

@jeffbolznv jeffbolznv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change LGTM, though I am not an expert on cmake. Let's give @0cc4m another chance to review before merging.

@0cc4m
Copy link
Collaborator

0cc4m commented Mar 27, 2025

Thanks, but I'm not much help with cmake. I can check that it still works for my use cases, but usually I just trust @bandoti with it.

@Icenowy
Copy link
Contributor Author

Icenowy commented Mar 28, 2025

@bandoti added CACHE BOOL to the code setting it to true.

I am dumb on CMake, so please check whether this is expected by you.

Previously the status of coopmat{,2} support isn't passed to the
vulkan-shaders-gen project building on the host, which leads to build
failure because of the cross-compiling code expecting coopmat{,2}
shaders that didn't get generated.

Fix this by passing the coopmat{,2} support status to vulkan-shaders
subproject.

Signed-off-by: Icenowy Zheng <[email protected]>
Copy link
Collaborator

@bandoti bandoti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Icenowy Thank you for getting those changes in! I forgot to mention wrapping these checks in if(NOT DEFINED GGML_VULKAN_COOPMAT_...), but I just went ahead and made that basic change myself—since everything else looks good. Without this, the check would run every time regardless of caching or not.

@bandoti bandoti merged commit b86f600 into ggml-org:master Mar 28, 2025
48 checks passed
@bandoti
Copy link
Collaborator

bandoti commented Mar 28, 2025

Thank you for the contribution.

@sandrohanea
Copy link
Contributor

Great news that this was merged!
Thanks @Icenowy and @bandoti for the fixes!

Any timeline for ggml sync @ggerganov (so this change to be propagated in whisper.cpp as well) ?

else()
message(STATUS "GL_NV_cooperative_matrix2 supported by glslc")
add_compile_definitions(GGML_VULKAN_COOPMAT2_GLSLC_SUPPORT)
if(NOT DEFINED GGML_VULKAN_COOPMAT2_GLSLC_SUPPORT)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the goal of these if tests? I found that when I switch to an older glslc, it continues to use the cached value and won't reevaluate the support. This will also be a problem for people updating to newer SDKs - they won't get updated support unless they manually delete the cmake cache.

Copy link
Collaborator

@bandoti bandoti Apr 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. My assumption on using the cache variable in general was not based on switching glslc versions around. Is this a typical behavior or something mainly Vulkan devs experience?

We could fix this by checking the glslc version and "invalidate the cache" if it changed.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A new Vulkan SDK is released every couple months, so we should expect everybody's glslc version to change periodically. This bug would prevent them from picking up new features. glslc doesn't have a clear versioning system, and I don't see what's gained by caching these values.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Several months, however, is aeons with the code velocity—and in this case we're typically not expecting folks to downgrade OR upgrade their Vulkan SDK without rebuilding the entire thing anyway, right?

This bug would prevent them from picking up new features.

This is not actually the case—as this particular caching only affects coopmat checks. Other new feature detection doesn't need to be cached, but my gut is telling me it should be.

I am happy to to submit a PR to remove the caching—but I would say if the check is redundant 99% of the time, it will perhaps be good to consider keeping caching. Another possibility we can move it from INTERNAL caching to BOOL to allow the cache variable to be changed.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and in this case we're typically not expecting folks to downgrade OR upgrade their Vulkan SDK without rebuilding the entire thing anyway, right?

I would wager that most people will not expect to need to delete their build folder (or cmake cache files) when they update the Vulkan SDK. It's certainly not part of any build instructions.

This is not actually the case—as this particular caching only affects coopmat checks. Other new feature detection doesn't need to be cached, but my gut is telling me it should be.

Why does any of this need to be cached? What's actually gained? Is it to save a fraction of a second during the cmake generate step?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you put it that way—I suppose we don't gain anything haha (nervous laugh).

Probably a premature optimization on my part. I will submit a PR to remove it. 😊

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created PR #12719 to address the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ggml changes relating to the ggml tensor library for machine learning Vulkan Issues specific to the Vulkan backend
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants