Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cpu: move all the operators into a separate c++ file (except mul_mat) #1167

Merged
merged 11 commits into from
Apr 2, 2025

Conversation

cmdr2
Copy link
Collaborator

@cmdr2 cmdr2 commented Mar 29, 2025

// This shrinks the size of ggml-cpu.c to around 3500 lines (down from 15k lines a few weeks ago). A future PR will organize ops.cpp into separate files, and use templates to simplify it further.

It does not modify any lines in ggml-cpu.c, it only removes lines. Actual diff.

This PR looks big, but only does a few simple things:

  1. Moves all the lines related to SIMD Mapping into simd-mappings.h
  2. Moves all the lines related to vectorized fns into vec.h (and vec.cpp for non-inline functions)
  3. Moves all the operator functions into ops.cpp (except mul_mat, which has some C-specific behavior)

Detailed changes:

  • SIMD Mapping - No modifications were made to the SIMD Mapping lines.
  • Vectorized functions - Three minor changes:
    • fix 2 warnings where ggml_float was being cast implicitly to float. diff
    • share the pre-computed table using extern.
    • use GGML_UNUSED directly instead of the UNUSED macro.
  • Operator functions - Minor cosmetic changes to work in a separate C++ file - actual diff
    • explicit cast to float * or ggml_fp16_t * in functions that expected that type, e.g. ggml_fp16_to_fp32_row().
    • static_cast<ggml_op_pool>(op) for converting an int to an enum.
    • use the ggml_get_type_traits_cpu() function instead of accessing type_traits_cpu directly.
    • use std::hardware_destructive_interference_size instead of hardware_destructive_interference_size.
    • use GGML_UNUSED directly instead of the UNUSED macro.

This change passes all the runners at ggml-ci.

Please let me know if this approach requires any changes, happy to fix!

Thanks!

@cmdr2 cmdr2 requested review from ggerganov and slaren March 29, 2025 13:32
Co-authored-by: Diego Devesa <slarengh@gmail.com>
@cmdr2
Copy link
Collaborator Author

cmdr2 commented Apr 2, 2025

Thanks for the review comments!

I've fixed the suggestions, except the one about ggml-cpu-impl.h (since it defines a few macros like __FP16C__ based on certain conditions).

// __FMA__ and __F16C__ are not defined in MSVC, however they are implied with AVX2/AVX512
#if defined(_MSC_VER) && (defined(__AVX2__) || defined(__AVX512F__))
#ifndef __FMA__
#define __FMA__
#endif
#ifndef __F16C__
#define __F16C__
#endif
#endif
// __SSE3__ and __SSSE3__ are not defined in MSVC, but SSE3/SSSE3 are present when AVX/AVX2/AVX512 are available
#if defined(_MSC_VER) && (defined(__AVX__) || defined(__AVX2__) || defined(__AVX512F__))
#ifndef __SSE3__
#define __SSE3__
#endif
#ifndef __SSSE3__
#define __SSSE3__
#endif
#endif
#if defined(__s390x__) && defined(__VEC__)
#ifndef __VXE__
#define __VXE__
#endif
#ifndef __VXE2__
#define __VXE2__
#endif
#endif
#if defined(__ARM_FEATURE_SVE)
#include <arm_sve.h>
#include <sys/prctl.h>
#endif

Maybe we could move those lines into a different file (possibly in simd-mappings.h)?

@ggerganov
Copy link
Member

I've fixed the suggestions, except the one about ggml-cpu-impl.h (since it defines a few macros like FP16C based on certain conditions).

I missed those. In this case, let's leave the include as it is.

Maybe we could move those lines into a different file (possibly in simd-mappings.h)?

I think these CPU feature defines have to remain in ggml-cpu-impl.h. So I think we are good for now.

@cmdr2
Copy link
Collaborator Author

cmdr2 commented Apr 2, 2025

Thanks. Should I take a deep breath and merge this in?

I've tested on the runners in ggml-ci, and parsed the include dependency order a couple of times in my head (as much as I could).

Will read through the PR one more time, with a fine comb.

@ggerganov
Copy link
Member

Yes, squash-merge and then I'll sync to llama.cpp and whisper.cpp to see if there are any issues there.

@cmdr2
Copy link
Collaborator Author

cmdr2 commented Apr 2, 2025

Tested the latest changes (PR comments) on ggml-ci . All tests pass - 6f4ce79

Squash-merging..

@cmdr2 cmdr2 merged commit d920dfd into ggml-org:master Apr 2, 2025
3 checks passed
@cmdr2
Copy link
Collaborator Author

cmdr2 commented Apr 2, 2025

The CI build failure on the master branch is surprising.

I created a new branch (from master) and ran the CI again, and it ran successfully - https://github.com/ggml-org/ggml/commits/cmdr2-test-master-after-cpu-refactor/

And the CUDA tests pass on my local PC as well.

The CI build failure on the master branch is for this line: MUL_MAT(type_a=iq1_s,type_b=f32,m=16,n=1,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0): [MUL_MAT] NMSE = 0.000648890 > 0.000500000 �[1;31mFAIL�[0m

Not sure why there was a transient error in the mean-square-error for mul_mat.

@ggerganov
Copy link
Member

Some of the tests occasionally exceed the error threshold, so it is expected to have them failing from time to time. I think this is such case, though let's keep an eye out and see if it happens again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants