Regression: e0dbec0 (aka #12181) breaks pooled embeddings: mean #12517

s-u · 2025-03-22T21:41:37Z

Name and Version

Affects all llama builds since e0dbec0, tested up to

version: 4941 (ba932df)
built with cc (Ubuntu 13.3.0-6ubuntu2-24.04) 13.3.0 for x86_64-linux-gnu

bug not present in

version: 4879 (f08f4b3)
built with cc (Ubuntu 13.3.0-6ubuntu2-24.04) 13.3.0 for x86_64-linux-gnu

Operating systems

Linux

Which llama.cpp modules do you know to be affected?

libllama (core library)

Command line

# Can be replicated with any model, here using Llama-3.3
# (-b/-c to reduce memory usages, but not relevant to the bug - can use model ctx size)
llama-embedding -m Llama-3.3-70B-Instruct-Q6_K-00001-of-00002.gguf -ngl 90 -b 2048 -c 2048 -p 'hello, world' --pooling mean

Problem description & steps to reproduce

Fails in llm_graph_context::build_pooling with:
llama.cpp/ggml/src/ggml.c:2738: GGML_ASSERT(ggml_can_mul_mat(a, b)) failed

Reproduce with any model using llama-embedding --pooling mean, for example:

llama-embedding -m Llama-3.3-70B-Instruct-Q6_K-00001-of-00002.gguf \
   -ngl 90 -b 2048 -c 2048 -p 'hello, world' --pooling mean

The error is due to mismatch between inp and inp_mean tensors in llama-graph.cpp@:1626.

Run with additional output printing nelements and nrows of inp and inp_mean:

llama_context: KV self size  =  640.00 MiB, K (f16):  320.00 MiB, V (f16):  320.00 MiB
llama_context: pipeline parallelism enabled (n_copies=4)
inp nel = 16777216, nrow = 2048
imp_mean nel = 4194304, nrow = 2048
inp nel = 16777216, nrow = 2048
imp_mean nel = 1, nrow = 1
llama.cpp/ggml/src/ggml.c:2738: GGML_ASSERT(ggml_can_mul_mat(a, b)) failed

run before with llama 4879 (f08f4b3), i.e., before e0dbec0 (#12181):

llama_init_from_model: KV self size  =  640.00 MiB, K (f16):  320.00 MiB, V (f16):  320.00 MiB
llama_init_from_model:  CUDA_Host  output buffer size =     0.00 MiB
llama_init_from_model: pipeline parallelism enabled (n_copies=4)
inp nel = 16777216, nrow = 2048
imp_mean nel = 4194304, nrow = 2048
inp nel = 8192, nrow = 1
imp_mean nel = 1, nrow = 1
inp nel = 16777216, nrow = 2048
imp_mean nel = 4194304, nrow = 2048
llama_init_from_model:      CUDA0 compute buffer size =  1600.03 MiB
llama_init_from_model:      CUDA1 compute buffer size =  1664.06 MiB
llama_init_from_model:  CUDA_Host compute buffer size =   192.09 MiB
llama_init_from_model: graph nodes  = 2569
llama_init_from_model: graph splits = 3
common_init_from_params: setting dry_penalty_last_n to ctx_size = 2048
common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable)
inp nel = 16384, nrow = 2
imp_mean nel = 4, nrow = 2
[...]
batch_decode: n_tokens = 3, n_seq = 1
inp nel = 24576, nrow = 3
imp_mean nel = 9, nrow = 3

First Bad Commit

e0dbec0

Relevant log output

The text was updated successfully, but these errors were encountered:

ggerganov · 2025-03-24T12:35:07Z

Check if #12545 fixes the issue.

s-u · 2025-03-24T22:08:17Z

I can confirm that this fixes the issue. Thanks, the fast response is much appreciated!

s-u added the bug-unconfirmed label Mar 22, 2025

ivan-tkatchev mentioned this issue Mar 24, 2025

Eval bug: crash when pooling_type == LLAMA_POOLING_TYPE_MEAN #12543

Open

ggerganov mentioned this issue Mar 24, 2025

context : fix worst-case reserve outputs #12545

Merged

ggerganov closed this as completed in #12545 Mar 25, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Regression: e0dbec0 (aka #12181) breaks pooled embeddings: mean #12517

Regression: e0dbec0 (aka #12181) breaks pooled embeddings: mean #12517

s-u commented Mar 22, 2025

ggerganov commented Mar 24, 2025

s-u commented Mar 24, 2025

Regression: e0dbec0 (aka #12181) breaks pooled embeddings: mean #12517

Regression: e0dbec0 (aka #12181) breaks pooled embeddings: mean #12517

Comments

s-u commented Mar 22, 2025

Name and Version

Operating systems

Which llama.cpp modules do you know to be affected?

Command line

Problem description & steps to reproduce

First Bad Commit

Relevant log output

ggerganov commented Mar 24, 2025

s-u commented Mar 24, 2025