Support truncating when the input size exceeds model limit #36

pengjiang80 · 2025-02-12T02:15:02Z

Some Application frameworks can't handle special characters very well and may not follow the chunk size strictly. In some cases, the size of embedding or reranking may exceed the setting and cause error based on model configuration. As it's a bug at the application level and may or may not be fixed, we can't guarantee that using a model with a bigger context size and setting a bigger ubatch size can 100% resolve the problem. Some users are willing to bear some accuracy loss and embed the document without error. We can add a parameter to handle this as both ollama and some other platforms support this. It's a setting not recommended, but users can turn it on at their own risk.

Refer to gpustack/gpustack#950

pengjiang80 mentioned this issue Feb 24, 2025

[Question]: GPUStack error to embed infiniflow/ragflow#5304

Open

thxCode mentioned this issue Feb 27, 2025

use llama-box deployed embedding models， input is too large to process, please increase the physical batch size gpustack/gpustack#1357

Closed

thxCode self-assigned this Feb 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support truncating when the input size exceeds model limit #36

Support truncating when the input size exceeds model limit #36

pengjiang80 commented Feb 12, 2025

Support truncating when the input size exceeds model limit #36

Support truncating when the input size exceeds model limit #36

Comments

pengjiang80 commented Feb 12, 2025