fix: Added Local Embedding Manager to reuse Local embed model - Fixes High Ram Issues #1950

mbcse · 2025-01-07T10:05:35Z

Fixes

#1942 #866 #1623

Relates to

Memory optimization for character file knowledge processing

Risks

Low - This change optimizes the embedding model initialization without changing core functionality.

Background

What does this PR do?

Optimizes RAM usage in the embedding process by implementing a singleton pattern for the BGE embedding model. Previously, the model would be initialized multiple times during character knowledge processing, leading to high RAM usage (8-12GB & even more). With this change, the model is initialized once and reused, reducing RAM usage to less than 1GB.

The root issue lies in the FastEmbed library's initialization and Eliza's reinitialization:

static async init({
    model = EmbeddingModel.BGESmallENV15,
    executionProviders = [ExecutionProvider.CPU],
    maxLength = 512,
    cacheDir = "local_cache",
    showDownloadProgress = true,
}) {
    const session = await ort.InferenceSession.create(modelPath, {
      executionProviders,
      graphOptimizationLevel: "all",
    });
    return new FlagEmbedding(tokenizer, session, model);
}

ref: https://github.com/Anush008/fastembed-js

When this init function is called, The ort inference session allocates new CPU and RAM resources that don't get cleared by default or by garbage collector. In Eliza's code, we are calling this, again and again, basically ram usage became directly proportional to the no of lines in the character's file knowledge.

elizaLogger.debug("Initializing BGE embedding model...");

const embeddingModel = await FlagEmbedding.init({
    cacheDir: cacheDir,
    model: EmbeddingModel.BGESmallENV15,
    // BGE-small-en-v1.5 specific settings
    maxLength: 512, // BGE's context window
});

elizaLogger.debug("Generating embedding for input:", {
    inputLength: input.length,
    inputPreview: input.slice(0, 100) + "...",
});

While this could be fixed in FastEmbed js by updating onnxruntime-node and using its release() method, that approach needs more testing and planning. This PR provides a quick and reliable fix by ensuring we create only one model instance.

What kind of change is this?

Improvements (optimizing existing features for better performance)

Documentation changes needed?

My changes do not require a change to the project documentation as this is an internal optimization that doesn't affect the API or user interface.

Testing

Where should a reviewer start?

Review the newly added EmbeddingModelManager.ts
Check the modifications in embedding.ts where the local embedding logic has been updated
Monitor RAM usage during character knowledge processing

Detailed testing steps

Process a character file with substantial knowledge content
- Verify that RAM usage remains under 1GB
- Confirm that embeddings are still generated correctly
Test concurrent embedding requests
- Verify that race conditions are handled properly
- Check that model initialization happens only once
Test browser fallback
- Verify that the code still falls back to remote embedding in browser environments

Test Results

Before

RAM Usage: 8-12GB during character knowledge processing

After

RAM Usage: <1GB during character knowledge processing

Technical Implementation Details

Implemented singleton pattern for BGE model management
Added proper initialization locking to handle concurrent requests
Maintained existing fallback behavior and error handling
Added cleanup capabilities for proper resource management
No changes to the embedding API or external interfaces

The PR introduces two main files:

EmbeddingModelManager.ts - New singleton class managing model initialization
Modified embedding.ts - Updated to use the manager for local embeddings

Deploy Notes

No special deployment steps required. The changes are backwards compatible and will take effect automatically after deployment.

Let me know your thoughts or if you need any clarification or changes to the PR!

…Ram issue

shakkernerd

This is nice.
Thank you for this.

mbcse and others added 3 commits January 7, 2025 15:19

fix: Added Local Embedding Manager to reuse embed model - Fixes High …

b0bae75

…Ram issue

Merge branch 'develop' into fix/embedding-high-memory-usage

03d39c2

Merge branch 'develop' into fix/embedding-high-memory-usage

c33f462

shakkernerd approved these changes Jan 8, 2025

View reviewed changes

shakkernerd merged commit c33f462 into elizaOS:develop Jan 8, 2025
3 of 5 checks passed

This was referenced Jan 8, 2025

Memory leak in getLocalEmbedding #1942

Closed

pnpm start crashes if there is too much data in knowledge #866

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Added Local Embedding Manager to reuse Local embed model - Fixes High Ram Issues #1950

fix: Added Local Embedding Manager to reuse Local embed model - Fixes High Ram Issues #1950

mbcse commented Jan 7, 2025 •

edited

Loading

shakkernerd left a comment

fix: Added Local Embedding Manager to reuse Local embed model - Fixes High Ram Issues #1950

fix: Added Local Embedding Manager to reuse Local embed model - Fixes High Ram Issues #1950

Conversation

mbcse commented Jan 7, 2025 • edited Loading

Fixes

Relates to

Risks

Background

What does this PR do?

What kind of change is this?

Documentation changes needed?

Testing

Where should a reviewer start?

Detailed testing steps

Test Results

Before

After

Technical Implementation Details

Deploy Notes

shakkernerd left a comment

Choose a reason for hiding this comment

mbcse commented Jan 7, 2025 •

edited

Loading