Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Added Local Embedding Manager to reuse Local embed model - Fixes High Ram Issues #1950

Merged
merged 3 commits into from
Jan 8, 2025

Conversation

mbcse
Copy link

@mbcse mbcse commented Jan 7, 2025

Fixes

#1942 #866 #1623

Relates to

Memory optimization for character file knowledge processing

Risks

Low - This change optimizes the embedding model initialization without changing core functionality.

Background

What does this PR do?

Optimizes RAM usage in the embedding process by implementing a singleton pattern for the BGE embedding model. Previously, the model would be initialized multiple times during character knowledge processing, leading to high RAM usage (8-12GB & even more). With this change, the model is initialized once and reused, reducing RAM usage to less than 1GB.

The root issue lies in the FastEmbed library's initialization and Eliza's reinitialization:

static async init({
    model = EmbeddingModel.BGESmallENV15,
    executionProviders = [ExecutionProvider.CPU],
    maxLength = 512,
    cacheDir = "local_cache",
    showDownloadProgress = true,
}) {
    const session = await ort.InferenceSession.create(modelPath, {
      executionProviders,
      graphOptimizationLevel: "all",
    });
    return new FlagEmbedding(tokenizer, session, model);
}

ref: https://github.com/Anush008/fastembed-js

When this init function is called, The ort inference session allocates new CPU and RAM resources that don't get cleared by default or by garbage collector. In Eliza's code, we are calling this, again and again, basically ram usage became directly proportional to the no of lines in the character's file knowledge.

elizaLogger.debug("Initializing BGE embedding model...");

const embeddingModel = await FlagEmbedding.init({
    cacheDir: cacheDir,
    model: EmbeddingModel.BGESmallENV15,
    // BGE-small-en-v1.5 specific settings
    maxLength: 512, // BGE's context window
});

elizaLogger.debug("Generating embedding for input:", {
    inputLength: input.length,
    inputPreview: input.slice(0, 100) + "...",
});

While this could be fixed in FastEmbed js by updating onnxruntime-node and using its release() method, that approach needs more testing and planning. This PR provides a quick and reliable fix by ensuring we create only one model instance.

What kind of change is this?

Improvements (optimizing existing features for better performance)

Documentation changes needed?

My changes do not require a change to the project documentation as this is an internal optimization that doesn't affect the API or user interface.

Testing

Where should a reviewer start?

  1. Review the newly added EmbeddingModelManager.ts
  2. Check the modifications in embedding.ts where the local embedding logic has been updated
  3. Monitor RAM usage during character knowledge processing

Detailed testing steps

  1. Process a character file with substantial knowledge content
    • Verify that RAM usage remains under 1GB
    • Confirm that embeddings are still generated correctly
  2. Test concurrent embedding requests
    • Verify that race conditions are handled properly
    • Check that model initialization happens only once
  3. Test browser fallback
    • Verify that the code still falls back to remote embedding in browser environments

Test Results

Before

RAM Usage: 8-12GB during character knowledge processing

After

RAM Usage: <1GB during character knowledge processing

Technical Implementation Details

  • Implemented singleton pattern for BGE model management
  • Added proper initialization locking to handle concurrent requests
  • Maintained existing fallback behavior and error handling
  • Added cleanup capabilities for proper resource management
  • No changes to the embedding API or external interfaces

The PR introduces two main files:

  1. EmbeddingModelManager.ts - New singleton class managing model initialization
  2. Modified embedding.ts - Updated to use the manager for local embeddings

Deploy Notes

No special deployment steps required. The changes are backwards compatible and will take effect automatically after deployment.

Let me know your thoughts or if you need any clarification or changes to the PR!

Copy link
Member

@shakkernerd shakkernerd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is nice.
Thank you for this.

@shakkernerd shakkernerd merged commit c33f462 into elizaOS:develop Jan 8, 2025
3 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants