fix: Added Local Embedding Manager to reuse Local embed model - Fixes High Ram Issues #1950
+171
−136
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes
#1942 #866 #1623
Relates to
Memory optimization for character file knowledge processing
Risks
Low - This change optimizes the embedding model initialization without changing core functionality.
Background
What does this PR do?
Optimizes RAM usage in the embedding process by implementing a singleton pattern for the BGE embedding model. Previously, the model would be initialized multiple times during character knowledge processing, leading to high RAM usage (8-12GB & even more). With this change, the model is initialized once and reused, reducing RAM usage to less than 1GB.
The root issue lies in the FastEmbed library's initialization and Eliza's reinitialization:
ref: https://github.com/Anush008/fastembed-js
When this init function is called, The ort inference session allocates new CPU and RAM resources that don't get cleared by default or by garbage collector. In Eliza's code, we are calling this, again and again, basically ram usage became directly proportional to the no of lines in the character's file knowledge.
While this could be fixed in FastEmbed js by updating onnxruntime-node and using its release() method, that approach needs more testing and planning. This PR provides a quick and reliable fix by ensuring we create only one model instance.
What kind of change is this?
Improvements (optimizing existing features for better performance)
Documentation changes needed?
My changes do not require a change to the project documentation as this is an internal optimization that doesn't affect the API or user interface.
Testing
Where should a reviewer start?
EmbeddingModelManager.ts
embedding.ts
where the local embedding logic has been updatedDetailed testing steps
Test Results
Before
RAM Usage: 8-12GB during character knowledge processing
After
RAM Usage: <1GB during character knowledge processing
Technical Implementation Details
The PR introduces two main files:
EmbeddingModelManager.ts
- New singleton class managing model initializationembedding.ts
- Updated to use the manager for local embeddingsDeploy Notes
No special deployment steps required. The changes are backwards compatible and will take effect automatically after deployment.
Let me know your thoughts or if you need any clarification or changes to the PR!