Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix to normalization. Global normaliztion was being applied, when ele… #324

Conversation

Unobtainiumrock
Copy link
Collaborator

What does this PR do?

This PR refines the normalization process for embedding arrays in the Adalflow library. The original implementation applied global normalization using the normalize_np_array function, which had limitations in handling both 1D and 2D arrays efficiently. The updated function, normalize_embeddings, addresses these limitations by:

  1. Ensuring embeddings have an L2 norm of 1 for both single embeddings (1D arrays) and batches of embeddings (2D arrays).
  2. Handling edge cases like zero vectors gracefully to avoid division by zero errors.
  3. Improving clarity and accuracy in the normalization logic.

For a detailed comparison between the old and new approaches, including example scenarios and output, please refer to this gist.

Files Changed:

  • /adalflow/adalflow/core/functional.py: Added normalize_embeddings to enhance reliability and flexibility in embedding normalization.
  • adalflow/adalflow/components/retriever/faiss_retriever.py: Replaced normalize_np_array with normalize_embeddings

Fixes #323

Breaking Changes:

None.

  • Was this discussed/agreed via a GitHub issue?
    Yes

  • Did you read the contributor guideline?
    Yes

  • Did you make sure your PR does only one thing, instead of bundling different changes together?
    Yes.

  • Did you make sure to update the documentation with your changes?
    Not applicable

  • Did you write any new necessary tests?
    Not exactly. Tests were conducted to verify the function handles both 1D and 2D arrays correctly at the gist shown here https://gist.github.com/Unobtainiumrock/f1f6b993a038ac7641fd6096bb78250f

  • Did you verify new and existing tests pass locally with your changes?
    Yes

  • Did you list all the breaking changes introduced by this pull request?
    No breaking changes were introduced.

Additional Notes:

The improved normalization function ensures the library handles embeddings robustly, minimizing potential errors during preprocessing. This change is essential for developers relying on accurate and reliable embedding normalization. The linked gist provides a detailed breakdown of the differences between the global normalization and row-wise normalization approaches, with illustrative examples.

Had fun working on this! 🙃

…ment-wise normalization (per vector normalization) should be applied instead
Copy link
Member

@liyin2015 liyin2015 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tests are failing

@Unobtainiumrock Unobtainiumrock deleted the global-normalization-fix branch January 31, 2025 17:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Global Normalization Being Applied Where Element-Wise Normalizaiton (per vector) Should Be Applied
2 participants