Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"TypeError in embed_documents: Nested Embedding Structure (List[List[float]]) Causes Failure in LlamaCppEmbeddings with langchain-community v0.3.12" #28813

Open
5 tasks done
Forgotten-Forever opened this issue Dec 19, 2024 · 1 comment
Labels
🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature

Comments

@Forgotten-Forever
Copy link

Checked other resources

  • I added a very descriptive title to this issue.
  • I searched the LangChain documentation with the integrated search.
  • I used the GitHub search to find a similar question and didn't find it.
  • I am sure that this is a bug in LangChain rather than my code.
  • The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).

Example Code

from langchain.document_loaders import TextLoader
from langchain.text_splitter import CharacterTextSplitter
from langchain.embeddings import LlamaCppEmbeddings

Load a text file

loader = TextLoader("./docs/raw.txt")
docs = loader.load()

Split text into smaller chunks

text_splitter = CharacterTextSplitter(chunk_size=10, chunk_overlap=0)
texts = text_splitter.split_documents(docs)

Prepare input for embedding

embeddings = LlamaCppEmbeddings(model_path="./models/llama-1B-q8_0.gguf")
_texts = [doc.page_content for doc in texts]

Attempt to embed documents

embedded_texts = embeddings.embed_documents(_texts)

print(len(embedded_texts), len(embedded_texts[0]))

Error Message and Stack Trace (if applicable)

[list(map(float, e["embedding"])) for e in embeddings["data"]]
PyDev console: starting.
Traceback (most recent call last):
File "D:\Pycharm\PyCharm Community Edition 2023.2.3\plugins\python-ce\helpers\pydev_pydevd_bundle\pydevd_exec2.py", line 3, in Exec
exec(exp, global_vars, local_vars)
File "", line 1, in
File "", line 1, in
TypeError: float() argument must be a string or a real number, not 'list'

Description

When using langchain-community's LlamaCppEmbeddings with llama-cpp-python, the embed_documents method fails with a TypeError when processing certain input texts. The issue arises because the returned embedding structure from llama_cpp is unexpectedly nested (List[List[float]]), but embed_documents assumes a flat structure (List[float]).

Environment
Python version: 3.10
langchain-community: v0.3.12
llama-cpp-python: v0.3.5
Model: llama-1B-q8_0.gguf --> llama-3.2-1B

Expected Behavior
The embed_documents method should process the embeddings correctly, regardless of whether the returned embedding structure is flat (List[float]) or nested (List[List[float]]).

Actual Behavior
The embed_documents method assumes the returned embeddings are flat (List[float]), but when the structure is nested (List[List[float]]), it fails with the following error:
TypeError: float() argument must be a string or a real number, not 'list'

System Info

(gpt310free) PS D:\Temp\Gpt> python -m langchain_core.sys_info

System Information

OS: Windows
OS Version: 10.0.19045
Python Version: 3.10.13 | packaged by Anaconda, Inc. | (main, Sep 11 2023, 13:24:38) [MSC v.1916 64 bit (AMD64)]

Package Information

langchain_core: 0.3.25
langchain: 0.3.12
langchain_community: 0.3.12
langsmith: 0.1.147
langchain_text_splitters: 0.3.3

Optional packages not installed

langserve

Other Dependencies

aiohttp: 3.11.10
async-timeout: 4.0.3
dataclasses-json: 0.6.7
httpx: 0.27.0
httpx-sse: 0.4.0
jsonpatch: 1.33
langsmith-pyo3: Installed. No version info available.
numpy: 1.25.2
orjson: 3.10.12
packaging: 24.0
pydantic: 2.10.3
pydantic-settings: 2.7.0
PyYAML: 6.0.1
requests: 2.30.0
requests-toolbelt: 1.0.0
SQLAlchemy: 2.0.36
tenacity: 8.2.3
typing-extensions: 4.12.2

@dosubot dosubot bot added the 🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature label Dec 19, 2024
@keenborder786
Copy link
Contributor

@Forgotten-Forever fixed in the above PR

ccurme added a commit that referenced this issue Dec 23, 2024
…8827)

- **Description:** `embed_documents` and `embed_query` was throwing off
the error as stated in the issue. The issue was that `Llama` client is
returning the embeddings in a nested list which is not being accounted
for in the current implementation and therefore the stated error is
being raised.
- **Issue:** #28813

---------

Co-authored-by: Chester Curme <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature
Projects
None yet
Development

No branches or pull requests

2 participants