"TypeError in embed_documents: Nested Embedding Structure (List[List[float]]) Causes Failure in LlamaCppEmbeddings with langchain-community v0.3.12" #28813
Labels
🤖:bug
Related to a bug, vulnerability, unexpected error with an existing feature
Checked other resources
Example Code
from langchain.document_loaders import TextLoader
from langchain.text_splitter import CharacterTextSplitter
from langchain.embeddings import LlamaCppEmbeddings
Load a text file
loader = TextLoader("./docs/raw.txt")
docs = loader.load()
Split text into smaller chunks
text_splitter = CharacterTextSplitter(chunk_size=10, chunk_overlap=0)
texts = text_splitter.split_documents(docs)
Prepare input for embedding
embeddings = LlamaCppEmbeddings(model_path="./models/llama-1B-q8_0.gguf")
_texts = [doc.page_content for doc in texts]
Attempt to embed documents
embedded_texts = embeddings.embed_documents(_texts)
print(len(embedded_texts), len(embedded_texts[0]))
Error Message and Stack Trace (if applicable)
[list(map(float, e["embedding"])) for e in embeddings["data"]]
PyDev console: starting.
Traceback (most recent call last):
File "D:\Pycharm\PyCharm Community Edition 2023.2.3\plugins\python-ce\helpers\pydev_pydevd_bundle\pydevd_exec2.py", line 3, in Exec
exec(exp, global_vars, local_vars)
File "", line 1, in
File "", line 1, in
TypeError: float() argument must be a string or a real number, not 'list'
Description
When using langchain-community's LlamaCppEmbeddings with llama-cpp-python, the embed_documents method fails with a TypeError when processing certain input texts. The issue arises because the returned embedding structure from llama_cpp is unexpectedly nested (List[List[float]]), but embed_documents assumes a flat structure (List[float]).
Environment
Python version: 3.10
langchain-community: v0.3.12
llama-cpp-python: v0.3.5
Model: llama-1B-q8_0.gguf --> llama-3.2-1B
Expected Behavior
The embed_documents method should process the embeddings correctly, regardless of whether the returned embedding structure is flat (List[float]) or nested (List[List[float]]).
Actual Behavior
The embed_documents method assumes the returned embeddings are flat (List[float]), but when the structure is nested (List[List[float]]), it fails with the following error:
TypeError: float() argument must be a string or a real number, not 'list'
System Info
(gpt310free) PS D:\Temp\Gpt> python -m langchain_core.sys_info
System Information
Package Information
Optional packages not installed
Other Dependencies
The text was updated successfully, but these errors were encountered: