You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I searched existing ideas and did not find a similar one
I added a very descriptive title
I've clearly described the feature request and motivation for it
Feature request
Currently, a 'text' field is automatically inserted into Document's metadata while uploading (upsert) to Pinecone. Can we have this as a feature by flagging in on/off via a parameter, e.g. include_text
Motivation
Pinecone suggets high-cardinal metadata can negatively affect the performance of the search/query. moreover, not having the text as a metada can tremendously save space.
Proposal (If applicable)
not sure if the following works out the box or how this affects batching or RAG?
@classmethoddeffrom_texts(
cls,
texts: List[str],
embedding: Embeddings,
metadatas: Optional[List[dict]] =None,
ids: Optional[List[str]] =None,
batch_size: int=32,
text_key: str="text",
include_text: Optional[bool] =False,
index_name: Optional[str] =None,
namespace: Optional[str] =None,
**kwargs: Any,
) ->Pinecone:
""" [TBD] """try:
importpineconeexceptImportError:
raiseValueError(
"Could not import pinecone python package. ""Please install it with `pip install pinecone-client`."
)
indexes=pinecone.list_indexes() # checks if provided index existsifindex_nameinindexes:
index=pinecone.Index(index_name)
eliflen(indexes) ==0:
raiseValueError(
"No active indexes found in your Pinecone project, ""are you sure you're using the right API key and environment?"
)
else:
raiseValueError(
f"Index '{index_name}' not found in your Pinecone project. "f"Did you mean one of the following indexes: {', '.join(indexes)}"
)
foriinrange(0, len(texts), batch_size):
# set end position of batchi_end=min(i+batch_size, len(texts))
# get batch of texts and idslines_batch=texts[i:i_end]
# create ids if not providedifids:
ids_batch=ids[i:i_end]
else:
ids_batch= [str(uuid.uuid4()) forninrange(i, i_end)]
# create embeddingsembeds=embedding.embed_documents(lines_batch)
# prep metadata and upsert batchifmetadatas:
metadata=metadatas[i:i_end]
else:
metadata= [{} for_inrange(i, i_end)]
forj, lineinenumerate(lines_batch):
ifinclude_text:
metadata[j][text_key] =lineto_upsert=zip(ids_batch, embeds, metadata)
# upsert to Pineconeindex.upsert(vectors=list(to_upsert), namespace=namespace)
returncls(index, embedding.embed_query, text_key, namespace)
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Checked
Feature request
Currently, a 'text' field is automatically inserted into Document's metadata while uploading (upsert) to Pinecone. Can we have this as a feature by flagging in on/off via a parameter, e.g.
include_text
Motivation
Pinecone suggets high-cardinal metadata can negatively affect the performance of the search/query. moreover, not having the text as a metada can tremendously save space.
Proposal (If applicable)
not sure if the following works out the box or how this affects batching or RAG?
Beta Was this translation helpful? Give feedback.
All reactions