Skip to content

Commit

Permalink
fix(embeddings): number of texts in Azure OpenAIEmbeddings batch (lan…
Browse files Browse the repository at this point in the history
…gchain-ai#10707)

This PR addresses the limitation of Azure OpenAI embeddings, which can
handle at maximum 16 texts in a batch. This can be solved setting
`chunk_size=16`. However, I'd love to have this automated, not to force
the user to figure where the issue comes from and how to solve it.

Closes langchain-ai#4575. 

@baskaryan

---------

Co-authored-by: Harrison Chase <[email protected]>
  • Loading branch information
mspronesti and hwchase17 authored Sep 20, 2023
1 parent 7395c28 commit f019835
Showing 1 changed file with 8 additions and 1 deletion.
9 changes: 8 additions & 1 deletion libs/langchain/langchain/embeddings/openai.py
Original file line number Diff line number Diff line change
Expand Up @@ -231,7 +231,7 @@ def build_extra(cls, values: Dict[str, Any]) -> Dict[str, Any]:
values["model_kwargs"] = extra
return values

@root_validator()
@root_validator(pre=True)
def validate_environment(cls, values: Dict) -> Dict:
"""Validate that api key and python package exists in environment."""
values["openai_api_key"] = get_from_dict_or_env(
Expand All @@ -257,8 +257,13 @@ def validate_environment(cls, values: Dict) -> Dict:
)
if values["openai_api_type"] in ("azure", "azure_ad", "azuread"):
default_api_version = "2022-12-01"
# Azure OpenAI embedding models allow a maximum of 16 texts
# at a time in each batch
# See: https://learn.microsoft.com/en-us/azure/ai-services/openai/reference#embeddings
default_chunk_size = 16
else:
default_api_version = ""
default_chunk_size = 1000
values["openai_api_version"] = get_from_dict_or_env(
values,
"openai_api_version",
Expand All @@ -271,6 +276,8 @@ def validate_environment(cls, values: Dict) -> Dict:
"OPENAI_ORGANIZATION",
default="",
)
if "chunk_size" not in values:
values["chunk_size"] = default_chunk_size
try:
import openai

Expand Down

0 comments on commit f019835

Please sign in to comment.