Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AzureSearch Oauth with ManagedIdentity using DefaultCredentials fallback results in a 403 #26595

Open
5 tasks done
gavinbarron opened this issue Sep 17, 2024 · 2 comments
Open
5 tasks done
Labels
Ɑ: vector store Related to vector store module

Comments

@gavinbarron
Copy link

gavinbarron commented Sep 17, 2024

Checked other resources

  • I added a very descriptive title to this issue.
  • I searched the LangChain documentation with the integrated search.
  • I used the GitHub search to find a similar question and didn't find it.
  • I am sure that this is a bug in LangChain rather than my code.
  • The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).

Example Code

When running in an AppService configured with a User assigned managed identity which has a number of permissions assinged I am unable to use the AzureSearch class.

As noted in #26216 explicitly passing an access token fails.

However the workaround to of supplying None also does not work when trying to use ManagedIdentity rather than a less secure option like Service Principal based auth.

from langchain_community.vectorstores.azuresearch import AzureSearch

# setup your connection params:
SEARCH_SERVICE_ENDPOINT= ".." #  Azure AI Search URL of the service to connect to (default URL is https://RESOURCE_NAME.search.windows.net)
embeddings = ... # replace with an instance of langchain.embeddings.base.Embeddings

# Try None for access_token to force fallback behavior in AzureSearch object ==> fails
db = AzureSearch(
    azure_search_endpoint= SEARCH_SERVICE_ENDPOINT,
    index_name="indexname",
    embedding_function=embeddings,
    azure_ad_access_token=None,
    azure_search_key=None
)

Error Message and Stack Trace (if applicable)

Here are the logs from our AppService, I've redacted a few things but the important things are still here.

2024-09-17T20:33:04.8837325Z 2024-09-17 20:33:04 - Incomplete environment configuration for EnvironmentCredential. These variables are set: AZURE_CLIENT_ID
2024-09-17T20:33:04.8838328Z 2024-09-17 20:33:04 - ManagedIdentityCredential will use App Service managed identity
2024-09-17T20:33:05.0344639Z 2024-09-17 20:33:05 - Request URL: 'http://<host>/msi/token?api-version=REDACTED&resource=REDACTED&client_id=REDACTED'
2024-09-17T20:33:05.0345552Z Request method: 'GET'
2024-09-17T20:33:05.0345607Z Request headers:
2024-09-17T20:33:05.0345647Z     'X-IDENTITY-HEADER': 'REDACTED'
2024-09-17T20:33:05.0345692Z     'User-Agent': 'azsdk-python-identity/1.17.1 Python/3.12.5 (Linux-5.15.158.2-1.cm2-x86_64-with-glibc2.36)'
2024-09-17T20:33:05.0345733Z No body was attached to the request
2024-09-17T20:33:05.2362589Z 2024-09-17 20:33:05 - Response status: 200
2024-09-17T20:33:05.2473564Z Response headers:
2024-09-17T20:33:05.2474192Z     'Content-Type': 'application/json; charset=utf-8'
2024-09-17T20:33:05.2474331Z     'Date': 'Tue, 17 Sep 2024 20:33:05 GMT'
2024-09-17T20:33:05.2474374Z     'Server': 'Kestrel'
2024-09-17T20:33:05.2474414Z     'Transfer-Encoding': 'chunked'
2024-09-17T20:33:05.2474456Z     'X-CORRELATION-ID': 'REDACTED'
2024-09-17T20:33:05.2655045Z 2024-09-17 20:33:05 - DefaultAzureCredential acquired a token from ManagedIdentityCredential
2024-09-17T20:33:05.2655454Z 2024-09-17 20:33:05 - Request URL: 'https://<search-service>.search.windows.net/indexes('index-name')?api-version=REDACTED'
2024-09-17T20:33:05.2662727Z Request method: 'GET'
2024-09-17T20:33:05.2662853Z Request headers:
2024-09-17T20:33:05.2662896Z     'Accept': 'application/json;odata.metadata=minimal'
2024-09-17T20:33:05.2662938Z     'x-ms-client-request-id': '0ab7a9d6-7534-11ef-acd8-da8272092cdd'
2024-09-17T20:33:05.2662983Z     'User-Agent': 'langchain azsdk-python-search-documents/11.5.1 Python/3.12.5 (Linux-5.15.158.2-1.cm2-x86_64-with-glibc2.36)'
2024-09-17T20:33:05.2663021Z     'Authorization': 'REDACTED'
2024-09-17T20:33:05.2663060Z No body was attached to the request
2024-09-17T20:33:05.7624539Z 2024-09-17 20:33:05 - Response status: 403
2024-09-17T20:33:05.7635306Z Response headers:
2024-09-17T20:33:05.7635393Z     'Content-Length': '55'
2024-09-17T20:33:05.7635437Z     'Content-Type': 'application/json; charset=utf-8'
2024-09-17T20:33:05.7635476Z     'Content-Language': 'REDACTED'
2024-09-17T20:33:05.7635514Z     'Server': 'Microsoft-IIS/10.0'
2024-09-17T20:33:05.7635553Z     'Strict-Transport-Security': 'REDACTED'
2024-09-17T20:33:05.7635591Z     'Preference-Applied': 'REDACTED'
2024-09-17T20:33:05.7635632Z     'request-id': '0ab7a9d6-7534-11ef-acd8-da8272092cdd'
2024-09-17T20:33:05.7643451Z     'elapsed-time': 'REDACTED'
2024-09-17T20:33:05.7643616Z     'Date': 'Tue, 17 Sep 2024 20:33:05 GMT'
2024-09-17T20:33:05.8131161Z 2024-09-17 20:33:05 - () Authorization failed.
2024-09-17T20:33:05.8139978Z Code:
2024-09-17T20:33:05.8140130Z Message: Authorization failed.
2024-09-17T20:33:05.8140181Z Traceback (most recent call last):
2024-09-17T20:33:05.8140235Z   File "/usr/local/lib/python3.12/site-packages/chainlit/utils.py", line 44, in wrapper
2024-09-17T20:33:05.8140282Z     return await user_function(**params_values)
2024-09-17T20:33:05.8530629Z            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-09-17T20:33:05.8533670Z   File "/usr/local/lib/python3.12/site-packages/chainlit/__init__.py", line 164, in with_parent_id
2024-09-17T20:33:05.8533730Z     await func(message)
2024-09-17T20:33:05.8533771Z   File "/app/chainlit_app.py", line 63, in on_message
2024-09-17T20:33:05.8533811Z     vector_store = get_vector_store(default_credential)
2024-09-17T20:33:05.8533850Z                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-09-17T20:33:05.8533896Z   File "/app/retrievers.py", line 30, in get_vector_store
2024-09-17T20:33:05.8533936Z     vector_store: AzureSearch = AzureSearch(
2024-09-17T20:33:05.8627932Z                                 ^^^^^^^^^^^^
2024-09-17T20:33:05.8628046Z   File "/usr/local/lib/python3.12/site-packages/langchain_community/vectorstores/azuresearch.py", line 335, in __init__
2024-09-17T20:33:05.8628086Z     self.client = _get_search_client(
2024-09-17T20:33:05.8628124Z                   ^^^^^^^^^^^^^^^^^^^
2024-09-17T20:33:05.8628168Z   File "/usr/local/lib/python3.12/site-packages/langchain_community/vectorstores/azuresearch.py", line 145, in _get_search_client
2024-09-17T20:33:05.8628208Z     index_client.get_index(name=index_name)
2024-09-17T20:33:05.8628251Z   File "/usr/local/lib/python3.12/site-packages/azure/core/tracing/decorator.py", line 94, in wrapper_use_tracer
2024-09-17T20:33:05.8690099Z     return func(*args, **kwargs)
2024-09-17T20:33:05.8690369Z            ^^^^^^^^^^^^^^^^^^^^^
2024-09-17T20:33:05.8690419Z   File "/usr/local/lib/python3.12/site-packages/azure/search/documents/indexes/_search_index_client.py", line 155, in get_index
2024-09-17T20:33:05.8690459Z     result = self._client.indexes.get(name, **kwargs)
2024-09-17T20:33:05.8690500Z              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-09-17T20:33:05.8690543Z   File "/usr/local/lib/python3.12/site-packages/azure/core/tracing/decorator.py", line 94, in wrapper_use_tracer
2024-09-17T20:33:05.8690581Z     return func(*args, **kwargs)
2024-09-17T20:33:05.8690619Z            ^^^^^^^^^^^^^^^^^^^^^
2024-09-17T20:33:05.8786775Z   File "/usr/local/lib/python3.12/site-packages/azure/search/documents/indexes/_generated/operations/_indexes_operations.py", line 849, in get
2024-09-17T20:33:05.8789138Z     raise HttpResponseError(response=response, model=error)
2024-09-17T20:33:05.8789925Z azure.core.exceptions.HttpResponseError: () Authorization failed.
2024-09-17T20:33:05.8790517Z Code:
2024-09-17T20:33:05.8790563Z Message: Authorization failed.

Description

In azuresearch.py the function _get_search_client uses fallback logic if the value for key and azure_ad_access_token are None then the logic on line 141 to build the SeachIndexClient looks like this:

SearchIndexClient(endpoint=endpoint, credential=credential, user_agent=user_agent)

I believe that this is the cause of the failure as digging deeper into the internal logic of the Azure library there is logic that will try to read an audience from the kwargs. When the SearchClient is using a TokenCredential this value is used to generate the scope for the underlying token request.

I believe that the fix for this issue is to modify the constructor call to pass the audience string for Azure search like this:

SearchIndexClient(endpoint=endpoint, credential=credential, user_agent=user_agent, audience="https://search.azure.com/")

System Info

Running those commands on my dev machine results in failure, but I build a container image based on python:3.12.5-slim-bookworm that install following packaged via requirements,txt

azure-identity==1.17.1
azure-search-documents==11.5.1
beautifulsoup4==4.12.3
bs4==0.0.2
chainlit==1.1.402
chardet==5.2.0
fastapi==0.110.3
idna==3.8
langchain==0.3.0
langchain-community==0.3.0
langchain-core==0.3.0
langchain-experimental==0.3.0
langchain-openai==0.2.0
langchain-text-splitters==0.3.0
langgraph==0.2.22
langgraph-checkpoint==1.0.10
langsmith==0.1.121
lxml==5.3.0
msal==1.30.0
msal-extensions==1.2.0
pydantic==2.8.2
pydantic_core==2.20.1
PyJWT==2.9.0
python-dotenv==1.0.1
tiktoken==0.7.0
uptrace==1.26.0
urllib3==2.2.2
uvicorn==0.25.0

platform = linux
python = 3.12.5
@dosubot dosubot bot added the Ɑ: vector store Related to vector store module label Sep 17, 2024
@khushiDesai
Copy link
Contributor

Hi, I am Khushi, a 4th year student at UofT CS. I’m working with my teammates @anushak18, @ashvini8, and @ssumaiyaahmed, who are also 4th year students at UofT CS. We would like to take the initiative to work on this issue and contribute to LangChain. We’re eager to help resolve the OAuth 403 error with AzureSearch and ManagedIdentity, and share our findings.

@koberghe
Copy link

koberghe commented Oct 22, 2024

@gavinbarron As far as I can see, the Azure library already defaults to "https://search.azure.com". I am also setting azure_search_key to None to make use of a ManagedIdentity and it seems to work fine.

EDIT: Sorry, mistake. I am using AzureCliCredential as a credential source, not ManagedIdentity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Ɑ: vector store Related to vector store module
Projects
None yet
Development

No branches or pull requests

3 participants