Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DefaultAzureCredential() not being used with anon=False keyword passed in storage_options #411

Open
Seb-Unit8 opened this issue Apr 13, 2023 · 2 comments

Comments

@Seb-Unit8
Copy link

Versions

  • adlfs: 2023.1.0
  • fsspec: 2023.4.0

Summary:

Hello,

I am not experiencing the expected behaviour introduced in #262 and documented in the project's README > Details > Setting credentials > 2:
"2. Auto credential solving using Azure's DefaultAzureCredential() library: storage_options={'account_name': ACCOUNT_NAME, 'anon': False} will use DefaultAzureCredential to get valid credentials to the container ACCOUNT_NAME. DefaultAzureCredential attempts to authenticate via the mechanisms and order visualized here."

The following code snippet outputs the expected return of the containers list:

from azure.storage.blob import BlobServiceClient
from azure.identity import DefaultAzureCredential
name : str = "<redacted>"
print([a for a in BlobServiceClient(f"https://{name}.blob.core.windows.net/", DefaultAzureCredential()).list_containers()])

verifying that the managed identity for this VM has the right permissions (Storage Blob Data Contributor).

However, the following code

import fsspec
container : str = "<redacted>"
subpath : str = "<redacted>"
fallback_options = {"account_name":f"{name}", "anon": False}
fsspec.filesystem("az", storage_options=fallback_options)
fsspec.get_mapper(f"az://{container}/{subpath}", storage_options = fallback_options})

run in the same environment throws the error:
ValueError: unable to connect to account for Must provide either a connection_string or account_name with credentials!!

Is anyone able to identify why the DefaultAzureCredential fallback is not being triggered even though I have specified the anon=False keyword?

Thanks for any help.

@charmoniumQ
Copy link

Here is a workaround that works for me:

import adlfs
import azure.identity.aio
abfs = adlfs.AzureBlobFileStorage(account_name=account_name, credential=azure.identity.aio.DefaultAzureCredential())
abfs.ls(container_name + "/" + subpath)

@mikwieczorek
Copy link

I encountered the same problem when running code that uses adlfs on ComputeInstance (CI) in AzureML with User-managed identity.
The identity has correct permission, which I can confirm running:

az login --identity --username xxx
az storage blob list --account-name SANAME --container-name MYCONTAINER --output table

However, it seems that automatic credentials resolution takes SystemAssigned Identity instead of User-manged identity assigned to the CI. Looking into DefaultCredentials Resolution Order Managed-identity should be correctly resolved, but it is not.

It seems like CI always have SystemAssigned Identity (?) and it may take precedence over User-managed identity. Digging into Azure identity python SDK it seems like setting a single environment variable should work and it indeed does:

import os
os.environ['AZURE_CLIENT_ID'] = 'xxx'
storage_options = {'account_name': SANAME, 'anon': False}
ddf = dd.read_parquet('az://MYCONTAINER/*.csv', storage_options=storage_options)

What would be nice for adlfs is an option to provide two arguments to storage_options, namely: storage_options = {'account_name': SANAME, 'client_id': 'xxx'} and as a result passed client_id should be used to fetch credentials. Currently such combination results in error: ValueError: secret should be an Azure Active Directory application's client secret

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants