Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use of temporary credentials via env variables #185

Open
borgoat opened this issue Jun 8, 2023 · 0 comments
Open

Use of temporary credentials via env variables #185

borgoat opened this issue Jun 8, 2023 · 0 comments

Comments

@borgoat
Copy link

borgoat commented Jun 8, 2023

Hi!

I'm trying to use this as part of a Spark/Glue job (using the DynamoDB connector as a Glue DataSource1), and while developing locally I'd like to use the environment variables to authenticate. I am using IAM Identity Center (formerly, AWS SSO), so I'm trying to set the usual AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_SESSION_TOKEN env vars.

However, DynamoDBClient misinterprets this, and forces the SDK client to be instantiated with BasicAWSCredentials (which uses the access key and secret access key only).

This method getAwsCredentialsProvider2 appears to be the culprit in misconfiguring the DynamoDB client.

I was able to force the right provider with this configuration, but obviously, I'd rather avoid hard-coding it in my job, especially as this is likely only needed for local development...

glueContext
      .getSourceWithFormat(
        connectionType = "dynamodb",
        options = JsonOptions(
          Map(
            "dynamodb.input.tableName" -> "[redacted]",
            "dynamodb.regionid" -> "eu-west-1",
            "dynamodb.customAWSCredentialsProvider" -> "com.amazonaws.auth.EnvironmentVariableCredentialsProvider"
          )
        )
      )

Anyway, I find this behaviour surprising. I'm probably missing the larger context, but I have to say, in my experience, I'm yet to find a case where setting up an AWS SDK client with explicit credentials is the way to go... Usually, the SDK implicit, env-based configuration, works out-of-the-box in just about any deployment and development scenario, being predictable and consistent across different environments and languages.

But I reckon this has to do with some idiosyncrasies of Hadoop and/or Spark? Or am I doing something wrong? How do others handle similar scenarios?

Footnotes

  1. https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-connect.html#aws-glue-programming-etl-connect-dynamodb

  2. https://github.com/awslabs/emr-dynamodb-connector/blob/40f605acc03926cca4b086db3873d488e316a547/emr-dynamodb-hadoop/src/main/java/org/apache/hadoop/dynamodb/DynamoDBClient.java#L455

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant