Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trino fails to correctly provision IRSA #24096

Open
mgorbatenko opened this issue Nov 11, 2024 · 1 comment
Open

Trino fails to correctly provision IRSA #24096

mgorbatenko opened this issue Nov 11, 2024 · 1 comment

Comments

@mgorbatenko
Copy link

I believe this is the same issue described in issue #15267 but that issue has since been closed.

We are running version 453 on EKS and are using the official Trino helm chart with legacy s3 support. We also have hive.metastore.glue.use-web-identity-token-credentials-provider set to true which was suggested as a solution in the original issue.

Most of the time this setup works fine, but sometimes one or more pods get into a bad state where they can't get the right credentials and split processing begins to fail with stack traces like this:

io.trino.spi.TrinoException: Error opening Hive split s3://amperity-tenant-nnebwd/tables/31JbBg91EHaproQ/2FJ/part-00023-8207dbc7-7342-4b2a-a421-54a409618b96-c000.snappy.parquet (offset=16942990, length=16942991): Read 49152 tail bytes of file s3://amperity-tenant-nnebwd/tables/31JbBg91EHaproQ/2FJ/part-00023-8207dbc7-7342-4b2a-a421-54a409618b96-c000.snappy.parquet failed: com.amazonaws.services.s3.model.AmazonS3Exception: Access Denied (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: C889VCGHTHDKR1PQ; S3 Extended Request ID: 5U7/BTdY6W5ZnUAFwHwZmD2b6JMdPd9NfIuCk5KiB5wumdC2uTIhQ+N0SFbrWFRkt/cUdUqV40ab/v2EWAnQTiiNzqnLYBFRF6/VR7MMxc8=; Proxy: null), S3 Extended Request ID: 5U7/BTdY6W5ZnUAFwHwZmD2b6JMdPd9NfIuCk5KiB5wumdC2uTIhQ+N0SFbrWFRkt/cUdUqV40ab/v2EWAnQTiiNzqnLYBFRF6/VR7MMxc8= (Bucket: amperity-tenant-nnebwd, Key: tables/31JbBg91EHaproQ/2FJ/part-00023-8207dbc7-7342-4b2a-a421-54a409618b96-c000.snappy.parquet)
	at io.trino.plugin.hive.parquet.ParquetPageSourceFactory.createPageSource(ParquetPageSourceFactory.java:306)
	at io.trino.plugin.hive.parquet.ParquetPageSourceFactory.createPageSource(ParquetPageSourceFactory.java:180)
	at io.trino.plugin.hive.HivePageSourceProvider.createHivePageSource(HivePageSourceProvider.java:202)
	at io.trino.plugin.hive.HivePageSourceProvider.createPageSource(HivePageSourceProvider.java:139)
	at io.trino.plugin.base.classloader.ClassLoaderSafeConnectorPageSourceProvider.createPageSource(ClassLoaderSafeConnectorPageSourceProvider.java:48)
	at io.trino.split.PageSourceManager$PageSourceProviderInstance.createPageSource(PageSourceManager.java:79)
	at io.trino.operator.ScanFilterAndProjectOperator$SplitToPages.process(ScanFilterAndProjectOperator.java:260)
	at io.trino.operator.ScanFilterAndProjectOperator$SplitToPages.process(ScanFilterAndProjectOperator.java:191)
	at io.trino.operator.WorkProcessorUtils$3.process(WorkProcessorUtils.java:359)
	at io.trino.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:423)
	at io.trino.operator.WorkProcessorUtils$3.process(WorkProcessorUtils.java:346)
	at io.trino.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:423)
	at io.trino.operator.WorkProcessorUtils$3.process(WorkProcessorUtils.java:346)
	at io.trino.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:423)
	at io.trino.operator.WorkProcessorUtils.getNextState(WorkProcessorUtils.java:261)
	at io.trino.operator.WorkProcessorUtils.lambda$processStateMonitor$2(WorkProcessorUtils.java:240)
	at io.trino.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:423)
	at io.trino.operator.WorkProcessorUtils.getNextState(WorkProcessorUtils.java:261)
	at io.trino.operator.WorkProcessorUtils.lambda$finishWhen$3(WorkProcessorUtils.java:255)
	at io.trino.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:423)
	at io.trino.operator.WorkProcessorSourceOperatorAdapter.getOutput(WorkProcessorSourceOperatorAdapter.java:133)
	at io.trino.operator.Driver.processInternal(Driver.java:403)
	at io.trino.operator.Driver.lambda$process$8(Driver.java:306)
	at io.trino.operator.Driver.tryWithLock(Driver.java:709)
	at io.trino.operator.Driver.process(Driver.java:298)
	at io.trino.operator.Driver.processForDuration(Driver.java:269)
	at io.trino.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:890)
	at io.trino.execution.executor.timesharing.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:187)
	at io.trino.execution.executor.timesharing.TimeSharingTaskExecutor$TaskRunner.run(TimeSharingTaskExecutor.java:565)
	at io.trino.$gen.Trino_453____20240807_192932_2.run(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
	at java.base/java.lang.Thread.run(Thread.java:1570)
Caused by: java.io.IOException: Read 49152 tail bytes of file s3://amperity-tenant-nnebwd/tables/31JbBg91EHaproQ/2FJ/part-00023-8207dbc7-7342-4b2a-a421-54a409618b96-c000.snappy.parquet failed: com.amazonaws.services.s3.model.AmazonS3Exception: Access Denied (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: C889VCGHTHDKR1PQ; S3 Extended Request ID: 5U7/BTdY6W5ZnUAFwHwZmD2b6JMdPd9NfIuCk5KiB5wumdC2uTIhQ+N0SFbrWFRkt/cUdUqV40ab/v2EWAnQTiiNzqnLYBFRF6/VR7MMxc8=; Proxy: null), S3 Extended Request ID: 5U7/BTdY6W5ZnUAFwHwZmD2b6JMdPd9NfIuCk5KiB5wumdC2uTIhQ+N0SFbrWFRkt/cUdUqV40ab/v2EWAnQTiiNzqnLYBFRF6/VR7MMxc8= (Bucket: amperity-tenant-nnebwd, Key: tables/31JbBg91EHaproQ/2FJ/part-00023-8207dbc7-7342-4b2a-a421-54a409618b96-c000.snappy.parquet)
	at io.trino.filesystem.hdfs.HdfsInput.readTail(HdfsInput.java:71)
	at io.trino.filesystem.TrinoInput.readTail(TrinoInput.java:43)
	at io.trino.filesystem.tracing.TracingInput.lambda$readTail$3(TracingInput.java:81)
	at io.trino.filesystem.tracing.Tracing.withTracing(Tracing.java:47)
	at io.trino.filesystem.tracing.TracingInput.readTail(TracingInput.java:81)
	at io.trino.plugin.hive.parquet.TrinoParquetDataSource.readTailInternal(TrinoParquetDataSource.java:54)
	at io.trino.parquet.AbstractParquetDataSource.readTail(AbstractParquetDataSource.java:100)
	at io.trino.parquet.reader.MetadataReader.readFooter(MetadataReader.java:101)
	at io.trino.plugin.hive.parquet.ParquetPageSourceFactory.createPageSource(ParquetPageSourceFactory.java:226)
	... 32 more
Caused by: io.trino.hdfs.s3.TrinoS3FileSystem.UnrecoverableS3OperationException: com.amazonaws.services.s3.model.AmazonS3Exception: Access Denied (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: C889VCGHTHDKR1PQ; S3 Extended Request ID: 5U7/BTdY6W5ZnUAFwHwZmD2b6JMdPd9NfIuCk5KiB5wumdC2uTIhQ+N0SFbrWFRkt/cUdUqV40ab/v2EWAnQTiiNzqnLYBFRF6/VR7MMxc8=; Proxy: null), S3 Extended Request ID: 5U7/BTdY6W5ZnUAFwHwZmD2b6JMdPd9NfIuCk5KiB5wumdC2uTIhQ+N0SFbrWFRkt/cUdUqV40ab/v2EWAnQTiiNzqnLYBFRF6/VR7MMxc8= (Bucket: amperity-tenant-nnebwd, Key: tables/31JbBg91EHaproQ/2FJ/part-00023-8207dbc7-7342-4b2a-a421-54a409618b96-c000.snappy.parquet)
	at io.trino.hdfs.s3.TrinoS3FileSystem$TrinoS3InputStream.lambda$openStream$2(TrinoS3FileSystem.java:1585)
	at io.trino.hdfs.s3.RetryDriver.run(RetryDriver.java:125)
	at io.trino.hdfs.s3.TrinoS3FileSystem$TrinoS3InputStream.openStream(TrinoS3FileSystem.java:1571)
	at io.trino.hdfs.s3.TrinoS3FileSystem$TrinoS3InputStream.openStream(TrinoS3FileSystem.java:1556)
	at io.trino.hdfs.s3.TrinoS3FileSystem$TrinoS3InputStream.seekStream(TrinoS3FileSystem.java:1549)
	at io.trino.hdfs.s3.TrinoS3FileSystem$TrinoS3InputStream.lambda$read$1(TrinoS3FileSystem.java:1493)
	at io.trino.hdfs.s3.RetryDriver.run(RetryDriver.java:125)
	at io.trino.hdfs.s3.TrinoS3FileSystem$TrinoS3InputStream.read(TrinoS3FileSystem.java:1492)
	at java.base/java.io.BufferedInputStream.read1(BufferedInputStream.java:345)
	at java.base/java.io.BufferedInputStream.implRead(BufferedInputStream.java:420)
	at java.base/java.io.BufferedInputStream.read(BufferedInputStream.java:405)
	at java.base/java.io.DataInputStream.read(DataInputStream.java:158)
	at java.base/java.io.DataInputStream.read(DataInputStream.java:158)
	at io.trino.hdfs.FSDataInputStreamTail.readTail(FSDataInputStreamTail.java:59)
	at io.trino.filesystem.hdfs.HdfsInput.readTail(HdfsInput.java:63)
	... 40 more

Restarting the pods corrects the issue and they start back up in the right state. However, sometimes it requires multiple restarts. Everything seems to point to the WebIdentityTokenCredentialsProvider being interrupted or failing. It sounds like @Pluies solved this by implementing a custom WebIdentityTokenCredentialsProvider. We may have to do this since it doesn't appear hive.metastore.glue.use-web-identity-token-credentials-provider actually solves the issue.

Would greatly appreciate any help here as this is a disruptive pattern!

@hashhar
Copy link
Member

hashhar commented Nov 20, 2024

for legacy s3 the docs are misleading, see #23544

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants