You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I believe this is the same issue described in issue #15267 but that issue has since been closed.
We are running version 453 on EKS and are using the official Trino helm chart with legacy s3 support. We also have hive.metastore.glue.use-web-identity-token-credentials-provider set to true which was suggested as a solution in the original issue.
Most of the time this setup works fine, but sometimes one or more pods get into a bad state where they can't get the right credentials and split processing begins to fail with stack traces like this:
io.trino.spi.TrinoException: Error opening Hive split s3://amperity-tenant-nnebwd/tables/31JbBg91EHaproQ/2FJ/part-00023-8207dbc7-7342-4b2a-a421-54a409618b96-c000.snappy.parquet (offset=16942990, length=16942991): Read 49152 tail bytes of file s3://amperity-tenant-nnebwd/tables/31JbBg91EHaproQ/2FJ/part-00023-8207dbc7-7342-4b2a-a421-54a409618b96-c000.snappy.parquet failed: com.amazonaws.services.s3.model.AmazonS3Exception: Access Denied (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: C889VCGHTHDKR1PQ; S3 Extended Request ID: 5U7/BTdY6W5ZnUAFwHwZmD2b6JMdPd9NfIuCk5KiB5wumdC2uTIhQ+N0SFbrWFRkt/cUdUqV40ab/v2EWAnQTiiNzqnLYBFRF6/VR7MMxc8=; Proxy: null), S3 Extended Request ID: 5U7/BTdY6W5ZnUAFwHwZmD2b6JMdPd9NfIuCk5KiB5wumdC2uTIhQ+N0SFbrWFRkt/cUdUqV40ab/v2EWAnQTiiNzqnLYBFRF6/VR7MMxc8= (Bucket: amperity-tenant-nnebwd, Key: tables/31JbBg91EHaproQ/2FJ/part-00023-8207dbc7-7342-4b2a-a421-54a409618b96-c000.snappy.parquet)
at io.trino.plugin.hive.parquet.ParquetPageSourceFactory.createPageSource(ParquetPageSourceFactory.java:306)
at io.trino.plugin.hive.parquet.ParquetPageSourceFactory.createPageSource(ParquetPageSourceFactory.java:180)
at io.trino.plugin.hive.HivePageSourceProvider.createHivePageSource(HivePageSourceProvider.java:202)
at io.trino.plugin.hive.HivePageSourceProvider.createPageSource(HivePageSourceProvider.java:139)
at io.trino.plugin.base.classloader.ClassLoaderSafeConnectorPageSourceProvider.createPageSource(ClassLoaderSafeConnectorPageSourceProvider.java:48)
at io.trino.split.PageSourceManager$PageSourceProviderInstance.createPageSource(PageSourceManager.java:79)
at io.trino.operator.ScanFilterAndProjectOperator$SplitToPages.process(ScanFilterAndProjectOperator.java:260)
at io.trino.operator.ScanFilterAndProjectOperator$SplitToPages.process(ScanFilterAndProjectOperator.java:191)
at io.trino.operator.WorkProcessorUtils$3.process(WorkProcessorUtils.java:359)
at io.trino.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:423)
at io.trino.operator.WorkProcessorUtils$3.process(WorkProcessorUtils.java:346)
at io.trino.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:423)
at io.trino.operator.WorkProcessorUtils$3.process(WorkProcessorUtils.java:346)
at io.trino.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:423)
at io.trino.operator.WorkProcessorUtils.getNextState(WorkProcessorUtils.java:261)
at io.trino.operator.WorkProcessorUtils.lambda$processStateMonitor$2(WorkProcessorUtils.java:240)
at io.trino.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:423)
at io.trino.operator.WorkProcessorUtils.getNextState(WorkProcessorUtils.java:261)
at io.trino.operator.WorkProcessorUtils.lambda$finishWhen$3(WorkProcessorUtils.java:255)
at io.trino.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:423)
at io.trino.operator.WorkProcessorSourceOperatorAdapter.getOutput(WorkProcessorSourceOperatorAdapter.java:133)
at io.trino.operator.Driver.processInternal(Driver.java:403)
at io.trino.operator.Driver.lambda$process$8(Driver.java:306)
at io.trino.operator.Driver.tryWithLock(Driver.java:709)
at io.trino.operator.Driver.process(Driver.java:298)
at io.trino.operator.Driver.processForDuration(Driver.java:269)
at io.trino.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:890)
at io.trino.execution.executor.timesharing.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:187)
at io.trino.execution.executor.timesharing.TimeSharingTaskExecutor$TaskRunner.run(TimeSharingTaskExecutor.java:565)
at io.trino.$gen.Trino_453____20240807_192932_2.run(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
at java.base/java.lang.Thread.run(Thread.java:1570)
Caused by: java.io.IOException: Read 49152 tail bytes of file s3://amperity-tenant-nnebwd/tables/31JbBg91EHaproQ/2FJ/part-00023-8207dbc7-7342-4b2a-a421-54a409618b96-c000.snappy.parquet failed: com.amazonaws.services.s3.model.AmazonS3Exception: Access Denied (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: C889VCGHTHDKR1PQ; S3 Extended Request ID: 5U7/BTdY6W5ZnUAFwHwZmD2b6JMdPd9NfIuCk5KiB5wumdC2uTIhQ+N0SFbrWFRkt/cUdUqV40ab/v2EWAnQTiiNzqnLYBFRF6/VR7MMxc8=; Proxy: null), S3 Extended Request ID: 5U7/BTdY6W5ZnUAFwHwZmD2b6JMdPd9NfIuCk5KiB5wumdC2uTIhQ+N0SFbrWFRkt/cUdUqV40ab/v2EWAnQTiiNzqnLYBFRF6/VR7MMxc8= (Bucket: amperity-tenant-nnebwd, Key: tables/31JbBg91EHaproQ/2FJ/part-00023-8207dbc7-7342-4b2a-a421-54a409618b96-c000.snappy.parquet)
at io.trino.filesystem.hdfs.HdfsInput.readTail(HdfsInput.java:71)
at io.trino.filesystem.TrinoInput.readTail(TrinoInput.java:43)
at io.trino.filesystem.tracing.TracingInput.lambda$readTail$3(TracingInput.java:81)
at io.trino.filesystem.tracing.Tracing.withTracing(Tracing.java:47)
at io.trino.filesystem.tracing.TracingInput.readTail(TracingInput.java:81)
at io.trino.plugin.hive.parquet.TrinoParquetDataSource.readTailInternal(TrinoParquetDataSource.java:54)
at io.trino.parquet.AbstractParquetDataSource.readTail(AbstractParquetDataSource.java:100)
at io.trino.parquet.reader.MetadataReader.readFooter(MetadataReader.java:101)
at io.trino.plugin.hive.parquet.ParquetPageSourceFactory.createPageSource(ParquetPageSourceFactory.java:226)
... 32 more
Caused by: io.trino.hdfs.s3.TrinoS3FileSystem.UnrecoverableS3OperationException: com.amazonaws.services.s3.model.AmazonS3Exception: Access Denied (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: C889VCGHTHDKR1PQ; S3 Extended Request ID: 5U7/BTdY6W5ZnUAFwHwZmD2b6JMdPd9NfIuCk5KiB5wumdC2uTIhQ+N0SFbrWFRkt/cUdUqV40ab/v2EWAnQTiiNzqnLYBFRF6/VR7MMxc8=; Proxy: null), S3 Extended Request ID: 5U7/BTdY6W5ZnUAFwHwZmD2b6JMdPd9NfIuCk5KiB5wumdC2uTIhQ+N0SFbrWFRkt/cUdUqV40ab/v2EWAnQTiiNzqnLYBFRF6/VR7MMxc8= (Bucket: amperity-tenant-nnebwd, Key: tables/31JbBg91EHaproQ/2FJ/part-00023-8207dbc7-7342-4b2a-a421-54a409618b96-c000.snappy.parquet)
at io.trino.hdfs.s3.TrinoS3FileSystem$TrinoS3InputStream.lambda$openStream$2(TrinoS3FileSystem.java:1585)
at io.trino.hdfs.s3.RetryDriver.run(RetryDriver.java:125)
at io.trino.hdfs.s3.TrinoS3FileSystem$TrinoS3InputStream.openStream(TrinoS3FileSystem.java:1571)
at io.trino.hdfs.s3.TrinoS3FileSystem$TrinoS3InputStream.openStream(TrinoS3FileSystem.java:1556)
at io.trino.hdfs.s3.TrinoS3FileSystem$TrinoS3InputStream.seekStream(TrinoS3FileSystem.java:1549)
at io.trino.hdfs.s3.TrinoS3FileSystem$TrinoS3InputStream.lambda$read$1(TrinoS3FileSystem.java:1493)
at io.trino.hdfs.s3.RetryDriver.run(RetryDriver.java:125)
at io.trino.hdfs.s3.TrinoS3FileSystem$TrinoS3InputStream.read(TrinoS3FileSystem.java:1492)
at java.base/java.io.BufferedInputStream.read1(BufferedInputStream.java:345)
at java.base/java.io.BufferedInputStream.implRead(BufferedInputStream.java:420)
at java.base/java.io.BufferedInputStream.read(BufferedInputStream.java:405)
at java.base/java.io.DataInputStream.read(DataInputStream.java:158)
at java.base/java.io.DataInputStream.read(DataInputStream.java:158)
at io.trino.hdfs.FSDataInputStreamTail.readTail(FSDataInputStreamTail.java:59)
at io.trino.filesystem.hdfs.HdfsInput.readTail(HdfsInput.java:63)
... 40 more
Restarting the pods corrects the issue and they start back up in the right state. However, sometimes it requires multiple restarts. Everything seems to point to the WebIdentityTokenCredentialsProvider being interrupted or failing. It sounds like @Pluies solved this by implementing a custom WebIdentityTokenCredentialsProvider. We may have to do this since it doesn't appear hive.metastore.glue.use-web-identity-token-credentials-provider actually solves the issue.
Would greatly appreciate any help here as this is a disruptive pattern!
The text was updated successfully, but these errors were encountered:
I believe this is the same issue described in issue #15267 but that issue has since been closed.
We are running version 453 on EKS and are using the official Trino helm chart with legacy s3 support. We also have
hive.metastore.glue.use-web-identity-token-credentials-provider
set totrue
which was suggested as a solution in the original issue.Most of the time this setup works fine, but sometimes one or more pods get into a bad state where they can't get the right credentials and split processing begins to fail with stack traces like this:
Restarting the pods corrects the issue and they start back up in the right state. However, sometimes it requires multiple restarts. Everything seems to point to the
WebIdentityTokenCredentialsProvider
being interrupted or failing. It sounds like @Pluies solved this by implementing a customWebIdentityTokenCredentialsProvider
. We may have to do this since it doesn't appearhive.metastore.glue.use-web-identity-token-credentials-provider
actually solves the issue.Would greatly appreciate any help here as this is a disruptive pattern!
The text was updated successfully, but these errors were encountered: