Skip to content
This repository has been archived by the owner on Apr 4, 2019. It is now read-only.

Strange problem of Parquet files in S3 #56

Open
iskohl opened this issue Dec 19, 2017 · 0 comments
Open

Strange problem of Parquet files in S3 #56

iskohl opened this issue Dec 19, 2017 · 0 comments

Comments

@iskohl
Copy link

iskohl commented Dec 19, 2017

I use streamx to sink kafka data to S3 as parquet files, everything is fine, I can observe the logs, which give me the messages, that the parquet fiiles are generated as expected below,

Dec 19, 2017 8:02:35 AM INFO: org.apache.parquet.hadoop.ColumnChunkPageWriteStore: written 29B for [accessuri] BINARY: 1 values, 6B raw, 8B comp, 1 pages, encodings: [[2017-12-19 08:02:35,933] INFO Committed s3.test/topics/colin-forecast/year=2017/month=12/day=19/colin-forecast+0+0000045248+0000045248.parquet for colin-forecast-0 (io.confluent.connect.hdfs.TopicPartitionWriter:638)
[2017-12-19 08:02:35,947] INFO Got brand-new compressor [.snappy] (org.apache.hadoop.io.compress.CodecPool:153)
[2017-12-19 08:02:35,948] INFO Starting commit and rotation for topic partition colin-forecast-0 with start offsets {year=2017/month=12/day=19=45249} and end offsets {year=2017/month=12/day=19=45249} (io.confluent.connect.hdfs.TopicPartitionWriter:302)
[2017-12-19 08:02:35,949] INFO Committed s3.test/topics/colin-forecast/year=2017/month=12/day=19/colin-forecast+0+0000045249+0000045249.parquet for colin-forecast-0 (io.confluent.connect.hdfs.TopicPartitionWriter:638)
[2017-12-19 08:02:35,961] INFO Got brand-new compressor [.snappy] (org.apache.hadoop.io.compress.CodecPool:153)
[2017-12-19 08:02:35,962] INFO Starting commit and rotation for topic partition colin-forecast-0 with start offsets {year=2017/month=12/day=19=45250} and end offsets {year=2017/month=12/day=19=45250} (io.confluent.connect.hdfs.TopicPartitionWriter:302)
[2017-12-19 08:02:35,963] INFO Committed s3.test/topics/colin-forecast/year=2017/month=12/day=19/colin-forecast+0+0000045250+0000045250.parquet for colin-forecast-0 (io.confluent.connect.hdfs.TopicPartitionWriter:638)

But I cannot find the parquet files landed in S3, there is nothing in S3, why? Do I need some configuration at S3 side? Thanks in advanced.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant