Skip to content
This repository has been archived by the owner on Apr 4, 2019. It is now read-only.

S3 partition file per hourly batch #47

Open
panda87 opened this issue Jul 26, 2017 · 1 comment
Open

S3 partition file per hourly batch #47

panda87 opened this issue Jul 26, 2017 · 1 comment

Comments

@panda87
Copy link

panda87 commented Jul 26, 2017

Hi

I'd like to know if there is an option to write one file per partition which means per hour.
For example, if i have 5 workers with 5 tasks, and I run hourly batch, is this plugins would know to aggregate the data to one file per the running batch?

Thanks
D.

@OneCricketeer
Copy link

You could use the TimeBasedPartitioner and a rotation interval configured for an hour.

However, this is not recommended for large volume topics and the Connector needs to hold an hour worth of data.

Also, why do you need this? Spark, Presto, Pig, Hive, etc can all read multiple files from an upper level s3 path

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants