forked from ilimi-in/secor
-
Notifications
You must be signed in to change notification settings - Fork 16
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Adding support for hourly s3 data ingestion from secor:
Summary: - Add the capability of partition the data file in S3 using hourly folder: s3://pinlogs/secor_raw_logs/topic1/dt=2015-07-07/hr=05/ This way, the data file will be on S3 much sooner (hourly vs daily), people can check the result on S3 much sooner and this also opens the door to have an faster hourly data pipeline on Hadoop side as well. The hr folder values are from 00-23 - To trigger the hourly partition and finalization, add the following parameter in your secor config: * # partitioner.granularity.hour=true And change the upload threshold to less than one hour: * secor.max.file.age.seconds=3000 - Change the hive partition registration code to register partition using both dt and hr column (it does require the HIVE table to be created or altered to have both dt and hr as the partition columns) - The enhancements are done through the following: * Introduce a new interface Partitioner which knows what the last partition period to be finalized (generating SUCCESS file) and knows how to find the previous partition periods to be finalized upon. * Change the TimestampedMessageParser to implement Partitioner, allow it to extract both the dt and hr from the timestamp field, and knows how to traverse backwards to find the previous partitions * Change the Finalizer to work with the Partitioner to loop through the list of ready-to-be finalized partitions for both the hr and dt folder to generate the SUCCESS file and do hive registration - Added more unit test on message parser on hourly behavior - Added more E2E tests to cover the partitioner and hourly ingestion Test Plan: Added both unit tests and e2e tests, and tested manually for a new topic on S3
- Loading branch information
Henry Cai
committed
Jul 22, 2015
1 parent
60f4629
commit 32f19e0
Showing
8 changed files
with
239 additions
and
119 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.