forked from ilimi-in/secor
-
Notifications
You must be signed in to change notification settings - Fork 16
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Instead of hard-coding daily/hourly concept in the PartitionFinalizer. Move the concept of partition boundary and partition ordering into a new interface Partitioner. PartitionFinalizer will work with the Partitioner to the last partition to finalize up to and collect all the partitions previous to that partition from partitioner.
- Loading branch information
Henry Cai
committed
Jul 16, 2015
1 parent
bec77cd
commit 05159e8
Showing
7 changed files
with
381 additions
and
305 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -42,13 +42,13 @@ | |
* @author Pawel Garbacki ([email protected]) | ||
*/ | ||
public class LogFilePath { | ||
private String mPrefix; | ||
private String mTopic; | ||
private String[] mPartitions; | ||
private int mGeneration; | ||
private int mKafkaPartition; | ||
private long mOffset; | ||
private String mExtension; | ||
private final String mPrefix; | ||
private final String mTopic; | ||
private final String[] mPartitions; | ||
private final int mGeneration; | ||
private final int mKafkaPartition; | ||
private final long mOffset; | ||
private final String mExtension; | ||
|
||
public LogFilePath(String prefix, int generation, long lastCommittedOffset, | ||
ParsedMessage message, String extension) { | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
326 changes: 112 additions & 214 deletions
326
src/main/java/com/pinterest/secor/parser/PartitionFinalizer.java
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,97 @@ | ||
/** | ||
* Licensed to the Apache Software Foundation (ASF) under one or more | ||
* contributor license agreements. See the NOTICE file distributed with | ||
* this work for additional information regarding copyright ownership. | ||
* The ASF licenses this file to You under the Apache License, Version 2.0 | ||
* (the "License"); you may not use this file except in compliance with | ||
* the License. You may obtain a copy of the License at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, software | ||
* distributed under the License is distributed on an "AS IS" BASIS, | ||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
* See the License for the specific language governing permissions and | ||
* limitations under the License. | ||
*/ | ||
package com.pinterest.secor.parser; | ||
|
||
import com.pinterest.secor.message.Message; | ||
|
||
import java.util.List; | ||
|
||
/** | ||
* The Partitioner knows when to finalize a file folder partition. | ||
* | ||
* A file folder partition (e.g. dt=2015-07-07) can be finalized when all | ||
* messages in that date arrived. The caller (PartitionFinalizer) will do the | ||
* finalization work (e.g. generate _SUCCESS file, perform hive registration) | ||
* | ||
* The partitioner provide the method to calculate the range of file | ||
* folder partitions to be finalized and provide the method to iterate through | ||
* the range. | ||
* | ||
* The caller will first provide a list of last-consumed messages for a given | ||
* kafka topic and call #getFinalizedUptoPartitions to get the finalized-up-to | ||
* partition and then walk backwards by calling #getPreviousPartitions to | ||
* collect all the previous partitions which are ready to be finalized. | ||
* | ||
* Note that finalize-up-to partition itself is not inclusive in the range of | ||
* partitions to be finalized. | ||
* | ||
* The caller might repeat this loop multiple times when the filesystem partition | ||
* is multi-dimentional (e.g. [dt=2015-07-07,hr=05]). it will loop once for the | ||
* hourly folder finalization and another time for the daily folder. | ||
* | ||
* Note that although we use daily/hourly partition illustrate the use of | ||
* partitioner, it is be no means the partitioner can only work with timestamp | ||
* based partitioning, it should also be able to work with offset based | ||
* partitioning as long as we establish an iterating order within those | ||
* partitions. | ||
* | ||
* @author Henry Cai ([email protected]) | ||
*/ | ||
public interface Partitioner { | ||
/** | ||
* Calculates the partition to finalize-up-to from a list of last-consumed | ||
* messages and a list of last-enqueued messages. | ||
* | ||
* For each kafka topic/partition for a given topic, the caller will provide | ||
* two messages: | ||
* * lastMessage: the last message at the tail of the kafka queue | ||
* * committedMessage: the message secor consumed and committed | ||
* And then iterate over all the kafka topic partitions for the given topic, | ||
* the caller will gather the above two messages into two lists. | ||
* | ||
* The Partitioner will compare the messages from all kafka partitions to | ||
* see which one is the earliest to finalize up to. The partitioner will | ||
* normally use the timestamp from the committedMessage to decide | ||
* the finalize time. But for some slow topics where there is no new | ||
* messages coming for a while (i.e. lastMessage == committedMessage), | ||
* the partitioner can use the current time as the finalize time. | ||
* | ||
* Note that the up-to partition itself is not inclusive in the range to be | ||
* finalized. For example, when the last message is in 2015-07-07, | ||
* 7/7 itself is not complete yet. | ||
* | ||
* Note also that the partitioner might want to adjust down the finalize | ||
* time to allow a safety lag for late arrival messages. e.g. adding one | ||
* extra hour lag | ||
* | ||
* @param lastMessages the last message at the tail of the queue | ||
* @param committedMessages the message secor consumed and committed | ||
* | ||
* @return a String array to represent a file folder partition to finalize up to | ||
*/ | ||
String[] getFinalizedUptoPartitions(List<Message> lastMessages, | ||
List<Message> committedMessages) throws Exception; | ||
|
||
/** | ||
* Get the previous partition out of the incoming partition. | ||
* E.g. for ["dt=2015-07-07","hr=05"], it will return ["dt=2015-07-07","hr=04"] | ||
* | ||
* @param partition | ||
* @return | ||
*/ | ||
String[] getPreviousPartitions(String[] partition) throws Exception; | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.