description |
---|
0.4.0 release introduced the theta-sketch based distinct count function, an S3 filesystem plugin, a unified star-tree index implementation, migration from TimeFieldSpec to DateTimeFieldSpec, etc. |
0.4.0 release introduced various new features, including the theta-sketch based distinct count aggregation function, an S3 filesystem plugin, a unified star-tree index implementation, deprecation of TimeFieldSpec in favor of DateTimeFieldSpec, etc. Miscellaneous refactoring, performance improvement and bug fixes were also included in this release. See details below.
- Made DateTimeFieldSpecs mainstream and deprecated TimeFieldSpec (#2756)
- Used time column from table config instead of schema (#5320)
- Included dateTimeFieldSpec in schema columns of Pinot Query Console #5392
- Used DATE_TIME as the primary time column for Pinot tables (#5399)
- Supported range queries using indexes (#5240)
- Supported complex aggregation functions
- Supported Aggregation functions with multiple arguments (#5261)
- Added api in AggregationFunction to get compiled input expressions (#5339)
- Added a simple PinotFS benchmark driver (#5160)
- Supported default star-tree (#5147)
- Added an initial implementation for theta-sketch based distinct count aggregation function (#5316)
- One minor side effect: DataSchemaPruner won't work for DistinctCountThetaSketchAggregatinoFunction (#5382)
- Added access control for Pinot server segment download api (#5260)
- Added Pinot S3 Filesystem Plugin (#5249)
- Text search improvement
- Pruned stop words for text index (#5297)
- Used 8byte offsets in chunk based raw index creator (#5285)
- Derived num docs per chunk from max column value length for varbyte raw index creator (#5256)
- Added inter segment tests for text search and fixed a bug for Lucene query parser creation (#5226)
- Made text index query cache a configurable option (#5176)
- Added Lucene DocId to PinotDocId cache to improve performance (#5177)
- Removed the construction of second bitmap in text index reader to improve performance (#5199)
- Tooling/usability improvement
- Added template support for Pinot Ingestion Job Spec (#5341)
- Allowed user to specify zk data dir and don't do clean up during zk shutdown (#5295)
- Allowed configuring minion task timeout in the PinotTaskGenerator (#5317)
- Update JVM settings for scripts (#5127)
- Added Stream github events demo (#5189)
- Moved docs link from gitbook to docs.pinot.apache.org (#5193)
- Re-implemented ORCRecordReader (#5267)
- Evaluated schema transform expressions during ingestion (#5238)
- Handled count distinct query in selection list (#5223)
- Enabled async processing in pinot broker query api (#5229)
- Supported bootstrap mode for table rebalance (#5224)
- Supported order-by on BYTES column (#5213)
- Added Nightly publish to binary (#5190)
- Shuffled the segments when rebalancing the table to avoid creating hotspot servers (#5197)
- Supported built-in transform functions (#5312)
- Added date time transform functions (#5326)
- Deepstore by-pass in LLC: introduced segment uploader (#5277, #5314)
- APIs Additions/Changes
- Added a new server api for download of segments
- /GET /segments/{tableNameWithType}/{segmentName}
- Added a new server api for download of segments
- Upgraded helix to 0.9.7 (#5411)
- Added support to execute functions during query compilation (#5406)
- Other notable refactoring
- Moved table config into pinot-spi (#5194)
- Cleaned up integration tests. Standardized the creation of schema, table config and segments (#5385)
- Added jsonExtractScalar function to extract field from json object (#4597)
- Added template support for Pinot Ingestion Job Spec #5372
- Cleaned up AggregationFunctionContext (#5364)
- Optimized real-time range predicate when cardinality is high (#5331)
- Made PinotOutputFormat use table config and schema to create segments (#5350)
- Tracked unavailable segments in InstanceSelector (#5337)
- Added a new best effort segment uploader with bounded upload time (#5314)
- In SegmentPurger, used table config to generate the segment (#5325)
- Decoupled schema from RecordReader and StreamMessageDecoder (#5309)
- Implemented ARRAYLENGTH UDF for multi-valued columns (#5301)
- Improved GroupBy query performance (#5291)
- Optimized ExpressionFilterOperator (#5132)
- Do not release the PinotDataBuffer when closing the index (#5400)
- Handled a no-arg function in query parsing and expression tree (#5375)
- Fixed compatibility issues during rolling upgrade due to unknown json fields (#5376)
- Fixed missing error message from pinot-admin command (#5305)
- Fixed HDFS copy logic (#5218)
- Fixed spark ingestion issue (#5216)
- Fixed the capacity of the DistinctTable (#5204)
- Fixed various links in the Pinot website
- Upsert: support overriding data in the real-time table (#4261).
- Add pinot upsert features to pinot common (#5175)
- Enhancements for theta-sketch, e.g. multiValue aggregation support, complex predicates, performance tuning, etc
-
TableConfig no longer support de-serialization from json string of nested json string (i.e. no
\"
inside the json) (#5194) -
The following APIs are changed in AggregationFunction (use TransformExpressionTree instead of String as the key of blockValSetMap) (#5371):
void aggregate(int length, AggregationResultHolder aggregationResultHolder, Map<TransformExpressionTree, BlockValSet> blockValSetMap); void aggregateGroupBySV(int length, int[] groupKeyArray, GroupByResultHolder groupByResultHolder, Map<TransformExpressionTree, BlockValSet> blockValSetMap); void aggregateGroupByMV(int length, int[][] groupKeysArray, GroupByResultHolder groupByResultHolder, Map<TransformExpressionTree, BlockValSet> blockValSetMap);