description |
---|
This page describes the indexing techniques available in Apache Pinot |
Apache Pinot™ supports the following indexing techniques:
- Bloom filter
- Forward index
- Dictionary-encoded forward index with bit compression
- Raw value forward index
- Sorted forward index with run-length encoding
- FST index
- Geospatial
- Inverted index
- Bitmap inverted index
- Sorted inverted index
- JSON index
- Range index
- Star-tree index
- Text search support
- Timestamp index
By default, Pinot creates a dictionary-encoded forward index for each column.
There are two ways to enable indexes for a Pinot table.
Indexing is enabled by specifying the column names in the table configuration. More details about how to configure each type of index can be found in the respective index's section linked above or in the table configuration reference.
Indexes can also be dynamically added to or removed from segments at any point. Update your table configuration with the latest set of indexes you want to have.
For example, if you have an inverted index on the foo
field and now want to also include the bar
field, you would update your table configuration from this:
"tableIndexConfig": {
"invertedIndexColumns": ["foo"],
...
}
To this:
"tableIndexConfig": {
"invertedIndexColumns": ["foo", "bar"],
...
}
The updated index configuration won't be picked up unless you invoke the reload API. This API sends reload messages via Helix to all servers, as part of which indexes are added or removed from the local segments. This happens without any downtime and is completely transparent to the queries.
When adding an index, only the new index is created and appended to the existing segment. When removing an index, its related states are cleaned up from Pinot servers. You can find this API under the Segments
tab on Swagger:
curl -X POST \
"http://localhost:9000/segments/myTable/reload" \
-H "accept: application/json"
You can also find this action on the Cluster Manager in the Pinot UI, on the specific table's page.
{% hint style="info" %} Not all indexes can be retrospectively applied to existing segments. For more detailed documentation on applying indexes, see the Indexing FAQ. {% endhint %}
The inverted index provides good performance for most use cases, especially if your use case doesn't have a strict low latency requirement.
You should start by using this, and if your queries aren't fast enough, switch to advanced indices like the sorted or star-tree index.
Matrix below show which combinations of data types, cardinality and encoding are compatible with various index types:
data type | bloom | fst | geo | inverted | json | native | text | range | startree | timestamp | vector |
---|---|---|---|---|---|---|---|---|---|---|---|
boolean | ❌ | ❌ | ❌ | 🆗 | ❌ | ❌ | ❌ | 🆗 | 🆗 (2) | 🆗 | ❌ |
int | 🆗 | ❌ | ❌ | 🆗 | ❌ | ❌ | ❌ | 🆗 | 🆗 (2) | 🆗 | ❌ |
long | 🆗 | ❌ | ❌ | 🆗 | ❌ | ❌ | ❌ | 🆗 | 🆗 (2) | 🆗 | ❌ |
float | 🆗 | ❌ | ❌ | 🆗 | ❌ | ❌ | ❌ | 🆗 | 🆗 (2) | 🆗 | 🆗 (5) |
double | 🆗 | ❌ | ❌ | 🆗 | ❌ | ❌ | ❌ | 🆗 | 🆗 (2) | 🆗 | ❌ |
big decimal | ❌ | ❌ | ❌ | 🆗 | ❌ | ❌ | ❌ | 🆗 | 🆗 (2) | 🆗 | ❌ |
timestamp | ❌ | ❌ | ❌ | 🆗 | ❌ | ❌ | ❌ | 🆗 | 🆗 (2) | 🆗 | ❌ |
string | 🆗 | 🆗 (1) | ❌ | 🆗 | 🆗 (2) (4) | 🆗 | 🆗 | 🆗 | 🆗 (2) | 🆗 (3) | ❌ |
json | ❌ | ❌ | ❌ | 🆗 | 🆗 (2) | 🆗 | 🆗 | 🆗 | 🆗 (2) | ❌ | ❌ |
bytes | 🆗 | ❌ | 🆗 (2) | 🆗 | ❌ | ❌ | ❌ | 🆗 | 🆗 (2) | ❌ | ❌ |
map | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
(1) Supports only dictionary-encoded columns.
(2) Supports only single value columns.
(3) Supported only if values can be parsed as long.
(4) Supported only if values can be parsed as json.
(5) Supports only multi value columns.