Feature Proposal: Secondary Index Query Enhancements - Filter Expressions and Beyond #4

martinsumner · 2024-11-08T12:06:33Z

martinsumner
Nov 8, 2024
Maintainer

Background

There have been multiple attempts to add a capability to query data in Riak:

First there was Map/Reduce, but that wasn't always popular or easy; and ultimately basho post-sales engineers warned everyone not to use it.
Then there was riak search, a fresh implementation of a Lucene-style index in Riak, but there were issues with making that performant both in the short-term and long-term, and was also destined to be eventually inconsistent (with no mechanism to validate entropy between data store and query engine).
There was secondary indexes, which had its own performance problems, and were constrained in functional scope.
Secondary indexes could then be enhanced through the use of overloaded terms, and regular expressions, but that was never documented - and generating regular expressions in code had their own performance and end-user development challenges.
Then there was Yokozuna, which leveraged Lucene rather than replaced it, and tackled the tricky problem of anti-entropy. However, co-existing the beam and JVM and scaling the two technologies concurrently was problematic ... and ultimately the maintenance overhead on the Riak team was too much to keep Solr updated and Yokozuna was dropped.

Some performance improvements have been made to secondary indexing over the past few releases. Firstly the efficiency of querying has improved greatly by using leveled not eleveldb, and then through improvements in leveled/OTP itself:

There was also an order of magnitude of improvement made in query planning.

The addition of coverage participation option also addressed (with intervention) the primary functional issue of secondary index use.

The performance characteristics makes secondary indexes powerful, especially when compared to alternative databases (in particular DynamoDB). However, to make good use of these and support non-trivial queries - is is necessary to use overloaded terms and regular expressions. The primary issue with this has been with explaining the efficient use of this to developers. There is also the secondary issue of performance of more complex back-tracking regular expressions within the PCRE engine used by OTP. The engine is set be changed in OTP 28 - but that may cause its own migration issues for Riak.

The aim of this proposal is therefore to make the power secondary indexes more accessible: to try and make the process more intuitive. A secondary concern is to ensure that it remains backwards compatible with existing implementations i.e. a service currently using 2i and regular expressions should be able to transition without re-indexing their data.

Proposal

It is proposed that indexing would remain the same. The definition of index entries is made by the developer, and then appended to the object - rather than defining a schema to read the object and generate index entries. The solution will retain some key flexibility advantages: an object can appear on an unrestricted number of indexes, have an unrestricted number of entries on each index, and only a sort key is mandatory (there is no need for a partition key).

The basic proposal is that a simple language will be provided to allow for terms to be evaluated, to break up overloaded terms into separate named projected attributes (also referred to as terms). Once an index entry has been evaluated, the map of projected attributes that is outputted, can be processed using a simple filter expression containing matching rules for individual terms. The results can then be returned, but aggregation of results can also be restricted to specific terms, and return only counts if result values are not required.

Finally each query can now contain. multiple sub-queries - allowing for the results of those sub-queries to be aggregated using basic set operations, before being returned.

There should be a new Query API, where a Query can be provided as a JSON object, defined as below.

Query Request

query_request

bucket
aggregation_expression
accumulation_option
result_provision
expression_substitutions
query_details
(
- aggregation_tag
- index_name
- start_term
- end_term
- evaluation_expression
- filter_expression
  )

bucket

Bucket or {Type, Bucket} tuple

aggregation_expression (optional)

A query may contain one or more queries (as described within query_details). If there is a single query, there is no purpose for aggregation_expression, but if there are multiple queries - the aggregation_expression describes the set operations to be used to combine the resilts.

If not provided, or the word none there should be a single query provided under query_details, for the query to be valid. Otherwise may be a set operation expression to describe how the results should be made by set operations on the results of each individual query in the query_details. Each tag referred to in the aggregation_expression should have a matching aggregation_tag within the query_details. the set operations supported will be UNION INTERSECT and SUBTRACT e.g. $1 INTERSECT ($2 UNION $3).

Note that aggregation will occur at a vnode level, and all the queries are run on a single vnode snapshot. However, all the queries are always run (aggregation occurs at the end, on each vnode - there is no intelligence to determine if some queries are unnecessary based on results of earlier queries e.g. they are to be intersected with an empty set).

accumulation_option

If an aggregation_expression other than none is provided, this can only be:

keys
key_count

If the aggregation_expression is none, and there is a single query in query_details, the following additional options are supported:

match_count
term_with_keys(TERM)
term_with_matchcount(TERM)
term_with_keycount(TERM)

A count can be based on matches (fastest option, counts each match independently), or keys which will only count unique keys that match. If the index has a 1:1 mapping between terms and keys, then these results will be consistent (and the match count should be used for improved efficiency). The term in term-based accumulation_option can be a reference to any single attribute returned from an evaluation_expression, including $term (to refer to the whole term).

result_provision

The result_provision may be two options - all, reference. If the option all is selected, all results will be returned once all results are available. If reference is chosen, the reference may be used to return pages of results, as when pages of results are available. The reference option is only available if an accumulationOption of keys or term_with_keys($TERM) is used.

The reference will have a 60s timeout. If no request to the return results from the reference is made within 60s then the results will be dropped, and no future requests van be made to fetch the results.

expression_substitutions

A map of substitutions to be made into evaluation and filter expression within the query details.

query_details

A list of query_detail objects

query_detail/aggregation_tag

If an aggregation_expression is used, the tag of this query which acts as a reference to this query within that expression.

query_detail/index_name

The name of the index_name used for the query within Riak. Only _bin type queries are supported via this API.

query_detail/start_term

The prefix of the term to be used as the start of the range to be covered by the query. All index entries >= this term will be candidates for the result set.

query_detail/end_term

The prefix of the term to be used as the end of the range to be covered by the query. All index entries < this term will be candidates for the result set.

query_detail/evaluation _expression

An expression to take an index term ($term) and a key ($key) and return a map of Key/Value attributes to be used in a filter expression and/or an accumulation_option. This expression is a pipeline of functions to extract sub-attributes from existing attributes, or combined those attributes into combination terms.

The supported functions are:
delim - break the term using a delimiter
join - concatenate multiple terms to gather using a delimeter
split - as with delim, but the output is a list of items associated with a single term, not separate terms
slice - as with split, but the slicing of the parent term is based on fixed width
index - slice a single sub-term from a term using a start position and length
kvsplit - use dual delimiters (one to tokenise a pari, one to split a pair into key and value) to break up a sequence of Key/Value pairs into named attributes
regex - use a perl-compatible regular expression to extract named attributes
to_integer - convert a string attribute into an integer
to_string - convert an integer attribute into a string (for example to use in join)
add - perform addition on an integer attribute
subtract - perform subtraction on an integer attribute
map - categorise an attribute by comparison to a series of limits I.e. to break a value into range-based buckets

query_detail/filter_expression

A filter expression to be applied to attributes outputted from the evaluation expression. Expression supported in the form:

comparator ::=
=
| <>
| <
| <=
| >
| >=

function ::=
attribute_exists (term)
| attribute_not_exists (term)
| attribute_empty (term)
| begins_with (term, prefix)
| ends_with (term, suffix)

Examples

Some examples of potentially non-trivial things that can be achieved with this query API:

Calculating an age from a DoB attribute with an index, and then return results counted by Age;
Phonetic matching queries (by pre-loading the index with a phonetic coding attribute);
Concatenating different terms to provide reports which count results based on combinations (e.g. GP Provider & Generation).

The tests within the leveled PR contain many different examples, in particular the unit tests of the leveled_filter).

The query capability will fall short of what would be possible in OpenSearch, potentially in terms of performance as well as functionality - however it will be a significant advance on the capability of other NoSQL solutions such as DynamoDB. With DynamoDB, although the filter expression support is similar, the restrictions that require a partition key on every index, and only allow an object to appear once on any given index - severely hamper the functional scope. Performance is also severely constrained by limits on RCUs per round trip.

Design

Erlang contains the leex and yecc parse tools, and these are simple to use tools for building a query language. Once the language is defined, then it is simple to produce a filter function which first evaluates an index term (to convert it into individual projected attributes) and then applies a filter expression. The job then, of replacing a regular expression in the index fold with a function is trivial.

Alternative Design Ideas

An attempt was made to simply improve the efficiency of regular expressions, and the 2i implementation, in particular via the http API. It was assumed that moving to pre-compiled regular expressions would make a big difference, but this will probably only be the case for very complex regular expressions. Likewise moving to RE2 was attempted using a NIF, but they only appear to be faster when pre-compiled - and there was no easy way to pre-compile a RE2 expression and then distribute that between Erlang nodes.

The difference between regular expressions and filter expressions is not just restricted to developer familiarity. The key change is support for "BETWEEN", such as for date ranges - this is extremely complicated and error-prone in regular expressions, but simple with support for "BETWEEN".

There had been previous attempts to make the separation of index sort keys, and projected attributes a more formal part of the definition part of the indexes. However, this would not be backwards compatible. The evaluation expression allows for the user to define projected attributes in relatively simple ways (i.e. using delimiters, or kv pairs).

Investigating the use of schemas to define indexes had a couple of issues:

It requires values to use a specific format (i.e. JSON).
Doing complicated things (like having multiple entries on the same index for the same object, but with different projected attributes) is hard with schemas.
We would need some sort of schema update mechanism, which may be a significant undertaking.

In parallel to this there is a project which is suing the open nature of the nextgenrepl API to do both real-time replication and full-sync reconciliation from Riak to OpenSearch. This may provide an alternative approach, but it is still considered desirable to have an integrated query solution within Riak.

Testing

Initial performance testing indicates that even at fairly low levels of complexity, the combination of Evaluation Expression and Filter Expression is faster than using a regular expression.

Caveats

WIP

Pull Requests

martinsumner/leveled#440

Planned release for inclusion

Riak 3.4.0

Bob-The-Marauder · 2024-11-22T11:57:05Z

Bob-The-Marauder
Nov 22, 2024

As an alternative to overloaded 2i queries, this appeals to me. Once functional, we'll need some detailed examples in the documentation and maybe a user tutorial video but overall, this sounds like a nice addition to existing search features.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Open Riak

Feature Proposal: Secondary Index Query Enhancements - Filter Expressions and Beyond #4

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

Select a reply

Open Riak

Feature Proposal: Secondary Index Query Enhancements - Filter Expressions and Beyond #4

martinsumner Nov 8, 2024 Maintainer

Background

Proposal

Query Request

bucket

aggregation_expression (optional)

accumulation_option

result_provision

expression_substitutions

query_details

query_detail/aggregation_tag

query_detail/index_name

query_detail/start_term

query_detail/end_term

query_detail/evaluation _expression

query_detail/filter_expression

Examples

Design

Alternative Design Ideas

Testing

Caveats

Pull Requests

Planned release for inclusion

Replies: 1 comment

Bob-The-Marauder Nov 22, 2024

martinsumner
Nov 8, 2024
Maintainer

Bob-The-Marauder
Nov 22, 2024