Transforms some filters into range scans #2807

evenyag · 2023-11-23T14:08:51Z

What type of enhancement is this?

Performance

What does the enhancement do?

Converts query conditions into primary key ranges to scan.

Here is an example from TiDB's docs: https://docs.pingcap.com/tidb/stable/explain-indexes#indexlookup

EXPLAIN SELECT * FROM t1 WHERE intkey BETWEEN 300 AND 310;

+-------------------------------+---------+-----------+--------------------------------+-----------------------------------+
| id                            | estRows | task      | access object                  | operator info                     |
+-------------------------------+---------+-----------+--------------------------------+-----------------------------------+
| IndexLookUp_10                | 5.67    | root      |                                |                                   |
| ├─IndexRangeScan_8(Build)     | 5.67    | cop[tikv] | table:t1, index:intkey(intkey) | range:[300,310], keep order:false |
| └─TableRowIDScan_9(Probe)     | 5.67    | cop[tikv] | table:t1                       | keep order:false                  |
+-------------------------------+---------+-----------+--------------------------------+-----------------------------------+
3 rows in set (0.00 sec)

Implementation challenges

We might perform the transformation in the query optimizers.

The storage engine can reduce the key range to scan or rows to return. We can implement range scan in the mito engine.

The memtable can only scan a subset of keys
The SST reader can further filter rows by key ranges.

KKould · 2024-02-22T12:27:46Z

Is there a more detailed description or issue to facilitate participation and contribution?
I am very interested in this issue

evenyag · 2024-02-24T04:09:07Z

There are some steps

implement a detacher to get index ranges to scan from filters. Here is an example that we can follow
support range scan or pruning in both memtables and SSTs
transform filters to ranges in the query optimizer or the storage engine
compare with existing min-max and inverted index to see whether we can benefit from it
merge them into the main branch

We can try to extract ranges from filters in the table predicate as the first step. Each range bound could be a Vec<Value>.

KKould · 2024-02-29T18:00:26Z

There are some steps

implement a detacher to get index ranges to scan from filters. Here is an example that we can follow

support range scan or pruning in both memtables and SSTs

transform filters to ranges in the query optimizer or the storage engine

compare with existing min-max and inverted index to see whether we can benefit from it

merge them into the main branch

We can try to extract ranges from filters in the table predicate as the first step. Each range bound could be a Vec<Value>.

I thought it might be appropriate that the range bound should be something like a structure like this?

struct Range {
   start: Bound<Value>,
   end: Bound<Value>,
}

Tips: GenericRange looks similar, but is it appropriate to confuse Lt/Gt with Lteq/Gteq so that it ignores its precision?

I found that TimeRangePredicateBuilder::extract_time_range_from_expr is very similar, so can I refer to it for implementation?

evenyag · 2024-03-01T06:04:53Z

I thought it might be appropriate that the range bound should be something like a structure like this?
struct Range {
   start: Bound<Value>,
   end: Bound<Value>,
}

A bound might contain multiple columns if we have a compound key. So a bound could be:

struct Bound {
    values: Vec<Value>,
    inclusive: bool,
}

If we encode values by memcomparable encoding, we could also reuse the bound struct here. But I'm not sure whether it is a good idea.

greptimedb/src/index/src/inverted_index/search/predicate.rs

Lines 40 to 47 in d4a54a0

    
           /// `Bound` is a sub-component of a range, representing a single-sided limit that could be inclusive or exclusive. 
        
           #[derive(Debug, Clone, PartialEq, Eq)] 
        
           pub struct Bound { 
        
               /// Whether the bound is inclusive or exclusive. 
        
               pub inclusive: bool, 
        
               /// The value of the bound. 
        
               pub value: Bytes, 
        
           }

I found that TimeRangePredicateBuilder::extract_time_range_from_expr is very similar, so can I refer to it for implementation?

Yes. Note that the TimeRangePredicateBuilder only supports extracting range for a single column. Extracting ranges is the most complicated thing. I'm also considering the necessity of this issue as we are implementing an inverted index. We could also construct some skipping indices in memory to speed up querying the memtable.

Reference

KKould · 2024-03-07T07:41:02Z

Recently I read some articles about index selection, and I found a rule:

Use combined indexes when querying with multiple conditions. Note that columns with equivalent conditions need to be placed in the front of the combined index. Here is an example: if the select* from t where c1 = 10 and c2 = 100 and c3 > 10 query is frequently used, consider creating a combined index Index cidx (c1, c2, c3), so that a index prefix can be constructed to scan by query conditions.

So I'm wondering if I can use something like TimeRangePredicateBuilder to sequentially obtain the relevant columns in the composite index, and only when the previous one is of equal value can I detach the next column.

Case 1: There is an index with three columns a, b, c. When WHERE a = 1 AND b > 1, the range of a=1 and b >1 can be extracted. The index can use the composite index of a, b, c. (if WHERE a > 1 AND b = 1 then the index cannot be used)

Case 2: There are two indexes with one column a and three columns a, b, c. When WHERE (a = 1 AND b > 1) OR a > 1, the range of a AND b cannot be extracted, but at this time the range of a range can be extracted and is a >= 1, so you can use a single-column index with column a

Tips: Case 2 can use similar index merging to perform better queries, but this may be optimized later.

ref:

https://docs.pingcap.com/tidb/stable/dev-guide-index-best-practice

evenyag · 2024-03-07T12:05:30Z

So I'm wondering if I can use something like TimeRangePredicateBuilder to sequentially obtain the relevant columns in the composite index, and only when the previous one is of equal value can I detach the next column.

Yes, it is possible. Note that we should support both WHERE a = 1 AND b > 1 and WHERE b > 1 AND a = 1.

KKould · 2024-03-07T16:22:55Z

So I'm wondering if I can use something like TimeRangePredicateBuilder to sequentially obtain the relevant columns in the composite index, and only when the previous one is of equal value can I detach the next column.

Yes, it is possible. Note that we should support both WHERE a = 1 AND b > 1 and WHERE b > 1 AND a = 1.

Yes, the order should not lead to inconsistent results

I'll try to verify the correctness of this

evenyag added C-enhancement Category Enhancements C-performance Category Performance labels Nov 23, 2023

evenyag self-assigned this Nov 23, 2023

evenyag added this to mito2 Nov 27, 2023

evenyag moved this to Todo in mito2 Nov 27, 2023

github-actions bot unassigned evenyag Mar 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Transforms some filters into range scans #2807

Transforms some filters into range scans #2807

evenyag commented Nov 23, 2023

KKould commented Feb 22, 2024

evenyag commented Feb 24, 2024

KKould commented Feb 29, 2024 •

edited

Loading

evenyag commented Mar 1, 2024 •

edited

Loading

KKould commented Mar 7, 2024 •

edited

Loading

evenyag commented Mar 7, 2024

KKould commented Mar 7, 2024

Transforms some filters into range scans #2807

Transforms some filters into range scans #2807

Comments

evenyag commented Nov 23, 2023

What type of enhancement is this?

What does the enhancement do?

Implementation challenges

KKould commented Feb 22, 2024

evenyag commented Feb 24, 2024

KKould commented Feb 29, 2024 • edited Loading

evenyag commented Mar 1, 2024 • edited Loading

KKould commented Mar 7, 2024 • edited Loading

evenyag commented Mar 7, 2024

KKould commented Mar 7, 2024

KKould commented Feb 29, 2024 •

edited

Loading

evenyag commented Mar 1, 2024 •

edited

Loading

KKould commented Mar 7, 2024 •

edited

Loading