Optimizer should recognize risk in cardinality estimation #59333

terry1purcell · 2025-02-07T20:56:54Z

Enhancement

Combining multiple predicates may exploit single column statistics (NDVs, topN, buckets) and index NDVs - with the combination of selectivities using either an exponential backoff or independence assumption. Neither of the backoff or independence are supported by statistics. It would be beneficial for the optimizer to acknowledge when an estimate was made with such assumptions.

Using the following DDL and insert for test purposes:

CREATE TABLE t2 (
a INT PRIMARY KEY,
b int,
c int,
d int,
e int,
Key (b, c, a),
key (e, d, a)
);

set @@cte_max_recursion_depth=10000000;
INSERT INTO t2 (a, b, c, d, e)
SELECT a, mod(a, 1000) AS b, mod(a, 1000) AS c, mod(a, 10000) as d, mod(a, 2) as e
FROM (
WITH RECURSIVE x AS (
SELECT 1 AS a
UNION ALL
SELECT a + 1 AS a
FROM x
WHERE a < 1000000
)
SELECT a
FROM x
) AS subquery;
ANALYZE TABLE t2;

Query examples

tidb> select count() from t2 where b = 0 and c = 0;
+----------+
| count() |
+----------+
| 1000 |
+----------+
1 row in set (0.01 sec)

terry1purcell added the type/enhancement The issue or PR belongs to an enhancement. label Feb 7, 2025

terry1purcell self-assigned this Feb 7, 2025

terry1purcell mentioned this issue Feb 7, 2025

planner: Recognize potential for correlation in subset index match (WIP) #58688

Open

13 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimizer should recognize risk in cardinality estimation #59333

Optimizer should recognize risk in cardinality estimation #59333

terry1purcell commented Feb 7, 2025

Optimizer should recognize risk in cardinality estimation #59333

Optimizer should recognize risk in cardinality estimation #59333

Comments

terry1purcell commented Feb 7, 2025

Enhancement