Cardinality Estimation Anomalies for ORDERKEY and PARTKEY in TPC-H Dataset (SF=1) #19

Koki05410 · 2024-01-31T00:18:26Z

I've been working with the TPC-H dataset (Scale Factor 1) in DeepDB and noticed an unusual pattern in cardinality estimation (CE). Specifically, when querying numerical columns with limited distinct values such as ORDERKEY and PARTKEY in the LINEITEM (total
records = 6001215) table, the system's predictions come out as multiples of the inverse of the sampling rate or exactly one. (e.g. CE results were 1, 6, 12, 18, ... when samples_per_spn = 1000000 1000000 1000000 1000000 1000000)
This occurs even after listing these columns under the no_compression section of the schema file to avoid compression effects. I'd appreciate any guidance or recommendations to mitigate this issue.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cardinality Estimation Anomalies for ORDERKEY and PARTKEY in TPC-H Dataset (SF=1) #19

Cardinality Estimation Anomalies for ORDERKEY and PARTKEY in TPC-H Dataset (SF=1) #19

Koki05410 commented Jan 31, 2024

Cardinality Estimation Anomalies for ORDERKEY and PARTKEY in TPC-H Dataset (SF=1) #19

Cardinality Estimation Anomalies for ORDERKEY and PARTKEY in TPC-H Dataset (SF=1) #19

Comments

Koki05410 commented Jan 31, 2024