Handle Varchar(n) in Velox #6888
Replies: 4 comments 10 replies
-
@tdcmeehan brought up an interesting Presto discussion around char and varchar collation semantics related to blank padding. |
Beta Was this translation helpful? Give feedback.
-
@mbasmanova I remember a similar discussion back in the day, but not sure if there was a technical reason why we decided not add support for it, or just something we haven't had time to look at. More generally, I guess in order to support this we would need a fixed-length array/string vector type. There was an initial implementation for something similar back in the day which someone created to support tensors, but I don't think we it was properly hooked up around the codebase. @mbasmanova do you know if we removed that code? |
Beta Was this translation helpful? Give feedback.
-
@mbasmanova : If we decide that the optimizations for memory allocation or addressing warrant a new physical type, then my thinking was that if we expose this physical type via a StringView like API, then it would be easy to flip any existing logic written using StringView (and VARCHAR type) to the new physical type. This might be overthinking for now :) |
Beta Was this translation helpful? Give feedback.
-
I've put together a document about the support. I've made a prototype with the changes and still working some aspects around the fuzzer. If you get a chance please take a look. |
Beta Was this translation helpful? Give feedback.
-
Both Presto and Spark allow specifying an optional max length for varchar type
https://prestodb.io/docs/current/language/types.html#string
https://spark.apache.org/docs/latest/sql-ref-datatypes.html
Velox Varchar type doesn't support max length semantics.
Right now, Presto always uses Velox varchar irrespective if the varchar has a max length. This causes incorrect results for many SQL.
I tried to error out such invalid CAST expressions at the Prestissimo layer, but that failed many SQL queries in the native tests. So I feel doing so is very restrictive. It would be better to support varchar(n) natively in Velox.
Starting this discussion to understand :
@mbasmanova @pedroerp @oerling
Beta Was this translation helpful? Give feedback.
All reactions