layout | title | nav_order |
---|---|---|
page |
Supported Operators |
6 |
Apache Spark supports processing various types of data. Not all expressions support all data types. The RAPIDS Accelerator for Apache Spark has further restrictions on what types are supported for processing. This tries to document what operations are supported and what data types each operation supports. Because Apache Spark is under active development too and this document was generated against version 3.1.1 of Spark. Most of this should still apply to other versions of Spark, but there may be slight changes.
The Decimal
type in Spark supports a precision up to 38 digits (128-bits).
The RAPIDS Accelerator supports 128-bit starting from version 21.12 and decimals are
enabled by default.
Please check Decimal Support for more details.
Decimal
precision and scale follow the same rule as CPU mode in Apache Spark:
* In particular, if we have expressions e1 and e2 with precision/scale p1/s1 and p2/s2
* respectively, then the following operations have the following precision / scale:
*
* Operation Result Precision Result Scale
* ------------------------------------------------------------------------
* e1 + e2 max(s1, s2) + max(p1-s1, p2-s2) + 1 max(s1, s2)
* e1 - e2 max(s1, s2) + max(p1-s1, p2-s2) + 1 max(s1, s2)
* e1 * e2 p1 + p2 + 1 s1 + s2
* e1 / e2 p1 - s1 + s2 + max(6, s1 + p2 + 1) max(6, s1 + p2 + 1)
* e1 % e2 min(p1-s1, p2-s2) + max(s1, s2) max(s1, s2)
* e1 union e2 max(s1, s2) + max(p1-s1, p2-s2) max(s1, s2)
However, Spark inserts PromotePrecision
to CAST both sides to the same type.
GPU mode may fall back to CPU even if the result Decimal precision is within 18 digits.
For example, Decimal(8,2)
x Decimal(6,3)
resulting in Decimal (15,5)
runs on CPU,
because due to PromotePrecision
, GPU mode assumes the result is Decimal(19,6)
.
There are even extreme cases where Spark can temporarily return a Decimal value
larger than what can be stored in 128-bits and then uses the CheckOverflow
operator to round it to a desired precision and scale. This means that even when
the accelerator supports 128-bit decimal, we might not be able to support all
operations that Spark can support.
Timestamps in Spark will all be converted to the local time zone before processing and are often converted to UTC before being stored, like in Parquet or ORC. The RAPIDS Accelerator only supports UTC as the time zone for timestamps.
In Spark CalendarInterval
s store three values, months, days, and microseconds.
Support for this type is still very limited in the accelerator. In some cases
only a a subset of the type is supported, like window ranges only support days currently.
There are lots of different configuration values that can impact if an operation is supported or not. Some of these are a part of the RAPIDS Accelerator and cover the level of compatibility with Apache Spark. Those are covered here. Others are a part of Apache Spark itself and those are a bit harder to document. The work of updating this to cover that support is still ongoing.
In general though if you ever have any question about why an operation is not running
on the GPU you may set spark.rapids.sql.explain
to ALL and it will try to give all of
the reasons why this particular operator or expression is on the CPU or GPU.
Type Name | Type Description |
---|---|
BOOLEAN | Holds true or false values. |
BYTE | Signed 8-bit integer value. |
SHORT | Signed 16-bit integer value. |
INT | Signed 32-bit integer value. |
LONG | Signed 64-bit integer value. |
FLOAT | 32-bit floating point value. |
DOUBLE | 64-bit floating point value. |
DATE | A date with no time component. Stored as 32-bit integer with days since Jan 1, 1970. |
TIMESTAMP | A date and time. Stored as 64-bit integer with microseconds since Jan 1, 1970 in the current time zone. |
STRING | A text string. Stored as UTF-8 encoded bytes. |
DECIMAL | A fixed point decimal value with configurable precision and scale. |
NULL | Only stores null values and is typically only used when no other type can be determined from the SQL. |
BINARY | An array of non-nullable bytes. |
CALENDAR | Represents a period of time. Stored as months, days and microseconds. |
ARRAY | A sequence of elements. |
MAP | A set of key value pairs, the keys cannot be null. |
STRUCT | A series of named fields. |
UDT | User defined types and java Objects. These are not standard SQL types. |
Value | Description |
---|---|
S | (Supported) Both Apache Spark and the RAPIDS Accelerator support this type fully. |
(Not Applicable) Neither Spark not the RAPIDS Accelerator support this type in this situation. | |
PS | (Partial Support) Apache Spark supports this type, but the RAPIDS Accelerator only partially supports it. An explanation for what is missing will be included with this. |
NS | (Not Supported) Apache Spark supports this type but the RAPIDS Accelerator does not. |
Apache Spark uses a Directed Acyclic Graph(DAG) of processing to build a query.
The nodes in this graph are instances of SparkPlan
and represent various high
level operations like doing a filter or project. The operations that the RAPIDS
Accelerator supports are described below.
Executor | Description | Notes | Param(s) | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
CoalesceExec | The backend for the dataframe coalesce method | None | Input/Output | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
NS |
CollectLimitExec | Reduce to single partition and apply limit | This is disabled by default because Collect Limit replacement can be slower on the GPU, if huge number of rows in a batch it could help by limiting the number of rows transferred from GPU to CPU | Input/Output | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
NS |
ExpandExec | The backend for the expand operator | None | Input/Output | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
NS |
FileSourceScanExec | Reading data from files, often from Hive tables | None | Input/Output | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
NS |
FilterExec | The backend for most filter statements | None | Input/Output | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
NS |
GenerateExec | The backend for operations that generate more output rows than input rows like explode | None | Input/Output | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
NS |
GlobalLimitExec | Limiting of results across partitions | None | Input/Output | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
NS |
LocalLimitExec | Per-partition limiting of results | None | Input/Output | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
NS |
ProjectExec | The backend for most select, withColumn and dropColumn statements | None | Input/Output | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
NS |
RangeExec | The backend for range operator | None | Input/Output | S | |||||||||||||||||
SampleExec | The backend for the sample operator | None | Input/Output | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
NS |
SortExec | The backend for the sort operator | None | Input/Output | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | S | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT |
NS |
SubqueryBroadcastExec | Plan to collect and transform the broadcast key values | None | Input/Output | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | S | S | PS UTC is only supported TZ for child TIMESTAMP |
PS UTC is only supported TZ for child TIMESTAMP |
PS UTC is only supported TZ for child TIMESTAMP |
S |
TakeOrderedAndProjectExec | Take the first limit elements as defined by the sortOrder, and do projection if needed | None | Input/Output | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
NS |
UnionExec | The backend for the union operator | None | Input/Output | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS unionByName will not optionally impute nulls for missing struct fields when the column is a struct and there are non-overlapping fields; UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
NS |
Executor | Description | Notes | Param(s) | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
CustomShuffleReaderExec | A wrapper of shuffle query stage | None | Input/Output | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
NS |
HashAggregateExec | The backend for hash based aggregations | None | Input/Output | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | PS not allowed for grouping expressions; UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS not allowed for grouping expressions; UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS not allowed for grouping expressions if containing Array or Map as child; UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
NS |
ObjectHashAggregateExec | The backend for hash based aggregations supporting TypedImperativeAggregate functions | None | Input/Output | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | PS only allowed when aggregate buffers can be converted between CPU and GPU |
NS | PS not allowed for grouping expressions; UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT |
PS not allowed for grouping expressions; UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT |
PS not allowed for grouping expressions if containing Array or Map as child; UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT |
NS |
SortAggregateExec | The backend for sort based aggregations | None | Input/Output | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | PS only allowed when aggregate buffers can be converted between CPU and GPU |
NS | PS not allowed for grouping expressions; UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT |
PS not allowed for grouping expressions; UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT |
PS not allowed for grouping expressions if containing Array or Map as child; UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT |
NS |
InMemoryTableScanExec | Implementation of InMemoryTableScanExec to use GPU accelerated caching | None | Input/Output | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | NS | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types NULL, BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types NULL, BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types NULL, BINARY, CALENDAR, UDT |
NS |
DataWritingCommandExec | Writing data | None | Input/Output | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | PS 128bit decimal only supported for Orc and Parquet |
NS | NS | NS | PS Only supported for Parquet; UTC is only supported TZ for child TIMESTAMP; unsupported child types NULL, BINARY, CALENDAR, UDT |
PS Only supported for Parquet; UTC is only supported TZ for child TIMESTAMP; unsupported child types NULL, BINARY, CALENDAR, UDT |
PS Only supported for Parquet; UTC is only supported TZ for child TIMESTAMP; unsupported child types NULL, BINARY, CALENDAR, UDT |
NS |
BatchScanExec | The backend for most file input | None | Input/Output | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | NS | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types NULL, BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types NULL, BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types NULL, BINARY, CALENDAR, UDT |
NS |
BroadcastExchangeExec | The backend for broadcast exchange of data | None | Input/Output | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, MAP, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, MAP, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, MAP, UDT |
NS |
ShuffleExchangeExec | The backend for most data being exchanged between processes | None | Input/Output | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | PS Round-robin partitioning is not supported if spark.sql.execution.sortBeforeRepartition is true; UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS Round-robin partitioning is not supported if spark.sql.execution.sortBeforeRepartition is true; UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS Round-robin partitioning is not supported for nested structs if spark.sql.execution.sortBeforeRepartition is true; UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
NS |
BroadcastHashJoinExec | Implementation of join using broadcast data | None | leftKeys | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, UDT |
NS | |
rightKeys | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, UDT |
NS | ||||
condition | S | ||||||||||||||||||||
Input/Output | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
NS | |||
BroadcastNestedLoopJoinExec | Implementation of join using brute force. Full outer joins and joins where the broadcast side matches the join side (e.g.: LeftOuter with left broadcast) are not supported | None | condition (A non-inner join only is supported if the condition expression can be converted to a GPU AST expression) |
S | |||||||||||||||||
Input/Output | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
NS | |||
Executor | Description | Notes | Param(s) | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
CartesianProductExec | Implementation of join using brute force | None | Input/Output | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
NS |
ShuffledHashJoinExec | Implementation of join using hashed shuffled data | None | leftKeys | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, UDT |
NS | |
rightKeys | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, UDT |
NS | ||||
condition | S | ||||||||||||||||||||
Input/Output | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
NS | |||
SortMergeJoinExec | Sort merge join, replacing with shuffled hash join | None | leftKeys | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, UDT |
NS | |
rightKeys | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, UDT |
NS | ||||
condition | S | ||||||||||||||||||||
Input/Output | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
NS | |||
AggregateInPandasExec | The backend for an Aggregation Pandas UDF, this accelerates the data transfer between the Java process and the Python process. It also supports scheduling GPU resources for the Python process when enabled. | None | Input/Output | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | NS | NS | NS | NS | NS | NS | NS | NS |
ArrowEvalPythonExec | The backend of the Scalar Pandas UDFs. Accelerates the data transfer between the Java process and the Python process. It also supports scheduling GPU resources for the Python process when enabled | None | Input/Output | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | NS | NS | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types DECIMAL, NULL, BINARY, CALENDAR, MAP, UDT |
NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types DECIMAL, NULL, BINARY, CALENDAR, MAP, UDT |
NS |
FlatMapGroupsInPandasExec | The backend for Flat Map Groups Pandas UDF, Accelerates the data transfer between the Java process and the Python process. It also supports scheduling GPU resources for the Python process when enabled. | None | Input/Output | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | NS | NS | NS | NS | NS | NS | NS | NS |
MapInPandasExec | The backend for Map Pandas Iterator UDF. Accelerates the data transfer between the Java process and the Python process. It also supports scheduling GPU resources for the Python process when enabled. | None | Input/Output | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | NS | NS | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types DECIMAL, NULL, BINARY, CALENDAR, MAP, UDT |
NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types DECIMAL, NULL, BINARY, CALENDAR, MAP, UDT |
NS |
WindowInPandasExec | The backend for Window Aggregation Pandas UDF, Accelerates the data transfer between the Java process and the Python process. It also supports scheduling GPU resources for the Python process when enabled. For now it only supports row based window frame. | This is disabled by default because it only supports row based frame for now | Input/Output | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | NS | NS | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types DECIMAL, NULL, BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT |
NS | NS | NS |
WindowExec | Window-operator backend | None | partitionSpec | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT |
NS |
Input/Output | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
NS |
Inside each node in the DAG there can be one or more trees of expressions that describe various types of processing that happens in that part of the plan. These can be things like adding two numbers together or checking for null. These expressions can have multiple input parameters and one output value. These expressions also can happen in different contexts. Because of how the accelerator works different contexts have different levels of support.
The most common expression context is project
. In this context values from a single
input row go through the expression and the result will also be use to produce
something in the same row. Be aware that even in the case of aggregation and window
operations most of the processing is still done in the project context either before
or after the other processing happens.
Aggregation operations like count or sum can take place in either the aggregation
,
reduction
, or window
context. aggregation
is when the operation was done while
grouping the data by one or more keys. reduction
is when there is no group by and
there is a single result for an entire column. window
is for window operations.
The final expression context is AST
or Abstract Syntax Tree.
Before explaining AST we first need to explain in detail how project context operations
work. Generally for a project context operation the plan Spark developed is read
on the CPU and an appropriate set of GPU kernels are selected to do those
operations. For example a >= b + 1
. Would result in calling a GPU kernel to add
1
to b
, followed by another kernel that is called to compare a
to that result.
The interpretation is happening on the CPU, and the GPU is used to do the processing.
For AST the interpretation for some reason cannot happen on the CPU and instead must
be done in the GPU kernel itself. An example of this is conditional joins. If you
want to join on A.a >= B.b + 1
where A
and B
are separate tables or data
frames, the +
and >=
operations cannot run as separate independent kernels
because it is done on a combination of rows in both A
and B
. Instead part of the
plan that Spark developed is turned into an abstract syntax tree and sent to the GPU
where it can be interpreted. The number and types of operations supported in this
are limited.
Expression | SQL Functions(s) | Description | Notes | Context | Param/Output | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Abs | `abs` | Absolute value | None | project | input | S | S | S | S | S | S | S | |||||||||||
result | S | S | S | S | S | S | S | ||||||||||||||||
AST | input | NS | NS | S | S | S | S | NS | |||||||||||||||
result | NS | NS | S | S | S | S | NS | ||||||||||||||||
Acos | `acos` | Inverse cosine | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
AST | input | S | |||||||||||||||||||||
result | S | ||||||||||||||||||||||
Acosh | `acosh` | Inverse hyperbolic cosine | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
AST | input | S | |||||||||||||||||||||
result | S | ||||||||||||||||||||||
Add | `+` | Addition | None | project | lhs | S | S | S | S | S | S | S | NS | ||||||||||
rhs | S | S | S | S | S | S | S | NS | |||||||||||||||
result | S | S | S | S | S | S | S | NS | |||||||||||||||
AST | lhs | NS | NS | S | S | S | S | NS | NS | ||||||||||||||
rhs | NS | NS | S | S | S | S | NS | NS | |||||||||||||||
result | NS | NS | S | S | S | S | NS | NS | |||||||||||||||
Expression | SQL Functions(s) | Description | Notes | Context | Param/Output | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
Alias | Gives a column a name | None | project | input | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
NS | |
result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
NS | |||||
AST | input | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
NS | NS | NS | NS | NS | NS | NS | NS | NS | ||||
result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
NS | NS | NS | NS | NS | NS | NS | NS | NS | |||||
And | `and` | Logical AND | None | project | lhs | S | |||||||||||||||||
rhs | S | ||||||||||||||||||||||
result | S | ||||||||||||||||||||||
AST | lhs | S | |||||||||||||||||||||
rhs | S | ||||||||||||||||||||||
result | S | ||||||||||||||||||||||
ArrayContains | `array_contains` | Returns a boolean if the array contains the passed in key | None | project | array | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types DECIMAL, BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT |
|||||||||||||||||
key | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | NS | NS | NS | NS | NS | NS | NS | NS | |||||
result | S | ||||||||||||||||||||||
ArrayExists | `exists` | Return true if any element satisfies the predicate LambdaFunction | None | project | argument | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
|||||||||||||||||
function | S | ||||||||||||||||||||||
result | S | ||||||||||||||||||||||
Expression | SQL Functions(s) | Description | Notes | Context | Param/Output | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
ArrayMax | `array_max` | Returns the maximum value in the array | None | project | input | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, STRUCT, UDT |
|||||||||||||||||
result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | NS | NS | NS | ||||||
ArrayMin | `array_min` | Returns the minimum value in the array | None | project | input | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, STRUCT, UDT |
|||||||||||||||||
result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | NS | NS | NS | ||||||
ArrayRepeat | `array_repeat` | Returns the array containing the given input value (left) count (right) times | None | project | left | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
NS |
right | S | S | S | S | |||||||||||||||||||
result | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
||||||||||||||||||||||
ArrayTransform | `transform` | Transform elements in an array using the transform function. This is similar to a `map` in functional programming | None | project | argument | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
|||||||||||||||||
function | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
NS | |||||
result | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
||||||||||||||||||||||
ArraysZip | `arrays_zip` | Returns a merged array of structs in which the N-th struct contains all N-th values of input arrays. | None | project | children | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
|||||||||||||||||
result | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
||||||||||||||||||||||
Asin | `asin` | Inverse sine | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
AST | input | S | |||||||||||||||||||||
result | S | ||||||||||||||||||||||
Expression | SQL Functions(s) | Description | Notes | Context | Param/Output | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
Asinh | `asinh` | Inverse hyperbolic sine | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
AST | input | S | |||||||||||||||||||||
result | S | ||||||||||||||||||||||
AtLeastNNonNulls | Checks if number of non null/Nan values is greater than a given value | None | project | input | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
NS | |
result | S | ||||||||||||||||||||||
Atan | `atan` | Inverse tangent | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
AST | input | S | |||||||||||||||||||||
result | S | ||||||||||||||||||||||
Atanh | `atanh` | Inverse hyperbolic tangent | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
AST | input | S | |||||||||||||||||||||
result | S | ||||||||||||||||||||||
AttributeReference | References an input column | None | project | result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
NS | |
AST | result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
NS | NS | NS | NS | NS | NS | NS | NS | NS | ||||
Expression | SQL Functions(s) | Description | Notes | Context | Param/Output | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
BRound | `bround` | Round an expression to d decimal places using HALF_EVEN rounding mode | None | project | value | S | S | S | S | PS result may round slightly differently |
PS result may round slightly differently |
S | |||||||||||
scale | S | ||||||||||||||||||||||
result | S | S | S | S | S | S | S | ||||||||||||||||
BitLength | `bit_length` | The bit length of string data | None | project | input | S | NS | ||||||||||||||||
result | S | ||||||||||||||||||||||
BitwiseAnd | `&` | Returns the bitwise AND of the operands | None | project | lhs | S | S | S | S | ||||||||||||||
rhs | S | S | S | S | |||||||||||||||||||
result | S | S | S | S | |||||||||||||||||||
AST | lhs | NS | NS | S | S | ||||||||||||||||||
rhs | NS | NS | S | S | |||||||||||||||||||
result | NS | NS | S | S | |||||||||||||||||||
BitwiseNot | `~` | Returns the bitwise NOT of the operands | None | project | input | S | S | S | S | ||||||||||||||
result | S | S | S | S | |||||||||||||||||||
AST | input | NS | NS | S | S | ||||||||||||||||||
result | NS | NS | S | S | |||||||||||||||||||
Expression | SQL Functions(s) | Description | Notes | Context | Param/Output | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
BitwiseOr | `\|` | Returns the bitwise OR of the operands | None | project | lhs | S | S | S | S | ||||||||||||||
rhs | S | S | S | S | |||||||||||||||||||
result | S | S | S | S | |||||||||||||||||||
AST | lhs | NS | NS | S | S | ||||||||||||||||||
rhs | NS | NS | S | S | |||||||||||||||||||
result | NS | NS | S | S | |||||||||||||||||||
BitwiseXor | `^` | Returns the bitwise XOR of the operands | None | project | lhs | S | S | S | S | ||||||||||||||
rhs | S | S | S | S | |||||||||||||||||||
result | S | S | S | S | |||||||||||||||||||
AST | lhs | NS | NS | S | S | ||||||||||||||||||
rhs | NS | NS | S | S | |||||||||||||||||||
result | NS | NS | S | S | |||||||||||||||||||
CaseWhen | `when` | CASE WHEN expression | None | project | predicate | S | |||||||||||||||||
value | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
NS | |||||
result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
NS | |||||
Expression | SQL Functions(s) | Description | Notes | Context | Param/Output | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
Cbrt | `cbrt` | Cube root | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
AST | input | S | |||||||||||||||||||||
result | S | ||||||||||||||||||||||
Ceil | `ceiling`, `ceil` | Ceiling of a number | None | project | input | S | S | S | |||||||||||||||
result | S | S | S | ||||||||||||||||||||
CheckOverflow | CheckOverflow after arithmetic operations between DecimalType data | None | project | input | S | ||||||||||||||||||
result | S | ||||||||||||||||||||||
Coalesce | `coalesce` | Returns the first non-null argument if exists. Otherwise, null | None | project | param | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, MAP, UDT |
NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, MAP, UDT |
NS |
result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, MAP, UDT |
NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, MAP, UDT |
NS | |||||
Concat | `concat` | List/String concatenate | None | project | input | S | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
|||||||||||||||
result | S | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
||||||||||||||||||||
ConcatWs | `concat_ws` | Concatenates multiple input strings or array of strings into a single string using a given separator | None | project | input | S | S | ||||||||||||||||
result | S | ||||||||||||||||||||||
Contains | Contains | None | project | src | S | ||||||||||||||||||
search | PS Literal value only |
||||||||||||||||||||||
result | S | ||||||||||||||||||||||
Expression | SQL Functions(s) | Description | Notes | Context | Param/Output | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
Cos | `cos` | Cosine | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
AST | input | S | |||||||||||||||||||||
result | S | ||||||||||||||||||||||
Cosh | `cosh` | Hyperbolic cosine | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
AST | input | S | |||||||||||||||||||||
result | S | ||||||||||||||||||||||
Cot | `cot` | Cotangent | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
AST | input | S | |||||||||||||||||||||
result | S | ||||||||||||||||||||||
CreateArray | `array` | Returns an array with the given elements | None | project | arg | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, MAP, UDT |
NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, MAP, UDT |
NS |
result | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, MAP, UDT |
||||||||||||||||||||||
CreateMap | `map` | Create a map | None | project | key | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | PS UTC is only supported TZ for child TIMESTAMP |
PS UTC is only supported TZ for child TIMESTAMP |
||||
value | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | PS UTC is only supported TZ for child TIMESTAMP |
PS UTC is only supported TZ for child TIMESTAMP |
PS UTC is only supported TZ for child TIMESTAMP |
||||||||
Expression | SQL Functions(s) | Description | Notes | Context | Param/Output | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
CreateNamedStruct | `named_struct`, `struct` | Creates a struct with the given field names and values | None | project | name | S | |||||||||||||||||
value | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
NS | |||||
result | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
||||||||||||||||||||||
CurrentRow$ | Special boundary for a window frame, indicating stopping at the current row | None | project | result | S | ||||||||||||||||||
DateAdd | `date_add` | Returns the date that is num_days after start_date | None | project | startDate | S | |||||||||||||||||
days | S | S | S | ||||||||||||||||||||
result | S | ||||||||||||||||||||||
DateAddInterval | Adds interval to date | None | project | start | S | ||||||||||||||||||
interval | PS month intervals are not supported; Literal value only |
||||||||||||||||||||||
result | S | ||||||||||||||||||||||
DateDiff | `datediff` | Returns the number of days from startDate to endDate | None | project | lhs | S | |||||||||||||||||
rhs | S | ||||||||||||||||||||||
result | S | ||||||||||||||||||||||
DateFormatClass | `date_format` | Converts timestamp to a value of string in the format specified by the date format | None | project | timestamp | PS UTC is only supported TZ for TIMESTAMP |
|||||||||||||||||
strfmt | PS A limited number of formats are supported; Literal value only |
||||||||||||||||||||||
result | S | ||||||||||||||||||||||
Expression | SQL Functions(s) | Description | Notes | Context | Param/Output | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
DateSub | `date_sub` | Returns the date that is num_days before start_date | None | project | startDate | S | |||||||||||||||||
days | S | S | S | ||||||||||||||||||||
result | S | ||||||||||||||||||||||
DayOfMonth | `dayofmonth`, `day` | Returns the day of the month from a date or timestamp | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
DayOfWeek | `dayofweek` | Returns the day of the week (1 = Sunday...7=Saturday) | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
DayOfYear | `dayofyear` | Returns the day of the year from a date or timestamp | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
DenseRank | `dense_rank` | Window function that returns the dense rank value within the aggregation window | None | window | ordering | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | NS | NS | NS | NS |
result | S | ||||||||||||||||||||||
Divide | `/` | Division | None | project | lhs | S | S | ||||||||||||||||
rhs | S | S | |||||||||||||||||||||
result | S | PS Because of Spark's inner workings the full range of decimal precision (even for 128-bit values) is not supported. |
|||||||||||||||||||||
ElementAt | `element_at` | Returns element of array at given(1-based) index in value if column is array. Returns value for the given key in value if column is map. | None | project | array/map | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS If it's map, only primitive key types are supported.; UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
||||||||||||||||
index/key | PS Unsupported as array index. Only Literals supported as map keys. |
PS Unsupported as array index. Only Literals supported as map keys. |
PS Unsupported as array index. Only Literals supported as map keys. |
PS Supported as array index. Only Literals supported as map keys. |
PS Unsupported as array index. Only Literals supported as map keys. |
PS Unsupported as array index. Only Literals supported as map keys. |
PS Unsupported as array index. Only Literals supported as map keys. |
PS Unsupported as array index. Only Literals supported as map keys. |
PS Unsupported as array index. Only Literals supported as map keys.; UTC is only supported TZ for TIMESTAMP |
PS Unsupported as array index. Only Literals supported as map keys. |
PS Unsupported as array index. Only Literals supported as map keys. |
NS | NS | NS | NS | NS | NS | NS | |||||
result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
NS | |||||
Expression | SQL Functions(s) | Description | Notes | Context | Param/Output | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
EndsWith | Ends with | None | project | src | S | ||||||||||||||||||
search | PS Literal value only |
||||||||||||||||||||||
result | S | ||||||||||||||||||||||
EqualNullSafe | `<=>` | Check if the values are equal including nulls <=> | None | project | lhs | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | NS | NS | NS | |
rhs | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | NS | NS | NS | ||||||
result | S | ||||||||||||||||||||||
EqualTo | `=`, `==` | Check if the values are equal | None | project | lhs | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | NS | NS | NS | |
rhs | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | NS | NS | NS | ||||||
result | S | ||||||||||||||||||||||
AST | lhs | S | S | S | S | S | NS | NS | S | PS UTC is only supported TZ for TIMESTAMP |
NS | NS | NS | NS | NS | NS | NS | NS | |||||
rhs | S | S | S | S | S | NS | NS | S | PS UTC is only supported TZ for TIMESTAMP |
NS | NS | NS | NS | NS | NS | NS | NS | ||||||
result | S | ||||||||||||||||||||||
Exp | `exp` | Euler's number e raised to a power | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
AST | input | S | |||||||||||||||||||||
result | S | ||||||||||||||||||||||
Expression | SQL Functions(s) | Description | Notes | Context | Param/Output | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
Explode | `explode`, `explode_outer` | Given an input array produces a sequence of rows for each value in the array | None | project | input | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
||||||||||||||||
result | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
||||||||||||||||||||||
Expm1 | `expm1` | Euler's number e raised to a power minus 1 | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
AST | input | S | |||||||||||||||||||||
result | S | ||||||||||||||||||||||
Floor | `floor` | Floor of a number | None | project | input | S | S | S | |||||||||||||||
result | S | S | S | ||||||||||||||||||||
FromUnixTime | `from_unixtime` | Get the string from a unix timestamp | None | project | sec | S | |||||||||||||||||
format | PS Only a limited number of formats are supported; Literal value only |
||||||||||||||||||||||
result | S | ||||||||||||||||||||||
GetArrayItem | Gets the field at `ordinal` in the Array | None | project | array | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
||||||||||||||||||
ordinal | S | ||||||||||||||||||||||
result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
NS | |||||
GetArrayStructFields | Extracts the `ordinal`-th fields of all array elements for the data with the type of array of struct | None | project | input | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
||||||||||||||||||
result | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
||||||||||||||||||||||
Expression | SQL Functions(s) | Description | Notes | Context | Param/Output | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
GetJsonObject | `get_json_object` | Extracts a json object from path | None | project | json | S | |||||||||||||||||
path | PS Literal value only |
||||||||||||||||||||||
result | S | ||||||||||||||||||||||
GetMapValue | Gets Value from a Map based on a key | None | project | map | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
||||||||||||||||||
key | PS Literal value only |
PS Literal value only |
PS Literal value only |
PS Literal value only |
PS Literal value only |
PS Literal value only |
PS Literal value only |
PS Literal value only |
PS UTC is only supported TZ for TIMESTAMP; Literal value only |
PS Literal value only |
PS max DECIMAL precision of 18; Literal value only |
NS | NS | NS | NS | NS | NS | NS | |||||
result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
NS | |||||
GetStructField | Gets the named field of the struct | None | project | input | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
||||||||||||||||||
result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
NS | |||||
GetTimestamp | Gets timestamps from strings using given pattern. | None | project | timeExp | S | PS UTC is only supported TZ for TIMESTAMP |
S | ||||||||||||||||
format | PS A limited number of formats are supported; Literal value only |
||||||||||||||||||||||
result | PS UTC is only supported TZ for TIMESTAMP |
||||||||||||||||||||||
GreaterThan | `>` | > operator | None | project | lhs | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | NS | NS | NS | |
rhs | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | NS | NS | NS | ||||||
result | S | ||||||||||||||||||||||
AST | lhs | S | S | S | S | S | NS | NS | S | PS UTC is only supported TZ for TIMESTAMP |
NS | NS | NS | NS | NS | NS | NS | NS | |||||
rhs | S | S | S | S | S | NS | NS | S | PS UTC is only supported TZ for TIMESTAMP |
NS | NS | NS | NS | NS | NS | NS | NS | ||||||
result | S | ||||||||||||||||||||||
Expression | SQL Functions(s) | Description | Notes | Context | Param/Output | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
GreaterThanOrEqual | `>=` | >= operator | None | project | lhs | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | NS | NS | NS | |
rhs | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | NS | NS | NS | ||||||
result | S | ||||||||||||||||||||||
AST | lhs | S | S | S | S | S | NS | NS | S | PS UTC is only supported TZ for TIMESTAMP |
NS | NS | NS | NS | NS | NS | NS | NS | |||||
rhs | S | S | S | S | S | NS | NS | S | PS UTC is only supported TZ for TIMESTAMP |
NS | NS | NS | NS | NS | NS | NS | NS | ||||||
result | S | ||||||||||||||||||||||
Greatest | `greatest` | Returns the greatest value of all parameters, skipping null values | None | project | param | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | NS | NS | NS | |
result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | NS | NS | NS | ||||||
Hour | `hour` | Returns the hour component of the string/timestamp | None | project | input | PS UTC is only supported TZ for TIMESTAMP |
|||||||||||||||||
result | S | ||||||||||||||||||||||
Hypot | `hypot` | Pythagorean addition (Hypotenuse) of real numbers | None | project | lhs | S | |||||||||||||||||
rhs | S | ||||||||||||||||||||||
result | S | ||||||||||||||||||||||
If | `if` | IF expression | None | project | predicate | S | |||||||||||||||||
trueValue | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
NS | |||||
falseValue | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
NS | |||||
result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
NS | |||||
Expression | SQL Functions(s) | Description | Notes | Context | Param/Output | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
In | `in` | IN operator | None | project | value | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | NS | NS | NS | |
list | PS Literal value only |
PS Literal value only |
PS Literal value only |
PS Literal value only |
PS Literal value only |
PS Literal value only |
PS Literal value only |
PS Literal value only |
PS UTC is only supported TZ for TIMESTAMP; Literal value only |
PS Literal value only |
PS Literal value only |
NS | NS | NS | NS | NS | NS | ||||||
result | S | ||||||||||||||||||||||
InSet | INSET operator | None | project | input | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | NS | NS | NS | ||
result | S | ||||||||||||||||||||||
InitCap | `initcap` | Returns str with the first letter of each word in uppercase. All other letters are in lowercase | This is not 100% compatible with the Spark version because the Unicode version used by cuDF and the JVM may differ, resulting in some corner-case characters not changing case correctly. | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
InputFileBlockLength | `input_file_block_length` | Returns the length of the block being read, or -1 if not available | None | project | result | S | |||||||||||||||||
InputFileBlockStart | `input_file_block_start` | Returns the start offset of the block being read, or -1 if not available | None | project | result | S | |||||||||||||||||
InputFileName | `input_file_name` | Returns the name of the file being read, or empty string if not available | None | project | result | S | |||||||||||||||||
IntegralDivide | `div` | Division with a integer result | None | project | lhs | S | S | ||||||||||||||||
rhs | S | S | |||||||||||||||||||||
result | S | ||||||||||||||||||||||
IsNaN | `isnan` | Checks if a value is NaN | None | project | input | S | S | ||||||||||||||||
result | S | ||||||||||||||||||||||
Expression | SQL Functions(s) | Description | Notes | Context | Param/Output | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
IsNotNull | `isnotnull` | Checks if a value is not null | None | project | input | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
NS |
result | S | ||||||||||||||||||||||
IsNull | `isnull` | Checks if a value is null | None | project | input | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
NS |
result | S | ||||||||||||||||||||||
KnownFloatingPointNormalized | Tag to prevent redundant normalization | None | project | input | S | S | |||||||||||||||||
result | S | S | |||||||||||||||||||||
KnownNotNull | Tag an expression as known to not be null | None | project | input | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | NS | S | S | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types NULL, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types NULL, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types NULL, UDT |
NS | |
result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | NS | S | S | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types NULL, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types NULL, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types NULL, UDT |
NS | |||||
Lag | `lag` | Window function that returns N entries behind this one | None | window | input | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, MAP, UDT |
NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, MAP, UDT |
NS |
offset | S | ||||||||||||||||||||||
default | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, MAP, UDT |
NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, MAP, UDT |
NS | |||||
result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, MAP, UDT |
NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, MAP, UDT |
NS | |||||
LambdaFunction | Holds a higher order SQL function | None | project | function | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
NS | |
arguments | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
NS | |||||
result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
NS | |||||
Expression | SQL Functions(s) | Description | Notes | Context | Param/Output | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
LastDay | `last_day` | Returns the last day of the month which the date belongs to | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
Lead | `lead` | Window function that returns N entries ahead of this one | None | window | input | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, MAP, UDT |
NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, MAP, UDT |
NS |
offset | S | ||||||||||||||||||||||
default | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, MAP, UDT |
NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, MAP, UDT |
NS | |||||
result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, MAP, UDT |
NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, MAP, UDT |
NS | |||||
Least | `least` | Returns the least value of all parameters, skipping null values | None | project | param | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | NS | NS | NS | |
result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | NS | NS | NS | ||||||
Length | `length`, `character_length`, `char_length` | String character length or binary byte length | None | project | input | S | NS | ||||||||||||||||
result | S | ||||||||||||||||||||||
LessThan | `<` | < operator | None | project | lhs | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | NS | NS | NS | |
rhs | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | NS | NS | NS | ||||||
result | S | ||||||||||||||||||||||
AST | lhs | S | S | S | S | S | NS | NS | S | PS UTC is only supported TZ for TIMESTAMP |
NS | NS | NS | NS | NS | NS | NS | NS | |||||
rhs | S | S | S | S | S | NS | NS | S | PS UTC is only supported TZ for TIMESTAMP |
NS | NS | NS | NS | NS | NS | NS | NS | ||||||
result | S | ||||||||||||||||||||||
Expression | SQL Functions(s) | Description | Notes | Context | Param/Output | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
LessThanOrEqual | `<=` | <= operator | None | project | lhs | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | NS | NS | NS | |
rhs | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | NS | NS | NS | ||||||
result | S | ||||||||||||||||||||||
AST | lhs | S | S | S | S | S | NS | NS | S | PS UTC is only supported TZ for TIMESTAMP |
NS | NS | NS | NS | NS | NS | NS | NS | |||||
rhs | S | S | S | S | S | NS | NS | S | PS UTC is only supported TZ for TIMESTAMP |
NS | NS | NS | NS | NS | NS | NS | NS | ||||||
result | S | ||||||||||||||||||||||
Like | `like` | Like | None | project | src | S | |||||||||||||||||
search | PS Literal value only |
||||||||||||||||||||||
result | S | ||||||||||||||||||||||
Literal | Holds a static value from the query | None | project | result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | S | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
NS | |
AST | result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
NS | NS | NS | NS | NS | NS | NS | NS | NS | ||||
Log | `ln` | Natural log | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
Log10 | `log10` | Log base 10 | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
Expression | SQL Functions(s) | Description | Notes | Context | Param/Output | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
Log1p | `log1p` | Natural log 1 + expr | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
Log2 | `log2` | Log base 2 | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
Logarithm | `log` | Log variable base | None | project | value | S | |||||||||||||||||
base | S | ||||||||||||||||||||||
result | S | ||||||||||||||||||||||
Lower | `lower`, `lcase` | String lowercase operator | This is not 100% compatible with the Spark version because the Unicode version used by cuDF and the JVM may differ, resulting in some corner-case characters not changing case correctly. | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
MakeDecimal | Create a Decimal from an unscaled long value for some aggregation optimizations | None | project | input | S | ||||||||||||||||||
result | PS max DECIMAL precision of 18 |
||||||||||||||||||||||
MapConcat | `map_concat` | Returns the union of all the given maps | None | project | input | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT |
|||||||||||||||||
result | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT |
||||||||||||||||||||||
MapEntries | `map_entries` | Returns an unordered array of all entries in the given map | None | project | input | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
|||||||||||||||||
result | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
||||||||||||||||||||||
Expression | SQL Functions(s) | Description | Notes | Context | Param/Output | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
MapFilter | `map_filter` | Filters entries in a map using the function | None | project | argument | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
|||||||||||||||||
function | S | ||||||||||||||||||||||
result | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
||||||||||||||||||||||
MapKeys | `map_keys` | Returns an unordered array containing the keys of the map | None | project | input | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
|||||||||||||||||
result | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
||||||||||||||||||||||
MapValues | `map_values` | Returns an unordered array containing the values of the map | None | project | input | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
|||||||||||||||||
result | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
||||||||||||||||||||||
Md5 | `md5` | MD5 hash operator | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
Minute | `minute` | Returns the minute component of the string/timestamp | None | project | input | PS UTC is only supported TZ for TIMESTAMP |
|||||||||||||||||
result | S | ||||||||||||||||||||||
MonotonicallyIncreasingID | `monotonically_increasing_id` | Returns monotonically increasing 64-bit integers | None | project | result | S | |||||||||||||||||
Month | `month` | Returns the month from a date or timestamp | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
Multiply | `*` | Multiplication | None | project | lhs | S | S | S | S | S | S | S | |||||||||||
rhs | S | S | S | S | S | S | S | ||||||||||||||||
result | S | S | S | S | S | S | PS Because of Spark's inner workings the full range of decimal precision (even for 128-bit values) is not supported. |
||||||||||||||||
AST | lhs | NS | NS | S | S | S | S | NS | |||||||||||||||
rhs | NS | NS | S | S | S | S | NS | ||||||||||||||||
result | NS | NS | S | S | S | S | NS | ||||||||||||||||
Expression | SQL Functions(s) | Description | Notes | Context | Param/Output | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
Murmur3Hash | `hash` | Murmur3 hash operator | None | project | input | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, MAP, UDT |
NS |
result | S | ||||||||||||||||||||||
NaNvl | `nanvl` | Evaluates to `left` iff left is not NaN, `right` otherwise | None | project | lhs | S | S | ||||||||||||||||
rhs | S | S | |||||||||||||||||||||
result | S | S | |||||||||||||||||||||
NamedLambdaVariable | A parameter to a higher order SQL function | None | project | result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
NS | |
Not | `!`, `not` | Boolean not operator | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
AST | input | S | |||||||||||||||||||||
result | S | ||||||||||||||||||||||
OctetLength | `octet_length` | The byte length of string data | None | project | input | S | NS | ||||||||||||||||
result | S | ||||||||||||||||||||||
Or | `or` | Logical OR | None | project | lhs | S | |||||||||||||||||
rhs | S | ||||||||||||||||||||||
result | S | ||||||||||||||||||||||
AST | lhs | S | |||||||||||||||||||||
rhs | S | ||||||||||||||||||||||
result | S | ||||||||||||||||||||||
Expression | SQL Functions(s) | Description | Notes | Context | Param/Output | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
PercentRank | `percent_rank` | Window function that returns the percent rank value within the aggregation window | None | window | ordering | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | NS | NS | NS | NS |
result | S | ||||||||||||||||||||||
Pmod | `pmod` | Pmod | None | project | lhs | S | S | S | S | S | S | S | |||||||||||
rhs | S | S | S | S | S | S | S | ||||||||||||||||
result | S | S | S | S | S | S | S | ||||||||||||||||
PosExplode | `posexplode_outer`, `posexplode` | Given an input array produces a sequence of rows for each value in the array | None | project | input | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
||||||||||||||||
result | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
||||||||||||||||||||||
Pow | `pow`, `power` | lhs ^ rhs | None | project | lhs | S | |||||||||||||||||
rhs | S | ||||||||||||||||||||||
result | S | ||||||||||||||||||||||
AST | lhs | S | |||||||||||||||||||||
rhs | S | ||||||||||||||||||||||
result | S | ||||||||||||||||||||||
PreciseTimestampConversion | Expression used internally to convert the TimestampType to Long and back without losing precision, i.e. in microseconds. Used in time windowing | None | project | input | S | PS UTC is only supported TZ for TIMESTAMP |
|||||||||||||||||
result | S | PS UTC is only supported TZ for TIMESTAMP |
|||||||||||||||||||||
Expression | SQL Functions(s) | Description | Notes | Context | Param/Output | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
PromotePrecision | PromotePrecision before arithmetic operations between DecimalType data | None | project | input | S | ||||||||||||||||||
result | S | ||||||||||||||||||||||
PythonUDF | UDF run in an external python process. Does not actually run on the GPU, but the transfer of data to/from it can be accelerated | None | aggregation | param | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | NS | NS | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types DECIMAL, NULL, BINARY, CALENDAR, MAP, UDT |
NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types DECIMAL, NULL, BINARY, CALENDAR, MAP, UDT |
NS | |
result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | NS | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types DECIMAL, NULL, BINARY, MAP |
NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types DECIMAL, NULL, BINARY, MAP |
|||||||
reduction | param | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | NS | NS | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types DECIMAL, NULL, BINARY, CALENDAR, MAP, UDT |
NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types DECIMAL, NULL, BINARY, CALENDAR, MAP, UDT |
NS | ||||
result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | NS | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types DECIMAL, NULL, BINARY, MAP |
NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types DECIMAL, NULL, BINARY, MAP |
|||||||
window | param | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | NS | NS | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types DECIMAL, NULL, BINARY, CALENDAR, MAP, UDT |
NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types DECIMAL, NULL, BINARY, CALENDAR, MAP, UDT |
NS | ||||
result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | NS | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types DECIMAL, NULL, BINARY, MAP |
NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types DECIMAL, NULL, BINARY, MAP |
|||||||
project | param | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | NS | NS | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types DECIMAL, NULL, BINARY, CALENDAR, MAP, UDT |
NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types DECIMAL, NULL, BINARY, CALENDAR, MAP, UDT |
NS | ||||
result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | NS | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types DECIMAL, NULL, BINARY, MAP |
NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types DECIMAL, NULL, BINARY, MAP |
|||||||
Quarter | `quarter` | Returns the quarter of the year for date, in the range 1 to 4 | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
RLike | `rlike` | Regular expression version of Like | None | project | str | S | |||||||||||||||||
regexp | PS Literal value only |
||||||||||||||||||||||
result | S | ||||||||||||||||||||||
Expression | SQL Functions(s) | Description | Notes | Context | Param/Output | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
RaiseError | `raise_error` | Throw an exception | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
Rand | `random`, `rand` | Generate a random column with i.i.d. uniformly distributed values in [0, 1) | None | project | seed | S | S | ||||||||||||||||
result | S | ||||||||||||||||||||||
Rank | `rank` | Window function that returns the rank value within the aggregation window | None | window | ordering | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | NS | NS | NS | NS |
result | S | ||||||||||||||||||||||
RegExpExtract | `regexp_extract` | Extract a specific group identified by a regular expression | None | project | str | S | |||||||||||||||||
regexp | PS Literal value only |
||||||||||||||||||||||
idx | S | ||||||||||||||||||||||
result | S | ||||||||||||||||||||||
RegExpReplace | `regexp_replace` | String replace using a regular expression pattern | None | project | regex | PS Literal value only |
|||||||||||||||||
result | S | ||||||||||||||||||||||
pos | PS only a value of 1 is supported |
||||||||||||||||||||||
str | S | ||||||||||||||||||||||
rep | PS Literal value only |
||||||||||||||||||||||
Expression | SQL Functions(s) | Description | Notes | Context | Param/Output | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
Remainder | `%`, `mod` | Remainder or modulo | None | project | lhs | S | S | S | S | S | S | S | |||||||||||
rhs | S | S | S | S | S | S | S | ||||||||||||||||
result | S | S | S | S | S | S | S | ||||||||||||||||
ReplicateRows | Given an input row replicates the row N times | None | project | input | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, MAP, UDT |
NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, MAP, UDT |
NS | |
result | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, MAP, UDT |
||||||||||||||||||||||
Rint | `rint` | Rounds up a double value to the nearest double equal to an integer | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
AST | input | S | |||||||||||||||||||||
result | S | ||||||||||||||||||||||
Round | `round` | Round an expression to d decimal places using HALF_UP rounding mode | None | project | value | S | S | S | S | PS result may round slightly differently |
PS result may round slightly differently |
S | |||||||||||
scale | S | ||||||||||||||||||||||
result | S | S | S | S | S | S | S | ||||||||||||||||
RowNumber | `row_number` | Window function that returns the index for the row within the aggregation window | None | window | result | S | |||||||||||||||||
ScalaUDF | User Defined Function, the UDF can choose to implement a RAPIDS accelerated interface to get better performance. | None | project | param | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | S | S | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types UDT |
NS | |
result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | S | S | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types UDT |
NS | |||||
Expression | SQL Functions(s) | Description | Notes | Context | Param/Output | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
Second | `second` | Returns the second component of the string/timestamp | None | project | input | PS UTC is only supported TZ for TIMESTAMP |
|||||||||||||||||
result | S | ||||||||||||||||||||||
Sequence | `sequence` | Sequence | None | project | start | S | S | S | S | NS | NS | ||||||||||||
stop | S | S | S | S | NS | NS | |||||||||||||||||
step | S | S | S | S | NS | ||||||||||||||||||
result | PS unsupported child types DATE, TIMESTAMP |
||||||||||||||||||||||
ShiftLeft | `shiftleft` | Bitwise shift left (<<) | None | project | value | S | S | ||||||||||||||||
amount | S | ||||||||||||||||||||||
result | S | S | |||||||||||||||||||||
ShiftRight | `shiftright` | Bitwise shift right (>>) | None | project | value | S | S | ||||||||||||||||
amount | S | ||||||||||||||||||||||
result | S | S | |||||||||||||||||||||
ShiftRightUnsigned | `shiftrightunsigned` | Bitwise unsigned shift right (>>>) | None | project | value | S | S | ||||||||||||||||
amount | S | ||||||||||||||||||||||
result | S | S | |||||||||||||||||||||
Expression | SQL Functions(s) | Description | Notes | Context | Param/Output | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
Signum | `sign`, `signum` | Returns -1.0, 0.0 or 1.0 as expr is negative, 0 or positive | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
Sin | `sin` | Sine | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
AST | input | S | |||||||||||||||||||||
result | S | ||||||||||||||||||||||
Sinh | `sinh` | Hyperbolic sine | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
AST | input | S | |||||||||||||||||||||
result | S | ||||||||||||||||||||||
Size | `size`, `cardinality` | The size of an array or a map | None | project | input | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
||||||||||||||||
result | S | ||||||||||||||||||||||
SortArray | `sort_array` | Returns a sorted array with the input array and the ascending / descending order | None | project | array | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, MAP, UDT |
|||||||||||||||||
ascendingOrder | S | ||||||||||||||||||||||
result | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, MAP, UDT |
||||||||||||||||||||||
Expression | SQL Functions(s) | Description | Notes | Context | Param/Output | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
SortOrder | Sort order | None | project | input | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, UDT |
NS | ||
result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, UDT |
NS | ||||||
SparkPartitionID | `spark_partition_id` | Returns the current partition id | None | project | result | S | |||||||||||||||||
SpecifiedWindowFrame | Specification of the width of the group (or "frame") of input rows around which a window function is evaluated | None | project | lower | S | S | S | S | NS | NS | NS | S | |||||||||||
upper | S | S | S | S | NS | NS | NS | S | |||||||||||||||
result | S | S | S | S | NS | NS | NS | S | |||||||||||||||
Sqrt | `sqrt` | Square root | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
AST | input | S | |||||||||||||||||||||
result | S | ||||||||||||||||||||||
StartsWith | Starts with | None | project | src | S | ||||||||||||||||||
search | PS Literal value only |
||||||||||||||||||||||
result | S | ||||||||||||||||||||||
StringLPad | `lpad` | Pad a string on the left | None | project | str | S | |||||||||||||||||
len | PS Literal value only |
||||||||||||||||||||||
pad | PS Literal value only |
||||||||||||||||||||||
result | S | ||||||||||||||||||||||
Expression | SQL Functions(s) | Description | Notes | Context | Param/Output | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
StringLocate | `position`, `locate` | Substring search operator | None | project | substr | PS Literal value only |
|||||||||||||||||
str | S | ||||||||||||||||||||||
start | PS Literal value only |
||||||||||||||||||||||
result | S | ||||||||||||||||||||||
StringRPad | `rpad` | Pad a string on the right | None | project | str | S | |||||||||||||||||
len | PS Literal value only |
||||||||||||||||||||||
pad | PS Literal value only |
||||||||||||||||||||||
result | S | ||||||||||||||||||||||
StringRepeat | `repeat` | StringRepeat operator that repeats the given strings with numbers of times given by repeatTimes | None | project | input | S | |||||||||||||||||
repeatTimes | S | ||||||||||||||||||||||
result | S | ||||||||||||||||||||||
StringReplace | `replace` | StringReplace operator | None | project | src | S | |||||||||||||||||
search | PS Literal value only |
||||||||||||||||||||||
replace | PS Literal value only |
||||||||||||||||||||||
result | S | ||||||||||||||||||||||
Expression | SQL Functions(s) | Description | Notes | Context | Param/Output | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
StringSplit | `split` | Splits `str` around occurrences that match `regex` | None | project | str | S | |||||||||||||||||
regexp | PS very limited subset of regex supported; Literal value only |
||||||||||||||||||||||
limit | PS Literal value only |
||||||||||||||||||||||
result | S | ||||||||||||||||||||||
StringToMap | `str_to_map` | Creates a map after splitting the input string into pairs of key-value strings | None | project | str | S | |||||||||||||||||
pairDelim | S | ||||||||||||||||||||||
keyValueDelim | S | ||||||||||||||||||||||
result | S | ||||||||||||||||||||||
StringTrim | `trim` | StringTrim operator | None | project | src | S | |||||||||||||||||
trimStr | PS Literal value only |
||||||||||||||||||||||
result | S | ||||||||||||||||||||||
StringTrimLeft | `ltrim` | StringTrimLeft operator | None | project | src | S | |||||||||||||||||
trimStr | PS Literal value only |
||||||||||||||||||||||
result | S | ||||||||||||||||||||||
StringTrimRight | `rtrim` | StringTrimRight operator | None | project | src | S | |||||||||||||||||
trimStr | PS Literal value only |
||||||||||||||||||||||
result | S | ||||||||||||||||||||||
Expression | SQL Functions(s) | Description | Notes | Context | Param/Output | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
Substring | `substr`, `substring` | Substring operator | None | project | str | S | NS | ||||||||||||||||
pos | PS Literal value only |
||||||||||||||||||||||
len | PS Literal value only |
||||||||||||||||||||||
result | S | NS | |||||||||||||||||||||
SubstringIndex | `substring_index` | substring_index operator | None | project | str | S | |||||||||||||||||
delim | PS only a single character is allowed; Literal value only |
||||||||||||||||||||||
count | PS Literal value only |
||||||||||||||||||||||
result | S | ||||||||||||||||||||||
Subtract | `-` | Subtraction | None | project | lhs | S | S | S | S | S | S | S | NS | ||||||||||
rhs | S | S | S | S | S | S | S | NS | |||||||||||||||
result | S | S | S | S | S | S | S | NS | |||||||||||||||
AST | lhs | NS | NS | S | S | S | S | NS | NS | ||||||||||||||
rhs | NS | NS | S | S | S | S | NS | NS | |||||||||||||||
result | NS | NS | S | S | S | S | NS | NS | |||||||||||||||
Tan | `tan` | Tangent | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
AST | input | S | |||||||||||||||||||||
result | S | ||||||||||||||||||||||
Expression | SQL Functions(s) | Description | Notes | Context | Param/Output | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
Tanh | `tanh` | Hyperbolic tangent | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
AST | input | S | |||||||||||||||||||||
result | S | ||||||||||||||||||||||
TimeAdd | Adds interval to timestamp | None | project | start | PS UTC is only supported TZ for TIMESTAMP |
||||||||||||||||||
interval | PS month intervals are not supported; Literal value only |
||||||||||||||||||||||
result | PS UTC is only supported TZ for TIMESTAMP |
||||||||||||||||||||||
ToDegrees | `degrees` | Converts radians to degrees | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
ToRadians | `radians` | Converts degrees to radians | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
ToUnixTimestamp | `to_unix_timestamp` | Returns the UNIX timestamp of the given time | None | project | timeExp | S | PS UTC is only supported TZ for TIMESTAMP |
S | |||||||||||||||
format | PS A limited number of formats are supported; Literal value only |
||||||||||||||||||||||
result | S | ||||||||||||||||||||||
TransformKeys | `transform_keys` | Transform keys in a map using a transform function | None | project | argument | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
|||||||||||||||||
function | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | NS | NS | NS | ||||||
result | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
||||||||||||||||||||||
Expression | SQL Functions(s) | Description | Notes | Context | Param/Output | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
TransformValues | `transform_values` | Transform values in a map using a transform function | None | project | argument | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
|||||||||||||||||
function | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
NS | |||||
result | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
||||||||||||||||||||||
UnaryMinus | `negative` | Negate a numeric value | None | project | input | S | S | S | S | S | S | S | NS | ||||||||||
result | S | S | S | S | S | S | S | NS | |||||||||||||||
AST | input | NS | NS | S | S | S | S | NS | NS | ||||||||||||||
result | NS | NS | S | S | S | S | NS | NS | |||||||||||||||
UnaryPositive | `positive` | A numeric value with a + in front of it | None | project | input | S | S | S | S | S | S | S | NS | ||||||||||
result | S | S | S | S | S | S | S | NS | |||||||||||||||
AST | input | S | S | S | S | S | S | NS | NS | ||||||||||||||
result | S | S | S | S | S | S | NS | NS | |||||||||||||||
UnboundedFollowing$ | Special boundary for a window frame, indicating all rows preceding the current row | None | project | result | S | ||||||||||||||||||
UnboundedPreceding$ | Special boundary for a window frame, indicating all rows preceding the current row | None | project | result | S | ||||||||||||||||||
UnixTimestamp | `unix_timestamp` | Returns the UNIX timestamp of current or specified time | None | project | timeExp | S | PS UTC is only supported TZ for TIMESTAMP |
S | |||||||||||||||
format | PS A limited number of formats are supported; Literal value only |
||||||||||||||||||||||
result | S | ||||||||||||||||||||||
Expression | SQL Functions(s) | Description | Notes | Context | Param/Output | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
UnscaledValue | Convert a Decimal to an unscaled long value for some aggregation optimizations | None | project | input | PS max DECIMAL precision of 18 |
||||||||||||||||||
result | S | ||||||||||||||||||||||
Upper | `upper`, `ucase` | String uppercase operator | This is not 100% compatible with the Spark version because the Unicode version used by cuDF and the JVM may differ, resulting in some corner-case characters not changing case correctly. | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
WeekDay | `weekday` | Returns the day of the week (0 = Monday...6=Sunday) | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
WindowExpression | Calculates a return value for every input row of a table based on a group (or "window") of rows | None | window | windowFunction | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
NS | |
windowSpec | S | S | S | S | NS | NS | PS max DECIMAL precision of 18 |
S | |||||||||||||||
result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
NS | |||||
WindowSpecDefinition | Specification of a window function, indicating the partitioning-expression, the row ordering, and the width of the window | None | project | partition | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT |
NS | |
value | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT |
NS | |||||
result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT |
NS | |||||
Year | `year` | Returns the year from a date or timestamp | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
AggregateExpression | Aggregate expression | None | aggregation | aggFunc | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
NS | |
filter | S | ||||||||||||||||||||||
result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
NS | |||||
reduction | aggFunc | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
NS | ||||
filter | S | ||||||||||||||||||||||
result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
NS | |||||
window | aggFunc | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
NS | ||||
filter | S | ||||||||||||||||||||||
result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
NS | |||||
Expression | SQL Functions(s) | Description | Notes | Context | Param/Output | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
ApproximatePercentile | `percentile_approx`, `approx_percentile` | Approximate percentile | This is not 100% compatible with the Spark version because the GPU implementation of approx_percentile is not bit-for-bit compatible with Apache Spark. To enable it, set spark.rapids.sql.incompatibleOps.enabled | aggregation | input | S | S | S | S | S | S | NS | NS | S | |||||||||
percentage | S | S | |||||||||||||||||||||
accuracy | S | ||||||||||||||||||||||
result | S | S | S | S | S | S | NS | NS | S | PS unsupported child types DATE, TIMESTAMP |
|||||||||||||
reduction | input | S | S | S | S | S | S | NS | NS | S | |||||||||||||
percentage | S | S | |||||||||||||||||||||
accuracy | S | ||||||||||||||||||||||
result | S | S | S | S | S | S | NS | NS | S | PS unsupported child types DATE, TIMESTAMP |
|||||||||||||
Average | `avg`, `mean` | Average aggregate operator | None | aggregation | input | S | S | S | S | S | S | S | |||||||||||
result | S | S | |||||||||||||||||||||
reduction | input | S | S | S | S | S | S | S | |||||||||||||||
result | S | S | |||||||||||||||||||||
window | input | S | S | S | S | S | S | S | |||||||||||||||
result | S | S | |||||||||||||||||||||
CollectList | `collect_list` | Collect a list of non-unique elements, not supported in reduction | None | aggregation | input | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
NS |
result | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
||||||||||||||||||||||
reduction | input | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
NS | ||||
result | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
||||||||||||||||||||||
window | input | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
NS | ||||
result | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
||||||||||||||||||||||
Expression | SQL Functions(s) | Description | Notes | Context | Param/Output | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
CollectSet | `collect_set` | Collect a set of unique elements, not supported in reduction | None | aggregation | input | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, MAP, UDT |
NS |
result | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, MAP, UDT |
||||||||||||||||||||||
reduction | input | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, MAP, UDT |
NS | ||||
result | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, MAP, UDT |
||||||||||||||||||||||
window | input | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, MAP, UDT |
NS | ||||
result | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, MAP, UDT |
||||||||||||||||||||||
Count | `count` | Count aggregate operator | None | aggregation | input | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | S | S | PS UTC is only supported TZ for child TIMESTAMP |
PS UTC is only supported TZ for child TIMESTAMP |
PS UTC is only supported TZ for child TIMESTAMP |
S |
result | S | ||||||||||||||||||||||
reduction | input | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | S | S | PS UTC is only supported TZ for child TIMESTAMP |
PS UTC is only supported TZ for child TIMESTAMP |
PS UTC is only supported TZ for child TIMESTAMP |
S | ||||
result | S | ||||||||||||||||||||||
window | input | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | S | S | PS UTC is only supported TZ for child TIMESTAMP |
PS UTC is only supported TZ for child TIMESTAMP |
PS UTC is only supported TZ for child TIMESTAMP |
S | ||||
result | S | ||||||||||||||||||||||
First | `first_value`, `first` | first aggregate operator | None | aggregation | input | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
NS |
result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
NS | |||||
reduction | input | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
NS | ||||
result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
NS | |||||
window | input | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | ||||
result | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | |||||
Expression | SQL Functions(s) | Description | Notes | Context | Param/Output | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
Last | `last`, `last_value` | last aggregate operator | None | aggregation | input | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
NS |
result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
NS | |||||
reduction | input | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
NS | ||||
result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
NS | |||||
window | input | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | ||||
result | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | |||||
Max | `max` | Max aggregate operator | None | aggregation | input | S | S | S | S | S | PS Input must not contain NaNs and spark.rapids.sql.hasNans must be false. |
PS Input must not contain NaNs and spark.rapids.sql.hasNans must be false. |
S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, STRUCT, UDT |
NS | |
result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, STRUCT, UDT |
NS | ||||||
reduction | input | S | S | S | S | S | PS Input must not contain NaNs and spark.rapids.sql.hasNans must be false. |
PS Input must not contain NaNs and spark.rapids.sql.hasNans must be false. |
S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, STRUCT, UDT |
NS | |||||
result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, STRUCT, UDT |
NS | ||||||
window | input | S | S | S | S | S | PS Input must not contain NaNs and spark.rapids.sql.hasNans must be false. |
PS Input must not contain NaNs and spark.rapids.sql.hasNans must be false. |
S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | NS | NS | NS | |||||
result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | NS | NS | NS | ||||||
Min | `min` | Min aggregate operator | None | aggregation | input | S | S | S | S | S | PS Input must not contain NaNs and spark.rapids.sql.hasNans must be false. |
PS Input must not contain NaNs and spark.rapids.sql.hasNans must be false. |
S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, STRUCT, UDT |
NS | |
result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, STRUCT, UDT |
NS | ||||||
reduction | input | S | S | S | S | S | PS Input must not contain NaNs and spark.rapids.sql.hasNans must be false. |
PS Input must not contain NaNs and spark.rapids.sql.hasNans must be false. |
S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, STRUCT, UDT |
NS | |||||
result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, STRUCT, UDT |
NS | ||||||
window | input | S | S | S | S | S | PS Input must not contain NaNs and spark.rapids.sql.hasNans must be false. |
PS Input must not contain NaNs and spark.rapids.sql.hasNans must be false. |
S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | NS | NS | NS | |||||
result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | NS | NS | NS | ||||||
Expression | SQL Functions(s) | Description | Notes | Context | Param/Output | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
PivotFirst | PivotFirst operator | None | aggregation | pivotColumn | S | S | S | S | S | PS Input must not contain NaNs and spark.rapids.sql.hasNans must be false. |
PS Input must not contain NaNs and spark.rapids.sql.hasNans must be false. |
S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | NS | NS | NS | NS | |
valueColumn | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | NS | NS | NS | NS | |||||
result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types NULL, BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT |
NS | NS | NS | |||||
reduction | pivotColumn | S | S | S | S | S | PS Input must not contain NaNs and spark.rapids.sql.hasNans must be false. |
PS Input must not contain NaNs and spark.rapids.sql.hasNans must be false. |
S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | NS | NS | NS | NS | ||||
valueColumn | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | NS | NS | NS | NS | |||||
result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types NULL, BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT |
NS | NS | NS | |||||
StddevPop | `stddev_pop` | Aggregation computing population standard deviation | None | reduction | input | NS | |||||||||||||||||
result | NS | ||||||||||||||||||||||
aggregation | input | S | |||||||||||||||||||||
result | S | ||||||||||||||||||||||
window | input | NS | |||||||||||||||||||||
result | NS | ||||||||||||||||||||||
StddevSamp | `stddev_samp`, `std`, `stddev` | Aggregation computing sample standard deviation | None | reduction | input | NS | |||||||||||||||||
result | NS | ||||||||||||||||||||||
aggregation | input | S | |||||||||||||||||||||
result | S | ||||||||||||||||||||||
window | input | S | |||||||||||||||||||||
result | S | ||||||||||||||||||||||
Expression | SQL Functions(s) | Description | Notes | Context | Param/Output | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
Sum | `sum` | Sum aggregate operator | None | aggregation | input | S | S | S | S | S | S | S | |||||||||||
result | S | S | S | ||||||||||||||||||||
reduction | input | S | S | S | S | S | S | S | |||||||||||||||
result | S | S | S | ||||||||||||||||||||
window | input | S | S | S | S | S | S | S | |||||||||||||||
result | S | S | S | ||||||||||||||||||||
VariancePop | `var_pop` | Aggregation computing population variance | None | reduction | input | NS | |||||||||||||||||
result | NS | ||||||||||||||||||||||
aggregation | input | S | |||||||||||||||||||||
result | S | ||||||||||||||||||||||
window | input | NS | |||||||||||||||||||||
result | NS | ||||||||||||||||||||||
VarianceSamp | `var_samp`, `variance` | Aggregation computing sample variance | None | reduction | input | NS | |||||||||||||||||
result | NS | ||||||||||||||||||||||
aggregation | input | S | |||||||||||||||||||||
result | S | ||||||||||||||||||||||
window | input | NS | |||||||||||||||||||||
result | NS | ||||||||||||||||||||||
Expression | SQL Functions(s) | Description | Notes | Context | Param/Output | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
NormalizeNaNAndZero | Normalize NaN and zero | None | project | input | S | S | |||||||||||||||||
result | S | S | |||||||||||||||||||||
ScalarSubquery | Subquery that will return only one row and one column | None | project | result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
NS | |
HiveGenericUDF | Hive Generic UDF, the UDF can choose to implement a RAPIDS accelerated interface to get better performance | None | project | param | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | S | S | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types UDT |
NS | |
result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | S | S | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types UDT |
NS | |||||
HiveSimpleUDF | Hive UDF, the UDF can choose to implement a RAPIDS accelerated interface to get better performance | None | project | param | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | S | S | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types UDT |
NS | |
result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | S | S | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types UDT |
NS |
The above table does not show what is and is not supported for cast. This table shows the matrix of supported casts. Nested types like MAP, Struct, and Array can only be cast if the child types can be cast.
Some of the casts to/from string on the GPU are not 100% the same and are disabled by default. Please see the configs for more details on these specific cases.
Please note that even though casting from one type to another is supported by Spark it does not mean they all produce usable results. For example casting from a date to a boolean always produces a null. This is for Hive compatibility and the accelerator produces the same result.
TO | |||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT | ||
FROM | BOOLEAN | S | S | S | S | S | S | S | S | S | |||||||||
BYTE | S | S | S | S | S | S | S | S | S | ||||||||||
SHORT | S | S | S | S | S | S | S | S | S | ||||||||||
INT | S | S | S | S | S | S | S | S | S | ||||||||||
LONG | S | S | S | S | S | S | S | S | S | ||||||||||
FLOAT | S | S | S | S | S | S | S | PS Conversion may produce different results and requires spark.rapids.sql.castFloatToString.enabled to be true. |
S | ||||||||||
DOUBLE | S | S | S | S | S | S | S | PS Conversion may produce different results and requires spark.rapids.sql.castFloatToString.enabled to be true. |
S | ||||||||||
DATE | S | PS UTC is only supported TZ for TIMESTAMP |
S | ||||||||||||||||
TIMESTAMP | S | PS UTC is only supported TZ for TIMESTAMP |
S | ||||||||||||||||
STRING | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | ||||||
DECIMAL | NS | S | S | S | S | S | S | S | S | ||||||||||
NULL | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | NS | NS | NS | NS | |
BINARY | NS | NS | |||||||||||||||||
CALENDAR | NS | NS | |||||||||||||||||
ARRAY | PS The array's child type must also support being cast to the desired child type; UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, MAP, UDT |
||||||||||||||||||
MAP | PS the map's key and value must also support being cast to the desired child types; UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT |
||||||||||||||||||
STRUCT | PS the struct's children must also support being cast to the desired child type(s); UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, MAP, UDT |
||||||||||||||||||
UDT | NS |
TO | |||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT | ||
FROM | BOOLEAN | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | ||||||||
BYTE | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | ||||||||
SHORT | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | ||||||||
INT | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | ||||||||
LONG | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | ||||||||
FLOAT | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
PS Conversion may produce different results and requires spark.rapids.sql.castFloatToString.enabled to be true. |
S | |||||||||
DOUBLE | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
PS Conversion may produce different results and requires spark.rapids.sql.castFloatToString.enabled to be true. |
S | |||||||||
DATE | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | NS | ||||||||
TIMESTAMP | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | NS | ||||||||
STRING | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | ||||||
DECIMAL | NS | S | S | S | S | S | S | NS | S | S | |||||||||
NULL | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | NS | NS | NS | NS | |
BINARY | NS | NS | |||||||||||||||||
CALENDAR | NS | NS | |||||||||||||||||
ARRAY | PS the array's child type must also support being cast to string |
PS The array's child type must also support being cast to the desired child type(s); UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT |
|||||||||||||||||
MAP | PS the map's key and value must also support being cast to string |
PS the map's key and value must also support being cast to the desired child types; UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT |
|||||||||||||||||
STRUCT | PS the struct's children must also support being cast to string |
PS the struct's children must also support being cast to the desired child type(s); UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT |
|||||||||||||||||
UDT | NS | NS |
When transferring data between different tasks the data is partitioned in
specific ways depending on requirements in the plan. Be aware that the types
included below are only for rows that impact where the data is partitioned.
So for example if we are doing a join on the column a
the data would be
hash partitioned on a
, but all of the other columns in the same data frame
as a
don't show up in the table. They are controlled by the rules for
ShuffleExchangeExec
which uses the Partitioning
.
Partition | Description | Notes | Param | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
HashPartitioning | Hash based partitioning | None | hash_key | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, MAP, UDT |
NS |
RangePartitioning | Range partitioning | None | order_key | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | S | NS | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, UDT |
NS | |
RoundRobinPartitioning | Round robin partitioning | None | |||||||||||||||||||
SinglePartition$ | Single partitioning | None |
For Input and Output it is not cleanly exposed what types are supported and which are not. This table tries to clarify that. Be aware that some types may be disabled in some cases for either reads or writes because of processing limitations, like rebasing dates or timestamps, or for a lack of type coercion support.
Format | Direction | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Avro | Read | S | S | S | S | S | S | S | NS | NS | S | NS | NS | NS | NS | NS | NS | ||
Write | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | |||
CSV | Read | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | NS | ||||||
Write | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | |||||||
JSON | Read | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | NS | NS | NS | NS | NS | ||
Write | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | |||
ORC | Read | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, UDT |
NS | ||
Write | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, MAP, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, MAP, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, MAP, UDT |
NS | |||
Parquet | Read | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, UDT |
NS | ||
Write | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP |
S | S | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, UDT |
PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, UDT |
NS |