-
Notifications
You must be signed in to change notification settings - Fork 244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA] GpuInsertIntoHiveTable
supports parquet format
#9939
Comments
Spark already will take most Hive parquet writes and translate them back to a regular parquet write before we ever see it. I am not 100% sure what the conditions are that make it fall back to using Hive directly for the insert, and they have changed a bit over time. It looks primarily like it has to do with a partitioned write/insert into, but I am not 100% sure. @nvliyuan do you have more information about the write exactly so we can reproduce this and know why it is falling back? Happy to get that info offline if we need to. |
Unfortnately this translation appears to not exist in earlier Spark versions. e.g. 3.2.x used by some customers. Not sure if we can support more types of output format and serde for |
Hi @revans2, could you help confirm if we are missing the parquet support for |
@firestarman We have NO parquet support in I am a little confused by your previous comment. In 3.2.1 there is no translation for The configs for HIVE are in https://github.com/apache/spark/blob/v3.2.1/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala or similar. The last config change in the newest version of that file I see was for 3.1.0 and some of them go back to Spark 1.1.1, so all of the versions we support should deal with all of the same sets of features more or less. Some of the relevant configs here are
There are also a few others for CTAS and INSERT_DIR. But in all of these cases they are configured on by default, except for schema merging. So we need to understand first of all why is an |
Just checked on the configurations. |
Thx a lot for the details. |
We have to support parquet in |
GpuInsertIntoHiveTable
supports parquet format
It also asks for support for bucketing, tracked by #10366 |
We observed fallback info in customer driverlog:
The text was updated successfully, but these errors were encountered: