You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Right now druid-spark-batch reads data using sc.textFile from given locations. This is important limitation if data is stored in formats like Parquet (or any other data format supported by Spark).
Would you consider to enhance this tool with support for arbitrary Spark SQL expression to define input data? You'll get for free:
support for any data format supported by Spark
support for any UDF supported by Spark for data pre-processing
support for joins before ingestion
The text was updated successfully, but these errors were encountered:
I think it would make some sense to have a DataSupplierFactory (or a similar more-sparky-name) passed in the task definition, one of whose implementations effectively does sc.textFile(dataFiles mkString ",") and whose other implementations do other things, then have the chain at
Right now druid-spark-batch reads data using sc.textFile from given locations. This is important limitation if data is stored in formats like Parquet (or any other data format supported by Spark).
Would you consider to enhance this tool with support for arbitrary Spark SQL expression to define input data? You'll get for free:
The text was updated successfully, but these errors were encountered: