Replies: 1 comment
-
Yes!! This would be really cool. Do you have an existing use-case that we can test this with? I'm wondering if we can just make this one of the backends for our What do you think? Would love an API proposal here! |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I know it is somewhat on the roadmap, but since more and more MPP environments are embracing SQL Flight for the arrow format, I am wondering if the following is in scope of what Daft will move towards.
It would be really nice if it were to be possible to connect to a SQL Flight endpoint (ADBC), write a query or convert a python dataframe into a query (much like SQLFrame) and have it executed on the MPP environment and pull the data into micropartitions into Daft. This will allow initial queries to be executed onto an MPP system for performance and security, but still allows you to make use of the power of Daft to then handle this data in an ML setting. If local data needs to be joined with remote data (on the MPP environment), data movement to Daft is needed but still benefits from higher performance and this will allow one to merge best of both worlds. It is very much similar to how snowpark works (afaik), but then using Daft, which adds way more performance (I believe snowpark uses Modin).
You can already just create a query and use a regular query endpoint to have a query processed and use its results, but working with larger datasets and also mixing local dataframes with remote data (pulling tables into Daft) is way more feasible when using SQL Flight.
Thx for such an awesome product!
Beta Was this translation helpful? Give feedback.
All reactions