Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Maybe use DataFusion and Apache Arrow as building blocks ? #119

Open
constantOut opened this issue May 15, 2020 · 1 comment
Open

Maybe use DataFusion and Apache Arrow as building blocks ? #119

constantOut opened this issue May 15, 2020 · 1 comment

Comments

@constantOut
Copy link

There is a competing project called https://github.com/ballista-compute/ballista
It is using DataFusion, I don't quite get it why Ballista examples include weird syntax for querying.
I understand that distributed SQL execution is more complex then just combining results from individual executors, but I think having single-node SQL engine would be of a great help.
What do you think ?

@rajasekarv
Copy link
Owner

I have plans of integrating with Python and possibly other languages(JVM and Go) using Arrow. However, regarding datafusion, the underlying architecture of this framework closely follows that of Spark and the job execution is quite a bit different than that of Datafusion. So, unfortunately we can’t use it. Andy Grove built ballista, a distributed framework around datafusion which is an interesting project to have a look at.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants