You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm running some tests on SQL queries that use WHERE, GROUP BY and/or ORDER BY phrases. That is no JOINS atm. I was able to create a cluster manually, with six nodes that each have 2 GPUS plus a head node. This shows up in the UI as I would expect; I also have grafana and prometheus running with it. I am using a python client. But some issues:
In some cases, I get the error: ray.rpc.RequestWorkerLeaseRequest exceeded maximum protobuf size of 2GB" (I think its looking for 16GB in one case)
To add, from what I can see all of my queries are processed in one stage. And the output shows "Forcing reduce stage concurrency from 1200 to 1" then all compute takes place on the first node of the cluster. The 1200 is about the number of cores I have across the cluster and the value I used to initialize DatafusionRayContext. Might be that the query plan deduced doesn't allow for parallelism?
BTW, it seems odd that after ray.init which connects to the cluster I then need DatafusionRayContext(int) to define how many CPUs I want. I would think that the connection object could be passed like DatafusionRayContext(InitializationObject) and the resulting context would then use all of the cluster. Otherwise, I don't see how to get configs from the ray.init to then set the number of CPUs dynamically -- and there is no option for GPUs from connection info -- but maybe datafusion-ray doesn't use GPUs
In some cases, where I can simply run datafusion directly, the datafusion-ray version (also running on one node per above) can be up to 8x slower.
Unfortunately, I can't provide examples or datasets as my tests are internal.
Thanks!
PS: I hope this is the correct place for this type of feedback. I didn't see a forum or anything. And apologies for noting more than one issue in the same post. I realize that you are only at version 0.1.0 so I hope you can view this as helpful.
The text was updated successfully, but these errors were encountered:
I'm running some tests on SQL queries that use WHERE, GROUP BY and/or ORDER BY phrases. That is no JOINS atm. I was able to create a cluster manually, with six nodes that each have 2 GPUS plus a head node. This shows up in the UI as I would expect; I also have grafana and prometheus running with it. I am using a python client. But some issues:
ray.rpc.RequestWorkerLeaseRequest exceeded maximum protobuf size of 2GB"
(I think its looking for 16GB in one case)DatafusionRayContext
. Might be that the query plan deduced doesn't allow for parallelism?ray.init
which connects to the cluster I then needDatafusionRayContext(int)
to define how many CPUs I want. I would think that the connection object could be passed likeDatafusionRayContext(InitializationObject)
and the resulting context would then use all of the cluster. Otherwise, I don't see how to get configs from theray.init
to then set the number of CPUs dynamically -- and there is no option for GPUs from connection info -- but maybe datafusion-ray doesn't use GPUsUnfortunately, I can't provide examples or datasets as my tests are internal.
Thanks!
PS: I hope this is the correct place for this type of feedback. I didn't see a forum or anything. And apologies for noting more than one issue in the same post. I realize that you are only at version 0.1.0 so I hope you can view this as helpful.
The text was updated successfully, but these errors were encountered: