-
Notifications
You must be signed in to change notification settings - Fork 205
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tracking issue: async integration #71
Comments
New |
Changes done so far in:
Must improve shuffle_rdd / co_grouped_rdd shuffle fetch calls to avoid using a concurrent hashmap for the performance hit. |
With the last pushes ( Ideally we probably will want to send some of the work inside to the blocking TP There is a problem in the I have looked a bit into the problems where we can't use the async bufreader capnp method (this is the last commit on async branch right now), it fails to read whatever data is being sent for some reason from the 'other side' (may it be the tests/examples, so from the scheduler; or the unit tests I created) so I haven't switched to that version yet; would be nice if w ecan use it but it may be a problem with the library itself (the connection is actually openned and the stream received, but then it fails to fetch any data from it at the executor). |
Next steeps are
|
Very much all that can be async right now is, except the There is a caveat which makes us have to block on certain async calls due to problems with capnp builder types not being This is temporary until a better solution can be found in the future (ideally most capnp_futures would impl Some of the problems with shuffling tasks persist (others where fixed), gonna focus again trying to fix whatever still is broken there for now. EDIT: actually looks like everything is working just fine now in distributed mode with the latests commits so no issues! |
After some talk we have decided to take a careful gradual approach to integrate async into the library.
Adding asynchronous computation is a large departure from the reference Spark implementation, and may change how we do certain things or what is possible (like certain optimizations that rely on stack allocation in our case) in ways that are not yet clear.
Therefore, is preferred to take a gradual approach as we explore the design space and evolve the library. The original work can be seen at #67, some work done in that preliminary PR will be ported to the main branch and more steeps will be taken to make testing and comparing both versions easily while we experiment.
Meanwhile an async branch will be maintained and kept in sync with the master branch.
Preliminary work
Future work
The text was updated successfully, but these errors were encountered: