Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem when using apollo with spark cluster #44

Open
r0mainK opened this issue Apr 11, 2018 · 0 comments
Open

Problem when using apollo with spark cluster #44

r0mainK opened this issue Apr 11, 2018 · 0 comments

Comments

@r0mainK
Copy link
Contributor

r0mainK commented Apr 11, 2018

When trying to run hash or cmd commands with spark in cluster mode, we get the same problem we used to have with ml, because the workers do not have the apollo lib and it is not added to the spark session using addPyfile.

I think we should either modify the way the --depzip flag functions in order to add it, or change logic:

  • when ml -s flag is used to specify a master that is not local, we should add ml, engine and all other dependencies if the call is not made by a command of the ml library, e.g. apollo and it's dependencies.
  • the --dep-zip flag should be used to add ml dependencies. It will be of no use for us since our workers use ml-core image and already have them, but it will be useful for other users.
  • as was pointed out in this issue, I think we should add to the spark conf by default the flags that will clean up after us, because it ends up taking a lot of memory
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant