Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Controls to Affect Partitioning and Caching #87

Open
metasim opened this issue Jun 13, 2018 · 1 comment
Open

Controls to Affect Partitioning and Caching #87

metasim opened this issue Jun 13, 2018 · 1 comment

Comments

@metasim
Copy link
Contributor

metasim commented Jun 13, 2018

For Workflow Nodes producing a DataFrame, it would be very helpful to have a set of "advanced" controls whereby the user can have calls to repartition/coalesce, persist(StorageLevel) or unpersist made as a part of the execute invocation. The default behavior would be to not call any of these methods (and only call them when explicitly requested).

Ideally, these controls would be separated from the normal parameters in the front end, residing in (perhaps) a separate tab between "Parameters" and "Ports". However, a more minimalist approach would be to append them to the existing parameters.

Implementation Considerations

Something to consider is to do it with a mix-in trait, so that existing nodes and SDK users could enable the capability selectively (maybe it doesn't make sense in some cases). It also gets rid of the need to special-case handling the DataFrame from other DOperables. This means explicit passing of the DataFrame to some function provided by the mix-in, a step that some may find objectionable. IMO I think it's fine.

cc: @bguseman

@metasim
Copy link
Contributor Author

metasim commented Jun 14, 2018

Perhaps also broadcast

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants