Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test Scalability of Spark BatteryAnalysis #118

Open
Mwegert opened this issue Mar 7, 2019 · 0 comments
Open

Test Scalability of Spark BatteryAnalysis #118

Mwegert opened this issue Mar 7, 2019 · 0 comments

Comments

@Mwegert
Copy link
Contributor

Mwegert commented Mar 7, 2019

Expected Behavior

The Spark program should work even if portions of the dataset are too large to fit in memory.

Actual Behavior

We suspect that there may be scalability issues in ModelFunction. We have provided ModelFunctionDataset, which resolves this but ran slower when we tested it. If there are issues, simply replace ModelFunction with ModelFunctionDataset in the Driver program. Look into optimization of the dataset logic in ModelFunctionDataset.

Steps to Reproduce the Problem

  1. Generate gigabytes of data
  2. Test the code as-is on an EMR cluster.
  3. Determine if an error occurs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant