Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zlib error when spark dump files larger than 2GB in script "cmph5_opreations" #5

Closed
PengNi opened this issue Oct 16, 2017 · 3 comments

Comments

@PengNi
Copy link
Owner

PengNi commented Oct 16, 2017

tested python version: 2.7.5

Spark/Python will raise an error when dumping files larger than 2GB:

OverflowError: size does not fit in an int.

This occurs when using zlib.

Related Works:
joblib issue #122
joblib issue #300
python issue #23306
python issue #27130

@PengNi
Copy link
Owner Author

PengNi commented Oct 16, 2017

For Python 2.x, this bug has been fixed on Python 2.7.13 (or higher) (Release Notes).

@PengNi
Copy link
Owner Author

PengNi commented Oct 17, 2017

The OverflowError problem was solved after upgrading python to 2.7.13.

close the issue.

@PengNi PengNi closed this as completed Oct 17, 2017
@PengNi
Copy link
Owner Author

PengNi commented Oct 17, 2017

Also, Spark has another limitation that the obj to dump can't be larger than 2GB when using struct.pack (see issue #6 ).

So the right thing to do is that to avoid a single file to be larger than 2GB. And this has nothing to do with zlib's bug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant