zlib error when spark dump files larger than 2GB in script "cmph5_opreations" #5

PengNi · 2017-10-16T04:44:47Z

tested python version: 2.7.5

Spark/Python will raise an error when dumping files larger than 2GB:

OverflowError: size does not fit in an int.

This occurs when using zlib.

Related Works:
joblib issue #122
joblib issue #300
python issue #23306
python issue #27130

The text was updated successfully, but these errors were encountered:

PengNi · 2017-10-16T08:09:59Z

For Python 2.x, this bug has been fixed on Python 2.7.13 (or higher) (Release Notes).

PengNi · 2017-10-17T03:43:36Z

The OverflowError problem was solved after upgrading python to 2.7.13.

close the issue.

PengNi · 2017-10-17T03:51:12Z

Also, Spark has another limitation that the obj to dump can't be larger than 2GB when using struct.pack (see issue #6 ).

So the right thing to do is that to avoid a single file to be larger than 2GB. And this has nothing to do with zlib's bug.

PengNi closed this as completed Oct 17, 2017

Provide feedback