You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Operation LOAD_20230622-155525-014753 failed unexpectedly: Required array length 2147483639 + 96 is too large.
Full Stacktrace:
2023-06-22 15:55:59 ERROR Operation LOAD_20230622-155525-014753 failed unexpectedly: Required array length 2147483639 + 96 is too large.
java.lang.OutOfMemoryError: Required array length 2147483639 + 96 is too large
at java.base/jdk.internal.util.ArraysSupport.hugeLength(ArraysSupport.java:649)
at java.base/jdk.internal.util.ArraysSupport.newLength(ArraysSupport.java:642)
at java.base/java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:100)
at java.base/java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:132)
at software.amazon.awssdk.utils.IoUtils.toByteArray(IoUtils.java:48)
at software.amazon.awssdk.core.sync.ResponseTransformer.lambda$toBytes$3(ResponseTransformer.java:175)
at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler$HttpResponseHandlerAdapter.transformResponse(BaseSyncClientHandler.java:218)
at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler$HttpResponseHandlerAdapter.handle(BaseSyncClientHandler.java:206)
at software.amazon.awssdk.core.internal.http.CombinedResponseHandler.handleSuccessResponse(CombinedResponseHandler.java:99)
at software.amazon.awssdk.core.internal.http.CombinedResponseHandler.handleResponse(CombinedResponseHandler.java:75)
FWIW, smaller csv files can be used for the load with no problem (~200mb), but get Java heap space errors on CSVs greater than 1 gb and hence the usage of export DSBULK_JAVA_OPTS="-Xmx10G". Are there other throttling available on here? Tried with a file of size 187GB csv and another with 2.3GB and both ended with the same error.
I'm afraid I don't have any specific insights. We were dealing with a URL file that had up to about 10 million records in it; your 2+ billion records (living in just the one transactions.csv file, it seems?) is probably going to run into the limits of what standard Java data structures can support (Integer.MAX_VALUE being 2147483647).
My recommendation would be to break your file up into smaller chunks. That's probably the easiest solution. If you're feeling bold, you could try to revisit my solution to make it more efficient or better utilize streaming.
Also, in case you haven't already found it, #399 was where I implemented the S3 functionality. There may be more answers that can be gleaned from that PR. I don't think I did anything in particular with processing the records themselves; I just dealt with reading the list from S3 and passing it to the normal DSBulk operation. (It may be worth noting that I have no association with DataStax or DSBulk. I'm just a random dev who needed a new feature and decided to implement it himself.)
Command Executed:
Console Output:
Full Stacktrace:
FWIW, smaller csv files can be used for the load with no problem (~200mb), but get Java heap space errors on CSVs greater than 1 gb and hence the usage of
export DSBULK_JAVA_OPTS="-Xmx10G"
. Are there other throttling available on here? Tried with a file of size187GB
csv and another with2.3GB
and both ended with the same error.┆Issue is synchronized with this Jira Task by Unito
The text was updated successfully, but these errors were encountered: