Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting java.lang.OutOfMemoryError: Java heap space exception #421

Closed
r3econ opened this issue May 25, 2022 · 5 comments
Closed

Getting java.lang.OutOfMemoryError: Java heap space exception #421

r3econ opened this issue May 25, 2022 · 5 comments

Comments

@r3econ
Copy link

r3econ commented May 25, 2022

I'm trying to load data to Astra and I'm getting an error. I've got around 5000 objects that I unloaded using the dsbulk 1.9.0 into a csv file. Now, when I try to load them into Astra I'm getting a java.lang.OutOfMemoryError. Some data, before the exception happens, gets successfully uploaded and I can see it in the database - around 120 items.

What could be the cause?

Here's how I invoke the dsbulk:

dsbulk load -url file.csv -k mykeyspacename -t mytablename-b "./secure-connect-xxx-production.zip" -u xxx-p xxx -header true --connector.csv.maxCharsPerColumn -1

Error:
2022-05-25 18!UNITO-UNDERSCORE!03!UNITO-UNDERSCORE!48-5 148 169 198!UNITO-UNDERSCORE!50002 - Remote Desktop Connection

Contents of the log:

2022-05-25 16:01:45 INFO  Username and password provided but auth provider not specified, inferring PlainTextAuthProvider
2022-05-25 16:01:45 INFO  A cloud secure connect bundle was provided: ignoring all explicit contact points.
2022-05-25 16:01:45 INFO  A cloud secure connect bundle was provided and selected operation performs writes: changing default consistency level to LOCAL_QUORUM.
2022-05-25 16:01:45 INFO  Operation directory: C:\Users\Administrator\Desktop\AstraMigration\dsbulk-1.9.0\bin\logs\LOAD_20220525-160145-169000
2022-05-25 16:01:58 ERROR Operation LOAD_20220525-160145-169000 failed unexpectedly: Java heap space.
java.lang.OutOfMemoryError: Java heap space
	at java.util.Arrays.copyOfRange(Unknown Source)
	at java.lang.String.<init>(Unknown Source)
	at com.univocity.parsers.common.input.DefaultCharAppender.getAndReset(DefaultCharAppender.java:162)
	at com.univocity.parsers.common.ParserOutput.valueParsed(ParserOutput.java:363)
	at com.univocity.parsers.csv.CsvParser.parseSingleDelimiterRecord(CsvParser.java:180)
	at com.univocity.parsers.csv.CsvParser.parseRecord(CsvParser.java:109)
	at com.univocity.parsers.common.AbstractParser.parseNext(AbstractParser.java:581)
	at com.univocity.parsers.common.AbstractParser.parseNextRecord(AbstractParser.java:1219)
	at com.datastax.oss.dsbulk.connectors.csv.CSVConnector$CSVRecordReader.readNext(CSVConnector.java:289)
	at com.datastax.oss.dsbulk.connectors.commons.AbstractFileBasedConnector$$Lambda$372/27353340.apply(Unknown Source)
2022-05-25 16:02:00 INFO  Final stats:
2022-05-25 16:02:00 INFO  Last processed positions can be found in positions.txt

┆Issue is synchronized with this Jira Task by Unito

@adutra
Copy link
Contributor

adutra commented May 27, 2022

Hi @r3econ, thanks for reaching out.

How big are the rows you are trying to load? Given that you use --connector.csv.maxCharsPerColumn -1 I suppose they are quite large.

When dealing with large rows: here are some things that work:

  • Throttle DSBulk; this is best done by setting --engine.maxConcurrentQueries X where X is a small number: start with 1, if it works, that's great but the throughput will be poor; try increasing the number little by little to improve DSBulk's throughput without breaking the operation.
  • Increase the heap size. This is done by setting the environment variable DSBULK_JAVA_OPTS. E.g. export DSBULK_JAVA_OPTS="-Xms1g -Xmx1g"

Hope that helps!

@adutra
Copy link
Contributor

adutra commented Jun 1, 2022

You can also add --dsbulk.log.sources false to lower the heap pressure.

@adutra
Copy link
Contributor

adutra commented Jun 13, 2022

Any updates on this @r3econ ?

@r3econ
Copy link
Author

r3econ commented Jun 15, 2022

Yes, thanks for the info. I managed to get it working by tweaking the params

@adutra
Copy link
Contributor

adutra commented Jun 15, 2022

Glad to hear! Let's close this issue then.

@adutra adutra closed this as completed Jun 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants