Java heap space Error : while importing large data #129

karthikasathishkumar · 2018-03-29T06:35:21Z

i have modified 1g into 10g in MEMORY=10g in bin/mallet shell script and executed import command with input size 5GB in ubuntu14 64-bit ram size 16GB.
i am getting the below error in mallet and how to overcome this error.
kindly suggest a better way to import data(total size of the input data = 5GB).

java.lang.OutOfMemoryError: Java heap space
	at java.lang.AbstractStringBuilder.<init>(AbstractStringBuilder.java:68)
	at java.lang.StringBuffer.<init>(StringBuffer.java:128)
	at cc.mallet.pipe.Input2CharSequence.pipe(Input2CharSequence.java:94)
	at cc.mallet.pipe.Input2CharSequence.pipe(Input2CharSequence.java:83)
	at cc.mallet.pipe.Input2CharSequence.pipe(Input2CharSequence.java:47)
	at cc.mallet.pipe.Pipe$SimplePipeInstanceIterator.next(Pipe.java:295)
	at cc.mallet.pipe.Pipe$SimplePipeInstanceIterator.next(Pipe.java:283)
	at cc.mallet.pipe.Pipe$SimplePipeInstanceIterator.next(Pipe.java:291)
	at cc.mallet.pipe.Pipe$SimplePipeInstanceIterator.next(Pipe.java:283)
	at cc.mallet.pipe.Pipe$SimplePipeInstanceIterator.next(Pipe.java:291)
	at cc.mallet.pipe.Pipe$SimplePipeInstanceIterator.next(Pipe.java:283)
	at cc.mallet.pipe.Pipe$SimplePipeInstanceIterator.next(Pipe.java:291)
	at cc.mallet.pipe.Pipe$SimplePipeInstanceIterator.next(Pipe.java:283)
	at cc.mallet.pipe.Pipe$SimplePipeInstanceIterator.next(Pipe.java:291)
	at cc.mallet.pipe.Pipe$SimplePipeInstanceIterator.next(Pipe.java:283)
	at cc.mallet.types.InstanceList.addThruPipe(InstanceList.java:267)
	at cc.mallet.classify.tui.Text2Vectors.main(Text2Vectors.java:322)

The text was updated successfully, but these errors were encountered:

mimno · 2018-04-25T18:21:28Z

You might be able to use the "bulk load" feature. It has fewer options, but may be more efficient.

$ bin/mallet bulk-load --help
Efficient tool for importing large amounts of text into Mallet format
--help TRUE|FALSE
Print this command line option usage information. Give argument of TRUE for longer documentation
Default is false
--prefix-code 'JAVA CODE'
Java code you want run before any other interpreted code. Note that the text is interpreted without modification, so unlike some other Java code options, you need to include any necessary 'new's when creating objects.
Default is null
--config FILE
Read command option values from a file
Default is null
--input FILE
The file containing data, one instance per line
Default is null
--output FILE
Write the instance list to this file
Default is mallet.data
--preserve-case [TRUE|FALSE]
If true, do not force all strings to lowercase.
Default is false
--remove-stopwords [TRUE|FALSE]
If true, remove common "stop words" from the text.
This option invokes a minimal English stoplist.
Default is false
--stoplist FILE
Read newline-separated words from this file,
and remove them from text. This option overrides
the default English stoplist triggered by --remove-stopwords.
Default is null
--keep-sequence [TRUE|FALSE]
If true, final data will be a FeatureSequence rather than a FeatureVector.
Default is false
--line-regex REGEX
Regular expression containing regex-groups for label, name and data.
Default is ^([^\t])\t([^\t])\t(.)
--name INTEGER
The index of the group containing the instance name.
Use 0 to indicate that this field is not used.
Default is 1
--label INTEGER
The index of the group containing the label string.
Use 0 to indicate that this field is not used.
Default is 2
--data INTEGER
The index of the group containing the data.
Default is 3
--prune-count N
Reduce features to those that occur more than N times.
Default is 0
--prune-doc-frequency N
Remove features that occur in more than (X100)% of documents. 0.05 is equivalent to IDF of 3.0.
Default is 1.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Java heap space Error : while importing large data #129

Java heap space Error : while importing large data #129

karthikasathishkumar commented Mar 29, 2018

mimno commented Apr 25, 2018

Java heap space Error : while importing large data #129

Java heap space Error : while importing large data #129

Comments

karthikasathishkumar commented Mar 29, 2018

mimno commented Apr 25, 2018