java.lang.OutOfMemoryError: Java heap space #479

arezki1990 · 2017-11-15T15:14:49Z

i am trying to build (spark)a cube on kylin with a join between two huge tables , can you help me out with that please
java.lang.OutOfMemoryError: Java heap space
at java.util.IdentityHashMap.resize(IdentityHashMap.java:471)
at java.util.IdentityHashMap.put(IdentityHashMap.java:440)
at org.apache.kylin.dict.TrieDictionaryBuilder.buildTrieBytes(TrieDictionaryBuilder.java:476)
at org.apache.kylin.dict.TrieDictionaryBuilder.build(TrieDictionaryBuilder.java:418)
at org.apache.kylin.dict.lookup.SnapshotTable.takeSnapshot(SnapshotTable.java:98)
at org.apache.kylin.dict.lookup.SnapshotManager.buildSnapshot(SnapshotManager.java:139)
at org.apache.kylin.cube.CubeManager.buildSnapshotTable(CubeManager.java:287)
at org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:87)
at org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:49)
at org.apache.kylin.engine.mr.steps.CreateDictionaryJob.run(CreateDictionaryJob.java:66)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.kylin.engine.mr.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:62)
at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:125)
at org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:64)
at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:125)
at org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:144)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

this is my spark config

kylin.env.hadoop-conf-dir=/home/hadoop/apache-kylin-2.2.0-bin/hadoop-conf

Estimate the RDD partition numbers

#kylin.engine.spark.rdd-partition-cut-mb=10

Minimal partition numbers of rdd

#kylin.engine.spark.min-partition=1

Max partition numbers of rdd

#kylin.engine.spark.max-partition=5000

Spark conf (default is in spark/conf/spark-defaults.conf)

#kylin.engine.spark-conf.spark.master=yarn
##kylin.engine.spark-conf.spark.submit.deployMode=cluster
#kylin.engine.spark-conf.spark.yarn.queue=default
kylin.engine.spark-conf.spark.executor.memory=20G
kylin.engine.spark-conf.spark.executor.cores=8
kylin.engine.spark-conf.spark.executor.instances=6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

java.lang.OutOfMemoryError: Java heap space #479

java.lang.OutOfMemoryError: Java heap space #479

arezki1990 commented Nov 15, 2017

java.lang.OutOfMemoryError: Java heap space #479

java.lang.OutOfMemoryError: Java heap space #479

Comments

arezki1990 commented Nov 15, 2017

Estimate the RDD partition numbers

Minimal partition numbers of rdd

Max partition numbers of rdd

Spark conf (default is in spark/conf/spark-defaults.conf)