[WIP] Unsafe shuffle writer support in RSS #53

mayurdb · 2021-07-23T08:27:17Z

Key traits

Stores the map output data in serialized form
Buffers the data in memory as much as possible. Chunk the data before sending it to RSS servers.
Avoids any extra copying of the data before spill
Uses Java’s unsafe apis to acquire memory for storing the serialized record
After receiving a record, it is serialized and stored at an offset in the already acquired page in memory.
A tuple of (PartitionId, Memory location where record is stored) is maintained in an Array
Array is sorted by the partition before spilling. Once sorted, data is read from the accompanying pointer location

Details

Implementation uses Java's unsafe APIs for acquiring large chunks of memory. Tuple of partition Id and memory location where a record is stored in memory is stored in an array. The advantage of doing this is that the data can be sorted by just sorting the metadata array on. Data before spilling is read into chunks of configurable size and sent over the network. Similar to the above two approaches, this approach also interfaces with the TMM to acquire more memory for storing records or for expanding the metadata array.

Open Source Spark has already implemented the unsafe shuffle writer and currently gets used for most executing of the shuffle writes. Components from Spark’s implementation around memory allocation, storing data in memory and metadata based sort were reused in the implementation. Logic around spilling triggers and around reading data from the memory had to be changed to be compatible and more importantly performant with RSS.

Performance Numbers

TODOs:

This is still a WIP and needs refactoring to be production ready
ShuffleWithAggregationTest needs to be fixed as some UTs are failing with the records read/written checks
The existing Scala UTs are passing. New UTs need to be added
RadixSort, RssPackedRecordPointer, RssShuffleInMemorySorter, ShuffleSortDataFormat are just copy pasted from Spark code. They are package private in Spark but even after creating same package structure in RSS, they throw IllegalAccessError at runtime. This needs to be fixed, so that we don't need to add those classes
General code cleanup

(cherry picked from commit a876e216c65f9381f6f2e551d701db978db995ee)

vladhlinsky

Hi @mayurdb.
This looks like a great improvement!
I understand that this is a WIP PR, but, please, let me leave a few comments from my side.
Thanks!

vladhlinsky · 2021-07-27T08:45:08Z

src/main/scala/org/apache/spark/shuffle/RssOpts.scala

+      .createWithDefault(true)
+  val unsafeShuffleWriterBufferSize: ConfigEntry[Long] =
+    ConfigBuilder("spark.shuffle.rss.unsafe.writer.bufferSize")
+      .doc("Use unsafe shuffle writer")


Looks like the description should be updated, as it is the same as for the previous config entry.

vladhlinsky · 2021-07-27T09:13:35Z

src/main/scala/org/apache/spark/shuffle/RssOpts.scala

+    ConfigBuilder("spark.shuffle.rss.unsafe.writer.bufferSize")
+      .doc("Use unsafe shuffle writer")
+      .longConf
+      .createWithDefault(5*1024l*1024)


Spark uses bytesConf for memory sizes, so users could specify values as bytes (b), kibibytes (k), mebibytes (m), etc. E.g. 50b, 100k, or 250m. It's also possible to omit a suffix. Should we also use bytesConf, as it's done for example for spark.memory.offHeap.size?

its a good suggestion , we should consider this

vladhlinsky · 2021-07-27T09:32:25Z

src/main/scala/org/apache/spark/shuffle/rss/RssShuffleWriteManager.scala

+  var totalBufferedSize: Long = 0
+  var totalSendDataBlockTime: Long = 0
+
+  private val writerBufferSize = conf.get(RssOpts.unsafeShuffleWriterBufferSize);


Looks like writerBufferSize is not used.

vladhlinsky · 2021-07-27T09:45:27Z

src/main/scala/org/apache/spark/shuffle/rss/WriterBufferManager.scala

  private val serializerInstance = serializer.newInstance()

-  def addRecord(partitionId: Int, record: Product2[K, V]): Seq[(Int, Array[Byte])] = {
+  var totalLookUpTime = 0L


Looks like totalLookUpTime is only being updated, but never read.

vladhlinsky · 2021-07-27T09:46:34Z

src/main/scala/org/apache/spark/shuffle/rss/WriterBufferManager.scala


  private var recordsWrittenCount: Int = 0

+  private var totalSerializationTime: Long = 0l
+
+  private var totalMemoryFethcWaitTime: Long = 0l


nit: totalMemoryFethcWaitTime => totalMemoryFetchWaitTime

vladhlinsky · 2021-07-27T10:47:33Z

src/main/java/org/apache/spark/shuffle/sort/RssShuffleInMemorySorter.java

+
+    private final int initialSize;
+
+    public RssShuffleInMemorySorter(MemoryConsumer consumer, int initialSize, boolean useRadixSort, boolean a) {


The latest argument boolean a is not used.

vladhlinsky · 2021-07-27T11:09:26Z

RadixSort, RssPackedRecordPointer, RssShuffleInMemorySorter, ShuffleSortDataFormat are just copy pasted from Spark code. They are package private in Spark but even after creating same package structure in RSS, they throw IllegalAccessError at runtime. This needs to be fixed, so that we don't need to add those classes

@mayurdb, I've deleted RadixSort, RssPackedRecordPointer, RssShuffleInMemorySorter, ShuffleSortDataFormat and modified RssShuffleExternalSorter to use Spark's classes. This worked fine in my environment with Spark version 2.4.3 and 2.4.4.

Tested locally in the spark-shell as follows:

...
scala> sc.parallelize(1 to 10, 2).map(x => (x % 3, x)).groupByKey(3).collect()
res0: Array[(Int, Iterable[Int])] = Array((0,CompactBuffer(6, 9, 3)), (1,CompactBuffer(7, 10, 1, 4)), (2,CompactBuffer(8, 2, 5)))

scala> sc.parallelize(1 to 10, 2).map(x => (x % 3, x)).reduceByKey((val1, _) => val1).collect()
res1: Array[(Int, Int)] = Array((0,6), (2,8), (1,7))

The IllegalAccessError could be thrown if the definition of a class has incompatibly changed. I think this may indicate that the Spark version used to compile RSS jars differs from the version, which is available on cluster nodes.
Please, correct me If I'm missing something.

mayurdb · 2021-07-27T19:49:13Z

RadixSort, RssPackedRecordPointer, RssShuffleInMemorySorter, ShuffleSortDataFormat are just copy pasted from Spark code. They are package private in Spark but even after creating same package structure in RSS, they throw IllegalAccessError at runtime. This needs to be fixed, so that we don't need to add those classes

@mayurdb, I've deleted RadixSort, RssPackedRecordPointer, RssShuffleInMemorySorter, ShuffleSortDataFormat and modified RssShuffleExternalSorter to use Spark's classes. This worked fine in my environment with Spark version 2.4.3 and 2.4.4.

Tested locally in the spark-shell as follows:
...
scala> sc.parallelize(1 to 10, 2).map(x => (x % 3, x)).groupByKey(3).collect()
res0: Array[(Int, Iterable[Int])] = Array((0,CompactBuffer(6, 9, 3)), (1,CompactBuffer(7, 10, 1, 4)), (2,CompactBuffer(8, 2, 5)))

scala> sc.parallelize(1 to 10, 2).map(x => (x % 3, x)).reduceByKey((val1, _) => val1).collect()
res1: Array[(Int, Int)] = Array((0,6), (2,8), (1,7))
The IllegalAccessError could be thrown if the definition of a class has incompatibly changed. I think this may indicate that the Spark version used to compile RSS jars differs from the version, which is available on cluster nodes.
Please, correct me If I'm missing something.

Oh that's great. These classes are just package private, so ideally they should just work for all the cases where the package structure is replicated. Also, the compile and runtime Spark jars will be different in most of the cases while using RSS.

I actually haven't looked into the issue as I just wanted to try this out and get the performance number first. I will check the details of the IllegalAccessError and get back

hiboyang · 2021-07-28T06:06:59Z

src/main/java/org/apache/spark/shuffle/sort/RssUnsafeShuffleWriter.java

+        assert(sorter != null);
+        final K key = record._1();
+        final int partitionId = partitioner.getPartition(key);
+        serBuffer.reset();


I remember some serializer like JavaSerializer may use object reference in serialized stream, each serialized object inside the serialized stream will not be independent. Thus getting the bytes from MyByteArrayOutputStream each time after adding key/value may not be always safe.

KyroSerializer does not use object reference (if I remember correctly), and will be safe here.

Maybe add some check here to make sure the serializer is KyroSerializer?

We should always use kyroSerializer, can we just fail if we are not using Kyro

mabansal

Seems like needs to be updated

mabansal · 2021-07-30T00:37:34Z

src/main/java/org/apache/spark/shuffle/sort/RadixSort.java

+import org.apache.spark.unsafe.Platform;
+import org.apache.spark.unsafe.array.LongArray;
+
+public class RadixSort {


I thought you have removed this class and using the spark class, why this is still showing up here

mabansal · 2021-07-30T00:38:04Z

src/main/java/org/apache/spark/shuffle/sort/RadixSort.java

+import org.apache.spark.unsafe.array.LongArray;
+
+public class RadixSort {
+


Can we just remove this class and use spark classes

mabansal · 2021-07-30T00:39:23Z

src/main/java/org/apache/spark/shuffle/sort/RssShuffleExternalSorter.java

+import org.apache.spark.util.Utils;
+
+
+final class RssShuffleExternalSorter extends MemoryConsumer {


Can we please add some comments here , why this class is being added and what is the functionality for that

mabansal · 2021-07-30T00:39:46Z

src/main/java/org/apache/spark/shuffle/sort/RssShuffleExternalSorter.java

+
+    private long sizeThreshold;
+
+    long numRecords = 0l;


why this is not private

mabansal · 2021-07-30T00:41:16Z

src/main/java/org/apache/spark/shuffle/sort/RssShuffleExternalSorter.java

+    private MemoryBlock currentPage = null;
+    private long pageCursor = -1;
+
+    long bytesWritten = 0l;


these should be private as well

mabansal · 2021-07-30T01:11:57Z

src/main/java/org/apache/spark/shuffle/sort/RssUnsafeShuffleWriter.java

+        // Keep track of success so we know if we encountered an exception
+        // We do this rather than a standard try/catch/re-throw to handle
+        // generic throwables.
+        // TODO: Fix recordsWritten


can you please add what we need to fix or just fix it and remove this TODO

mabansal · 2021-07-30T01:16:02Z

src/main/scala/org/apache/spark/shuffle/RssOpts.scala

+    ConfigBuilder("spark.shuffle.rss.unsafe.writer.bufferSize")
+      .doc("Use unsafe shuffle writer")
+      .longConf
+      .createWithDefault(5*1024l*1024)


its a good suggestion , we should consider this

mabansal · 2021-07-30T01:16:34Z

src/main/scala/org/apache/spark/shuffle/RssShuffleWriter.scala


-  private val enableMapSideAggregation = shuffleDependency.mapSideCombine && conf.get(RssOpts.enableMapSideAggregation)
+  val enableMapSideAggregation = shuffleDependency.mapSideCombine && conf.get(RssOpts.enableMapSideAggregation)


can we make them private

mabansal · 2021-07-30T01:19:44Z

src/main/scala/org/apache/spark/shuffle/RssTestMemoryManager.scala

+import org.apache.spark.memory.{MemoryManager, MemoryMode}
+import org.apache.spark.storage.BlockId
+
+class RssTestMemoryManager(conf: SparkConf)


Add some context here for the class

mabansal · 2021-07-30T01:20:22Z

src/main/scala/org/apache/spark/shuffle/rss/RssShuffleWriteManager.scala

+
+  private val writeClientCloseLock = new Object()
+
+  var totalCompressionTime: Long = 0


these all variables should be private

mayurdb · 2021-08-03T10:34:47Z

RadixSort, RssPackedRecordPointer, RssShuffleInMemorySorter, ShuffleSortDataFormat are just copy pasted from Spark code. They are package private in Spark but even after creating same package structure in RSS, they throw IllegalAccessError at runtime. This needs to be fixed, so that we don't need to add those classes

@mayurdb, I've deleted RadixSort, RssPackedRecordPointer, RssShuffleInMemorySorter, ShuffleSortDataFormat and modified RssShuffleExternalSorter to use Spark's classes. This worked fine in my environment with Spark version 2.4.3 and 2.4.4.

Tested locally in the spark-shell as follows:
...
scala> sc.parallelize(1 to 10, 2).map(x => (x % 3, x)).groupByKey(3).collect()
res0: Array[(Int, Iterable[Int])] = Array((0,CompactBuffer(6, 9, 3)), (1,CompactBuffer(7, 10, 1, 4)), (2,CompactBuffer(8, 2, 5)))

scala> sc.parallelize(1 to 10, 2).map(x => (x % 3, x)).reduceByKey((val1, _) => val1).collect()
res1: Array[(Int, Int)] = Array((0,6), (2,8), (1,7))
The IllegalAccessError could be thrown if the definition of a class has incompatibly changed. I think this may indicate that the Spark version used to compile RSS jars differs from the version, which is available on cluster nodes.
Please, correct me If I'm missing something.

@vladhlinsky did you run these commands from within the Intellij/Any other IDE or ran a spark-submit externally and passed the RSS jars? Also, if the command was ran externally, can you please confirm if RSS was used?

These classes are package private in Spark. To be able to access them, we will need same package structure and also both the interface and implementation should be loaded by same class loader. Looks like I'm hitting the second issue.

mayurdb · 2021-08-03T11:04:58Z

RadixSort, RssPackedRecordPointer, RssShuffleInMemorySorter, ShuffleSortDataFormat are just copy pasted from Spark code. They are package private in Spark but even after creating same package structure in RSS, they throw IllegalAccessError at runtime. This needs to be fixed, so that we don't need to add those classes

@mayurdb, I've deleted RadixSort, RssPackedRecordPointer, RssShuffleInMemorySorter, ShuffleSortDataFormat and modified RssShuffleExternalSorter to use Spark's classes. This worked fine in my environment with Spark version 2.4.3 and 2.4.4.
Tested locally in the spark-shell as follows:
...
scala> sc.parallelize(1 to 10, 2).map(x => (x % 3, x)).groupByKey(3).collect()
res0: Array[(Int, Iterable[Int])] = Array((0,CompactBuffer(6, 9, 3)), (1,CompactBuffer(7, 10, 1, 4)), (2,CompactBuffer(8, 2, 5)))

scala> sc.parallelize(1 to 10, 2).map(x => (x % 3, x)).reduceByKey((val1, _) => val1).collect()
res1: Array[(Int, Int)] = Array((0,6), (2,8), (1,7))
The IllegalAccessError could be thrown if the definition of a class has incompatibly changed. I think this may indicate that the Spark version used to compile RSS jars differs from the version, which is available on cluster nodes.
Please, correct me If I'm missing something.
@vladhlinsky did you run these commands from within the Intellij/Any other IDE or ran a spark-submit externally and passed the RSS jars? Also, if the command was ran externally, can you please confirm if RSS was used?

These classes are package private in Spark. To be able to access them, we will need same package structure and also both the interface and implementation should be loaded by same class loader. Looks like I'm hitting the second issue.

scala> Class.forName("org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter").getClassLoader()
res19: ClassLoader = sun.misc.Launcher$AppClassLoader@36aa7bc2

// Class for RSS code base
scala>  Class.forName("org.apache.spark.shuffle.sort.RssShuffleExternalSorter").getClassLoader()
res20: ClassLoader = scala.reflect.internal.util.ScalaClassLoader$URLClassLoader@2634d000```

vladhlinsky · 2021-08-06T13:34:44Z

@mayurdb, I ran a spark-shell locally and verified that RSS was used, but in my case classes were loaded by the same class loader, so looks like it's really the cause of the issue:

scala> Class.forName("org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter").getClassLoader()
res2: ClassLoader = sun.misc.Launcher$AppClassLoader@2f0e140b

scala> Class.forName("org.apache.spark.shuffle.sort.RssShuffleExternalSorter").getClassLoader()
res3: ClassLoader = sun.misc.Launcher$AppClassLoader@2f0e140b

YutingWang98 · 2023-03-25T00:14:57Z

Hi @mayurdb, we have also been experiencing memory and map stage latency issues using Rss. We plan to test and work on this implementation as well. Wondering if you have any updates about this PR that you can share with us. Many thanks :)

mayurdb added 3 commits July 23, 2021 12:32

Unsafe side of changes

588a4e6

(cherry picked from commit a876e216c65f9381f6f2e551d701db978db995ee)

Port changes

eb7e230

Interface change for no aggregation manager

c921874

vladhlinsky reviewed Jul 27, 2021

View reviewed changes

Fixed simple issues - variables, style checks, etc

f72ef6c

hiboyang reviewed Jul 28, 2021

View reviewed changes

mabansal self-requested a review July 30, 2021 00:34

mabansal reviewed Jul 30, 2021

View reviewed changes

mayurdb mentioned this pull request Dec 7, 2021

[Proposal] Unsafe memory management in RSS mappers #59

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Unsafe shuffle writer support in RSS #53

[WIP] Unsafe shuffle writer support in RSS #53

mayurdb commented Jul 23, 2021 •

edited

Loading

vladhlinsky left a comment

vladhlinsky Jul 27, 2021

vladhlinsky Jul 27, 2021

mabansal Jul 30, 2021

vladhlinsky Jul 27, 2021

vladhlinsky Jul 27, 2021

vladhlinsky Jul 27, 2021

vladhlinsky Jul 27, 2021

vladhlinsky commented Jul 27, 2021 •

edited

Loading

mayurdb commented Jul 27, 2021

hiboyang Jul 28, 2021

mabansal Jul 30, 2021

mabansal left a comment

mabansal Jul 30, 2021

mabansal Jul 30, 2021

mabansal Jul 30, 2021

mabansal Jul 30, 2021

mabansal Jul 30, 2021

mabansal Jul 30, 2021

mabansal Jul 30, 2021

mabansal Jul 30, 2021

mabansal Jul 30, 2021

mabansal Jul 30, 2021

mayurdb commented Aug 3, 2021

mayurdb commented Aug 3, 2021 •

edited

Loading

vladhlinsky commented Aug 6, 2021 •

edited

Loading

YutingWang98 commented Mar 25, 2023


		private final int initialSize;

		public RssShuffleInMemorySorter(MemoryConsumer consumer, int initialSize, boolean useRadixSort, boolean a) {

		import org.apache.spark.unsafe.array.LongArray;

		public class RadixSort {

		import org.apache.spark.util.Utils;


		final class RssShuffleExternalSorter extends MemoryConsumer {


		private val enableMapSideAggregation = shuffleDependency.mapSideCombine && conf.get(RssOpts.enableMapSideAggregation)
		val enableMapSideAggregation = shuffleDependency.mapSideCombine && conf.get(RssOpts.enableMapSideAggregation)


		private val writeClientCloseLock = new Object()

		var totalCompressionTime: Long = 0

[WIP] Unsafe shuffle writer support in RSS #53

Are you sure you want to change the base?

[WIP] Unsafe shuffle writer support in RSS #53

Conversation

mayurdb commented Jul 23, 2021 • edited Loading

vladhlinsky left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vladhlinsky commented Jul 27, 2021 • edited Loading

mayurdb commented Jul 27, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mabansal left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mayurdb commented Aug 3, 2021

mayurdb commented Aug 3, 2021 • edited Loading

vladhlinsky commented Aug 6, 2021 • edited Loading

YutingWang98 commented Mar 25, 2023

mayurdb commented Jul 23, 2021 •

edited

Loading

vladhlinsky commented Jul 27, 2021 •

edited

Loading

mayurdb commented Aug 3, 2021 •

edited

Loading

vladhlinsky commented Aug 6, 2021 •

edited

Loading