Skip to content

Peach-He/Spark-PMoF

 
 

Repository files navigation

Spark-PMoF: RPMem extension for Spark Shuffle

Spark-PMoF (Persistent Memory over Fabric), RPMem extension for Spark Shuffle, is a Spark Shuffle Plugin which enables persistent memory and high performance fabric technology like RDMA for Spark shuffle to improve Spark performance in shuffle intensive scneario.

IMPORTANT NOTE

Spark-PMof has been migrated and integrated to OAP: https://github.com/Intel-bigdata/OAP/tree/master/oap-shuffle/RPMem-shuffle. Please Check OAP for most recent update.

Contents

Introduction

Installation

Make sure you got HPNL installed.

git clone https://github.com/Intel-bigdata/Spark-PMoF.git
cd Spark-PMoF; mvn package -DskipTests

If the pmem hardware is ready,it's useful to test by removing the -DskipTests option:

mvn package

Benchmark

Usage

This plugin current supports Spark 2.3 and works well on various Network fabrics, including Socket, RDMA and Omni-Path. Before runing Spark workload, add following contents in spark-defaults.conf, then have fun! :-)

spark.driver.extraClassPath Spark-PMoF-PATH/target/sso-0.1-jar-with-dependencies.jar
spark.executor.extraClassPath Spark-PMoF-PATH/target/sso-0.1-jar-with-dependencies.jar
spark.shuffle.manager org.apache.spark.shuffle.pmof.PmofShuffleManager

Contact

Chendi Xue, [email protected] Jian Zhang, [email protected]

About

Spark Shuffle Optimization with RDMA+AEP

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • C++ 85.1%
  • Scala 10.0%
  • Java 2.9%
  • C 1.5%
  • Other 0.5%