Skip to content

Project used to execute multiple experiments to compare Spark vs MapReduce

License

Notifications You must be signed in to change notification settings

georgeboc/Testbed

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Testbed

This project is used to execute multiple experiments to compare Spark vs MapReduce. Before creating a one-JAR with all dependencies, make sure the following dependencies are installed:

  • Maven
  • Java JDK 11

Then, run the following script:

$ bash scripts/install.sh

Afterwards, it will create a target folder, which holds the one-JAR with all dependencies.

To run the JAR, use the following command:

$ java -jar Testbed-1.0-SNAPSHOT-jar-with-dependencies.jar [options]

Where [options] are the following:

usage: java -jar Testbed-1.0-SNAPSHOT-jar-with-dependencies.jar [options]
 -f,--framework-name <arg>               Data Processing Framework's name.
                                         Available options are:
                                         [MapReduce, Spark]
 -i,--instrumented                       If this flag is present, the
                                         Testbed will use the instrumented
                                         invocations. Without this flag,
                                         the Testbed will use the
                                         invocations required to measure
                                         time
 -l,--local                              If the flag is present, the
                                         Testbed uses the local
                                         environment for the frameworks.
                                         Without this flag, the Testbed
                                         uses the cluster environment
 -o,--output <arg>                       Output file path
 -p,--pipeline <arg>                     Pipeline file path
 -s,--sheet-name <arg>                   Sheet name in output file path
 -t,--tolerable-error-percentage <arg>   Tolerable error percentage.
                                         Default value is: 5.0

This section is extracted from usage, which can always be displayed when invoking the java JAR with no options, as follows:

$ java -jar Testbed-1.0-SNAPSHOT-jar-with-dependencies.jar

The pipeline refers to a JSON file with the following aspect:

[
  {
    "operation": "LOAD",
    "outputTag": "dataset1",
    "datasetDirectoryPath": "input/Ad_click_on_taobao_10000"
  },
  {
    "operation": "SELECT",
    "inputTag": "dataset1",
    "outputTag": "selectedDataset1",
    "rowsSelectivityFactor": 0.005,
    "columnName": "DateTime"
  }
]

An example of invocation with inputParameters is the following:

$ java -jar Testbed-1.0-SNAPSHOT-jar-with-dependencies.jar \
--tolerable-error-percentage 5.0 \
--framework-name MapReduce \
--instrumented \
--pipeline pipelines/pipeline.json \
--output output/operation_instrumentations.xlsx \
--sheet-name Dataset_5% \
--local

To execute experiments, we have created the script scripts/experiments_runner.sh. This script executes each experiment for Spark and MapReduce 3 times and an additional time to instrument the execution. The individual output of each experiment gets aggregated into a single output file. To execute this script, use the following command:

$ bash scripts/experiments_runner.sh

For more information, check the dissertation associated to this project.

About

Project used to execute multiple experiments to compare Spark vs MapReduce

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published