Testbed

This project is used to execute multiple experiments to compare Spark vs MapReduce. Before creating a one-JAR with all dependencies, make sure the following dependencies are installed:

Maven
Java JDK 11

Then, run the following script:

$ bash scripts/install.sh

Afterwards, it will create a target folder, which holds the one-JAR with all dependencies.

To run the JAR, use the following command:

$ java -jar Testbed-1.0-SNAPSHOT-jar-with-dependencies.jar [options]

Where [options] are the following:

usage: java -jar Testbed-1.0-SNAPSHOT-jar-with-dependencies.jar [options]
 -f,--framework-name <arg>               Data Processing Framework's name.
                                         Available options are:
                                         [MapReduce, Spark]
 -i,--instrumented                       If this flag is present, the
                                         Testbed will use the instrumented
                                         invocations. Without this flag,
                                         the Testbed will use the
                                         invocations required to measure
                                         time
 -l,--local                              If the flag is present, the
                                         Testbed uses the local
                                         environment for the frameworks.
                                         Without this flag, the Testbed
                                         uses the cluster environment
 -o,--output <arg>                       Output file path
 -p,--pipeline <arg>                     Pipeline file path
 -s,--sheet-name <arg>                   Sheet name in output file path
 -t,--tolerable-error-percentage <arg>   Tolerable error percentage.
                                         Default value is: 5.0

This section is extracted from usage, which can always be displayed when invoking the java JAR with no options, as follows:

$ java -jar Testbed-1.0-SNAPSHOT-jar-with-dependencies.jar

The pipeline refers to a JSON file with the following aspect:

[
  {
    "operation": "LOAD",
    "outputTag": "dataset1",
    "datasetDirectoryPath": "input/Ad_click_on_taobao_10000"
  },
  {
    "operation": "SELECT",
    "inputTag": "dataset1",
    "outputTag": "selectedDataset1",
    "rowsSelectivityFactor": 0.005,
    "columnName": "DateTime"
  }
]

An example of invocation with inputParameters is the following:

$ java -jar Testbed-1.0-SNAPSHOT-jar-with-dependencies.jar \
--tolerable-error-percentage 5.0 \
--framework-name MapReduce \
--instrumented \
--pipeline pipelines/pipeline.json \
--output output/operation_instrumentations.xlsx \
--sheet-name Dataset_5% \
--local

To execute experiments, we have created the script scripts/experiments_runner.sh. This script executes each experiment for Spark and MapReduce 3 times and an additional time to instrument the execution. The individual output of each experiment gets aggregated into a single output file. To execute this script, use the following command:

$ bash scripts/experiments_runner.sh

For more information, check the dissertation associated to this project.

Name		Name	Last commit message	Last commit date
Latest commit History 253 Commits
experiments_information		experiments_information
scripts		scripts
src/main		src/main
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
mvnw		mvnw
mvnw.cmd		mvnw.cmd
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Testbed

About

Releases

Packages

Languages

License

georgeboc/Testbed

Folders and files

Latest commit

History

Repository files navigation

Testbed

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages