This C++ program performs edge detection on a PGM image using various Laplacian filters. The program can be run in sequential mode or parallel mode using different parallelization strategies. The goal is to compare the performance of the different parallelization strategies utilizing the CPU and to determine which one is the most efficient. If you want to know how edge detection works using Laplacian filters, see 3b1b's video on convolution.
This repository features a core paper, report.pdf
, detailing the results of the experiments conducted to compare the performance of the different parallelization strategies.
To compile the code, enter the source directory and run the following commands:
mkdir build
cd build
cmake ../ -DCMAKE_BUILD_TYPE=Release
make
This will create the main
executable in the build directory. If you ever change the source code, you can recompile the code by running make
in the build directory. No need to run cmake
again.
Run the main
executable in the build directory with the following arguments:
./main -i <input_pgm> -o <output_pgm> -n <num_threads> -t <toggle_timing> -f <filter> -m <execution_method> -c <chunk_size>
The arguments are as follows:
-i <input_pgm>
: The input pgm file. Altnernatively, you can use-b <image_number>
to use one of the built-in images. The image numbers are only 1 and 2. Use either-i
or-b
, not both.-o <output_pgm>
: The output pgm file.-n <num_threads>
: The number of threads to use. Must be a positive integer.-t <toggle_timing>
: 1 to enable timing, 0 to disable timing.-f <filter>
: The laplacian filter to use. Here are the options:1
: 3x3 Laplacian2
: 5x5 Laplacian3
: 9x9 Laplacian of Gaussian4
: 1x1 Identity
-m <execution_method>
: The execution method to use. Here are the options:1
: Sequential2
: Parallel Sharded Rows3
: Parallel Sharded Column Major4
: Parallel Sharded Row Major5
: Parallel Work Queue
-c <chunk_size>
: The chunk size to use when using work queue parallelization. Must be a positive integer.
For optimal performance, the number of threads should be equal to the number of cores on your machine. If you are unsure how many cores your machine has, you can find out by running the following command:
lscpu | grep "CPU"
# result
CPU op-mode(s): 32-bit, 64-bit
CPU(s): 8 # number of cores!!
On-line CPU(s) list: 0-7
Model name: Intel(R) Core(TM) i7-9700 CPU @ 3.00GHz
CPU family: 6
CPU max MHz: 4700.0000
CPU min MHz: 800.0000
NUMA node0 CPU(s): 0-7
Vulnerability Mmio stale data: Mitigation; Clear CPU buffers; SMT disabled
That being said, the entire point of this program is to compare the performance of different parallelization strategies. So, you can run the program with different numbers of threads to see how the performance changes for yourself.
You can see the filters in the filters.cpp
file,
filter *builtin_filters[NUM_FILTERS] = {&lp3_f, &lp5_f, &log_f, &identity_f};
For example, the 3x3 Laplacian filter is defined as follows:
int8_t lp3_m[] = {
0, 1, 0,
1, -4, 1,
0, 1, 0,
};
filter lp3_f = {3, lp3_m};
The sequential execution method processes the image in a single thread.
The parallel sharded rows execution method divides the image into rows and assigns each thread a row to process.
The parallel sharded column major execution method divides the image into columns and assigns each thread a column to process.
The parallel sharded row major execution method divides the image into rows and assigns each thread a row to process. The difference between this method and the parallel sharded rows method is that the image is stored in row-major order.
Row-major order means that the elements of the rows of the image are stored contiguously in memory. This is in contrast to column-major order, where the elements of the columns of the image are stored contiguously in memory.
The parallel work queue execution method uses a work queue to assign work to threads. The work queue is a queue of tasks that need to be completed. Each task is a row of the image that needs to be processed. Threads take tasks from the work queue and process them.
This allows you to generate PGM images in the binary format. The binary is
located inside the build directory and it is called "pgm_creator". See the
pgm_creator.cpp
source code.
You can create a PGM image using the PGM creator provided like so: (you must build the starter code first)
./pgm_creator 2 2 input.pgm
This creates a 2x2 image in the build
directory called input.pgm
. You can
then remove all .pgm
files in the build
directory by running make clean_pgm
.
You can see the contents of the image with the following command:
od --skip-bytes=11 --width=2 --format=u1 input.pgm
This will print the following:
0000013 1 2
0000015 3 4
0000017
We skip the first 11 bytes because they are part of the header and usually not relevant. In this example, the image has bytes with contents 1, 2, 3 and 4.
When testing, you can use this to inspect the output of your program.
Run ./generate_graphs_script.sh
to generate graphs for some experiments. The experiments and some sample results are explained in the report.pdf
file. The shell script will run a python script that runs a series of experiments and generates graphs based on the results. The resulting graphs will be saved in build/graphs/
. The graphs will be saved as .png
files.
This contains a single function that takes data for the x/y axes and plots a graph based on it. Read the documentation in the file for instructions.
This script runs your executable (main
) in the build directory, parses the
perf output and plots some basic graphs (it depends on the first script). If
you decide to use it, make sure to read all of the code there.