Skip to content

Latest commit

 

History

History
159 lines (114 loc) · 6.43 KB

README.md

File metadata and controls

159 lines (114 loc) · 6.43 KB

Parallelized Edge Detection

This C++ program performs edge detection on a PGM image using various Laplacian filters. The program can be run in sequential mode or parallel mode using different parallelization strategies. The goal is to compare the performance of the different parallelization strategies utilizing the CPU and to determine which one is the most efficient. If you want to know how edge detection works using Laplacian filters, see 3b1b's video on convolution.

This repository features a core paper, report.pdf, detailing the results of the experiments conducted to compare the performance of the different parallelization strategies.

Compiling the code

To compile the code, enter the source directory and run the following commands:

mkdir build
cd build
cmake ../ -DCMAKE_BUILD_TYPE=Release
make

This will create the main executable in the build directory. If you ever change the source code, you can recompile the code by running make in the build directory. No need to run cmake again.

Usage

Run the main executable in the build directory with the following arguments:

./main -i <input_pgm> -o <output_pgm> -n <num_threads> -t <toggle_timing> -f <filter> -m <execution_method> -c <chunk_size>

The arguments are as follows:

  • -i <input_pgm>: The input pgm file. Altnernatively, you can use -b <image_number> to use one of the built-in images. The image numbers are only 1 and 2. Use either -i or -b, not both.
  • -o <output_pgm>: The output pgm file.
  • -n <num_threads>: The number of threads to use. Must be a positive integer.
  • -t <toggle_timing>: 1 to enable timing, 0 to disable timing.
  • -f <filter>: The laplacian filter to use. Here are the options:
    • 1: 3x3 Laplacian
    • 2: 5x5 Laplacian
    • 3: 9x9 Laplacian of Gaussian
    • 4: 1x1 Identity
  • -m <execution_method>: The execution method to use. Here are the options:
    • 1: Sequential
    • 2: Parallel Sharded Rows
    • 3: Parallel Sharded Column Major
    • 4: Parallel Sharded Row Major
    • 5: Parallel Work Queue
  • -c <chunk_size>: The chunk size to use when using work queue parallelization. Must be a positive integer.

Number of Threads

For optimal performance, the number of threads should be equal to the number of cores on your machine. If you are unsure how many cores your machine has, you can find out by running the following command:

lscpu | grep "CPU"

# result
CPU op-mode(s):                       32-bit, 64-bit
CPU(s):                               8 # number of cores!!
On-line CPU(s) list:                  0-7
Model name:                           Intel(R) Core(TM) i7-9700 CPU @ 3.00GHz
CPU family:                           6
CPU max MHz:                          4700.0000
CPU min MHz:                          800.0000
NUMA node0 CPU(s):                    0-7
Vulnerability Mmio stale data:        Mitigation; Clear CPU buffers; SMT disabled

That being said, the entire point of this program is to compare the performance of different parallelization strategies. So, you can run the program with different numbers of threads to see how the performance changes for yourself.

Filters

You can see the filters in the filters.cpp file,

filter *builtin_filters[NUM_FILTERS] = {&lp3_f, &lp5_f, &log_f, &identity_f};

For example, the 3x3 Laplacian filter is defined as follows:

int8_t lp3_m[] = {
    0, 1, 0, 
    1, -4, 1, 
    0, 1, 0,
};

filter lp3_f = {3, lp3_m};

Execution Methods

Sequential

The sequential execution method processes the image in a single thread.


Parallel Sharded Rows

The parallel sharded rows execution method divides the image into rows and assigns each thread a row to process.


Parallel Sharded Column Major

The parallel sharded column major execution method divides the image into columns and assigns each thread a column to process.


Parallel Sharded Row Major

The parallel sharded row major execution method divides the image into rows and assigns each thread a row to process. The difference between this method and the parallel sharded rows method is that the image is stored in row-major order.

Row-major order means that the elements of the rows of the image are stored contiguously in memory. This is in contrast to column-major order, where the elements of the columns of the image are stored contiguously in memory.


Parallel Work Queue

The parallel work queue execution method uses a work queue to assign work to threads. The work queue is a queue of tasks that need to be completed. Each task is a row of the image that needs to be processed. Threads take tasks from the work queue and process them.

PGM Creator

This allows you to generate PGM images in the binary format. The binary is located inside the build directory and it is called "pgm_creator". See the pgm_creator.cpp source code.

You can create a PGM image using the PGM creator provided like so: (you must build the starter code first)

./pgm_creator 2 2 input.pgm

This creates a 2x2 image in the build directory called input.pgm. You can then remove all .pgm files in the build directory by running make clean_pgm.

Inspecting Binary PGM Images

You can see the contents of the image with the following command:

od --skip-bytes=11 --width=2 --format=u1 input.pgm

This will print the following:

0000013   1   2
0000015   3   4
0000017

We skip the first 11 bytes because they are part of the header and usually not relevant. In this example, the image has bytes with contents 1, 2, 3 and 4.

When testing, you can use this to inspect the output of your program.

Generating Graphs

Run ./generate_graphs_script.sh to generate graphs for some experiments. The experiments and some sample results are explained in the report.pdf file. The shell script will run a python script that runs a series of experiments and generates graphs based on the results. The resulting graphs will be saved in build/graphs/. The graphs will be saved as .png files.

plotter.py

This contains a single function that takes data for the x/y axes and plots a graph based on it. Read the documentation in the file for instructions.

perfs_student.py

This script runs your executable (main) in the build directory, parses the perf output and plots some basic graphs (it depends on the first script). If you decide to use it, make sure to read all of the code there.