This repository contains functions with different approaches to Transposition of Matrix and performance tests of them.
We sequentially implement :
- Naive matrix transposition (single-thread)
- Parallel naive matrix transposition (multi-threads)
- SSE matrix transposition (single-thread)
- SSE Block matrix transposition (single-thread)
- and fastest variant SSE Parallel Block matrix transposition (multi-threads)
All tests were performed on "Intel(R) Core(TM) i7-8650U CPU @ 1.90GHz" with 16 GB of DDR4 RAM
As can be seen significant difference apears starting with matrix about 2000x2000
SSE Block matrix transposition approach faster than any other single-thread approaches
SSE Parallel Block matrix transposition fastest at all
FastMatrixTransposition [matrix_size] [block_size] [number_of_threads] [number_of_tests performance tests for each approach]
Will outputed average times for each approah
8000,194659.156250,97701.046875,140561.281250,86731.703125,62631.093750
[matrix_size],[naive approach], [parallel naive approach], [SSE matrix transposition], [Block transposition], [Block SSE parallel transposition]
All times in nano seconds