-
Notifications
You must be signed in to change notification settings - Fork 69
Performance
The neural2d program uses the C++11 thread class to launch the integrated web server in its own thread.
We have experimented with using OpenMP to parallelize various other loops in neural2d-core.cpp with limited success. Perhaps the random accesses through several large data structures frustrate the CPU cache mechanism. Neural2d no longer ships with OpenMP #pragmas in the code.
To enable OpenMP parallelization, insert the appropriate OpenMP #pragma line(s) at the loop(s) you wish to parallelize in neural2d-core.cpp, and add the -fopenmp option to the compiler command line in the Makefile.
Typically, a compiler will implement the C++11 thread model and the OpenMP thread model in terms of an underlying thread mechanism provided by the operating system (e.g., pthreads on Linux, or Windows threads on Windows). You may need to supply additional compiler or linker flags to fully enable threading support. E.g., on Linux. add -fopenmp and -pthreads to the g++ command line to enable threading.
If the compiler does not recognize OpenMP pragmas, it will ignore them and compile the program single threaded. If the compiler supports OpenMP, you should get a modest speed increase due to the parallelization.
The neural2d program keeps very large data structures in memory. We know that non-sequential memory accesses are slow due to how cache lines are managed in the processor cache. We also know that the time-consuming inner loops are loops over the connections between neurons. The Connection records introduce an extra level of indirection in convolution layers, further hindering data locality.
Future efforts to improve the performance of neural2d could focus on vectorizing various math loops that are currently not vectorized.