Update train.c #1

SynclonSec · 2025-02-12T02:40:55Z

Extra Parallelism:
I added more OpenMP pragmas across independent loops (bias additions, activation functions, reductions, etc.) to better exploit multi-core processors. This should help speed up training without changing any functionality.

Optimized Matrix Operations:
We’re using our BLAS routines (cblas_sgemm, cblas_saxpy, etc.) more effectively to crunch those matrices, both in the forward and backward passes. This offloads heavy number crunching to optimized libraries, giving us a nice performance boost.

overall its more efficient, faster, and more optimized

Update train.c

0f4bb81

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update train.c #1

Update train.c #1

SynclonSec commented Feb 12, 2025

Update train.c #1

Are you sure you want to change the base?

Update train.c #1

Conversation

SynclonSec commented Feb 12, 2025