Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RISC-V 64: Add assembly code for SHA-256 #7758

Merged
merged 1 commit into from
Jul 30, 2024

Conversation

SparkiDev
Copy link
Contributor

Description

Move common defines out of AES file to header file.

Testing

Regression tested RISC-V 64 ASM.

Checklist

  • added tests
  • updated/added doxygen
  • updated appropriate READMEs
  • Updated manual and documentation

@SparkiDev SparkiDev self-assigned this Jul 18, 2024
@SparkiDev SparkiDev assigned wolfSSL-Bot and unassigned SparkiDev Jul 21, 2024
@SparkiDev SparkiDev requested a review from wolfSSL-Bot July 21, 2024 22:54
@dgarske dgarske self-assigned this Jul 22, 2024
@dgarske
Copy link
Contributor

dgarske commented Jul 22, 2024

@SparkiDev I will run this on actual hardware and compare performance for SHA2-256.

@dgarske
Copy link
Contributor

dgarske commented Jul 22, 2024

HiFive Unleashed at 1.4GHz:

SHA-256: 18.326 MiB/s -> 20.399 MiB/s

PR 7758 (with RISCV-ASM):

root@HiFiveU:~/wolfssl-riscv# ./configure --enable-riscv-asm && make
root@HiFiveU:~/wolfssl-riscv# ./wolfcrypt/benchmark/benchmark
------------------------------------------------------------------------------
 wolfSSL version 5.7.2
------------------------------------------------------------------------------
Math: 	Multi-Precision: Wolf(SP) word-size=64 bits=3072 sp_int.c
wolfCrypt Benchmark (block bytes 1048576, min 1.0 sec each)
RNG                         10 MiB took 1.094 seconds,    9.141 MiB/s
AES-128-CBC-enc             20 MiB took 1.062 seconds,   18.837 MiB/s
AES-128-CBC-dec             20 MiB took 1.059 seconds,   18.893 MiB/s
AES-192-CBC-enc             20 MiB took 1.240 seconds,   16.130 MiB/s
AES-192-CBC-dec             20 MiB took 1.233 seconds,   16.216 MiB/s
AES-256-CBC-enc             15 MiB took 1.046 seconds,   14.343 MiB/s
AES-256-CBC-dec             15 MiB took 1.039 seconds,   14.440 MiB/s
AES-128-GCM-enc             15 MiB took 1.301 seconds,   11.528 MiB/s
AES-128-GCM-dec             15 MiB took 1.300 seconds,   11.542 MiB/s
AES-192-GCM-enc             15 MiB took 1.423 seconds,   10.543 MiB/s
AES-192-GCM-dec             15 MiB took 1.425 seconds,   10.528 MiB/s
AES-256-GCM-enc             10 MiB took 1.025 seconds,    9.755 MiB/s
AES-256-GCM-dec             10 MiB took 1.025 seconds,    9.760 MiB/s
GMAC Table 4-bit            31 MiB took 1.025 seconds,   30.245 MiB/s
CHACHA                      35 MiB took 1.128 seconds,   31.018 MiB/s
CHA-POLY                    25 MiB took 1.095 seconds,   22.825 MiB/s
MD5                         75 MiB took 1.011 seconds,   74.202 MiB/s
POLY1305                    90 MiB took 1.044 seconds,   86.186 MiB/s
SHA                         35 MiB took 1.064 seconds,   32.891 MiB/s
SHA-256                     25 MiB took 1.226 seconds,   20.399 MiB/s
SHA-384                     25 MiB took 1.128 seconds,   22.153 MiB/s
SHA-512                     25 MiB took 1.129 seconds,   22.146 MiB/s
SHA-512/224                 25 MiB took 1.128 seconds,   22.163 MiB/s
SHA-512/256                 25 MiB took 1.129 seconds,   22.149 MiB/s
HMAC-MD5                    75 MiB took 1.010 seconds,   74.237 MiB/s
HMAC-SHA                    35 MiB took 1.066 seconds,   32.830 MiB/s
HMAC-SHA256                 25 MiB took 1.223 seconds,   20.446 MiB/s
HMAC-SHA384                 25 MiB took 1.128 seconds,   22.165 MiB/s
HMAC-SHA512                 25 MiB took 1.128 seconds,   22.155 MiB/s
PBKDF2                       3 KiB took 1.007 seconds,    2.544 KiB/s
RSA     2048   public      1500 ops took 1.047 sec, avg 0.698 ms, 1432.120 ops/sec
RSA     2048  private       100 ops took 4.375 sec, avg 43.749 ms, 22.858 ops/sec
DH      2048  key gen       116 ops took 1.006 sec, avg 8.671 ms, 115.326 ops/sec
DH      2048    agree       100 ops took 1.843 sec, avg 18.430 ms, 54.259 ops/sec
ECC   [      SECP256R1]   256  key gen       200 ops took 1.282 sec, avg 6.409 ms, 156.031 ops/sec
ECDHE [      SECP256R1]   256    agree       200 ops took 1.277 sec, avg 6.386 ms, 156.591 ops/sec
ECDSA [      SECP256R1]   256     sign       200 ops took 1.310 sec, avg 6.549 ms, 152.697 ops/sec
ECDSA [      SECP256R1]   256   verify       300 ops took 1.330 sec, avg 4.432 ms, 225.626 ops/sec
Benchmark complete

Master (with RISCV-ASM):

root@HiFiveU:~/wolfssl-riscv# ./configure --enable-riscv-asm && make
root@HiFiveU:~/wolfssl-riscv# ./wolfcrypt/benchmark/benchmark
------------------------------------------------------------------------------
 wolfSSL version 5.7.2
------------------------------------------------------------------------------
Math: 	Multi-Precision: Wolf(SP) word-size=64 bits=3072 sp_int.c
wolfCrypt Benchmark (block bytes 1048576, min 1.0 sec each)
RNG                         10 MiB took 1.366 seconds,    7.320 MiB/s
AES-128-CBC-enc             20 MiB took 1.070 seconds,   18.688 MiB/s
AES-128-CBC-dec             20 MiB took 1.065 seconds,   18.775 MiB/s
AES-192-CBC-enc             20 MiB took 1.240 seconds,   16.134 MiB/s
AES-192-CBC-dec             20 MiB took 1.231 seconds,   16.244 MiB/s
AES-256-CBC-enc             15 MiB took 1.052 seconds,   14.256 MiB/s
AES-256-CBC-dec             15 MiB took 1.045 seconds,   14.359 MiB/s
AES-128-GCM-enc             15 MiB took 1.303 seconds,   11.511 MiB/s
AES-128-GCM-dec             15 MiB took 1.303 seconds,   11.508 MiB/s
AES-192-GCM-enc             15 MiB took 1.431 seconds,   10.479 MiB/s
AES-192-GCM-dec             15 MiB took 1.432 seconds,   10.475 MiB/s
AES-256-GCM-enc             10 MiB took 1.034 seconds,    9.669 MiB/s
AES-256-GCM-dec             10 MiB took 1.034 seconds,    9.669 MiB/s
GMAC Table 4-bit            31 MiB took 1.003 seconds,   30.917 MiB/s
CHACHA                      35 MiB took 1.139 seconds,   30.740 MiB/s
CHA-POLY                    25 MiB took 1.103 seconds,   22.666 MiB/s
MD5                         75 MiB took 1.009 seconds,   74.360 MiB/s
POLY1305                    90 MiB took 1.041 seconds,   86.473 MiB/s
SHA                         35 MiB took 1.061 seconds,   32.977 MiB/s
SHA-256                     20 MiB took 1.091 seconds,   18.326 MiB/s
SHA-384                     25 MiB took 1.152 seconds,   21.699 MiB/s
SHA-512                     25 MiB took 1.153 seconds,   21.689 MiB/s
SHA-512/224                 25 MiB took 1.151 seconds,   21.721 MiB/s
SHA-512/256                 25 MiB took 1.151 seconds,   21.718 MiB/s
HMAC-MD5                    75 MiB took 1.009 seconds,   74.366 MiB/s
HMAC-SHA                    35 MiB took 1.061 seconds,   32.977 MiB/s
HMAC-SHA256                 20 MiB took 1.093 seconds,   18.302 MiB/s
HMAC-SHA384                 25 MiB took 1.151 seconds,   21.725 MiB/s
HMAC-SHA512                 25 MiB took 1.151 seconds,   21.723 MiB/s
PBKDF2                       2 KiB took 1.001 seconds,    2.279 KiB/s
RSA     2048   public      1500 ops took 1.059 sec, avg 0.706 ms, 1416.165 ops/sec
RSA     2048  private       100 ops took 4.409 sec, avg 44.086 ms, 22.683 ops/sec
DH      2048  key gen       115 ops took 1.003 sec, avg 8.718 ms, 114.703 ops/sec
DH      2048    agree       100 ops took 1.845 sec, avg 18.454 ms, 54.188 ops/sec
ECC   [      SECP256R1]   256  key gen       200 ops took 1.278 sec, avg 6.390 ms, 156.485 ops/sec
ECDHE [      SECP256R1]   256    agree       200 ops took 1.273 sec, avg 6.365 ms, 157.107 ops/sec
ECDSA [      SECP256R1]   256     sign       200 ops took 1.308 sec, avg 6.542 ms, 152.860 ops/sec
ECDSA [      SECP256R1]   256   verify       300 ops took 1.337 sec, avg 4.456 ms, 224.392 ops/sec

dgarske
dgarske previously approved these changes Jul 22, 2024
@SparkiDev
Copy link
Contributor Author

@dgarske,

Please generate benchmarks with assembly code again.

Thanks,
Sean

@SparkiDev SparkiDev assigned dgarske and unassigned SparkiDev Jul 25, 2024
@dgarske
Copy link
Contributor

dgarske commented Jul 25, 2024

Using bbd74f7:

Only slightly faster:

root@HiFiveU:~/wolfssl-riscv# ./wolfcrypt/benchmark/benchmark
------------------------------------------------------------------------------
 wolfSSL version 5.7.2
------------------------------------------------------------------------------
Math: 	Multi-Precision: Wolf(SP) word-size=64 bits=3072 sp_int.c
wolfCrypt Benchmark (block bytes 1048576, min 1.0 sec each)
RNG                         10 MiB took 1.098 seconds,    9.106 MiB/s
AES-128-CBC-enc             20 MiB took 1.063 seconds,   18.810 MiB/s
AES-128-CBC-dec             20 MiB took 1.069 seconds,   18.707 MiB/s
AES-192-CBC-enc             20 MiB took 1.233 seconds,   16.221 MiB/s
AES-192-CBC-dec             20 MiB took 1.237 seconds,   16.168 MiB/s
AES-256-CBC-enc             15 MiB took 1.046 seconds,   14.335 MiB/s
AES-256-CBC-dec             15 MiB took 1.046 seconds,   14.339 MiB/s
AES-128-GCM-enc             15 MiB took 1.291 seconds,   11.618 MiB/s
AES-128-GCM-dec             15 MiB took 1.292 seconds,   11.612 MiB/s
AES-192-GCM-enc             15 MiB took 1.419 seconds,   10.573 MiB/s
AES-192-GCM-dec             15 MiB took 1.418 seconds,   10.575 MiB/s
AES-256-GCM-enc             10 MiB took 1.026 seconds,    9.743 MiB/s
AES-256-GCM-dec             10 MiB took 1.026 seconds,    9.749 MiB/s
GMAC Table 4-bit            31 MiB took 1.025 seconds,   30.253 MiB/s
CHACHA                      35 MiB took 1.154 seconds,   30.325 MiB/s
CHA-POLY                    25 MiB took 1.114 seconds,   22.436 MiB/s
MD5                         75 MiB took 1.010 seconds,   74.247 MiB/s
POLY1305                    90 MiB took 1.042 seconds,   86.409 MiB/s
SHA                         35 MiB took 1.062 seconds,   32.955 MiB/s
SHA-256                     25 MiB took 1.222 seconds,   20.454 MiB/s
SHA-384                     25 MiB took 1.135 seconds,   22.034 MiB/s
SHA-512                     25 MiB took 1.135 seconds,   22.034 MiB/s
SHA-512/224                 25 MiB took 1.134 seconds,   22.040 MiB/s
SHA-512/256                 25 MiB took 1.134 seconds,   22.044 MiB/s
HMAC-MD5                    75 MiB took 1.010 seconds,   74.251 MiB/s
HMAC-SHA                    35 MiB took 1.062 seconds,   32.947 MiB/s
HMAC-SHA256                 25 MiB took 1.222 seconds,   20.454 MiB/s
HMAC-SHA384                 25 MiB took 1.135 seconds,   22.029 MiB/s
HMAC-SHA512                 25 MiB took 1.135 seconds,   22.026 MiB/s
PBKDF2                       3 KiB took 1.008 seconds,    2.542 KiB/s
RSA     2048   public      1500 ops took 1.048 sec, avg 0.699 ms, 1431.121 ops/sec
RSA     2048  private       100 ops took 4.385 sec, avg 43.848 ms, 22.806 ops/sec
DH      2048  key gen       114 ops took 1.006 sec, avg 8.821 ms, 113.364 ops/sec
DH      2048    agree       100 ops took 1.894 sec, avg 18.936 ms, 52.811 ops/sec
ECC   [      SECP256R1]   256  key gen       200 ops took 1.280 sec, avg 6.399 ms, 156.270 ops/sec
ECDHE [      SECP256R1]   256    agree       200 ops took 1.276 sec, avg 6.378 ms, 156.778 ops/sec
ECDSA [      SECP256R1]   256     sign       200 ops took 1.310 sec, avg 6.550 ms, 152.676 ops/sec
ECDSA [      SECP256R1]   256   verify       300 ops took 1.317 sec, avg 4.391 ms, 227.750 ops/sec
Benchmark complete

@dgarske dgarske assigned SparkiDev and unassigned dgarske Jul 25, 2024
dgarske
dgarske previously approved these changes Jul 25, 2024
@SparkiDev
Copy link
Contributor Author

Let me know if there is any improvement in performance with the new assembly code.

@dgarske
Copy link
Contributor

dgarske commented Jul 29, 2024

Appears to be 4.6% faster.
Updated benchmarks on HiFive Unleashed at 1.4GHz and Git commit:

root@HiFiveU:~/wolfssl# ./wolfcrypt/benchmark/benchmark
------------------------------------------------------------------------------
 wolfSSL version 5.7.2
------------------------------------------------------------------------------
Math: 	Multi-Precision: Wolf(SP) word-size=64 bits=3072 sp_int.c
wolfCrypt Benchmark (block bytes 1048576, min 1.0 sec each)
RNG                         10 MiB took 1.055 seconds,    9.477 MiB/s
AES-128-CBC-enc             20 MiB took 1.076 seconds,   18.584 MiB/s
AES-128-CBC-dec             10 MiB took 1.594 seconds,    6.274 MiB/s
AES-192-CBC-enc             20 MiB took 1.245 seconds,   16.059 MiB/s
AES-192-CBC-dec             10 MiB took 1.670 seconds,    5.989 MiB/s
AES-256-CBC-enc             15 MiB took 1.058 seconds,   14.177 MiB/s
AES-256-CBC-dec             10 MiB took 1.888 seconds,    5.298 MiB/s
AES-128-GCM-enc             15 MiB took 1.299 seconds,   11.546 MiB/s
AES-128-GCM-dec             15 MiB took 1.299 seconds,   11.545 MiB/s
AES-192-GCM-enc             15 MiB took 1.426 seconds,   10.517 MiB/s
AES-192-GCM-dec             15 MiB took 1.426 seconds,   10.516 MiB/s
AES-256-GCM-enc             10 MiB took 1.032 seconds,    9.692 MiB/s
AES-256-GCM-dec             10 MiB took 1.032 seconds,    9.692 MiB/s
GMAC Table 4-bit            31 MiB took 1.025 seconds,   30.255 MiB/s
CHACHA                      35 MiB took 1.154 seconds,   30.323 MiB/s
CHA-POLY                    25 MiB took 1.114 seconds,   22.451 MiB/s
MD5                         75 MiB took 1.010 seconds,   74.229 MiB/s
POLY1305                    90 MiB took 1.041 seconds,   86.488 MiB/s
SHA                         35 MiB took 1.067 seconds,   32.792 MiB/s
SHA-256                     25 MiB took 1.166 seconds,   21.446 MiB/s
SHA-384                     25 MiB took 1.129 seconds,   22.146 MiB/s
SHA-512                     25 MiB took 1.128 seconds,   22.153 MiB/s
SHA-512/224                 25 MiB took 1.128 seconds,   22.162 MiB/s
SHA-512/256                 25 MiB took 1.129 seconds,   22.140 MiB/s
HMAC-MD5                    75 MiB took 1.011 seconds,   74.168 MiB/s
HMAC-SHA                    35 MiB took 1.063 seconds,   32.919 MiB/s
HMAC-SHA256                 25 MiB took 1.166 seconds,   21.443 MiB/s
HMAC-SHA384                 25 MiB took 1.128 seconds,   22.163 MiB/s
HMAC-SHA512                 25 MiB took 1.129 seconds,   22.145 MiB/s
PBKDF2                       3 KiB took 1.003 seconds,    2.649 KiB/s
RSA     2048   public      1500 ops took 1.043 sec, avg 0.695 ms, 1438.085 ops/sec
RSA     2048  private       100 ops took 4.381 sec, avg 43.814 ms, 22.824 ops/sec
DH      2048  key gen       116 ops took 1.002 sec, avg 8.641 ms, 115.724 ops/sec
DH      2048    agree       100 ops took 1.834 sec, avg 18.344 ms, 54.513 ops/sec
ECC   [      SECP256R1]   256  key gen       200 ops took 1.288 sec, avg 6.441 ms, 155.249 ops/sec
ECDHE [      SECP256R1]   256    agree       200 ops took 1.284 sec, avg 6.419 ms, 155.792 ops/sec
ECDSA [      SECP256R1]   256     sign       200 ops took 1.319 sec, avg 6.595 ms, 151.632 ops/sec
ECDSA [      SECP256R1]   256   verify       300 ops took 1.335 sec, avg 4.450 ms, 224.718 ops/sec
Benchmark complete

@dgarske
Copy link
Contributor

dgarske commented Jul 29, 2024

Retest this please

Move common defines out of AES file to header file.
@SparkiDev
Copy link
Contributor Author

Last time.
Just let me know the speed.

@dgarske
Copy link
Contributor

dgarske commented Jul 30, 2024

Run again at commit f1e01e4 (no difference).

Is it possible this HiFive Unleashed doesn't support the assembly instructions you are trying to use?

I have another RISC-V board Microchip PolarFire® SoC Discovery Kit (https://www.microchip.com/en-us/development-tool/MPFS-DISCO-KIT?_ga=2.239966534.437271057.1722361866-327000224.1722361866), but its not setup with Linux yet. Would you like me to attempt to run on that one?

./configure --enable-riscv-asm && make
./wolfcrypt/benchmark/benchmark
------------------------------------------------------------------------------
 wolfSSL version 5.7.2
------------------------------------------------------------------------------
Math: 	Multi-Precision: Wolf(SP) word-size=64 bits=3072 sp_int.c
wolfCrypt Benchmark (block bytes 1048576, min 1.0 sec each)
RNG                         10 MiB took 1.059 seconds,    9.439 MiB/s
AES-128-CBC-enc             20 MiB took 1.052 seconds,   19.013 MiB/s
AES-128-CBC-dec             20 MiB took 1.054 seconds,   18.968 MiB/s
AES-192-CBC-enc             20 MiB took 1.222 seconds,   16.361 MiB/s
AES-192-CBC-dec             20 MiB took 1.219 seconds,   16.405 MiB/s
AES-256-CBC-enc             15 MiB took 1.042 seconds,   14.399 MiB/s
AES-256-CBC-dec             15 MiB took 1.038 seconds,   14.446 MiB/s
AES-128-GCM-enc             15 MiB took 1.287 seconds,   11.659 MiB/s
AES-128-GCM-dec             15 MiB took 1.287 seconds,   11.659 MiB/s
AES-192-GCM-enc             15 MiB took 1.412 seconds,   10.623 MiB/s
AES-192-GCM-dec             15 MiB took 1.414 seconds,   10.612 MiB/s
AES-256-GCM-enc             10 MiB took 1.023 seconds,    9.779 MiB/s
AES-256-GCM-dec             10 MiB took 1.021 seconds,    9.796 MiB/s
GMAC Table 4-bit            31 MiB took 1.025 seconds,   30.246 MiB/s
CHACHA                      35 MiB took 1.137 seconds,   30.783 MiB/s
CHA-POLY                    25 MiB took 1.101 seconds,   22.704 MiB/s
MD5                         75 MiB took 1.009 seconds,   74.336 MiB/s
POLY1305                    90 MiB took 1.041 seconds,   86.431 MiB/s
SHA                         35 MiB took 1.062 seconds,   32.966 MiB/s
SHA-256                     25 MiB took 1.165 seconds,   21.451 MiB/s
SHA-384                     25 MiB took 1.125 seconds,   22.231 MiB/s
SHA-512                     25 MiB took 1.125 seconds,   22.213 MiB/s
SHA-512/224                 25 MiB took 1.125 seconds,   22.229 MiB/s
SHA-512/256                 25 MiB took 1.125 seconds,   22.226 MiB/s
HMAC-MD5                    75 MiB took 1.009 seconds,   74.321 MiB/s
HMAC-SHA                    35 MiB took 1.061 seconds,   32.974 MiB/s
HMAC-SHA256                 25 MiB took 1.166 seconds,   21.445 MiB/s
HMAC-SHA384                 25 MiB took 1.125 seconds,   22.226 MiB/s
HMAC-SHA512                 25 MiB took 1.125 seconds,   22.221 MiB/s
PBKDF2                       3 KiB took 1.003 seconds,    2.649 KiB/s
RSA     2048   public      1500 ops took 1.045 sec, avg 0.697 ms, 1435.256 ops/sec
RSA     2048  private       100 ops took 4.395 sec, avg 43.949 ms, 22.754 ops/sec
DH      2048  key gen       116 ops took 1.002 sec, avg 8.635 ms, 115.812 ops/sec
DH      2048    agree       100 ops took 1.836 sec, avg 18.364 ms, 54.454 ops/sec
ECC   [      SECP256R1]   256  key gen       200 ops took 1.291 sec, avg 6.455 ms, 154.912 ops/sec
ECDHE [      SECP256R1]   256    agree       200 ops took 1.286 sec, avg 6.431 ms, 155.494 ops/sec
ECDSA [      SECP256R1]   256     sign       200 ops took 1.320 sec, avg 6.600 ms, 151.518 ops/sec
ECDSA [      SECP256R1]   256   verify       300 ops took 1.327 sec, avg 4.423 ms, 226.115 ops/sec
Benchmark complete

@SparkiDev
Copy link
Contributor Author

That's enough.
I've been trying to improve the base implementation, no extensions, but there just isn't enough you can do - instruction set too simple.

Merge once you are happy with the code.

Thanks,
Sean

@SparkiDev SparkiDev assigned dgarske and unassigned SparkiDev Jul 30, 2024
@dgarske dgarske merged commit 6a1139a into wolfSSL:master Jul 30, 2024
121 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants