Skip to content
ckuethe edited this page Dec 11, 2014 · 3 revisions

The Buildroot image includes some benchmark tools:

  • openssl
  • whetstone
  • dhrystone
  • lmbench
  • tinymembench
  • cache_calibrator
  • ramspeed
  • netperf
  • stress
  • fio
  • iozone
  • bonnie++
# whetstone 100000
Loops: 100000, Iterations: 1, Duration: 35 sec.
C Converted Double Precision Whetstones: 285.7 MIPS

# dhrystone 10000000
Microseconds for one run through Dhrystone:    0.7
Dhrystones per Second:                      1524390.3

# ramspeed -l 1 -b 3
RAMspeed (GENERIC) v2.6.0 by Rhett M. Hollander and Paul V. Bolotoff, 2002-09

INTEGER   Copy:      440.47 MB/s
INTEGER   Scale:     307.86 MB/s
INTEGER   Add:       212.12 MB/s
INTEGER   Triad:     211.79 MB/s
---
INTEGER   AVERAGE:   293.06 MB/s

# openssl speed
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
md2                  0.00         0.00         0.00         0.00         0.00 
mdc2              1604.22k     1779.86k     1822.55k     1841.83k     1834.35k
md4               7641.93k    25956.82k    66855.90k   110162.94k   136108.97k
md5               5971.17k    19607.87k    49923.75k    82258.94k   101657.51k
hmac(md5)         5790.88k    18597.57k    48525.61k    81202.18k   101171.20k
sha1              6314.17k    18527.79k    40392.60k    57632.09k    65956.52k
rmd160            5412.50k    14755.29k    30110.98k    40827.80k    45394.60k
rc4              50650.26k    56746.60k    59247.79k    60164.97k    60162.05k
des cbc           9329.62k     9955.37k    10141.02k    10160.81k    10179.93k
des ede3          3297.65k     3384.36k     3407.87k     3425.78k     3418.79k
idea cbc             0.00         0.00         0.00         0.00         0.00 
seed cbc         14670.05k    15640.81k    15904.51k    15987.03k    16068.92k
rc2 cbc           9301.10k     9740.01k     9876.03k     9869.99k     9882.28k
rc5-32/12 cbc        0.00         0.00         0.00         0.00         0.00 
blowfish cbc     16108.54k    17007.00k    17325.48k    17468.28k    17438.04k
cast cbc         18011.86k    19099.03k    19497.56k    19667.99k    19638.95k
aes-128 cbc      29206.78k    33361.49k    34641.07k    35148.89k    35154.60k
aes-192 cbc      25125.62k    28106.75k    29045.67k    29318.83k    29504.90k
aes-256 cbc      22550.54k    24851.09k    25744.44k    25872.73k    25941.33k
camellia-128 cbc    16838.76k    18033.49k    18582.25k    18569.56k    18606.76k
camellia-192 cbc    13413.60k    14108.65k    14361.86k    14481.55k    14456.15k
camellia-256 cbc    13365.27k    14155.56k    14365.18k    14481.89k    14456.15k
sha256            8093.84k    19353.94k    34883.33k    44066.25k    47521.79k
sha512            3137.71k    12596.83k    19986.01k    28258.63k    31976.11k
whirlpool          725.15k     1480.69k     2383.96k     2819.41k     2978.16k
aes-128 ige      28145.47k    31839.23k    33371.61k    33752.62k    33753.77k
aes-192 ige      24346.81k    27047.98k    28067.67k    28431.93k    28420.78k
aes-256 ige      21930.09k    24193.95k    24898.30k    25106.77k    25252.73k
ghash            42546.49k    50217.77k    53402.11k    54039.89k    54475.43k
                  sign    verify    sign/s verify/s
rsa  512 bits 0.001350s 0.000130s    740.5   7710.6
rsa 1024 bits 0.007609s 0.000412s    131.4   2425.1
rsa 2048 bits 0.049212s 0.001502s     20.3    665.8
rsa 4096 bits 0.352759s 0.005778s      2.8    173.1
                  sign    verify    sign/s verify/s
dsa  512 bits 0.001364s 0.001431s    733.3    698.8
dsa 1024 bits 0.004056s 0.004623s    246.5    216.3
dsa 2048 bits 0.014535s 0.016863s     68.8     59.3
                              sign    verify    sign/s verify/s
 160 bit ecdsa (secp160r1)   0.0008s   0.0032s   1216.3    316.9
 192 bit ecdsa (nistp192)   0.0011s   0.0044s    942.2    229.3
 224 bit ecdsa (nistp224)   0.0014s   0.0059s    732.0    170.5
 256 bit ecdsa (nistp256)   0.0017s   0.0077s    577.6    129.4
 384 bit ecdsa (nistp384)   0.0040s   0.0198s    252.0     50.6
 521 bit ecdsa (nistp521)   0.0082s   0.0449s    122.7     22.3
 163 bit ecdsa (nistk163)   0.0032s   0.0094s    310.8    106.6
 233 bit ecdsa (nistk233)   0.0075s   0.0171s    132.6     58.4
 283 bit ecdsa (nistk283)   0.0121s   0.0313s     82.5     31.9
 409 bit ecdsa (nistk409)   0.0311s   0.0667s     32.1     15.0
 571 bit ecdsa (nistk571)   0.0773s   0.1545s     12.9      6.5
 163 bit ecdsa (nistb163)   0.0032s   0.0101s    313.6     99.0
 233 bit ecdsa (nistb233)   0.0075s   0.0187s    133.4     53.6
 283 bit ecdsa (nistb283)   0.0122s   0.0346s     82.3     28.9
 409 bit ecdsa (nistb409)   0.0314s   0.0748s     31.9     13.4
 571 bit ecdsa (nistb571)   0.0774s   0.1758s     12.9      5.7
                              op      op/s
 160 bit ecdh (secp160r1)   0.0026s    382.7
 192 bit ecdh (nistp192)   0.0036s    281.4
 224 bit ecdh (nistp224)   0.0050s    200.5
 256 bit ecdh (nistp256)   0.0065s    155.0
 384 bit ecdh (nistp384)   0.0166s     60.4
 521 bit ecdh (nistp521)   0.0369s     27.1
 163 bit ecdh (nistk163)   0.0046s    219.2
 233 bit ecdh (nistk233)   0.0084s    119.3
 283 bit ecdh (nistk283)   0.0153s     65.2
 409 bit ecdh (nistk409)   0.0329s     30.4
 571 bit ecdh (nistk571)   0.0764s     13.1
 163 bit ecdh (nistb163)   0.0049s    202.8
 233 bit ecdh (nistb233)   0.0092s    109.2
 283 bit ecdh (nistb283)   0.0170s     58.7
 409 bit ecdh (nistb409)   0.0369s     27.1
 571 bit ecdh (nistb571)   0.0867s     11.5

# tinymembench 
tinymembench v0.3 (simple benchmark for memory throughput and latency)
==========================================================================
== Memory bandwidth tests                                               ==
==                                                                      ==
== Note 1: 1MB = 1000000 bytes                                          ==
== Note 2: Results for 'copy' tests show how many bytes can be          ==
==         copied per second (adding together read and writen           ==
==         bytes would have provided twice higher numbers)              ==
== Note 3: 2-pass copy means that we are using a small temporary buffer ==
==         to first fetch data into it, and only then write it to the   ==
==         destination (source -> L1 cache, L1 cache -> destination)    ==
== Note 4: If sample standard deviation exceeds 0.1%, it is shown in    ==
==         brackets                                                     ==
==========================================================================

 C copy backwards                                     :    222.2 MB/s (4.4%)
 C copy                                               :    197.2 MB/s
 C copy prefetched (32 bytes step)                    :    443.9 MB/s
 C copy prefetched (64 bytes step)                    :    444.0 MB/s
 C 2-pass copy                                        :    168.9 MB/s
 C 2-pass copy prefetched (32 bytes step)             :    328.5 MB/s
 C 2-pass copy prefetched (64 bytes step)             :    328.5 MB/s
 C fill                                               :    878.7 MB/s (0.1%)
 ---
 standard memcpy                                      :    238.5 MB/s
 standard memset                                      :    880.6 MB/s (0.1%)
 ---
 NEON read                                            :    277.9 MB/s
 NEON read prefetched (32 bytes step)                 :    622.0 MB/s (0.4%)
 NEON read prefetched (64 bytes step)                 :    628.8 MB/s (0.2%)
 NEON copy                                            :    266.9 MB/s (0.8%)
 NEON copy prefetched (32 bytes step)                 :    472.0 MB/s
 NEON copy prefetched (64 bytes step)                 :    470.3 MB/s (0.1%)
 NEON unrolled copy                                   :    271.1 MB/s (0.2%)
 NEON unrolled copy prefetched (32 bytes step)        :    473.7 MB/s
 NEON unrolled copy prefetched (64 bytes step)        :    474.4 MB/s
 NEON copy backwards                                  :    264.8 MB/s (0.8%)
 NEON copy backwards prefetched (32 bytes step)       :    471.6 MB/s
 NEON copy backwards prefetched (64 bytes step)       :    470.3 MB/s
 NEON 2-pass copy                                     :    210.6 MB/s (0.3%)
 NEON 2-pass copy prefetched (32 bytes step)          :    356.4 MB/s
 NEON 2-pass copy prefetched (64 bytes step)          :    366.0 MB/s
 NEON unrolled 2-pass copy                            :    210.1 MB/s (0.2%)
 NEON unrolled 2-pass copy prefetched (32 bytes step) :    355.2 MB/s (0.2%)
 NEON unrolled 2-pass copy prefetched (64 bytes step) :    364.9 MB/s
 NEON fill                                            :    917.5 MB/s (0.1%)
 NEON fill backwards                                  :    917.5 MB/s (0.1%)
 ARM fill (STRD)                                      :    879.6 MB/s (0.1%)
 ARM fill (STM with 8 registers)                      :    880.4 MB/s (0.1%)
 ARM fill (STM with 4 registers)                      :    878.5 MB/s (0.1%)
 ARM copy prefetched (incr pld)                       :    466.4 MB/s
 ARM copy prefetched (wrap pld)                       :    453.8 MB/s
 ARM 2-pass copy prefetched (incr pld)                :    345.9 MB/s (0.3%)
 ARM 2-pass copy prefetched (wrap pld)                :    308.8 MB/s

==========================================================================
== Memory latency test                                                  ==
==                                                                      ==
== Average time is measured for random memory accesses in the buffers   ==
== of different sizes. The larger is the buffer, the more significant   ==
== are relative contributions of TLB, L1/L2 cache misses and SDRAM      ==
== accesses. For extremely large buffer sizes we are expecting to see   ==
== page table walk with several requests to SDRAM for almost every      ==
== memory access (though 64MiB is not nearly large enough to experience ==
== this effect to its fullest).                                         ==
==                                                                      ==
== Note 1: All the numbers are representing extra time, which needs to  ==
==         be added to L1 cache latency. The cycle timings for L1 cache ==
==         latency can be usually found in the processor documentation. ==
== Note 2: Dual random read means that we are simultaneously performing ==
==         two independent memory accesses at a time. In the case if    ==
==         the memory subsystem can't handle multiple outstanding       ==
==         requests, dual random read has the same timings as two       ==
==         single reads performed one after another.                    ==
==========================================================================

block size : single random read / dual random read
      1024 :    0.0 ns          /     0.0 ns 
      2048 :    0.0 ns          /     0.0 ns 
      4096 :    0.0 ns          /     0.0 ns 
      8192 :    0.0 ns          /     0.0 ns 
     16384 :    0.0 ns          /     0.0 ns 
     32768 :    0.0 ns          /     0.1 ns 
     65536 :    8.8 ns          /    16.9 ns 
    131072 :   13.2 ns          /    25.4 ns 
    262144 :   47.3 ns          /    92.9 ns 
    524288 :  170.9 ns          /   335.0 ns 
   1048576 :  233.6 ns          /   453.8 ns 
   2097152 :  266.0 ns          /   514.0 ns 
   4194304 :  283.9 ns          /   545.6 ns 
   8388608 :  295.2 ns          /   566.3 ns 
  16777216 :  306.3 ns          /   586.8 ns 
  33554432 :  321.6 ns          /   616.7 ns 
  67108864 :  352.7 ns          /   676.6 ns 
Clone this wiki locally