Skip to content

Commit

Permalink
tweaking
Browse files Browse the repository at this point in the history
  • Loading branch information
lemire committed Jun 19, 2024
1 parent d6a0157 commit 5f6f669
Showing 1 changed file with 21 additions and 21 deletions.
42 changes: 21 additions & 21 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -102,27 +102,27 @@ sudo dotnet run -c Release
## Results (x64)

On x64 system, we offer several functions: a fallback function for legacy systems,
a SSE42 function for older CPUs, an AVX2 function for current x64 systems, and an
AVX-512 function for the most recent systems (AMD Zen 4, Intel Ice lake, etc.).

On an Intel Ice Lake system, our validation function is up to several times
faster than the standard library when using at least the AVX2 routines. Only on pure
ASCII inputs (Latin-Lipsum) is the standard library seemingly faster, but all functions
are effectively at "memory speed" so the difference is practically not significant.
A more realistic input is Twitter.json which is mostly ASCII with some Unicode content.

| data set | SimdUnicode SSE42 (GB/s) | SimdUnicode AVX2 (GB/s) | .NET speed (GB/s) |
|:----------------|:-------------------------|:------------------------|-------------------|
| Twitter.json | 16 | 24 | 12 |
| Arabic-Lipsum | 4.9 | 9.0 | 2.3 |
| Chinese-Lipsum | 4.9 | 9.0 | 3.9 |
| Emoji-Lipsum | 4.3 | 7.1 | 0.9 |
| Hebrew-Lipsum | 4.5 | 8.0 | 2.3 |
| Hindi-Lipsum | 4.3 | 8.0 | 2.1 |
| Japanese-Lipsum | 4.5 | 8.0  | 3.5 |
| Korean-Lipsum | 4.5 | 8.0 | 1.3 |
| Latin-Lipsum | 50 | 76 | 96 |
| Russian-Lipsum | 4.3 | 8.0 | 1.2 |
a SSE42 function for older CPUs, and an AVX2 function for current x64 systems.

On an Intel Ice Lake system, our validation function is up to seven times
faster than the standard library.
A realistic input is Twitter.json which is mostly ASCII with some Unicode content.

| data set | SimdUnicode current AVX2 (GB/s) | .NET speed (GB/s) |
|:----------------|:------------------------|-------------------|
| Twitter.json | 24 | 12 |
| Arabic-Lipsum | 9.0 | 2.3 |
| Chinese-Lipsum | 9.0 | 3.9 |
| Emoji-Lipsum | 7.1 | 0.9 |
| Hebrew-Lipsum | 8.0 | 2.3 |
| Hindi-Lipsum | 8.0 | 2.1 |
| Japanese-Lipsum | 8.0  | 3.5 |
| Korean-Lipsum | 8.0 | 1.3 |
| Latin-Lipsum | 76 | 96 |
| Russian-Lipsum | 8.0 | 1.2 |

On the pure ASCII inputs (Latin-Lipsum) has a small advantage but both
functions are extremely fast.

## Results (ARM)

Expand Down

0 comments on commit 5f6f669

Please sign in to comment.