Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pre-release PR #44

Merged
merged 9 commits into from
Jun 20, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
55 changes: 41 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -88,7 +88,8 @@ To run just one benchmark, use a filter:

```
cd benchmark
dotnet run --configuration Release --filter "*Arabic-Lipsum*"
dotnet run --configuration Release --filter "*Twitter*"
dotnet run --configuration Release --filter "*Lipsum*"
```

If you are under macOS or Linux, you may want to run the benchmarks in privileged mode:
Expand All @@ -98,26 +99,52 @@ cd benchmark
sudo dotnet run -c Release
```


--anyCategories sse avx avx512
## Results (x64)

To be completed.
On an Intel Ice Lake system, our validation function is up to 13 times
faster than the standard library.
A realistic input is Twitter.json which is mostly ASCII with some Unicode content
where we are 2.4 times faster.

| data set | SimdUnicode current AVX2 (GB/s) | .NET speed (GB/s) | speed up |
|:----------------|:------------------------|:-------------------|:-------------------|
| Twitter.json | 29 | 12 | 2.4 x |
| Arabic-Lipsum | 12 | 2.3 | 5.2 x |
| Chinese-Lipsum | 12 | 3.9 | 3.0 x |
| Emoji-Lipsum | 12 | 0.9 | 13 x |
| Hebrew-Lipsum |12 | 2.3 | 5.2 x |
| Hindi-Lipsum | 12 | 2.1 | 5.7 x |
| Japanese-Lipsum | 10  | 3.5 | 2.9 x |
| Korean-Lipsum | 10 | 1.3 | 7.7 x |
| Latin-Lipsum | 76 | 76 | --- |
| Russian-Lipsum | 12 | 1.2 | 10 x |



On x64 system, we offer several functions: a fallback function for legacy systems,
a SSE42 function for older CPUs, an AVX2 function for current x64 systems and
an AVX-512 function for the most recent processors (AMD Zen 4 or better, Intel
Ice Lake, etc.).

## Results (ARM)

On an Apple M2 system, our validation function is two to three times
On an Apple M2 system, our validation function is 1.5 to four times
faster than the standard library.

| data set | SimdUnicode speed (GB/s) | .NET speed (GB/s) |
|:----------------|:-----------|:--------------------------|
| Arabic-Lipsum | 6.7 | 3.5 |
| Chinese-Lipsum | 6.7 | 4.8 |
| Emoji-Lipsum | 6.7 | 2.5 |
| Hebrew-Lipsum | 6.7 | 3.5 |
| Hindi-Lipsum | 6.8 | 3.0 |
| Japanese-Lipsum | 6.8 | 4.6  |
| Korean-Lipsum | 6.6 | 1.8 |
| Latin-Lipsum | 87 | 38 |
| Russian-Lipsum | 6.7 | 2.6 |
| data set | SimdUnicode speed (GB/s) | .NET speed (GB/s) | speed up |
|:----------------|:-----------|:--------------------------|:-------------------|
| Twitter.json | 25 | 14 | 1.8 x |
| Arabic-Lipsum | 7.4 | 3.5 | 2.1 x |
| Chinese-Lipsum | 7.4 | 4.8 | 1.5 x |
| Emoji-Lipsum | 7.4 | 2.5 | 3.0 x |
| Hebrew-Lipsum | 7.4 | 3.5 | 2.1 x |
| Hindi-Lipsum | 7.3 | 3.0 | 2.4 x |
| Japanese-Lipsum | 7.3 | 4.6  | 1.6 x |
| Korean-Lipsum | 7.4 | 1.8 | 4.1 x |
| Latin-Lipsum | 87 | 38 | 2.3 x |
| Russian-Lipsum | 7.4 | 2.7 | 2.7 x |


## Building the library
Expand Down
49 changes: 30 additions & 19 deletions benchmark/Benchmark.cs
Original file line number Diff line number Diff line change
Expand Up @@ -62,58 +62,70 @@ public string GetValue(Summary summary, BenchmarkCase benchmarkCase)
[Config(typeof(Config))]
public class RealDataBenchmark
{
// We only informs the user once about the SIMD support of the system.
private static bool printed;
#pragma warning disable CA1812
private sealed class Config : ManualConfig
{
public Config()
{
AddColumn(new Speed());


if (RuntimeInformation.ProcessArchitecture == Architecture.Arm64)
{
if (!printed)
{
#pragma warning disable CA1303
Console.WriteLine("ARM64 system detected.");
AddFilter(new AnyCategoriesFilter(["arm64", "scalar", "runtime"]));

Console.WriteLine("ARM64 system detected.");
printed = true;
}
}
else if (RuntimeInformation.ProcessArchitecture == Architecture.X64)
{
if (Vector512.IsHardwareAccelerated && System.Runtime.Intrinsics.X86.Avx512Vbmi.IsSupported)
{
if (!printed)
{
#pragma warning disable CA1303
Console.WriteLine("X64 system detected (Intel, AMD,...) with AVX-512 support.");
AddFilter(new AnyCategoriesFilter(["avx512", "avx", "sse", "scalar", "runtime"]));
Console.WriteLine("X64 system detected (Intel, AMD,...) with AVX-512 support.");
printed = true;
}
}
else if (Avx2.IsSupported)
{
if (!printed)
{
#pragma warning disable CA1303
Console.WriteLine("X64 system detected (Intel, AMD,...) with AVX2 support.");
AddFilter(new AnyCategoriesFilter(["avx", "sse", "scalar", "runtime"]));
Console.WriteLine("X64 system detected (Intel, AMD,...) with AVX2 support.");
printed = true;
}
}
else if (Ssse3.IsSupported)
{
if (!printed)
{
#pragma warning disable CA1303
Console.WriteLine("X64 system detected (Intel, AMD,...) with Sse4.2 support.");
AddFilter(new AnyCategoriesFilter(["sse", "scalar", "runtime"]));
Console.WriteLine("X64 system detected (Intel, AMD,...) with Sse4.2 support.");
printed = true;
}
}
else
{
if (!printed)
{
#pragma warning disable CA1303
Console.WriteLine("X64 system detected (Intel, AMD,...) without relevant SIMD support.");
AddFilter(new AnyCategoriesFilter(["scalar", "runtime"]));
Console.WriteLine("X64 system detected (Intel, AMD,...) without relevant SIMD support.");
printed = true;
}
}
}
else
{
AddFilter(new AnyCategoriesFilter(["scalar", "runtime"]));

}
AddFilter(new AnyCategoriesFilter(["default"]));

}
}
// Parameters and variables for real data
[Params(@"data/Arabic-Lipsum.utf8.txt",
[Params(@"data/twitter.json",
@"data/Arabic-Lipsum.utf8.txt",
@"data/Hebrew-Lipsum.utf8.txt",
@"data/Korean-Lipsum.utf8.txt",
@"data/Chinese-Lipsum.utf8.txt",
Expand Down Expand Up @@ -285,7 +297,6 @@ public unsafe void SIMDUtf8ValidationRealDataSse()
});
}
}

}
public class Program
{
Expand Down
3 changes: 3 additions & 0 deletions benchmark/benchmark.csproj
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,9 @@
<None Update="data\*.utf8.txt">
<CopyToOutputDirectory>Always</CopyToOutputDirectory>
</None>
<None Update="data\twitter.json">
<CopyToOutputDirectory>Always</CopyToOutputDirectory>
</None>
</ItemGroup>


Expand Down
Loading