-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
crypto/aes: speedup CTR mode on AMD64 and ARM64 #53503
base: master
Are you sure you want to change the base?
Conversation
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
This PR (HEAD: d9b6a72) has been imported to Gerrit for code review. Please visit https://go-review.googlesource.com/c/go/+/413594 to see it. Tip: You can toggle comments from me using the |
Message from Gopher Robot: Patch Set 1: Congratulations on opening your first change. Thank you for your contribution! Next steps: Most changes in the Go project go through a few rounds of revision. This can be During May-July and Nov-Jan the Go project is in a code freeze, during which Please don’t reply on this GitHub thread. Visit golang.org/cl/413594. |
Message from Борис Нагаев: Patch Set 1: (1 comment) Please don’t reply on this GitHub thread. Visit golang.org/cl/413594. |
Message from Ian Lance Taylor: Patch Set 2: (1 comment) Please don’t reply on this GitHub thread. Visit golang.org/cl/413594. |
@starius would be great if you were following up on the python vs go topic? |
@andig I'm rewriting the generators in Go. Want to make a single Go program producing for all architectures. |
d9b6a72
to
e24634a
Compare
This PR (HEAD: e24634a) has been imported to Gerrit for code review. Please visit Gerrit at https://go-review.googlesource.com/c/go/+/413594. Important tips:
|
@andig I finished with generators and added correctness test. Waiting for review in Gerrit :-) |
Message from Борис Нагаев: Patch Set 3: (1 comment) Please don’t reply on this GitHub thread. Visit golang.org/cl/413594. |
Message from Ian Lance Taylor: Patch Set 3: Commit-Queue+1 Please don’t reply on this GitHub thread. Visit golang.org/cl/413594. |
Message from Go LUCI: Patch Set 3: Dry run: CV is trying the patch. Bot data: {"action":"start","triggered_at":"2023-09-12T01:14:29Z","revision":"e73f75e3161f92b39819bc5ff5b915f7a9c87bda"} Please don’t reply on this GitHub thread. Visit golang.org/cl/413594. |
Message from Ian Lance Taylor: Patch Set 3: -Commit-Queue Please don’t reply on this GitHub thread. Visit golang.org/cl/413594. |
Message from Go LUCI: Patch Set 3: This CL has failed the run. Reason: Tryjob golang/try/gotip-linux-arm64 has failed Please don’t reply on this GitHub thread. Visit golang.org/cl/413594. |
Message from Go LUCI: Patch Set 3: LUCI-TryBot-Result-1 Please don’t reply on this GitHub thread. Visit golang.org/cl/413594. |
e24634a
to
9683c85
Compare
This PR (HEAD: 9683c85) has been imported to Gerrit for code review. Please visit Gerrit at https://go-review.googlesource.com/c/go/+/413594. Important tips:
|
Message from Борис Нагаев: Patch Set 4: (1 comment) Please don’t reply on this GitHub thread. Visit golang.org/cl/413594. |
Message from Gopher Robot: Patch Set 1: Congratulations on opening your first change. Thank you for your contribution! Next steps: Most changes in the Go project go through a few rounds of revision. This can be During May-July and Nov-Jan the Go project is in a code freeze, during which Please don’t reply on this GitHub thread. Visit golang.org/cl/413594. |
Message from Борис Нагаев: Patch Set 1: (1 comment) Please don’t reply on this GitHub thread. Visit golang.org/cl/413594. |
Message from Ian Lance Taylor: Patch Set 2: (1 comment) Please don’t reply on this GitHub thread. Visit golang.org/cl/413594. |
Message from Борис Нагаев: Patch Set 3: (1 comment) Please don’t reply on this GitHub thread. Visit golang.org/cl/413594. |
Message from Ian Lance Taylor: Patch Set 3: Commit-Queue+1 Please don’t reply on this GitHub thread. Visit golang.org/cl/413594. |
Message from Go LUCI: Patch Set 3: Dry run: CV is trying the patch. Bot data: {"action":"start","triggered_at":"2023-09-12T01:14:29Z","revision":"e73f75e3161f92b39819bc5ff5b915f7a9c87bda"} Please don’t reply on this GitHub thread. Visit golang.org/cl/413594. |
Message from Ian Lance Taylor: Patch Set 3: -Commit-Queue Please don’t reply on this GitHub thread. Visit golang.org/cl/413594. |
Message from Go LUCI: Patch Set 3: This CL has failed the run. Reason: Tryjob golang/try/gotip-linux-arm64 has failed Please don’t reply on this GitHub thread. Visit golang.org/cl/413594. |
Message from Go LUCI: Patch Set 3: LUCI-TryBot-Result-1 Please don’t reply on this GitHub thread. Visit golang.org/cl/413594. |
Message from Борис Нагаев: Patch Set 4: (1 comment) Please don’t reply on this GitHub thread. Visit golang.org/cl/413594. |
Message from Nicola Murino: Patch Set 6: Run-TryBot+1 Please don’t reply on this GitHub thread. Visit golang.org/cl/413594. |
Message from Gopher Robot: Patch Set 6: (2 comments) Please don’t reply on this GitHub thread. Visit golang.org/cl/413594. |
Message from Gopher Robot: Patch Set 6: (1 comment) Please don’t reply on this GitHub thread. Visit golang.org/cl/413594. |
931a533
to
92848a1
Compare
Message from Gopher Robot: Patch Set 6: TryBot-Result-1 (1 comment) Please don’t reply on this GitHub thread. Visit golang.org/cl/413594. |
This PR (HEAD: 92848a1) has been imported to Gerrit for code review. Please visit Gerrit at https://go-review.googlesource.com/c/go/+/413594. Important tips:
|
Message from Nicola Murino: Patch Set 7: Run-TryBot+1 Please don’t reply on this GitHub thread. Visit golang.org/cl/413594. |
Message from Gopher Robot: Patch Set 7: (2 comments) Please don’t reply on this GitHub thread. Visit golang.org/cl/413594. |
Message from Борис Нагаев: Patch Set 6: (1 comment) Please don’t reply on this GitHub thread. Visit golang.org/cl/413594. |
Message from Gopher Robot: Patch Set 7: TryBot-Result+1 (1 comment) Please don’t reply on this GitHub thread. Visit golang.org/cl/413594. |
Message from qiulaidongfeng: Patch Set 8: Run-TryBot+1 Please don’t reply on this GitHub thread. Visit golang.org/cl/413594. |
Message from qiulaidongfeng: Patch Set 8: -Run-TryBot Please don’t reply on this GitHub thread. Visit golang.org/cl/413594. |
The new implementation, fileCipherStreamV2, is currently only hooked up in tests and benchmarks. It's not used in production until we decide on a feature flagging strategy. The major change is to defer more to the standard library, and specifically to pass larger batches to the stdlib at once. Previously we made a separate call to the crypto library for every 16-byte block; now we pass the entire buffer at once. The downside is random access is more expensive. The previous implementation was basically constant-time, while the new one performs multiple heap allocations for every "seek". To summarize the performance results below, v2 is significantly slower than v1 for tiny (16 byte) non-sequential operations. It is about the same performance for small sequential operations, and substantially faster for large sequential operations (about twice as fast as the original version). The FIPS version is now even faster (for large sequential writes) than non-FIPS. This is because Go's implementation of CTR mode has not been as extensively optimized as openssl's. There are open PRs that may improve this in the future, such as golang/go#53503 The release notes will come once this is actually hooked up with its feature flag. Release note: None Epic: None goos: linux goarch: amd64 cpu: Intel(R) Xeon(R) CPU @ 2.80GHz BenchmarkFileCipherStream/fips=false/impl=v1/key=128/seq=false/block=16/-24 7521 137392 ns/op 238.50 MB/s BenchmarkFileCipherStream/fips=false/impl=v1/key=128/seq=false/block=256/-24 15793 75349 ns/op 434.88 MB/s BenchmarkFileCipherStream/fips=false/impl=v1/key=128/seq=false/block=1024/-24 16778 71159 ns/op 460.49 MB/s BenchmarkFileCipherStream/fips=false/impl=v1/key=128/seq=false/block=16384/-24 17083 69697 ns/op 470.15 MB/s BenchmarkFileCipherStream/fips=false/impl=v1/key=128/seq=true/block=16/-24 7797 137415 ns/op 238.46 MB/s BenchmarkFileCipherStream/fips=false/impl=v1/key=128/seq=true/block=256/-24 15794 75635 ns/op 433.24 MB/s BenchmarkFileCipherStream/fips=false/impl=v1/key=128/seq=true/block=1024/-24 16785 71195 ns/op 460.26 MB/s BenchmarkFileCipherStream/fips=false/impl=v1/key=128/seq=true/block=16384/-24 17169 69813 ns/op 469.37 MB/s BenchmarkFileCipherStream/fips=false/impl=v1/key=192/seq=false/block=16/-24 7510 142334 ns/op 230.22 MB/s BenchmarkFileCipherStream/fips=false/impl=v1/key=192/seq=false/block=256/-24 14797 81525 ns/op 401.94 MB/s BenchmarkFileCipherStream/fips=false/impl=v1/key=192/seq=false/block=1024/-24 15715 76005 ns/op 431.13 MB/s BenchmarkFileCipherStream/fips=false/impl=v1/key=192/seq=false/block=16384/-24 15985 74794 ns/op 438.11 MB/s BenchmarkFileCipherStream/fips=false/impl=v1/key=192/seq=true/block=16/-24 7558 142334 ns/op 230.22 MB/s BenchmarkFileCipherStream/fips=false/impl=v1/key=192/seq=true/block=256/-24 14826 80456 ns/op 407.28 MB/s BenchmarkFileCipherStream/fips=false/impl=v1/key=192/seq=true/block=1024/-24 15757 76231 ns/op 429.85 MB/s BenchmarkFileCipherStream/fips=false/impl=v1/key=192/seq=true/block=16384/-24 16063 74763 ns/op 438.29 MB/s BenchmarkFileCipherStream/fips=false/impl=v1/key=256/seq=false/block=16/-24 7304 146606 ns/op 223.51 MB/s BenchmarkFileCipherStream/fips=false/impl=v1/key=256/seq=false/block=256/-24 13947 85653 ns/op 382.57 MB/s BenchmarkFileCipherStream/fips=false/impl=v1/key=256/seq=false/block=1024/-24 14589 81277 ns/op 403.16 MB/s BenchmarkFileCipherStream/fips=false/impl=v1/key=256/seq=false/block=16384/-24 14989 80079 ns/op 409.19 MB/s BenchmarkFileCipherStream/fips=false/impl=v1/key=256/seq=true/block=16/-24 7274 145928 ns/op 224.55 MB/s BenchmarkFileCipherStream/fips=false/impl=v1/key=256/seq=true/block=256/-24 13935 85680 ns/op 382.45 MB/s BenchmarkFileCipherStream/fips=false/impl=v1/key=256/seq=true/block=1024/-24 14690 81366 ns/op 402.73 MB/s BenchmarkFileCipherStream/fips=false/impl=v1/key=256/seq=true/block=16384/-24 15002 80120 ns/op 408.99 MB/s BenchmarkFileCipherStream/fips=false/impl=v2/key=128/seq=false/block=16/-24 615 1896677 ns/op 17.28 MB/s BenchmarkFileCipherStream/fips=false/impl=v2/key=128/seq=false/block=256/-24 9006 119509 ns/op 274.19 MB/s BenchmarkFileCipherStream/fips=false/impl=v2/key=128/seq=false/block=1024/-24 23329 51281 ns/op 638.99 MB/s BenchmarkFileCipherStream/fips=false/impl=v2/key=128/seq=false/block=16384/-24 29162 40910 ns/op 800.97 MB/s BenchmarkFileCipherStream/fips=false/impl=v2/key=128/seq=true/block=16/-24 8750 132576 ns/op 247.16 MB/s BenchmarkFileCipherStream/fips=false/impl=v2/key=128/seq=true/block=256/-24 26404 45361 ns/op 722.39 MB/s BenchmarkFileCipherStream/fips=false/impl=v2/key=128/seq=true/block=1024/-24 28626 41808 ns/op 783.78 MB/s BenchmarkFileCipherStream/fips=false/impl=v2/key=128/seq=true/block=16384/-24 29336 40935 ns/op 800.48 MB/s BenchmarkFileCipherStream/fips=false/impl=v2/key=192/seq=false/block=16/-24 574 2042367 ns/op 16.04 MB/s BenchmarkFileCipherStream/fips=false/impl=v2/key=192/seq=false/block=256/-24 8706 129528 ns/op 252.98 MB/s BenchmarkFileCipherStream/fips=false/impl=v2/key=192/seq=false/block=1024/-24 21183 56446 ns/op 580.52 MB/s BenchmarkFileCipherStream/fips=false/impl=v2/key=192/seq=false/block=16384/-24 26109 45655 ns/op 717.73 MB/s BenchmarkFileCipherStream/fips=false/impl=v2/key=192/seq=true/block=16/-24 8504 137355 ns/op 238.56 MB/s BenchmarkFileCipherStream/fips=false/impl=v2/key=192/seq=true/block=256/-24 23960 50296 ns/op 651.50 MB/s BenchmarkFileCipherStream/fips=false/impl=v2/key=192/seq=true/block=1024/-24 25696 46679 ns/op 701.99 MB/s BenchmarkFileCipherStream/fips=false/impl=v2/key=192/seq=true/block=16384/-24 26198 45587 ns/op 718.80 MB/s BenchmarkFileCipherStream/fips=false/impl=v2/key=256/seq=false/block=16/-24 531 2206987 ns/op 14.85 MB/s BenchmarkFileCipherStream/fips=false/impl=v2/key=256/seq=false/block=256/-24 7694 139498 ns/op 234.90 MB/s BenchmarkFileCipherStream/fips=false/impl=v2/key=256/seq=false/block=1024/-24 19524 61046 ns/op 536.77 MB/s BenchmarkFileCipherStream/fips=false/impl=v2/key=256/seq=false/block=16384/-24 23745 50562 ns/op 648.07 MB/s BenchmarkFileCipherStream/fips=false/impl=v2/key=256/seq=true/block=16/-24 8245 148016 ns/op 221.38 MB/s BenchmarkFileCipherStream/fips=false/impl=v2/key=256/seq=true/block=256/-24 21739 55280 ns/op 592.76 MB/s BenchmarkFileCipherStream/fips=false/impl=v2/key=256/seq=true/block=1024/-24 23212 51531 ns/op 635.89 MB/s BenchmarkFileCipherStream/fips=false/impl=v2/key=256/seq=true/block=16384/-24 23805 50438 ns/op 649.66 MB/s PASS goos: linux goarch: amd64 cpu: Intel(R) Xeon(R) CPU @ 2.80GHz BenchmarkFileCipherStream/fips=true/impl=v1/key=128/seq=false/block=16/-24 2050 569817 ns/op 57.51 MB/s BenchmarkFileCipherStream/fips=true/impl=v1/key=128/seq=false/block=256/-24 2499 490116 ns/op 66.86 MB/s BenchmarkFileCipherStream/fips=true/impl=v1/key=128/seq=false/block=1024/-24 2473 478146 ns/op 68.53 MB/s BenchmarkFileCipherStream/fips=true/impl=v1/key=128/seq=false/block=16384/-24 2518 477786 ns/op 68.58 MB/s BenchmarkFileCipherStream/fips=true/impl=v1/key=128/seq=true/block=16/-24 2067 566694 ns/op 57.82 MB/s BenchmarkFileCipherStream/fips=true/impl=v1/key=128/seq=true/block=256/-24 2448 488315 ns/op 67.10 MB/s BenchmarkFileCipherStream/fips=true/impl=v1/key=128/seq=true/block=1024/-24 2490 479422 ns/op 68.35 MB/s BenchmarkFileCipherStream/fips=true/impl=v1/key=128/seq=true/block=16384/-24 2514 481941 ns/op 67.99 MB/s BenchmarkFileCipherStream/fips=true/impl=v1/key=192/seq=false/block=16/-24 2008 568220 ns/op 57.67 MB/s BenchmarkFileCipherStream/fips=true/impl=v1/key=192/seq=false/block=256/-24 2407 493184 ns/op 66.44 MB/s BenchmarkFileCipherStream/fips=true/impl=v1/key=192/seq=false/block=1024/-24 2428 480403 ns/op 68.21 MB/s BenchmarkFileCipherStream/fips=true/impl=v1/key=192/seq=false/block=16384/-24 2421 482593 ns/op 67.90 MB/s BenchmarkFileCipherStream/fips=true/impl=v1/key=192/seq=true/block=16/-24 2040 573229 ns/op 57.16 MB/s BenchmarkFileCipherStream/fips=true/impl=v1/key=192/seq=true/block=256/-24 2314 496560 ns/op 65.99 MB/s BenchmarkFileCipherStream/fips=true/impl=v1/key=192/seq=true/block=1024/-24 2478 476019 ns/op 68.84 MB/s BenchmarkFileCipherStream/fips=true/impl=v1/key=192/seq=true/block=16384/-24 2449 483360 ns/op 67.79 MB/s BenchmarkFileCipherStream/fips=true/impl=v1/key=256/seq=false/block=16/-24 1998 574193 ns/op 57.07 MB/s BenchmarkFileCipherStream/fips=true/impl=v1/key=256/seq=false/block=256/-24 2356 488059 ns/op 67.14 MB/s BenchmarkFileCipherStream/fips=true/impl=v1/key=256/seq=false/block=1024/-24 2454 482139 ns/op 67.96 MB/s BenchmarkFileCipherStream/fips=true/impl=v1/key=256/seq=false/block=16384/-24 2446 483249 ns/op 67.81 MB/s BenchmarkFileCipherStream/fips=true/impl=v1/key=256/seq=true/block=16/-24 2020 565348 ns/op 57.96 MB/s BenchmarkFileCipherStream/fips=true/impl=v1/key=256/seq=true/block=256/-24 2388 488736 ns/op 67.05 MB/s BenchmarkFileCipherStream/fips=true/impl=v1/key=256/seq=true/block=1024/-24 2418 492126 ns/op 66.58 MB/s BenchmarkFileCipherStream/fips=true/impl=v1/key=256/seq=true/block=16384/-24 2443 480180 ns/op 68.24 MB/s BenchmarkFileCipherStream/fips=true/impl=v2/key=128/seq=false/block=16/-24 259 4733741 ns/op 6.92 MB/s BenchmarkFileCipherStream/fips=true/impl=v2/key=128/seq=false/block=256/-24 3481 297358 ns/op 110.20 MB/s BenchmarkFileCipherStream/fips=true/impl=v2/key=128/seq=false/block=1024/-24 14671 82314 ns/op 398.08 MB/s BenchmarkFileCipherStream/fips=true/impl=v2/key=128/seq=false/block=16384/-24 135296 9077 ns/op 3609.94 MB/s BenchmarkFileCipherStream/fips=true/impl=v2/key=128/seq=true/block=16/-24 2421 452063 ns/op 72.49 MB/s BenchmarkFileCipherStream/fips=true/impl=v2/key=128/seq=true/block=256/-24 33729 36362 ns/op 901.15 MB/s BenchmarkFileCipherStream/fips=true/impl=v2/key=128/seq=true/block=1024/-24 81376 15570 ns/op 2104.61 MB/s BenchmarkFileCipherStream/fips=true/impl=v2/key=128/seq=true/block=16384/-24 139063 8704 ns/op 3764.70 MB/s BenchmarkFileCipherStream/fips=true/impl=v2/key=192/seq=false/block=16/-24 244 5070212 ns/op 6.46 MB/s BenchmarkFileCipherStream/fips=true/impl=v2/key=192/seq=false/block=256/-24 3350 313168 ns/op 104.63 MB/s BenchmarkFileCipherStream/fips=true/impl=v2/key=192/seq=false/block=1024/-24 14427 85797 ns/op 381.93 MB/s BenchmarkFileCipherStream/fips=true/impl=v2/key=192/seq=false/block=16384/-24 114337 10014 ns/op 3272.32 MB/s BenchmarkFileCipherStream/fips=true/impl=v2/key=192/seq=true/block=16/-24 2611 452229 ns/op 72.46 MB/s BenchmarkFileCipherStream/fips=true/impl=v2/key=192/seq=true/block=256/-24 32545 37458 ns/op 874.79 MB/s BenchmarkFileCipherStream/fips=true/impl=v2/key=192/seq=true/block=1024/-24 72807 16660 ns/op 1966.85 MB/s BenchmarkFileCipherStream/fips=true/impl=v2/key=192/seq=true/block=16384/-24 116131 9992 ns/op 3279.30 MB/s BenchmarkFileCipherStream/fips=true/impl=v2/key=256/seq=false/block=16/-24 247 4896561 ns/op 6.69 MB/s BenchmarkFileCipherStream/fips=true/impl=v2/key=256/seq=false/block=256/-24 3409 336213 ns/op 97.46 MB/s BenchmarkFileCipherStream/fips=true/impl=v2/key=256/seq=false/block=1024/-24 13916 86875 ns/op 377.18 MB/s BenchmarkFileCipherStream/fips=true/impl=v2/key=256/seq=false/block=16384/-24 101763 11161 ns/op 2936.06 MB/s BenchmarkFileCipherStream/fips=true/impl=v2/key=256/seq=true/block=16/-24 2326 452598 ns/op 72.40 MB/s BenchmarkFileCipherStream/fips=true/impl=v2/key=256/seq=true/block=256/-24 30164 37851 ns/op 865.70 MB/s BenchmarkFileCipherStream/fips=true/impl=v2/key=256/seq=true/block=1024/-24 68094 17592 ns/op 1862.67 MB/s BenchmarkFileCipherStream/fips=true/impl=v2/key=256/seq=true/block=16384/-24 104826 11190 ns/op 2928.36 MB/s
115549: engineccl: Reimplement FileCipherStreamV2 r=bdarnell a=bdarnell First commit is from #115365. #115454 is related and makes v1 look a little better in the benchmarks. The new implementation, fileCipherStreamV2, is currently only hooked up in tests and benchmarks. It's not used in production until we decide on a feature flagging strategy. The major change is to defer more to the standard library, and specifically to pass larger batches to the stdlib at once. Previously we made a separate call to the crypto library for every 16-byte block; now we pass the entire buffer at once. The downside is random access is more expensive. The previous implementation was basically constant-time, while the new one performs multiple heap allocations for every "seek". To summarize the performance results below, v2 is significantly slower than v1 for tiny (16 byte) non-sequential operations. It is about the same performance for small sequential operations, and substantially faster for large sequential operations (about twice as fast as the original version). The FIPS version is now even faster (for large sequential writes) than non-FIPS. This is because Go's implementation of CTR mode has not been as extensively optimized as openssl's. There are open PRs that may improve this in the future, such as golang/go#53503 ``` goos: linux goarch: amd64 cpu: Intel(R) Xeon(R) CPU @ 2.80GHz BenchmarkFileCipherStream/fips=false/impl=v1/key=128/seq=false/block=16/-24 7521 137392 ns/op 238.50 MB/s BenchmarkFileCipherStream/fips=false/impl=v1/key=128/seq=false/block=256/-24 15793 75349 ns/op 434.88 MB/s BenchmarkFileCipherStream/fips=false/impl=v1/key=128/seq=false/block=1024/-24 16778 71159 ns/op 460.49 MB/s BenchmarkFileCipherStream/fips=false/impl=v1/key=128/seq=false/block=16384/-24 17083 69697 ns/op 470.15 MB/s BenchmarkFileCipherStream/fips=false/impl=v1/key=128/seq=true/block=16/-24 7797 137415 ns/op 238.46 MB/s BenchmarkFileCipherStream/fips=false/impl=v1/key=128/seq=true/block=256/-24 15794 75635 ns/op 433.24 MB/s BenchmarkFileCipherStream/fips=false/impl=v1/key=128/seq=true/block=1024/-24 16785 71195 ns/op 460.26 MB/s BenchmarkFileCipherStream/fips=false/impl=v1/key=128/seq=true/block=16384/-24 17169 69813 ns/op 469.37 MB/s BenchmarkFileCipherStream/fips=false/impl=v1/key=192/seq=false/block=16/-24 7510 142334 ns/op 230.22 MB/s BenchmarkFileCipherStream/fips=false/impl=v1/key=192/seq=false/block=256/-24 14797 81525 ns/op 401.94 MB/s BenchmarkFileCipherStream/fips=false/impl=v1/key=192/seq=false/block=1024/-24 15715 76005 ns/op 431.13 MB/s BenchmarkFileCipherStream/fips=false/impl=v1/key=192/seq=false/block=16384/-24 15985 74794 ns/op 438.11 MB/s BenchmarkFileCipherStream/fips=false/impl=v1/key=192/seq=true/block=16/-24 7558 142334 ns/op 230.22 MB/s BenchmarkFileCipherStream/fips=false/impl=v1/key=192/seq=true/block=256/-24 14826 80456 ns/op 407.28 MB/s BenchmarkFileCipherStream/fips=false/impl=v1/key=192/seq=true/block=1024/-24 15757 76231 ns/op 429.85 MB/s BenchmarkFileCipherStream/fips=false/impl=v1/key=192/seq=true/block=16384/-24 16063 74763 ns/op 438.29 MB/s BenchmarkFileCipherStream/fips=false/impl=v1/key=256/seq=false/block=16/-24 7304 146606 ns/op 223.51 MB/s BenchmarkFileCipherStream/fips=false/impl=v1/key=256/seq=false/block=256/-24 13947 85653 ns/op 382.57 MB/s BenchmarkFileCipherStream/fips=false/impl=v1/key=256/seq=false/block=1024/-24 14589 81277 ns/op 403.16 MB/s BenchmarkFileCipherStream/fips=false/impl=v1/key=256/seq=false/block=16384/-24 14989 80079 ns/op 409.19 MB/s BenchmarkFileCipherStream/fips=false/impl=v1/key=256/seq=true/block=16/-24 7274 145928 ns/op 224.55 MB/s BenchmarkFileCipherStream/fips=false/impl=v1/key=256/seq=true/block=256/-24 13935 85680 ns/op 382.45 MB/s BenchmarkFileCipherStream/fips=false/impl=v1/key=256/seq=true/block=1024/-24 14690 81366 ns/op 402.73 MB/s BenchmarkFileCipherStream/fips=false/impl=v1/key=256/seq=true/block=16384/-24 15002 80120 ns/op 408.99 MB/s BenchmarkFileCipherStream/fips=false/impl=v2/key=128/seq=false/block=16/-24 615 1896677 ns/op 17.28 MB/s BenchmarkFileCipherStream/fips=false/impl=v2/key=128/seq=false/block=256/-24 9006 119509 ns/op 274.19 MB/s BenchmarkFileCipherStream/fips=false/impl=v2/key=128/seq=false/block=1024/-24 23329 51281 ns/op 638.99 MB/s BenchmarkFileCipherStream/fips=false/impl=v2/key=128/seq=false/block=16384/-24 29162 40910 ns/op 800.97 MB/s BenchmarkFileCipherStream/fips=false/impl=v2/key=128/seq=true/block=16/-24 8750 132576 ns/op 247.16 MB/s BenchmarkFileCipherStream/fips=false/impl=v2/key=128/seq=true/block=256/-24 26404 45361 ns/op 722.39 MB/s BenchmarkFileCipherStream/fips=false/impl=v2/key=128/seq=true/block=1024/-24 28626 41808 ns/op 783.78 MB/s BenchmarkFileCipherStream/fips=false/impl=v2/key=128/seq=true/block=16384/-24 29336 40935 ns/op 800.48 MB/s BenchmarkFileCipherStream/fips=false/impl=v2/key=192/seq=false/block=16/-24 574 2042367 ns/op 16.04 MB/s BenchmarkFileCipherStream/fips=false/impl=v2/key=192/seq=false/block=256/-24 8706 129528 ns/op 252.98 MB/s BenchmarkFileCipherStream/fips=false/impl=v2/key=192/seq=false/block=1024/-24 21183 56446 ns/op 580.52 MB/s BenchmarkFileCipherStream/fips=false/impl=v2/key=192/seq=false/block=16384/-24 26109 45655 ns/op 717.73 MB/s BenchmarkFileCipherStream/fips=false/impl=v2/key=192/seq=true/block=16/-24 8504 137355 ns/op 238.56 MB/s BenchmarkFileCipherStream/fips=false/impl=v2/key=192/seq=true/block=256/-24 23960 50296 ns/op 651.50 MB/s BenchmarkFileCipherStream/fips=false/impl=v2/key=192/seq=true/block=1024/-24 25696 46679 ns/op 701.99 MB/s BenchmarkFileCipherStream/fips=false/impl=v2/key=192/seq=true/block=16384/-24 26198 45587 ns/op 718.80 MB/s BenchmarkFileCipherStream/fips=false/impl=v2/key=256/seq=false/block=16/-24 531 2206987 ns/op 14.85 MB/s BenchmarkFileCipherStream/fips=false/impl=v2/key=256/seq=false/block=256/-24 7694 139498 ns/op 234.90 MB/s BenchmarkFileCipherStream/fips=false/impl=v2/key=256/seq=false/block=1024/-24 19524 61046 ns/op 536.77 MB/s BenchmarkFileCipherStream/fips=false/impl=v2/key=256/seq=false/block=16384/-24 23745 50562 ns/op 648.07 MB/s BenchmarkFileCipherStream/fips=false/impl=v2/key=256/seq=true/block=16/-24 8245 148016 ns/op 221.38 MB/s BenchmarkFileCipherStream/fips=false/impl=v2/key=256/seq=true/block=256/-24 21739 55280 ns/op 592.76 MB/s BenchmarkFileCipherStream/fips=false/impl=v2/key=256/seq=true/block=1024/-24 23212 51531 ns/op 635.89 MB/s BenchmarkFileCipherStream/fips=false/impl=v2/key=256/seq=true/block=16384/-24 23805 50438 ns/op 649.66 MB/s PASS goos: linux goarch: amd64 cpu: Intel(R) Xeon(R) CPU @ 2.80GHz BenchmarkFileCipherStream/fips=true/impl=v1/key=128/seq=false/block=16/-24 2050 569817 ns/op 57.51 MB/s BenchmarkFileCipherStream/fips=true/impl=v1/key=128/seq=false/block=256/-24 2499 490116 ns/op 66.86 MB/s BenchmarkFileCipherStream/fips=true/impl=v1/key=128/seq=false/block=1024/-24 2473 478146 ns/op 68.53 MB/s BenchmarkFileCipherStream/fips=true/impl=v1/key=128/seq=false/block=16384/-24 2518 477786 ns/op 68.58 MB/s BenchmarkFileCipherStream/fips=true/impl=v1/key=128/seq=true/block=16/-24 2067 566694 ns/op 57.82 MB/s BenchmarkFileCipherStream/fips=true/impl=v1/key=128/seq=true/block=256/-24 2448 488315 ns/op 67.10 MB/s BenchmarkFileCipherStream/fips=true/impl=v1/key=128/seq=true/block=1024/-24 2490 479422 ns/op 68.35 MB/s BenchmarkFileCipherStream/fips=true/impl=v1/key=128/seq=true/block=16384/-24 2514 481941 ns/op 67.99 MB/s BenchmarkFileCipherStream/fips=true/impl=v1/key=192/seq=false/block=16/-24 2008 568220 ns/op 57.67 MB/s BenchmarkFileCipherStream/fips=true/impl=v1/key=192/seq=false/block=256/-24 2407 493184 ns/op 66.44 MB/s BenchmarkFileCipherStream/fips=true/impl=v1/key=192/seq=false/block=1024/-24 2428 480403 ns/op 68.21 MB/s BenchmarkFileCipherStream/fips=true/impl=v1/key=192/seq=false/block=16384/-24 2421 482593 ns/op 67.90 MB/s BenchmarkFileCipherStream/fips=true/impl=v1/key=192/seq=true/block=16/-24 2040 573229 ns/op 57.16 MB/s BenchmarkFileCipherStream/fips=true/impl=v1/key=192/seq=true/block=256/-24 2314 496560 ns/op 65.99 MB/s BenchmarkFileCipherStream/fips=true/impl=v1/key=192/seq=true/block=1024/-24 2478 476019 ns/op 68.84 MB/s BenchmarkFileCipherStream/fips=true/impl=v1/key=192/seq=true/block=16384/-24 2449 483360 ns/op 67.79 MB/s BenchmarkFileCipherStream/fips=true/impl=v1/key=256/seq=false/block=16/-24 1998 574193 ns/op 57.07 MB/s BenchmarkFileCipherStream/fips=true/impl=v1/key=256/seq=false/block=256/-24 2356 488059 ns/op 67.14 MB/s BenchmarkFileCipherStream/fips=true/impl=v1/key=256/seq=false/block=1024/-24 2454 482139 ns/op 67.96 MB/s BenchmarkFileCipherStream/fips=true/impl=v1/key=256/seq=false/block=16384/-24 2446 483249 ns/op 67.81 MB/s BenchmarkFileCipherStream/fips=true/impl=v1/key=256/seq=true/block=16/-24 2020 565348 ns/op 57.96 MB/s BenchmarkFileCipherStream/fips=true/impl=v1/key=256/seq=true/block=256/-24 2388 488736 ns/op 67.05 MB/s BenchmarkFileCipherStream/fips=true/impl=v1/key=256/seq=true/block=1024/-24 2418 492126 ns/op 66.58 MB/s BenchmarkFileCipherStream/fips=true/impl=v1/key=256/seq=true/block=16384/-24 2443 480180 ns/op 68.24 MB/s BenchmarkFileCipherStream/fips=true/impl=v2/key=128/seq=false/block=16/-24 259 4733741 ns/op 6.92 MB/s BenchmarkFileCipherStream/fips=true/impl=v2/key=128/seq=false/block=256/-24 3481 297358 ns/op 110.20 MB/s BenchmarkFileCipherStream/fips=true/impl=v2/key=128/seq=false/block=1024/-24 14671 82314 ns/op 398.08 MB/s BenchmarkFileCipherStream/fips=true/impl=v2/key=128/seq=false/block=16384/-24 135296 9077 ns/op 3609.94 MB/s BenchmarkFileCipherStream/fips=true/impl=v2/key=128/seq=true/block=16/-24 2421 452063 ns/op 72.49 MB/s BenchmarkFileCipherStream/fips=true/impl=v2/key=128/seq=true/block=256/-24 33729 36362 ns/op 901.15 MB/s BenchmarkFileCipherStream/fips=true/impl=v2/key=128/seq=true/block=1024/-24 81376 15570 ns/op 2104.61 MB/s BenchmarkFileCipherStream/fips=true/impl=v2/key=128/seq=true/block=16384/-24 139063 8704 ns/op 3764.70 MB/s BenchmarkFileCipherStream/fips=true/impl=v2/key=192/seq=false/block=16/-24 244 5070212 ns/op 6.46 MB/s BenchmarkFileCipherStream/fips=true/impl=v2/key=192/seq=false/block=256/-24 3350 313168 ns/op 104.63 MB/s BenchmarkFileCipherStream/fips=true/impl=v2/key=192/seq=false/block=1024/-24 14427 85797 ns/op 381.93 MB/s BenchmarkFileCipherStream/fips=true/impl=v2/key=192/seq=false/block=16384/-24 114337 10014 ns/op 3272.32 MB/s BenchmarkFileCipherStream/fips=true/impl=v2/key=192/seq=true/block=16/-24 2611 452229 ns/op 72.46 MB/s BenchmarkFileCipherStream/fips=true/impl=v2/key=192/seq=true/block=256/-24 32545 37458 ns/op 874.79 MB/s BenchmarkFileCipherStream/fips=true/impl=v2/key=192/seq=true/block=1024/-24 72807 16660 ns/op 1966.85 MB/s BenchmarkFileCipherStream/fips=true/impl=v2/key=192/seq=true/block=16384/-24 116131 9992 ns/op 3279.30 MB/s BenchmarkFileCipherStream/fips=true/impl=v2/key=256/seq=false/block=16/-24 247 4896561 ns/op 6.69 MB/s BenchmarkFileCipherStream/fips=true/impl=v2/key=256/seq=false/block=256/-24 3409 336213 ns/op 97.46 MB/s BenchmarkFileCipherStream/fips=true/impl=v2/key=256/seq=false/block=1024/-24 13916 86875 ns/op 377.18 MB/s BenchmarkFileCipherStream/fips=true/impl=v2/key=256/seq=false/block=16384/-24 101763 11161 ns/op 2936.06 MB/s BenchmarkFileCipherStream/fips=true/impl=v2/key=256/seq=true/block=16/-24 2326 452598 ns/op 72.40 MB/s BenchmarkFileCipherStream/fips=true/impl=v2/key=256/seq=true/block=256/-24 30164 37851 ns/op 865.70 MB/s BenchmarkFileCipherStream/fips=true/impl=v2/key=256/seq=true/block=1024/-24 68094 17592 ns/op 1862.67 MB/s BenchmarkFileCipherStream/fips=true/impl=v2/key=256/seq=true/block=16384/-24 104826 11190 ns/op 2928.36 MB/s ``` Co-authored-by: Ben Darnell <[email protected]>
Message from Ben Darnell: Patch Set 8: (1 comment) Please don’t reply on this GitHub thread. Visit golang.org/cl/413594. |
Message from qiulaidongfeng: Patch Set 8: (1 comment) Please don’t reply on this GitHub thread. Visit golang.org/cl/413594. |
Message from Ben Darnell: Patch Set 8: (1 comment) Please don’t reply on this GitHub thread. Visit golang.org/cl/413594. |
Message from qiulaidongfeng: Patch Set 8: (1 comment) Please don’t reply on this GitHub thread. Visit golang.org/cl/413594. |
The implementation runs up to 8 AES instructions in different registers one after another in ASM code. Because CPU has instruction pipelining and the instructions do not depend on each other, they can run in parallel with this layout of code. This results in significant speedup compared to the regular implementation in which blocks are processed in the same registers so AES instructions do not run in parallel. GCM mode already utilizes the approach. The type implementing ctrAble in ASM has most of its code in XORKeyStreamAt method which has an additional argument, offset. It allows to use it in a stateless way and to jump to any location in the stream. The method does not exist in pure Go and boringcrypto implementations. AES CTR benchmark delta. $ go test crypto/cipher -bench 'BenchmarkAESCTR*' AMD64. Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz name old time/op new time/op delta BenchmarkAESCTR1K-2 1259ns 266.9ns -78.8% BenchmarkAESCTR8K-2 9859ns 1953ns -80.1% ARM64. ARM Neoverse-N1 (AWS EC2 t4g.small instance) name old time/op new time/op delta BenchmarkAESCTR1K-2 1098ns 481.1ns -56.2% BenchmarkAESCTR8K-2 8447ns 3452ns -59.1% Original issue: golang#20967 Investigation and initial implementation: https://github.com/mmcloughlin/aesnix/ Full implementation in external repo: https://github.com/starius/aesctrat
92848a1
to
e087300
Compare
This PR (HEAD: e087300) has been imported to Gerrit for code review. Please visit Gerrit at https://go-review.googlesource.com/c/go/+/413594. Important tips:
|
Message from Борис Нагаев: Patch Set 8: (1 comment) Please don’t reply on this GitHub thread. Visit golang.org/cl/413594. |
Message from Борис Нагаев: Patch Set 9: (1 comment) Please don’t reply on this GitHub thread. Visit golang.org/cl/413594. |
Message from qiulaidongfeng: Patch Set 9: (1 comment) Please don’t reply on this GitHub thread. Visit golang.org/cl/413594. |
Message from Ben Darnell: Patch Set 9: (1 comment) Please don’t reply on this GitHub thread. Visit golang.org/cl/413594. |
Message from Борис Нагаев: Patch Set 9: (1 comment) Please don’t reply on this GitHub thread. Visit golang.org/cl/413594. |
Message from Filippo Valsorda: Patch Set 9: (1 comment) Please don’t reply on this GitHub thread. Visit golang.org/cl/413594. |
Message from Filippo Valsorda: Patch Set 9: (1 comment) Please don’t reply on this GitHub thread. Visit golang.org/cl/413594. |
The implementation runs up to 8 AES instructions in different registers
one after another in ASM code. Because CPU has instruction pipelining
and the instructions do not depend on each other, they can run in
parallel with this layout of code. This results in significant speedup
compared to the regular implementation in which blocks are processed in
the same registers so AES instructions do not run in parallel.
GCM mode already utilizes the approach.
The ASM implementation of ctrAble has most of its code in XORKeyStreamAt
method which has an additional argument, offset. It allows to use it
in a stateless way and to jump to any location in the stream. The method
does not exist in pure Go and boringcrypto implementations.
AES CTR benchmark delta.
$ go test crypto/cipher -bench 'BenchmarkAESCTR*'
AMD64. Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
name old time/op new time/op delta
BenchmarkAESCTR1K-2 1259ns 266.9ns -78.8%
BenchmarkAESCTR8K-2 9859ns 1953ns -80.1%
ARM64. ARM Neoverse-N1 (AWS EC2 t4g.small instance)
name old time/op new time/op delta
BenchmarkAESCTR1K-2 1098ns 481.1ns -56.2%
BenchmarkAESCTR8K-2 8447ns 3452ns -59.1%
ARM64. Apple M1
name old time/op new time/op delta
BenchmarkAESCTR1K-2 455.3ns 154.3ns -66.1%
BenchmarkAESCTR8K-2 3491ns 1116ns -68.0%
Fixes #20967
Investigation and initial implementation:
https://github.com/mmcloughlin/aesnix/
Full implementation in external repo:
https://github.com/starius/aesctrat