-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Constant Time Lookup NibbleMap #108939
Constant Time Lookup NibbleMap #108939
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know the existing code uses macros, but i'm not sure that new code needs to go down the same path
e105570
to
d59ea81
Compare
/azp run |
You have several pipelines (over 10) configured to build pull requests in this repository. Specify which pipelines you would like to run by using /azp run [pipelines] command. You can specify multiple pipelines using a comma separated list. |
/azp run runtime-coreclr outerloop |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run runtime-coreclr outerloop |
Azure Pipelines successfully started running 1 pipeline(s). |
836f71f
to
25c987e
Compare
@max-charlamb when this is ready to go, please also add a task to #108553 to update the |
No pipelines are associated with this pull request. |
867afe4
to
f975925
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Please don't merge this until you also have the cDAC changes ready to merge.
Thanks David. I skimmed through and it looked fine to me. On the DAC side we might find that doing many small reads has higher perf costs than we'd like, but I don't think its worth optimizing right now just based on my speculation. The current DAC probably has similar overhead and we have room to optimize in the future if testing broader scenarios identifies this part of the code as a hot spot. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Looks like this change broke AltJits, I am seeing
cc @dotnet/jit-contrib if you observe the same |
Yes, i can confirm that I have seen this failure using altjit:
|
Opened #109970 |
What is it?
The existing nibblemap data structure has O(n) reads and O(1) writes.
Implements an alternative nibblemap data structure that has O(1) reads and O(n) writes.
Fixes #93550
See other PR which adds cDAC support for the new algorithm: #109654
Changes
NibbleMapSet
andFindMethodCode
NibbleMapDelete
because the codeSize argument is not required for deletion.size_t codeSize
argument toNibbleMapSet
and passes proper code size in all invocations.MethodSectionIterator
to new nibblemap implementation.OutOfProcessFindHeader
to new nibblemap implementation."nibblemapmacros.h"
incodepitchingmanager.cpp
. This does not appear to be used and is removed for clarity and to keep nibblemap changes isolated.Benchmarking
Tests
In order to compare the performance of the previous (Linear) algorithm with the new algorithm (Constant), all of the dotnet/performance benchmarks relating to Exceptions were run.
In addition, an optimal scenario for the Constant algorithm was created (but not checked in). The new algorithm should see the most improvement when it is reading from the end of an extremely long function. The previous algorithm would have needed to iterate to the beginning of the function body, with a single DWORD read accounting for 256 bytes of the function. Usually this is a small number but can grow given method length is unbounded.
To show off the difference, I created a benchmark
StackTrace
with an artificially long function which recursively calls itself (requiring multiple nibblemap reads to unwind the stack). This was done by using ~6000 successiveif (b) { Console.WriteLine("Hello World"); }
statements. In the test runb
is always false but still generates extra instructions.When this is compiled down to win-x64 assembly the function is
152990
bytes long. Therefore, doing aFindMethodCode
(nibble map read) near the end of the function should result in almost 600 successive DWORD reads in the Linear algorithm compared to 2 in the Constant algorithm.Results
The microbenchmark runs seem to have inconclusive results. While we would expect the Constant algorithm to be faster in extremely long functions (StackTrace test) this is only true some of the time. This could be due the the microbenchmark format allowing the processors branch predictor to correctly guess the number of iterations to read in the linear algorithm effectively skipping a large cost in real world scenarios. This effect varies across stack depth and seems inconsistent run to run.
Overall, this change seems to be within ~5% (better or worse) performance from the previous linear algorithm, but reduces the theoretical worst-case performance.
Benchmarking details
Verification
OutOfProcessFindHeader