Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Constant Time Lookup NibbleMap #108939

Merged
merged 35 commits into from
Nov 11, 2024
Merged

Conversation

max-charlamb
Copy link
Contributor

@max-charlamb max-charlamb commented Oct 16, 2024

What is it?

The existing nibblemap data structure has O(n) reads and O(1) writes.
Implements an alternative nibblemap data structure that has O(1) reads and O(n) writes.

Fixes #93550

See other PR which adds cDAC support for the new algorithm: #109654

Changes

  • Modifies NibbleMapSet and FindMethodCode
  • Adds separate function NibbleMapDelete because the codeSize argument is not required for deletion.
  • Adds size_t codeSize argument to NibbleMapSet and passes proper code size in all invocations.
  • Adapts MethodSectionIterator to new nibblemap implementation.
  • Adapts OutOfProcessFindHeader to new nibblemap implementation.
  • Removes reference to "nibblemapmacros.h" in codepitchingmanager.cpp. This does not appear to be used and is removed for clarity and to keep nibblemap changes isolated.

Benchmarking

Tests

In order to compare the performance of the previous (Linear) algorithm with the new algorithm (Constant), all of the dotnet/performance benchmarks relating to Exceptions were run.

In addition, an optimal scenario for the Constant algorithm was created (but not checked in). The new algorithm should see the most improvement when it is reading from the end of an extremely long function. The previous algorithm would have needed to iterate to the beginning of the function body, with a single DWORD read accounting for 256 bytes of the function. Usually this is a small number but can grow given method length is unbounded.

To show off the difference, I created a benchmark StackTrace with an artificially long function which recursively calls itself (requiring multiple nibblemap reads to unwind the stack). This was done by using ~6000 successive if (b) { Console.WriteLine("Hello World"); } statements. In the test run b is always false but still generates extra instructions.

When this is compiled down to win-x64 assembly the function is 152990 bytes long. Therefore, doing a FindMethodCode (nibble map read) near the end of the function should result in almost 600 successive DWORD reads in the Linear algorithm compared to 2 in the Constant algorithm.

Results

The microbenchmark runs seem to have inconclusive results. While we would expect the Constant algorithm to be faster in extremely long functions (StackTrace test) this is only true some of the time. This could be due the the microbenchmark format allowing the processors branch predictor to correctly guess the number of iterations to read in the linear algorithm effectively skipping a large cost in real world scenarios. This effect varies across stack depth and seems inconsistent run to run.

Overall, this change seems to be within ~5% (better or worse) performance from the previous linear algorithm, but reduces the theoretical worst-case performance.

Benchmarking details
Method Toolchain depth kind Mean Error Ratio
StackTrace \Constant\corerun.exe 1 ? 54.887 us 2.5243 us 1.00
StackTrace \Linear\corerun.exe 1 ? 59.314 us 6.6220 us 1.08
StackTrace \Constant\corerun.exe 10 ? 247.103 us 11.5665 us 1.00
StackTrace \Linear\corerun.exe 10 ? 250.995 us 14.6442 us 1.02
StackTrace \Constant\corerun.exe 100 ? 2,186.940 us 152.0479 us 1.01
StackTrace \Linear\corerun.exe 100 ? 2,103.922 us 81.1039 us 0.97
StackTrace \Constant\corerun.exe 1000 ? 18,779.557 us 587.3016 us 1.00
StackTrace \Linear\corerun.exe 1000 ? 22,177.570 us 1,335.1786 us 1.18
ThrowAndCatch \Constant\corerun.exe ? Software 1.922 us 0.1168 us 1.00
ThrowAndCatch \Linear\corerun.exe ? Software 1.760 us 0.0779 us 0.92
ThrowAndCatch_ManyCatchBlocks \Constant\corerun.exe ? Software 2.160 us 0.1177 us 1.00
ThrowAndCatch_ManyCatchBlocks \Linear\corerun.exe ? Software 1.882 us 0.0367 us 0.87
ThrowAndCatchFinally \Constant\corerun.exe ? Software 1.932 us 0.1631 us 1.01
ThrowAndCatchFinally \Linear\corerun.exe ? Software 1.768 us 0.0969 us 0.92
ThrowAndCatchWhen \Constant\corerun.exe ? Software 2.308 us 0.2131 us 1.01
ThrowAndCatchWhen \Linear\corerun.exe ? Software 1.969 us 0.1732 us 0.86
ThrowAndCatchWhenFinally \Constant\corerun.exe ? Software 2.176 us 0.2765 us 1.02
ThrowAndCatchWhenFinally \Linear\corerun.exe ? Software 1.827 us 0.0813 us 0.85
ThrowAndCatchDeep \Constant\corerun.exe ? Software 6.830 us 0.4687 us 1.01
ThrowAndCatchDeep \Linear\corerun.exe ? Software 6.281 us 0.2232 us 0.92
ThrowAndCatchDeepRecursive \Constant\corerun.exe ? Software 6.917 us 0.2981 us 1.00
ThrowAndCatchDeepRecursive \Linear\corerun.exe ? Software 6.689 us 0.3461 us 0.97
MultipleNestedTryCatch_FirstCatches \Constant\corerun.exe ? Software 1.782 us 0.0538 us 1.00
MultipleNestedTryCatch_FirstCatches \Linear\corerun.exe ? Software 1.799 us 0.0614 us 1.01
MultipleNestedTryCatch_LastCatches \Constant\corerun.exe ? Software 1.948 us 0.0674 us 1.00
MultipleNestedTryCatch_LastCatches \Linear\corerun.exe ? Software 1.890 us 0.0707 us 0.97
MultipleNestedTryFinally \Constant\corerun.exe ? Software 2.008 us 0.0375 us 1.00
MultipleNestedTryFinally \Linear\corerun.exe ? Software 1.892 us 0.0511 us 0.94
CatchAndRethrowDeep \Constant\corerun.exe ? Software 30.859 us 0.8336 us 1.00
CatchAndRethrowDeep \Linear\corerun.exe ? Software 32.656 us 1.9763 us 1.06
CatchAndThrowOtherDeep \Constant\corerun.exe ? Software 33.849 us 1.2789 us 1.00
CatchAndThrowOtherDeep \Linear\corerun.exe ? Software 34.615 us 2.0650 us 1.02
TryAndFinallyDeep \Constant\corerun.exe ? Software 7.309 us 0.4300 us 1.00
TryAndFinallyDeep \Linear\corerun.exe ? Software 6.991 us 0.1374 us 0.96
TryAndCatchDeep_CaugtAtTheTop \Constant\corerun.exe ? Software 7.642 us 0.2970 us 1.00
TryAndCatchDeep_CaugtAtTheTop \Linear\corerun.exe ? Software 6.798 us 0.1371 us 0.89
ThrowAndCatch \Constant\corerun.exe ? Hardware 2.959 us 0.0744 us 1.00
ThrowAndCatch \Linear\corerun.exe ? Hardware 2.883 us 0.0571 us 0.98
ThrowAndCatch_ManyCatchBlocks \Constant\corerun.exe ? Hardware 3.181 us 0.0622 us 1.00
ThrowAndCatch_ManyCatchBlocks \Linear\corerun.exe ? Hardware 3.134 us 0.0668 us 0.99
ThrowAndCatchFinally \Constant\corerun.exe ? Hardware 2.915 us 0.0712 us 1.00
ThrowAndCatchFinally \Linear\corerun.exe ? Hardware 2.959 us 0.0982 us 1.02
ThrowAndCatchWhen \Constant\corerun.exe ? Hardware 2.992 us 0.0587 us 1.00
ThrowAndCatchWhen \Linear\corerun.exe ? Hardware 2.939 us 0.0518 us 0.98
ThrowAndCatchWhenFinally \Constant\corerun.exe ? Hardware 3.012 us 0.0614 us 1.00
ThrowAndCatchWhenFinally \Linear\corerun.exe ? Hardware 2.976 us 0.0588 us 0.99
ThrowAndCatchDeep \Constant\corerun.exe ? Hardware 7.470 us 0.1429 us 1.00
ThrowAndCatchDeep \Linear\corerun.exe ? Hardware 7.395 us 0.1424 us 0.99
ThrowAndCatchDeepRecursive \Constant\corerun.exe ? Hardware 7.634 us 0.1523 us 1.00
ThrowAndCatchDeepRecursive \Linear\corerun.exe ? Hardware 8.099 us 0.2414 us 1.06
MultipleNestedTryCatch_FirstCatches \Constant\corerun.exe ? Hardware 2.887 us 0.0575 us 1.00
MultipleNestedTryCatch_FirstCatches \Linear\corerun.exe ? Hardware 2.940 us 0.0595 us 1.02
MultipleNestedTryCatch_LastCatches \Constant\corerun.exe ? Hardware 3.099 us 0.0756 us 1.00
MultipleNestedTryCatch_LastCatches \Linear\corerun.exe ? Hardware 3.111 us 0.0612 us 1.00
MultipleNestedTryFinally \Constant\corerun.exe ? Hardware 3.078 us 0.0628 us 1.00
MultipleNestedTryFinally \Linear\corerun.exe ? Hardware 3.117 us 0.0582 us 1.01
CatchAndRethrowDeep \Constant\corerun.exe ? Hardware 30.660 us 0.4211 us 1.00
CatchAndRethrowDeep \Linear\corerun.exe ? Hardware 30.387 us 0.7820 us 0.99
CatchAndThrowOtherDeep \Constant\corerun.exe ? Hardware 34.458 us 0.7226 us 1.00
CatchAndThrowOtherDeep \Linear\corerun.exe ? Hardware 34.642 us 0.8285 us 1.01
TryAndFinallyDeep \Constant\corerun.exe ? Hardware 8.484 us 0.2266 us 1.00
TryAndFinallyDeep \Linear\corerun.exe ? Hardware 8.643 us 0.2083 us 1.02
TryAndCatchDeep_CaugtAtTheTop \Constant\corerun.exe ? Hardware 8.292 us 0.1711 us 1.00
TryAndCatchDeep_CaugtAtTheTop \Linear\corerun.exe ? Hardware 8.270 us 0.1647 us 1.00
ThrowAndCatch \Constant\corerun.exe ? ReflectionSoftware 6.575 us 0.1319 us 1.00
ThrowAndCatch \Linear\corerun.exe ? ReflectionSoftware 6.649 us 0.1284 us 1.01
ThrowAndCatch_ManyCatchBlocks \Constant\corerun.exe ? ReflectionSoftware 6.828 us 0.1563 us 1.00
ThrowAndCatch_ManyCatchBlocks \Linear\corerun.exe ? ReflectionSoftware 7.008 us 0.1907 us 1.03
ThrowAndCatchDeep \Constant\corerun.exe ? ReflectionSoftware 10.993 us 0.3096 us 1.00
ThrowAndCatchDeep \Linear\corerun.exe ? ReflectionSoftware 11.145 us 0.3053 us 1.01
ThrowAndCatchDeepRecursive \Constant\corerun.exe ? ReflectionSoftware 11.779 us 0.3687 us 1.00
ThrowAndCatchDeepRecursive \Linear\corerun.exe ? ReflectionSoftware 11.405 us 0.2920 us 0.97
ThrowAndCatch \Constant\corerun.exe ? ReflectionHardware 8.043 us 0.2432 us 1.00
ThrowAndCatch \Linear\corerun.exe ? ReflectionHardware 8.135 us 0.2468 us 1.01
ThrowAndCatch_ManyCatchBlocks \Constant\corerun.exe ? ReflectionHardware 8.216 us 0.2361 us 1.00
ThrowAndCatch_ManyCatchBlocks \Linear\corerun.exe ? ReflectionHardware 8.404 us 0.2156 us 1.02
ThrowAndCatchDeep \Constant\corerun.exe ? ReflectionHardware 12.421 us 0.3315 us 1.00
ThrowAndCatchDeep \Linear\corerun.exe ? ReflectionHardware 12.926 us 0.4600 us 1.04
ThrowAndCatchDeepRecursive \Constant\corerun.exe ? ReflectionHardware 12.743 us 0.2532 us 1.00
ThrowAndCatchDeepRecursive \Linear\corerun.exe ? ReflectionHardware 13.087 us 0.4372 us 1.03

Verification

  • OutOfProcessFindHeader
    • This is used by the Visual Studio native debugger when debugging a managed dotnet application. Verified that with the changes the debugger is still able to get a proper call stack. Without the changes here, the result is garbled.

Copy link
Member

@lambdageek lambdageek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know the existing code uses macros, but i'm not sure that new code needs to go down the same path

src/coreclr/inc/clrconfigvalues.h Outdated Show resolved Hide resolved
src/coreclr/inc/nibblemapmacros.h Outdated Show resolved Hide resolved
src/coreclr/vm/codeman.cpp Outdated Show resolved Hide resolved
@max-charlamb
Copy link
Contributor Author

/azp run

Copy link

You have several pipelines (over 10) configured to build pull requests in this repository. Specify which pipelines you would like to run by using /azp run [pipelines] command. You can specify multiple pipelines using a comma separated list.

@max-charlamb
Copy link
Contributor Author

/azp run runtime-coreclr outerloop

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@max-charlamb
Copy link
Contributor Author

/azp run runtime-coreclr outerloop

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

src/coreclr/vm/codeman.cpp Outdated Show resolved Hide resolved
src/coreclr/vm/codeman.cpp Outdated Show resolved Hide resolved
src/coreclr/vm/codeman.cpp Outdated Show resolved Hide resolved
src/coreclr/vm/codeman.cpp Outdated Show resolved Hide resolved
src/coreclr/vm/codeman.cpp Outdated Show resolved Hide resolved
@lambdageek
Copy link
Member

@max-charlamb when this is ready to go, please also add a task to #108553 to update the ExecutionManager_1 contract implementation to use an updated reader algorithm (or bump the contract version 2 and implement a new nibble map reader, or whatever you and @davidwrighton decide is the right thing to do)

@max-charlamb max-charlamb changed the title implement optimized nibblemap Constant Time Lookup NibbleMap Nov 4, 2024
Copy link

No pipelines are associated with this pull request.

@max-charlamb max-charlamb marked this pull request as ready for review November 5, 2024 15:30
Copy link
Member

@davidwrighton davidwrighton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Please don't merge this until you also have the cDAC changes ready to merge.

@davmason
Copy link
Member

davmason commented Nov 8, 2024

CC @hoyosjs and @noahfalk if you want to look at the DAC stuff

@noahfalk
Copy link
Member

noahfalk commented Nov 8, 2024

CC @hoyosjs and @noahfalk if you want to look at the DAC stuff

Thanks David. I skimmed through and it looked fine to me. On the DAC side we might find that doing many small reads has higher perf costs than we'd like, but I don't think its worth optimizing right now just based on my speculation. The current DAC probably has similar overhead and we have room to optimize in the future if testing broader scenarios identifies this part of the code as a hot spot.

Copy link
Member

@davmason davmason left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@max-charlamb max-charlamb merged commit 4087cc8 into dotnet:main Nov 11, 2024
90 checks passed
@EgorBo
Copy link
Member

EgorBo commented Nov 19, 2024

Looks like this change broke AltJits, I am seeing

Assert failure(PID 47136 [0x0000b820], Thread: 37808 [0x93b0]): ((*pMap) & ~mask) && !IsPointer(*pMap)

CORECLR! GetCLRRuntimeHost + 0x81C928 (0x00007ffb`1827d888)
CORECLR! GetCLRRuntimeHost + 0x81E1DA (0x00007ffb`1827f13a)
CORECLR! GetCLRRuntimeHost + 0xA1AF9 (0x00007ffb`17b02a59)
CORECLR! GetCLRRuntimeHost + 0xDB795 (0x00007ffb`17b3c6f5)
CORECLR! GetCLRRuntimeHost + 0xAFA8C (0x00007ffb`17b109ec)
CORECLR! GetCLRRuntimeHost + 0x1871F2 (0x00007ffb`17be8152)
CORECLR! GetCLRRuntimeHost + 0x187A7D (0x00007ffb`17be89dd)
CORECLR! GetCLRRuntimeHost + 0x186F1B (0x00007ffb`17be7e7b)
CORECLR! GetCLRRuntimeHost + 0x1898A1 (0x00007ffb`17bea801)
CORECLR! GetCLRRuntimeHost + 0x189353 (0x00007ffb`17bea2b3)
    File: C:\prj\runtime-main\src\coreclr\vm\codeman.cpp:4154

cc @dotnet/jit-contrib if you observe the same

@kunalspathak
Copy link
Member

Yes, i can confirm that I have seen this failure using altjit:


Assert failure(PID 21700 [0x000054c4], Thread: 47520 [0xb9a0]): ((*pMap) & ~mask) && !IsPointer(*pMap)

CORECLR! EEJitManager::NibbleMapDeleteUnlocked + 0x158 (0x00007ff9`161db138)
CORECLR! EEJitManager::RemoveJitData + 0x1AA (0x00007ff9`161dc9ea)
CORECLR! CEEJitInfo::BackoutJitData + 0xD9 (0x00007ff9`15a60109)
CORECLR! invokeCompileMethod + 0x195 (0x00007ff9`15a99d95)
CORECLR! UnsafeJitFunction + 0x83C (0x00007ff9`15a6e08c)
CORECLR! MethodDesc::JitCompileCodeLocked + 0x1D2 (0x00007ff9`15b45882)
CORECLR! MethodDesc::JitCompileCodeLockedEventWrapper + 0x50D (0x00007ff9`15b4610d)
CORECLR! MethodDesc::JitCompileCode + 0x5AB (0x00007ff9`15b455ab)
CORECLR! MethodDesc::PrepareILBasedCode + 0x50C (0x00007ff9`15b47f4c)
CORECLR! MethodDesc::PrepareCode + 0xF3 (0x00007ff9`15b479e3)
    File: Q:\git\runtime2\src\coreclr\vm\codeman.cpp:4148
    Image: q:\git\runtime2\artifacts\tests\coreclr\windows.x64.Checked\tests\Core_Root\corerun.exe

@kunalspathak
Copy link
Member

Opened #109970

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

nibblemapmacros do a linear scan
7 participants