Constant Time Lookup NibbleMap #108939

max-charlamb · 2024-10-16T19:22:44Z

What is it?

The existing nibblemap data structure has O(n) reads and O(1) writes.
Implements an alternative nibblemap data structure that has O(1) reads and O(n) writes.

Fixes #93550

See other PR which adds cDAC support for the new algorithm: #109654

Changes

Modifies NibbleMapSet and FindMethodCode
Adds separate function NibbleMapDelete because the codeSize argument is not required for deletion.
Adds size_t codeSize argument to NibbleMapSet and passes proper code size in all invocations.
Adapts MethodSectionIterator to new nibblemap implementation.
Adapts OutOfProcessFindHeader to new nibblemap implementation.
Removes reference to "nibblemapmacros.h" in codepitchingmanager.cpp. This does not appear to be used and is removed for clarity and to keep nibblemap changes isolated.

Benchmarking

Tests

In order to compare the performance of the previous (Linear) algorithm with the new algorithm (Constant), all of the dotnet/performance benchmarks relating to Exceptions were run.

In addition, an optimal scenario for the Constant algorithm was created (but not checked in). The new algorithm should see the most improvement when it is reading from the end of an extremely long function. The previous algorithm would have needed to iterate to the beginning of the function body, with a single DWORD read accounting for 256 bytes of the function. Usually this is a small number but can grow given method length is unbounded.

To show off the difference, I created a benchmark StackTrace with an artificially long function which recursively calls itself (requiring multiple nibblemap reads to unwind the stack). This was done by using ~6000 successive if (b) { Console.WriteLine("Hello World"); } statements. In the test run b is always false but still generates extra instructions.

When this is compiled down to win-x64 assembly the function is 152990 bytes long. Therefore, doing a FindMethodCode (nibble map read) near the end of the function should result in almost 600 successive DWORD reads in the Linear algorithm compared to 2 in the Constant algorithm.

Results

The microbenchmark runs seem to have inconclusive results. While we would expect the Constant algorithm to be faster in extremely long functions (StackTrace test) this is only true some of the time. This could be due the the microbenchmark format allowing the processors branch predictor to correctly guess the number of iterations to read in the linear algorithm effectively skipping a large cost in real world scenarios. This effect varies across stack depth and seems inconsistent run to run.

Overall, this change seems to be within ~5% (better or worse) performance from the previous linear algorithm, but reduces the theoretical worst-case performance.

Benchmarking details

Method	Toolchain	depth	kind	Mean	Error	Ratio
StackTrace	\Constant\corerun.exe	1	?	54.887 us	2.5243 us	1.00
StackTrace	\Linear\corerun.exe	1	?	59.314 us	6.6220 us	1.08

StackTrace	\Constant\corerun.exe	10	?	247.103 us	11.5665 us	1.00
StackTrace	\Linear\corerun.exe	10	?	250.995 us	14.6442 us	1.02

StackTrace	\Constant\corerun.exe	100	?	2,186.940 us	152.0479 us	1.01
StackTrace	\Linear\corerun.exe	100	?	2,103.922 us	81.1039 us	0.97

StackTrace	\Constant\corerun.exe	1000	?	18,779.557 us	587.3016 us	1.00
StackTrace	\Linear\corerun.exe	1000	?	22,177.570 us	1,335.1786 us	1.18

ThrowAndCatch	\Constant\corerun.exe	?	Software	1.922 us	0.1168 us	1.00
ThrowAndCatch	\Linear\corerun.exe	?	Software	1.760 us	0.0779 us	0.92

ThrowAndCatch_ManyCatchBlocks	\Constant\corerun.exe	?	Software	2.160 us	0.1177 us	1.00
ThrowAndCatch_ManyCatchBlocks	\Linear\corerun.exe	?	Software	1.882 us	0.0367 us	0.87

ThrowAndCatchFinally	\Constant\corerun.exe	?	Software	1.932 us	0.1631 us	1.01
ThrowAndCatchFinally	\Linear\corerun.exe	?	Software	1.768 us	0.0969 us	0.92

ThrowAndCatchWhen	\Constant\corerun.exe	?	Software	2.308 us	0.2131 us	1.01
ThrowAndCatchWhen	\Linear\corerun.exe	?	Software	1.969 us	0.1732 us	0.86

ThrowAndCatchWhenFinally	\Constant\corerun.exe	?	Software	2.176 us	0.2765 us	1.02
ThrowAndCatchWhenFinally	\Linear\corerun.exe	?	Software	1.827 us	0.0813 us	0.85

ThrowAndCatchDeep	\Constant\corerun.exe	?	Software	6.830 us	0.4687 us	1.01
ThrowAndCatchDeep	\Linear\corerun.exe	?	Software	6.281 us	0.2232 us	0.92

ThrowAndCatchDeepRecursive	\Constant\corerun.exe	?	Software	6.917 us	0.2981 us	1.00
ThrowAndCatchDeepRecursive	\Linear\corerun.exe	?	Software	6.689 us	0.3461 us	0.97

MultipleNestedTryCatch_FirstCatches	\Constant\corerun.exe	?	Software	1.782 us	0.0538 us	1.00
MultipleNestedTryCatch_FirstCatches	\Linear\corerun.exe	?	Software	1.799 us	0.0614 us	1.01

MultipleNestedTryCatch_LastCatches	\Constant\corerun.exe	?	Software	1.948 us	0.0674 us	1.00
MultipleNestedTryCatch_LastCatches	\Linear\corerun.exe	?	Software	1.890 us	0.0707 us	0.97

MultipleNestedTryFinally	\Constant\corerun.exe	?	Software	2.008 us	0.0375 us	1.00
MultipleNestedTryFinally	\Linear\corerun.exe	?	Software	1.892 us	0.0511 us	0.94

CatchAndRethrowDeep	\Constant\corerun.exe	?	Software	30.859 us	0.8336 us	1.00
CatchAndRethrowDeep	\Linear\corerun.exe	?	Software	32.656 us	1.9763 us	1.06

CatchAndThrowOtherDeep	\Constant\corerun.exe	?	Software	33.849 us	1.2789 us	1.00
CatchAndThrowOtherDeep	\Linear\corerun.exe	?	Software	34.615 us	2.0650 us	1.02

TryAndFinallyDeep	\Constant\corerun.exe	?	Software	7.309 us	0.4300 us	1.00
TryAndFinallyDeep	\Linear\corerun.exe	?	Software	6.991 us	0.1374 us	0.96

TryAndCatchDeep_CaugtAtTheTop	\Constant\corerun.exe	?	Software	7.642 us	0.2970 us	1.00
TryAndCatchDeep_CaugtAtTheTop	\Linear\corerun.exe	?	Software	6.798 us	0.1371 us	0.89

ThrowAndCatch	\Constant\corerun.exe	?	Hardware	2.959 us	0.0744 us	1.00
ThrowAndCatch	\Linear\corerun.exe	?	Hardware	2.883 us	0.0571 us	0.98

ThrowAndCatch_ManyCatchBlocks	\Constant\corerun.exe	?	Hardware	3.181 us	0.0622 us	1.00
ThrowAndCatch_ManyCatchBlocks	\Linear\corerun.exe	?	Hardware	3.134 us	0.0668 us	0.99

ThrowAndCatchFinally	\Constant\corerun.exe	?	Hardware	2.915 us	0.0712 us	1.00
ThrowAndCatchFinally	\Linear\corerun.exe	?	Hardware	2.959 us	0.0982 us	1.02

ThrowAndCatchWhen	\Constant\corerun.exe	?	Hardware	2.992 us	0.0587 us	1.00
ThrowAndCatchWhen	\Linear\corerun.exe	?	Hardware	2.939 us	0.0518 us	0.98

ThrowAndCatchWhenFinally	\Constant\corerun.exe	?	Hardware	3.012 us	0.0614 us	1.00
ThrowAndCatchWhenFinally	\Linear\corerun.exe	?	Hardware	2.976 us	0.0588 us	0.99

ThrowAndCatchDeep	\Constant\corerun.exe	?	Hardware	7.470 us	0.1429 us	1.00
ThrowAndCatchDeep	\Linear\corerun.exe	?	Hardware	7.395 us	0.1424 us	0.99

ThrowAndCatchDeepRecursive	\Constant\corerun.exe	?	Hardware	7.634 us	0.1523 us	1.00
ThrowAndCatchDeepRecursive	\Linear\corerun.exe	?	Hardware	8.099 us	0.2414 us	1.06

MultipleNestedTryCatch_FirstCatches	\Constant\corerun.exe	?	Hardware	2.887 us	0.0575 us	1.00
MultipleNestedTryCatch_FirstCatches	\Linear\corerun.exe	?	Hardware	2.940 us	0.0595 us	1.02

MultipleNestedTryCatch_LastCatches	\Constant\corerun.exe	?	Hardware	3.099 us	0.0756 us	1.00
MultipleNestedTryCatch_LastCatches	\Linear\corerun.exe	?	Hardware	3.111 us	0.0612 us	1.00

MultipleNestedTryFinally	\Constant\corerun.exe	?	Hardware	3.078 us	0.0628 us	1.00
MultipleNestedTryFinally	\Linear\corerun.exe	?	Hardware	3.117 us	0.0582 us	1.01

CatchAndRethrowDeep	\Constant\corerun.exe	?	Hardware	30.660 us	0.4211 us	1.00
CatchAndRethrowDeep	\Linear\corerun.exe	?	Hardware	30.387 us	0.7820 us	0.99

CatchAndThrowOtherDeep	\Constant\corerun.exe	?	Hardware	34.458 us	0.7226 us	1.00
CatchAndThrowOtherDeep	\Linear\corerun.exe	?	Hardware	34.642 us	0.8285 us	1.01

TryAndFinallyDeep	\Constant\corerun.exe	?	Hardware	8.484 us	0.2266 us	1.00
TryAndFinallyDeep	\Linear\corerun.exe	?	Hardware	8.643 us	0.2083 us	1.02

TryAndCatchDeep_CaugtAtTheTop	\Constant\corerun.exe	?	Hardware	8.292 us	0.1711 us	1.00
TryAndCatchDeep_CaugtAtTheTop	\Linear\corerun.exe	?	Hardware	8.270 us	0.1647 us	1.00

ThrowAndCatch	\Constant\corerun.exe	?	ReflectionSoftware	6.575 us	0.1319 us	1.00
ThrowAndCatch	\Linear\corerun.exe	?	ReflectionSoftware	6.649 us	0.1284 us	1.01

ThrowAndCatch_ManyCatchBlocks	\Constant\corerun.exe	?	ReflectionSoftware	6.828 us	0.1563 us	1.00
ThrowAndCatch_ManyCatchBlocks	\Linear\corerun.exe	?	ReflectionSoftware	7.008 us	0.1907 us	1.03

ThrowAndCatchDeep	\Constant\corerun.exe	?	ReflectionSoftware	10.993 us	0.3096 us	1.00
ThrowAndCatchDeep	\Linear\corerun.exe	?	ReflectionSoftware	11.145 us	0.3053 us	1.01

ThrowAndCatchDeepRecursive	\Constant\corerun.exe	?	ReflectionSoftware	11.779 us	0.3687 us	1.00
ThrowAndCatchDeepRecursive	\Linear\corerun.exe	?	ReflectionSoftware	11.405 us	0.2920 us	0.97

ThrowAndCatch	\Constant\corerun.exe	?	ReflectionHardware	8.043 us	0.2432 us	1.00
ThrowAndCatch	\Linear\corerun.exe	?	ReflectionHardware	8.135 us	0.2468 us	1.01

ThrowAndCatch_ManyCatchBlocks	\Constant\corerun.exe	?	ReflectionHardware	8.216 us	0.2361 us	1.00
ThrowAndCatch_ManyCatchBlocks	\Linear\corerun.exe	?	ReflectionHardware	8.404 us	0.2156 us	1.02

ThrowAndCatchDeep	\Constant\corerun.exe	?	ReflectionHardware	12.421 us	0.3315 us	1.00
ThrowAndCatchDeep	\Linear\corerun.exe	?	ReflectionHardware	12.926 us	0.4600 us	1.04

ThrowAndCatchDeepRecursive	\Constant\corerun.exe	?	ReflectionHardware	12.743 us	0.2532 us	1.00
ThrowAndCatchDeepRecursive	\Linear\corerun.exe	?	ReflectionHardware	13.087 us	0.4372 us	1.03

Verification

OutOfProcessFindHeader
- This is used by the Visual Studio native debugger when debugging a managed dotnet application. Verified that with the changes the debugger is still able to get a proper call stack. Without the changes here, the result is garbled.

lambdageek

I know the existing code uses macros, but i'm not sure that new code needs to go down the same path

src/coreclr/inc/clrconfigvalues.h

src/coreclr/inc/nibblemapmacros.h

src/coreclr/vm/codeman.cpp

max-charlamb · 2024-10-21T21:36:02Z

/azp run

azure-pipelines · 2024-10-21T21:36:09Z

You have several pipelines (over 10) configured to build pull requests in this repository. Specify which pipelines you would like to run by using /azp run [pipelines] command. You can specify multiple pipelines using a comma separated list.

max-charlamb · 2024-10-21T21:39:46Z

/azp run runtime-coreclr outerloop

azure-pipelines · 2024-10-21T21:40:06Z

Azure Pipelines successfully started running 1 pipeline(s).

max-charlamb · 2024-10-22T03:34:28Z

/azp run runtime-coreclr outerloop

azure-pipelines · 2024-10-22T03:34:46Z

Azure Pipelines successfully started running 1 pipeline(s).

src/coreclr/vm/codeman.cpp

lambdageek · 2024-10-31T20:12:54Z

@max-charlamb when this is ready to go, please also add a task to #108553 to update the ExecutionManager_1 contract implementation to use an updated reader algorithm (or bump the contract version 2 and implement a new nibble map reader, or whatever you and @davidwrighton decide is the right thing to do)

azure-pipelines · 2024-11-05T15:29:48Z

No pipelines are associated with this pull request.

davidwrighton

LGTM. Please don't merge this until you also have the cDAC changes ready to merge.

davmason · 2024-11-08T07:18:07Z

CC @hoyosjs and @noahfalk if you want to look at the DAC stuff

noahfalk · 2024-11-08T22:35:58Z

CC @hoyosjs and @noahfalk if you want to look at the DAC stuff

Thanks David. I skimmed through and it looked fine to me. On the DAC side we might find that doing many small reads has higher perf costs than we'd like, but I don't think its worth optimizing right now just based on my speculation. The current DAC probably has similar overhead and we have room to optimize in the future if testing broader scenarios identifies this part of the code as a hot spot.

davmason

LGTM

EgorBo · 2024-11-19T15:14:07Z

Looks like this change broke AltJits, I am seeing

Assert failure(PID 47136 [0x0000b820], Thread: 37808 [0x93b0]): ((*pMap) & ~mask) && !IsPointer(*pMap)

CORECLR! GetCLRRuntimeHost + 0x81C928 (0x00007ffb`1827d888)
CORECLR! GetCLRRuntimeHost + 0x81E1DA (0x00007ffb`1827f13a)
CORECLR! GetCLRRuntimeHost + 0xA1AF9 (0x00007ffb`17b02a59)
CORECLR! GetCLRRuntimeHost + 0xDB795 (0x00007ffb`17b3c6f5)
CORECLR! GetCLRRuntimeHost + 0xAFA8C (0x00007ffb`17b109ec)
CORECLR! GetCLRRuntimeHost + 0x1871F2 (0x00007ffb`17be8152)
CORECLR! GetCLRRuntimeHost + 0x187A7D (0x00007ffb`17be89dd)
CORECLR! GetCLRRuntimeHost + 0x186F1B (0x00007ffb`17be7e7b)
CORECLR! GetCLRRuntimeHost + 0x1898A1 (0x00007ffb`17bea801)
CORECLR! GetCLRRuntimeHost + 0x189353 (0x00007ffb`17bea2b3)
    File: C:\prj\runtime-main\src\coreclr\vm\codeman.cpp:4154

cc @dotnet/jit-contrib if you observe the same

kunalspathak · 2024-11-19T17:50:12Z

Yes, i can confirm that I have seen this failure using altjit:


Assert failure(PID 21700 [0x000054c4], Thread: 47520 [0xb9a0]): ((*pMap) & ~mask) && !IsPointer(*pMap)

CORECLR! EEJitManager::NibbleMapDeleteUnlocked + 0x158 (0x00007ff9`161db138)
CORECLR! EEJitManager::RemoveJitData + 0x1AA (0x00007ff9`161dc9ea)
CORECLR! CEEJitInfo::BackoutJitData + 0xD9 (0x00007ff9`15a60109)
CORECLR! invokeCompileMethod + 0x195 (0x00007ff9`15a99d95)
CORECLR! UnsafeJitFunction + 0x83C (0x00007ff9`15a6e08c)
CORECLR! MethodDesc::JitCompileCodeLocked + 0x1D2 (0x00007ff9`15b45882)
CORECLR! MethodDesc::JitCompileCodeLockedEventWrapper + 0x50D (0x00007ff9`15b4610d)
CORECLR! MethodDesc::JitCompileCode + 0x5AB (0x00007ff9`15b455ab)
CORECLR! MethodDesc::PrepareILBasedCode + 0x50C (0x00007ff9`15b47f4c)
CORECLR! MethodDesc::PrepareCode + 0xF3 (0x00007ff9`15b479e3)
    File: Q:\git\runtime2\src\coreclr\vm\codeman.cpp:4148
    Image: q:\git\runtime2\artifacts\tests\coreclr\windows.x64.Checked\tests\Core_Root\corerun.exe

kunalspathak · 2024-11-19T17:53:21Z

Opened #109970

dotnet-issue-labeler bot added the area-VM-coreclr label Oct 16, 2024

dotnet-policy-service bot assigned max-charlamb Oct 16, 2024

lambdageek reviewed Oct 16, 2024

View reviewed changes

src/coreclr/inc/clrconfigvalues.h Outdated Show resolved Hide resolved

src/coreclr/inc/nibblemapmacros.h Outdated Show resolved Hide resolved

src/coreclr/vm/codeman.cpp Outdated Show resolved Hide resolved

build-analysis bot mentioned this pull request Oct 16, 2024

DebuggerTests other than WasmTestOnChrome timeouts in CI #108921

Closed

max-charlamb force-pushed the nibble-optimize branch 2 times, most recently from e105570 to d59ea81 Compare October 18, 2024 18:49

build-analysis bot mentioned this pull request Oct 24, 2024

SIGKILL (OOM?) while running LibraryImportGenerator.Tests w/o actionable log messages or artifacts dotnet/dnceng#2496

Open

3 tasks

max-charlamb force-pushed the nibble-optimize branch 2 times, most recently from 836f71f to 25c987e Compare October 25, 2024 17:50

lambdageek reviewed Oct 30, 2024

View reviewed changes

max-charlamb changed the title ~~implement optimized nibblemap~~ Constant Time Lookup NibbleMap Nov 4, 2024

max-charlamb marked this pull request as ready for review November 5, 2024 15:30

Max Charlamb added 7 commits November 5, 2024 10:30

fix some bugs and implement code heap iteration

be71d08

fix allocation size

e41e90f

fix assertion statement to use correct starting address

f841195

typo fix

9b0f34c

fix another issue with code size

6b0458b

remove extra loop from find algorithm

ee62241

add assert to make sure low bits are not set

54e5eff

Max Charlamb added 14 commits November 5, 2024 10:30

remove debug tools

304c1e5

clean up PR

e43b7fa

use compiler intrinsic to find nibble without loop

97c3bfa

fix bitscan

67558d8

remove config option

ceecd09

optimizations

282edd7

clean up diff

a795d98

add docs and put nibblemap functions in a namespace

ec10542

fix spacing to reduce diff

dbb6e1a

rework to use old functions

07ade75

more cleanup

25f3c7e

use VolatileStore

0019a6a

FORCEINLINE methods as IsPointer was not being inlined by the compiler

48f4c43

Implement OutOfProcessFindHeader for new nibblemap algorithm

f975925

max-charlamb force-pushed the nibble-optimize branch from 867afe4 to f975925 Compare November 5, 2024 15:30

davidwrighton approved these changes Nov 7, 2024

View reviewed changes

max-charlamb mentioned this pull request Nov 7, 2024

[cdac] add v2 ExecutionManager contract for NibbleMap change max-charlamb/runtime#1

Open

davmason requested review from noahfalk and hoyosjs November 8, 2024 07:17

increment contract version

69f1ad0

max-charlamb mentioned this pull request Nov 8, 2024

[cdac] add v2 ExecutionManager contract for NibbleMap change #109654

Merged

davmason approved these changes Nov 8, 2024

View reviewed changes

max-charlamb merged commit 4087cc8 into dotnet:main Nov 11, 2024
90 checks passed

kunalspathak mentioned this pull request Nov 19, 2024

Hitting assert in NibbleMapDeleteUnlocked #109970

Closed

max-charlamb mentioned this pull request Nov 19, 2024

Allow nibblemap deletes to be no-op #109976

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Constant Time Lookup NibbleMap #108939

Constant Time Lookup NibbleMap #108939

max-charlamb commented Oct 16, 2024 •

edited

Loading

lambdageek left a comment

max-charlamb commented Oct 21, 2024

azure-pipelines bot commented Oct 21, 2024

max-charlamb commented Oct 21, 2024

azure-pipelines bot commented Oct 21, 2024

max-charlamb commented Oct 22, 2024

azure-pipelines bot commented Oct 22, 2024

lambdageek commented Oct 31, 2024

azure-pipelines bot commented Nov 5, 2024

davidwrighton left a comment

davmason commented Nov 8, 2024

noahfalk commented Nov 8, 2024

davmason left a comment

EgorBo commented Nov 19, 2024

kunalspathak commented Nov 19, 2024

kunalspathak commented Nov 19, 2024

Constant Time Lookup NibbleMap #108939

Constant Time Lookup NibbleMap #108939

Conversation

max-charlamb commented Oct 16, 2024 • edited Loading

What is it?

Changes

Benchmarking

Tests

Results

Verification

lambdageek left a comment

Choose a reason for hiding this comment

max-charlamb commented Oct 21, 2024

azure-pipelines bot commented Oct 21, 2024

max-charlamb commented Oct 21, 2024

azure-pipelines bot commented Oct 21, 2024

max-charlamb commented Oct 22, 2024

azure-pipelines bot commented Oct 22, 2024

lambdageek commented Oct 31, 2024

azure-pipelines bot commented Nov 5, 2024

davidwrighton left a comment

Choose a reason for hiding this comment

davmason commented Nov 8, 2024

noahfalk commented Nov 8, 2024

davmason left a comment

Choose a reason for hiding this comment

EgorBo commented Nov 19, 2024

kunalspathak commented Nov 19, 2024

kunalspathak commented Nov 19, 2024

max-charlamb commented Oct 16, 2024 •

edited

Loading