-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Small tuning for write-barrier on arm64 #106191
Conversation
@EgorBot -arm64 -perf using System;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
public class Bench
{
[Benchmark]
public void WB()
{
Foo foo = new Foo();
for (long i = 0; i < 200000000; i++)
foo.x = foo;
}
}
internal class Foo
{
public volatile Foo x;
} |
Benchmark results on Arm64
Flame graphs: Main vs PR 🔥 For clean |
doesnt look like this change is debug builds only? |
Ah, it seems to be never enabled currently, judging by this |
cd6ebb7
to
9bea3cc
Compare
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
@EgorBot -arm64 -intel using System.Runtime.CompilerServices;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
BenchmarkRunner.Run<Bench>(args: args);
public class Bench
{
[Benchmark]
public void WB_gen0()
{
Foo foo = new Foo();
for (long i = 0; i < 200000000; i++)
foo.x = foo;
}
static readonly Foo Gen2Obj = GetGen2Obj();
static Foo GetGen2Obj()
{
var obj = new Foo();
GC.Collect(2);
GC.Collect(2);
if (GC.GetGeneration(obj) != 2)
throw new Exception();
return obj;
}
[Benchmark]
public void WB_gen2()
{
var gen2obj = Gen2Obj;
for (long i = 0; i < 200000000; i++)
gen2obj.x = gen2obj;
}
[Benchmark]
public void CheckedWB_NotInHeap()
{
MyStruct ms = new MyStruct();
string str = new string('x', 1);
for (long i = 0; i < 200000000; i++)
CheckedWriteBarrier(ref ms, str);
}
// struct in heap
MyStruct MS = new();
[Benchmark]
public void CheckedWB_InHeap()
{
string str = new string('x', 1);
for (long i = 0; i < 200000000; i++)
CheckedWriteBarrier(ref MS, str);
}
[MethodImpl(MethodImplOptions.NoInlining)]
void CheckedWriteBarrier(ref MyStruct ms, string str) => ms.Str = str;
}
public record struct MyStruct(string Str);
internal class Foo
{
public volatile Foo x;
} |
This comment was marked as resolved.
This comment was marked as resolved.
Benchmark results on Arm64
|
@cshung @jkotas when you have a moment, please review - I am seeing stable 14% improvement in the benchmarks. I did some study with Was: https://gist.github.com/EgorBot/cbbd053c7f0362cd6f315a9ee6089f39 PS: it is unfortunate that the write barrier is accessed via jump-stubs (you can even see it in the link above ^ and notice that the address matches the address of the write barrier) |
blo 0f | ||
|
||
PREPARE_EXTERNAL_VAR_INDIRECT g_ephemeral_high, x12 | ||
PREPARE_EXTERNAL_VAR_INDIRECT g_ephemeral_low, x12 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I saw you have used ldp
at one point. Is it not profitable here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I saw no visible benefits from it, while it's possible to use ldp
for NativeAOT easily, it's a bit tricky on CoreCLR with its:
ldr x12, LOCAL_LABEL(wbs_ephemeral_low)
ldr x17, LOCAL_LABEL(wbs_ephemeral_high)
I wasn't sure how to properly use ldp
with LOCAL_LABEL
🤔
I tried:
ldp x12, x17, LOCAL_LABEL(wbs_ephemeral_low) -- not compiling
ldp x12, x17, [sp, LOCAL_LABEL(wbs_ephemeral_low)] -- not compiling
I guess ldp expects some real sp delta rather than label imm. I guess I can calculate & hard-code it, but I simply didn't see any improvements in the benchmarks from that so I gave up
From my understanding, the difference between CoreCLR and NativeAOT is that NativeAOT accesses GC variables directly, while on CoreCLR we have a mechanism that dumps them all to a "patchable pool" next to the Write-barrier function for better cache access so it can access them by label in code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
just being a little risk averse, wonder whether we should check if this has any measurable improvement in real scenarios? Else we could differ till the 9 fork? |
Yeah I already put the 10.0.0 milestone to it |
I presume we should put the write barrier code to the loader heapr |
cc @ebepho |
A small contribution towards #106051
A small clean up in debug-only code (we don't need to push/pop r13)
Optimize "is gen0" check