Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-PGO calli/delegate call transformation #109679

Draft
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

MichalPetryka
Copy link
Contributor

@MichalPetryka MichalPetryka commented Nov 10, 2024

I've decided to split my earlier attempt from #85197 into first adding just the transformation and only then making it find more cases with substitution.

TODO before merge:

TODO after merge:

  • Add some importer forward substitution to handle calli with args and more cases.
  • Do [API Proposal]: Introduce an intrinsic for more efficient lambda generation #85014 to handle lambdas.
  • Handle explicitly allocated delegates (for example capturing lambdas, might be done as a part of the substitution)
  • Make the VM not mark all DynamicMethods as DONTINLINE.
  • Handle methods closed over null in some way.
  • Avoid loads for frozen targets.
  • Reuse the handling for later phases where we can't inline but can devirt more after forwardsub and other passes.

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Nov 10, 2024
@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Nov 10, 2024
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

@MichalPetryka
Copy link
Contributor Author

@MihuBot -dependsOn 108579

@MichalPetryka
Copy link
Contributor Author

@MihuBot -dependsOn 108579

@MichalPetryka
Copy link
Contributor Author

@MihuBot -dependsOn 108579

@MichalPetryka
Copy link
Contributor Author

@jkotas It seems to me like the CONTRACT_ASSERT has some bug where it crashes with some non null terminated string instead of properly asserting, should I open an issue for it?

(1724.ab4): Access violation - code c0000005 (first/second chance not available)
For analysis of this file, run !analyze -v
*** WARNING: Unable to verify checksum for coreclr.dll
coreclr!common_strnlen_simd<0,1,unsigned char>+0x17d:
00007ff8`4c1593dd c5fe6f00        vmovdqu ymm0,ymmword ptr [rax] ds:00007ff8`00000000=??
0:000> cdb: Reading initial command '$<C:\h\w\A222091B\t\tmpbv5o3k.tmp'
0:000> .load C:\Users\runner\.dotnet\sos\sos.dll
0:000> ~*k

.  0  Id: 1724.ab4 Suspend: 0 Teb: 000000c3`5c72a000 Unfrozen
Child-SP          RetAddr           Call Site
000000c3`5c972960 00007ff8`4c158ad6 coreclr!common_strnlen_simd<0,1,unsigned char>+0x17d
000000c3`5c972af0 00007ff8`4c159efd coreclr!common_strnlen<0,unsigned char>+0x26
000000c3`5c972b20 00007ff8`4c13fb18 coreclr!strnlen+0x1d
000000c3`5c972b50 00007ff8`4c13f742 coreclr!__crt_stdio_output::output_processor<char,__crt_stdio_output::string_output_adapter<char>,__crt_stdio_output::format_validation_base<char,__crt_stdio_output::string_output_adapter<char> > >::type_case_s_compute_narrow_string_length+0x28
000000c3`5c972b80 00007ff8`4c13a762 coreclr!__crt_stdio_output::output_processor<char,__crt_stdio_output::string_output_adapter<char>,__crt_stdio_output::format_validation_base<char,__crt_stdio_output::string_output_adapter<char> > >::type_case_s+0x102
000000c3`5c972bc0 00007ff8`4c136a7f coreclr!__crt_stdio_output::output_processor<char,__crt_stdio_output::string_output_adapter<char>,__crt_stdio_output::format_validation_base<char,__crt_stdio_output::string_output_adapter<char> > >::state_case_type+0x82
000000c3`5c972c30 00007ff8`4c12a295 coreclr!__crt_stdio_output::output_processor<char,__crt_stdio_output::string_output_adapter<char>,__crt_stdio_output::format_validation_base<char,__crt_stdio_output::string_output_adapter<char> > >::process+0x34f
000000c3`5c972ca0 00007ff8`4c12b011 coreclr!common_vsprintf<__crt_stdio_output::format_validation_base,char>+0x2a5
000000c3`5c9731d0 00007ff8`4c142ff9 coreclr!common_vsprintf_s<char>+0x201
000000c3`5c973270 00007ff8`4b803554 coreclr!__stdio_common_vsprintf_s+0x69
(Inline Function) --------`-------- coreclr!_vsprintf_s_l+0x24
000000c3`5c973300 00007ff8`4b7d1067 coreclr!sprintf_s+0x44
000000c3`5c973360 00007ff8`4bb7fb8e coreclr!CONTRACT_ASSERT+0x237
000000c3`5c976a30 00007ff8`4b9c2445 coreclr!EEContract::DoChecks+0x2ce
000000c3`5c976ab0 00007ff8`4b99022a coreclr!Entry2MethodDesc+0x2c5

@jkotas
Copy link
Member

jkotas commented Nov 10, 2024

Yes, it looks like a bug somewhere. Do you have a link to the crashdump or are you able to tell where the non-terminated string comes from?

@MichalPetryka
Copy link
Contributor Author

Yes, it looks like a bug somewhere. Do you have a link to the crashdump or are you able to tell where the non-terminated string comes from?

The crash is from the Build Libraries Test Run checked coreclr windows x64 Release System.Linq.Parallel.Tests on this PR, not sure how to extract the dump from there.

@jkotas
Copy link
Member

jkotas commented Nov 11, 2024

The crash is from the Build Libraries Test Run checked coreclr windows x64 Release System.Linq.Parallel.Tests on this PR

Multiple things went wrong:

  • The changes in your PR are introducing contract violation
  • The per-thread ContractStackRecord that is used to report the contract violation is corrupted that leads to the crash inside sprintf

The non-null terminated string is a random pointer that came from the corrupted ContractStackRecord linked list. I am not able to tell whether the ContractStackRecord linked list corruption is related to the changes in your PR.

I have tried to introduce an artificial contract violation locally and it was reported fine (no ContractStackRecord linked list corruption).

not sure how to extract the dump from there.

runfo get-helix-payload -j 3836380a-b687-43fa-9d1b-6f438a90cd05 -w System.Linq.Parallel.Tests -o c:\helix_payload\System.Linq.Parallel.Tests

This command is mentioned in the how-to-debug-dump.md doc that is part of the test artifacts: https://dev.azure.com/dnceng-public/public/_build/results?buildId=865877&view=ms.vss-test-web.build-test-results-tab&runId=22530538&paneView=dotnet-dnceng.dnceng-build-release-tasks.helix-test-information-tab&resultId=214160

@MichalPetryka
Copy link
Contributor Author

  • The changes in your PR are introducing contract violation

I think the current version should be fine now, could you check if all looks good with the new JIT-EE call?

@@ -5066,7 +5101,7 @@ void CEEInfo::getCallInfo(
}
else
{
if (!exactType.IsTypeDesc() && !pTargetMD->IsArray())
if (!exactType.IsTypeDesc() && !pTargetMD->IsArray() && !pTargetMD->IsDynamicMethod())
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jkotas CoreCLR is asserting here on DynamicMethod without adding a check here, is this the right solution? With it we can emit direct calls to them but they're still not inlined cause VM says they're DONTINLINE.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the assert?

Dynamic methods can call each other today. This code is able to handle that. How is the case that you are introducing different?

Copy link
Contributor Author

@MichalPetryka MichalPetryka Nov 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the assert?

It's the null return assert inside of this if.

How is the case that you are introducing different?

The MethodTable* from the DynamicMethod is different from the one this method obtains from the JIT which in turn gets it from getMethodClass(replacementMethod). It seems like getMethodClass causes the pointer to be normalized to the real one while the VM here has a special DynamicMethodTable* which causes GetExactDeclaringType to fail.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not following. DynamicMethodTable and MethodTable are unrelated types. DynamicMethodTable should never show up in the JIT/EE interface calls.

Copy link
Contributor Author

@MichalPetryka MichalPetryka Nov 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not following. DynamicMethodTable and MethodTable are unrelated types. DynamicMethodTable should never show up in the JIT/EE interface calls.

I mean, the JIT is passing the proper class handle (the class the DynamicMethod is being attached to, which it gets from getMethodClass), but the method table stored in the dynamic MethodDesc is a custom one with the debug name for it being dynamicClass, which makes GetExactDeclaringType fail cause that seems to have no relation to the parent type.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so the problem is that getMethodClass returns the owning class handle for dynamic methods, but CORINFO_RESOLVED_TOKEN is expected to have the actual class handle for dynamic methods.

I think it would be best to fix it in the place that fills in CORINFO_RESOLVED_TOKEN to have the actual class handle. It will make the dynamic methods path introduced by this PR work the same way as the existing paths taken by the dynamic methods.

Copy link
Contributor Author

@MichalPetryka MichalPetryka Nov 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so the problem is that getMethodClass returns the owning class handle for dynamic methods, but CORINFO_RESOLVED_TOKEN is expected to have the actual class handle for dynamic methods.

I think it would be best to fix it in the place that fills in CORINFO_RESOLVED_TOKEN to have the actual class handle. It will make the dynamic methods path introduced by this PR work the same way as the existing paths taken by the dynamic methods.

How would I grab that from the JIT? This is where I grab that currently with getMethodClass.

EDIT: Would you suggest to add it to the getMethodFromDelegate JIT-EE api?

Copy link
Contributor Author

@MichalPetryka MichalPetryka Nov 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also to be honest, this seems to me a bit like CoreCLR is leaking implementation details of itself to the JIT, for me it'd make more sense if it always returned the handle of the owner class to the JIT and then mapped back to its dynamic one on its end reliably. Otherwise we're polluting the JIT with behaviours specific to one VM.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that CoreCLR is leaking the implementation details here, but I have different opinion about what the correct behavior should be.

The dynamic method type owner is optional and it is only meant to be used for visibility checks. All visibility checks should be done on the EE side. I am not sure why the JIT ever needs to see the type owner. I think it would make more sense for the JIT to always see the special MethodTable that the dynamic methods are attached to.

Would you suggest to add it to the getMethodFromDelegate JIT-EE api?

I would think so. It may also help you with communicating precise type to the JIT. I think MethodDesc* may not always have the precise type in case of shared generic code - notice that Delegate.GetMethodImpl does extra work to figure out the precise type for generic types https://github.com/dotnet/runtime/blob/main/src/coreclr/System.Private.CoreLib/src/System/Delegate.CoreCLR.cs#L169-L170.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would think so. It may also help you with communicating precise type to the JIT. I think MethodDesc* may not always have the precise type in case of shared generic code - notice that Delegate.GetMethodImpl does extra work to figure out the precise type for generic types https://github.com/dotnet/runtime/blob/main/src/coreclr/System.Private.CoreLib/src/System/Delegate.CoreCLR.cs#L169-L170.

I can't find a case where this is an issue, the JIT seems to always have the proper generic context to resolve stuff so I wonder if this is just a reflection only issue?

If not, could you assist me with handling this cause I couldn't figure out how to replicate this logic in VMs?

@@ -106,7 +106,6 @@ public TypeDesc Type
public DelegateInfo(TypeDesc delegateType, DelegateFeature features)
{
Debug.Assert(delegateType.IsDelegate);
Debug.Assert(delegateType.IsTypeDefinition);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this is safe but NativeAOT otherwise asserts here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It isn't, the implementation is non-sensical for instantiated types.

Is this only used to get signature of the delegate? If so, replace this:

                DelegateInfo delegateInfo = _compilation.TypeSystemContext.GetDelegateInfo(calledType);
                int paramCountDelegateClosed = delegateInfo.Signature.Length + 1;

with this:

                int paramCountDelegateClosed = calledType.GetMethod("Invoke", null).Signature.Length + 1;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made it check _firstParameter instead since AFAIR I can just do that on NativeAOT since it seems to not support closed static delegates or instance ones closed over null.

@AndyAyersMS
Copy link
Member

Copying and slightly editing part of my note from #108579 (comment)

I think we should approach this differently. Instead of trying to pull all this off during importation and inlining, where it is difficult for us to reason about dataflow, I would like to see us focus on refactoring the inliner so it can be invoked as a utility in later phases and then rely on normal forward propagation of facts to enable new inlining in cases like these.

That would be a more robust and more general solution, but also a fair amount more work, and not something I think we're prepared to do or oversee during .NET 10.

Thinking about it a bit more, you might also be able to re-express this as a special case of GDV, where the analysis you do here creates a guarded delegate invoke, and then later JIT optimizations prove it's the only possibility and remove the guard test and alternative paths. @jakobbotsch might have been hinting at something like this recently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants