Replies: 3 comments 1 reply
-
@vargaz How does Mono deal with this issue today? |
Beta Was this translation helpful? Give feedback.
-
You can find some more discussion on this topic in #4906 . I think that the simple rule should be to use
|
Beta Was this translation helpful? Give feedback.
-
Thank you very much, I got the simple rule. We should modify the code to follow this simple rule. However, I don't have a better way for the released portable binary program (From NuGet), if we need to ensure safe to execute it on RMO platforms, it seems that we can only use the slow mode: if we cannot found whether IL is lock-free associated access, we need to automatically use volatile for all indir accesses at JIT compiling, this will generates a lot of unnecessary memory barrier instructions. Is there any other way to reduce performance loss? |
Beta Was this translation helpful? Give feedback.
-
We found that lazy initialization will cause errors on RMO(Relaxed Memory Order) platforms, which are used on both Roslyn and CoreFX.
https://github.com/dotnet/roslyn/blob/master/src/Compilers/CSharp/Portable/Binder/Binder.cs#L493-L505
runtime/src/libraries/System.Reflection.Metadata/src/System/Reflection/PortableExecutable/PEReader.cs
Lines 338 to 351 in da90d08
We use a unit test case(https://github.com/dotnet-hev/load-load-reordering) to explain what happened:
Root Cause:
In LazyInit, the two loads for &_lazyObj are out of ordered, the loads for return is executed before loads for branch, the stores on another thread happens to be between these two loads, so store updates only for loads for branch. although accesses are for the same address.
Why ARM no issue?
Because ARM/ARM64 is not a fully RMO micro-architecture, they have constraints on the order of loads for the same address.
Why isn't easy to observe in native(C/C++)?
Because native compilers usually do local assignment propagation optimization, the result of loads used for branch is also used for return.
How to fix?
We may need to think about questions first:
Whether problem 1 is right or wrong is not easy to solve problem 2, unless it is fixed in coreclr, but it seems that it is difficult to accurately match the scene for local assignment propagation or memory barrier insertion.
Any suggestions please?
Thanks!
@jkotas
Beta Was this translation helpful? Give feedback.
All reactions