JIT: Use reaching definitions in CSE to update conservative VNs #109959

jakobbotsch · 2024-11-19T13:06:32Z

Now that CSE always inserts into SSA we can update it to make use of the reaching definition information that it has access to. CSE already spent effort to track some extra information to try to do this, which we can remove.

Remove optCSECheckedBoundMap: this was used by CSE to try to update conservative VNs of ancestor bounds checks. This is unnececssary since all descendants of the CSE definitions should get the same conservative VNs automatically now.
Remove CSEdsc::defConservNormVN; this was used to update conservative VNs in the case where all defs agree on the conservative VN, which again is unnecessary now

Making this change requires a bit of refactoring to the incremental SSA builder. Before this PR the builder takes all defs and all uses and then inserts everything into SSA. After this change the builder is used in a multi-step process as follows:

All definitions are added with IncrementalSsaBuilder::InsertDef
The definitions are finalized with IncrementalSsaBuilder::FinalizeDefs
Uses are inserted (one by one) with IncrementalSsaBuilder::InsertUse. No finalization is necessary; each use is directly put into SSA as a result of calling this method.

The refactoring allows CSE to use the incremental SSA builder such that it can access reaching definition information for each of the uses as part of making replacements. However, this still requires some refactoring such that CSE performs replacements of all defs before performing the replacements of all uses.

Additionally, this PR fixes various incorrect VN updating made by CSE.

VN and CSE still track VNs that are interesting bounds check. However, VN was sometimes inserting VNs with exception sets into the set, which is not useful (the consumers always use normal VNs when querying the set). This PR fixes VN to insert the normal VN instead.

Fix #109745

Now that CSE always inserts into SSA we can update it to make use of the reaching definition information that it has access to. CSE already spent effort to track some extra information to try to do this, which we can remove. - Remove `optCSECheckedBoundMap`: this was used by CSE to try to update conservative VNs of ancestor bounds checks. This is unnececssary since all descendants of the CSE definitions should get the same conservative VNs automatically now. - Remove `CSEdsc::defConservNormVN`; this was used to update conservative VNs in the single-def case, which again is unnecessary now Making this change requires a bit of refactoring to the incremental SSA builder. Before this PR the builder takes all defs and all uses and then inserts everything into SSA. After this change the builder is used in a multi-step process as follows: 1. All definitions are added with `IncrementalSsaBuilder::InsertDef` 2. The definitions are finalized with `IncrementalSsaBuilder::FinalizeDefs` 3. Uses are inserted (one by one) with `IncrementalSsaBuilder::InsertUse`. No finalization are necessary; each use is directly put into SSA as a result of calling this method. The refactoring allows CSE to use the incremental SSA builder such that it can access reaching definition information for each of the uses as part of making replacements. However, this still requires some refactoring such that CSE performs replacements of all defs before performing the replacements of all uses. Additionally, this PR fixes various incorrect VN updating made by CSE. VN and CSE still track VNs that are interesting bounds check. However, VN was sometimes inserting VNs with exception sets into the set, which is not useful (the consumers always use normal VNs when querying the set). This PR fixes VN to insert the normal VN instead. Fix dotnet#109745

jakobbotsch · 2024-11-19T13:08:31Z

src/coreclr/jit/optcse.cpp

-            if (isSharedConst)
+        // Assign the proper Value Numbers.
+        ValueNumPair valExc = m_pCompiler->vnStore->VNPExceptionSet(val->gtVNPair);
+        store->gtVNPair     = m_pCompiler->vnStore->VNPWithExc(ValueNumStore::VNPForVoid(), valExc);


This was assigning ValueNumStore::VNPForVoid before, without any exceptions. That seems wrong.

jakobbotsch · 2024-11-19T13:21:10Z

src/coreclr/jit/valuenum.cpp

+                        ValueNum lengthVN =
+                            vnStore->VNNormalValue(tree->AsBoundsChk()->GetArrayLength()->gtVNPair.GetConservative());


All consumers of this info query the normal conservative VN, so inserting one with exceptions is uninteresting. I hit a few diffs related to this.

jakobbotsch · 2024-11-19T15:03:11Z

/azp run runtime-coreclr jitstress, runtime-coreclr libraries-jitstress

azure-pipelines · 2024-11-19T15:03:34Z

Azure Pipelines successfully started running 2 pipeline(s).

jakobbotsch · 2024-11-19T18:20:08Z

/azp run runtime-coreclr jitstress, runtime-coreclr libraries-jitstress

azure-pipelines · 2024-11-19T18:20:22Z

Azure Pipelines successfully started running 2 pipeline(s).

jakobbotsch · 2024-11-20T14:27:09Z

/azp run runtime-coreclr jitstress, runtime-coreclr libraries-jitstress

azure-pipelines · 2024-11-20T14:27:23Z

Azure Pipelines successfully started running 2 pipeline(s).

jakobbotsch · 2024-11-20T17:07:28Z

cc @dotnet/jit-contrib PTAL @AndyAyersMS

Diffs. Some minor CQ and TP improvements.

The test failure looks like an infra issue -- the failing workitem was dead-lettered.

AndyAyersMS

Did you ever collect stats on how often we bail out putting defs into SSA because of IDF size?

jakobbotsch · 2024-11-22T08:41:20Z

Did you ever collect stats on how often we bail out putting defs into SSA because of IDF size?

Here's some histograms for IDF size we see in the incremental builder for win-x64 collections:

aspnet
IDF size
     <=          1 ===>     779 count ( 15% of total)
      2 ..       2 ===>     780 count ( 30% of total)
      3 ..       3 ===>     639 count ( 43% of total)
      4 ..       4 ===>     608 count ( 55% of total)
      5 ..       5 ===>     380 count ( 63% of total)
      6 ..      10 ===>    1366 count ( 90% of total)
     11 ..      20 ===>     386 count ( 97% of total)
     21 ..      30 ===>      95 count ( 99% of total)
     31 ..      40 ===>       9 count ( 99% of total)
     41 ..      50 ===>       4 count (100% of total)
     51 ..     100 ===>       0 count (100% of total)
    101 ..     200 ===>       0 count (100% of total)
    201 ..     300 ===>       0 count (100% of total)
    301 ..     400 ===>       0 count (100% of total)
    401 ..     500 ===>       0 count (100% of total)

benchmarks.run_pgo
IDF size
     <=          1 ===>     298 count ( 12% of total)
      2 ..       2 ===>     531 count ( 35% of total)
      3 ..       3 ===>     200 count ( 43% of total)
      4 ..       4 ===>     243 count ( 54% of total)
      5 ..       5 ===>     170 count ( 61% of total)
      6 ..      10 ===>     770 count ( 94% of total)
     11 ..      20 ===>     114 count ( 99% of total)
     21 ..      30 ===>       8 count ( 99% of total)
     31 ..      40 ===>       0 count ( 99% of total)
     41 ..      50 ===>       7 count ( 99% of total)
     51 ..     100 ===>       1 count (100% of total)
    101 ..     200 ===>       0 count (100% of total)
    201 ..     300 ===>       0 count (100% of total)
    301 ..     400 ===>       0 count (100% of total)
    401 ..     500 ===>       0 count (100% of total)


libraries_tests.run:
IDF size
     <=          1 ===>    4340 count ( 18% of total)
      2 ..       2 ===>    3915 count ( 35% of total)
      3 ..       3 ===>    3542 count ( 50% of total)
      4 ..       4 ===>    3435 count ( 64% of total)
      5 ..       5 ===>    1806 count ( 72% of total)
      6 ..      10 ===>    5126 count ( 94% of total)
     11 ..      20 ===>    1231 count ( 99% of total)
     21 ..      30 ===>     151 count ( 99% of total)
     31 ..      40 ===>      20 count ( 99% of total)
     41 ..      50 ===>       5 count ( 99% of total)
     51 ..     100 ===>       2 count (100% of total)
    101 ..     200 ===>       0 count (100% of total)
    201 ..     300 ===>       0 count (100% of total)
    301 ..     400 ===>       0 count (100% of total)
    401 ..     500 ===>       0 count (100% of total)

realworld:
IDF size
     <=          1 ===>     449 count ( 22% of total)
      2 ..       2 ===>     366 count ( 41% of total)
      3 ..       3 ===>     341 count ( 58% of total)
      4 ..       4 ===>     225 count ( 70% of total)
      5 ..       5 ===>     178 count ( 79% of total)
      6 ..      10 ===>     305 count ( 95% of total)
     11 ..      20 ===>      81 count ( 99% of total)
     21 ..      30 ===>      13 count ( 99% of total)
     31 ..      40 ===>       1 count ( 99% of total)
     41 ..      50 ===>       1 count (100% of total)
     51 ..     100 ===>       0 count (100% of total)
    101 ..     200 ===>       0 count (100% of total)
    201 ..     300 ===>       0 count (100% of total)
    301 ..     400 ===>       0 count (100% of total)
    401 ..     500 ===>       0 count (100% of total)

So looks like none of these collections have any examples that fail SSA insertion.

jakobbotsch · 2024-11-22T09:11:09Z

Some examples of functions with large (>= 40) IDF sizes for some CSEs.

https://github.com/PowerShell/PowerShell/blob/7ca7aae1d13d19e38c7c26260758f474cb9bef7f/src/System.Management.Automation/engine/Modules/ModuleCmdletBase.cs#L1513-L3580 (a 2000 line function)

https://github.com/dotnet/performance/blob/4894b54b3e86637b040ba6c7c54c23524cfbeadd/src/benchmarks/micro/runtime/Bytemark/assign_rect.cs#L438-L540 (the OSR version of this)
The CSE here is for tableau.GetLength(1).

runtime/src/libraries/System.Private.CoreLib/src/System/Text/Unicode/Utf8Utility.Transcoding.cs

Lines 835 to 1522 in 6717d71

    
                   // On method return, pInputBufferRemaining and pOutputBufferRemaining will both point to where 
        
                   // the next char would have been consumed from / the next byte would have been written to. 
        
                   // inputLength in chars, outputBytesRemaining in bytes. 
        
                   public static OperationStatus TranscodeToUtf8(char* pInputBuffer, int inputLength, byte* pOutputBuffer, int outputBytesRemaining, out char* pInputBufferRemaining, out byte* pOutputBufferRemaining) 
        
                   { 
        
                       const int CharsPerDWord = sizeof(uint) / sizeof(char); 
        
                       Debug.Assert(inputLength >= 0, "Input length must not be negative."); 
        
                       Debug.Assert(pInputBuffer != null || inputLength == 0, "Input length must be zero if input buffer pointer is null."); 
        
                       Debug.Assert(outputBytesRemaining >= 0, "Destination length must not be negative."); 
        
                       Debug.Assert(pOutputBuffer != null || outputBytesRemaining == 0, "Destination length must be zero if destination buffer pointer is null."); 
        
                       // First, try vectorized conversion. 
        
                       { 
        
                           nuint numElementsConverted = Ascii.NarrowUtf16ToAscii(pInputBuffer, pOutputBuffer, (uint)Math.Min(inputLength, outputBytesRemaining)); 
        
                           pInputBuffer += numElementsConverted; 
        
                           pOutputBuffer += numElementsConverted; 
        
                           // Quick check - did we just end up consuming the entire input buffer? 
        
                           // If so, short-circuit the remainder of the method. 
        
                           if ((int)numElementsConverted == inputLength) 
        
                           { 
        
                               pInputBufferRemaining = pInputBuffer; 
        
                               pOutputBufferRemaining = pOutputBuffer; 
        
                               return OperationStatus.Done; 
        
                           } 
        
                           inputLength -= (int)numElementsConverted; 
        
                           outputBytesRemaining -= (int)numElementsConverted; 
        
                       } 
        
                       if (inputLength < CharsPerDWord) 
        
                       { 
        
                           goto ProcessInputOfLessThanDWordSize; 
        
                       } 
        
                       char* pFinalPosWhereCanReadDWordFromInputBuffer = pInputBuffer + (uint)inputLength - CharsPerDWord; 
        
                       // We have paths for SSE4.1 vectorization inside the inner loop. Since the below 
        
                       // vector is only used in those code paths, we leave it uninitialized if SSE4.1 
        
                       // is not enabled. 
        
                       Vector128<short> nonAsciiUtf16DataMask; 
        
                       if (Sse41.X64.IsSupported || (AdvSimd.Arm64.IsSupported && BitConverter.IsLittleEndian)) 
        
                       { 
        
                           nonAsciiUtf16DataMask = Vector128.Create(unchecked((short)0xFF80)); // mask of non-ASCII bits in a UTF-16 char 
        
                       } 
        
                       // Begin the main loop. 
        
           #if DEBUG 
        
                       char* pLastBufferPosProcessed = null; // used for invariant checking in debug builds 
        
           #endif 
        
                       uint thisDWord; 
        
                       Debug.Assert(pInputBuffer <= pFinalPosWhereCanReadDWordFromInputBuffer); 
        
                       do 
        
                       { 
        
                           // Read 32 bits at a time. This is enough to hold any possible UTF16-encoded scalar. 
        
                           thisDWord = Unsafe.ReadUnaligned<uint>(pInputBuffer); 
        
                       AfterReadDWord: 
        
           #if DEBUG 
        
                           Debug.Assert(pLastBufferPosProcessed < pInputBuffer, "Algorithm should've made forward progress since last read."); 
        
                           pLastBufferPosProcessed = pInputBuffer; 
        
           #endif 
        
                           // First, check for the common case of all-ASCII chars. 
        
                           if (Utf16Utility.AllCharsInUInt32AreAscii(thisDWord)) 
        
                           { 
        
                               // We read an all-ASCII sequence (2 chars). 
        
                               if (outputBytesRemaining < 2) 
        
                               { 
        
                                   goto ProcessOneCharFromCurrentDWordAndFinish; // running out of space, but may be able to write some data 
        
                               } 
        
                               // The high WORD of the local declared below might be populated with garbage 
        
                               // as a result of our shifts below, but that's ok since we're only going to 
        
                               // write the low WORD. 
        
                               // 
        
                               // [ 00000000 0bbbbbbb | 00000000 0aaaaaaa ] -> [ 00000000 0bbbbbbb | 0bbbbbbb 0aaaaaaa ] 
        
                               // (Same logic works regardless of endianness.) 
        
                               uint valueToWrite = thisDWord | (thisDWord >> 8); 
        
                               Unsafe.WriteUnaligned(pOutputBuffer, (ushort)valueToWrite); 
        
                               pInputBuffer += 2; 
        
                               pOutputBuffer += 2; 
        
                               outputBytesRemaining -= 2; 
        
                               // If we saw a sequence of all ASCII, there's a good chance a significant amount of following data is also ASCII. 
        
                               // Below is basically unrolled loops with poor man's vectorization. 
        
                               uint inputCharsRemaining = (uint)(pFinalPosWhereCanReadDWordFromInputBuffer - pInputBuffer) + 2; 
        
                               uint minElementsRemaining = (uint)Math.Min(inputCharsRemaining, outputBytesRemaining); 
        
                               if (Sse41.X64.IsSupported || (AdvSimd.Arm64.IsSupported && BitConverter.IsLittleEndian)) 
        
                               { 
        
                                   // Try reading and writing 8 elements per iteration. 
        
                                   uint maxIters = minElementsRemaining / 8; 
        
                                   ulong possibleNonAsciiQWord; 
        
                                   int i; 
        
                                   Vector128<short> utf16Data; 
        
                                   for (i = 0; (uint)i < maxIters; i++) 
        
                                   { 
        
                                       // The trimmer won't trim out nonAsciiUtf16DataMask unless this is in the loop. 
        
                                       // Luckily, this is a nop and will be elided by the JIT 
        
                                       Unsafe.SkipInit(out nonAsciiUtf16DataMask); 
        
                                       utf16Data = Unsafe.ReadUnaligned<Vector128<short>>(pInputBuffer); 
        
                                       if (AdvSimd.Arm64.IsSupported) 
        
                                       { 
        
                                           Vector128<short> isUtf16DataNonAscii = AdvSimd.CompareTest(utf16Data, nonAsciiUtf16DataMask); 
        
                                           bool hasNonAsciiDataInVector = AdvSimd.Arm64.MinPairwise(isUtf16DataNonAscii, isUtf16DataNonAscii).AsUInt64().ToScalar() != 0; 
        
                                           if (hasNonAsciiDataInVector) 
        
                                           { 
        
                                               goto LoopTerminatedDueToNonAsciiDataInVectorLocal; 
        
                                           } 
        
                                           Vector64<byte> lower = AdvSimd.ExtractNarrowingSaturateUnsignedLower(utf16Data); 
        
                                           AdvSimd.Store(pOutputBuffer, lower); 
        
                                       } 
        
                                       else if (Sse41.IsSupported) 
        
                                       { 
        
                                           if (!Sse41.TestZ(utf16Data, nonAsciiUtf16DataMask)) 
        
                                           { 
        
                                               goto LoopTerminatedDueToNonAsciiDataInVectorLocal; 
        
                                           } 
        
                                           // narrow and write 
        
                                           Sse2.StoreScalar((ulong*)pOutputBuffer /* unaligned */, Sse2.PackUnsignedSaturate(utf16Data, utf16Data).AsUInt64()); 
        
                                       } 
        
                                       else 
        
                                       { 
        
                                           // We explicitly recheck each IsSupported query to ensure that the trimmer can see which paths are live/dead 
        
                                           ThrowHelper.ThrowUnreachableException(); 
        
                                       } 
        
                                       pInputBuffer += 8; 
        
                                       pOutputBuffer += 8; 
        
                                   } 
        
                                   outputBytesRemaining -= 8 * i; 
        
                                   // Can we perform one more iteration, but reading & writing 4 elements instead of 8? 
        
                                   if ((minElementsRemaining & 4) != 0) 
        
                                   { 
        
                                       possibleNonAsciiQWord = Unsafe.ReadUnaligned<ulong>(pInputBuffer); 
        
                                       if (!Utf16Utility.AllCharsInUInt64AreAscii(possibleNonAsciiQWord)) 
        
                                       { 
        
                                           goto LoopTerminatedDueToNonAsciiDataInPossibleNonAsciiQWordLocal; 
        
                                       } 
        
                                       utf16Data = Vector128.CreateScalarUnsafe(possibleNonAsciiQWord).AsInt16(); 
        
                                       if (AdvSimd.IsSupported) 
        
                                       { 
        
                                           Vector64<byte> lower = AdvSimd.ExtractNarrowingSaturateUnsignedLower(utf16Data); 
        
                                           AdvSimd.StoreSelectedScalar((uint*)pOutputBuffer, lower.AsUInt32(), 0); 
        
                                       } 
        
                                       else if (Sse2.IsSupported) 
        
                                       { 
        
                                           Unsafe.WriteUnaligned(pOutputBuffer, Sse2.ConvertToUInt32(Sse2.PackUnsignedSaturate(utf16Data, utf16Data).AsUInt32())); 
        
                                       } 
        
                                       else 
        
                                       { 
        
                                           // We explicitly recheck each IsSupported query to ensure that the trimmer can see which paths are live/dead 
        
                                           ThrowHelper.ThrowUnreachableException(); 
        
                                       } 
        
                                       pInputBuffer += 4; 
        
                                       pOutputBuffer += 4; 
        
                                       outputBytesRemaining -= 4; 
        
                                   } 
        
                                   continue; // Go back to beginning of main loop, read data, check for ASCII 
        
                               LoopTerminatedDueToNonAsciiDataInVectorLocal: 
        
                                   outputBytesRemaining -= 8 * i; 
        
                                   if (Sse2.X64.IsSupported) 
        
                                   { 
        
                                       possibleNonAsciiQWord = Sse2.X64.ConvertToUInt64(utf16Data.AsUInt64()); 
        
                                   } 
        
                                   else 
        
                                   { 
        
                                       possibleNonAsciiQWord = utf16Data.AsUInt64().ToScalar(); 
        
                                   } 
        
                                   // Temporarily set 'possibleNonAsciiQWord' to be the low 64 bits of the vector, 
        
                                   // then check whether it's all-ASCII. If so, narrow and write to the destination 
        
                                   // buffer. Since we know that either the high 64 bits or the low 64 bits of the 
        
                                   // vector contains non-ASCII data, by the end of the following block the 
        
                                   // 'possibleNonAsciiQWord' local is guaranteed to contain the non-ASCII segment. 
        
                                   if (Utf16Utility.AllCharsInUInt64AreAscii(possibleNonAsciiQWord)) // all chars in first QWORD are ASCII 
        
                                   { 
        
                                       if (AdvSimd.IsSupported) 
        
                                       { 
        
                                           Vector64<byte> lower = AdvSimd.ExtractNarrowingSaturateUnsignedLower(utf16Data); 
        
                                           AdvSimd.StoreSelectedScalar((uint*)pOutputBuffer, lower.AsUInt32(), 0); 
        
                                       } 
        
                                       else if (Sse2.IsSupported) 
        
                                       { 
        
                                           Unsafe.WriteUnaligned(pOutputBuffer, Sse2.ConvertToUInt32(Sse2.PackUnsignedSaturate(utf16Data, utf16Data).AsUInt32())); 
        
                                       } 
        
                                       else 
        
                                       { 
        
                                           // We explicitly recheck each IsSupported query to ensure that the trimmer can see which paths are live/dead 
        
                                           ThrowHelper.ThrowUnreachableException(); 
        
                                       } 
        
                                       pInputBuffer += 4; 
        
                                       pOutputBuffer += 4; 
        
                                       outputBytesRemaining -= 4; 
        
                                       possibleNonAsciiQWord = utf16Data.AsUInt64().GetElement(1); 
        
                                   } 
        
                               LoopTerminatedDueToNonAsciiDataInPossibleNonAsciiQWordLocal: 
        
                                   Debug.Assert(!Utf16Utility.AllCharsInUInt64AreAscii(possibleNonAsciiQWord)); // this condition should've been checked earlier 
        
                                   thisDWord = (uint)possibleNonAsciiQWord; 
        
                                   if (Utf16Utility.AllCharsInUInt32AreAscii(thisDWord)) 
        
                                   { 
        
                                       // [ 00000000 0bbbbbbb | 00000000 0aaaaaaa ] -> [ 00000000 0bbbbbbb | 0bbbbbbb 0aaaaaaa ] 
        
                                       Unsafe.WriteUnaligned(pOutputBuffer, (ushort)(thisDWord | (thisDWord >> 8))); 
        
                                       pInputBuffer += 2; 
        
                                       pOutputBuffer += 2; 
        
                                       outputBytesRemaining -= 2; 
        
                                       thisDWord = (uint)(possibleNonAsciiQWord >> 32); 
        
                                   } 
        
                                   goto AfterReadDWordSkipAllCharsAsciiCheck; 
        
                               } 
        
                               else 
        
                               { 
        
                                   // Can't use SSE41 x64, so we'll only read and write 4 elements per iteration. 
        
                                   uint maxIters = minElementsRemaining / 4; 
        
                                   uint secondDWord; 
        
                                   int i; 
        
                                   for (i = 0; (uint)i < maxIters; i++) 
        
                                   { 
        
                                       thisDWord = Unsafe.ReadUnaligned<uint>(pInputBuffer); 
        
                                       secondDWord = Unsafe.ReadUnaligned<uint>(pInputBuffer + 2); 
        
                                       if (!Utf16Utility.AllCharsInUInt32AreAscii(thisDWord | secondDWord)) 
        
                                       { 
        
                                           goto LoopTerminatedDueToNonAsciiData; 
        
                                       } 
        
                                       // [ 00000000 0bbbbbbb | 00000000 0aaaaaaa ] -> [ 00000000 0bbbbbbb | 0bbbbbbb 0aaaaaaa ] 
        
                                       // (Same logic works regardless of endianness.) 
        
                                       Unsafe.WriteUnaligned(pOutputBuffer, (ushort)(thisDWord | (thisDWord >> 8))); 
        
                                       Unsafe.WriteUnaligned(pOutputBuffer + 2, (ushort)(secondDWord | (secondDWord >> 8))); 
        
                                       pInputBuffer += 4; 
        
                                       pOutputBuffer += 4; 
        
                                   } 
        
                                   outputBytesRemaining -= 4 * i; 
        
                                   continue; // Go back to beginning of main loop, read data, check for ASCII 
        
                               LoopTerminatedDueToNonAsciiData: 
        
                                   outputBytesRemaining -= 4 * i; 
        
                                   // First, see if we can drain any ASCII data from the first DWORD. 
        
                                   if (Utf16Utility.AllCharsInUInt32AreAscii(thisDWord)) 
        
                                   { 
        
                                       // [ 00000000 0bbbbbbb | 00000000 0aaaaaaa ] -> [ 00000000 0bbbbbbb | 0bbbbbbb 0aaaaaaa ] 
        
                                       // (Same logic works regardless of endianness.) 
        
                                       Unsafe.WriteUnaligned(pOutputBuffer, (ushort)(thisDWord | (thisDWord >> 8))); 
        
                                       pInputBuffer += 2; 
        
                                       pOutputBuffer += 2; 
        
                                       outputBytesRemaining -= 2; 
        
                                       thisDWord = secondDWord; 
        
                                   } 
        
                                   goto AfterReadDWordSkipAllCharsAsciiCheck; 
        
                               } 
        
                           } 
        
                       AfterReadDWordSkipAllCharsAsciiCheck: 
        
                           Debug.Assert(!Utf16Utility.AllCharsInUInt32AreAscii(thisDWord)); // this should have been handled earlier 
        
                           // Next, try stripping off the first ASCII char if it exists. 
        
                           // We don't check for a second ASCII char since that should have been handled above. 
        
                           if (IsFirstCharAscii(thisDWord)) 
        
                           { 
        
                               if (outputBytesRemaining == 0) 
        
                               { 
        
                                   goto OutputBufferTooSmall; 
        
                               } 
        
                               if (BitConverter.IsLittleEndian) 
        
                               { 
        
                                   pOutputBuffer[0] = (byte)thisDWord; // extract [ ## ## 00 AA ] 
        
                               } 
        
                               else 
        
                               { 
        
                                   pOutputBuffer[0] = (byte)(thisDWord >> 16); // extract [ 00 AA ## ## ] 
        
                               } 
        
                               pInputBuffer++; 
        
                               pOutputBuffer++; 
        
                               outputBytesRemaining--; 
        
                               if (pInputBuffer > pFinalPosWhereCanReadDWordFromInputBuffer) 
        
                               { 
        
                                   goto ProcessNextCharAndFinish; // input buffer doesn't contain enough data to read a DWORD 
        
                               } 
        
                               else 
        
                               { 
        
                                   // The input buffer at the current offset contains a non-ASCII char. 
        
                                   // Read an entire DWORD and fall through to non-ASCII consumption logic. 
        
                                   thisDWord = Unsafe.ReadUnaligned<uint>(pInputBuffer); 
        
                               } 
        
                           } 
        
                           // At this point, we know the first char in the buffer is non-ASCII, but we haven't yet validated it. 
        
                           if (!IsFirstCharAtLeastThreeUtf8Bytes(thisDWord)) 
        
                           { 
        
                           TryConsumeMultipleTwoByteSequences: 
        
                               // For certain text (Greek, Cyrillic, ...), 2-byte sequences tend to be clustered. We'll try transcoding them in 
        
                               // a tight loop without falling back to the main loop. 
        
                               if (IsSecondCharTwoUtf8Bytes(thisDWord)) 
        
                               { 
        
                                   // We have two runs of two bytes each. 
        
                                   if (outputBytesRemaining < 4) 
        
                                   { 
        
                                       goto ProcessOneCharFromCurrentDWordAndFinish; // running out of output buffer 
        
                                   } 
        
                                   Unsafe.WriteUnaligned(pOutputBuffer, ExtractTwoUtf8TwoByteSequencesFromTwoPackedUtf16Chars(thisDWord)); 
        
                                   pInputBuffer += 2; 
        
                                   pOutputBuffer += 4; 
        
                                   outputBytesRemaining -= 4; 
        
                                   if (pInputBuffer > pFinalPosWhereCanReadDWordFromInputBuffer) 
        
                                   { 
        
                                       goto ProcessNextCharAndFinish; // Running out of data - go down slow path 
        
                                   } 
        
                                   else 
        
                                   { 
        
                                       // Optimization: If we read a long run of two-byte sequences, the next sequence is probably 
        
                                       // also two bytes. Check for that first before going back to the beginning of the loop. 
        
                                       thisDWord = Unsafe.ReadUnaligned<uint>(pInputBuffer); 
        
                                       if (IsFirstCharTwoUtf8Bytes(thisDWord)) 
        
                                       { 
        
                                           // Validated we have a two-byte sequence coming up 
        
                                           goto TryConsumeMultipleTwoByteSequences; 
        
                                       } 
        
                                       // If we reached this point, the next sequence is something other than a valid 
        
                                       // two-byte sequence, so go back to the beginning of the loop. 
        
                                       goto AfterReadDWord; 
        
                                   } 
        
                               } 
        
                               if (outputBytesRemaining < 2) 
        
                               { 
        
                                   goto OutputBufferTooSmall; 
        
                               } 
        
                               Unsafe.WriteUnaligned(pOutputBuffer, (ushort)ExtractUtf8TwoByteSequenceFromFirstUtf16Char(thisDWord)); 
        
                               // The buffer contains a 2-byte sequence followed by 2 bytes that aren't a 2-byte sequence. 
        
                               // Unlikely that a 3-byte sequence would follow a 2-byte sequence, so perhaps remaining 
        
                               // char is ASCII? 
        
                               if (IsSecondCharAscii(thisDWord)) 
        
                               { 
        
                                   if (outputBytesRemaining >= 3) 
        
                                   { 
        
                                       if (BitConverter.IsLittleEndian) 
        
                                       { 
        
                                           thisDWord >>= 16; 
        
                                       } 
        
                                       pOutputBuffer[2] = (byte)thisDWord; 
        
                                       pInputBuffer += 2; 
        
                                       pOutputBuffer += 3; 
        
                                       outputBytesRemaining -= 3; 
        
                                       continue; // go back to original bounds check and check for ASCII 
        
                                   } 
        
                                   else 
        
                                   { 
        
                                       pInputBuffer++; 
        
                                       pOutputBuffer += 2; 
        
                                       goto OutputBufferTooSmall; 
        
                                   } 
        
                               } 
        
                               else 
        
                               { 
        
                                   pInputBuffer++; 
        
                                   pOutputBuffer += 2; 
        
                                   outputBytesRemaining -= 2; 
        
                                   if (pInputBuffer > pFinalPosWhereCanReadDWordFromInputBuffer) 
        
                                   { 
        
                                       goto ProcessNextCharAndFinish; // Running out of data - go down slow path 
        
                                   } 
        
                                   else 
        
                                   { 
        
                                       thisDWord = Unsafe.ReadUnaligned<uint>(pInputBuffer); 
        
                                       goto BeforeProcessThreeByteSequence; // we know the next byte isn't ASCII, and it's not the start of a 2-byte sequence (this was checked above) 
        
                                   } 
        
                               } 
        
                           } 
        
                       // Check the 3-byte case. 
        
                       BeforeProcessThreeByteSequence: 
        
                           if (!IsFirstCharSurrogate(thisDWord)) 
        
                           { 
        
                               // Optimization: A three-byte character could indicate CJK text, which makes it likely 
        
                               // that the character following this one is also CJK. We'll perform the check now 
        
                               // rather than jumping to the beginning of the main loop. 
        
                               if (IsSecondCharAtLeastThreeUtf8Bytes(thisDWord)) 
        
                               { 
        
                                   if (!IsSecondCharSurrogate(thisDWord)) 
        
                                   { 
        
                                       if (outputBytesRemaining < 6) 
        
                                       { 
        
                                           goto ConsumeSingleThreeByteRun; // not enough space - try consuming as much as we can 
        
                                       } 
        
                                       WriteTwoUtf16CharsAsTwoUtf8ThreeByteSequences(ref *pOutputBuffer, thisDWord); 
        
                                       pInputBuffer += 2; 
        
                                       pOutputBuffer += 6; 
        
                                       outputBytesRemaining -= 6; 
        
                                       // Try to remain in the 3-byte processing loop if at all possible. 
        
                                       if (pInputBuffer > pFinalPosWhereCanReadDWordFromInputBuffer) 
        
                                       { 
        
                                           goto ProcessNextCharAndFinish; // Running out of data - go down slow path 
        
                                       } 
        
                                       else 
        
                                       { 
        
                                           thisDWord = Unsafe.ReadUnaligned<uint>(pInputBuffer); 
        
                                           if (IsFirstCharAtLeastThreeUtf8Bytes(thisDWord)) 
        
                                           { 
        
                                               goto BeforeProcessThreeByteSequence; 
        
                                           } 
        
                                           else 
        
                                           { 
        
                                               // Fall back to standard processing loop since we don't know how to optimize this. 
        
                                               goto AfterReadDWord; 
        
                                           } 
        
                                       } 
        
                                   } 
        
                               } 
        
                           ConsumeSingleThreeByteRun: 
        
                               if (outputBytesRemaining < 3) 
        
                               { 
        
                                   goto OutputBufferTooSmall; 
        
                               } 
        
                               WriteFirstUtf16CharAsUtf8ThreeByteSequence(ref *pOutputBuffer, thisDWord); 
        
                               pInputBuffer++; 
        
                               pOutputBuffer += 3; 
        
                               outputBytesRemaining -= 3; 
        
                               // Occasionally one-off ASCII characters like spaces, periods, or newlines will make their way 
        
                               // in to the text. If this happens strip it off now before seeing if the next character 
        
                               // consists of three code units. 
        
                               if (IsSecondCharAscii(thisDWord)) 
        
                               { 
        
                                   if (outputBytesRemaining == 0) 
        
                                   { 
        
                                       goto OutputBufferTooSmall; 
        
                                   } 
        
                                   if (BitConverter.IsLittleEndian) 
        
                                   { 
        
                                       *pOutputBuffer = (byte)(thisDWord >> 16); 
        
                                   } 
        
                                   else 
        
                                   { 
        
                                       *pOutputBuffer = (byte)(thisDWord); 
        
                                   } 
        
                                   pInputBuffer++; 
        
                                   pOutputBuffer++; 
        
                                   outputBytesRemaining--; 
        
                                   if (pInputBuffer > pFinalPosWhereCanReadDWordFromInputBuffer) 
        
                                   { 
        
                                       goto ProcessNextCharAndFinish; // Running out of data - go down slow path 
        
                                   } 
        
                                   else 
        
                                   { 
        
                                       thisDWord = Unsafe.ReadUnaligned<uint>(pInputBuffer); 
        
                                       if (IsFirstCharAtLeastThreeUtf8Bytes(thisDWord)) 
        
                                       { 
        
                                           goto BeforeProcessThreeByteSequence; 
        
                                       } 
        
                                       else 
        
                                       { 
        
                                           // Fall back to standard processing loop since we don't know how to optimize this. 
        
                                           goto AfterReadDWord; 
        
                                       } 
        
                                   } 
        
                               } 
        
                               if (pInputBuffer > pFinalPosWhereCanReadDWordFromInputBuffer) 
        
                               { 
        
                                   goto ProcessNextCharAndFinish; // Running out of data - go down slow path 
        
                               } 
        
                               else 
        
                               { 
        
                                   thisDWord = Unsafe.ReadUnaligned<uint>(pInputBuffer); 
        
                                   goto AfterReadDWordSkipAllCharsAsciiCheck; // we just checked above that this value isn't ASCII 
        
                               } 
        
                           } 
        
                           // Four byte sequence processing 
        
                           if (IsWellFormedUtf16SurrogatePair(thisDWord)) 
        
                           { 
        
                               if (outputBytesRemaining < 4) 
        
                               { 
        
                                   goto OutputBufferTooSmall; 
        
                               } 
        
                               Unsafe.WriteUnaligned(pOutputBuffer, ExtractFourUtf8BytesFromSurrogatePair(thisDWord)); 
        
                               pInputBuffer += 2; 
        
                               pOutputBuffer += 4; 
        
                               outputBytesRemaining -= 4; 
        
                               continue; // go back to beginning of loop for processing 
        
                           } 
        
                           goto Error; // an ill-formed surrogate sequence: high not followed by low, or low not preceded by high 
        
                       } while (pInputBuffer <= pFinalPosWhereCanReadDWordFromInputBuffer); 
        
                   ProcessNextCharAndFinish: 
        
                       inputLength = (int)(pFinalPosWhereCanReadDWordFromInputBuffer - pInputBuffer) + CharsPerDWord; 
        
                   ProcessInputOfLessThanDWordSize: 
        
                       Debug.Assert(inputLength < CharsPerDWord); 
        
                       if (inputLength == 0) 
        
                       { 
        
                           goto InputBufferFullyConsumed; 
        
                       } 
        
                       uint thisChar = *pInputBuffer; 
        
                       goto ProcessFinalChar; 
        
                   ProcessOneCharFromCurrentDWordAndFinish: 
        
                       if (BitConverter.IsLittleEndian) 
        
                       { 
        
                           thisChar = thisDWord & 0xFFFFu; // preserve only the first char 
        
                       } 
        
                       else 
        
                       { 
        
                           thisChar = thisDWord >> 16; // preserve only the first char 
        
                       } 
        
                   ProcessFinalChar: 
        
                       { 
        
                           if (thisChar <= 0x7Fu) 
        
                           { 
        
                               if (outputBytesRemaining == 0) 
        
                               { 
        
                                   goto OutputBufferTooSmall; // we have no hope of writing anything to the output 
        
                               } 
        
                               // 1-byte (ASCII) case 
        
                               *pOutputBuffer = (byte)thisChar; 
        
                               pInputBuffer++; 
        
                               pOutputBuffer++; 
        
                           } 
        
                           else if (thisChar < 0x0800u) 
        
                           { 
        
                               if (outputBytesRemaining < 2) 
        
                               { 
        
                                   goto OutputBufferTooSmall; // we have no hope of writing anything to the output 
        
                               } 
        
                               // 2-byte case 
        
                               pOutputBuffer[1] = (byte)((thisChar & 0x3Fu) | unchecked((uint)(sbyte)0x80)); // [ 10xxxxxx ] 
        
                               pOutputBuffer[0] = (byte)((thisChar >> 6) | unchecked((uint)(sbyte)0xC0)); // [ 110yyyyy ] 
        
                               pInputBuffer++; 
        
                               pOutputBuffer += 2; 
        
                           } 
        
                           else if (!UnicodeUtility.IsSurrogateCodePoint(thisChar)) 
        
                           { 
        
                               if (outputBytesRemaining < 3) 
        
                               { 
        
                                   goto OutputBufferTooSmall; // we have no hope of writing anything to the output 
        
                               } 
        
                               // 3-byte case 
        
                               pOutputBuffer[2] = (byte)((thisChar & 0x3Fu) | unchecked((uint)(sbyte)0x80)); // [ 10xxxxxx ] 
        
                               pOutputBuffer[1] = (byte)(((thisChar >> 6) & 0x3Fu) | unchecked((uint)(sbyte)0x80)); // [ 10yyyyyy ] 
        
                               pOutputBuffer[0] = (byte)((thisChar >> 12) | unchecked((uint)(sbyte)0xE0)); // [ 1110zzzz ] 
        
                               pInputBuffer++; 
        
                               pOutputBuffer += 3; 
        
                           } 
        
                           else if (thisChar <= 0xDBFFu) 
        
                           { 
        
                               // UTF-16 high surrogate code point with no trailing data, report incomplete input buffer 
        
                               goto InputBufferTooSmall; 
        
                           } 
        
                           else 
        
                           { 
        
                               // UTF-16 low surrogate code point with no leading data, report error 
        
                               goto Error; 
        
                           } 
        
                       } 
        
                       // There are two ways we can end up here. Either we were running low on input data, 
        
                       // or we were running low on space in the destination buffer. If we're running low on 
        
                       // input data (label targets ProcessInputOfLessThanDWordSize and ProcessNextCharAndFinish), 
        
                       // then the inputLength value is guaranteed to be between 0 and 1, and we should return Done. 
        
                       // If we're running low on destination buffer space (label target ProcessOneCharFromCurrentDWordAndFinish), 
        
                       // then we didn't modify inputLength since entering the main loop, which means it should 
        
                       // still have a value of >= 2. So checking the value of inputLength is all we need to do to determine 
        
                       // which of the two scenarios we're in. 
        
                       if (inputLength > 1) 
        
                       { 
        
                           goto OutputBufferTooSmall; 
        
                       } 
        
                   InputBufferFullyConsumed: 
        
                       OperationStatus retVal = OperationStatus.Done; 
        
                       goto ReturnCommon; 
        
                   InputBufferTooSmall: 
        
                       retVal = OperationStatus.NeedMoreData; 
        
                       goto ReturnCommon; 
        
                   OutputBufferTooSmall: 
        
                       retVal = OperationStatus.DestinationTooSmall; 
        
                       goto ReturnCommon; 
        
                   Error: 
        
                       retVal = OperationStatus.InvalidData; 
        
                       goto ReturnCommon; 
        
                   ReturnCommon: 
        
                       pInputBufferRemaining = pInputBuffer; 
        
                       pOutputBufferRemaining = pOutputBuffer; 
        
                       return retVal; 
        
                   }

(standard tier 1 version).
This one has a CSE with 27 definitions and with no uses in reachable blocks. I wonder if it would be worth it to compute the DFS tree. It seems like the heuristic is not accounting for reachability of the uses (probably the better solution is to aggressively trim the blocks earlier)

dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Nov 19, 2024

dotnet-policy-service bot assigned jakobbotsch Nov 19, 2024

jakobbotsch commented Nov 19, 2024

View reviewed changes

Fix release build

b4302b2

jakobbotsch commented Nov 19, 2024

View reviewed changes

Fix assert

b7f1c1f

Fix def CSE VN

c00dc2d

jakobbotsch marked this pull request as ready for review November 20, 2024 17:06

jakobbotsch requested a review from AndyAyersMS November 20, 2024 17:07

AndyAyersMS approved these changes Nov 21, 2024

View reviewed changes

jakobbotsch merged commit fe70623 into dotnet:main Nov 22, 2024
131 of 133 checks passed

jakobbotsch deleted the fix-109745 branch November 22, 2024 09:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JIT: Use reaching definitions in CSE to update conservative VNs #109959

JIT: Use reaching definitions in CSE to update conservative VNs #109959

jakobbotsch commented Nov 19, 2024 •

edited

Loading

jakobbotsch Nov 19, 2024

jakobbotsch Nov 19, 2024

jakobbotsch commented Nov 19, 2024

azure-pipelines bot commented Nov 19, 2024

jakobbotsch commented Nov 19, 2024

azure-pipelines bot commented Nov 19, 2024

jakobbotsch commented Nov 20, 2024

azure-pipelines bot commented Nov 20, 2024

jakobbotsch commented Nov 20, 2024 •

edited

Loading

AndyAyersMS left a comment

jakobbotsch commented Nov 22, 2024

jakobbotsch commented Nov 22, 2024

		ValueNum lengthVN =
		vnStore->VNNormalValue(tree->AsBoundsChk()->GetArrayLength()->gtVNPair.GetConservative());

JIT: Use reaching definitions in CSE to update conservative VNs #109959

JIT: Use reaching definitions in CSE to update conservative VNs #109959

Conversation

jakobbotsch commented Nov 19, 2024 • edited Loading

jakobbotsch Nov 19, 2024

Choose a reason for hiding this comment

jakobbotsch Nov 19, 2024

Choose a reason for hiding this comment

jakobbotsch commented Nov 19, 2024

azure-pipelines bot commented Nov 19, 2024

jakobbotsch commented Nov 19, 2024

azure-pipelines bot commented Nov 19, 2024

jakobbotsch commented Nov 20, 2024

azure-pipelines bot commented Nov 20, 2024

jakobbotsch commented Nov 20, 2024 • edited Loading

AndyAyersMS left a comment

Choose a reason for hiding this comment

jakobbotsch commented Nov 22, 2024

jakobbotsch commented Nov 22, 2024

jakobbotsch commented Nov 19, 2024 •

edited

Loading

jakobbotsch commented Nov 20, 2024 •

edited

Loading