-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
System.Text.Json failing some large file tests #59678
Comments
Tagging subscribers to this area: @eiriktsarpalis, @layomia Issue DetailsIf I didn't know any better, I'd say "memory corruption".
|
Interesting observation: In all three failures the expected character vs the actual char had a difference of 2. (q => s, u => w, e => g). Maybe it really is some sort of memory corruption (something has a bad pointer somewhere, and added 2). Secondary observation: They all feel like they have expected and actual backwards 😄. |
Seems related to #45464 |
The runtime-extra-platforms rolling build also hit the failure in |
Not sure if this is exactly related, but build https://dev.azure.com/dnceng/public/_build/results?buildId=1697303&view=logs&j=05e92ac1-194e-59cf-664a-fa72d1cdd19b&t=caea1d4b-c90c-5be1-e57d-c4635079c333 failed with HandleCollectionsAsync test failing in a very similar way. Here are the logs: https://helixre8s23ayyeko0k025g8.blob.core.windows.net/dotnet-runtime-refs-heads-main-50e485d6cc7b461f9a/System.Text.Json.Tests/1/console.6fad303f.log?sv=2019-07-07&se=2022-04-23T08%3A48%3A09Z&sr=c&sp=rl&sig=SNdOcc025qirLoyBCvPN7U%2BHl1uIgU94Zo64yUp0CLk%3D |
In @bartonjs case, the second least significant bit is flipping 0->1 in every case. If it was a bad pointer I think we'd expect 1->0 also, but it's a sample of three. In @joperezr case (again as @bartonjs points out, Expected and Actual are reversed - cc @eiriktsarpalis ) it is quite different, several bits flipped, and it's then repeated.
It's important to record machine names in such cases. Last one above was dci-mac-build-365.local. Looks like we lost the other two? We have in the past send a box to the trash because it was regularly flipping 1 bit. |
Oh, @bartonjs case was dci-mac-build-195. |
Might be related to this other issue? #65021 |
In the same CI job that Joe shared, there's another log file with a slightly different callstack, in case it's helpful: Pipeline: Libraries Test Run release coreclr OSX x64 Release
Callstack:
|
@dotnet/area-system-text-json Happened once again in today's morning rolling build: https://dev.azure.com/dnceng/public/_build/results?buildId=1700046&view=results
|
Logging machine name from above instance: so no evidence this is hardware specific. |
@eiriktsarpalis @steveharter @layomia can we get the test disabled or a fix submitted? This happened again in the
|
FWIW this seems like a different failure compared to the original report. Is this impacting osx x64 builds only? Should we only disable the test for that platform? |
I'll add couple of other tests I've seen locally, first 3 look like the "not found" value is actually in the string and in all 3 cases it's using current culture for comparison:
also in another run I've seen those 2 happen in a single run which was rather unusual and I think the stream was actually corrupted (somehow ended up with first byte being zero) - interestingly this only affected async related cases and same test using non-async code path was fine:
I'll be replacing Assert.Contains in our tests into something else which can possibly provide us some more details |
@krwq perhaps you can put it in a shared place - AssertExtensions, we already did something very similar for Equals If we build up several of these improved assertions and they show value, we could try to see whether xunit would be interested in any changes. |
FWIW the AssertExtensions.Contains already exists and does exactly what I planned to add (which is doing Ordinal comparison by default). I'm currently running tests locally to see if StringComparison.CurrentCulture vs Ordinal make any difference (I want to make sure the failures above are not related to some globalization bug). I'm basically doing CurrentCulture comparison and rather than throwing on "not found" I'm also comparing using Ordinal and if I see any mismatch that will suggest to me this is likely some kind of globalization initialization error. If both checks fail it's likely memory corruption. I'll update once I get a repro (although it might take a day or two depending on the luck) |
Ok, finally got lucky - random Assert.Contains failed where IMO some issues we're seeing are some kind of globalization issue - maybe cultures didn't get initialized correctly when it happens on multiple threads. We'll likely need to try to write some standalone repro and try to attach debugger to see if we can see what exactly happened and then maybe we can guess what should be changed. From JSON perspective we likely should change to ordinal compare anyway but it would be good to first nail down what the problem with CurrentCulture is |
I'll locally do Ordinal compare for now to flush out other flaky issues and meantime I'm checking with @tarekgh if he's seen anything similar happening elsewhere... |
Even if this has morphed from the original issue, we should take a look at this during 9.0.0 since it causes CI failures. |
Yeah I'm closing it again and I'll make mroe specific tests if any aren't caught by the others |
the problem is there are failures of this specific test and the failures look like:
https://helixre107v0xdcypoyl9e7f.blob.core.windows.net/dotnet-runtime-refs-pull-101117-merge-608176b4786749e2b4/System.Text.Json.Tests/1/console.6c31de45.log?helixlogtype=result |
we should probably just skip the test on the interpreter |
opened #101193 |
It is looking increasingly like the ooms were indeed only part of the problem and now we're hitting a bunch of different ways to timeout in the interpreter on desktop and wasm that this catches |
https://dev.azure.com/dnceng/public/_build/results?buildId=1387876&view=ms.vss-test-web.build-test-results-tab&runId=40401306&resultId=194463&paneView=debug
If I didn't know any better, I'd say "memory corruption".
https://helixre8s23ayyeko0k025g8.blob.core.windows.net/dotnet-runtime-refs-pull-59626-merge-ff748885001649f2ad/System.Text.Json.Tests/1/console.55b96e63.log?sv=2019-07-07&se=2021-10-17T15%3A49%3A24Z&sr=c&sp=rl&sig=jkQ9vlfitTdm9LAJkyCb4ZnZxrI5Gb%2F8qvt%2FSn0S27M%3D
Error Message
Fill the error message using known issues guidance.
Report
Summary
Known issue validation
Build: 🔎⚠️ Validation could not be done without an Azure DevOps build URL on the issue. Please add it to the "Build: 🔎" line.
Result validation:
The text was updated successfully, but these errors were encountered: