-
Notifications
You must be signed in to change notification settings - Fork 46
Widespread timeouts in Edge 17 #563
Comments
Do we know if this is Sauce specific or happens with native runs? |
@thejohnjansen have you been able to compare the results from Sauce with your own runs yet? |
I'm seeing this in https://wpt.fyi/results/old-tests/submission/Microsoft/selection?sha=eea0b54014&label=stable now. |
Seeing this today also, in https://wpt.fyi/results/webmessaging/with-options?sha=c0a1bc68d7&label=stable |
As a point of comparison, the same tests from the experimental Edge 18 run: The tests also run and have results when testing Edge 17 manually, so I'm quite sure this isn't a problem with Edge, but with the running infrastructure. |
https://wpt.fyi/results/?product=edge@085d47ef10&product=edge@548797a187&diff compares two runs that are close in their wpt revisions. It's a bit hard to navigate (web-platform-tests/wpt.fyi#411) but here's one directory with a lot of Ede 17 timeouts which are probably spurious: |
This issue became apparent again today on a web-platform-tests that fail only in Chrome (from wpt.fyi data) thread on blink-dev. |
This may be the most extreme bad run I've spotted so far: https://wpt.fyi/results/?sha=5acd3bcf66 0 / N almost down the line. @jugglinmike, perhaps the setup has become totally broken somehow? @lukebjerring, do you have an issue for "validate results"? Seems like with that in place we would reject a run like this, or label it bad at least. |
This appears to be the most complete of recent runs: https://wpt.fyi/results/?sha=99e577f57c is another really bad one, and before that I can't find any that are this busted going back as far as 434ca47448. So this appears to be a recent regression. |
Well, the builds certainly aren't completing any faster. They're just slow and wrong now. We're seeing three different kinds of errors. The first two:
...are always followed by a timeout, e.g.
The third kind of error is far more common but far less helpful:
It is never followed by a timeout. As you can see, the logs are now formatted with the "GroupingFormatter" because that was recently made the default behavior of the WPT CLI. As mentioned there, this format is unideal for continuous integration because it limits the amount of information available in situations like these. I have reinstated the previous default logging format for this project. This won't fix anything, but it should help us in future debugging attempts. I've updated WPT so that I'm notified of changes to WPT's In the mean time, I have also updated to the latest release of the "Sauce Connect Binary" in the off chance that "Fixed file descriptor leak." has some bearing here. |
Thanks for investigating Mike! Do you think it'd be worth trying more aggressive restarting, and perhaps more chunks, to mitigate breakage like this? |
The fact that this immediately interferes with almost every "chunk" makes me think that restarting will not resolve it, but it couldn't hurt to try (especially now that we're disabling Safari on Sauce Labs). We'll see what effect this has on time-to-results (it may get ugly). (I was ready to point out that these are "ERROR"s and not "FAILURE"s, but then I remembered that doesn't actually matter.)
For Edge, we're currently running WPT in 100 "chunks" which already feels a bit excessive. I can give it a shot if we run out of ideas, though. |
Oh, if it's already 100 chunks, then scratch that idea. Seems like something is broken with Sauce or the sauce support in |
We've completed a collection from Edge that involved restarting after every test failure. This increased build time from 15 hours to 24 hours, but it had no appreciable affect on the veracity of the results. Strangely, I haven't been able to connect to Sauce Labs locally. I'm not going to prioritize fixing that yet for a few reasons:
In the interim, I can do ad-hoc testing from any of the worker machines; that's probably more authentic, anyway. ...just not while they're actively testing. Although Edge is now running 24/7, 10 of our machines do not run Sauce Labs tests, so this is still a viable option. |
The problem comes from the way cross-realm objects are interpreted by Edge/Edgedriver. The easiest way to see this is by comparing the results of the WebDriver "execute script" command which collects test results from the browser. Excerpt from logs of a test conducted on 2018-10-14{
"screenshot": null,
"between_commands": 0.1099998950958252,
"suggestion_values": [],
"request": {
"args": [],
"script": "var callback = arguments[arguments.length - 1];\nwindow.testdriver_callback = callback;\nwindow.process_next_event();\n"
},
"HTTPStatus": 200,
"result": [
"/css/css-typed-om/the-stylepropertymap/properties/white-space.html",
"complete",
[
0,
null,
null,
[
[
"'white-space' property",
1,
"'CSSKeywordValue' is not defined",
"ReferenceError: 'CSSKeywordValue' is not defined\n at Global code (http://web-platform.test:8000/css/css-typed-om/the-stylepropertymap/properties/resources/testsuite.js:23:1)"
]
]
]
],
"suggestion": null,
"duration": 0.015000104904174805,
"path": "execute_async",
"method": "POST",
"statusCode": 0,
"timelineSeconds": 9.593000173568726
} Excerpt from logs of a test conducted on 2018-10-17{
"screenshot": null,
"between_commands": 0.2969999313354492,
"suggestion_values": [],
"request": {
"args": [],
"script": "var callback = arguments[arguments.length - 1];\nwindow.opener.testdriver_callback = callback;\nwindow.opener.process_next_event();\n"
},
"HTTPStatus": 200,
"result": {
"0": "/speech-api/historical.html",
"1": "complete",
"2": {
"0": 0,
"1": null,
"2": null,
"3": {
"0": {
"0": "SpeechRecognitionError interface should not exist",
"1": 0,
"2": null,
"3": null
},
"1": {
"0": "webkitSpeechGrammar interface should not exist",
"1": 0,
"2": null,
"3": null
},
"2": {
"0": "webkitSpeechGrammarList interface should not exist",
"1": 0,
"2": null,
"3": null
},
"3": {
"0": "webkitSpeechRecognition interface should not exist",
"1": 0,
"2": null,
"3": null
},
"4": {
"0": "webkitSpeechRecognitionError interface should not exist",
"1": 0,
"2": null,
"3": null
},
"5": {
"0": "webkitSpeechRecognitionEvent interface should not exist",
"1": 0,
"2": null,
"3": null
}
}
}
},
"suggestion": null,
"duration": 0.25,
"path": "execute_async",
"method": "POST",
"statusCode": 0,
"timelineSeconds": 53.45300006866455
} This has interfered with results collection because the wptrunner expects to receive a Python list, but the structure that comes from Edge is deserialized into a dict. (We were seeing an error message of simply, "1" because of an oversight in wptrunner's error handling code.) All of the logging data provided by Sauce Labs indicates that the same build of Edge and Edgedriver were used in both cases: And I couldn't find any relevant notes in the Sauce Labs product announcement blog or on Edgedriver's website. It turns out that this was actually caused by a change to WPT: web-platform-tests/wpt#13419 . I've submitted a fix that will accomodate Edge's non-standard behavior and a separate patch with WebDriver specification tests. |
We landed a short-term fix for this in WPT: Since that was merged today, the change will not influence collection until tomorrow. I'll leave this open until that can be verified (likely not until Thursday). |
Still no results on wpt.fyi, what is the status of the Edge run that started after the fix? |
Good news first: we successfully collected from Edge, the data appears to be valid, and the restart-enabled attempt took less than 24 hours. The reason you don't see this on the frontpage of https://wpt.fyi is that the corresponding collection for Chrome failed. If it's not one browser... I've manually triggered a retry, and filed gh-625 to track this latest instability. |
71fe0a6c3d was the daily commit, and Taskcluster got started for it: But it failed, I've filed web-platform-tests/wpt#13989. |
Edge results have always seemed to have a lot of TIMEOUT in the "Test file" row. Browsing around at random it's easy find cases for Edge 17 as well:
https://wpt.fyi/results/accelerometer/Accelerometer-enabled-by-feature-policy-attribute-redirect-on-load.https.html?sha=ad84337c03
https://wpt.fyi/results/compat/interfaces.any.html?sha=ad84337c03
https://wpt.fyi/results/wake-lock/wakelock-cancel-twice.https.html?sha=ad84337c03
@jugglinmike, as a start, could you count how many tests are like this, where only Edge has a harness-level timeout?
The text was updated successfully, but these errors were encountered: