Widespread timeouts in Edge 17 #563

foolip · 2018-05-17T08:05:50Z

Edge results have always seemed to have a lot of TIMEOUT in the "Test file" row. Browsing around at random it's easy find cases for Edge 17 as well:
https://wpt.fyi/results/accelerometer/Accelerometer-enabled-by-feature-policy-attribute-redirect-on-load.https.html?sha=ad84337c03
https://wpt.fyi/results/compat/interfaces.any.html?sha=ad84337c03
https://wpt.fyi/results/wake-lock/wakelock-cancel-twice.https.html?sha=ad84337c03

@jugglinmike, as a start, could you count how many tests are like this, where only Edge has a harness-level timeout?

gsnedders · 2018-07-19T08:16:51Z

Do we know if this is Sauce specific or happens with native runs?

foolip · 2018-07-19T11:04:55Z

@thejohnjansen have you been able to compare the results from Sauce with your own runs yet?

foolip · 2018-08-08T20:31:21Z

I'm seeing this in https://wpt.fyi/results/old-tests/submission/Microsoft/selection?sha=eea0b54014&label=stable now.

foolip · 2018-08-17T11:00:53Z

Seeing this today also, in https://wpt.fyi/results/webmessaging/with-options?sha=c0a1bc68d7&label=stable

foolip · 2018-08-17T11:17:04Z

As a point of comparison, the same tests from the experimental Edge 18 run:
https://wpt.fyi/results/accelerometer/Accelerometer-enabled-by-feature-policy-attribute-redirect-on-load.https.html?sha=548797a187
https://wpt.fyi/results/compat/interfaces.any.html?sha=548797a187 (missing, not sure why)
https://wpt.fyi/results/wake-lock/wakelock-cancel-twice.https.html?sha=548797a187

The tests also run and have results when testing Edge 17 manually, so I'm quite sure this isn't a problem with Edge, but with the running infrastructure.

foolip · 2018-08-17T11:21:24Z

https://wpt.fyi/results/?product=edge@085d47ef10&product=edge@548797a187&diff compares two runs that are close in their wpt revisions. It's a bit hard to navigate (web-platform-tests/wpt.fyi#411) but here's one directory with a lot of Ede 17 timeouts which are probably spurious:
https://wpt.fyi/results/dom/events?product=edge@085d47ef10&product=edge@548797a187&diff=true

foolip · 2018-10-16T10:20:13Z

This issue became apparent again today on a web-platform-tests that fail only in Chrome (from wpt.fyi data) thread on blink-dev.

foolip · 2018-10-19T07:14:46Z

This may be the most extreme bad run I've spotted so far: https://wpt.fyi/results/?sha=5acd3bcf66

0 / N almost down the line. @jugglinmike, perhaps the setup has become totally broken somehow?

@lukebjerring, do you have an issue for "validate results"? Seems like with that in place we would reject a run like this, or label it bad at least.

foolip · 2018-10-19T12:20:40Z

This appears to be the most complete of recent runs:
https://wpt.fyi/results/?sha=937ed3c7b1

https://wpt.fyi/results/?sha=99e577f57c is another really bad one, and before that I can't find any that are this busted going back as far as 434ca47448. So this appears to be a recent regression.

jugglinmike · 2018-10-19T18:50:49Z

Well, the builds certainly aren't completing any faster. They're just slow and wrong now.

We're seeing three different kinds of errors. The first two:

▶ ERROR [expected OK] /css/css-scoping/slot-non-html-display-value.html
└   → Message: Unable to get property 'addEventListener' of undefined or null reference (WARNING: The server did not provide any stacktrace information)

▶ ERROR [expected OK] /websockets/interfaces/WebSocket/bufferedAmount/bufferedAmount-large.html
└   → Message: Unable to get property 'addEventListener' of undefined or null reference (WARNING: The server did not provide any stacktrace information)

...are always followed by a timeout, e.g.

Command duration or timeout: 390 milliseconds
Build info: version: '2.52.0', revision: '4c2593c', time: '2016-02-11 19:06:42'
System info: host: 'sauce-win10', ip: '10.100.19.211', os.name: 'Windows 10', os.arch: 'amd64', os.version: '10.0', java.version: '1.8.0_131'
Driver info: org.openqa.selenium.edge.EdgeDriver
Capabilities [{applicationCacheEnabled=true, InPrivate=false, pageLoadStrategy=normal, platform=ANY, acceptSslCerts=true, browserVersion=42.17134.1.0, platformVersion=10, locationContextEnabled=true, webStorageEnabled=true, browserName=MicrosoftEdge, takesScreenshot=true, takesElementScreenshot=true, platformName=windows}]
Session ID: XXXXXXXX-XXXX-XXXX-XXXX-XXXXE993DD85
Stacktrace:
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0 (None:-2)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance (None:-1)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance (None:-1)
    at java.lang.reflect.Constructor.newInstance (None:-1)
    at org.openqa.selenium.remote.ErrorHandler.createThrowable (ErrorHandler.java:206)
    at org.openqa.selenium.remote.ErrorHandler.throwIfResponseFailed (ErrorHandler.java:158)
    at org.openqa.selenium.remote.RemoteWebDriver.execute (RemoteWebDriver.java:678)
    at org.openqa.selenium.remote.RemoteWebDriver.executeAsyncScript (RemoteWebDriver.java:598)
    at sun.reflect.NativeMethodAccessorImpl.invoke0 (None:-2)
    at sun.reflect.NativeMethodAccessorImpl.invoke (None:-1)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke (None:-1)
    at java.lang.reflect.Method.invoke (None:-1)
    at org.openqa.selenium.support.events.EventFiringWebDriver$2.invoke (EventFiringWebDriver.java:103)
    at com.sun.proxy.$Proxy1.executeAsyncScript (None:-1)
    at org.openqa.selenium.support.events.EventFiringWebDriver.executeAsyncScript (EventFiringWebDriver.java:229)
    at org.openqa.selenium.remote.server.handler.ExecuteAsyncScript.call (ExecuteAsyncScript.java:58)
    at java.util.concurrent.FutureTask.run (None:-1)
    at org.openqa.selenium.remote.server.DefaultSession$1.run (DefaultSession.java:176)
    at java.util.concurrent.ThreadPoolExecutor.runWorker (None:-1)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run (None:-1)
    at java.lang.Thread.run (None:-1)

The third kind of error is far more common but far less helpful:

▶ ERROR [expected OK] /websockets/interfaces/WebSocket/bufferedAmount/bufferedAmount-readonly.html
└   → 1

It is never followed by a timeout.

As you can see, the logs are now formatted with the "GroupingFormatter" because that was recently made the default behavior of the WPT CLI. As mentioned there, this format is unideal for continuous integration because it limits the amount of information available in situations like these.

I have reinstated the previous default logging format for this project. This won't fix anything, but it should help us in future debugging attempts. I've updated WPT so that I'm notified of changes to WPT's tools directory, so hopefully I can stay on top of things like this in the future.

In the mean time, I have also updated to the latest release of the "Sauce Connect Binary" in the off chance that "Fixed file descriptor leak." has some bearing here.

foolip · 2018-10-19T19:37:04Z

Thanks for investigating Mike! Do you think it'd be worth trying more aggressive restarting, and perhaps more chunks, to mitigate breakage like this?

jugglinmike · 2018-10-19T21:34:07Z

Do you think it'd be worth trying more aggressive restarting

The fact that this immediately interferes with almost every "chunk" makes me think that restarting will not resolve it, but it couldn't hurt to try (especially now that we're disabling Safari on Sauce Labs). We'll see what effect this has on time-to-results (it may get ugly).

(I was ready to point out that these are "ERROR"s and not "FAILURE"s, but then I remembered that doesn't actually matter.)

perhaps more chunks

For Edge, we're currently running WPT in 100 "chunks" which already feels a bit excessive. I can give it a shot if we run out of ideas, though.

foolip · 2018-10-19T22:06:12Z

Oh, if it's already 100 chunks, then scratch that idea. Seems like something is broken with Sauce or the sauce support in ./wpt run then?

jugglinmike · 2018-10-25T21:04:53Z

We've completed a collection from Edge that involved restarting after every test failure. This increased build time from 15 hours to 24 hours, but it had no appreciable affect on the veracity of the results.

Strangely, I haven't been able to connect to Sauce Labs locally. I'm not going to prioritize fixing that yet for a few reasons:

This is distinct from the behavior we're seeing on the workers
We've seen our share of roving DNS issues, so this may resolve itself in time

In the interim, I can do ad-hoc testing from any of the worker machines; that's probably more authentic, anyway. ...just not while they're actively testing. Although Edge is now running 24/7, 10 of our machines do not run Sauce Labs tests, so this is still a viable option.

jugglinmike · 2018-11-02T22:04:33Z

The problem comes from the way cross-realm objects are interpreted by Edge/Edgedriver. The easiest way to see this is by comparing the results of the WebDriver "execute script" command which collects test results from the browser.

Excerpt from logs of a test conducted on 2018-10-14

{
  "screenshot": null,
  "between_commands": 0.1099998950958252,
  "suggestion_values": [],
  "request": {
    "args": [],
    "script": "var callback = arguments[arguments.length - 1];\nwindow.testdriver_callback = callback;\nwindow.process_next_event();\n"
  },
  "HTTPStatus": 200,
  "result": [
    "/css/css-typed-om/the-stylepropertymap/properties/white-space.html",
    "complete",
    [
      0,
      null,
      null,
      [
        [
          "'white-space' property",
          1,
          "'CSSKeywordValue' is not defined",
          "ReferenceError: 'CSSKeywordValue' is not defined\n   at Global code (http://web-platform.test:8000/css/css-typed-om/the-stylepropertymap/properties/resources/testsuite.js:23:1)"
        ]
      ]
    ]
  ],
  "suggestion": null,
  "duration": 0.015000104904174805,
  "path": "execute_async",
  "method": "POST",
  "statusCode": 0,
  "timelineSeconds": 9.593000173568726
}

Excerpt from logs of a test conducted on 2018-10-17

{
  "screenshot": null,
  "between_commands": 0.2969999313354492,
  "suggestion_values": [],
  "request": {
    "args": [],
    "script": "var callback = arguments[arguments.length - 1];\nwindow.opener.testdriver_callback = callback;\nwindow.opener.process_next_event();\n"
  },
  "HTTPStatus": 200,
  "result": {
    "0": "/speech-api/historical.html",
    "1": "complete",
    "2": {
      "0": 0,
      "1": null,
      "2": null,
      "3": {
        "0": {
          "0": "SpeechRecognitionError interface should not exist",
          "1": 0,
          "2": null,
          "3": null
        },
        "1": {
          "0": "webkitSpeechGrammar interface should not exist",
          "1": 0,
          "2": null,
          "3": null
        },
        "2": {
          "0": "webkitSpeechGrammarList interface should not exist",
          "1": 0,
          "2": null,
          "3": null
        },
        "3": {
          "0": "webkitSpeechRecognition interface should not exist",
          "1": 0,
          "2": null,
          "3": null
        },
        "4": {
          "0": "webkitSpeechRecognitionError interface should not exist",
          "1": 0,
          "2": null,
          "3": null
        },
        "5": {
          "0": "webkitSpeechRecognitionEvent interface should not exist",
          "1": 0,
          "2": null,
          "3": null
        }
      }
    }
  },
  "suggestion": null,
  "duration": 0.25,
  "path": "execute_async",
  "method": "POST",
  "statusCode": 0,
  "timelineSeconds": 53.45300006866455
}

This has interfered with results collection because the wptrunner expects to receive a Python list, but the structure that comes from Edge is deserialized into a dict. (We were seeing an error message of simply, "1" because of an oversight in wptrunner's error handling code.)

All of the logging data provided by Sauce Labs indicates that the same build of Edge and Edgedriver were used in both cases:

And I couldn't find any relevant notes in the Sauce Labs product announcement blog or on Edgedriver's website.

It turns out that this was actually caused by a change to WPT: web-platform-tests/wpt#13419 . I've submitted a fix that will accomodate Edge's non-standard behavior and a separate patch with WebDriver specification tests.

jugglinmike · 2018-11-06T18:30:12Z

We landed a short-term fix for this in WPT:

web-platform-tests/wpt#13938

Since that was merged today, the change will not influence collection until tomorrow. I'll leave this open until that can be verified (likely not until Thursday).

foolip · 2018-11-08T13:59:17Z

Still no results on wpt.fyi, what is the status of the Edge run that started after the fix?

jugglinmike · 2018-11-08T20:46:37Z

Good news first: we successfully collected from Edge, the data appears to be valid, and the restart-enabled attempt took less than 24 hours.

The reason you don't see this on the frontpage of https://wpt.fyi is that the corresponding collection for Chrome failed. If it's not one browser...

I've manually triggered a retry, and filed gh-625 to track this latest instability.

foolip · 2018-11-08T21:27:32Z

71fe0a6c3d was the daily commit, and Taskcluster got started for it:
https://tools.taskcluster.net/groups/FJTQgT1GQ1qVCsAwYXVKzg

But it failed, I've filed web-platform-tests/wpt#13989.

foolip mentioned this issue Aug 8, 2018

Convert Microsoft's selection tests to testharness.js web-platform-tests/wpt#7595

Merged

jugglinmike mentioned this issue Oct 31, 2018

[wptrunner] Improve errors from Selenium web-platform-tests/wpt#13806

Merged

This was referenced Nov 2, 2018

[wptrunner] Transmit test results via postMessage web-platform-tests/wpt#13881

Closed

Edge runs are too slow (20-30h) #623

Open

jugglinmike mentioned this issue Nov 13, 2018

No Edge runs #628

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Widespread timeouts in Edge 17 #563

Widespread timeouts in Edge 17 #563

foolip commented May 17, 2018 •

edited

Loading

gsnedders commented Jul 19, 2018

foolip commented Jul 19, 2018

foolip commented Aug 8, 2018

foolip commented Aug 17, 2018

foolip commented Aug 17, 2018

foolip commented Aug 17, 2018

foolip commented Oct 16, 2018

foolip commented Oct 19, 2018

foolip commented Oct 19, 2018

jugglinmike commented Oct 19, 2018

foolip commented Oct 19, 2018

jugglinmike commented Oct 19, 2018

foolip commented Oct 19, 2018

jugglinmike commented Oct 25, 2018

jugglinmike commented Nov 2, 2018

jugglinmike commented Nov 6, 2018

foolip commented Nov 8, 2018

jugglinmike commented Nov 8, 2018

foolip commented Nov 8, 2018

Widespread timeouts in Edge 17 #563

Widespread timeouts in Edge 17 #563

Comments

foolip commented May 17, 2018 • edited Loading

gsnedders commented Jul 19, 2018

foolip commented Jul 19, 2018

foolip commented Aug 8, 2018

foolip commented Aug 17, 2018

foolip commented Aug 17, 2018

foolip commented Aug 17, 2018

foolip commented Oct 16, 2018

foolip commented Oct 19, 2018

foolip commented Oct 19, 2018

jugglinmike commented Oct 19, 2018

foolip commented Oct 19, 2018

jugglinmike commented Oct 19, 2018

foolip commented Oct 19, 2018

jugglinmike commented Oct 25, 2018

jugglinmike commented Nov 2, 2018

jugglinmike commented Nov 6, 2018

foolip commented Nov 8, 2018

jugglinmike commented Nov 8, 2018

foolip commented Nov 8, 2018

foolip commented May 17, 2018 •

edited

Loading