Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Errors in removeIdleSockets #239

Open
bridger opened this issue Feb 3, 2018 · 8 comments
Open

Errors in removeIdleSockets #239

bridger opened this issue Feb 3, 2018 · 8 comments

Comments

@bridger
Copy link

bridger commented Feb 3, 2018

In my server log I see these errors:

[IncomingSocketManager.swift:252 removeIdleSockets(removeAll:)] epoll_ctl failure. Error code=1. Reason=Operation not permitted

It seems that the removeIdleSockets is getting the EPERM exception when cleaning up idle connections.

I'm running my server using this Docker container. It is deployed on Amazon ECS.

My server is mostly a websocket server. There is a memory leak I'm trying to investigate and I wonder if this might be related.

@djones6
Copy link
Contributor

djones6 commented Feb 5, 2018

I haven't seen this issue myself, though I am currently investigating a potential threading issue around removeIdleSockets (issue #237) which could be related.

There are two threads on Linux which perform epoll_wait, with connections distributed between them. These threads should be the only ones invoking epoll methods on their respective FDs, however when a new connection is received, we call removeIdleSockets (at most, once every 5 seconds) to clear any stale ones. This is performed on a different thread, and I wonder if this EPERM error is related to two threads trying to invoke functions on the same epoll FD concurrently.

@bridger
Copy link
Author

bridger commented Feb 11, 2018

That theory makes sense to me!

I just got another crash that seems related. All I got from the logs is this:

Fatal error: Trying to remove task, but it's not in the registry.: file Foundation/URLSession/TaskRegistry.swift, line 76

This has only happened once, so it is pretty rare. I don't see anything unusual in the logs beforehand.

@mikezander
Copy link

@bridger I'm also getting this issue, have you found a solution?

@ianpartridge
Copy link
Collaborator

@mikezander have you moved to Swift 5 recently? We've had a few reports of this and it looks like it's a bug in URLSession on Linux. There is a prototype fix here that we are hoping to get into Swift 5.0.1: swiftlang/swift-corelibs-foundation#2061

@mikezander
Copy link

@ianpartridge No I actually haven't updated to Swift 5 yet. I'm still running Swift 4 on Kitura version 2.3.0, I was thinking I should update to 2.5.0, could that possibly fix the issue?

@mikezander
Copy link

mikezander commented Apr 4, 2019

Hmm I can't replicate it but based off that bug it looks like the issue is Swift related.

@ianpartridge
Copy link
Collaborator

Interesting. All the reports we have had so far are on Swift 5. The problem is definitely in Foundation not Kitura so I'm afraid upgrading Kitura is unlikely to help (although we would recommend you do that anyway as there are piles of improvements since version 2.3!).

Out of interest, are you running on Swift 4.0, 4.1 or 4.2? We are discussing how long to continue to support earlier versions of Swift, and user feedback would be very helpful.

As for your immediate problem, the only option I can suggest is to avoid using URLSession on Linux :( How are you using URLSession? Directly from your Kitura app or via a library like https://github.com/IBM-Swift/SwiftyRequest ? You might consider trying https://ibm-swift.github.io/Kitura-net/Classes/ClientRequest.html instead which uses libcurl directly instead of URLSession.

@gurugeek
Copy link

gurugeek commented Dec 9, 2019

just to report the same issue
[2019-12-09T02:31:10.976+01:00] [ERROR] [IncomingSocketManager.swift:295 removeIdleSockets(removeAll:runNow:)] epoll_ctl failure. Error code=1. Reason=Operation not permitted

Swift version 5.1 (swift-5.1.2-RELEASE)
Target: x86_64-unknown-linux-gnu

Kitura 2.8.0

This makes it totally unusable as a lot of requests fail (even with just 10 concurrent requests and 100 requests so not exactly high load)

ab -n 100 -c 10 https://.../index
This is ApacheBench, Version 2.3 <$Revision: 1843412 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking press.toys (be patient).....done

Server Software: Apache/2.4.41
Server Hostname:
Server Port: 443
SSL/TLS Protocol: TLSv1.2,ECDHE-RSA-CHACHA20-POLY1305,2048,256
Server Temp Key: ECDH X25519 253 bits
TLS Server Name:

Document Path: /index
Document Length: 14023 bytes

Concurrency Level: 10
Time taken for tests: 5.114 seconds
Complete requests: 100
Failed requests: 32
(Connect: 0, Receive: 0, Length: 32, Exceptions: 0)
Non-2xx responses: 3
Total transferred: 1385997 bytes
HTML transferred: 1371341 bytes
Requests per second: 19.55 [#/sec] (mean)
Time per request: 511.410 [ms] (mean)
Time per request: 51.141 [ms] (mean, across all concurrent requests)
Transfer rate: 264.66 [Kbytes/sec] received

Connection Times (ms)
min mean[+/-sd] median max
Connect: 61 92 16.7 90 156
Processing: 40 192 504.7 80 3030
Waiting: 39 185 505.7 69 3030
Total: 110 285 502.7 173 3106

Percentage of the requests served within a certain time (ms)
50% 173
66% 188
75% 223
80% 255
90% 339
95% 389
98% 3105
99% 3106
100% 3106 (longest request)

:(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants