-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Running UI clients killed after "Too many open files" #6310
Comments
Indicates that something is exhausting the maximum number of file descriptors. The modesets failing on amdgpu is also worrying, but shouldn't cause your applications to close. The exhaustion of FDs is probably what causes your apps to close. |
Okay, thanks - I'll look into that. However, I'm confused - these errors appear in the middle of that large block of output that is suddenly triggered out of the blue, and which involves disabling and re-enabling outputs and input devices, rearranging workspaces and other stuff. So all this stuff happens on purpose, for a reason, while the machine is just sitting there? That seems unexpected to me. |
So...
I'm not sure what's up with that enormous number for I found a rather old Xwayland issue that resulted in a similar error message. But I have no reason to suspect that this bug somehow still exists. |
Systemd bumped the default for this a while back: http://0pointer.net/blog/file-descriptor-limits.html |
Update: I tried changing the soft limit to 100k or so, but this did not make a difference and the same issue happened. Then I updated Sway on Saturday, so |
I am experiencing exactly the same problem pretty much since I started using sway sometime last year. It also happens with the sway Would it help to provide a debug log from my side as well? Or any other information which might be helpful? |
For me, it happens super randomly. Sometimes it is good for days and sometimes it happens quite quickly after a reboot. However, i only notice it after the screen was locked or attempted to lock. (Today, happened again, however it seemed like the auto lock was not properly triggered. Screens were blanked out (dpms off), but no lock screen was there. Pressing a key immediately turned on the screens and brought me back to the (broken) desktop (almost all windows disappeared, alacrity terminals still there, but not responding to user input, way bar unresponsive) Every newly opened window from now on behaves like expected. Also killing and restarting waybar, kinda brings the system back to a working state. Other observation: Most windows disappeared including ALL Xwayland clients. However, also xdg_shell clients were killed (firefox, thunderbird, nemo, etc) Some windows however seem to survive, e.g. MellowPlayer, alacritty. BUT the to not respond to user input, nor do they update any visual state. At least MellowPlayer seems to still be active (playing music, controllable via remote) It would be super nice to figure out/fix that problem. It is highly annoying. I am scared to lock my screen while working, but leaving everything on draws a lot more power. :( Fixing this would highly improve my sway experience... |
i got the same issue (atleast i think so) but after removing dpms related lines from my swaylock exec, it does not happen |
I'm leaving the issue open since others have the same problem now. However, unfortunately I can't contribute more at this point because I don't see the problem anymore. I see that my last comment is from 16 days ago, and my machine now has an uptime of 15 days and almost 20 hours... somehow this problem was apparently fixed for me with the update I installed on June 7th. Hope it doesn't come back. @emersion I'm not sure I agree with the subject change you applied. It was an interesting idea to focus on the message relating to open files, but as far as I can tell this did not turn out to be the actual cause of the problem. |
I still think this is the cause of the issue. We're probably leaking FDs somewhere at some point, or a client causes Sway to keep too many FDs opened. |
I raised my ulimits:
Since then, the problem did not occur anymore. However, I also think it is too early to say it is fixed. It sometimes happens very rarely.
|
If someone can still reproduce, running this command when Sway is out of FDs could help figuring out what kind of FDs cause the error.
But since killing clients apparently helps, not sure it's possible to get a useful trace. |
It seems like the clients are being killed or kill themselves. At least as soon as I am back in control of the computer, there are almost no more applications running. :( |
Yes, Sway kills clients if it runs out of FDs. |
Out of curiosity I just had a look in |
These are probably used by Mesa, nothing to worry about. |
Maybe we can add an extra debug log before / after killing to track what was killed and who had all those FDs open? |
Yeah. Not that simple because libwayland does the killing internally. |
I have literally the same as #6310 (comment) for a couple of month, except I have no swaylock
for now I raised ulimits too, and keep watching on it UPD: 1 month without issues! |
It's been stable for me since about a month or so. |
I've been experiencing similar seemingly-random client terminations over the last week
I did see the file descriptor errors in my logs, but these have disappeared since I added this to both my /etc/systemd/system.conf and /etc/systemd/user.conf
However, I'm still seeing terminations, I just suppose they weren't directly caused by file descriptor limits, and I'm still seeing the same errors as the original report
Or similar errors:
Interestingly enough, I don't get this on my gaming PC (AMD GPU) but do get this a few times a day on my work laptop (Dell Precision 5550, Intel GPU) but this could be related to the different workloads |
I am on intel as well if that matters. Hybrid laptop with Nvidia, but that's probably unrelated. |
Please try the patch from this response and report back: |
Weird, despite having bumped my file descriptor limits per #6310 (comment) I did just now have this happen again I had Firefox, Zoom, Discord, Slack open (on my Intel machine), was in a Zoom meeting, and suddenly everything except Firefox disappeared
sway.log from this session: https://gist.github.com/jokeyrhyme/d086e97c084c0ba658b0235a82fbbbc0 |
@jokeyrhyme you may need also tune your /etc/security/limits.conf |
@vvrein I did previously, per #6310 (comment), the numbers are double the systemd defaults Note (per that message) that I still get similar symptoms even where there are no log messages indicating the file descriptor limit is being reached I'll double the numbers again and see if that helps 🤷 |
Started running the following for a while: ]$ max=0; while true; do ls -l /proc/$(pidof sway)/fd > /tmp/sway-fds.txt; cur=$(wc -l /tmp/sway-fds.txt | awk '{print $1}'); if [ $cur -gt $max ]; then max=$cur; date; echo "Max FDs opened increased to $max"; cp /tmp/sway-fds.txt /tmp/sway-fds-max.txt; fi; sleep 1; done
<snip>
Mon Sep 13 03:47:59 AM CDT 2021
Max FDs opened increased to 287
Tue Sep 14 02:03:42 AM CDT 2021
Max FDs opened increased to 724 List of file descriptors and Sway debug log (lines truncated to 120 chars): https://gist.github.com/lae/850e82a0a9354c0d31795b0307fcaa99 Sudden increase there, and I guess everything either crashed or became unusable immediately afterwards. Does this help? I can try some other things if needed, if there's anything I can do soon. (I don't think raising ulimits is a long term solution, so hopefully we can resolve this.) |
It seems at least to be a long term workaround. Haven't had any hickup since about 3 months now. I do agree, however, that the root cause should be fixes (if not already happened) |
As a recent data-point, I have wlroots 0.14.1-2 (archlinux) and I believe this just happened to me this morning: running alacritty, Slack, Firefox, jumped into a Zoom for a few minutes, Firefox disappears a few minutes later Definitely not complaining, but was curious about the patch notes for wlroots and (at first) thought there was a chance that work might have an impact here: https://github.com/swaywm/wlroots/releases/tag/0.14.1 |
Could this be the same as #6642? If so, a fix has been merged, maybe try latest master? |
Sway Version:
swaymsg -t get_version
sayssway version 4.19.1 (2021-02-01)
,sway -v
says1.6-85291411 (Apr 23 2021, branch 'master')
Debug Log: gist link
Configuration File: gist link
Description:
This has happened to me several times - there is a pattern here, though I'm not sure exactly what it is.
Gdk-Message: 19:29:31.588: Error reading events from display: Connection reset by peer
andExiting due to channel error.
Since my debug log ran for several hours, I stripped it down and annotated it - I basically just cut out the large part in the middle where I was working without trouble for a few hours.
I'm not sure what to try at this point, suggestions are welcome. I started seeing this problem a while ago, but recently it has happened every evening. I'll make a test tonight by leaving the monitors on, in case that makes a difference...
The text was updated successfully, but these errors were encountered: