Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Needs restarting boot time #579

Open
RomanSoloweow opened this issue Feb 24, 2025 · 9 comments
Open

Needs restarting boot time #579

RomanSoloweow opened this issue Feb 24, 2025 · 9 comments
Labels

Comments

@RomanSoloweow
Copy link

Hello, we are using Rocky Linux versions 9.2 and 9.4 with the latest available version of the ‘needs-restarting’ plugin — 4.3.0. Recently, we encountered an issue with ‘needs-restarting’.

If your virtual machine is hosted on VMware vSphere and you revert to snapshots, then /proc/1 and /proc/stat may contain completely different dates, leading to incorrect operation of the ‘needs-restarting’ plugin.

image

@RomanSoloweow
Copy link
Author

cc @kontura @ppisar

@ppisar
Copy link
Contributor

ppisar commented Feb 24, 2025

That's worrisome. Could you explain how "VMware vSphere revert to snapshots" is implemented? Your screenshot hints that a kernel is newly booted and a before-than frozen userspace thawed.

@ppisar
Copy link
Contributor

ppisar commented Feb 24, 2025

By the way, RHEL 9 is going to get another method of obtaining a boot time from systemd https://issues.redhat.com/browse/RHEL-14900. You can check a build for CentOS Stream https://mirror.stream.centos.org/9-stream/BaseOS/x86_64/os/Packages/. Maybe it resolves the problem for you.

@RomanSoloweow
Copy link
Author

I need both methods, reboot-hint and services

@RomanSoloweow
Copy link
Author

I'm not sure, but I think the machine is copied in a "frozen state" and then the copy replaces the main version which is then "thawed."

@evan-goode
Copy link
Member

Using KVM/libvirt instead of VMWare vSphere, I did notice a difference in grep ^btime /proc/stat on a RHEL 9.5 VM running kernel 5.14.0-503.16.1.el9_5.x86_64, after resuming from a snapshot.

The new preferred method for detecting boot time used by needs-restarting in RHEL 9.6+ is based on systemd. Specifically, it uses the UnitsLoadStartTimestamp property on /org/freedesktop/systemd1. Resuming my 9.5 VM from a snapshot had no effect on the UnitsLoadStartTimestamp, so if vSphere works similar enough to libvirt/qemu/KVM, this bug should be already fixed in the next RHEL release.

Strangely, on a CentOS 9 stream VM running kernel 5.14.0-570.el9.x86_64, I did not see a difference in grep ^btime /proc/stat after resuming a snapshot. So maybe some other change also fixed this bug?

@evan-goode
Copy link
Member

Hmm, except for needs-restarting (no arguments) we still measure process start times relative to the btime in /proc/stat. Too bad it's unreliable. /proc/uptime also seems to be incorrect after resuming a snapshot. If we do not have a reliable way to get the kernel boot time (from which process start times are measured), there's not much we can do here...

@RomanSoloweow
Copy link
Author

Yes, I think depending on the user's configuration, these methods vary and there is no universal method. I suggest allowing to configure where the boot time will be taken from: proc/1, proc/stat, or systemd. This could be an argument to the command, or allow passing the boot time into the command. This way, everyone can choose the method that works in their specific configuration.

@evan-goode
Copy link
Member

The bigger issue is that we can't get correct process start times. Currently we are using column 22 (1-indexed) of /proc/pid/stat to get process start times, which only gives us the uptime of the kernel when the process was started. If the uptime stops counting (e.g. the VM is paused or is restored from an earlier snapshot), then it's not useful. To illustrate this, imagine the VM gets paused for one day. Before and after the pause, the uptime is exactly the same. So just knowing the uptime when a process was started doesn't tell you the wall clock time when it started, since it could have been started on either side of the pause, and you don't know when the pause happened.

Like needs-restarting, ps -o start also reads /proc/pid/stat[22] and gives incorrect readings when pausing/resuming VMs.

Reading the mtime or ctime of /proc/pid (stat -c %Z /proc/pid) seems to give us the absolute timestamps we want, and these are unaffected by pausing/resuming the VM, but last Petr and I checked, the behavior of these is not well-defined: #536.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants