Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"svc -d . ; svc -u ." doesn't restart a service if the service doesn't exit quickly #20

Closed
josb opened this issue Jul 15, 2014 · 5 comments

Comments

@josb
Copy link

josb commented Jul 15, 2014

In some wrapper scripts I had to replace "svc -d .; svc -u ." with "svc -t ." since bringing a certain kind of service down and then up when the run script hasn't exited yet causes the up to be ignored.
The issue seems to be that supervise thinks it's in start mode (firstrun == 1)
and doesn't restart the service if it exited > 0.
Specifically, in supervise.c the following condition is triggered:

firstrun == 1 and wait_crashed(wstat) == 15

  if ((firstrun && (wait_crashed(wstat) || wait_exitcode(wstat) != 0))
|| (!wait_crashed(wstat) && wait_exitcode(wstat) == 100)) {
          svc->flagwantup = 0;
          svc->flagstatus = svstatus_failed;
  }

Repro: create a service which doesn't exit right away when running 'svc -d .; svc -u .';
svstat will show 'failed' in the status message, and the service will be left down.

The underlying problem seems to be that firstrun is set when it should not (it
should only be set if the start file exists). Maybe the following assignment
in handling 'u' in doit is wrong:

        firstrun = !svcmain.flagwantup;
@bruceg
Copy link
Owner

bruceg commented Sep 30, 2014

Did stock daemontools get this right?

@josb
Copy link
Author

josb commented Oct 1, 2014

From what I can tell, yes.

This is with stock daemontools-0.76 and a service named 'int'. It seems to behave correctly:

[root@ln-jbackus int]# cd /service/
[root@ln-jbackus service]# ln -s /var/lib/service/int/
[root@ln-jbackus service]# svstat int/
int/: up (pid 14264) 1 seconds
[root@ln-jbackus service]# svstat int/
int/: up (pid 14264) 2 seconds
[root@ln-jbackus service]# cd int/
[root@ln-jbackus int]# cat run
#!/usr/bin/env ruby

Signal.trap('INT') do puts "exiting..."; sleep 10; exit end
puts "sleeping"
sleep 60
[root@ln-jbackus int]# svc -d .; svc -u .
[root@ln-jbackus int]# svstat .
.: up (pid 14479) 3 seconds
[root@ln-jbackus int]# svstat .
.: up (pid 14479) 5 seconds
[root@ln-jbackus int]# svstat .
.: up (pid 14479) 5 seconds
[root@ln-jbackus int]#

@bruceg
Copy link
Owner

bruceg commented Oct 9, 2018

As far as I can tell, this was fixed by pull #56. I am having difficulty producing a repeatable test case for this though.

@bruceg bruceg closed this as completed Oct 9, 2018
@bruceg
Copy link
Owner

bruceg commented Oct 10, 2018

Actually, neither pull #56 nor pull #58 fully fix this according to the documentation in supervise.8. I am working on a resolution. The problem is how to handle services for which the exit 100 happens after the svc -u is received, while still maintaining the rest of the restart semantics.

@bruceg
Copy link
Owner

bruceg commented Oct 11, 2018

Ok, I have tried this a half dozen different ways, and I see no way to fully resolve this due to the fundamental race. At this point, svc -du will restart any service that does not exit 100, but those will stay stopped, however svc -u on a service that previously exited 100 will start it. This is due to the (new) requirement introduced in the first version of daemontools-encore: It restarts ./run if ./start exits zero or ./run exits with any value other than 100.

This leads to a fundamental race. If the admin issues svc -du and the u is received before the service exits, the service is marked as needing to be brought up. However, it then exits 100 and is set to stay down. I see no good way to resolve this particular case.

However, as far as I can tell, the problem is solved for all other cases.

@bruceg bruceg closed this as completed Oct 11, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants