Skip to content
This repository has been archived by the owner on Jan 30, 2020. It is now read-only.

Fleet unit uploaded, yet missing #1127

Closed
yaronr opened this issue Feb 11, 2015 · 13 comments · Fixed by #1134
Closed

Fleet unit uploaded, yet missing #1127

yaronr opened this issue Feb 11, 2015 · 13 comments · Fixed by #1134
Milestone

Comments

@yaronr
Copy link

yaronr commented Feb 11, 2015

Hi

CoreOS 584.0.0

fleetctl destroy wordpress-sidekick.service
fleetctl start wordpress-sidekick.service

or:
destroy - submit - load - start,

Expected result: unit will run
Actual result: unit is failed

Logs from fleet:

Feb 11 08:17:16 ip-10-0-4-135.ec2.internal fleetd[613]: ERROR manager.go:147: Failed to trigger systemd unit wordpress-sidekick.service stop: Unit wordpress-sidekick.service not loaded.
Feb 11 08:17:16 ip-10-0-4-135.ec2.internal fleetd[613]: INFO manager.go:275: Removing systemd unit wordpress-sidekick.service
Feb 11 08:17:16 ip-10-0-4-135.ec2.internal fleetd[613]: INFO reconcile.go:321: AgentReconciler completed task: type=UnloadUnit job=wordpress-sidekick.service reason="unit loaded but not scheduled here"
Feb 11 08:18:26 ip-10-0-4-135.ec2.internal fleetd[613]: INFO manager.go:262: Writing systemd unit wordpress-sidekick.service (740b)
Feb 11 08:18:26 ip-10-0-4-135.ec2.internal fleetd[613]: INFO reconcile.go:321: AgentReconciler completed task: type=LoadUnit job=wordpress-sidekick.service reason="unit scheduled here but not loaded"
Feb 11 08:18:58 ip-10-0-4-135.ec2.internal fleetd[613]: ERROR manager.go:136: Failed to trigger systemd unit wordpress-sidekick.service start: Unit wordpress-sidekick.service failed to load: No such file or directory.
Feb 11 08:18:58 ip-10-0-4-135.ec2.internal fleetd[613]: INFO reconcile.go:321: AgentReconciler completed task: type=StartUnit job=wordpress-sidekick.service reason="unit currently loaded but desired state is launched"

unit file details:
Wants=etcd.service
After=etcd.service

BindsTo=wordpress.service
After=wordpress.service

Restart=always

(wordpress and etcd units are up and running)

@bcwaldon
Copy link
Contributor

Likely related to #900

@bcwaldon bcwaldon added the bug label Feb 11, 2015
@tom-pryor
Copy link

I can't actually get any fleet units to run in my Vagrant environment.

I run:

fleetctl start syslog

Listing units:

core@core-01 ~ $ fleetctl list-units
UNIT        MACHINE             ACTIVE      SUB
syslog.service  0a805687.../172.17.8.103    inactive    dead

Error in log:

core-03 fleetd[915]: ERROR manager.go:136: Failed to trigger systemd unit syslog.service start: Unit syslog.service failed to load: No such file or directory.

SSH into core-03. syslog.service is present in /run/fleet/units but even trying to run it manually fails:

core@core-03 ~ $ sudo systemctl start syslog
Failed to start syslog.service: Unit syslog.service failed to load: No such file or directory.

@tom-pryor
Copy link

Looking at 4c23412 commit, if I add NeedDaemonReload=true then the unit runs fine. @jonboulle could you please clarify when a reload is necessary?

@robszumski
Copy link
Member

I've seen this pop up several times on IRC in the past few days. I think we need to look at reverting this change.

@sylus
Copy link

sylus commented Feb 20, 2015

Just confirming this as well. My fleet units worked in 522.6 but just updated to 598.0 (to test SMB support) and they no longer work and just say fail to load: No such file or directory.

I am currently seeing while looking through the issue queue that a few other issues are probably representative of the same problem.

Is this the definitive issue that is tracking this regression?

@guruvan
Copy link

guruvan commented Feb 20, 2015

@robszumski we'll try the NeedDaemonReload=true and see how that works for now, but this is really a debilitating issue - I've noted on a couple issues already

@bcwaldon bcwaldon added this to the v0.10.0 milestone Feb 21, 2015
@bdehamer
Copy link

Seeing something similar to this issue on 557.2.0. I've got a few units that will fail to start about 50% of the time. I've got a script that demonstrates the issue pretty consistently:

https://gist.github.com/bdehamer/9c5303f7ef9d463a9134

This script goes into a loop and simply destroys, submits, loads, and starts the units over-and-over (you need the Fleet API bound to TCP port 49153 for the script to work). Things usually run fine the first time through, but will fail on the second execution with a not-found error:

core@core-01 ~/dev $ fleetctl status DB.service
● DB.service
   Loaded: not-found (Reason: No such file or directory)
   Active: inactive (dead) since Mon 2015-02-23 16:29:25 UTC; 10s ago
 Main PID: 5605 (code=exited, status=0/SUCCESS)

@bcwaldon
Copy link
Contributor

I'm actively investigating this bug.

Quick note here - NeedDaemonReload is a property that systemd provides over the dbus interface, it is not a property a user can set in a unit file.

@yaronr
Copy link
Author

yaronr commented Feb 25, 2015

@bcwaldon any updates?
I'm seeing this all the time.
At least a workaround?

Thanks

@akaspin
Copy link

akaspin commented Feb 25, 2015

Bump.

@rufman
Copy link

rufman commented Feb 25, 2015

I would also be interested in a workaround. It seems like v0.8 doesn't have this issue (or it's not as pervasive).

@guruvan
Copy link

guruvan commented Feb 25, 2015

@bcwaldon This is something my crew needs to fix in the next week - we've got production systems and this is a sleep-loser because of constant service outages. A workaround would be fine for the next couple weeks while y'all figure this out - "CrashWhenAdminAwake=true" would also be good. ;) - I'm a little wary of just writing a script to blanket-restart all these services as they get stopped by fleet.

@bcwaldon
Copy link
Contributor

bcwaldon commented Mar 2, 2015

Please see #1134

@bcwaldon bcwaldon modified the milestones: v0.9.1, v0.10.0 Apr 14, 2015
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants