-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CI: make systemd tests parallel-safe (*) #24048
Conversation
LGTM |
test/system/250-systemd.bats
Outdated
mv -Z "container-$cname.service" $UNIT_FILE.tmp.$$ && \ | ||
mv -Z $UNIT_FILE.tmp.$$ $UNIT_FILE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am confused, why do we move to a tmp file just to move to the real location directly after? I do not see th epoint of doing this vs one single mv command?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ultraparanoia. mv
is not atomic across filesystems, so this is theoretically possible:
- mv starts writing foo.service
- another parallel job runs systemctl daemon-reload
- systemd sees (incomplete) foo.service and does whatever systemd does with it and bad things happen
- mv completes
Even with all this paranoia, #24010 still triggers (on my laptop), so I'm pretty sure that the fragment-file flake has nothing to do with corrupt systemd service files ... but I just want to eliminate every last possibility of test errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok got it but then please comment the reason here because it is not obvious otherwise.
test/system/250-systemd.bats
Outdated
mv -Z "container-$cname.service" $TEMPLATE_FILE.tmp.$$ && \ | ||
mv -Z $TEMPLATE_FILE.tmp.$$ $TEMPLATE_FILE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
Mostly just switch to safename. Rewrite setup() to guarantee unique service file names, atomically created. * IMPORTANT NOTE: enabling parallelization on these tests triggers containers#24010 ("fragment file" flake), but only on my f40 laptop. I have never seen the flake in Cirrus despite many many runs in containers#23275. I am submitting this for review and merging because even though _something_ is broken, this breakage is unlikely to affect our CI. Signed-off-by: Ed Santiago <[email protected]>
01f5053
to
faf4604
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: edsantiago, Luap99 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
f9f72f5
into
containers:main
Mostly just switch to safename. Rewrite setup() to guarantee
unique service file names, atomically created.
triggers unable to create pod cgroup: slice was already loaded or has a fragment file #24010 ("fragment file" flake), but only on my
f40 laptop. I have never seen the flake in Cirrus despite
many many runs in WIP: system test parallelization: two-pass approach #23275. I am submitting this for review
and merging because even though something is broken,
this breakage is unlikely to affect our CI.
Signed-off-by: Ed Santiago [email protected]