Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: compute jailer cpu time without underflow #5034

Merged
merged 5 commits into from
Feb 13, 2025

Conversation

roypat
Copy link
Contributor

@roypat roypat commented Feb 10, 2025

Oh, here we go again.

The jailer wishes to compute the CPU time it spents before exec-ing into
Firecracker, so that it can be passed to Firecracker's parent-cpu-us
flag and emitted as a metric.

The CLOCK_PROCESS_CPUTIME_ID is reset on every fork() call, so we need
to stitch together the different values from before the first fork, and
then between each consecutive fork. However, we also need to account for
the scenarios that we do not fork() at all (no daemonization).

Now, this all goes slightly wrong because we want to exclude the time
spent the call to Env::run() from the reported time (note that inside
the jailer, this is only the time spent parsing arguments, but if a user
ended up exec()-ing into the jailer, it would also include the time from
prior to calling exec(), which makes sense to exclude). So we grab the
value of CLOCK_PROCESS_CPUTIME_ID right after parsing arguments, and
subtract it at the end. Sadly, in the case of daemonization, we subtract
it not from the total amount of time spent in the jailer, but rather
from the time spent since the final fork(). This works out if the time
spent since the last fork was greater than the time spent parsing
arguments, but if not, we have an integer underflow on our hands. Now,
this is only a problem in debug builds, and in release builds, we
underflow, and then _over_flow again during the next addition of the
underflowed value to jailer_cpu_time_us, and Firecracker even seens a
sane value for parent_time_cpu_us. But in debug builds, we panic.

Fix this by breaking the subtraction and addition into two separate
statements, to avoid relying on the double over-/underflow

Closes #5033

Fixes: 68ab56e
Signed-off-by: Patrick Roy [email protected]

License Acceptance

By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license. For more information on following Developer
Certificate of Origin and signing off your commits, please check
CONTRIBUTING.md.

PR Checklist

  • I have read and understand CONTRIBUTING.md.
  • I have run tools/devtool checkstyle to verify that the PR passes the
    automated style checks.
  • I have described what is done in these changes, why they are needed, and
    how they are solving the problem in a clear and encompassing way.
  • I have updated any relevant documentation (both in code and in the docs)
    in the PR.
  • I have mentioned all user-facing changes in CHANGELOG.md.
  • If a specific issue led to this PR, this PR closes the issue.
  • When making API changes, I have followed the
    Runbook for Firecracker API changes.
  • I have tested all new and changed functionalities in unit tests and/or
    integration tests.
  • I have linked an issue to every new TODO.

  • This functionality cannot be added in rust-vmm.

Replaces a manual impl that was already missing some fields because they
were added to the struct after the impl was written.

Signed-off-by: Patrick Roy <[email protected]>
@roypat roypat added the Status: Awaiting review Indicates that a pull request is ready to be reviewed label Feb 10, 2025
The jailer wishes to compute the CPU time it spents before exec-ing into
Firecracker, so that it can be passed to Firecracker's `parent-cpu-us`
flag and emitted as a metric.

The CLOCK_PROCESS_CPUTIME_ID is reset on every fork() call, so we need
to stitch together the different values from before the first fork, and
then between each consecutive fork. However, we also need to account for
the scenarios that we do not fork() at all (no daemonization).

Now, this all goes slightly wrong because we want to exclude the time
spent the call to Env::run() from the reported time (note that inside
the jailer, this is only the time spent parsing arguments, but if a user
ended up exec()-ing into the jailer, it would also include the time from
prior to calling exec(), which makes sense to exclude). So we grab the
value of CLOCK_PROCESS_CPUTIME_ID right after parsing arguments, and
subtract it at the end. Sadly, in the case of daemonization, we subtract
it not from the total amount of time spent in the jailer, but rather
from the time spent since the final fork(). This works out if the time
spent since the last fork was greater than the time spent parsing
arguments, but if not, we have an integer underflow on our hands. Now,
this is only a problem in debug builds, and in release builds, we
underflow, and then _over_flow again during the next addition of the
underflowed value to jailer_cpu_time_us, and Firecracker even seens a
sane value for parent_time_cpu_us. But in debug builds, we panic.

Fix this by breaking the subtraction and addition into two separate
statements, to avoid relying on the double over-/underflow

Closes firecracker-microvm#5033

Fixes: 68ab56e
Signed-off-by: Patrick Roy <[email protected]>
Copy link

codecov bot commented Feb 10, 2025

Codecov Report

Attention: Patch coverage is 0% with 7 lines in your changes missing coverage. Please review.

Project coverage is 83.21%. Comparing base (3ca2fab) to head (c9b5b16).
Report is 4 commits behind head on main.

Files with missing lines Patch % Lines
src/jailer/src/env.rs 0.00% 7 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #5034      +/-   ##
==========================================
+ Coverage   83.13%   83.21%   +0.07%     
==========================================
  Files         245      245              
  Lines       26651    26626      -25     
==========================================
  Hits        22156    22156              
+ Misses       4495     4470      -25     
Flag Coverage Δ
5.10-c5n.metal 83.69% <0.00%> (+0.08%) ⬆️
5.10-m5n.metal 83.68% <0.00%> (+0.09%) ⬆️
5.10-m6a.metal 82.88% <0.00%> (+0.08%) ⬆️
5.10-m6g.metal 79.64% <0.00%> (+0.08%) ⬆️
5.10-m6i.metal 83.67% <0.00%> (+0.08%) ⬆️
5.10-m7g.metal 79.64% <0.00%> (+0.08%) ⬆️
6.1-c5n.metal 83.69% <0.00%> (+0.08%) ⬆️
6.1-m5n.metal 83.67% <0.00%> (+0.07%) ⬆️
6.1-m6a.metal 82.89% <0.00%> (+0.08%) ⬆️
6.1-m6g.metal 79.64% <0.00%> (+0.09%) ⬆️
6.1-m6i.metal 83.67% <0.00%> (+0.08%) ⬆️
6.1-m7g.metal 79.64% <0.00%> (+0.09%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

src/jailer/src/env.rs Outdated Show resolved Hide resolved
In `report_cpu_start_time()`, Firecracker computers the CPU time it took
until the API socket was available as

CLOCK_PROCESS_CPUTIME_ID - cpu_start_time + jailer_time

However, here `cpu_start_time` is always zero currently (it's the value
the jailer passes via `--cpu-start-time`), and in the case where the
jailer exec()s into Firecracker (without prior fork() or clone()), this
is wrong: CLOCK_PROCESS_CPUTIME_ID will now be the CPU time since the
_jailer_ process started, and thus adding the jailer_time to it is
nonsense. This all works out if we set cpu_start_time to be the CPU time
right before we exec() into Firecracker: If we did not fork(), then it
will cause the calculation above to correctly discard all time before
exec()-ing into Firecracker (before re-adding specifically the time
spent in the jailer). If we _did_ fork(), then it will be approximately
zero, and thus preserve the current semantics (as the jailer time will
only be the time _until_ we did the last fork(), and
CLOCK_PROCESS_CPUTIME_ID will only be the absolute CPU time _since_ the
last fork()).

Signed-off-by: Patrick Roy <[email protected]>
The components that go into the calculation are
1. Exclude thing before calling Env::new() (including potential time
   spent in a parent of the jailer itself that ended up exec()-ing
2. The time up until a potential fork() for daemonization
3. The time spent in each temporary process during triple-fork()ing

Signed-off-by: Patrick Roy <[email protected]>
@roypat roypat force-pushed the fix-jailer-underflow branch from 5d2da19 to 9895cc9 Compare February 10, 2025 15:59
@zulinx86 zulinx86 self-requested a review February 11, 2025 09:29
src/jailer/src/env.rs Outdated Show resolved Hide resolved
@roypat roypat merged commit 0975cd4 into firecracker-microvm:main Feb 13, 2025
5 of 7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Awaiting review Indicates that a pull request is ready to be reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug] Fail to launch daemonize jailer v1.10.1
3 participants