fix: compute jailer cpu time without underflow #5034

roypat · 2025-02-10T14:44:59Z

Oh, here we go again.

The jailer wishes to compute the CPU time it spents before exec-ing into
Firecracker, so that it can be passed to Firecracker's parent-cpu-us
flag and emitted as a metric.

The CLOCK_PROCESS_CPUTIME_ID is reset on every fork() call, so we need
to stitch together the different values from before the first fork, and
then between each consecutive fork. However, we also need to account for
the scenarios that we do not fork() at all (no daemonization).

Now, this all goes slightly wrong because we want to exclude the time
spent the call to Env::run() from the reported time (note that inside
the jailer, this is only the time spent parsing arguments, but if a user
ended up exec()-ing into the jailer, it would also include the time from
prior to calling exec(), which makes sense to exclude). So we grab the
value of CLOCK_PROCESS_CPUTIME_ID right after parsing arguments, and
subtract it at the end. Sadly, in the case of daemonization, we subtract
it not from the total amount of time spent in the jailer, but rather
from the time spent since the final fork(). This works out if the time
spent since the last fork was greater than the time spent parsing
arguments, but if not, we have an integer underflow on our hands. Now,
this is only a problem in debug builds, and in release builds, we
underflow, and then _over_flow again during the next addition of the
underflowed value to jailer_cpu_time_us, and Firecracker even seens a
sane value for parent_time_cpu_us. But in debug builds, we panic.

Fix this by breaking the subtraction and addition into two separate
statements, to avoid relying on the double over-/underflow

Closes #5033

Fixes: 68ab56e
Signed-off-by: Patrick Roy [email protected]

License Acceptance

By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license. For more information on following Developer
Certificate of Origin and signing off your commits, please check
CONTRIBUTING.md.

PR Checklist

I have read and understand CONTRIBUTING.md.
I have run tools/devtool checkstyle to verify that the PR passes the
automated style checks.
I have described what is done in these changes, why they are needed, and
how they are solving the problem in a clear and encompassing way.
I have updated any relevant documentation (both in code and in the docs)
in the PR.
I have mentioned all user-facing changes in CHANGELOG.md.
If a specific issue led to this PR, this PR closes the issue.
When making API changes, I have followed the
Runbook for Firecracker API changes.
I have tested all new and changed functionalities in unit tests and/or
integration tests.
I have linked an issue to every new TODO.

This functionality cannot be added in rust-vmm.

Replaces a manual impl that was already missing some fields because they were added to the struct after the impl was written. Signed-off-by: Patrick Roy <[email protected]>

The jailer wishes to compute the CPU time it spents before exec-ing into Firecracker, so that it can be passed to Firecracker's `parent-cpu-us` flag and emitted as a metric. The CLOCK_PROCESS_CPUTIME_ID is reset on every fork() call, so we need to stitch together the different values from before the first fork, and then between each consecutive fork. However, we also need to account for the scenarios that we do not fork() at all (no daemonization). Now, this all goes slightly wrong because we want to exclude the time spent the call to Env::run() from the reported time (note that inside the jailer, this is only the time spent parsing arguments, but if a user ended up exec()-ing into the jailer, it would also include the time from prior to calling exec(), which makes sense to exclude). So we grab the value of CLOCK_PROCESS_CPUTIME_ID right after parsing arguments, and subtract it at the end. Sadly, in the case of daemonization, we subtract it not from the total amount of time spent in the jailer, but rather from the time spent since the final fork(). This works out if the time spent since the last fork was greater than the time spent parsing arguments, but if not, we have an integer underflow on our hands. Now, this is only a problem in debug builds, and in release builds, we underflow, and then _over_flow again during the next addition of the underflowed value to jailer_cpu_time_us, and Firecracker even seens a sane value for parent_time_cpu_us. But in debug builds, we panic. Fix this by breaking the subtraction and addition into two separate statements, to avoid relying on the double over-/underflow Closes firecracker-microvm#5033 Fixes: 68ab56e Signed-off-by: Patrick Roy <[email protected]>

codecov · 2025-02-10T14:49:51Z

Codecov Report

Attention: Patch coverage is 0% with 7 lines in your changes missing coverage. Please review.

Project coverage is 83.21%. Comparing base (3ca2fab) to head (c9b5b16).
Report is 4 commits behind head on main.

Files with missing lines	Patch %	Lines
src/jailer/src/env.rs	0.00%	7 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #5034      +/-   ##
==========================================
+ Coverage   83.13%   83.21%   +0.07%     
==========================================
  Files         245      245              
  Lines       26651    26626      -25     
==========================================
  Hits        22156    22156              
+ Misses       4495     4470      -25

Flag	Coverage Δ
5.10-c5n.metal	`83.69% <0.00%> (+0.08%)`	⬆️
5.10-m5n.metal	`83.68% <0.00%> (+0.09%)`	⬆️
5.10-m6a.metal	`82.88% <0.00%> (+0.08%)`	⬆️
5.10-m6g.metal	`79.64% <0.00%> (+0.08%)`	⬆️
5.10-m6i.metal	`83.67% <0.00%> (+0.08%)`	⬆️
5.10-m7g.metal	`79.64% <0.00%> (+0.08%)`	⬆️
6.1-c5n.metal	`83.69% <0.00%> (+0.08%)`	⬆️
6.1-m5n.metal	`83.67% <0.00%> (+0.07%)`	⬆️
6.1-m6a.metal	`82.89% <0.00%> (+0.08%)`	⬆️
6.1-m6g.metal	`79.64% <0.00%> (+0.09%)`	⬆️
6.1-m6i.metal	`83.67% <0.00%> (+0.08%)`	⬆️
6.1-m7g.metal	`79.64% <0.00%> (+0.09%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

src/jailer/src/env.rs

In `report_cpu_start_time()`, Firecracker computers the CPU time it took until the API socket was available as CLOCK_PROCESS_CPUTIME_ID - cpu_start_time + jailer_time However, here `cpu_start_time` is always zero currently (it's the value the jailer passes via `--cpu-start-time`), and in the case where the jailer exec()s into Firecracker (without prior fork() or clone()), this is wrong: CLOCK_PROCESS_CPUTIME_ID will now be the CPU time since the _jailer_ process started, and thus adding the jailer_time to it is nonsense. This all works out if we set cpu_start_time to be the CPU time right before we exec() into Firecracker: If we did not fork(), then it will cause the calculation above to correctly discard all time before exec()-ing into Firecracker (before re-adding specifically the time spent in the jailer). If we _did_ fork(), then it will be approximately zero, and thus preserve the current semantics (as the jailer time will only be the time _until_ we did the last fork(), and CLOCK_PROCESS_CPUTIME_ID will only be the absolute CPU time _since_ the last fork()). Signed-off-by: Patrick Roy <[email protected]>

The components that go into the calculation are 1. Exclude thing before calling Env::new() (including potential time spent in a parent of the jailer itself that ended up exec()-ing 2. The time up until a potential fork() for daemonization 3. The time spent in each temporary process during triple-fork()ing Signed-off-by: Patrick Roy <[email protected]>

src/jailer/src/env.rs

refactor: derive Debug for jailer::env::Env

344aa3a

Replaces a manual impl that was already missing some fields because they were added to the struct after the impl was written. Signed-off-by: Patrick Roy <[email protected]>

roypat added the Status: Awaiting review Indicates that a pull request is ready to be reviewed label Feb 10, 2025

roypat force-pushed the fix-jailer-underflow branch from d09494c to a1d7820 Compare February 10, 2025 14:49

roypat requested review from xmarcalx, kalyazin, pb8o and Manciukic as code owners February 10, 2025 14:49

Manciukic reviewed Feb 10, 2025

View reviewed changes

src/jailer/src/env.rs Outdated Show resolved Hide resolved

roypat added 2 commits February 10, 2025 15:59

roypat force-pushed the fix-jailer-underflow branch from 5d2da19 to 9895cc9 Compare February 10, 2025 15:59

zulinx86 self-requested a review February 11, 2025 09:29

Merge branch 'main' into fix-jailer-underflow

c9b5b16

zulinx86 approved these changes Feb 11, 2025

View reviewed changes

src/jailer/src/env.rs Outdated Show resolved Hide resolved

Manciukic reviewed Feb 11, 2025

View reviewed changes

src/jailer/src/env.rs Show resolved Hide resolved

Manciukic approved these changes Feb 13, 2025

View reviewed changes

roypat merged commit 0975cd4 into firecracker-microvm:main Feb 13, 2025
5 of 7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: compute jailer cpu time without underflow #5034

fix: compute jailer cpu time without underflow #5034

roypat commented Feb 10, 2025

codecov bot commented Feb 10, 2025 •

edited

Loading

fix: compute jailer cpu time without underflow #5034

fix: compute jailer cpu time without underflow #5034

Conversation

roypat commented Feb 10, 2025

License Acceptance

PR Checklist

codecov bot commented Feb 10, 2025 • edited Loading

Codecov Report

codecov bot commented Feb 10, 2025 •

edited

Loading