Skip to content

Add Steal Time support in ARM #5139

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Apr 28, 2025
Merged

Conversation

DakshinD
Copy link
Contributor

@DakshinD DakshinD commented Apr 4, 2025

Opening this PR as a draft as we continue working on this issue to add steal time support for ARM. Please see below for the changes we have implemented so far.

Changes

  • Added PVTime struct in src/vmm/src/arch/aarch64/pvtime/mod.rs that allocates and stores per-vCPU stolen_time regions

    • It offers a new constructor to create the device (allocating system memory and storing regions per vcpu), and register_vcpu which attempts to use kvm_set_device_attr to register the corresponding IPA for a vcpu
  • Created a function setup_pv_time in src/vmm/src/builder.rs which is invoked in build_microvm_for_boot

    • Checks if PVTime is supported using kvm_has_device_attr
    • Creates a PVTime device
    • Iterates over vcpu fd's to register their IPAs for steal_time struct
  • Started using debug/info logs while booting microvm for testing

Current problems:

Future Progress

  • We hope to continue making progress on this PR by:
    • Getting feedback on our current design of the PVTime device and setup_pv_time
    • Support snapshotting
    • Any other things we need to address

Reason

Attempting to resolve issue #4866

License Acceptance

By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license. For more information on following Developer
Certificate of Origin and signing off your commits, please check
CONTRIBUTING.md.

PR Checklist

  • I have read and understand CONTRIBUTING.md.
  • I have run tools/devtool checkstyle to verify that the PR passes the
    automated style checks.
  • I have described what is done in these changes, why they are needed, and
    how they are solving the problem in a clear and encompassing way.
  • I have updated any relevant documentation (both in code and in the docs)
    in the PR.
  • I have mentioned all user-facing changes in CHANGELOG.md.
  • If a specific issue led to this PR, this PR closes the issue.
  • When making API changes, I have followed the
    Runbook for Firecracker API changes.
  • I have tested all new and changed functionalities in unit tests and/or
    integration tests.
  • I have linked an issue to every new TODO.

  • This functionality cannot be added in rust-vmm.

@DakshinD DakshinD marked this pull request as draft April 4, 2025 07:00
@Manciukic Manciukic self-assigned this Apr 4, 2025
@Manciukic Manciukic added the Status: WIP Indicates that an issue is currently being worked on or triaged label Apr 4, 2025
Copy link
Contributor

@Manciukic Manciukic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's try making the proposed change to pass the right address and see if it works.
I haven't left any comment regarding style or design to focus on functionality first. Once we get something working we can see how to make it "look better".

Copy link
Contributor

@Manciukic Manciukic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got confused by the KVM API as well! Passing a user pointer to the IPA to kvm is ok, but it needs to be 64B aligned.

The hint was in the KVM documentation:

-EINVAL Base address not 64 byte aligned

https://www.kernel.org/doc/html/v5.8/virt/kvm/devices/vcpu.html#attribute-kvm-arm-vcpu-pvtime-ipa

@Manciukic Manciukic linked an issue Apr 4, 2025 that may be closed by this pull request
3 tasks
@DakshinD
Copy link
Contributor Author

DakshinD commented Apr 4, 2025

Thank you so much for your fast response! Totally missed the docs for 64 byte alignment, but that ended up fixing the bugs and I can successfully boot a microVM manually. If you think its a good time now, we can look to flesh out the implementation and "make it better".

Other than that, we believe the next step is adding unit/integration tests to make sure our device is working properly, as well as implementing persistence across snapshots. Any leads on how to do both of these things would be really helpful.

The lead we have been looking at for snapshotting is the Persist interface.

  • We store the MicrovmState in save_snapshot, so we need to first store our PVTime device somewhere in the vmm
  • Then we implement Persist for PVTime, and store the PVTimeState we get from save() somewhere in the MicrovmState (could we extend MicrovmState to include this? or add this state to one of the other states already existing inside of it?)
  • on booting from snapshot, we need to setup the pvtime device again, using kvm_set_device_addr again because the vcpu fds change.
  • Finally, would need to write comprehensive tests to make sure this works

@Manciukic
Copy link
Contributor

Thank you so much for your fast response! Totally missed the docs for 64 byte alignment, but that ended up fixing the bugs and I can successfully boot a microVM manually. If you think its a good time now, we can look to flesh out the implementation and "make it better".

Well done! I'll take a look.

The lead we have been looking at for snapshotting is the Persist interface.

* We store the `MicrovmState` in `save_snapshot`, so we need to first store our PVTime device somewhere in the vmm

* Then we implement `Persist` for PVTime, and store the `PVTimeState` we get from `save()` somewhere in the `MicrovmState` (could we extend MicrovmState to include this? or add this state to one of the other states already existing inside of it?)

I think extending it is ok, I didn't find any other place where we could fit it.

* on booting from snapshot, we need to setup the pvtime device again, using `kvm_set_device_addr` again because the vcpu fds change.

Correct. Remember to use the resource allocator with exact policy to ensure the system memory gets reserved in our internal allocator.

* Finally, would need to write comprehensive tests to make sure this works

I see the kernel prints a message "using stolen time PV" in the console when it boots, so we could probably check for that.

A more in-depth test to actually check that it's working would be to start 2 cpu hogging workloads on a 2vcpu guest and pin both vcpus to the same pcpu. That should achieve 50% steal time, and increase the relative counter in /proc/stat. We could assert the value actually increased.

For the snapshot, we could check that the same (non-zero) value is present after resume and that it continues to increase if we hog it again.

@DakshinD
Copy link
Contributor Author

DakshinD commented Apr 8, 2025

Hi @Manciukic, I was just about to comment :), thanks for your response. We've started a bare bones implementation of snapshot persistence in the previous commit. So if you also have the time to take a look at it, that would be awesome.

We are now trying to change the existing tests to include PVTime. Any input on our design choices we made would be appreciated again, specifically:

  • Using either ONLY the base_addr or the hashmap in PVTime struct (we believe only the base_addr is necessary if the order of the vCPUs does not change across snapshots)
  • Our restoration of the device inside build_microvm_from_snapshot
  • Choosing to make the device an Option<PVTime inside the struct Vmm, and Option<PVTimeState> inside struct MicrovmState

Copy link
Contributor

@Manciukic Manciukic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good. I'd just move most of the pv_time specific logic into the new module to avoid cluttering the builder.rs even more.

We also need to bump the snapshot version and add a changelog entry, other than the integration tests.

@Manciukic
Copy link
Contributor

Manciukic commented Apr 8, 2025

I tried to run the branch through our CI and there are a couple of issues: https://buildkite.com/firecracker/firecracker-pr/builds/13103#

  • x86 doesn't compile, for example:
E   error[E0433]: failed to resolve: could not find `aarch64` in `arch`
E     --> src/vmm/src/builder.rs:22:18
E      |
E   22 | use crate::arch::aarch64::pvtime::{PVTime, PVTimeConstructorArgs, PVTimeError};
E      |                  ^^^^^^^ could not find `aarch64` in `arch`
E      |
E   note: found an item that was configured out
  • arm has a few failures
integration_tests/build/test_clippy.py::test_rust_clippy[aarch64-unknown-linux-gnu] FAILED                                                               [ 12%]
integration_tests/build/test_clippy.py::test_rust_clippy[aarch64-unknown-linux-musl] FAILED                                                              [ 25%]
integration_tests/build/test_coverage.py::test_coverage FAILED                                                                                           [ 37%]
integration_tests/build/test_dependencies.py::test_unused_dependencies PASSED                                                                            [ 50%]
integration_tests/build/test_gdb.py::test_gdb_compiles PASSED                                                                                            [ 62%]
integration_tests/build/test_seccomp_no_redundant_rules.py::test_redundant_seccomp_rules PASSED                                                          [ 75%]
integration_tests/build/test_unittests.py::test_unittests FAILED                                                                                         [ 87%]
integration_tests/build/test_unittests.py::test_benchmarks_compile PASSED                                                                                [100%]
  • unit tests need to be updated with the new field
  • clippy is unhappy

But the good news is functional and performance tests passed 🎉

To run these tests locally you can run:

./tools/devtool test -y -- integration_tests/build

@DakshinD
Copy link
Contributor Author

DakshinD commented Apr 9, 2025

Managed to fix the errors reported in the build as well as clippy errors, would be awesome if it could be re-run.

We started looking at unit/integration tests, as per your suggestions, and were hoping to get a small lead as to going about it.

  • For the unit tests in src/vmm/src/builder.rs as well as test_microvm_state_snapshot, we couldn't find a way to get the Vec<Vcpus> in context to setup the PVTime device. Is it necessary to mimic these existing tests for the pvtime device?

Copy link

codecov bot commented Apr 9, 2025

Codecov Report

Attention: Patch coverage is 84.72222% with 11 lines in your changes missing coverage. Please review.

Project coverage is 83.12%. Comparing base (ae078ee) to head (1e4640d).
Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
src/vmm/src/builder.rs 71.05% 11 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #5139      +/-   ##
==========================================
+ Coverage   83.06%   83.12%   +0.06%     
==========================================
  Files         250      250              
  Lines       26897    26946      +49     
==========================================
+ Hits        22342    22399      +57     
+ Misses       4555     4547       -8     
Flag Coverage Δ
5.10-c5n.metal 83.58% <0.00%> (+0.02%) ⬆️
5.10-m5n.metal 83.58% <0.00%> (+0.01%) ⬆️
5.10-m6a.metal 82.81% <0.00%> (+0.02%) ⬆️
5.10-m6g.metal 79.42% <84.72%> (+0.07%) ⬆️
5.10-m6i.metal 83.57% <0.00%> (+0.01%) ⬆️
5.10-m7a.metal-48xl 82.79% <0.00%> (+0.01%) ⬆️
5.10-m7g.metal 79.42% <84.72%> (+0.07%) ⬆️
5.10-m7i.metal-24xl 83.54% <0.00%> (+0.01%) ⬆️
5.10-m7i.metal-48xl 83.54% <0.00%> (+0.01%) ⬆️
5.10-m8g.metal-24xl 79.42% <84.72%> (+0.07%) ⬆️
5.10-m8g.metal-48xl 79.42% <84.72%> (+0.07%) ⬆️
6.1-c5n.metal 83.63% <0.00%> (+0.01%) ⬆️
6.1-m5n.metal 83.63% <0.00%> (+0.02%) ⬆️
6.1-m6a.metal 82.85% <0.00%> (+0.02%) ⬆️
6.1-m6g.metal 79.42% <84.72%> (+0.07%) ⬆️
6.1-m6i.metal 83.62% <0.00%> (+0.02%) ⬆️
6.1-m7a.metal-48xl 82.84% <0.00%> (+0.02%) ⬆️
6.1-m7g.metal 79.42% <84.72%> (+0.07%) ⬆️
6.1-m7i.metal-24xl 83.63% <0.00%> (+0.01%) ⬆️
6.1-m7i.metal-48xl 83.65% <0.00%> (+0.02%) ⬆️
6.1-m8g.metal-24xl 79.42% <84.72%> (+0.07%) ⬆️
6.1-m8g.metal-48xl 79.42% <84.72%> (+0.07%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Manciukic
Copy link
Contributor

Manciukic commented Apr 10, 2025

There are still a few style and compilation errors. You can run the tests from your development machine as:

./tools/devtool -y test  -- integration_tests/build/
./tools/devtool checkstyle

snippet:

error[E0412]: cannot find type `PVTimeError` in this scope
  --> src/vmm/src/builder.rs:88:18
   |
88 |     CreatePVTime(PVTimeError),
   |                  ^^^^^^^^^^^ not found in this scope
error[E0412]: cannot find type `PVTimeError` in this scope
   --> src/vmm/src/builder.rs:424:27
    |
424 |     RestorePVTime(#[from] PVTimeError),
    |                           ^^^^^^^^^^^ not found in this scope
error[E0412]: cannot find type `PVTimeError` in this scope
   --> src/vmm/src/builder.rs:424:27
    |
424 |     RestorePVTime(#[from] PVTimeError),
    |                           ^^^^^^^^^^^ not found in this scope
    |
help: you might be missing a type parameter
    |
424 |     RestorePVTime(#[from<PVTimeError>] PVTimeError),
    |                         +++++++++++++
warning: unused import: `log::warn`
  --> src/vmm/src/builder.rs:14:5
   |
14 | use log::warn;
   |     ^^^^^^^^^
   |
   = note: `#[warn(unused_imports)]` on by default
For more information about this error, try `rustc --explain E0412`.
warning: `vmm` (lib) generated 1 warning
error: could not compile `vmm` (lib) due to 3 previous errors; 1 warning emitted
Returned error code: 101

https://buildkite.com/firecracker/firecracker-pr/builds/13116#_

@Manciukic
Copy link
Contributor

Manciukic commented Apr 10, 2025

For the unit tests in src/vmm/src/builder.rs as well as test_microvm_state_snapshot, we couldn't find a way to get the Vec<Vcpus> in context to setup the PVTime device. Is it necessary to mimic these existing tests for the pvtime device?

It's difficult to unit test code that mostly performs syscalls (ioctls on vcpu fd in this case), so the policy we adopted to not over-complicate the codebase with mocks is to test these features through end-to-end integration tests only.
I think it would be enough to have unit tests inside PVTime to ensure the logic (excluding the ioctls) is correct.
The unit tests in builder.rs are just to verify the creation of the emulated device, and most of the logic there is untested.

@DakshinD
Copy link
Contributor Author

Thanks for the clarification on that. We've added some very basic unit tests to PVTime and also 2 integration tests for basic checking and snapshotting. If there's any changes/next steps we need to take, please let us know.

A re-run of the test suite would also be very helpful so we can nail down any bugs, as it was a bit buggy locally.

@Manciukic
Copy link
Contributor

Thanks for the clarification on that. We've added some very basic unit tests to PVTime and also 2 integration tests for basic checking and snapshotting. If there's any changes/next steps we need to take, please let us know.

Can you also add a basic test that checks that the guest has enabled steal PV by looking at the kernel logs?

uvm.ssh.run("dmesg | grep 'stolen time PV'")

Make sure that only runs on aarch64.

A re-run of the test suite would also be very helpful so we can nail down any bugs, as it was a bit buggy locally.

Done. x86 has the same issues as above: https://buildkite.com/firecracker/firecracker-pr/builds/13138#0196241a-77dd-4fe4-9023-e053f8004c2d

@DakshinD
Copy link
Contributor Author

Added that extra test and should've completely fixed the x86 errors, should be good for test suite re-run.

@DakshinD DakshinD marked this pull request as ready for review April 15, 2025 16:01
@DakshinD DakshinD changed the title Draft PR: Add Steal Time support in ARM Add Steal Time support in ARM Apr 15, 2025
@DakshinD
Copy link
Contributor Author

Thanks for the revisions @Manciukic. I've gone ahead and:

  • changed vcpu_index to a u64 as per your suggestion and I think it cleans up the code nicely. It just assumes that vcpus.len() will not exceed 256 which I believe is a valid assumption?
  • Fixed the integration tests to check for increase only, and also cleaned it up with helper function.

Please let me know if there is anything else on that end, and looking towards a merge, what would need to be done such as:

  • changing snapshot version?
  • changelog
  • squashing my commits into ~3 substantial ones for clarity?

@Manciukic
Copy link
Contributor

Thanks for the revisions @Manciukic. I've gone ahead and:

* changed `vcpu_index` to a u64 as per your suggestion and I think it cleans up the code nicely. It just assumes that `vcpus.len()` will not exceed 256 which I believe is a valid assumption?

It's the assumption we already make in other parts of the code. I think it's fine.

* Fixed the integration tests to check for increase only, and also cleaned it up with helper function.

Thanks!

Please let me know if there is anything else on that end, and looking towards a merge, what would need to be done such as:

* changing snapshot version?

* changelog

* squashing my commits into ~3 substantial ones for clarity?

Yup, all three are required for merging. I can't think of anything else at the moment, other than keeping the branch up-to-date with main.

I will take a deeper look next week after the Easter holidays.

@DakshinD
Copy link
Contributor Author

DakshinD commented Apr 22, 2025

Hi @Manciukic, Just updated the snapshot version by minor version, added changelog, and attempted to rebase the commits but git freaked out so hopefully I did it right. Also apologies it seemed to auto-request reviews after I force pushed.

Copy link
Contributor

@Manciukic Manciukic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mostly lgtm except a few minor things. Another small nit: please clean up the commit messages a little (make sure to mention what, why, and how).

update: It also looks like the rebase on top of main didn't go as expected. Make sure to git pull --rebase upstream main and resolve the conflicts.

@Manciukic Manciukic added Status: Awaiting review Indicates that a pull request is ready to be reviewed and removed Status: WIP Indicates that an issue is currently being worked on or triaged labels Apr 24, 2025
Copy link
Contributor

@ShadowCurse ShadowCurse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks reasonable. Couple minor fixes and it should be good to go.

@DakshinD
Copy link
Contributor Author

DakshinD commented Apr 24, 2025

Thanks for the PR review earlier.

I have gone ahead and:

  • Addressed @roypat design changes. I have gone ahead and got rid of PVTime device, moved the logic into KvmVcpu, and just left some setup logic inside builder.rs. The only nit I have is that I don't like leaving the STEALTIME_STRUCT_MEM_SIZE const out in the open.
  • I have cleaned up the commit messages, as well as fixed the small nit on the integration test as @Manciukic said.
  • I have separated the commits into rust and integration test commits, as well as a separate one for the Changelog as @ShadowCurse said. Although, I wasn't able to move setup_pvtime into create_vmm_and_vcpus as mentioned.

I've double checked that style, clippy, steal time tests pass, etc. Any feedback on this new design would be welcome, as well as a run of the test-suite if possible. I believe (hopefully) that it should build fine on x86 as well.

Thank you!

Copy link
Contributor

@roypat roypat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thanks for doing that change! Everything looks a lot simpler now :D

Just a small comment about the restore code, but then this looks good to go from me too!

roypat
roypat previously approved these changes Apr 25, 2025
@Manciukic
Copy link
Contributor

One more issue on x86 :(

error[E0432]: unresolved import `crate::arch::VcpuArchError`
  --> src/vmm/src/builder.rs:23:39
   |
23 | use crate::arch::{ConfigurationError, VcpuArchError, configure_system_for_boot, load_kernel};
   |                                       ^^^^^^^^^^^^^
   |                                       |
   |                                       no `VcpuArchError` in `arch`
   |                                       help: a similar name exists in the module: `KvmArchError`
warning: unused import: `vm_memory::GuestAddress`
  --> src/vmm/src/builder.rs:17:5
   |
17 | use vm_memory::GuestAddress;
   |     ^^^^^^^^^^^^^^^^^^^^^^^
   |
   = note: `#[warn(unused_imports)]` on by default

@DakshinD
Copy link
Contributor Author

DakshinD commented Apr 25, 2025

@Manciukic Just pushed that fix for x86, sorry about that. Gonna try getting the cross-compilation working locally so I can test that soon.

Edit: Apologies, the unused import seems to show up as an error, let me fix that :(

Edit 2: Should be good now

Adds functionality for pvtime, which displays steal time
to guest on ARM machines. PVTime is persisted across
snapshots as well (snapshot ver updated).

- Added ipa per vCPU for mem region storing steal time info.
- Persists this ipa per vCPU across snapshots.
- Shared steal time mem region is setup on boot and restore
  from snapshot in builder.rs.

Signed-off-by: Dakshin Devanand <dakshind2005@gmail.com>
Added integration tests checking:
- steal time increase
- steal time persistence across snapshots
- pvtime existence on ARM

Motivated by addition of PVTime functionality
for ARM.

Signed-off-by: Dakshin Devanand <dakshind2005@gmail.com>
Add a changelog entry to inform about addition
of PVTime (steal time) functionality on ARM.

Signed-off-by: Dakshin Devanand <dakshind2005@gmail.com>
@roypat
Copy link
Contributor

roypat commented Apr 25, 2025

@Manciukic Just pushed that fix for x86, sorry about that. Gonna try getting the cross-compilation working locally so I can test that soon.

You should be able to fairly easily validate warnings / compiler errors locally with just cargo check --target x86_64-unknown-linux-gnu --all --tests, even without having a full cross-compilation toolchain installed (you'll only need to install the x86 target using rustup)

@DakshinD
Copy link
Contributor Author

DakshinD commented Apr 25, 2025

Thanks for pointing that out. It was telling me the x86_64-unknown-linux-gnu target may not be installed, so I would do rustup target add x86_64-unknown-linux-gnu and it would tell me everything was fine. But I had to run it with sudo for it to actually download the needed things :( Oversight on my part

@roypat roypat merged commit e2024c9 into firecracker-microvm:main Apr 28, 2025
6 of 7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Awaiting review Indicates that a pull request is ready to be reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature Request] Steal Time support in ARM
4 participants