Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create sandbox for host file system access #783

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

n0toose
Copy link
Member

@n0toose n0toose commented Nov 5, 2024

  • Add --mount parameter for "whitelisting" guest_paths and defining their respective filesystem paths on the host FS
  • Add UhyveFileMap structure, unit and integration tests
  • Implement sandbox for open() and unlink() syscall

A few points that could be further worked are unit tests, handling more of the parsing using the clap library directly and performance optimizations.


Fixes #767

@n0toose

This comment was marked as outdated.

Copy link

codecov bot commented Nov 5, 2024

Codecov Report

Attention: Patch coverage is 85.00000% with 36 lines in your changes missing coverage. Please review.

Project coverage is 70.27%. Comparing base (d4bc7da) to head (c150abc).

Files with missing lines Patch % Lines
src/hypercall.rs 61.11% 21 Missing ⚠️
src/isolation.rs 96.91% 5 Missing ⚠️
src/linux/x86_64/kvm_cpu.rs 54.54% 5 Missing ⚠️
src/bin/uhyve.rs 0.00% 3 Missing ⚠️
src/vm.rs 77.77% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #783      +/-   ##
==========================================
+ Coverage   68.95%   70.27%   +1.32%     
==========================================
  Files          20       21       +1     
  Lines        2757     2981     +224     
==========================================
+ Hits         1901     2095     +194     
- Misses        856      886      +30     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.


🚨 Try these New Features:

src/vm.rs Outdated Show resolved Hide resolved
@n0toose n0toose changed the title feat(sandbox): Add UhyveFileMap structure and sandbox Create sandbox for host file system access Nov 5, 2024
src/hypercall.rs Outdated Show resolved Hide resolved
src/hypercall.rs Outdated Show resolved Hide resolved
src/hypercall.rs Outdated Show resolved Hide resolved
src/isolation.rs Outdated Show resolved Hide resolved
src/isolation.rs Outdated Show resolved Hide resolved
src/isolation.rs Outdated Show resolved Hide resolved
src/params.rs Outdated Show resolved Hide resolved
src/vm.rs Outdated Show resolved Hide resolved
@n0toose

This comment was marked as outdated.

@n0toose n0toose force-pushed the sandbox-uhyvefilemap branch 2 times, most recently from d14cb43 to 0b243c1 Compare November 6, 2024 12:24
src/isolation.rs Outdated Show resolved Hide resolved
@n0toose

This comment was marked as outdated.

@n0toose n0toose force-pushed the sandbox-uhyvefilemap branch 3 times, most recently from cf4b61a to 6a11855 Compare November 11, 2024 16:30
src/bin/uhyve.rs Outdated Show resolved Hide resolved
@n0toose

This comment was marked as outdated.

@n0toose

This comment was marked as outdated.

@n0toose

This comment was marked as outdated.

@n0toose n0toose force-pushed the sandbox-uhyvefilemap branch 2 times, most recently from 677c06e to d9d750c Compare November 23, 2024 23:48
src/isolation.rs Outdated Show resolved Hide resolved
@@ -150,9 +152,11 @@ pub struct UhyveVm<VirtBackend: VirtualizationBackend> {
pub virtio_device: Arc<Mutex<VirtioNetPciDevice>>,
#[allow(dead_code)] // gdb is not supported on macos
pub(super) gdb_port: Option<u16>,
pub(crate) mount: Mutex<UhyveFileMap>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see above: mount is a maybe misleading name

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to investigate whether it's possible to add multiple different types of mounts, such as from an .img file like in QEMU, together with the current UhyveFileMap approach. Additionally, we did discuss that we wanted to have a "similar interface" to Docker's, including by using the -v parameter. It's not 100% "accurate", but I think it's close enough and leaves room for more features in the future.

Is this surely not the best option, compared to, say, mapping?

@jounathaen
Copy link
Member

A flaw of my current approach - which is currently not affecting us, as Uhyve is not "generally used for directories" and processes files under /root/ so far - is that each file and directory has to be mapped by the user.

I devised a method that recursively "searches" the parent directories until it finds a mapped parent directory, and then adds all directories and the file itself in the map: https://github.com/n0toose/uhyve/tree/sandbox-uhyvefilemap-directories

I don't fully understand, what you mean by "adds all directories and the file itself in the map"? I think the map should not be altered once created (besides when new files/directories are created from the kernel). I see a lot of issues with dead entries in the map if this route is taken.

This method is not ready yet. This process takes place during an open() and is costly - a cost that might be saved if we add preexisting directories and files during UhyveFileMap::new. Benchmarks are necessary to demonstrate the costliness and justify this method.

I am not sure if this is really that costly. How deep are directories typically nested? 10 entries deep? 10 string splits and lookups in a HashMap should not create a lot of overhead.

@n0toose
Copy link
Member Author

n0toose commented Nov 25, 2024

I don't fully understand, what you mean by "adds all directories and the file itself in the map"? I think the map should not be altered once created (besides when new files/directories are created from the kernel). I see a lot of issues with dead entries in the map if this route is taken.

Hm, I see. We could just check for the parent directory every time a file is opened then - the modifications should be minor. Do you think that we should check for child directories / files during the creation of the UhyveFileMap?

@n0toose
Copy link
Member Author

n0toose commented Nov 25, 2024

this rebase will be fun :D

adding you as a co-author to relevant commits

src/hypercall.rs Outdated Show resolved Hide resolved
src/hypercall.rs Outdated Show resolved Hide resolved
@jounathaen
Copy link
Member

I don't fully understand, what you mean by "adds all directories and the file itself in the map"? I think the map should not be altered once created (besides when new files/directories are created from the kernel). I see a lot of issues with dead entries in the map if this route is taken.

Hm, I see. We could just check for the parent directory every time a file is opened then - the modifications should be minor. Do you think that we should check for child directories / files during the creation of the UhyveFileMap?

No, why should we? A recursive lookup will be just fine. Also, this allows to reflect file/directory changes outside uhyve in the kernel during runtime.

@n0toose
Copy link
Member Author

n0toose commented Nov 25, 2024

No, why should we? A recursive lookup will be just fine. Also, this allows to reflect file/directory changes outside uhyve in the kernel during runtime.

Well, you have a point (plus, this does make my job much easier, so even less of a reason to object, and it can be revisited later :D). I'll integrate it once the initial changes get an OK and once I'm done with rebasing and fixing up a few additional details that I've noticed.

@n0toose
Copy link
Member Author

n0toose commented Nov 25, 2024

Initial round of rebases is complete (based on your suggestions), I'll do one or two rounds to fix the remaining comments and the test.

n0toose and others added 5 commits November 25, 2024 15:00
* Add --mount parameter for "whitelisting" guest_paths and defining
  their respective filesystem paths on the host FS
* Add UhyveFileMap structure
* Add sandbox support to open() syscall

A few points that could be further worked are unit tests, handling
more of the parsing using the clap library directly and performance
optimizations.

Helped-by: Çağatay Yiğit Şahin <[email protected]>
Helped-by: Jonathan Klimt <[email protected]>
Co-authored-by: Jonathan Klimt <[email protected]>
Introduces a test for UhyveFileMap and adjusts existing tests
accordingly.
- Rename file_map to mount
- Temporarily remove short parameter
- Temporarily remove environment variable
- Don't split file_map params with commas
- Change documentation, remove references to file_map variable
- Minor improvements to tests
This commit modifies existing tests to support this new feature.
An additional measure of modifying sysopen.flags was also introduced.

Co-authored-by: Jonathan Klimt <[email protected]>
Also removes the "Introduction" header, as the header takes more
space than the text itself now.

Originally introduced in e7869a9.
n0toose and others added 2 commits November 25, 2024 15:12
... as it was deemed unnecessary and redundant.

The tests were also modified accordingly.
Depending on the error, we return EINVAL (like in the kernel itself)
when O_DIRECTORY is used together with O_CREAT, EIO when the kernel
requests to open a file that does not have a UTF-8 filename, and
-ENOENT when the file is not present in the file map.

The sysopen flags were moved to uhyve-interface for now.
new_file_test now runs a kernel called open_close_file, which attempts
to open foo.txt, write to it, then closes the file, and attempts to
open it again so as to read from it. This is more of a workaround
because of the fact that we cannot read the temporary file easily,
but it should work just as well and expand the test's coverage to
"reading the contents of a file that is a temporary file on the
host filesystem".
If the guest path is part of a directory that is not mapped, we'll
recursively look for the parent directories and check if the parent
directories are mapped. If that's the case, we'll use the file in the
mapped host directory instead.

This feature also comes with a unit test. We partially rely on PathBuf
to prevent any funny behavior from taking place.

A current flaw of this approach is that the filename that can be found
in the OpenParams struct is "relative" instead of absolute, and
providing paths that are otherwise whitelisted (e.g. "/root/file.txt"
instead of "file.txt") will result in the map not return a result.
@n0toose
Copy link
Member Author

n0toose commented Nov 26, 2024

I pushed a new feature for directory support for UhyveFileMap. This should be ready for review.

A problem with the current implementation is that open, in the way that we use it provides a relative path that contains the file name foo.txt, which is mostly equivalent to /root/foo.txt. The UhyveFileMap sees those two files as "separate". If /root/ is whitelisted, and the user attempts to open foo.txt (which is basically /root/foo.txt), the open will fail. I attempted to use tests to "document" its behavior. As things are right now, this approach is better than nothing, and I think that it's merge-able as it is "more production-ready than a hypervisor that provides full host system access", despite the risk of confusion for the user.

However, the current design needs to be reworked with the answers to the following questions in mind:

  • If the guest_path does not contain an explicit absolute path, do we assume that the file is in /root and prepend it ourselves? (A previous early design took this into account - together with a virtiofsd-like "shared directory", but was removed after discussions. Now feels like a good time to bring it up again.)
  • Should we "get the current working directory" from the UhyveVm to compare the entire path against UhyveFileMap? How do other implementations handle it?
  • Do we need to make the virtual machine "aware" of what we have mapped, and let the VM reject specific paths? Are "bad" maps (what is "bad"? everything outside of /root?) capable of breaking things, or is the flexibility of being able to map everything to everything advantageous (e.g. to /etc/...)?

@n0toose
Copy link
Member Author

n0toose commented Nov 26, 2024

https://man7.org/linux/man-pages/man7/path_resolution.7.html

If the pathname does not start with the '/' character, the
starting lookup directory of the resolution process is the
current working directory of the process — or in the case of
openat(2)-style system calls, the dfd argument (or the current
working directory if AT_FDCWD is passed as the dfd
The current working directory is inherited from the parent, and
can be changed by use of the chdir(2) system call.

We could use UHYVE_MOUNT or /root if there is no / prefix, both when creating the map and when resolving open calls: https://github.com/hermit-os/kernel/blob/c2c4b1f09df28566091173db5255b8e7e17a1a08/src/fs/uhyve.rs#L324

///
/// Example: --mount host_dir:guest_dir --mount file.txt:guest_file.txt
#[clap(long)]
mount: Vec<String>,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We tend to let the user define parameters using the UHYVE_<PARAMETER> syntax. It was mentioned that "mount" is misleading here: #783 (comment)

However, there's an actual technical reason to adjust the name either on the Uhyve end or the Hermit end: HERMIT_MOUNT is used by the kernel to define the mountpoint used by Uhyve: #783 (comment)

https://github.com/hermit-os/kernel/blob/c2c4b1f09df28566091173db5255b8e7e17a1a08/src/fs/uhyve.rs#L324

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

File Isolation: Add file map for paths that the kernel should have read/write access kernel on.
3 participants