-
Notifications
You must be signed in to change notification settings - Fork 142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bpf seccomp #352
Bpf seccomp #352
Conversation
Ping @mato. Is there interest in this? |
@minad Yes, I think this is worth doing, but needs some more work. One thing we lose with this approach is that currently the seccomp filter is built up by calls in each module that gets activated, so the filter is effectively "tailored" to the modules in use. This is useful for future modules that might want to punch different (but again, optional) holes in the seccomp filter, and also follows the principle of least privilege. Further, I'm not sure how the "one big filter" approach would work in the presence of multiple instances of modules (devices/resources). This I think we won't know for certain until I have a prototype in place. |
This is right. But I don't think there is a solution which is as simple as this one if you want to build the seccomp filter step by step from each module. I don't see the seccomp filter as part of the device modules, but as something sitting below, like a firewall. Therefore I think it is also justified from a design point of view, to do it in a monolithic step. The filter builder should just iterate over all the active modules.
What do you mean by that? The filter I have written already supports multiple devices. There is support for up to 8 block devices, multiple IO stream devices (like serial ports) and multiple network devices. The filter blob is parametrized by the number of devices before loading. It is hard to make it more flexible than that. The limitation in the number of devices comes from the encoding of lookup tables in BPF, which is very limited. I made a testbed/prototype to test some things concerning multiple devices and the filter script proposed in this PR is just taken from there. For solo5 it would definitely need some tweaking. Alternatively you could look for a full blown BPF compiler solution, which allows you to build the filter up step by step. But this would involve much complexity. Basically libseccomp is such a thing, but it seems insufficient. For the scope of a highly restricted sandbox like solo5 I would rather prefer I very short hand written filter script, written in BPF ASM directly, which can be checked carefully. Also note that eBPF is not yet available for seccomp and I have no idea when it will be. There are at least some kernel patches floating around afaik. |
I've thought about your approach [of a single monolithic BPF] some more, especially in the context of the evolving support for multiple devices (#372 / #373).
The thing is, in the presence of multiple devices, the monolithic approach generates a weaker filter than the per-module gradual approach we have now. For example, in your BPF in Also, if we want to keep at least some checks to disallow/limit the impact of block writes past the end of the backing file (to enable #325), those checks need to be parametrized per file descriptor as each may have a different device capacity (i.e. limit for the
YMWV, but my opinion on this as maintainer is that the one-monolithic-filter approach will have a much higher maintenance cost further down the line. Any change to a module which might affect the filter requires careful coordination between all modules, and patching the monolith itself. With the libseccomp-based filter this is much reduced, as each module is responsible for it's "piece" of the filter. Personally, I find the libseccomp-based rules easier to follow from the tender's perspective than hand-written BPF. Granted, we are trusting libseccomp to generate correct rules, which may not always be the case (see http://www.paul-moore.com/blog/d/2019/03/libseccomp_v240.html), but that is a trade-off I'm willing to keep until we have an option which is convincingly better. One such option worth investigating is a compromise, where, given the relatively small set of seccomp rules we need, the fact that we always operate as a strict whitelist and the fact that the parametrization needed is straightforward (basically only changing literals, not control flow), we use a combination of a) modules being able to define (parametrized) BPF "snippets", each of which is self-contained and must either ...however, I'm not sure if implementing the above wouldn't just lead to a half-assed implementation of what libseccomp already does for us today. |
Taken from #343. The bpf assembly is not yet adapted for solo5 yet, since it supports multiple block devices etc.
This removes the dependency on libseccomp and it lifts the restriction of single block writes. Blocks must still be aligned however (related #325).