Bpf seccomp #352

minad · 2019-04-10T19:06:10Z

Taken from #343. The bpf assembly is not yet adapted for solo5 yet, since it supports multiple block devices etc.

This removes the dependency on libseccomp and it lifts the restriction of single block writes. Blocks must still be aligned however (related #325).

minad · 2019-04-24T11:16:34Z

Ping @mato. Is there interest in this?

mato · 2019-05-10T10:18:39Z

@minad Yes, I think this is worth doing, but needs some more work.

One thing we lose with this approach is that currently the seccomp filter is built up by calls in each module that gets activated, so the filter is effectively "tailored" to the modules in use. This is useful for future modules that might want to punch different (but again, optional) holes in the seccomp filter, and also follows the principle of least privilege.

Further, I'm not sure how the "one big filter" approach would work in the presence of multiple instances of modules (devices/resources). This I think we won't know for certain until I have a prototype in place.

minad · 2019-05-10T10:46:35Z

One thing we lose with this approach is that currently the seccomp filter is built up by calls in each module that gets activated, so the filter is effectively "tailored" to the modules in use. This is useful for future modules that might want to punch different (but again, optional) holes in the seccomp filter, and also follows the principle of least privilege.

This is right. But I don't think there is a solution which is as simple as this one if you want to build the seccomp filter step by step from each module. I don't see the seccomp filter as part of the device modules, but as something sitting below, like a firewall. Therefore I think it is also justified from a design point of view, to do it in a monolithic step. The filter builder should just iterate over all the active modules.

Further, I'm not sure how the "one big filter" approach would work in the presence of multiple instances of modules (devices/resources). This I think we won't know for certain until I have a prototype in place.

What do you mean by that? The filter I have written already supports multiple devices. There is support for up to 8 block devices, multiple IO stream devices (like serial ports) and multiple network devices. The filter blob is parametrized by the number of devices before loading. It is hard to make it more flexible than that. The limitation in the number of devices comes from the encoding of lookup tables in BPF, which is very limited. I made a testbed/prototype to test some things concerning multiple devices and the filter script proposed in this PR is just taken from there. For solo5 it would definitely need some tweaking.

Alternatively you could look for a full blown BPF compiler solution, which allows you to build the filter up step by step. But this would involve much complexity. Basically libseccomp is such a thing, but it seems insufficient.

For the scope of a highly restricted sandbox like solo5 I would rather prefer I very short hand written filter script, written in BPF ASM directly, which can be checked carefully. Also note that eBPF is not yet available for seccomp and I have no idea when it will be. There are at least some kernel patches floating around afaik.

mato · 2019-06-24T14:36:11Z

@minad:

I've thought about your approach [of a single monolithic BPF] some more, especially in the context of the evolving support for multiple devices (#372 / #373).

What do you mean by that? The filter I have written already supports multiple devices. There is support for up to 8 block devices, multiple IO stream devices (like serial ports) and multiple network devices. The filter blob is parametrized by the number of devices before loading. It is hard to make it more flexible than that.

The thing is, in the presence of multiple devices, the monolithic approach generates a weaker filter than the per-module gradual approach we have now.

For example, in your BPF in block_call:, you're making the assumption that the file descriptors on which pread / pwrite may be called are part of the contiguous range $BPF_BLOCK_MIN ... $BPF_BLOCK_MAX, and said range contains no other file descriptors. There's no way to guarantee that, as the devices get attached in whichever order the user may have specified them on the tender's command line. Contrast this to the libseccomp-based approach which adds a rule specific to each file descriptor.

Also, if we want to keep at least some checks to disallow/limit the impact of block writes past the end of the backing file (to enable #325), those checks need to be parametrized per file descriptor as each may have a different device capacity (i.e. limit for the pos argument to the pwrite call). I don't see how to achieve that without generating at least some of the filter at tender run-time.

For the scope of a highly restricted sandbox like solo5 I would rather prefer I very short hand written filter script, written in BPF ASM directly, which can be checked carefully.

YMWV, but my opinion on this as maintainer is that the one-monolithic-filter approach will have a much higher maintenance cost further down the line. Any change to a module which might affect the filter requires careful coordination between all modules, and patching the monolith itself. With the libseccomp-based filter this is much reduced, as each module is responsible for it's "piece" of the filter.

Personally, I find the libseccomp-based rules easier to follow from the tender's perspective than hand-written BPF. Granted, we are trusting libseccomp to generate correct rules, which may not always be the case (see http://www.paul-moore.com/blog/d/2019/03/libseccomp_v240.html), but that is a trade-off I'm willing to keep until we have an option which is convincingly better.

One such option worth investigating is a compromise, where, given the relatively small set of seccomp rules we need, the fact that we always operate as a strict whitelist and the fact that the parametrization needed is straightforward (basically only changing literals, not control flow), we use a combination of

a) modules being able to define (parametrized) BPF "snippets", each of which is self-contained and must either ret ALLOW or continue to the next instruction.
b) the core then combining all snippets into a single BPF filter, with the arch-specific header and a ret KILL at the end.

...however, I'm not sure if implementing the above wouldn't just lead to a half-assed implementation of what libseccomp already does for us today.

minad added 2 commits April 28, 2019 12:06

remove libseccomp configuration

65229a8

use bpf seccomp filter

8b107ae

minad mentioned this pull request May 8, 2019

Make spt independent of libseccomp/"Tenderless" spt #341

Closed

mato mentioned this pull request Jun 24, 2019

Enable block operations for more than 1 block #325

Open

minad closed this Jun 24, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bpf seccomp #352

Bpf seccomp #352

minad commented Apr 10, 2019 •

edited

Loading

minad commented Apr 24, 2019

mato commented May 10, 2019

minad commented May 10, 2019 •

edited

Loading

mato commented Jun 24, 2019

Bpf seccomp #352

Bpf seccomp #352

Conversation

minad commented Apr 10, 2019 • edited Loading

minad commented Apr 24, 2019

mato commented May 10, 2019

minad commented May 10, 2019 • edited Loading

mato commented Jun 24, 2019

minad commented Apr 10, 2019 •

edited

Loading

minad commented May 10, 2019 •

edited

Loading