Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate eBPF Support #935

Open
XAMPPRocky opened this issue Apr 26, 2024 · 9 comments
Open

Investigate eBPF Support #935

XAMPPRocky opened this issue Apr 26, 2024 · 9 comments
Assignees
Labels
area/networking Related to networking I/O area/performance Anything to do with Quilkin being slow, or making it go faster. kind/design Proposal discussing new features / fixes and how they should be implemented kind/feature New feature or request

Comments

@XAMPPRocky
Copy link
Collaborator

XAMPPRocky commented Apr 26, 2024

We've talked around what eBPF might look like a few times now, so this is a tracking issue for looking into what it would take to support it. The main potential benefit of having higher throughput through removing syscalls and providing process separation from the update configuration and latency tasks for the game traffic.

Requirements

  • This added as an additional "mode" to quilkin, similar to agent, manage, and relay. We might add the mode as command under proxy (e.g. quilkin proxy ebpf) to be able to reuse the arguments from proxy, but in code, it would be a separate path, and probably closer to agent in how it works.
  • The eBPF still supports dynamic configuration like proxy does. How it pulls that configuration will obviously be different.
  • The eBPF filter still supports most of the filters, it doesn't need to support all, as there are some that could never work in the eBPF context, but it still needs to be enough that it is useful for production deployments.
@XAMPPRocky XAMPPRocky added kind/feature New feature or request area/performance Anything to do with Quilkin being slow, or making it go faster. kind/design Proposal discussing new features / fixes and how they should be implemented area/networking Related to networking I/O labels Apr 26, 2024
@Jake-Shadle Jake-Shadle self-assigned this Jun 25, 2024
@Jake-Shadle
Copy link
Collaborator

I've created a PoC at https://github.com/Jake-Shadle/ebpf-test to see how feasible this would be.

Note that the PoC supports (almost) exactly our actual use case of client packets w/token <-> proxy <-> server, nothing else.

The eBPF still supports dynamic configuration like proxy does. How it pulls that configuration will obviously be different.

I'm not sure what you mean by this? If it's configuration in terms of updating server endpoints that's totally fine, and my PoC does that, but if it's things like updating token sizes or something, that becomes problematic, as right now those are done via global variables, which can only be modified at load time due to how eBPF loaders work (modifying the actual contents of the ELF before passing it to the kernel to be verified). It might be possible to store those types of configs in a map with a single entry, but that would be a bit wasteful of always reading data that rarely/never changes. It would also be possible to just detach and reload the eBPF code and attach again as well, as that code takes milliseconds.

The eBPF filter still supports most of the filters, it doesn't need to support all, as there are some that could never work in the eBPF context, but it still needs to be enough that it is useful for production deployments.

I'm not sure I see the point in adding support for what filters could work in eBPF as we don't actually know what filters people are using currently. I would much rather add support for only our use case in an initial first pass, and either take on additional filters, or just have a library of helpers that people can use to create their own eBPF programs if they choose to not make them open source. Also, since all the filters are supported in non-eBPF quilkin, it means if people can't use the eBPF program(s) provided natively in quilkin because they don't support their particular use case, they can just use the regular proxy.

TODO

There are a couple of things missing from the PoC that would need to be done before it could be integrated into quilkin (pending discussions ofc).

  1. Metrics - Right now the eBPF program is a black box as I wanted to prove it could function correctly first, but collection of the same metrics that quilkin currently supports should be fairly easy to support with a single ring buffer that is processed in user space.
  2. Incremental UDP checksum - It should be possible to do incremental updates of the UDP checksum when modifying the packet, but I wasn't able to get it working so far, so right now we actually perform the full checksum computation, including the data payload, when sending IPv6 packets so that the packet isn't rejected by the receiver
  3. Load balancer suffix - In our use case at Embark, the (GCP?) load balancer adds a 1u8 at the end of the packet, this would mess up the token calculation, so I would need to modify the code to take that into account, if configured.

@XAMPPRocky
Copy link
Collaborator Author

I'm not sure what you mean by this?

I'll try to clarify the intent below, but for the future if there are parts of the requirements that you are uncertain about, don't understand, or have questions that come up during the course of the implementation. I would ask that you ask for clarification before finishing the proof of concept as that can save you a lot of time if we're both aligned on what it does and doesn't need to do.

if it's things like updating token sizes or something, that becomes problematic, as right now those are done via global variables, which can only be modified at load time due to how eBPF loaders work (modifying the actual contents of the ELF before passing it to the kernel to be verified). It might be possible to store those types of configs in a map with a single entry, but that would be a bit wasteful of always reading data that rarely/never changes. It would also be possible to just detach and reload the eBPF code and attach again as well, as that code takes milliseconds.

Yes, filter behaviour does need to be configurable. My expectation is that the user land side of the eBPF proxy deployment is reading the same config yaml file schema that we already have, and loading that configuration into the eBPF program to be the filter. Having that result in the eBPF program being detached and reattached is an acceptable compromise to only need to read that configuration at startup in eBPF.

I'm not sure I see the point in adding support for what filters could work in eBPF as we don't actually know what filters people are using currently. I would much rather add support for only our use case in an initial first pass, and either take on additional filters, or just have a library of helpers that people can use to create their own eBPF programs if they choose to not make them open source. Also, since all the filters are supported in non-eBPF quilkin, it means if people can't use the eBPF program(s) provided natively in quilkin because they don't support their particular use case, they can just use the regular proxy.

Well we first need a system for defining the filters. The proof of concept doesn't include a way to define a sequence of filters as the behaviour. Right now, the proof of concept doesn't seem to match even our specific configuration of filters, because as far as I can tell it doesn't read the version byte that the configuration specifies. This is why it's important to have filters be available and be specified through the configuration. If I want to change the current packet format (which I have been looking at), right now I would just update the config file, with what you're currently proposing, that would require writing a new program.

It's fine in my opinion for only the filters currently supported are the ones we use, but that needs done through a way where like in the existing setup, we are just defining filters as building blocks and users can join them together in configuration to create their setup, it can't be hardcoded to one setup in code.

I also think ideally we are splitting off the current filter code into its own crate, and sharing as much of that as possible between both implementations to reuse existing code where possible and keeping and behaviour divergences colocated so if we a different capture filter behaviour for example, both implementations are in the same file/folder.

In our use case at Embark, the (GCP?) load balancer adds a 1u8 at the end of the packet, this would mess up the token calculation, so I would need to modify the code to take that into account, if configured.

Are you sure about this? We don't use a load balancer in our deployed configurations all traffic is direct to the pod where possible and I don't remember deploying this before, would you be able to share more about this either here or on Slack?

@Jake-Shadle
Copy link
Collaborator

If I want to change the current packet format (which I have been looking at), right now I would just update the config file, with what you're currently proposing, that would require writing a new program.

It would possible to do something with tail call programs I suppose, but it would be vastly more complex than a simple program that does one thing. I think the easiest thing to do is to take a survey of current users to know what filters/configs are actually being used before spending time on them, or even figuring out a way they could be supported, or if the more complex use case users even want or care about eBPF at all.

It's fine in my opinion for only the filters currently supported are the ones we use, but that needs done through a way where like in the existing setup, we are just defining filters as building blocks and users can join them together in configuration to create their setup, it can't be hardcoded to one setup in code.

I agree the flexibility with the current filter configuration is nice to have...but I think it's also not actually important. Maybe this isn't the case for everyone, but it seems once you settle on a configuration that you want you will stick with it for a long time, so if one is wanting to use eBPF for performance etc then doing something like creating a bespoke program that will be used for weeks or months that does only what you need would make sense, with maybe a few toggles/variables exposed for quick changes or something. But again I come back to, if someone does need that high velocity config changes and complicated chains of filters then they can just use quilkin as it is today.

Are you sure about this? We don't use a load balancer in our deployed configurations all traffic is direct to the pod where possible and I don't remember deploying this before, would you be able to share more about this either here or on Slack?

This is the first filter, at least at the one config I am aware of.

      - name: quilkin.filters.capture.v1alpha1.Capture
        config:
          metadataKey: embark.dev/load_balancer/version
          suffix:
            size: 1
            remove: true

@XAMPPRocky
Copy link
Collaborator Author

It would possible to do something with tail call programs I suppose, but it would be vastly more complex than a simple program that does one thing.

The goal of eBPF support is to have the fastest implementation within the constraint of supporting filters and current feature set of quilkin, it's not to have one program that supports one specific configuration today.

I agree the flexibility with the current filter configuration is nice to have...but I think it's also not actually important. Maybe this isn't the case for everyone, but it seems once you settle on a configuration that you want you will stick with it for a long time, so if one is wanting to use eBPF for performance etc then doing something like creating a bespoke program that will be used for weeks or months that does only what you need would make sense, with maybe a few toggles/variables exposed for quick changes or something. But again I come back to, if someone does need that high velocity config changes and complicated chains of filters then they can just use quilkin as it is today.

I think that is largely missing the point of quilkin being a tool. People only tend to write configuration for nginx or istio once (both of which have eBPF modes available), but that doesn't mean that people choose to instead write a custom HTTP proxy program every time they need to route traffic. The point of the tool is to provide an abstraction that allows people to create and manage proxies easily without needing to write their own bespoke programs. You can setup a Quilkin proxy in a cluster in less than five minutes, you can't do that if the answer is write a bespoke program.

@Jake-Shadle
Copy link
Collaborator

Maybe changing to use AF_XDP is the easiest option then, the eBPF program would be solely used to redirect the packets we care about to be processed in userspace similarly to how they are today, just instead of using io-uring it's using the XDP umem stuff (maybe io_uring can still be used?). Though AF_XDP isn't supported by all NIC's so that might be a non-starter.

@XAMPPRocky
Copy link
Collaborator Author

Maybe changing to use AF_XDP is the easiest option then, the eBPF program would be solely used to redirect the packets we care about to be processed in userspace similarly to how they are today, just instead of using io-uring it's using the XDP umem stuff (maybe io_uring can still be used?).

That makes sense to me, the benefit we're looking for is bypassing kernel stack for processing, and from the docs you showed it looks like it achieves that, it's not like it would be faster if filters are compiled into a kernel module aside from the bypassing as far as I know.

Though AF_XDP isn't supported by all NIC's so that might be a non-starter.

I did a cursory search for support on GCP and AWS and it seems like both support AF_XDP, which is enough for us (though we should probably write a small program and deploy it to check).

I think it's fair to require support for this as long as using a regular UDP socket is still available. I don't know if there is a query for support at runtime (like could we attempt to load the eBPF module at runtime, then if it fails is that because it doesn't have support?), if not we can make it a separate flag and make it not the default, but I think runtime detection would be preferable.

@Jake-Shadle
Copy link
Collaborator

Yes, I believe it would fail to attach if it's not supported, so easy enough to fallback to the normal I/O path in that case.

@markmandel
Copy link
Member

Just chiming in to say - excited to see this work 👍🏻 In retrospect, I wish we'd actually started with an eBPF implementation (but hindsight is always 20/20).

Curious what sort of performance impact / difference you are seeing in the implementation?

@XAMPPRocky
Copy link
Collaborator Author

In retrospect, I wish we'd actually started with an eBPF implementation (but hindsight is always 20/20).

I mean I don't think GCP had support for things like AF_XDP until the past year or so in things like gVNIC, so might not have even been possible until recently 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/networking Related to networking I/O area/performance Anything to do with Quilkin being slow, or making it go faster. kind/design Proposal discussing new features / fixes and how they should be implemented kind/feature New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants