Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interaction between Jool and XFRM #427

Open
rnhmjoj opened this issue Nov 19, 2024 · 9 comments
Open

Interaction between Jool and XFRM #427

rnhmjoj opened this issue Nov 19, 2024 · 9 comments

Comments

@rnhmjoj
Copy link

rnhmjoj commented Nov 19, 2024

I'm running Jool in NAT64 mode on a router that is also the endpoint of an IPsec VPN in tunnel mode.
The VPN clients get IPv6 addresses from a local subnet and are NDP-proxied, so behave like they were on the LAN (encrypted traffic comes in from the IPv4 WAN and comes out either to the IPv6 WAN or LAN decrypted). Everything works as expected, except NAT64.

To see what goes wrong I traced a single ping from a VPN client to an IPv4 host on the internet:

# tcpdump -i any 'icmp or icmp6'
wan  In  IP6 2001:db8::1 (VPN client) > 64:ff9b::128.66.0.2 (mapped IPv4 host): ICMP6, echo request, id 26, seq 1, length 64
wan  Out IP  128.66.0.1 (router) > 128.66.0.2 (IPv4 host): ICMP echo request, id 47538, seq 1, length 64
wan  In  IP  128.66.0.2 (IPv4 host) > 128.66.0.1 (router): ICMP echo reply, id 47538, seq 1, length 64

$ jool session display --icmp
---------------------------------
Expires in 0:00:59.230
Remote: 128.66.0.2#54542	2001:db8::1#29
Local: 128.66.0.1#54542	64:ff9b::8042:2#29
---------------------------------

So, the echo request is correctly translated, the echo reply is received but apparently is not translated back to IPv6.
However, using this eBPF tool I can see the translated IPv6 reply has in fact been crafted but is being dropped for some reason:

# pwru 'icmp6 and src 64:ff9b::128.66.0.2' --output-meta --output-tuple
                 ip6_output iface=44(lan) proto=0x86dd mtu=1500 len=104 [64:ff9b::8042:2]:0->[2001:db8::1]:0(icmp6)
               nf_hook_slow iface=44(lan) proto=0x86dd mtu=1500 len=104 [64:ff9b::8042:2]:0->[2001:db8::1]:0(icmp6)
       selinux_ip_postroute iface=44(lan) proto=0x86dd mtu=1500 len=104 [64:ff9b::8042:2]:0->[2001:db8::1]:0(icmp6)
selinux_ip_postroute_compat iface=44(lan) proto=0x86dd mtu=1500 len=104 [64:ff9b::8042:2]:0->[2001:db8::1]:0(icmp6)
          ip6_finish_output iface=44(lan) proto=0x86dd mtu=1500 len=104 [64:ff9b::8042:2]:0->[2001:db8::1]:0(icmp6)
         ip6_finish_output2 iface=44(lan) proto=0x86dd mtu=1500 len=104 [64:ff9b::8042:2]:0->[2001:db8::1]:0(icmp6)
       neigh_resolve_output iface=44(lan) proto=0x86dd mtu=1500 len=104 [64:ff9b::8042:2]:0->[2001:db8::1]:0(icmp6)
         __neigh_event_send iface=44(lan) proto=0x86dd mtu=1500 len=104 [64:ff9b::8042:2]:0->[2001:db8::1]:0(icmp6)
         ndisc_error_report iface=44(lan) proto=0x86dd mtu=1500 len=104 [64:ff9b::8042:2]:0->[2001:db8::1]:0(icmp6)
           ip6_link_failure iface=44(lan) proto=0x86dd mtu=1500 len=104 [64:ff9b::8042:2]:0->[2001:db8::1]:0(icmp6)
                 icmp6_send iface=44(lan) proto=0x86dd mtu=1500 len=104 [64:ff9b::8042:2]:0->[2001:db8::1]:0(icmp6)
        icmpv6_route_lookup iface=44(lan) proto=0x86dd mtu=1500 len=104 [64:ff9b::8042:2]:0->[2001:db8::1]:0(icmp6)
      __xfrm_decode_session iface=44(lan) proto=0x86dd mtu=1500 len=104 [64:ff9b::8042:2]:0->[2001:db8::1]:0(icmp6)
            decode_session6 iface=44(lan) proto=0x86dd mtu=1500 len=104 [64:ff9b::8042:2]:0->[2001:db8::1]:0(icmp6)
kfree_skb_reason(SKB_DROP_REASON_NOT_SPECIFIED) iface=44(lan) proto=0x86dd mtu=1500 len=104 [64:ff9b::8042:2]:0->[2001:db8::1]:0(icmp6)
   skb_release_head_state iface=44(lan) proto=0x86dd mtu=1500 len=104 [64:ff9b::8042:2]:0->[2001:db8::1]:0(icmp6)
         skb_release_data iface=44(lan) proto=0x86dd mtu=1500 len=104 [64:ff9b::8042:2]:0->[2001:db8::1]:0(icmp6)
            skb_free_head iface=44(lan) proto=0x86dd mtu=1500 len=104 [64:ff9b::8042:2]:0->[2001:db8::1]:0(icmp6)
             kfree_skbmem iface=44(lan) proto=0x86dd mtu=1500 len=104 [64:ff9b::8042:2]:0->[2001:db8::1]:0(icmp6)

For comparison, this is what the trace of a ping to a host that doesn't need translation looks like:

# pwru 'icmp6 and src 2001:db8::2' --output-meta --output-tuple
        ipv6_gro_receive iface=2(eth0) proto=0x86dd mtu=1700 len=104 [2001:db8::2]:0->[2001:db8::1]:0(icmp6)
            nf_hook_slow iface=37(wan) proto=0x86dd mtu=1700 len=104 [2001:db8::2]:0->[2001:db8::1]:0(icmp6)
            ip6_rcv_core iface=37(wan) proto=0x86dd mtu=1700 len=104 [2001:db8::2]:0->[2001:db8::1]:0(icmp6)
            nf_hook_slow iface=37(wan) proto=0x86dd mtu=1700 len=104 [2001:db8::2]:0->[2001:db8::1]:0(icmp6)
         nf_ip6_checksum iface=37(wan) proto=0x86dd mtu=1700 len=104 [2001:db8::2]:0->[2001:db8::1]:0(icmp6)
 __skb_checksum_complete iface=37(wan) proto=0x86dd mtu=1700 len=104 [2001:db8::2]:0->[2001:db8::1]:0(icmp6)
         ip6_route_input iface=37(wan) proto=0x86dd mtu=1700 len=104 [2001:db8::2]:0->[2001:db8::1]:0(icmp6)
             ip6_forward iface=37(wan) proto=0x86dd mtu=1700 len=104 [2001:db8::2]:0->[2001:db8::1]:0(icmp6)
     __xfrm_policy_check iface=37(wan) proto=0x86dd mtu=1700 len=104 [2001:db8::2]:0->[2001:db8::1]:0(icmp6)
         decode_session6 iface=37(wan) proto=0x86dd mtu=1700 len=104 [2001:db8::2]:0->[2001:db8::1]:0(icmp6)
    __xfrm_route_forward iface=37(wan) proto=0x86dd mtu=1700 len=104 [2001:db8::2]:0->[2001:db8::1]:0(icmp6)
         decode_session6 iface=37(wan) proto=0x86dd mtu=1700 len=104 [2001:db8::2]:0->[2001:db8::1]:0(icmp6)
        pskb_expand_head iface=37(wan) proto=0x86dd mtu=1700 len=104 [2001:db8::2]:0->[2001:db8::1]:0(icmp6)
           skb_free_head iface=37(wan) proto=0x86dd mtu=1700 len=104 [2001:db8::2]:0->[2001:db8::1]:0(icmp6)
            nf_hook_slow iface=37(wan) proto=0x86dd mtu=1700 len=104 [2001:db8::2]:0->[2001:db8::1]:0(icmp6)
      selinux_ip_forward iface=37(wan) proto=0x86dd mtu=1700 len=104 [2001:db8::2]:0->[2001:db8::1]:0(icmp6)
            xfrm6_output iface=37(wan) proto=0x86dd mtu=1700 len=104 [2001:db8::2]:0->[2001:db8::1]:0(icmp6)
            nf_hook_slow iface=37(wan) proto=0x86dd mtu=1700 len=104 [2001:db8::2]:0->[2001:db8::1]:0(icmp6)
    selinux_ip_postroute iface=37(wan) proto=0x86dd mtu=1700 len=104 [2001:db8::2]:0->[2001:db8::1]:0(icmp6)
selinux_ip_postroute_compat iface=37(wan) proto=0x86dd mtu=1700 len=104 [2001:db8::2]:0->[2001:db8::1]:0(icmp6)
          __xfrm6_output iface=37(wan) proto=0x86dd mtu=1700 len=104 [2001:db8::2]:0->[2001:db8::1]:0(icmp6)
             xfrm_output iface=37(wan) proto=0x86dd mtu=1700 len=104 [2001:db8::2]:0->[2001:db8::1]:0(icmp6)
     xfrm_dev_offload_ok iface=37(wan) proto=0x86dd mtu=1700 len=104 [2001:db8::2]:0->[2001:db8::1]:0(icmp6)
      xfrm_output_resume iface=37(wan) proto=0x86dd mtu=1700 len=104 [2001:db8::2]:0->[2001:db8::1]:0(icmp6)
        pskb_expand_head iface=37(wan) proto=0x86dd mtu=1700 len=104 [2001:db8::2]:0->[2001:db8::1]:0(icmp6)
           skb_free_head iface=37(wan) proto=0x86dd mtu=1700 len=104 [2001:db8::2]:0->[2001:db8::1]:0(icmp6)
  xfrm_outer_mode_output iface=37(wan) proto=0x86dd mtu=1700 len=104 [2001:db8::2]:0->[2001:db8::1]:0(icmp6)
 xfrm6_tunnel_check_size iface=37(wan) proto=0x86dd mtu=1700 len=104 [2001:db8::2]:0->[2001:db8::1]:0(icmp6)

I don't know how to interpret these traces because I have no knowledge of the netfilter internals, but clearly the packet written by Jool is being sent to a different (wrong?) interface (LAN instead of WAN).
I'm not sure whether this issue is a limitation of Jool, Jool and XFRM clashing or the VPN policies not being configured correctly. Do you have any insight?

The Jool configuration is simply:

{
  "framework": "netfilter",
  "global": { "pool6": "64:ff9b::/96" },
  "instance": "default"
}

If you're interested, the VPN setup is fully defined here

@ydahhrk
Copy link
Member

ydahhrk commented Nov 26, 2024

clearly the packet written by Jool is being sent to a different (wrong?) interface (LAN instead of WAN).

Solid starting point.

The interface fields in the kernel packet structure have always seemed awkward to me, because whether they refer to the inbound or the outbound interface is context-dependent.

But assuming pwru's output indeed refers to the outbound interface, that's simply decided by the routing table.

Do you see a rule in ip -6 route that might steer packet 64:ff9b::8042:2 -> 2001:db8::1 towards lan?

I can see the translated IPv6 reply has in fact been crafted but is being dropped for some reason:

IPv6 finished successfully. Layer 2 likely failed to find a neighbor, and tried to respond an ICMPv6 error by way of a Layer 3 callback. The ICMPv6 error could not be sent either, seemingly because XFRM could not map one of the packets into a session.

So the core issue does seem to be that it fails to find a neighbor after the packet has been dumped into the wrong interface.

@rnhmjoj
Copy link
Author

rnhmjoj commented Nov 27, 2024

Do you see a rule in ip -6 route that might steer packet 64:ff9b::8042:2 -> 2001:db8::1 towards lan?

Initially I suspected the fact clients were given addresses from the LAN prefix could confuse Jool, so I set up a different prefix (not linked to any physical interface) and used that, but the packets still go to the LAN.

Maybe the problem is that there is no static route? I think the packet are routed only based on this policy:

# ip xfrm policy
src ::/0 dst 2001:db8::1/128 (VPN client)
        dir out priority 1769472
        tmpl src 128.66.0.1 (router) dst 128.66.0.3 (VPN client)
                proto esp reqid 16409 mode tunnel

src 2001:db8::1/128 (VPN client) dst ::/0
        dir fwd priority 1769472
        tmpl src 128.66.0.3 (VPN client) dst 128.66.0.1 (router)
                proto esp reqid 16409 mode tunnel

src 2001:db8::1/128 (VPN client) dst ::/0
        dir in priority 1769472
        tmpl src 128.66.0.3 (VPN client) () dst 128.66.0.1 (router)
                proto esp reqid 16409 mode tunnel

src ::/0 dst ::/0
        socket out priority 0

@ydahhrk
Copy link
Member

ydahhrk commented Nov 28, 2024

Please bear with me, because I don't really have any experience configuring VPNs, and in its fervor to sell them to me in the name of buzzwords, the Internet is remarkably atrocious at explaining them.

+-------------+
| VPN client  |
+-------------+
| 2001:db8::1 |
| 128.66.0.3  |
+-------------+
      |
+-------------+
| 128.66.0.1  |
+-------------+
| router      |
+-------------+
| 128.66.0.1  |
+-------------+
      |
+-------------+
| 128.66.0.2  |
+-------------+
| IPv4 host   |
+-------------+

How come everyone involved seems to be in the same IPv4 network (128.66.0)? Is "IPv4 host" also supposed to be a VPN client? Is 128.66.0 supposed to be the VPN?

Is 2001:db8::1 also a member of the VPN?

What is the point of the translator in this environment?

Which is lan, which is wan and which is eth0? Which are the networks connected to each of those interfaces?

What's the expected packet flow?

It seems to be

  1. VPN client writes 2001:db8::1 > 64:ff9b::128.66.0.2
  2. Router receives that (presumably from the lan interface), and translates it into 128.66.0.1 > 128.66.0.2.
  3. Router encrypts and encapsulates that inside of packet a.b.c.d > e.f.g.h... and sends it through the wan interface.
  4. The IPv4 host receives that packet, decapsulates 128.66.0.1 > 128.66.0.2 and responds e.f.g.h > a.b.c.d containing 128.66.0.2 > 128.66.0.1.
  5. Router decapsulates 128.66.0.2 > 128.66.0.1.
  6. Router translates that into 64:ff9b::128.66.0.2 > 2001:db8::1, and sends it through lan.
  7. VPN client receives the packet.

Maybe the problem is that there is no static route? I think the packet are routed only based on this policy:

I don't know the internals of XFRM, but if the packet is supposed to be "routed based on its policies," I'd expect the policies to result in routes in the routing table.

It shouldn't matter if the route is static or dynamic.

Are you getting an empty table when you execute ip -6 route?

src ::/0 dst 2001:db8::1/128 (VPN client)
        dir out priority 1769472
        tmpl src 128.66.0.1 (router) dst 128.66.0.3 (VPN client)
                proto esp reqid 16409 mode tunnel

How do you read this? I'm guessing this means "if you get packet X > 2001:db8::1, encapsulate it inside of packet 128.66.0.1 > 128.66.0.3," but if this is the case, it raises several questions.

@rnhmjoj
Copy link
Author

rnhmjoj commented Nov 29, 2024

How come everyone involved seems to be in the same IPv4 network (128.66.0)? Is "IPv4 host" also supposed to be a VPN client? Is 128.66.0 supposed to be the VPN?

Sorry, I just made the addresses up and it must have become confusing.
The VPN is essentially an old-fashioned IPv6-in-IPv4 tunnel: I'm assuming the VPN client connects to the router from some remote IPv4-only network, thanks to the VPN it acquires an IPv6 addresess (from the LAN prefix) and is able to reach the IPv6 internet.

Is 128.66.0 supposed to be the VPN?

No, the VPN is strictly IPv6-only.

What is the point of the translator in this environment?

It performs NAT64 for the IPv6-only LAN, and ideally also the VPN clients.

The gist is, the VPN client should be able to also reach IPv4 hosts through 64:ff9b::/96. Yes, this apparently makes no sense because it could connect directly (it has an IPv4 address), but I'm trying to route the traffic this way so it is encrypted.

Which is lan, which is wan and which is eth0? Which are the networks connected to each of those interfaces?

I'll draw you a diagram:

                                                                  Internet
+--------------------------------------------------------------------------+
|                                                                          |
|                                                      +----------------+  |
|                                                      |                |  |
|       +---------------+                              |   VPN Client   |  |
|       |               |                              |   (IPv4 only)  |  |
|       |   IPv4 host   |            IPv4              |                |  |
|       |               |                              +----------------+  |
|       +---------------+        +---------------------| 128.66.0.3/32  |  |
|    +->| 128.66.0.2/32 |        |                     +----------------+  |
|    |  +---------------+        | ....................| 2001:db8::1/64 |  |
|    |                           | :                   +----------------+  |
|    |                           | :   VPN tunnel                          |
|    | IPv4                      | :    (IPv6)                             |
|    |                           | :                                       |
+----+---------------------------+-:---------------------------------------+
     |                           | :
     |     +----------------+    | :
     |     |                |    | :
     |     |     Router     |    | :
     |     |                |    | :
     |     +----------------+    | :
     +-----| 128.66.0.1/32  |<---+ :
           +----------------+      :
           | 2001:db8::5/64 |<.....:
           +----------------+
              ^     ^                     LAN 2001:db8::/64
              |     |     +--------------------------------+
              |     |     |                                |
              |     |     |            +----------------+  |
              |     |     |            |                |  |
              |     +-----|----------> |      PC 2      |  |
              |           |            |                |  |
              |           |            +----------------+  |
              |           |            | 2001:db8::3/64 |  |
              |           |            +----------------+  |
              |           |                                |
              |           |    +----------------+          |
              +-----------|--->|                |          |
                          |    |      PC 1      |          |
                          |    |                |          |
                          |    +----------------+          |
                          |    | 2001:db8::2/64 |          |
                          |    +----------------+          |
                          |                                |
                          +--------------------------------+

The expected flow for:

user@client $ ping 64:ff9b::128.66.0.2

is the following:

  1. 2001:db8::1 (VPN client) --> 2001:db8::5 (Router) [encapsulated in the tunnel 128.66.0.3/32 <--> 128.66.0.1/32]
  2. 128.66.0.1 (router) --> 128.66.0.2 (IPv4 host)
  3. 128.66.0.2 (IPv4 host) --> 128.66.0.1 (router)
  4. 2001:db8::5 (Router) --> 2001:db8::1 (VPN client) [encapsulated in the tunnel 128.66.0.3/32 <--> 128.66.0.1/32]

Are you getting an empty table when you execute ip -6 route?

Essentially yes: there is only the /64 LAN and the default route via the ISP.

src ::/0 dst 2001:db8::1/128 (VPN client)
dir out priority 1769472
tmpl src 128.66.0.1 (router) dst 128.66.0.3 (VPN client)
proto esp reqid 16409 mode tunnel

How do you read this? I'm guessing this means "if you get packet X > 2001:db8::1, encapsulate it inside of packet 128.66.0.1 > 128.66.0.3," but if this is the case, it raises several questions.

Correct.

@ydahhrk
Copy link
Member

ydahhrk commented Nov 29, 2024

Ok, this makes much more sense now.

Well, looking at the diagram here, it seems XFRM happens after Prerouting. If you compare this to Jool's version, it doesn't really fit.

Jool picks up your packet at Prerouting, then dumps it in Postrouting. At that point, the packet might U-turn into the "Xfrm encode" box, but by then it skipped all the XFRM lookup analysis.

This matches your pwru output:

# pwru 'icmp6 and src 64:ff9b::128.66.0.2' --output-meta --output-tuple
                 ip6_output iface=44(lan) (...) [64:ff9b::8042:2]:0->[2001:db8::1]:0(icmp6)

Jool left the packet in ip6_output(), which is the kernel's outer IPv6 Postrouting function. In other words, it skipped forwarding, along with all the middle XFRM boxes in it.

Your non-translated packet, by contrast, traverses "Forward" as normal:

pwru 'icmp6 and src 2001:db8::2' --output-meta --output-tuple
            ip6_rcv_core iface=37(wan) (...) [2001:db8::2]:0->[2001:db8::1]:0(icmp6)
                   (...)
             ip6_forward iface=37(wan) (...) [2001:db8::2]:0->[2001:db8::1]:0(icmp6)
                   (...)
    selinux_ip_postroute iface=37(wan) (...) [2001:db8::2]:0->[2001:db8::1]:0(icmp6)

I think your best bet is to separate Jool and XFRM into separate network namespaces. That way, you'll have complete control over what happens when.

Would this be viable?

@rnhmjoj
Copy link
Author

rnhmjoj commented Nov 30, 2024

In other words, it skipped forwarding, along with all the middle XFRM boxes in it.

Oh ok, so it means Jool and XFRM are essentially incompatible, correct?

I think your best bet is to separate Jool and XFRM into separate network namespaces. That way, you'll have complete control over what happens when.

I'll have to look into this as I've never used network namespaces. From what I understand I need to put Jool into an ad-hoc namespace, create a veth device that links it to the real namespace and route 64:ff9b::/96 to it.

Anyway, thank you for your time and for Jool!

@ydahhrk
Copy link
Member

ydahhrk commented Dec 2, 2024

Oh ok, so it means Jool and XFRM are essentially incompatible, correct?

Assuming the "Xfrm lookup fwd policy" and "Xfrm lookup out policy" boxes (from the thermalcircle.de diagram) are mandatory:

If they're in the same namespace, yes. Unforturately.

I'll have to look into this as I've never used network namespaces. From what I understand I need to put Jool into an ad-hoc namespace, create a veth device that links it to the real namespace and route 64:ff9b::/96 to it.

Yes. You'll also need to route the pool4 address to it.

Alternatively, if you do IPv4 in one interface and IPv6 in a separate interface, you can move one of those interfaces to the virtual namespace. This might simplify the routing, as it'll mean packets won't have to do U-turns in and out of the Jool namespace:

               |
               |
        +-------------+
+=======| eth0        |=============+
|    +--| 2001:db8::5 |---------+   |
|    |  +-------------+         |   |
|    |         |        Virtual |   |
|    |       Jool     Namespace |   |
|    |         |                |   |
|    |  +-----------------+     |   |
|    |  | vethA           |     |   |
|    +--| 192.168.0.1     |-----+   |
|       | 2001:db8:AAAA:1 |         |
|       +-----------------+         |
|              |                    |
|       +-----------------+         |
|    +--| vethB           |-----+   |
|    |  | 192.168.0.2     |     |   |
|    |  | 2001:db8:AAAA:2 |     |   |
|    |  +-----------------+     |   |
|    |         |         Global |   |
|    |       XFRM     Namespace |   |
|    |         |                |   |
|    |  +-------------+         |   |
|    +--| eth1        |---------+   |
+=======| 128.66.0.1  |=============+
        +-------------+
               |
               |

I don't know if NixOS has a special way of doing it, but here's how I setup temporary (won't survive reboots) virtual namespaces in one of my test suites:

To move interface eth0 into namespace potato, run

ip link set dev eth0 netns potato

@ydahhrk
Copy link
Member

ydahhrk commented Dec 2, 2024

Alternatively alternatively, place Jool in one machine, and XFRM in another. At the end of the day, that's what the namespaces would be simulating.

@rnhmjoj
Copy link
Author

rnhmjoj commented Dec 3, 2024

Unfortunately I can't move the IPv4 WAN interface into the jool namespace, because it's used by the VPN and other services too. I've implemented the setup from this script I found and it seems to work, but it's quite ugly (the IPv4 NAT in particular). Do you happen to have a cleaner setup? Thank you again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants