Interaction between Jool and XFRM #427

rnhmjoj · 2024-11-19T22:07:12Z

I'm running Jool in NAT64 mode on a router that is also the endpoint of an IPsec VPN in tunnel mode.
The VPN clients get IPv6 addresses from a local subnet and are NDP-proxied, so behave like they were on the LAN (encrypted traffic comes in from the IPv4 WAN and comes out either to the IPv6 WAN or LAN decrypted). Everything works as expected, except NAT64.

To see what goes wrong I traced a single ping from a VPN client to an IPv4 host on the internet:

# tcpdump -i any 'icmp or icmp6'
wan  In  IP6 2001:db8::1 (VPN client) > 64:ff9b::128.66.0.2 (mapped IPv4 host): ICMP6, echo request, id 26, seq 1, length 64
wan  Out IP  128.66.0.1 (router) > 128.66.0.2 (IPv4 host): ICMP echo request, id 47538, seq 1, length 64
wan  In  IP  128.66.0.2 (IPv4 host) > 128.66.0.1 (router): ICMP echo reply, id 47538, seq 1, length 64

$ jool session display --icmp
---------------------------------
Expires in 0:00:59.230
Remote: 128.66.0.2#54542	2001:db8::1#29
Local: 128.66.0.1#54542	64:ff9b::8042:2#29
---------------------------------

So, the echo request is correctly translated, the echo reply is received but apparently is not translated back to IPv6.
However, using this eBPF tool I can see the translated IPv6 reply has in fact been crafted but is being dropped for some reason:

# pwru 'icmp6 and src 64:ff9b::128.66.0.2' --output-meta --output-tuple
                 ip6_output iface=44(lan) proto=0x86dd mtu=1500 len=104 [64:ff9b::8042:2]:0->[2001:db8::1]:0(icmp6)
               nf_hook_slow iface=44(lan) proto=0x86dd mtu=1500 len=104 [64:ff9b::8042:2]:0->[2001:db8::1]:0(icmp6)
       selinux_ip_postroute iface=44(lan) proto=0x86dd mtu=1500 len=104 [64:ff9b::8042:2]:0->[2001:db8::1]:0(icmp6)
selinux_ip_postroute_compat iface=44(lan) proto=0x86dd mtu=1500 len=104 [64:ff9b::8042:2]:0->[2001:db8::1]:0(icmp6)
          ip6_finish_output iface=44(lan) proto=0x86dd mtu=1500 len=104 [64:ff9b::8042:2]:0->[2001:db8::1]:0(icmp6)
         ip6_finish_output2 iface=44(lan) proto=0x86dd mtu=1500 len=104 [64:ff9b::8042:2]:0->[2001:db8::1]:0(icmp6)
       neigh_resolve_output iface=44(lan) proto=0x86dd mtu=1500 len=104 [64:ff9b::8042:2]:0->[2001:db8::1]:0(icmp6)
         __neigh_event_send iface=44(lan) proto=0x86dd mtu=1500 len=104 [64:ff9b::8042:2]:0->[2001:db8::1]:0(icmp6)
         ndisc_error_report iface=44(lan) proto=0x86dd mtu=1500 len=104 [64:ff9b::8042:2]:0->[2001:db8::1]:0(icmp6)
           ip6_link_failure iface=44(lan) proto=0x86dd mtu=1500 len=104 [64:ff9b::8042:2]:0->[2001:db8::1]:0(icmp6)
                 icmp6_send iface=44(lan) proto=0x86dd mtu=1500 len=104 [64:ff9b::8042:2]:0->[2001:db8::1]:0(icmp6)
        icmpv6_route_lookup iface=44(lan) proto=0x86dd mtu=1500 len=104 [64:ff9b::8042:2]:0->[2001:db8::1]:0(icmp6)
      __xfrm_decode_session iface=44(lan) proto=0x86dd mtu=1500 len=104 [64:ff9b::8042:2]:0->[2001:db8::1]:0(icmp6)
            decode_session6 iface=44(lan) proto=0x86dd mtu=1500 len=104 [64:ff9b::8042:2]:0->[2001:db8::1]:0(icmp6)
kfree_skb_reason(SKB_DROP_REASON_NOT_SPECIFIED) iface=44(lan) proto=0x86dd mtu=1500 len=104 [64:ff9b::8042:2]:0->[2001:db8::1]:0(icmp6)
   skb_release_head_state iface=44(lan) proto=0x86dd mtu=1500 len=104 [64:ff9b::8042:2]:0->[2001:db8::1]:0(icmp6)
         skb_release_data iface=44(lan) proto=0x86dd mtu=1500 len=104 [64:ff9b::8042:2]:0->[2001:db8::1]:0(icmp6)
            skb_free_head iface=44(lan) proto=0x86dd mtu=1500 len=104 [64:ff9b::8042:2]:0->[2001:db8::1]:0(icmp6)
             kfree_skbmem iface=44(lan) proto=0x86dd mtu=1500 len=104 [64:ff9b::8042:2]:0->[2001:db8::1]:0(icmp6)

For comparison, this is what the trace of a ping to a host that doesn't need translation looks like:

# pwru 'icmp6 and src 2001:db8::2' --output-meta --output-tuple
        ipv6_gro_receive iface=2(eth0) proto=0x86dd mtu=1700 len=104 [2001:db8::2]:0->[2001:db8::1]:0(icmp6)
            nf_hook_slow iface=37(wan) proto=0x86dd mtu=1700 len=104 [2001:db8::2]:0->[2001:db8::1]:0(icmp6)
            ip6_rcv_core iface=37(wan) proto=0x86dd mtu=1700 len=104 [2001:db8::2]:0->[2001:db8::1]:0(icmp6)
            nf_hook_slow iface=37(wan) proto=0x86dd mtu=1700 len=104 [2001:db8::2]:0->[2001:db8::1]:0(icmp6)
         nf_ip6_checksum iface=37(wan) proto=0x86dd mtu=1700 len=104 [2001:db8::2]:0->[2001:db8::1]:0(icmp6)
 __skb_checksum_complete iface=37(wan) proto=0x86dd mtu=1700 len=104 [2001:db8::2]:0->[2001:db8::1]:0(icmp6)
         ip6_route_input iface=37(wan) proto=0x86dd mtu=1700 len=104 [2001:db8::2]:0->[2001:db8::1]:0(icmp6)
             ip6_forward iface=37(wan) proto=0x86dd mtu=1700 len=104 [2001:db8::2]:0->[2001:db8::1]:0(icmp6)
     __xfrm_policy_check iface=37(wan) proto=0x86dd mtu=1700 len=104 [2001:db8::2]:0->[2001:db8::1]:0(icmp6)
         decode_session6 iface=37(wan) proto=0x86dd mtu=1700 len=104 [2001:db8::2]:0->[2001:db8::1]:0(icmp6)
    __xfrm_route_forward iface=37(wan) proto=0x86dd mtu=1700 len=104 [2001:db8::2]:0->[2001:db8::1]:0(icmp6)
         decode_session6 iface=37(wan) proto=0x86dd mtu=1700 len=104 [2001:db8::2]:0->[2001:db8::1]:0(icmp6)
        pskb_expand_head iface=37(wan) proto=0x86dd mtu=1700 len=104 [2001:db8::2]:0->[2001:db8::1]:0(icmp6)
           skb_free_head iface=37(wan) proto=0x86dd mtu=1700 len=104 [2001:db8::2]:0->[2001:db8::1]:0(icmp6)
            nf_hook_slow iface=37(wan) proto=0x86dd mtu=1700 len=104 [2001:db8::2]:0->[2001:db8::1]:0(icmp6)
      selinux_ip_forward iface=37(wan) proto=0x86dd mtu=1700 len=104 [2001:db8::2]:0->[2001:db8::1]:0(icmp6)
            xfrm6_output iface=37(wan) proto=0x86dd mtu=1700 len=104 [2001:db8::2]:0->[2001:db8::1]:0(icmp6)
            nf_hook_slow iface=37(wan) proto=0x86dd mtu=1700 len=104 [2001:db8::2]:0->[2001:db8::1]:0(icmp6)
    selinux_ip_postroute iface=37(wan) proto=0x86dd mtu=1700 len=104 [2001:db8::2]:0->[2001:db8::1]:0(icmp6)
selinux_ip_postroute_compat iface=37(wan) proto=0x86dd mtu=1700 len=104 [2001:db8::2]:0->[2001:db8::1]:0(icmp6)
          __xfrm6_output iface=37(wan) proto=0x86dd mtu=1700 len=104 [2001:db8::2]:0->[2001:db8::1]:0(icmp6)
             xfrm_output iface=37(wan) proto=0x86dd mtu=1700 len=104 [2001:db8::2]:0->[2001:db8::1]:0(icmp6)
     xfrm_dev_offload_ok iface=37(wan) proto=0x86dd mtu=1700 len=104 [2001:db8::2]:0->[2001:db8::1]:0(icmp6)
      xfrm_output_resume iface=37(wan) proto=0x86dd mtu=1700 len=104 [2001:db8::2]:0->[2001:db8::1]:0(icmp6)
        pskb_expand_head iface=37(wan) proto=0x86dd mtu=1700 len=104 [2001:db8::2]:0->[2001:db8::1]:0(icmp6)
           skb_free_head iface=37(wan) proto=0x86dd mtu=1700 len=104 [2001:db8::2]:0->[2001:db8::1]:0(icmp6)
  xfrm_outer_mode_output iface=37(wan) proto=0x86dd mtu=1700 len=104 [2001:db8::2]:0->[2001:db8::1]:0(icmp6)
 xfrm6_tunnel_check_size iface=37(wan) proto=0x86dd mtu=1700 len=104 [2001:db8::2]:0->[2001:db8::1]:0(icmp6)

I don't know how to interpret these traces because I have no knowledge of the netfilter internals, but clearly the packet written by Jool is being sent to a different (wrong?) interface (LAN instead of WAN).
I'm not sure whether this issue is a limitation of Jool, Jool and XFRM clashing or the VPN policies not being configured correctly. Do you have any insight?

The Jool configuration is simply:

{
  "framework": "netfilter",
  "global": { "pool6": "64:ff9b::/96" },
  "instance": "default"
}

If you're interested, the VPN setup is fully defined here

The text was updated successfully, but these errors were encountered:

ydahhrk · 2024-11-26T17:18:44Z

clearly the packet written by Jool is being sent to a different (wrong?) interface (LAN instead of WAN).

Solid starting point.

The interface fields in the kernel packet structure have always seemed awkward to me, because whether they refer to the inbound or the outbound interface is context-dependent.

But assuming pwru's output indeed refers to the outbound interface, that's simply decided by the routing table.

Do you see a rule in ip -6 route that might steer packet 64:ff9b::8042:2 -> 2001:db8::1 towards lan?

I can see the translated IPv6 reply has in fact been crafted but is being dropped for some reason:

IPv6 finished successfully. Layer 2 likely failed to find a neighbor, and tried to respond an ICMPv6 error by way of a Layer 3 callback. The ICMPv6 error could not be sent either, seemingly because XFRM could not map one of the packets into a session.

So the core issue does seem to be that it fails to find a neighbor after the packet has been dumped into the wrong interface.

rnhmjoj · 2024-11-27T06:59:33Z

Do you see a rule in ip -6 route that might steer packet 64:ff9b::8042:2 -> 2001:db8::1 towards lan?

Initially I suspected the fact clients were given addresses from the LAN prefix could confuse Jool, so I set up a different prefix (not linked to any physical interface) and used that, but the packets still go to the LAN.

Maybe the problem is that there is no static route? I think the packet are routed only based on this policy:

# ip xfrm policy
src ::/0 dst 2001:db8::1/128 (VPN client)
        dir out priority 1769472
        tmpl src 128.66.0.1 (router) dst 128.66.0.3 (VPN client)
                proto esp reqid 16409 mode tunnel

src 2001:db8::1/128 (VPN client) dst ::/0
        dir fwd priority 1769472
        tmpl src 128.66.0.3 (VPN client) dst 128.66.0.1 (router)
                proto esp reqid 16409 mode tunnel

src 2001:db8::1/128 (VPN client) dst ::/0
        dir in priority 1769472
        tmpl src 128.66.0.3 (VPN client) () dst 128.66.0.1 (router)
                proto esp reqid 16409 mode tunnel

src ::/0 dst ::/0
        socket out priority 0

ydahhrk · 2024-11-28T18:56:29Z

Please bear with me, because I don't really have any experience configuring VPNs, and in its fervor to sell them to me in the name of buzzwords, the Internet is remarkably atrocious at explaining them.

+-------------+
| VPN client  |
+-------------+
| 2001:db8::1 |
| 128.66.0.3  |
+-------------+
      |
+-------------+
| 128.66.0.1  |
+-------------+
| router      |
+-------------+
| 128.66.0.1  |
+-------------+
      |
+-------------+
| 128.66.0.2  |
+-------------+
| IPv4 host   |
+-------------+

How come everyone involved seems to be in the same IPv4 network (128.66.0)? Is "IPv4 host" also supposed to be a VPN client? Is 128.66.0 supposed to be the VPN?

Is 2001:db8::1 also a member of the VPN?

What is the point of the translator in this environment?

Which is lan, which is wan and which is eth0? Which are the networks connected to each of those interfaces?

What's the expected packet flow?

It seems to be

VPN client writes 2001:db8::1 > 64:ff9b::128.66.0.2
Router receives that (presumably from the lan interface), and translates it into 128.66.0.1 > 128.66.0.2.
Router encrypts and encapsulates that inside of packet a.b.c.d > e.f.g.h... and sends it through the wan interface.
The IPv4 host receives that packet, decapsulates 128.66.0.1 > 128.66.0.2 and responds e.f.g.h > a.b.c.d containing 128.66.0.2 > 128.66.0.1.
Router decapsulates 128.66.0.2 > 128.66.0.1.
Router translates that into 64:ff9b::128.66.0.2 > 2001:db8::1, and sends it through lan.
VPN client receives the packet.

Maybe the problem is that there is no static route? I think the packet are routed only based on this policy:

I don't know the internals of XFRM, but if the packet is supposed to be "routed based on its policies," I'd expect the policies to result in routes in the routing table.

It shouldn't matter if the route is static or dynamic.

Are you getting an empty table when you execute ip -6 route?

src ::/0 dst 2001:db8::1/128 (VPN client)
        dir out priority 1769472
        tmpl src 128.66.0.1 (router) dst 128.66.0.3 (VPN client)
                proto esp reqid 16409 mode tunnel

How do you read this? I'm guessing this means "if you get packet X > 2001:db8::1, encapsulate it inside of packet 128.66.0.1 > 128.66.0.3," but if this is the case, it raises several questions.

rnhmjoj · 2024-11-29T08:58:52Z

How come everyone involved seems to be in the same IPv4 network (128.66.0)? Is "IPv4 host" also supposed to be a VPN client? Is 128.66.0 supposed to be the VPN?

Sorry, I just made the addresses up and it must have become confusing.
The VPN is essentially an old-fashioned IPv6-in-IPv4 tunnel: I'm assuming the VPN client connects to the router from some remote IPv4-only network, thanks to the VPN it acquires an IPv6 addresess (from the LAN prefix) and is able to reach the IPv6 internet.

Is 128.66.0 supposed to be the VPN?

No, the VPN is strictly IPv6-only.

What is the point of the translator in this environment?

It performs NAT64 for the IPv6-only LAN, and ideally also the VPN clients.

The gist is, the VPN client should be able to also reach IPv4 hosts through 64:ff9b::/96. Yes, this apparently makes no sense because it could connect directly (it has an IPv4 address), but I'm trying to route the traffic this way so it is encrypted.

Which is lan, which is wan and which is eth0? Which are the networks connected to each of those interfaces?

I'll draw you a diagram:

                                                                  Internet
+--------------------------------------------------------------------------+
|                                                                          |
|                                                      +----------------+  |
|                                                      |                |  |
|       +---------------+                              |   VPN Client   |  |
|       |               |                              |   (IPv4 only)  |  |
|       |   IPv4 host   |            IPv4              |                |  |
|       |               |                              +----------------+  |
|       +---------------+        +---------------------| 128.66.0.3/32  |  |
|    +->| 128.66.0.2/32 |        |                     +----------------+  |
|    |  +---------------+        | ....................| 2001:db8::1/64 |  |
|    |                           | :                   +----------------+  |
|    |                           | :   VPN tunnel                          |
|    | IPv4                      | :    (IPv6)                             |
|    |                           | :                                       |
+----+---------------------------+-:---------------------------------------+
     |                           | :
     |     +----------------+    | :
     |     |                |    | :
     |     |     Router     |    | :
     |     |                |    | :
     |     +----------------+    | :
     +-----| 128.66.0.1/32  |<---+ :
           +----------------+      :
           | 2001:db8::5/64 |<.....:
           +----------------+
              ^     ^                     LAN 2001:db8::/64
              |     |     +--------------------------------+
              |     |     |                                |
              |     |     |            +----------------+  |
              |     |     |            |                |  |
              |     +-----|----------> |      PC 2      |  |
              |           |            |                |  |
              |           |            +----------------+  |
              |           |            | 2001:db8::3/64 |  |
              |           |            +----------------+  |
              |           |                                |
              |           |    +----------------+          |
              +-----------|--->|                |          |
                          |    |      PC 1      |          |
                          |    |                |          |
                          |    +----------------+          |
                          |    | 2001:db8::2/64 |          |
                          |    +----------------+          |
                          |                                |
                          +--------------------------------+

The expected flow for:

user@client $ ping 64:ff9b::128.66.0.2

is the following:

2001:db8::1 (VPN client) --> 2001:db8::5 (Router) [encapsulated in the tunnel 128.66.0.3/32 <--> 128.66.0.1/32]
128.66.0.1 (router) --> 128.66.0.2 (IPv4 host)
128.66.0.2 (IPv4 host) --> 128.66.0.1 (router)
2001:db8::5 (Router) --> 2001:db8::1 (VPN client) [encapsulated in the tunnel 128.66.0.3/32 <--> 128.66.0.1/32]

Are you getting an empty table when you execute ip -6 route?

Essentially yes: there is only the /64 LAN and the default route via the ISP.

src ::/0 dst 2001:db8::1/128 (VPN client)
dir out priority 1769472
tmpl src 128.66.0.1 (router) dst 128.66.0.3 (VPN client)
proto esp reqid 16409 mode tunnel

How do you read this? I'm guessing this means "if you get packet X > 2001:db8::1, encapsulate it inside of packet 128.66.0.1 > 128.66.0.3," but if this is the case, it raises several questions.

Correct.

ydahhrk · 2024-11-29T21:39:40Z

Ok, this makes much more sense now.

Well, looking at the diagram here, it seems XFRM happens after Prerouting. If you compare this to Jool's version, it doesn't really fit.

Jool picks up your packet at Prerouting, then dumps it in Postrouting. At that point, the packet might U-turn into the "Xfrm encode" box, but by then it skipped all the XFRM lookup analysis.

This matches your pwru output:

# pwru 'icmp6 and src 64:ff9b::128.66.0.2' --output-meta --output-tuple
                 ip6_output iface=44(lan) (...) [64:ff9b::8042:2]:0->[2001:db8::1]:0(icmp6)

Jool left the packet in ip6_output(), which is the kernel's outer IPv6 Postrouting function. In other words, it skipped forwarding, along with all the middle XFRM boxes in it.

Your non-translated packet, by contrast, traverses "Forward" as normal:

pwru 'icmp6 and src 2001:db8::2' --output-meta --output-tuple
            ip6_rcv_core iface=37(wan) (...) [2001:db8::2]:0->[2001:db8::1]:0(icmp6)
                   (...)
             ip6_forward iface=37(wan) (...) [2001:db8::2]:0->[2001:db8::1]:0(icmp6)
                   (...)
    selinux_ip_postroute iface=37(wan) (...) [2001:db8::2]:0->[2001:db8::1]:0(icmp6)

I think your best bet is to separate Jool and XFRM into separate network namespaces. That way, you'll have complete control over what happens when.

Would this be viable?

rnhmjoj · 2024-11-30T10:43:04Z

In other words, it skipped forwarding, along with all the middle XFRM boxes in it.

Oh ok, so it means Jool and XFRM are essentially incompatible, correct?

I think your best bet is to separate Jool and XFRM into separate network namespaces. That way, you'll have complete control over what happens when.

I'll have to look into this as I've never used network namespaces. From what I understand I need to put Jool into an ad-hoc namespace, create a veth device that links it to the real namespace and route 64:ff9b::/96 to it.

Anyway, thank you for your time and for Jool!

ydahhrk · 2024-12-02T15:09:58Z

Oh ok, so it means Jool and XFRM are essentially incompatible, correct?

Assuming the "Xfrm lookup fwd policy" and "Xfrm lookup out policy" boxes (from the thermalcircle.de diagram) are mandatory:

If they're in the same namespace, yes. Unforturately.

I'll have to look into this as I've never used network namespaces. From what I understand I need to put Jool into an ad-hoc namespace, create a veth device that links it to the real namespace and route 64:ff9b::/96 to it.

Yes. You'll also need to route the pool4 address to it.

Alternatively, if you do IPv4 in one interface and IPv6 in a separate interface, you can move one of those interfaces to the virtual namespace. This might simplify the routing, as it'll mean packets won't have to do U-turns in and out of the Jool namespace:

               |
               |
        +-------------+
+=======| eth0        |=============+
|    +--| 2001:db8::5 |---------+   |
|    |  +-------------+         |   |
|    |         |        Virtual |   |
|    |       Jool     Namespace |   |
|    |         |                |   |
|    |  +-----------------+     |   |
|    |  | vethA           |     |   |
|    +--| 192.168.0.1     |-----+   |
|       | 2001:db8:AAAA:1 |         |
|       +-----------------+         |
|              |                    |
|       +-----------------+         |
|    +--| vethB           |-----+   |
|    |  | 192.168.0.2     |     |   |
|    |  | 2001:db8:AAAA:2 |     |   |
|    |  +-----------------+     |   |
|    |         |         Global |   |
|    |       XFRM     Namespace |   |
|    |         |                |   |
|    |  +-------------+         |   |
|    +--| eth1        |---------+   |
+=======| 128.66.0.1  |=============+
        +-------------+
               |
               |

I don't know if NixOS has a special way of doing it, but here's how I setup temporary (won't survive reboots) virtual namespaces in one of my test suites:

Create namespaces: https://github.com/NICMx/Jool/blob/main/test/graybox/test-suite/namespace-create.sh
Network diagram: https://github.com/NICMx/Jool/blob/main/test/graybox/test-suite/nat64/network.md
Network creation scripts: https://github.com/NICMx/Jool/tree/main/test/graybox/test-suite/nat64

To move interface eth0 into namespace potato, run

ip link set dev eth0 netns potato

ydahhrk · 2024-12-02T15:21:01Z

Alternatively alternatively, place Jool in one machine, and XFRM in another. At the end of the day, that's what the namespaces would be simulating.

rnhmjoj · 2024-12-03T10:55:57Z

Unfortunately I can't move the IPv4 WAN interface into the jool namespace, because it's used by the VPN and other services too. I've implemented the setup from this script I found and it seems to work, but it's quite ugly (the IPv4 NAT in particular). Do you happen to have a cleaner setup? Thank you again.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Interaction between Jool and XFRM #427

Interaction between Jool and XFRM #427

rnhmjoj commented Nov 19, 2024

ydahhrk commented Nov 26, 2024

rnhmjoj commented Nov 27, 2024

ydahhrk commented Nov 28, 2024

rnhmjoj commented Nov 29, 2024 •

edited

Loading

ydahhrk commented Nov 29, 2024

rnhmjoj commented Nov 30, 2024

ydahhrk commented Dec 2, 2024

ydahhrk commented Dec 2, 2024

rnhmjoj commented Dec 3, 2024

Interaction between Jool and XFRM #427

Interaction between Jool and XFRM #427

Comments

rnhmjoj commented Nov 19, 2024

ydahhrk commented Nov 26, 2024

rnhmjoj commented Nov 27, 2024

ydahhrk commented Nov 28, 2024

rnhmjoj commented Nov 29, 2024 • edited Loading

ydahhrk commented Nov 29, 2024

rnhmjoj commented Nov 30, 2024

ydahhrk commented Dec 2, 2024

ydahhrk commented Dec 2, 2024

rnhmjoj commented Dec 3, 2024

rnhmjoj commented Nov 29, 2024 •

edited

Loading