Skip to content

Commit

Permalink
tcp_ratelimit: provide an api for drivers to release ratesets at detach
Browse files Browse the repository at this point in the history
When the kernel is compiled with options RATELIMIT, the
mlx5en driver cannot detach. It gets stuck waiting for all
kernel users of its rates to drop to zero before finally calling
ether_ifdetach.

The tcp ratelimit code has an eventhandler for ifnet departure
which causes rates to be released. However, this is called as an
ifnet departure eventhandler, which is invoked as part of
ifdetach(), via either_ifdetach(). This means that the tcp
ratelimit code holds down many hw rates when the mlx5en driver
is waiting for the rate count to go to 0. Thus devctl detach
will deadlock on mlx5 with this stack:
mi_switch+0xcf sleepq_timedwait+0x2f _sleep+0x1a3 pause_sbt+0x77 mlx5e_destroy_ifp+0xaf mlx5_remove_device+0xa7 mlx5_unregister_device+0x78 mlx5_unload_one+0x10a remove_one+0x1e linux_pci_detach_device+0x36 linux_pci_detach+0x24 device_detach+0x180 devctl2_ioctl+0x3dc devfs_ioctl+0xbb vn_ioctl+0xca devfs_ioctl_f+0x1e kern_ioctl+0x1c3 sys_ioctl+0x10a

To fix this, provide an explicit API for a driver to call the tcp
ratelimit code telling it to detach itself from an ifnet. This
allows the mlx5 driver to unload cleanly. I considered adding an
ifnet pre-departure eventhandler. However, that would need to be
invoked by the driver, so a simple function call seemed better.

The mlx5en driver has been updated to call this function.

Reviewed by: kib, rrs

Differential Revision:	https://reviews.freebsd.org/D46221
Sponsored by: Netflix
  • Loading branch information
Andrew Gallatin authored and Andrew Gallatin committed Aug 5, 2024
1 parent 06134ea commit 1f628be
Show file tree
Hide file tree
Showing 3 changed files with 22 additions and 1 deletion.
8 changes: 7 additions & 1 deletion sys/dev/mlx5/mlx5_en/mlx5_en_main.c
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@
#include <machine/atomic.h>

#include <net/debugnet.h>
#include <netinet/tcp_ratelimit.h>
#include <netipsec/keydb.h>
#include <netipsec/ipsec_offload.h>

Expand Down Expand Up @@ -4876,7 +4877,12 @@ mlx5e_destroy_ifp(struct mlx5_core_dev *mdev, void *vpriv)

#ifdef RATELIMIT
/*
* The kernel can have reference(s) via the m_snd_tag's into
* Tell the TCP ratelimit code to release the rate-sets attached
* to our ifnet.
*/
tcp_rl_release_ifnet(ifp);
/*
* The kernel can still have reference(s) via the m_snd_tag's into
* the ratelimit channels, and these must go away before
* detaching:
*/
Expand Down
6 changes: 6 additions & 0 deletions sys/netinet/tcp_ratelimit.c
Original file line number Diff line number Diff line change
Expand Up @@ -1298,6 +1298,12 @@ tcp_rl_ifnet_departure(void *arg __unused, struct ifnet *ifp)
NET_EPOCH_EXIT(et);
}

void
tcp_rl_release_ifnet(struct ifnet *ifp)
{
tcp_rl_ifnet_departure(NULL, ifp);
}

static void
tcp_rl_shutdown(void *arg __unused, int howto __unused)
{
Expand Down
9 changes: 9 additions & 0 deletions sys/netinet/tcp_ratelimit.h
Original file line number Diff line number Diff line change
Expand Up @@ -94,6 +94,8 @@ CK_LIST_HEAD(head_tcp_rate_set, tcp_rate_set);
#ifndef ETHERNET_SEGMENT_SIZE
#define ETHERNET_SEGMENT_SIZE 1514
#endif
struct tcpcb;

#ifdef RATELIMIT
#define DETAILED_RATELIMIT_SYSCTL 1 /*
* Undefine this if you don't want
Expand Down Expand Up @@ -131,6 +133,9 @@ tcp_get_pacing_burst_size_w_divisor(struct tcpcb *tp, uint64_t bw, uint32_t segs
void
tcp_rl_log_enobuf(const struct tcp_hwrate_limit_table *rte);

void
tcp_rl_release_ifnet(struct ifnet *ifp);

#else
static inline const struct tcp_hwrate_limit_table *
tcp_set_pacing_rate(struct tcpcb *tp, struct ifnet *ifp,
Expand Down Expand Up @@ -218,6 +223,10 @@ tcp_rl_log_enobuf(const struct tcp_hwrate_limit_table *rte)
{
}

static inline void
tcp_rl_release_ifnet(struct ifnet *ifp)
{
}
#endif

/*
Expand Down

0 comments on commit 1f628be

Please sign in to comment.