Add a netdevice flag to control skb linearization in generic xdp mode.
The attribute can be modified through
/sys/class/net/<DEVICE>/xdp_linearize
The default is 1 (on)
Motivation: xdp expects linear skbs with some minimum headroom, and
generic xdp calls skb_linearize() if needed. The linearizatio is
expensive though, and may be unnecessary e.g. when the xdp program does
not need access to the whole payload.
This sysfs entry allows users to opt out of linearization on a
per-device basis (linearization is still performed on cloned skbs).
On a kernel instrumented to grab timestamps around the linearization
code in netif_receive_generic_xdp, and heavy netperf traffic with 1500b
mtu, I see the following times (nanoseconds/pkt)
The receiver generally sees larger packets so the difference is more
significant.
ns/pkt RECEIVER SENDER
p50 p90 p99 p50 p90 p99
LINEARIZATION: 600ns 1090ns 4900ns 149ns 249ns 460ns
NO LINEARIZATION: 40ns 59ns 90ns 40ns 50ns 100ns
v1 --> v2 : added Documentation
v2 --> v3 : adjusted for skb_cloned
Signed-off-by: Luigi Rizzo <[email protected]>
---
Documentation/ABI/testing/sysfs-class-net | 8 ++++++++
include/linux/netdevice.h | 3 ++-
net/core/dev.c | 8 ++++++--
net/core/net-sysfs.c | 15 +++++++++++++++
4 files changed, 31 insertions(+), 3 deletions(-)
diff --git a/Documentation/ABI/testing/sysfs-class-net b/Documentation/ABI/testing/sysfs-class-net
index 664a8f6a634f3..5917af789c53c 100644
--- a/Documentation/ABI/testing/sysfs-class-net
+++ b/Documentation/ABI/testing/sysfs-class-net
@@ -301,3 +301,11 @@ Contact: [email protected]
Description:
32-bit unsigned integer counting the number of times the link has
been down
+
+What: /sys/class/net/<iface>/xdp_linearize
+Date: Jan 2020
+KernelVersion: 5.6
+Contact: [email protected]
+Description:
+ boolean controlling whether skb should be linearized in
+ generic xdp. Defaults to true.
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 6c3f7032e8d9d..66fe80d9b5d09 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1985,7 +1985,8 @@ struct net_device {
struct netdev_rx_queue *_rx;
unsigned int num_rx_queues;
- unsigned int real_num_rx_queues;
+ unsigned int real_num_rx_queues:31;
+ unsigned int xdp_linearize : 1;
struct bpf_prog __rcu *xdp_prog;
unsigned long gro_flush_timeout;
diff --git a/net/core/dev.c b/net/core/dev.c
index dbbfff123196a..ef54c33de3492 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4520,9 +4520,12 @@ static u32 netif_receive_generic_xdp(struct sk_buff *skb,
/* XDP packets must be linear and must have sufficient headroom
* of XDP_PACKET_HEADROOM bytes. This is the guarantee that also
* native XDP provides, thus we need to do it here as well.
+ * For non shared skbs linearization is controlled by xdp_linearize.
*/
- if (skb_cloned(skb) || skb_is_nonlinear(skb) ||
- skb_headroom(skb) < XDP_PACKET_HEADROOM) {
+ if (skb_cloned(skb) ||
+ (skb->dev->xdp_linearize &&
+ (skb_is_nonlinear(skb) ||
+ skb_headroom(skb) < XDP_PACKET_HEADROOM))) {
int hroom = XDP_PACKET_HEADROOM - skb_headroom(skb);
int troom = skb->tail + skb->data_len - skb->end;
@@ -9806,6 +9809,7 @@ struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name,
dev->gso_max_segs = GSO_MAX_SEGS;
dev->upper_level = 1;
dev->lower_level = 1;
+ dev->xdp_linearize = 1;
INIT_LIST_HEAD(&dev->napi_list);
INIT_LIST_HEAD(&dev->unreg_list);
diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index cf0215734ceb0..bcd45e9a20668 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -442,6 +442,20 @@ static ssize_t proto_down_store(struct device *dev,
}
NETDEVICE_SHOW_RW(proto_down, fmt_dec);
+static int change_xdp_linearize(struct net_device *dev, unsigned long val)
+{
+ dev->xdp_linearize = !!val;
+ return 0;
+}
+
+static ssize_t xdp_linearize_store(struct device *dev,
+ struct device_attribute *attr,
+ const char *buf, size_t len)
+{
+ return netdev_store(dev, attr, buf, len, change_xdp_linearize);
+}
+NETDEVICE_SHOW_RW(xdp_linearize, fmt_dec);
+
static ssize_t phys_port_id_show(struct device *dev,
struct device_attribute *attr, char *buf)
{
@@ -536,6 +550,7 @@ static struct attribute *net_class_attrs[] __ro_after_init = {
&dev_attr_phys_port_name.attr,
&dev_attr_phys_switch_id.attr,
&dev_attr_proto_down.attr,
+ &dev_attr_xdp_linearize.attr,
&dev_attr_carrier_up_count.attr,
&dev_attr_carrier_down_count.attr,
NULL,
--
2.25.0.265.gbab2e86ba0-goog
Luigi Rizzo <[email protected]> writes:
> Add a netdevice flag to control skb linearization in generic xdp mode.
>
> The attribute can be modified through
> /sys/class/net/<DEVICE>/xdp_linearize
> The default is 1 (on)
Calling it just 'xdp_linearize' implies (to me) that it also affects
driver-mode XDP. So maybe generic_xdp_linearize ?
[...]
> +
> +What: /sys/class/net/<iface>/xdp_linearize
> +Date: Jan 2020
> +KernelVersion: 5.6
> +Contact: [email protected]
> +Description:
> + boolean controlling whether skb should be linearized in
> + generic xdp. Defaults to true.
Could you also add a few words explaining what the tradeoff here is?
Something like: "turning this off can increase the performance of
generic XDP at the cost of making the content of making the XDP program
unable to access packet fragments after the first one"
-Toke
On Fri, Feb 28, 2020 at 2:16 AM Toke Høiland-Jørgensen <[email protected]> wrote:
>
> Luigi Rizzo <[email protected]> writes:
>
> > Add a netdevice flag to control skb linearization in generic xdp mode.
> >
> > The attribute can be modified through
> > /sys/class/net/<DEVICE>/xdp_linearize
> > The default is 1 (on)
>
> Calling it just 'xdp_linearize' implies (to me) that it also affects
> driver-mode XDP. So maybe generic_xdp_linearize ?
done in v4, xdpgeneric_linearize for consistency
...
> > +Description:
> > + boolean controlling whether skb should be linearized in
> > + generic xdp. Defaults to true.
>
> Could you also add a few words explaining what the tradeoff here is?
> Something like: "turning this off can increase the performance of
> generic XDP at the cost of making the content of making the XDP program
> unable to access packet fragments after the first one"
done too in v4
thanks
luigi