2020-02-28 10:56:23

by Luigi Rizzo

[permalink] [raw]
Subject: [PATCH v4] netdev attribute to control xdpgeneric skb linearization

Add a netdevice flag to control skb linearization in generic xdp mode.

The attribute can be modified through
/sys/class/net/<DEVICE>/xdpgeneric_linearize
The default is 1 (on)

Motivation: xdp expects linear skbs with some minimum headroom, and
generic xdp calls skb_linearize() if needed. The linearization is
expensive, and may be unnecessary e.g. when the xdp program does
not need access to the whole payload.
This sysfs entry allows users to opt out of linearization on a
per-device basis (linearization is still performed on cloned skbs).

On a kernel instrumented to grab timestamps around the linearization
code in netif_receive_generic_xdp, and heavy netperf traffic with 1500b
mtu, I see the following times (nanoseconds/pkt)

The receiver generally sees larger packets so the difference is more
significant.

ns/pkt RECEIVER SENDER

p50 p90 p99 p50 p90 p99

LINEARIZATION: 600ns 1090ns 4900ns 149ns 249ns 460ns
NO LINEARIZATION: 40ns 59ns 90ns 40ns 50ns 100ns

v1 --> v2 : added Documentation
v2 --> v3 : adjusted for skb_cloned
v3 --> v4 : renamed to xdpgeneric_linearize, documentation

Signed-off-by: Luigi Rizzo <[email protected]>
---
Documentation/ABI/testing/sysfs-class-net | 10 ++++++++++
include/linux/netdevice.h | 3 ++-
net/core/dev.c | 8 ++++++--
net/core/net-sysfs.c | 16 ++++++++++++++++
4 files changed, 34 insertions(+), 3 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-class-net b/Documentation/ABI/testing/sysfs-class-net
index 664a8f6a634f..d5531bf223d7 100644
--- a/Documentation/ABI/testing/sysfs-class-net
+++ b/Documentation/ABI/testing/sysfs-class-net
@@ -301,3 +301,13 @@ Contact: [email protected]
Description:
32-bit unsigned integer counting the number of times the link has
been down
+
+What: /sys/class/net/<iface>/xdpgeneric_linearize
+Date: Feb 2020
+KernelVersion: 5.6
+Contact: [email protected]
+Description:
+ boolean controlling whether skbs should be linearized in
+ generic XDP. Defaults to true. Turning this off can increase
+ the performance of generic XDP at the cost of making the XDP
+ program unable to access packet fragments after the first one.
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 6c3f7032e8d9..f06294b2e8bb 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1985,7 +1985,8 @@ struct net_device {

struct netdev_rx_queue *_rx;
unsigned int num_rx_queues;
- unsigned int real_num_rx_queues;
+ unsigned int real_num_rx_queues:31;
+ unsigned int xdpgeneric_linearize : 1;

struct bpf_prog __rcu *xdp_prog;
unsigned long gro_flush_timeout;
diff --git a/net/core/dev.c b/net/core/dev.c
index dbbfff123196..c539489d3166 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4520,9 +4520,12 @@ static u32 netif_receive_generic_xdp(struct sk_buff *skb,
/* XDP packets must be linear and must have sufficient headroom
* of XDP_PACKET_HEADROOM bytes. This is the guarantee that also
* native XDP provides, thus we need to do it here as well.
+ * For non shared skbs, xdpgeneric_linearize controls linearization.
*/
- if (skb_cloned(skb) || skb_is_nonlinear(skb) ||
- skb_headroom(skb) < XDP_PACKET_HEADROOM) {
+ if (skb_cloned(skb) ||
+ (skb->dev->xdpgeneric_linearize &&
+ (skb_is_nonlinear(skb) ||
+ skb_headroom(skb) < XDP_PACKET_HEADROOM))) {
int hroom = XDP_PACKET_HEADROOM - skb_headroom(skb);
int troom = skb->tail + skb->data_len - skb->end;

@@ -9806,6 +9809,7 @@ struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name,
dev->gso_max_segs = GSO_MAX_SEGS;
dev->upper_level = 1;
dev->lower_level = 1;
+ dev->xdpgeneric_linearize = 1;

INIT_LIST_HEAD(&dev->napi_list);
INIT_LIST_HEAD(&dev->unreg_list);
diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index cf0215734ceb..eab06a427d90 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -442,6 +442,21 @@ static ssize_t proto_down_store(struct device *dev,
}
NETDEVICE_SHOW_RW(proto_down, fmt_dec);

+static int change_xdpgeneric_linearize(struct net_device *dev,
+ unsigned long val)
+{
+ dev->xdpgeneric_linearize = !!val;
+ return 0;
+}
+
+static ssize_t xdpgeneric_linearize_store(struct device *dev,
+ struct device_attribute *attr,
+ const char *buf, size_t len)
+{
+ return netdev_store(dev, attr, buf, len, change_xdpgeneric_linearize);
+}
+NETDEVICE_SHOW_RW(xdpgeneric_linearize, fmt_dec);
+
static ssize_t phys_port_id_show(struct device *dev,
struct device_attribute *attr, char *buf)
{
@@ -536,6 +551,7 @@ static struct attribute *net_class_attrs[] __ro_after_init = {
&dev_attr_phys_port_name.attr,
&dev_attr_phys_switch_id.attr,
&dev_attr_proto_down.attr,
+ &dev_attr_xdpgeneric_linearize.attr,
&dev_attr_carrier_up_count.attr,
&dev_attr_carrier_down_count.attr,
NULL,
--
2.25.1.481.gfbce0eb801-goog


2020-02-28 11:21:40

by Toke Høiland-Jørgensen

[permalink] [raw]
Subject: Re: [PATCH v4] netdev attribute to control xdpgeneric skb linearization

Luigi Rizzo <[email protected]> writes:

> Add a netdevice flag to control skb linearization in generic xdp mode.
>
> The attribute can be modified through
> /sys/class/net/<DEVICE>/xdpgeneric_linearize
> The default is 1 (on)
>
> Motivation: xdp expects linear skbs with some minimum headroom, and
> generic xdp calls skb_linearize() if needed. The linearization is
> expensive, and may be unnecessary e.g. when the xdp program does
> not need access to the whole payload.
> This sysfs entry allows users to opt out of linearization on a
> per-device basis (linearization is still performed on cloned skbs).
>
> On a kernel instrumented to grab timestamps around the linearization
> code in netif_receive_generic_xdp, and heavy netperf traffic with 1500b
> mtu, I see the following times (nanoseconds/pkt)
>
> The receiver generally sees larger packets so the difference is more
> significant.
>
> ns/pkt RECEIVER SENDER
>
> p50 p90 p99 p50 p90 p99
>
> LINEARIZATION: 600ns 1090ns 4900ns 149ns 249ns 460ns
> NO LINEARIZATION: 40ns 59ns 90ns 40ns 50ns 100ns
>
> v1 --> v2 : added Documentation
> v2 --> v3 : adjusted for skb_cloned
> v3 --> v4 : renamed to xdpgeneric_linearize, documentation
>
> Signed-off-by: Luigi Rizzo <[email protected]>

Acked-by: Toke Høiland-Jørgensen <[email protected]>

2020-02-28 12:13:50

by Michal Kubecek

[permalink] [raw]
Subject: Re: [PATCH v4] netdev attribute to control xdpgeneric skb linearization

On Fri, Feb 28, 2020 at 02:54:35AM -0800, Luigi Rizzo wrote:
> Add a netdevice flag to control skb linearization in generic xdp mode.
>
> The attribute can be modified through
> /sys/class/net/<DEVICE>/xdpgeneric_linearize
> The default is 1 (on)

I'm a bit surprised that it didn't appear in earlier rounds of review
but I believe (rt)netlink is generally preferred configuration interface
for network device attributes. Making a new attribute accessible only
through sysfs doesn't seem right. (But it's not my call to make.)

Michal

2020-02-28 12:30:20

by Jesper Dangaard Brouer

[permalink] [raw]
Subject: Re: [PATCH v4] netdev attribute to control xdpgeneric skb linearization

On Fri, 28 Feb 2020 02:54:35 -0800
Luigi Rizzo <[email protected]> wrote:

> diff --git a/net/core/dev.c b/net/core/dev.c
> index dbbfff123196..c539489d3166 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -4520,9 +4520,12 @@ static u32 netif_receive_generic_xdp(struct sk_buff *skb,
> /* XDP packets must be linear and must have sufficient headroom
> * of XDP_PACKET_HEADROOM bytes. This is the guarantee that also
> * native XDP provides, thus we need to do it here as well.
> + * For non shared skbs, xdpgeneric_linearize controls linearization.
> */
> - if (skb_cloned(skb) || skb_is_nonlinear(skb) ||
> - skb_headroom(skb) < XDP_PACKET_HEADROOM) {
> + if (skb_cloned(skb) ||
> + (skb->dev->xdpgeneric_linearize &&
> + (skb_is_nonlinear(skb) ||
> + skb_headroom(skb) < XDP_PACKET_HEADROOM))) {
> int hroom = XDP_PACKET_HEADROOM - skb_headroom(skb);
> int troom = skb->tail + skb->data_len - skb->end;
>

Have you checked that calling bpf_xdp_adjust_tail() is not breaking anything?

--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
LinkedIn: http://www.linkedin.com/in/brouer

2020-02-28 12:32:41

by Jesper Dangaard Brouer

[permalink] [raw]
Subject: Re: [PATCH v4] netdev attribute to control xdpgeneric skb linearization

On Fri, 28 Feb 2020 02:54:35 -0800
Luigi Rizzo <[email protected]> wrote:

> diff --git a/net/core/dev.c b/net/core/dev.c
> index dbbfff123196..c539489d3166 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -4520,9 +4520,12 @@ static u32 netif_receive_generic_xdp(struct sk_buff *skb,
> /* XDP packets must be linear and must have sufficient headroom
> * of XDP_PACKET_HEADROOM bytes. This is the guarantee that also
> * native XDP provides, thus we need to do it here as well.
> + * For non shared skbs, xdpgeneric_linearize controls linearization.
> */
> - if (skb_cloned(skb) || skb_is_nonlinear(skb) ||
> - skb_headroom(skb) < XDP_PACKET_HEADROOM) {
> + if (skb_cloned(skb) ||
> + (skb->dev->xdpgeneric_linearize &&
> + (skb_is_nonlinear(skb) ||
> + skb_headroom(skb) < XDP_PACKET_HEADROOM))) {
> int hroom = XDP_PACKET_HEADROOM - skb_headroom(skb);
> int troom = skb->tail + skb->data_len - skb->end;

Have you checked that calling bpf_xdp_adjust_tail() is not breaking anything?

--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
LinkedIn: http://www.linkedin.com/in/brouer

2020-02-28 13:20:48

by Luigi Rizzo

[permalink] [raw]
Subject: Re: [PATCH v4] netdev attribute to control xdpgeneric skb linearization

On Fri, Feb 28, 2020 at 4:30 AM Jesper Dangaard Brouer
<[email protected]> wrote:
>
> On Fri, 28 Feb 2020 02:54:35 -0800
> Luigi Rizzo <[email protected]> wrote:
>
> > diff --git a/net/core/dev.c b/net/core/dev.c
> > index dbbfff123196..c539489d3166 100644
> > --- a/net/core/dev.c
> > +++ b/net/core/dev.c
> > @@ -4520,9 +4520,12 @@ static u32 netif_receive_generic_xdp(struct sk_buff *skb,
> > /* XDP packets must be linear and must have sufficient headroom
> > * of XDP_PACKET_HEADROOM bytes. This is the guarantee that also
> > * native XDP provides, thus we need to do it here as well.
> > + * For non shared skbs, xdpgeneric_linearize controls linearization.
> > */
> > - if (skb_cloned(skb) || skb_is_nonlinear(skb) ||
> > - skb_headroom(skb) < XDP_PACKET_HEADROOM) {
> > + if (skb_cloned(skb) ||
> > + (skb->dev->xdpgeneric_linearize &&
> > + (skb_is_nonlinear(skb) ||
> > + skb_headroom(skb) < XDP_PACKET_HEADROOM))) {
> > int hroom = XDP_PACKET_HEADROOM - skb_headroom(skb);
> > int troom = skb->tail + skb->data_len - skb->end;
> >
>
> Have you checked that calling bpf_xdp_adjust_tail() is not breaking anything?

It won't leak memory or cause crashes if that is what you mean.
Of course if there are more segments the effect won't be the desired one,
as it will chop off the tail of the first segment.

But this is an opt-in feature and requires the same permissions needed to load
an xdp program, so I expect it to be used consciously.

It would be nice if we had a flag in the xdp_buff to communicate that
the packet is
incomplete, but there isn't a way that I can see.

cheers
luigi

2020-02-28 19:01:22

by Jakub Kicinski

[permalink] [raw]
Subject: Re: [PATCH v4] netdev attribute to control xdpgeneric skb linearization

On Fri, 28 Feb 2020 02:54:35 -0800 Luigi Rizzo wrote:
> Add a netdevice flag to control skb linearization in generic xdp mode.
>
> The attribute can be modified through
> /sys/class/net/<DEVICE>/xdpgeneric_linearize
> The default is 1 (on)
>
> Motivation: xdp expects linear skbs with some minimum headroom, and
> generic xdp calls skb_linearize() if needed. The linearization is
> expensive, and may be unnecessary e.g. when the xdp program does
> not need access to the whole payload.
> This sysfs entry allows users to opt out of linearization on a
> per-device basis (linearization is still performed on cloned skbs).
>
> On a kernel instrumented to grab timestamps around the linearization
> code in netif_receive_generic_xdp, and heavy netperf traffic with 1500b
> mtu, I see the following times (nanoseconds/pkt)
>
> The receiver generally sees larger packets so the difference is more
> significant.
>
> ns/pkt RECEIVER SENDER
>
> p50 p90 p99 p50 p90 p99
>
> LINEARIZATION: 600ns 1090ns 4900ns 149ns 249ns 460ns
> NO LINEARIZATION: 40ns 59ns 90ns 40ns 50ns 100ns
>
> v1 --> v2 : added Documentation
> v2 --> v3 : adjusted for skb_cloned
> v3 --> v4 : renamed to xdpgeneric_linearize, documentation
>
> Signed-off-by: Luigi Rizzo <[email protected]>

Just load your program in cls_bpf. No extensions or knobs needed.

Making xdpgeneric-only extensions without touching native XDP makes
no sense to me. Is this part of some greater vision?

2020-02-28 23:54:55

by Willem de Bruijn

[permalink] [raw]
Subject: Re: [PATCH v4] netdev attribute to control xdpgeneric skb linearization

On Fri, Feb 28, 2020 at 2:01 PM Jakub Kicinski <[email protected]> wrote:
>
> On Fri, 28 Feb 2020 02:54:35 -0800 Luigi Rizzo wrote:
> > Add a netdevice flag to control skb linearization in generic xdp mode.
> >
> > The attribute can be modified through
> > /sys/class/net/<DEVICE>/xdpgeneric_linearize
> > The default is 1 (on)
> >
> > Motivation: xdp expects linear skbs with some minimum headroom, and
> > generic xdp calls skb_linearize() if needed. The linearization is
> > expensive, and may be unnecessary e.g. when the xdp program does
> > not need access to the whole payload.
> > This sysfs entry allows users to opt out of linearization on a
> > per-device basis (linearization is still performed on cloned skbs).
> >
> > On a kernel instrumented to grab timestamps around the linearization
> > code in netif_receive_generic_xdp, and heavy netperf traffic with 1500b
> > mtu, I see the following times (nanoseconds/pkt)
> >
> > The receiver generally sees larger packets so the difference is more
> > significant.
> >
> > ns/pkt RECEIVER SENDER
> >
> > p50 p90 p99 p50 p90 p99
> >
> > LINEARIZATION: 600ns 1090ns 4900ns 149ns 249ns 460ns
> > NO LINEARIZATION: 40ns 59ns 90ns 40ns 50ns 100ns
> >
> > v1 --> v2 : added Documentation
> > v2 --> v3 : adjusted for skb_cloned
> > v3 --> v4 : renamed to xdpgeneric_linearize, documentation
> >
> > Signed-off-by: Luigi Rizzo <[email protected]>
>
> Just load your program in cls_bpf. No extensions or knobs needed.
>
> Making xdpgeneric-only extensions without touching native XDP makes
> no sense to me. Is this part of some greater vision?

Yes, native xdp has the same issue when handling packets that exceed a
page (4K+ MTU) or otherwise consist of multiple segments. The issue is
just more acute in generic xdp. But agreed that both need to be solved
together.

Many programs need only access to the header. There currently is not a
way to express this, or for xdp to convey that the buffer covers only
part of the packet.

2020-03-03 21:45:13

by Daniel Borkmann

[permalink] [raw]
Subject: Re: [PATCH v4] netdev attribute to control xdpgeneric skb linearization

On 2/29/20 12:53 AM, Willem de Bruijn wrote:
> On Fri, Feb 28, 2020 at 2:01 PM Jakub Kicinski <[email protected]> wrote:
>> On Fri, 28 Feb 2020 02:54:35 -0800 Luigi Rizzo wrote:
>>> Add a netdevice flag to control skb linearization in generic xdp mode.
>>>
>>> The attribute can be modified through
>>> /sys/class/net/<DEVICE>/xdpgeneric_linearize
>>> The default is 1 (on)
>>>
>>> Motivation: xdp expects linear skbs with some minimum headroom, and
>>> generic xdp calls skb_linearize() if needed. The linearization is
>>> expensive, and may be unnecessary e.g. when the xdp program does
>>> not need access to the whole payload.
>>> This sysfs entry allows users to opt out of linearization on a
>>> per-device basis (linearization is still performed on cloned skbs).
>>>
>>> On a kernel instrumented to grab timestamps around the linearization
>>> code in netif_receive_generic_xdp, and heavy netperf traffic with 1500b
>>> mtu, I see the following times (nanoseconds/pkt)
>>>
>>> The receiver generally sees larger packets so the difference is more
>>> significant.
>>>
>>> ns/pkt RECEIVER SENDER
>>>
>>> p50 p90 p99 p50 p90 p99
>>>
>>> LINEARIZATION: 600ns 1090ns 4900ns 149ns 249ns 460ns
>>> NO LINEARIZATION: 40ns 59ns 90ns 40ns 50ns 100ns
>>>
>>> v1 --> v2 : added Documentation
>>> v2 --> v3 : adjusted for skb_cloned
>>> v3 --> v4 : renamed to xdpgeneric_linearize, documentation
>>>
>>> Signed-off-by: Luigi Rizzo <[email protected]>
>>
>> Just load your program in cls_bpf. No extensions or knobs needed.
>>
>> Making xdpgeneric-only extensions without touching native XDP makes
>> no sense to me. Is this part of some greater vision?
>
> Yes, native xdp has the same issue when handling packets that exceed a
> page (4K+ MTU) or otherwise consist of multiple segments. The issue is
> just more acute in generic xdp. But agreed that both need to be solved
> together.
>
> Many programs need only access to the header. There currently is not a
> way to express this, or for xdp to convey that the buffer covers only
> part of the packet.

Right, my only question I had earlier was that when users ship their
application with /sys/class/net/<DEVICE>/xdpgeneric_linearize turned off,
how would they know how much of the data is actually pulled in? Afaik,
some drivers might only have a linear section that covers the eth header
and that is it. What should the BPF prog do in such case? Drop the skb
since it does not have the rest of the data to e.g. make a XDP_PASS
decision or fallback to tc/BPF altogether? I hinted earlier, one way to
make this more graceful is to add a skb pointer inside e.g. struct
xdp_rxq_info and then enable an bpf_skb_pull_data()-like helper e.g. as:

BPF_CALL_2(bpf_xdp_pull_data, struct xdp_buff *, xdp, u32, len)
{
struct sk_buff *skb = xdp->rxq->skb;

return skb ? bpf_try_make_writable(skb, len ? :
skb_headlen(skb)) : -ENOTSUPP;
}

Thus, when the data/data_end test fails in generic XDP, the user can
call e.g. bpf_xdp_pull_data(xdp, 64) to make sure we pull in as much as
is needed w/o full linearization and once done the data/data_end can be
repeated to proceed. Native XDP will leave xdp->rxq->skb as NULL, but
later we could perhaps reuse the same bpf_xdp_pull_data() helper for
native with skb-less backing. Thoughts?

Thanks,
Daniel

2020-03-03 21:49:56

by Jakub Kicinski

[permalink] [raw]
Subject: Re: [PATCH v4] netdev attribute to control xdpgeneric skb linearization

On Tue, 3 Mar 2020 20:46:55 +0100 Daniel Borkmann wrote:
> Thus, when the data/data_end test fails in generic XDP, the user can
> call e.g. bpf_xdp_pull_data(xdp, 64) to make sure we pull in as much as
> is needed w/o full linearization and once done the data/data_end can be
> repeated to proceed. Native XDP will leave xdp->rxq->skb as NULL, but
> later we could perhaps reuse the same bpf_xdp_pull_data() helper for
> native with skb-less backing. Thoughts?

I'm curious why we consider a xdpgeneric-only addition. Is attaching
a cls_bpf program noticeably slower than xdpgeneric?

2020-03-03 21:50:51

by Daniel Borkmann

[permalink] [raw]
Subject: Re: [PATCH v4] netdev attribute to control xdpgeneric skb linearization

On 3/3/20 9:50 PM, Jakub Kicinski wrote:
> On Tue, 3 Mar 2020 20:46:55 +0100 Daniel Borkmann wrote:
>> Thus, when the data/data_end test fails in generic XDP, the user can
>> call e.g. bpf_xdp_pull_data(xdp, 64) to make sure we pull in as much as
>> is needed w/o full linearization and once done the data/data_end can be
>> repeated to proceed. Native XDP will leave xdp->rxq->skb as NULL, but
>> later we could perhaps reuse the same bpf_xdp_pull_data() helper for
>> native with skb-less backing. Thoughts?
>
> I'm curious why we consider a xdpgeneric-only addition. Is attaching
> a cls_bpf program noticeably slower than xdpgeneric?

Yeah, agree, I'm curious about that part as well.

Thanks,
Daniel

2020-03-03 21:51:28

by Willem de Bruijn

[permalink] [raw]
Subject: Re: [PATCH v4] netdev attribute to control xdpgeneric skb linearization

On Tue, Mar 3, 2020 at 3:50 PM Jakub Kicinski <[email protected]> wrote:
>
> On Tue, 3 Mar 2020 20:46:55 +0100 Daniel Borkmann wrote:
> > Thus, when the data/data_end test fails in generic XDP, the user can
> > call e.g. bpf_xdp_pull_data(xdp, 64) to make sure we pull in as much as
> > is needed w/o full linearization and once done the data/data_end can be
> > repeated to proceed. Native XDP will leave xdp->rxq->skb as NULL, but
> > later we could perhaps reuse the same bpf_xdp_pull_data() helper for
> > native with skb-less backing. Thoughts?

Something akin to pskb_may_pull sounds like a great solution to me.

Another approach would be a new xdp_action XDP_NEED_LINEARIZED that
causes the program to be restarted after linearization. But that is both
more expensive and less elegant.

Instead of a sysctl or device option, is this an optimization that
could be taken based on the program? Specifically, would XDP_FLAGS be
a path to pass a SUPPORT_SG flag along with the program? I'm not
entirely familiar with the XDP setup code, so this may be a totally
off. But from a quick read it seems like generic_xdp_install could
transfer such a flag to struct net_device.

> I'm curious why we consider a xdpgeneric-only addition. Is attaching
> a cls_bpf program noticeably slower than xdpgeneric?

This just should not be xdp*generic* only, but allow us to use any XDP
with large MTU sizes and without having to disable GRO. I'd still like a
way to be able to drop or modify packets before GRO, or to signal that
a type of packet should skip GRO.

2020-03-04 09:19:41

by Jesper Dangaard Brouer

[permalink] [raw]
Subject: Re: [PATCH v4] netdev attribute to control xdpgeneric skb linearization

On Tue, 3 Mar 2020 16:10:14 -0500
Willem de Bruijn <[email protected]> wrote:

> On Tue, Mar 3, 2020 at 3:50 PM Jakub Kicinski <[email protected]> wrote:
> >
> > On Tue, 3 Mar 2020 20:46:55 +0100 Daniel Borkmann wrote:
> > > Thus, when the data/data_end test fails in generic XDP, the user can
> > > call e.g. bpf_xdp_pull_data(xdp, 64) to make sure we pull in as much as
> > > is needed w/o full linearization and once done the data/data_end can be
> > > repeated to proceed. Native XDP will leave xdp->rxq->skb as NULL, but
> > > later we could perhaps reuse the same bpf_xdp_pull_data() helper for
> > > native with skb-less backing. Thoughts?
>
> Something akin to pskb_may_pull sounds like a great solution to me.
>
> Another approach would be a new xdp_action XDP_NEED_LINEARIZED that
> causes the program to be restarted after linearization. But that is both
> more expensive and less elegant.
>
> Instead of a sysctl or device option, is this an optimization that
> could be taken based on the program? Specifically, would XDP_FLAGS be
> a path to pass a SUPPORT_SG flag along with the program? I'm not
> entirely familiar with the XDP setup code, so this may be a totally
> off. But from a quick read it seems like generic_xdp_install could
> transfer such a flag to struct net_device.
>
> > I'm curious why we consider a xdpgeneric-only addition. Is attaching
> > a cls_bpf program noticeably slower than xdpgeneric?
>
> This just should not be xdp*generic* only, but allow us to use any XDP
> with large MTU sizes and without having to disable GRO.

This is an important point: "should not be xdp*generic* only".

I really want to see this work for XDP-native *first*, and it seems
that with Daniel's idea, it can can also work for XDP-generic. As Jakub
also hinted, it seems strange that people are trying to implement this
for XDP-generic, as I don't think there is any performance advantage
over cls_bpf. We really want this to work from XDP-native.


> I'd still like a way to be able to drop or modify packets before GRO,
> or to signal that a type of packet should skip GRO.

That is a use-case, that we should remember to support.

Samih (cc'ed) is working on adding multi-frame support[1] to XDP-native.
Given the huge interest this thread shows, I think I will dedicate
some of my time to help him out on the actual coding.

For my idea to work[1], we first have storage space for the multi-buffer
references, and I propose we use the skb_shared_info area, that is
available anyhow for XDP_PASS that calls build_skb(). Thus, we first
need to standardize across all XDP drivers, how and where this memory
area is referenced/offset.


[1] https://github.com/xdp-project/xdp-project/blob/master/areas/core/xdp-multi-buffer01-design.org
[2] https://github.com/xdp-project/xdp-project/blob/master/areas/core/xdp-multi-buffer01-design.org#storage-space-for-multi-buffer-referencessegments
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
LinkedIn: http://www.linkedin.com/in/brouer

2020-03-04 10:08:03

by Luigi Rizzo

[permalink] [raw]
Subject: Re: [PATCH v4] netdev attribute to control xdpgeneric skb linearization

[taking one message in the thread to answer multiple issues]

On Tue, Mar 3, 2020 at 11:47 AM Daniel Borkmann <[email protected]> wrote:
>
> On 2/29/20 12:53 AM, Willem de Bruijn wrote:
> > On Fri, Feb 28, 2020 at 2:01 PM Jakub Kicinski <[email protected]> wrote:
> >> On Fri, 28 Feb 2020 02:54:35 -0800 Luigi Rizzo wrote:
> >>> Add a netdevice flag to control skb linearization in generic xdp mode.
> >>>
> >>> The attribute can be modified through
> >>> /sys/class/net/<DEVICE>/xdpgeneric_linearize
> >>> The default is 1 (on)
...
> >>> ns/pkt RECEIVER SENDER
> >>>
> >>> p50 p90 p99 p50 p90 p99
> >>>
> >>> LINEARIZATION: 600ns 1090ns 4900ns 149ns 249ns 460ns
> >>> NO LINEARIZATION: 40ns 59ns 90ns 40ns 50ns 100ns
...
> >> Just load your program in cls_bpf. No extensions or knobs needed.

Yes this is indeed an option, perhaps the only downside is that
it acts after packet taps, so if, say, the program is there to filter unwanted
traffic we would miss that protection.

...
> >> Making xdpgeneric-only extensions without touching native XDP makes
> >> no sense to me. Is this part of some greater vision?
> >
> > Yes, native xdp has the same issue when handling packets that exceed a
> > page (4K+ MTU) or otherwise consist of multiple segments. The issue is
> > just more acute in generic xdp. But agreed that both need to be solved
> > together.
> >
> > Many programs need only access to the header. There currently is not a
> > way to express this, or for xdp to convey that the buffer covers only
> > part of the packet.
>
> Right, my only question I had earlier was that when users ship their
> application with /sys/class/net/<DEVICE>/xdpgeneric_linearize turned off,
> how would they know how much of the data is actually pulled in? Afaik,

The short answer is that before turning linearization off, the sysadmin should
make sure that the linear section contains enough data for the program
to operate.
In doubt, leave linearization on and live with the cost.

The long answer (which probably repeats things I already discussed
with some of you):
clearly this patch is not perfect, as it lacks ways for the kernel and
bpf program to
communicate
a) whether there is a non-linear section, and
b) whether the bpf program understands non-linear/partial packets and how much
linear data (and headroom) it expects.

Adding these two features needs some agreement on the details.
We had a thread a few weeks ago about multi-segment xdp support, I am not sure
we reached a conclusion, and I am concerned that we may end up reimplementing
sg lists or simplified-skbs for use in bpf programs where perhaps we
could just live
with pull_up/accessor for occasional access to the non-linear part,
and some hints
that the program can pass to the driver/xdpgeneric to specify
requirements. for #b

Specifically:
#a is trivial -- add a field to the xdp_buff, and a helper to read it
from the bpf program;
#b is a bit less clear -- it involves a helper to either pull_up or
access the non linear data
(which one is preferable probably depends on the use case and we may want both),
and some attribute that the program passes to the kernel at load time,
to control
when linearization should be applied. I have hacked the 'license'
section to pass this
information on a per-program basis, but we need a cleaner way.

My reasoning for suggesting this patch, as an interim solution, is that
being completely opt-in, one can carefully evaluate when it is safe to use
even without having #b implemented.
For #a, the program might infer (but not reliably) that some data are
missing by looking
at the payload length which may be present in some of the headers. We
could mitigate
abuse by e.g. forcing XDP_REDIRECT and XDP_TX in xdpgeneric only
accept linear packets.

cheers
luigi

> some drivers might only have a linear section that covers the eth header
> and that is it. What should the BPF prog do in such case? Drop the skb
> since it does not have the rest of the data to e.g. make a XDP_PASS
> decision or fallback to tc/BPF altogether? I hinted earlier, one way to
> make this more graceful is to add a skb pointer inside e.g. struct
> xdp_rxq_info and then enable an bpf_skb_pull_data()-like helper e.g. as:
>
> BPF_CALL_2(bpf_xdp_pull_data, struct xdp_buff *, xdp, u32, len)
> {
> struct sk_buff *skb = xdp->rxq->skb;
>
> return skb ? bpf_try_make_writable(skb, len ? :
> skb_headlen(skb)) : -ENOTSUPP;
> }
>
> Thus, when the data/data_end test fails in generic XDP, the user can
> call e.g. bpf_xdp_pull_data(xdp, 64) to make sure we pull in as much as
> is needed w/o full linearization and once done the data/data_end can be
> repeated to proceed. Native XDP will leave xdp->rxq->skb as NULL, but
> later we could perhaps reuse the same bpf_xdp_pull_data() helper for
> native with skb-less backing. Thoughts?
>
> Thanks,
> Daniel