Received: by 2002:a25:c205:0:0:0:0:0 with SMTP id s5csp4347005ybf; Wed, 4 Mar 2020 02:08:03 -0800 (PST) X-Google-Smtp-Source: ADFU+vsy/OfaIOqCqL8BsbFRb5O8Tj0xzkvtgVe/oRgawhrPGUvRQhzs34yrKhIoK8uSgRakRuN7 X-Received: by 2002:a05:6830:238f:: with SMTP id l15mr758489ots.211.1583316482945; Wed, 04 Mar 2020 02:08:02 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1583316482; cv=none; d=google.com; s=arc-20160816; b=IhZp0scVXzySwGtHssuZIn/mLEEzFoz2xhIpb51+i6EAeZouBVuFJDefxNgWswegyO V8E2K1E8mMVc8Oi8XHZl0TcKZMTuaj36bqG7qu57RuWz64xBS/rQriUyrVDEbvg5qPF6 zKZ8Trk/s284pZzQOAwdwupCksbyC34A8RjZquEkYmRh2f32Y9SGLr2HpNOnQ+libddw udjOukSNc7BzXhnyEX4k4W91pXWuMoucUoRIUznxaQVvIYeK5OqCPq83zA7EppIgccuh lo31JPQ2lgZL1BfHNByv24Q+q6Dj5WQkhXWxMlKEfLlzaAt4sA+29c1MjyrvFDsIfCZE AaLg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=qvjAotCReIalwn7hfhRA/t4b2Gvv6W6FE/RxEgqf7t8=; b=ylQSV5rpe2ehWRbHfsM0f+MwcaFeowWXSZM+LwZtYXz7Jx7UICIjRw1xJ3DLIcTxwQ ccRjmTdfblajXJSgrv+WaXD8IqWrF/w2bSTj2tD4K0cWFXa+uYMfWjRbIclm0qpNs46q ApfcoHI4EnziG0CWTCxtYGAY0OpPhJ2jkl0d49M18m2pLAxowksP4Ns6ofkXfRsXgUz0 hjhwxpZcPew0IP4hWW19lHb895cC9ODPZ6yIWU4ad6VTnuHfG15OhKQsngiraBRfuw00 82OLWGKEHEuhyobe0jK5d+FvOCchnur0ySAQre/Pkh99ZLe7zd/z8O0c41V2VmbD9yWq srJQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b="TlS//aLM"; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id a23si866191oie.81.2020.03.04.02.07.49; Wed, 04 Mar 2020 02:08:02 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b="TlS//aLM"; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2387624AbgCDKG3 (ORCPT + 99 others); Wed, 4 Mar 2020 05:06:29 -0500 Received: from mail-ed1-f66.google.com ([209.85.208.66]:45101 "EHLO mail-ed1-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726137AbgCDKG2 (ORCPT ); Wed, 4 Mar 2020 05:06:28 -0500 Received: by mail-ed1-f66.google.com with SMTP id h62so1573518edd.12 for ; Wed, 04 Mar 2020 02:06:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=qvjAotCReIalwn7hfhRA/t4b2Gvv6W6FE/RxEgqf7t8=; b=TlS//aLMBwYMpsokQs4f+uhkB8/BG6XV1ID6SLScsSnRde6IV1KEaVzUesz7X0HS48 sIqEqXZkY/mPsymtHOURw4GXi3q66hWqhz2+X9RmD1Tt6GIOl5oQyAoTRaDV+5sRRvhv IvnergFq3KB0EY3wYxA/Gb3RTzWpD64yfgeDV3JxMSWKZo+ch1ImmDmVc8jJLAtePuTN 5+J8TBDTgjG5wlwCK7CmVA+ykJI20Yp+CqZVyJsr1sUMeh2NpynuAmioA6YLOSuZCp3t ktALDGuW+aAoYRgsy3D1e9JSlCS0FPG0GzdP12OhnzaiM45ZI2LYw36sH6Z1KBVgRgP7 DORg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=qvjAotCReIalwn7hfhRA/t4b2Gvv6W6FE/RxEgqf7t8=; b=OhEqwDI21YIWyRyIfksL4AJKjQX136yy+vNKkYNe6C/HUXjLZsf+VvC0MnrI/rkCzO UscCrtZQneDk7hlsxAKsNjmDM9/awljxIOrBJzeKzH70bDT4Kg4d/JQpKUK1YkjY4Mnf Cfn1F0kzLJF6iQ8dmUJxef9xBUPjdDKUNPSjx/mmYUlLlJX6nxD5+P4ohN9BfIKlux1g qEtjtN0cYwlRk1S00QzjAdatGec9cUHASG6g+qMyTOJnt6ljPCS03g2y+Xy4McxBJGtX jA5F+mE5yrmNSCit6aFPMfaPxiBCXG3dokgEocP/eCBw9h1lebxSH9v7Y5a+VYYIkI+N AG+Q== X-Gm-Message-State: ANhLgQ0YPsDaN8rPZm6SqTeYfMus23aeIZ4UTXwBLdSS96fXXsEbjOd8 hrTGEHXh0LCjcpyQ94fmYXA1XWA/Cbypjh7aOkzdQA== X-Received: by 2002:a17:906:15c2:: with SMTP id l2mr1771475ejd.302.1583316386498; Wed, 04 Mar 2020 02:06:26 -0800 (PST) MIME-Version: 1.0 References: <20200228105435.75298-1-lrizzo@google.com> <20200228110043.2771fddb@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com> <3c27d9c0-eb17-b20f-2d10-01f3bdf8c0d6@iogearbox.net> In-Reply-To: <3c27d9c0-eb17-b20f-2d10-01f3bdf8c0d6@iogearbox.net> From: Luigi Rizzo Date: Wed, 4 Mar 2020 02:06:15 -0800 Message-ID: Subject: Re: [PATCH v4] netdev attribute to control xdpgeneric skb linearization To: Daniel Borkmann Cc: Willem de Bruijn , Jakub Kicinski , Network Development , =?UTF-8?B?VG9rZSBIw7hpbGFuZC1Kw7hyZ2Vuc2Vu?= , David Miller , Jesper Dangaard Brouer , "Jubran, Samih" , linux-kernel , ast@kernel.org, bpf@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org [taking one message in the thread to answer multiple issues] On Tue, Mar 3, 2020 at 11:47 AM Daniel Borkmann wrote: > > On 2/29/20 12:53 AM, Willem de Bruijn wrote: > > On Fri, Feb 28, 2020 at 2:01 PM Jakub Kicinski wrote: > >> On Fri, 28 Feb 2020 02:54:35 -0800 Luigi Rizzo wrote: > >>> Add a netdevice flag to control skb linearization in generic xdp mode. > >>> > >>> The attribute can be modified through > >>> /sys/class/net//xdpgeneric_linearize > >>> The default is 1 (on) ... > >>> ns/pkt RECEIVER SENDER > >>> > >>> p50 p90 p99 p50 p90 p99 > >>> > >>> LINEARIZATION: 600ns 1090ns 4900ns 149ns 249ns 460ns > >>> NO LINEARIZATION: 40ns 59ns 90ns 40ns 50ns 100ns ... > >> Just load your program in cls_bpf. No extensions or knobs needed. Yes this is indeed an option, perhaps the only downside is that it acts after packet taps, so if, say, the program is there to filter unwanted traffic we would miss that protection. ... > >> Making xdpgeneric-only extensions without touching native XDP makes > >> no sense to me. Is this part of some greater vision? > > > > Yes, native xdp has the same issue when handling packets that exceed a > > page (4K+ MTU) or otherwise consist of multiple segments. The issue is > > just more acute in generic xdp. But agreed that both need to be solved > > together. > > > > Many programs need only access to the header. There currently is not a > > way to express this, or for xdp to convey that the buffer covers only > > part of the packet. > > Right, my only question I had earlier was that when users ship their > application with /sys/class/net//xdpgeneric_linearize turned off, > how would they know how much of the data is actually pulled in? Afaik, The short answer is that before turning linearization off, the sysadmin should make sure that the linear section contains enough data for the program to operate. In doubt, leave linearization on and live with the cost. The long answer (which probably repeats things I already discussed with some of you): clearly this patch is not perfect, as it lacks ways for the kernel and bpf program to communicate a) whether there is a non-linear section, and b) whether the bpf program understands non-linear/partial packets and how much linear data (and headroom) it expects. Adding these two features needs some agreement on the details. We had a thread a few weeks ago about multi-segment xdp support, I am not sure we reached a conclusion, and I am concerned that we may end up reimplementing sg lists or simplified-skbs for use in bpf programs where perhaps we could just live with pull_up/accessor for occasional access to the non-linear part, and some hints that the program can pass to the driver/xdpgeneric to specify requirements. for #b Specifically: #a is trivial -- add a field to the xdp_buff, and a helper to read it from the bpf program; #b is a bit less clear -- it involves a helper to either pull_up or access the non linear data (which one is preferable probably depends on the use case and we may want both), and some attribute that the program passes to the kernel at load time, to control when linearization should be applied. I have hacked the 'license' section to pass this information on a per-program basis, but we need a cleaner way. My reasoning for suggesting this patch, as an interim solution, is that being completely opt-in, one can carefully evaluate when it is safe to use even without having #b implemented. For #a, the program might infer (but not reliably) that some data are missing by looking at the payload length which may be present in some of the headers. We could mitigate abuse by e.g. forcing XDP_REDIRECT and XDP_TX in xdpgeneric only accept linear packets. cheers luigi > some drivers might only have a linear section that covers the eth header > and that is it. What should the BPF prog do in such case? Drop the skb > since it does not have the rest of the data to e.g. make a XDP_PASS > decision or fallback to tc/BPF altogether? I hinted earlier, one way to > make this more graceful is to add a skb pointer inside e.g. struct > xdp_rxq_info and then enable an bpf_skb_pull_data()-like helper e.g. as: > > BPF_CALL_2(bpf_xdp_pull_data, struct xdp_buff *, xdp, u32, len) > { > struct sk_buff *skb = xdp->rxq->skb; > > return skb ? bpf_try_make_writable(skb, len ? : > skb_headlen(skb)) : -ENOTSUPP; > } > > Thus, when the data/data_end test fails in generic XDP, the user can > call e.g. bpf_xdp_pull_data(xdp, 64) to make sure we pull in as much as > is needed w/o full linearization and once done the data/data_end can be > repeated to proceed. Native XDP will leave xdp->rxq->skb as NULL, but > later we could perhaps reuse the same bpf_xdp_pull_data() helper for > native with skb-less backing. Thoughts? > > Thanks, > Daniel