Received: by 2002:ad5:4acb:0:0:0:0:0 with SMTP id n11csp345832imw; Mon, 4 Jul 2022 10:15:43 -0700 (PDT) X-Google-Smtp-Source: AGRyM1vjR0QNoNiERThbZYJNAEGhNTy+hWO5fLH/A/ZM4MOYHpsoguUtXmI7129PxIZ/NuGFYzQ3 X-Received: by 2002:a63:5b26:0:b0:40d:9515:b5ac with SMTP id p38-20020a635b26000000b0040d9515b5acmr26860472pgb.51.1656954942899; Mon, 04 Jul 2022 10:15:42 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1656954942; cv=none; d=google.com; s=arc-20160816; b=vmPlXBh/jmln/L/4vWIMCUqxsU/7MLC+wh6Y4azd4IkryqfdKcmy61nql9YFDjXmT5 gglNeCL8MIwErCCJOJCSPcubcbofZnJoAl9tiMVVqO7IGZn8L/uVEGft17yMRc7NF7xj BWtRAzmQOderF0huRfSlMW6jc0krcSR5Q4D3VC6rTpwvaACAywYKFhBeJXReKo9mUDXm 7nJgjQRqFKjr37srFtZI0EQDbotiya1l08IKju1WQxfzHZICtIKQxmJ9oAL5ogaaEOvO yQCIGoKogisCaGE+eQpF41m/SVjL1AjgbG9ayM55WC1C89Vxy83x7491KRNFeEcg1o9q zOow== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:message-id:date:references :in-reply-to:subject:cc:to:from:dkim-signature; bh=90xhSQDtM76aihPHt68+M5RcBuaIA4lDfnCcukO+uC4=; b=A3+DQ+tym/Z4jAIK38h7xDVjIHBVaRvzw2tOPH0p14/dqPabXNTay6p/plMImTAYMg L+0lT9TR2kDsOSwsVDSf8LwQr9VQXT6tfZFSbfyM7oIqbrx+7HZ5XolJn4Rr9qMVd4xs u0rfrjHB0/zeXt8RSRhdfHy07/8AWYsY8hWKAYvfyVTc9TnoBdFk9r3SPdSstyQTVxLw nqqNDCCYXty6e2/+z8Hk4Ra1qf8Q/DypNYtadiSJb7h3KSTJEMQMVmPX73RGvoy9r0fE 7XaTcP0mxbaxjCknDnC2mFpvbCSLWVc0nVWLyuLtYoWGCQM9l1YK/8+iGdmaOoZMWEt6 r0fg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=RtovKlJM; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id a2-20020a17090a740200b001ececaf6f37si21209120pjg.149.2022.07.04.10.15.30; Mon, 04 Jul 2022 10:15:42 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=RtovKlJM; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230168AbiGDROS (ORCPT + 99 others); Mon, 4 Jul 2022 13:14:18 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57098 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233028AbiGDROP (ORCPT ); Mon, 4 Jul 2022 13:14:15 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 29F991208B for ; Mon, 4 Jul 2022 10:14:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1656954848; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=90xhSQDtM76aihPHt68+M5RcBuaIA4lDfnCcukO+uC4=; b=RtovKlJMoO6yHzvssbz4ZymZ77xk0ve+VC3PW9K0B/VrNRXi+60EPAG2JNDYoBA4Li+92d AxRsma0zM5k0L157jAdjW8MEMwhYIzH4z263dtQnK7AX1Tk9S3xH+BwrJ6jhuoSzVvxMEh /Hh+1qQZfdK+XLtfougEn5Ob9F5ZmZM= Received: from mail-ej1-f70.google.com (mail-ej1-f70.google.com [209.85.218.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-361-BMajq41jPRuZupLrRoC9RA-1; Mon, 04 Jul 2022 13:14:07 -0400 X-MC-Unique: BMajq41jPRuZupLrRoC9RA-1 Received: by mail-ej1-f70.google.com with SMTP id sd14-20020a1709076e0e00b0072a7c5a08f4so2182666ejc.21 for ; Mon, 04 Jul 2022 10:14:06 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:in-reply-to:references:date :message-id:mime-version; bh=90xhSQDtM76aihPHt68+M5RcBuaIA4lDfnCcukO+uC4=; b=GwCVST5nRMaDk3WCIH2PH6s8LA73lZP3xfzAtaVVlkeMgUL1ayMdwzpyXXYJW5dU/A BMjGLC0uCG/FIC9hgWoh/fBAYoNg5eN2Kb6JFqjgBEw7EGTjibwcNUOqIlDOkMHnq01Q ePvi97xhk8abAcfnVUx8jsV5zpCbwfVAZaTZ4OkIOLA/dsFFuekHZqgVH9bawSk57OyY C1E9jcXPE3dDzF34A+PB9Ev5q317aUDm9u7fe8K3aqnmTx9BShLmN0aapFfb0IRPASut YLtt1QzHgDBg9xIVAwL9sC9NMkY3NZOQtyL4o/iMcZIt92MSfWDXXeLbnMurwz2sR5IH tQmA== X-Gm-Message-State: AJIora+JayAzHHG5tFzZwPHT+2kulNWzGyMaRxegGMSER3Fg4+D2ml9B etVG4VR65IfYnwmYYeo+bKcE0n8s0tUghrCzrhr6VqRMddXfxtq/CJ32vgZlbpDYR8nMnox4lTn NbgL3Q/TsWlI8JYhFPXOgy3z5 X-Received: by 2002:a17:906:8479:b0:72a:5610:f151 with SMTP id hx25-20020a170906847900b0072a5610f151mr24224545ejc.125.1656954845685; Mon, 04 Jul 2022 10:14:05 -0700 (PDT) X-Received: by 2002:a17:906:8479:b0:72a:5610:f151 with SMTP id hx25-20020a170906847900b0072a5610f151mr24224508ejc.125.1656954845270; Mon, 04 Jul 2022 10:14:05 -0700 (PDT) Received: from alrua-x1.borgediget.toke.dk ([45.145.92.2]) by smtp.gmail.com with ESMTPSA id by27-20020a0564021b1b00b004356112a8a2sm21023202edb.15.2022.07.04.10.14.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 04 Jul 2022 10:14:04 -0700 (PDT) Received: by alrua-x1.borgediget.toke.dk (Postfix, from userid 1000) id 14DE2477A3F; Mon, 4 Jul 2022 19:14:04 +0200 (CEST) From: Toke =?utf-8?Q?H=C3=B8iland-J=C3=B8rgensen?= To: Alexander Lobakin Cc: John Fastabend , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Larysa Zaremba , Michal Swiatkowski , Jesper Dangaard Brouer , =?utf-8?B?QmrDtnJuIFTDtnBlbA==?= , Magnus Karlsson , Maciej Fijalkowski , Jonathan Lemon , Lorenzo Bianconi , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Jesse Brandeburg , Yajun Deng , Willem de Bruijn , bpf@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, xdp-hints@xdp-project.net Subject: Re: [xdp-hints] Re: [PATCH RFC bpf-next 00/52] bpf, xdp: introduce and use Generic Hints/metadata In-Reply-To: <20220704154440.7567-1-alexandr.lobakin@intel.com> References: <20220628194812.1453059-1-alexandr.lobakin@intel.com> <62bbedf07f44a_2181420830@john.notmuch> <87iloja8ly.fsf@toke.dk> <20220704154440.7567-1-alexandr.lobakin@intel.com> X-Clacks-Overhead: GNU Terry Pratchett Date: Mon, 04 Jul 2022 19:14:04 +0200 Message-ID: <87a69o94wz.fsf@toke.dk> MIME-Version: 1.0 Content-Type: text/plain X-Spam-Status: No, score=-3.5 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_LOW, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Alexander Lobakin writes: > From: Toke H??iland-J??rgensen > Date: Wed, 29 Jun 2022 15:43:05 +0200 > >> John Fastabend writes: >> >> > Alexander Lobakin wrote: >> >> This RFC is to give the whole picture. It will most likely be split >> >> onto several series, maybe even merge cycles. See the "table of >> >> contents" below. >> > >> > Even for RFC its a bit much. Probably improve the summary >> > message here as well I'm still not clear on the overall >> > architecture so not sure I want to dig into patches. >> >> +1 on this, and piggybacking on your comment to chime in on the general >> architecture. >> >> >> Now, a NIC driver, or even a SmartNIC itself, can put those params >> >> there in a well-defined format. The format is fixed, but can be of >> >> several different types represented by structures, which definitions >> >> are available to the kernel, BPF programs and the userland. >> > >> > I don't think in general the format needs to be fixed. >> >> No, that's the whole point of BTF: it's not supposed to be UAPI, we'll >> use CO-RE to enable dynamic formats... >> >> [...] >> >> >> It is fixed due to it being almost a UAPI, and the exact format can >> >> be determined by reading the last 10 bytes of metadata. They contain >> >> a 2-byte magic ID to not confuse it with a non-compatible meta and >> >> a 8-byte combined BTF ID + type ID: the ID of the BTF where this >> >> structure is defined and the ID of that definition inside that BTF. >> >> Users can obtain BTF IDs by structure types using helpers available >> >> in the kernel, BPF (written by the CO-RE/verifier) and the userland >> >> (libbpf -> kernel call) and then rely on those ID when reading data >> >> to make sure whether they support it and what to do with it. >> >> Why separate magic and ID? The idea is to make different formats >> >> always contain the basic/"generic" structure embedded at the end. >> >> This way we can still benefit in purely generic consumers (like >> >> cpumap) while providing some "extra" data to those who support it. >> > >> > I don't follow this. If you have a struct in your driver name it >> > something obvious, ice_xdp_metadata. If I understand things >> > correctly just dump the BTF for the driver, extract the >> > struct and done you can use CO-RE reads. For the 'fixed' case >> > this looks easy. And I don't think you even need a patch for this. >> >> ...however as we've discussed previously, we do need a bit of >> infrastructure around this. In particular, we need to embed the embed >> the BTF ID into the metadata itself so BPF can do runtime disambiguation >> between different formats (and add the right CO-RE primitives to make >> this easy). This is for two reasons: >> >> - The metadata might be different per-packet (e.g., PTP packets with >> timestamps interleaved with bulk data without them) >> >> - With redirects we may end up processing packets from different devices >> in a single XDP program (in devmap or cpumap, or on a veth) so we need >> to be able to disambiguate at runtime. >> >> So I think the part of the design that puts the BTF ID into the end of >> the metadata struct is sound; however, the actual format doesn't have to >> be fixed, we can use CO-RE to pick out the bits that a given BPF program >> needs; we just need a convention for how drivers report which format(s) >> they support. Which we should also agree on (and add core infrastructure >> around) so each driver doesn't go around inventing their own >> conventions. >> >> >> The enablement of this feature is controlled on attaching/replacing >> >> XDP program on an interface with two new parameters: that combined >> >> BTF+type ID and metadata threshold. >> >> The threshold specifies the minimum frame size which a driver (or >> >> NIC) should start composing metadata from. It is introduced instead >> >> of just false/true flag due to that often it's not worth it to spend >> >> cycles to fetch all that data for such small frames: let's say, it >> >> can be even faster to just calculate checksums for them on CPU >> >> rather than touch non-coherent DMA zone. Simple XDP_DROP case loses >> >> 15 Mpps on 64 byte frames with enabled metadata, threshold can help >> >> mitigate that. >> > >> > I would put this in the bonus category. Can you do the simple thing >> > above without these extra bits and then add them later. Just >> > pick some overly conservative threshold to start with. >> >> Yeah, I'd agree this kind of configuration is something that can be >> added later, and also it's sort of orthogonal to the consumption of the >> metadata itself. >> >> Also, tying this configuration into the loading of an XDP program is a >> terrible interface: these are hardware configuration options, let's just >> put them into ethtool or 'ip link' like any other piece of device >> configuration. > > I don't believe it fits there, especially Ethtool. Ethtool is for > hardware configuration, XDP/AF_XDP is 95% software stuff (apart from > offload bits which is purely NFP's for now). But XDP-hints is about consuming hardware features. When you're configuring which metadata items you want, you're saying "please provide me with these (hardware) features". So ethtool is an excellent place to do that :) > I follow that way: > > 1) you pick a program you want to attach; > 2) usually they are written for special needs and usecases; > 3) so most likely that program will be tied with metadata/driver/etc > in some way; > 4) so you want to enable Hints of a particular format primarily for > this program and usecase, same with threshold and everything > else. > > Pls explain how you see it, I might be wrong for sure. As above: XDP hints is about giving XDP programs (and AF_XDP consumers) access to metadata that is not currently available. Tying the lifetime of that hardware configuration (i.e., which information to provide) to the lifetime of an XDP program is not a good interface: for one thing, how will it handle multiple programs? What about when XDP is not used at all but you still want to configure the same features? In addition, in every other case where we do dynamic data access (with CO-RE) the BPF program is a consumer that modifies itself to access the data provided by the kernel. I get that this is harder to achieve for AF_XDP, but then let's solve that instead of making a totally inconsistent interface for XDP. I'm as excited as you about the prospect of having totally programmable hardware where you can just specify any arbitrary metadata format and it'll provide that for you. But that is an orthogonal feature: let's start with creating a dynamic interface for consuming the (static) hardware features we already have, and then later we can have a separate interface for configuring more dynamic hardware features. XDP-hints is about adding this consumption feature in a way that's sufficiently dynamic that we can do the other (programmable hardware) thing on top later... -Toke