Received: by 2002:a6b:fb09:0:0:0:0:0 with SMTP id h9csp621909iog; Wed, 29 Jun 2022 07:04:44 -0700 (PDT) X-Google-Smtp-Source: AGRyM1ulAQub6sUdMK6uLKRAtmPGBlZzVZKNtXMdXUWUZyl0PbBxrMmNNWiocv9hhdKn3ZAwfrYf X-Received: by 2002:a17:906:1b01:b0:726:be5e:7125 with SMTP id o1-20020a1709061b0100b00726be5e7125mr3537662ejg.678.1656511484063; Wed, 29 Jun 2022 07:04:44 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1656511484; cv=none; d=google.com; s=arc-20160816; b=jD25MNhj2aVfSYi/KG4J/QOtpZmYsu6Dojn2jENRih+VZP7DoQ1+ktlctECY38aqo8 QvuDPUe+66sKe+VXK4Rho9OgZ8lDv2MrKOBejWyHrUwfvm2ngL+o3RepQzi2awhk3n2B 6xq+aHYsDnfavHA518jZejGTSnEZgaDP9HsG9mYbSuiY8DPHPzLBYWNgOjqkACS8fZxl NVMY0aMAaGEZEaGjtxqiAxqD+9yCgEKNhKjIVWbLCwYwPO8VjTof9SSMhUnczJZqcNvL /Ncyaitow9h7w/FJwUlvgkhmgDpzAoZXqdmXbDM2CbgvVqdquEX2pGAzgC/bSnXVc5EI MFcg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:message-id:date:references :in-reply-to:subject:cc:to:from:dkim-signature; bh=UW6EWWcDgP+o+MFt4vmMvtXMf9IPmqPd5CK2PWObuB8=; b=pnDfsDNRwGbEfJ6S1J5O227dzXak1EDu5wg0L0sYVzsfdqheHZivy99JcmGCVJF23y Z5PbeOXPRaVj1a8pddKuzgtO3tyOiOcBZZmI9cnUwhrmUM3I8daIPq5SmeZGrbk+U8+H 1c/HD36mPX+K53Ns+Z6FQMdWduKSDeKi4HUFmOH48HI3ZS2v1+oK1WNLt1FpYnyv8tMW LV0yD2dqOheFeoINXbTcFBHh2ij2W1tx+2upcvnwZWb/hfpdip176dN7V9ehV207UlWS BataBaYUlPKjcoNHq8Ock0igLVp85wFsb59TGSY3qJldhlfM369oCUq/05k5nEp/m6mT 7G2w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=MmnvNBt9; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id 18-20020a508e12000000b004357cc6877dsi20468330edw.202.2022.06.29.07.04.09; Wed, 29 Jun 2022 07:04:44 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=MmnvNBt9; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233906AbiF2Nn0 (ORCPT + 99 others); Wed, 29 Jun 2022 09:43:26 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45326 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233882AbiF2NnP (ORCPT ); Wed, 29 Jun 2022 09:43:15 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id B396CBE08 for ; Wed, 29 Jun 2022 06:43:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1656510191; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=UW6EWWcDgP+o+MFt4vmMvtXMf9IPmqPd5CK2PWObuB8=; b=MmnvNBt96+9Ox/SfWisSRjifq5VYQVdoJMhDISVi7AuePQoV+CPMo52aRfqdBdw2bhoQ4u jeyAkkENHinXB91yGHpgaQFW8GG+GABbKFN6VEq+66kSblhh1wi0naiDZtDSar3Rb8v4h4 DPHMghm4JtD0AFj5HwcIMArF04nhTF4= Received: from mail-ed1-f72.google.com (mail-ed1-f72.google.com [209.85.208.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-83-PQfGF3ECMuCNp9dVcQw_nw-1; Wed, 29 Jun 2022 09:43:10 -0400 X-MC-Unique: PQfGF3ECMuCNp9dVcQw_nw-1 Received: by mail-ed1-f72.google.com with SMTP id y18-20020a056402441200b0043564cdf765so12029826eda.11 for ; Wed, 29 Jun 2022 06:43:09 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:in-reply-to:references:date :message-id:mime-version; bh=UW6EWWcDgP+o+MFt4vmMvtXMf9IPmqPd5CK2PWObuB8=; b=L6jKv0097t+Ckmn62mImYpPFp1duCKmYhBDAhlUCvB4YKjjJRk8iNyrtgj6Oiy/BWN lKk8wD7/1uyw3DGfHz/Gnlktr9NVbR5WIviGX7lFDIaAz5Fkla4bfkZpLVDI5/0qbFpH 02PxbeMQJV9uqzf068O705pr/L4sEFK4WsskQiQY3Qa5pZ/dD413wKU+WXhIHoBvGHf6 Ni4eN8O1xYiGoF708pbnFOjc7BL97DxrVXkrzhia3joKdPw2+zJrp4zQdXuf4SJq1bVG 8FF5E7eIPxxmU0k9eubTZHJY5i4dnAGLlmRWnEeyxXA1Mm7CSRfqyELqKVPoy7ShhsqO IVLQ== X-Gm-Message-State: AJIora85VagkAo8M9Y2aSyNEK7p7nCz+RZvoU5S20wC7fgubtcCZ8Je+ +rKl1GRSw/3f/BU9rrOedzCRQua/bZjBRwzys4fKOrpyMF2hvBVuDoZhkThRVe5+5QSrdrHd9Ej cwpPNzMjT/tTydH6i1xItXrFg X-Received: by 2002:a17:906:7790:b0:722:e6cf:126 with SMTP id s16-20020a170906779000b00722e6cf0126mr3292402ejm.244.1656510188266; Wed, 29 Jun 2022 06:43:08 -0700 (PDT) X-Received: by 2002:a17:906:7790:b0:722:e6cf:126 with SMTP id s16-20020a170906779000b00722e6cf0126mr3292295ejm.244.1656510186810; Wed, 29 Jun 2022 06:43:06 -0700 (PDT) Received: from alrua-x1.borgediget.toke.dk ([2a0c:4d80:42:443::2]) by smtp.gmail.com with ESMTPSA id y20-20020a17090629d400b00704cf66d415sm7768808eje.13.2022.06.29.06.43.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 29 Jun 2022 06:43:05 -0700 (PDT) Received: by alrua-x1.borgediget.toke.dk (Postfix, from userid 1000) id 4C5A4477057; Wed, 29 Jun 2022 15:43:05 +0200 (CEST) From: Toke =?utf-8?Q?H=C3=B8iland-J=C3=B8rgensen?= To: John Fastabend , Alexander Lobakin , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko Cc: Alexander Lobakin , Larysa Zaremba , Michal Swiatkowski , Jesper Dangaard Brouer , =?utf-8?B?QmrDtnJuIFTDtnBlbA==?= , Magnus Karlsson , Maciej Fijalkowski , Jonathan Lemon , Lorenzo Bianconi , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Jesse Brandeburg , John Fastabend , Yajun Deng , Willem de Bruijn , bpf@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, xdp-hints@xdp-project.net Subject: Re: [xdp-hints] Re: [PATCH RFC bpf-next 00/52] bpf, xdp: introduce and use Generic Hints/metadata In-Reply-To: <62bbedf07f44a_2181420830@john.notmuch> References: <20220628194812.1453059-1-alexandr.lobakin@intel.com> <62bbedf07f44a_2181420830@john.notmuch> X-Clacks-Overhead: GNU Terry Pratchett Date: Wed, 29 Jun 2022 15:43:05 +0200 Message-ID: <87iloja8ly.fsf@toke.dk> MIME-Version: 1.0 Content-Type: text/plain X-Spam-Status: No, score=-3.2 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_LOW, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org John Fastabend writes: > Alexander Lobakin wrote: >> This RFC is to give the whole picture. It will most likely be split >> onto several series, maybe even merge cycles. See the "table of >> contents" below. > > Even for RFC its a bit much. Probably improve the summary > message here as well I'm still not clear on the overall > architecture so not sure I want to dig into patches. +1 on this, and piggybacking on your comment to chime in on the general architecture. >> Now, a NIC driver, or even a SmartNIC itself, can put those params >> there in a well-defined format. The format is fixed, but can be of >> several different types represented by structures, which definitions >> are available to the kernel, BPF programs and the userland. > > I don't think in general the format needs to be fixed. No, that's the whole point of BTF: it's not supposed to be UAPI, we'll use CO-RE to enable dynamic formats... [...] >> It is fixed due to it being almost a UAPI, and the exact format can >> be determined by reading the last 10 bytes of metadata. They contain >> a 2-byte magic ID to not confuse it with a non-compatible meta and >> a 8-byte combined BTF ID + type ID: the ID of the BTF where this >> structure is defined and the ID of that definition inside that BTF. >> Users can obtain BTF IDs by structure types using helpers available >> in the kernel, BPF (written by the CO-RE/verifier) and the userland >> (libbpf -> kernel call) and then rely on those ID when reading data >> to make sure whether they support it and what to do with it. >> Why separate magic and ID? The idea is to make different formats >> always contain the basic/"generic" structure embedded at the end. >> This way we can still benefit in purely generic consumers (like >> cpumap) while providing some "extra" data to those who support it. > > I don't follow this. If you have a struct in your driver name it > something obvious, ice_xdp_metadata. If I understand things > correctly just dump the BTF for the driver, extract the > struct and done you can use CO-RE reads. For the 'fixed' case > this looks easy. And I don't think you even need a patch for this. ...however as we've discussed previously, we do need a bit of infrastructure around this. In particular, we need to embed the embed the BTF ID into the metadata itself so BPF can do runtime disambiguation between different formats (and add the right CO-RE primitives to make this easy). This is for two reasons: - The metadata might be different per-packet (e.g., PTP packets with timestamps interleaved with bulk data without them) - With redirects we may end up processing packets from different devices in a single XDP program (in devmap or cpumap, or on a veth) so we need to be able to disambiguate at runtime. So I think the part of the design that puts the BTF ID into the end of the metadata struct is sound; however, the actual format doesn't have to be fixed, we can use CO-RE to pick out the bits that a given BPF program needs; we just need a convention for how drivers report which format(s) they support. Which we should also agree on (and add core infrastructure around) so each driver doesn't go around inventing their own conventions. >> The enablement of this feature is controlled on attaching/replacing >> XDP program on an interface with two new parameters: that combined >> BTF+type ID and metadata threshold. >> The threshold specifies the minimum frame size which a driver (or >> NIC) should start composing metadata from. It is introduced instead >> of just false/true flag due to that often it's not worth it to spend >> cycles to fetch all that data for such small frames: let's say, it >> can be even faster to just calculate checksums for them on CPU >> rather than touch non-coherent DMA zone. Simple XDP_DROP case loses >> 15 Mpps on 64 byte frames with enabled metadata, threshold can help >> mitigate that. > > I would put this in the bonus category. Can you do the simple thing > above without these extra bits and then add them later. Just > pick some overly conservative threshold to start with. Yeah, I'd agree this kind of configuration is something that can be added later, and also it's sort of orthogonal to the consumption of the metadata itself. Also, tying this configuration into the loading of an XDP program is a terrible interface: these are hardware configuration options, let's just put them into ethtool or 'ip link' like any other piece of device configuration. >> The RFC can be divided into 8 parts: > > I'm missing something why not do the simplest bit of work and > get this running in ice with a few smallish driver updates > so we can all see it. No need for so many patches. Agreed. This incremental approach is basically what Jesper's simultaneous series makes a start on, AFAICT? Would be nice if y'all could converge the efforts :) [...] > I really think your asking questions that are two or three > jumps away. Why not do the simplest bit first and kick > the driver with an on/off switch into this mode. But > I don't understand this cpumap use case so maybe explain > that first. > > And sorry didn't even look at your 50+ patches. Figure lets > get agreement on the goal first. +1 on both of these :) -Toke