Received: by 2002:a05:6a10:2785:0:0:0:0 with SMTP id ia5csp406141pxb; Wed, 13 Jan 2021 06:42:16 -0800 (PST) X-Google-Smtp-Source: ABdhPJyaqarpMHz3FD/5T5MYCluw4xaBO/iA7rh5gr9tGIPGuV1ZO9Te++fwriivvJR/QR8B6v3a X-Received: by 2002:aa7:c886:: with SMTP id p6mr2093874eds.352.1610548936585; Wed, 13 Jan 2021 06:42:16 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1610548936; cv=none; d=google.com; s=arc-20160816; b=Rd0Wga3g4mfBOU91xPDSw35SLajTieCRI7bGMn9QizF35iu99CHOWUjLszPu1zLLqy aNotaLCykNV1/F9NoE1k+lf1z8rTSaa4U9LCBux9xxAUpnYBITze5/b1WiC5jYM9mH4B 7RMZUgMDGte8UmLIyWAZeKocwpWP9pRb0doBqvZTOmzdoOt4Xrg7Cg15vnkaAwSXPTFo Zvxa/HtfZLO2hz8rspQCiWxMA40GaRz9KM4+Tx8pnOYagEDvkGTw9COMabUs2uWH832d qX687sNGjVdjbyPaFmaT8JXodjSGOfhAmxX8hqjdHnmHYIa3PxzdBwHK7ENl0Tfb4+Td umJw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=2ujGnSKYEthvi0rZ2iClFRGUsHpgEMsu727w+G030VY=; b=KekY2AiGMkpKa3XMLmxLI7PzACEn0OtGknQr+snlbvckl1C0T3Rjx3caGTF03SkzOk OC+j9JVpCFsNDndJ/fRVeEYnzC3E4LQ16DPyC7Fba7lHq57q6HrXxZpaJrtfv/9dupxc yqkFZhArkZDsaEQiAJUdnMp2LR5LbXca8FS+NspkpCeR0JTZnABCdBsF4fuCYXnclYNz 51Pk3EbUZoqxzKyOmeyIZLoJpQk3kjNVHuNV2ms+Y+Digyd7bqDiZnDUn1p3KPIRBJvi FGyzfl8P+7oQyaib8zZzukojnpCjyjus3OKm3hDhMnwgDrRLyyEemSuPGEgIFzjhlQdx qPAg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=KUPL716H; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id he34si988815ejc.729.2021.01.13.06.41.52; Wed, 13 Jan 2021 06:42:16 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=KUPL716H; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727335AbhAMOkw (ORCPT + 99 others); Wed, 13 Jan 2021 09:40:52 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39372 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725843AbhAMOkv (ORCPT ); Wed, 13 Jan 2021 09:40:51 -0500 Received: from mail-pj1-x1030.google.com (mail-pj1-x1030.google.com [IPv6:2607:f8b0:4864:20::1030]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8E901C061575 for ; Wed, 13 Jan 2021 06:40:11 -0800 (PST) Received: by mail-pj1-x1030.google.com with SMTP id b5so1333104pjl.0 for ; Wed, 13 Jan 2021 06:40:11 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=2ujGnSKYEthvi0rZ2iClFRGUsHpgEMsu727w+G030VY=; b=KUPL716HUHhkSt/RjEEjpWlYVDKM/abejwsbcqehL7DCSQm+pJqoncUorlVw/OV54k YViU+8HHMxFP67B4XdlZeuFHhKN8LFgxxyAuzbMhI/TONKoL8kkMF1z2PUJzY2Z+Gzd3 6FVdE8sdiBLfF+nzsTAeDoSVodGoWs6pbHJdyHrDWIya8fLkDBpMubiZhda8l6YVYSQL TQ6gMvrlDp0E9Pb02g8KV/58eBqEYu+BKtSUdEXFQfnPiTe80AhbbzV2fj9dw+KuF5nx E8XWoEDWqy7jLDC+GnO6H3xkacv1NVRiFqCnTzYxIU/DjZQs+qOL0hVLJ+Us+11Zm6ot B0ZA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=2ujGnSKYEthvi0rZ2iClFRGUsHpgEMsu727w+G030VY=; b=I+YWYUYkQd9oVP9V+C5h7cfdIGKGsV/4kknae+G5Uz7U1UN5ny8CV8sSdVngYmU43n 8ICKOZEEpUrBQH5ZueQxbP5035D2+J6EiumF/eHjwj80SSIsJeed0p3ffk5O5xT0ZjOW ECyvSqvQNUomApTnU/TSlnNtSNqNx+y13jBVMDV2HE1vh+c4bIUk0wuYS4dLMxNTWdl5 T5Rml2s7x0TUl/Mm+4GfzC1CVytTLDmsH7g8BpYc2ZLKFPH5iyvIGhg/5iP2nWg6QqHg 8US9vopC7s4Z9IP3rSWdSf/WJN5Z7iFcQd2VGOWL5E+/F1OcZqpMj7iAxMHwOKsVa3/r +/+g== X-Gm-Message-State: AOAM5310NWpUdPkEP3C9x5H008nG+hnyS9Jbp82PlJPScdttgw+Nfb47 9xg1oQUXB0M8+CyHVNBOUJDRVw5rtZ0= X-Received: by 2002:a17:90a:aa8d:: with SMTP id l13mr1206923pjq.0.1610548810650; Wed, 13 Jan 2021 06:40:10 -0800 (PST) Received: from mail-pl1-f181.google.com (mail-pl1-f181.google.com. [209.85.214.181]) by smtp.gmail.com with ESMTPSA id ob6sm2979563pjb.30.2021.01.13.06.40.10 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 13 Jan 2021 06:40:10 -0800 (PST) Received: by mail-pl1-f181.google.com with SMTP id g3so1189027plp.2 for ; Wed, 13 Jan 2021 06:40:10 -0800 (PST) X-Received: by 2002:a67:bd0a:: with SMTP id y10mr2141127vsq.28.1610548441862; Wed, 13 Jan 2021 06:34:01 -0800 (PST) MIME-Version: 1.0 References: <20210112194143.1494-1-yuri.benditovich@daynix.com> <78bbc518-4b73-4629-68fb-2713250f8967@redhat.com> In-Reply-To: <78bbc518-4b73-4629-68fb-2713250f8967@redhat.com> From: Willem de Bruijn Date: Wed, 13 Jan 2021 09:33:25 -0500 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [RFC PATCH 0/7] Support for virtio-net hash reporting To: Jason Wang Cc: Willem de Bruijn , Yuri Benditovich , "David S. Miller" , Jakub Kicinski , "Michael S . Tsirkin" , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , John Fastabend , KP Singh , rdunlap@infradead.org, "Gustavo A . R . Silva" , Herbert Xu , Steffen Klassert , Pablo Neira Ayuso , decui@microsoft.com, cai@lca.pw, Jakub Sitnicki , Marco Elver , Paolo Abeni , Network Development , linux-kernel , kvm@vger.kernel.org, virtualization@lists.linux-foundation.org, bpf , Yan Vugenfirer Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jan 12, 2021 at 11:11 PM Jason Wang wrote: > > > On 2021/1/13 =E4=B8=8A=E5=8D=887:47, Willem de Bruijn wrote: > > On Tue, Jan 12, 2021 at 3:29 PM Yuri Benditovich > > wrote: > >> On Tue, Jan 12, 2021 at 9:49 PM Yuri Benditovich > >> wrote: > >>> On Tue, Jan 12, 2021 at 9:41 PM Yuri Benditovich > >>> wrote: > >>>> Existing TUN module is able to use provided "steering eBPF" to > >>>> calculate per-packet hash and derive the destination queue to > >>>> place the packet to. The eBPF uses mapped configuration data > >>>> containing a key for hash calculation and indirection table > >>>> with array of queues' indices. > >>>> > >>>> This series of patches adds support for virtio-net hash reporting > >>>> feature as defined in virtio specification. It extends the TUN modul= e > >>>> and the "steering eBPF" as follows: > >>>> > >>>> Extended steering eBPF calculates the hash value and hash type, keep= s > >>>> hash value in the skb->hash and returns index of destination virtque= ue > >>>> and the type of the hash. TUN module keeps returned hash type in > >>>> (currently unused) field of the skb. > >>>> skb->__unused renamed to 'hash_report_type'. > >>>> > >>>> When TUN module is called later to allocate and fill the virtio-net > >>>> header and push it to destination virtqueue it populates the hash > >>>> and the hash type into virtio-net header. > >>>> > >>>> VHOST driver is made aware of respective virtio-net feature that > >>>> extends the virtio-net header to report the hash value and hash repo= rt > >>>> type. > >>> Comment from Willem de Bruijn: > >>> > >>> Skbuff fields are in short supply. I don't think we need to add one > >>> just for this narrow path entirely internal to the tun device. > >>> > >> We understand that and try to minimize the impact by using an already > >> existing unused field of skb. > > Not anymore. It was repurposed as a flags field very recently. > > > > This use case is also very narrow in scope. And a very short path from > > data producer to consumer. So I don't think it needs to claim scarce > > bits in the skb. > > > > tun_ebpf_select_queue stores the field, tun_put_user reads it and > > converts it to the virtio_net_hdr in the descriptor. > > > > tun_ebpf_select_queue is called from .ndo_select_queue. Storing the > > field in skb->cb is fragile, as in theory some code could overwrite > > that between field between ndo_select_queue and > > ndo_start_xmit/tun_net_xmit, from which point it is fully under tun > > control again. But in practice, I don't believe anything does. > > > > Alternatively an existing skb field that is used only on disjoint > > datapaths, such as ingress-only, could be viable. > > > A question here. We had metadata support in XDP for cooperation between > eBPF programs. Do we have something similar in the skb? > > E.g in the RSS, if we want to pass some metadata information between > eBPF program and the logic that generates the vnet header (either hard > logic in the kernel or another eBPF program). Is there any way that can > avoid the possible conflicts of qdiscs? Not that I am aware of. The closest thing is cb[]. It'll have to aliase a field like that, that is known unused for the given = path. One other approach that has been used within linear call stacks is out of band. Like percpu variables softnet_data.xmit.more and mirred_rec_level. But that is perhaps a bit overwrought for this use case. > > > >>> Instead, you could just run the flow_dissector in tun_put_user if the > >>> feature is negotiated. Indeed, the flow dissector seems more apt to m= e > >>> than BPF here. Note that the flow dissector internally can be > >>> overridden by a BPF program if the admin so chooses. > >>> > >> When this set of patches is related to hash delivery in the virtio-net > >> packet in general, > >> it was prepared in context of RSS feature implementation as defined in > >> virtio spec [1] > >> In case of RSS it is not enough to run the flow_dissector in tun_put_u= ser: > >> in tun_ebpf_select_queue the TUN calls eBPF to calculate the hash, > >> hash type and queue index > >> according to the (mapped) parameters (key, hash types, indirection > >> table) received from the guest. > > TUNSETSTEERINGEBPF was added to support more diverse queue selection > > than the default in case of multiqueue tun. Not sure what the exact > > use cases are. > > > > But RSS is exactly the purpose of the flow dissector. It is used for > > that purpose in the software variant RPS. The flow dissector > > implements a superset of the RSS spec, and certainly computes a > > four-tuple for TCP/IPv6. In the case of RPS, it is skipped if the NIC > > has already computed a 4-tuple hash. > > > > What it does not give is a type indication, such as > > VIRTIO_NET_HASH_TYPE_TCPv6. I don't understand how this would be used. > > In datapaths where the NIC has already computed the four-tuple hash > > and stored it in skb->hash --the common case for servers--, That type > > field is the only reason to have to compute again. > > > The problem is there's no guarantee that the packet comes from the NIC, > it could be a simple VM2VM or host2VM packet. > > And even if the packet is coming from the NIC that calculates the hash > there's no guarantee that it's the has that guest want (guest may use > different RSS keys). Ah yes, of course. I would still revisit the need to store a detailed hash_type along with the hash, as as far I can tell that conveys no actionable information to the guest.