Received: by 2002:a05:7412:da14:b0:e2:908c:2ebd with SMTP id fe20csp1548540rdb; Sun, 8 Oct 2023 13:04:32 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFXXdG6WCqsUYDyiXZ5CqchhQEimkKoGo7jCYWwX9INzMM2o0DwWcHyePAODbMD8KFIHNbz X-Received: by 2002:a05:6358:3411:b0:13a:d269:bd22 with SMTP id h17-20020a056358341100b0013ad269bd22mr11907827rwd.25.1696795472602; Sun, 08 Oct 2023 13:04:32 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1696795472; cv=none; d=google.com; s=arc-20160816; b=Row0kGUWmCjybLl3TZ+P7ld3blj289MJZOxq55rIj3WHLx4DAVFXhcJpCEUEy+T9Mo 5KNsVl86RxzDcGFXuZIx56Ye0p5dJjFsgp1q5e9Z20pRCkWE9V6tvAztxQebK78rUl7N WFYzcVFQUtLDBQpNU693LsDBjPnM/0Q3wVf+BATrXvGByFPiG7ZGo/4YHoxzmsO/ONJG P5adYgsLWJfrs2uBpEQUcR4GTzke15qL46A4PQLmQf3Qjj2LamE/Cisgvu0rVPIJ9kfT NRoe1v0665WGHYb4lAY5nD5UD6I2iC5zFWhJK4iXLGf2e+KhJ0TQ5695fmzVYEVSQ4Y3 8hVA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id:dkim-signature; bh=wk4ID+9Y2woSlIo9KG7KQPxOKMdKteazTwjUAl7uvHc=; fh=0kZfx+klSLWD9hoj74HEotUQ86LI1NMUCd1qU8078u0=; b=yLzlY5RSPIf8NUlbgvJG5KoaVUq/BnQgPlnx2CfaktQheUc9+SvJRz+gypSmiTbb8X ymiSCcNM+kXllL826sWql+rivJxol0RVQSCG1+gZumv0AoJMH+6jGiSy0R3beMoJF47e zqMI1jiVlfMC1FlAnjiYWrJyNMThNjl8KamacwhEphyKxAnC2qqG7CMAV2H56S/Frl5i Mw0RdPJXawv69kN4qJhe+yNwZKUjfbyT9kjgLWMAJe/yPU4rgzDlHi++vHFVeuyQ2/zg 8Mswa8B+FX4KsHTcNqDwPCyCrpkHce5/lGXJnW5vLHKWioTN0eqsAQsKmqENk/zX1V2C lbmQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@daynix-com.20230601.gappssmtp.com header.s=20230601 header.b=xWuhtOwH; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from snail.vger.email (snail.vger.email. [2620:137:e000::3:7]) by mx.google.com with ESMTPS id nl13-20020a17090b384d00b002775298e2efsi6639293pjb.17.2023.10.08.13.04.32 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 08 Oct 2023 13:04:32 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) client-ip=2620:137:e000::3:7; Authentication-Results: mx.google.com; dkim=pass header.i=@daynix-com.20230601.gappssmtp.com header.s=20230601 header.b=xWuhtOwH; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by snail.vger.email (Postfix) with ESMTP id B341280AEB01; Sun, 8 Oct 2023 13:04:31 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at snail.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1344577AbjJHUE0 (ORCPT + 99 others); Sun, 8 Oct 2023 16:04:26 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59486 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1344467AbjJHUEZ (ORCPT ); Sun, 8 Oct 2023 16:04:25 -0400 Received: from mail-oa1-x34.google.com (mail-oa1-x34.google.com [IPv6:2001:4860:4864:20::34]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6EA81B6 for ; Sun, 8 Oct 2023 13:04:24 -0700 (PDT) Received: by mail-oa1-x34.google.com with SMTP id 586e51a60fabf-1c0fcbf7ae4so2939319fac.0 for ; Sun, 08 Oct 2023 13:04:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=daynix-com.20230601.gappssmtp.com; s=20230601; t=1696795462; x=1697400262; darn=vger.kernel.org; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=wk4ID+9Y2woSlIo9KG7KQPxOKMdKteazTwjUAl7uvHc=; b=xWuhtOwHbOEuuqEvdG/xCV9PaV7GlwXKl8ia80sSCWwIR1eQ8nEa8acaC37fDPVm3P xBlVBOGV5fvR4EQlP3XoZODoJxOQkACGgoXekrFGGi+a6n17NW+H7VUM+6wjgyxWtl9I q01PcPZFYO1IOzF0nC3KcPq62/J6zskijRWk020awePtmHjWGfXP0VafiMdqySdp7kpI nXuCuVheylTP5Muv+Kyc+rPrYwcG2zE2SFgf+cArebc/FqYL8YAVnotyO9bzvTcnBxWq ThxaXHyblTU23DtHjQ9n7Tw2Zw5Fh3KqRJd4/rZ8fYZ/vOZ+usMzF4QQo5RQv6e08v3k FQSA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1696795462; x=1697400262; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=wk4ID+9Y2woSlIo9KG7KQPxOKMdKteazTwjUAl7uvHc=; b=w6DpoRnd+APMuAT5s5wMJ77gyiTmGEaGB4G+tgtwij4SOE+cYJmxZhxBUAw9V2TdHJ JO1dsXWHkY7PbepehU1YkWqqeLbDm6IDqbnQeaeck/vIr3Hry/lnQkQVR8q5rEA8g4Mx aYd3EPgBKR1gYIBoWCVcOdr6o24LSdGavvInKehlQ5B8wsgKQDUtZZUorHGsepiOi4OF sq3M9JOOlb7kTFGYdD++pq7Q0Ije5+is1R3J1OQaC0YqvRvudAtwU8WgPM1qBE8sjDis pEtIeFke0ZamR1yHJT6F6yCxw+pWgYVs6CUfKaYpmvYLcs9l/fJe/09XW0HyeQKV//+J LEDg== X-Gm-Message-State: AOJu0YxXxdfj2X3Eof+JK6X591j/WdEPCcugv5uBye83XENxp7jrMa+c JkuvrVQLm/5gdZbVGrp2y8v5pQ== X-Received: by 2002:a05:6870:d606:b0:1d6:5649:a88e with SMTP id a6-20020a056870d60600b001d65649a88emr17607839oaq.37.1696795461716; Sun, 08 Oct 2023 13:04:21 -0700 (PDT) Received: from ?IPV6:2400:4050:a840:1e00:78d2:b862:10a7:d486? ([2400:4050:a840:1e00:78d2:b862:10a7:d486]) by smtp.gmail.com with ESMTPSA id c16-20020aa78810000000b00690d1269691sm4895954pfo.22.2023.10.08.13.04.15 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sun, 08 Oct 2023 13:04:21 -0700 (PDT) Message-ID: <8f4ad5bc-b849-4ef4-ac1f-8d5a796205e9@daynix.com> Date: Mon, 9 Oct 2023 05:04:14 +0900 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [RFC PATCH 5/7] tun: Introduce virtio-net hashing feature Content-Language: en-US To: Willem de Bruijn Cc: Jason Wang , "Michael S. Tsirkin" , netdev@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, virtualization@lists.linux-foundation.org, linux-kselftest@vger.kernel.org, bpf@vger.kernel.org, davem@davemloft.net, kuba@kernel.org, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, kafai@fb.com, songliubraving@fb.com, yhs@fb.com, john.fastabend@gmail.com, kpsingh@kernel.org, rdunlap@infradead.org, willemb@google.com, gustavoars@kernel.org, herbert@gondor.apana.org.au, steffen.klassert@secunet.com, nogikh@google.com, pablo@netfilter.org, decui@microsoft.com, jakub@cloudflare.com, elver@google.com, pabeni@redhat.com, Yuri Benditovich References: <20231008052101.144422-1-akihiko.odaki@daynix.com> <20231008052101.144422-6-akihiko.odaki@daynix.com> From: Akihiko Odaki In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (snail.vger.email [0.0.0.0]); Sun, 08 Oct 2023 13:04:31 -0700 (PDT) On 2023/10/09 4:07, Willem de Bruijn wrote: > On Sun, Oct 8, 2023 at 7:22 AM Akihiko Odaki wrote: >> >> virtio-net have two usage of hashes: one is RSS and another is hash >> reporting. Conventionally the hash calculation was done by the VMM. >> However, computing the hash after the queue was chosen defeats the >> purpose of RSS. >> >> Another approach is to use eBPF steering program. This approach has >> another downside: it cannot report the calculated hash due to the >> restrictive nature of eBPF. >> >> Introduce the code to compute hashes to the kernel in order to overcome >> thse challenges. An alternative solution is to extend the eBPF steering >> program so that it will be able to report to the userspace, but it makes >> little sense to allow to implement different hashing algorithms with >> eBPF since the hash value reported by virtio-net is strictly defined by >> the specification. >> >> The hash value already stored in sk_buff is not used and computed >> independently since it may have been computed in a way not conformant >> with the specification. >> >> Signed-off-by: Akihiko Odaki >> --- > >> +static const struct tun_vnet_hash_cap tun_vnet_hash_cap = { >> + .max_indirection_table_length = >> + TUN_VNET_HASH_MAX_INDIRECTION_TABLE_LENGTH, >> + >> + .types = VIRTIO_NET_SUPPORTED_HASH_TYPES >> +}; > > No need to have explicit capabilities exchange like this? Tun either > supports all or none. tun does not support VIRTIO_NET_RSS_HASH_TYPE_IP_EX, VIRTIO_NET_RSS_HASH_TYPE_TCP_EX, and VIRTIO_NET_RSS_HASH_TYPE_UDP_EX. It is because the flow dissector does not support IPv6 extensions. The specification is also vague, and does not tell how many TLVs should be consumed at most when interpreting destination option header so I chose to avoid adding code for these hash types to the flow dissector. I doubt anyone will complain about it since nobody complains for Linux. I'm also adding this so that we can extend it later. max_indirection_table_length may grow for systems with 128+ CPUs, or types may have other bits for new protocols in the future. > >> case TUNSETSTEERINGEBPF: >> - ret = tun_set_ebpf(tun, &tun->steering_prog, argp); >> + bpf_ret = tun_set_ebpf(tun, &tun->steering_prog, argp); >> + if (IS_ERR(bpf_ret)) >> + ret = PTR_ERR(bpf_ret); >> + else if (bpf_ret) >> + tun->vnet_hash.flags &= ~TUN_VNET_HASH_RSS; > > Don't make one feature disable another. > > TUNSETSTEERINGEBPF and TUNSETVNETHASH are mutually exclusive > functions. If one is enabled the other call should fail, with EBUSY > for instance. > >> + case TUNSETVNETHASH: >> + len = sizeof(vnet_hash); >> + if (copy_from_user(&vnet_hash, argp, len)) { >> + ret = -EFAULT; >> + break; >> + } >> + >> + if (((vnet_hash.flags & TUN_VNET_HASH_REPORT) && >> + (tun->vnet_hdr_sz < sizeof(struct virtio_net_hdr_v1_hash) || >> + !tun_is_little_endian(tun))) || >> + vnet_hash.indirection_table_mask >= >> + TUN_VNET_HASH_MAX_INDIRECTION_TABLE_LENGTH) { >> + ret = -EINVAL; >> + break; >> + } >> + >> + argp = (u8 __user *)argp + len; >> + len = (vnet_hash.indirection_table_mask + 1) * 2; >> + if (copy_from_user(vnet_hash_indirection_table, argp, len)) { >> + ret = -EFAULT; >> + break; >> + } >> + >> + argp = (u8 __user *)argp + len; >> + len = virtio_net_hash_key_length(vnet_hash.types); >> + >> + if (copy_from_user(vnet_hash_key, argp, len)) { >> + ret = -EFAULT; >> + break; >> + } > > Probably easier and less error-prone to define a fixed size control > struct with the max indirection table size. I made its size variable because the indirection table and key may grow in the future as I wrote above. > > Btw: please trim the CC: list considerably on future patches. I'll do so in the next version with the TUNSETSTEERINGEBPF change you proposed.