Received: by 2002:a05:7412:8d10:b0:f3:1519:9f41 with SMTP id bj16csp3759606rdb; Sun, 10 Dec 2023 21:04:50 -0800 (PST) X-Google-Smtp-Source: AGHT+IH8qp+DbH/MZU7OwQGVY+zMjtefrk/DhjTXhpHIWzROWBouAnvu24MLMqpXgu8/72pKnnYC X-Received: by 2002:a05:6a00:14c6:b0:6ce:450c:657d with SMTP id w6-20020a056a0014c600b006ce450c657dmr4607443pfu.3.1702271090454; Sun, 10 Dec 2023 21:04:50 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1702271090; cv=none; d=google.com; s=arc-20160816; b=QnbZL/v9H7i69mmWIR1Uz0N5Q+XKfHOQBSebUasZiIz6rpmGSmg329S768NrSXjw+j gjMIKIbmrdBCYNnfCPUimo/x8d66VABbZtjdsi0QW43liylgvHqs1jG6U0qQO8Gn1lzX 6O2XbMx1U/PV3D8EXCkO4I6PqgRZRo9Y0IXv3GgxYqt/wMfuCofFZtdTOi1Uvi2fWbrw opt8n8gfFm4tNXVcP6qkraYMFJLELInv3ehyiYePKvmJ7yJbo676M8JOLR6xvHZkuI5T ZMT1CIUHMWnC1a17t4zwMiTlrU4DhqJdvWSVQ7ddWA2+alB0m8Q5aqNiCSgkriTgen74 xMqg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :content-language:references:cc:to:subject:user-agent:mime-version :date:message-id:dkim-signature; bh=BalHLDcooMB+t0Ag91tFQwsYCexpEQu0oQz20kE5Mf0=; fh=6JYbqZNmWeK6hBmM1DtV2cJfkSydirELGIkVrao/NK0=; b=U9mj5Yd3IZlDMIyLZiNeyGYPf99Kaa6tWcQKX8nFZdp0v4JF+ccE0SkD6GldwboNDn saVKGouVVUx9R7xLlo6kVAKhX05LcHaa8buJec+68t/yjXYMI8ZRVr808VxmxAVfg5Cg WPmJenLLyvuwR3gpL0nkJiZOSnXmfqkrC8nv8INEQ3gi6+ZgfYXQteJCAclM7PardhEq Ryf+fqIM2QnvL0UNrPlA9ncJwBcjWCGeYQgeXCspWMiSIF1jVuf4+pzIFJvh58Uus010 aapa2N9nGxG1+oazsX/xnlXHAXGSfuTuGJDYbSgq7pynxZHyL1JWLDK/WKNBhM6onEB4 /Wig== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@daynix-com.20230601.gappssmtp.com header.s=20230601 header.b=27Kks5wu; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:1 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from morse.vger.email (morse.vger.email. [2620:137:e000::3:1]) by mx.google.com with ESMTPS id k25-20020a6568d9000000b005c207717411si5466168pgt.864.2023.12.10.21.04.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 10 Dec 2023 21:04:50 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:1 as permitted sender) client-ip=2620:137:e000::3:1; Authentication-Results: mx.google.com; dkim=pass header.i=@daynix-com.20230601.gappssmtp.com header.s=20230601 header.b=27Kks5wu; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:1 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by morse.vger.email (Postfix) with ESMTP id B8824805CAC6; Sun, 10 Dec 2023 21:04:47 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at morse.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229483AbjLKFEa (ORCPT + 99 others); Mon, 11 Dec 2023 00:04:30 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44980 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229445AbjLKFE3 (ORCPT ); Mon, 11 Dec 2023 00:04:29 -0500 Received: from mail-oo1-xc2a.google.com (mail-oo1-xc2a.google.com [IPv6:2607:f8b0:4864:20::c2a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 51DAFEA for ; Sun, 10 Dec 2023 21:04:35 -0800 (PST) Received: by mail-oo1-xc2a.google.com with SMTP id 006d021491bc7-5907ded6287so2228542eaf.0 for ; Sun, 10 Dec 2023 21:04:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=daynix-com.20230601.gappssmtp.com; s=20230601; t=1702271074; x=1702875874; darn=vger.kernel.org; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=BalHLDcooMB+t0Ag91tFQwsYCexpEQu0oQz20kE5Mf0=; b=27Kks5wuAVKjtTcqWGs6wgrbfjuHo3f5Gx8tvh/WNdJcnYotDNA2sCWbE+lvHdJ085 Ac2YbK5zU46cLxPujBkF0gAmB9BxuFkbBWXcz3qXF7KMq+Mx96SU9vYYSPM6Io7578NV LMJiGVnYFWCaksXbwQcEs3jl1ZJqF6GnHXTzblTUtOH39z5LZuor8+JZSt2cuUbAPlWC ymU27E+k/12onMZ62Q+T/f+9WJKj2oBgcDZPSu/6Bqi8X+23ym2brol85Q8AdohyhtK2 GZmd63U/x3qZJbS21jIT1CdyaZE5xZK+EbtTbzd0kMz6IrsT04TuNGsbsDpzC+9wN5VV zM3Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1702271074; x=1702875874; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=BalHLDcooMB+t0Ag91tFQwsYCexpEQu0oQz20kE5Mf0=; b=lhaBORA3SHVhW5uPwuHT5lL+k1kN28NehhlxBlucW62XddMA7kIWZycjba2UsNdLAE CYYpJ20eqt3EwandsczYiIJJqR0esV+AfMLmYdeBvllzMxcFlGr3uwzdKx/8nTWhQQho U1jQxfabXSLg3iZtzjp9k6x21/Mssb0cKsEggxyBSyuBPDRn9n6jBokWBTPNl7pq9aqi WSmmt7c4C1RmOqm2s/5fmCWoIPd3loNKMvFY7cogzjHUsvGKpBMArbaFDBLlZ+GzqjmY 9h51MAUUbMg80UVT5EhlhWv0xc5kNrysWm7CVVjmfD/9RSlqZO1Un22CoyVzW3/Ttcwq qTFw== X-Gm-Message-State: AOJu0YzxqDmNfR/DmLXXX6x/ruSzAmr84J22S5GxlsMjJI3Q7IMrxlXX qBfhsQG8P4B86O6bbrTuaf5IRw== X-Received: by 2002:a05:6359:1b05:b0:170:756:90fd with SMTP id up5-20020a0563591b0500b00170075690fdmr2599052rwb.19.1702271074519; Sun, 10 Dec 2023 21:04:34 -0800 (PST) Received: from [157.82.205.15] ([157.82.205.15]) by smtp.gmail.com with ESMTPSA id k189-20020a6324c6000000b005bcea1bf43bsm5314561pgk.12.2023.12.10.21.04.28 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sun, 10 Dec 2023 21:04:34 -0800 (PST) Message-ID: <49a5b971-ae97-4118-ae20-f651ad14bed7@daynix.com> Date: Mon, 11 Dec 2023 14:04:27 +0900 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [RFC PATCH v2 1/7] bpf: Introduce BPF_PROG_TYPE_VNET_HASH To: Song Liu Cc: Alexei Starovoitov , Jason Wang , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau , Yonghong Song , John Fastabend , KP Singh , Stanislav Fomichev , Hao Luo , Jiri Olsa , Jonathan Corbet , Willem de Bruijn , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , "Michael S. Tsirkin" , Xuan Zhuo , Mykola Lysenko , Shuah Khan , bpf , "open list:DOCUMENTATION" , LKML , Network Development , kvm@vger.kernel.org, virtualization@lists.linux-foundation.org, "open list:KERNEL SELFTEST FRAMEWORK" , Yuri Benditovich , Andrew Melnychenko References: <20231015141644.260646-1-akihiko.odaki@daynix.com> <2594bb24-74dc-4785-b46d-e1bffcc3e7ed@daynix.com> <9a4853ad-5ef4-4b15-a49e-9edb5ae4468e@daynix.com> <6253fb6b-9a53-484a-9be5-8facd46c051e@daynix.com> <664003d3-aadb-4938-80f6-67fab1c9dcdd@daynix.com> Content-Language: en-US From: Akihiko Odaki In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-0.8 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on morse.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (morse.vger.email [0.0.0.0]); Sun, 10 Dec 2023 21:04:47 -0800 (PST) On 2023/12/11 10:40, Song Liu wrote: > On Sat, Dec 9, 2023 at 11:03 PM Akihiko Odaki wrote: >> >> On 2023/11/22 14:36, Akihiko Odaki wrote: >>> On 2023/11/22 14:25, Song Liu wrote: > [...] >> >> Now the discussion is stale again so let me summarize the discussion: >> >> A tuntap device can have an eBPF steering program to let the userspace >> decide which tuntap queue should be used for each packet. QEMU uses this >> feature to implement the RSS algorithm for virtio-net emulation. Now, >> the virtio specification has a new feature to report hash values >> calculated with the RSS algorithm. The goal of this RFC is to report >> such hash values from the eBPF steering program to the userspace. >> >> There are currently three ideas to implement the proposal: >> >> 1. Abandon eBPF steering program and implement RSS in the kernel. >> >> It is possible to implement the RSS algorithm in the kernel as it's >> strictly defined in the specification. However, there are proposals for >> relevant virtio specification changes, and abandoning eBPF steering >> program will loose the ability to implement those changes in the >> userspace. There are concerns that this lead to more UAPI changes in the >> end. >> >> 2. Add BPF kfuncs. >> >> Adding BPF kfuncs is *the* standard way to add BPF interfaces. hid-bpf >> is a good reference for this. >> >> The problem with BPF kfuncs is that kfuncs are not considered as stable >> as UAPI. In my understanding, it is not problematic for things like >> hid-bpf because programs using those kfuncs affect the entire system >> state and expected to be centrally managed. Such BPF programs can be >> updated along with the kernel in a manner similar to kernel modules. >> >> The use case of tuntap steering/hash reporting is somewhat different >> though; the eBPF program is more like a part of application (QEMU or >> potentially other VMM) and thus needs to be portable. For example, a >> user may expect a Debian container with QEMU installed to work on Fedora. >> >> BPF kfuncs do still provide some level of stability, but there is no >> documentation that tell how stable they are. The worst case scenario I >> can imagine is that a future legitimate BPF change breaks QEMU, letting >> the "no regressions" rule force the change to be reverted. Some >> assurance that kind scenario will not happen is necessary in my opinion. > > I don't think we can provide stability guarantees before seeing something > being used in the field. How do we know it will be useful forever? If a > couple years later, there is only one person using it somewhere in the > world, why should we keep supporting it? If there are millions of virtual > machines using it, why would you worry about it being removed? I have a different opinion about providing stability guarantees; I believe it is safe to provide such a guarantee without actual use in a field. We develop features expecting there are real uses, and if it turns out otherwise, we can break the stated guarantee since there is no real use cases anyway. It is fine even breaking UAPIs in such a case, which is stated in Documentation/admin-guide/reporting-regressions.rst. So I rather feel easy about guaranteeing UAPI stability; we can just guarantee the UAPI-level stability for a particular kfunc and use it from QEMU expecting the stability. If the feature is found not useful, QEMU and the kernel can just remove it. I'm more concerned about the other case, which means that there will be wide uses of this feature. A kernel developer may assume the stability of the interface is like one of kernel internal APIs (Documentation/bpf/kfuncs.rst says kfuncs are like EXPORT_SYMBOL_GPL) and decide to change it, breaking old QEMU binaries and that's something I would like to avoid. Regarding the breakage scenario, I think we can avoid the kfuncs removal just by saying "we won't remove them". I'm more worried the case that a change in the BPF kfunc infrastucture requires to recompile the binary. So, in short, I don't think we can say "kfuncs are like EXPORT_SYMBOL_GPL" and "you can freely use kfuncs in a normal userspace application like QEMU" at the same time. > >> >> 3. Add BPF program type derived from the conventional steering program type >> >> In principle, it's just to add a feature to report four more bytes to >> the conventional steering program. However, BPF program types are frozen >> for feature additions and the proposed change will break the feature freeze. >> >> So what's next? I'm inclined to option 3 due to its minimal ABI/API >> change, but I'm also fine with option 2 if it is possible to guarantee >> the ABI/API stability necessary to run pre-built QEMUs on future kernel >> versions by e.g., explicitly stating the stability of kfuncs. If no >> objection arises, I'll resend this series with the RFC prefix dropped >> for upstream inclusion. If it's decided to go for option 1 or 2, I'll >> post a new version of the series implementing the idea. > > Probably a dumb question, but does this RFC fall into option 3? If > that's the case, I seriously don't think it's gonna happen. Yes, it's option 3. > > I would recommend you give option 2 a try and share the code. This is > probably the best way to move the discussion forward. I'd like to add a documentation change to say the added kfuncs are exceptional cases that are not like EXPORT_SYMBOL_GPL in that case. Will it work? Regards, Akihiko Odaki