Received: by 2002:a05:7412:b130:b0:e2:908c:2ebd with SMTP id az48csp1089981rdb; Sat, 18 Nov 2023 02:43:05 -0800 (PST) X-Google-Smtp-Source: AGHT+IHdgkfXS0iGyUDLOxRYj4BGylidrtbECM8XcbBSYZySjLPCDR8D/rDgRNy4fHdegW++ld9d X-Received: by 2002:a05:6a20:9153:b0:187:4bf3:fc50 with SMTP id x19-20020a056a20915300b001874bf3fc50mr9971871pzc.13.1700304184964; Sat, 18 Nov 2023 02:43:04 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1700304184; cv=none; d=google.com; s=arc-20160816; b=oByW1HZv4lB9SK0mqHNO9xWUZoluJvGRdlejlxGbFPzAE/CURS8QqvOHUiG4hQP6n0 z1l6js8aeutEoSj6gwwDQBAlB5qtmc4ukCr9/kWDxOLHm5aQ0/0ZC6BcJDzwx65X6no7 j8f380GcU3SjfFnfFCGJJFfCAwyWD9EeE9sZ1et9mhq6SsYkIQ+i1kuA5rosixdLeFfw 2lGWQAljwgZE7GusH6+Z16F3ISmIxi23JotmeaLxJcwUDva0Kql0vAbHgsTBouwJZbsN rbmJA5UAyoIHCHIWJAuDdeZwcF1hD/Kq3cYUwzMpvqEcwW+WIFwLOLjri0ZVqNBusYn4 JTTw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:references :cc:to:from:content-language:subject:user-agent:mime-version:date :message-id:dkim-signature; bh=zy1KCU9oTgYlIGJ1CmWqV4X6k+RLxXAN+NT2CTiEe4g=; fh=ztXqv03hJuglHa+bsPVSjOh5v0FnQifuRSr026y9IJQ=; b=onchndpI4L32kuEpWnFAiW7pEMnivnC8rWe85+EFEnuRcNjCKU+XWzqgrIj1TsnHsW yhtp+soVC66Ee0F5gT1CmeHzX1DIQNMJ6LISzgYYumGprEDGk9xf4D/NnkRGZF89VNL/ /tgJKlAybo+I+pT8SpxvFka0zoswmEcX2s0bAFscdvfxLzz+Aq25NMHB++GKe+DnDDOt QcBmEMnw0zjGfKEfRqOXuOdymSMc1gDRbidNSlRRURvLfujGI7nrlWODoBTPDfNtSB51 KlJAhVux3lYXO6DtQU1XMTbJWVofwyO2d+DohppQNZlxhADqtP0KFOWGnqQpSW8EaVWv Depg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@daynix-com.20230601.gappssmtp.com header.s=20230601 header.b=OgOH+LU+; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:8 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from fry.vger.email (fry.vger.email. [2620:137:e000::3:8]) by mx.google.com with ESMTPS id h26-20020a63531a000000b005b95fbb1747si3984022pgb.761.2023.11.18.02.43.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 18 Nov 2023 02:43:04 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:8 as permitted sender) client-ip=2620:137:e000::3:8; Authentication-Results: mx.google.com; dkim=pass header.i=@daynix-com.20230601.gappssmtp.com header.s=20230601 header.b=OgOH+LU+; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:8 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by fry.vger.email (Postfix) with ESMTP id A24F5809F399; Sat, 18 Nov 2023 02:42:41 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at fry.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229730AbjKRKjM (ORCPT + 99 others); Sat, 18 Nov 2023 05:39:12 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38098 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229449AbjKRKjL (ORCPT ); Sat, 18 Nov 2023 05:39:11 -0500 Received: from mail-pl1-x633.google.com (mail-pl1-x633.google.com [IPv6:2607:f8b0:4864:20::633]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A794710C1 for ; Sat, 18 Nov 2023 02:39:06 -0800 (PST) Received: by mail-pl1-x633.google.com with SMTP id d9443c01a7336-1ce5e65ba37so10127175ad.1 for ; Sat, 18 Nov 2023 02:39:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=daynix-com.20230601.gappssmtp.com; s=20230601; t=1700303946; x=1700908746; darn=vger.kernel.org; h=content-transfer-encoding:in-reply-to:references:cc:to:from :content-language:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=zy1KCU9oTgYlIGJ1CmWqV4X6k+RLxXAN+NT2CTiEe4g=; b=OgOH+LU+DlnpLwQZuxxGIoOgEifApS1GwUrc/zmItgbKDpwXwQj0hKw2dQ9h8qisGG 31MHSx4nUFyYfqUOeC/KsC8vt5SgGRLy/I9zO62oysgiepVdv2LUStmtrQ9X/5xhn3uu lKBuB2opDvL4jmKeuncrj7+HCzmR4QQNcnAJBWNNXWhg/x9ju08cZTqsCXgxtcZell1t QmMgAU2E12RmFuBRC0Wk9Rxi4J/SDQivi2MuNFNb9l1EfpebPy4SaEuk0cONzx78Klwj H7XiyXqGdxFW/TcDXpPSu1+n8Vzv+NGVm8jIv+Mx53l3el4WUk41UdohTbLybQherx48 SklA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1700303946; x=1700908746; h=content-transfer-encoding:in-reply-to:references:cc:to:from :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=zy1KCU9oTgYlIGJ1CmWqV4X6k+RLxXAN+NT2CTiEe4g=; b=BCnT6rGX/4/v0Nv+i9Sw8y9FbILmsZrhoRPeaENXSV4/q/NUCDwrUPoCjsr+ycjYik 0qaEVyfEjmipniDvgQN+dvwbLPHuoBWikPpM8U4h7JMam+4/6K60vh+Wuf765xzHnglr eHXje3Kr/E8vfy2BKlK5zFfl92wNX1zDPzOt36buBMga8eLEZpWsbuhzGM6TspSW8XAQ BARszw6tep2B5dCGnBfjI4dVIyAKYrplIMHlcio34XsIBfCOa1TN+TxLntu4J9S8Uoz1 svHhgugDFYiJtD64cdmiJVnxCiT+U4g8V45ItpWHOdrA0UtEO5nLBj/0qdXAMaeRp1w8 /PlA== X-Gm-Message-State: AOJu0YyiLQt2NT/Goo3G24RYmHD8RvV4pe+THbQhhU6Ndf1V+twIkIzl HW6ZCjYopzF5vJNrYmViEEZzVw== X-Received: by 2002:a17:902:e88e:b0:1cc:5aef:f2c3 with SMTP id w14-20020a170902e88e00b001cc5aeff2c3mr11314929plg.22.1700303946093; Sat, 18 Nov 2023 02:39:06 -0800 (PST) Received: from [157.82.205.15] ([157.82.205.15]) by smtp.gmail.com with ESMTPSA id c11-20020a170902aa4b00b001b896686c78sm2745643plr.66.2023.11.18.02.38.59 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sat, 18 Nov 2023 02:39:05 -0800 (PST) Message-ID: <6253fb6b-9a53-484a-9be5-8facd46c051e@daynix.com> Date: Sat, 18 Nov 2023 19:38:58 +0900 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [RFC PATCH v2 1/7] bpf: Introduce BPF_PROG_TYPE_VNET_HASH Content-Language: en-US From: Akihiko Odaki To: Alexei Starovoitov , Jason Wang Cc: Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , John Fastabend , KP Singh , Stanislav Fomichev , Hao Luo , Jiri Olsa , Jonathan Corbet , Willem de Bruijn , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , "Michael S. Tsirkin" , Xuan Zhuo , Mykola Lysenko , Shuah Khan , bpf , "open list:DOCUMENTATION" , LKML , Network Development , kvm@vger.kernel.org, virtualization@lists.linux-foundation.org, "open list:KERNEL SELFTEST FRAMEWORK" , Yuri Benditovich , Andrew Melnychenko References: <20231015141644.260646-1-akihiko.odaki@daynix.com> <20231015141644.260646-2-akihiko.odaki@daynix.com> <2594bb24-74dc-4785-b46d-e1bffcc3e7ed@daynix.com> <9a4853ad-5ef4-4b15-a49e-9edb5ae4468e@daynix.com> In-Reply-To: <9a4853ad-5ef4-4b15-a49e-9edb5ae4468e@daynix.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-0.8 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on fry.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (fry.vger.email [0.0.0.0]); Sat, 18 Nov 2023 02:42:41 -0800 (PST) On 2023/10/18 4:19, Akihiko Odaki wrote: > On 2023/10/18 4:03, Alexei Starovoitov wrote: >> On Mon, Oct 16, 2023 at 7:38 PM Jason Wang wrote: >>> >>> On Tue, Oct 17, 2023 at 7:53 AM Alexei Starovoitov >>> wrote: >>>> >>>> On Sun, Oct 15, 2023 at 10:10 AM Akihiko Odaki >>>> wrote: >>>>> >>>>> On 2023/10/16 1:07, Alexei Starovoitov wrote: >>>>>> On Sun, Oct 15, 2023 at 7:17 AM Akihiko Odaki >>>>>> wrote: >>>>>>> >>>>>>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h >>>>>>> index 0448700890f7..298634556fab 100644 >>>>>>> --- a/include/uapi/linux/bpf.h >>>>>>> +++ b/include/uapi/linux/bpf.h >>>>>>> @@ -988,6 +988,7 @@ enum bpf_prog_type { >>>>>>>           BPF_PROG_TYPE_SK_LOOKUP, >>>>>>>           BPF_PROG_TYPE_SYSCALL, /* a program that can execute >>>>>>> syscalls */ >>>>>>>           BPF_PROG_TYPE_NETFILTER, >>>>>>> +       BPF_PROG_TYPE_VNET_HASH, >>>>>> >>>>>> Sorry, we do not add new stable program types anymore. >>>>>> >>>>>>> @@ -6111,6 +6112,10 @@ struct __sk_buff { >>>>>>>           __u8  tstamp_type; >>>>>>>           __u32 :24;              /* Padding, future use. */ >>>>>>>           __u64 hwtstamp; >>>>>>> + >>>>>>> +       __u32 vnet_hash_value; >>>>>>> +       __u16 vnet_hash_report; >>>>>>> +       __u16 vnet_rss_queue; >>>>>>>    }; >>>>>> >>>>>> we also do not add anything to uapi __sk_buff. >>>>>> >>>>>>> +const struct bpf_verifier_ops vnet_hash_verifier_ops = { >>>>>>> +       .get_func_proto         = sk_filter_func_proto, >>>>>>> +       .is_valid_access        = sk_filter_is_valid_access, >>>>>>> +       .convert_ctx_access     = bpf_convert_ctx_access, >>>>>>> +       .gen_ld_abs             = bpf_gen_ld_abs, >>>>>>> +}; >>>>>> >>>>>> and we don't do ctx rewrites like this either. >>>>>> >>>>>> Please see how hid-bpf and cgroup rstat are hooking up bpf >>>>>> in _unstable_ way. >>>>> >>>>> Can you describe what "stable" and "unstable" mean here? I'm new to >>>>> BPF >>>>> and I'm worried if it may mean the interface stability. >>>>> >>>>> Let me describe the context. QEMU bundles an eBPF program that is used >>>>> for the "eBPF steering program" feature of tun. Now I'm proposing to >>>>> extend the feature to allow to return some values to the userspace and >>>>> vhost_net. As such, the extension needs to be done in a way that >>>>> ensures >>>>> interface stability. >>>> >>>> bpf is not an option then. >>>> we do not add stable bpf program types or hooks any more. >>> >>> Does this mean eBPF could not be used for any new use cases other than >>> the existing ones? >> >> It means that any new use of bpf has to be unstable for the time being. > > Can you elaborate more about making new use unstable "for the time > being?" Is it a temporary situation? What is the rationale for that? > Such information will help devise a solution that is best for both of > the BPF and network subsystems. > > I would also appreciate if you have some documentation or link to > relevant discussions on the mailing list. That will avoid having same > discussion you may already have done in the past. Hi, The discussion has been stuck for a month, but I'd still like to continue figuring out the way best for the whole kernel to implement this feature. I summarize the current situation and question that needs to be answered before push this forward: The goal of this RFC is to allow to report hash values calculated with eBPF steering program. It's essentially just to report 4 bytes from the kernel to the userspace. Unfortunately, however, it is not acceptable for the BPF subsystem because the "stable" BPF is completely fixed these days. The "unstable/kfunc" BPF is an alternative, but the eBPF program will be shipped with a portable userspace program (QEMU)[1] so the lack of interface stability is not tolerable. Another option is to hardcode the algorithm that was conventionally implemented with eBPF steering program in the kernel[2]. It is possible because the algorithm strictly follows the virtio-net specification[3]. However, there are proposals to add different algorithms to the specification[4], and hardcoding the algorithm to the kernel will require to add more UAPIs and code each time such a specification change happens, which is not good for tuntap. In short, the proposed feature requires to make either of three compromises: 1. Compromise on the BPF side: Relax the "stable" BPF feature freeze once and allow eBPF steering program to report 4 more bytes to the kernel. 2. Compromise on the tuntap side: Implement the algorithm to the kernel, and abandon the capability to update the algorithm without changing the kernel. IMHO, I think it's better to make a compromise on the BPF side (option 1). We should minimize the total UAPI changes in the whole kernel, and option 1 is much superior in that sense. Yet I have to note that such a compromise on the BPF side can risk the "stable" BPF feature freeze fragile and let other people complain like "you allowed to change stable BPF for this, why do you reject [some other request to change stable BPF]?" It is bad for BPF maintainers. (I can imagine that introducing and maintaining widely different BPF interfaces is too much burden.) And, of course, this requires an approval from BPF maintainers. So I'd like to ask you that which of these compromises you think worse. Please also tell me if you have another idea. Regards, Akihiko Odaki [1] https://qemu.readthedocs.io/en/v8.1.0/devel/ebpf_rss.html [2] https://lore.kernel.org/all/20231008052101.144422-1-akihiko.odaki@daynix.com/ [3] https://docs.oasis-open.org/virtio/virtio/v1.2/csd01/virtio-v1.2-csd01.html#x1-2400003 [4] https://lore.kernel.org/all/CACGkMEuBbGKssxNv5AfpaPpWQfk2BHR83rM5AHXN-YVMf2NvpQ@mail.gmail.com/