Received: by 2002:a05:7412:8d10:b0:f3:1519:9f41 with SMTP id bj16csp4545930rdb; Tue, 12 Dec 2023 02:40:11 -0800 (PST) X-Google-Smtp-Source: AGHT+IGXJrbP2CL1LhoVwtnnsPtiSu+OrDyyTRPd09GSzBoBf7FY0QbIOyvEpevD85y9TMkGE5Bx X-Received: by 2002:a17:903:32c8:b0:1d0:5806:f45d with SMTP id i8-20020a17090332c800b001d05806f45dmr6258903plr.42.1702377610891; Tue, 12 Dec 2023 02:40:10 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1702377610; cv=none; d=google.com; s=arc-20160816; b=wjv8y7Inej76p1KnbsWV+CV15Qx3BNQHnetO4eI5stOZ0WEEmUUcfCOlcG8k6HaiCh Z4KSsC4u/rI1HAYQ/Fp0Cp4t9JMCRD8zM30MosnX2/KGoYswtH2xpWtvx7u5NMjJzcgG QdMNs3GhQNzBsm3lq7xtP+X4f3das9JocswZCbyLIDov4gMxa1EAKKbSmNWj5nWc4/FU 1k23EaJ4LQM7ed26LHAR9PPTO3+dTaSf+BrU6IIa3pp4/qxlaH9kFmCTjqP3jwaRbZEy 7mVb5jgZBLVEl6kaRBYUbcaVcgjueADUxbuP79SONz0MuLuD59D5nJArT/C1lbV9Jp+O Gyug== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=fggtRaV/8orZgO1gMNRIxp3acepW1r0vajRNrmoShyY=; fh=6vv6QE8WWMgGDZmVbwzv04phgPedl5FJOYB5JAEyRgo=; b=CiU2iTWGCKk4uz4r6xVVBLrWA7KO10lrtt9BMKGDcNAh/5hjW1ypJTq1+m+yslpjk/ Gja2cteXffG7Gt7uTnPZz0dc+SSk+GDnuntnRcqcYkrDG/9i2A7gz9/LrJoJ3Q327u+B B6E12P7LAmR1wspHyt9PO0Vf1BaUiC6/As33L+1gsPPphTDOcYsDmZkvUHrlmOysRsvz tHlxeYznZEBJj3catl+9eGa57DIKG/UiXPXm3qnZzbbCOoe/TDDC0q2d86Hu1I8H5Xa9 RijRuIVcIEWpcKBRbVin99GhD3fClMgm1wONDBdfkQpNFUZzjvgS+fxzd2lra09zKh29 LAHg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=NX6fGH2p; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:1 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from morse.vger.email (morse.vger.email. [2620:137:e000::3:1]) by mx.google.com with ESMTPS id k9-20020a170902c40900b001cff3536eb6si7749076plk.463.2023.12.12.02.40.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 Dec 2023 02:40:10 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:1 as permitted sender) client-ip=2620:137:e000::3:1; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=NX6fGH2p; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:1 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by morse.vger.email (Postfix) with ESMTP id E084380219EB; Tue, 12 Dec 2023 02:40:07 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at morse.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1346185AbjLLKju (ORCPT + 99 others); Tue, 12 Dec 2023 05:39:50 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38946 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235073AbjLLKjt (ORCPT ); Tue, 12 Dec 2023 05:39:49 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 189E6AC for ; Tue, 12 Dec 2023 02:39:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1702377594; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=fggtRaV/8orZgO1gMNRIxp3acepW1r0vajRNrmoShyY=; b=NX6fGH2pHIiCL1YEkobrXowP1+DxTOLTqXWjp5UEfY/7jwlo1O+oIqT3fHEe3v+1zREqbI +MprYy8FjsjmqpaaJ/k5oDPWNuxovKzR1jw0mNlq8W/J/yb2b68qtzLT82L1CpnBljGaWW 0ySQu6ts52IyASpayUaCedHkUgQnY94= Received: from mail-ej1-f72.google.com (mail-ej1-f72.google.com [209.85.218.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-693-MeNehHkjMsyuqcvFK-LvlA-1; Tue, 12 Dec 2023 05:39:53 -0500 X-MC-Unique: MeNehHkjMsyuqcvFK-LvlA-1 Received: by mail-ej1-f72.google.com with SMTP id a640c23a62f3a-a1f93205e60so83886366b.3 for ; Tue, 12 Dec 2023 02:39:52 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1702377587; x=1702982387; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=fggtRaV/8orZgO1gMNRIxp3acepW1r0vajRNrmoShyY=; b=qfRXyLdeFLwBVxxHqYLthBUPEZbV008zV8omd3unKFSay6lHf6wmGTjx9Xq6QpZmff AgdZP8YvGJp8WbFaOvLAw7NZcUTEUEa65PXyy1l2ghR/ILZJwx3QQHubG6ulI20wF00e Lm/9uJlUBKmJlH2Rg5OmYGsTqEjjNJDuzSNZwAIVKRxf4U7wbrEaKNLJbG00kSpx19yI /gWwdtRW1swUULZ0DOrVFpTbBFJT19FQ7n+6fr2uFVPuhpuGMJAzJPq+PqZ7Av2Niet+ kbFcvjNNKk0kS6d/SWwlXaRCUNKM8CHRRze6v0rdPEj6XfmfhcJzeol1FRVg5qKXKLVS HgKg== X-Gm-Message-State: AOJu0Yx++3Y4FFGyqQsF00bOK44ZHo7Dr5+QXAFIT3ZAbGad7gwQFpFP qOIUoJ7NSf+nk6ky23tooQ3iWnuR1wBM9mcplqfH4NRjeghoHUo96EmCo2lFsuPHvx+CWoHRmBH BLgyXATtQwGcdHs7GO0nqRqBNPOsLUpFcLEqi5PzM X-Received: by 2002:a17:906:5194:b0:a1a:4a36:66fc with SMTP id y20-20020a170906519400b00a1a4a3666fcmr1343859ejk.16.1702377587459; Tue, 12 Dec 2023 02:39:47 -0800 (PST) X-Received: by 2002:a17:906:5194:b0:a1a:4a36:66fc with SMTP id y20-20020a170906519400b00a1a4a3666fcmr1343842ejk.16.1702377586761; Tue, 12 Dec 2023 02:39:46 -0800 (PST) MIME-Version: 1.0 References: <2f33be45-fe11-4b69-8e89-4d2824a0bf01@daynix.com> In-Reply-To: <2f33be45-fe11-4b69-8e89-4d2824a0bf01@daynix.com> From: Benjamin Tissoires Date: Tue, 12 Dec 2023 11:39:34 +0100 Message-ID: Subject: Re: Should I add BPF kfuncs for userspace apps? And how? To: Akihiko Odaki Cc: Alexei Starovoitov , Jason Wang , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau , Yonghong Song , John Fastabend , KP Singh , Stanislav Fomichev , Hao Luo , Jiri Olsa , Jonathan Corbet , Willem de Bruijn , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , "Michael S. Tsirkin" , Xuan Zhuo , Mykola Lysenko , Shuah Khan , Yuri Benditovich , Andrew Melnychenko , Benjamin Tissoires , bpf , "open list:DOCUMENTATION" , kvm@vger.kernel.org, LKML , virtualization@lists.linux-foundation.org, "open list:KERNEL SELFTEST FRAMEWORK" , Network Development Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-0.9 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on morse.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (morse.vger.email [0.0.0.0]); Tue, 12 Dec 2023 02:40:08 -0800 (PST) Hi, On Tue, Dec 12, 2023 at 9:11=E2=80=AFAM Akihiko Odaki wrote: > > Hi, > > It is said eBPF is a safe way to extend kernels and that is very > attarctive, but we need to use kfuncs to add new usage of eBPF and > kfuncs are said as unstable as EXPORT_SYMBOL_GPL. So now I'd like to ask > some questions: > > 1) Which should I choose, BPF kfuncs or ioctl, when adding a new feature > for userspace apps? > 2) How should I use BPF kfuncs from userspace apps if I add them? > > Here, a "userspace app" means something not like a system-wide daemon > like systemd (particularly, I have QEMU in mind). I'll describe the > context more below: I'm probably not the best person in the world to answer your questions, Alexei and others from the BPF core group are, but given that you pointed at a thread I was involved in, I feel I can give you a few pointers. But first and foremost, I encourage you to schedule an agenda item in the BPF office hour[4]. Being able to talk with the core people directly was tremendously helpful to me to understand their point. > > --- > > I'm working on a new feature that aids virtio-net implementations using > tuntap virtual network device. You can see [1] for details, but > basically it's to extend BPF_PROG_TYPE_SOCKET_FILTER to report four more > bytes. > > However, with long discussions we have confirmed extending > BPF_PROG_TYPE_SOCKET_FILTER is not going to happen, and adding kfuncs is > the way forward. So I decided how to add kfuncs to the kernel and how to > use it. There are rich documentations for the kernel side, but I found > little about the userspace. The best I could find is a systemd change > proposal that is based on WIP kernel changes[2]. Yes, as Alexei already replied, BPF is not adding new stable APIs, only kfuncs. The reason being that once it's marked as stable, you can't really remove it, even if you think it's badly designed and useless. Kfuncs, OTOH are "unstable" by default meaning that the constraints around it are more relaxed. However, "unstable" doesn't mean "unusable". It just means that the kernel might or might not have the function when you load your program in userspace. So you have to take that fact into account from day one, both from the kernel side and the userspace side. The kernel docs have a nice paragraph explaining that situation and makes the distinction between relatively unused kfuncs, and well known established ones. Regarding the systemd discussion you are mentioning ([2]), this is something that I have on my plate for a long time. I think I even mentioned it to Alexei at Kernel Recipes this year, and he frowned his eyebrows when I mentioned it. And looking at the systemd code and the benefits over a plain ioctl, it is clearer that in that case, a plain ioctl is better, mostly because we already know the API and the semantic. A kfunc would be interesting in cases where you are not sure about the overall design, and so you can give a shot at various API solutions without having to keep your bad v1 design forever. > > So now I'm wondering how I should use BPF kfuncs from userspace apps if > I add them. In the systemd discussion, it is told that Linus said it's > fine to use BPF kfuncs in a private infrastructure big companies own, or > in systemd as those users know well about the system[3]. Indeed, those > users should be able to make more assumptions on the kernel than > "normal" userspace applications can. > > Returning to my proposal, I'm proposing a new feature to be used by QEMU > or other VMM applications. QEMU is more like a normal userspace > application, and usually does not make much assumptions on the kernel it > runs on. For example, it's generally safe to run a Debian container > including QEMU installed with apt on Fedora. BPF kfuncs may work even in > such a situation thanks to CO-RE, but it sounds like *accidentally* > creating UAPIs. > > Considering all above, how can I integrate BPF kfuncs to the application? FWIW, I'm not sure you can rely on BPF calls from a container. There is a high chance the syscall gets disabled by the runtime. > > If BPF kfuncs are like EXPORT_SYMBOL_GPL, the natural way to handle them > is to think of BPF programs as some sort of kernel modules and > incorporate logic that behaves like modprobe. More concretely, I can put > eBPF binaries to a directory like: > /usr/local/share/qemu/ebpf/$KERNEL_RELEASE I would advise against that (one program per kernel release). Simply because your kfunc may or may not have been backported to kernel release v6.X.Y+1 while it was not there when v6.X.Y was out. So relying on the kernel number is just going to be a headache. As I understand it, the way forward is to rely on the kernel, libbpf and CO-RE: if the function is not available, the program will simply not load, and you'll know that this version of the code is not available (or has changed API). So what I would do if some kfunc API is becoming deprecated, is embedding both code paths in the same BPF unit, but marking them as not loaded by libppf. Then I can load the compilation unit, try v2 of the API, and if it's not available, try v1, and if not, then mention that I can not rely on BPF. Of course, this can also be done with separate compilation units. > > Then, QEMU can uname() and get the path to the binary. It will give an > error if it can't find the binary for the current kernel so that it > won't create accidental UAPIs. > > The obvious downside of this is that it complicates packaging a lot; it > requires packaging QEMU eBPF binaries each time a new kernel comes up. > This complexity is centrally managed by modprobe for kernel modules, but > apparently each application needs to take care of it for BPF programs. For my primary use case: HID-BPF, I put kfuncs in kernel v6.3 and given that I haven't touch this part of the API, the same compilation unit compiled in the v6.3 era still works on a v6.7-rcx, so no, IMO it's not complex and doesn't require to follow the kernel releases (which is the whole point of HID-BPF FWIW). > > In conclusion, I see too much complexity to use BPF in a userspace > application, which we didn't have to care for > BPF_PROG_TYPE_SOCKET_FILTER. Isn't there a better way? Or shouldn't I > use BPF in my case in the first place? Given that I'm not a network person, I'm not sure about your use case, but I would make my decision based on: - do I know exactly what I want to achieve and I'm confident that I'll write the proper kernel API from day one? (if not then kfuncs is appealing because it's less workload in the long run, but userspace needs to be slightly smarter) - are all of my use cases covered by using BPF? (what happens if I run QEMU in a container?) -> BPF might or might not be a solution But the nice thing about using BPF kfuncs is that it allows you to have a testing (not-)UAPI kernel interface. You can then implement the userspace changes and see how it behaves. And then, once you got the right design, you can decide to promote it to a proper syscall or ioctl if you want. Cheers, Benjamin > > Thanks, > Akihiko Odaki > > [1] > https://lore.kernel.org/all/20231015141644.260646-1-akihiko.odaki@daynix.= com/ > [2] https://github.com/systemd/systemd/pull/29797 > [3] https://github.com/systemd/systemd/pull/29797#discussion_r1384637939 > [4] https://docs.google.com/spreadsheets/d/1LfrDXZ9-fdhvPEp_LHkxAMYyxxpwBXj= ywWa0AejEveU