Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp1734943rwd; Wed, 17 May 2023 00:05:08 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ4Lu7GOgGjNEG+PebYOMHxavXj9H5PozYaks6/32hiW2kt2y+9UuOSrEC3YtizOXF3h2BGN X-Received: by 2002:a17:90b:100d:b0:24e:120e:30ff with SMTP id gm13-20020a17090b100d00b0024e120e30ffmr38332827pjb.14.1684307108614; Wed, 17 May 2023 00:05:08 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1684307108; cv=none; d=google.com; s=arc-20160816; b=wnvf4O/beFWpkw5HdeKdWstvYC32J2+HEhsSwn6Y2+ocpT+Ad8WHc8hkPoFFikgTml EmJCeXomPJbCA725iLK3gaIrfhdfKB2jacBN5Q7gFms2TKMb1Rl+9cb9AcZsurkwNc1B puV6oLdD2ZQx9K4Pn+YK83rRleFpPpN9wvkRq0NS2vn40TT5C8B3nRID0wkV4BP0+6cj IcgJ0wCLYVWK5Cp874UeU4JfSav20I75Sysg85vNDHSJbiNVThqAXEj1a58kCq6kT6P6 DcqCYxMiZF2vgvZylv6/AaJcYec32JAkaGGjIKXImpdSsMKyfyi9mzzqq8yl2z9gd5cD XMRA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=ueYvFiNPIqodyLv9v94eNhjcPY0/j41IxYzpNNl2tWA=; b=NJKkf14uiIB/nt2ZSytOI6/Ju0uGFyDwY0ppuGTN4oTEWswfNuM7B71A7Z1ZaY9Nmw n0yfBaYo4c6Eu/0H/eHngMw0uj7vbDkYKrcZrJUbFqSjCckDccSR0ARiECseLkwjKJw+ i7NKU2GPXUS6TNBEckFGkdT9jg4phbYuiBl4e6TQZl4uLymG115DLr2GbIRFJ7pyvz5H quWejwt2m5QXm6Kz4IS+vqn4NGV4G9ynj+FEZjA6GMU6gq2cnzfyEM/NUpPPCz2Cn+z9 q7hEobniSJUJ3x1PdFyOvfukWGb+8Z+DMkfYzPsn/U3zL57ZGPFQGxchl5Xpc3gYuqxV JNNw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20221208 header.b=CvNG7VVi; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id i9-20020a636d09000000b004fab4df6dfdsi20782946pgc.369.2023.05.17.00.04.56; Wed, 17 May 2023 00:05:08 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20221208 header.b=CvNG7VVi; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229643AbjEQGv6 (ORCPT + 99 others); Wed, 17 May 2023 02:51:58 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34014 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229626AbjEQGv5 (ORCPT ); Wed, 17 May 2023 02:51:57 -0400 Received: from mail-vs1-xe2a.google.com (mail-vs1-xe2a.google.com [IPv6:2607:f8b0:4864:20::e2a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 058E82112; Tue, 16 May 2023 23:51:51 -0700 (PDT) Received: by mail-vs1-xe2a.google.com with SMTP id ada2fe7eead31-4348416ae45so224043137.3; Tue, 16 May 2023 23:51:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1684306310; x=1686898310; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=ueYvFiNPIqodyLv9v94eNhjcPY0/j41IxYzpNNl2tWA=; b=CvNG7VViNSsj8iEJdkr97mdB8Bw1Zk0V/8lT0NhnxzuZbuvYFZYSfLaECM9k6R/+Xk qOkehyeybXgM4oOWUvyl0tgGU8D0q4r3EoFEhR74UB1z4Nx7PZY53T1GZcXht9TBcf6V 6wjDImt7h9T0YQe/M9Q+NR8qhqR9a9xyCdqQ1ASCnHiDmaH46DXURJ2bMQc8iSnQjf++ oInZZA55Gdl0GlqcdNpWkmZC+YsrV3LRKSqzEPw29/OKRc8zpG3kJiAuxqO4G/2O7wZt nB8z8OyL6qu0+qVFyWpKgqdu8zNjhlvQTP6xZEkwBa7OglroEj63HTGZNds38MqLbcbL qilQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684306310; x=1686898310; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ueYvFiNPIqodyLv9v94eNhjcPY0/j41IxYzpNNl2tWA=; b=c2avkECpmJ0wg3ckwhjOhA2eLtQK+rdNPuHuUaSXeqpouk41QzOb1nXIgjjPks0EBi r+FWbRDV5tdFrMelUr5tC9tCHRcNM7eksPkNP+CKmoaOjYYcekSGUaCYNZvyoK8t6bVV A0OPjgHRvF+TUIsGMajrZSOzpz2YB0NFzimKtGymYYu3TyitnyF/NKcDOaE2ST8zQnjN FsgXkQN7Ff4jIxkO6ImKvaiO7AJ3dCsvMFsDjHh8NsH7KfDbRP0PyhbpXVsgN6+PAE5O lqupd7nBiJXhqptQKmbDLsmSRpseW67gv6z9A9ZT2wDKNeOOMUOpB6QBbxemrnhh9h0W e28w== X-Gm-Message-State: AC+VfDwNFedVhhx8A3wYH1FbZl2lamXzv70CLb3X7xMeaFMZa6KEY8Fm AWkuPVG0Kp3rkty//QF6Q8N74WfzvzNfpDQ2dnU= X-Received: by 2002:a05:6102:3a66:b0:42f:46d6:7a65 with SMTP id bf6-20020a0561023a6600b0042f46d67a65mr17316190vsb.21.1684306310492; Tue, 16 May 2023 23:51:50 -0700 (PDT) MIME-Version: 1.0 References: <20230418014037.2412394-1-drosen@google.com> <93e0e991-147f-0021-d635-95e615057273@linux.alibaba.com> In-Reply-To: <93e0e991-147f-0021-d635-95e615057273@linux.alibaba.com> From: Amir Goldstein Date: Wed, 17 May 2023 09:51:39 +0300 Message-ID: Subject: Re: [RFC PATCH bpf-next v3 00/37] FUSE BPF: A Stacked Filesystem Extension for FUSE To: Gao Xiang Cc: Daniel Rosenberg , Miklos Szeredi , bpf@vger.kernel.org, Alexei Starovoitov , linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-unionfs@vger.kernel.org, Daniel Borkmann , John Fastabend , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , KP Singh , Stanislav Fomichev , Hao Luo , Jiri Olsa , Shuah Khan , Jonathan Corbet , Joanne Koong , Mykola Lysenko , kernel-team@android.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, May 17, 2023 at 5:50=E2=80=AFAM Gao Xiang wrote: > > > > On 2023/5/2 17:07, Daniel Rosenberg wrote: > > On Mon, Apr 24, 2023 at 8:32=E2=80=AFAM Miklos Szeredi wrote: > >> > >> > >> The security model needs to be thought about and documented. Think > >> about this: the fuse server now delegates operations it would itself > >> perform to the passthrough code in fuse. The permissions that would > >> have been checked in the context of the fuse server are now checked in > >> the context of the task performing the operation. The server may be > >> able to bypass seccomp restrictions. Files that are open on the > >> backing filesystem are now hidden (e.g. lsof won't find these), which > >> allows the server to obfuscate accesses to backing files. Etc. > >> > >> These are not particularly worrying if the server is privileged, but > >> fuse comes with the history of supporting unprivileged servers, so we > >> should look at supporting passthrough with unprivileged servers as > >> well. > >> > > > > This is on my todo list. My current plan is to grab the creds that the > > daemon uses to respond to FUSE_INIT. That should keep behavior fairly > > similar. I'm not sure if there are cases where the fuse server is > > operating under multiple contexts. > > I don't currently have a plan for exposing open files via lsof. Every > > such file should relate to one that will show up though. I haven't dug > > into how that's set up, but I'm open to suggestions. > > > >> My other generic comment is that you should add justification for > >> doing this in the first place. I guess it's mainly performance. So > >> how performance can be won in real life cases? It would also be good > >> to measure the contribution of individual ops to that win. Is there > >> another reason for this besides performance? > >> > >> Thanks, > >> Miklos > > > > Our main concern with it is performance. We have some preliminary > > numbers looking at the pure passthrough case. We've been testing using > > a ramdrive on a somewhat slow machine, as that should highlight > > differences more. We ran fio for sequential reads, and random > > read/write. For sequential reads, we were seeing libfuse's > > passthrough_hp take about a 50% hit, with fuse-bpf not being > > detectably slower. For random read/write, we were seeing a roughly 90% > > drop in performance from passthrough_hp, while fuse-bpf has about a 7% > > drop in read and write speed. When we use a bpf that traces every > > opcode, that performance hit increases to a roughly 1% drop in > > sequential read performance, and a 20% drop in both read and write > > performance for random read/write. We plan to make more complex bpf > > examples, with fuse daemon equivalents to compare against. > > > > We have not looked closely at the impact of individual opcodes yet. > > > > There's also a potential ease of use for fuse-bpf. If you're > > implementing a fuse daemon that is largely mirroring a backing > > filesystem, you only need to write code for the differences in > > behavior. For instance, say you want to remove image metadata like > > location. You could give bpf information on what range of data is > > metadata, and zero out that section without having to handle any other > > operations. > > A bit out of topic (although I'm not quite look into FUSE BPF internals) > After roughly listening to this topic in FS track last week, I'm not > quite sure (at least in the long term) if it might be better if > ebpf-related filter/redirect stuffs could be landed in vfs or in a > somewhat stackable fs so that we could redirect/filter any sub-fstree > in principle? It's just an open question and I have no real tendency > of this but do we really need a BPF-filter functionality for each > individual fs? I think that is a valid question, but the answer is that even if it makes s= ense, doing something like this in vfs would be a much bigger project with larger consequences on performance and security and whatnot, so even if (and a very big if) this ever happens, using FUSE-BPF as a playground for this sort of stuff would be a good idea. This reminds me of union mounts - it made sense to have union mount functionality in vfs, but after a long winding road, a stacked fs (overlayf= s) turned out to be a much more practical solution. > > It sounds much like > https://learn.microsoft.com/en-us/windows-hardware/drivers/ifs/about-file= -system-filter-drivers > Nice reference. I must admit that I found it hard to understand what Windows filter drivers can do compared to FUSE-BPF design. It'd be nice to get some comparison from what is planned for FUSE-BPF. Interesting to note that there is a "legacy" Windows filter driver API, so Windows didn't get everything right for the first API - that is especial= ly interesting to look at as repeating other people's mistakes would be a sham= e. Thanks, Amir.