Received: by 2002:ab2:1149:0:b0:1f3:1f8c:d0c6 with SMTP id z9csp289659lqz; Fri, 29 Mar 2024 18:00:00 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCVQhCuy9eOL/FFKCqiKK/YUYzCuxtYSWyGA22c70eGf/3WyDzz9Sk3zbzIFOnAX+bh+buaur1jvG0nZC6drru7I5NNnPGJ1JG+1mFKJzQ== X-Google-Smtp-Source: AGHT+IGaYJfH/QAEu93uW+V+6c8xfTQjM+ikX4LfcKQ4cVZhlnjZpGLqxXx1yKOox0rj3Z3VB9kc X-Received: by 2002:ad4:5907:0:b0:68f:62d6:70c with SMTP id ez7-20020ad45907000000b0068f62d6070cmr9574451qvb.19.1711760399779; Fri, 29 Mar 2024 17:59:59 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1711760399; cv=pass; d=google.com; s=arc-20160816; b=U20eIwK6LCxhiqSlvpVRN53ekUuUIFnZ9gsRScA5ScShY8pu3A8MBEVMcV6uoDzIHs hmZyhgOqVe8Ni/TK62BqU87j0W0gkunmns47NAMrVShLN0M8xPcXIuN8Qj7ZS8wMPRU6 OQqg5nvhbFTBTui9/r440XkSsRX/wip9pVX4SlwIv+n1wyOTin/ytsfNseXZKstBijRy w0uXJ0wtftkfSQd5fG+0h9ZGf9lB5OaoEUZmhKH1MkpEtZ+fiienEyk0UTYnRkgd2wRZ O/LAL+bcN7dYuM5iwRbyiphPWQ69gqVTkWcRaiaBYgGazxKz3akLNhnIj86S2B85rei7 XGKA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:list-unsubscribe:list-subscribe :list-id:precedence:dkim-signature; bh=uNZdNcX9/iroWo3FyJaRCAOgU46ZuzE0lebtkWKYFL4=; fh=0GUQ/okUiqdDV0kEpKremJ+wN4m8erEVuOgI7Syh6x4=; b=xaTI0lSj7WpnOHnUabX28Zlg4BoERzsq8LgPuXkrXM2jxMM53wB4aTGCLhBo4kPjtA 5Gq/JhdePZ7WpT2KOphpqo6GtqQmUuDLhPPdu47ufnzzaEshbpI7HYdnF/I5++RURBwA aBXmumtR+rOYSmqEFJzYPep6reRDlhUfP2iq59j1iTBakDdPBKa848G1fZ7OagTebKNI ZE+2W+3yY+oLfl8Zra+8dBUDQfamOU4GgQqnv3gFjXGiqD9zBpaC66RfjjjJxt7U+Qd2 a8vvDgM5zhRBZ5ubskdcQZgc1oi34q4a1fw/SyDDORZOAQnu3BLiCxj0no2wMyz/XN1e ukSw==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=4I1SvwBV; arc=pass (i=1 spf=pass spfdomain=google.com dkim=pass dkdomain=google.com dmarc=pass fromdomain=google.com); spf=pass (google.com: domain of linux-kernel+bounces-125558-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-125558-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [2604:1380:45d1:ec00::1]) by mx.google.com with ESMTPS id q1-20020a056214018100b00696a7f3ff6fsi4930073qvr.130.2024.03.29.17.59.59 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 29 Mar 2024 17:59:59 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-125558-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) client-ip=2604:1380:45d1:ec00::1; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=4I1SvwBV; arc=pass (i=1 spf=pass spfdomain=google.com dkim=pass dkdomain=google.com dmarc=pass fromdomain=google.com); spf=pass (google.com: domain of linux-kernel+bounces-125558-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-125558-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id 63CD91C20B40 for ; Sat, 30 Mar 2024 00:59:59 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 16DCF4A3D; Sat, 30 Mar 2024 00:59:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="4I1SvwBV" Received: from mail-wr1-f48.google.com (mail-wr1-f48.google.com [209.85.221.48]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4C5501860 for ; Sat, 30 Mar 2024 00:59:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.48 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711760387; cv=none; b=SpXNodpkUzJPmJsuaK/fe3y0ojIXyCm167SrX9ry7sZKI+E1TFRjwTwHA/TOQODwRzHZlmF8e13NZQrfiVHCcgGIQal+8DXIqq0kzE+GNiU7/xafKxjxAxTaf2l8nB3Szu6j4O/xEGhdmImdOaqKy5g1q5xWxGPQ9wUdHI6ZMyM= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711760387; c=relaxed/simple; bh=uNZdNcX9/iroWo3FyJaRCAOgU46ZuzE0lebtkWKYFL4=; h=MIME-Version:References:In-Reply-To:From:Date:Message-ID:Subject: To:Cc:Content-Type; b=NwGTeJC+P+559q2yMUPa2El1FtFmZqNjzGutBnJIKSYsH32Yfe0BoAE7nmMJWiRDD7gMu2OeAVwqSyYWCJidF/sLG0QdHE/+IzJy5L1S6SXXXiuEeyvYrBOk3wEeBIP16wZwUblvhmTHWd4seyvF3i8Fj7qmZ8zsgx49PYwtozY= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=4I1SvwBV; arc=none smtp.client-ip=209.85.221.48 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=google.com Received: by mail-wr1-f48.google.com with SMTP id ffacd0b85a97d-341730bfc46so1699751f8f.3 for ; Fri, 29 Mar 2024 17:59:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1711760384; x=1712365184; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=uNZdNcX9/iroWo3FyJaRCAOgU46ZuzE0lebtkWKYFL4=; b=4I1SvwBVGoq4hHVZqk9OHvp4M2fm0MN/NDUGXzD6iffrAQuukF9cVD0VgX2oX75ImR 8M58QKON58QoPCaut8GNZe9zkDZ6tzD2WFb4aJkJCLoYWUBEFk50hY7Gxs6u8MtRdY8f /Fw53PwrAqxW2FDdW9kibmLDdCv1PmzK/uut/UPc/Ox/cUxqF9JNizDBpXkKE6kg4ycV u2Zhpx19Aj6s3RfefsYR3npWeZJSw2n/i6JXe1HMRAB2YGrUgxqiqK6b3cUNkwrFkrkV tsicKP84YjaojsUO9/XrxBAbXkheNFY4tpXq03P3Dx5+/pm15K1jL+ZhspCubzGBG3EW PM8A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1711760384; x=1712365184; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=uNZdNcX9/iroWo3FyJaRCAOgU46ZuzE0lebtkWKYFL4=; b=CXhS+6dOfmsEODjEwesbV1VteT5lfDdmrM8HbZQ7dI9y/MfEtPpqs3/Bwsqu6mcFyE ZSYPNcBI84VeCHHlsXwJ7swV8MbpV/olNghDAniy9XcWHgM1o0zDmAJc8uXo9RYO/0wQ qRgyBMGgXBXZp6XU2pEwt3u+bDW7Ljqn8Xe/4RMfYPbGjRTrR2H5OJ5olMbmgJ+1clX2 bpKGaOiOlAPDtzMS5ZpNS8qIOgZKNyOt9XAiJRLFFOHG8pB65bGdNbFi0sE55SMwKUD8 nO52Na6kRuCEdztUW4WnJAn6C02yC6A4QE00FQ5EIYzf4YDHPg3SgWi9nQBaZZfr7RU5 gytw== X-Forwarded-Encrypted: i=1; AJvYcCVvcnBukXMXzKZJPgy+bYnxd7FZv+MzNZKZ4ScNLI2qHFzNTh5e14ecyoB6V0KMJn/JmmbJzhH7wpYhtm2aZkXCmcVFsR8YzHBb4xk7 X-Gm-Message-State: AOJu0YwremSursEjJ45GlxW3farpDNOQ7+KNeyLo+l5/Ml2b1+eCtZ8f +vC45z3P6VwPy8Zi2vVf946j6B72gZVIVqc9fLRb1802+bZP+XfVHY8zOZNXWfLMPQJqSGigknz CoCsV4qKxrj0oxKoPYGV+IYFwJ7DdVsJgXHCV X-Received: by 2002:adf:f18e:0:b0:33d:64c7:5619 with SMTP id h14-20020adff18e000000b0033d64c75619mr2622657wro.70.1711760383544; Fri, 29 Mar 2024 17:59:43 -0700 (PDT) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 References: <20240329015351.624249-1-drosen@google.com> In-Reply-To: From: Daniel Rosenberg Date: Fri, 29 Mar 2024 17:59:30 -0700 Message-ID: Subject: Re: [RFC PATCH v4 00/36] Fuse-BPF and plans on merging with Fuse Passthrough To: Amir Goldstein Cc: Miklos Szeredi , bpf@vger.kernel.org, Alexei Starovoitov , linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-unionfs@vger.kernel.org, Daniel Borkmann , John Fastabend , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Eduard Zingerman , Yonghong Song , KP Singh , Stanislav Fomichev , Hao Luo , Jiri Olsa , Shuah Khan , Jonathan Corbet , Joanne Koong , Mykola Lysenko , Christian Brauner , kernel-team@android.com, Bernd Schubert Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Thu, Mar 28, 2024 at 11:45=E2=80=AFPM Amir Goldstein wrote: > > My plan was to start from passthrough ioctl with O_PATH fd on lookup > and deal with performance improvements later when there are actual > workloads that report a problem and that depends where the overhead is. > > Is it with the opening of O_PATH fds? > Is it with the passthtough ioctls? > If latter, then when fuse uring is merged, I think we could get loose > the ioctls anyway. > I'm not terribly sure. Ideally I would have cc'ed them on this email, but I didn't take down contact info with my notes. I was under the impression that it was triggering all of the opens before an actual open was needed, for example, during ls -l, but I don't know if that was with O_PATH fds. I was a bit concerned that a performance fix there might end up needing a different interface, and managing multiple of those could get pretty cluttered. But I agree that it doesn't make sense to do anything there without a concrete use-case and issue. > > The original reason was to mitigate an attack vector of fooling a > privileged process into writing the fd (number) to /dev/fuse to > gain access to a backing file this way. > > The fuse-bpf way of doing all responds with ioctls seems fine for > this purpose, but note that the explicit setup also provides feedback > to the server in case the passthrough cannot be accomplished > for a specific inode (e.g. because of stacking depths overflow) > and that is a big benefit IMO. > That certainly informs the daemon of the error earlier. So long as we can still run the complete passthrough mode serverless that's fine by me. I've found that mode helpful for running filesystem tests on pure backing mode, plus I imagine some simple Fuse filesystems could get away with only the bpf programs. > > Using a global cred should be fine, just as overlayfs does. > The specific inode passthrough setup could mention if the global > cred should be used. > > However, note that overlayfs needs to handle some special cases > when using mounter creds (e.g.: ovl_create_or_link() and dropping > of CAP_SYS_RESOURCE). > > If you are going to mimic all this, better have that in the stacking fs > common code. > Sure. The less duplicate code the better :) > > That sounds like a good plan, but also, please remember Miklos' request - > please split the patch sets for review to: > 1. FUSE-passthrough-all-mode > 2. Attach BPF program > > We FUSE developers must be able to review the FUSE/passthough changes > without any BPF code at all (which we have little understanding thereof) > > As a merge strategy, I think we need to aim for merging all the FUSE > passthrough infrastructure needed for passthrough of inode operations > strictly before merging any FUSE-BPF specific code. > > In parallel you may get BPF infrastructure merged, but integrating FUSE+B= PF, > should be done only after all infrastructure is already merged IMO. > Ok. I'll probably mess around with the module stuff at least, in order to work out if everything I need is present on the bpf side. Do you know if anyone is actively working on extending the file-backing work to something like inode-backing? I don't want to duplicate work there, but I'd be happy to start looking at it. Otherwise I'd focus on the bpf end for now. I expect we'll want to be able to optionally set the bpf program at the same place where we set the backing file/inode. Hence the spit into a file and inode program set. I'm still thinking over what the best way to address the programs is... > > So I don't think there is any point in anyone actually reviewing the > v4 patch set that you just posted? > Correct. The only reason I included it was as a reference for the sort of stuff fuse-bpf is currently doing. > > Please explain what you mean by that. > How are fuse-bpf file operations expected to be used and specifically, > How are they expected to extend the current FUSE passthrough functionalit= y? > > Do you mean that an passthrough setup will include a reference to a bpf > program that will be used to decide per read/write/splice operation > whether it should be passed through to backing file or sent to server > direct_io style? > So in the current fuse-bpf setup, the bpf program does two things. It can edit certain parameters, and it can indicate what the next action should be. That action could be queuing up the post filter after the backing operation, deferring to a userspace pre/post filter, or going back to normal fuse operations. The latter one isn't currently very well fleshed out. Unless you do some specific tracking, under existing fuse-bpf you'd have a node id of 0, and userspace can't make much out of that. With that aside, there's all sorts of caching nightmares to deal with there. We're only using the parameter changing currently in our use cases. I wouldn't be opposed to leaving the falling back to fuse for specific operations out of v1 of the bpf enhancements, especially if we have the userspace pre/post filters available. So you'd optionally specify a bpf program to use with the backing file. That would allow you to manipulate some data in the files like you might in Fuse itself. For instance, data redaction. You could null out location metadata in images, provided a map or something with the offsets that should be nulled. You could also prepend some data at the beginning of a file by adjusting offsets and attrs and whatnot. I could imagine having multiple backing files, and the bpf program splitting a read into multiple parts to handle parts of it using different backing files, although that's not in the current design. > > I just wanted to make sure that you are aware of the fact that direct io > to server is the only mode of io that is allowed on an inode with an atta= ched > backing file. > > Thanks, > Amir. > Can you not read/write without interacting with the server? Or do you mean FOPEN_DIRECT_IO sends some file ops to the server even in passthrough mode? At the moment I'm tempted to follow the same mechanics passthrough is using. The only exception would be possibly tossing back to the server, which I mentioned above. That'd only happen for, say, read, if we're not under FOPEN_DIRECT_IO. I've not looked too closely at FOPEN_DIRECT_IO. In Fuse bpf we currently have bpf mode taking priority. Are there any email threads I should look at for more background there? -Daniel