Received: by 2002:a05:6a10:206:0:0:0:0 with SMTP id 6csp2558850pxj; Mon, 17 May 2021 04:38:44 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzzUkPOi3r9n7zqDlXnp2YZrRLXO+ogczXVUk1sRySk1sWH3EUqYfql3h3FqJ7Mtwi16s0i X-Received: by 2002:a05:6e02:581:: with SMTP id c1mr4966257ils.37.1621251524358; Mon, 17 May 2021 04:38:44 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1621251524; cv=none; d=google.com; s=arc-20160816; b=wuCCkJHb03jQSgt7r/kvTM6OfRyxh4EQO1RHlEoCqdN6p7VR7dR1KEVrSj1L8RTHmp Drl1NMFTLEZ7nA2PuTtUzudPCTVVibJHgalDwDjhUFlORFfgOGevq00fM8b85IzvX5mz gMKzCPnf2hA7gncpV2ASvAQGks4N0xQdqST89BxcWuGwzICjR1MwkzG0nopCmZQHVSwg 0NrlZ1aPM9fXoRFMQaje80dDsr1xTTfQSOxtUEzeGGYcP+3QfucPTWEPDiFwtdwsj40d G3XTe9ug2aGmzEO/qf6DWYUWjiVRP2aERhsUT0EyReq0RIF3xWxVb0Ji9Id80k1l/7BS DgoA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=w+xkGRvvRiunjBm3RVE6YkhJ5AJIWHDmoIrUIlGn+fc=; b=igZO1LtbHf53+wgyKDXUq/zIYbVHMWGTLdmOf9Asmg8HBtIlkVXw31YPLFlrAnJfxZ INooSS1tbxX12cZX2cSN7RgvEMCjAo2S5dQrczD4wdk/7oSUSlicz89TQQYe8Yxdu5x0 ND2k6mbA6ltr0Nt2SDMvkPLhsx4ZREwaeecApgE5QWB+OjU4OFgez8MYNl6Yfeks6r9A llQNByVqg6fXJwtZMfQwiXq8ANOHd4HVvAlxQ+1WlRNORf3JVZZmf7aWjLufkV5lg4/w 1Q9Zq4Tk9sw4BDLwsbGJ2OM24srvhC/7S8Jg4gu3d+qIzfHAsjYAhXS8dF9UgrcKAhmI H5Gg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@android.com header.s=20161025 header.b=rgeOydfo; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=android.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id a13si16175609iok.48.2021.05.17.04.38.31; Mon, 17 May 2021 04:38:44 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@android.com header.s=20161025 header.b=rgeOydfo; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=android.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236851AbhEQLiJ (ORCPT + 99 others); Mon, 17 May 2021 07:38:09 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58410 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236845AbhEQLiJ (ORCPT ); Mon, 17 May 2021 07:38:09 -0400 Received: from mail-wr1-x436.google.com (mail-wr1-x436.google.com [IPv6:2a00:1450:4864:20::436]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B91FCC061573 for ; Mon, 17 May 2021 04:36:52 -0700 (PDT) Received: by mail-wr1-x436.google.com with SMTP id n2so6094297wrm.0 for ; Mon, 17 May 2021 04:36:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=android.com; s=20161025; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=w+xkGRvvRiunjBm3RVE6YkhJ5AJIWHDmoIrUIlGn+fc=; b=rgeOydfoO4C0EGb+zRmNliMzLP+z/a3m/PFvxWzQCnuTO0yX+YRfpXF1F/UYV8g5wZ 4QB+/OXoKHS192d/lt5nSZqGPQdDUZPdAI6UezHpfVdKOgAF5iPPyhZahZX+s7MiEqd3 qeC+9PMLWROihyVRXsduqAT9Q1L+zD2DKsYG1alUepvXXUPHUHdLdHS2xa2vEWnqiQkz DBjW/Q6/7WAl02qblvCi8azyWoJABTNcExa5YAltFXPHb0uPHKpfdkB0IWy0wsemVP72 NOgTI5z2APyifDtP178fxwUdiCe11qaAdGsZLCNXV318ih3yUKwzg8OOKVgOb0XrfqK1 FuZQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=w+xkGRvvRiunjBm3RVE6YkhJ5AJIWHDmoIrUIlGn+fc=; b=pnhKazxk1D9+jU34axo8Pwk8LiXDhMV58gyJncm1f6I42Tet4xiCoYfZXdIzbcN/po mOG9gCrjo9iEJSFYBsxAE5G29XdgA0Jno7esuQCDRAOFOPkyC7qMSrKDZD6oAg4zfIus g7U7m5tcjdVqaK+gi9P0Gdxpb+Xk6y6K4iHZmWVMDB2ZYs7Yv4MgbvLXXToKwdHiURzy VOn2Mo+bc/U/124egAfaaA+f+JoFTTbX6Ql2SIJBnvAaAIiq3G2hFjeaMme1Qgp/dh2y C6SJ3TgN5TSmYuBKCH0/iKrefHAu3kNpDxTpZ5SacRJyBVngoAg0DqWDZBXumDlcfSL9 ASZA== X-Gm-Message-State: AOAM532kmuaZehn83crHXy6QNoqihgX73W9JBmKNZOEAYiz4DDiKMhKt SwCf+gtH42R7I+EGxvZIOaPL3Q== X-Received: by 2002:a5d:64a9:: with SMTP id m9mr3795995wrp.200.1621251411489; Mon, 17 May 2021 04:36:51 -0700 (PDT) Received: from google.com ([2a00:79e0:d:210:c20d:53d8:7bc:dbf9]) by smtp.gmail.com with ESMTPSA id a18sm10660286wmb.13.2021.05.17.04.36.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 17 May 2021 04:36:51 -0700 (PDT) Date: Mon, 17 May 2021 12:36:49 +0100 From: Alessio Balsini To: Amir Goldstein Cc: Miklos Szeredi , Alessio Balsini , Akilesh Kailash , Antonio SJ Musumeci , David Anderson , Giuseppe Scrivano , Jann Horn , Jens Axboe , Martijn Coenen , Palmer Dabbelt , Paul Lawrence , Peng Tao , Stefano Duo , Zimuzo Ezeozue , wuyan , fuse-devel , kernel-team , linux-fsdevel , linux-kernel Subject: Re: [PATCH RESEND V12 4/8] fuse: Passthrough initialization and release Message-ID: References: <20210125153057.3623715-1-balsini@android.com> <20210125153057.3623715-5-balsini@android.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, May 05, 2021 at 03:21:26PM +0300, Amir Goldstein wrote: > On Wed, Feb 17, 2021 at 3:52 PM Miklos Szeredi wrote: > > > > On Mon, Jan 25, 2021 at 4:31 PM Alessio Balsini wrote: > > > > > > Implement the FUSE passthrough ioctl that associates the lower > > > (passthrough) file system file with the fuse_file. > > > > > > The file descriptor passed to the ioctl by the FUSE daemon is used to > > > access the relative file pointer, that will be copied to the fuse_file > > > data structure to consolidate the link between the FUSE and lower file > > > system. > > > > > > To enable the passthrough mode, user space triggers the > > > FUSE_DEV_IOC_PASSTHROUGH_OPEN ioctl and, if the call succeeds, receives > > > back an identifier that will be used at open/create response time in the > > > fuse_open_out field to associate the FUSE file to the lower file system > > > file. > > > The value returned by the ioctl to user space can be: > > > - > 0: success, the identifier can be used as part of an open/create > > > reply. > > > - <= 0: an error occurred. > > > The value 0 represents an error to preserve backward compatibility: the > > > fuse_open_out field that is used to pass the passthrough_fh back to the > > > kernel uses the same bits that were previously as struct padding, and is > > > commonly zero-initialized (e.g., in the libfuse implementation). > > > Removing 0 from the correct values fixes the ambiguity between the case > > > in which 0 corresponds to a real passthrough_fh, a missing > > > implementation of FUSE passthrough or a request for a normal FUSE file, > > > simplifying the user space implementation. > > > > > > For the passthrough mode to be successfully activated, the lower file > > > system file must implement both read_iter and write_iter file > > > operations. This extra check avoids special pseudo files to be targeted > > > for this feature. > > > Passthrough comes with another limitation: no further file system > > > stacking is allowed for those FUSE file systems using passthrough. > > > > > > Signed-off-by: Alessio Balsini > > > --- > > > fs/fuse/inode.c | 5 +++ > > > fs/fuse/passthrough.c | 87 ++++++++++++++++++++++++++++++++++++++++++- > > > 2 files changed, 90 insertions(+), 2 deletions(-) > > > > > > diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c > > > index a1104d5abb70..7ebc398fbacb 100644 > > > --- a/fs/fuse/inode.c > > > +++ b/fs/fuse/inode.c > > > @@ -1133,6 +1133,11 @@ EXPORT_SYMBOL_GPL(fuse_send_init); > > > > > > static int free_fuse_passthrough(int id, void *p, void *data) > > > { > > > + struct fuse_passthrough *passthrough = (struct fuse_passthrough *)p; > > > + > > > + fuse_passthrough_release(passthrough); > > > + kfree(p); > > > + > > > return 0; > > > } > > > > > > diff --git a/fs/fuse/passthrough.c b/fs/fuse/passthrough.c > > > index 594060c654f8..cf993e83803e 100644 > > > --- a/fs/fuse/passthrough.c > > > +++ b/fs/fuse/passthrough.c > > > @@ -3,19 +3,102 @@ > > > #include "fuse_i.h" > > > > > > #include > > > +#include > > > > > > int fuse_passthrough_open(struct fuse_dev *fud, > > > struct fuse_passthrough_out *pto) > > > { > > > - return -EINVAL; > > > + int res; > > > + struct file *passthrough_filp; > > > + struct fuse_conn *fc = fud->fc; > > > + struct inode *passthrough_inode; > > > + struct super_block *passthrough_sb; > > > + struct fuse_passthrough *passthrough; > > > + > > > + if (!fc->passthrough) > > > + return -EPERM; > > > + > > > + /* This field is reserved for future implementation */ > > > + if (pto->len != 0) > > > + return -EINVAL; > > > + > > > + passthrough_filp = fget(pto->fd); > > > + if (!passthrough_filp) { > > > + pr_err("FUSE: invalid file descriptor for passthrough.\n"); > > > + return -EBADF; > > > + } > > > + > > > + if (!passthrough_filp->f_op->read_iter || > > > + !passthrough_filp->f_op->write_iter) { > > > + pr_err("FUSE: passthrough file misses file operations.\n"); > > > + res = -EBADF; > > > + goto err_free_file; > > > + } > > > + > > > + passthrough_inode = file_inode(passthrough_filp); > > > + passthrough_sb = passthrough_inode->i_sb; > > > + if (passthrough_sb->s_stack_depth >= FILESYSTEM_MAX_STACK_DEPTH) { > > > + pr_err("FUSE: fs stacking depth exceeded for passthrough\n"); > > > > No need to print an error to the logs, this can be a perfectly normal > > occurrence. However I'd try to find a more unique error value than > > EINVAL so that the fuse server can interpret this as "not your fault, > > but can't support passthrough on this file". E.g. EOPNOTSUPP. > > > > > > Sorry for the fashionably late response... > Same comment for !{read,write}_iter case above. > EBAFD is really not appropriate there. > May I suggest ELOOP for s_stack_depth and EOPNOTSUPP > for no rw iter ops. > > Are you planning to post another version of the patches soon? > > Thanks, > Amir. Hi Amir, Thanks for your feedback. I like the ELOOP for the stack_depth, and the EOPNOTSUPP is already in my local branch. I've been busy with the integration of this last patchset in Android, that unfortunately will be shipped as out-of-tree. I'm keeping all my fingers crossed for my workarounds to prevent future passthrough UAPI breakages to work. :) I have an ugly patch which uses IDR as Miklos asked, but unfortunately I'm facing some performance issues due to the locking mechanisms to keep guarantee the RCU consistency. I can post the new patch set as an RFC soon for the community to take a look. At a glance what happens is: - the IDR, one for each fuse_conn, contains pointers to "struct fuse_passthrough" containing: - fuse_file *: which is using passthrough, - file *: native file system file target, - cred of the FUSE server, - ioctl(PASSTHROUGH_OPEN): updates IDR, requires spinlock: - kmalloc(fuse_passthrough), update file and cred, - ID = idr_alloc(), - return ID, - fuse_open reply from FUSE server with passthrough ID: updates IDR, requires spinlock: - pt = idr_find(ID), - update fuse_file with the current fuse_file, - update fuse_file->passthrough_id = ID, - idr_replace(), - read/write/mmap: lock-free IDR read: - idr_find(fuse_file::passthrough_id), - forward request to lower file system as for the current FUSE passthrough patches. - ioctl(PASSTHROUGH_CLOSE): updates IDR, requires spinlock: - idr_remove(); - call_rcu(): tell possible open fuse_file user that the ID is no more valid and kfree() the allocated struct; - close(fuse_file): updates IDR, requires spinlock: - ID = fuse_file::passthrough_id - idr_find(ID), - fuse_passthrough::fuse_file = NULL, - idr_replace(). This would be fine if passthrough is activated for a few, big files, where the passthrough overhead is dominated by the direct access benefits, but if passthrough is enabled for many small files which just do one or two read/writes (as what I did for my benchmark of total build time for the kernel, where I was passing-through every opened file), the overhead becomes a real issue. If you have any thoughts on how to make this simpler, I'm absolutely open to fix this. We are also in the preliminary design step of having a hierarchical passthrough feature, for which all the dir/file operations performed in any of the contents of the passthrough folder, are automatically forwarded to the lower file system. That may fix this overhead issue, but it's still early to forecast what is going to happen with this idea. Right now I'm not even sure it's feasible. Looking forward to receiving some feedback, thanks in advance! Alessio