Received: by 2002:a05:7412:da14:b0:e2:908c:2ebd with SMTP id fe20csp2033654rdb; Mon, 9 Oct 2023 10:16:22 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFrnT/EDOSuSAbfwmCcAMeLKmw+QoLwZihJLgX1o3MOuiyvlY+HsaLYV2r1yogMnRSqFdLH X-Received: by 2002:a05:6a20:4414:b0:159:cf93:9b50 with SMTP id ce20-20020a056a20441400b00159cf939b50mr17657977pzb.46.1696871782058; Mon, 09 Oct 2023 10:16:22 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1696871782; cv=pass; d=google.com; s=arc-20160816; b=1Gu69dohoaTETPivnfCuagji2HojBXYNrQfa4J38WoMdXkHqU5TIL6GEQMjul4J6eK KnPJdTgAXsKx30MtRfXEXut3eAJM6SqbXwcjSlgNLCGgD4UvNlzaQU6Qzj/oHMkcsEmE VAl/YgR6xFLkU2KSfooLWcvIKy/2Z2UZ3BjIEcvSgoSlfLRQg26JnEK+Pa8hUHJMeBmX /ePXuV0R+2Eg1qAOvw0K18l1sfFP4/2QbodRqn6VlzcV2S8VfBoZEdhnugJsU7lOIOvy 0QiSrWnhA14lohBGtyGEu2QlD7cJaOxtKkEmjcFEOpRJuQ+o39fXqOWYAElTu8eR4Kei JF2g== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=QlFuBSlg0E1TP6LN5LFGf7CzS2eE6s5UiNS1GrMFKuU=; fh=Pzq0wC4stGh0zwNuiP25tr+2JeLSzb157Kx0i2lJJDQ=; b=YmMPnrxnmNOxqQgKudsmmjw21w9O4PLIQJ3srQKU88CuWkK1MgEGWK+5ZRYumh25Y4 UhyOARnJUfOoIR70ynuEIjQ7WcX8TiMRbpdkWXQBc8cg1GMEiXog+QFwkc49L/EUa5Il GYWeVixRawt+/97xvUbmHfLBs5sXEIq3nRSLRs/U87pv5U3IsWE8mgYXVhvwOdYvjwGE tyrYi3qHge5n4XYaJyoWWXvm3O8O54bZp+XyRKaFmQ4yzQwP5SQcdICR/Ut6aen1RvD0 onhbjfl8uwEzCSTluSBLKIAP9vhfHVrRvVsGE3rWXALzuJ/8JVRmZ//LUHltI8jMr2EE uX1w== ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@templeofstupid.com header.s=dreamhost header.b=d23ui0iZ; arc=pass (i=1); spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:1 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from morse.vger.email (morse.vger.email. [2620:137:e000::3:1]) by mx.google.com with ESMTPS id s18-20020a056a00179200b006826c8d5a31si8190263pfg.21.2023.10.09.10.16.21 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 09 Oct 2023 10:16:22 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:1 as permitted sender) client-ip=2620:137:e000::3:1; Authentication-Results: mx.google.com; dkim=pass header.i=@templeofstupid.com header.s=dreamhost header.b=d23ui0iZ; arc=pass (i=1); spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:1 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by morse.vger.email (Postfix) with ESMTP id E9EBA80BDE2C; Mon, 9 Oct 2023 10:15:58 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at morse.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1377307AbjJIRPk (ORCPT + 99 others); Mon, 9 Oct 2023 13:15:40 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52058 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1346647AbjJIRPh (ORCPT ); Mon, 9 Oct 2023 13:15:37 -0400 Received: from bee.birch.relay.mailchannels.net (bee.birch.relay.mailchannels.net [23.83.209.14]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D6A94D6 for ; Mon, 9 Oct 2023 10:15:31 -0700 (PDT) X-Sender-Id: dreamhost|x-authsender|kjlx@templeofstupid.com Received: from relay.mailchannels.net (localhost [127.0.0.1]) by relay.mailchannels.net (Postfix) with ESMTP id 20005361380 for ; Mon, 9 Oct 2023 17:15:31 +0000 (UTC) Received: from pdx1-sub0-mail-a302.dreamhost.com (unknown [127.0.0.6]) (Authenticated sender: dreamhost) by relay.mailchannels.net (Postfix) with ESMTPA id B357A360BD7 for ; Mon, 9 Oct 2023 17:15:30 +0000 (UTC) ARC-Seal: i=1; s=arc-2022; d=mailchannels.net; t=1696871730; a=rsa-sha256; cv=none; b=H0zdrU/R8BrMiN1B8DW8a0dfeALKIzLE3NnFdRisByZpEJSw+N9z1VBjYOZOQxjHlefDvK B9g1s2xusAQ16GLiIFXk39QrxEySINK/r4cr1uhuLUevmBrNlW4OGu4HUhHHcbppFNOuBH g4+8yvRbPu0dcsqJcpgMubnlz6U036r9zJAfAEjTCQgGfoCw8oN4uT0UnLI3nGKyZLVUI5 SiqIdi/ZB92SIwuWW35cbDxAk0pQuUDJMu4ESZOtaLzjWlHtpYyFZugEd3a9+Qk8aXRhEE XK3QxDdu+y2JLtK/GDSvBvkVLYiAS3hLWjxiA7gZu3BHeii8r7BcvVR5x/1p8A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=mailchannels.net; s=arc-2022; t=1696871730; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references:dkim-signature; bh=QlFuBSlg0E1TP6LN5LFGf7CzS2eE6s5UiNS1GrMFKuU=; b=lhwmYpOWKOG1MspdhwQm0+aBswqTu4G4ezwqF8BKVgp8xM3gtcjcmrd45FGPcccKYOZlId hYcFA6PGFOTKuG03YerR6wtyrC+wWoWOqeAKlIAdvnaZeWSvZJKRMZnLL6QUIlYvheBK+b N2VGPNB3nsNPLx8sBp9fY48Z5dXS4ejliz0WNgbMIhByEJN3uNz0W8oJsnXwSff/yEVN1i E9yqPmbSuXVH/0l4PQ9qqcTQFG4oJqv/Cm7VGQzjmVzWlhS7I1xIjne+kJHxghgN2zt8xK BqjP80LeH+zZEdIj9To9a2cyF/jU1FTSeLz95mswj6icdSZYIvbjj745Rw+seQ== ARC-Authentication-Results: i=1; rspamd-7c449d4847-zgl4f; auth=pass smtp.auth=dreamhost smtp.mailfrom=kjlx@templeofstupid.com X-Sender-Id: dreamhost|x-authsender|kjlx@templeofstupid.com X-MC-Relay: Neutral X-MailChannels-SenderId: dreamhost|x-authsender|kjlx@templeofstupid.com X-MailChannels-Auth-Id: dreamhost X-Relation-Little: 36e7711853006560_1696871730964_991421424 X-MC-Loop-Signature: 1696871730964:1140449380 X-MC-Ingress-Time: 1696871730964 Received: from pdx1-sub0-mail-a302.dreamhost.com (pop.dreamhost.com [64.90.62.162]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384) by 100.103.131.218 (trex/6.9.1); Mon, 09 Oct 2023 17:15:30 +0000 Received: from kmjvbox (c-73-231-176-24.hsd1.ca.comcast.net [73.231.176.24]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) (Authenticated sender: kjlx@templeofstupid.com) by pdx1-sub0-mail-a302.dreamhost.com (Postfix) with ESMTPSA id 4S45Jf3CtlzWb for ; Mon, 9 Oct 2023 10:15:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=templeofstupid.com; s=dreamhost; t=1696871730; bh=QlFuBSlg0E1TP6LN5LFGf7CzS2eE6s5UiNS1GrMFKuU=; h=Date:From:To:Cc:Subject:Content-Type; b=d23ui0iZazRAAWOdP9h8o/a6mucm8xaDew7vyb31mqBMFfdgeDdRpuhCMPSxdZsbS FbuAI8dyiujlkFxFOjc65mrhMKIYuLcrzibc3O9DGC8K+gh35uog4fKsV3/9LX9D8L GnvmYl5hEuqsk0QMWnIaomiHbPpWWLf/c57dgFLu0QHobQIHUT9P5EbS1XHGzyz1Br 9it4BjVRDvDJBWP7kILW5xciW7rhHzXp0jD2JKpsB6unPlJLmVVea+6sAx98TQxO3m PE6FjXxMey6FwIQd4dVZBQVeaN0xqyDkoXRXe3Oyc2JKY8oFxfii35i+5ZGRkkxI1y EJ4FeUOC3HvsA== Received: from johansen (uid 1000) (envelope-from kjlx@templeofstupid.com) id e0098 by kmjvbox (DragonFly Mail Agent v0.12); Mon, 09 Oct 2023 10:15:25 -0700 Date: Mon, 9 Oct 2023 10:15:25 -0700 From: Krister Johansen To: Bernd Schubert Cc: Krister Johansen , Miklos Szeredi , linux-fsdevel@vger.kernel.org, Miklos Szeredi , linux-kernel@vger.kernel.org, German Maglione , Greg Kurz , Max Reitz Subject: Re: [resend PATCH v2 2/2] fuse: ensure that submounts lookup their parent Message-ID: <20231009171525.GA1973@templeofstupid.com> References: <45778432fba32dce1fb1f5fd13272c89c95c3f52.1696043833.git.kjlx@templeofstupid.com> <3187f942-dcf0-4b2f-a106-0eb5d5a33949@fastmail.fm> <20231007004107.GA1967@templeofstupid.com> <968148ad-787e-4ccb-9d84-f32b5da88517@fastmail.fm> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <968148ad-787e-4ccb-9d84-f32b5da88517@fastmail.fm> X-Spam-Status: No, score=2.7 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, RCVD_IN_SBL_CSS,SPF_HELO_NONE,SPF_PASS,UNPARSEABLE_RELAY autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on morse.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (morse.vger.email [0.0.0.0]); Mon, 09 Oct 2023 10:15:59 -0700 (PDT) X-Spam-Level: ** On Mon, Oct 09, 2023 at 02:52:42PM +0200, Bernd Schubert wrote: > > > On 10/7/23 02:41, Krister Johansen wrote: > > On Fri, Oct 06, 2023 at 07:13:06PM +0200, Bernd Schubert wrote: > > > > > > > > > On 10/2/23 17:24, Krister Johansen wrote: > > > > The submount code uses the parent nodeid passed into the function in > > > > order to create the root dentry for the new submount. This nodeid does > > > > not get its remote reference count incremented by a lookup option. > > > > > > > > If the parent inode is evicted from its superblock, due to memory > > > > pressure for example, it can result in a forget opertation being sent to > > > > the server. Should this nodeid be forgotten while it is still in use in > > > > a submount, users of the submount get an error from the server on any > > > > subsequent access. In the author's case, this was an EBADF on all > > > > subsequent operations that needed to reference the root. > > > > > > > > Debugging the problem revealed that the dentry shrinker triggered a forget > > > > after killing the dentry with the last reference, despite the root > > > > dentry in another superblock still using the nodeid. > > > > > > > > As a result, a container that was also using this submount failed to > > > > access its filesystem because it had borrowed the reference instead of > > > > taking its own when setting up its superblock for the submount. > > > > > > > > This commit fixes the problem by having the new submount trigger a > > > > lookup for the parent as part of creating a new root dentry for the > > > > virtiofsd submount superblock. This allows each superblock to have its > > > > inodes removed by the shrinker when unreferenced, while keeping the > > > > nodeid reference count accurate and active with the server. > > > > > > > > Signed-off-by: Krister Johansen > > > > --- > > > > fs/fuse/dir.c | 10 +++++----- > > > > fs/fuse/fuse_i.h | 6 ++++++ > > > > fs/fuse/inode.c | 43 +++++++++++++++++++++++++++++++++++++------ > > > > 3 files changed, 48 insertions(+), 11 deletions(-) > > > > > > > > diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c > > > > index 5e01946d7531..333730c74619 100644 > > > > --- a/fs/fuse/dir.c > > > > +++ b/fs/fuse/dir.c > > > > @@ -183,11 +183,11 @@ static void fuse_lookup_init(struct fuse_conn *fc, struct fuse_args *args, > > > > args->out_args[0].value = outarg; > > > > } > > > > -static int fuse_dentry_revalidate_lookup(struct fuse_mount *fm, > > > > - struct dentry *entry, > > > > - struct inode *inode, > > > > - struct fuse_entry_out *outarg, > > > > - bool *lookedup) > > > > +int fuse_dentry_revalidate_lookup(struct fuse_mount *fm, > > > > + struct dentry *entry, > > > > + struct inode *inode, > > > > + struct fuse_entry_out *outarg, > > > > + bool *lookedup) > > > > { > > > > struct dentry *parent; > > > > struct fuse_forget_link *forget; > > > > diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h > > > > index 405252bb51f2..a66fcf50a4cc 100644 > > > > --- a/fs/fuse/fuse_i.h > > > > +++ b/fs/fuse/fuse_i.h > > > > @@ -1325,6 +1325,12 @@ void fuse_dax_dontcache(struct inode *inode, unsigned int flags); > > > > bool fuse_dax_check_alignment(struct fuse_conn *fc, unsigned int map_alignment); > > > > void fuse_dax_cancel_work(struct fuse_conn *fc); > > > > +/* dir.c */ > > > > +int fuse_dentry_revalidate_lookup(struct fuse_mount *fm, struct dentry *entry, > > > > + struct inode *inode, > > > > + struct fuse_entry_out *outarg, > > > > + bool *lookedup); > > > > + > > > > /* ioctl.c */ > > > > long fuse_file_ioctl(struct file *file, unsigned int cmd, unsigned long arg); > > > > long fuse_file_compat_ioctl(struct file *file, unsigned int cmd, > > > > diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c > > > > index 444418e240c8..79a31cb55512 100644 > > > > --- a/fs/fuse/inode.c > > > > +++ b/fs/fuse/inode.c > > > > @@ -1464,7 +1464,13 @@ static int fuse_fill_super_submount(struct super_block *sb, > > > > struct fuse_mount *fm = get_fuse_mount_super(sb); > > > > struct super_block *parent_sb = parent_fi->inode.i_sb; > > > > struct fuse_attr root_attr; > > > > + struct fuse_inode *fi; > > > > struct inode *root; > > > > + struct inode *parent; > > > > + struct dentry *pdent; > > > > + struct fuse_entry_out outarg; > > > > + bool lookedup = false; > > > > + int ret; > > > > fuse_sb_defaults(sb); > > > > fm->sb = sb; > > > > @@ -1480,14 +1486,39 @@ static int fuse_fill_super_submount(struct super_block *sb, > > > > if (parent_sb->s_subtype && !sb->s_subtype) > > > > return -ENOMEM; > > > > - fuse_fill_attr_from_inode(&root_attr, parent_fi); > > > > - root = fuse_iget(sb, parent_fi->nodeid, 0, &root_attr, 0, 0); > > > > /* > > > > - * This inode is just a duplicate, so it is not looked up and > > > > - * its nlookup should not be incremented. fuse_iget() does > > > > - * that, though, so undo it here. > > > > + * It is necessary to lookup the parent_if->nodeid in case the dentry > > > > + * that triggered the automount of the submount is later evicted. > > > > + * If this dentry is evicted without the lookup count getting increased > > > > + * on the submount root, then the server can subsequently forget this > > > > + * nodeid which leads to errors when trying to access the root of the > > > > + * submount. > > > > */ > > > > - get_fuse_inode(root)->nlookup--; > > > > + parent = &parent_fi->inode; > > > > + pdent = d_find_alias(parent); > > > > + if (!pdent) > > > > + return -EINVAL; > > > > + > > > > + ret = fuse_dentry_revalidate_lookup(fm, pdent, parent, &outarg, > > > > + &lookedup); > > > > + dput(pdent); > > > > + /* > > > > + * The new root owns this nlookup on success, and it is incremented by > > > > + * fuse_iget(). In the case the lookup succeeded but revalidate fails, > > > > + * ensure that the lookup count is tracked by the parent. > > > > + */ > > > > + if (ret <= 0) { > > > > + if (lookedup) { > > > > + fi = get_fuse_inode(parent); > > > > + spin_lock(&fi->lock); > > > > + fi->nlookup++; > > > > + spin_unlock(&fi->lock); > > > > + } > > > > > > I might be wrong, but doesn't that mean that > > > "get_fuse_inode(root)->nlookup--" needs to be called? > > > > In the case where ret > 0, the nlookup on get_fuse_inode(root) is set to > > 1 by fuse_iget(). That ensures that the root is forgotten when later > > unmounted. The code that handles the forget uses the count of nlookup > > to tell the server-side how many references to forget. (That's in > > fuse_evict_inode()). > > > > However, if the fuse_dentry_revalidate_lookup() call performs a valid > > lookup but returns an error, this function will return before it fills > > out s_root in the superblock or calls fuse_iget(). If the superblock > > doesn't have a s_root set, then the code in generic_kill_super() won't > > dput() the root dentry and trigger the forget. > > > > The intention of this code was to handle the case where the lookup had > > succeeded, but the code determined it was still necessary to return an > > error. In that situation, the reference taken by the lookup has to be > > accounted somewhere, and the parent seemed like a plausible candidate. > > Yeah sorry, I had just missed that fuse_iget() also moved and then thought > it would have increased fi->nlookup already. No worries; I'd much rather get feedback if something doesn't look right, even if it turns out okay in the end. > > However, after writing up this response, I can see that there's still a > > problem here if d_make_root(root) returns NULL, because we'll also lose > > track of the nlookup in that case. > > > > If you agree that charging this to the parent on error makes sense, I'll > > re-work the error handling here so that the right thing happens when > > either fuse_dentry_revalidate_lookup() or d_make_root() encounter an > > error. > > Oh yeah, I also missed that. Although, iput() calls iput_final, which then > calls evict and sends the fuse forget - isn't that the right action already? Thanks, I had forgotten that d_make_root() would call iput() for me if d_alloc_anon() fails. Let me restate this to suggest that I account the nlookup to the parent if fuse_dentry_revalidate_lookup() or fuse_iget() fail instead. Does that sound right? > > Thanks for the feedback. > > Well, false alarm from my side, sorry again! No apology necessary; I appreciate you spending the time to look and ask questions. -K