Received: by 2002:ad5:4acb:0:0:0:0:0 with SMTP id n11csp3045896imw; Wed, 6 Jul 2022 16:37:08 -0700 (PDT) X-Google-Smtp-Source: AGRyM1tx02V/LZqf7bIgqwy1StiiW4usxJy1SDvEwITy1BWLlwUgR8sFkPLIVKtd4q0ecb1qr7m0 X-Received: by 2002:a17:907:9627:b0:726:9f72:fc8 with SMTP id gb39-20020a170907962700b007269f720fc8mr41786099ejc.551.1657150628496; Wed, 06 Jul 2022 16:37:08 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1657150628; cv=none; d=google.com; s=arc-20160816; b=rD+7CfT5eMMko68QGKuGGYRdKsyyyqC+pEza7LG8QmvfrDzIm0g8sr2mFOYizvz7Zy aFaXxILjxRoXIc9NPFgYJOSti/9Ifudp0o4fg2OQw//zTreWZxAzTv5GKSjvLJOi7C50 00OMaIJlpnTWM9839hJGNLxS/0I/YYWXJ92AXzNC5lsE3FZC1iy0eWk+Gu/vaVI1Jlnl efuS33S0CcGhzocjt950A+wSti9mcTpuUYZEp7CQbKgmbLI9pEG2dMRFG2nWaY2CKElY Db/byi4fFe1WRFn/GiR0AU3xKvikK6iE5IFoHgKJOweDSsCL+4+f3f0qAX1nMTB0i6UA nOBA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:message-id:date:references:in-reply-to:subject :cc:to:from:mime-version:content-transfer-encoding:dkim-signature :dkim-signature; bh=KP45Ilat6TFUn+kDSeCRS9zHc+DP1j9Wg0vIkz2SS4c=; b=S0sP1kz5OhEgYq9ett4qWwhTufvpRLkM2KjFxVVi2Ic61kqX38ppYX0WqjLh4jw9wG 0jl9RkIsBaKer83QT9dkqI1rPz5onBQM+jVp0ZyCSFvY0g9VuCbO3W0bX6DRF5o+9Q24 07Yi/xZ5wFgf1j2McdvgtZgg5TFAvlWiGVg5lX7wB3iYrHh/4jg8Zl+YV5Vt7HH+Z4wH UPOFTYvGG2Yf6x1iTXTC9XadyiLTOE9bO28pIwZ5/acWCyi+kWPl2pz8uV95b8bSObaf kC8XSH3QkNgMFsuAxUc9oyHe1zCDbBJKXZxJ+apSOSeGDADBiEcTH9cCMoYfiB9JOUeD S8QA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@suse.de header.s=susede2_rsa header.b=EGxsbqkO; dkim=neutral (no key) header.i=@suse.de; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=suse.de Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id ka3-20020a170907990300b007048e19020esi661550ejc.30.2022.07.06.16.36.28; Wed, 06 Jul 2022 16:37:08 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@suse.de header.s=susede2_rsa header.b=EGxsbqkO; dkim=neutral (no key) header.i=@suse.de; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=suse.de Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234551AbiGFXgB (ORCPT + 99 others); Wed, 6 Jul 2022 19:36:01 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46698 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234526AbiGFXgB (ORCPT ); Wed, 6 Jul 2022 19:36:01 -0400 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4105A2CC8C for ; Wed, 6 Jul 2022 16:36:00 -0700 (PDT) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id E0E2721EC2; Wed, 6 Jul 2022 23:35:58 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1657150558; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=KP45Ilat6TFUn+kDSeCRS9zHc+DP1j9Wg0vIkz2SS4c=; b=EGxsbqkOzotAKppZYYRQXmEpN4gH3mV7yR6KhGCS3Eyn2du1ukywWazP6/1mVUQQ0zQjPD N81xU1bMaX1gWejDl8OilIwDwUqGTCN6c0x6NXkWCdubYdaN54fM08Y2IjNqxXu5Nd95bV jAXLSG7XZ2Br6jML/XKCa7BU5kbojFU= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1657150558; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=KP45Ilat6TFUn+kDSeCRS9zHc+DP1j9Wg0vIkz2SS4c=; b=22qzUgfSLpvWgTvKFKyu2appZB58+NJ0gVQcLKQWeLuhKLTPwkk3k2LqJumdp/JAbb85gF jj7XcUiy0GBJFPAA== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 79572134CF; Wed, 6 Jul 2022 23:35:57 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id yVK5DF0cxmIPPgAAMHmgww (envelope-from ); Wed, 06 Jul 2022 23:35:57 +0000 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 From: "NeilBrown" To: "Jeff Layton" Cc: "Trond Myklebust" , "Anna Schumaker" , "linux-nfs" Subject: Re: [PATCH] NFS: don't unhash dentry during unlink. In-reply-to: References: <165708423191.17141.6465885406851939941@noble.neil.brown.name>, Date: Thu, 07 Jul 2022 09:35:53 +1000 Message-id: <165715055394.17141.17231322377882434619@noble.neil.brown.name> X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org On Thu, 07 Jul 2022, Jeff Layton wrote: > On Wed, 2022-07-06 at 15:10 +1000, NeilBrown wrote: > > NFS unlink() must determine if the file is open, and must perform a > > "silly rename" instead of an unlink if it is. Otherwise the client > > might hold a file open which has been removed on the server. > >=20 > > Consequently if it determines that the file isn't open, it must block > > any subsequent opens until the unlink has been completed on the server. > >=20 > > This is currently achieved by unhashing the dentry. This forces any > > open attempt to the slow-path for lookup which will block on i_sem on > > the directory until the unlink completes. A proposed patch will change > > the VFS to only get a shared lock on i_sem for unlink, so this will no > > longer work. > >=20 > > Instead we introduce an explicit interlock. A flag is set on the dentry > > while the unlink is running and ->d_revalidate blocks while that flag is > > set. This closes the race without requiring exclusion on i_sem. > > unlink will still have exclusion on the dentry being unlinked, so it > > will be safe to set and then clear the flag without any risk of another > > thread touching the flag. > >=20 > > There is little room for adding new dentry flags, so instead of adding a > > new flag, we overload an existing flag which is not used by NFS. > >=20 > > DCACHE_DONTCACHE is only set for filesystems which call > > d_mark_dontcache() and NFS never calls this, so it is currently unused > > in NFS. > > DCACHE_DONTCACHE is only tested when the last reference on a dentry has > > been dropped, so it is safe for NFS to set and then clear the flag while > > holding a reference - the setting of the flag cannot cause a > > misunderstanding. > >=20 > > So we define DCACHE_NFS_PENDING_UNLINK as an alias for DCACHE_DONTCACHE > > and add a definition to nfs_fs.h so that if NFS ever does find a need to > > call d_mark_dontcache() the build will fail with a suitable error. > >=20 > > Signed-off-by: NeilBrown > > --- > >=20 > > Hi Trond/Anna, > > this patch is a precursor for my parallel-directory-updates patch set. > > I would be particularly helpful if this (and the nfsd patches I > > recently sent) could land for the next merge window. Then I could post > > a substantially reduced series to implement parallel directory > > updates, which would then be easier for other to review. > >=20 > > Thanks, > > NeilBrown > >=20 > >=20 > > fs/nfs/dir.c | 23 ++++++++++++++++------- > > include/linux/nfs_fs.h | 14 ++++++++++++++ > > 2 files changed, 30 insertions(+), 7 deletions(-) > >=20 > > diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c > > index 0c4e8dd6aa96..695bb057cbd2 100644 > > --- a/fs/nfs/dir.c > > +++ b/fs/nfs/dir.c > > @@ -1778,6 +1778,8 @@ __nfs_lookup_revalidate(struct dentry *dentry, unsi= gned int flags, > > int ret; > > =20 > > if (flags & LOOKUP_RCU) { > > + if (dentry->d_flags & DCACHE_NFS_PENDING_UNLINK) > > + return -ECHILD; > > parent =3D READ_ONCE(dentry->d_parent); > > dir =3D d_inode_rcu(parent); > > if (!dir) > > @@ -1786,6 +1788,9 @@ __nfs_lookup_revalidate(struct dentry *dentry, unsi= gned int flags, > > if (parent !=3D READ_ONCE(dentry->d_parent)) > > return -ECHILD; > > } else { > > + /* Wait for unlink to complete */ > > + wait_var_event(&dentry->d_flags, > > + !(dentry->d_flags & DCACHE_NFS_PENDING_UNLINK)); > > parent =3D dget_parent(dentry); > > ret =3D reval(d_inode(parent), dentry, flags); > > dput(parent); > > @@ -2454,7 +2459,6 @@ static int nfs_safe_remove(struct dentry *dentry) > > int nfs_unlink(struct inode *dir, struct dentry *dentry) > > { > > int error; > > - int need_rehash =3D 0; > > =20 > > dfprintk(VFS, "NFS: unlink(%s/%lu, %pd)\n", dir->i_sb->s_id, > > dir->i_ino, dentry); > > @@ -2469,15 +2473,20 @@ int nfs_unlink(struct inode *dir, struct dentry *= dentry) > > error =3D nfs_sillyrename(dir, dentry); > > goto out; > > } > > - if (!d_unhashed(dentry)) { > > - __d_drop(dentry); > > - need_rehash =3D 1; > > - } > > + /* We must prevent any concurrent open until the unlink > > + * completes. ->d_revalidate will wait for DCACHE_NFS_PENDING_UNLINK > > + * to clear. We set it here to ensure no lookup succeeds until > > + * the unlink is complete on the server. > > + */ > > + dentry->d_flags |=3D DCACHE_NFS_PENDING_UNLINK; > > + > > spin_unlock(&dentry->d_lock); > > error =3D nfs_safe_remove(dentry); > > nfs_dentry_remove_handle_error(dir, dentry, error); > > - if (need_rehash) > > - d_rehash(dentry); > > + spin_lock(&dentry->d_lock); > > + dentry->d_flags &=3D ~DCACHE_NFS_PENDING_UNLINK; > > + spin_unlock(&dentry->d_lock); > > + wake_up_var(&dentry->d_flags); > > out: > > trace_nfs_unlink_exit(dir, dentry, error); > > return error; > > diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h > > index a17c337dbdf1..041a6076e045 100644 > > --- a/include/linux/nfs_fs.h > > +++ b/include/linux/nfs_fs.h > > @@ -617,6 +617,20 @@ nfs_fileid_to_ino_t(u64 fileid) > > =20 > > #define NFS_JUKEBOX_RETRY_TIME (5 * HZ) > > =20 > > +/* We need to block new opens while a file is being unlinked. > > + * If it is opened *before* we decide to unlink, we will silly-rename > > + * instead. If it is opened *after*, then we the open to fail unless it = creates >=20 > "then we allow the open to fail" Actually it is "then we need the open to fail". I should probably do a complete re-write of that para. >=20 > > + * a new file. > > + * If we allow the open and unlink to race, we could end up with a file = that is > > + * open but deleted on the server resulting in ESTALE. > > + * So overload DCACHE_DONTCACHE to record when the unlink is happening > > + * and block dentry revalidation while it is set. > > + * DCACHE_DONTCACHE is only used by filesystems which call d_mark_dontca= che() > > + * which NFS never calls. It is only tested on a dentry on which all re= ferences > > + * have been dropped, so it is safe for NFS to set it while holding a re= ference. > > + */ > > +#define DCACHE_NFS_PENDING_UNLINK DCACHE_DONTCACHE > > +#define d_mark_dontcache(i) BUILD_BUG_ON_MSG(1, "NFS cannot use d_mark_d= ontcache()") > > =20 > > # undef ifdebug > > # ifdef NFS_DEBUG >=20 > Wow, we really are out of dentry flags. I wonder if some of them are no > longer needed? >=20 > This overloading is a bit klunky but it's probably OK. AFAICT, > 0x80000000 is still available though if this turns out to be too nasty. > It looks like 0x08000000 may also be free I need one of those two in a subsequent patch to lock a dentry while the name/link is being created. If I used the other for NFS_PENDING_UNLINK we would be completely out. This flag really should be completely private to nfs. d_fsdata would be the best place to put it. But NFS doesn't have a permanent d_fsdata in which I can store a bit. Nor does it leave d_fsdata untouched, so I cannot store a magic value in there. There are two different uses of d_fsdata. I don't fully understand when they are active, so I don't know if it is safe to add another independent use - I suspect not though. >=20 >=20 > Reviewed-by: Jeff Layton >=20 Thanks, NeilBrown