Received: by 2002:a05:6358:45e:b0:b5:b6eb:e1f9 with SMTP id 30csp1234533rwe; Thu, 25 Aug 2022 19:20:27 -0700 (PDT) X-Google-Smtp-Source: AA6agR73TRppQNc8uIiakZNjSvNydYFkGwrZ3p7nDj4AA0JNU2Sm08Jhl3L0MVqvlrT6FO0zyJWV X-Received: by 2002:aa7:888f:0:b0:536:fefd:e5d1 with SMTP id z15-20020aa7888f000000b00536fefde5d1mr1825273pfe.69.1661480427483; Thu, 25 Aug 2022 19:20:27 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1661480427; cv=none; d=google.com; s=arc-20160816; b=1ILIXWnqayrdhHJUCORHtKZWxTeeY8gZIU5yFaJXSmNS5Gu1Qhmo1ox3zZHBbpmzkn dq6Ed422gMeIr+Zn9sC4ZC18mgT+O9W7coYKLewXe9drCa8iSGk5+l+ne+Ev0iND6JfS y2/mIaa0B7vlE9Rqcu3ousmzOIwgwhkMZIgTaNAOeFoBUJfby96PCGxbIOxD28EvCX/2 u7pz0KJd+om8i4mXRtV2INZk1p1Ir1g/v2loGAI62Py51Qshzql5BDWIUBV5RXcA06WJ 1kzFeng8mpixANmvbhcjl/YTs17Es68GLRUCjYh5TkbLKHENA8bxmtevqbpDGfBNai7U 4uPw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:message-id:date:cc:to:from:subject:dkim-signature :dkim-signature; bh=rdewQxjoX3h2OYaDP8Q52JaGQBN0oIzHjo+RTRN04G4=; b=W0CwYEDPkvh9ZZvwgf8XAf03RrfyG132V8E8bJ01ZKWnuXKB0n2afBldiY31zUEATi wBgTMGmVGhMDTVnCaEfpbK5P15DWMDXWeQf9RXZY3f4MYNFDkHwDbkritGliZ7VUb/VF KoDx3FqUe838ao3IepyUtcTXSco2IrK4Y3YdAvneCNzTdctrvBOSw/ZxjCdAPFdKtO42 HyFodvzY7gn0B8X1yfag8ZxfJImqdmA38urHPpCGmClE5X/BSy2RZJtRPyKU18gQG61L fAe2KGdbqB21lCjMarx5zgUOgcyc3FmOfvetbIRILVsP3cr+aMqu03z51Vl02NcURltN RMaw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@suse.de header.s=susede2_rsa header.b=MBjbPDKR; dkim=neutral (no key) header.i=@suse.de header.b=DnlkIkPs; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=suse.de Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id h10-20020a056a00230a00b0052e7d706a6fsi757963pfh.149.2022.08.25.19.19.40; Thu, 25 Aug 2022 19:20:27 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@suse.de header.s=susede2_rsa header.b=MBjbPDKR; dkim=neutral (no key) header.i=@suse.de header.b=DnlkIkPs; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=suse.de Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S243360AbiHZCQu (ORCPT + 99 others); Thu, 25 Aug 2022 22:16:50 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35798 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233431AbiHZCQt (ORCPT ); Thu, 25 Aug 2022 22:16:49 -0400 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 684EFC9EA3; Thu, 25 Aug 2022 19:16:48 -0700 (PDT) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 019A420441; Fri, 26 Aug 2022 02:16:47 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1661480207; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=rdewQxjoX3h2OYaDP8Q52JaGQBN0oIzHjo+RTRN04G4=; b=MBjbPDKRhm92P0yOj50KQYPjcAwo5rPA217+WOb37bH9d6Hrr/QwT9H8SKTSvFYf95Ujct ScnjsCkqwlaqlBCiMaO9hB56oLl3xcyPNA8fouOLDFh2cLSgfDyU4NAxeZA/+Nx7eC4sdY hRUh8Syjaimegb0WF51F/EHDozAJszA= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1661480207; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=rdewQxjoX3h2OYaDP8Q52JaGQBN0oIzHjo+RTRN04G4=; b=DnlkIkPs1iBbbePvziGXurxqyOTbqAazE1qRcEhbnn5zMTgt4Ju9viiCoDAr2SJK2lDKn8 Q/Q/L9OiKI6a/fBg== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 6B17313A65; Fri, 26 Aug 2022 02:16:44 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id 1NFdCQwtCGNCMQAAMHmgww (envelope-from ); Fri, 26 Aug 2022 02:16:44 +0000 Subject: [PATCH/RFC 00/10 v5] Improve scalability of directory operations From: NeilBrown To: Al Viro , Linus Torvalds , Daire Byrne , Trond Myklebust , Chuck Lever Cc: Linux NFS Mailing List , linux-fsdevel@vger.kernel.org, LKML Date: Fri, 26 Aug 2022 12:10:43 +1000 Message-ID: <166147828344.25420.13834885828450967910.stgit@noble.brown> User-Agent: StGit/1.5 MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org [I made up "v5" - I haven't been counting] VFS currently holds an exclusive lock on the directory while making changes: add, remove, rename. When multiple threads make changes in the one directory, the contention can be noticeable. In the case of NFS with a high latency link, this can easily be demonstrated. NFS doesn't really need VFS locking as the server ensures correctness. Lustre uses a single(?) directory for object storage, and has patches for ext4 to support concurrent updates (Lustre accesses ext4 directly, not via the VFS). XFS (it is claimed) doesn't its own locking and doesn't need the VFS to help at all. This patch series allows filesystems to request a shared lock on directories and provides serialisation on just the affected name, not the whole directory. It changes both the VFS and NFSD to use shared locks when appropriate, and changes NFS to request shared locks. The central enabling feature is a new dentry flag DCACHE_PAR_UPDATE which acts as a bit-lock. The ->d_lock spinlock is taken to set/clear it, and wait_var_event() is used for waiting. This flag is set on all dentries that are part of a directory update, not just when a shared lock is taken. When a shared lock is taken we must use alloc_dentry_parallel() which needs a wq which must remain until the update is completed. To make use of concurrent create, kern_path_create() would need to be passed a wq. Rather than the churn required for that, we use exclusive locking when no wq is provided. One interesting consequence of this is that silly-rename becomes a little more complex. As the directory may not be exclusively locked, the new silly-name needs to be locked (DCACHE_PAR_UPDATE) as well. A new LOOKUP_SILLY_RENAME is added which helps implement this using common code. While testing I found some odd behaviour that was caused by d_revalidate() racing with rename(). To resolve this I used DCACHE_PAR_UPDATE to ensure they cannot race any more. Testing, review, or other comments would be most welcome, NeilBrown --- NeilBrown (10): VFS: support parallel updates in the one directory. VFS: move EEXIST and ENOENT tests into lookup_hash_update() VFS: move want_write checks into lookup_hash_update() VFS: move dput() and mnt_drop_write() into done_path_update() VFS: export done_path_update() VFS: support concurrent renames. VFS: hold DCACHE_PAR_UPDATE lock across d_revalidate() NFSD: allow parallel creates from nfsd VFS: add LOOKUP_SILLY_RENAME NFS: support parallel updates in the one directory. fs/dcache.c | 72 ++++- fs/namei.c | 616 ++++++++++++++++++++++++++++++++--------- fs/nfs/dir.c | 28 +- fs/nfs/fs_context.c | 6 +- fs/nfs/internal.h | 3 +- fs/nfs/unlink.c | 51 +++- fs/nfsd/nfs3proc.c | 28 +- fs/nfsd/nfs4proc.c | 29 +- fs/nfsd/nfsfh.c | 9 + fs/nfsd/nfsproc.c | 29 +- fs/nfsd/vfs.c | 177 +++++------- include/linux/dcache.h | 28 ++ include/linux/fs.h | 5 +- include/linux/namei.h | 39 ++- 14 files changed, 799 insertions(+), 321 deletions(-) -- Signature