Received: by 2002:ab2:3b09:0:b0:1ed:14ea:9113 with SMTP id b9csp175375lqc; Thu, 29 Feb 2024 13:55:40 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCUotfW+kU4mYu4A29kV2FfkA9pzx4AcbBqowQ7WXeywfD45QG5YCmMXu2Zb2OuuOCjIi+/S/AYFcdl8wVyeK5pw0lRfw/kwpNwX2qIeWw== X-Google-Smtp-Source: AGHT+IGVzzqL3OoJqc9RXhsmKjbfF91uEFKqHypXy/5fUWpnYAXk5MlBNEIUaLuvWxG5FAh4WO8H X-Received: by 2002:a17:907:b9d7:b0:a3f:5144:ada2 with SMTP id xa23-20020a170907b9d700b00a3f5144ada2mr165357ejc.2.1709243740616; Thu, 29 Feb 2024 13:55:40 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1709243740; cv=pass; d=google.com; s=arc-20160816; b=sE+Jw9aoGX4ztJ7wHljbbX30nJNrp1IBN7v5qxABGlVtr5kUs8xZQKFw18NV9HcMaP KJtynQJjJ2mD1UXUA8lme9Ud7G8B6SrKwfG01lWOPSgmTe/I5N1IHqrpROxdI9FPFP6P lf2R1SuEkEmS79zxj1/WASQwgH+hJi+pUuG3DPj/d4spaMdyiwaYFNupZWgM+nBov8y+ B5JRg7B5B6VWAZzFyg4lE33lDEbzcT9J0RHzTuBCft6hs+choGToSYRGYLcV9itq6t3H 9n97Jo5+TF2o4JTe1aEa+dG1Mld2QLbxgvgQIOIe9/CEajCvvhbmEemnR0IIQJFwvWaa ifcA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=cc:to:message-id:content-transfer-encoding:mime-version :list-unsubscribe:list-subscribe:list-id:precedence:subject:date :from:dkim-signature; bh=U9VBvK2axspshyDxtIdPG2D0VbvW/eSzTvp4NV3mbGw=; fh=NquNkNBxqcXfLYnkXBAd1Fx+1O0uKjX0J1IIxQSRoic=; b=RqDHunR69PvJmJC4iIGjxVMOm8Adez3fD64N7JnzIHQybYaYIsdGeyefdE2rzBrry/ bqAd8aScUjvPosDRgyX4wqCcUOpV2HdxODCxf940/vKtZfBa3DIZNYqKnbdTYsewZBgL 7WaLVm/JhYqjl4lRwllylFqJ5QZNBzjTww0RVvJfN16lrZZojuuK3lgTszwIhOFUU4JR D7JjvMCvJtKrVzpA217jiytt8MLMjJJTBkEju29AhBIzYpngmcJ3VDfkyZBDWxxzjRaZ s4HamxJWiB4UBwb+w+7fxGzdBJLZYtAUVAjpBdiGXckOxbFFk77AZvYC3Iw4CvAyuIUQ NZ6w==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=k1wmwJa1; arc=pass (i=1 dkim=pass dkdomain=kernel.org); spf=pass (google.com: domain of linux-kernel+bounces-87607-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-87607-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [2604:1380:4601:e00::3]) by mx.google.com with ESMTPS id z13-20020a170906240d00b00a4425f8658bsi881264eja.735.2024.02.29.13.55.40 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 29 Feb 2024 13:55:40 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-87607-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) client-ip=2604:1380:4601:e00::3; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=k1wmwJa1; arc=pass (i=1 dkim=pass dkdomain=kernel.org); spf=pass (google.com: domain of linux-kernel+bounces-87607-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-87607-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id 0D18F1F24377 for ; Thu, 29 Feb 2024 21:55:40 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 1ACE86D530; Thu, 29 Feb 2024 21:55:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="k1wmwJa1" Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2B5126D521; Thu, 29 Feb 2024 21:55:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709243730; cv=none; b=GiFyo/zukBIaWOfF+mGvMdtiJj2bbDSDufkWup+lPWLsy8uVwKWuCB6BEQFRBZBqPXpobr7v86koyNa5pOdvZu6LQUvLytAa2+ZiAQ1A2mGwfMkSxiYfmEogwy0L/HV9e4ykhTMGPdTJsVfCG5k+5xZ/CQqvP0oLjvPxR0SiE+0= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709243730; c=relaxed/simple; bh=QXwImBHkQU4oBCMDlASAtoeObYm8w7DXrTIbSDNAt40=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:To:Cc; b=UzS09ioPX3CBBCHhen+Np8WArYZ4m4uTVNkuLiRvKFPHNgtRzJw4jb2bp7NKERQ02/Wjejp8zbsWcI3exGSofzBUTJk6zAHrE20GnjN1DAmYIJvf5WTgPrcBhRzyKBQC+jMlj7seDDLva8YS8BnneyzK4MPlv5YgyK52u8PS0qg= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=k1wmwJa1; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id CBEC8C433C7; Thu, 29 Feb 2024 21:55:28 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1709243729; bh=QXwImBHkQU4oBCMDlASAtoeObYm8w7DXrTIbSDNAt40=; h=From:Date:Subject:To:Cc:From; b=k1wmwJa15FhdfrnwodWdq/GkznfkCJ39jCfh2Gh0+PduWbc8Fr+MQT8hyv/uwO5Dc EwRyj0x63X3NcW3pzqf7I7qLkpwnGfM3HWrVnJs64WBRSMYdWXcgIHrK871ETYrIV4 kqIllaLvUAXuBTcxoYr4+D7IkmUKFbqLdL8FNfnW3pb1AqINLhLh+vQUxEZjuRCEHD qb3CjuLCabAaRRRioC7RydS4LXHQ+eYSVdR2NCArTF8ZWsPqODlRJrrYvA7aFcVhAx x/ZH02Fq44gCPV6ivv3AvsrUfFSWaLySfsfgJHs5VTuhXmA4PieCwZF/BtjG8mugvn UBZtqOpFfjHew== From: Jeff Layton Date: Thu, 29 Feb 2024 16:55:26 -0500 Subject: [PATCH RFC] nfsd: return NFS4ERR_DELAY on contention for v4.0 replay_owner rp_mutex Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Message-Id: <20240229-rp_mutex-v1-1-47deb9e4d32d@kernel.org> X-B4-Tracking: v=1; b=H4sIAE394GUC/6tWKk4tykwtVrJSqFYqSi3LLM7MzwNyDHUUlJIzE vPSU3UzU4B8JSMDIxMDIyNL3aKC+NzSktQK3RQjg1RjY0PLVGNDUyWg8oKi1LTMCrBR0UpBbs5 KsbW1AHYo3VZfAAAA To: Chuck Lever , Neil Brown , Olga Kornievskaia , Dai Ngo , Tom Talpey Cc: linux-nfs@vger.kernel.org, linux-kernel@vger.kernel.org, Jeff Layton X-Mailer: b4 0.12.3 X-Developer-Signature: v=1; a=openpgp-sha256; l=4484; i=jlayton@kernel.org; h=from:subject:message-id; bh=QXwImBHkQU4oBCMDlASAtoeObYm8w7DXrTIbSDNAt40=; b=owEBbQKS/ZANAwAIAQAOaEEZVoIVAcsmYgBl4P1Q440J9i93kJmvdFmdQ3HdXV5UpvE4Cp8fd o8U9xOHkfKJAjMEAAEIAB0WIQRLwNeyRHGyoYTq9dMADmhBGVaCFQUCZeD9UAAKCRAADmhBGVaC FTwcD/4wVDSH4NxzwIUI/ZD+cyBy+I6YFEUZaIlDCvWVXSpenQZq/ZiH+wiiQZRceeXscP0K+aa /6b9lnD1+PkgupDYAWCe4EMyd94Qpdfka42ZeKXDVwmeh//pk2ARmkichihJD+AQ4ZtIJDNYl/V vOjPXV9CwqVHgnRyc3XI8O+v42i0s5G2j72QTufhQLW+i14mRl1JB/AKLnthvV4hEHnk+5tKBY5 z/KjBEz7nkBWq5HqCqUpKqgG4PLwmKNm6vZ00/5z4byC5r7wpAEd2FJk2gdThoZsAZtyDZ4b38U 9dOR5pK654/vAyFuelvtJxD5NE71FNX3PkBKmSi+Z0L/z3miAMR8V+/EGqicMFaKqcx3f0FeL3U NonNZ803fxYPglRq+k0u+DLwnVql1XCJZ3JBLFMq4+loHhqg4pm8zQOu2jkCz5IUhay6imX0gTU Us5sCLMoDGaT1KW4xprllM4+KeAs39yQ7JgqJPlls4KRcGIeSy8YZhh4EzyENuFBbaXfh/1pHzo xXwPrwu0dRkbvB/WAE6rnS72LP8+I6MNfFj2NzgGly9+W2ItUr8JTeBmodgssu2hZBLE9gLrUgT uC+Z/s4oJvcUxsepjtD0ivyWilkF5f30atJugJq0Ke/9FoPHN0HdpL81H5k02u2eXwbb34B2u5s EmEEtX9+qhFRHEw== X-Developer-Key: i=jlayton@kernel.org; a=openpgp; fpr=4BC0D7B24471B2A184EAF5D3000E684119568215 move_to_close_lru() is currently called with ->st_mutex and .rp_mutex held. This can lead to a deadlock as move_to_close_lru() waits for sc_count to drop to 2, and some threads holding a reference might be waiting for either mutex. These references will never be dropped so sc_count will never reach 2. There have been a couple of attempted fixes (see [1] and [2]), but both were problematic for different reasons. This patch attempts to break the impasse by simply not waiting for the rp_mutex. If it's contended then we just have it return NFS4ERR_DELAY. This will likely cause parallel opens by the same openowner to be even slower on NFSv4.0, but it should break the deadlock. One way to address the performance impact might be to allow the wait for the mutex to time out after a period of time (30ms would be the same as NFSD_DELEGRETURN_TIMEOUT). We'd need to add a mutex_lock_timeout function in order for that to work. Chuck also suggested that we may be able to utilize the svc_defer mechanism instead of returning NFS4ERR_DELAY in this situation, but I'm not quite sure how feasible that is. [1]: https://lore.kernel.org/lkml/4dd1fe21e11344e5969bb112e954affb@jd.com/T/ [2]: https://lore.kernel.org/linux-nfs/170546328406.23031.11217818844350800811@noble.neil.brown.name/ Signed-off-by: Jeff Layton --- fs/nfsd/nfs4state.c | 30 +++++++++++++++++------------- 1 file changed, 17 insertions(+), 13 deletions(-) diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index aee12adf0598..4b667eeb06c8 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -4658,13 +4658,16 @@ static void init_nfs4_replay(struct nfs4_replay *rp) mutex_init(&rp->rp_mutex); } -static void nfsd4_cstate_assign_replay(struct nfsd4_compound_state *cstate, +static __be32 nfsd4_cstate_assign_replay(struct nfsd4_compound_state *cstate, struct nfs4_stateowner *so) { if (!nfsd4_has_session(cstate)) { - mutex_lock(&so->so_replay.rp_mutex); + WARN_ON_ONCE(cstate->replay_owner); + if (!mutex_trylock(&so->so_replay.rp_mutex)) + return nfserr_jukebox; cstate->replay_owner = nfs4_get_stateowner(so); } + return nfs_ok; } void nfsd4_cstate_clear_replay(struct nfsd4_compound_state *cstate) @@ -5332,15 +5335,17 @@ nfsd4_process_open1(struct nfsd4_compound_state *cstate, strhashval = ownerstr_hashval(&open->op_owner); oo = find_openstateowner_str(strhashval, open, clp); open->op_openowner = oo; - if (!oo) { + if (!oo) goto new_owner; - } if (!(oo->oo_flags & NFS4_OO_CONFIRMED)) { /* Replace unconfirmed owners without checking for replay. */ release_openowner(oo); open->op_openowner = NULL; goto new_owner; } + status = nfsd4_cstate_assign_replay(cstate, &oo->oo_owner); + if (status) + return status; status = nfsd4_check_seqid(cstate, &oo->oo_owner, open->op_seqid); if (status) return status; @@ -5350,6 +5355,9 @@ nfsd4_process_open1(struct nfsd4_compound_state *cstate, if (oo == NULL) return nfserr_jukebox; open->op_openowner = oo; + status = nfsd4_cstate_assign_replay(cstate, &oo->oo_owner); + if (status) + return status; alloc_stateid: open->op_stp = nfs4_alloc_open_stateid(clp); if (!open->op_stp) @@ -6121,12 +6129,8 @@ nfsd4_process_open2(struct svc_rqst *rqstp, struct svc_fh *current_fh, struct nf void nfsd4_cleanup_open_state(struct nfsd4_compound_state *cstate, struct nfsd4_open *open) { - if (open->op_openowner) { - struct nfs4_stateowner *so = &open->op_openowner->oo_owner; - - nfsd4_cstate_assign_replay(cstate, so); - nfs4_put_stateowner(so); - } + if (open->op_openowner) + nfs4_put_stateowner(&open->op_openowner->oo_owner); if (open->op_file) kmem_cache_free(file_slab, open->op_file); if (open->op_stp) @@ -7193,9 +7197,9 @@ nfs4_preprocess_seqid_op(struct nfsd4_compound_state *cstate, u32 seqid, if (status) return status; stp = openlockstateid(s); - nfsd4_cstate_assign_replay(cstate, stp->st_stateowner); - - status = nfs4_seqid_op_checks(cstate, stateid, seqid, stp); + status = nfsd4_cstate_assign_replay(cstate, stp->st_stateowner); + if (!status) + status = nfs4_seqid_op_checks(cstate, stateid, seqid, stp); if (!status) *stpp = stp; else --- base-commit: 2eb3d14898b97bdc0596d184cbf829b5a81cd639 change-id: 20240229-rp_mutex-d20e3319e315 Best regards, -- Jeff Layton