Received: by 2002:ac2:464d:0:0:0:0:0 with SMTP id s13csp3581528lfo; Mon, 23 May 2022 08:26:30 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxi8MnmUtOZdim6+6mTxX9bYgVHDWAq+3WbjE81kRDp/tqmXs50eHDtnmQGS+UY30gaoKnU X-Received: by 2002:a17:903:2c8:b0:161:51d6:61b with SMTP id s8-20020a17090302c800b0016151d6061bmr23472370plk.23.1653319590107; Mon, 23 May 2022 08:26:30 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1653319590; cv=none; d=google.com; s=arc-20160816; b=PJ6RA0DbbwGElKsCMv59v91Zimm1dSUhlkDj3ApREwfqROtxBVwQEIWou+0Fy/aPPQ AbuD+7Nr4iVKkzKc+RlQ9/4d0pNEOH35VJdABrCLsSPQxeMZM1mOYPr1Mhvusp2/xtdQ vKKzBCgkhku/Pwtmv4AgYSCp4dzvEqWW+MJXR1DfqGpUDkAZiLf9MUXLS3ZgXWc+AqRK Ad1btne/AYVQxJtHfk2C+++Ul5lE2DuLEpO8GjWINDqv2ueeKHh7VezuBMxGzPlvPigd a2xYcKFY9ElVEdrhqZOiYbfM/Zqm3af93zSRZ0Ni4NkAUrNvFtvXCgnMRZMgUj+9t1tD wQXQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:user-agent :content-transfer-encoding:references:in-reply-to:date:cc:to:from :subject:message-id:dkim-signature; bh=rotfiX/pOU2RKlVKlbnjfz+cKmOONcBrhQEgrJdUeYs=; b=QhoB8aj6gGXgnMCd4GI0f5XrTs4ehc2rYKT/8DYBd0Om0NRYj/WKzI23bJcSFr0yQk 5x0sG7nSYfv2ejcIBv3jJ87aJLu9WXq/mI0MEwQ3kBz5CCY6ijPx37hJS5d2SI0MvFDW jJDCH1LIQlj0k5671FTYUkOGfu5EXpaEiRbA8BXmOdUJr3KsnCGf+8kORE8EjsWiqBcE a4wlaOvG9PIxz9Xpn8g5iO1GfSqtSBL+0XrNJgDrCEhWTO0gjje1eytGXpfnroinQs9l TIOaWI0Ba0ve/WkQILqwNtlqRLyPw5qHEUMrfm1oymMxP1HGk0zjjNdzvYiqCxOAvxaF 7Y9w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=lCVWZvEn; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [2620:137:e000::1:18]) by mx.google.com with ESMTPS id s11-20020a17090302cb00b00158b024df25si10438108plk.555.2022.05.23.08.26.28 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 23 May 2022 08:26:30 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) client-ip=2620:137:e000::1:18; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=lCVWZvEn; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 9851B5DA3F; Mon, 23 May 2022 08:26:23 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237925AbiEWP0T (ORCPT + 99 others); Mon, 23 May 2022 11:26:19 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35684 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237921AbiEWP0S (ORCPT ); Mon, 23 May 2022 11:26:18 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 27C13532FF for ; Mon, 23 May 2022 08:26:17 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id A08B8612B2 for ; Mon, 23 May 2022 15:26:16 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id C4850C385A9; Mon, 23 May 2022 15:26:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1653319576; bh=1CM42ecTY/U7gVahRV41JpZ7X0SZT8TwQMmI0949N7I=; h=Subject:From:To:Cc:Date:In-Reply-To:References:From; b=lCVWZvEnPam684VcuT/73DMw0kGHOBlkYUtfTiQv3Oll63yUuKgSmyryyjZaMon8z XZ1sQ1rV7IvV2m79mWviBRv7CEqnipvMqH0CDikk31k08BiB4nUHJq2RXBguCfgdl7 k7L06lNw2lH0m6yPModKNbh8+VuYA0FaTDawzCo2cs1NgIO8s5eBa2nmqtDG5cdCXI NjqCMILY4qdtYxE6GAKnFINzDOySzaqrV+RDD6KUKDkJ8Hdnnn9cz53j5wUw2/Xs7p ecYfnknqGF5q3MfFBeth5ox2wi9IHuoBIwnOIBhQZAIcZ/2EhyQKuNSpMfzDKdhPXH 054TJy6AmiW6A== Message-ID: Subject: Re: [PATCH RFC] NFSD: Fix possible sleep during nfsd4_release_lockowner() From: Jeff Layton To: Chuck Lever III Cc: Linux NFS Mailing List Date: Mon, 23 May 2022 11:26:14 -0400 In-Reply-To: <510282CB-38D3-438A-AF8A-9AC2519FCEF7@oracle.com> References: <165323344948.2381.7808135229977810927.stgit@bazille.1015granger.net> <510282CB-38D3-438A-AF8A-9AC2519FCEF7@oracle.com> Content-Type: text/plain; charset="ISO-8859-15" Content-Transfer-Encoding: quoted-printable User-Agent: Evolution 3.44.1 (3.44.1-1.fc36) MIME-Version: 1.0 X-Spam-Status: No, score=-3.0 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,MAILING_LIST_MULTI, RDNS_NONE,SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org On Mon, 2022-05-23 at 15:00 +0000, Chuck Lever III wrote: >=20 > > On May 23, 2022, at 9:40 AM, Jeff Layton wrote: > >=20 > > On Sun, 2022-05-22 at 11:38 -0400, Chuck Lever wrote: > > > nfsd4_release_lockowner() holds clp->cl_lock when it calls > > > check_for_locks(). However, check_for_locks() calls nfsd_file_get() > > > / nfsd_file_put() to access the backing inode's flc_posix list, and > > > nfsd_file_put() can sleep if the inode was recently removed. > > >=20 > >=20 > > It might be good to add a might_sleep() to nfsd_file_put? >=20 > I intend to include the patch you reviewed last week that > adds the might_sleep(), as part of this series. >=20 >=20 > > > Let's instead rely on the stateowner's reference count to gate > > > whether the release is permitted. This should be a reliable > > > indication of locks-in-use since file lock operations and > > > ->lm_get_owner take appropriate references, which are released > > > appropriately when file locks are removed. > > >=20 > > > Reported-by: Dai Ngo > > > Signed-off-by: Chuck Lever > > > Cc: stable@vger.kernel.org > > > --- > > > fs/nfsd/nfs4state.c | 9 +++------ > > > 1 file changed, 3 insertions(+), 6 deletions(-) > > >=20 > > > This might be a naive approach, but let's start with it. > > >=20 > > > This passes light testing, but it's not clear how much our existing > > > fleet of tests exercises this area. I've locally built a couple of > > > pynfs tests (one is based on the one Dai posted last week) and they > > > pass too. > > >=20 > > > I don't believe that FREE_STATEID needs the same simplification. > > >=20 > > > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c > > > index a280256cbb03..b77894e668a4 100644 > > > --- a/fs/nfsd/nfs4state.c > > > +++ b/fs/nfsd/nfs4state.c > > > @@ -7559,12 +7559,9 @@ nfsd4_release_lockowner(struct svc_rqst *rqstp= , > > >=20 > > > /* see if there are still any locks associated with it */ > > > lo =3D lockowner(sop); > > > - list_for_each_entry(stp, &sop->so_stateids, st_perstateowner) { > > > - if (check_for_locks(stp->st_stid.sc_file, lo)) { > > > - status =3D nfserr_locks_held; > > > - spin_unlock(&clp->cl_lock); > > > - return status; > > > - } > > > + if (atomic_read(&sop->so_count) > 1) { > > > + spin_unlock(&clp->cl_lock); > > > + return nfserr_locks_held; > > > } > > >=20 > > > nfs4_get_stateowner(sop); > > >=20 > > >=20 > >=20 > > lm_get_owner is called from locks_copy_conflock, so if someone else > > happens to be doing a LOCKT or F_GETLK call at the same time that > > RELEASE_LOCKOWNER gets called, then this may end up returning an error > > inappropriately. >=20 > IMO releasing the lockowner while it's being used for _anything_ > seems risky and surprising. If RELEASE_LOCKOWNER succeeds while > the client is still using the lockowner for any reason, a > subsequent error will occur if the client tries to use it again. > Heck, I can see the server failing in mid-COMPOUND with this kind > of race. Better I think to just leave the lockowner in place if > there's any ambiguity. >=20 The problem here is not the client itself calling RELEASE_LOCKOWNER while it's still in use, but rather a different client altogether calling LOCKT (or a local process does a F_GETLK) on an inode where a lock is held by a client. The LOCKT gets a reference to it (for the conflock), while the client that has the lockowner releases the lock and then the lockowner while the refcount is still high. The race window for this is probably quite small, but I think it's theoretically possible. The point is that an elevated refcount on the lockowner doesn't necessarily mean that locks are actually being held by it. > The spec language does not say RELEASE_LOCKOWNER must not return > LOCKS_HELD for other reasons, and it does say that there is no > choice of using another NFSERR value (RFC 7530 Section 13.2). >=20 What recourse does the client have if this happens? It released all of its locks and tried to release the lockowner, but the server says "locks held". Should it just give up at that point? RELEASE_LOCKOWNER is a sort of a courtesy by the client, I suppose... >=20 > > My guess is that that would be pretty hard to hit the > > timing right, but not impossible. > >=20 > > What we may want to do is have the kernel do this check and only if it > > comes back >1 do the actual check for locks. That won't fix the origina= l > > problem though. > >=20 > > In other places in nfsd, we've plumbed in a dispose_list head and > > deferred the sleeping functions until the spinlock can be dropped. I > > haven't looked closely at whether that's possible here, but it may be a > > more reliable approach. >=20 > That was proposed by Dai last week. >=20 > https://lore.kernel.org/linux-nfs/1653079929-18283-1-git-send-email-dai.n= go@oracle.com/T/#u >=20 > Trond pointed out that if two separate clients were releasing a > lockowner on the same inode, there is nothing that protects the > dispose_list, and it would get corrupted. >=20 > https://lore.kernel.org/linux-nfs/31E87CEF-C83D-4FA8-A774-F2C389011FCE@or= acle.com/T/#mf1fc1ae0503815c0a36ae75a95086c3eff892614 >=20 Yeah, that doesn't look like what's needed. What I was going to suggest is a nfsd_file_put variant that takes a list_head. If the refcount goes to zero and the thing ends up being unhashed, then you put it on the dispose list rather than doing the blocking operations, and then clean it up later. That said, nfsd_file_put has grown significantly in complexity over the years, so maybe that's not simple to do now. --=20 Jeff Layton