Received: by 2002:a05:6602:18e:0:0:0:0 with SMTP id m14csp2079726ioo; Mon, 23 May 2022 09:37:52 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwKbLJUpJedUK7uf3Beth8H75WOdhxXHUnFbSe56NYrBsEC5ZH3sE9mHmIS/4B9OnMZitNJ X-Received: by 2002:a05:6a00:1a55:b0:518:a189:8f7e with SMTP id h21-20020a056a001a5500b00518a1898f7emr5619525pfv.48.1653323871892; Mon, 23 May 2022 09:37:51 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1653323871; cv=none; d=google.com; s=arc-20160816; b=V0EkUPFQ8cnmVN0g7oWuJY7YpeVoSRdkrFq/2uu2WuoL0B4t8oqYf8O/3zyz39LjCG ZU7QcscxGaFGnWigZMTp60G+xDsLkbqPV4WO+dPdHfqd7o51e8v20hiFCqikpGp4NHeJ FcX+5yezCtQ3a5GCGCh58u6mhWmNF7B4LGWNTWfB6gnQjU6zMCY6wIB2IbIr1Y0QD8gc xe049lKnWaXK87oqkZPYrTFqaroV/ckzV5UsPNQGeTsC2jBSLAOms8vq+t3Hk/SD07qR W6wXGkcpuFMPWEA9YIkxnlZDCBjWxLTX5OBXDzu1X7GQ/N8T7cyMO6Nkx35KRW2gQCsa Kfmw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:user-agent :content-transfer-encoding:references:in-reply-to:date:cc:to:from :subject:message-id:dkim-signature; bh=DUqHk4rf1+YIPW47Ow6EQxOc8qmBwUvI2CTY9kUPsMQ=; b=KZDN8eWORmX+p2Ue1DKsrzM6Rc2cv7lN+MHFeudyizq4vg03FlIqtIBnzCP/i8PJ9l pVybZCnIqXlKPbxf3B4vYVGDKQHrinMkoamr0wLb4pcL5hvwQACOJl5FDoMFkA5cFFvZ VVqc/lQgT6kGr34LfSVFg1QvM5qwSmomstGVrgWiGfe5O8/ELUBjrDJVW3wgmzHN5skq RhDoDpuYP4Fjcl8J5VXARP6QJ60honFE5xkVfbhidWIsK5y3/GSwxy7LUi4EeI8onFeD 4B9vFQ8/n5K60djoP2zKf2B1v1On8s5opJsjc7y4/V4xGcsw1aJA1VfXshuPJa3XP7Bk aAZg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=R46blzjL; spf=softfail (google.com: domain of transitioning linux-nfs-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [23.128.96.19]) by mx.google.com with ESMTPS id lr14-20020a17090b4b8e00b001e001016e10si14732336pjb.31.2022.05.23.09.37.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 23 May 2022 09:37:51 -0700 (PDT) Received-SPF: softfail (google.com: domain of transitioning linux-nfs-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) client-ip=23.128.96.19; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=R46blzjL; spf=softfail (google.com: domain of transitioning linux-nfs-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 5969C692B6; Mon, 23 May 2022 09:37:49 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238830AbiEWQhs (ORCPT + 99 others); Mon, 23 May 2022 12:37:48 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56048 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238837AbiEWQhr (ORCPT ); Mon, 23 May 2022 12:37:47 -0400 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2D0F2692B6 for ; Mon, 23 May 2022 09:37:46 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 97355B811CC for ; Mon, 23 May 2022 16:37:44 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id BC669C385A9; Mon, 23 May 2022 16:37:42 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1653323863; bh=UE2bptiO+d274XDyQ7hBScLv6kqxro/eEXpB4CRBLn0=; h=Subject:From:To:Cc:Date:In-Reply-To:References:From; b=R46blzjLSElsd3ucWhsBErJywP6bpj//g6laHFytKEMycsbPBM8e8nRI/eFDQqpEI StikrXKkBK2uvQ5wFLr43z+Vwb9wT5uoBgZNapnPqV/OTBRTb7qzrYGwtQ5JVIrqDr UxjuWE/9XjTw66rn36zAzW8kVJvIsddqZgEStcgUiYnPTOThkAC08ZnwTUUz8fJ9j+ hUe/68SEl7NXzaEuPTvHXFyXQr/YtdRBVzGT1VpZiqxzt16fzuvr717ok7FFubnM8g C1Fyfd+HPG1g+cQYnHWSfhZmjK6TVS0VsK/nHew5sAFI926WYpVCHA69JFHqU4EVkA /rbEucb6w3Q3A== Message-ID: Subject: Re: [PATCH RFC] NFSD: Fix possible sleep during nfsd4_release_lockowner() From: Jeff Layton To: Chuck Lever III Cc: Linux NFS Mailing List Date: Mon, 23 May 2022 12:37:41 -0400 In-Reply-To: <1A37E2B5-8113-48D6-AF7C-5381F364D99E@oracle.com> References: <165323344948.2381.7808135229977810927.stgit@bazille.1015granger.net> <510282CB-38D3-438A-AF8A-9AC2519FCEF7@oracle.com> <1A37E2B5-8113-48D6-AF7C-5381F364D99E@oracle.com> Content-Type: text/plain; charset="ISO-8859-15" Content-Transfer-Encoding: quoted-printable User-Agent: Evolution 3.44.1 (3.44.1-1.fc36) MIME-Version: 1.0 X-Spam-Status: No, score=-3.0 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,MAILING_LIST_MULTI, RDNS_NONE,SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org On Mon, 2022-05-23 at 15:41 +0000, Chuck Lever III wrote: >=20 > > On May 23, 2022, at 11:26 AM, Jeff Layton wrote: > >=20 > > On Mon, 2022-05-23 at 15:00 +0000, Chuck Lever III wrote: > > >=20 > > > > On May 23, 2022, at 9:40 AM, Jeff Layton wrote= : > > > >=20 > > > > On Sun, 2022-05-22 at 11:38 -0400, Chuck Lever wrote: > > > > > nfsd4_release_lockowner() holds clp->cl_lock when it calls > > > > > check_for_locks(). However, check_for_locks() calls nfsd_file_get= () > > > > > / nfsd_file_put() to access the backing inode's flc_posix list, a= nd > > > > > nfsd_file_put() can sleep if the inode was recently removed. > > > > >=20 > > > >=20 > > > > It might be good to add a might_sleep() to nfsd_file_put? > > >=20 > > > I intend to include the patch you reviewed last week that > > > adds the might_sleep(), as part of this series. > > >=20 > > >=20 > > > > > Let's instead rely on the stateowner's reference count to gate > > > > > whether the release is permitted. This should be a reliable > > > > > indication of locks-in-use since file lock operations and > > > > > ->lm_get_owner take appropriate references, which are released > > > > > appropriately when file locks are removed. > > > > >=20 > > > > > Reported-by: Dai Ngo > > > > > Signed-off-by: Chuck Lever > > > > > Cc: stable@vger.kernel.org > > > > > --- > > > > > fs/nfsd/nfs4state.c | 9 +++------ > > > > > 1 file changed, 3 insertions(+), 6 deletions(-) > > > > >=20 > > > > > This might be a naive approach, but let's start with it. > > > > >=20 > > > > > This passes light testing, but it's not clear how much our existi= ng > > > > > fleet of tests exercises this area. I've locally built a couple o= f > > > > > pynfs tests (one is based on the one Dai posted last week) and th= ey > > > > > pass too. > > > > >=20 > > > > > I don't believe that FREE_STATEID needs the same simplification. > > > > >=20 > > > > > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c > > > > > index a280256cbb03..b77894e668a4 100644 > > > > > --- a/fs/nfsd/nfs4state.c > > > > > +++ b/fs/nfsd/nfs4state.c > > > > > @@ -7559,12 +7559,9 @@ nfsd4_release_lockowner(struct svc_rqst *r= qstp, > > > > >=20 > > > > > /* see if there are still any locks associated with it */ > > > > > lo =3D lockowner(sop); > > > > > - list_for_each_entry(stp, &sop->so_stateids, st_perstateowner) = { > > > > > - if (check_for_locks(stp->st_stid.sc_file, lo)) { > > > > > - status =3D nfserr_locks_held; > > > > > - spin_unlock(&clp->cl_lock); > > > > > - return status; > > > > > - } > > > > > + if (atomic_read(&sop->so_count) > 1) { > > > > > + spin_unlock(&clp->cl_lock); > > > > > + return nfserr_locks_held; > > > > > } > > > > >=20 > > > > > nfs4_get_stateowner(sop); > > > > >=20 > > > > >=20 > > > >=20 > > > > lm_get_owner is called from locks_copy_conflock, so if someone else > > > > happens to be doing a LOCKT or F_GETLK call at the same time that > > > > RELEASE_LOCKOWNER gets called, then this may end up returning an er= ror > > > > inappropriately. > > >=20 > > > IMO releasing the lockowner while it's being used for _anything_ > > > seems risky and surprising. If RELEASE_LOCKOWNER succeeds while > > > the client is still using the lockowner for any reason, a > > > subsequent error will occur if the client tries to use it again. > > > Heck, I can see the server failing in mid-COMPOUND with this kind > > > of race. Better I think to just leave the lockowner in place if > > > there's any ambiguity. > > >=20 > >=20 > > The problem here is not the client itself calling RELEASE_LOCKOWNER > > while it's still in use, but rather a different client altogether > > calling LOCKT (or a local process does a F_GETLK) on an inode where a > > lock is held by a client. The LOCKT gets a reference to it (for the > > conflock), while the client that has the lockowner releases the lock an= d > > then the lockowner while the refcount is still high. > >=20 > > The race window for this is probably quite small, but I think it's > > theoretically possible. The point is that an elevated refcount on the > > lockowner doesn't necessarily mean that locks are actually being held b= y > > it. >=20 > Sure, I get that the lockowner's reference count is not 100% > reliable. The question is whether it's good enough. >=20 > We are looking for a mechanism that can simply count the number > of locks held by a lockowner. It sounds like you believe that > lm_get_owner / put_owner might not be a reliable way to do > that. >=20 >=20 > > > The spec language does not say RELEASE_LOCKOWNER must not return > > > LOCKS_HELD for other reasons, and it does say that there is no > > > choice of using another NFSERR value (RFC 7530 Section 13.2). > > >=20 > >=20 > > What recourse does the client have if this happens? It released all of > > its locks and tried to release the lockowner, but the server says "lock= s > > held". Should it just give up at that point? RELEASE_LOCKOWNER is a sor= t > > of a courtesy by the client, I suppose... >=20 > RELEASE_LOCKOWNER is a courtesy for the server. Most clients > ignore the return code IIUC. >=20 > So the hazard caused by this race would be a small resource > leak on the server that would go away once the client's lease > was purged. >=20 >=20 > > > > My guess is that that would be pretty hard to hit the > > > > timing right, but not impossible. > > > >=20 > > > > What we may want to do is have the kernel do this check and only if= it > > > > comes back >1 do the actual check for locks. That won't fix the ori= ginal > > > > problem though. > > > >=20 > > > > In other places in nfsd, we've plumbed in a dispose_list head and > > > > deferred the sleeping functions until the spinlock can be dropped. = I > > > > haven't looked closely at whether that's possible here, but it may = be a > > > > more reliable approach. > > >=20 > > > That was proposed by Dai last week. > > >=20 > > > https://lore.kernel.org/linux-nfs/1653079929-18283-1-git-send-email-d= ai.ngo@oracle.com/T/#u > > >=20 > > > Trond pointed out that if two separate clients were releasing a > > > lockowner on the same inode, there is nothing that protects the > > > dispose_list, and it would get corrupted. > > >=20 > > > https://lore.kernel.org/linux-nfs/31E87CEF-C83D-4FA8-A774-F2C389011FC= E@oracle.com/T/#mf1fc1ae0503815c0a36ae75a95086c3eff892614 > > >=20 > >=20 > > Yeah, that doesn't look like what's needed. > >=20 > > What I was going to suggest is a nfsd_file_put variant that takes a > > list_head. If the refcount goes to zero and the thing ends up being > > unhashed, then you put it on the dispose list rather than doing the > > blocking operations, and then clean it up later. >=20 > Trond doesn't like that approach; see the e-mail thread. >=20 I didn't see him saying that that would be wrong, per-se, but the initial implementation was racy. His suggestion was just to keep a counter in the lockowner of how many locks are associated with it. That seems like a good suggestion, though you'd probably need to add a parameter to lm_get_owner to indicate whether you were adding a new lock or just doing a conflock copy. Checking the object refcount like this patch does seems wrong though. --=20 Jeff Layton