Return-Path: Received: from mx1.redhat.com ([209.132.183.28]:39504 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S934457AbdJQRqR (ORCPT ); Tue, 17 Oct 2017 13:46:17 -0400 From: "Benjamin Coddington" To: "Jeff Layton" Cc: bfields@fieldses.org, linux-nfs@vger.kernel.org Subject: Re: [PATCH] nfsd4: Prevent the reuse of a closed stateid Date: Tue, 17 Oct 2017 13:46:15 -0400 Message-ID: In-Reply-To: <1508258356.4747.6.camel@redhat.com> References: <2087b4cab6c695a7df01c1135f1773c9b762f0b0.1508248427.git.bcodding@redhat.com> <1508258356.4747.6.camel@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; format=flowed Sender: linux-nfs-owner@vger.kernel.org List-ID: On 17 Oct 2017, at 12:39, Jeff Layton wrote: > On Tue, 2017-10-17 at 09:55 -0400, Benjamin Coddington wrote: >> While running generic/089 on NFSv4.1, the following on-the-wire >> exchange >> occurs: >> >> Client Server >> ---------- ---------- >> OPEN (owner A) -> >> <- OPEN response: state A1 >> CLOSE (state A1)-> >> OPEN (owner A) -> >> <- CLOSE response: state A2 >> <- OPEN response: state A3 >> LOCK (state A3) -> >> <- LOCK response: NFS4_BAD_STATEID >> >> The server should not be returning state A3 in response to the second >> OPEN >> after processing the CLOSE and sending back state A2. The problem is >> that >> nfsd4_process_open2() is able to find an existing open state just >> after >> nfsd4_close() has incremented the state's sequence number but before >> calling unhash_ol_stateid(). >> >> Fix this by using the state's sc_type field to identify closed state, >> and >> protect the update of that field with st_mutex. >> nfsd4_find_existing_open() >> will return with the st_mutex held, so that the state will not >> transition >> between being looked up and then updated in nfsd4_process_open2(). >> >> Signed-off-by: Benjamin Coddington >> --- >> fs/nfsd/nfs4state.c | 29 +++++++++++++++++++---------- >> 1 file changed, 19 insertions(+), 10 deletions(-) >> > > The problem is real, but I'm not thrilled with the fix. It seems more > lock thrashy... Really? I don't think I increased any call counts to spinlocks or mutex locks. The locking overhead should be equivalent. > Could we instead fix this by just unhashing the stateid earlier, > before > the nfs4_inc_and_copy_stateid call? I think I tried this - and if I remember correctly, the problem is that you can't hold st_mutux while taking the cl_lock (or maybe it was the fi_lock?). I tried several simpler ways, and they resulted in deadlocks. That was a couple of weeks ago, so I apologize if my memory is fuzzy. I should have sent this patch off to the list then. If you need me to to verify that, I can - but maybe someone more familiar with the locking here can save me that time. Ben