Return-Path: Received: from mail-yk0-f181.google.com ([209.85.160.181]:35388 "EHLO mail-yk0-f181.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751545AbbIITSF (ORCPT ); Wed, 9 Sep 2015 15:18:05 -0400 Received: by ykdu9 with SMTP id u9so31795415ykd.2 for ; Wed, 09 Sep 2015 12:18:04 -0700 (PDT) Date: Wed, 9 Sep 2015 15:18:01 -0400 From: Jeff Layton To: Trond Myklebust Cc: Andrew W Elble , Bruce James Fields , Linux NFS Mailing List Subject: Re: upgrade/downgrade race Message-ID: <20150909151801.3fd2243f@tlielax.poochiereds.net> In-Reply-To: References: <20150909144935.51e3653b@tlielax.poochiereds.net> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Sender: linux-nfs-owner@vger.kernel.org List-ID: On Wed, 9 Sep 2015 15:01:54 -0400 Trond Myklebust wrote: > On Wed, Sep 9, 2015 at 2:49 PM, Jeff Layton wrote: > > On Wed, 9 Sep 2015 13:49:44 -0400 > > Trond Myklebust wrote: > > > >> +Bruce, +Jeff... > >> > >> On Wed, Sep 9, 2015 at 1:12 PM, Trond Myklebust > >> wrote: > >> > On Wed, Sep 9, 2015 at 9:37 AM, Andrew W Elble wrote: > >> >> > >> >> In attempting to troubleshoot other issues, we've run into this race > >> >> with 4.1.4 (both client and server) with a few cherry-picked patches > >> >> from upstream. This is my attempt at a redacted packet-capture. > >> >> > >> >> These all affect the same fh/stateid: > >> >> > >> >> 116 -> OPEN (will be an upgrade / for write) > >> >> 117 -> OPEN_DOWNGRADE (to read for the existing stateid / seqid = 0x6 > >> >> > >> >> 121 -> OPEN_DOWNGRADE (completed last / seqid = 0x8) > >> >> 122 -> OPEN (completed first / seqid = 0x7) > >> >> > >> >> Attempts to write using that stateid fail because the stateid doesn't > >> >> have write access. > >> >> > >> >> Any thoughts? I can share more data from the capture if needed. > >> > > >> > Bruce & Jeff, > >> > > >> > Given that the client sent a non-zero seqid, why is the OPEN_DOWNGRADE > >> > being executed after the OPEN here? Surely, if that is the case, the > >> > server should be returning NFS4ERR_OLD_STATEID and failing the > >> > OPEN_DOWNGRADE operation? > >> > > > > > The problem there is that we do the seqid checks at the beginning of > > the operation. In this case it's likely that it was 0x6 when the > > OPEN_DOWNGRADE started. The OPEN completed first though and bumped the > > seqid, and then the downgrade finished and bumped it again. When we bump > > the seqid we don't verify it against what came in originally. > > > > The question is whether that's wrong from the POV of the spec. RFC5661 > > doesn't seem to explicitly require that we serialize such operations on > > the server. The closest thing I can find is this in 3.3.12: > > RFC5661, section 8.2.2 > Except for layout stateids (Section 12.5.3), when a client sends a > stateid to the server, it has two choices with regard to the seqid > sent. It may set the seqid to zero to indicate to the server that it > wishes the most up-to-date seqid for that stateid's "other" field to > be used. This would be the common choice in the case of a stateid > sent with a READ or WRITE operation. It also may set a non-zero > value, in which case the server checks if that seqid is the correct > one. In that case, the server is required to return > NFS4ERR_OLD_STATEID if the seqid is lower than the most current value > and NFS4ERR_BAD_STATEID if the seqid is greater than the most current > value. This would be the common choice in the case of stateids sent > with a CLOSE or OPEN_DOWNGRADE. Because OPENs may be sent in > parallel for the same owner, a client might close a file without > knowing that an OPEN upgrade had been done by the server, changing > the lock in question. If CLOSE were sent with a zero seqid, the OPEN > upgrade would be cancelled before the client even received an > indication that an upgrade had happened. > > The suggestion there is clearly that the client can rely on the server > not reordering those CLOSE/OPEN_DOWNGRADE operations w.r.t. a parallel > OPEN. Otherwise, what is the difference between sending a non-zero > seqid and zero? > > > "The server is required to increment the "seqid" field by > > one at each transition of the stateid. This is important since the > > client will inspect the seqid in OPEN stateids to determine the order > > of OPEN processing done by the server." > > > > If we do need to fix this on the server, it's likely to be pretty ugly: > > > > We'd either need to serialize seqid morphing operations (ugh), or make > > update_stateid do an cmpxchg to swap it into place (or add some extra > > locking around it), and then have some way to unwind all of the changes > > if that fails. That may be impossible however -- we're likely closing > > struct files after all. > > Updates to the state are already required to be atomic. You can't have > a stateid where an OPEN_DOWNGRADE or CLOSE only partially succeeded. > > > > > Now, all of that said, I think the client has some bugs in its seqid > > handling as well. It should have realized that the stateid was a r/o > > one after the OPEN_DOWNGRADE came back with the higher seqid, but it > > still issued a WRITE just afterward. That seems wrong. > > No. The client is relying on the server not reordering the > OPEN_DOWNGRADE. It expects either for the OPEN to happen first, and > the OPEN_DOWNGRADE to fail, or for the OPEN_DOWNGRADE to happen first, > and for both operations to succeed. > > Trond In that case, the "simple" fix would be to add a mutex to nfs4_ol_stateid. Lock that in nfs4_preprocess_seqid_op, and ensure that we unlock it after bumping the seqid (or on error). Bruce, any thoughts? -- Jeff Layton