Return-Path: MIME-Version: 1.0 In-Reply-To: References: Date: Sun, 22 Mar 2015 15:20:05 -0400 Message-ID: Subject: Re: Recovery after BAD_SEQID From: Trond Myklebust To: Benjamin Coddington Cc: Linux NFS Mailing List Content-Type: text/plain; charset=UTF-8 List-ID: On Thu, Mar 19, 2015 at 6:48 AM, Benjamin Coddington wrote: > I wrote yesterday about a RHEL6 bug, but I'd gotten some details wrong about > the problem, so I'm starting new thread. > > It looks like getting BAD_SEQID back from an OPEN operation drops the state_owner > which means that the state machine can't find or recover any other objects > for that state_owner. That can get the client into unrecoverable loops. I > can produce one of them with: > > 1) OPEN file1, OPEN file2 > 2) break the network for longer than the lease period > 3) during recovery, have the server return BAD_SEQID for one of the OPENS > 4) break the network again for longer than the lease period > 5) WRITE to the file that recovered properly in #3 > > This gets stuck in WRITE,NFS4ERR_EXPIRED. > > It looks like some cleanup is needed if we have to drop the whole > state_owner. Alternatively, does it make sense to just drop the objects in > that sequence? > > Ummm... Why are you seeing BAD_SEQID in the first place? That specific error means that the client and server disagree on the sequencing of the OPENs, which means there is a bug either on the client or on the server. -- Trond Myklebust Linux NFS client maintainer, PrimaryData trond.myklebust@primarydata.com