Return-Path: Received: from mx1.redhat.com ([209.132.183.28]:32813 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755190AbcCPTqa (ORCPT ); Wed, 16 Mar 2016 15:46:30 -0400 Date: Wed, 16 Mar 2016 15:46:28 -0400 (EDT) From: Benjamin Coddington To: "J. Bruce Fields" cc: Trond Myklebust , Anna Schumaker , Linux NFS Mailing List Subject: Re: [PATCH] NFS: Retry a zero-length short read In-Reply-To: <20160316191503.GB12992@fieldses.org> Message-ID: References: <79adfc63f33887475f6a0f0eeaacd8b52b2ed12c.1458119750.git.bcodding@redhat.com> <20160316171857.GA12992@fieldses.org> <20160316191503.GB12992@fieldses.org> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Sender: linux-nfs-owner@vger.kernel.org List-ID: > On Wed, Mar 16, 2016 at 01:36:48PM -0400, Benjamin Coddington wrote: > > On Wed, 16 Mar 2016, J. Bruce Fields wrote: > > > > > On Wed, Mar 16, 2016 at 12:22:22PM -0400, Trond Myklebust wrote: > > > > On Wed, Mar 16, 2016 at 11:20 AM, Benjamin Coddington > > > > wrote: > > > > > On Wed, 16 Mar 2016, Benjamin Coddington wrote: > > > > > > > > > >> On Wed, 16 Mar 2016, Trond Myklebust wrote: > > > > >> > > > > >> > On Wed, Mar 16, 2016 at 10:22 AM, Benjamin Coddington > > > > >> > wrote: > > > > >> > > On Wed, 16 Mar 2016, Trond Myklebust wrote: > > > > >> > > > > > > >> > >> On Wed, Mar 16, 2016 at 5:17 AM, Benjamin Coddington > > > > >> > >> wrote: > > > > >> > >> > > > > > >> > >> > A zero-length short read without eof should be retried rather than sending > > > > >> > >> > an error to the application. > > > > >> > >> > > > > >> > >> > > > > >> > >> In what situation would returning a 0 length read not be a bug? If the > > > > >> > >> server intended that we back off and retry, it has the alternative of > > > > >> > >> sending a JUKEBOX/DELAY error. > > > > >> > > > > > > >> > > If the server completes a local read but then another writer comes in and > > > > >> > > appends to the file before the server checks if it needs to set EOF, then > > > > >> > > the response might be 0 length without EOF set. > > > > >> > > > > > >> > Why isn't that EOF check done atomically with the read itself? This > > > > >> > still sounds like a server bug to me. > > > > >> > > > > >> I don't know -- I would guess because doing that atomically is harder than > > > > >> not, and I don't know where the RFCs say that a zero length response without > > > > >> eof is to be treated as an error or condition to be avoided. > > > > >> > > > > >> I'll look into that, and respond here. > > > > > > > > > > Indeed, it seems that it is more convenient for the linux server to send a > > > > > zero-length response without eof when the file grows. It would probably be > > > > > more helpful if the server handled that case, but I think that 7530 states > > > > > that it doesn't have to handle that case. > > > > > > > > Here is what RFC5661 and RFC7530 say. > > > > > > > > If the READ ended at the end-of-file (formally, in a correctly formed > > > > READ request, if offset + count is equal to the size of the file), or > > > > the READ request extends beyond the size of the file (if offset + > > > > count is greater than the size of the file), eof is returned as TRUE; > > > > otherwise, it is FALSE. A successful READ of an empty file will > > > > always return eof as TRUE. > > > > > > > > Here is what RFC1813 says: > > > > > > > > eof > > > > If the read ended at the end-of-file (formally, in a > > > > correctly formed READ request, if READ3args.offset plus > > > > READ3resok.count is equal to the size of the file), eof > > > > is returned as TRUE; otherwise it is FALSE. A > > > > successful READ of an empty file will always return eof > > > > as TRUE. > > > > > > > > Where does it say that the eof determination is allowed to be > > > > non-atomic? Unlike structures such as change_info4, there isn't an > > > > "atomic" flag to allow the server to communicate to the client that it > > > > cannot rely on the eof flag. Since the determination is part of the > > > > same READ operation, you can't point to the "COMPOUNDS are not atomic" > > > > either. > > > > > > I wonder why READ has eof at all, instead of adopting read()'s > > > convention that 0 means eof? > > > > > > In any case, our server should be able to count on that convention > > > internally, so if getting a real "eof" out of the filesystem looks > > > daunting, why not for now just check for that case in > > > nfsd4_encode_readv()?: > > > > It's probably not really that daunting. Couldn't we check to see if inode's > > size has increased, and if so do the read again? > > Does the client really have trouble with short reads in general, or only > with zero-length short reads? Just the zero-length short read. > > The concept of READ and eof atomicity should also include a concept of a > > boundary or barrier. The operation is atomic within what bound - the > > server's response processing or the underlying filesystem? > > Ideally the read data and eof both represent the same state of the file > with no file-modifying operations intervening between the two. > > I could come up with artificial examples that would allow clients to > detect that we don't meet this standard. I can't think of one that > would occur in practice (may just be my lack of imagination). So, sounds like fixing this is a good idea on the server. I hope Trond will let us know if he still feels that the client ought not to be changed since it seems an easy enough fix to avoid a similar problem on another server. Perhaps there's a downside I'm not seeing on the client. Or maybe the convention of read() returning 0 meaning eof is global enough to cause it to be acceptible behavior -- we really should treat a zero-length read response without eof as an error. My lack of experience is showing.. :) Ben