Return-Path: Received: from aserp1040.oracle.com ([141.146.126.69]:24362 "EHLO aserp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753866AbbHFAPk convert rfc822-to-8bit (ORCPT ); Wed, 5 Aug 2015 20:15:40 -0400 Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 7.3 \(1878.6\)) Subject: Re: [PATCH] NFS: Add OTW write barrier before may-open test From: Chuck Lever In-Reply-To: Date: Wed, 5 Aug 2015 20:15:32 -0400 Cc: Linux NFS Mailing List Message-Id: <8F6C1893-11A1-485C-A9EA-0ABBCC4D4286@oracle.com> References: <20150805182305.11796.75476.stgit@manet.1015granger.net> <359CCE70-6056-418F-B627-688BE27C7EA8@oracle.com> To: Trond Myklebust Sender: linux-nfs-owner@vger.kernel.org List-ID: On Aug 5, 2015, at 6:27 PM, Trond Myklebust wrote: > On Wed, Aug 5, 2015 at 2:44 PM, Chuck Lever wrote: >> >> On Aug 5, 2015, at 2:34 PM, Chuck Lever wrote: >> >>> Commit 14546c337588 ("NFS: Don't do a full flush to disk on close() >>> if we hold a delegation") added an optimization. When an NFSv4 write >>> delegation is present, close(2) does not wait while a file's dirty >>> data is flushed to the NFS server. >>> >>> However, if the application workload immediately re-opens that file, >>> nfs_may_open() can perform an ACCESS and GETATTR which runs >>> concurrently with the flushing WRITE. If the flushing WRITE and >>> GETATTR complete out of order on the server, the file size cached on >>> the client will go backwards, possibly resulting in new writes going >>> to the wrong file offset. >>> >>> Add a write barrier before the access check to ensure the server's >>> idea of the file's size is properly up to date. >>> >>> The downside of this approach is that each fresh open(2) of a dirty >>> file results in an extra flush. It seems to me that _any_ open(2) >>> done while there is dirty data waiting on the client could result in >>> a file size roll back. However, I see bad behavior only when the >>> client holds a write delegation. >>> >>> Fixes: 14546c337588 ("NFS: Don't do a full flush to disk on . . .") >>> Signed-off-by: Chuck Lever >>> --- >>> fs/nfs/dir.c | 9 +++++++++ >>> 1 file changed, 9 insertions(+) >>> >>> I'm not certain this is a good long term fix. Some other possible >>> solutions include: >>> >>> - Not performing the access check if the client holds a delegation >>> - Not performing a GETATTR as part of the ACCESS check >>> - Simply marking the file attributes stale instead of using the >>> returned file size >>> - Reverting commit 14546c337588 >> >> OK. If the client holds a write delegation, then it shouldn't care >> about the server's file size at all until it has flushed all dirty >> data and returned the delegation. So flushing here is probably wrong. >> >> But the incoming file size in the GETATTR is definitely screwing up >> the cached file size. >> > > In which kernels are you seeing the race? For recent kernels (v4.0+) > the write code should be calling nfs_fattr_set_barrier() in order to > prevent the result from the ACCESS from overwriting the new file size. I'm testing on 4.2-rc4. -- Chuck Lever