2008-09-22 16:35:28

by Trond Myklebust

[permalink] [raw]
Subject: Re: [NFS] blocks of zeros (NULLs) in NFS files in kernels >= 2.6.20

On Mon, 2008-09-22 at 18:05 +0200, Hans-Peter Jansen wrote:
> For what is worth, this behavior is visible in bog standard writing/reading
> files, (log files in my case, via the python logging package). It obviously
> deviates from local filesystem behavior, and former state of the linux
> nfs-client. Should we add patches to less, tail, and all other instruments
> for watching/analysing log files (just to pick the tip of the ice rock) in
> order to throw away runs of zeros, when reading from nfs mounted files? Or
> should we ask their maintainers to add locking code for the nfs "read
> files, which are written at the same time" case, just to work around
> __some__ of the consequences of this bug? Imagine, how ugly this is going
> to look!
>
> The whole issue is what I call a major regression, thus I strongly ask for a
> reply from Trond on this matter.
>
> I even vote for sending a revert request for this hunk to the stable team,
> where it is applicable, after Trond sorted it out (for 2.6.27?).
>
> Thanks, Aaron and Chuck for the detailed analysis - it demystified a wired
> behavior, I observed here. When you're in a process to get real work done
> in a fixed timeline, such things could make you mad..

Revert _what_ exactly?

Please assume that I've been travelling for the past 5 weeks, and have
only a sketchy idea of what has been going on.
My understanding was that this is a consequence of unordered writes
causing the file to be extended while some other task is reading.
AFAICS, this sort of behaviour has _always_ been possible. I can't see
how reverting anything will fix it.

Trond



2008-09-22 17:04:17

by Aaron Straus

[permalink] [raw]
Subject: Re: [NFS] blocks of zeros (NULLs) in NFS files in kernels >= 2.6.20

Hi,

On Sep 22 12:35 PM, Trond Myklebust wrote:
> Revert _what_ exactly?

Yep. I narrowed the problem down to an offending hunk in a particular
patch. Removing that hunk did eliminate the problem. However,
reverting that hunk is likely wrong and the code has changed _a lot_
since that commit.

> My understanding was that this is a consequence of unordered writes
> causing the file to be extended while some other task is reading.

Yes. I added some debugging statements to look at the writeout path.

I think the following happens:

- page 0 gets partially written to by app
- VM writes out partial dirty page 0
- page 0 gets fully written by app
- page 1 gets partially written by app
- VM writes out dirty page 1

At this point there is a hole in the file. The tail end of page 0 is
still not written to server.

- VM writes out dirty page 0
...

> AFAICS, this sort of behaviour has _always_ been possible. I can't see
> how reverting anything will fix it.

Here is the crux. It was possible previously but unlikely e.g. our app
never saw this behavior. The new writeout semantics causes visible
holes in files often.

Anyway, I agree the new writeout semantics are allowed and possibly
saner than the previous writeout path. The problem is that it is
__annoying__ for this use case (log files).

I'm not sure if there is an easy solution. We want the VM to writeout
the address space in order. Maybe we can start the scan for dirty
pages at the last page we wrote out i.e. page 0 in the example above?

=a=



--
===================
Aaron Straus
aaron-bYFJunmd+ZV8UrSeD/[email protected]

2008-09-22 18:45:52

by Hans-Peter Jansen

[permalink] [raw]
Subject: Re: [NFS] blocks of zeros (NULLs) in NFS files in kernels >= 2.6.20

Am Montag, 22. September 2008 schrieb Trond Myklebust:
> On Mon, 2008-09-22 at 18:05 +0200, Hans-Peter Jansen wrote:
> > For what is worth, this behavior is visible in bog standard
> > writing/reading files, (log files in my case, via the python logging
> > package). It obviously deviates from local filesystem behavior, and
> > former state of the linux nfs-client. Should we add patches to less,
> > tail, and all other instruments for watching/analysing log files (just
> > to pick the tip of the ice rock) in order to throw away runs of zeros,
> > when reading from nfs mounted files? Or should we ask their maintainers
> > to add locking code for the nfs "read files, which are written at the
> > same time" case, just to work around __some__ of the consequences of
> > this bug? Imagine, how ugly this is going to look!
> >
> > The whole issue is what I call a major regression, thus I strongly ask
> > for a reply from Trond on this matter.
> >
> > I even vote for sending a revert request for this hunk to the stable
> > team, where it is applicable, after Trond sorted it out (for 2.6.27?).
> >
> > Thanks, Aaron and Chuck for the detailed analysis - it demystified a
> > wired behavior, I observed here. When you're in a process to get real
> > work done in a fixed timeline, such things could make you mad..
>
> Revert _what_ exactly?
For your convenience, important parts inlined here:

>From Aarons message: Tue, 9 Sep 2008 12:46:44 -0700 in this thread. << EOM

Of the bisected offending commit:

commit e261f51f25b98c213e0b3d7f2109b117d714f69d
Author: Trond Myklebust <[email protected]>
Date: Tue Dec 5 00:35:41 2006 -0500

NFS: Make nfs_updatepage() mark the page as dirty.

This will ensure that we can call set_page_writeback() from within
nfs_writepage(), which is always called with the page lock set.

Signed-off-by: Trond Myklebust <[email protected]>


It seems to be this hunk which introduces the problem:


@@ -628,7 +667,6 @@ static struct nfs_page * nfs_update_request(struct
nfs_open_context* ctx,
return ERR_PTR(error);
}
spin_unlock(&nfsi->req_lock);
- nfs_mark_request_dirty(new);
return new;
}
spin_unlock(&nfsi->req_lock);


If I add that function call back in... the problem disappears. I don't
know if this just papers over the real problem though?

EOM

This commit happened between 2.6.19 and 2.6.20, btw.

> Please assume that I've been travelling for the past 5 weeks, and have
> only a sketchy idea of what has been going on.

Ahh, I see, that explains, why you didn't responded earlier.

> My understanding was that this is a consequence of unordered writes
> causing the file to be extended while some other task is reading.
> AFAICS, this sort of behaviour has _always_ been possible. I can't see
> how reverting anything will fix it.

Hopefully, this helps you to remember the purpose of that change.

Cheers,
Pete