2008-09-05 19:19:54

by Aaron Straus

[permalink] [raw]
Subject: [NFS] blocks of zeros (NULLs) in NFS files in kernels >= 2.6.20

_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs
_______________________________________________
Please note that [email protected] is being discontinued.
Please subscribe to [email protected] instead.
http://vger.kernel.org/vger-lists.html#linux-nfs


Attachments:
(No filename) (1.76 kB)
reader.py (844.00 B)
writer.py (257.00 B)
(No filename) (363.00 B)
(No filename) (362.00 B)
Download all attachments

2008-09-08 19:02:13

by Aaron Straus

[permalink] [raw]
Subject: Re: [NFS] blocks of zeros (NULLs) in NFS files in kernels >= 2.6.20

Hi,

On Sep 05 03:56 PM, Chuck Lever wrote:
> Comparing a wire trace with strace output, starting with the writing
> client, might also be illuminating. We prefer wireshark as it uses
> good default trace settings, parses the wire bytes and displays them
> coherently, and allows you to sort the frames in various useful ways.

OK in addition to the bisection I've collected trace data for the good
(commit 4d770ccf4257b23a7ca2a85de1b1c22657b581d8) and bad (commit
e261f51f25b98c213e0b3d7f2109b117d714f69d) cases.

Attached is a file called trace.tar.bz2 inside you'll find 4 files, for
the two sessions:

bad-wireshark
bad-strace

good-wireshark
good-strace

>From a quick glance the difference seems to be the bad case does an
UNSTABLE NFS WRITE call. I don't really know what that means or what
its semantics are... but that bad commit does seem to introduce this
regression.

Anything else I can provide?

Thanks!

=a=


--
===================
Aaron Straus
aaron-bYFJunmd+ZV8UrSeD/[email protected]


Attachments:
(No filename) (1.00 kB)
trace.tar.bz2 (13.65 kB)
Download all attachments

2008-09-05 19:57:19

by Chuck Lever

[permalink] [raw]
Subject: Re: [NFS] blocks of zeros (NULLs) in NFS files in kernels >= 2.6.20

[ replacing cc: [email protected] with [email protected], and neil's
old address with his current one ]

On Sep 5, 2008, at Sep 5, 2008, 3:19 PM, Aaron Straus wrote:
> Hi all,
>
> We're hitting some bad behavior in NFS v3. The situation is this:
>
> machine A - NFS server
>
> machine B - NFS client (writer)
> machine C - NFS client (reader)
>
> (all machines x86 SMP)
>
> machine A exports a directory on ext3 filesystem:
>
> /srv/home 192.168.0.0/24(rw,sync,no_subtree_check)
>
> machines B and C mount that directory normally
>
> mount A:/srv/home /mntpnt
>
> machine B opens a file and writes to it (think a log file)
>
> machine C stats that file, opens it and reads it (think tailing the
> log file)
>
>
> The issue is that machine C will often see large blocks of NULLs
> (zeros) in the file. If you do the same read again just after you see
> the block of NULLs you will see proper the data.
>
> Attached are two simple python programs that demonstrate the problem.
>
> To use them (they will write to a file called test-nfs in CWD):
>
> (on machine B in one window)
>
> python writer.py
>
> (on machine C in another window)
>
> python reader.py
>
>
> reader.py will die when it sees NULLs in the file. Usually for us
> this happens after about 60s (two timeouts I think). The first
> NULL is
> usually either at index 4000 or 8000 depending on the kernel.
>
>
> Now the version of the kernel the server is running doesn't seem to
> matter. The reader also doesn't seem to matter (though I didn't test
> this completely). The writer seems to be the issue:
>
> Writer_Version Outcome:
> <= 2.6.19 OK
> >= 2.6.20 BAD

Up to which kernel? Recent ones may address this issue already.

> I've tested both vanilla kernel.org kernels and Ubuntu 8.04 kernels.
>
> I can try to bisect between 2.6.19 <-> 2.6.20.

That's a good start.

Comparing a wire trace with strace output, starting with the writing
client, might also be illuminating. We prefer wireshark as it uses
good default trace settings, parses the wire bytes and displays them
coherently, and allows you to sort the frames in various useful ways.

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com

2008-09-05 20:04:55

by Aaron Straus

[permalink] [raw]
Subject: Re: [NFS] blocks of zeros (NULLs) in NFS files in kernels >= 2.6.20

Hi,

On Sep 05 03:56 PM, Chuck Lever wrote:
> [ replacing cc: [email protected] with [email protected], and neil's
> old address with his current one ]

Sorry I probably grabbed an old MAINTAINERS file.

> On Sep 5, 2008, at Sep 5, 2008, 3:19 PM, Aaron Straus wrote:
> > Writer_Version Outcome:
> > <= 2.6.19 OK
> > >= 2.6.20 BAD
>
> Up to which kernel? Recent ones may address this issue already.

BAD up to 2.6.27-rc?

I have to see exactly which is the last rc version I tested.

> > I can try to bisect between 2.6.19 <-> 2.6.20.
>
> That's a good start.

OK will try to bisect.

> Comparing a wire trace with strace output, starting with the writing
> client, might also be illuminating. We prefer wireshark as it uses
> good default trace settings, parses the wire bytes and displays them
> coherently, and allows you to sort the frames in various useful ways.

OK. Could you also try to reproduce on your side using those python
programs? I want to make sure it's not something specific with our
mounts, etc.

Thanks!
=a=


--
===================
Aaron Straus
[email protected]

2008-09-05 20:41:32

by Chuck Lever

[permalink] [raw]
Subject: Re: [NFS] blocks of zeros (NULLs) in NFS files in kernels >= 2.6.20

On Sep 5, 2008, at Sep 5, 2008, 4:04 PM, Aaron Straus wrote:
> Hi,
>
> On Sep 05 03:56 PM, Chuck Lever wrote:
>> [ replacing cc: [email protected] with [email protected], and neil's
>> old address with his current one ]
>
> Sorry I probably grabbed an old MAINTAINERS file.
>
>> On Sep 5, 2008, at Sep 5, 2008, 3:19 PM, Aaron Straus wrote:
>>> Writer_Version Outcome:
>>> <= 2.6.19 OK
>>>> = 2.6.20 BAD
>>
>> Up to which kernel? Recent ones may address this issue already.
>
> BAD up to 2.6.27-rc?
>
> I have to see exactly which is the last rc version I tested.
>
>>> I can try to bisect between 2.6.19 <-> 2.6.20.
>>
>> That's a good start.
>
> OK will try to bisect.
>
>> Comparing a wire trace with strace output, starting with the writing
>> client, might also be illuminating. We prefer wireshark as it uses
>> good default trace settings, parses the wire bytes and displays them
>> coherently, and allows you to sort the frames in various useful ways.
>
> OK. Could you also try to reproduce on your side using those python
> programs? I want to make sure it's not something specific with our
> mounts, etc.

I have the latest Fedora 9 kernels on two clients, mounting via NFSv3
using "actimeo=600" (for other reasons). The server is OpenSolaris
2008.5.

reader.py reported zeroes in the test file after about 5 minutes.

Looking at the file a little later, I don't see any problems with it.

Since your scripts are not using any kind of serialization (ie file
locking) between the clients, I wonder if non-determinant behavior is
to be expected.

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com

2008-09-05 22:14:36

by Aaron Straus

[permalink] [raw]
Subject: Re: [NFS] blocks of zeros (NULLs) in NFS files in kernels >= 2.6.20

Hi,

On Sep 05 04:36 PM, Chuck Lever wrote:
> I have the latest Fedora 9 kernels on two clients, mounting via NFSv3
> using "actimeo=600" (for other reasons). The server is OpenSolaris
> 2008.5.
>
> reader.py reported zeroes in the test file after about 5 minutes.

Awesome. Thanks for testing! Our actime is much shorter which is
probably why it happens sooner for us.

> Looking at the file a little later, I don't see any problems with it.
>
> Since your scripts are not using any kind of serialization (ie file
> locking) between the clients, I wonder if non-determinant behavior is
> to be expected.

Hmm... yep. I don't know what guarantees we want to make. The
behavior doesn't seem to be consistent with older kernels though... so
I'm thinking it might be a bug.

We hit this particular issue because we have scripts which essentially
'tail -f' log files looking for errors. They miss log messages (and
see corrupted ones) b/c of the NULLs. That's also why there is no
serialization.... we don't need it when grep'ing through log messages.

I'm bisecting now. I see a block of intricate-looking NFS patches, I'll
try to narrow it down to a particular commit.

I'll also get the wireshark data at that point.

Thanks,
=a=


--
===================
Aaron Straus
aaron-bYFJunmd+ZV8UrSeD/[email protected]


Attachments:
(No filename) (1.31 kB)
signature.asc (191.00 B)
Digital signature
Download all attachments

2008-09-06 00:03:51

by Aaron Straus

[permalink] [raw]
Subject: Re: [NFS] blocks of zeros (NULLs) in NFS files in kernels >= 2.6.20

On Sep 05 03:56 PM, Chuck Lever wrote:
> > I can try to bisect between 2.6.19 <-> 2.6.20.
>
> That's a good start.

Hi,

OK. Bisected.

This is the commit where we start to see blocks of NULLs in NFS files.

e261f51f25b98c213e0b3d7f2109b117d714f69d is first bad commit
commit e261f51f25b98c213e0b3d7f2109b117d714f69d
Author: Trond Myklebust <[email protected]>
Date: Tue Dec 5 00:35:41 2006 -0500

NFS: Make nfs_updatepage() mark the page as dirty.

This will ensure that we can call set_page_writeback() from within
nfs_writepage(), which is always called with the page lock set.

Signed-off-by: Trond Myklebust <[email protected]>



Thanks,
=a=



--
===================
Aaron Straus
aaron-bYFJunmd+ZV8UrSeD/[email protected]


Attachments:
(No filename) (792.00 B)
signature.asc (191.00 B)
Digital signature
Download all attachments