2008-12-03 00:05:25

by Russell Miller

[permalink] [raw]
Subject: NFS directory listing hang on write.

Hi, all. We're having some NFS problems here that I don't understand.

The arguments are standard. wsize=32768,rsize=32768,tcp. Nothing
else out of the ordinary. Test case is easy too:

mount a filesystem.
cd into the directory.
create a 1G file and background the creation, so 1G is being written
into the directory.
Then, try to ls the directory.

The ls will hang, sometimes for a few seconds, sometimes for the
entire length of the write. This happens on every NFS server I've
tried, including an acopia, bluearc, onstor, and a generic linux box
running on the other end.

I have seen this behavior on multiple systems and using multiple
kernels. Starting with the latest centos 5.2 kernel, but I've also
reproduced it on the stock 2.6.27.7. I don't really understand what's
going on, but I think it has something to do with the attribute cache,
as setting actimeo=1 seems to have a positive effect. Setting it to
zero seems to have no effect.

It does not seem to be happening with centos 4.x kernels, which are
based on 2.6.9.

I have turned on nfs and rpc debugging, and it's not telling me
anything useful, which is what I would expect if the attribute cache
is borking somehow. I don't think there's any debugging for the cache
itself.

Can someone please give me an idea of how to proceed in debugging
this? I'm not a stranger to the kernel, but this is deeper in than
I've gone before, and I am frankly a little over my head here.

Thanks,


2008-12-03 00:22:17

by Trond Myklebust

[permalink] [raw]
Subject: Re: NFS directory listing hang on write.

On Tue, 2008-12-02 at 16:05 -0800, Russell Miller wrote:
> Hi, all. We're having some NFS problems here that I don't understand.
>
> The arguments are standard. wsize=32768,rsize=32768,tcp. Nothing
> else out of the ordinary. Test case is easy too:
>
> mount a filesystem.
> cd into the directory.
> create a 1G file and background the creation, so 1G is being written
> into the directory.
> Then, try to ls the directory.
>
> The ls will hang, sometimes for a few seconds, sometimes for the
> entire length of the write. This happens on every NFS server I've
> tried, including an acopia, bluearc, onstor, and a generic linux box
> running on the other end.

That is known and expected behaviour. It is not a bug.

The reason is that your 'ls' is also trying to stat() the file, and so
the client has to flush out all those writes in order to ensure that the
mtime/ctime are posixly correct.
By this, I mean that POSIX requires the filesystem to update the
mtime/ctime every time you issue a write() call. In the case of NFS,
this update is actually done when the client sends the WRITE rpc call to
to the server. IOW: in order to ensure that your mtime/ctime doesn't
change after you've issued the stat() call, we actually need to flush
out those backgrounded writes and wait for them to complete.

Cheers
Trond