2008-05-12 01:54:05

by NeilBrown

[permalink] [raw]
Subject: Re: Many open/close on same files yeilds "No such file or directory".

On Friday May 9, [email protected] wrote:
>
> When I disabled the NFS-server and rand my "real-world" program on a
> single processor (make -j 1). It ran through fine. It basically
> gets around 20 million chunks out of differnet file and assemble the
> chuncks in a few other files. This processes more or less 5 individual
> sections, so make can run effectively with a concurrency of 5.

(For linux-nfs readers: the problem is that repeatedly opening a given
file sometimes returns a ENOENT - http://lkml.org/lkml/2008/5/9/15).

The mention of an NFS-server made my ears prick up...

Do I understand correctly that the problem only occurs when you have
48 clients hammering away at the filesystem in question?

Could the clients be accessing the same file that you are experiencing
problems with? Or one of the directories in the path (if so, how
deep).

How many different files to these 20 million chunks come from? And
how does that number compare with the first number from
grep dentry /proc/slabinfo
??

The NFS server does some slighty strange things with the dcache if the
object being access is not in the cache.

Also, can get a few instances of
grep '^fh' /proc/nfs/rpc/nfsd

while things are going strange. The numbers are:
* fh <stale> <total-lookups> <anonlookups> <dir-not-in-dcache> <nondir-not-in-dcache>

That will show us if it is looking for things that aren't in the
dcache.

Finally, if the filesystem export with "subtree_check" or
"nosubtree_check"?
Does it make a difference if you switch the setting of this flag and
re-export?

NeilBrown


2008-05-12 06:41:02

by Jesper Krogh

[permalink] [raw]
Subject: Re: Many open/close on same files yeilds "No such file or directory".

Neil Brown wrote:
> On Friday May 9, [email protected] wrote:
>> When I disabled the NFS-server and rand my "real-world" program on a
>> single processor (make -j 1). It ran through fine. It basically
>> gets around 20 million chunks out of differnet file and assemble the
>> chuncks in a few other files. This processes more or less 5 individual
>> sections, so make can run effectively with a concurrency of 5.
>
> (For linux-nfs readers: the problem is that repeatedly opening a given
> file sometimes returns a ENOENT - http://lkml.org/lkml/2008/5/9/15).

This thing really, really irritated me, but I must admit that Andrew
Morton was very correct about this "not being very likely"- a kernel
bug.

It seem that our central configuration handling system (slack) was being
way to aggressive about updating symlinks in paths of the filesystems
that I was testing upon, that explains why I couldn't reproduce it
on the internal volumes, and not on any of the volumes I created only
for testing purposes. Sometimes you just get too blind..
(I haven't been able to reproduce for 12 hours now)

Just to answer your questions, yes, the 48 clients do hammer on NFS and
now it seems to work excellent.

Sorry for all the noise.

--
Jesper