From: "Talpey, Thomas" <Thomas.Talpey@netapp.com>
Subject: Re: Linux client cache corruption,
 system call returning  incorrectly
Date: Fri, 02 Mar 2007 08:45:19 -0500
Message-ID: <EXNANE01XvpFVjCRGry00000034@exnane01.hq.netapp.com>
References: <45E796B7.9010707@ilm.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Cc: nfs@lists.sourceforge.net
To: Eli Stair <estair@ilm.com>
In-Reply-To: <45E796B7.9010707@ilm.com>
References: <45E796B7.9010707@ilm.com>
Sender: nfs-bounces@lists.sourceforge.net
Errors-To: nfs-bounces@lists.sourceforge.net

Do you mean ENFILE? If so, what's your process's file descriptor limit
(ulimit -n)? Since stat(2) doesn't open anything, in theory it shouldn't
return this. fstat(2) takes a file descriptor but again, doesn't open
anything.

I wouldn't recommend using rpcdebug, since this is probably a local
issue. Rpcdebug will show over-the-wire stuff, mostly. Strace will
probably be the best tool.

Tom.

At 10:15 PM 3/1/2007, Eli Stair wrote:
>
>I'm having a serious client cache issue on recent kernels.  On 2.6.18 
>and 2.6.20 (but /not/ 2.6.15.4) clients I'm seeing periodic file GETATTR 
>or ACCESS calls that return fine from the server pass an ENOFILE up to 
>the application.  It occurs against all NFS servers I've tested (2.6.18 
>knfsd, OnTap 10.0.1, SpinOS 2.5.5p8).
>
>The triggering usage stage is repeated stat'ing of a several hundred 
>files that are opened read-only (but not close() or open()'ing it again) 
>during runtime.  I have been unable to duplicate the usage into a 
>bug-triggering testcase yet, but it is very easily triggered by an 
>internal app.  Mounting the NFS filesystems with 'nocto' appears to 
>mitigate the issue by about 50%, but does not completely get rid of it. 
>  Also, using 2.6.20+ Trond's NFS_ALL patches and this one you supplied 
>also slow the rate of errors, but not completely.
>
>I'm rigging the application with an strace harness so I can track down 
>specifically what ops are failing in production.  I can confirm that 
>those errors I have witnessed under debug are NOT failing due to an NFS 
>call returning where access is denied, or on an open(), it appears to be 
>stat() of the file (usually several dozen or hundreds in sequence) that 
>return ENOFILE, though the call should return sucess.
>
>Any tips on using rpcdebug effectively?  I'm getting tremendous levels 
>of info output with '-m nfs -s all', too much to parse well.
>
>I'll update with some more hard data as I get further along, but want to 
>see if a) anyone else has noticed this and working on a fix, and b) if 
>there are any suggestions on getting more useful data than what I'm 
>working towards.
>
>Reverting to 2.6.15.4 (which doesn't exhibit this particular bug) isn't 
>a direct solution even temporarily, as that has a nasty NFS fseek bug 
>(seek to EOF goes to wrong offset).
>
>Cheers,
>
>
>/eli


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs