2008-07-11 11:31:15

by Tom H

[permalink] [raw]
Subject: [NFS] Stale NFS file handle error


Hi,

I have been intermittently seeing a "Stale NFS file handle" error from a
Java app we use to resize images and save them to a file on an NFS
mounted directory.

An example of the code in Java is;

File outputFile = new File(outputDir, (new
StringBuilder()).append(filePrefix).append(ifile.getName()).toString());
BufferedImage large = ImageIO.read(inputFile);
BufferedImage scaled = scale(large, width, height);
ImageIO.write(scaled, "jpg", outputFile);

And about once in every 10 thousand writes, it throws the following error;

java.io.IOException: Stale NFS file handle
at java.io.RandomAccessFile.close(RandomAccessFile.java:573)
at
javax.imageio.stream.FileImageOutputStream.close(FileImageOutputStream.java:160)
at javax.imageio.ImageIO.write(ImageIO.java:1519)
at com.hz.pagemill.process.ImageScaler.scale(ImageScaler.java:47)

The NFS3 protocol specification defines a Stale File handle error as
follows; (http://tools.ietf.org/html/rfc1813)

NFS3ERR_STALE
Invalid file handle. The file handle given in the
arguments was invalid. The file referred to by that file
handle no longer exists or access to it has been
revoked.

The way the code works prevents 2 processes creating the same file, so I
am pretty sure that whatever is moving, renaming or deleting the file is
not part of the java app.

We are running the default nfs server with RHEL4;
[[email protected] ~]# modinfo nfsd
filename: /lib/modules/2.6.9-55.0.9.ELsmp/kernel/fs/nfsd/nfsd.ko
license: GPL
author: Olaf Kirch <[email protected]>
depends: sunrpc,exportfs,lockd,nfs_acl
vermagic: 2.6.9-55.0.9.ELsmp SMP gcc-3.4

with the following export options;
[[email protected] ~]# cat /etc/exports
/myshareZZZ *(rw,sync,no_root_squash,no_subtree_check)

(I have tried it with, and without no_subtree_check)

I was hoping to get some pointers on debugging the issue.

Many Thanks,

Tom H




-------------------------------------------------------------------------
Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW!
Studies have shown that voting for your favorite open source project,
along with a healthy diet, reduces your potential for chronic lameness
and boredom. Vote Now at http://www.sourceforge.net/community/cca08
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs
_______________________________________________
Please note that [email protected] is being discontinued.
Please subscribe to [email protected] instead.
http://vger.kernel.org/vger-lists.html#linux-nfs



2008-07-11 12:50:48

by Peter Staubach

[permalink] [raw]
Subject: Re: [NFS] Stale NFS file handle error

Tom H wrote:
> Hi,
>
> I have been intermittently seeing a "Stale NFS file handle" error from a
> Java app we use to resize images and save them to a file on an NFS
> mounted directory.
>
> An example of the code in Java is;
>
> File outputFile = new File(outputDir, (new
> StringBuilder()).append(filePrefix).append(ifile.getName()).toString());
> BufferedImage large = ImageIO.read(inputFile);
> BufferedImage scaled = scale(large, width, height);
> ImageIO.write(scaled, "jpg", outputFile);
>
> And about once in every 10 thousand writes, it throws the following error;
>
> java.io.IOException: Stale NFS file handle
> at java.io.RandomAccessFile.close(RandomAccessFile.java:573)
> at
> javax.imageio.stream.FileImageOutputStream.close(FileImageOutputStream.java:160)
> at javax.imageio.ImageIO.write(ImageIO.java:1519)
> at com.hz.pagemill.process.ImageScaler.scale(ImageScaler.java:47)
>
> The NFS3 protocol specification defines a Stale File handle error as
> follows; (http://tools.ietf.org/html/rfc1813)
>
> NFS3ERR_STALE
> Invalid file handle. The file handle given in the
> arguments was invalid. The file referred to by that file
> handle no longer exists or access to it has been
> revoked.
>
> The way the code works prevents 2 processes creating the same file, so I
> am pretty sure that whatever is moving, renaming or deleting the file is
> not part of the java app.
>
> We are running the default nfs server with RHEL4;
> [[email protected] ~]# modinfo nfsd
> filename: /lib/modules/2.6.9-55.0.9.ELsmp/kernel/fs/nfsd/nfsd.ko
> license: GPL
> author: Olaf Kirch <[email protected]>
> depends: sunrpc,exportfs,lockd,nfs_acl
> vermagic: 2.6.9-55.0.9.ELsmp SMP gcc-3.4
>
> with the following export options;
> [[email protected] ~]# cat /etc/exports
> /myshareZZZ *(rw,sync,no_root_squash,no_subtree_check)
>
> (I have tried it with, and without no_subtree_check)
>
> I was hoping to get some pointers on debugging the issue.

Are you sure that you don't have two (or more) different NFS
clients working in the same directory?

The ESTALE error usually occurs when a change is made on the
server to something that a client is accessing, perhaps
directly on the server or more commonly, by a different NFS
client.

ps

-------------------------------------------------------------------------
Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW!
Studies have shown that voting for your favorite open source project,
along with a healthy diet, reduces your potential for chronic lameness
and boredom. Vote Now at http://www.sourceforge.net/community/cca08
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs
_______________________________________________
Please note that [email protected] is being discontinued.
Please subscribe to [email protected] instead.
http://vger.kernel.org/vger-lists.html#linux-nfs


2008-07-11 13:23:32

by Tom H

[permalink] [raw]
Subject: Re: [NFS] Stale NFS file handle error

Peter Staubach wrote:
> Are you sure that you don't have two (or more) different NFS
> clients working in the same directory?
>
> The ESTALE error usually occurs when a change is made on the
> server to something that a client is accessing, perhaps
> directly on the server or more commonly, by a different NFS
> client.

I'm as sure as I can be that the app is not opening the file multiple
times, the previous lines of code actually generate the folder and its
all sequential. The file names are all deterministic, only 1 process
running and nothing else accesses those files on the server. I have
checked for things like backup processes.

It happens so infrequently that I get the feeling that its more likely
something else.

I guess that I am looking for ways to troubleshoot the problem, maybe
with debug tools or whatever the guru nfs people would use...

Tom

-------------------------------------------------------------------------
Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW!
Studies have shown that voting for your favorite open source project,
along with a healthy diet, reduces your potential for chronic lameness
and boredom. Vote Now at http://www.sourceforge.net/community/cca08
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs
_______________________________________________
Please note that [email protected] is being discontinued.
Please subscribe to [email protected] instead.
http://vger.kernel.org/vger-lists.html#linux-nfs


2008-07-11 13:38:52

by Peter Staubach

[permalink] [raw]
Subject: Re: [NFS] Stale NFS file handle error

Tom H wrote:
> Peter Staubach wrote:
>
>> Are you sure that you don't have two (or more) different NFS
>> clients working in the same directory?
>>
>> The ESTALE error usually occurs when a change is made on the
>> server to something that a client is accessing, perhaps
>> directly on the server or more commonly, by a different NFS
>> client.
>>
>
> I'm as sure as I can be that the app is not opening the file multiple
> times, the previous lines of code actually generate the folder and its
> all sequential. The file names are all deterministic, only 1 process
> running and nothing else accesses those files on the server. I have
> checked for things like backup processes.
>
> It happens so infrequently that I get the feeling that its more likely
> something else.
>
> I guess that I am looking for ways to troubleshoot the problem, maybe
> with debug tools or whatever the guru nfs people would use...

It sounds like you are only using 1 NFS client? Do you have
the same file system from the server mounted in more than one
place on this client?

You could try using something like tshark or tcpdump to capture
the network traffic when the error occurs, but I suspect that
if it really only happens infrequently, then you would end up
with gigabytes of capture and may or may not actually be able to
identify the traffic when the problem occurs.

Alternately, when the problem occurs, see if the file being
accessed really does continue to exist and is the same file,
ie. wasn't removed and then a new file with the same name
created in its place.

ps

-------------------------------------------------------------------------
Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW!
Studies have shown that voting for your favorite open source project,
along with a healthy diet, reduces your potential for chronic lameness
and boredom. Vote Now at http://www.sourceforge.net/community/cca08
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs
_______________________________________________
Please note that [email protected] is being discontinued.
Please subscribe to [email protected] instead.
http://vger.kernel.org/vger-lists.html#linux-nfs