2006-02-03 18:05:31

by Brian D. McGrew

[permalink] [raw]
Subject: Stale NFS File Handle

Good morning all (kind of a long winded mail, please have patience!)

I've got an FC3 server running a 2.6.9 kernel and sharing about 500GB of
disk space on a RAID5 array via NFS. This box has been running fine for
over a year now but in the last three weeks or so I'm seeing a ton of
Stale NFS File Handle errors; especially in my overnight builds.

Most of my clients are FC3 and a couple of Solaris boxes running a stock
configuration. All we're doing is serving up NFS and compiling with
GCC. We're seeing this error more and more and the harder I try to
track it down, the more we're seeing it (ok, maybe that's my
imagination).

I'm guessing that the problem has to be somewhere in the FC3 server
because I've still got some Solaris NFS servers that have been running
for years with no problems.

What should I be looking for in tracking this error down? Should I
upgrade my kernel? Should I throw away FC3 and go to Enterprise Linux?
I'm at the end of my rope here because this is now causing a major set
back to our development team!

Please help!

-brian

Brian D. McGrew { [email protected] || [email protected] }
--
> Those of you who think you know it all,
really annoy those of us who do!


2006-02-03 19:09:14

by Trond Myklebust

[permalink] [raw]
Subject: Re: Stale NFS File Handle

On Fri, 2006-02-03 at 10:05 -0800, Brian D. McGrew wrote:
> Good morning all (kind of a long winded mail, please have patience!)
>
> I've got an FC3 server running a 2.6.9 kernel and sharing about 500GB of
> disk space on a RAID5 array via NFS. This box has been running fine for
> over a year now but in the last three weeks or so I'm seeing a ton of
> Stale NFS File Handle errors; especially in my overnight builds.
>
> Most of my clients are FC3 and a couple of Solaris boxes running a stock
> configuration. All we're doing is serving up NFS and compiling with
> GCC. We're seeing this error more and more and the harder I try to
> track it down, the more we're seeing it (ok, maybe that's my
> imagination).
>
> I'm guessing that the problem has to be somewhere in the FC3 server
> because I've still got some Solaris NFS servers that have been running
> for years with no problems.
>
> What should I be looking for in tracking this error down? Should I
> upgrade my kernel? Should I throw away FC3 and go to Enterprise Linux?
> I'm at the end of my rope here because this is now causing a major set
> back to our development team!

Kernels prior to 2.6.12 (if memory serves me correctly) had a series of
errors in the code that converts filehandles into valid dentries on the
server. Upgrading to the FC4 kernel, which I believe to be 2.6.14 based,
is therefore very likely to solve your problem.

Cheers,
Trond

2006-02-03 19:14:27

by Roger Heflin

[permalink] [raw]
Subject: RE: Stale NFS File Handle



> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]] On Behalf Of
> Brian D. McGrew
> Sent: Friday, February 03, 2006 12:06 PM
> To: [email protected]
> Subject: Stale NFS File Handle
>
> Good morning all (kind of a long winded mail, please have patience!)
>
> I've got an FC3 server running a 2.6.9 kernel and sharing
> about 500GB of disk space on a RAID5 array via NFS. This box
> has been running fine for over a year now but in the last
> three weeks or so I'm seeing a ton of Stale NFS File Handle
> errors; especially in my overnight builds.
>
> Most of my clients are FC3 and a couple of Solaris boxes
> running a stock configuration. All we're doing is serving up
> NFS and compiling with GCC. We're seeing this error more and
> more and the harder I try to track it down, the more we're
> seeing it (ok, maybe that's my imagination).
>
> I'm guessing that the problem has to be somewhere in the FC3
> server because I've still got some Solaris NFS servers that
> have been running for years with no problems.
>
> What should I be looking for in tracking this error down?
> Should I upgrade my kernel? Should I throw away FC3 and go
> to Enterprise Linux?
> I'm at the end of my rope here because this is now causing a
> major set back to our development team!
>
> Please help!


Brian,

That is an ancient kernel well over a year old, I would try a
later kernel.

At a min put on a later kernel, and maybe put on FC4 as there
as are several different kernels to choose from there, some
of which may have issues, others of which may work.

You might also check when and how your are doing "exportfs -r"
and other exportfs type commands because I have seen this command
before cause interesting race conditions (ie there is a spot
where apparently the clients get a failure response). My setup
to get those messages required a busy machine, and updating
/etc/exports in cron and rerunning exportfs often, even with
all of that the failures were pretty rare, and only affected
some nodes on a given failure.

I don't know if the bug is still around, but it is something
to check.

Roger



2006-02-03 19:18:10

by Roger Heflin

[permalink] [raw]
Subject: RE: Stale NFS File Handle



>
> Kernels prior to 2.6.12 (if memory serves me correctly) had a
> series of errors in the code that converts filehandles into
> valid dentries on the server. Upgrading to the FC4 kernel,
> which I believe to be 2.6.14 based, is therefore very likely
> to solve your problem.
>
> Cheers,
> Trond

Default FC4 is 2.6.11... so he would need to install on of the
updated kernels on FC4.

Roger