2006-01-16 09:01:45

by Michael Loftis

[permalink] [raw]
Subject: Linux 2.6.8 NFS not stateless and random failures?

(Sincere apologies if this gets posted twice, old habits die hard, and I
was posting to rutgers instead of kernel.org!)

We recently attempted to upgrade a completely working Linux 2.4 NFS
environment to 2.6 based server, nothing else has changed, at all, just the
server.

On with the show, when did the 2.6 series NFS lose it's stateless ability?
Now whenever I update NFS exports, or reboot the NFS server, I have to
remount or reboot all NFS clients now. I thought part of the whole point
of NFS is it is stateless. Indeed we didn't have this behavior before
2.6...

Secondly we're getting weird intermittent failures, most easily seen by the
webservers with logs along the lines of below, apparently random, and
inconsistent. I removed the particular path and client from the below log
entry. There is NOT a permissions problem on these elements. Subsequent
accesses will (usually) succeed. Right after a reboot everything will be
fine for a while...then after a bit the webserver starts to get these
errors intermittently, with no apparent reasoning behind them. Again, with
2.4, we had nothing of the sort going on except in the (very very limited
and few) legitimate cases caused by customers setting incorrect perms.

[Sun Jan 15 12:14:00 2006] [error] [client a.b.c.d] (13)Permission denied:
access to /path... failed because search permissions are missing on a
component of the path

Debian 3.1 Kernel 2.6.8-2-686-smp w/ ReiserFS on LVM on a qlogic QLA2342
(2312) based PCI-X/133Mhz card.

No errors or other oddities at all being logged on NFS server, completely
quiet.

We will probably have to roll back to 2.4, esp. since I can't reproduce
this.

--
"Genius might be described as a supreme capacity for getting its possessors
into trouble of all kinds."
-- Samuel Butler


2006-01-16 15:43:13

by Trond Myklebust

[permalink] [raw]
Subject: Re: Linux 2.6.8 NFS not stateless and random failures?

On Mon, 2006-01-16 at 02:01 -0700, Michael Loftis wrote:
> (Sincere apologies if this gets posted twice, old habits die hard, and I
> was posting to rutgers instead of kernel.org!)
>
> We recently attempted to upgrade a completely working Linux 2.4 NFS
> environment to 2.6 based server, nothing else has changed, at all, just the
> server.
>
> On with the show, when did the 2.6 series NFS lose it's stateless ability?
> Now whenever I update NFS exports, or reboot the NFS server, I have to
> remount or reboot all NFS clients now. I thought part of the whole point
> of NFS is it is stateless. Indeed we didn't have this behavior before
> 2.6...
>
> Secondly we're getting weird intermittent failures, most easily seen by the
> webservers with logs along the lines of below, apparently random, and
> inconsistent. I removed the particular path and client from the below log
> entry. There is NOT a permissions problem on these elements. Subsequent
> accesses will (usually) succeed. Right after a reboot everything will be
> fine for a while...then after a bit the webserver starts to get these
> errors intermittently, with no apparent reasoning behind them. Again, with
> 2.4, we had nothing of the sort going on except in the (very very limited
> and few) legitimate cases caused by customers setting incorrect perms.
>
> [Sun Jan 15 12:14:00 2006] [error] [client a.b.c.d] (13)Permission denied:
> access to /path... failed because search permissions are missing on a
> component of the path
>
> Debian 3.1 Kernel 2.6.8-2-686-smp w/ ReiserFS on LVM on a qlogic QLA2342
> (2312) based PCI-X/133Mhz card.

AFAIK, most of these bugs have been fixed.

Please try to reproduce the problems on a more recent kernel, or get
Debian to backport the fixes.

Cheers,
Trond

2006-01-16 21:44:24

by Michael Loftis

[permalink] [raw]
Subject: Re: Linux 2.6.8 NFS not stateless and random failures?



--On January 16, 2006 10:42:48 AM -0500 Trond Myklebust
<[email protected]> wrote:


> AFAIK, most of these bugs have been fixed.
>
> Please try to reproduce the problems on a more recent kernel, or get
> Debian to backport the fixes.


I'll see if I can find a more specific version...I already know we can't
use anything between 2.6.13 to 2.6.15, maybe including 2.6.15.1 (I'd have
to test that) because of some sort of either SCSI or aic7xxx driver
problems. Even 2.6.8 I occasionally have to cycle our tape library and
reset the tape server when for some reason the kernel manages to deadlock a
tape drive when we have more than one tape drive going at once.