2005-03-08 04:53:50

by Bernardo Innocenti

[permalink] [raw]
Subject: NFS client bug in 2.6.8-2.6.11

Hello,

This problem was previously described by Neil Conway.
All relevant information here:

http://lkml.org/lkml/2005/2/10/97


I still see this very same problem on 2.6.11 vanilla and in
Fedora/RawHide hernels. It has haunted me for a couple of
months on several Fedora clients. Strangely, a Gentoo
client isn't affected, but I couldn't investigate further.

When the current directory becomes inaccessible, it remains
so until I cd somewhere else and then cd back to it.
Sometimes I must wait a few seconds before cd succeeds.

Here's a sample session:

[executing a find / in another shell to trigger the bug]
beetle:/pub/linux/distro/fedora-devel# ll
ls: .: No such file or directory
beetle:/pub/linux/distro/fedora-devel# cd -
/
beetle:/# cd -
bash: cd: /pub/linux/distro/fedora-devel: No such file or directory
beetle:/#
[...a few seconds later...]
beetle:/# cd -
/pub/linux/distro/fedora-devel


Appears to be a client bug. The problem only happens
when there's heavy filesystem activity on other
filesystems (local or NFS).

NFS mount options: rw,_netdev,rsize=32768,wsize=32768,hard,intr,proto=udp,addr=10.3.3.1

--
// Bernardo Innocenti - Develer S.r.l., R&D dept.
\X/ http://www.develer.com/


2005-03-08 06:38:23

by Bernardo Innocenti

[permalink] [raw]
Subject: Re: NFS client bug in 2.6.8-2.6.11

Trond Myklebust wrote:
> ty den 08.03.2005 Klokka 05:53 (+0100) skreiv Bernardo Innocenti:
>
>>Appears to be a client bug.
>
> Why?

Two clients started showing the problem after
being upgraded from FC2 to FC3, while the server
remained unchanged.

I also can't reproduce the problem on an older
client running 2.4.21.

I'll test with 2.6.7 as soon as I can reboot the
client I'm using right now.

--
// Bernardo Innocenti - Develer S.r.l., R&D dept.
\X/ http://www.develer.com/

2005-03-08 06:48:17

by Trond Myklebust

[permalink] [raw]
Subject: Re: NFS client bug in 2.6.8-2.6.11

ty den 08.03.2005 Klokka 07:38 (+0100) skreiv Bernardo Innocenti:
> Trond Myklebust wrote:
> > ty den 08.03.2005 Klokka 05:53 (+0100) skreiv Bernardo Innocenti:
> >
> >>Appears to be a client bug.
> >
> > Why?
>
> Two clients started showing the problem after
> being upgraded from FC2 to FC3, while the server
> remained unchanged.

Can you produce tcpdumps to back that up?

Neil's problem appeared rather to be server-related. Neither of us could
reproduce his problem when the server was exporting an XFS partition.

The other thing to try is to turn off subtree checking on the server.

Cheers,
Trond

--
Trond Myklebust <[email protected]>



-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-03-08 07:03:54

by Bernardo Innocenti

[permalink] [raw]
Subject: Re: NFS client bug in 2.6.8-2.6.11

Bernardo Innocenti wrote:
> Trond Myklebust wrote:
>
> I also can't reproduce the problem on an older
> client running 2.4.21.

Well, actually I tried harder with the 2.4.21
client and I obtained a similar effect:

naraku:/pub/linux/distro/fedora-devel# ll
ls: .: Stale NFS file handle
naraku:/pub/linux/distro/fedora-devel# cd -
/arc/linux
naraku:/arc/linux# cd -
/pub/linux/distro/fedora-devel
naraku:/pub/linux/distro/fedora-devel# ll
... (lots of files)


So, instead of ENOENT I get ESTALE on 2.4.21.

May well be a server bug then. The server is running
2.6.10-1.766_FC3. Do you think I should try installing
a vanilla kernel on the server?

--
// Bernardo Innocenti - Develer S.r.l., R&D dept.
\X/ http://www.develer.com/

2005-03-08 08:56:40

by Anders Saaby

[permalink] [raw]
Subject: Re: NFS client bug in 2.6.8-2.6.11

On Tuesday 08 March 2005 08:03, Bernardo Innocenti wrote:
> Bernardo Innocenti wrote:
> > Trond Myklebust wrote:
> >
> > I also can't reproduce the problem on an older
> > client running 2.4.21.
>
> Well, actually I tried harder with the 2.4.21
> client and I obtained a similar effect:
>
> So, instead of ENOENT I get ESTALE on 2.4.21.
>
> May well be a server bug then. The server is running
> 2.6.10-1.766_FC3. Do you think I should try installing
> a vanilla kernel on the server?

We have seen lots of ESTALE's/ENOENT's when the server is running 2.6.10
(vanilla). Don't know if this was supposed to be fixed in the 2.6.10-FC
kernels, but vanilla 2.6.11 doesen't seem to have this bug at all.

You mention a lot of kernel versions including 2.6.11, and I can't really
figure out whether you are talking abount the clients or the server. -
Anyways if your server has only run with 2.6.10 - try 2.6.11.

- Apologies if I missed something obvious.

--
Med venlig hilsen - Best regards - Meilleures salutations

Anders Saaby
Systems Engineer
------------------------------------------------
Cohaesio A/S - Maglebjergvej 5D - DK-2800 Lyngby
Phone: +45 45 880 888 - Fax: +45 45 880 777
Mail: [email protected] - http://www.cohaesio.com
------------------------------------------------

2005-03-08 09:26:09

by Bernardo Innocenti

[permalink] [raw]
Subject: Re: NFS client bug in 2.6.8-2.6.11

Trond Myklebust wrote:
> ty den 08.03.2005 Klokka 07:38 (+0100) skreiv Bernardo Innocenti:
>>
>>Two clients started showing the problem after
>>being upgraded from FC2 to FC3, while the server
>>remained unchanged.
>
> Can you produce tcpdumps to back that up?
>
> Neil's problem appeared rather to be server-related. Neither of us could
> reproduce his problem when the server was exporting an XFS partition.

Actually, I was mistaken: running a background "find / >/dev/null"
triggers the problem even on the old RedHat (2.4.26) and
Gentoo (2.6.11) clients.


> The other thing to try is to turn off subtree checking on the server.

It's already turned off on all shares. For the record, this is the
contents of my /etc/exportfs:

/home gss/krb5(rw,no_root_squash,no_subtree_check,async) beetle(rw,no_root_squash,no_subtree_check,async) deimos(rw,async,no_subtree_check,anonuid=134,anongid=100) haring(rw,async,no_subtree_check,anonuid=127,anongid=100) murphy(rw,async,no_subtree_check,anonuid=158,anongid=100) daneel(rw,async,no_subtree_check,anonuid=100,anongid=100) 10.0.0.0/8(rw,no_subtree_check,async)
/arc 10.0.0.0/8(rw,no_root_squash,no_subtree_check,async,anonuid=14,anongid=113)

#
# NFSv4
#
/export beetle(rw,fsid=0,no_root_squash,insecure,no_subtree_check,async)
/export 10.0.0.0/8(rw,fsid=0,insecure,no_subtree_check,async)
/export gss/krb5(rw,fsid=0,insecure,no_subtree_check,async)
/export/home beetle(rw,nohide,no_root_squash,insecure,no_subtree_check,async)
/export/home 10.0.0.0/8(rw,nohide,insecure,no_subtree_check,async)
/export/home gss/krb5(rw,nohide,no_root_squash,insecure,no_subtree_check,async)
/export/arc 10.0.0.0/8(rw,nohide,no_root_squash,insecure,no_subtree_check,async,anonuid=14,anongid=113)

--
// Bernardo Innocenti - Develer S.r.l., R&D dept.
\X/ http://www.develer.com/

2005-03-08 22:25:07

by Bernardo Innocenti

[permalink] [raw]
Subject: Re: NFS client bug in 2.6.8-2.6.11

Anders Saaby wrote:
> On Tuesday 08 March 2005 08:03, Bernardo Innocenti wrote:
>
>>Bernardo Innocenti wrote:
>>
>>>Trond Myklebust wrote:
>>>
>>>I also can't reproduce the problem on an older
>>>client running 2.4.21.
>>
>>Well, actually I tried harder with the 2.4.21
>>client and I obtained a similar effect:
>>
>>So, instead of ENOENT I get ESTALE on 2.4.21.
>>
>>May well be a server bug then. The server is running
>>2.6.10-1.766_FC3. Do you think I should try installing
>>a vanilla kernel on the server?
>
>
> We have seen lots of ESTALE's/ENOENT's when the server is running 2.6.10
> (vanilla). Don't know if this was supposed to be fixed in the 2.6.10-FC
> kernels, but vanilla 2.6.11 doesen't seem to have this bug at all.
>
> You mention a lot of kernel versions including 2.6.11, and I can't really
> figure out whether you are talking abount the clients or the server. -
> Anyways if your server has only run with 2.6.10 - try 2.6.11.

Thank you, I've finally nailed it down by upgrading the
*server* kernel from 2.6.10-1.770_FC3 to 2.6.10-1.770_FC3.

The latter is basically 2.6.10-ac12 plus a bunch of vendor
specific patches.


> - Apologies if I missed something obvious.

No, *I* did. All the clues I had leaded me to the client
side, while the problem was in the server instead.

--
// Bernardo Innocenti - Develer S.r.l., R&D dept.
\X/ http://www.develer.com/

2005-03-08 05:30:54

by Trond Myklebust

[permalink] [raw]
Subject: Re: NFS client bug in 2.6.8-2.6.11

ty den 08.03.2005 Klokka 05:53 (+0100) skreiv Bernardo Innocenti:

> Appears to be a client bug.

Why?

--
Trond Myklebust <[email protected]>



-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs