2012-10-09 14:07:21

by David Werner

[permalink] [raw]
Subject: Problem with NFSv4 (server: NetApp-Filer, client openSUSE-11.4, 12.1 and 12.2)

Hi,

We have a serious problem to use NFS v4 with recent openSUSE versions.
Provided that I have write permission to the directory I touch, I can do
the following for getting a stale file handle:

> touch ..
> ls
ls: cannot open directory .: Stale NFS file handle

This makes the filesystem quite unusable, as anything which touches a
directory above (like writing a file) can stop an application.
If I leave the directory and cd into it again, it content is listable.
>From our computing center that the server runs I got the information
that it is NetApp ONTAP 7.3.6P4.
I observed the problem here with openSUSE-11.4 (kernel: 2.6.37.6-0.20-desktop),
openSUSE-12.1 (kernel: 3.4.4-1-desktop), openSUSE-12.2 (kernel: 3.4.6-2.10-desktop)
and all x86_64 architecture but _not_ with openSUSE-11.3 (kernel: 2.6.34.10-0.6-desktop).
We use NFSv4 without encryption and userdata from NIS.
I'm looking for suggestions to resolve this problem.


Best regards, David

--
David Werner
Universitaet Stuttgart
Institut f?r Wasser- und Umweltsystemmodellierung
Lehrstuhl fuer Hydromechanik & Hydrosystemmodellierung
Pfaffenwaldring 61 ** 70569 Stuttgart
Tel.: ++49-711-685 67010 ** Fax: ++49-711-685 60430
[email protected]
http://www.hydrosys.uni-stuttgart.de/


2012-10-09 15:10:01

by David Werner

[permalink] [raw]
Subject: Re: Problem with NFSv4 (server: NetApp-Filer, client openSUSE-11.4, 12.1 and 12.2)

Hi Emmanuel,

Mount shows the following options which resulted from "defaults,_netdev"
in fstab:

rus4iws.rus.uni-stuttgart.de:/vol/rus4iws_data0/ on /home type nfs4
(rw,relatime,vers=4.0,rsize=65536,wsize=65536,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=129.69.98.101,local_lock=none,addr=129.69.201.103,_netdev)

I now also tried with "noac" and without _netdev (where I forgot its meaning .. I think it was a recommendation to cirumvent some systemd boot problem)
like the following:

rus4iws.rus.uni-stuttgart.de:/vol/rus4iws_data0/ on /home type nfs4
(rw,relatime,sync,vers=4.0,rsize=65536,wsize=65536,namlen=255,acregmin=0,acregmax=0,acdirmin=0,acdirmax=0,hard,noac,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=129.69.98.101,local_lock=none,addr=129.69.201.103)

But it did not resolve anything.

Best regards,
David

2012-10-09 19:27:27

by David Werner

[permalink] [raw]
Subject: Re: Problem with NFSv4 (server: NetApp-Filer, client openSUSE-11.4, 12.1 and 12.2)

On Tue, Oct 09, 2012 at 06:15:11PM +0200, Emmanuel Florac wrote:
> Le Tue, 9 Oct 2012 17:09:59 +0200
> David Werner <[email protected]> ?crivait:
>
> > rus4iws.rus.uni-stuttgart.de:/vol/rus4iws_data0/ on /home
> > type nfs4
> > (rw,relatime,sync,vers=4.0,rsize=65536,wsize=65536,namlen=255,acregmin=0,acregmax=0,acdirmin=0,acdirmax=0,hard,noac,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=129.69.98.101,local_lock=none,addr=129.69.201.103)
> >
> > But it did not resolve anything.
>
> Is the NetApp exported volume compressed by any chance? What does the
> output from "showmount -e <filer>" looks like for the related exports?

output of showmount -e:

Export list for rus4iws.rus.uni-stuttgart.de:
/vol/rus4iws_data0
129.69.201.31,129.69.201.39,ikearw,ikea1,ikea2,ikea3,teleloch
/vol/rus4iws_vol0 129.69.201.31,129.69.201.39

The first line contains our netgroups and is the directory we mount.
I think some deduplication is enabled.
Today I made a test with ubuntu-12.04.1 client, with the same problem.

--
David Werner
Universitaet Stuttgart
Institut f?r Wasser- und Umweltsystemmodellierung
Lehrstuhl fuer Hydromechanik & Hydrosystemmodellierung
Pfaffenwaldring 61 ** 70569 Stuttgart
Tel.: ++49-711-685 67010 ** Fax: ++49-711-685 60430
[email protected]

2012-10-17 14:50:39

by David Werner

[permalink] [raw]
Subject: Re: Problem with NFSv4 (server: NetApp-Filer, client openSUSE-11.4, 12.1 and 12.2)

Hi,

I now also tried the mount option 'nordirplus' which is in the shipped
with man-page listet only for NFSv3, but on this mailing list it was said is
also available for NFSv4. At a first glance this seems also to resolve the problem too,
but seems to give better performance than the former mentioned 'lookupcache=none'.

Best regards, David

--
David Werner
Universitaet Stuttgart
Institut f?r Wasser- und Umweltsystemmodellierung
Lehrstuhl fuer Hydromechanik & Hydrosystemmodellierung
Pfaffenwaldring 61 ** 70569 Stuttgart
Tel.: ++49-711-685 67010 ** Fax: ++49-711-685 60430
[email protected]
http://www.hydrosys.uni-stuttgart.de/

2012-10-09 14:21:32

by Emmanuel Florac

[permalink] [raw]
Subject: Re: Problem with NFSv4 (server: NetApp-Filer, client openSUSE-11.4, 12.1 and 12.2)

Le Tue, 9 Oct 2012 15:59:19 +0200
David Werner <[email protected]> ?crivait:

> I'm looking for suggestions to resolve this problem.

Did you try mounting the export with the noac option?

--
------------------------------------------------------------------------
Emmanuel Florac | Direction technique
| Intellique
| <[email protected]>
| +33 1 78 94 84 02
------------------------------------------------------------------------

2012-10-09 16:15:15

by Emmanuel Florac

[permalink] [raw]
Subject: Re: Problem with NFSv4 (server: NetApp-Filer, client openSUSE-11.4, 12.1 and 12.2)

Le Tue, 9 Oct 2012 17:09:59 +0200
David Werner <[email protected]> ?crivait:

> rus4iws.rus.uni-stuttgart.de:/vol/rus4iws_data0/ on /home
> type nfs4
> (rw,relatime,sync,vers=4.0,rsize=65536,wsize=65536,namlen=255,acregmin=0,acregmax=0,acdirmin=0,acdirmax=0,hard,noac,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=129.69.98.101,local_lock=none,addr=129.69.201.103)
>
> But it did not resolve anything.

Is the NetApp exported volume compressed by any chance? What does the
output from "showmount -e <filer>" looks like for the related exports?

--
------------------------------------------------------------------------
Emmanuel Florac | Direction technique
| Intellique
| <[email protected]>
| +33 1 78 94 84 02
------------------------------------------------------------------------

2012-10-10 15:04:51

by David Werner

[permalink] [raw]
Subject: Re: Problem with NFSv4 (server: NetApp-Filer, client openSUSE-11.4, 12.1 and 12.2)

Hi,

to provide more details I made tcpdump of the
"touch .." on two hosts. The outputfiles for looking in with wireshark
& co are on:

http://maultier.iws.uni-stuttgart.de:8080/nfs

But I do not understand much of nfs-protocol.

Best regards, David

2012-10-10 16:06:09

by Malahal Naineni

[permalink] [raw]
Subject: Re: Problem with NFSv4 (server: NetApp-Filer, client openSUSE-11.4, 12.1 and 12.2)

David Werner [[email protected]] wrote:
> Hi,
>
> to provide more details I made tcpdump of the
> "touch .." on two hosts. The outputfiles for looking in with wireshark
> & co are on:
>
> http://maultier.iws.uni-stuttgart.de:8080/nfs
>
> But I do not understand much of nfs-protocol.

I did take a quick look and didn't see anything wrong in the NFS trace.
Any syslog messages at the client. The ESTALE error must be made up at
the client.

Regards, Malahal.


2012-10-11 14:20:38

by David Werner

[permalink] [raw]
Subject: Re: Problem with NFSv4 (server: NetApp-Filer, client openSUSE-11.4, 12.1 and 12.2)

Some more news about my problem. Thanks to all, who read this thread and
made their thoughts about it.

* I found not any significant kernel messages regarding stale
handles in syslog or dmesg.

* The question about "noac"-mount-parameter brought me to check more mount
options. The parameter lookupcache is quite signifcant. If I set it
to "none", the problem disappears. while with the values "positive" or "all"
the problem persits.

- Though if I mount first with "none" and later with "all"
the problem disappeared but only until reboot when it then was
mounted with "all".
- I checked this under openSUSE-12.2.
- I made a quick check with openSUSE 11.4 whether "none" takes there
also away the problem.

Best regards, David

2012-10-09 19:44:57

by Emmanuel Florac

[permalink] [raw]
Subject: Re: Problem with NFSv4 (server: NetApp-Filer, client openSUSE-11.4, 12.1 and 12.2)

Le Tue, 9 Oct 2012 21:27:24 +0200 vous écriviez:

> The first line contains our netgroups and is the directory we mount.

Yes, nothing special apparently.

> I think some deduplication is enabled.
> Today I made a test with ubuntu-12.04.1 client, with the same problem.
>

So this may be a kernel bug. I'll have a quick look.

--
------------------------------------------------------------------------
Emmanuel Florac | Direction technique
| Intellique
| <[email protected]>
| +33 1 78 94 84 02
------------------------------------------------------------------------