2005-09-20 15:17:18

by Reuti

[permalink] [raw]
Subject: NFS client got stale NFS handle after server reboot

Hi all,

I set up a NFS server using kernel NFS in kernel 2.6.11.4-21.9 from
SuSE. I defined three directories to be exported using netgroups and all
is working fine, as they can be mounted as intended by all the nodes in
the cluster.

Up to now I was used to have the NFS feature, that a "hard" mount on the
nodes will survive a reboot of the NFS server. This is now no longer the
case. If I reboot the server, all the nodes get a stale NFS handle. I
can achieve the same effect by simple stopping and starting the nfsd -
same result.

As we use a RAID card for the disks in the server, I got the idea of
changed major-minor numbers and also tried to give in the "exports"
file a chosen "fsid=" for each of the three exports. (Although
starting/stopping the nfsd won't change the major-minor numbers I
think.) No change.

Now the funny part: login to a node. Choose any of the three imported
directories. Make an "umount" and "mount" on this dir - and also the
other two dirs reappear and are accessible again.

Does anyone has an explanation and/or solution for this setup, to get
the surviving feature back?

CU - Reuti

(BTW: all partitions are ext2 on Linux x86 machines, mounted with NFSv3)



-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server.
Download it for free - -and be entered to win a 42" plasma tv or your very
own Sony(tm)PSP. Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs


2005-09-26 08:33:00

by Reuti

[permalink] [raw]
Subject: Re: [CIRCUMVENTED] NFS client got stale NFS handle after server reboot

...and then this one.

Hi Vincent,

I had a short look into the source of nfs-utils, but I realized, that=20
most is in the kernel. But I saw a line checking for the DNS names and=20
sometimes skipping it (if all mount points are unexported).

And this was another thing: we don't have DNS in the cluster, but use=20
only the /etc/hosts file and distribute it to the nodes with NIS. And in=20
the rmtab file, I saw some entries with the nodename, and another one=20
with the TCP/IP address of the same node. Relies NFS on DNS in the new mo=
de?

Cheers - Reuti


Vincent Roqueta wrote:
> Le jeudi 22 Septembre 2005 16:52, Reuti a =E9crit :
>=20
>>Finally I could track down the odd behavior (this time in top posting):
>>
>>If you have this combination:
>>
>>+ Linux kernel 2.6 nfsd on server
>>+ 64 bit i386, i.e. x86_64 server
>>+ using netgroups in /etc/exports
>>+ operate mountd/exportfs in new mode
>>
>>you will face the problem of stale NFS handle after a stop/start of of
>>the NFS service. As we want to use 64 bit of course, and enjoy netgroup=
s
>>for easier administration, I got a working setup by *not* mounting
>>/proc/fs/nfsd and forcing this way mountd/exportfs into legacy mode als=
o
>>on a 2.6 kernel.
>>
>>Cheers - Reuti
>=20
>=20
>=20
> There are several problems using x86_64 linux NFS (v4) machines as serv=
er, and=20
> it seems to have several problems on 64 bits architectures in general (=
at=20
> least with x86_64 and ppc64. Surely other architectures)
> =20
> For now there are this issue yet known (linux 2.6.13+CITI, Linux 2.6.14=
-rc*=20
> +CITI)
> -> Socket error -11 closing a big file (server : x86_64, clients: x86, =
ppc64)
> -> wrong lock behaviours (Client PPC64, servers: ppc64, x86)
> -> Input/output error closing a big file (server: ppc64, client x86_64=
)
> -> Oops (Client x86_64, server x86_64) - Linux 2.6.13+CITI
>=20
> However, this issues concern NFSv4. I am going to try to reproduce your=
=20
> problem with NFSv3 and NFSv4.
>=20
>=20
> Vincent
>=20
>=20
> -------------------------------------------------------
> SF.Net email is sponsored by:
> Tame your development challenges with Apache's Geronimo App Server.=20
> Download it for free - -and be entered to win a 42" plasma tv or your v=
ery
> own Sony(tm)PSP. Click here to play: http://sourceforge.net/geronimo.p=
hp
> _______________________________________________
> NFS maillist - [email protected]
> https://lists.sourceforge.net/lists/listinfo/nfs



-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server.
Download it for free - -and be entered to win a 42" plasma tv or your very
own Sony(tm)PSP. Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-09-21 16:49:54

by Reuti

[permalink] [raw]
Subject: Re: NFS client got stale NFS handle after server reboot

Hi again,

Reuti wrote:
> Hi all,
>
> I set up a NFS server using kernel NFS in kernel 2.6.11.4-21.9 from
> SuSE. I defined three directories to be exported using netgroups and all
> is working fine, as they can be mounted as intended by all the nodes in
> the cluster.
>
> Up to now I was used to have the NFS feature, that a "hard" mount on the
> nodes will survive a reboot of the NFS server. This is now no longer the
> case. If I reboot the server, all the nodes get a stale NFS handle. I
> can achieve the same effect by simple stopping and starting the nfsd -
> same result.
>
> As we use a RAID card for the disks in the server, I got the idea of
> changed major-minor numbers and also tried to give in the "exports"
> file a chosen "fsid=" for each of the three exports. (Although
> starting/stopping the nfsd won't change the major-minor numbers I
> think.) No change.
>
> Now the funny part: login to a node. Choose any of the three imported
> directories. Make an "umount" and "mount" on this dir - and also the
> other two dirs reappear and are accessible again.
>
> Does anyone has an explanation and/or solution for this setup, to get
> the surviving feature back?
>
> CU - Reuti
>
> (BTW: all partitions are ext2 on Linux x86 machines, mounted with NFSv3)

as a followup to my own posting:

I did some more testing: downloaded the latest stable kernel (2.6.13.2)
and compiled for server and client, no success.

Then I installed a conventional IDE harddisk - same result. So it's not
AACRAID to blame for.

Then I tried with another server, but a 32 bit machine. And with this it
was working.

So my conclusion:

Server 64 bit and client 32 bit will not survive a stop/start of the NFS
server. But if you unmount/mount one of the imported dirs, all will
reappear as already stated.

Is this a known issue, and/or any solution to this?

Tomorrow I'll try to get another 64 bit machine as client, to check
whether this would behave the same with the 64 bit server.

TIA - Reuti



-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server.
Download it for free - -and be entered to win a 42" plasma tv or your very
own Sony(tm)PSP. Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-09-22 09:24:35

by Reuti

[permalink] [raw]
Subject: Re: NFS client got stale NFS handle after server reboot

Okay, here the result:

Reuti wrote:
> Hi again,
>
> Reuti wrote:
>
>> Hi all,
>>
>> I set up a NFS server using kernel NFS in kernel 2.6.11.4-21.9 from
>> SuSE. I defined three directories to be exported using netgroups and all
>> is working fine, as they can be mounted as intended by all the nodes in
>> the cluster.
>>
>> Up to now I was used to have the NFS feature, that a "hard" mount on the
>> nodes will survive a reboot of the NFS server. This is now no longer the
>> case. If I reboot the server, all the nodes get a stale NFS handle. I
>> can achieve the same effect by simple stopping and starting the nfsd -
>> same result.
>>
>> As we use a RAID card for the disks in the server, I got the idea of
>> changed major-minor numbers and also tried to give in the "exports"
>> file a chosen "fsid=" for each of the three exports. (Although
>> starting/stopping the nfsd won't change the major-minor numbers I
>> think.) No change.
>>
>> Now the funny part: login to a node. Choose any of the three imported
>> directories. Make an "umount" and "mount" on this dir - and also the
>> other two dirs reappear and are accessible again.
>>
>> Does anyone has an explanation and/or solution for this setup, to get
>> the surviving feature back?
>>
>> CU - Reuti
>>
>> (BTW: all partitions are ext2 on Linux x86 machines, mounted with NFSv3)
>
>
> as a followup to my own posting:
>
> I did some more testing: downloaded the latest stable kernel (2.6.13.2)
> and compiled for server and client, no success.
>
> Then I installed a conventional IDE harddisk - same result. So it's not
> AACRAID to blame for.
>
> Then I tried with another server, but a 32 bit machine. And with this it
> was working.
>
> So my conclusion:
>
> Server 64 bit and client 32 bit will not survive a stop/start of the NFS
> server. But if you unmount/mount one of the imported dirs, all will
> reappear as already stated.
>
> Is this a known issue, and/or any solution to this?
>
> Tomorrow I'll try to get another 64 bit machine as client, to check
> whether this would behave the same with the 64 bit server.

64 bit server and 64 bit client has the same issue of a stale NFS handle
after stop/start of the nfs daemon. To me it looks like a bug, as it was
working within a pure 32 bit setup. Where exactly shall I report this as
a bug? - Reuti



-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server.
Download it for free - -and be entered to win a 42" plasma tv or your very
own Sony(tm)PSP. Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-09-22 14:52:57

by Reuti

[permalink] [raw]
Subject: Re: [CIRCUMVENTED] NFS client got stale NFS handle after server reboot

Finally I could track down the odd behavior (this time in top posting):

If you have this combination:

+ Linux kernel 2.6 nfsd on server
+ 64 bit i386, i.e. x86_64 server
+ using netgroups in /etc/exports
+ operate mountd/exportfs in new mode

you will face the problem of stale NFS handle after a stop/start of of
the NFS service. As we want to use 64 bit of course, and enjoy netgroups
for easier administration, I got a working setup by *not* mounting
/proc/fs/nfsd and forcing this way mountd/exportfs into legacy mode also
on a 2.6 kernel.

Cheers - Reuti


Reuti wrote:
> Okay, here the result:
>
> Reuti wrote:
>
>> Hi again,
>>
>> Reuti wrote:
>>
>>> Hi all,
>>>
>>> I set up a NFS server using kernel NFS in kernel 2.6.11.4-21.9 from
>>> SuSE. I defined three directories to be exported using netgroups and all
>>> is working fine, as they can be mounted as intended by all the nodes in
>>> the cluster.
>>>
>>> Up to now I was used to have the NFS feature, that a "hard" mount on the
>>> nodes will survive a reboot of the NFS server. This is now no longer the
>>> case. If I reboot the server, all the nodes get a stale NFS handle. I
>>> can achieve the same effect by simple stopping and starting the nfsd -
>>> same result.
>>>
>>> As we use a RAID card for the disks in the server, I got the idea of
>>> changed major-minor numbers and also tried to give in the "exports"
>>> file a chosen "fsid=" for each of the three exports. (Although
>>> starting/stopping the nfsd won't change the major-minor numbers I
>>> think.) No change.
>>>
>>> Now the funny part: login to a node. Choose any of the three imported
>>> directories. Make an "umount" and "mount" on this dir - and also the
>>> other two dirs reappear and are accessible again.
>>>
>>> Does anyone has an explanation and/or solution for this setup, to get
>>> the surviving feature back?
>>>
>>> CU - Reuti
>>>
>>> (BTW: all partitions are ext2 on Linux x86 machines, mounted with NFSv3)
>>
>>
>>
>> as a followup to my own posting:
>>
>> I did some more testing: downloaded the latest stable kernel
>> (2.6.13.2) and compiled for server and client, no success.
>>
>> Then I installed a conventional IDE harddisk - same result. So it's
>> not AACRAID to blame for.
>>
>> Then I tried with another server, but a 32 bit machine. And with this
>> it was working.
>>
>> So my conclusion:
>>
>> Server 64 bit and client 32 bit will not survive a stop/start of the
>> NFS server. But if you unmount/mount one of the imported dirs, all
>> will reappear as already stated.
>>
>> Is this a known issue, and/or any solution to this?
>>
>> Tomorrow I'll try to get another 64 bit machine as client, to check
>> whether this would behave the same with the 64 bit server.
>
>
> 64 bit server and 64 bit client has the same issue of a stale NFS handle
> after stop/start of the nfs daemon. To me it looks like a bug, as it was
> working within a pure 32 bit setup. Where exactly shall I report this as
> a bug? - Reuti
>



-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server.
Download it for free - -and be entered to win a 42" plasma tv or your very
own Sony(tm)PSP. Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-09-22 15:18:15

by Vincent Roqueta

[permalink] [raw]
Subject: Re: [CIRCUMVENTED] NFS client got stale NFS handle after server reboot

Le jeudi 22 Septembre 2005 16:52, Reuti a =E9crit=A0:
> Finally I could track down the odd behavior (this time in top posting):
>
> If you have this combination:
>
> + Linux kernel 2.6 nfsd on server
> + 64 bit i386, i.e. x86_64 server
> + using netgroups in /etc/exports
> + operate mountd/exportfs in new mode
>
> you will face the problem of stale NFS handle after a stop/start of of
> the NFS service. As we want to use 64 bit of course, and enjoy netgroups
> for easier administration, I got a working setup by *not* mounting
> /proc/fs/nfsd and forcing this way mountd/exportfs into legacy mode also
> on a 2.6 kernel.
>
> Cheers - Reuti


There are several problems using x86_64 linux NFS (v4) machines as server, =
and=20
it seems to have several problems on 64 bits architectures in general (at=20
least with x86_64 and ppc64. Surely other architectures)
=20
=46or now there are this issue yet known (linux 2.6.13+CITI, Linux 2.6.14-r=
c*=20
+CITI)
=2D> Socket error -11 closing a big file (server : x86_64, clients: x86, pp=
c64)
=2D> wrong lock behaviours (Client PPC64, servers: ppc64, x86)
=2D> Input/output error closing a big file (server: ppc64, client x86_64)
=2D> Oops (Client x86_64, server x86_64) - Linux 2.6.13+CITI

However, this issues concern NFSv4. I am going to try to reproduce your=20
problem with NFSv3 and NFSv4.


Vincent


-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server.
Download it for free - -and be entered to win a 42" plasma tv or your very
own Sony(tm)PSP. Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-09-22 16:28:27

by Chris Penney

[permalink] [raw]
Subject: Re: [CIRCUMVENTED] NFS client got stale NFS handle after server reboot

On 9/22/05, Vincent Roqueta <[email protected]> wrote:
>
> There are several problems using x86_64 linux NFS (v4) machines as server,
> and
> it seems to have several problems on 64 bits architectures in general (at
> least with x86_64 and ppc64. Surely other architectures)
>
> For now there are this issue yet known (linux 2.6.13+CITI, Linux
> 2.6.14-rc*
> +CITI)
> -> Socket error -11 closing a big file (server : x86_64, clients: x86,
> ppc64)
> -> wrong lock behaviours (Client PPC64, servers: ppc64, x86)
> -> Input/output error closing a big file (server: ppc64, client x86_64)
> -> Oops (Client x86_64, server x86_64) - Linux 2.6.13+CITI
>
> However, this issues concern NFSv4. I am going to try to reproduce your
> problem with NFSv3 and NFSv4.
>
I've not heard anything negative about x86_64 systems as NFS servers, but
this note is somewhat concerning to me as I've got some new AMD Opteron
hardware coming to replace some Pentium 4 NFS servers. We only use NFS v3
(mainly tcp). Should I be concerned? Should I use i386 mode instead of
x86_64?
Chris


Attachments:
(No filename) (1.05 kB)
(No filename) (1.41 kB)
Download all attachments

2005-09-22 16:22:21

by Chris Penney

[permalink] [raw]
Subject: Re: [CIRCUMVENTED] NFS client got stale NFS handle after server reboot

On 9/22/05, Vincent Roqueta <[email protected]> wrote:
>
> There are several problems using x86_64 linux NFS (v4) machines as server,
> and
> it seems to have several problems on 64 bits architectures in general (at
> least with x86_64 and ppc64. Surely other architectures)
>
> For now there are this issue yet known (linux 2.6.13+CITI, Linux
> 2.6.14-rc*
> +CITI)
> -> Socket error -11 closing a big file (server : x86_64, clients: x86,
> ppc64)
> -> wrong lock behaviours (Client PPC64, servers: ppc64, x86)
> -> Input/output error closing a big file (server: ppc64, client x86_64)
> -> Oops (Client x86_64, server x86_64) - Linux 2.6.13+CITI
>
> However, this issues concern NFSv4. I am going to try to reproduce your
> problem with NFSv3 and NFSv4.
>
>
> Vincent

I've not heard anything negative about x86_64 systems as NFS servers, but
this note is somewhat concerning to me as I've got some new AMD Opteron
hardware coming to replace some Pentium 4 NFS servers. We only use NFS v3
(mainly tcp). Should I be concerned? Should I use i386 mode instead of
x86_64?
Chris


Attachments:
(No filename) (1.06 kB)
(No filename) (1.48 kB)
Download all attachments