2004-03-09 19:42:29

by Wade Hampton

[permalink] [raw]
Subject: df hangs on down nfs server mounted with hard,intr, can't kill

[I posted this to the Fedora list yesterday.]

I have a Fedora server with kernel 2.4.22-1-2163 SMP mounting a
remote solaris server (hence choice of options):

rsize=32768,ro,hard,intr,tcp,nfsvers=3

When the remote is down or disconnected, a "df" hangs (as expected),
but I can't kill it, even as root or with kill -9. The docs for mount
indicate
that the INTR option should allow for killing apps mounted with HARD.
Is this a bug (glibc, 2.4 kernel, NFS, or Fedora's kernel)?

I also coded a test program that calls statvfs(2) and it hangs on
the statvfs(2) call when run against a down NFS server. It too
can't be interrupted or killed.

My questions are:

1) Is there a safe and reliable means to check for a down NFS server
(e.g., is showmount -e <server> safe enough?)

2) Is the non-interruptable operation (even with INTR option)
a bug or feature?

3) Is there a simple kernel call, /proc entry, or similar that can
be used to reliably check for free/used disk space and for a down
host, without hanging my application?

A showmount -e followed by a statvfs() might work, but
there is the possibility of losing the host between the two
calls, resulting in an application hang.

4) Is there a perl module to accomplish this?

This would be very useful for network monitoring, e.g., when the
server goes down and stays down for >1 minute, generate an SNMP
trap and write to a log file. It would be good if you can't put an SNMP
agent on the server, but only on the client. It is also useful for writing
a highly reliable client application.

As I have no control over the remote system, when it went down,
I had to do a hard reboot of my Linux box to stop the hung apps. This
is a Windows solution, not a Linux solution....

Note, I found this when writing some scripts for MRTG to check
the disk utilization of partitions. My df's hung so I didn't even get
the proper values for my local partitions. After a few days, I had
LOTS of hung MRTG apps and had to reboot (this test server is
down for a week or two).

Thanks
--
Wade Hampton


-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs


2004-03-09 20:00:36

by Lever, Charles

[permalink] [raw]
Subject: RE: df hangs on down nfs server mounted with hard,intr, can't kill

hi wade-

the fact that "intr" doesn't work as expected is a bug, and
folks are attempting to address this at least partially in 2.6.

if you want a way to do a "df" without hanging your client,
try using a soft mount with a short-ish timeout for your
df requests.

caveat: read the Linux NFS FAQ for more on using "soft" safely.

> -----Original Message-----
> From: Wade Hampton [mailto:[email protected]]=20
> Sent: Tuesday, March 09, 2004 2:33 PM
> To: [email protected]
> Subject: [NFS] df hangs on down nfs server mounted with=20
> hard,intr, can't kill
>=20
>=20
> [I posted this to the Fedora list yesterday.]
>=20
> I have a Fedora server with kernel 2.4.22-1-2163 SMP mounting=20
> a remote solaris server (hence choice of options):
>=20
> rsize=3D32768,ro,hard,intr,tcp,nfsvers=3D3
>=20
> When the remote is down or disconnected, a "df" hangs (as=20
> expected), but I can't kill it, even as root or with kill -9.=20
> The docs for mount=20
> indicate
> that the INTR option should allow for killing apps mounted=20
> with HARD. Is this a bug (glibc, 2.4 kernel, NFS, or Fedora's kernel)?
>=20
> I also coded a test program that calls statvfs(2) and it=20
> hangs on the statvfs(2) call when run against a down NFS=20
> server. It too can't be interrupted or killed.
>=20
> My questions are:
>=20
> 1) Is there a safe and reliable means to check for a down NFS server
> (e.g., is showmount -e <server> safe enough?)
>=20
> 2) Is the non-interruptable operation (even with INTR option)
> a bug or feature?
>=20
> 3) Is there a simple kernel call, /proc entry, or similar that can
> be used to reliably check for free/used disk space and for a down
> host, without hanging my application?
> =20
> A showmount -e followed by a statvfs() might work, but
> there is the possibility of losing the host between the two
> calls, resulting in an application hang.
>=20
> 4) Is there a perl module to accomplish this?
>=20
> This would be very useful for network monitoring, e.g., when=20
> the server goes down and stays down for >1 minute, generate=20
> an SNMP trap and write to a log file. It would be good if=20
> you can't put an SNMP agent on the server, but only on the=20
> client. It is also useful for writing a highly reliable=20
> client application.
>=20
> As I have no control over the remote system, when it went=20
> down, I had to do a hard reboot of my Linux box to stop the=20
> hung apps. This is a Windows solution, not a Linux solution....
> =20
> Note, I found this when writing some scripts for MRTG to=20
> check the disk utilization of partitions. My df's hung so I=20
> didn't even get the proper values for my local partitions. =20
> After a few days, I had LOTS of hung MRTG apps and had to=20
> reboot (this test server is down for a week or two).
>=20
> Thanks
> --=20
> Wade Hampton
>=20
>=20
> -------------------------------------------------------
> This SF.Net email is sponsored by: IBM Linux Tutorials
> Free Linux tutorial presented by Daniel Robbins, President=20
> and CEO of GenToo technologies. Learn everything from=20
> fundamentals to system=20
> =
administration.http://ads.osdn.com/?ad_id=3D1470&alloc_id=3D3638&op=3Dcli=
ck
> _______________________________________________
> NFS maillist - [email protected]=20
> https://lists.sourceforge.net/lists/listinfo/n> fs
>=20


-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-03-09 20:18:29

by Wade Hampton

[permalink] [raw]
Subject: Re: df hangs on down nfs server mounted with hard,intr, can't kill

Lever, Charles wrote:

>hi wade-
>
>the fact that "intr" doesn't work as expected is a bug, and
>folks are attempting to address this at least partially in 2.6.
>
>
Thanks. While they they are addressing it, it would be nice to have
a means of calling statvfs with a timeout or always have statvfs
interruptable (even if mounted hard w/o intr).

One kernel goal should be to allow someone to lose a mount or USB device,
and still be able to kill any running jobs/modules, but not have to
reboot the
computer (NFS yesterday on one machine, USB today on another).

>if you want a way to do a "df" without hanging your client,
>try using a soft mount with a short-ish timeout for your
>df requests.
>
>
I had to go with the hard mount command due to the type of NFS
server I have to access (legacy system).

>caveat: read the Linux NFS FAQ for more on using "soft" safely.
>
>
Thanks. I will.

Do you think that doing a showmount before doing a df would be
a safe approach for now?

showmount -e <server>
if it fails
return error code
else
df <mount point>

Thanks,
--
Wade Hampton


-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-03-10 02:45:38

by Ian Kent

[permalink] [raw]
Subject: Re: df hangs on down nfs server mounted with hard,intr, can't kill

On Tue, 9 Mar 2004, Wade Hampton wrote:

> [I posted this to the Fedora list yesterday.]
>
> I have a Fedora server with kernel 2.4.22-1-2163 SMP mounting a
> remote solaris server (hence choice of options):
>
> rsize=32768,ro,hard,intr,tcp,nfsvers=3
>
> When the remote is down or disconnected, a "df" hangs (as expected),
> but I can't kill it, even as root or with kill -9. The docs for mount
> indicate
> that the INTR option should allow for killing apps mounted with HARD.
> Is this a bug (glibc, 2.4 kernel, NFS, or Fedora's kernel)?
>

snip ...

>
> 4) Is there a perl module to accomplish this?

Perhaps you could convert this bit of code to perl. There would surely
be access to the RPC libraries from perl.

If the server is down the timeout is several seconds longer than the
timeout first time the routine is called (anyone care to comment as to
why?) but at least it returns.

#include <rpc/rpc.h>
#include <nfs/nfs.h>
#include <linux/nfs2.h>
#include <rpc/xdr.h>

int rpc_ping(const char *host, long seconds, long micros)
{
CLIENT *client;
struct timeval tout;
enum clnt_stat stat;

client = clnt_create(host, NFS_PROGRAM, NFS2_VERSION, "udp");
if (client == NULL) {
return 0;
}

tout.tv_sec = seconds;
tout.tv_usec = micros;

clnt_control(client, CLSET_TIMEOUT, (char *)&tout);
clnt_control(client, CLSET_RETRY_TIMEOUT, (char *)&tout);

stat = clnt_call(client, NFSPROC_NULL,
(xdrproc_t)xdr_void, 0, (xdrproc_t)xdr_void, 0, tout);

clnt_destroy(client);

if (stat != RPC_SUCCESS) {
return 0;
}

return 1;
}

Ian



-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-03-10 08:18:54

by Olaf Kirch

[permalink] [raw]
Subject: Re: df hangs on down nfs server mounted with hard,intr, can't kill

On Tue, Mar 09, 2004 at 11:51:10AM -0800, Lever, Charles wrote:
> the fact that "intr" doesn't work as expected is a bug, and
> folks are attempting to address this at least partially in 2.6.

FWIW, attached you'll find the patch I'm currently using.
I thought I had posted it to this list already.

BTW what is the "right" way to get stuff like this into the mainline
kernel? Trond, are you collecting stuff and pushing it to Andrew, or
should I be sending patches to him directly?

Olaf
--
Olaf Kirch | Stop wasting entropy - start using predictable
[email protected] | tempfile names today!
---------------+


Attachments:
(No filename) (616.00 B)
nfs-interruptible (1.54 kB)
Download all attachments

2004-03-10 19:28:50

by Trond Myklebust

[permalink] [raw]
Subject: Re: df hangs on down nfs server mounted with hard,intr, can't kill

P=E5 on , 10/03/2004 klokka 03:09, skreiv Olaf Kirch:
> On Tue, Mar 09, 2004 at 11:51:10AM -0800, Lever, Charles wrote:
> > the fact that "intr" doesn't work as expected is a bug, and
> > folks are attempting to address this at least partially in 2.6.
>=20
> FWIW, attached you'll find the patch I'm currently using.
> I thought I had posted it to this list already.
>=20
> BTW what is the "right" way to get stuff like this into the mainline
> kernel? Trond, are you collecting stuff and pushing it to Andrew, or
> should I be sending patches to him directly?

I've got your patch queued in the latest NFS4_ALL, and I believe I
already pushed it to Andrew (so it should already be in the latest
2.6.3-mm). I'm aiming to push it into 2.6.5-preX as soon as Linus
releases the final version of 2.6.4...

You are of course free to send stuff directly to Andrew/Linus, but I
personally would prefer if you attempt to queue it through me, since
that allows for a more ordered development process. Fixing up a set of
30 patches in order to accomodate some minor change that went directly
into the main kernel can be very time consuming, so it makes it easier
for me if we just append stuff to the NFS4_ALL patches (unless, of
course, we are talking *critical* fixes).

FYI: you can for the moment assume that *all* patches that are in the
http://www.fys.uio.no/~trondmy/src/Linux-2.6.x subdirectories are being
queued for inclusion in Linus' kernel. Only if a patch shows serious
problems under testing will it be dropped.

Cheers,
Trond


-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-03-11 09:42:07

by Olaf Kirch

[permalink] [raw]
Subject: Re: df hangs on down nfs server mounted with hard,intr, can't kill

On Wed, Mar 10, 2004 at 02:18:46PM -0500, Trond Myklebust wrote:
> You are of course free to send stuff directly to Andrew/Linus, but I
> personally would prefer if you attempt to queue it through me, since
> that allows for a more ordered development process. Fixing up a set of
> 30 patches in order to accomodate some minor change that went directly
> into the main kernel can be very time consuming,

Very much agreed...

This arrangement is fine with me; I just want to avoid a situation
where my patches go to the bit bucket because you think I will
push them to Andrew and I think you will do so :)

Olaf
--
Olaf Kirch | Stop wasting entropy - start using predictable
[email protected] | tempfile names today!
---------------+


-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-03-11 19:54:43

by Trond Myklebust

[permalink] [raw]
Subject: Re: df hangs on down nfs server mounted with hard,intr, can't kill

P=E5 to , 11/03/2004 klokka 04:31, skreiv Olaf Kirch:

> This arrangement is fine with me; I just want to avoid a situation
> where my patches go to the bit bucket because you think I will
> push them to Andrew and I think you will do so :)

No. I've been assuming you would allow me to continue to play at being
the NFS client maintainer. ;-)

Cheers,
Trond


-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-03-12 09:07:25

by Olaf Kirch

[permalink] [raw]
Subject: Re: df hangs on down nfs server mounted with hard,intr, can't kill

On Thu, Mar 11, 2004 at 02:44:05PM -0500, Trond Myklebust wrote:
> P=E5 to , 11/03/2004 klokka 04:31, skreiv Olaf Kirch:
>=20
> > This arrangement is fine with me; I just want to avoid a situation
> > where my patches go to the bit bucket because you think I will
> > push them to Andrew and I think you will do so :)
>=20
> No. I've been assuming you would allow me to continue to play at being
> the NFS client maintainer. ;-)

Don't you worry, I'm not interested in wearing your hat. Been there,
done that, failed :)

Olaf
--=20
Olaf Kirch | Stop wasting entropy - start using predictable
[email protected] | tempfile names today!
---------------+=20


-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-03-15 05:21:14

by Yusuf Goolamabbas

[permalink] [raw]
Subject: Re: df hangs on down nfs server mounted with hard,intr, can't kill

> I've got your patch queued in the latest NFS4_ALL, and I believe I
> already pushed it to Andrew (so it should already be in the latest
> 2.6.3-mm). I'm aiming to push it into 2.6.5-preX as soon as Linus
> releases the final version of 2.6.4...
>
> FYI: you can for the moment assume that *all* patches that are in the
> http://www.fys.uio.no/~trondmy/src/Linux-2.6.x subdirectories are being
> queued for inclusion in Linus' kernel. Only if a patch shows serious
> problems under testing will it be dropped.
>

Trond, Looking at the latest bk/web it seems some of your NFS4_ALL
patches got in to mainline whilst 2.6.4-mm2 has some patches which akpm
calls

- Some significant NFS client enhancements: reads smaller than PAGE_SIZE
are no longer synchronous, support for smaller-than-PAGE_SIZE reads,
etc.

http://marc.theaimsgroup.com/?l=linux-kernel&m=107931425909684&w=2

Are these going to mature in -mm for a while or are they going to be
2.6.5 also ?

Also, what's the status with the nfszerostats patch from Steve Dickson
of Redhat

http://people.redhat.com/steved/NFS/nfszerostats/

Regards, Yusuf


-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-03-15 17:03:33

by Trond Myklebust

[permalink] [raw]
Subject: Re: df hangs on down nfs server mounted with hard,intr, can't kill

P=E5 m=E5 , 15/03/2004 klokka 00:21, skreiv Yusuf Goolamabbas:

> Trond, Looking at the latest bk/web it seems some of your NFS4_ALL
> patches got in to mainline whilst 2.6.4-mm2 has some patches which akpm
> calls
>=20
> - Some significant NFS client enhancements: reads smaller than PAGE_SIZE
> are no longer synchronous, support for smaller-than-PAGE_SIZE reads,
> etc.
>=20
> http://marc.theaimsgroup.com/?l=3Dlinux-kernel&m=3D107931425909684&w=3D2
>=20
> Are these going to mature in -mm for a while or are they going to be
> 2.6.5 also ?

That's up to Andrew to decide: I've pushed them to him, and as
maintainer of the 2.6. tree, he gets to decide when they go in...

> Also, what's the status with the nfszerostats patch from Steve Dickson
> of Redhat
>=20
> http://people.redhat.com/steved/NFS/nfszerostats/

Chuck is currently working on some enhanced NFS statistics patches. I
imagine he will take steps to include Steve's patches in his work, and
then push them to me.

Cheers,
Trond


-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-03-15 18:17:09

by Steve Dickson

[permalink] [raw]
Subject: Re: df hangs on down nfs server mounted with hard,intr, can't kill

Trond Myklebust wrote:

>P=E5 m=E5 , 15/03/2004 klokka 00:21, skreiv Yusuf Goolamabbas:
> =20
>
>>Also, what's the status with the nfszerostats patch from Steve Dickson
>>of Redhat
>>
>>http://people.redhat.com/steved/NFS/nfszerostats/
>> =20
>>
>
>Chuck is currently working on some enhanced NFS statistics patches. I
>imagine he will take steps to include Steve's patches in his work, and
>then push them to me.
> =20
>
FYI... The nfs-utils code to support nfszerostats is already
in the FC2 rawhides... I'm hopeful Neil will include this
patch in the next nfs-utils release....

SteveD.




-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs