2005-05-18 14:29:18

by Henrik Schmiediche

[permalink] [raw]
Subject: Extremely high load on NFS clients wen the NFS server is down...


Hello,
I have two Redhat AS3 servers. One of them (among other things) serves NFS
file systems to my other systems including to the other server. When my NFS
file server goes down or is restarted the load on the other AS3 server
increases to the point it is completely useless (it goes to 50+ in a minute
or two). I never observed this behavior when I was serving NFS file systems
using a Solaris system.

Has anyone observed this phenomenon? Any solution to it?

Sincerely,

- Henrik




-------------------------------------------------------
This SF.Net email is sponsored by Oracle Space Sweepstakes
Want to be the first software developer in space?
Enter now for the Oracle Space Sweepstakes!
http://ads.osdn.com/?ad_id=7412&alloc_id=16344&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs


2005-05-18 15:49:47

by Vincent Roqueta

[permalink] [raw]
Subject: Re: Extremely high load on NFS clients wen the NFS server is down...

Le mercredi 18 Mai 2005 16:28, Henrik Schmiediche a =E9crit=A0:
> Hello,
> I have two Redhat AS3 servers. One of them (among other things) serves NFS
> file systems to my other systems including to the other server. When my N=
=46S
> file server goes down or is restarted the load on the other AS3 server
> increases to the point it is completely useless (it goes to 50+ in a minu=
te
> or two). I never observed this behavior when I was serving NFS file syste=
ms
> using a Solaris system.
>
> Has anyone observed this phenomenon? Any solution to it?
Which process use so many CPU ? rpc.mountd ?


Cordialement,
Vincent


-------------------------------------------------------
This SF.Net email is sponsored by Oracle Space Sweepstakes
Want to be the first software developer in space?
Enter now for the Oracle Space Sweepstakes!
http://ads.osdn.com/?ad_id=7412&alloc_id=16344&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-05-18 18:01:23

by Steve Dickson

[permalink] [raw]
Subject: Re: Extremely high load on NFS clients wen the NFS server is down...



Henrik Schmiediche wrote:
> I have two Redhat AS3 servers. One of them (among other things) serves NFS
> file systems to my other systems including to the other server. When my NFS
> file server goes down or is restarted the load on the other AS3 server
> increases to the point it is completely useless (it goes to 50+ in a minute
> or two). I never observed this behavior when I was serving NFS file systems
> using a Solaris system.
>
> Has anyone observed this phenomenon? Any solution to it?
No... Please define "completely useless". Is there a ton of
network traffic? If so, please produce a bzip2 binary tethereal
trace of the traffic. If the cpu is pined or the system hangs please
produce a system trace (i.e. echo t > /proc/sysrq-trigger).
Also what is the exact kernel version (i.e. uname -r).

steved.


-------------------------------------------------------
This SF.Net email is sponsored by Oracle Space Sweepstakes
Want to be the first software developer in space?
Enter now for the Oracle Space Sweepstakes!
http://ads.osdn.com/?ad_id=7412&alloc_id=16344&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-05-18 18:52:21

by Dan Stromberg

[permalink] [raw]
Subject: Re: Extremely high load on NFS clients wen the NFS server is down...

On Wed, 2005-05-18 at 09:28 -0500, Henrik Schmiediche wrote:
> Hello,
> I have two Redhat AS3 servers. One of them (among other things) serves NFS
> file systems to my other systems including to the other server. When my NFS
> file server goes down or is restarted the load on the other AS3 server
> increases to the point it is completely useless (it goes to 50+ in a minute
> or two). I never observed this behavior when I was serving NFS file systems
> using a Solaris system.
>
> Has anyone observed this phenomenon? Any solution to it?
>
> Sincerely,
>
> - Henrik

I've seen this a number of times. In fact, I just sorted out such a
situation again over the weekend - but running jack the ripper to get a
list of accounts with bad passwords, and digging up one that had a local
shell, rather than a shell on NFS.

Far from a fix, but this program can help a lot when the load on a
system gets so high, and the NFS timeouts are so bad, that you cannot
ssh in or get any other form of interactive shell:

http://dcs.nac.uci.edu/~strombrg/fallback-reboot/

In fact, it can often even allow you reboot a system when the system's
hard disks have gone temporarily useless.

(Yes, it's a shameless plug of my own program :)


Attachments:
signature.asc (189.00 B)
This is a digitally signed message part

2005-05-18 19:12:33

by Henrik Schmiediche

[permalink] [raw]
Subject: RE: Extremely high load on NFS clients wen the NFS server is down...

Steve,
Thanks for your response. The command 'echo t > /proc/sysrq-trigger' does
nothing that I can tell. Is there something I need to do to enable this
feature?

Completely useless means that the load continues to increase on the server
that has the NFS file systems mounted to the point it become sluggish and
ultimately stops responding altogether. As the load increases the system (at
the beginning) is still responsive so I can monitor it. 'top' suggests there
is no process eating up a significant amount of CPU as the load is
increasing. I do not know if the load ever plateaus, the system stops
responding before that point.

The kernel is 2.6.11.4-20a-smp --- the latest one as far as I know.

There is a lot of network traffic on the server. I will get an ethereal dump
when I can.

Sincerely,

- Henrik






-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Steve Dickson
Sent: Wednesday, May 18, 2005 1:01 PM
To: Henrik Schmiediche
Cc: [email protected]
Subject: Re: [NFS] Extremely high load on NFS clients wen the NFS server is
down...



Henrik Schmiediche wrote:
> I have two Redhat AS3 servers. One of them (among other things) serves NFS
> file systems to my other systems including to the other server. When my
NFS
> file server goes down or is restarted the load on the other AS3 server
> increases to the point it is completely useless (it goes to 50+ in a
minute
> or two). I never observed this behavior when I was serving NFS file
systems
> using a Solaris system.
>
> Has anyone observed this phenomenon? Any solution to it?
No... Please define "completely useless". Is there a ton of
network traffic? If so, please produce a bzip2 binary tethereal
trace of the traffic. If the cpu is pined or the system hangs please
produce a system trace (i.e. echo t > /proc/sysrq-trigger).
Also what is the exact kernel version (i.e. uname -r).

steved.


-------------------------------------------------------
This SF.Net email is sponsored by Oracle Space Sweepstakes
Want to be the first software developer in space?
Enter now for the Oracle Space Sweepstakes!
http://ads.osdn.com/?ad_id=7412&alloc_id=16344&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs



-------------------------------------------------------
This SF.Net email is sponsored by Oracle Space Sweepstakes
Want to be the first software developer in space?
Enter now for the Oracle Space Sweepstakes!
http://ads.osdn.com/?ad_id=7412&alloc_id=16344&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-05-18 19:15:11

by Henrik Schmiediche

[permalink] [raw]
Subject: RE: Extremely high load on NFS clients wen the NFS server isdown...


Thanks for the info. I will check out your program.

One more thing, once the NFS server comes back up the load on the client NFS
server system reduces to normal levels. It becomes responsive again.

Sincerely,

- Henrik

-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Dan Stromberg
Sent: Wednesday, May 18, 2005 1:52 PM
To: Henrik Schmiediche
Cc: [email protected]; [email protected]
Subject: Re: [NFS] Extremely high load on NFS clients wen the NFS server
isdown...

On Wed, 2005-05-18 at 09:28 -0500, Henrik Schmiediche wrote:
> Hello,
> I have two Redhat AS3 servers. One of them (among other things) serves NFS
> file systems to my other systems including to the other server. When my
NFS
> file server goes down or is restarted the load on the other AS3 server
> increases to the point it is completely useless (it goes to 50+ in a
minute
> or two). I never observed this behavior when I was serving NFS file
systems
> using a Solaris system.
>
> Has anyone observed this phenomenon? Any solution to it?
>
> Sincerely,
>
> - Henrik

I've seen this a number of times. In fact, I just sorted out such a
situation again over the weekend - but running jack the ripper to get a
list of accounts with bad passwords, and digging up one that had a local
shell, rather than a shell on NFS.

Far from a fix, but this program can help a lot when the load on a
system gets so high, and the NFS timeouts are so bad, that you cannot
ssh in or get any other form of interactive shell:

http://dcs.nac.uci.edu/~strombrg/fallback-reboot/

In fact, it can often even allow you reboot a system when the system's
hard disks have gone temporarily useless.

(Yes, it's a shameless plug of my own program :)




-------------------------------------------------------
This SF.Net email is sponsored by Oracle Space Sweepstakes
Want to be the first software developer in space?
Enter now for the Oracle Space Sweepstakes!
http://ads.osdn.com/?ad_id=7412&alloc_id=16344&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-05-18 19:20:38

by J. Bruce Fields

[permalink] [raw]
Subject: Re: Extremely high load on NFS clients wen the NFS server is down...

On Wed, May 18, 2005 at 02:11:58PM -0500, Henrik Schmiediche wrote:
> Steve,
> Thanks for your response. The command 'echo t > /proc/sysrq-trigger' does
> nothing that I can tell.

Check your logs--it should have dumped a bunch of task information into
/var/log/messages.

--b.


-------------------------------------------------------
This SF.Net email is sponsored by Oracle Space Sweepstakes
Want to be the first software developer in space?
Enter now for the Oracle Space Sweepstakes!
http://ads.osdn.com/?ad_id=7412&alloc_id=16344&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-05-18 19:31:28

by Henrik Schmiediche

[permalink] [raw]
Subject: RE: Extremely high load on NFS clients wen the NFS server is down...


Nope. Nothing there or in any any of the log files in /var/log/

- Henrik


-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of J. Bruce Fields
Sent: Wednesday, May 18, 2005 2:20 PM
To: Henrik Schmiediche
Cc: 'Steve Dickson'; [email protected]
Subject: Re: [NFS] Extremely high load on NFS clients wen the NFS server is
down...

On Wed, May 18, 2005 at 02:11:58PM -0500, Henrik Schmiediche wrote:
> Steve,
> Thanks for your response. The command 'echo t > /proc/sysrq-trigger' does
> nothing that I can tell.

Check your logs--it should have dumped a bunch of task information into
/var/log/messages.

--b.


-------------------------------------------------------
This SF.Net email is sponsored by Oracle Space Sweepstakes
Want to be the first software developer in space?
Enter now for the Oracle Space Sweepstakes!
http://ads.osdn.com/?ad_id=7412&alloc_id=16344&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs



-------------------------------------------------------
This SF.Net email is sponsored by Oracle Space Sweepstakes
Want to be the first software developer in space?
Enter now for the Oracle Space Sweepstakes!
http://ads.osdn.com/?ad_id=7412&alloc_id=16344&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-05-18 19:32:18

by Lever, Charles

[permalink] [raw]
Subject: RE: Extremely high load on NFS clients wen the NFS server is down...

> On Wed, May 18, 2005 at 02:11:58PM -0500, Henrik Schmiediche wrote:
> > Steve,
> > Thanks for your response. The command 'echo t >=20
> /proc/sysrq-trigger' does
> > nothing that I can tell.
>=20
> Check your logs--it should have dumped a bunch of task=20
> information into /var/log/messages.

only if you have enabled sysrq first:

sudo sysctl -w kernel.sysrq=3D1


-------------------------------------------------------
This SF.Net email is sponsored by Oracle Space Sweepstakes
Want to be the first software developer in space?
Enter now for the Oracle Space Sweepstakes!
http://ads.osdn.com/?ad_id=7412&alloc_id=16344&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-05-18 19:36:43

by Joshua Baker-LePain

[permalink] [raw]
Subject: RE: Extremely high load on NFS clients wen the NFS server is down...

On Wed, 18 May 2005 at 2:11pm, Henrik Schmiediche wrote

> Thanks for your response. The command 'echo t > /proc/sysrq-trigger' does
> nothing that I can tell. Is there something I need to do to enable this
> feature?

echo 1 > /proc/sys/kernel/sysrq

--
Joshua Baker-LePain
Department of Biomedical Engineering
Duke University


-------------------------------------------------------
This SF.Net email is sponsored by Oracle Space Sweepstakes
Want to be the first software developer in space?
Enter now for the Oracle Space Sweepstakes!
http://ads.osdn.com/?ad_id=7412&alloc_id=16344&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-05-18 19:51:52

by Henrik Schmiediche

[permalink] [raw]
Subject: RE: Extremely high load on NFS clients wen the NFS server is down...


That did it. Thanks.

- Henrik


-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Joshua Baker-LePain
Sent: Wednesday, May 18, 2005 2:37 PM
To: Henrik Schmiediche
Cc: 'Steve Dickson'; [email protected]
Subject: RE: [NFS] Extremely high load on NFS clients wen the NFS server is
down...

On Wed, 18 May 2005 at 2:11pm, Henrik Schmiediche wrote

> Thanks for your response. The command 'echo t > /proc/sysrq-trigger' does
> nothing that I can tell. Is there something I need to do to enable this
> feature?

echo 1 > /proc/sys/kernel/sysrq

--
Joshua Baker-LePain
Department of Biomedical Engineering
Duke University


-------------------------------------------------------
This SF.Net email is sponsored by Oracle Space Sweepstakes
Want to be the first software developer in space?
Enter now for the Oracle Space Sweepstakes!
http://ads.osdn.com/?ad_id=7412&alloc_id=16344&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs



-------------------------------------------------------
This SF.Net email is sponsored by Oracle Space Sweepstakes
Want to be the first software developer in space?
Enter now for the Oracle Space Sweepstakes!
http://ads.osdn.com/?ad_id=7412&alloc_id=16344&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-05-18 20:14:54

by Eric S. Johnson

[permalink] [raw]
Subject: Re: Extremely high load on NFS clients wen the NFS server is down...


Henrik,

Ive seen this behavior, somewhat... When a nfs server goes down a
large number of processes go into uninterruptible sleep, waiting on
unanswerable NFS rpc calls. This drives the load average up.. BUT does
not really have any affect on response time *for processes that don't
try to access NFS mounted files* The load is not real, its just a
count of processes in uninterruptible sleep.. You can see all these
processes with ps, the STAT column says D

kill -9 to the rpciod process will free up some of the processes
that are stuck in disk wait. Till they re-issue the nfs call...

I have not gone much more into the why of this... But sometimes
a root shell (with a local $HOME and no NFS mounted directories in
the path) and a bunch of kill -9's to rpciod process will help you
regain control of things enough to do further diagnose and recovery
steps.

E



-------------------------------------------------------
This SF.Net email is sponsored by Oracle Space Sweepstakes
Want to be the first software developer in space?
Enter now for the Oracle Space Sweepstakes!
http://ads.osdn.com/?ad_id=7412&alloc_id=16344&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-05-18 20:30:05

by Henrik Schmiediche

[permalink] [raw]
Subject: RE: Extremely high load on NFS clients wen the NFS server is down...


Thanks for this. I will experiment with your suggestion. The load may be
artificial, but my system definitively become sluggish (after a while) to
the point of non-responsiveness.

Sincerely,

- Henrik


-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Eric S. Johnson
Sent: Wednesday, May 18, 2005 3:15 PM
To: Henrik Schmiediche
Cc: [email protected]
Subject: Re: [NFS] Extremely high load on NFS clients wen the NFS server is
down...


Henrik,

Ive seen this behavior, somewhat... When a nfs server goes down a
large number of processes go into uninterruptible sleep, waiting on
unanswerable NFS rpc calls. This drives the load average up.. BUT does
not really have any affect on response time *for processes that don't
try to access NFS mounted files* The load is not real, its just a
count of processes in uninterruptible sleep.. You can see all these
processes with ps, the STAT column says D

kill -9 to the rpciod process will free up some of the processes
that are stuck in disk wait. Till they re-issue the nfs call...

I have not gone much more into the why of this... But sometimes
a root shell (with a local $HOME and no NFS mounted directories in
the path) and a bunch of kill -9's to rpciod process will help you
regain control of things enough to do further diagnose and recovery
steps.

E



-------------------------------------------------------
This SF.Net email is sponsored by Oracle Space Sweepstakes
Want to be the first software developer in space?
Enter now for the Oracle Space Sweepstakes!
http://ads.osdn.com/?ad_id=7412&alloc_id=16344&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs



-------------------------------------------------------
This SF.Net email is sponsored by Oracle Space Sweepstakes
Want to be the first software developer in space?
Enter now for the Oracle Space Sweepstakes!
http://ads.osdn.com/?ad_id=7412&alloc_id=16344&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-05-18 20:50:34

by Dan Stromberg

[permalink] [raw]
Subject: RE: Extremely high load on NFS clients wen the NFS server is down...


Agreed. It feels like "false load" initially, but after a while, things
go kerflooey.

On Wed, 2005-05-18 at 15:29 -0500, Henrik Schmiediche wrote:
> Thanks for this. I will experiment with your suggestion. The load may be
> artificial, but my system definitively become sluggish (after a while) to
> the point of non-responsiveness.
>
> Sincerely,
>
> - Henrik
>
>
> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]] On Behalf Of Eric S. Johnson
> Sent: Wednesday, May 18, 2005 3:15 PM
> To: Henrik Schmiediche
> Cc: [email protected]
> Subject: Re: [NFS] Extremely high load on NFS clients wen the NFS server is
> down...
>
>
> Henrik,
>
> Ive seen this behavior, somewhat... When a nfs server goes down a
> large number of processes go into uninterruptible sleep, waiting on
> unanswerable NFS rpc calls. This drives the load average up.. BUT does
> not really have any affect on response time *for processes that don't
> try to access NFS mounted files* The load is not real, its just a
> count of processes in uninterruptible sleep.. You can see all these
> processes with ps, the STAT column says D
>
> kill -9 to the rpciod process will free up some of the processes
> that are stuck in disk wait. Till they re-issue the nfs call...
>
> I have not gone much more into the why of this... But sometimes
> a root shell (with a local $HOME and no NFS mounted directories in
> the path) and a bunch of kill -9's to rpciod process will help you
> regain control of things enough to do further diagnose and recovery
> steps.
>
> E
>
>
>
> -------------------------------------------------------
> This SF.Net email is sponsored by Oracle Space Sweepstakes
> Want to be the first software developer in space?
> Enter now for the Oracle Space Sweepstakes!
> http://ads.osdn.com/?ad_id=7412&alloc_id=16344&op=click
> _______________________________________________
> NFS maillist - [email protected]
> https://lists.sourceforge.net/lists/listinfo/nfs
>
>
>
> -------------------------------------------------------
> This SF.Net email is sponsored by Oracle Space Sweepstakes
> Want to be the first software developer in space?
> Enter now for the Oracle Space Sweepstakes!
> http://ads.osdn.com/?ad_id=7412&alloc_id=16344&op=click
> _______________________________________________
> NFS maillist - [email protected]
> https://lists.sourceforge.net/lists/listinfo/nfs
>


Attachments:
signature.asc (189.00 B)
This is a digitally signed message part

2005-05-18 21:39:42

by Steve Dickson

[permalink] [raw]
Subject: Re: Extremely high load on NFS clients wen the NFS server is down...

Lever, Charles wrote:
> only if you have enabled sysrq first:
>
> sudo sysctl -w kernel.sysrq=1
You can also set 'kernel.sysrq = 1' in /etc/sysctl.conf
so its always set....

steved.



-------------------------------------------------------
This SF.Net email is sponsored by Oracle Space Sweepstakes
Want to be the first software developer in space?
Enter now for the Oracle Space Sweepstakes!
http://ads.osdn.com/?ad_id=7412&alloc_id=16344&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs