2003-05-05 04:39:49

by Zeev Fisher

[permalink] [raw]
Subject: processes stuck in D state

Hi!

I got a continuos problem of unkillable processes stuck in D state (
uninterruptable sleep ) on my Linux servers.
It happens randomly every time on other server on another process ( all
the servers are configured the same with 2.4.18-10 kernel ). Here's an
example :

root@lnx35 /]# ps -el|grep D
F S UID PID PPID C PRI NI ADDR SZ WCHAN TTY TIME CMD
000 D 911 29327 1 0 75 0 - 9382 lock_p ? 00:00:00
calibre
000 D 894 30049 15854 0 75 0 - 8995 lock_p ? 00:00:01
calibrewb
000 D 894 30092 8661 0 75 0 - 8995 lock_p ? 00:00:01
calibrewb
000 D 894 29773 26052 0 75 0 - 8977 lock_p ? 00:00:01
calibrewb


It was probably stuck while trying to get a lock (which was
certainly free) on an NFS volume mounted from a Netapp server.

Enabling debug mode on rpc ( echo '65535' >/proc/sys/sunrpc/rpc_debug )
didn't gave me any clue.
Tracing the stucked process pid doesn't give any output.

Those processes are there already few days and will stay there until
next reboot.

The load average is now 4 ( although the machine is 100% idle ) and the
system seems to work fine.
If other programs are started again they run and use the same mounts
that the processes above are stuck on.

Another detail is that those problems started when i added the 'intr'
option to my nfs mounted fs but i'm not sure. Also, i can't easily check
that since this problem is not reproducible.

Has anyone noticed the same behavior ? Is this a well known problem ?


Thanks for your help.

--
Zeev Fisher - Unix System Administrator
Marvell Semiconductor Israel Ltd
Moshav Manof, D.N. Misgav 20184, ISRAEL
Email - [email protected]
Tel - + 972 4 9091402
Cell - + 972 54 995402
Fax - + 972 4 9091501
WWW Page: http://www.marvell.com

------------------------------------------------------------------------
This message may contain confidential, proprietary or legally privileged
information. The information is intended only for the use of the individual
or entity named above. If the reader of this message is not the
intended recipient, you are hereby notified that any dissemination, distribution
or copying of this communication is strictly prohibited.
If you have received this communication in error, please notify us
immediately by telephone, or by e-mail and delete the message from your
computer.



2003-05-05 14:56:25

by Michael Buesch

[permalink] [raw]
Subject: Re: processes stuck in D state

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Monday 05 May 2003 07:52, Zeev Fisher wrote:
> Hi!

Hi Zeev!

> I got a continuos problem of unkillable processes stuck in D state (
> uninterruptable sleep ) on my Linux servers.
> It happens randomly every time on other server on another process ( all
> the servers are configured the same with 2.4.18-10 kernel ). Here's an
> example :
[snip]
> Has anyone noticed the same behavior ? Is this a well known problem ?

I've had the same problem with some 2.4.21-preX twice (or maybe more times,
don't remember) on one of my machines.
IMHO it has something to do with NFS. (I'm using this box as a NFS-client).
I wish, I could reproduce it one more time, to do some traces, etc
on it. But I've not found a way to reproduce it, yet.

- --
Regards Michael B?sch
http://www.8ung.at/tuxsoft
16:50:44 up 52 min, 1 user, load average: 1.00, 1.00, 0.94
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)

iD8DBQE+tnugoxoigfggmSgRAt8BAJ0deufnL/E6acpz4pIPZll8f48TIgCfWmcI
auSRmi6oyrTbqMVe+MrfuV4=
=ahIZ
-----END PGP SIGNATURE-----

2003-05-05 15:13:56

by Mike Waychison

[permalink] [raw]
Subject: Re: processes stuck in D state



On Mon, 5 May 2003, Michael Buesch wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On Monday 05 May 2003 07:52, Zeev Fisher wrote:
> > Hi!
>
> Hi Zeev!
>
> > I got a continuos problem of unkillable processes stuck in D state (
> > uninterruptable sleep ) on my Linux servers.
> > It happens randomly every time on other server on another process ( all
> > the servers are configured the same with 2.4.18-10 kernel ). Here's an
> > example :
> [snip]
> > Has anyone noticed the same behavior ? Is this a well known problem ?
>
> I've had the same problem with some 2.4.21-preX twice (or maybe more times,
> don't remember) on one of my machines.
> IMHO it has something to do with NFS. (I'm using this box as a NFS-client).
> I wish, I could reproduce it one more time, to do some traces, etc
> on it. But I've not found a way to reproduce it, yet.
>

This happens when you mount an NFS mount with the 'hard' option (default)
and a mount's handle expires incorrectly (eg: server crash).
Read the mount manpage for an explanation to the downsides of using
the 'soft' option.


Mike Waychison

> - --
> Regards Michael B?sch
> http://www.8ung.at/tuxsoft
> 16:50:44 up 52 min, 1 user, load average: 1.00, 1.00, 0.94
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.2.1 (GNU/Linux)
>
> iD8DBQE+tnugoxoigfggmSgRAt8BAJ0deufnL/E6acpz4pIPZll8f48TIgCfWmcI
> auSRmi6oyrTbqMVe+MrfuV4=
> =ahIZ
> -----END PGP SIGNATURE-----
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2003-05-05 16:14:34

by Michael Buesch

[permalink] [raw]
Subject: Re: processes stuck in D state

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Monday 05 May 2003 17:24, Mike Waychison wrote:
> On Mon, 5 May 2003, Michael Buesch wrote:
> > On Monday 05 May 2003 07:52, Zeev Fisher wrote:
> > > Hi!
> >
> > Hi Zeev!
> >
> > > I got a continuos problem of unkillable processes stuck in D state (
> > > uninterruptable sleep ) on my Linux servers.
> > > It happens randomly every time on other server on another process ( all
> > > the servers are configured the same with 2.4.18-10 kernel ). Here's an
> > > example :
> >
> > [snip]
> >
> > > Has anyone noticed the same behavior ? Is this a well known problem ?
> >
> > I've had the same problem with some 2.4.21-preX twice (or maybe more
> > times, don't remember) on one of my machines.
> > IMHO it has something to do with NFS. (I'm using this box as a
> > NFS-client). I wish, I could reproduce it one more time, to do some
> > traces, etc on it. But I've not found a way to reproduce it, yet.
>
> This happens when you mount an NFS mount with the 'hard' option (default)
> and a mount's handle expires incorrectly (eg: server crash).
> Read the mount manpage for an explanation to the downsides of using
> the 'soft' option.
>
>
> Mike Waychison

my fstab-entry:
192.168.0.50:/mnt/nfs_1 /mnt/nfs_1 nfs rw,hard,intr,user,nodev,nosuid,exec 0 0

from man mount:
[snip] The process cannot be interrupted or killed unless you also specify intr. [/snip]

I can't interrupt any process that accessed the NFS-server
while shutting down the server, although intr is specified.
_That's_ my problem. :)

- --
Regards Michael B?sch
http://www.8ung.at/tuxsoft
18:23:58 up 48 min, 3 users, load average: 1.20, 1.05, 0.93
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)

iD8DBQE+tpCZoxoigfggmSgRAkmdAJwM/L8mZpS+DE2WzjzrXuRdxuY98QCgin1l
aKik6/WGFwWXMjd8pjwHIXw=
=akJd
-----END PGP SIGNATURE-----

2003-05-05 22:03:29

by jw schultz

[permalink] [raw]
Subject: Re: processes stuck in D state

On Mon, May 05, 2003 at 06:25:48PM +0200, Michael Buesch wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On Monday 05 May 2003 17:24, Mike Waychison wrote:
> > On Mon, 5 May 2003, Michael Buesch wrote:
> > > On Monday 05 May 2003 07:52, Zeev Fisher wrote:
> > > > Hi!
> > >
> > > Hi Zeev!
> > >
> > > > I got a continuos problem of unkillable processes stuck in D state (
> > > > uninterruptable sleep ) on my Linux servers.
> > > > It happens randomly every time on other server on another process ( all
> > > > the servers are configured the same with 2.4.18-10 kernel ). Here's an
> > > > example :
> > >
> > > [snip]
> > >
> > > > Has anyone noticed the same behavior ? Is this a well known problem ?
> > >
> > > I've had the same problem with some 2.4.21-preX twice (or maybe more
> > > times, don't remember) on one of my machines.
> > > IMHO it has something to do with NFS. (I'm using this box as a
> > > NFS-client). I wish, I could reproduce it one more time, to do some
> > > traces, etc on it. But I've not found a way to reproduce it, yet.
> >
> > This happens when you mount an NFS mount with the 'hard' option (default)
> > and a mount's handle expires incorrectly (eg: server crash).
> > Read the mount manpage for an explanation to the downsides of using
> > the 'soft' option.
> >
> >
> > Mike Waychison
>
> my fstab-entry:
> 192.168.0.50:/mnt/nfs_1 /mnt/nfs_1 nfs rw,hard,intr,user,nodev,nosuid,exec 0 0
>
> from man mount:
> [snip] The process cannot be interrupted or killed unless you also specify intr. [/snip]
>
> I can't interrupt any process that accessed the NFS-server
> while shutting down the server, although intr is specified.
> _That's_ my problem. :)

I had a similar problem with SuSE's 2.4.18. Random processes
seemed to go into D state from whence intr is useless.

I rebuilt the kernel with NFSv3 disabled and that problem
went away. The logs are full of
May 5 14:54:15 duncan kernel: NFS: NFSv3 not supported.
May 5 14:54:15 duncan kernel: nfs warning: mount version older than kernel
but that i can live with. Processes hung and umount failing
i cannot abide.

If there is a better answer, i'm listening.


--
________________________________________________________________
J.W. Schultz Pegasystems Technologies
email address: [email protected]

Remember Cernan and Schmitt