2003-05-06 14:52:10

by Michael Buesch

[permalink] [raw]
Subject: processes stuck in D state

=2D----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi!

Please take a look at this problem:

[linux-kernel-mailing-list thread]
http://marc.theaimsgroup.com/?t=3D98639966100003&r=3D1&w=3D2

thanks.
Please cc me, as I'm not subscribed to the nfs-list.

=2D --=20
Regards Michael B=FCsch
http://www.8ung.at/tuxsoft
16:28:34 up 20 min, 1 user, load average: 1.07, 0.97, 0.66
=2D----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)

iD8DBQE+t8wdoxoigfggmSgRAm5aAJsGJLPe9yUd4sqah4yiU0GsMIAGzACfSa2+
gAMZvSHQirHmE8yZChpgH/8=3D
=3Dpka2
=2D----END PGP SIGNATURE-----



-------------------------------------------------------
Enterprise Linux Forum Conference & Expo, June 4-6, 2003, Santa Clara
The only event dedicated to issues related to Linux enterprise solutions
http://www.enterpriselinuxforum.com

_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs


2003-05-06 15:18:34

by Lever, Charles

[permalink] [raw]
Subject: RE: processes stuck in D state

hi michael-

i'm not sure why you mailed neilb -- this appears to
be NFS client related, not server related.

can you spell out the sequence of events that leads
to the stuck processes? it looks like the client
is working-as-designed, but if you can provide more
details, we can verify what's going on.

> -----Original Message-----
> From: Michael Buesch [mailto:[email protected]]
> Sent: Tuesday, May 06, 2003 10:52 AM
> To: [email protected]
> Cc: [email protected]
> Subject: [NFS] processes stuck in D state
>=20
>=20
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>=20
> Hi!
>=20
> Please take a look at this problem:
>=20
> [linux-kernel-mailing-list thread]
> http://marc.theaimsgroup.com/?t=3D98639966100003&r=3D1&w=3D2
>=20
> thanks.
> Please cc me, as I'm not subscribed to the nfs-list.
>=20
> - --=20
> Regards Michael B=FCsch
> http://www.8ung.at/tuxsoft
> 16:28:34 up 20 min, 1 user, load average: 1.07, 0.97, 0.66
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.2.1 (GNU/Linux)
>=20
> iD8DBQE+t8wdoxoigfggmSgRAm5aAJsGJLPe9yUd4sqah4yiU0GsMIAGzACfSa2+
> gAMZvSHQirHmE8yZChpgH/8=3D
> =3Dpka2
> -----END PGP SIGNATURE-----
>=20
>=20
>=20
> -------------------------------------------------------
> Enterprise Linux Forum Conference & Expo, June 4-6, 2003, Santa Clara
> The only event dedicated to issues related to Linux=20
> enterprise solutions
> http://www.enterpriselinuxforum.com
>=20
> _______________________________________________
> NFS maillist - [email protected]
> https://lists.sourceforge.net/lists/listinfo/nfs
>=20


-------------------------------------------------------
Enterprise Linux Forum Conference & Expo, June 4-6, 2003, Santa Clara
The only event dedicated to issues related to Linux enterprise solutions
http://www.enterpriselinuxforum.com

_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-05-06 15:20:46

by Trond Myklebust

[permalink] [raw]
Subject: Re: processes stuck in D state

>>>>> " " == Michael Buesch <[email protected]> writes:

> Hi! Please take a look at this problem:

> [linux-kernel-mailing-list thread]
> http://marc.theaimsgroup.com/?t=98639966100003&r=1&w=2

If I can hazard a guess: someone is firewalling the lockd port and/or
the statd port.

Either mount using the 'nolock' option, or fix the firewall (see the
HOWTO and/or FAQ).

Cheers,
Trond


-------------------------------------------------------
Enterprise Linux Forum Conference & Expo, June 4-6, 2003, Santa Clara
The only event dedicated to issues related to Linux enterprise solutions
http://www.enterpriselinuxforum.com

_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-05-06 15:42:54

by Michael Buesch

[permalink] [raw]
Subject: Re: processes stuck in D state

=2D----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Tuesday 06 May 2003 17:20, Trond Myklebust wrote:
> >>>>> " " =3D=3D Michael Buesch <[email protected]> writes:
> > Hi! Please take a look at this problem:
> >
> > [linux-kernel-mailing-list thread]
> > http://marc.theaimsgroup.com/?t=3D98639966100003&r=3D1&w=3D2
>
> If I can hazard a guess: someone is firewalling the lockd port and/or
> the statd port.
>
> Either mount using the 'nolock' option, or fix the firewall (see the
> HOWTO and/or FAQ).

To reproduce the problem:
=2D - mount some nfs from a server in your lan.
=2D - Open an app, that uses the mounted fs. I've simply opened a
konqueror-window for the directory where the nfs is mounted.
=2D - shut down or crash the server or just pull the network-cable.
=2D - Now the konqueror-process is nonkillable in D state. There's no
chance to kill it.

I've tried it with all firewalls disabled, but the problem resists.

> Cheers,
> Trond

@linux-kernel-mailing-list: I've posted a thread to nfs-mailing list with
the same topic as in lkml. IMHO this is the better list for this problem. :)

=2D --=20
Regards Michael B=FCsch
http://www.8ung.at/tuxsoft
17:34:35 up 1:26, 5 users, load average: 1.52, 1.32, 1.13
=2D----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)

iD8DBQE+t9fWoxoigfggmSgRAvrcAJ4i+i3V+kcRd+kLHS7cb2WDZDHKsQCfWljd
rwtAFK4ONkJHzVck03t7F5U=3D
=3DgHuP
=2D----END PGP SIGNATURE-----



-------------------------------------------------------
Enterprise Linux Forum Conference & Expo, June 4-6, 2003, Santa Clara
The only event dedicated to issues related to Linux enterprise solutions
http://www.enterpriselinuxforum.com

_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-05-06 15:47:42

by Lever, Charles

[permalink] [raw]
Subject: RE: processes stuck in D state

> To reproduce the problem:
> - - mount some nfs from a server in your lan.
> - - Open an app, that uses the mounted fs. I've simply opened a
> konqueror-window for the directory where the nfs is mounted.
> - - shut down or crash the server or just pull the network-cable.
> - - Now the konqueror-process is nonkillable in D state. There's no
> chance to kill it.

does the problem persist after you reconnect the network cable?
what happens when the server becomes available again?
are you mounting with UDP or TCP?


-------------------------------------------------------
Enterprise Linux Forum Conference & Expo, June 4-6, 2003, Santa Clara
The only event dedicated to issues related to Linux enterprise solutions
http://www.enterpriselinuxforum.com

_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-05-06 15:56:24

by Michael Buesch

[permalink] [raw]
Subject: Re: [NFS] processes stuck in D state

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Tuesday 06 May 2003 17:47, Lever, Charles wrote:
> does the problem persist after you reconnect the network cable?
> what happens when the server becomes available again?

no. If server is available again, the process wakes up from D.

But like man mount says:
[snip] The process cannot be interrupted or killed unless you also spec=
ify intr. [/snip]
The process should be killable while the cable is pulled.
But that's not the case, although intr is in fstab.

> are you mounting with UDP or TCP?

uh. How to find it out? :)

- --=20
Regards Michael B=FCsch
http://www.8ung.at/tuxsoft
17:53:19 up 1:44, 5 users, load average: 1.04, 1.05, 1.06
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)

iD8DBQE+t9s0oxoigfggmSgRAkDPAKCFMeEGvS3KUhwn0bNQngKRK6h2fwCdEcv/
U2ttfZ6Mm8Sazuksfn4UUrY=3D
=3DM5Kh
-----END PGP SIGNATURE-----

2003-05-06 16:05:40

by Trond Myklebust

[permalink] [raw]
Subject: Re: processes stuck in D state

>>>>> " " == Michael Buesch <[email protected]> writes:


> To reproduce the problem:
> - - mount some nfs from a server in your lan.
> - - Open an app, that uses the mounted fs. I've simply opened a
> konqueror-window for the directory where the nfs is mounted.
> - - shut down or crash the server or just pull the
> network-cable.
> - - Now the konqueror-process is nonkillable in D
> state. There's no
> chance to kill it.

Unless you are using the 'intr' or 'soft' mount flags, then that is
*documented and expected* behaviour.

It is true that even when using the 'intr' mount flag, you don't
always succeed in killing a task that is hanging on NFS. That is
usually due to the fact that it is waiting on some semaphore that is
held by another process. semaphores always sleep in the
TASK_UNINTERRUPTIBLE state, so they cannot be signalled.
Linus has suggested a solution to this problem: to set up a special
class of semaphores that are killable with 'SIGKILL', but doing that
(and then replacing all those semaphores in the VFS and VM) is not
going to happen before 2.7.x. at the earliest.

However, as I've mentioned on this list *many* times before: there
exists a workaround if you are wanting to kill all processes in order
to unmount the partition:
kill -9 all the processes.
kill -9 rpciod.

Cheers,
Trond


-------------------------------------------------------
Enterprise Linux Forum Conference & Expo, June 4-6, 2003, Santa Clara
The only event dedicated to issues related to Linux enterprise solutions
http://www.enterpriselinuxforum.com

_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-05-06 16:37:59

by Michael Buesch

[permalink] [raw]
Subject: Re: processes stuck in D state

=2D----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Tuesday 06 May 2003 18:05, Trond Myklebust wrote:
> >>>>> " " =3D=3D Michael Buesch <[email protected]> writes:
> > To reproduce the problem:
> > - - mount some nfs from a server in your lan.
> > - - Open an app, that uses the mounted fs. I've simply opened a
> > konqueror-window for the directory where the nfs is mounted.
> > - - shut down or crash the server or just pull the
> > network-cable.
> > - - Now the konqueror-process is nonkillable in D
> > state. There's no
> > chance to kill it.
>
> Unless you are using the 'intr' or 'soft' mount flags, then that is
> *documented and expected* behaviour.

I'm using intr.

> However, as I've mentioned on this list *many* times before: there
> exists a workaround if you are wanting to kill all processes in order
> to unmount the partition:
> kill -9 all the processes.
> kill -9 rpciod.

kill -9 doesn't work for me to kill the app.

=2D --=20
Regards Michael B=FCsch
http://www.8ung.at/tuxsoft
18:28:55 up 2:20, 5 users, load average: 1.02, 1.06, 1.06
=2D----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)

iD8DBQE+t+MhoxoigfggmSgRAkeqAJ0c71DxLZ13/CHqUXlTa8TvjAt2iwCeLO34
s7crt56Gr8JyKxCLZMbrNvc=3D
=3Dz8EU
=2D----END PGP SIGNATURE-----



-------------------------------------------------------
Enterprise Linux Forum Conference & Expo, June 4-6, 2003, Santa Clara
The only event dedicated to issues related to Linux enterprise solutions
http://www.enterpriselinuxforum.com

_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-05-06 16:54:58

by Trond Myklebust

[permalink] [raw]
Subject: Re: processes stuck in D state

>>>>> " " == Michael Buesch <[email protected]> writes:

>> kill -9 all the processes. kill -9 rpciod.

> kill -9 doesn't work for me to kill the app.

I didn't say kill the app. I said signal it with -9, then signal
rpciod.

Cheers,
Trond


-------------------------------------------------------
Enterprise Linux Forum Conference & Expo, June 4-6, 2003, Santa Clara
The only event dedicated to issues related to Linux enterprise solutions
http://www.enterpriselinuxforum.com

_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-05-06 17:09:32

by pwitting

[permalink] [raw]
Subject: RE: processes stuck in D state

Actually, I have seen this on the server before, specifically
When using an older version of IBM's JFS for Linux with RH 7.3

I worked with the JFS team and JFS v1.1.0 and kernel 2.4.20
seemed to stabilize it, at least I've seen no more "freezes"
as a result. Judging by your mail address this might be your
problem as well.

One noticeable symptom is that cd'ing to an affected dir and
attempting an ls would also freeze (I had a large (20GB+) file
copy going, so I usually knew what the affected dir was.

Two other things that helped:
1) increasing the # of nfs threads (120 or more)
2) ensuring the uid/gid the remote thread was using existed on
the server. (sounds stupid but it helped)

neither "cured" the issue, but it went from being reproducible
to being occasional.

Good Luck.

> From: "Lever, Charles" <[email protected]>
>
> i'm not sure why you mailed neilb -- this appears to
> be NFS client related, not server related.
>
> can you spell out the sequence of events that leads
> to the stuck processes? it looks like the client
> is working-as-designed, but if you can provide more
> details, we can verify what's going on.
>
>> -----Original Message-----
>> From: Michael Buesch [mailto:[email protected]]
>> Sent: Tuesday, May 06, 2003 10:52 AM
>> To: [email protected]
>> Cc: [email protected]
>> Subject: [NFS] processes stuck in D state
>>
>> Please take a look at this problem:
>>
>> [linux-kernel-mailing-list thread]
>> http://marc.theaimsgroup.com/?t=3D98639966100003&r=3D1&w=3D2
>>
>> thanks.
>> Please cc me, as I'm not subscribed to the nfs-list.



-------------------------------------------------------
Enterprise Linux Forum Conference & Expo, June 4-6, 2003, Santa Clara
The only event dedicated to issues related to Linux enterprise solutions
http://www.enterpriselinuxforum.com

_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-05-06 17:32:29

by Michael Buesch

[permalink] [raw]
Subject: Re: [NFS] processes stuck in D state

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Tuesday 06 May 2003 18:54, Trond Myklebust wrote:
> >>>>> " " =3D=3D Michael Buesch <[email protected]> writes:
> >> kill -9 all the processes. kill -9 rpciod.
> >>
> > kill -9 doesn't work for me to kill the app.
>
> I didn't say kill the app. I said signal it with -9, then signal
> rpciod.

Ah, I understand. :)

> Cheers,
> Trond

- --=20
Regards Michael B=FCsch
http://www.8ung.at/tuxsoft
19:31:20 up 3:22, 2 users, load average: 1.23, 1.09, 1.04
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)

iD8DBQE+t/G3oxoigfggmSgRAq5BAJ0SezM+y1LFnwglArReHERXb2VJZQCeKKd0
Sx6RqCkOvm4FvgTCVyx2gCE=3D
=3DK8c7
-----END PGP SIGNATURE-----

2003-05-06 17:34:38

by Michael Buesch

[permalink] [raw]
Subject: Re: processes stuck in D state

=2D----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Tuesday 06 May 2003 19:09, [email protected] wrote:
> Actually, I have seen this on the server before, specifically
> When using an older version of IBM's JFS for Linux with RH 7.3

I'm running ext3 on the server.

> I worked with the JFS team and JFS v1.1.0 and kernel 2.4.20
> seemed to stabilize it, at least I've seen no more "freezes"
> as a result.

server-kernel is 2.4.21-pre6.

> Judging by your mail address this might be your
> problem as well.

You refer to the "fs" in my e-mail address? :)
The "fs" stands for "FreeSoftware".

> One noticeable symptom is that cd'ing to an affected dir and
> attempting an ls would also freeze (I had a large (20GB+) file
> copy going, so I usually knew what the affected dir was.
>
> Two other things that helped:
> 1) increasing the # of nfs threads (120 or more)
> 2) ensuring the uid/gid the remote thread was using existed on
> the server. (sounds stupid but it helped)
>
> neither "cured" the issue, but it went from being reproducible
> to being occasional.
>
> Good Luck.

thanks. :)

=2D --=20
Regards Michael B=FCsch
http://www.8ung.at/tuxsoft
19:26:58 up 3:18, 2 users, load average: 1.01, 1.08, 1.05
=2D----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)

iD8DBQE+t/FhoxoigfggmSgRAqgLAJ9hGo5l2c1y/oiZxamh/M88avZYIwCggzTc
lgyMmTHSGnxK17Xfq1Vb/eg=3D
=3DNi5P
=2D----END PGP SIGNATURE-----



-------------------------------------------------------
Enterprise Linux Forum Conference & Expo, June 4-6, 2003, Santa Clara
The only event dedicated to issues related to Linux enterprise solutions
http://www.enterpriselinuxforum.com

_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-05-06 18:32:55

by Guolin Cheng

[permalink] [raw]
Subject: RE: processes stuck in D state

Hi, all,

We are encountering the same problem here as well,=20
=20
[root@arc100 root]# ps auxw
......
andy 9087 0.0 3.2 12900 10312 ? D May05 0:02
/0/tmp/av_explore .index /net/arc295/0/DQ_crawl26.20030416034937.arc.gz
......

although the nfs server has no problem at all. If I try to "ls
/net/arc295/0" then the new "ls" process will hang as well.
The method I followed to fix the problem is:

[root@arc100 root]# umount -f /net/arc295/0
umount2: Device or resource busy
umount: /net/arc295/0: Illegal seek
[root@arc100 root]# /etc/init.d/amd restart
Stopping amd: [ OK ]
Starting amd: [ OK ]
[root@arc100 root]# cd /net/arc295/0
[root@arc100 0]# ls=20

My nfs clients/servers has the same configurations:=20

Redhat 8.0
General Linux Kernel 2.4.20 ("nfs over tcp" is enabled)
gcc-3.2-7=20
amd ( am-utils-6.0.7-9 )
amd mount options in map amd.master
(opts:=3Drw,intr,nfsv3,tcp,nosuid,nodev,noresvport)
The real amd mount status for the nfs directory:
arc295:/0 on /.amd_mnt/arc295/host/0 type nfs
(rw,intr,nfsv3,tcp,nosuid,nodev,noresvport,dev=3D0000f10e,vers=3D3,proto=
=3Dtcp)

Any one need more info the shoot the sort of problem, let me know.

Thanks.
--Guolin Cheng


-----Original Message-----
From: Lever, Charles [mailto:[email protected]]
Sent: Tuesday, May 06, 2003 8:18 AM
To: Michael Buesch
Cc: [email protected]; [email protected]
Subject: RE: [NFS] processes stuck in D state


hi michael-

i'm not sure why you mailed neilb -- this appears to
be NFS client related, not server related.

can you spell out the sequence of events that leads
to the stuck processes? it looks like the client
is working-as-designed, but if you can provide more
details, we can verify what's going on.

> -----Original Message-----
> From: Michael Buesch [mailto:[email protected]]
> Sent: Tuesday, May 06, 2003 10:52 AM
> To: [email protected]
> Cc: [email protected]
> Subject: [NFS] processes stuck in D state
>=20
>=20
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>=20
> Hi!
>=20
> Please take a look at this problem:
>=20
> [linux-kernel-mailing-list thread]
> http://marc.theaimsgroup.com/?t=3D98639966100003&r=3D1&w=3D2
>=20
> thanks.
> Please cc me, as I'm not subscribed to the nfs-list.
>=20
> - --=20
> Regards Michael B=FCsch
> http://www.8ung.at/tuxsoft
> 16:28:34 up 20 min, 1 user, load average: 1.07, 0.97, 0.66
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.2.1 (GNU/Linux)
>=20
> iD8DBQE+t8wdoxoigfggmSgRAm5aAJsGJLPe9yUd4sqah4yiU0GsMIAGzACfSa2+
> gAMZvSHQirHmE8yZChpgH/8=3D
> =3Dpka2
> -----END PGP SIGNATURE-----
>=20
>=20
>=20
> -------------------------------------------------------
> Enterprise Linux Forum Conference & Expo, June 4-6, 2003, Santa Clara
> The only event dedicated to issues related to Linux=20
> enterprise solutions
> http://www.enterpriselinuxforum.com
>=20
> _______________________________________________
> NFS maillist - [email protected]
> https://lists.sourceforge.net/lists/listinfo/nfs
>=20


-------------------------------------------------------
Enterprise Linux Forum Conference & Expo, June 4-6, 2003, Santa Clara
The only event dedicated to issues related to Linux enterprise =
solutions
http://www.enterpriselinuxforum.com

_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs


-------------------------------------------------------
Enterprise Linux Forum Conference & Expo, June 4-6, 2003, Santa Clara
The only event dedicated to issues related to Linux enterprise solutions
http://www.enterpriselinuxforum.com

_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs