2003-07-03 12:33:48

by Brian Ristuccia

[permalink] [raw]
Subject: 2.4.21-rmap15j: sometimes processes stuck in state D, WCHAN 'down'

We've been seeing processes which occasionally get stuck in state D, WCHAN
'down'. If I go look in /proc/pid/fd, or proc/pid/cwd/, usually I can find
one of the open files or directory which will I can hang a new process by
attempting to access. The affected files/directories have been on a local
ext3 filesystem. The stuck processes are unkillable, even with signal 9.

If I attempt to attach to one of the hung processes with strace, there is no
output from strace, and the strace process becomes hung and unkillable as
well.

Is anyone else seeing this problem with stock 2.4.21 or 2.4.21-rmap15j?

--
Brian Ristuccia
[email protected]


2003-07-04 03:15:43

by Rik van Riel

[permalink] [raw]
Subject: Re: 2.4.21-rmap15j: sometimes processes stuck in state D, WCHAN 'down'

On Thu, 3 Jul 2003, Brian Ristuccia wrote:

> Is anyone else seeing this problem with stock 2.4.21 or 2.4.21-rmap15j?

I haven't heard of anything like this. Could you please get
a backtrace of all the stuck processes with alt+sysrq+t and
decode the backtraces with ksymoops ?

thanks,

Rik
--
Great minds drink alike.

2003-07-10 14:28:26

by Brian Ristuccia

[permalink] [raw]
Subject: Re: 2.4.21-rmap15j: sometimes processes stuck in state D, WCHAN 'down'

On Thu, Jul 03, 2003 at 11:30:09PM -0400, Rik van Riel wrote:
> On Thu, 3 Jul 2003, Brian Ristuccia wrote:
>
> > Is anyone else seeing this problem with stock 2.4.21 or 2.4.21-rmap15j?
>
> I haven't heard of anything like this. Could you please get
> a backtrace of all the stuck processes with alt+sysrq+t and
> decode the backtraces with ksymoops ?
>

This is actually the backtrace from ctrl-scrollock or whatever - the
machine where it happened didn't have magic sysrq. If the difference is
important, I can put a magic sysrq kernel everywhere and try to reproduce it
again.

Jul 10 10:18:19 grunt1 kernel: ld D D6E5C848 0 25131 1
25142 25147 (NOTLB)
Jul 10 10:18:19 grunt1 kernel: Call Trace: [dput+25/340] [__down+108/200]
[__down_failed+8/12] [.text.lock.namei+53/1246] [link_path_walk+1786/2460]
Jul 10 10:18:19 grunt1 kernel: ld D D6E5C848 0 25142 1
25131 (NOTLB)
Jul 10 10:18:19 grunt1 kernel: Call Trace: [dput+25/340] [__down+108/200]
[__down_failed+8/12] [.text.lock.namei+53/1246] [link_path_walk+1786/2460]
Jul 10 10:18:19 grunt1 kernel: ld D D6E5C848 0 25143 1
25144 7575 (NOTLB)
Jul 10 10:18:19 grunt1 kernel: Call Trace: [dput+25/340] [__down+108/200]
[__down_failed+8/12] [.text.lock.namei+53/1246] [link_path_walk+1786/2460]
Jul 10 10:18:19 grunt1 kernel: ld D D6E5C848 0 25144 1
25147 25143 (NOTLB)
Jul 10 10:18:19 grunt1 kernel: Call Trace: [dput+25/340] [__down+108/200]
[__down_failed+8/12] [.text.lock.namei+53/1246] [link_path_walk+1786/2460]
Jul 10 10:18:19 grunt1 kernel: ld D D7A6A000 408 25147 1
25131 25144 (NOTLB)

--
Brian Ristuccia
[email protected]