2008-02-22 23:08:50

by Stephen Oberholtzer

[permalink] [raw]
Subject: How to diagnose a process stuck in D state?

First off: I'm not subscribed to the list (I don't think I could
handle the volume), so please make sure you CC me if you reply.

I run an application on one of my machines; it often hangs, with the
process stuck D state. When this happens, the process sticks around
until I reboot the machine.

I have tried the following to start diagnosing the problem:

* Running 'ps -axl' shows "-" in the wchan column.
* The contents of /proc/pid/wchan say "_stext".
* 'strace -p pid' says "Process pid attached - interrupt to quit" and
stops responding. Sending SIGINT and SIGTERM have no effect on the
strace process, although kill -11 (SIGSEGV, my personal favorite) does
work.

This is very confusing. I would greatly appreciate it if someone
could tell me how a process can enter D state without being in a
syscall, and what I can do to start tracking down the cause.

(By the way: This is on amd64, 2.6.23. I'm updating to 2.6.24.2 right
now, on the off chance that whatever was causing the problem has been
fixed.)

--
-- Stevie-O
Real programmers use COPY CON PROGRAM.EXE


2008-02-23 00:31:26

by Lee Revell

[permalink] [raw]
Subject: Re: How to diagnose a process stuck in D state?

On Fri, Feb 22, 2008 at 6:08 PM, Stephen Oberholtzer
<[email protected]> wrote:
> First off: I'm not subscribed to the list (I don't think I could
> handle the volume), so please make sure you CC me if you reply.
>
> I run an application on one of my machines; it often hangs, with the
> process stuck D state. When this happens, the process sticks around
> until I reboot the machine.
>

Run "dmesg". Often when this happens you'll find that the kernel has Oopsed.

Lee

2008-02-23 13:37:59

by Daniel J Blueman

[permalink] [raw]
Subject: Re: How to diagnose a process stuck in D state?

Stephen,

On 22 Feb, 23:10, "Stephen Oberholtzer" <[email protected]> wrote:
> First off: I'm not subscribed to the list (I don't think I could
> handle the volume), so please make sure you CC me if you reply.
>
> I run an application on one of my machines; it often hangs, with the
> process stuck D state. When this happens, the process sticks around
> until I reboot the machine.
>
> I have tried the following to start diagnosing the problem:
>
> * Running 'ps -axl' shows "-" in the wchan column.
> * The contents of /proc/pid/wchan say "_stext".
> * 'strace -p pid' says "Process pid attached - interrupt to quit" and
> stops responding. Sending SIGINT and SIGTERM have no effect on the
> strace process, although kill -11 (SIGSEGV, my personal favorite) does
> work.
>
> This is very confusing. I would greatly appreciate it if someone
> could tell me how a process can enter D state without being in a
> syscall, and what I can do to start tracking down the cause.

wchan shows where the process is sleeping - probably a kernel mutex or
lock here. This doesn't help much, but the stack signature will.

Enable and use the sys-request mechanism (via terminal, keyboard or
serial) eg sysrq-T to dump the stack frames of all processes,
including this.

Unsurprisingly, it's documented in the kernel Documentation directory.

> (By the way: This is on amd64, 2.6.23. I'm updating to 2.6.24.2 right
> now, on the off chance that whatever was causing the problem has been
> fixed.)

Daniel
--
Daniel J Blueman