2011-04-22 23:09:32

by Ben Greear

[permalink] [raw]
Subject: Debugging hung tasks?

I am testing lots of NFS traffic against an over-loaded and slow file server.

I enabled the hung-task detection logic, and it's hitting after 180
seconds.

First: Is there any valid reason to have funky NFS cause a hung task?

Second: Why doesn't the hung-task panic logic print the stack trace of
the hung task?
Is this an option that can be enabled?

Thanks,
Ben

--
Ben Greear <[email protected]>
Candela Technologies Inc http://www.candelatech.com


2011-04-23 03:55:49

by Randy Dunlap

[permalink] [raw]
Subject: Re: Debugging hung tasks?

On Fri, 22 Apr 2011 16:09:29 -0700 Ben Greear wrote:

> I am testing lots of NFS traffic against an over-loaded and slow file server.
>
> I enabled the hung-task detection logic, and it's hitting after 180
> seconds.
>
> First: Is there any valid reason to have funky NFS cause a hung task?
>
> Second: Why doesn't the hung-task panic logic print the stack trace of
> the hung task?
> Is this an option that can be enabled?

hung_task.c::check_hung_task() always calls sched_show_task() and
optionally does the panic:

if (sysctl_hung_task_panic)
panic("hung_task: blocked tasks");

sched.c::sched_show_task() calls show_stack(), which should be doing what
you are asking for AFAICT. What kernel version are you using?


---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***

2011-04-25 17:38:36

by Ben Greear

[permalink] [raw]
Subject: Re: Debugging hung tasks?

On 04/22/2011 08:55 PM, Randy Dunlap wrote:
> On Fri, 22 Apr 2011 16:09:29 -0700 Ben Greear wrote:
>
>> I am testing lots of NFS traffic against an over-loaded and slow file server.
>>
>> I enabled the hung-task detection logic, and it's hitting after 180
>> seconds.
>>
>> First: Is there any valid reason to have funky NFS cause a hung task?
>>
>> Second: Why doesn't the hung-task panic logic print the stack trace of
>> the hung task?
>> Is this an option that can be enabled?
>
> hung_task.c::check_hung_task() always calls sched_show_task() and
> optionally does the panic:
>
> if (sysctl_hung_task_panic)
> panic("hung_task: blocked tasks");
>
> sched.c::sched_show_task() calls show_stack(), which should be doing what
> you are asking for AFAICT. What kernel version are you using?

Here's one of the panics, for instance (captured on serial console).

There is a lockdep splat in 2.6.36.4 early on, (known bug, but
not fixed since that kernel is EOL), so that is probably why there
is no locking info printed. But, I was expecting a more useful stack
trace since it appears to be our user-space application (btserver)
that is hung.

Apr 22 15:57:38 localhost kernel: nfs: server 192.168.100.19 not responding, still trying
Apr 22 15:57:38 localhost kernel: nfs: server 192.168.100.19 OK
Kernel panic - not syncing: hung_task: blocked tasks
Pid: 58, comm: khungtaskd Not tainted 2.6.36.4+ #1
Apr 22 15:59:08 Call Trace:
localhost kernel [<ffffffff8140174a>] panic+0x96/0x1ae
: INFO: task bts [<ffffffff81093106>] watchdog+0x1b1/0x1f9
erver:15212 bloc [<ffffffff81092f55>] ? watchdog+0x0/0x1f9
ked for more tha [<ffffffff8105c774>] kthread+0x7d/0x85
n 180 seconds.
[<ffffffff8100a8e4>] kernel_thread_helper+0x4/0x10
Apr 22 15:59:08 [<ffffffff81404a54>] ? restore_args+0x0/0x30
localhost kernel [<ffffffff8105c6f7>] ? kthread+0x0/0x85
: "echo 0 > /pro [<ffffffff8100a8e0>] ? kernel_thread_helper+0x0/0x10
c/sys/kernel/hunpanic occurred, switching back to text console
Rebooting in 10 seconds..^C

We're testing 2.6.38.4 now..haven't seen this problem again,
so maybe it's fixed anyway...

Thanks,
Ben

--
Ben Greear <[email protected]>
Candela Technologies Inc http://www.candelatech.com