2008-08-04 12:47:57

by Bernd Schubert

[permalink] [raw]
Subject: sysrq+t doesn't work for some threads

Hello,

I need to debug a lustre problem, where lustre threads take 100% cpu time and
also have a memory leak. The problem is, that sysrq+t doesn't work for these
threads. It nicely shows all stack traces, just the trouble some threads
won't show their traces:

[69338.858825] ll_mdt_36 R running task 0 21679 2 (L-TLB)
[69338.865689] ll_mdt_37 S 0000000000000000 0 21680 2 (L-TLB)
[69338.872676] ffff8102e6d01dd0 0000000000000046 ffffffff88160325
00000000ffffffed
[69338.880544] 0000000000000000 ffffffff8815e016 000000000000000a
ffff810322ef4ea0
[69338.888318] ffff81031e7477b0 00003f0797609cef 000000000003434e
ffff810322ef5050
[69338.895957] Call Trace:
[69338.898787] [<ffffffff8828cab5>] :ptlrpc:ptlrpc_main+0xa55/0x1ce0
[69338.905297] [<ffffffff8020a2f8>] child_rip+0xa/0x12


But I really need to have the trace of ll_mdt_36, which is one of the
troublesome threads. Might this be a problem of the x86_64-mm-unwinder.patch
we always apply since it provides much better traces?
This is with 2.6.22.19.

Any help would be appreciated.


Thanks,
Bernd

PS: Going to apply kdb now, maybe I get traces when I stop the system with
kdb.


--
Bernd Schubert
Q-Leap Networks GmbH


2008-08-04 19:46:40

by Robert Hancock

[permalink] [raw]
Subject: Re: sysrq+t doesn't work for some threads

Bernd Schubert wrote:
> Hello,
>
> I need to debug a lustre problem, where lustre threads take 100% cpu time and
> also have a memory leak. The problem is, that sysrq+t doesn't work for these
> threads. It nicely shows all stack traces, just the trouble some threads
> won't show their traces:
>
> [69338.858825] ll_mdt_36 R running task 0 21679 2 (L-TLB)

This means this task is running, you won't get a stack trace for such a
thread. You'd likely have to stop it somehow. Is this a kernel thread?

> [69338.865689] ll_mdt_37 S 0000000000000000 0 21680 2 (L-TLB)
> [69338.872676] ffff8102e6d01dd0 0000000000000046 ffffffff88160325
> 00000000ffffffed
> [69338.880544] 0000000000000000 ffffffff8815e016 000000000000000a
> ffff810322ef4ea0
> [69338.888318] ffff81031e7477b0 00003f0797609cef 000000000003434e
> ffff810322ef5050
> [69338.895957] Call Trace:
> [69338.898787] [<ffffffff8828cab5>] :ptlrpc:ptlrpc_main+0xa55/0x1ce0
> [69338.905297] [<ffffffff8020a2f8>] child_rip+0xa/0x12
>
>
> But I really need to have the trace of ll_mdt_36, which is one of the
> troublesome threads. Might this be a problem of the x86_64-mm-unwinder.patch
> we always apply since it provides much better traces?
> This is with 2.6.22.19.
>
> Any help would be appreciated.

2008-08-04 19:51:34

by Bernd Schubert

[permalink] [raw]
Subject: Re: sysrq+t doesn't work for some threads

On Monday 04 August 2008 21:46:30 Robert Hancock wrote:
> Bernd Schubert wrote:
> > Hello,
> >
> > I need to debug a lustre problem, where lustre threads take 100% cpu time
> > and also have a memory leak. The problem is, that sysrq+t doesn't work
> > for these threads. It nicely shows all stack traces, just the trouble
> > some threads won't show their traces:
> >
> > [69338.858825] ll_mdt_36 R running task 0 21679 2 (L-TLB)
>
> This means this task is running, you won't get a stack trace for such a
> thread. You'd likely have to stop it somehow. Is this a kernel thread?

Yes, it is a kernel thread. How can I stop it? Is there a way to stop kernel
threads at all? kill -STOP doesn't work.


Thanks,
Bernd



--
Bernd Schubert
Q-Leap Networks GmbH