2010-06-04 11:50:11

by Jeff Layton

[permalink] [raw]
Subject: Re: wait_even_interruptible_timeout(), signal, spin_lock() = system hang

On Fri, 28 May 2010 11:44:46 -0500
Shirish Pargaonkar <[email protected]> wrote:

> After this sequence of calls, system hangs (smp, x86 box based with
> .34 kernel), can ping only.
> I have not been able to break in with Alt Sysrq t, working on that
>
> rc = wait_event_interruptible_timeout(ses->server->response_q,
> (midQ->midState != MID_REQUEST_SUBMITTED), timeout);
> if (rc < 0) {
> cFYI(1, ("command 0x%x interrupted", midQ->command));
> return -1;
> }
>
> and when function that invoking function after coming out with ERESTARTSYS
> (I kill the command with Ctrl C) calls
> spin_lock(&GlobalMid_Lock);
>
> system hangs. If I sleep before return -1 (e.g. msleep(1), no hang)
>

Sounds like a race of some sort, but could also be that msleep() is
doing something (perhaps relating to the pending signal) that prevents
the hang. Without some sort of clue as to what the box is hung on at
the time there is no way to know.

> I do not have to use wait_event_interruptible_timeout and no such problems with
> wait_event_timeout, it is only when signal/interrupt is involved, I
> run into this problem
>
> Any pointers/ideas what could be happening, would be really really appreciated.
>

No idea right offhand. I'd suggest getting a core or sysrq data and see
what it's doing.

--
Jeff Layton <[email protected]>


2010-06-04 12:20:57

by Shirish Pargaonkar

[permalink] [raw]
Subject: Re: wait_even_interruptible_timeout(), signal, spin_lock() = system hang

On Fri, Jun 4, 2010 at 6:51 AM, Jeff Layton <[email protected]> wrote:
> On Fri, 28 May 2010 11:44:46 -0500
> Shirish Pargaonkar <[email protected]> wrote:
>
>> After this sequence of calls, system hangs (smp, x86 box based with
>> .34 kernel), can ping only.
>> I have not been able to break in with Alt Sysrq t, working on that
>>
>> ? ? ? ? rc = wait_event_interruptible_timeout(ses->server->response_q,
>> ? ? ? ? ? ? ? ? ? ? ? ? (midQ->midState != MID_REQUEST_SUBMITTED), timeout);
>> ? ? ? ? if (rc < 0) {
>> ? ? ? ? ? ? ? ? cFYI(1, ("command 0x%x interrupted", midQ->command));
>> ? ? ? ? ? ? ? ? return -1;
>> ? ? ? ? }
>>
>> and when function that invoking function after coming out with ERESTARTSYS
>> (I kill the command with Ctrl C) calls
>> ?spin_lock(&GlobalMid_Lock);
>>
>> system hangs. ?If I sleep before return -1 (e.g. msleep(1), no hang)
>>
>
> Sounds like a race of some sort, but could also be that msleep() is
> doing something (perhaps relating to the pending signal) that prevents
> the hang. Without some sort of clue as to what the box is hung on at
> the time there is no way to know.
>
>> I do not have to use wait_event_interruptible_timeout and no such problems with
>> wait_event_timeout, it is only when signal/interrupt is involved, I
>> run into this problem
>>
>> Any pointers/ideas what could be happening, would be really really appreciated.
>>
>
> No idea right offhand. I'd suggest getting a core or sysrq data and see
> what it's doing.
>
> --
> Jeff Layton <[email protected]>
>

Jeff, Thanks. The system hangs really hard. It does not respond to
Alt ScrLk Ctrl ScrLk
key sequence at the text mode console i.e. nothing gets logged in
/var/log/messages.

Regards,

Shirish

2010-06-05 13:57:36

by Shirish Pargaonkar

[permalink] [raw]
Subject: Re: wait_even_interruptible_timeout(), signal, spin_lock() = system hang

On Fri, Jun 4, 2010 at 7:13 AM, Shirish Pargaonkar
<[email protected]> wrote:
> On Fri, Jun 4, 2010 at 6:51 AM, Jeff Layton <[email protected]> wrote:
>> On Fri, 28 May 2010 11:44:46 -0500
>> Shirish Pargaonkar <[email protected]> wrote:
>>
>>> After this sequence of calls, system hangs (smp, x86 box based with
>>> .34 kernel), can ping only.
>>> I have not been able to break in with Alt Sysrq t, working on that
>>>
>>> ? ? ? ? rc = wait_event_interruptible_timeout(ses->server->response_q,
>>> ? ? ? ? ? ? ? ? ? ? ? ? (midQ->midState != MID_REQUEST_SUBMITTED), timeout);
>>> ? ? ? ? if (rc < 0) {
>>> ? ? ? ? ? ? ? ? cFYI(1, ("command 0x%x interrupted", midQ->command));
>>> ? ? ? ? ? ? ? ? return -1;
>>> ? ? ? ? }
>>>
>>> and when function that invoking function after coming out with ERESTARTSYS
>>> (I kill the command with Ctrl C) calls
>>> ?spin_lock(&GlobalMid_Lock);
>>>
>>> system hangs. ?If I sleep before return -1 (e.g. msleep(1), no hang)
>>>
>>
>> Sounds like a race of some sort, but could also be that msleep() is
>> doing something (perhaps relating to the pending signal) that prevents
>> the hang. Without some sort of clue as to what the box is hung on at
>> the time there is no way to know.
>>
>>> I do not have to use wait_event_interruptible_timeout and no such problems with
>>> wait_event_timeout, it is only when signal/interrupt is involved, I
>>> run into this problem
>>>
>>> Any pointers/ideas what could be happening, would be really really appreciated.
>>>
>>
>> No idea right offhand. I'd suggest getting a core or sysrq data and see
>> what it's doing.
>>
>> --
>> Jeff Layton <[email protected]>
>>
>
> Jeff, Thanks. ?The system hangs really hard. It does not respond to
> Alt ScrLk ?Ctrl ScrLk
> key sequence at the text mode console i.e. nothing gets logged in
> /var/log/messages.
>
> Regards,
>
> Shirish
>

I think this is what is happening,

when one fsstress command gets killed, one of the numerous processes
that are getting
killed holds mid lock and dies before releasing the lock and others
when interrupted in
wait return with ERESTARTSYS and while attempting the mid lock, spin forever
causing system hang.