2005-12-28 00:40:28

by KUROSAWA Takahiro

[permalink] [raw]
Subject: oom-killer causes lockups in cpuset_excl_nodes_overlap()

The oom-killer causes lockups because it calls
cpuset_excl_nodes_overlap() with tasklist_lock read-locked.
cpuset_excl_nodes_overlap() gets cpuset_sem (or callback_sem in
later linux versions) semaphore, which might_sleep even if the
semaphore could be down without sleeping. If processes call
exit() or fork() when the oom-killer sleeps in the down(), they
lockup because they call write_lock_irq(&tasklist_lock).

The lockup occurred on linux-2.6.14. The problem also seems to exist
in linux-2.6.15-rc5-mm3 and linux-2.6.15-rc7.

Regards,

--
KUROSAWA, Takahiro


2005-12-31 13:24:20

by Kirill Korotaev

[permalink] [raw]
Subject: Re: oom-killer causes lockups in cpuset_excl_nodes_overlap()

yes, we found the same problem while looking at the code.
and this is not the only cpuset function which might sleep, but is
called from atomic context... :(

> The oom-killer causes lockups because it calls
> cpuset_excl_nodes_overlap() with tasklist_lock read-locked.
> cpuset_excl_nodes_overlap() gets cpuset_sem (or callback_sem in
> later linux versions) semaphore, which might_sleep even if the
> semaphore could be down without sleeping. If processes call
> exit() or fork() when the oom-killer sleeps in the down(), they
> lockup because they call write_lock_irq(&tasklist_lock).
>
> The lockup occurred on linux-2.6.14. The problem also seems to exist
> in linux-2.6.15-rc5-mm3 and linux-2.6.15-rc7.
>
> Regards,
>

2006-01-03 21:35:04

by Paul Jackson

[permalink] [raw]
Subject: Re: oom-killer causes lockups in cpuset_excl_nodes_overlap()

KUROSAWA Takahiro wrote:
>
> The oom-killer causes lockups because it calls
> cpuset_excl_nodes_overlap() with tasklist_lock read-locked.
> cpuset_excl_nodes_overlap() gets cpuset_sem (or callback_sem in
> later linux versions) semaphore, which might_sleep even if the
> semaphore could be down without sleeping.

Thank-you for catching this. My apologies for not responding sooner.
I was off the air for a week. I will look at this now.

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.925.600.0401