The oom-killer causes lockups because it calls
cpuset_excl_nodes_overlap() with tasklist_lock read-locked.
cpuset_excl_nodes_overlap() gets cpuset_sem (or callback_sem in
later linux versions) semaphore, which might_sleep even if the
semaphore could be down without sleeping. If processes call
exit() or fork() when the oom-killer sleeps in the down(), they
lockup because they call write_lock_irq(&tasklist_lock).
The lockup occurred on linux-2.6.14. The problem also seems to exist
in linux-2.6.15-rc5-mm3 and linux-2.6.15-rc7.
Regards,
--
KUROSAWA, Takahiro
yes, we found the same problem while looking at the code.
and this is not the only cpuset function which might sleep, but is
called from atomic context... :(
> The oom-killer causes lockups because it calls
> cpuset_excl_nodes_overlap() with tasklist_lock read-locked.
> cpuset_excl_nodes_overlap() gets cpuset_sem (or callback_sem in
> later linux versions) semaphore, which might_sleep even if the
> semaphore could be down without sleeping. If processes call
> exit() or fork() when the oom-killer sleeps in the down(), they
> lockup because they call write_lock_irq(&tasklist_lock).
>
> The lockup occurred on linux-2.6.14. The problem also seems to exist
> in linux-2.6.15-rc5-mm3 and linux-2.6.15-rc7.
>
> Regards,
>
KUROSAWA Takahiro wrote:
>
> The oom-killer causes lockups because it calls
> cpuset_excl_nodes_overlap() with tasklist_lock read-locked.
> cpuset_excl_nodes_overlap() gets cpuset_sem (or callback_sem in
> later linux versions) semaphore, which might_sleep even if the
> semaphore could be down without sleeping.
Thank-you for catching this. My apologies for not responding sooner.
I was off the air for a week. I will look at this now.
--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.925.600.0401