After selecting a task to kill, the oom killer iterates all processes and
kills all other user threads that share the same mm_struct in different
thread groups.
But in some extreme cases, the selected task happens to be a vfork child
of init process sharing the same mm_struct with it, which causes kernel
panic on init getting killed. This panic is observed in a busybox shell
that busybox itself is init, with a kthread keeps consuming memories.
Signed-off-by: Ming Liu <[email protected]>
---
mm/oom_kill.c | 16 ++++++++--------
1 files changed, 8 insertions(+), 8 deletions(-)
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 314e9d2..7db4881 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -479,17 +479,17 @@ void oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
task_unlock(victim);
/*
- * Kill all user processes sharing victim->mm in other thread groups, if
- * any. They don't get access to memory reserves, though, to avoid
- * depletion of all memory. This prevents mm->mmap_sem livelock when an
- * oom killed thread cannot exit because it requires the semaphore and
- * its contended by another thread trying to allocate memory itself.
- * That thread will now get access to memory reserves since it has a
- * pending fatal signal.
+ * Kill all user processes except init sharing victim->mm in other
+ * thread groups, if any. They don't get access to memory reserves,
+ * though, to avoid depletion of all memory. This prevents mm->mmap_sem
+ * livelock when an oom killed thread cannot exit because it requires
+ * the semaphore and its contended by another thread trying to allocate
+ * memory itself. That thread will now get access to memory reserves
+ * since it has a pending fatal signal.
*/
for_each_process(p)
if (p->mm == mm && !same_thread_group(p, victim) &&
- !(p->flags & PF_KTHREAD)) {
+ !(p->flags & PF_KTHREAD) && !is_global_init(p)) {
if (p->signal->oom_score_adj == OOM_SCORE_ADJ_MIN)
continue;
--
1.7.0.4
On Mon, 23 Sep 2013, Ming Liu wrote:
> After selecting a task to kill, the oom killer iterates all processes and
> kills all other user threads that share the same mm_struct in different
> thread groups.
>
> But in some extreme cases, the selected task happens to be a vfork child
> of init process sharing the same mm_struct with it, which causes kernel
> panic on init getting killed. This panic is observed in a busybox shell
> that busybox itself is init, with a kthread keeps consuming memories.
>
We shouldn't be selecting a process where mm == init_mm in the first
place, so this wouldn't fix the issue entirely.
On 09/25/2013 10:34 AM, David Rientjes wrote:
> On Mon, 23 Sep 2013, Ming Liu wrote:
>
>> After selecting a task to kill, the oom killer iterates all processes and
>> kills all other user threads that share the same mm_struct in different
>> thread groups.
>>
>> But in some extreme cases, the selected task happens to be a vfork child
>> of init process sharing the same mm_struct with it, which causes kernel
>> panic on init getting killed. This panic is observed in a busybox shell
>> that busybox itself is init, with a kthread keeps consuming memories.
>>
> We shouldn't be selecting a process where mm == init_mm in the first
> place, so this wouldn't fix the issue entirely.
But if we add a control point for "mm == init_mm" in the first place(ie.
in oom_unkillable_task), that would forbid the processes sharing mm with
init to be selected, is that reasonable? Actually my fix is just to
protect init process to be killed for its vfork child being selected and
I think it's the only place where there is the risk. If my understanding
is wrong, pls correct me.
Thanks,
Ming Liu
>
>
On Wed, 25 Sep 2013, Ming Liu wrote:
> > We shouldn't be selecting a process where mm == init_mm in the first
> > place, so this wouldn't fix the issue entirely.
>
> But if we add a control point for "mm == init_mm" in the first place(ie. in
> oom_unkillable_task), that would forbid the processes sharing mm with init to
> be selected, is that reasonable? Actually my fix is just to protect init
> process to be killed for its vfork child being selected and I think it's the
> only place where there is the risk. If my understanding is wrong, pls correct
> me.
>
We never want to select a process where task->mm == init_mm because if we
kill it we won't free any memory, regardless of vfork(). The goal of the
oom killer is solely to free memory, so it always tries to avoid needless
killing.
On 09/26/2013 01:56 AM, David Rientjes wrote:
> On Wed, 25 Sep 2013, Ming Liu wrote:
>
>>> We shouldn't be selecting a process where mm == init_mm in the first
>>> place, so this wouldn't fix the issue entirely.
>> But if we add a control point for "mm == init_mm" in the first place(ie. in
>> oom_unkillable_task), that would forbid the processes sharing mm with init to
>> be selected, is that reasonable? Actually my fix is just to protect init
>> process to be killed for its vfork child being selected and I think it's the
>> only place where there is the risk. If my understanding is wrong, pls correct
>> me.
>>
> We never want to select a process where task->mm == init_mm because if we
> kill it we won't free any memory, regardless of vfork(). The goal of the
> oom killer is solely to free memory, so it always tries to avoid needless
> killing.
Yes, that make sense, I will send the V1 patch.
the best,
thank you
>
>