Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933448AbcJTXJA (ORCPT ); Thu, 20 Oct 2016 19:09:00 -0400 Received: from mail-vk0-f42.google.com ([209.85.213.42]:34587 "EHLO mail-vk0-f42.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932776AbcJTXI6 (ORCPT ); Thu, 20 Oct 2016 19:08:58 -0400 MIME-Version: 1.0 In-Reply-To: <20160921154350.13128-2-roman.penyaev@profitbricks.com> References: <20160921154350.13128-1-roman.penyaev@profitbricks.com> <20160921154350.13128-2-roman.penyaev@profitbricks.com> From: Andy Lutomirski Date: Thu, 20 Oct 2016 16:08:37 -0700 Message-ID: Subject: Re: [PATCH 2/2] sched: do not call workqueue sleep hook if task is already dead To: Roman Pen , Oleg Nesterov Cc: Andy Lutomirski , Josh Poimboeuf , Borislav Petkov , Brian Gerst , Denys Vlasenko , "H . Peter Anvin" , Peter Zijlstra , Thomas Gleixner , Ingo Molnar , Tejun Heo , "linux-kernel@vger.kernel.org" Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4110 Lines: 103 On Wed, Sep 21, 2016 at 8:43 AM, Roman Pen wrote: > If panic_on_oops is not set and oops happens inside workqueue kthread, > kernel kills this kthread. Current patch fixes recursive GPF which > happens in that case with the following stack: Oleg, can you take a look at this? --Andy > > [] dump_stack+0x68/0x93 > [] ? do_exit+0x7ab/0xc10 > [] __schedule_bug+0x83/0xe0 > [] __schedule+0x7ea/0xba0 > [] ? vprintk_default+0x1f/0x30 > [] ? printk+0x48/0x50 > [] schedule+0x40/0x90 > [] do_exit+0x9ca/0xc10 > [] ? kmsg_dump+0x11d/0x190 > [] ? kmsg_dump+0x17/0x190 > [] oops_end+0x99/0xd0 > [] no_context+0x185/0x3e0 > [] __bad_area_nosemaphore+0x83/0x1c0 > [] ? vprintk_emit+0x25e/0x530 > [] bad_area_nosemaphore+0x14/0x20 > [] __do_page_fault+0xac/0x570 > [] ? console_trylock+0x1e/0xe0 > [] ? trace_hardirqs_off_thunk+0x1a/0x1c > [] do_page_fault+0xc/0x10 > [] page_fault+0x22/0x30 > [] ? kthread_data+0x33/0x40 > [] ? wq_worker_sleeping+0xe/0x80 > [] __schedule+0x47b/0xba0 > [] schedule+0x40/0x90 > [] do_exit+0x7dd/0xc10 > [] oops_end+0x99/0xd0 > > The root cause is that zeroed task->vfork_done member is accessed from > wq_worker_sleeping() hook. The zeroing out happens on the following > path: > > oops_end() > do_exit() > exit_mm() > mm_release() > complete_vfork_done() > > In order to fix a bug dead tasks must be ignored. > > Signed-off-by: Roman Pen > Cc: Andy Lutomirski > Cc: Josh Poimboeuf > Cc: Borislav Petkov > Cc: Brian Gerst > Cc: Denys Vlasenko > Cc: H. Peter Anvin > Cc: Peter Zijlstra > Cc: Thomas Gleixner > Cc: Ingo Molnar > Cc: Tejun Heo > Cc: linux-kernel@vger.kernel.org > --- > kernel/sched/core.c | 16 +++++++++++++++- > 1 file changed, 15 insertions(+), 1 deletion(-) > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index 2c303e7..50772e5 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -3380,8 +3380,22 @@ static void __sched notrace __schedule(bool preempt) > * If a worker went to sleep, notify and ask workqueue > * whether it wants to wake up a task to maintain > * concurrency. > + * > + * Also the following stack is possible: > + * oops_end() > + * do_exit() > + * schedule() > + * > + * If panic_on_oops is not set and oops happens on > + * a workqueue execution path, thread will be killed. > + * That is definitly sad, but not to make the situation > + * even worse we have to ignore dead tasks in order not > + * to step on zeroed out members (e.g. t->vfork_done is > + * already NULL on that path, since we were called by > + * do_exit())) > */ > - if (prev->flags & PF_WQ_WORKER) { > + if (prev->flags & PF_WQ_WORKER && > + prev->state != TASK_DEAD) { > struct task_struct *to_wakeup; > > to_wakeup = wq_worker_sleeping(prev); > -- > 2.9.3 > -- Andy Lutomirski AMA Capital Management, LLC