Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S941259AbcJXQCO (ORCPT ); Mon, 24 Oct 2016 12:02:14 -0400 Received: from mail-yb0-f176.google.com ([209.85.213.176]:39666 "EHLO mail-yb0-f176.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933767AbcJXQCM (ORCPT ); Mon, 24 Oct 2016 12:02:12 -0400 MIME-Version: 1.0 In-Reply-To: <20161021154735.GA22949@redhat.com> References: <20160921154350.13128-1-roman.penyaev@profitbricks.com> <20160921154350.13128-2-roman.penyaev@profitbricks.com> <20161021154735.GA22949@redhat.com> From: Roman Penyaev Date: Mon, 24 Oct 2016 18:01:51 +0200 Message-ID: Subject: Re: [PATCH 2/2] sched: do not call workqueue sleep hook if task is already dead To: Oleg Nesterov Cc: Andy Lutomirski , Andy Lutomirski , Josh Poimboeuf , Borislav Petkov , Brian Gerst , Denys Vlasenko , "H . Peter Anvin" , Peter Zijlstra , Thomas Gleixner , Ingo Molnar , Tejun Heo , "linux-kernel@vger.kernel.org" Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3344 Lines: 92 On Fri, Oct 21, 2016 at 5:47 PM, Oleg Nesterov wrote: > On 10/20, Andy Lutomirski wrote: >> >> > --- a/kernel/sched/core.c >> > +++ b/kernel/sched/core.c >> > @@ -3380,8 +3380,22 @@ static void __sched notrace __schedule(bool preempt) >> > * If a worker went to sleep, notify and ask workqueue >> > * whether it wants to wake up a task to maintain >> > * concurrency. >> > + * >> > + * Also the following stack is possible: >> > + * oops_end() >> > + * do_exit() >> > + * schedule() >> > + * >> > + * If panic_on_oops is not set and oops happens on >> > + * a workqueue execution path, thread will be killed. >> > + * That is definitly sad, but not to make the situation >> > + * even worse we have to ignore dead tasks in order not >> > + * to step on zeroed out members (e.g. t->vfork_done is >> > + * already NULL on that path, since we were called by >> > + * do_exit())) > > And we have more problems like this. Say, if blk_flush_plug_list() > crashes it will likely crash again and again recursively. I will send a patch if reproduce it :) > >> > */ >> > - if (prev->flags & PF_WQ_WORKER) { >> > + if (prev->flags & PF_WQ_WORKER && >> > + prev->state != TASK_DEAD) { > > I don't think we should change __schedule()... Can't we simply clear > PF_WQ_WORKER in complete_vfork_done() ? Or add the PF_EXITING checks > into wq_worker_sleeping() and wq_worker_waking_up(). Yeah, probably handling this corner case in wq_worker_sleeping() func is much better. > > Or perhaps something like the change below. That's a nice stuff, thanks Oleg. I simply did not know about these callbacks. But the huge problem is that after commit 2deb4be28 by Andy Lutomirski we can't use stack when we are already in do_exit(). And putting this callback head inside a worker structure is a bloat. I will resend this with a simple task state check in a wq_worker_sleeping(). -- Roman > > Oleg. > > --- x/kernel/workqueue.c > +++ x/kernel/workqueue.c > @@ -2157,6 +2157,14 @@ static void process_scheduled_works(stru > } > } > > +static void oops_handler(struct callback_head *oops_work) > +{ > + if (!(current->flags & PF_WQ_WORKER)) > + return; > + > + clear PF_WQ_WORKER, probably do more cleanups > +} > + > /** > * worker_thread - the worker thread function > * @__worker: self > @@ -2171,11 +2179,14 @@ static void process_scheduled_works(stru > */ > static int worker_thread(void *__worker) > { > + struct callback_head oops_work; > struct worker *worker = __worker; > struct worker_pool *pool = worker->pool; > > /* tell the scheduler that this is a workqueue worker */ > worker->task->flags |= PF_WQ_WORKER; > + init_task_work(&oops_work, oops_handler); > + task_work_add(current, &oops_work, false); > woke_up: > spin_lock_irq(&pool->lock); > >