Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S935650AbcJUPtQ (ORCPT ); Fri, 21 Oct 2016 11:49:16 -0400 Received: from mx1.redhat.com ([209.132.183.28]:42646 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S934311AbcJUPtH (ORCPT ); Fri, 21 Oct 2016 11:49:07 -0400 Date: Fri, 21 Oct 2016 17:47:36 +0200 From: Oleg Nesterov To: Andy Lutomirski Cc: Roman Pen , Andy Lutomirski , Josh Poimboeuf , Borislav Petkov , Brian Gerst , Denys Vlasenko , "H . Peter Anvin" , Peter Zijlstra , Thomas Gleixner , Ingo Molnar , Tejun Heo , "linux-kernel@vger.kernel.org" Subject: Re: [PATCH 2/2] sched: do not call workqueue sleep hook if task is already dead Message-ID: <20161021154735.GA22949@redhat.com> References: <20160921154350.13128-1-roman.penyaev@profitbricks.com> <20160921154350.13128-2-roman.penyaev@profitbricks.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.18 (2008-05-17) X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.26]); Fri, 21 Oct 2016 15:49:07 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2593 Lines: 70 On 10/20, Andy Lutomirski wrote: > > > --- a/kernel/sched/core.c > > +++ b/kernel/sched/core.c > > @@ -3380,8 +3380,22 @@ static void __sched notrace __schedule(bool preempt) > > * If a worker went to sleep, notify and ask workqueue > > * whether it wants to wake up a task to maintain > > * concurrency. > > + * > > + * Also the following stack is possible: > > + * oops_end() > > + * do_exit() > > + * schedule() > > + * > > + * If panic_on_oops is not set and oops happens on > > + * a workqueue execution path, thread will be killed. > > + * That is definitly sad, but not to make the situation > > + * even worse we have to ignore dead tasks in order not > > + * to step on zeroed out members (e.g. t->vfork_done is > > + * already NULL on that path, since we were called by > > + * do_exit())) And we have more problems like this. Say, if blk_flush_plug_list() crashes it will likely crash again and again recursively. > > */ > > - if (prev->flags & PF_WQ_WORKER) { > > + if (prev->flags & PF_WQ_WORKER && > > + prev->state != TASK_DEAD) { I don't think we should change __schedule()... Can't we simply clear PF_WQ_WORKER in complete_vfork_done() ? Or add the PF_EXITING checks into wq_worker_sleeping() and wq_worker_waking_up(). Or perhaps something like the change below. Oleg. --- x/kernel/workqueue.c +++ x/kernel/workqueue.c @@ -2157,6 +2157,14 @@ static void process_scheduled_works(stru } } +static void oops_handler(struct callback_head *oops_work) +{ + if (!(current->flags & PF_WQ_WORKER)) + return; + + clear PF_WQ_WORKER, probably do more cleanups +} + /** * worker_thread - the worker thread function * @__worker: self @@ -2171,11 +2179,14 @@ static void process_scheduled_works(stru */ static int worker_thread(void *__worker) { + struct callback_head oops_work; struct worker *worker = __worker; struct worker_pool *pool = worker->pool; /* tell the scheduler that this is a workqueue worker */ worker->task->flags |= PF_WQ_WORKER; + init_task_work(&oops_work, oops_handler); + task_work_add(current, &oops_work, false); woke_up: spin_lock_irq(&pool->lock);