MIME-Version: 1.0
In-Reply-To: <20161021154735.GA22949@redhat.com>
References: <20160921154350.13128-1-roman.penyaev@profitbricks.com>
 <20160921154350.13128-2-roman.penyaev@profitbricks.com> <CALCETrUxRSGg=AwyX5eYxWq=bYG=JAjTv4V1g9UX3ng5WANoUA@mail.gmail.com>
 <20161021154735.GA22949@redhat.com>
From: Roman Penyaev <roman.penyaev@profitbricks.com>
Date: Mon, 24 Oct 2016 18:01:51 +0200
Message-ID: <CAJrWOzCJuzP39h_=_aLfFULwd-t-2GZ-VQ9J=NtiAuKxD2sniQ@mail.gmail.com>
Subject: Re: [PATCH 2/2] sched: do not call workqueue sleep hook if task is
 already dead
To: Oleg Nesterov <oleg@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>,
        Andy Lutomirski <luto@kernel.org>,
        Josh Poimboeuf <jpoimboe@redhat.com>, Borislav Petkov <bp@alien8.de>,
        Brian Gerst <brgerst@gmail.com>, Denys Vlasenko <dvlasenk@redhat.com>,
        "H . Peter Anvin" <hpa@zytor.com>,
        Peter Zijlstra <peterz@infradead.org>,
        Thomas Gleixner <tglx@linutronix.de>, Ingo Molnar <mingo@redhat.com>,
        Tejun Heo <tj@kernel.org>,
        "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Content-Type: text/plain; charset=UTF-8
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 3344
Lines: 92

On Fri, Oct 21, 2016 at 5:47 PM, Oleg Nesterov <oleg@redhat.com> wrote:
> On 10/20, Andy Lutomirski wrote:
>>
>> > --- a/kernel/sched/core.c
>> > +++ b/kernel/sched/core.c
>> > @@ -3380,8 +3380,22 @@ static void __sched notrace __schedule(bool preempt)
>> >                          * If a worker went to sleep, notify and ask workqueue
>> >                          * whether it wants to wake up a task to maintain
>> >                          * concurrency.
>> > +                        *
>> > +                        * Also the following stack is possible:
>> > +                        *    oops_end()
>> > +                        *    do_exit()
>> > +                        *    schedule()
>> > +                        *
>> > +                        * If panic_on_oops is not set and oops happens on
>> > +                        * a workqueue execution path, thread will be killed.
>> > +                        * That is definitly sad, but not to make the situation
>> > +                        * even worse we have to ignore dead tasks in order not
>> > +                        * to step on zeroed out members (e.g. t->vfork_done is
>> > +                        * already NULL on that path, since we were called by
>> > +                        * do_exit()))
>
> And we have more problems like this. Say, if blk_flush_plug_list()
> crashes it will likely crash again and again recursively.

I will send a patch if reproduce it :)

>
>> >                          */
>> > -                       if (prev->flags & PF_WQ_WORKER) {
>> > +                       if (prev->flags & PF_WQ_WORKER &&
>> > +                           prev->state != TASK_DEAD) {
>
> I don't think we should change __schedule()... Can't we simply clear
> PF_WQ_WORKER in complete_vfork_done() ? Or add the PF_EXITING checks
> into wq_worker_sleeping() and wq_worker_waking_up().

Yeah, probably handling this corner case in wq_worker_sleeping() func
is much better.

>
> Or perhaps something like the change below.

That's a nice stuff, thanks Oleg. I simply did not know about these
callbacks.

But the huge problem is that after commit 2deb4be28 by Andy Lutomirski
we can't use stack when we are already in do_exit().  And putting this
callback head inside a worker structure is a bloat.  I will resend this
with a simple task state check in a wq_worker_sleeping().

--
Roman


>
> Oleg.
>
> --- x/kernel/workqueue.c
> +++ x/kernel/workqueue.c
> @@ -2157,6 +2157,14 @@ static void process_scheduled_works(stru
>         }
>  }
>
> +static void oops_handler(struct callback_head *oops_work)
> +{
> +       if (!(current->flags & PF_WQ_WORKER))
> +               return;
> +
> +       clear PF_WQ_WORKER, probably do more cleanups
> +}
> +
>  /**
>   * worker_thread - the worker thread function
>   * @__worker: self
> @@ -2171,11 +2179,14 @@ static void process_scheduled_works(stru
>   */
>  static int worker_thread(void *__worker)
>  {
> +       struct callback_head oops_work;
>         struct worker *worker = __worker;
>         struct worker_pool *pool = worker->pool;
>
>         /* tell the scheduler that this is a workqueue worker */
>         worker->task->flags |= PF_WQ_WORKER;
> +       init_task_work(&oops_work, oops_handler);
> +       task_work_add(current, &oops_work, false);
>  woke_up:
>         spin_lock_irq(&pool->lock);
>
>