Date: Tue, 25 Oct 2016 16:19:20 +0200
From: Oleg Nesterov <oleg@redhat.com>
To: Roman Penyaev <roman.penyaev@profitbricks.com>
Cc: Andy Lutomirski <luto@kernel.org>,
        Peter Zijlstra <peterz@infradead.org>,
        Thomas Gleixner <tglx@linutronix.de>, Ingo Molnar <mingo@redhat.com>,
        Tejun Heo <tj@kernel.org>, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v3 1/1] workqueue: ignore dead tasks in a workqueue
        sleep hook
Message-ID: <20161025141920.GC4326@redhat.com>
References: <20161025110357.8821-1-roman.penyaev@profitbricks.com> <20161025125615.GA4326@redhat.com> <CAJrWOzBCa5fqV0N_OK65F4wqJWyXLs+J_m4e8_VWP3BAPMMTxw@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <CAJrWOzBCa5fqV0N_OK65F4wqJWyXLs+J_m4e8_VWP3BAPMMTxw@mail.gmail.com>
User-Agent: Mutt/1.5.18 (2008-05-17)
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 1883
Lines: 54

On 10/25, Roman Penyaev wrote:
>
> On Tue, Oct 25, 2016 at 2:56 PM, Oleg Nesterov <oleg@redhat.com> wrote:
> > On 10/25, Roman Pen wrote:
> >>
> >>  struct task_struct *wq_worker_sleeping(struct task_struct *task)
> >>  {
> >> -     struct worker *worker = kthread_data(task), *to_wakeup = NULL;
> >> +     struct worker *worker, *to_wakeup = NULL;
> >>       struct worker_pool *pool;
> >>
> >> +
> >> +     if (task->state == TASK_DEAD) {
> >> +             /*
> >> +              * Here we try to catch the following path before
> >> +              * accessing NULL kthread->vfork_done ptr thru
> >> +              * kthread_data():
> >> +              *
> >> +              *    oops_end()
> >> +              *    do_exit()
> >> +              *    schedule()
> >> +              *
> >> +              * If panic_on_oops is not set and oops happens on
> >> +              * a workqueue execution path, thread will be killed.
> >> +              * That is definitly sad, but not to make the situation
> >> +              * even worse we have to ignore dead tasks in order not
> >> +              * to step on zeroed out members (e.g. t->vfork_done is
> >> +              * already NULL on that path, since we were called by
> >> +              * do_exit())).
> >> +              */
> >> +             return NULL;
> >> +     }
> >
> > I still think that PF_EXITING check makes more sense than TASK_DEAD,
> > but I won't insist.
>
> Why?  I probably do not see the corner cases, so, please, explain.

If nothing else the crashed worker can schedule() before do_task_dead(),

But mainly, to me PF_EXITING just looks better. TASK_DEAD is the very
special state, only sched/core.c should use it.

and... perhaps we can just add

	void oops_end_exit(void)
	{
		current->flags &= ~PF_WQ_WORKER;
		perhaps sonething else;
	}

called by oops_end() before rewind_stack_do_exit() ?

Oleg.