2016-10-25 11:04:06

by Roman Pen

[permalink] [raw]
Subject: [PATCH v3 1/1] workqueue: ignore dead tasks in a workqueue sleep hook

If panic_on_oops is not set and oops happens inside workqueue kthread,
kernel kills this kthread. Current patch fixes recursive GPF which
happens when wq_worker_sleeping() function unconditionally accesses
the NULL kthread->vfork_done ptr thru kthread_data() -> to_kthread().

The stack is the following:

[<ffffffff81397f75>] dump_stack+0x68/0x93
[<ffffffff8106954b>] ? do_exit+0x7ab/0xc10
[<ffffffff8108fd73>] __schedule_bug+0x83/0xe0
[<ffffffff81716d5a>] __schedule+0x7ea/0xba0
[<ffffffff810c864f>] ? vprintk_default+0x1f/0x30
[<ffffffff8116a63c>] ? printk+0x48/0x50
[<ffffffff81717150>] schedule+0x40/0x90
[<ffffffff8106976a>] do_exit+0x9ca/0xc10
[<ffffffff810c8e3d>] ? kmsg_dump+0x11d/0x190
[<ffffffff810c8d37>] ? kmsg_dump+0x17/0x190
[<ffffffff81021ee9>] oops_end+0x99/0xd0
[<ffffffff81052da5>] no_context+0x185/0x3e0
[<ffffffff81053083>] __bad_area_nosemaphore+0x83/0x1c0
[<ffffffff810c820e>] ? vprintk_emit+0x25e/0x530
[<ffffffff810531d4>] bad_area_nosemaphore+0x14/0x20
[<ffffffff8105355c>] __do_page_fault+0xac/0x570
[<ffffffff810c66fe>] ? console_trylock+0x1e/0xe0
[<ffffffff81002036>] ? trace_hardirqs_off_thunk+0x1a/0x1c
[<ffffffff81053a2c>] do_page_fault+0xc/0x10
[<ffffffff8171f812>] page_fault+0x22/0x30
[<ffffffff81089bc3>] ? kthread_data+0x33/0x40
[<ffffffff8108427e>] ? wq_worker_sleeping+0xe/0x80
[<ffffffff817169eb>] __schedule+0x47b/0xba0
[<ffffffff81717150>] schedule+0x40/0x90
[<ffffffff8106957d>] do_exit+0x7dd/0xc10
[<ffffffff81021ee9>] oops_end+0x99/0xd0

kthread->vfork_done is zeroed out on the following path:

do_exit()
exit_mm()
mm_release()
complete_vfork_done()

In order to fix a bug dead tasks must be ignored.

Signed-off-by: Roman Pen <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: [email protected]
---
v3:
o minor comment and coding style fixes.

v2:
o put a task->state check directly into a wq_worker_sleeping() function
instead of changing the __schedule().

kernel/workqueue.c | 26 +++++++++++++++++++++++++-
1 file changed, 25 insertions(+), 1 deletion(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 9dc7ac5101e0..1525456b18c6 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -875,9 +875,33 @@ void wq_worker_waking_up(struct task_struct *task, int cpu)
*/
struct task_struct *wq_worker_sleeping(struct task_struct *task)
{
- struct worker *worker = kthread_data(task), *to_wakeup = NULL;
+ struct worker *worker, *to_wakeup = NULL;
struct worker_pool *pool;

+
+ if (task->state == TASK_DEAD) {
+ /*
+ * Here we try to catch the following path before
+ * accessing NULL kthread->vfork_done ptr thru
+ * kthread_data():
+ *
+ * oops_end()
+ * do_exit()
+ * schedule()
+ *
+ * If panic_on_oops is not set and oops happens on
+ * a workqueue execution path, thread will be killed.
+ * That is definitly sad, but not to make the situation
+ * even worse we have to ignore dead tasks in order not
+ * to step on zeroed out members (e.g. t->vfork_done is
+ * already NULL on that path, since we were called by
+ * do_exit())).
+ */
+ return NULL;
+ }
+
+ worker = kthread_data(task);
+
/*
* Rescuers, which may not have all the fields set up like normal
* workers, also reach here, let's not access anything before
--
2.9.3


2016-10-25 12:57:51

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH v3 1/1] workqueue: ignore dead tasks in a workqueue sleep hook

On 10/25, Roman Pen wrote:
>
> struct task_struct *wq_worker_sleeping(struct task_struct *task)
> {
> - struct worker *worker = kthread_data(task), *to_wakeup = NULL;
> + struct worker *worker, *to_wakeup = NULL;
> struct worker_pool *pool;
>
> +
> + if (task->state == TASK_DEAD) {
> + /*
> + * Here we try to catch the following path before
> + * accessing NULL kthread->vfork_done ptr thru
> + * kthread_data():
> + *
> + * oops_end()
> + * do_exit()
> + * schedule()
> + *
> + * If panic_on_oops is not set and oops happens on
> + * a workqueue execution path, thread will be killed.
> + * That is definitly sad, but not to make the situation
> + * even worse we have to ignore dead tasks in order not
> + * to step on zeroed out members (e.g. t->vfork_done is
> + * already NULL on that path, since we were called by
> + * do_exit())).
> + */
> + return NULL;
> + }

I still think that PF_EXITING check makes more sense than TASK_DEAD,
but I won't insist.

Oleg.

2016-10-25 13:01:32

by Roman Pen

[permalink] [raw]
Subject: Re: [PATCH v3 1/1] workqueue: ignore dead tasks in a workqueue sleep hook

On Tue, Oct 25, 2016 at 2:56 PM, Oleg Nesterov <[email protected]> wrote:
> On 10/25, Roman Pen wrote:
>>
>> struct task_struct *wq_worker_sleeping(struct task_struct *task)
>> {
>> - struct worker *worker = kthread_data(task), *to_wakeup = NULL;
>> + struct worker *worker, *to_wakeup = NULL;
>> struct worker_pool *pool;
>>
>> +
>> + if (task->state == TASK_DEAD) {
>> + /*
>> + * Here we try to catch the following path before
>> + * accessing NULL kthread->vfork_done ptr thru
>> + * kthread_data():
>> + *
>> + * oops_end()
>> + * do_exit()
>> + * schedule()
>> + *
>> + * If panic_on_oops is not set and oops happens on
>> + * a workqueue execution path, thread will be killed.
>> + * That is definitly sad, but not to make the situation
>> + * even worse we have to ignore dead tasks in order not
>> + * to step on zeroed out members (e.g. t->vfork_done is
>> + * already NULL on that path, since we were called by
>> + * do_exit())).
>> + */
>> + return NULL;
>> + }
>
> I still think that PF_EXITING check makes more sense than TASK_DEAD,
> but I won't insist.

Why? I probably do not see the corner cases, so, please, explain.

--
Roman

2016-10-25 14:21:00

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH v3 1/1] workqueue: ignore dead tasks in a workqueue sleep hook

On 10/25, Roman Penyaev wrote:
>
> On Tue, Oct 25, 2016 at 2:56 PM, Oleg Nesterov <[email protected]> wrote:
> > On 10/25, Roman Pen wrote:
> >>
> >> struct task_struct *wq_worker_sleeping(struct task_struct *task)
> >> {
> >> - struct worker *worker = kthread_data(task), *to_wakeup = NULL;
> >> + struct worker *worker, *to_wakeup = NULL;
> >> struct worker_pool *pool;
> >>
> >> +
> >> + if (task->state == TASK_DEAD) {
> >> + /*
> >> + * Here we try to catch the following path before
> >> + * accessing NULL kthread->vfork_done ptr thru
> >> + * kthread_data():
> >> + *
> >> + * oops_end()
> >> + * do_exit()
> >> + * schedule()
> >> + *
> >> + * If panic_on_oops is not set and oops happens on
> >> + * a workqueue execution path, thread will be killed.
> >> + * That is definitly sad, but not to make the situation
> >> + * even worse we have to ignore dead tasks in order not
> >> + * to step on zeroed out members (e.g. t->vfork_done is
> >> + * already NULL on that path, since we were called by
> >> + * do_exit())).
> >> + */
> >> + return NULL;
> >> + }
> >
> > I still think that PF_EXITING check makes more sense than TASK_DEAD,
> > but I won't insist.
>
> Why? I probably do not see the corner cases, so, please, explain.

If nothing else the crashed worker can schedule() before do_task_dead(),

But mainly, to me PF_EXITING just looks better. TASK_DEAD is the very
special state, only sched/core.c should use it.

and... perhaps we can just add

void oops_end_exit(void)
{
current->flags &= ~PF_WQ_WORKER;
perhaps sonething else;
}

called by oops_end() before rewind_stack_do_exit() ?

Oleg.

2016-10-25 15:58:48

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH v3 1/1] workqueue: ignore dead tasks in a workqueue sleep hook

On 10/25, Oleg Nesterov wrote:
>
> void oops_end_exit(void)
> {
> current->flags &= ~PF_WQ_WORKER;
> perhaps sonething else;
> }
>
> called by oops_end() before rewind_stack_do_exit() ?

and "perhaps sonething else" above should probably clear current->plug,
it likely points to "struct blk_plug" on stack.

and perhaps call task_work_run(). Currently only irq_thread() uses the
"destructor" work on stack, but think can have more users.

Oleg.

2016-10-25 16:09:41

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH v3 1/1] workqueue: ignore dead tasks in a workqueue sleep hook

sorry for noise, forgot to mention...

On 10/25, Oleg Nesterov wrote:
>
> On 10/25, Oleg Nesterov wrote:
> >
> > void oops_end_exit(void)
> > {
> > current->flags &= ~PF_WQ_WORKER;
> > perhaps sonething else;
> > }
> >
> > called by oops_end() before rewind_stack_do_exit() ?
>
> and "perhaps sonething else" above should probably clear current->plug,
> it likely points to "struct blk_plug" on stack.
>
> and perhaps call task_work_run(). Currently only irq_thread() uses the
> "destructor" work on stack, but think can have more users.

and, probably absorb some code from do_exit(), say set_fs(USER_DS) and/or
PF_EXITING check.

Oleg.

2016-10-25 18:17:20

by Roman Pen

[permalink] [raw]
Subject: Re: [PATCH v3 1/1] workqueue: ignore dead tasks in a workqueue sleep hook

On Tue, Oct 25, 2016 at 6:08 PM, Oleg Nesterov <[email protected]> wrote:
> sorry for noise, forgot to mention...
>
> On 10/25, Oleg Nesterov wrote:
>>
>> On 10/25, Oleg Nesterov wrote:
>> >
>> > void oops_end_exit(void)
>> > {
>> > current->flags &= ~PF_WQ_WORKER;
>> > perhaps sonething else;
>> > }
>> >
>> > called by oops_end() before rewind_stack_do_exit() ?
>>
>> and "perhaps sonething else" above should probably clear current->plug,
>> it likely points to "struct blk_plug" on stack.
>>
>> and perhaps call task_work_run(). Currently only irq_thread() uses the
>> "destructor" work on stack, but think can have more users.
>
> and, probably absorb some code from do_exit(), say set_fs(USER_DS) and/or
> PF_EXITING check.

I agree that for stack which was rewound we need more cleanups. But in the
current patch I do not want to mix several cases. Here I try to avoid access
to a NULL pointer of a dead task. That is not related to a corrupted stack.

Regarding the PF_WQ_WORKER flags. I liked the idea that workqueue itself
should be able to detect that task is dead. Spreading the code which does
flags cleanup for a special workqueue case IMO is not nice.

And, indeed, PF_EXITING looks more generic. Thanks. I will resend this.

--
Roman