2018-06-29 21:19:01

by Isaac J. Manjarres

[permalink] [raw]
Subject: [PATCH v2] stop_machine: Disable preemption when waking two stopper threads

When cpu_stop_queue_two_works() begins to wake the stopper
threads, it does so without preemption disabled, which leads
to the following race condition:

The source CPU calls cpu_stop_queue_two_works(), with cpu1
as the source CPU, and cpu2 as the destination CPU. When
adding the stopper threads to the wake queue used in this
function, the source CPU stopper thread is added first,
and the destination CPU stopper thread is added last.

When wake_up_q() is invoked to wake the stopper threads, the
threads are woken up in the order that they are queued in,
so the source CPU's stopper thread is woken up first, and
it preempts the thread running on the source CPU.

The stopper thread will then execute on the source CPU,
disable preemption, and begin executing multi_cpu_stop(),
and wait for an ack from the destination CPU's stopper thread,
with preemption still disabled. Since the worker thread that
woke up the stopper thread on the source CPU is affine to the
source CPU, and preemption is disabled on the source CPU, that
thread will never run to dequeue the destination CPU's stopper
thread from the wake queue, and thus, the destination CPU's
stopper thread will never run, causing the source CPU's stopper
thread to wait forever, and stall.

Disable preemption when waking the stopper threads in
cpu_stop_queue_two_works() to ensure that the worker thread
that is waking up the stopper threads isn't preempted
by the source CPU's stopper thread, and permanently
scheduled out, leaving the remaining stopper thread asleep
in the wake queue.

Co-developed-by: Pavankumar Kondeti <[email protected]>
Signed-off-by: Prasad Sodagudi <[email protected]>
Signed-off-by: Pavankumar Kondeti <[email protected]>
Signed-off-by: Isaac J. Manjarres <[email protected]>
---
kernel/stop_machine.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
index f89014a..1ff523d 100644
--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -270,7 +270,11 @@ static int cpu_stop_queue_two_works(int cpu1, struct cpu_stop_work *work1,
goto retry;
}

- wake_up_q(&wakeq);
+ if (!err) {
+ preempt_disable();
+ wake_up_q(&wakeq);
+ preempt_enable();
+ }

return err;
}
--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project



2018-07-02 04:53:00

by Pavankumar Kondeti

[permalink] [raw]
Subject: Re: [PATCH v2] stop_machine: Disable preemption when waking two stopper threads

Hi Issac,

On Fri, Jun 29, 2018 at 01:55:12PM -0700, Isaac J. Manjarres wrote:
> When cpu_stop_queue_two_works() begins to wake the stopper
> threads, it does so without preemption disabled, which leads
> to the following race condition:
>
> The source CPU calls cpu_stop_queue_two_works(), with cpu1
> as the source CPU, and cpu2 as the destination CPU. When
> adding the stopper threads to the wake queue used in this
> function, the source CPU stopper thread is added first,
> and the destination CPU stopper thread is added last.
>
> When wake_up_q() is invoked to wake the stopper threads, the
> threads are woken up in the order that they are queued in,
> so the source CPU's stopper thread is woken up first, and
> it preempts the thread running on the source CPU.
>
> The stopper thread will then execute on the source CPU,
> disable preemption, and begin executing multi_cpu_stop(),
> and wait for an ack from the destination CPU's stopper thread,
> with preemption still disabled. Since the worker thread that
> woke up the stopper thread on the source CPU is affine to the
> source CPU, and preemption is disabled on the source CPU, that
> thread will never run to dequeue the destination CPU's stopper
> thread from the wake queue, and thus, the destination CPU's
> stopper thread will never run, causing the source CPU's stopper
> thread to wait forever, and stall.
>
> Disable preemption when waking the stopper threads in
> cpu_stop_queue_two_works() to ensure that the worker thread
> that is waking up the stopper threads isn't preempted
> by the source CPU's stopper thread, and permanently
> scheduled out, leaving the remaining stopper thread asleep
> in the wake queue.
>
> Co-developed-by: Pavankumar Kondeti <[email protected]>
> Signed-off-by: Prasad Sodagudi <[email protected]>
> Signed-off-by: Pavankumar Kondeti <[email protected]>
> Signed-off-by: Isaac J. Manjarres <[email protected]>
> ---

You might want to add the below Fixes tag and CC stable.

Fixes: 0b26351b910f ("stop_machine, sched: Fix migrate_swap() vs. active_balance() deadlock")

> kernel/stop_machine.c | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
> index f89014a..1ff523d 100644
> --- a/kernel/stop_machine.c
> +++ b/kernel/stop_machine.c
> @@ -270,7 +270,11 @@ static int cpu_stop_queue_two_works(int cpu1, struct cpu_stop_work *work1,
> goto retry;
> }
>
> - wake_up_q(&wakeq);
> + if (!err) {
> + preempt_disable();
> + wake_up_q(&wakeq);
> + preempt_enable();
> + }
>
> return err;
> }

--
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.


2018-07-02 12:18:44

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v2] stop_machine: Disable preemption when waking two stopper threads

On Fri, Jun 29, 2018 at 01:55:12PM -0700, Isaac J. Manjarres wrote:
> When cpu_stop_queue_two_works() begins to wake the stopper
> threads, it does so without preemption disabled, which leads
> to the following race condition:
>
> The source CPU calls cpu_stop_queue_two_works(), with cpu1
> as the source CPU, and cpu2 as the destination CPU. When
> adding the stopper threads to the wake queue used in this
> function, the source CPU stopper thread is added first,
> and the destination CPU stopper thread is added last.
>
> When wake_up_q() is invoked to wake the stopper threads, the
> threads are woken up in the order that they are queued in,
> so the source CPU's stopper thread is woken up first, and
> it preempts the thread running on the source CPU.
>
> The stopper thread will then execute on the source CPU,
> disable preemption, and begin executing multi_cpu_stop(),
> and wait for an ack from the destination CPU's stopper thread,
> with preemption still disabled. Since the worker thread that
> woke up the stopper thread on the source CPU is affine to the
> source CPU, and preemption is disabled on the source CPU, that
> thread will never run to dequeue the destination CPU's stopper
> thread from the wake queue, and thus, the destination CPU's
> stopper thread will never run, causing the source CPU's stopper
> thread to wait forever, and stall.
>
> Disable preemption when waking the stopper threads in
> cpu_stop_queue_two_works() to ensure that the worker thread
> that is waking up the stopper threads isn't preempted
> by the source CPU's stopper thread, and permanently
> scheduled out, leaving the remaining stopper thread asleep
> in the wake queue.
>
> Co-developed-by: Pavankumar Kondeti <[email protected]>
> Signed-off-by: Prasad Sodagudi <[email protected]>
> Signed-off-by: Pavankumar Kondeti <[email protected]>
> Signed-off-by: Isaac J. Manjarres <[email protected]>

That SoB chain is broken, if Prasad wrote the ptch then there needs to
be a From: line somewhere.

But yes, that looks about right.

> ---
> kernel/stop_machine.c | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
> index f89014a..1ff523d 100644
> --- a/kernel/stop_machine.c
> +++ b/kernel/stop_machine.c
> @@ -270,7 +270,11 @@ static int cpu_stop_queue_two_works(int cpu1, struct cpu_stop_work *work1,
> goto retry;
> }
>
> - wake_up_q(&wakeq);
> + if (!err) {
> + preempt_disable();
> + wake_up_q(&wakeq);
> + preempt_enable();
> + }
>
> return err;
> }
> --
> The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
> a Linux Foundation Collaborative Project
>

2018-07-03 05:54:41

by Isaac J. Manjarres

[permalink] [raw]
Subject: Re: [PATCH v2] stop_machine: Disable preemption when waking two stopper threads

Hi Peter,

Thanks for the feedback. I'll make sure to incorporate it into my next
patch, and send that soon.

Thanks,
Isaac Manjarres
On 2018-07-02 05:15, Peter Zijlstra wrote:
> On Fri, Jun 29, 2018 at 01:55:12PM -0700, Isaac J. Manjarres wrote:
>> When cpu_stop_queue_two_works() begins to wake the stopper
>> threads, it does so without preemption disabled, which leads
>> to the following race condition:
>>
>> The source CPU calls cpu_stop_queue_two_works(), with cpu1
>> as the source CPU, and cpu2 as the destination CPU. When
>> adding the stopper threads to the wake queue used in this
>> function, the source CPU stopper thread is added first,
>> and the destination CPU stopper thread is added last.
>>
>> When wake_up_q() is invoked to wake the stopper threads, the
>> threads are woken up in the order that they are queued in,
>> so the source CPU's stopper thread is woken up first, and
>> it preempts the thread running on the source CPU.
>>
>> The stopper thread will then execute on the source CPU,
>> disable preemption, and begin executing multi_cpu_stop(),
>> and wait for an ack from the destination CPU's stopper thread,
>> with preemption still disabled. Since the worker thread that
>> woke up the stopper thread on the source CPU is affine to the
>> source CPU, and preemption is disabled on the source CPU, that
>> thread will never run to dequeue the destination CPU's stopper
>> thread from the wake queue, and thus, the destination CPU's
>> stopper thread will never run, causing the source CPU's stopper
>> thread to wait forever, and stall.
>>
>> Disable preemption when waking the stopper threads in
>> cpu_stop_queue_two_works() to ensure that the worker thread
>> that is waking up the stopper threads isn't preempted
>> by the source CPU's stopper thread, and permanently
>> scheduled out, leaving the remaining stopper thread asleep
>> in the wake queue.
>>
>> Co-developed-by: Pavankumar Kondeti <[email protected]>
>> Signed-off-by: Prasad Sodagudi <[email protected]>
>> Signed-off-by: Pavankumar Kondeti <[email protected]>
>> Signed-off-by: Isaac J. Manjarres <[email protected]>
>
> That SoB chain is broken, if Prasad wrote the ptch then there needs to
> be a From: line somewhere.
>
> But yes, that looks about right.
>
>> ---
>> kernel/stop_machine.c | 6 +++++-
>> 1 file changed, 5 insertions(+), 1 deletion(-)
>>
>> diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
>> index f89014a..1ff523d 100644
>> --- a/kernel/stop_machine.c
>> +++ b/kernel/stop_machine.c
>> @@ -270,7 +270,11 @@ static int cpu_stop_queue_two_works(int cpu1,
>> struct cpu_stop_work *work1,
>> goto retry;
>> }
>>
>> - wake_up_q(&wakeq);
>> + if (!err) {
>> + preempt_disable();
>> + wake_up_q(&wakeq);
>> + preempt_enable();
>> + }
>>
>> return err;
>> }
>> --
>> The Qualcomm Innovation Center, Inc. is a member of the Code Aurora
>> Forum,
>> a Linux Foundation Collaborative Project
>>