2015-08-28 04:03:01

by Wanpeng Li

[permalink] [raw]
Subject: [PATCH v2] sched: fix tsk->pi_lock isn't held when do_set_cpus_allowed()

------------[ cut here ]------------
WARNING: CPU: 0 PID: 13 at kernel/sched/core.c:1156 do_set_cpus_allowed+0x7e/0x80()
Modules linked in:
CPU: 0 PID: 13 Comm: migration/0 Not tainted 4.2.0-rc1-00049-g25834c7 #2
Call Trace:
dump_stack+0x4b/0x75
warn_slowpath_common+0x8b/0xc0
? do_set_cpus_allowed+0x7e/0x80
? do_set_cpus_allowed+0x7e/0x80
warn_slowpath_null+0x22/0x30
do_set_cpus_allowed+0x7e/0x80
cpuset_cpus_allowed_fallback+0x7c/0x170
? cpuset_cpus_allowed+0x180/0x180
select_fallback_rq+0x221/0x280
migration_call+0xe3/0x250
notifier_call_chain+0x53/0x70
__raw_notifier_call_chain+0x1e/0x30
cpu_notify+0x28/0x50
take_cpu_down+0x22/0x40
multi_cpu_stop+0xd5/0x140
? __stop_cpus+0x80/0x80
cpu_stopper_thread+0xbc/0x170
? preempt_count_sub+0x9/0x50
? _raw_spin_unlock_irq+0x37/0x50
? _raw_spin_unlock_irqrestore+0x55/0x70
? trace_hardirqs_on_caller+0x144/0x1e0
? cpu_stop_should_run+0x35/0x40
? preempt_count_sub+0x9/0x50
? _raw_spin_unlock_irqrestore+0x41/0x70
smpboot_thread_fn+0x174/0x2f0
? sort_range+0x30/0x30
kthread+0xc4/0xe0
ret_from_kernel_thread+0x21/0x30
? kthread_create_on_node+0x180/0x180
---[ end trace 15f4c86d404693b0 ]---

As Peterz pointed out:

| So the normal rules for changing task_struct::cpus_allowed are holding
| both pi_lock and rq->lock, such that holding either stabilizes the mask.
|
| This is so that wakeup can happen without rq->lock and load-balance
| without pi_lock.
|
| From this we already get the relaxation that we can omit acquiring
| rq->lock if the task is not on the rq, because in that case
| load-balancing will not apply to it.
|
| ** these are the rules currently tested in do_set_cpus_allowed() **
|
| Now, since __set_cpus_allowed_ptr() uses task_rq_lock() which
| unconditionally acquires both locks, we could get away with holding just
| rq->lock when on_rq for modification because that'd still exclude
| __set_cpus_allowed_ptr(), it would also work against
| __kthread_bind_mask() because that assumes !on_rq.
|
| That said, this is all somewhat fragile.
|
| Now, I don't think dropping rq->lock is quite as disastrous as it
| usually is because !cpu_active at this point, which means load-balance
| will not interfere, but that too is somewhat fragile.
|
| So we end up with a choice of two fragile..

This patch fix it by following the rules for changing task_struct::cpus_allowed
w/ both pi_lock and rq->lock are held.

Reported-by: kernel test robot <[email protected]>
Reported-by: Sasha Levin <[email protected]>
Signed-off-by: Wanpeng Li <[email protected]>
---
v1 -> v2:
* fix the silly double lock stuff
* follow the rules for changing task_struct::cpus_allowed

kernel/sched/core.c | 22 ++++++++++++++++++++++
1 files changed, 22 insertions(+), 0 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index b3386c6..8cf87e3 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5186,6 +5186,27 @@ static void migrate_tasks(struct rq *dead_rq)
BUG_ON(!next);
next->sched_class->put_prev_task(rq, next);

+ /*
+ * Rules for changing task_struct::cpus_allowed are holding
+ * both pi_lock and rq->lock, such that holding either
+ * stabilizes the mask.
+ *
+ * Drop rq->lock is not quite as disastrous as it usually is
+ * because !cpu_active at this point, which means load-balance
+ * will not interfere.
+ */
+ lockdep_unpin_lock(&rq->lock);
+ raw_spin_unlock(&rq->lock);
+ raw_spin_lock(&next->pi_lock);
+ raw_spin_lock(&rq->lock);
+ lockdep_pin_lock(&rq->lock);
+ if (!(task_rq(next) == rq && task_on_rq_queued(next))) {
+ lockdep_unpin_lock(&rq->lock);
+ raw_spin_unlock(&rq->lock);
+ raw_spin_unlock(&next->pi_lock);
+ continue;
+ }
+
/* Find suitable destination for @next, with force if needed. */
dest_cpu = select_fallback_rq(dead_rq->cpu, next);

@@ -5196,6 +5217,7 @@ static void migrate_tasks(struct rq *dead_rq)
rq = dead_rq;
raw_spin_lock(&rq->lock);
}
+ raw_spin_unlock(&next->pi_lock);
}

rq->stop = stop;
--
1.7.1


2015-08-28 06:33:57

by Boqun Feng

[permalink] [raw]
Subject: Re: [PATCH v2] sched: fix tsk->pi_lock isn't held when do_set_cpus_allowed()

Hi Wanpeng,

On Fri, Aug 28, 2015 at 12:02:47PM +0800, Wanpeng Li wrote:
<snip>
> This patch fix it by following the rules for changing task_struct::cpus_allowed
> w/ both pi_lock and rq->lock are held.
>
> Reported-by: kernel test robot <[email protected]>
> Reported-by: Sasha Levin <[email protected]>
> Signed-off-by: Wanpeng Li <[email protected]>
> ---
> v1 -> v2:
> * fix the silly double lock stuff
> * follow the rules for changing task_struct::cpus_allowed
>
> kernel/sched/core.c | 22 ++++++++++++++++++++++
> 1 files changed, 22 insertions(+), 0 deletions(-)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index b3386c6..8cf87e3 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -5186,6 +5186,27 @@ static void migrate_tasks(struct rq *dead_rq)
> BUG_ON(!next);
> next->sched_class->put_prev_task(rq, next);
>
> + /*
> + * Rules for changing task_struct::cpus_allowed are holding
> + * both pi_lock and rq->lock, such that holding either
> + * stabilizes the mask.
> + *
> + * Drop rq->lock is not quite as disastrous as it usually is
> + * because !cpu_active at this point, which means load-balance
> + * will not interfere.
> + */
> + lockdep_unpin_lock(&rq->lock);
> + raw_spin_unlock(&rq->lock);
> + raw_spin_lock(&next->pi_lock);
> + raw_spin_lock(&rq->lock);
> + lockdep_pin_lock(&rq->lock);
> + if (!(task_rq(next) == rq && task_on_rq_queued(next))) {
> + lockdep_unpin_lock(&rq->lock);
> + raw_spin_unlock(&rq->lock);

Dropping rq->lock here means we will continue the loop without the
rq->lock, right? But we do have a lockdep_pin_lock(&rq->lock) in the
beginning of every iteration. So can we release rq->lock here?

Regards,
Boqun

> + raw_spin_unlock(&next->pi_lock);
> + continue;
> + }
> +
> /* Find suitable destination for @next, with force if needed. */
> dest_cpu = select_fallback_rq(dead_rq->cpu, next);
>
> @@ -5196,6 +5217,7 @@ static void migrate_tasks(struct rq *dead_rq)
> rq = dead_rq;
> raw_spin_lock(&rq->lock);
> }
> + raw_spin_unlock(&next->pi_lock);
> }
>
> rq->stop = stop;
> --
> 1.7.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/


Attachments:
(No filename) (2.36 kB)
signature.asc (473.00 B)
Download all attachments

2015-08-28 06:43:37

by Wanpeng Li

[permalink] [raw]
Subject: Re: [PATCH v2] sched: fix tsk->pi_lock isn't held when do_set_cpus_allowed()

Hi Boqun,
On 8/28/15 2:33 PM, Boqun Feng wrote:
> Hi Wanpeng,
>
> On Fri, Aug 28, 2015 at 12:02:47PM +0800, Wanpeng Li wrote:
> <snip>
>> This patch fix it by following the rules for changing task_struct::cpus_allowed
>> w/ both pi_lock and rq->lock are held.
>>
>> Reported-by: kernel test robot <[email protected]>
>> Reported-by: Sasha Levin <[email protected]>
>> Signed-off-by: Wanpeng Li <[email protected]>
>> ---
>> v1 -> v2:
>> * fix the silly double lock stuff
>> * follow the rules for changing task_struct::cpus_allowed
>>
>> kernel/sched/core.c | 22 ++++++++++++++++++++++
>> 1 files changed, 22 insertions(+), 0 deletions(-)
>>
>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> index b3386c6..8cf87e3 100644
>> --- a/kernel/sched/core.c
>> +++ b/kernel/sched/core.c
>> @@ -5186,6 +5186,27 @@ static void migrate_tasks(struct rq *dead_rq)
>> BUG_ON(!next);
>> next->sched_class->put_prev_task(rq, next);
>>
>> + /*
>> + * Rules for changing task_struct::cpus_allowed are holding
>> + * both pi_lock and rq->lock, such that holding either
>> + * stabilizes the mask.
>> + *
>> + * Drop rq->lock is not quite as disastrous as it usually is
>> + * because !cpu_active at this point, which means load-balance
>> + * will not interfere.
>> + */
>> + lockdep_unpin_lock(&rq->lock);
>> + raw_spin_unlock(&rq->lock);
>> + raw_spin_lock(&next->pi_lock);
>> + raw_spin_lock(&rq->lock);
>> + lockdep_pin_lock(&rq->lock);
>> + if (!(task_rq(next) == rq && task_on_rq_queued(next))) {
>> + lockdep_unpin_lock(&rq->lock);
>> + raw_spin_unlock(&rq->lock);
> Dropping rq->lock here means we will continue the loop without the
> rq->lock, right? But we do have a lockdep_pin_lock(&rq->lock) in the
> beginning of every iteration. So can we release rq->lock here?

Good catch! There is no need to lockdep_unpin and unlock rq->lock I think.

Regards,
Wanpeng Li

>
> Regards,
> Boqun
>
>> + raw_spin_unlock(&next->pi_lock);
>> + continue;
>> + }
>> +
>> /* Find suitable destination for @next, with force if needed. */
>> dest_cpu = select_fallback_rq(dead_rq->cpu, next);
>>
>> @@ -5196,6 +5217,7 @@ static void migrate_tasks(struct rq *dead_rq)
>> rq = dead_rq;
>> raw_spin_lock(&rq->lock);
>> }
>> + raw_spin_unlock(&next->pi_lock);
>> }
>>
>> rq->stop = stop;
>> --
>> 1.7.1
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at http://www.tux.org/lkml/