------------[ cut here ]------------
WARNING: CPU: 0 PID: 13 at kernel/sched/core.c:1156 do_set_cpus_allowed+0x7e/0x80()
Modules linked in:
CPU: 0 PID: 13 Comm: migration/0 Not tainted 4.2.0-rc1-00049-g25834c7 #2
Call Trace:
dump_stack+0x4b/0x75
warn_slowpath_common+0x8b/0xc0
? do_set_cpus_allowed+0x7e/0x80
? do_set_cpus_allowed+0x7e/0x80
warn_slowpath_null+0x22/0x30
do_set_cpus_allowed+0x7e/0x80
cpuset_cpus_allowed_fallback+0x7c/0x170
? cpuset_cpus_allowed+0x180/0x180
select_fallback_rq+0x221/0x280
migration_call+0xe3/0x250
notifier_call_chain+0x53/0x70
__raw_notifier_call_chain+0x1e/0x30
cpu_notify+0x28/0x50
take_cpu_down+0x22/0x40
multi_cpu_stop+0xd5/0x140
? __stop_cpus+0x80/0x80
cpu_stopper_thread+0xbc/0x170
? preempt_count_sub+0x9/0x50
? _raw_spin_unlock_irq+0x37/0x50
? _raw_spin_unlock_irqrestore+0x55/0x70
? trace_hardirqs_on_caller+0x144/0x1e0
? cpu_stop_should_run+0x35/0x40
? preempt_count_sub+0x9/0x50
? _raw_spin_unlock_irqrestore+0x41/0x70
smpboot_thread_fn+0x174/0x2f0
? sort_range+0x30/0x30
kthread+0xc4/0xe0
ret_from_kernel_thread+0x21/0x30
? kthread_create_on_node+0x180/0x180
---[ end trace 15f4c86d404693b0 ]---
As Peterz pointed out:
| So the normal rules for changing task_struct::cpus_allowed are holding
| both pi_lock and rq->lock, such that holding either stabilizes the mask.
|
| This is so that wakeup can happen without rq->lock and load-balance
| without pi_lock.
|
| From this we already get the relaxation that we can omit acquiring
| rq->lock if the task is not on the rq, because in that case
| load-balancing will not apply to it.
|
| ** these are the rules currently tested in do_set_cpus_allowed() **
|
| Now, since __set_cpus_allowed_ptr() uses task_rq_lock() which
| unconditionally acquires both locks, we could get away with holding just
| rq->lock when on_rq for modification because that'd still exclude
| __set_cpus_allowed_ptr(), it would also work against
| __kthread_bind_mask() because that assumes !on_rq.
|
| That said, this is all somewhat fragile.
|
| Now, I don't think dropping rq->lock is quite as disastrous as it
| usually is because !cpu_active at this point, which means load-balance
| will not interfere, but that too is somewhat fragile.
|
| So we end up with a choice of two fragile..
This patch fix it by following the rules for changing task_struct::cpus_allowed
w/ both pi_lock and rq->lock are held.
Reported-by: kernel test robot <[email protected]>
Reported-by: Sasha Levin <[email protected]>
Signed-off-by: Wanpeng Li <[email protected]>
---
v1 -> v2:
* fix the silly double lock stuff
* follow the rules for changing task_struct::cpus_allowed
kernel/sched/core.c | 22 ++++++++++++++++++++++
1 files changed, 22 insertions(+), 0 deletions(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index b3386c6..8cf87e3 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5186,6 +5186,27 @@ static void migrate_tasks(struct rq *dead_rq)
BUG_ON(!next);
next->sched_class->put_prev_task(rq, next);
+ /*
+ * Rules for changing task_struct::cpus_allowed are holding
+ * both pi_lock and rq->lock, such that holding either
+ * stabilizes the mask.
+ *
+ * Drop rq->lock is not quite as disastrous as it usually is
+ * because !cpu_active at this point, which means load-balance
+ * will not interfere.
+ */
+ lockdep_unpin_lock(&rq->lock);
+ raw_spin_unlock(&rq->lock);
+ raw_spin_lock(&next->pi_lock);
+ raw_spin_lock(&rq->lock);
+ lockdep_pin_lock(&rq->lock);
+ if (!(task_rq(next) == rq && task_on_rq_queued(next))) {
+ lockdep_unpin_lock(&rq->lock);
+ raw_spin_unlock(&rq->lock);
+ raw_spin_unlock(&next->pi_lock);
+ continue;
+ }
+
/* Find suitable destination for @next, with force if needed. */
dest_cpu = select_fallback_rq(dead_rq->cpu, next);
@@ -5196,6 +5217,7 @@ static void migrate_tasks(struct rq *dead_rq)
rq = dead_rq;
raw_spin_lock(&rq->lock);
}
+ raw_spin_unlock(&next->pi_lock);
}
rq->stop = stop;
--
1.7.1
Hi Wanpeng,
On Fri, Aug 28, 2015 at 12:02:47PM +0800, Wanpeng Li wrote:
<snip>
> This patch fix it by following the rules for changing task_struct::cpus_allowed
> w/ both pi_lock and rq->lock are held.
>
> Reported-by: kernel test robot <[email protected]>
> Reported-by: Sasha Levin <[email protected]>
> Signed-off-by: Wanpeng Li <[email protected]>
> ---
> v1 -> v2:
> * fix the silly double lock stuff
> * follow the rules for changing task_struct::cpus_allowed
>
> kernel/sched/core.c | 22 ++++++++++++++++++++++
> 1 files changed, 22 insertions(+), 0 deletions(-)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index b3386c6..8cf87e3 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -5186,6 +5186,27 @@ static void migrate_tasks(struct rq *dead_rq)
> BUG_ON(!next);
> next->sched_class->put_prev_task(rq, next);
>
> + /*
> + * Rules for changing task_struct::cpus_allowed are holding
> + * both pi_lock and rq->lock, such that holding either
> + * stabilizes the mask.
> + *
> + * Drop rq->lock is not quite as disastrous as it usually is
> + * because !cpu_active at this point, which means load-balance
> + * will not interfere.
> + */
> + lockdep_unpin_lock(&rq->lock);
> + raw_spin_unlock(&rq->lock);
> + raw_spin_lock(&next->pi_lock);
> + raw_spin_lock(&rq->lock);
> + lockdep_pin_lock(&rq->lock);
> + if (!(task_rq(next) == rq && task_on_rq_queued(next))) {
> + lockdep_unpin_lock(&rq->lock);
> + raw_spin_unlock(&rq->lock);
Dropping rq->lock here means we will continue the loop without the
rq->lock, right? But we do have a lockdep_pin_lock(&rq->lock) in the
beginning of every iteration. So can we release rq->lock here?
Regards,
Boqun
> + raw_spin_unlock(&next->pi_lock);
> + continue;
> + }
> +
> /* Find suitable destination for @next, with force if needed. */
> dest_cpu = select_fallback_rq(dead_rq->cpu, next);
>
> @@ -5196,6 +5217,7 @@ static void migrate_tasks(struct rq *dead_rq)
> rq = dead_rq;
> raw_spin_lock(&rq->lock);
> }
> + raw_spin_unlock(&next->pi_lock);
> }
>
> rq->stop = stop;
> --
> 1.7.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
Hi Boqun,
On 8/28/15 2:33 PM, Boqun Feng wrote:
> Hi Wanpeng,
>
> On Fri, Aug 28, 2015 at 12:02:47PM +0800, Wanpeng Li wrote:
> <snip>
>> This patch fix it by following the rules for changing task_struct::cpus_allowed
>> w/ both pi_lock and rq->lock are held.
>>
>> Reported-by: kernel test robot <[email protected]>
>> Reported-by: Sasha Levin <[email protected]>
>> Signed-off-by: Wanpeng Li <[email protected]>
>> ---
>> v1 -> v2:
>> * fix the silly double lock stuff
>> * follow the rules for changing task_struct::cpus_allowed
>>
>> kernel/sched/core.c | 22 ++++++++++++++++++++++
>> 1 files changed, 22 insertions(+), 0 deletions(-)
>>
>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> index b3386c6..8cf87e3 100644
>> --- a/kernel/sched/core.c
>> +++ b/kernel/sched/core.c
>> @@ -5186,6 +5186,27 @@ static void migrate_tasks(struct rq *dead_rq)
>> BUG_ON(!next);
>> next->sched_class->put_prev_task(rq, next);
>>
>> + /*
>> + * Rules for changing task_struct::cpus_allowed are holding
>> + * both pi_lock and rq->lock, such that holding either
>> + * stabilizes the mask.
>> + *
>> + * Drop rq->lock is not quite as disastrous as it usually is
>> + * because !cpu_active at this point, which means load-balance
>> + * will not interfere.
>> + */
>> + lockdep_unpin_lock(&rq->lock);
>> + raw_spin_unlock(&rq->lock);
>> + raw_spin_lock(&next->pi_lock);
>> + raw_spin_lock(&rq->lock);
>> + lockdep_pin_lock(&rq->lock);
>> + if (!(task_rq(next) == rq && task_on_rq_queued(next))) {
>> + lockdep_unpin_lock(&rq->lock);
>> + raw_spin_unlock(&rq->lock);
> Dropping rq->lock here means we will continue the loop without the
> rq->lock, right? But we do have a lockdep_pin_lock(&rq->lock) in the
> beginning of every iteration. So can we release rq->lock here?
Good catch! There is no need to lockdep_unpin and unlock rq->lock I think.
Regards,
Wanpeng Li
>
> Regards,
> Boqun
>
>> + raw_spin_unlock(&next->pi_lock);
>> + continue;
>> + }
>> +
>> /* Find suitable destination for @next, with force if needed. */
>> dest_cpu = select_fallback_rq(dead_rq->cpu, next);
>>
>> @@ -5196,6 +5217,7 @@ static void migrate_tasks(struct rq *dead_rq)
>> rq = dead_rq;
>> raw_spin_lock(&rq->lock);
>> }
>> + raw_spin_unlock(&next->pi_lock);
>> }
>>
>> rq->stop = stop;
>> --
>> 1.7.1
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at http://www.tux.org/lkml/