Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751721AbbH1G4I (ORCPT ); Fri, 28 Aug 2015 02:56:08 -0400 Received: from blu004-omc1s38.hotmail.com ([65.55.116.49]:50401 "EHLO BLU004-OMC1S38.hotmail.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751070AbbH1G4H (ORCPT ); Fri, 28 Aug 2015 02:56:07 -0400 X-TMN: [kjNXzChSPmPfVcFzEZrh+lfTSfjash1xw641EYG+n1E=] X-Originating-Email: [wanpeng.li@hotmail.com] Message-ID: From: Wanpeng Li To: Ingo Molnar , Peter Zijlstra CC: Sasha Levin , kernel test robot , Boqun Feng , linux-kernel@vger.kernel.org, Wanpeng Li Subject: [PATCH v3] sched: fix tsk->pi_lock isn't held when do_set_cpus_allowed() Date: Fri, 28 Aug 2015 14:55:56 +0800 X-Mailer: git-send-email 1.9.1 X-OriginalArrivalTime: 28 Aug 2015 06:56:04.0013 (UTC) FILETIME=[9B29D5D0:01D0E15E] MIME-Version: 1.0 Content-Type: text/plain Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4286 Lines: 126 ------------[ cut here ]------------ WARNING: CPU: 0 PID: 13 at kernel/sched/core.c:1156 do_set_cpus_allowed+0x7e/0x80() Modules linked in: CPU: 0 PID: 13 Comm: migration/0 Not tainted 4.2.0-rc1-00049-g25834c7 #2 Call Trace: dump_stack+0x4b/0x75 warn_slowpath_common+0x8b/0xc0 ? do_set_cpus_allowed+0x7e/0x80 ? do_set_cpus_allowed+0x7e/0x80 warn_slowpath_null+0x22/0x30 do_set_cpus_allowed+0x7e/0x80 cpuset_cpus_allowed_fallback+0x7c/0x170 ? cpuset_cpus_allowed+0x180/0x180 select_fallback_rq+0x221/0x280 migration_call+0xe3/0x250 notifier_call_chain+0x53/0x70 __raw_notifier_call_chain+0x1e/0x30 cpu_notify+0x28/0x50 take_cpu_down+0x22/0x40 multi_cpu_stop+0xd5/0x140 ? __stop_cpus+0x80/0x80 cpu_stopper_thread+0xbc/0x170 ? preempt_count_sub+0x9/0x50 ? _raw_spin_unlock_irq+0x37/0x50 ? _raw_spin_unlock_irqrestore+0x55/0x70 ? trace_hardirqs_on_caller+0x144/0x1e0 ? cpu_stop_should_run+0x35/0x40 ? preempt_count_sub+0x9/0x50 ? _raw_spin_unlock_irqrestore+0x41/0x70 smpboot_thread_fn+0x174/0x2f0 ? sort_range+0x30/0x30 kthread+0xc4/0xe0 ret_from_kernel_thread+0x21/0x30 ? kthread_create_on_node+0x180/0x180 ---[ end trace 15f4c86d404693b0 ]--- As Peterz pointed out: | So the normal rules for changing task_struct::cpus_allowed are holding | both pi_lock and rq->lock, such that holding either stabilizes the mask. | | This is so that wakeup can happen without rq->lock and load-balance | without pi_lock. | | From this we already get the relaxation that we can omit acquiring | rq->lock if the task is not on the rq, because in that case | load-balancing will not apply to it. | | ** these are the rules currently tested in do_set_cpus_allowed() ** | | Now, since __set_cpus_allowed_ptr() uses task_rq_lock() which | unconditionally acquires both locks, we could get away with holding just | rq->lock when on_rq for modification because that'd still exclude | __set_cpus_allowed_ptr(), it would also work against | __kthread_bind_mask() because that assumes !on_rq. | | That said, this is all somewhat fragile. | | Now, I don't think dropping rq->lock is quite as disastrous as it | usually is because !cpu_active at this point, which means load-balance | will not interfere, but that too is somewhat fragile. | | So we end up with a choice of two fragile.. This patch fix it by following the rules for changing task_struct::cpus_allowed w/ both pi_lock and rq->lock are held. Reported-by: kernel test robot Reported-by: Sasha Levin Signed-off-by: Wanpeng Li --- v2 -> v3: * drop the unnecessary lockdep_unpin and unlock rq->lock v1 -> v2: * fix the silly double lock stuff * follow the rules for changing task_struct::cpus_allowed kernel/sched/core.c | 20 ++++++++++++++++++++ 1 files changed, 20 insertions(+), 0 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index b3386c6..56d19cc 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -5186,6 +5186,25 @@ static void migrate_tasks(struct rq *dead_rq) BUG_ON(!next); next->sched_class->put_prev_task(rq, next); + /* + * Rules for changing task_struct::cpus_allowed are holding + * both pi_lock and rq->lock, such that holding either + * stabilizes the mask. + * + * Drop rq->lock is not quite as disastrous as it usually is + * because !cpu_active at this point, which means load-balance + * will not interfere. + */ + lockdep_unpin_lock(&rq->lock); + raw_spin_unlock(&rq->lock); + raw_spin_lock(&next->pi_lock); + raw_spin_lock(&rq->lock); + lockdep_pin_lock(&rq->lock); + if (!(task_rq(next) == rq && task_on_rq_queued(next))) { + raw_spin_unlock(&next->pi_lock); + continue; + } + /* Find suitable destination for @next, with force if needed. */ dest_cpu = select_fallback_rq(dead_rq->cpu, next); @@ -5196,6 +5215,7 @@ static void migrate_tasks(struct rq *dead_rq) rq = dead_rq; raw_spin_lock(&rq->lock); } + raw_spin_unlock(&next->pi_lock); } rq->stop = stop; -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/