Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752682AbbH1N6c (ORCPT ); Fri, 28 Aug 2015 09:58:32 -0400 Received: from blu004-omc1s20.hotmail.com ([65.55.116.31]:62693 "EHLO BLU004-OMC1S20.hotmail.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752612AbbH1N6b (ORCPT ); Fri, 28 Aug 2015 09:58:31 -0400 X-TMN: [7JjMWUC9MNf38VKNawml2vB+zK/KxEzdPIX8xbwOU5c=] X-Originating-Email: [wanpeng.li@hotmail.com] Message-ID: Subject: Re: [PATCH v3] sched: fix tsk->pi_lock isn't held when do_set_cpus_allowed() To: Peter Zijlstra References: <20150828132928.GE19282@twins.programming.kicks-ass.net> CC: Ingo Molnar , Sasha Levin , kernel test robot , Boqun Feng , linux-kernel@vger.kernel.org From: Wanpeng Li Date: Fri, 28 Aug 2015 21:58:05 +0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:38.0) Gecko/20100101 Thunderbird/38.2.0 MIME-Version: 1.0 In-Reply-To: <20150828132928.GE19282@twins.programming.kicks-ass.net> Content-Type: text/plain; charset="windows-1252"; format=flowed Content-Transfer-Encoding: 7bit X-OriginalArrivalTime: 28 Aug 2015 13:58:29.0035 (UTC) FILETIME=[9DF947B0:01D0E199] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5153 Lines: 144 On 8/28/15 9:29 PM, Peter Zijlstra wrote: > On Fri, Aug 28, 2015 at 02:55:56PM +0800, Wanpeng Li wrote: >> This patch fix it by following the rules for changing task_struct::cpus_allowed >> w/ both pi_lock and rq->lock are held. > Thanks, I made that the below. There was a pin leak and I turned the > safety check into a WARN_ON because it really should not happen. > > I also munged some of the comments a bit and did some slight edits to > the Changelog. Cool, thanks for the help. :-) Regards, Wanpeng Li > > --- > Subject: sched: 'Annotate' migrate_tasks() > From: Wanpeng Li > Date: Fri, 28 Aug 2015 14:55:56 +0800 > > | WARNING: CPU: 0 PID: 13 at kernel/sched/core.c:1156 do_set_cpus_allowed+0x7e/0x80() > | Modules linked in: > | CPU: 0 PID: 13 Comm: migration/0 Not tainted 4.2.0-rc1-00049-g25834c7 #2 > | Call Trace: > | dump_stack+0x4b/0x75 > | warn_slowpath_common+0x8b/0xc0 > | warn_slowpath_null+0x22/0x30 > | do_set_cpus_allowed+0x7e/0x80 > | cpuset_cpus_allowed_fallback+0x7c/0x170 > | select_fallback_rq+0x221/0x280 > | migration_call+0xe3/0x250 > | notifier_call_chain+0x53/0x70 > | __raw_notifier_call_chain+0x1e/0x30 > | cpu_notify+0x28/0x50 > | take_cpu_down+0x22/0x40 > | multi_cpu_stop+0xd5/0x140 > | cpu_stopper_thread+0xbc/0x170 > | smpboot_thread_fn+0x174/0x2f0 > | kthread+0xc4/0xe0 > | ret_from_kernel_thread+0x21/0x30 > > As Peterz pointed out: > > | So the normal rules for changing task_struct::cpus_allowed are holding > | both pi_lock and rq->lock, such that holding either stabilizes the mask. > | > | This is so that wakeup can happen without rq->lock and load-balance > | without pi_lock. > | > | From this we already get the relaxation that we can omit acquiring > | rq->lock if the task is not on the rq, because in that case > | load-balancing will not apply to it. > | > | ** these are the rules currently tested in do_set_cpus_allowed() ** > | > | Now, since __set_cpus_allowed_ptr() uses task_rq_lock() which > | unconditionally acquires both locks, we could get away with holding just > | rq->lock when on_rq for modification because that'd still exclude > | __set_cpus_allowed_ptr(), it would also work against > | __kthread_bind_mask() because that assumes !on_rq. > | > | That said, this is all somewhat fragile. > | > | Now, I don't think dropping rq->lock is quite as disastrous as it > | usually is because !cpu_active at this point, which means load-balance > | will not interfere, but that too is somewhat fragile. > | > | So we end up with a choice of two fragile.. > > This patch fixes it by following the rules for changing > task_struct::cpus_allowed with both pi_lock and rq->lock held. > > Cc: Ingo Molnar > Reported-by: kernel test robot > Reported-by: Sasha Levin > Signed-off-by: Wanpeng Li > [Modified changelog and patch] > Signed-off-by: Peter Zijlstra (Intel) > Link: http://lkml.kernel.org/r/BLU436-SMTP1660820490DE202E3934ED3806E0@phx.gbl > --- > > kernel/sched/core.c | 29 ++++++++++++++++++++++++++--- > 1 file changed, 26 insertions(+), 3 deletions(-) > > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -5178,24 +5178,47 @@ static void migrate_tasks(struct rq *dea > break; > > /* > - * Ensure rq->lock covers the entire task selection > - * until the migration. > + * pick_next_task assumes pinned rq->lock. > */ > lockdep_pin_lock(&rq->lock); > next = pick_next_task(rq, &fake_task); > BUG_ON(!next); > next->sched_class->put_prev_task(rq, next); > > + /* > + * Rules for changing task_struct::cpus_allowed are holding > + * both pi_lock and rq->lock, such that holding either > + * stabilizes the mask. > + * > + * Drop rq->lock is not quite as disastrous as it usually is > + * because !cpu_active at this point, which means load-balance > + * will not interfere. Also, stop-machine. > + */ > + lockdep_unpin_lock(&rq->lock); > + raw_spin_unlock(&rq->lock); > + raw_spin_lock(&next->pi_lock); > + raw_spin_lock(&rq->lock); > + > + /* > + * Since we're inside stop-machine, _nothing_ should have > + * changed the task, WARN if weird stuff happened, because in > + * that case the above rq->lock drop is a fail too. > + */ > + if (WARN_ON(task_rq(next) != rq || !task_on_rq_queued(next))) { > + raw_spin_unlock(&next->pi_lock); > + continue; > + } > + > /* Find suitable destination for @next, with force if needed. */ > dest_cpu = select_fallback_rq(dead_rq->cpu, next); > > - lockdep_unpin_lock(&rq->lock); > rq = __migrate_task(rq, next, dest_cpu); > if (rq != dead_rq) { > raw_spin_unlock(&rq->lock); > rq = dead_rq; > raw_spin_lock(&rq->lock); > } > + raw_spin_unlock(&next->pi_lock); > } > > rq->stop = stop; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/