Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S936139AbdIZDTU (ORCPT ); Mon, 25 Sep 2017 23:19:20 -0400 Received: from smtprelay0130.hostedemail.com ([216.40.44.130]:33795 "EHLO smtprelay.hostedemail.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S935811AbdIZDTS (ORCPT ); Mon, 25 Sep 2017 23:19:18 -0400 X-Session-Marker: 726F737465647440676F6F646D69732E6F7267 X-Spam-Summary: 2,0,0,,d41d8cd98f00b204,rostedt@goodmis.org,:::::::::,RULES_HIT:41:355:379:541:599:800:960:973:988:989:1260:1277:1311:1313:1314:1345:1359:1437:1515:1516:1518:1534:1542:1593:1594:1711:1730:1747:1777:1792:2198:2199:2393:2553:2559:2562:2693:3138:3139:3140:3141:3142:3165:3354:3622:3865:3867:3868:3871:3872:3874:4362:4605:5007:6117:6248:6261:7875:7903:7974:9040:10004:10400:10562:10848:10967:11026:11232:11658:11914:12043:12114:12663:12740:12760:12895:13255:13439:13618:14096:14097:14181:14659:14721:21080:21324:21325:21433:21451:21627:30012:30054:30070:30090:30091,0,RBL:none,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:,MSBL:0,DNSBL:none,Custom_rules:0:0:0,LFtime:1,LUA_SUMMARY:none X-HE-Tag: trees52_8adfd467f5213 X-Filterd-Recvd-Size: 3226 Date: Mon, 25 Sep 2017 23:18:43 -0400 From: Steven Rostedt To: zhouchengming Cc: , , , Subject: Re: [PATCH] sched/rt.c: pick and check task if double_lock_balance() unlock the rq Message-ID: <20170925231843.140d3a21@vmware.local.home> In-Reply-To: <59C9AC08.9000302@huawei.com> References: <1505112709-102019-1-git-send-email-zhouchengming1@huawei.com> <20170925154057.191e3fd1@vmware.local.home> <59C9AC08.9000302@huawei.com> X-Mailer: Claws Mail 3.15.0-dirty (GTK+ 2.24.31; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2178 Lines: 68 On Tue, 26 Sep 2017 09:23:20 +0800 zhouchengming wrote: > On 2017/9/26 3:40, Steven Rostedt wrote: > > On Mon, 11 Sep 2017 14:51:49 +0800 > > Zhou Chengming wrote: > > > >> push_rt_task() pick the first pushable task and find an eligible > >> lowest_rq, then double_lock_balance(rq, lowest_rq). So if > >> double_lock_balance() unlock the rq (when double_lock_balance() return 1), > >> we have to check if this task is still on the rq. > >> > >> The problem is that the check conditions are not sufficient: > >> > >> if (unlikely(task_rq(task) != rq || > >> !cpumask_test_cpu(lowest_rq->cpu,&task->cpus_allowed) || > >> task_running(rq, task) || > >> !rt_task(task) || > >> !task_on_rq_queued(task))) { > >> > >> cpu2 cpu1 cpu0 > >> push_rt_task(rq1) > >> pick task_A on rq1 > >> find rq0 > >> double_lock_balance(rq1, rq0) > >> unlock(rq1) > >> rq1 __schedule > >> pick task_A run > >> task_A sleep (dequeued) > >> lock(rq0) > >> lock(rq1) > >> do_above_check(task_A) > >> task_rq(task_A) == rq1 > >> cpus_allowed unchanged > >> task_running == false > >> rt_task(task_A) == true > >> try_to_wake_up(task_A) > >> select_cpu = cpu3 > >> enqueue(rq3, task_A) > > How can this happen? The try_to_wake_up(task_A) needs to grab the rq > > that task A is on, and we have that rq lock. > > > > /me confused. > > > > -- Steve > > Thanks for the reply! > After the task_A sleep on cpu1, the try_to_wake_up(task_A) on cpu0 select a different cpu3, > so it will grab the rq3 lock, not the rq1 lock. Ah crap. This is caused by 7608dec2ce20 ("sched: Drop the rq argument to sched_class::select_task_rq()"). Because this code depends on try_to_wake_up() grabbing the task's rq lock. But it no longer does that, and it causes this race. OK, I need to look at this deeper when I'm not so jetlagged and typing this because I can't sleep at 5am. Thanks for pointing this out! It may be fixed by simply grabbing the run queue lock on migration, as that would sync things up. Peter? -- Steve