Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758716AbcCDPSc (ORCPT ); Fri, 4 Mar 2016 10:18:32 -0500 Received: from e32.co.us.ibm.com ([32.97.110.150]:53502 "EHLO e32.co.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756790AbcCDPS2 (ORCPT ); Fri, 4 Mar 2016 10:18:28 -0500 X-IBM-Helo: d03dlp01.boulder.ibm.com X-IBM-MailFrom: paulmck@linux.vnet.ibm.com X-IBM-RcptTo: linux-kernel@vger.kernel.org Date: Fri, 4 Mar 2016 07:18:24 -0800 From: "Paul E. McKenney" To: Ross Green Cc: Mathieu Desnoyers , John Stultz , Thomas Gleixner , Peter Zijlstra , lkml , Ingo Molnar , Lai Jiangshan , dipankar@in.ibm.com, Andrew Morton , Josh Triplett , rostedt , David Howells , Eric Dumazet , Darren Hart , =?iso-8859-1?Q?Fr=E9d=E9ric?= Weisbecker , Oleg Nesterov , pranith kumar Subject: Re: rcu_preempt self-detected stall on CPU from 4.5-rc3, since 3.17 Message-ID: <20160304151824.GR3577@linux.vnet.ibm.com> Reply-To: paulmck@linux.vnet.ibm.com References: <20160220063248.GE3522@linux.vnet.ibm.com> <686568926.5862.1456259651418.JavaMail.zimbra@efficios.com> <20160223205522.GT3522@linux.vnet.ibm.com> <20160226005638.GV3522@linux.vnet.ibm.com> <20160226013557.GA20904@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 16030415-0005-0000-0000-00001D112899 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3371 Lines: 84 On Fri, Mar 04, 2016 at 04:30:12PM +1100, Ross Green wrote: > On Fri, Feb 26, 2016 at 12:35 PM, Paul E. McKenney wrote: [ . . . ] > >> OK, so what wakeup path omits the sched_wakeup event? > >> > >> The sched_waking event looks to occur once in try_to_wake_up() and > >> once in try_to_wake_up_local(). Starting with try_to_wake_up(): > >> > >> o If the task is ->on_rq, ttwu_remote() is invoked: > >> > >> o This acquires the runqueue lock, then if > >> task_on_rq_queued() invokes ttwu_do_wakeup(). This > >> unconditionally does sched_wakeup, so we didn't go that > >> way. (And this path skips the bulk of try_to_wake_up() > >> on return.) > >> > >> o Otherwise, we release the runqueu lock and returns zero. > >> > >> o There is some ordering checking, runqueue selection, and then > >> p->state is set to TASK_WAKING. And we apparently are not getting > >> here, either. But I don't see any other way out. > >> > >> Ignoring this for the moment... > >> > >> We eventually reach to the call to ttwu_queue(). > >> > >> o Here the TTWU_QUEUE path seems to avoid doing a > >> sched_wakeup event -- and since we are trying to wake > >> CPU 0 from CPU 4, so they don't share cache (x86). > >> > >> o This invokes ttwu_queue_remote(), which sends an IPI > >> unless polling is in effect. I would need to enable > >> trace_sched_wake_idle_without_ipi() to see whether or > >> not the IPI was actually sent. > >> > >> If the target CPU was offline, we should have seen the > >> cpu_is_offline() WARN_ON(). I suppose that the CPU might > >> go offline between the check and the ->send_IPI_mask(), > >> but only once. And we are trying to wakeup on CPU 0 > >> quite a few times. > >> > >> Any thoughts on what to look for? > >> > >> Next, try_to_wake_up_local(): > >> > >> o After doing several checks, it does the sched_waking event. > >> > >> o If the task is already queued, it calls ttwu_activate(). > >> > >> o It then invokes ttwu_do_wakeup(), which unconditionally > >> does the sched_wakeup() event. > >> > >> So this path looks unlikely, even ignoring the fact that > >> the waking CPU in the traces above is always different than > >> the CPU to be awakened on. > >> > >> Any thoughts? > >> > >> Thanx, Paul > G'day, > > > Here is a series of rcu_preempt stall events(5) from linux-4.5-rc6 release. > > Again some testing procedure. boot, run series of brief benchmarks and > then leave idle. > The first stall event appeared quite quickly - within hours, the rest > at what appears to be random intervals after that. > > > I thought I might give Daniels patch set a try and see how that goes! Looks like the same issue from dmesg. For my part, I added more tracing, which seems to have further decreased the probability of occurrence. The sched_wake_idle_without_ipi event did not appear. My next step is to try writing a torture test focused specifically on this issue. We need a faster reproducer to make decent progress. Thanx, Paul