Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754722AbbLFUzV (ORCPT ); Sun, 6 Dec 2015 15:55:21 -0500 Received: from e31.co.us.ibm.com ([32.97.110.149]:44924 "EHLO e31.co.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754458AbbLFUzS (ORCPT ); Sun, 6 Dec 2015 15:55:18 -0500 X-IBM-Helo: d03dlp03.boulder.ibm.com X-IBM-MailFrom: paulmck@linux.vnet.ibm.com X-IBM-RcptTo: linux-kernel@vger.kernel.org Date: Sun, 6 Dec 2015 12:56:03 -0800 From: "Paul E. McKenney" To: tglx@linutronix.de, peterz@infradead.org, preeti@linux.vnet.ibm.com, viresh.kumar@linaro.org, mtosatti@redhat.com, fweisbec@gmail.com Cc: linux-kernel@vger.kernel.org, sasha.levin@oracle.com Subject: Re: Possible issue with commit 4961b6e11825? Message-ID: <20151206205603.GA9008@linux.vnet.ibm.com> Reply-To: paulmck@linux.vnet.ibm.com References: <20151204232022.GA15891@linux.vnet.ibm.com> <20151205190124.GA1990@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20151205190124.GA1990@linux.vnet.ibm.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 15120620-8236-0000-0000-00001421267D Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2209 Lines: 45 On Sat, Dec 05, 2015 at 11:01:24AM -0800, Paul E. McKenney wrote: > On Fri, Dec 04, 2015 at 03:20:22PM -0800, Paul E. McKenney wrote: > > Hello! > > > > Are there any known issues with commit 4961b6e11825 (sched: core: Use > > hrtimer_start[_expires]())? > > > > The reason that I ask is that I am about 90% sure that an rcutorture > > failure bisects to that commit. I will be running more tests on > > 3497d206c4d9 (perf: core: Use hrtimer_start()), which is the predecessor > > of 4961b6e11825, and which, unlike 4961b6e11825, passes a 12-hour > > rcutorture test with scenario TREE03. In contrast, 4961b6e11825 gets > > 131 RCU CPU stall warnings, 132 reports of one of RCU's grace-period > > kthreads being starved, and 525 reports of one of rcutorture's kthreads > > being starved. Most of the test runs hang on shutdown, which is no > > surprise if an RCU CPU stall is happening at about that time. > > > > But perhaps 3497d206c4d9 was just getting lucky, hence additional testing > > over the weekend. > > And it was getting lucky. In a set of 24 two-hour runs (triple parallel) > on an earlier commit (not 3497d206c4d9, no clue what I was thinking) got > me two failed runs, for a total of 49 reports of one of RCU's grace-period > kthreads being starved, no reports of rcutorture's kthreads being starved, > and no hangs on shutdown. So much lower failure rate, but still failures. > > At this point, I am a bit disgusted with bisection, so my next test cycle > (36 two-hour runs on a system capable of doing three concurrently) is on > the most recent -rcu, but with CPU hotplug disabled. If that shows failures, > then I hammer 3497d206c4d9 hard. And no failures on current -rcu with CPU hotplug disabled. So this seems to be specific to CPU hotplug. So my next step is to fix some remaining known CPU-hotplug issues in RCU. And Thomas, when you get those CPU-hotplug patches ready, I have a testcase for you! ;-) Thanx, Paul -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/