Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757064AbcCUR0X (ORCPT ); Mon, 21 Mar 2016 13:26:23 -0400 Received: from e35.co.us.ibm.com ([32.97.110.153]:45774 "EHLO e35.co.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755732AbcCUR0V (ORCPT ); Mon, 21 Mar 2016 13:26:21 -0400 X-IBM-Helo: d03dlp03.boulder.ibm.com X-IBM-MailFrom: paulmck@linux.vnet.ibm.com X-IBM-RcptTo: linux-kernel@vger.kernel.org Date: Mon, 21 Mar 2016 10:26:16 -0700 From: "Paul E. McKenney" To: Jacob Pan Cc: Josh Triplett , Ross Green , Mathieu Desnoyers , John Stultz , Thomas Gleixner , Peter Zijlstra , lkml , Ingo Molnar , Lai Jiangshan , dipankar@in.ibm.com, Andrew Morton , rostedt , David Howells , Eric Dumazet , Darren Hart , =?iso-8859-1?Q?Fr=E9d=E9ric?= Weisbecker , Oleg Nesterov , pranith kumar , "Chatre, Reinette" Subject: Re: rcu_preempt self-detected stall on CPU from 4.5-rc3, since 3.17 Message-ID: <20160321172616.GU4287@linux.vnet.ibm.com> Reply-To: paulmck@linux.vnet.ibm.com References: <686568926.5862.1456259651418.JavaMail.zimbra@efficios.com> <20160223205522.GT3522@linux.vnet.ibm.com> <20160226005638.GV3522@linux.vnet.ibm.com> <20160318210011.GA571@cloud> <20160318235641.GH4287@linux.vnet.ibm.com> <20160321092230.75f23fa9@yairi> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20160321092230.75f23fa9@yairi> User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 16032117-0013-0000-0000-000020A7502F Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1375 Lines: 38 On Mon, Mar 21, 2016 at 09:22:30AM -0700, Jacob Pan wrote: > On Fri, 18 Mar 2016 16:56:41 -0700 > "Paul E. McKenney" wrote: > > On Fri, Mar 18, 2016 at 02:00:11PM -0700, Josh Triplett wrote: > > > On Thu, Feb 25, 2016 at 04:56:38PM -0800, Paul E. McKenney wrote: [ . . . ] > > > We're seeing a similar stall (~60 seconds) on an x86 development > > > system here. Any luck tracking down the cause of this? If not, any > > > suggestions for traces that might be helpful? > > > > The dmesg containing the stall, the kernel version, and the .config > > would be helpful! Working on a torture test specific to this bug... > > > > Thanx, Paul > > > +Reinette, she has the system that can reproduce the issue. I > believe she is having some other problems with it at the moment. But > the .config should be available. Version is v4.5. A couple of additional questions: 1. Is the test running on bare metal or virtualized? If the latter, what is the host? 2. Does the workload involve CPU hotplug? 3. Are you seeing things like this in dmesg? "rcu_preempt kthread starved for 21033 jiffies" "rcu_sched kthread starved for 32103 jiffies" "rcu_bh kthread starved for 84031 jiffies" If not, you are probably facing some other bug, and should proceed debugging as described in Documentation/RCU/stallwarn.txt. Thanx, Paul