Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753710AbcC1PHo (ORCPT ); Mon, 28 Mar 2016 11:07:44 -0400 Received: from mail.efficios.com ([78.47.125.74]:53764 "EHLO mail.efficios.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752468AbcC1PHm (ORCPT ); Mon, 28 Mar 2016 11:07:42 -0400 Date: Mon, 28 Mar 2016 15:07:36 +0000 (UTC) From: Mathieu Desnoyers To: "Paul E. McKenney" , Thomas Gleixner Cc: Peter Zijlstra , Jacob Pan , Josh Triplett , Ross Green , John Stultz , lkml , Ingo Molnar , Lai Jiangshan , dipankar@in.ibm.com, Andrew Morton , rostedt , David Howells , Eric Dumazet , Darren Hart , =?utf-8?B?RnLDqWTDqXJpYw==?= Weisbecker , Oleg Nesterov , pranith kumar , "Chatre, Reinette" Message-ID: <24778627.37842.1459177656260.JavaMail.zimbra@efficios.com> In-Reply-To: <20160328132932.GF4287@linux.vnet.ibm.com> References: <20160223205522.GT3522@linux.vnet.ibm.com> <20160318210011.GA571@cloud> <20160318235641.GH4287@linux.vnet.ibm.com> <20160321092230.75f23fa9@yairi> <20160327205439.GY6356@twins.programming.kicks-ass.net> <20160327210914.GD4287@linux.vnet.ibm.com> <20160328062851.GE6344@twins.programming.kicks-ass.net> <20160328132932.GF4287@linux.vnet.ibm.com> Subject: Re: rcu_preempt self-detected stall on CPU from 4.5-rc3, since 3.17 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [78.47.125.74] X-Mailer: Zimbra 8.6.0_GA_1178 (ZimbraWebClient - FF45 (Linux)/8.6.0_GA_1178) Thread-Topic: rcu_preempt self-detected stall on CPU from 4.5-rc3, since 3.17 Thread-Index: cl/uli5Lf5KIJxAXSEbXOHTqbcNK8A== Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3538 Lines: 100 ----- On Mar 28, 2016, at 9:29 AM, Paul E. McKenney paulmck@linux.vnet.ibm.com wrote: > On Mon, Mar 28, 2016 at 08:28:51AM +0200, Peter Zijlstra wrote: >> On Sun, Mar 27, 2016 at 02:09:14PM -0700, Paul E. McKenney wrote: >> >> > > Does that system have MONITOR/MWAIT errata? >> > >> > On the off-chance that this question was also directed at me, >> >> Hehe, it wasn't, however, since we're here.. >> >> > here is >> > what I am running on. I am running in a qemu/KVM virtual machine, in >> > case that matters. >> >> Have you actually tried on real proper hardware? Does it still reproduce >> there? > > Ross has, but I have not, given that I have a shared system on the one > hand and a single-socket (four core, eight hardware thread) laptop on > the other that has even longer reproduction times. The repeat-by is > as follows: > > o Build a kernel with the following Kconfigs: > > CONFIG_SMP=y > CONFIG_NR_CPUS=16 > CONFIG_PREEMPT_NONE=n > CONFIG_PREEMPT_VOLUNTARY=n > CONFIG_PREEMPT=y > # This should result in CONFIG_PREEMPT_RCU=y > CONFIG_HZ_PERIODIC=y > CONFIG_NO_HZ_IDLE=n > CONFIG_NO_HZ_FULL=n > CONFIG_RCU_TRACE=y > CONFIG_HOTPLUG_CPU=y > CONFIG_RCU_FANOUT=2 > CONFIG_RCU_FANOUT_LEAF=2 > CONFIG_RCU_NOCB_CPU=n > CONFIG_DEBUG_LOCK_ALLOC=n > CONFIG_RCU_BOOST=y > CONFIG_RCU_KTHREAD_PRIO=2 > CONFIG_DEBUG_OBJECTS_RCU_HEAD=n > CONFIG_RCU_EXPERT=y > CONFIG_RCU_TORTURE_TEST=y > CONFIG_PRINTK_TIME=y > CONFIG_RCU_TORTURE_TEST_SLOW_CLEANUP=y > CONFIG_RCU_TORTURE_TEST_SLOW_INIT=y > CONFIG_RCU_TORTURE_TEST_SLOW_PREINIT=y > > If desired, you can instead build with CONFIG_RCU_TORTURE_TEST=m > and modprobe/insmod the module manually. > > o Find a two-socket x86 system or larger, with at least 16 CPUs. > > o Boot the kernel with the following kernel boot parameters: > > rcutorture.onoff_interval=1 rcutorture.onoff_holdoff=30 > > The onoff_holdoff is only needed for CONFIG_RCU_TORTURE_TEST=y. > When manually setting up the module, you get the holdoff for > free, courtesy of human timescales. > > In the absence of instrumentation, I get failures usually within a > couple of hours, though sometimes much longer. With instrumentation, > the sky appears to be the limit. :-/ > > Ross is running on bare metal with no CPU hotplug, so perhaps his setup > is of more immediate interest. He is seeing the same symptoms that I am, > namely a task being repeatedly awakened without actually coming out of > TASK_INTERRUPTIBLE state, let alone running. As you pointed out earlier, > he cannot be seeing the same bug that my crude patch suppresses, but > given that I still see a few failures with that crude patch, it is quite > possible that there is still a common bug. With respect to bare metal vs KVM guest, I've reported an issue with inaccurate detection of TSC as being an unreliable time source on a KVM guest. The basic setup is to overcommit the CPU use across the entire host, thus leading to preemption of the guest. The guest TSC watchdog then falsely assume that TSC is unreliable, because it gets preempted for a long time (e.g. 0.5 second) between reading the HPET and the TSC. Ref. http://lkml.iu.edu/hypermail/linux/kernel/1509.1/00379.html I'm wondering if what Paul is observing in the KVM setup might be caused by long preemption by the host. One way to stress test this is to run parallel kernel builds on the host (or in another guest) while the guest is running, thus over-committing the CPU use. Thoughts ? Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com