Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754916AbcC1Q34 (ORCPT ); Mon, 28 Mar 2016 12:29:56 -0400 Received: from e38.co.us.ibm.com ([32.97.110.159]:53376 "EHLO e38.co.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754530AbcC1Q3w (ORCPT ); Mon, 28 Mar 2016 12:29:52 -0400 X-IBM-Helo: d03dlp03.boulder.ibm.com X-IBM-MailFrom: paulmck@linux.vnet.ibm.com X-IBM-RcptTo: linux-kernel@vger.kernel.org Date: Mon, 28 Mar 2016 09:29:54 -0700 From: "Paul E. McKenney" To: Mathieu Desnoyers Cc: Thomas Gleixner , Peter Zijlstra , Jacob Pan , Josh Triplett , Ross Green , John Stultz , lkml , Ingo Molnar , Lai Jiangshan , dipankar@in.ibm.com, Andrew Morton , rostedt , David Howells , Eric Dumazet , Darren Hart , =?iso-8859-1?Q?Fr=E9d=E9ric?= Weisbecker , Oleg Nesterov , pranith kumar , "Chatre, Reinette" Subject: Re: rcu_preempt self-detected stall on CPU from 4.5-rc3, since 3.17 Message-ID: <20160328162954.GK4287@linux.vnet.ibm.com> Reply-To: paulmck@linux.vnet.ibm.com References: <20160223205522.GT3522@linux.vnet.ibm.com> <20160321092230.75f23fa9@yairi> <20160327205439.GY6356@twins.programming.kicks-ass.net> <20160327210914.GD4287@linux.vnet.ibm.com> <20160328062851.GE6344@twins.programming.kicks-ass.net> <20160328132932.GF4287@linux.vnet.ibm.com> <24778627.37842.1459177656260.JavaMail.zimbra@efficios.com> <20160328155635.GJ4287@linux.vnet.ibm.com> <214640813.37884.1459181539377.JavaMail.zimbra@efficios.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <214640813.37884.1459181539377.JavaMail.zimbra@efficios.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 16032816-0029-0000-0000-000016513D0F Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5894 Lines: 146 On Mon, Mar 28, 2016 at 04:12:19PM +0000, Mathieu Desnoyers wrote: > ----- On Mar 28, 2016, at 11:56 AM, Paul E. McKenney paulmck@linux.vnet.ibm.com wrote: > > > On Mon, Mar 28, 2016 at 03:07:36PM +0000, Mathieu Desnoyers wrote: > >> ----- On Mar 28, 2016, at 9:29 AM, Paul E. McKenney paulmck@linux.vnet.ibm.com > >> wrote: > >> > >> > On Mon, Mar 28, 2016 at 08:28:51AM +0200, Peter Zijlstra wrote: > >> >> On Sun, Mar 27, 2016 at 02:09:14PM -0700, Paul E. McKenney wrote: > >> >> > >> >> > > Does that system have MONITOR/MWAIT errata? > >> >> > > >> >> > On the off-chance that this question was also directed at me, > >> >> > >> >> Hehe, it wasn't, however, since we're here.. > >> >> > >> >> > here is > >> >> > what I am running on. I am running in a qemu/KVM virtual machine, in > >> >> > case that matters. > >> >> > >> >> Have you actually tried on real proper hardware? Does it still reproduce > >> >> there? > >> > > >> > Ross has, but I have not, given that I have a shared system on the one > >> > hand and a single-socket (four core, eight hardware thread) laptop on > >> > the other that has even longer reproduction times. The repeat-by is > >> > as follows: > >> > > >> > o Build a kernel with the following Kconfigs: > >> > > >> > CONFIG_SMP=y > >> > CONFIG_NR_CPUS=16 > >> > CONFIG_PREEMPT_NONE=n > >> > CONFIG_PREEMPT_VOLUNTARY=n > >> > CONFIG_PREEMPT=y > >> > # This should result in CONFIG_PREEMPT_RCU=y > >> > CONFIG_HZ_PERIODIC=y > >> > CONFIG_NO_HZ_IDLE=n > >> > CONFIG_NO_HZ_FULL=n > >> > CONFIG_RCU_TRACE=y > >> > CONFIG_HOTPLUG_CPU=y > >> > CONFIG_RCU_FANOUT=2 > >> > CONFIG_RCU_FANOUT_LEAF=2 > >> > CONFIG_RCU_NOCB_CPU=n > >> > CONFIG_DEBUG_LOCK_ALLOC=n > >> > CONFIG_RCU_BOOST=y > >> > CONFIG_RCU_KTHREAD_PRIO=2 > >> > CONFIG_DEBUG_OBJECTS_RCU_HEAD=n > >> > CONFIG_RCU_EXPERT=y > >> > CONFIG_RCU_TORTURE_TEST=y > >> > CONFIG_PRINTK_TIME=y > >> > CONFIG_RCU_TORTURE_TEST_SLOW_CLEANUP=y > >> > CONFIG_RCU_TORTURE_TEST_SLOW_INIT=y > >> > CONFIG_RCU_TORTURE_TEST_SLOW_PREINIT=y > >> > > >> > If desired, you can instead build with CONFIG_RCU_TORTURE_TEST=m > >> > and modprobe/insmod the module manually. > >> > > >> > o Find a two-socket x86 system or larger, with at least 16 CPUs. > >> > > >> > o Boot the kernel with the following kernel boot parameters: > >> > > >> > rcutorture.onoff_interval=1 rcutorture.onoff_holdoff=30 > >> > > >> > The onoff_holdoff is only needed for CONFIG_RCU_TORTURE_TEST=y. > >> > When manually setting up the module, you get the holdoff for > >> > free, courtesy of human timescales. > >> > > >> > In the absence of instrumentation, I get failures usually within a > >> > couple of hours, though sometimes much longer. With instrumentation, > >> > the sky appears to be the limit. :-/ > >> > > >> > Ross is running on bare metal with no CPU hotplug, so perhaps his setup > >> > is of more immediate interest. He is seeing the same symptoms that I am, > >> > namely a task being repeatedly awakened without actually coming out of > >> > TASK_INTERRUPTIBLE state, let alone running. As you pointed out earlier, > >> > he cannot be seeing the same bug that my crude patch suppresses, but > >> > given that I still see a few failures with that crude patch, it is quite > >> > possible that there is still a common bug. > >> > >> With respect to bare metal vs KVM guest, I've reported an issue with > >> inaccurate detection of TSC as being an unreliable time source on a > >> KVM guest. The basic setup is to overcommit the CPU use across the > >> entire host, thus leading to preemption of the guest. The guest TSC > >> watchdog then falsely assume that TSC is unreliable, because it gets > >> preempted for a long time (e.g. 0.5 second) between reading the HPET > >> and the TSC. > >> > >> Ref. http://lkml.iu.edu/hypermail/linux/kernel/1509.1/00379.html > >> > >> I'm wondering if what Paul is observing in the KVM setup might be > >> caused by long preemption by the host. One way to stress test this > >> is to run parallel kernel builds on the host (or in another guest) > >> while the guest is running, thus over-committing the CPU use. > >> > >> Thoughts ? > > > > If I run NO_HZ_FULL, I do get warnings about unstable timesources. > > > > And certainly guest VCPUs can be preempted. However, if they were > > preempted for the lengths of time I am seeing, I should also see > > softlockup warnings on the host, which I do not see. > > Why would you see softlockup warning on the host ? > > I expect the priority at which the kvm vcpu runs is much lower than > the priority of the rcu worker threads on the host. Therefore, you > might very well have long preemption delays for kvm vpus while the > rcu worker threads run fine on the host kernel because they have > a higher priority. > > Am I missing something ? Right, host/guest confusion on my part. I should expect softlockups on the -guest- because rcutorture runs almost entirely in kernel mode. I don't see them there, either. Another reason I do not expect preemption is the problem is because I don't see it with waketorture, which has a large number of tasks periodically waking up. Yet another reason is that I can get things moving again by doing periodic wakeups from the scheduling-clock interrupt. (Why those wakeups make things go but those from the timers don't is a mystery to me!) Please see commit 56ef96ac25c3 (DIAGS: Another horrible exploratory hack) for one piece of this. This would presumably have no effect on preemption at the host level. Thanx, Paul > Thanks, > > Mathieu > > > > > That said, perhaps I should cobble together something to force short > > repeated preemptions at the host level. Maybe that would get the > > reproduction rate sufficiently high to enable less-dainty debugging. > > > > Thanx, Paul > > -- > Mathieu Desnoyers > EfficiOS Inc. > http://www.efficios.com >