Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752900AbcC0UqP (ORCPT ); Sun, 27 Mar 2016 16:46:15 -0400 Received: from bombadil.infradead.org ([198.137.202.9]:43902 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752045AbcC0UqN (ORCPT ); Sun, 27 Mar 2016 16:46:13 -0400 Date: Sun, 27 Mar 2016 22:45:59 +0200 From: Peter Zijlstra To: "Paul E. McKenney" Cc: Mathieu Desnoyers , "Chatre, Reinette" , Jacob Pan , Josh Triplett , Ross Green , John Stultz , Thomas Gleixner , lkml , Ingo Molnar , Lai Jiangshan , dipankar@in.ibm.com, Andrew Morton , rostedt , David Howells , Eric Dumazet , Darren Hart , =?iso-8859-1?Q?Fr=E9d=E9ric?= Weisbecker , Oleg Nesterov , pranith kumar Subject: Re: rcu_preempt self-detected stall on CPU from 4.5-rc3, since 3.17 Message-ID: <20160327204559.GV6356@twins.programming.kicks-ass.net> References: <20160318235641.GH4287@linux.vnet.ibm.com> <0D818C7A2259ED42912C1E04120FDE26712E676E@ORSMSX111.amr.corp.intel.com> <20160325214623.GR4287@linux.vnet.ibm.com> <1370753660.36931.1458995371427.JavaMail.zimbra@efficios.com> <20160326152816.GW4287@linux.vnet.ibm.com> <20160326184940.GA23851@linux.vnet.ibm.com> <706246733.37102.1459030977316.JavaMail.zimbra@efficios.com> <20160327013456.GX4287@linux.vnet.ibm.com> <702204510.37291.1459086535844.JavaMail.zimbra@efficios.com> <20160327154018.GA4287@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20160327154018.GA4287@linux.vnet.ibm.com> User-Agent: Mutt/1.5.21 (2012-12-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1709 Lines: 39 On Sun, Mar 27, 2016 at 08:40:18AM -0700, Paul E. McKenney wrote: > Oh, and the patch I am running with is below. I am running x86, and so > some other architectures would of course need the corresponding patch > on that architecture. > -#define TIF_POLLING_NRFLAG 21 /* idle is polling for TIF_NEED_RESCHED */ > +/* #define TIF_POLLING_NRFLAG 21 idle is polling for TIF_NEED_RESCHED */ x86 is the only arch that really uses this heavily IIRC. Most of the other archs need interrupts to wake up remote cores. So what we try to do is avoid sending IPIs when the CPU is idle, for the remote wakeup case we use set_nr_if_polling() which sets TIF_NEED_RESCHED if TIF_POLLING_NRFLAG was set. If it wasn't, we'll send the IPI. Otherwise we rely on the idle loop to do sched_ttwu_pending() when it breaks out of loop due to TIF_NEED_RESCHED. But, you need hotplug for this to happen, right? We should not be migrating towards, or waking on, CPUs no longer present in cpu_active_map, and there is a rcu/sched_sync() after clearing that bit. Furthermore, migration_call() does a sched_ttwu_pending() (waking any remaining stragglers) before we migrate all runnable tasks off the dying CPU. The other interesting case would be resched_cpu(), which uses set_nr_and_not_polling() to kick a remote cpu to call schedule(). It atomically sets TIF_NEED_RESCHED and returns if TIF_POLLING_NRFLAG was not set. If indeed not, it will send an IPI. This assumes the idle 'exit' path will do the same as the IPI does; and if you look at cpu_idle_loop() it does indeed do both preempt_fold_need_resched() and sched_ttwu_pending(). Note that one cannot rely on irq_enter()/irq_exit() being called for the scheduler IPI.