Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932117AbcC2AZQ (ORCPT ); Mon, 28 Mar 2016 20:25:16 -0400 Received: from e17.ny.us.ibm.com ([129.33.205.207]:48831 "EHLO e17.ny.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755541AbcC2AZN (ORCPT ); Mon, 28 Mar 2016 20:25:13 -0400 X-IBM-Helo: d01dlp02.pok.ibm.com X-IBM-MailFrom: paulmck@linux.vnet.ibm.com X-IBM-RcptTo: linux-kernel@vger.kernel.org Date: Mon, 28 Mar 2016 17:25:18 -0700 From: "Paul E. McKenney" To: Peter Zijlstra Cc: Mathieu Desnoyers , "Chatre, Reinette" , Jacob Pan , Josh Triplett , Ross Green , John Stultz , Thomas Gleixner , lkml , Ingo Molnar , Lai Jiangshan , dipankar@in.ibm.com, Andrew Morton , rostedt , David Howells , Eric Dumazet , Darren Hart , =?iso-8859-1?Q?Fr=E9d=E9ric?= Weisbecker , Oleg Nesterov , pranith kumar Subject: Re: rcu_preempt self-detected stall on CPU from 4.5-rc3, since 3.17 Message-ID: <20160329002518.GA13058@linux.vnet.ibm.com> Reply-To: paulmck@linux.vnet.ibm.com References: <20160326152816.GW4287@linux.vnet.ibm.com> <20160326184940.GA23851@linux.vnet.ibm.com> <706246733.37102.1459030977316.JavaMail.zimbra@efficios.com> <20160327013456.GX4287@linux.vnet.ibm.com> <702204510.37291.1459086535844.JavaMail.zimbra@efficios.com> <20160327154018.GA4287@linux.vnet.ibm.com> <20160327204559.GV6356@twins.programming.kicks-ass.net> <20160327210641.GB4287@linux.vnet.ibm.com> <20160328062547.GD6344@twins.programming.kicks-ass.net> <20160328130841.GE4287@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20160328130841.GE4287@linux.vnet.ibm.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 16032900-0041-0000-0000-000003B7C8B7 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4690 Lines: 128 On Mon, Mar 28, 2016 at 06:08:41AM -0700, Paul E. McKenney wrote: > On Mon, Mar 28, 2016 at 08:25:47AM +0200, Peter Zijlstra wrote: > > On Sun, Mar 27, 2016 at 02:06:41PM -0700, Paul E. McKenney wrote: [ . . . ] > > > OK, so I should instrument migration_call() if I get the repro rate up? > > > > Can do, maybe try the below first. (yes I know how long it all takes :/) > > OK, will run this today, then run calibration for last night's run this > evening. And there was one failure out of ten runs. If last night's failure rate was typical (7 of 24), then I believe we can be about 87% confident that this change helped. That isn't all that confident, but... Tested-by: Paul E. McKenney So what to run tonight? The most sane approach would be to run stock in order to get a baseline failure rate. It is tempting to run more of Peter's patch, but part of the problem is that we don't know the current baseline. So baseline it is... Thanx, Paul > Speaking of which, last night's run (disabling TIF_POLLING_NRFLAG) > consisted of 24 two-hour runs. Six of them had hard hangs, and another > had a hang that eventually unhung of its own accord. I believe that this > is significantly fewer failures than from a stock kernel, but I could > be wrong, and it will take some serious testing to give statistical > confidence for whatever conclusion is correct. > > > > > The other interesting case would be resched_cpu(), which uses > > > > set_nr_and_not_polling() to kick a remote cpu to call schedule(). It > > > > atomically sets TIF_NEED_RESCHED and returns if TIF_POLLING_NRFLAG was > > > > not set. If indeed not, it will send an IPI. > > > > > > > > This assumes the idle 'exit' path will do the same as the IPI does; and > > > > if you look at cpu_idle_loop() it does indeed do both > > > > preempt_fold_need_resched() and sched_ttwu_pending(). > > > > > > > > Note that one cannot rely on irq_enter()/irq_exit() being called for the > > > > scheduler IPI. > > > > > > OK, thank you for the info! Any specific debug actions? > > > > Dunno, something like the below should bring visibility into the > > (lockless) wake_list thingy. > > > > So these trace_printk()s should happen between trace_sched_waking() and > > trace_sched_wakeup() (I've not fully read the thread, but ISTR you had > > some traces with these here thingies on). > > > > --- > > arch/x86/include/asm/bitops.h | 6 ++++-- > > kernel/sched/core.c | 9 +++++++++ > > 2 files changed, 13 insertions(+), 2 deletions(-) > > > > diff --git a/arch/x86/include/asm/bitops.h b/arch/x86/include/asm/bitops.h > > index 7766d1cf096e..5345784d5e41 100644 > > --- a/arch/x86/include/asm/bitops.h > > +++ b/arch/x86/include/asm/bitops.h > > @@ -112,11 +112,13 @@ clear_bit(long nr, volatile unsigned long *addr) > > if (IS_IMMEDIATE(nr)) { > > asm volatile(LOCK_PREFIX "andb %1,%0" > > : CONST_MASK_ADDR(nr, addr) > > - : "iq" ((u8)~CONST_MASK(nr))); > > + : "iq" ((u8)~CONST_MASK(nr)) > > + : "memory"); > > } else { > > asm volatile(LOCK_PREFIX "btr %1,%0" > > : BITOP_ADDR(addr) > > - : "Ir" (nr)); > > + : "Ir" (nr) > > + : "memory"); > > } > > } > > Is the above addition of "memory" strictly for the debug below, or is > it also a potential fix? > > Starting it up regardless, but figured I should ask! > > Thanx, Paul > > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > > index 0b21e7a724e1..b446f73c530d 100644 > > --- a/kernel/sched/core.c > > +++ b/kernel/sched/core.c > > @@ -1669,6 +1669,7 @@ void sched_ttwu_pending(void) > > while (llist) { > > p = llist_entry(llist, struct task_struct, wake_entry); > > llist = llist_next(llist); > > + trace_printk("waking %d\n", p->pid); > > ttwu_do_activate(rq, p, 0); > > } > > > > @@ -1719,6 +1720,7 @@ static void ttwu_queue_remote(struct task_struct *p, int cpu) > > struct rq *rq = cpu_rq(cpu); > > > > if (llist_add(&p->wake_entry, &cpu_rq(cpu)->wake_list)) { > > + trace_printk("queued %d for waking on %d\n", p->pid, cpu); > > if (!set_nr_if_polling(rq->idle)) > > smp_send_reschedule(cpu); > > else > > @@ -5397,10 +5399,17 @@ migration_call(struct notifier_block *nfb, unsigned long action, void *hcpu) > > migrate_tasks(rq); > > BUG_ON(rq->nr_running != 1); /* the migration thread */ > > raw_spin_unlock_irqrestore(&rq->lock, flags); > > + > > + /* really bad m'kay */ > > + WARN_ON(!llist_empty(&rq->wake_list)); > > + > > break; > > > > case CPU_DEAD: > > calc_load_migrate(rq); > > + > > + /* more bad */ > > + WARN_ON(!llist_empty(&rq->wake_list)); > > break; > > #endif > > } > >