Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S935057AbcJHLfO (ORCPT ); Sat, 8 Oct 2016 07:35:14 -0400 Received: from Galois.linutronix.de ([146.0.238.70]:46160 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S934899AbcJHLfF (ORCPT ); Sat, 8 Oct 2016 07:35:05 -0400 Date: Sat, 8 Oct 2016 13:32:06 +0200 (CEST) From: Thomas Gleixner To: Rich Felker cc: Daniel Lezcano , devicetree@vger.kernel.org, linux-kernel@vger.kernel.org, linux-sh@vger.kernel.org, Rob Herring , Mark Rutland , "Paul E. McKenney" Subject: Re: [PATCH v7 2/2] clocksource: add J-Core timer/clocksource driver In-Reply-To: <20161008013210.GX19318@brightrain.aerifal.cx> Message-ID: References: <22c1ee0f908fe3bf8b70f5e87d659ceb29af1434.1474693319.git.dalias@libc.org> <20160926210716.GA12855@brightrain.aerifal.cx> <4b02ba7d-4a31-297a-bbbd-be26da615e7b@linaro.org> <20160926222352.GE19318@brightrain.aerifal.cx> <20160927004258.GF19318@brightrain.aerifal.cx> <20160927220820.GH19318@brightrain.aerifal.cx> <20161008013210.GX19318@brightrain.aerifal.cx> User-Agent: Alpine 2.20 (DEB 67 2015-01-07) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3484 Lines: 81 On Fri, 7 Oct 2016, Rich Felker wrote: > > If I'm not mistaken, the bug is in tick_nohz_restart. According to the I think you are mistaken. Let's look at CPU0 only: -0 [000] d... 150.829698: __tick_nohz_idle_enter: can stop idle tick -0 [000] d... 150.829774: __tick_nohz_idle_enter: entering stop_sched_tick -0 [000] d... 150.830001: tick_stop: success=1 dependency=NONE -0 [000] d... 150.830127: hrtimer_cancel: hrtimer=109e949c -0 [000] d... 150.830380: jcore_pit_set_next_event: pit 109eccb0 0 set delta 82124660 en=4410000 cntr=75 throt=82124660 -0 [000] d... 150.830498: hrtimer_start: hrtimer=109e949c function=tick_sched_timer expires=150840000000 softexpires=150840000000 -0 [000] d... 150.830720: jcore_pit_set_next_event: pit 109eccb0 0 set delta 539823 en=4410000 cntr=34 throt=539823 Tick is programmed to fire at 150.84 -0 [000] d... 150.830822: __tick_nohz_idle_enter: returning from stop_sched_tick -0 [000] d... 150.830940: rcu_dyntick: Start 140000000000000 0 -0 [000] d... 150.831072: rcu_dyntick: End 0 1 -0 [000] d... 150.831165: cpu_idle: state=0 cpu_id=0 -0 [000] d... 150.831269: rcu_dyntick: Start 1 0 -0 [000] .... 150.831368: cpu_startup_entry: enabled irqs, spinning CPU spins and waits for an interrupt to happen -0 [000] d... 150.841530: rcu_dyntick: End 0 1 Dropping out of the spin about the time we expect the PIT interrupt to fire. And an interrupt is the reason why we drop out because the 'need resched' flag is not set and we end up in: -0 [000] d.s. 150.841724: tick_irq_enter: update jiffies via irq which is called from irq_enter() -0 [000] d.s. 150.841918: tick_do_update_jiffies64: finished do_timer(1) -0 [000] d.s. 150.842348: tick_do_update_jiffies64: finished updating jiffies So here we would expect: irq_handler_entry: irq=16 name=jcore_pit ... irq_handler_exit ..... or any other interrupt being invoked. But we see this here: -0 [000] d... 150.842603: __tick_nohz_idle_enter: can stop idle tick And that's just wrong. Now looking at CPU1 at the same time: -0 [001] d.h. 150.841829: irq_handler_entry: irq=16 name=jcore_ -0 [001] d.h. 150.842054: hrtimer_cancel: hrtimer=109f449c -0 [001] d.h. 150.842218: hrtimer_expire_entry: hrtimer=109f449c function=tick_sched_timer now=150840579940 So there the PIT interrupt enters and runs the interrupt handler. Now looking deeper at the PIT interrupts: kworker/0:1-208 [000] d.h. 150.641822: irq_handler_entry: irq=16 name=jcore_pit kworker/0:1-208 [000] d.h. 150.644928: irq_handler_exit: irq=16 ret=handled -0 [001] dnh. 150.645370: irq_handler_entry: irq=16 name=jcore_pit -0 [001] dnh. 150.647444: irq_handler_exit: irq=16 ret=handled Both CPUs have the same interrupt number for their per cpu PIT interrupt. IIRC you said earlier when the percpu interrupt discussion happened, that your per cpu PITs have distinct interrupt numbers, but that does not seem the case here. That's where you have to dig. Thanks, tglx