Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757194Ab3CSRDL (ORCPT ); Tue, 19 Mar 2013 13:03:11 -0400 Received: from cantor2.suse.de ([195.135.220.15]:49067 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755440Ab3CSRDJ (ORCPT ); Tue, 19 Mar 2013 13:03:09 -0400 Date: Tue, 19 Mar 2013 18:03:06 +0100 From: Jiri Bohac To: Thomas Gleixner , Dimitri Sivanich Cc: linux-kernel@vger.kernel.org Subject: Re: [PATCH][RFC] specific do_timer_cpu value for nohz off mode Message-ID: <20130319170306.GA23272@midget.suse.cz> References: <20111108191149.GA7236@sgi.com> <20120215143710.GA10543@sgi.com> <20120215153438.GB11343@sgi.com> <20120216145900.GA3772@sgi.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20120216145900.GA3772@sgi.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3694 Lines: 93 Hi, following up on a very old thread: http://thread.gmane.org/gmane.linux.kernel/1212777 On Thu, Feb 16, 2012 at 08:59:00AM -0600, Dimitri Sivanich wrote: > On Wed, Feb 15, 2012 at 09:36:47PM +0100, Thomas Gleixner wrote: > > On Wed, 15 Feb 2012, Dimitri Sivanich wrote: > > > On Wed, Feb 15, 2012 at 03:52:06PM +0100, Thomas Gleixner wrote: > > > > So the first CPU which registers a clock event device takes it. That's > > > > the boot CPU, no matter what. > > > > > > > Both kernel tracing and the original patch that I proposed for this > > > showed plainly (at the time) that the tick_do_timer_cpu was not always cpu 0 > > > prior to modifying it for nohz=off. Maybe that is no longer the case? > > > > This logic has not been changed in years. > > I did some tracing of all points where tick_do_timer_cpu is set in the > 3.3.0-rc3+ kernel. > > > > > tick_do_timer_cpu is initialized to TICK_DO_TIMER_BOOT and the first > > cpu which registers either a global or a per cpu clock event device > > takes it over. This is at least on x86 always the boot cpu, i.e. cpu0. > > After that point nothing touches that variable when nohz is disabled > > (runtime or compile time). > > At that point it is set to cpu 0. However, when we go into highres mode > it does change. Below are the two places it was set during boot with > nohz=off on one of our x86 based machines. > > [ 0.000000] tick_setup_device: tick_do_timer_cpu 0 > [ 1.924098] tick_broadcast_setup_oneshot: tick_do_timer_cpu 17 > > So in this example it's now cpu 17, and it stays that way from that point on. > > A traceback at that point shows tick_init_highres is indeed initiating this: > > [ 1.924863] [] tick_broadcast_setup_oneshot+0x71/0x160 > [ 1.924863] [] tick_broadcast_switch_to_oneshot+0x33/0x50 > [ 1.924863] [] tick_switch_to_oneshot+0x81/0xd0 > [ 1.924863] [] tick_init_highres+0x10/0x20 > [ 1.924863] [] hrtimer_run_pending+0x71/0xd0 > > > > > So I really want to see proper proof why that would not be the > > case. If it really happens then we fix the root cause instead of > > adding random sysfs interfaces. As Dimitri wrote above, the switch from cpu 0 is done by tick_broadcast_setup_oneshot. The first CPU switching to highres takes the broadcast responsibility and also sets tick_do_timer_cpu to itself. This behaviour has been introduced by 7300711e (clockevents: broadcast fixup possible waiters). I don't see a good reason assign tick_do_timer_cpu to the CPU doing the one-shot timer broadcasts. The timer interrupt will be generated on any other CPU as well, be it through the broadcast IPI or a per-CPU clockevent device. Any online CPU can do that job, so how about just dropping the assignment? The do_timer() code should not suffer from the jitter introduced by the interrupt being generated by the broadcast, should it? Signed-off-by: Jiri Bohac --- a/kernel/time/tick-broadcast.c +++ b/kernel/time/tick-broadcast.c @@ -572,9 +572,6 @@ void tick_broadcast_setup_oneshot(struct clock_event_device *bc) bc->event_handler = tick_handle_oneshot_broadcast; - /* Take the do_timer update */ - tick_do_timer_cpu = cpu; - /* * We must be careful here. There might be other CPUs * waiting for periodic broadcast. We need to set the -- Jiri Bohac SUSE Labs, SUSE CZ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/