2012-02-15 14:16:38

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH] specific do_timer_cpu value for nohz off mode

On Tue, 8 Nov 2011, Dimitri Sivanich wrote:
>
> Allow manual override of the tick_do_timer_cpu.
>
> While not necessarily harmful, doing jiffies updates on an application cpu
> does cause some extra overhead that HPC benchmarking people notice. They
> prefer to have OS activity isolated to certain cpus. They like reproducibility
> of results, and having jiffies updates bouncing around introduces variability.

I really wonder about this changelog. The only case where jiffies
updates bounces around is the NOHZ case. In all other modes (periodic
or highres) the boot cpu gets the do_timer() duty and it's never
assigned to any other cpu.

So what's the point of this exercise? Moving it away from CPU0 for
acedemic reasons or what?

Thanks,

tglx


2012-02-15 14:37:16

by Dimitri Sivanich

[permalink] [raw]
Subject: Re: [PATCH] specific do_timer_cpu value for nohz off mode

On Wed, Feb 15, 2012 at 03:16:34PM +0100, Thomas Gleixner wrote:
> On Tue, 8 Nov 2011, Dimitri Sivanich wrote:
> >
> > Allow manual override of the tick_do_timer_cpu.
> >
> > While not necessarily harmful, doing jiffies updates on an application cpu
> > does cause some extra overhead that HPC benchmarking people notice. They
> > prefer to have OS activity isolated to certain cpus. They like reproducibility
> > of results, and having jiffies updates bouncing around introduces variability.
>
> I really wonder about this changelog. The only case where jiffies
> updates bounces around is the NOHZ case. In all other modes (periodic
> or highres) the boot cpu gets the do_timer() duty and it's never
> assigned to any other cpu.
>
> So what's the point of this exercise? Moving it away from CPU0 for
> acedemic reasons or what?
>
I wasn't specifically trying to move it away from CPU0 (having jiffies updates
on CPU0 was and would be just fine for the nohz=off case). The issue was
that the tick_do_timer_cpu could be any cpu even in the nohz=off case (maybe
something has changed that since?). After the point of assignment it is
static, but you never know which cpu it is.

2012-02-15 14:53:59

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH] specific do_timer_cpu value for nohz off mode

On Wed, 15 Feb 2012, Dimitri Sivanich wrote:

> On Wed, Feb 15, 2012 at 03:16:34PM +0100, Thomas Gleixner wrote:
> > On Tue, 8 Nov 2011, Dimitri Sivanich wrote:
> > >
> > > Allow manual override of the tick_do_timer_cpu.
> > >
> > > While not necessarily harmful, doing jiffies updates on an application cpu
> > > does cause some extra overhead that HPC benchmarking people notice. They
> > > prefer to have OS activity isolated to certain cpus. They like reproducibility
> > > of results, and having jiffies updates bouncing around introduces variability.
> >
> > I really wonder about this changelog. The only case where jiffies
> > updates bounces around is the NOHZ case. In all other modes (periodic
> > or highres) the boot cpu gets the do_timer() duty and it's never
> > assigned to any other cpu.
> >
> > So what's the point of this exercise? Moving it away from CPU0 for
> > acedemic reasons or what?
> >
> I wasn't specifically trying to move it away from CPU0 (having jiffies updates
> on CPU0 was and would be just fine for the nohz=off case). The issue was
> that the tick_do_timer_cpu could be any cpu even in the nohz=off case (maybe
> something has changed that since?). After the point of assignment it is
> static, but you never know which cpu it is.

It's always the boot cpu and that has been there from day one of that code.

tick_setup_device()
{
/*
* First device setup ?
*/
if (!td->evtdev) {
/*
* If no cpu took the do_timer update, assign it to
* this cpu:
*/
if (tick_do_timer_cpu == TICK_DO_TIMER_BOOT) {
tick_do_timer_cpu = cpu;

So the first CPU which registers a clock event device takes it. That's
the boot CPU, no matter what.

Thanks,

tglx

2012-02-15 15:35:06

by Dimitri Sivanich

[permalink] [raw]
Subject: Re: [PATCH] specific do_timer_cpu value for nohz off mode

On Wed, Feb 15, 2012 at 03:52:06PM +0100, Thomas Gleixner wrote:
> On Wed, 15 Feb 2012, Dimitri Sivanich wrote:
>
> > On Wed, Feb 15, 2012 at 03:16:34PM +0100, Thomas Gleixner wrote:
> > > On Tue, 8 Nov 2011, Dimitri Sivanich wrote:
> > > >
> > > > Allow manual override of the tick_do_timer_cpu.
> > > >
> > > > While not necessarily harmful, doing jiffies updates on an application cpu
> > > > does cause some extra overhead that HPC benchmarking people notice. They
> > > > prefer to have OS activity isolated to certain cpus. They like reproducibility
> > > > of results, and having jiffies updates bouncing around introduces variability.
> > >
> > > I really wonder about this changelog. The only case where jiffies
> > > updates bounces around is the NOHZ case. In all other modes (periodic
> > > or highres) the boot cpu gets the do_timer() duty and it's never
> > > assigned to any other cpu.
> > >
> > > So what's the point of this exercise? Moving it away from CPU0 for
> > > acedemic reasons or what?
> > >
> > I wasn't specifically trying to move it away from CPU0 (having jiffies updates
> > on CPU0 was and would be just fine for the nohz=off case). The issue was
> > that the tick_do_timer_cpu could be any cpu even in the nohz=off case (maybe
> > something has changed that since?). After the point of assignment it is
> > static, but you never know which cpu it is.
>
> It's always the boot cpu and that has been there from day one of that code.
>
> tick_setup_device()
> {
> /*
> * First device setup ?
> */
> if (!td->evtdev) {
> /*
> * If no cpu took the do_timer update, assign it to
> * this cpu:
> */
> if (tick_do_timer_cpu == TICK_DO_TIMER_BOOT) {
> tick_do_timer_cpu = cpu;
>
> So the first CPU which registers a clock event device takes it. That's
> the boot CPU, no matter what.
>
Both kernel tracing and the original patch that I proposed for this
showed plainly (at the time) that the tick_do_timer_cpu was not always cpu 0
prior to modifying it for nohz=off. Maybe that is no longer the case?

2012-02-15 20:36:50

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH] specific do_timer_cpu value for nohz off mode

On Wed, 15 Feb 2012, Dimitri Sivanich wrote:
> On Wed, Feb 15, 2012 at 03:52:06PM +0100, Thomas Gleixner wrote:
> > So the first CPU which registers a clock event device takes it. That's
> > the boot CPU, no matter what.
> >
> Both kernel tracing and the original patch that I proposed for this
> showed plainly (at the time) that the tick_do_timer_cpu was not always cpu 0
> prior to modifying it for nohz=off. Maybe that is no longer the case?

This logic has not been changed in years.

tick_do_timer_cpu is initialized to TICK_DO_TIMER_BOOT and the first
cpu which registers either a global or a per cpu clock event device
takes it over. This is at least on x86 always the boot cpu, i.e. cpu0.
After that point nothing touches that variable when nohz is disabled
(runtime or compile time).

So I really want to see proper proof why that would not be the
case. If it really happens then we fix the root cause instead of
adding random sysfs interfaces.

Thanks,

tglx

2012-02-16 14:59:06

by Dimitri Sivanich

[permalink] [raw]
Subject: Re: [PATCH] specific do_timer_cpu value for nohz off mode

On Wed, Feb 15, 2012 at 09:36:47PM +0100, Thomas Gleixner wrote:
> On Wed, 15 Feb 2012, Dimitri Sivanich wrote:
> > On Wed, Feb 15, 2012 at 03:52:06PM +0100, Thomas Gleixner wrote:
> > > So the first CPU which registers a clock event device takes it. That's
> > > the boot CPU, no matter what.
> > >
> > Both kernel tracing and the original patch that I proposed for this
> > showed plainly (at the time) that the tick_do_timer_cpu was not always cpu 0
> > prior to modifying it for nohz=off. Maybe that is no longer the case?
>
> This logic has not been changed in years.

I did some tracing of all points where tick_do_timer_cpu is set in the
3.3.0-rc3+ kernel.

>
> tick_do_timer_cpu is initialized to TICK_DO_TIMER_BOOT and the first
> cpu which registers either a global or a per cpu clock event device
> takes it over. This is at least on x86 always the boot cpu, i.e. cpu0.
> After that point nothing touches that variable when nohz is disabled
> (runtime or compile time).

At that point it is set to cpu 0. However, when we go into highres mode
it does change. Below are the two places it was set during boot with
nohz=off on one of our x86 based machines.

[ 0.000000] tick_setup_device: tick_do_timer_cpu 0
[ 1.924098] tick_broadcast_setup_oneshot: tick_do_timer_cpu 17

So in this example it's now cpu 17, and it stays that way from that point on.

A traceback at that point shows tick_init_highres is indeed initiating this:

[ 1.924863] [<ffffffff81087e01>] tick_broadcast_setup_oneshot+0x71/0x160
[ 1.924863] [<ffffffff81087f23>] tick_broadcast_switch_to_oneshot+0x33/0x50
[ 1.924863] [<ffffffff81088841>] tick_switch_to_oneshot+0x81/0xd0
[ 1.924863] [<ffffffff810888a0>] tick_init_highres+0x10/0x20
[ 1.924863] [<ffffffff81061e71>] hrtimer_run_pending+0x71/0xd0

>
> So I really want to see proper proof why that would not be the
> case. If it really happens then we fix the root cause instead of
> adding random sysfs interfaces.

If you're willing to consider it a bug that tick_do_timer_cpu is not cpu 0,
than I'm very much in agreement.

>
> Thanks,
>
> tglx

2013-03-19 17:03:11

by Jiri Bohac

[permalink] [raw]
Subject: Re: [PATCH][RFC] specific do_timer_cpu value for nohz off mode

Hi,

following up on a very old thread:
http://thread.gmane.org/gmane.linux.kernel/1212777

On Thu, Feb 16, 2012 at 08:59:00AM -0600, Dimitri Sivanich wrote:
> On Wed, Feb 15, 2012 at 09:36:47PM +0100, Thomas Gleixner wrote:
> > On Wed, 15 Feb 2012, Dimitri Sivanich wrote:
> > > On Wed, Feb 15, 2012 at 03:52:06PM +0100, Thomas Gleixner wrote:
> > > > So the first CPU which registers a clock event device takes it. That's
> > > > the boot CPU, no matter what.
> > > >
> > > Both kernel tracing and the original patch that I proposed for this
> > > showed plainly (at the time) that the tick_do_timer_cpu was not always cpu 0
> > > prior to modifying it for nohz=off. Maybe that is no longer the case?
> >
> > This logic has not been changed in years.
>
> I did some tracing of all points where tick_do_timer_cpu is set in the
> 3.3.0-rc3+ kernel.
>
> >
> > tick_do_timer_cpu is initialized to TICK_DO_TIMER_BOOT and the first
> > cpu which registers either a global or a per cpu clock event device
> > takes it over. This is at least on x86 always the boot cpu, i.e. cpu0.
> > After that point nothing touches that variable when nohz is disabled
> > (runtime or compile time).
>
> At that point it is set to cpu 0. However, when we go into highres mode
> it does change. Below are the two places it was set during boot with
> nohz=off on one of our x86 based machines.
>
> [ 0.000000] tick_setup_device: tick_do_timer_cpu 0
> [ 1.924098] tick_broadcast_setup_oneshot: tick_do_timer_cpu 17
>
> So in this example it's now cpu 17, and it stays that way from that point on.
>
> A traceback at that point shows tick_init_highres is indeed initiating this:
>
> [ 1.924863] [<ffffffff81087e01>] tick_broadcast_setup_oneshot+0x71/0x160
> [ 1.924863] [<ffffffff81087f23>] tick_broadcast_switch_to_oneshot+0x33/0x50
> [ 1.924863] [<ffffffff81088841>] tick_switch_to_oneshot+0x81/0xd0
> [ 1.924863] [<ffffffff810888a0>] tick_init_highres+0x10/0x20
> [ 1.924863] [<ffffffff81061e71>] hrtimer_run_pending+0x71/0xd0
>
> >
> > So I really want to see proper proof why that would not be the
> > case. If it really happens then we fix the root cause instead of
> > adding random sysfs interfaces.

As Dimitri wrote above, the switch from cpu 0 is done by
tick_broadcast_setup_oneshot. The first CPU switching to highres
takes the broadcast responsibility and also sets
tick_do_timer_cpu to itself.

This behaviour has been introduced by 7300711e
(clockevents: broadcast fixup possible waiters).

I don't see a good reason assign tick_do_timer_cpu to the CPU
doing the one-shot timer broadcasts. The timer interrupt will be
generated on any other CPU as well, be it through the broadcast
IPI or a per-CPU clockevent device. Any online CPU can do that
job, so how about just dropping the assignment?

The do_timer() code should not suffer from the jitter introduced
by the interrupt being generated by the broadcast, should it?

Signed-off-by: Jiri Bohac <[email protected]>

--- a/kernel/time/tick-broadcast.c
+++ b/kernel/time/tick-broadcast.c
@@ -572,9 +572,6 @@ void tick_broadcast_setup_oneshot(struct clock_event_device *bc)

bc->event_handler = tick_handle_oneshot_broadcast;

- /* Take the do_timer update */
- tick_do_timer_cpu = cpu;
-
/*
* We must be careful here. There might be other CPUs
* waiting for periodic broadcast. We need to set the



--
Jiri Bohac <[email protected]>
SUSE Labs, SUSE CZ