2014-04-09 13:52:04

by Viresh Kumar

[permalink] [raw]
Subject: Re: nohz problem with idle time on old hardware

On Thu, Nov 14, 2013 at 1:31 AM, Thomas Gleixner <[email protected]> wrote:
> Subject: NOHZ: Check for nohz active instead of nohz enabled
>
> RCU and the fine grained idle time accounting functions check
> tick_nohz_enabled. But that variable is merily telling that NOHZ has
> been enabled in the config and not been disabled on the command line.
>
> But it does not tell anything about nohz being active. That's what all
> this should check for.
>
> Matthew reported, that the idle accounting on his old P1 machine
> showed bogus values, when he enabled NOHZ in the config and did not
> disable it on the kernel command line. The reason is that his machine
> uses (refined) jiffies as a clocksource which explains why the "fine"
> grained accounting went into lala land, because it depends on when the
> system goes and leaves idle relative to the jiffies increment.
>
> Provide a tick_nohz_active indicator and let RCU and the accounting
> code use this instead of tick_nohz_enable.

> diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> @@ -973,7 +968,7 @@ static void tick_nohz_switch_to_nohz(void)
> struct tick_sched *ts = &__get_cpu_var(tick_cpu_sched);
> ktime_t next;
>
> - if (!tick_nohz_enabled)
> + if (!tick_nohz_active)
> return;

Considering the impressive list of Reviewed-by and people involved
in this patch, I am not sure I am reading the code well here.

The above change isn't required as per my understanding. Otherwise
we will never pass that check. tick_nohz_active is initialized as zero
and so we will keep on returning for ever and wouldn't be able to set
it to 1 ever.

I have a patch to fix it up, but wanted to know your opinion before
sending it.

> local_irq_disable();
> @@ -981,7 +976,7 @@ static void tick_nohz_switch_to_nohz(void)
> local_irq_enable();
> return;
> }
> -
> + tick_nohz_active = 1;
> ts->nohz_mode = NOHZ_MODE_LOWRES;
>
> /*


2014-04-09 14:31:44

by Steven Rostedt

[permalink] [raw]
Subject: Re: nohz problem with idle time on old hardware

On Wed, 9 Apr 2014 19:21:53 +0530
Viresh Kumar <[email protected]> wrote:

> On Thu, Nov 14, 2013 at 1:31 AM, Thomas Gleixner <[email protected]> wrote:
> > Subject: NOHZ: Check for nohz active instead of nohz enabled
> >
> > RCU and the fine grained idle time accounting functions check
> > tick_nohz_enabled. But that variable is merily telling that NOHZ has
> > been enabled in the config and not been disabled on the command line.
> >
> > But it does not tell anything about nohz being active. That's what all
> > this should check for.
> >
> > Matthew reported, that the idle accounting on his old P1 machine
> > showed bogus values, when he enabled NOHZ in the config and did not
> > disable it on the kernel command line. The reason is that his machine
> > uses (refined) jiffies as a clocksource which explains why the "fine"
> > grained accounting went into lala land, because it depends on when the
> > system goes and leaves idle relative to the jiffies increment.
> >
> > Provide a tick_nohz_active indicator and let RCU and the accounting
> > code use this instead of tick_nohz_enable.
>
> > diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> > @@ -973,7 +968,7 @@ static void tick_nohz_switch_to_nohz(void)
> > struct tick_sched *ts = &__get_cpu_var(tick_cpu_sched);
> > ktime_t next;
> >
> > - if (!tick_nohz_enabled)
> > + if (!tick_nohz_active)
> > return;
>
> Considering the impressive list of Reviewed-by and people involved
> in this patch, I am not sure I am reading the code well here.
>
> The above change isn't required as per my understanding. Otherwise
> we will never pass that check. tick_nohz_active is initialized as zero
> and so we will keep on returning for ever and wouldn't be able to set
> it to 1 ever.
>
> I have a patch to fix it up, but wanted to know your opinion before
> sending it.

Ouch! You are correct, this part of the patch makes no sense. That's
what I get for reviewing a patch and not looking at all the code around
the changes. (another kernel developer hangs head in shame :-( )

I think that if statement should be nuked.

-- Steve

>
> > local_irq_disable();
> > @@ -981,7 +976,7 @@ static void tick_nohz_switch_to_nohz(void)
> > local_irq_enable();
> > return;
> > }
> > -
> > + tick_nohz_active = 1;
> > ts->nohz_mode = NOHZ_MODE_LOWRES;
> >
> > /*

2014-04-09 15:11:22

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: nohz problem with idle time on old hardware

On Wed, Apr 09, 2014 at 07:21:53PM +0530, Viresh Kumar wrote:
> On Thu, Nov 14, 2013 at 1:31 AM, Thomas Gleixner <[email protected]> wrote:
> > Subject: NOHZ: Check for nohz active instead of nohz enabled
> >
> > RCU and the fine grained idle time accounting functions check
> > tick_nohz_enabled. But that variable is merily telling that NOHZ has
> > been enabled in the config and not been disabled on the command line.
> >
> > But it does not tell anything about nohz being active. That's what all
> > this should check for.
> >
> > Matthew reported, that the idle accounting on his old P1 machine
> > showed bogus values, when he enabled NOHZ in the config and did not
> > disable it on the kernel command line. The reason is that his machine
> > uses (refined) jiffies as a clocksource which explains why the "fine"
> > grained accounting went into lala land, because it depends on when the
> > system goes and leaves idle relative to the jiffies increment.
> >
> > Provide a tick_nohz_active indicator and let RCU and the accounting
> > code use this instead of tick_nohz_enable.
>
> > diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> > @@ -973,7 +968,7 @@ static void tick_nohz_switch_to_nohz(void)
> > struct tick_sched *ts = &__get_cpu_var(tick_cpu_sched);
> > ktime_t next;
> >
> > - if (!tick_nohz_enabled)
> > + if (!tick_nohz_active)
> > return;
>
> Considering the impressive list of Reviewed-by and people involved
> in this patch, I am not sure I am reading the code well here.
>
> The above change isn't required as per my understanding. Otherwise
> we will never pass that check. tick_nohz_active is initialized as zero
> and so we will keep on returning for ever and wouldn't be able to set
> it to 1 ever.
>
> I have a patch to fix it up, but wanted to know your opinion before
> sending it.

Ouch, bad thing.

I'm also disovering this whole thread and patch only today, since nobody Cc'ed me :-(

> > local_irq_disable();
> > @@ -981,7 +976,7 @@ static void tick_nohz_switch_to_nohz(void)
> > local_irq_enable();
> > return;
> > }
> > -
> > + tick_nohz_active = 1;
> > ts->nohz_mode = NOHZ_MODE_LOWRES;
> >
> > /*
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2014-04-09 15:21:06

by Viresh Kumar

[permalink] [raw]
Subject: Re: nohz problem with idle time on old hardware

On 9 April 2014 20:01, Steven Rostedt <[email protected]> wrote:
> Ouch! You are correct, this part of the patch makes no sense. That's
> what I get for reviewing a patch and not looking at all the code around
> the changes. (another kernel developer hangs head in shame :-( )
>
> I think that if statement should be nuked.

Hmm, my opinion differs here :)

If we completely remove this statement, we will run
tick_nohz_switch_to_nohz() even if nohz is not enabled. And check for
enabled must stay.

2014-04-09 15:29:53

by Steven Rostedt

[permalink] [raw]
Subject: Re: nohz problem with idle time on old hardware

On Wed, 9 Apr 2014 20:50:59 +0530
Viresh Kumar <[email protected]> wrote:

> On 9 April 2014 20:01, Steven Rostedt <[email protected]> wrote:
> > Ouch! You are correct, this part of the patch makes no sense. That's
> > what I get for reviewing a patch and not looking at all the code around
> > the changes. (another kernel developer hangs head in shame :-( )
> >
> > I think that if statement should be nuked.
>
> Hmm, my opinion differs here :)
>
> If we completely remove this statement, we will run
> tick_nohz_switch_to_nohz() even if nohz is not enabled. And check for
> enabled must stay.

Do we? This is only called by tick_check_oneshot_change() which has the
following:

int tick_check_oneshot_change(int allow_nohz)
{
struct tick_sched *ts = &__get_cpu_var(tick_cpu_sched);

if (!test_and_clear_bit(0, &ts->check_clocks))
return 0;

if (ts->nohz_mode != NOHZ_MODE_INACTIVE)
return 0;

if (!timekeeping_valid_for_hres() || !tick_is_oneshot_available())
return 0;

if (!allow_nohz)
return 1;

tick_nohz_switch_to_nohz();
return 0;
}

How often does it make it to that last check?

-- Steve

2014-04-09 15:31:47

by Steven Rostedt

[permalink] [raw]
Subject: Re: nohz problem with idle time on old hardware

On Wed, 9 Apr 2014 11:29:50 -0400
Steven Rostedt <[email protected]> wrote:

> On Wed, 9 Apr 2014 20:50:59 +0530
> Viresh Kumar <[email protected]> wrote:
>
> > On 9 April 2014 20:01, Steven Rostedt <[email protected]> wrote:
> > > Ouch! You are correct, this part of the patch makes no sense. That's
> > > what I get for reviewing a patch and not looking at all the code around
> > > the changes. (another kernel developer hangs head in shame :-( )
> > >
> > > I think that if statement should be nuked.
> >
> > Hmm, my opinion differs here :)
> >
> > If we completely remove this statement, we will run
> > tick_nohz_switch_to_nohz() even if nohz is not enabled. And check for
> > enabled must stay.
>
> Do we? This is only called by tick_check_oneshot_change() which has the
> following:
>
> int tick_check_oneshot_change(int allow_nohz)
> {
> struct tick_sched *ts = &__get_cpu_var(tick_cpu_sched);
>
> if (!test_and_clear_bit(0, &ts->check_clocks))
> return 0;
>
> if (ts->nohz_mode != NOHZ_MODE_INACTIVE)
> return 0;
>
> if (!timekeeping_valid_for_hres() || !tick_is_oneshot_available())
> return 0;
>
> if (!allow_nohz)
> return 1;
>
> tick_nohz_switch_to_nohz();
> return 0;
> }
>
> How often does it make it to that last check?


Hmm, looking at the code, I see it probably should still do the check.

OK, nevermind ;-)

-- Steve

2014-04-09 15:34:48

by Viresh Kumar

[permalink] [raw]
Subject: Re: nohz problem with idle time on old hardware

On 9 April 2014 21:01, Steven Rostedt <[email protected]> wrote:
>> Do we? This is only called by tick_check_oneshot_change() which has the
>> following:
>>
>> int tick_check_oneshot_change(int allow_nohz)
>> {
>> struct tick_sched *ts = &__get_cpu_var(tick_cpu_sched);
>>
>> if (!test_and_clear_bit(0, &ts->check_clocks))
>> return 0;
>>
>> if (ts->nohz_mode != NOHZ_MODE_INACTIVE)
>> return 0;
>>
>> if (!timekeeping_valid_for_hres() || !tick_is_oneshot_available())
>> return 0;
>>
>> if (!allow_nohz)
>> return 1;
>>
>> tick_nohz_switch_to_nohz();
>> return 0;
>> }
>>
>> How often does it make it to that last check?

Probably we will reach here only once per boot per cpu.

> Hmm, looking at the code, I see it probably should still do the check.

But still we need it for that one time. :)

2014-04-09 15:39:42

by Steven Rostedt

[permalink] [raw]
Subject: Re: nohz problem with idle time on old hardware

On Wed, 9 Apr 2014 11:31:43 -0400
Steven Rostedt <[email protected]> wrote:

>
> Hmm, looking at the code, I see it probably should still do the check.
>
> OK, nevermind ;-)

Reading even more of the code, now I'm totally confused :-)

When tick_setup_sched_timer() is called, if tick_nohz_enabled is set,
then we set tick_nohz_active.

This gets called by hrtimer_switch_to_hres(), and before that is
called, the tick_check_oneshot_changed() will never get to the
tick_nohz_switch_to_nohz() call.

Looks to me, the real answer is to nuke both the if statement *and* the
setting of the tick_nohz_active in that function. Both looks a bit
redundant to me.

-- Steve

2014-04-09 15:56:56

by Viresh Kumar

[permalink] [raw]
Subject: Re: nohz problem with idle time on old hardware

On 9 April 2014 21:09, Steven Rostedt <[email protected]> wrote:
> Reading even more of the code, now I'm totally confused :-)

:)

> When tick_setup_sched_timer() is called, if tick_nohz_enabled is set,
> then we set tick_nohz_active.

correct.

> This gets called by hrtimer_switch_to_hres(), and before that is
> called, the tick_check_oneshot_changed() will never get to the
> tick_nohz_switch_to_nohz() call.

If hrtimer_switch_to_hres() is called or HRES is enabled, we will
never ever call tick_nohz_switch_to_nohz().

> Looks to me, the real answer is to nuke both the if statement *and* the
> setting of the tick_nohz_active in that function. Both looks a bit
> redundant to me.

When HRES isn't enabled and NOHZ isn't enabled as well, in that
case we should stick to the periodic code from tick-common.c and
the oneshot options of tick_nohz_switch_to_nohz() or
hrtimer_switch_to_hres() shouldn't be used. And so, we still need
those checks, as per my understanding. :)

Lets see what others have for this discussion :)

2014-04-09 16:15:36

by Steven Rostedt

[permalink] [raw]
Subject: Re: nohz problem with idle time on old hardware

On Wed, 9 Apr 2014 21:26:51 +0530
Viresh Kumar <[email protected]> wrote:


> When HRES isn't enabled and NOHZ isn't enabled as well, in that
> case we should stick to the periodic code from tick-common.c and
> the oneshot options of tick_nohz_switch_to_nohz() or
> hrtimer_switch_to_hres() shouldn't be used. And so, we still need
> those checks, as per my understanding. :)

The one thing we can agree on is that the current code is wrong :-)

>
> Lets see what others have for this discussion :)

Yeah, those that actually wrote this code should have a say in this.

-- Steve