2012-10-04 17:46:30

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: Seems like "sched: Add missing call to calc_load_exit_idle()" should be reverted in 3.5 branch

On Thu, Oct 04, 2012 at 12:11:01PM +0800, Huacai Chen wrote:
> Hi, Greg
>
> I found that Linux-3.5.5 accept this commit "sched: Add missing call
> to calc_load_exit_idle()" but I think this isn't needed. Because
> "5167e8d5417b sched/nohz: Rewrite and fix load-avg computation --
> again not fully applied" is true for 3.6 branch, but not for 3.5
> branch.

But 5167e8d5417b is in 3.5, so shouldn't this commit still be necessary?

> In 3.5 branch, calc_load_exit_idle() is already called in
> tick_nohz_idle_exit(), it doesn't need to be called at
> tick_nohz_update_jiffies() again. In 3.6 branch, some code of
> tick_nohz_idle_exit() is splitted to tick_nohz_restart_sched_tick()
> and calc_load_exit_idle() is missing by accident, so commit "sched:
> Add missing call to calc_load_exit_idle()" is needed.

So this really should be dropped from 3.5? Charles, Peter, Ingo, any
thoughts here?

thanks,

greg k-h


2012-10-04 18:32:21

by Peter Zijlstra

[permalink] [raw]
Subject: Re: Seems like "sched: Add missing call to calc_load_exit_idle()" should be reverted in 3.5 branch

On Thu, 2012-10-04 at 10:46 -0700, Greg Kroah-Hartman wrote:
> On Thu, Oct 04, 2012 at 12:11:01PM +0800, Huacai Chen wrote:
> > Hi, Greg
> >
> > I found that Linux-3.5.5 accept this commit "sched: Add missing call
> > to calc_load_exit_idle()" but I think this isn't needed. Because
> > "5167e8d5417b sched/nohz: Rewrite and fix load-avg computation --
> > again not fully applied" is true for 3.6 branch, but not for 3.5
> > branch.
>
> But 5167e8d5417b is in 3.5, so shouldn't this commit still be necessary?
>
> > In 3.5 branch, calc_load_exit_idle() is already called in
> > tick_nohz_idle_exit(), it doesn't need to be called at
> > tick_nohz_update_jiffies() again. In 3.6 branch, some code of
> > tick_nohz_idle_exit() is splitted to tick_nohz_restart_sched_tick()
> > and calc_load_exit_idle() is missing by accident, so commit "sched:
> > Add missing call to calc_load_exit_idle()" is needed.
>
> So this really should be dropped from 3.5? Charles, Peter, Ingo, any
> thoughts here?

Bah, lots of code movement there recently.. let me try and untangle all
that afresh.. /me checks out v3.5.5.

OK, assuming ->tick_stopped means what the label says, then we only want
to call calc_load_enter_idle() when it flips to 1 and
calc_load_exit_idle() when it flips back to 0, such that when an actual
tick happens its got correct state.

Now the actual patch "5167e8d5417b sched/nohz: Rewrite and fix load-avg
computation -- again not fully applied" modifies
tick_nohz_restart_sched_tick() which doesn't appear to exist in v3.5.5
and the patch fobbed it into tick_nohz_update_jiffies() which is called
from interrupt entry when nohz-idle so that the interrupt (and possible
tailing softirq) see a valid jiffies count.

However, since we don't restart the tick, we won't be sampling load muck
and calling calc_load_exit_idle() from there is bound to confuse state.

I hope.. damn this code ;-)

I can't find wtf went wrong either, the initial patch 5167e8d5417bf5c
contains both hunks, but in that case the fixup 749c8814f0 doesn't make
sense, not can I find anything in merge commits using:

git log -S calc_load_exit_idle kernel/time/tick-sched.c

/me puzzled

2012-10-04 22:28:01

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: Seems like "sched: Add missing call to calc_load_exit_idle()" should be reverted in 3.5 branch

On Thu, Oct 04, 2012 at 08:31:59PM +0200, Peter Zijlstra wrote:
> On Thu, 2012-10-04 at 10:46 -0700, Greg Kroah-Hartman wrote:
> > On Thu, Oct 04, 2012 at 12:11:01PM +0800, Huacai Chen wrote:
> > > Hi, Greg
> > >
> > > I found that Linux-3.5.5 accept this commit "sched: Add missing call
> > > to calc_load_exit_idle()" but I think this isn't needed. Because
> > > "5167e8d5417b sched/nohz: Rewrite and fix load-avg computation --
> > > again not fully applied" is true for 3.6 branch, but not for 3.5
> > > branch.
> >
> > But 5167e8d5417b is in 3.5, so shouldn't this commit still be necessary?
> >
> > > In 3.5 branch, calc_load_exit_idle() is already called in
> > > tick_nohz_idle_exit(), it doesn't need to be called at
> > > tick_nohz_update_jiffies() again. In 3.6 branch, some code of
> > > tick_nohz_idle_exit() is splitted to tick_nohz_restart_sched_tick()
> > > and calc_load_exit_idle() is missing by accident, so commit "sched:
> > > Add missing call to calc_load_exit_idle()" is needed.
> >
> > So this really should be dropped from 3.5? Charles, Peter, Ingo, any
> > thoughts here?
>
> Bah, lots of code movement there recently.. let me try and untangle all
> that afresh.. /me checks out v3.5.5.
>
> OK, assuming ->tick_stopped means what the label says, then we only want
> to call calc_load_enter_idle() when it flips to 1 and
> calc_load_exit_idle() when it flips back to 0, such that when an actual
> tick happens its got correct state.
>
> Now the actual patch "5167e8d5417b sched/nohz: Rewrite and fix load-avg
> computation -- again not fully applied" modifies
> tick_nohz_restart_sched_tick() which doesn't appear to exist in v3.5.5
> and the patch fobbed it into tick_nohz_update_jiffies() which is called
> from interrupt entry when nohz-idle so that the interrupt (and possible
> tailing softirq) see a valid jiffies count.
>
> However, since we don't restart the tick, we won't be sampling load muck
> and calling calc_load_exit_idle() from there is bound to confuse state.
>
> I hope.. damn this code ;-)
>
> I can't find wtf went wrong either, the initial patch 5167e8d5417bf5c
> contains both hunks, but in that case the fixup 749c8814f0 doesn't make
> sense, not can I find anything in merge commits using:
>
> git log -S calc_load_exit_idle kernel/time/tick-sched.c
>
> /me puzzled

I'm puzzled as well. Any ideas if I should do anything here or not?

greg k-h

2012-10-05 01:39:56

by Jonathan Nieder

[permalink] [raw]
Subject: Re: Seems like "sched: Add missing call to calc_load_exit_idle()" should be reverted in 3.5 branch

Peter Zijlstra wrote:

> I can't find wtf went wrong either, the initial patch 5167e8d5417bf5c
> contains both hunks, but in that case the fixup 749c8814f0 doesn't make
> sense, not can I find anything in merge commits using:
>
> git log -S calc_load_exit_idle kernel/time/tick-sched.c

git log -m -p --first-parent -Scalc_load_exit_idle -- kernel/time/tick-sched.c

finds 3992c0321258 ("Merge branch 'timers-core-for-linus'",
2012-07-22), which seems to have mismerged 2ac0d98fd624 ("nohz: Make
nohz API agnostic against idle ticks cputime accounting").

Thanks,
Jonathan

2012-10-05 10:21:55

by Peter Zijlstra

[permalink] [raw]
Subject: Re: Seems like "sched: Add missing call to calc_load_exit_idle()" should be reverted in 3.5 branch

On Thu, 2012-10-04 at 15:27 -0700, Greg Kroah-Hartman wrote:
> I'm puzzled as well. Any ideas if I should do anything here or not?

So I think the current v3.5.5 code is fine. I'm just not smart enough to
figure out how 3.6 got fuzzed, this git thing is confusing as hell.

2012-10-05 16:17:48

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: Seems like "sched: Add missing call to calc_load_exit_idle()" should be reverted in 3.5 branch

On Fri, Oct 05, 2012 at 12:21:06PM +0200, Peter Zijlstra wrote:
> On Thu, 2012-10-04 at 15:27 -0700, Greg Kroah-Hartman wrote:
> > I'm puzzled as well. Any ideas if I should do anything here or not?
>
> So I think the current v3.5.5 code is fine. I'm just not smart enough to
> figure out how 3.6 got fuzzed, this git thing is confusing as hell.

Thanks for letting me know, and digging through this, I'll leave things
as-is.

greg k-h

2012-10-05 17:10:19

by Jonathan Nieder

[permalink] [raw]
Subject: Re: Seems like "sched: Add missing call to calc_load_exit_idle()" should be reverted in 3.5 branch

Peter Zijlstra wrote:
> On Thu, 2012-10-04 at 15:27 -0700, Greg Kroah-Hartman wrote:

>> I'm puzzled as well. Any ideas if I should do anything here or not?
>
> So I think the current v3.5.5 code is fine.

Now I'm puzzled. You wrote:

| However, since we don't restart the tick, we won't be sampling load muck
| and calling calc_load_exit_idle() from there is bound to confuse state.

Doesn't that mean 900404e5d201 "sched: Add missing call to
calc_load_exit_idle()" which is part of 3.5.5 was problematic? Or
did I just miscount the number of "not"s?

2012-10-05 17:24:12

by Peter Zijlstra

[permalink] [raw]
Subject: Re: Seems like "sched: Add missing call to calc_load_exit_idle()" should be reverted in 3.5 branch

On Fri, 2012-10-05 at 10:10 -0700, Jonathan Nieder wrote:
> Peter Zijlstra wrote:
> > On Thu, 2012-10-04 at 15:27 -0700, Greg Kroah-Hartman wrote:
>
> >> I'm puzzled as well. Any ideas if I should do anything here or not?
> >
> > So I think the current v3.5.5 code is fine.
>
> Now I'm puzzled. You wrote:
>
> | However, since we don't restart the tick, we won't be sampling load muck
> | and calling calc_load_exit_idle() from there is bound to confuse state.
>
> Doesn't that mean 900404e5d201 "sched: Add missing call to
> calc_load_exit_idle()" which is part of 3.5.5 was problematic? Or
> did I just miscount the number of "not"s?


Argh, yeah, so now I've managed to confuse everyone I'm afraid.

You are right, v3.5.5 has one calc_load_exit_idle() too many, the one in
tick_nohz_update_jiffies() needs to go.

Sorry.. I got entirely confused figuring out wth happened with 3.6.

2012-10-08 11:23:39

by Charles Wang

[permalink] [raw]
Subject: Re: Seems like "sched: Add missing call to calc_load_exit_idle()" should be reverted in 3.5 branch

On 10/05/2012 09:39 AM, Jonathan Nieder wrote:
> Peter Zijlstra wrote:
>
>> I can't find wtf went wrong either, the initial patch 5167e8d5417bf5c
>> contains both hunks, but in that case the fixup 749c8814f0 doesn't make
>> sense, not can I find anything in merge commits using:
>>
>> git log -S calc_load_exit_idle kernel/time/tick-sched.c
>
> git log -m -p --first-parent -Scalc_load_exit_idle -- kernel/time/tick-sched.c
>
> finds 3992c0321258 ("Merge branch 'timers-core-for-linus'",
> 2012-07-22), which seems to have mismerged 2ac0d98fd624 ("nohz: Make
> nohz API agnostic against idle ticks cputime accounting").
That's it. Patch 2ac0d98fd624 and 19f5f7364("nohz: Separate idle
sleeping time accounting from nohz logic") are produced on 2011-07-28,
merged on 2012-07-22, right after 5167e8d5417bf5c, but applied before
5167e8d5417bf5c.
These two patches changed tick_nohz_idle_exit, which causing Peter's
patch 5167e8d5417bf5c couldn't fully be applied.
There should be conflict being reported, but why we don't get is really
confused.
>
> Thanks,
> Jonathan
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2012-10-08 11:33:46

by Charles Wang

[permalink] [raw]
Subject: Re: Seems like "sched: Add missing call to calc_load_exit_idle()" should be reverted in 3.5 branch

On 10/06/2012 01:23 AM, Peter Zijlstra wrote:
> On Fri, 2012-10-05 at 10:10 -0700, Jonathan Nieder wrote:
>> Peter Zijlstra wrote:
>>> On Thu, 2012-10-04 at 15:27 -0700, Greg Kroah-Hartman wrote:
>>
>>>> I'm puzzled as well. Any ideas if I should do anything here or not?
>>>
>>> So I think the current v3.5.5 code is fine.
>>
>> Now I'm puzzled. You wrote:
>>
>> | However, since we don't restart the tick, we won't be sampling load muck
>> | and calling calc_load_exit_idle() from there is bound to confuse state.
>>
>> Doesn't that mean 900404e5d201 "sched: Add missing call to
>> calc_load_exit_idle()" which is part of 3.5.5 was problematic? Or
>> did I just miscount the number of "not"s?
>
>
> Argh, yeah, so now I've managed to confuse everyone I'm afraid.
>
> You are right, v3.5.5 has one calc_load_exit_idle() too many, the one in
> tick_nohz_update_jiffies() needs to go.
>
> Sorry.. I got entirely confused figuring out wth happened with 3.6.
>
High loadavg reported with v3.6, and I just checked the upstream code,
which puzzled many people. Sorry for that~