On Thu, Oct 04, 2012 at 12:11:01PM +0800, Huacai Chen wrote:
> Hi, Greg
>
> I found that Linux-3.5.5 accept this commit "sched: Add missing call
> to calc_load_exit_idle()" but I think this isn't needed. Because
> "5167e8d5417b sched/nohz: Rewrite and fix load-avg computation --
> again not fully applied" is true for 3.6 branch, but not for 3.5
> branch.
But 5167e8d5417b is in 3.5, so shouldn't this commit still be necessary?
> In 3.5 branch, calc_load_exit_idle() is already called in
> tick_nohz_idle_exit(), it doesn't need to be called at
> tick_nohz_update_jiffies() again. In 3.6 branch, some code of
> tick_nohz_idle_exit() is splitted to tick_nohz_restart_sched_tick()
> and calc_load_exit_idle() is missing by accident, so commit "sched:
> Add missing call to calc_load_exit_idle()" is needed.
So this really should be dropped from 3.5? Charles, Peter, Ingo, any
thoughts here?
thanks,
greg k-h
On Thu, 2012-10-04 at 10:46 -0700, Greg Kroah-Hartman wrote:
> On Thu, Oct 04, 2012 at 12:11:01PM +0800, Huacai Chen wrote:
> > Hi, Greg
> >
> > I found that Linux-3.5.5 accept this commit "sched: Add missing call
> > to calc_load_exit_idle()" but I think this isn't needed. Because
> > "5167e8d5417b sched/nohz: Rewrite and fix load-avg computation --
> > again not fully applied" is true for 3.6 branch, but not for 3.5
> > branch.
>
> But 5167e8d5417b is in 3.5, so shouldn't this commit still be necessary?
>
> > In 3.5 branch, calc_load_exit_idle() is already called in
> > tick_nohz_idle_exit(), it doesn't need to be called at
> > tick_nohz_update_jiffies() again. In 3.6 branch, some code of
> > tick_nohz_idle_exit() is splitted to tick_nohz_restart_sched_tick()
> > and calc_load_exit_idle() is missing by accident, so commit "sched:
> > Add missing call to calc_load_exit_idle()" is needed.
>
> So this really should be dropped from 3.5? Charles, Peter, Ingo, any
> thoughts here?
Bah, lots of code movement there recently.. let me try and untangle all
that afresh.. /me checks out v3.5.5.
OK, assuming ->tick_stopped means what the label says, then we only want
to call calc_load_enter_idle() when it flips to 1 and
calc_load_exit_idle() when it flips back to 0, such that when an actual
tick happens its got correct state.
Now the actual patch "5167e8d5417b sched/nohz: Rewrite and fix load-avg
computation -- again not fully applied" modifies
tick_nohz_restart_sched_tick() which doesn't appear to exist in v3.5.5
and the patch fobbed it into tick_nohz_update_jiffies() which is called
from interrupt entry when nohz-idle so that the interrupt (and possible
tailing softirq) see a valid jiffies count.
However, since we don't restart the tick, we won't be sampling load muck
and calling calc_load_exit_idle() from there is bound to confuse state.
I hope.. damn this code ;-)
I can't find wtf went wrong either, the initial patch 5167e8d5417bf5c
contains both hunks, but in that case the fixup 749c8814f0 doesn't make
sense, not can I find anything in merge commits using:
git log -S calc_load_exit_idle kernel/time/tick-sched.c
/me puzzled
On Thu, Oct 04, 2012 at 08:31:59PM +0200, Peter Zijlstra wrote:
> On Thu, 2012-10-04 at 10:46 -0700, Greg Kroah-Hartman wrote:
> > On Thu, Oct 04, 2012 at 12:11:01PM +0800, Huacai Chen wrote:
> > > Hi, Greg
> > >
> > > I found that Linux-3.5.5 accept this commit "sched: Add missing call
> > > to calc_load_exit_idle()" but I think this isn't needed. Because
> > > "5167e8d5417b sched/nohz: Rewrite and fix load-avg computation --
> > > again not fully applied" is true for 3.6 branch, but not for 3.5
> > > branch.
> >
> > But 5167e8d5417b is in 3.5, so shouldn't this commit still be necessary?
> >
> > > In 3.5 branch, calc_load_exit_idle() is already called in
> > > tick_nohz_idle_exit(), it doesn't need to be called at
> > > tick_nohz_update_jiffies() again. In 3.6 branch, some code of
> > > tick_nohz_idle_exit() is splitted to tick_nohz_restart_sched_tick()
> > > and calc_load_exit_idle() is missing by accident, so commit "sched:
> > > Add missing call to calc_load_exit_idle()" is needed.
> >
> > So this really should be dropped from 3.5? Charles, Peter, Ingo, any
> > thoughts here?
>
> Bah, lots of code movement there recently.. let me try and untangle all
> that afresh.. /me checks out v3.5.5.
>
> OK, assuming ->tick_stopped means what the label says, then we only want
> to call calc_load_enter_idle() when it flips to 1 and
> calc_load_exit_idle() when it flips back to 0, such that when an actual
> tick happens its got correct state.
>
> Now the actual patch "5167e8d5417b sched/nohz: Rewrite and fix load-avg
> computation -- again not fully applied" modifies
> tick_nohz_restart_sched_tick() which doesn't appear to exist in v3.5.5
> and the patch fobbed it into tick_nohz_update_jiffies() which is called
> from interrupt entry when nohz-idle so that the interrupt (and possible
> tailing softirq) see a valid jiffies count.
>
> However, since we don't restart the tick, we won't be sampling load muck
> and calling calc_load_exit_idle() from there is bound to confuse state.
>
> I hope.. damn this code ;-)
>
> I can't find wtf went wrong either, the initial patch 5167e8d5417bf5c
> contains both hunks, but in that case the fixup 749c8814f0 doesn't make
> sense, not can I find anything in merge commits using:
>
> git log -S calc_load_exit_idle kernel/time/tick-sched.c
>
> /me puzzled
I'm puzzled as well. Any ideas if I should do anything here or not?
greg k-h
Peter Zijlstra wrote:
> I can't find wtf went wrong either, the initial patch 5167e8d5417bf5c
> contains both hunks, but in that case the fixup 749c8814f0 doesn't make
> sense, not can I find anything in merge commits using:
>
> git log -S calc_load_exit_idle kernel/time/tick-sched.c
git log -m -p --first-parent -Scalc_load_exit_idle -- kernel/time/tick-sched.c
finds 3992c0321258 ("Merge branch 'timers-core-for-linus'",
2012-07-22), which seems to have mismerged 2ac0d98fd624 ("nohz: Make
nohz API agnostic against idle ticks cputime accounting").
Thanks,
Jonathan
On Thu, 2012-10-04 at 15:27 -0700, Greg Kroah-Hartman wrote:
> I'm puzzled as well. Any ideas if I should do anything here or not?
So I think the current v3.5.5 code is fine. I'm just not smart enough to
figure out how 3.6 got fuzzed, this git thing is confusing as hell.
On Fri, Oct 05, 2012 at 12:21:06PM +0200, Peter Zijlstra wrote:
> On Thu, 2012-10-04 at 15:27 -0700, Greg Kroah-Hartman wrote:
> > I'm puzzled as well. Any ideas if I should do anything here or not?
>
> So I think the current v3.5.5 code is fine. I'm just not smart enough to
> figure out how 3.6 got fuzzed, this git thing is confusing as hell.
Thanks for letting me know, and digging through this, I'll leave things
as-is.
greg k-h
Peter Zijlstra wrote:
> On Thu, 2012-10-04 at 15:27 -0700, Greg Kroah-Hartman wrote:
>> I'm puzzled as well. Any ideas if I should do anything here or not?
>
> So I think the current v3.5.5 code is fine.
Now I'm puzzled. You wrote:
| However, since we don't restart the tick, we won't be sampling load muck
| and calling calc_load_exit_idle() from there is bound to confuse state.
Doesn't that mean 900404e5d201 "sched: Add missing call to
calc_load_exit_idle()" which is part of 3.5.5 was problematic? Or
did I just miscount the number of "not"s?
On Fri, 2012-10-05 at 10:10 -0700, Jonathan Nieder wrote:
> Peter Zijlstra wrote:
> > On Thu, 2012-10-04 at 15:27 -0700, Greg Kroah-Hartman wrote:
>
> >> I'm puzzled as well. Any ideas if I should do anything here or not?
> >
> > So I think the current v3.5.5 code is fine.
>
> Now I'm puzzled. You wrote:
>
> | However, since we don't restart the tick, we won't be sampling load muck
> | and calling calc_load_exit_idle() from there is bound to confuse state.
>
> Doesn't that mean 900404e5d201 "sched: Add missing call to
> calc_load_exit_idle()" which is part of 3.5.5 was problematic? Or
> did I just miscount the number of "not"s?
Argh, yeah, so now I've managed to confuse everyone I'm afraid.
You are right, v3.5.5 has one calc_load_exit_idle() too many, the one in
tick_nohz_update_jiffies() needs to go.
Sorry.. I got entirely confused figuring out wth happened with 3.6.
On 10/05/2012 09:39 AM, Jonathan Nieder wrote:
> Peter Zijlstra wrote:
>
>> I can't find wtf went wrong either, the initial patch 5167e8d5417bf5c
>> contains both hunks, but in that case the fixup 749c8814f0 doesn't make
>> sense, not can I find anything in merge commits using:
>>
>> git log -S calc_load_exit_idle kernel/time/tick-sched.c
>
> git log -m -p --first-parent -Scalc_load_exit_idle -- kernel/time/tick-sched.c
>
> finds 3992c0321258 ("Merge branch 'timers-core-for-linus'",
> 2012-07-22), which seems to have mismerged 2ac0d98fd624 ("nohz: Make
> nohz API agnostic against idle ticks cputime accounting").
That's it. Patch 2ac0d98fd624 and 19f5f7364("nohz: Separate idle
sleeping time accounting from nohz logic") are produced on 2011-07-28,
merged on 2012-07-22, right after 5167e8d5417bf5c, but applied before
5167e8d5417bf5c.
These two patches changed tick_nohz_idle_exit, which causing Peter's
patch 5167e8d5417bf5c couldn't fully be applied.
There should be conflict being reported, but why we don't get is really
confused.
>
> Thanks,
> Jonathan
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
On 10/06/2012 01:23 AM, Peter Zijlstra wrote:
> On Fri, 2012-10-05 at 10:10 -0700, Jonathan Nieder wrote:
>> Peter Zijlstra wrote:
>>> On Thu, 2012-10-04 at 15:27 -0700, Greg Kroah-Hartman wrote:
>>
>>>> I'm puzzled as well. Any ideas if I should do anything here or not?
>>>
>>> So I think the current v3.5.5 code is fine.
>>
>> Now I'm puzzled. You wrote:
>>
>> | However, since we don't restart the tick, we won't be sampling load muck
>> | and calling calc_load_exit_idle() from there is bound to confuse state.
>>
>> Doesn't that mean 900404e5d201 "sched: Add missing call to
>> calc_load_exit_idle()" which is part of 3.5.5 was problematic? Or
>> did I just miscount the number of "not"s?
>
>
> Argh, yeah, so now I've managed to confuse everyone I'm afraid.
>
> You are right, v3.5.5 has one calc_load_exit_idle() too many, the one in
> tick_nohz_update_jiffies() needs to go.
>
> Sorry.. I got entirely confused figuring out wth happened with 3.6.
>
High loadavg reported with v3.6, and I just checked the upstream code,
which puzzled many people. Sorry for that~