LinuxLists.cc - Very high CPU values in top on idle system (3.0-rc1)

2011-05-30 17:39:19

Subject: Very high CPU values in top on idle system (3.0-rc1)

I get very high CPU values when I run 'top' on my (mostly) idle system
with 3.0-rc1. For example mpd was always in the 1-2% range and is now
constantly over 50%.

This is caused by:

commit 317f394160e9beb97d19a84c39b7e5eb3d7815a8
Author: Peter Zijlstra <[email protected]>
Date: Tue Apr 5 17:23:58 2011 +0200

sched: Move the second half of ttwu() to the remote cpu

When I revert the above I see sane CPU values again.
--
Markus

2011-05-30 18:06:27

by Peter Zijlstra

[permalink] [raw]

Subject: Re: Very high CPU values in top on idle system (3.0-rc1)

On Mon, 2011-05-30 at 19:39 +0200, Markus Trippelsdorf wrote:
> I get very high CPU values when I run 'top' on my (mostly) idle system
> with 3.0-rc1. For example mpd was always in the 1-2% range and is now
> constantly over 50%.
>
> This is caused by:
>
> commit 317f394160e9beb97d19a84c39b7e5eb3d7815a8
> Author: Peter Zijlstra <[email protected]>
> Date: Tue Apr 5 17:23:58 2011 +0200
>
> sched: Move the second half of ttwu() to the remote cpu
>
> When I revert the above I see sane CPU values again.

So: echo NO_TTWU_QUEUE > /debug/sched_features, also cures it?

What architecture, what .config, and can you see it with anything other
than mpd (yum search mpd, only seems to result in mpd clients not the
actual server).

2011-05-30 18:24:00

On Tue, 2011-06-07 at 15:12 +0200, Borislav Petkov wrote:
> On Thu, Jun 02, 2011 at 11:48:31AM -0400, Peter Zijlstra wrote:
> > Crap.. you're right. And I bet other archs don't do that either. With
> > NO_HZ you really need irq_enter() for pretty much all interrupts so I
> > was assuming the resched IPI had it, but its been special and never
> > really needed it. If it would wake an idle cpu the idle loop exit would
> > deal with it, if it interrupted userspace the thing was running and
> > NO_HZ wasn't relevant.
> >
> > Damn.
> >
> > And yes, the only reason I didn't see this on my dev box was because we
> > do indeed set that sched_clock_stable thing on wsm. And I never noticed
> > on my desktop because firefox/X/etc. consuming heaps of CPU isn't weird
> > at all.
> >
> > Adding it to all resched int handlers is of course a possibility but
> > would slow down the thing, although with the new code, most users are
> > now indeed wakeups (excepting weird and wonderful users like KVM).
>
> FWIW, we could set the sched_clock_stable on AMD too - at least on F10h
> and later. This will take care of the problem at hand and defer the
> issue of slowing down the resched ipi handlers.
>
> I dunno, however, whether we still would need the proper ->tick_gtod
> update for correct ttwu accounting regardless of sched_clock_stable on
> K8 (unstable TSCs) and maybe even other arches.

Yeah, I think I'm going to commit this extra irq_enter() thing for now,
and then slowly go through all the arches again and remove this one here
once all archs are sorted.