2008-08-31 15:43:32

by Avi Kivity

[permalink] [raw]
Subject: [REGRESSION] High, likely incorrect process cpu usage counters with kvm and 2.6.2[67]

Running an idle Windows VM on Linux 2.6.26+ with kvm, one sees high
values for the kvm process in top (30%-70% cpu), where one would
normally expect 0%-1%. Surprisingly, the per-cpu system counters show
almost 100% idle, leading me to believe this is an accounting error and
that the process does not actually consume this much cpu.

I bisected this to a scheduler change, namely

commit 3e51f33fcc7f55e6df25d15b55ed10c8b4da84cd
Author: Peter Zijlstra <[email protected]>
Date: Sat May 3 18:29:28 2008 +0200

sched: add optional support for CONFIG_HAVE_UNSTABLE_SCHED_CLOCK

this replaces the rq->clock stuff (and possibly cpu_clock()).

- architectures that have an 'imperfect' hardware clock can set
CONFIG_HAVE_UNSTABLE_SCHED_CLOCK

- the 'jiffie' window might be superfulous when we update tick_gtod
before the __update_sched_clock() call in sched_clock_tick()

- cpu_clock() might be implemented as:

sched_clock_cpu(smp_processor_id())

if the accuracy proves good enough - how far can TSC drift in a
single jiffie when considering the filtering and idle hooks?

[ [email protected]: various fixes and cleanups ]

Signed-off-by: Peter Zijlstra <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>

Which is a bit too complex for me to work out.

Further information:
- the kvm thread which has the incorrect counter is the one that
actually executes guest code
- this thread mostly sleeps in schedule(), as one would expect
- it is periodically woken up by a timer; perhaps the problem is that
the process is sampled using the same timer, so it always shows as
running (though I'd expect it to report 100% cpu in that case).

Any help will be appreciated (or provided).

--
error compiling committee.c: too many arguments to function


2008-08-31 18:10:06

by Parag Warudkar

[permalink] [raw]
Subject: Re: [REGRESSION] High, likely incorrect process cpu usage counters with kvm and 2.6.2[67]

On Sun, Aug 31, 2008 at 11:43 AM, Avi Kivity <[email protected]> wrote:
> Running an idle Windows VM on Linux 2.6.26+ with kvm, one sees high values
> for the kvm process in top (30%-70% cpu), where one would normally expect
> 0%-1%. Surprisingly, the per-cpu system counters show almost 100% idle,
> leading me to believe this is an accounting error and that the process does
> not actually consume this much cpu.

Busted process accounting - This looks the same as
http://bugzilla.kernel.org/show_bug.cgi?id=11209 .
Please verify. Peter's patch in latest git stops showing "incorrect
looking" CPU usage but at least the process times are still wrong,
horribly.
In fact the CPU usage thing in -rc5 is likely also incorrect but I
need to analyze that bit a little more.

>From Today's Git -

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

12961 parag 20 0 83000 8908 6628 R 0 0.1 5124415h npviewer.bin

>
> I bisected this to a scheduler change, namely
>
> commit 3e51f33fcc7f55e6df25d15b55ed10c8b4da84cd
> Author: Peter Zijlstra <[email protected]>
> Date: Sat May 3 18:29:28 2008 +0200
>
> sched: add optional support for CONFIG_HAVE_UNSTABLE_SCHED_CLOCK
> this replaces the rq->clock stuff (and possibly cpu_clock()).
> - architectures that have an 'imperfect' hardware clock can set
> CONFIG_HAVE_UNSTABLE_SCHED_CLOCK
> - the 'jiffie' window might be superfulous when we update tick_gtod
> before the __update_sched_clock() call in sched_clock_tick()
> - cpu_clock() might be implemented as:
> sched_clock_cpu(smp_processor_id())
> if the accuracy proves good enough - how far can TSC drift in a
> single jiffie when considering the filtering and idle hooks?
> [ [email protected]: various fixes and cleanups ]
> Signed-off-by: Peter Zijlstra <[email protected]>
> Signed-off-by: Ingo Molnar <[email protected]>

That patch sounds like it had open questions?
Really giving this is a long standing bad regression, all the
offending patches should be reverted in absence of a fix, no?

Parag