2011-02-01 15:53:47

by Glauber Costa

[permalink] [raw]
Subject: Re: [PATCH v2 4/6] KVM-GST: KVM Steal time registration

On Sun, 2011-01-30 at 15:16 +0200, Avi Kivity wrote:
> On 01/28/2011 09:52 PM, Glauber Costa wrote:
> > Register steal time within KVM. Everytime we sample the steal time
> > information, we update a local variable that tells what was the
> > last time read. We then account the difference.
> >
> >
> >
> > static void kvm_guest_cpu_offline(void *dummy)
> > {
> > kvm_pv_disable_apf(NULL);
> > + native_write_msr(MSR_KVM_STEAL_TIME, 0, 0);
> > apf_task_wake_all();
> > }
>
> Don't use the native_ versions, they override the pvops implementation.
> It doesn't matter for kvm, but we're not supposed to know this.

fair.

> > + /*
> > + * using nanoseconds introduces noise, which accumulates easily
> > + * leading to big steal time values. We want, however, to keep the
> > + * interface nanosecond-based for future-proofness. The hypervisor may
> > + * adopt a similar strategy, but we can't rely on that.
> > + */
> > + delta /= NSEC_PER_MSEC;
> > + delta *= NSEC_PER_MSEC;
>
> You're working around this problem both in the guest and host. So even
> if we fix it in one, it will still be broken in the other.

And if you notice, in two different ways:
I am (was) forcing to usecs in the host, and msecs in the guest.
One of the problems here, is that if we account steal time, we refrain
from accounting user / system time. Reason being, that if we account it,
we'll end up with more than HZ ticks per HZ, since we'll account ticks
as both steal and real.

And since the granularity of the cpu accounting is too coarse, we end up
with much more steal time than we should, because things that are less
than 1 unity of cputime, are often rounded up to 1 unity of cputime.

Now, I've already said that I will investigate further, and I'm ready to
back of from all of this. But assuming my analysis is right so far, what
if we keep things in nsecs or msecs, and only convert to cputime in the
time of read? This would allow us to just subtract steal time from
user/system time, in a more fine grained way.


2011-02-01 16:16:37

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v2 4/6] KVM-GST: KVM Steal time registration

On Tue, 2011-02-01 at 13:53 -0200, Glauber Costa wrote:
>
> And since the granularity of the cpu accounting is too coarse, we end up
> with much more steal time than we should, because things that are less
> than 1 unity of cputime, are often rounded up to 1 unity of cputime.

See, that! is the problem, don't round up like that.

What you can do is: steal_ticks = steal_time_clock() / TICK_NSEC, or
simply keep a steal time delta and every time it overflows
cputime_one_jiffy insert a steal-time tick.

Venki might have created some infrastructure for doing this with the
IRQ_TIME accounting mess, but irqtime_account_process_tick() still gives
me a head-ache.


2011-02-01 17:00:58

by Glauber Costa

[permalink] [raw]
Subject: Re: [PATCH v2 4/6] KVM-GST: KVM Steal time registration


>
> See, that! is the problem, don't round up like that.

Yeah, I was using usecs_to_cputime(), believing this was the standard interface. By the way, one of the things that also led to better results
were just forcing it to 0 every time we had steal == 1 in the end. But
*that* was a real hack =)

> What you can do is: steal_ticks = steal_time_clock() / TICK_NSEC, or
> simply keep a steal time delta and every time it overflows
> cputime_one_jiffy insert a steal-time tick.

What do you think about keeping accounting in msec/usec resolution and
(thus allowing us to compute half a tick to user/system, other half to
steal time) only change it to cputime in the last minute?

2011-02-01 17:43:17

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v2 4/6] KVM-GST: KVM Steal time registration

On Tue, 2011-02-01 at 15:00 -0200, Glauber Costa wrote:
>
> > What you can do is: steal_ticks = steal_time_clock() / TICK_NSEC, or
> > simply keep a steal time delta and every time it overflows
> > cputime_one_jiffy insert a steal-time tick.
>
> What do you think about keeping accounting in msec/usec resolution and
> (thus allowing us to compute half a tick to user/system, other half to
> steal time) only change it to cputime in the last minute?

its only accounting full tick..

2011-02-01 20:20:46

by Venkatesh Pallipadi

[permalink] [raw]
Subject: Re: [PATCH v2 4/6] KVM-GST: KVM Steal time registration

On Tue, Feb 1, 2011 at 9:44 AM, Peter Zijlstra <[email protected]> wrote:
> On Tue, 2011-02-01 at 15:00 -0200, Glauber Costa wrote:
>>
>> > What you can do is: steal_ticks = steal_time_clock() / TICK_NSEC, or
>> > simply keep a steal time delta and every time it overflows
>> > cputime_one_jiffy insert a steal-time tick.
>>
>> What do you think about keeping accounting in msec/usec resolution and
>> (thus allowing us to compute half a tick to user/system, other half to
>> steal time) only change it to cputime in the last minute?
>
> its only accounting full tick..
>

Yes. The way we ended up dealing with this in irq time case is track
it in fine granularity and accumulate over time (internally) but
account it (make it visible externally) only in terms of ticks, only
when the value being accumulated crosses the tick boundary. This does
has a hole when we use 99% of time on tick on irq and use 1% just
before the tick on some system, then whole tick will be system and on
next tick if there is 1% irq and 99% system then that will be
accounted as irq as our accumulated value crosses the tick boundary
then. But, such holes on avg should not be worse than not having fine
granularity.

Thanks,
Venki