Message-ID: <4C758194.5060203@redhat.com>
Date: Wed, 25 Aug 2010 10:48:20 -1000
From: Zachary Amsden <zamsden@redhat.com>
User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.10) Gecko/20100621 Fedora/3.0.5-1.fc13 Thunderbird/3.0.5
MIME-Version: 1.0
To: Marcelo Tosatti <mtosatti@redhat.com>
CC: kvm@vger.kernel.org, Avi Kivity <avi@redhat.com>,
        Glauber Costa <glommer@redhat.com>,
        Thomas Gleixner <tglx@linutronix.de>,
        John Stultz <johnstul@us.ibm.com>, linux-kernel@vger.kernel.org
Subject: Re: [KVM timekeeping 25/35] Add clock catchup mode
References: <1282291669-25709-1-git-send-email-zamsden@redhat.com> <1282291669-25709-26-git-send-email-zamsden@redhat.com> <20100825172718.GA28380@amt.cnet>
In-Reply-To: <20100825172718.GA28380@amt.cnet>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 5013
Lines: 126

On 08/25/2010 07:27 AM, Marcelo Tosatti wrote:
> On Thu, Aug 19, 2010 at 10:07:39PM -1000, Zachary Amsden wrote:
>    
>> Make the clock update handler handle generic clock synchronization,
>> not just KVM clock.  We add a catchup mode which keeps passthrough
>> TSC in line with absolute guest TSC.
>>
>> Signed-off-by: Zachary Amsden<zamsden@redhat.com>
>> ---
>>   arch/x86/include/asm/kvm_host.h |    1 +
>>   arch/x86/kvm/x86.c              |   55 ++++++++++++++++++++++++++------------
>>   2 files changed, 38 insertions(+), 18 deletions(-)
>>
>>      
>    
>>   	kvm_x86_ops->vcpu_load(vcpu, cpu);
>> -	if (unlikely(vcpu->cpu != cpu) || check_tsc_unstable()) {
>> +	if (unlikely(vcpu->cpu != cpu) || vcpu->arch.tsc_rebase) {
>>   		/* Make sure TSC doesn't go backwards */
>>   		s64 tsc_delta = !vcpu->arch.last_host_tsc ? 0 :
>>   				native_read_tsc() - vcpu->arch.last_host_tsc;
>>   		if (tsc_delta<  0)
>>   			mark_tsc_unstable("KVM discovered backwards TSC");
>> -		if (check_tsc_unstable())
>> +		if (check_tsc_unstable()) {
>>   			kvm_x86_ops->adjust_tsc_offset(vcpu, -tsc_delta);
>> -		kvm_migrate_timers(vcpu);
>> +			kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
>> +		}
>> +		if (vcpu->cpu != cpu)
>> +			kvm_migrate_timers(vcpu);
>>   		vcpu->cpu = cpu;
>> +		vcpu->arch.tsc_rebase = 0;
>>   	}
>>   }
>>
>> @@ -1947,6 +1961,12 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
>>   	kvm_x86_ops->vcpu_put(vcpu);
>>   	kvm_put_guest_fpu(vcpu);
>>   	vcpu->arch.last_host_tsc = native_read_tsc();
>> +
>> +	/* For unstable TSC, force compensation and catchup on next CPU */
>> +	if (check_tsc_unstable()) {
>> +		vcpu->arch.tsc_rebase = 1;
>> +		kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
>> +	}
>>      
> The mix between catchup,trap versus stable,unstable TSC is confusing and
> difficult to grasp. Can you please introduce all the infrastructure
> first, then control usage of them in centralized places? Examples:
>
> +static void kvm_update_tsc_trapping(struct kvm *kvm)
> +{
> +       int trap, i;
> +       struct kvm_vcpu *vcpu;
> +
> +       trap = check_tsc_unstable()&&  atomic_read(&kvm->online_vcpus)>  1;
> +       kvm_for_each_vcpu(i, vcpu, kvm)
> +               kvm_x86_ops->set_tsc_trap(vcpu, trap&&  !vcpu->arch.time_page);
> +}
>
> +       /* For unstable TSC, force compensation and catchup on next CPU */
> +       if (check_tsc_unstable()) {
> +               vcpu->arch.tsc_rebase = 1;
> +               kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
> +       }
>
>
> kvm_guest_time_update is becoming very confusing too. I understand this
> is due to the many cases its dealing with, but please make it as simple
> as possible.
>    

I tried to comment as best as I could.  I think the whole 
"kvm_update_tsc_trapping" thing is probably a poor design choice.  It 
works, but it's thoroughly unintelligible right now without spending 
some days figuring out why.

I'll rework the tail series of patches to try to make them more clear.

> +       /*
> +        * If we are trapping and no longer need to, use catchup to
> +        * ensure passthrough TSC will not be less than trapped TSC
> +        */
> +       if (vcpu->tsc_mode == TSC_MODE_PASSTHROUGH&&  vcpu->tsc_trapping&&
> +           ((this_tsc_khz<= v->kvm->arch.virtual_tsc_khz || kvmclock))) {
> +               catchup = 1;
>
> What, TSC trapping with kvmclock enabled?
>    

Transitioning to use of kvmclock after a cold boot means we may have 
been trapping and now we will not be.

> For both catchup and trapping the resolution of the host clock is
> important, as Glauber commented for kvmclock. Can you comment on the
> problems that arrive from a low res clock for both modes?
>
> Similarly for catchup mode, the effect of exit frequency. No need for
> any guarantees?
>    

The scheduler will do something to get an IRQ at whatever resolution it 
uses for it's timeslice.  That guarantees an exit per timeslice, so 
we'll never be behind by more than one slice while scheduling.  While 
not scheduling, we're dormant anyway, waiting on either an IRQ or shared 
memory variable change.  Local timers could end up behind when dormant.

We may need a hack to accelerate firing of timers in such a case, or 
perhaps bounds on when to use catchup mode and when to not.

Partly, the lack of implementation is by deliberate choice; the logic 
involved with setting such bounds and wisdom of doing so is a choice 
most likely to be done by a policy agent in userspace, in our case, 
qemu.  In the end, that is what has full control over the setting or not 
of guest TSC rate and choice of TSC mode.

What's lacking is the ability to force the use of a certain mode.  I 
think it's clear now, that needs to be a per-VM choice, not a global one.

Zach
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/