Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755464AbdGKHnU (ORCPT ); Tue, 11 Jul 2017 03:43:20 -0400 Received: from mx1.redhat.com ([209.132.183.28]:56272 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755242AbdGKHnS (ORCPT ); Tue, 11 Jul 2017 03:43:18 -0400 DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com AE475C060208 Authentication-Results: ext-mx08.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx08.extmail.prod.ext.phx2.redhat.com; spf=pass smtp.mailfrom=pbonzini@redhat.com DKIM-Filter: OpenDKIM Filter v2.11.0 mx1.redhat.com AE475C060208 Subject: Re: [PATCH RFC 0/2] KVM: x86: Support using the VMX preemption timer for APIC Timer periodic/oneshot mode To: Andy Lutomirski , Wanpeng Li , linux-kernel@vger.kernel.org, kvm@vger.kernel.org, Thomas Gleixner , X86 ML Cc: =?UTF-8?B?UmFkaW0gS3LEjW3DocWZ?= , Yunhong Jiang , Wanpeng Li References: <1476188240-3502-1-git-send-email-wanpeng.li@hotmail.com> From: Paolo Bonzini Message-ID: Date: Tue, 11 Jul 2017 09:43:14 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.2.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.32]); Tue, 11 Jul 2017 07:43:17 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3179 Lines: 79 On 11/07/2017 02:13, Andy Lutomirski wrote: > On 10/11/2016 05:17 AM, Wanpeng Li wrote: >> Most windows guests which I have on hand currently still utilize APIC >> Timer >> periodic/oneshot mode instead of APIC Timer tsc-deadline mode: >> - windows 2008 server r2 >> - windows 2012 server r2 >> - windows 7 >> - windows 10 >> >> This patchset adds the support using the VMX preemption timer for APIC >> Timer >> periodic/oneshot mode. >> >> I add a print in oneshot mode testcase of kvm-unit-tests/apic.flat and >> observed >> that w/ patch the latency is ~2% of w/o patch. I think maybe something >> is still >> not right in the patchset, in addition, tmcct in apic_get_tmcct() >> maybe is not >> calculated correctly. Your comments to improve the patchset is a great >> appreciated. >> >> Wanpeng Li (2): >> KVM: lapic: Extract start_sw_period() to handle oneshot/periodic mode >> KVM: x86: Support using the vmx preemption timer for APIC Timer >> periodic/one mode >> >> arch/x86/kvm/lapic.c | 162 >> ++++++++++++++++++++++++++++++--------------------- >> 1 file changed, 95 insertions(+), 67 deletions(-) >> > > I think this is a step in the right direction, but I think there's a > different approach that would be much, much faster: use the VMX > preemption timer for *host* preemption. Specifically, do this: > > 1. Refactor the host TSC deadline timer a bit to allow the TSC deadline > timer to be "borrow". It might look something like this: > > u64 borrow_tsc_deadline(void (*timer_callback)()); > > The caller is now permitted to use the TSC deadline timer for its own > nefarious purposes. The caller promises to call return_tsc_deadline() > in a timely manner if the TSC exceeds the return value while the > deadline timer is borrowed. > > If the TSC deadline fires while it's borrowed, timer_callback() will be > called. > > void return_tsc_deadline(bool timer_fired); > > The caller is done borrowing the TSC deadline timer. The caller need > not reset the TSC deadline timer MSR to its previous value before > calling this. It must be called with IRQs on and preemption off. > > Getting this to work cleanly without races may be a bit tricky. So be it. > > 2. Teach KVM to use the VMX preemption timer as a substitute deadline > timer while in guest mode. Specifically, KVM will borrow_tsc_deadline() > (if TSC deadline is enabled) when entering guest mode and > return_tsc_deadline() when returning out of guest mode. > > 3. Now KVM can change its MSR bitmaps to allow the guest to program the > TSC deadline MSR directly. No exit at all needed to handle guest writes > to the deadline timer. This assumes that the TSC deadline MSR observes the guest TSC offset, which I'm not at all sure of. If you can't, you break live migration. Also, while it would halve the cost of a guest's programming of the timer tick, you would still incur the cost of a vmexit to call timer_callback (it would be different if you could program the TSC deadline timer to send a posted interrupt, of course). Things would be half as slow, but still a far cry from bare metal. Really, we should just ask Intel to virtualize the TSC deadline MSR. Paolo