Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752143AbdCPNYn (ORCPT ); Thu, 16 Mar 2017 09:24:43 -0400 Received: from mail-qk0-f196.google.com ([209.85.220.196]:33610 "EHLO mail-qk0-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751547AbdCPNYk (ORCPT ); Thu, 16 Mar 2017 09:24:40 -0400 Date: Thu, 16 Mar 2017 09:24:27 -0400 From: "Gabriel L. Somlo" To: "Michael S. Tsirkin" Cc: linux-kernel@vger.kernel.org, Paolo Bonzini , Radim =?utf-8?B?S3LEjW3DocWZ?= , Jonathan Corbet , Thomas Gleixner , Ingo Molnar , "H. Peter Anvin" , x86@kernel.org, Joerg Roedel , kvm@vger.kernel.org, linux-doc@vger.kernel.org Subject: Re: [PATCH v5 untested] kvm: better MWAIT emulation for guests Message-ID: <20170316132426.GB4085@HEDWIG.INI.CMU.EDU> References: <1489612895-12799-1-git-send-email-mst@redhat.com> <20170315233534.GG2239@HEDWIG.INI.CMU.EDU> <20170316013903-mutt-send-email-mst@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170316013903-mutt-send-email-mst@kernel.org> X-Clacks-Overhead: GNU Terry Pratchett User-Agent: Mutt/1.7.1 (2016-10-04) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 8221 Lines: 182 On Thu, Mar 16, 2017 at 01:41:28AM +0200, Michael S. Tsirkin wrote: > On Wed, Mar 15, 2017 at 07:35:34PM -0400, Gabriel L. Somlo wrote: > > On Wed, Mar 15, 2017 at 11:22:18PM +0200, Michael S. Tsirkin wrote: > > > Guests running Mac OS 5, 6, and 7 (Leopard through Lion) have a problem: > > > unless explicitly provided with kernel command line argument > > > "idlehalt=0" they'd implicitly assume MONITOR and MWAIT availability, > > > without checking CPUID. > > > > > > We currently emulate that as a NOP but on VMX we can do better: let > > > guest stop the CPU until timer, IPI or memory change. CPU will be busy > > > but that isn't any worse than a NOP emulation. > > > > > > Note that mwait within guests is not the same as on real hardware > > > because halt causes an exit while mwait doesn't. For this reason it > > > might not be a good idea to use the regular MWAIT flag in CPUID to > > > signal this capability. Add a flag in the hypervisor leaf instead. > > > > > > Additionally, we add a capability for QEMU - e.g. if it knows there's an > > > isolated CPU dedicated for the VCPU it can set the standard MWAIT flag > > > to improve guest behaviour. > > > > Same behavior (on the mac pro 1,1 running F22 with custom-compiled > > kernel from kvm git master, plus this patch on top). > > > > The OS X 10.7 kernel hangs (or at least progresses extremely slowly) > > on boot, does not bring up guest graphical interface within the first > > 10 minutes that I waited for it. That, in contrast with the default > > nop-based emulation where the guest comes up within 30 seconds. > > > Thanks a lot, meanwhile I'll try to write a unit-test and experiment > with various behaviours. > > > I will run another round of tests on a newer Mac (4-year-old macbook > > air) and report back tomorrow. > > > > Going off on a tangent, why would encouraging otherwise well-behaved > > guests (like linux ones, for example) to use MWAIT be desirable to > > begin with ? Is it a matter of minimizing the overhead associated with > > exiting and re-entering L1 ? Because if so, AFAIR staying inside L1 and > > running guest-mode MWAIT in a tight loop will actually waste the host > > CPU without the opportunity to yield to some other L0 thread. Sorry if > > I fell into the middle of an ongoing conversation on this and missed > > most of the relevant context, in which case please feel free to ignore > > me... :) > > > > Thanks, > > --G > > It's just some experiments I'm running, I'm not ready to describe it > yet. I thought this part might be useful to at least some guests, so > trying to upstream it right now. OK, so on a macbook air running F25 and the latest kvm git master plus your v5 patch (4.11.0-rc2+), things appear to work. host-side cpuid output: eax=0x000040 ebx=0x000040 ecx=0x000003 edx=0x021120 guest-side cpuid output: eax=00000000 ebx=00000000 ecx=0x000003 edx=00000000 processor : 3 vendor_id : GenuineIntel cpu family : 6 model : 42 model name : Intel(R) Core(TM) i7-2677M CPU @ 1.80GHz stepping : 7 microcode : 0x29 cpu MHz : 1157.849 cache size : 4096 KB physical id : 0 siblings : 4 core id : 1 cpu cores : 2 apicid : 3 initial apicid : 3 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx lahf_lm tpr_shadow vnmi flexpriority ept vpid xsaveopt dtherm ida arat pln pts bugs : bogomips : 3604.68 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: After studying your patch a bit more carefully (sorry, it's crazy around here right now :) ) I realized you're simply trying to (selectively) decide when to exit L1 and emulate as NOP vs. when to just allow L1 to execute MONITOR & MWAIT natively. Is that right ? Because if so, the issues I saw on my MacPro1,1 are weird and inexplicable, given that allowing L>=1 to run MONITOR/MWAIT natively was one of the options Alex Graf and Rene Rebe used back in the very early days of OS X on QEMU, at the time I got involved with that project. Here's part of an out of tree patch against 3.4 which did just that, and worked as far as I remember on *any* MWAIT capable intel chip I had access to back in 2010: ############################################################################## # 99-mwait.patch.kvm-kmod (Rene Rebe ) 2010-04-27 ############################################################################## diff -pNarU5 linux-3.4/arch/x86/kvm/cpuid.c linux-3.4-mac/arch/x86/kvm/cpuid.c --- linux-3.4/arch/x86/kvm/cpuid.c 2012-05-20 18:29:13.000000000 -0400 +++ linux-3.4-mac/arch/x86/kvm/cpuid.c 2012-10-09 11:42:59.921215750 -0400 @@ -222,11 +222,11 @@ static int do_cpuid_ent(struct kvm_cpuid f_nx | 0 /* Reserved */ | F(MMXEXT) | F(MMX) | F(FXSR) | F(FXSR_OPT) | f_gbpages | f_rdtscp | 0 /* Reserved */ | f_lm | F(3DNOWEXT) | F(3DNOW); /* cpuid 1.ecx */ const u32 kvm_supported_word4_x86_features = - F(XMM3) | F(PCLMULQDQ) | 0 /* DTES64, MONITOR */ | + F(XMM3) | F(PCLMULQDQ) | F(MWAIT) /* DTES64, MONITOR */ | 0 /* DS-CPL, VMX, SMX, EST */ | 0 /* TM2 */ | F(SSSE3) | 0 /* CNXT-ID */ | 0 /* Reserved */ | F(FMA) | F(CX16) | 0 /* xTPR Update, PDCM */ | 0 /* Reserved, DCA */ | F(XMM4_1) | F(XMM4_2) | F(X2APIC) | F(MOVBE) | F(POPCNT) | diff -pNarU5 linux-3.4/arch/x86/kvm/svm.c linux-3.4-mac/arch/x86/kvm/svm.c --- linux-3.4/arch/x86/kvm/svm.c 2012-05-20 18:29:13.000000000 -0400 +++ linux-3.4-mac/arch/x86/kvm/svm.c 2012-10-09 11:44:41.598997481 -0400 @@ -1102,12 +1102,10 @@ static void init_vmcb(struct vcpu_svm *s set_intercept(svm, INTERCEPT_VMSAVE); set_intercept(svm, INTERCEPT_STGI); set_intercept(svm, INTERCEPT_CLGI); set_intercept(svm, INTERCEPT_SKINIT); set_intercept(svm, INTERCEPT_WBINVD); - set_intercept(svm, INTERCEPT_MONITOR); - set_intercept(svm, INTERCEPT_MWAIT); set_intercept(svm, INTERCEPT_XSETBV); control->iopm_base_pa = iopm_base; control->msrpm_base_pa = __pa(svm->msrpm); control->int_ctl = V_INTR_MASKING_MASK; diff -pNarU5 linux-3.4/arch/x86/kvm/vmx.c linux-3.4-mac/arch/x86/kvm/vmx.c --- linux-3.4/arch/x86/kvm/vmx.c 2012-05-20 18:29:13.000000000 -0400 +++ linux-3.4-mac/arch/x86/kvm/vmx.c 2012-10-09 11:42:59.925215977 -0400 @@ -1938,11 +1938,11 @@ static __init void nested_vmx_setup_ctls nested_vmx_procbased_ctls_low, nested_vmx_procbased_ctls_high); nested_vmx_procbased_ctls_low = 0; nested_vmx_procbased_ctls_high &= CPU_BASED_VIRTUAL_INTR_PENDING | CPU_BASED_USE_TSC_OFFSETING | CPU_BASED_HLT_EXITING | CPU_BASED_INVLPG_EXITING | - CPU_BASED_MWAIT_EXITING | CPU_BASED_CR3_LOAD_EXITING | + CPU_BASED_CR3_LOAD_EXITING | CPU_BASED_CR3_STORE_EXITING | #ifdef CONFIG_X86_64 CPU_BASED_CR8_LOAD_EXITING | CPU_BASED_CR8_STORE_EXITING | #endif CPU_BASED_MOV_DR_EXITING | CPU_BASED_UNCOND_IO_EXITING | @@ -2404,12 +2404,10 @@ static __init int setup_vmcs_config(stru CPU_BASED_CR3_LOAD_EXITING | CPU_BASED_CR3_STORE_EXITING | CPU_BASED_USE_IO_BITMAPS | CPU_BASED_MOV_DR_EXITING | CPU_BASED_USE_TSC_OFFSETING | - CPU_BASED_MWAIT_EXITING | - CPU_BASED_MONITOR_EXITING | CPU_BASED_INVLPG_EXITING | CPU_BASED_RDPMC_EXITING; opt = CPU_BASED_TPR_SHADOW | CPU_BASED_USE_MSR_BITMAPS | If all you're trying to do is (selectively) revert to this behavior, that "shouldn't" mess it up for the MacPro either, so I'm thoroughly confused at this point :) Back in 2010, running MWAIT in L>=1 behaved 100% exactly like a NOP, didn't power down the physical CPU, just immediately moved on to the next instruction. As such, there was no power saving and no opportunity to yield to another L0 thread either, unlike with NOP emulation at L0. Did that change on newer Intel chips (i.e., is guest-mode MWAIT now doing something smarter than just acting as a guest-mode NOP) ? Thanks, --Gabriel