Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754137AbdCPO7S (ORCPT ); Thu, 16 Mar 2017 10:59:18 -0400 Received: from mail-qk0-f194.google.com ([209.85.220.194]:35938 "EHLO mail-qk0-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753936AbdCPO7K (ORCPT ); Thu, 16 Mar 2017 10:59:10 -0400 Date: Thu, 16 Mar 2017 10:58:20 -0400 From: "Gabriel L. Somlo" To: "Michael S. Tsirkin" Cc: linux-kernel@vger.kernel.org, Paolo Bonzini , Radim =?utf-8?B?S3LEjW3DocWZ?= , Jonathan Corbet , Thomas Gleixner , Ingo Molnar , "H. Peter Anvin" , x86@kernel.org, Joerg Roedel , kvm@vger.kernel.org, linux-doc@vger.kernel.org Subject: Re: [PATCH v5 untested] kvm: better MWAIT emulation for guests Message-ID: <20170316145819.GC4085@HEDWIG.INI.CMU.EDU> References: <1489612895-12799-1-git-send-email-mst@redhat.com> <20170315233534.GG2239@HEDWIG.INI.CMU.EDU> <20170316013903-mutt-send-email-mst@kernel.org> <20170316132426.GB4085@HEDWIG.INI.CMU.EDU> <20170316155550-mutt-send-email-mst@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20170316155550-mutt-send-email-mst@kernel.org> X-Clacks-Overhead: GNU Terry Pratchett User-Agent: Mutt/1.7.1 (2016-10-04) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 6930 Lines: 136 On Thu, Mar 16, 2017 at 04:04:12PM +0200, Michael S. Tsirkin wrote: > On Thu, Mar 16, 2017 at 09:24:27AM -0400, Gabriel L. Somlo wrote: > > After studying your patch a bit more carefully (sorry, it's crazy > > around here right now :) ) I realized you're simply trying to > > (selectively) decide when to exit L1 and emulate as NOP vs. when to > > just allow L1 to execute MONITOR & MWAIT natively. > > > > Is that right ? Because if so, the issues I saw on my MacPro1,1 are > > weird and inexplicable, given that allowing L>=1 to run MONITOR/MWAIT > > natively was one of the options Alex Graf and Rene Rebe used back in > > the very early days of OS X on QEMU, at the time I got involved with > > that project. Here's part of an out of tree patch against 3.4 which did > > just that, and worked as far as I remember on *any* MWAIT capable > > intel chip I had access to back in 2010: > > > > ############################################################################## > > # 99-mwait.patch.kvm-kmod (Rene Rebe ) 2010-04-27 > > ############################################################################## > > diff -pNarU5 linux-3.4/arch/x86/kvm/cpuid.c linux-3.4-mac/arch/x86/kvm/cpuid.c > > --- linux-3.4/arch/x86/kvm/cpuid.c 2012-05-20 18:29:13.000000000 -0400 > > +++ linux-3.4-mac/arch/x86/kvm/cpuid.c 2012-10-09 11:42:59.921215750 -0400 > > @@ -222,11 +222,11 @@ static int do_cpuid_ent(struct kvm_cpuid > > f_nx | 0 /* Reserved */ | F(MMXEXT) | F(MMX) | > > F(FXSR) | F(FXSR_OPT) | f_gbpages | f_rdtscp | > > 0 /* Reserved */ | f_lm | F(3DNOWEXT) | F(3DNOW); > > /* cpuid 1.ecx */ > > const u32 kvm_supported_word4_x86_features = > > - F(XMM3) | F(PCLMULQDQ) | 0 /* DTES64, MONITOR */ | > > + F(XMM3) | F(PCLMULQDQ) | F(MWAIT) /* DTES64, MONITOR */ | > > 0 /* DS-CPL, VMX, SMX, EST */ | > > 0 /* TM2 */ | F(SSSE3) | 0 /* CNXT-ID */ | 0 /* Reserved */ | > > F(FMA) | F(CX16) | 0 /* xTPR Update, PDCM */ | > > 0 /* Reserved, DCA */ | F(XMM4_1) | > > F(XMM4_2) | F(X2APIC) | F(MOVBE) | F(POPCNT) | > > diff -pNarU5 linux-3.4/arch/x86/kvm/svm.c linux-3.4-mac/arch/x86/kvm/svm.c > > --- linux-3.4/arch/x86/kvm/svm.c 2012-05-20 18:29:13.000000000 -0400 > > +++ linux-3.4-mac/arch/x86/kvm/svm.c 2012-10-09 11:44:41.598997481 -0400 > > @@ -1102,12 +1102,10 @@ static void init_vmcb(struct vcpu_svm *s > > set_intercept(svm, INTERCEPT_VMSAVE); > > set_intercept(svm, INTERCEPT_STGI); > > set_intercept(svm, INTERCEPT_CLGI); > > set_intercept(svm, INTERCEPT_SKINIT); > > set_intercept(svm, INTERCEPT_WBINVD); > > - set_intercept(svm, INTERCEPT_MONITOR); > > - set_intercept(svm, INTERCEPT_MWAIT); > > set_intercept(svm, INTERCEPT_XSETBV); > > > > control->iopm_base_pa = iopm_base; > > control->msrpm_base_pa = __pa(svm->msrpm); > > control->int_ctl = V_INTR_MASKING_MASK; > > diff -pNarU5 linux-3.4/arch/x86/kvm/vmx.c linux-3.4-mac/arch/x86/kvm/vmx.c > > --- linux-3.4/arch/x86/kvm/vmx.c 2012-05-20 18:29:13.000000000 -0400 > > +++ linux-3.4-mac/arch/x86/kvm/vmx.c 2012-10-09 11:42:59.925215977 -0400 > > @@ -1938,11 +1938,11 @@ static __init void nested_vmx_setup_ctls > > nested_vmx_procbased_ctls_low, nested_vmx_procbased_ctls_high); > > nested_vmx_procbased_ctls_low = 0; > > nested_vmx_procbased_ctls_high &= > > CPU_BASED_VIRTUAL_INTR_PENDING | CPU_BASED_USE_TSC_OFFSETING | > > CPU_BASED_HLT_EXITING | CPU_BASED_INVLPG_EXITING | > > - CPU_BASED_MWAIT_EXITING | CPU_BASED_CR3_LOAD_EXITING | > > + CPU_BASED_CR3_LOAD_EXITING | > > CPU_BASED_CR3_STORE_EXITING | > > #ifdef CONFIG_X86_64 > > CPU_BASED_CR8_LOAD_EXITING | CPU_BASED_CR8_STORE_EXITING | > > #endif > > CPU_BASED_MOV_DR_EXITING | CPU_BASED_UNCOND_IO_EXITING | > > @@ -2404,12 +2404,10 @@ static __init int setup_vmcs_config(stru > > CPU_BASED_CR3_LOAD_EXITING | > > CPU_BASED_CR3_STORE_EXITING | > > CPU_BASED_USE_IO_BITMAPS | > > CPU_BASED_MOV_DR_EXITING | > > CPU_BASED_USE_TSC_OFFSETING | > > - CPU_BASED_MWAIT_EXITING | > > - CPU_BASED_MONITOR_EXITING | > > CPU_BASED_INVLPG_EXITING | > > CPU_BASED_RDPMC_EXITING; > > > > opt = CPU_BASED_TPR_SHADOW | > > CPU_BASED_USE_MSR_BITMAPS | > > > > If all you're trying to do is (selectively) revert to this behavior, > > that "shouldn't" mess it up for the MacPro either, so I'm thoroughly > > confused at this point :) > > Yes. Me too. Want to try that other patch and see what happens? You mean the old 3.4 patch against current KVM ? I'll try to do that, might take me a while :) > > Back in 2010, running MWAIT in L>=1 behaved 100% exactly like a NOP, > > didn't power down the physical CPU, just immediately moved on to the > > next instruction. As such, there was no power saving and no > > opportunity to yield to another L0 thread either, unlike with NOP > > emulation at L0. > > > > Did that change on newer Intel chips (i.e., is guest-mode MWAIT now > > doing something smarter than just acting as a guest-mode NOP) ? > > > > Thanks, > > --Gabriel > > Interesting. What it seems to say is this: > > MWAIT. Behavior of the MWAIT instruction (which always causes an invalid- > opcode exception—#UD—if CPL > 0) is determined by the setting of the “MWAIT > exiting” VM-execution control: > — If the “MWAIT exiting” VM-execution control is 1, MWAIT causes a VM exit > (see Section 22.1.3). > — If the “MWAIT exiting” VM-execution control is 0, MWAIT operates normally if > any of the following is true: (1) the “interrupt-window exiting” VM-execution > control is 0; (2) ECX[0] is 0; or (3) RFLAGS.IF = 1. > — If the “MWAIT exiting” VM-execution control is 0, the “interrupt-window > exiting” VM-execution control is 1, ECX[0] = 1, and RFLAGS.IF = 0, MWAIT > does not cause the processor to enter an implementation-dependent > optimized state; instead, control passes to the instruction following the > MWAIT instruction. > > > And since interrupt-window exiting is 0 most of the time for KVM, > I would expect MWAIT to behave normally. The intel manual said the same thing back in 2010 as well. However, regardless of how any flags were set, interrupt-window exiting or not, "normal" L1 MWAIT behavior was that it woke up immediately regardless. Remember, never going to sleep is still correct ("normal" ?) behavior per the ISA definition of MWAIT :) Also, when I tested your patch on the macbook air (where it worked), not only was the host reporting 400% CPU for qemu (which is to be expected), but the thermal fan/cooling thing also shifted up into high gear, which means the physical CPU got hot, which it shouldn't have if the guest-mode MWAIT actually did put the host CPU into low power. So at least on this 4-year-old core-I7 chip, the story Intel tells in its manual still doesn't check out. I could never get any clarification on what they mean by "operates normally" :)