Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753419AbdCOU3T (ORCPT ); Wed, 15 Mar 2017 16:29:19 -0400 Received: from mail-qk0-f194.google.com ([209.85.220.194]:34759 "EHLO mail-qk0-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751765AbdCOU3R (ORCPT ); Wed, 15 Mar 2017 16:29:17 -0400 Date: Wed, 15 Mar 2017 16:21:41 -0400 From: "Gabriel L. Somlo" To: Radim =?utf-8?B?S3LEjW3DocWZ?= Cc: "Michael S. Tsirkin" , linux-kernel@vger.kernel.org, Paolo Bonzini , Jonathan Corbet , Thomas Gleixner , Ingo Molnar , "H. Peter Anvin" , x86@kernel.org, Joerg Roedel , kvm@vger.kernel.org, linux-doc@vger.kernel.org Subject: Re: [PATCH v4] kvm: better MWAIT emulation for guests Message-ID: <20170315202140.GD2239@HEDWIG.INI.CMU.EDU> References: <1489605443-21045-1-git-send-email-mst@redhat.com> <20170315201348.GA14076@potion> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20170315201348.GA14076@potion> X-Clacks-Overhead: GNU Terry Pratchett User-Agent: Mutt/1.7.1 (2016-10-04) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4796 Lines: 137 On Wed, Mar 15, 2017 at 09:13:49PM +0100, Radim Krčmář wrote: > 2017-03-15 21:28+0200, Michael S. Tsirkin: > > Guests running Mac OS 5, 6, and 7 (Leopard through Lion) have a problem: > > unless explicitly provided with kernel command line argument > > "idlehalt=0" they'd implicitly assume MONITOR and MWAIT availability, > > without checking CPUID. > > > > We currently emulate that as a NOP but on VMX we can do better: let > > guest stop the CPU until timer, IPI or memory change. CPU will be busy > > but that isn't any worse than a NOP emulation. > > > > Note that mwait within guests is not the same as on real hardware > > because halt causes an exit while mwait doesn't. For this reason it > > might not be a good idea to use the regular MWAIT flag in CPUID to > > signal this capability. Add a flag in the hypervisor leaf instead. > > > > Additionally, we add a capability for QEMU - e.g. if it knows there's an > > isolated CPU dedicated for the VCPU it can set the standard MWAIT flag > > to improve guest behaviour. > > > > Reported-by: "Gabriel L. Somlo" > > Signed-off-by: Michael S. Tsirkin > > --- > > > > Note: SVM bits are untested at this point. Seems pretty > > obvious though. > > > > changes from v3: > > - don't enable capability if cli+mwait blocks interrupts > > - doc typo fixes (drop drop ppc doc) > > > > changes from v2: > > - add a capability to allow host userspace to detect new kernels > > - more documentation to clarify the semantics of the feature flag > > and why it's useful > > - svm support as suggested by Radim > > > > changes from v1: > > - typo fix resulting in rest of leaf flags being overwritten > > Reported by: Wanpeng Li > > - updated commit log with data about guests helped by this feature > > - better document differences between mwait and halt for guests > > > > diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h > > @@ -212,4 +213,28 @@ static inline u64 nsec_to_cycles(struct kvm_vcpu *vcpu, u64 nsec) > > __rem; \ > > }) > > > > +static bool kvm_mwait_in_guest(void) > > +{ > > + unsigned int eax, ebx, ecx; > > + > > + if (!cpu_has(&boot_cpu_data, X86_FEATURE_MWAIT)) > > + return -ENODEV; > > + > > + if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL) > > + return -ENODEV; > > + > > + /* > > + * Intel CPUs without CPUID5_ECX_INTERRUPT_BREAK are problematic as > > + * they would allow guest to stop the CPU completely by disabling > > + * interrupts then invoking MWAIT. > > + */ > > + if (boot_cpu_data.cpuid_level < CPUID_MWAIT_LEAF) > > + return -ENODEV; > > + > > + cpuid(CPUID_MWAIT_LEAF, &eax, &ebx, &ecx, &mwait_substates); > > + > > + if (!(ecx & CPUID5_ECX_INTERRUPT_BREAK)) > > + return -ENODEV; > > The guest is still able to set ecx=0 with MWAIT, which should be the > same as not having the CPUID flag, so I'm wondering how this check > prevents anything harmful ... is it really a cpu "feature"? > > If we somehow report ecx bit 1 in CPUID[5], then the guest might try to > set ecx bit 0 for MWAIT, which will cause #GP(0) and could explain the > hang that Gabriel is hitting. > > Gabriel, > > - do you see VM exits on the "hung" VCPU? how would I go about looking ? > - what is your CPU model? $ cat /proc/cpuinfo ... processor : 3 vendor_id : GenuineIntel cpu family : 6 model : 15 model name : Intel(R) Xeon(R) CPU 5150 @ 2.66GHz stepping : 6 microcode : 0xd2 cpu MHz : 2659.966 cache size : 4096 KB physical id : 3 siblings : 2 core id : 0 cpu cores : 2 apicid : 6 initial apicid : 6 fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good nopl cpuid aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca lahf_lm tpr_shadow dtherm bugs : bogomips : 5320.04 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: (this is 2x dual-core Xeon on a Mac Pro 1,1 -- all I had to spare for testing, to avoid having to reboot my primary desktop :) > - what do you get after running this C program on host and guest? > > #include > #include > > int main(void) { > uint32_t eax = 5, ebx, ecx = 0, edx; > asm ("cpuid" : "+a"(eax), "=b"(ebx), "+c"(ecx), "=d"(edx)); > > printf("eax=%#08x ebx=%#08x ecx=%#08x edx=%#08x\n", eax, ebx, ecx, edx); > > return 0; > } eax=0x000040 ebx=0x000040 ecx=0x000003 edx=0x000020 HTH, --G