Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754189AbdCOXae (ORCPT ); Wed, 15 Mar 2017 19:30:34 -0400 Received: from mail-qk0-f194.google.com ([209.85.220.194]:35271 "EHLO mail-qk0-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753983AbdCOX3Y (ORCPT ); Wed, 15 Mar 2017 19:29:24 -0400 Date: Wed, 15 Mar 2017 19:22:56 -0400 From: "Gabriel L. Somlo" To: Radim =?utf-8?B?S3LEjW3DocWZ?= Cc: "Michael S. Tsirkin" , linux-kernel@vger.kernel.org, Paolo Bonzini , Jonathan Corbet , Thomas Gleixner , Ingo Molnar , "H. Peter Anvin" , x86@kernel.org, Joerg Roedel , kvm@vger.kernel.org, linux-doc@vger.kernel.org Subject: Re: [PATCH v4] kvm: better MWAIT emulation for guests Message-ID: <20170315232255.GF2239@HEDWIG.INI.CMU.EDU> References: <1489605443-21045-1-git-send-email-mst@redhat.com> <20170315201348.GA14076@potion> <20170315202140.GD2239@HEDWIG.INI.CMU.EDU> <20170315204618.GB14076@potion> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20170315204618.GB14076@potion> X-Clacks-Overhead: GNU Terry Pratchett User-Agent: Mutt/1.7.1 (2016-10-04) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5933 Lines: 162 On Wed, Mar 15, 2017 at 09:46:18PM +0100, Radim Krčmář wrote: > 2017-03-15 16:21-0400, Gabriel L. Somlo: > > On Wed, Mar 15, 2017 at 09:13:49PM +0100, Radim Krčmář wrote: > >> 2017-03-15 21:28+0200, Michael S. Tsirkin: > >> > Guests running Mac OS 5, 6, and 7 (Leopard through Lion) have a problem: > >> > unless explicitly provided with kernel command line argument > >> > "idlehalt=0" they'd implicitly assume MONITOR and MWAIT availability, > >> > without checking CPUID. > >> > > >> > We currently emulate that as a NOP but on VMX we can do better: let > >> > guest stop the CPU until timer, IPI or memory change. CPU will be busy > >> > but that isn't any worse than a NOP emulation. > >> > > >> > Note that mwait within guests is not the same as on real hardware > >> > because halt causes an exit while mwait doesn't. For this reason it > >> > might not be a good idea to use the regular MWAIT flag in CPUID to > >> > signal this capability. Add a flag in the hypervisor leaf instead. > >> > > >> > Additionally, we add a capability for QEMU - e.g. if it knows there's an > >> > isolated CPU dedicated for the VCPU it can set the standard MWAIT flag > >> > to improve guest behaviour. > >> > > >> > Reported-by: "Gabriel L. Somlo" > >> > Signed-off-by: Michael S. Tsirkin > >> > --- > >> > > >> > Note: SVM bits are untested at this point. Seems pretty > >> > obvious though. > >> > > >> > changes from v3: > >> > - don't enable capability if cli+mwait blocks interrupts > >> > - doc typo fixes (drop drop ppc doc) > >> > > >> > changes from v2: > >> > - add a capability to allow host userspace to detect new kernels > >> > - more documentation to clarify the semantics of the feature flag > >> > and why it's useful > >> > - svm support as suggested by Radim > >> > > >> > changes from v1: > >> > - typo fix resulting in rest of leaf flags being overwritten > >> > Reported by: Wanpeng Li > >> > - updated commit log with data about guests helped by this feature > >> > - better document differences between mwait and halt for guests > >> > > >> > diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h > >> > @@ -212,4 +213,28 @@ static inline u64 nsec_to_cycles(struct kvm_vcpu *vcpu, u64 nsec) > >> > __rem; \ > >> > }) > >> > > >> > +static bool kvm_mwait_in_guest(void) > >> > +{ > >> > + unsigned int eax, ebx, ecx; > >> > + > >> > + if (!cpu_has(&boot_cpu_data, X86_FEATURE_MWAIT)) > >> > + return -ENODEV; > >> > + > >> > + if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL) > >> > + return -ENODEV; > >> > + > >> > + /* > >> > + * Intel CPUs without CPUID5_ECX_INTERRUPT_BREAK are problematic as > >> > + * they would allow guest to stop the CPU completely by disabling > >> > + * interrupts then invoking MWAIT. > >> > + */ > >> > + if (boot_cpu_data.cpuid_level < CPUID_MWAIT_LEAF) > >> > + return -ENODEV; > >> > + > >> > + cpuid(CPUID_MWAIT_LEAF, &eax, &ebx, &ecx, &mwait_substates); > >> > + > >> > + if (!(ecx & CPUID5_ECX_INTERRUPT_BREAK)) > >> > + return -ENODEV; > >> > >> The guest is still able to set ecx=0 with MWAIT, which should be the > >> same as not having the CPUID flag, so I'm wondering how this check > >> prevents anything harmful ... is it really a cpu "feature"? > >> > >> If we somehow report ecx bit 1 in CPUID[5], then the guest might try to > >> set ecx bit 0 for MWAIT, which will cause #GP(0) and could explain the > >> hang that Gabriel is hitting. > >> > >> Gabriel, > >> > >> - do you see VM exits on the "hung" VCPU? > > > > how would I go about looking ? > > Probably the simplest would be to try install trace-cmd and do > > trace-cmd record -e kvm:kvm_entry > > for a while, then killing it with ^C and calling > > trace-cmd report > > it will have lines like: > > CPU-3729 [021] 5046.222480: kvm_entry: vcpu 7 > > the first column is task id, the last one is VCPU id, so you'd verify that all > VCPUs eventually enter. > You could either look at the taks id of VCPUs that are running 100% of the time > or just pipe through `grep -o 'vcpu.*' | sort -u` and hope that all of them are > running, so you'd see nice list of them all. :) so, with Michael's patch v5 (same behavior as earlier btw), I get: # trace-cmd report | head -10000 | grep -o 'vcpu.*' | sort -u vcpu 0 vcpu 1 vcpu 2 vcpu 3 Which means they're all there. Also, 'top' shows 400% CPU for qemu-system-x86_64, which means I have 4 threads idle-looping at 100% each, which is consistent with a guest using an MWAIT-based idle loop. > If you don't see some VCPUs there, they might have been halted, which would > show on > > virsh qemu-monitor-command rhel7 --hmp info cpus > > If there are running, but never entering VCPUs, then running > > trace-cmd record -e kvm:\* > > to capture all kvm events could tell us why it is stuck. > > >> - what is your CPU model? > > > > $ cat /proc/cpuinfo > > model name : Intel(R) Xeon(R) CPU 5150 @ 2.66GHz > > > > (this is 2x dual-core Xeon on a Mac Pro 1,1 -- all I had to spare for > > testing, to avoid having to reboot my primary desktop :) > > Oh, ancient when it comes to VMX. :) I also have a somewhat more recent (only 4 year old) Macbook Air, I'll try to redo the whole set of tests on that one as well. > > >> - what do you get after running this C program on host and guest? > > > > eax=0x000040 ebx=0x000040 ecx=0x000003 edx=0x000020 > > Your CPU has the CPUID5_ECX_INTERRUPT_BREAK, thanks. > The guest should see QEMU's default, which is > > eax=0x000000 ebx=0x000000 ecx=0x000003 edx=0x000000 That's right, I finally managed to transfer a linux guest image over to the box, and indeed that's what it produces. More tomorrow once I have a chance to test on the Macbook Air (need to compile kvm git master on that one first, then transfer the os x qcow image, etc... Thanks, --Gabriel