Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753950AbdDDMjb (ORCPT ); Tue, 4 Apr 2017 08:39:31 -0400 Received: from mx1.redhat.com ([209.132.183.28]:50092 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752603AbdDDMj2 (ORCPT ); Tue, 4 Apr 2017 08:39:28 -0400 DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com D5F4CC06584C Authentication-Results: ext-mx08.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx08.extmail.prod.ext.phx2.redhat.com; spf=pass smtp.mailfrom=rkrcmar@redhat.com DKIM-Filter: OpenDKIM Filter v2.11.0 mx1.redhat.com D5F4CC06584C Date: Tue, 4 Apr 2017 14:39:16 +0200 From: Radim =?utf-8?B?S3LEjW3DocWZ?= To: Alexander Graf Cc: Jim Mattson , "Michael S. Tsirkin" , LKML , "Gabriel L. Somlo" , Paolo Bonzini , Jonathan Corbet , Thomas Gleixner , Ingo Molnar , "H. Peter Anvin" , the arch/x86 maintainers , Joerg Roedel , kvm list , linux-doc@vger.kernel.org Subject: Re: [PATCH v5 untested] kvm: better MWAIT emulation for guests Message-ID: <20170404123915.GA9525@potion> References: <1489612895-12799-1-git-send-email-mst@redhat.com> <87f187de-64ef-22a2-7714-a811883bce02@suse.de> <20170328142837.GA21738@potion> <20170329121147.GA5129@potion> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.32]); Tue, 04 Apr 2017 12:39:27 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4681 Lines: 83 2017-04-03 12:04+0200, Alexander Graf: > On 03/29/2017 02:11 PM, Radim Krčmář wrote: >> 2017-03-28 13:35-0700, Jim Mattson: >> > On Tue, Mar 28, 2017 at 7:28 AM, Radim Krčmář wrote: >> > > 2017-03-27 15:34+0200, Alexander Graf: >> > > > On 15/03/2017 22:22, Michael S. Tsirkin wrote: >> > > > > Guests running Mac OS 5, 6, and 7 (Leopard through Lion) have a problem: >> > > > > unless explicitly provided with kernel command line argument >> > > > > "idlehalt=0" they'd implicitly assume MONITOR and MWAIT availability, >> > > > > without checking CPUID. >> > > > > >> > > > > We currently emulate that as a NOP but on VMX we can do better: let >> > > > > guest stop the CPU until timer, IPI or memory change. CPU will be busy >> > > > > but that isn't any worse than a NOP emulation. >> > > > > >> > > > > Note that mwait within guests is not the same as on real hardware >> > > > > because halt causes an exit while mwait doesn't. For this reason it >> > > > > might not be a good idea to use the regular MWAIT flag in CPUID to >> > > > > signal this capability. Add a flag in the hypervisor leaf instead. >> > > > So imagine we had proper MWAIT emulation capabilities based on page faults. >> > > > In that case, we could do something as fancy as >> > > > >> > > > Treat MWAIT as pass-through by default >> > > > >> > > > Have a per-vcpu monitor timer 10 times a second in the background that >> > > > checks which instruction we're in >> > > > >> > > > If we're in mwait for the last - say - 1 second, switch to emulated MWAIT, >> > > > if $IP was in non-mwait within that time, reset counter. >> > > Or we could reuse external interrupts for sampling. Exits trigerred by >> > > them would check for current instruction (probably would be best to >> > > limit just to timer tick) and a sufficient ratio (> 0?) of other exits >> > > would imply that MWAIT is not used. >> > > >> > > > Or instead maybe just reuse the adapter hlt logic? >> > > Emulated MWAIT is very similar to emulated HLT, so reusing the logic >> > > makes sense. We would just add new wakeup methods. >> > > >> > > > Either way, with that we should be able to get super low latency IPIs >> > > > running while still maintaining some sanity on systems which don't have >> > > > dedicated CPUs for workloads. >> > > > >> > > > And we wouldn't need guest modifications, which is a great plus. So older >> > > > guests (and Windows?) could benefit from mwait as well. >> > > There is no need guest modifications -- it could be exposed as standard >> > > MWAIT feature to the guest, with responsibilities for guest/host-impact >> > > on the user. >> > > >> > > I think that the page-fault based MWAIT would require paravirt if it >> > > should be enabled by default, because of performance concerns: >> > > Enabling write protection on a page needs a VM exit on all other VCPUs >> > > when beginning monitoring (to reload page permissions and prevent missed >> > > writes). >> > > We'd want to keep trapping writes to the page all the time because >> > > toggling is slow, but this could regress performance for an OS that has >> > > other data accessed by other VCPUs in that page. >> > > No current interface can tell the guest that it should reserve the whole >> > > page instead of what CPUID[5] says and that writes to the monitored page >> > > are not "cheap", but can trigger a VM exit ... >> > CPUID.05H:EBX is supposed to address the false sharing issue. IIRC, >> > VMware Fusion reports 64 in CPUID.05H:EAX and 4096 in CPUID.05H:EBX >> > when running Mac OS X guests. Per Intel's SDM volume 3, section >> > 8.10.5, "To avoid false wake-ups; use the largest monitor line size to >> > pad the data structure used to monitor writes. Software must make sure >> > that beyond the data structure, no unrelated data variable exists in >> > the triggering area for MWAIT. A pad may be needed to avoid this >> > situation." Unfortunately, most operating systems do not follow this >> > advice. >> Right, EBX provides what we need to expose that the whole page is >> monitored, thanks! > > So coming back to the original patch, is there anything that should keep us > from exposing MWAIT straight into the guest at all times? Just minor issues: * OS X on Core 2 fails for unknown reason if we disable the instruction trapping, which is an argument against doing it by default * idling guests would consume host CPU, which is a significant change in behavior and shouldn't be done without userspace's involvement I think the best compromise is to add a capability for the MWAIT VM-exit controls and let userspace expose MWAIT if it wishes to. Will send a patch.