Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755268AbdDLQ3r (ORCPT ); Wed, 12 Apr 2017 12:29:47 -0400 Received: from mx1.redhat.com ([209.132.183.28]:34592 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754953AbdDLQ3m (ORCPT ); Wed, 12 Apr 2017 12:29:42 -0400 DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 5ECB47AEB6 Authentication-Results: ext-mx01.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx01.extmail.prod.ext.phx2.redhat.com; spf=pass smtp.mailfrom=mst@redhat.com DKIM-Filter: OpenDKIM Filter v2.11.0 mx1.redhat.com 5ECB47AEB6 Date: Wed, 12 Apr 2017 19:29:28 +0300 From: "Michael S. Tsirkin" To: Alexander Graf Cc: Jim Mattson , kvm list , Radim =?utf-8?B?S3LEjW3DocWZ?= , LKML , "Gabriel L. Somlo" , Paolo Bonzini , Jonathan Corbet , Thomas Gleixner , Ingo Molnar , "H. Peter Anvin" , the arch/x86 maintainers , Joerg Roedel , linux-doc@vger.kernel.org, qemu-devel@nongnu.org Subject: Re: [PATCH v6] kvm: better MWAIT emulation for guests Message-ID: <20170412185249-mutt-send-email-mst@kernel.org> References: <1491911135-216950-1-git-send-email-agraf@suse.de> <4622E361-52AB-40F2-9915-45C48F0DBCD2@suse.de> <204f274d-697d-f9c6-8719-9bf91105f8b9@suse.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <204f274d-697d-f9c6-8719-9bf91105f8b9@suse.de> X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.25]); Wed, 12 Apr 2017 16:29:31 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2934 Lines: 66 On Wed, Apr 12, 2017 at 04:54:10PM +0200, Alexander Graf wrote: > > > On 12.04.17 16:34, Jim Mattson wrote: > > Actually, we have rejected commit 87c00572ba05aa8c ("kvm: x86: emulate > > monitor and mwait instructions as nop"), so when we intercept > > MONITOR/MWAIT, we synthesize #UD. Perhaps it is this difference from > > vanilla kvm that motivates the following idea... > > So you're not running upstream kvm? In that case, you can just not take this > patch either :). > > > Since we're still not going to report MONITOR support in CPUID, the > > only guests of consequence are paravirtual guests. What if a > > Only if someone actually implemented something for PV guests, yes. > > The real motivation is to allow user space to force set the MONITOR CPUID > flag. That way an admin can - if he really wants to - dedicate pCPUs to the > VM. > > I agree that we don't need the kvm pv flag for that. I'd be happy to drop > that if everyone agrees. I don't really agree we do not need the PV flag. mwait on kvm is different from mwait on bare metal in that you are heavily penalized by scheduler for polling unless you configure the host just so. HLT lets you give up the host CPU if you know you won't need it for a long time. So while many people can get by with monitor cpuid (those that isolate host CPUs) and it's a valuable option to have, I think a PV flag is also a valuable option and can be set for more configurations. Guest has an idle driver calling mwait on short waits and halt on longer ones. I'm in fact testing an idle driver using such a PV flag and will post when ready (after vacation ~3 weeks from now probably). > > paravirtual guest was aware of the fact that sometimes MONITOR/MWAIT > > would work as architected, and sometimes they would raise #UD (or do > > something else that's guest-visible, to indicate that the hypevisor is > > intercepting the instructions). Such a guest could first try a > > MONITOR/MWAIT-based idle loop and then fall back on a HLT-based idle > > loop if the hypervisor rejected its use of MONITOR/MWAIT. > > How would that work? That guest would have to atomically notify all other > vCPUs that wakeup notifications now go via IPIs instead of cache line > dirtying. > > That's probably as much work to get right as it would be to just emulate > MWAIT inside kvm ;). > > > We already have the loose concept of "this pCPU has other things to > > do," which is encoded in the variable-sized PLE window. With > > MONITOR/MWAIT, the choice is binary, but a simple implementation could > > tie the two together, by allowing the guest to use MONITOR/MWAIT > > whenever the PLE window exceeds a certain threshold. Or the decision > > could be left to the userspace agent. > > I agree, and that's basically the idea I mentioned earlier with MWAIT > emulation. We could (for well behaved guests) switch between emulating MWAIT > and running native MWAIT. > > > > Alex