DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 6BEF37972E
Date: Thu, 17 Aug 2017 15:51:31 +0200
From: Radim =?utf-8?B?S3LEjW3DocWZ?= <rkrcmar@redhat.com>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>, linux-kernel@vger.kernel.org,
        kvm@vger.kernel.org, stable@vger.kernel.org
Subject: Re: [PATCH] kvm: x86: disable KVM_FAST_MMIO_BUS
Message-ID: <20170817135130.GC2566@flask>
References: <20170816155132-mutt-send-email-mst@kernel.org>
 <9de5ebf5-457d-2a34-0314-c6c612ddb2e9@redhat.com>
 <20170816161301-mutt-send-email-mst@kernel.org>
 <fb384aa5-6b8a-4d70-98bb-2ce2adbcaab4@redhat.com>
 <20170816194342-mutt-send-email-mst@kernel.org>
 <81dabc78-edfd-32fc-024c-c57330386a51@redhat.com>
 <20170816190316.GA2566@flask>
 <20170816224815-mutt-send-email-mst@kernel.org>
 <ac5c6c7e-1738-b1c6-96eb-6bdf198b67d2@redhat.com>
 <20170817011815-mutt-send-email-mst@kernel.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <20170817011815-mutt-send-email-mst@kernel.org>
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 3147
Lines: 66

2017-08-17 01:31+0300, Michael S. Tsirkin:
> On Wed, Aug 16, 2017 at 11:25:35PM +0200, Paolo Bonzini wrote:
> > On 16/08/2017 21:59, Michael S. Tsirkin wrote:
> > > On Wed, Aug 16, 2017 at 09:03:17PM +0200, Radim Krčmář wrote:
> > >>>> how about we blacklist nested virt for this optimization?
> > >>
> > >> Not every hypervisor can be easily detected ...
> > > 
> > > Hypervisors that don't set a hypervisor bit in CPUID are violating the
> > > spec themselves, aren't they?  Anyway, we can add a management option
> > > for use in a nested scenario.
> > 
> > No, the hypervisor bit only says that CPUID leaf 0x40000000 is defined.
> > See for example
> > https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1009458:
> > "Intel and AMD have also reserved CPUID leaves 0x40000000 - 0x400000FF
> > for software use. Hypervisors can use these leaves to provide an
> > interface to pass information from the hypervisor to the guest operating
> > system running inside a virtual machine. The hypervisor bit indicates
> > the presence of a hypervisor and that it is safe to test these
> > additional software leaves".
> 
> Looks like it's not a bug then. Still, most hypervisors do have this
> leaf so it's a reasonable way that will catch most issues.  We can
> always blacklist more as they are found. Additionally let's go ahead
> and add ability for userspace to disable fast MMIO for these
> hypervisors we failed to detect.

In the worst case, I'd make faster mmio an opt-in unsafe feature
regardless of what we run on.  Users that just want KVM to work get the
default and people who care about utmost performance can jump through
loops.

> > >> KVM uses standard features and SDM clearly says that the
> > >> instruction length field is undefined.
> > > 
> > > True. Let's see whether intel can commit to a stronger definition.
> > > I don't think there's any rush to make this change.
> > 
> > I disagree.  Relying on undefined processor features is a bad idea.
> 
> Maybe it was a bad idea 3 years ago, yes. In 2012 I posted "kvm_para:
> add mmio word store hypercall" as an alternative.  Was nacked as MMIO
> was seen as safer and better. By now many people rely on mmio being
> fast.  Let's talk to hardware guys to define the feature before we give
> up and spend years designing an alternative.

The change is not backward-compatible wrt. SDM, but all processors might
actually be behaving like we want ...  (I'd assert undefined behavior
add a vm-exit flag if I were to allow it, though.)

> > > It's just that this has been there for 3 years and people have built a
> > > product around this.
> > 
> > Around 700 clock cycles?
> > 
> > Paolo
> 
> About 30% the cost of exit, isn't it?  There are definitely workloads
> where cost of exit gates performance. We didn't work on fast mmio based
> on theoretical assumptions. But maybe I am wrong. We'll see. Jason here
> volunteered to test your patch and we'll see what comes out of it. If
> I'm wrong and it's about 1%, I won't split hairs.

I'm ok with waiting for the numbers as I hope that we won't have to
resort to adding special cases.