Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752491AbdHPWbb (ORCPT ); Wed, 16 Aug 2017 18:31:31 -0400 Received: from mx1.redhat.com ([209.132.183.28]:34942 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752376AbdHPWb3 (ORCPT ); Wed, 16 Aug 2017 18:31:29 -0400 DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 7561C7EABC Authentication-Results: ext-mx04.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx04.extmail.prod.ext.phx2.redhat.com; spf=fail smtp.mailfrom=mst@redhat.com Date: Thu, 17 Aug 2017 01:31:27 +0300 From: "Michael S. Tsirkin" To: Paolo Bonzini Cc: Radim =?utf-8?B?S3LEjW3DocWZ?= , linux-kernel@vger.kernel.org, kvm@vger.kernel.org, stable@vger.kernel.org Subject: Re: [PATCH] kvm: x86: disable KVM_FAST_MMIO_BUS Message-ID: <20170817011815-mutt-send-email-mst@kernel.org> References: <20170816112249.28939-1-pbonzini@redhat.com> <20170816155132-mutt-send-email-mst@kernel.org> <9de5ebf5-457d-2a34-0314-c6c612ddb2e9@redhat.com> <20170816161301-mutt-send-email-mst@kernel.org> <20170816194342-mutt-send-email-mst@kernel.org> <81dabc78-edfd-32fc-024c-c57330386a51@redhat.com> <20170816190316.GA2566@flask> <20170816224815-mutt-send-email-mst@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.28]); Wed, 16 Aug 2017 22:31:29 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4317 Lines: 92 On Wed, Aug 16, 2017 at 11:25:35PM +0200, Paolo Bonzini wrote: > On 16/08/2017 21:59, Michael S. Tsirkin wrote: > > On Wed, Aug 16, 2017 at 09:03:17PM +0200, Radim Krčmář wrote: > >> 2017-08-16 19:19+0200, Paolo Bonzini: > >>> On 16/08/2017 18:50, Michael S. Tsirkin wrote: > >>>> On Wed, Aug 16, 2017 at 03:30:31PM +0200, Paolo Bonzini wrote: > >>>>> While you can filter out instruction fetches, that's not enough. A data > >>>>> read could happen because someone pointed the IDT to MMIO area, and who > >>>>> knows what the VM-exit instruction length points to in that case. > >>>> > >>>> Thinking more about it, I don't really see how anything > >>>> legal guest might be doing with virtio would trigger anything > >>>> but a fault after decoding the instruction. How does > >>>> skipping instruction even make sense in the example you give? > >>> > >>> There's no such thing as a legal guest. Anything that the hypervisor > >>> does, that differs from real hardware, is a possible escalation path. > >>> > >>> This in fact makes me doubt the EMULTYPE_SKIP patch too. > >> > >> The main hack is that we expect EPT misconfig within a given range to be > >> a MMIO NULL write. I think it is fine -- EMULTYPE_SKIP is a common path > >> that should have well tested error paths and, IIUC, virtio doesn't allow > >> any other access, so it is a problem of the guest if a buggy/malicious > >> application can access virtio memory. > > Yes, I agree. EMULTYPE_SKIP is fine because failed decoding still > causes an exception to be injected. Maybe it's better to gate the > EMULTYPE_SKIP emulation on the exit qualification saying this is a write I thought it's already limited to writes. I agree that's a reasonable limitation in any case. > and also not a page table walk---just in case. I still don't get it, sorry. Let's assume for the sake of argument that it's a PT walk causing the MMIO access. Just why do you think that it makes sense to skip the instruction that caused the walk? > >>>> how about we blacklist nested virt for this optimization? > >> > >> Not every hypervisor can be easily detected ... > > > > Hypervisors that don't set a hypervisor bit in CPUID are violating the > > spec themselves, aren't they? Anyway, we can add a management option > > for use in a nested scenario. > > No, the hypervisor bit only says that CPUID leaf 0x40000000 is defined. > See for example > https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1009458: > "Intel and AMD have also reserved CPUID leaves 0x40000000 - 0x400000FF > for software use. Hypervisors can use these leaves to provide an > interface to pass information from the hypervisor to the guest operating > system running inside a virtual machine. The hypervisor bit indicates > the presence of a hypervisor and that it is safe to test these > additional software leaves". Looks like it's not a bug then. Still, most hypervisors do have this leaf so it's a reasonable way that will catch most issues. We can always blacklist more as they are found. Additionally let's go ahead and add ability for userspace to disable fast MMIO for these hypervisors we failed to detect. > >> KVM uses standard features and SDM clearly says that the > >> instruction length field is undefined. > > > > True. Let's see whether intel can commit to a stronger definition. > > I don't think there's any rush to make this change. > > I disagree. Relying on undefined processor features is a bad idea. Maybe it was a bad idea 3 years ago, yes. In 2012 I posted "kvm_para: add mmio word store hypercall" as an alternative. Was nacked as MMIO was seen as safer and better. By now many people rely on mmio being fast. Let's talk to hardware guys to define the feature before we give up and spend years designing an alternative. > > It's just that this has been there for 3 years and people have built a > > product around this. > > Around 700 clock cycles? > > Paolo About 30% the cost of exit, isn't it? There are definitely workloads where cost of exit gates performance. We didn't work on fast mmio based on theoretical assumptions. But maybe I am wrong. We'll see. Jason here volunteered to test your patch and we'll see what comes out of it. If I'm wrong and it's about 1%, I won't split hairs. -- MST