Received: by 10.223.176.46 with SMTP id f43csp2214335wra; Thu, 25 Jan 2018 06:39:58 -0800 (PST) X-Google-Smtp-Source: AH8x225GgtfWF+6MAjghKhaMwVdjyGucBzvvVB/KKzuvcmfkV1sDoh5SSqalAgxBaMbdULEiAPiG X-Received: by 10.101.85.138 with SMTP id j10mr13602072pgs.144.1516891198422; Thu, 25 Jan 2018 06:39:58 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1516891198; cv=none; d=google.com; s=arc-20160816; b=zFoKsXqyRFeU/+ywuUZKgOFd2HTKUEOQFfRHVlN2m/OlAZ5r4OR/Rimc87B5An6yM0 ZFu9s/3Etd0I8Mm0WhSAU8i0pvZp0oBoGwcFdBpWfxgSgCzATkuHoP65ldDz26h841cP fAf2vOWdl66NeNZJK5l4Gm9kubCEfW0Aw7aBnaL6kCQtEbYNRqXTdc45t3Qwy/sbSOBB UvJ2JBiyEYHs/XVm5xhi+oowQQ5EF9fAo4PIuV011/sz2hyNZkVEaR7X7IkWpbFuw1/X 1ocPUBbJK5IPab3J/ZQW2oTTpsl+CN78AWMXTpjJ80HQvUnVaKzZjle7FQnpSnlr+Lwk Rsog== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :arc-authentication-results; bh=mI7sDDkFsVV+5C6h2L516Ke1c9WDoplivEASOIF5jrE=; b=jOzq+c9Mc5Mfw/kGAqGtFGzofI15UMT+qeyl4Mf1Fo3oEYQANkINcI5iL5+idBDPe9 Djw7kVW9qksVSXa3JkFEuHX1IsP2CpU+p6KuljHr7y8+BUbkWdtcpTrssYrBvAf7XGAi ri9DfGX17AyZPHClJCNRm6QhBBKQAMJCWvjTdFgxFp09+ND0C3dez1h5JrnxNOoNJ4ji nnjAAZUvXOSDRzu+ea01D5FHDm0bJzYRFLmaY6t1u0wLwUX6IxlTFpdOf8g0w1PVVaFn n9Gb5qg02pdHuA1mDuJB9G+kUBqMtUHyCyOCvAtjPp6JwTH9T/EPmQfmQZhXHGLrUPax ETEQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id k20-v6si2018664pll.754.2018.01.25.06.39.44; Thu, 25 Jan 2018 06:39:58 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751533AbeAYOQf (ORCPT + 99 others); Thu, 25 Jan 2018 09:16:35 -0500 Received: from mx1.redhat.com ([209.132.183.28]:59314 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751388AbeAYOQe (ORCPT ); Thu, 25 Jan 2018 09:16:34 -0500 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id BFAF3C052422; Thu, 25 Jan 2018 14:16:33 +0000 (UTC) Received: from flask (unknown [10.43.2.80]) by smtp.corp.redhat.com (Postfix) with SMTP id 309F75C8B0; Thu, 25 Jan 2018 14:16:28 +0000 (UTC) Received: by flask (sSMTP sendmail emulation); Thu, 25 Jan 2018 15:16:21 +0100 Date: Thu, 25 Jan 2018 15:16:21 +0100 From: Radim =?utf-8?B?S3LEjW3DocWZ?= To: Liran Alon Cc: vkuznets@redhat.com, x86@kernel.org, pbonzini@redhat.com, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, "Michael S. Tsirkin" , Jason Wang Subject: Re: [PATCH] x86/kvm: disable fast MMIO when running nested Message-ID: <20180125141620.GA7663@flask> References: <6690c53c-fc99-44ea-9090-6e7438c1bc98@default> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <6690c53c-fc99-44ea-9090-6e7438c1bc98@default> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.32]); Thu, 25 Jan 2018 14:16:34 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 2018-01-25 01:55-0800, Liran Alon: > ----- vkuznets@redhat.com wrote: > > I was investigating an issue with seabios >= 1.10 which stopped > > working > > for nested KVM on Hyper-V. The problem appears to be in > > handle_ept_violation() function: when we do fast mmio we need to skip > > the instruction so we do kvm_skip_emulated_instruction(). This, > > however, > > depends on VM_EXIT_INSTRUCTION_LEN field being set correctly in VMCS. > > However, this is not the case. > > > > Intel's manual doesn't mandate VM_EXIT_INSTRUCTION_LEN to be set when > > EPT MISCONFIG occurs. While on real hardware it was observed to be > > set, > > some hypervisors follow the spec and don't set it; we end up > > advancing > > IP with some random value. > > > > I checked with Microsoft and they confirmed they don't fill > > VM_EXIT_INSTRUCTION_LEN on EPT MISCONFIG. > > > > Fix the issue by disabling fast mmio when running nested. > > > > Signed-off-by: Vitaly Kuznetsov > > --- > > arch/x86/kvm/vmx.c | 9 ++++++++- > > 1 file changed, 8 insertions(+), 1 deletion(-) > > > > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c > > index c829d89e2e63..54afb446f38e 100644 > > --- a/arch/x86/kvm/vmx.c > > +++ b/arch/x86/kvm/vmx.c > > @@ -6558,9 +6558,16 @@ static int handle_ept_misconfig(struct kvm_vcpu > > *vcpu) > > /* > > * A nested guest cannot optimize MMIO vmexits, because we have an > > * nGPA here instead of the required GPA. > > + * Skipping instruction below depends on undefined behavior: > > Intel's > > + * manual doesn't mandate VM_EXIT_INSTRUCTION_LEN to be set in VMCS > > + * when EPT MISCONFIG occurs and while on real hardware it was > > observed > > + * to be set, other hypervisors (namely Hyper-V) don't set it, we > > end > > + * up advancing IP with some random value. Disable fast mmio when > > + * running nested and keep it for real hardware in hope that > > + * VM_EXIT_INSTRUCTION_LEN will always be set correctly. > > If Intel manual doesn't mandate VM_EXIT_INSTRUCTION_LEN to be set in VMCS on EPT_MISCONFIG, > I don't think we should do this on real-hardware as-well. Neither do I, but you can see the last discussion on this topic, https://patchwork.kernel.org/patch/9903811/. In short, we've agreed to limit the hack to real hardware and wait for Intel or virtio changes. Michael and Jason, any progress on implementing a fast virtio mechanism that doesn't rely on undefined behavior? (Encode writing instruction length into last 4 bits of MMIO address, side-channel say that accesses to the MMIO area always use certain instruction length, use hypercall, ...) Thanks.