Received: by 10.223.176.46 with SMTP id f43csp2217483wra; Thu, 25 Jan 2018 06:42:44 -0800 (PST) X-Google-Smtp-Source: AH8x225DSuqYefZClbwb6Am+n2KJJ3XjTq+zlcFuoOAjIXp56be9NlZobfuESvJzEN5swQzgFYQP X-Received: by 10.98.33.82 with SMTP id h79mr16053582pfh.139.1516891364465; Thu, 25 Jan 2018 06:42:44 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1516891364; cv=none; d=google.com; s=arc-20160816; b=cGVVWMcE1U9Ptzs+XWs47hlO4cNxdk4D2fEAMsEEHZnr2Jg95m5W6nwV7vb2uSFD32 Rz8a2wEF1t9x3JUGOH/Yod6vmXpaTWewttUEssQF/PnjkxA5USYl7CEPd8jwsEROU/88 aNYOgHklN979yF7O+d/N1ZefB6ykiqpEeUQ2DjSWr1gY1u1xgYVvarQqsAUW0G/79htB DubU0NFkyNbt8DfNgrGE50x0JRSu3PMnfdxDKLxo0BmTSMw20PrRBe4blcxdVVPy7AKm V8xHkEYD7qlFl/0hJNkYErA6DRm2WvOow3qkAivNpirlOTyeeW/GI6rZHloO3jwQHgfF QHXQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :arc-authentication-results; bh=XLfZH8yf9r27dzTUMF/59bZLxL7VJSS9e3AHEQXGr2M=; b=OAO3WoU4n7NK/ieDtzFUEWHR085RCaxBLd9q1EWRr3FUedlxRj6ZTSJtp0vGWko8vl M2D74vISsK4ErLaztnpFjL3JxoihXX7/NZDiGOZruZoEzlpSefYghx20D6utUnr1rgVh N87UO5jwppAGJmytPB66PS15qWS+Ak9fsZx8e1QcbRWtKBRnRRiW1CIPxgcKVrOU+2Z0 LuMk/YrDgr/OABkXzqaiJ3PUIIrnLrh2Amto+t7lXjRG9bAVJyaBzgtmieznu9BPbfdG OQdGTrvlBTaQWg6ZuZvE4majAMftD1XlltWH1bWO+z8dr0EJuGmsPMMEf01fKElhfZpq rnxw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 63-v6si2084540pla.526.2018.01.25.06.42.30; Thu, 25 Jan 2018 06:42:44 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751666AbeAYOev (ORCPT + 99 others); Thu, 25 Jan 2018 09:34:51 -0500 Received: from mx1.redhat.com ([209.132.183.28]:58282 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751114AbeAYOeu (ORCPT ); Thu, 25 Jan 2018 09:34:50 -0500 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 7448E78232; Thu, 25 Jan 2018 14:34:50 +0000 (UTC) Received: from flask (unknown [10.43.2.80]) by smtp.corp.redhat.com (Postfix) with SMTP id 76C8A608F3; Thu, 25 Jan 2018 14:34:47 +0000 (UTC) Received: by flask (sSMTP sendmail emulation); Thu, 25 Jan 2018 15:34:40 +0100 Date: Thu, 25 Jan 2018 15:34:40 +0100 From: Radim =?utf-8?B?S3LEjW3DocWZ?= To: Vitaly Kuznetsov Cc: kvm@vger.kernel.org, x86@kernel.org, linux-kernel@vger.kernel.org, Paolo Bonzini Subject: Re: [PATCH] x86/kvm: disable fast MMIO when running nested Message-ID: <20180125143439.GA19884@flask> References: <20180124151234.32329-1-vkuznets@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180124151234.32329-1-vkuznets@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.28]); Thu, 25 Jan 2018 14:34:50 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 2018-01-24 16:12+0100, Vitaly Kuznetsov: > I was investigating an issue with seabios >= 1.10 which stopped working > for nested KVM on Hyper-V. The problem appears to be in > handle_ept_violation() function: when we do fast mmio we need to skip > the instruction so we do kvm_skip_emulated_instruction(). This, however, > depends on VM_EXIT_INSTRUCTION_LEN field being set correctly in VMCS. > However, this is not the case. > > Intel's manual doesn't mandate VM_EXIT_INSTRUCTION_LEN to be set when > EPT MISCONFIG occurs. While on real hardware it was observed to be set, > some hypervisors follow the spec and don't set it; we end up advancing > IP with some random value. > > I checked with Microsoft and they confirmed they don't fill > VM_EXIT_INSTRUCTION_LEN on EPT MISCONFIG. > > Fix the issue by disabling fast mmio when running nested. > > Signed-off-by: Vitaly Kuznetsov > --- > arch/x86/kvm/vmx.c | 9 ++++++++- > 1 file changed, 8 insertions(+), 1 deletion(-) > > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c > index c829d89e2e63..54afb446f38e 100644 > --- a/arch/x86/kvm/vmx.c > +++ b/arch/x86/kvm/vmx.c > @@ -6558,9 +6558,16 @@ static int handle_ept_misconfig(struct kvm_vcpu *vcpu) > /* > * A nested guest cannot optimize MMIO vmexits, because we have an > * nGPA here instead of the required GPA. > + * Skipping instruction below depends on undefined behavior: Intel's > + * manual doesn't mandate VM_EXIT_INSTRUCTION_LEN to be set in VMCS > + * when EPT MISCONFIG occurs and while on real hardware it was observed > + * to be set, other hypervisors (namely Hyper-V) don't set it, we end > + * up advancing IP with some random value. Disable fast mmio when > + * running nested and keep it for real hardware in hope that > + * VM_EXIT_INSTRUCTION_LEN will always be set correctly. > */ > gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS); > - if (!is_guest_mode(vcpu) && > + if (!static_cpu_has(X86_FEATURE_HYPERVISOR) && !is_guest_mode(vcpu) && I realized that Paolo kept a minor optimization while getting rid of the undefined behavior (https://patchwork.kernel.org/patch/9903811/). Please do the same trick that signals kvm_io_bus_write() before going to x86_emulate_instruction(... EMULTYPE_SKIP ...), but add a branch to use kvm_skip_emulated_instruction() for bare-metal, thanks. > !kvm_io_bus_write(vcpu, KVM_FAST_MMIO_BUS, gpa, 0, NULL)) { > trace_kvm_fast_mmio(gpa); > return kvm_skip_emulated_instruction(vcpu); > -- > 2.14.3 >