Received: by 10.223.176.46 with SMTP id f43csp2285432wra; Thu, 25 Jan 2018 07:38:40 -0800 (PST) X-Google-Smtp-Source: AH8x224591EISLcXrSnRsWVnW2asYLhywSaTLgjrpxnYE+vJ+0+CQE/5hKgJCUpZ6IlM10f+/6tR X-Received: by 2002:a17:902:7442:: with SMTP id e2-v6mr11859249plt.364.1516894720117; Thu, 25 Jan 2018 07:38:40 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1516894720; cv=none; d=google.com; s=arc-20160816; b=YnG+Vlo3/CTG+/Pa0uMhdjH7pwv3H44uyj1btExE+iIPhDN+3/iw4oID2Xns+28TE4 y+nSeEm/JKFzwbJ6REBU4PdZEP1llm0u2vyQVx9eyVPBic/q8wtCADTt2Y5vrffWA4hd RczE2DpkPUsI7ktsIutxeDQScwEN3ahO/5+e84TSL7ysezfl8gKTq0kF3370nhfdRqGq lcU+7w7vksDGFZm1XiVIzSGvtD/N7FGTZ/y8Dico5J/wKpx7kXGRz/wqeNivpg6DTHfd ZlpjDoS2Vk4r7IVf+ubBGpd/+591Nfb+H7P927LjrMfMPktjhiju44moJ44xVpotsgc8 MYHw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:arc-authentication-results; bh=5N7T+ydt98EVK7wu5tIs8WokYsTBe/tXThZyWxSqG9c=; b=M76A/SZpGtNFHs1hg/V0z6rwoIj4zgU5yu0adUT4CgSMXkiwnFbOUldA7CDyTz50gX p4Nzqhezhq3/IJZbHHehhyPknWUIFH6T16wPe0QXzLg4CWooNSwtNZEcCFn9Sl+TZVjw baAJUctmnzBOAYmH1yUsNzozswtL8XUdjpnlqgJrkBysxF71uxH8Xw+OZuMSov4JvqDV Sj8aA///eSiDX8BdRLKPIKZPEhppjrTK4i1MwEx2SnXZu5P92dmle/qabMED3Z3NMrpi shSDOFw5YWI89XmZ63NhEO0TAPMD5tZk17PpPwLJeHJS4auvJ5rbt1UjZDc+Av9zkPw7 HPZA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id g124si1705441pgc.114.2018.01.25.07.38.25; Thu, 25 Jan 2018 07:38:40 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751303AbeAYPhQ (ORCPT + 99 others); Thu, 25 Jan 2018 10:37:16 -0500 Received: from mx1.redhat.com ([209.132.183.28]:35784 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750980AbeAYPhP (ORCPT ); Thu, 25 Jan 2018 10:37:15 -0500 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 6D2F1CDB; Thu, 25 Jan 2018 15:37:15 +0000 (UTC) Received: from vitty.brq.redhat.com (unknown [10.43.2.155]) by smtp.corp.redhat.com (Postfix) with ESMTP id 952AF5EE0E; Thu, 25 Jan 2018 15:37:08 +0000 (UTC) From: Vitaly Kuznetsov To: kvm@vger.kernel.org Cc: x86@kernel.org, linux-kernel@vger.kernel.org, Paolo Bonzini , =?UTF-8?q?Radim=20Kr=C4=8Dm=C3=A1=C5=99?= , Wanpeng Li , Liran Alon , "Michael S. Tsirkin" , Jason Wang Subject: [PATCH v2] x86/kvm/vmx: do not use vm-exit instruction length for fast MMIO when running nested Date: Thu, 25 Jan 2018 16:37:07 +0100 Message-Id: <20180125153707.29981-1-vkuznets@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.25]); Thu, 25 Jan 2018 15:37:15 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org I was investigating an issue with seabios >= 1.10 which stopped working for nested KVM on Hyper-V. The problem appears to be in handle_ept_violation() function: when we do fast mmio we need to skip the instruction so we do kvm_skip_emulated_instruction(). This, however, depends on VM_EXIT_INSTRUCTION_LEN field being set correctly in VMCS. However, this is not the case. Intel's manual doesn't mandate VM_EXIT_INSTRUCTION_LEN to be set when EPT MISCONFIG occurs. While on real hardware it was observed to be set, some hypervisors follow the spec and don't set it; we end up advancing IP with some random value. I checked with Microsoft and they confirmed they don't fill VM_EXIT_INSTRUCTION_LEN on EPT MISCONFIG. Fix the issue by doing instruction skip through emulator when running nested. Fixes: 68c3b4d1676d870f0453c31d5a52e7e65c7448ae Suggested-by: Radim Krčmář Suggested-by: Paolo Bonzini Signed-off-by: Vitaly Kuznetsov --- v1 -> v2: inlay X86_FEATURE_HYPERVISOR case with EMULTYPE_SKIP optimization [Paolo Bonzini, Radim Krčmář] --- arch/x86/kvm/vmx.c | 16 +++++++++++++++- arch/x86/kvm/x86.c | 3 ++- 2 files changed, 17 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index c829d89e2e63..e105b439c372 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -6563,7 +6563,21 @@ static int handle_ept_misconfig(struct kvm_vcpu *vcpu) if (!is_guest_mode(vcpu) && !kvm_io_bus_write(vcpu, KVM_FAST_MMIO_BUS, gpa, 0, NULL)) { trace_kvm_fast_mmio(gpa); - return kvm_skip_emulated_instruction(vcpu); + /* + * Doing kvm_skip_emulated_instruction() depends on undefined + * behavior: Intel's manual doesn't mandate + * VM_EXIT_INSTRUCTION_LEN to be set in VMCS when EPT MISCONFIG + * occurs and while on real hardware it was observed to be set, + * other hypervisors (namely Hyper-V) don't set it, we end up + * advancing IP with some random value. Disable fast mmio when + * running nested and keep it for real hardware in hope that + * VM_EXIT_INSTRUCTION_LEN will always be set correctly. + */ + if (!static_cpu_has(X86_FEATURE_HYPERVISOR)) + return kvm_skip_emulated_instruction(vcpu); + else + return x86_emulate_instruction(vcpu, gpa, EMULTYPE_SKIP, + NULL, 0) == EMULATE_DONE; } ret = kvm_mmu_page_fault(vcpu, gpa, PFERR_RSVD_MASK, NULL, 0); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 1cec2c62a0b0..930aba87a723 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -5703,7 +5703,8 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu, * handle watchpoints yet, those would be handled in * the emulate_ops. */ - if (kvm_vcpu_check_breakpoint(vcpu, &r)) + if (!(emulation_type & EMULTYPE_SKIP) && + kvm_vcpu_check_breakpoint(vcpu, &r)) return r; ctxt->interruptibility = 0; -- 2.14.3