Received: by 10.223.176.46 with SMTP id f43csp2222702wra; Thu, 25 Jan 2018 06:47:10 -0800 (PST) X-Google-Smtp-Source: AH8x225IidDfNtVIpAG8ZlAWuizWsEyQkbYTyft40ZAykb9okmZAmrdYEZwlkoVHlfJGuMeYlceV X-Received: by 2002:a17:902:6c0e:: with SMTP id q14-v6mr11723327plk.445.1516891630596; Thu, 25 Jan 2018 06:47:10 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1516891630; cv=none; d=google.com; s=arc-20160816; b=sg3OF8ilgdLb1PP/QLFcugQNWp12ofhU12Z5BBPrCcU8Bvodjn9+iEtUAr24q+w0N6 KqzjeYVp/ET25szXNSXi6xYWdMR9IJB2+6vGHIWEzEevd6ePflTjJslV7wKbX+shm7mu 4cwSTJBXMOSVJsZO8ZufntIrFCy86JTi1eNO5GurXzhPpS+1AhAWQ+UEkoUDH5YlE6Mf kx6GslDyh94UuD00aybe/z4jy331SlHO3a/9Mfdy3tseBoNHFAkqfzmqfEg4P3K4xGSK xCzxEI78bNSmHsh2KkZpoI1mjsgWI2g4wvplBd+kSj1ufW/iSnGOzFP+A7awy9w6av60 Zmgw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-language :content-transfer-encoding:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:arc-authentication-results; bh=yLwlXh91Ts2Nu/e8EnF1LzY1NCKdpvBiVn6VqrwweZM=; b=OtdzEKC6k8+NqnNm8Zyjo8OGj8BIYlaiQ3NRipNSm76ywgjZGVYuTpe5YEEIeYpnjI Amk68zuMc3OiFoUU+y9D7w6U2cE48ffHpbJiNBJ6zH4Esu9CZwR+P2uPffWUsEFYlufM MQRiWNj1/rKGEtYgywZWdBvJEyDxFjVcneARsNePOZ2BExWfAy1MuN/L/yxNQez0NANS Uei9Zd6XLT6eV0Kz/rROY+z5JOpeF2tYo8ezsEGgdtNuFkFcNv+kgBSOFc7gw/49yMtK 9zvK0Y6/jygtLiokPAsOhgz7Ug6NPbIlEIrhZWLIdFb6Oe2zAZd5UINw4FovMroNoxE8 B38w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id o3-v6si2076471plk.533.2018.01.25.06.46.55; Thu, 25 Jan 2018 06:47:10 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751916AbeAYOji (ORCPT + 99 others); Thu, 25 Jan 2018 09:39:38 -0500 Received: from mx1.redhat.com ([209.132.183.28]:56766 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751493AbeAYOjf (ORCPT ); Thu, 25 Jan 2018 09:39:35 -0500 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 3EFCA8F885; Thu, 25 Jan 2018 14:39:35 +0000 (UTC) Received: from [10.72.12.17] (ovpn-12-17.pek2.redhat.com [10.72.12.17]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 5DD8717CE9; Thu, 25 Jan 2018 14:39:25 +0000 (UTC) Subject: Re: [PATCH] x86/kvm: disable fast MMIO when running nested To: =?UTF-8?B?UmFkaW0gS3LEjW3DocWZ?= , Liran Alon Cc: vkuznets@redhat.com, x86@kernel.org, pbonzini@redhat.com, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, "Michael S. Tsirkin" References: <6690c53c-fc99-44ea-9090-6e7438c1bc98@default> <20180125141620.GA7663@flask> From: Jason Wang Message-ID: Date: Thu, 25 Jan 2018 22:39:14 +0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.5.0 MIME-Version: 1.0 In-Reply-To: <20180125141620.GA7663@flask> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-US X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.27]); Thu, 25 Jan 2018 14:39:35 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2018年01月25日 22:16, Radim Krčmář wrote: > 2018-01-25 01:55-0800, Liran Alon: >> ----- vkuznets@redhat.com wrote: >>> I was investigating an issue with seabios >= 1.10 which stopped >>> working >>> for nested KVM on Hyper-V. The problem appears to be in >>> handle_ept_violation() function: when we do fast mmio we need to skip >>> the instruction so we do kvm_skip_emulated_instruction(). This, >>> however, >>> depends on VM_EXIT_INSTRUCTION_LEN field being set correctly in VMCS. >>> However, this is not the case. >>> >>> Intel's manual doesn't mandate VM_EXIT_INSTRUCTION_LEN to be set when >>> EPT MISCONFIG occurs. While on real hardware it was observed to be >>> set, >>> some hypervisors follow the spec and don't set it; we end up >>> advancing >>> IP with some random value. >>> >>> I checked with Microsoft and they confirmed they don't fill >>> VM_EXIT_INSTRUCTION_LEN on EPT MISCONFIG. >>> >>> Fix the issue by disabling fast mmio when running nested. >>> >>> Signed-off-by: Vitaly Kuznetsov >>> --- >>> arch/x86/kvm/vmx.c | 9 ++++++++- >>> 1 file changed, 8 insertions(+), 1 deletion(-) >>> >>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c >>> index c829d89e2e63..54afb446f38e 100644 >>> --- a/arch/x86/kvm/vmx.c >>> +++ b/arch/x86/kvm/vmx.c >>> @@ -6558,9 +6558,16 @@ static int handle_ept_misconfig(struct kvm_vcpu >>> *vcpu) >>> /* >>> * A nested guest cannot optimize MMIO vmexits, because we have an >>> * nGPA here instead of the required GPA. >>> + * Skipping instruction below depends on undefined behavior: >>> Intel's >>> + * manual doesn't mandate VM_EXIT_INSTRUCTION_LEN to be set in VMCS >>> + * when EPT MISCONFIG occurs and while on real hardware it was >>> observed >>> + * to be set, other hypervisors (namely Hyper-V) don't set it, we >>> end >>> + * up advancing IP with some random value. Disable fast mmio when >>> + * running nested and keep it for real hardware in hope that >>> + * VM_EXIT_INSTRUCTION_LEN will always be set correctly. >> If Intel manual doesn't mandate VM_EXIT_INSTRUCTION_LEN to be set in VMCS on EPT_MISCONFIG, >> I don't think we should do this on real-hardware as-well. > Neither do I, but you can see the last discussion on this topic, > https://patchwork.kernel.org/patch/9903811/. In short, we've agreed to > limit the hack to real hardware and wait for Intel or virtio changes. > > Michael and Jason, any progress on implementing a fast virtio mechanism > that doesn't rely on undefined behavior? > > (Encode writing instruction length into last 4 bits of MMIO address, > side-channel say that accesses to the MMIO area always use certain > instruction length, use hypercall, ...) > > Thanks. No progress from my side. But we can use PIO for virtio 1.0 and it's faster than fast MMIO (qemu supports modern pio notification bar, we can make it as default). It looks to me that neither encoding nor hypercall will work for real hardware virtio device. Thanks