Received: by 10.223.185.116 with SMTP id b49csp6483281wrg; Wed, 28 Feb 2018 10:08:14 -0800 (PST) X-Google-Smtp-Source: AH8x224bDoC/2ZQEBFMWWLGMwAdpTIQEdmN9kwxhhN7Yu3CX+GGaRBvtfm6vQEEd8ZCU33FQOoU3 X-Received: by 10.98.160.142 with SMTP id p14mr18770596pfl.134.1519841294581; Wed, 28 Feb 2018 10:08:14 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1519841294; cv=none; d=google.com; s=arc-20160816; b=eOWpZSJu7gGA8uQeBxrc/fa3NiAbU4EcbxRbAeC0lZV5T7x5tg+2wE9/MArt4KQ0m0 s5K+acifbpCxJtQTb8WWcq/vjGOm3AHqvl2uxaFbXdlLuxK6MOttBb0Sga6zHl4QoJ6x jElFQCpDiwTO6vEHooPXLS0MhqtxPYiCS8irMJ+Kb8d2yGv/9IceiooeVGkzbQH78fhu tNN/pAKsjzzDThDhPyfDgfa5XShZktteK4QDClUZIRU51jsGVVtP9ToNDtzcep4m0iIF au84NxH3c9EpaAoGGupv+RdJv3UOGkpm9SKqQh7UiopfM8ET9dpV//Hc053A4umfNrNF dlmQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature :arc-authentication-results; bh=zDTp21LaHwsQUd4nw1WTjK4XeVYtKJMcv8xQpFOUNa8=; b=o4mcWuIJdTvOeJTt+oT/yqpnQ5D4t/r3j/sZlz7pTTQORf0uoAT27kicBukgGzeC8b Zej8wVDamdVdRemQjEhgXzU6Wc5XapWNvOzEj5MqsoKExQX8OMXrndenruO80KvimK2R Om1M4wFtFOVCsBaCHArbL4JDH6v3ljU80VznIG8y0jIh+y/9HYEAWiDPg+BwcIRWZAih ypnb2YLLCoZssh81fSR49Yzma6GS96I6uQ1EK5lhmhqnpkLvtSvyZUDwYuPCXFScHXwM sJdLlNrrjhhc62mclbmmnrteH37cUjF/e7Wv8y6vd+6Po94CVAU31gOgbkNYqeDKyYfv jOkQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@amazon.de header.s=amazon201209 header.b=evLe0E2W; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amazon.de Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id p1-v6si1553753pld.80.2018.02.28.10.07.59; Wed, 28 Feb 2018 10:08:14 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@amazon.de header.s=amazon201209 header.b=evLe0E2W; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amazon.de Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752761AbeB1SHH (ORCPT + 99 others); Wed, 28 Feb 2018 13:07:07 -0500 Received: from smtp-fw-33001.amazon.com ([207.171.190.10]:57913 "EHLO smtp-fw-33001.amazon.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751566AbeB1SHG (ORCPT ); Wed, 28 Feb 2018 13:07:06 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.de; i=@amazon.de; q=dns/txt; s=amazon201209; t=1519841226; x=1551377226; h=from:to:cc:subject:date:message-id:mime-version: content-transfer-encoding; bh=zDTp21LaHwsQUd4nw1WTjK4XeVYtKJMcv8xQpFOUNa8=; b=evLe0E2WiGAQAJHDOGhq5KQ0CYoE8qOpahQUuU8p5JWY9j9AUMcSfov2 lS57UuS/rqmLRkh7PXeoPOWTMdBGJ+iDVUpPVGulqhyw1qAIUBmLHdjLD cWQ1JMh5nhiIuGg2QuCq3J/uxoVc40k8YDhLTxx6OzJMT+GT30YFpj/6C Y=; X-IronPort-AV: E=Sophos;i="5.47,406,1515456000"; d="scan'208";a="722069735" Received: from sea3-co-svc-lb6-vlan2.sea.amazon.com (HELO email-inbound-relay-1a-16acd5e0.us-east-1.amazon.com) ([10.47.22.34]) by smtp-border-fw-out-33001.sea14.amazon.com with ESMTP/TLS/DHE-RSA-AES256-SHA; 28 Feb 2018 18:07:02 +0000 Received: from u54e1ad5160425a4b64ea.ant.amazon.com (iad1-ws-svc-lb91-vlan2.amazon.com [10.0.103.146]) by email-inbound-relay-1a-16acd5e0.us-east-1.amazon.com (8.14.7/8.14.7) with ESMTP id w1SI6sIE025586 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 28 Feb 2018 18:06:57 GMT Received: from u54e1ad5160425a4b64ea.ant.amazon.com (localhost [127.0.0.1]) by u54e1ad5160425a4b64ea.ant.amazon.com (8.15.2/8.15.2/Debian-3) with ESMTP id w1SI6q2Z023424; Wed, 28 Feb 2018 19:06:53 +0100 Received: (from karahmed@localhost) by u54e1ad5160425a4b64ea.ant.amazon.com (8.15.2/8.15.2/Submit) id w1SI6pPZ023421; Wed, 28 Feb 2018 19:06:51 +0100 From: KarimAllah Ahmed To: x86@kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: KarimAllah Ahmed , Paolo Bonzini , =?UTF-8?q?Radim=20Kr=C4=8Dm=C3=A1=C5=99?= , Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" Subject: [PATCH] X86/KVM: Update the exit_qualification access bits while walking an address Date: Wed, 28 Feb 2018 19:06:48 +0100 Message-Id: <1519841208-23349-1-git-send-email-karahmed@amazon.de> X-Mailer: git-send-email 2.7.4 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org ... to avoid having a stale value when handling an EPT misconfig for MMIO regions. MMIO regions that are not passed-through to the guest are handled through EPT misconfigs. The first time a certain MMIO page is touched it causes an EPT violation, then KVM marks the EPT entry to cause an EPT misconfig instead. Any subsequent accesses to the entry will generate an EPT misconfig. Things gets slightly complicated with nested guest handling for MMIO regions that are not passed through from L0 (i.e. emulated by L0 user-space). An EPT violation for one of these MMIO regions from L2, exits to L0 hypervisor. L0 would then look at the EPT12 mapping for L1 hypervisor and realize it is not present (or not sufficient to serve the request). Then L0 injects an EPT violation to L1. L1 would then update its EPT mappings. The EXIT_QUALIFICATION value for L1 would come from exit_qualification variable in "struct vcpu". The problem is that this variable is only updated on EPT violation and not on EPT misconfig. So if an EPT violation because of a read happened first, then an EPT misconfig because of a write happened afterwards. The L0 hypervisor will still contain exit_qualification value from the previous read instead of the write and end up injecting an EPT violation to the L1 hypervisor with an out of date EXIT_QUALIFICATION. The EPT violation that is injected from L0 to L1 needs to have the correct EXIT_QUALIFICATION specially for the access bits because the individual access bits for MMIO EPTs are updated only on actual access of this specific type. So for the example above, the L1 hypervisor will keep updating only the read bit in the EPT then resume the L2 guest. The L2 guest would end up causing another exit where the L0 *again* will inject another EPT violation to L1 hypervisor with *again* an out of date exit_qualification which indicates a read and not a write. Then this ping-pong just keeps happening without making any forward progress. The behavior of mapping MMIO regions changed in: commit a340b3e229b24 ("kvm: Map PFN-type memory regions as writable (if possible)") ... where an EPT violation for a read would also fixup the write bits to avoid another EPT violation which by acciddent would fix the bug mentioned above. This commit fixes this situation and ensures that the access bits for the exit_qualifcation is up to date. That ensures that even L1 hypervisor running with a KVM version before the commit mentioned above would still work. ( The description above assumes EPT to be available and used by L1 hypervisor + the L1 hypervisor is passing through the MMIO region to the L2 guest while this MMIO region is emulated by the L0 user-space ). Cc: Paolo Bonzini Cc: Radim Krčmář Cc: Thomas Gleixner Cc: Ingo Molnar Cc: H. Peter Anvin Cc: x86@kernel.org Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: KarimAllah Ahmed --- arch/x86/kvm/paging_tmpl.h | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h index 5abae72..6288e9d 100644 --- a/arch/x86/kvm/paging_tmpl.h +++ b/arch/x86/kvm/paging_tmpl.h @@ -452,14 +452,21 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker, * done by is_rsvd_bits_set() above. * * We set up the value of exit_qualification to inject: - * [2:0] - Derive from [2:0] of real exit_qualification at EPT violation + * [2:0] - Derive from the access bits. The exit_qualification might be + * out of date if it is serving an EPT misconfiguration. * [5:3] - Calculated by the page walk of the guest EPT page tables * [7:8] - Derived from [7:8] of real exit_qualification * * The other bits are set to 0. */ if (!(errcode & PFERR_RSVD_MASK)) { - vcpu->arch.exit_qualification &= 0x187; + vcpu->arch.exit_qualification &= 0x180; + if (write_fault) + vcpu->arch.exit_qualification |= EPT_VIOLATION_ACC_WRITE; + if (user_fault) + vcpu->arch.exit_qualification |= EPT_VIOLATION_ACC_READ; + if (fetch_fault) + vcpu->arch.exit_qualification |= EPT_VIOLATION_ACC_INSTR; vcpu->arch.exit_qualification |= (pte_access & 0x7) << 3; } #endif -- 2.7.4