Received: by 2002:a05:6a10:5bc5:0:0:0:0 with SMTP id os5csp141285pxb; Wed, 20 Oct 2021 18:28:53 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwf3XRzS4S8Sa6E6+ZThFdq3EiNyROTHksrgdAIBWJibip7L/5IX1cMh8sSL7enfLAhkO2x X-Received: by 2002:a17:90a:c70d:: with SMTP id o13mr2904172pjt.143.1634779732939; Wed, 20 Oct 2021 18:28:52 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1634779732; cv=none; d=google.com; s=arc-20160816; b=vKIbErM7VEvW1fwOvCNBMBGNM1XQcxmZr1goAFrD0sOeP+YOVaeB2UqnY/APCPZxel wAEEirpmrl4Ssi8UajOek4kJDRwaR08Sbf/UhIJi8Yx5ljRpdv4MI/4iaUxPsuikERwt x9m4917wVPi40YvHxLRigzBFw+EhYlifzI0fSyx7tyYl9VRclli/WrbQQdz6TYjS0uYc y9tkK5lhFqNZpoC1vcbI7MiA0+DTq5JzpJmu+HdIaWhPoQCq1+3lv/g3bBU79h7tkq8t y/BXVKoUG+cjc2Z0xZMZEswmxtS+9bH08DC+N+ZWmmLDLdFqL+uvp6DeOoJaGLnAnm/K ztIA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:from:references :cc:to:subject; bh=2wYCTEM4u/FhHJu2LrSyyAASoOABhjnmJsxoQayIELw=; b=wsMOtIttK2UDRdm12aEI4tQb2kHIat++eKFqQWetpjcG5McmHUHZy32MZIxZaiX34Q WBq9bMVWqk6nyhRBr0T5Ew6e3JCAKmWZi4MQttwjquPZNyBPq0Ba7+RfgKynbA8tOM0J R13aicIXxsg/oWJCFTW4+POt3aYQPhv3tF9YKpk/eiPRQW8bXEp4n4hJ8VSrKUFauM+4 n0p1zL9dYh4W5TXxO2oI5GLpydEW/b4j4Wi0fWy8neTAi9AIqvPCES3sQaAf2suYH0mN g9AzIY+zeMB16wWa+0iL8ANsfHJyCB4/4YHSKh1tyP5ykFsy9NOFY9uV2bbJ/vcW6UJF HvcQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id t12si5667755plr.73.2021.10.20.18.28.39; Wed, 20 Oct 2021 18:28:52 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231488AbhJUB3k (ORCPT + 99 others); Wed, 20 Oct 2021 21:29:40 -0400 Received: from out30-130.freemail.mail.aliyun.com ([115.124.30.130]:59538 "EHLO out30-130.freemail.mail.aliyun.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231342AbhJUB3j (ORCPT ); Wed, 20 Oct 2021 21:29:39 -0400 X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R131e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e01424;MF=laijs@linux.alibaba.com;NM=1;PH=DS;RN=14;SR=0;TI=SMTPD_---0Ut5Vdx0_1634779640; Received: from C02XQCBJJG5H.local(mailfrom:laijs@linux.alibaba.com fp:SMTPD_---0Ut5Vdx0_1634779640) by smtp.aliyun-inc.com(127.0.0.1); Thu, 21 Oct 2021 09:27:21 +0800 Subject: Re: [PATCH 1/4] KVM: X86: Fix tlb flush for tdp in kvm_invalidate_pcid() To: Sean Christopherson Cc: Lai Jiangshan , linux-kernel@vger.kernel.org, kvm@vger.kernel.org, Paolo Bonzini , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H. Peter Anvin" References: <20211019110154.4091-1-jiangshanlai@gmail.com> <20211019110154.4091-2-jiangshanlai@gmail.com> From: Lai Jiangshan Message-ID: <55abc519-b528-ddaa-120d-8d157b520623@linux.alibaba.com> Date: Thu, 21 Oct 2021 09:27:20 +0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Thunderbird/78.14.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2021/10/21 02:26, Sean Christopherson wrote: > On Wed, Oct 20, 2021, Lai Jiangshan wrote: >> On 2021/10/19 23:25, Sean Christopherson wrote: >> I just read some interception policy in vmx.c, if EPT=1 but vmx_need_pf_intercept() >> return true for some reasons/configs, #PF is intercepted. But CR3 write is not >> intercepted, which means there will be an EPT fault _after_ (IIUC) the CR3 write if >> the GPA of the new CR3 exceeds the guest maxphyaddr limit. And kvm queues a fault to >> the guest which is also _after_ the CR3 write, but the guest expects the fault before >> the write. >> >> IIUC, it can be fixed by intercepting CR3 write or reversing the CR3 write in EPT >> violation handler. > > KVM implicitly does the latter by emulating the faulting instruction. > > static int handle_ept_violation(struct kvm_vcpu *vcpu) > { > ... > > /* > * Check that the GPA doesn't exceed physical memory limits, as that is > * a guest page fault. We have to emulate the instruction here, because > * if the illegal address is that of a paging structure, then > * EPT_VIOLATION_ACC_WRITE bit is set. Alternatively, if supported we > * would also use advanced VM-exit information for EPT violations to > * reconstruct the page fault error code. > */ > if (unlikely(allow_smaller_maxphyaddr && kvm_vcpu_is_illegal_gpa(vcpu, gpa))) > return kvm_emulate_instruction(vcpu, 0); > > return kvm_mmu_page_fault(vcpu, gpa, error_code, NULL, 0); > } > > and injecting a #GP when kvm_set_cr3() fails. I think the EPT violation happens *after* the cr3 write. So the instruction to be emulated is not "cr3 write". The emulation will queue fault into guest though, recursive EPT violation happens since the cr3 exceeds maxphyaddr limit. In this case, the guest is malicious/broken and gets to keep the pieces too. > > static int em_cr_write(struct x86_emulate_ctxt *ctxt) > { > if (ctxt->ops->set_cr(ctxt, ctxt->modrm_reg, ctxt->src.val)) > return emulate_gp(ctxt, 0); > > /* Disable writeback. */ > ctxt->dst.type = OP_NONE; > return X86EMUL_CONTINUE; > } >