Received: by 2002:a05:6a10:5bc5:0:0:0:0 with SMTP id os5csp1164178pxb; Thu, 21 Oct 2021 17:24:28 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyzTkxm1+cI5RxiTlnXNWhCjFHg4d0oLbVohMkK9leVVDphfeecYkaaJono+pu8J5nNn5xQ X-Received: by 2002:a05:6402:42d1:: with SMTP id i17mr12039687edc.320.1634862268550; Thu, 21 Oct 2021 17:24:28 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1634862268; cv=none; d=google.com; s=arc-20160816; b=RwyhJQ4v/jEOOwSE25r8LGvVxjZhJRKaPdHrRvOggFTVAt6LhGNiEBb8Rf522uwl6v y7LozfkilUe9VlfkZxsZgA7N1UstYJrEoRbcL0w2r41iP797TQOmsYfZfDIZKtALa5+X GDrFn2XRZXxNyZC1+Nkr7M3NxDQp4DU+Rnb0+PsDUYQwmX0YS4112UaQd4IctLKtHz/j xMDWaSt5re0b+VP7OO4UHESWH5tMvKd9P6JMIuMH0aBheSK1gp2bJ/ytt+oQaA/4vVmf FZgbyzQlOQ5SpyeygoEVgwJUQOv8lx9nUPgTSyhXnr7RsOIG9L3p9dlSSftF+Z5AHCMz NjSw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:from:references :cc:to:subject; bh=p7Asvc0leI3IIlOXhbNq7KXa6yUlHOpTRCSxbGql9oM=; b=wnk3eEKsNRjuLIxYuB0FVMME+aWzapTV/VXrMFgg8XAxCrefOuA+ZV0HZzh8lOmDCZ VnqihvGlnogm0qKScjihxkr0552vWYLZ275zw0t1nE/GvaztGdKMfK5Iy45svuBG5o+w cMWLlAhFQP1AbYEnXlZBYgFuzT22eY2bLiIvge7GjDfoxinNM2k3LJC7Clg3ugZnHE1U AA0Ya0baNg5ZRR4OhdwjwSzKubmdzk/eOW9lU4cHdGalSGCt0EsGyw+Igavir2L1vZjY XTN6DbpOq/OKaqCU3rH9yrZKcK3mADMpqR/HSDbNS6Jj/S3aVB5iOkpfvD2QsQIfBfw5 le1A== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id sd27si786748ejc.102.2021.10.21.17.24.04; Thu, 21 Oct 2021 17:24:28 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231996AbhJVAYj (ORCPT + 99 others); Thu, 21 Oct 2021 20:24:39 -0400 Received: from out30-54.freemail.mail.aliyun.com ([115.124.30.54]:56158 "EHLO out30-54.freemail.mail.aliyun.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229512AbhJVAYi (ORCPT ); Thu, 21 Oct 2021 20:24:38 -0400 X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R151e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04423;MF=laijs@linux.alibaba.com;NM=1;PH=DS;RN=14;SR=0;TI=SMTPD_---0UtBmaPJ_1634862139; Received: from C02XQCBJJG5H.local(mailfrom:laijs@linux.alibaba.com fp:SMTPD_---0UtBmaPJ_1634862139) by smtp.aliyun-inc.com(127.0.0.1); Fri, 22 Oct 2021 08:22:20 +0800 Subject: Re: [PATCH 1/4] KVM: X86: Fix tlb flush for tdp in kvm_invalidate_pcid() To: Sean Christopherson Cc: Lai Jiangshan , linux-kernel@vger.kernel.org, kvm@vger.kernel.org, Paolo Bonzini , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H. Peter Anvin" References: <20211019110154.4091-1-jiangshanlai@gmail.com> <20211019110154.4091-2-jiangshanlai@gmail.com> <55abc519-b528-ddaa-120d-8d157b520623@linux.alibaba.com> From: Lai Jiangshan Message-ID: Date: Fri, 22 Oct 2021 08:22:19 +0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Thunderbird/78.14.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2021/10/21 22:52, Sean Christopherson wrote: > On Thu, Oct 21, 2021, Lai Jiangshan wrote: >> >> >> On 2021/10/21 02:26, Sean Christopherson wrote: >>> On Wed, Oct 20, 2021, Lai Jiangshan wrote: >>>> On 2021/10/19 23:25, Sean Christopherson wrote: >>>> I just read some interception policy in vmx.c, if EPT=1 but vmx_need_pf_intercept() >>>> return true for some reasons/configs, #PF is intercepted. But CR3 write is not >>>> intercepted, which means there will be an EPT fault _after_ (IIUC) the CR3 write if >>>> the GPA of the new CR3 exceeds the guest maxphyaddr limit. And kvm queues a fault to >>>> the guest which is also _after_ the CR3 write, but the guest expects the fault before >>>> the write. >>>> >>>> IIUC, it can be fixed by intercepting CR3 write or reversing the CR3 write in EPT >>>> violation handler. >>> >>> KVM implicitly does the latter by emulating the faulting instruction. >>> >>> static int handle_ept_violation(struct kvm_vcpu *vcpu) >>> { >>> ... >>> >>> /* >>> * Check that the GPA doesn't exceed physical memory limits, as that is >>> * a guest page fault. We have to emulate the instruction here, because >>> * if the illegal address is that of a paging structure, then >>> * EPT_VIOLATION_ACC_WRITE bit is set. Alternatively, if supported we >>> * would also use advanced VM-exit information for EPT violations to >>> * reconstruct the page fault error code. >>> */ >>> if (unlikely(allow_smaller_maxphyaddr && kvm_vcpu_is_illegal_gpa(vcpu, gpa))) >>> return kvm_emulate_instruction(vcpu, 0); >>> >>> return kvm_mmu_page_fault(vcpu, gpa, error_code, NULL, 0); >>> } >>> >>> and injecting a #GP when kvm_set_cr3() fails. >> >> I think the EPT violation happens *after* the cr3 write. So the instruction to be >> emulated is not "cr3 write". The emulation will queue fault into guest though, >> recursive EPT violation happens since the cr3 exceeds maxphyaddr limit. > > Doh, you're correct. I think my mind wandered into thinking about what would > happen with PDPTRs and forgot to get back to normal MOV CR3. > > So yeah, the only way to correctly handle this would be to intercept CR3 loads. > I'm guessing that would have a noticeable impact on guest performance. I think we can detect it in handle_ept_violation() via checking the cr3 value, and make it triple-fault if it is the case, so that the VMM can exit. I don't think any OS would use the reserved bit in CR3 and the corresponding #GP. > > Paolo, I'll leave this one for you to decide, we have pretty much written off > allow_smaller_maxphyaddr :-) >