Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751922AbcCJOHe (ORCPT ); Thu, 10 Mar 2016 09:07:34 -0500 Received: from mx1.redhat.com ([209.132.183.28]:51122 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751517AbcCJOH0 (ORCPT ); Thu, 10 Mar 2016 09:07:26 -0500 Subject: Re: [RFC PATCH 1/2] KVM: MMU: precompute page fault error code To: Xiao Guangrong , linux-kernel@vger.kernel.org, kvm@vger.kernel.org References: <1457437538-65867-1-git-send-email-pbonzini@redhat.com> <1457437538-65867-2-git-send-email-pbonzini@redhat.com> <56E17E3C.30407@linux.intel.com> Cc: huaitong.han@intel.com From: Paolo Bonzini Message-ID: <56E17F9A.2010308@redhat.com> Date: Thu, 10 Mar 2016 15:07:22 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.6.0 MIME-Version: 1.0 In-Reply-To: <56E17E3C.30407@linux.intel.com> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3602 Lines: 101 On 10/03/2016 15:01, Xiao Guangrong wrote: > > > On 03/08/2016 07:45 PM, Paolo Bonzini wrote: >> For the next patch, we will want to filter PFERR_FETCH_MASK away early, >> and not pass it to permission_fault if neither NX nor SMEP are enabled. >> Prepare for the change. > > Why it is needed? It is much easier to drop PFEC.F in > update_permission_bitmask(). It's just how I structured the patches. It's unrelated to update_permission_bimask. I wanted permission_fault to return a fault code, and while it's easy to add bits (such as PKERR_PK_MASK) it's harder to remove bits from the PFEC that permission_fault receives. So I'm doing it here. Feel free to instead drop PFEC.F in permission_fault() or even add a new member to kvm_mmu with the valid bits of a PFEC. This is just an RFC for you and Huaitong to adapt in the PK patchset. Sometimes things are easier to describe in patches than in English. :) Paolo >> >> Signed-off-by: Paolo Bonzini >> --- >> arch/x86/kvm/mmu.c | 2 +- >> arch/x86/kvm/paging_tmpl.h | 26 +++++++++++++++----------- >> 2 files changed, 16 insertions(+), 12 deletions(-) >> >> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c >> index 2463de0b935c..e57f7be061e3 100644 >> --- a/arch/x86/kvm/mmu.c >> +++ b/arch/x86/kvm/mmu.c >> @@ -3883,7 +3883,7 @@ static void update_permission_bitmask(struct >> kvm_vcpu *vcpu, >> u = bit & ACC_USER_MASK; >> >> if (!ept) { >> - /* Not really needed: !nx will cause pte.nx to fault */ >> + /* Not really needed: if !nx, ff will always be zero */ >> x |= !mmu->nx; >> /* Allow supervisor writes if !cr0.wp */ >> w |= !is_write_protection(vcpu) && !uf; >> diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h >> index 6013f3685ef4..285858d3223b 100644 >> --- a/arch/x86/kvm/paging_tmpl.h >> +++ b/arch/x86/kvm/paging_tmpl.h >> @@ -272,13 +272,24 @@ static int FNAME(walk_addr_generic)(struct >> guest_walker *walker, >> gpa_t pte_gpa; >> int offset; >> const int write_fault = access & PFERR_WRITE_MASK; >> - const int user_fault = access & PFERR_USER_MASK; >> - const int fetch_fault = access & PFERR_FETCH_MASK; >> - u16 errcode = 0; >> + u16 errcode; >> gpa_t real_gpa; >> gfn_t gfn; >> >> trace_kvm_mmu_pagetable_walk(addr, access); >> + >> + /* >> + * Do not modify PFERR_FETCH_MASK in access. It is used later in >> the call to >> + * mmu->translate_gpa and, when nested virtualization is in use, >> the X or NX >> + * bit of nested page tables always applies---even if NX and SMEP >> are disabled >> + * in the guest. >> + * >> + * TODO: cache the result of the NX and SMEP test in struct kvm_mmu? >> + */ >> + errcode = access; >> + if (!(mmu->nx || kvm_read_cr4_bits(vcpu, X86_CR4_SMEP))) >> + errcode &= ~PFERR_FETCH_MASK; >> + > > PFEC.P might be set since it is copied from @access. > > And the workload is moved from rare path to the common path... > >> retry_walk: >> walker->level = mmu->root_level; >> pte = mmu->get_cr3(vcpu); >> @@ -389,9 +400,7 @@ retry_walk: >> >> if (unlikely(!accessed_dirty)) { >> ret = FNAME(update_accessed_dirty_bits)(vcpu, mmu, walker, >> write_fault); >> - if (unlikely(ret < 0)) >> - goto error; >> - else if (ret) >> + if (ret > 0) >> goto retry_walk; > > So it returns normally even if update_accessed_dirty_bits() failed.