2016-03-08 11:46:43

by Paolo Bonzini

[permalink] [raw]
Subject: [RFC PATCH 0/2] KVM: MMU: return page fault error code from permission_fault

Intel's new PKU extension introduces a bit of the page fault error code
that must be dynamically computed after the PTE lookup (unlike I/D, U/S
and W/R). To ease this, these two patches make permission_fault return
the page fault error code.

Right now it only adds the P bit to the input parameter "pfec", but PKU
can change that.

Paolo

Paolo Bonzini (2):
KVM: MMU: precompute page fault error code
KVM: MMU: return page fault error code from permission_fault

arch/x86/kvm/mmu.c | 2 +-
arch/x86/kvm/mmu.h | 15 ++++++++++-----
arch/x86/kvm/paging_tmpl.h | 31 +++++++++++++++++--------------
3 files changed, 28 insertions(+), 20 deletions(-)

--
1.8.3.1


2016-03-08 11:46:03

by Paolo Bonzini

[permalink] [raw]
Subject: [RFC PATCH 2/2] KVM: MMU: return page fault error code from permission_fault

Intel's new PKU extension introduces a bit of the page fault error code
that must be dynamically computed after the PTE lookup (unlike I/D, U/S
and W/R). To ease this, these two patches make permission_fault return
the page fault error code.

Right now it only adds the P bit to the input parameter "pfec", but PKU
can change that.

Signed-off-by: Paolo Bonzini <[email protected]>
---
arch/x86/kvm/mmu.h | 15 ++++++++++-----
arch/x86/kvm/paging_tmpl.h | 5 ++---
2 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 58fe98a0a526..8b2b3dfca1ae 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -141,11 +141,15 @@ static inline bool is_write_protection(struct kvm_vcpu *vcpu)
}

/*
- * Will a fault with a given page-fault error code (pfec) cause a permission
- * fault with the given access (in ACC_* format)?
+ * Check if a given access (described through the I/D, W/R and U/S bits of a
+ * page fault error code pfec) causes a permission fault with the given PTE
+ * access rights (in ACC_* format).
+ *
+ * Return zero if the access does not fault; return the page fault error code
+ * if the access faults.
*/
-static inline bool permission_fault(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
- unsigned pte_access, unsigned pfec)
+static inline u8 permission_fault(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
+ unsigned pte_access, unsigned pfec)
{
int cpl = kvm_x86_ops->get_cpl(vcpu);
unsigned long rflags = kvm_x86_ops->get_rflags(vcpu);
@@ -169,7 +173,8 @@ static inline bool permission_fault(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,

WARN_ON(pfec & PFERR_RSVD_MASK);

- return (mmu->permissions[index] >> pte_access) & 1;
+ pfec |= PFERR_PRESENT_MASK;
+ return -((mmu->permissions[index] >> pte_access) & 1) & pfec;
}

void kvm_mmu_invalidate_zap_all_pages(struct kvm *kvm);
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 285858d3223b..4a6866dab848 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -370,10 +370,9 @@ retry_walk:
walker->ptes[walker->level - 1] = pte;
} while (!is_last_gpte(mmu, walker->level, pte));

- if (unlikely(permission_fault(vcpu, mmu, pte_access, access))) {
- errcode |= PFERR_PRESENT_MASK;
+ errcode = permission_fault(vcpu, mmu, pte_access, errcode);
+ if (unlikely(errcode))
goto error;
- }

gfn = gpte_to_gfn_lvl(pte, walker->level);
gfn += (addr & PT_LVL_OFFSET_MASK(walker->level)) >> PAGE_SHIFT;
--
1.8.3.1

2016-03-08 11:46:39

by Paolo Bonzini

[permalink] [raw]
Subject: [RFC PATCH 1/2] KVM: MMU: precompute page fault error code

For the next patch, we will want to filter PFERR_FETCH_MASK away early,
and not pass it to permission_fault if neither NX nor SMEP are enabled.
Prepare for the change.

Signed-off-by: Paolo Bonzini <[email protected]>
---
arch/x86/kvm/mmu.c | 2 +-
arch/x86/kvm/paging_tmpl.h | 26 +++++++++++++++-----------
2 files changed, 16 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 2463de0b935c..e57f7be061e3 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -3883,7 +3883,7 @@ static void update_permission_bitmask(struct kvm_vcpu *vcpu,
u = bit & ACC_USER_MASK;

if (!ept) {
- /* Not really needed: !nx will cause pte.nx to fault */
+ /* Not really needed: if !nx, ff will always be zero */
x |= !mmu->nx;
/* Allow supervisor writes if !cr0.wp */
w |= !is_write_protection(vcpu) && !uf;
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 6013f3685ef4..285858d3223b 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -272,13 +272,24 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
gpa_t pte_gpa;
int offset;
const int write_fault = access & PFERR_WRITE_MASK;
- const int user_fault = access & PFERR_USER_MASK;
- const int fetch_fault = access & PFERR_FETCH_MASK;
- u16 errcode = 0;
+ u16 errcode;
gpa_t real_gpa;
gfn_t gfn;

trace_kvm_mmu_pagetable_walk(addr, access);
+
+ /*
+ * Do not modify PFERR_FETCH_MASK in access. It is used later in the call to
+ * mmu->translate_gpa and, when nested virtualization is in use, the X or NX
+ * bit of nested page tables always applies---even if NX and SMEP are disabled
+ * in the guest.
+ *
+ * TODO: cache the result of the NX and SMEP test in struct kvm_mmu?
+ */
+ errcode = access;
+ if (!(mmu->nx || kvm_read_cr4_bits(vcpu, X86_CR4_SMEP)))
+ errcode &= ~PFERR_FETCH_MASK;
+
retry_walk:
walker->level = mmu->root_level;
pte = mmu->get_cr3(vcpu);
@@ -389,9 +400,7 @@ retry_walk:

if (unlikely(!accessed_dirty)) {
ret = FNAME(update_accessed_dirty_bits)(vcpu, mmu, walker, write_fault);
- if (unlikely(ret < 0))
- goto error;
- else if (ret)
+ if (ret > 0)
goto retry_walk;
}

@@ -402,11 +411,6 @@ retry_walk:
return 1;

error:
- errcode |= write_fault | user_fault;
- if (fetch_fault && (mmu->nx ||
- kvm_read_cr4_bits(vcpu, X86_CR4_SMEP)))
- errcode |= PFERR_FETCH_MASK;
-
walker->fault.vector = PF_VECTOR;
walker->fault.error_code_valid = true;
walker->fault.error_code = errcode;
--
1.8.3.1


2016-03-10 14:02:12

by Xiao Guangrong

[permalink] [raw]
Subject: Re: [RFC PATCH 1/2] KVM: MMU: precompute page fault error code



On 03/08/2016 07:45 PM, Paolo Bonzini wrote:
> For the next patch, we will want to filter PFERR_FETCH_MASK away early,
> and not pass it to permission_fault if neither NX nor SMEP are enabled.
> Prepare for the change.

Why it is needed? It is much easier to drop PFEC.F in
update_permission_bitmask().

>
> Signed-off-by: Paolo Bonzini <[email protected]>
> ---
> arch/x86/kvm/mmu.c | 2 +-
> arch/x86/kvm/paging_tmpl.h | 26 +++++++++++++++-----------
> 2 files changed, 16 insertions(+), 12 deletions(-)
>
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index 2463de0b935c..e57f7be061e3 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -3883,7 +3883,7 @@ static void update_permission_bitmask(struct kvm_vcpu *vcpu,
> u = bit & ACC_USER_MASK;
>
> if (!ept) {
> - /* Not really needed: !nx will cause pte.nx to fault */
> + /* Not really needed: if !nx, ff will always be zero */
> x |= !mmu->nx;
> /* Allow supervisor writes if !cr0.wp */
> w |= !is_write_protection(vcpu) && !uf;
> diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
> index 6013f3685ef4..285858d3223b 100644
> --- a/arch/x86/kvm/paging_tmpl.h
> +++ b/arch/x86/kvm/paging_tmpl.h
> @@ -272,13 +272,24 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
> gpa_t pte_gpa;
> int offset;
> const int write_fault = access & PFERR_WRITE_MASK;
> - const int user_fault = access & PFERR_USER_MASK;
> - const int fetch_fault = access & PFERR_FETCH_MASK;
> - u16 errcode = 0;
> + u16 errcode;
> gpa_t real_gpa;
> gfn_t gfn;
>
> trace_kvm_mmu_pagetable_walk(addr, access);
> +
> + /*
> + * Do not modify PFERR_FETCH_MASK in access. It is used later in the call to
> + * mmu->translate_gpa and, when nested virtualization is in use, the X or NX
> + * bit of nested page tables always applies---even if NX and SMEP are disabled
> + * in the guest.
> + *
> + * TODO: cache the result of the NX and SMEP test in struct kvm_mmu?
> + */
> + errcode = access;
> + if (!(mmu->nx || kvm_read_cr4_bits(vcpu, X86_CR4_SMEP)))
> + errcode &= ~PFERR_FETCH_MASK;
> +

PFEC.P might be set since it is copied from @access.

And the workload is moved from rare path to the common path...

> retry_walk:
> walker->level = mmu->root_level;
> pte = mmu->get_cr3(vcpu);
> @@ -389,9 +400,7 @@ retry_walk:
>
> if (unlikely(!accessed_dirty)) {
> ret = FNAME(update_accessed_dirty_bits)(vcpu, mmu, walker, write_fault);
> - if (unlikely(ret < 0))
> - goto error;
> - else if (ret)
> + if (ret > 0)
> goto retry_walk;

So it returns normally even if update_accessed_dirty_bits() failed.

2016-03-10 14:04:31

by Xiao Guangrong

[permalink] [raw]
Subject: Re: [RFC PATCH 2/2] KVM: MMU: return page fault error code from permission_fault



On 03/08/2016 07:45 PM, Paolo Bonzini wrote:
> Intel's new PKU extension introduces a bit of the page fault error code
> that must be dynamically computed after the PTE lookup (unlike I/D, U/S
> and W/R). To ease this, these two patches make permission_fault return
> the page fault error code.
>
> Right now it only adds the P bit to the input parameter "pfec", but PKU
> can change that.

Yep, i got the same idea when i reviewed the pkey patchset. This patch
looks good to me.

Reviewed-by: Xiao Guangrong <[email protected]>

2016-03-10 14:05:11

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [RFC PATCH 2/2] KVM: MMU: return page fault error code from permission_fault



On 10/03/2016 15:03, Xiao Guangrong wrote:
>
>
> On 03/08/2016 07:45 PM, Paolo Bonzini wrote:
>> Intel's new PKU extension introduces a bit of the page fault error code
>> that must be dynamically computed after the PTE lookup (unlike I/D, U/S
>> and W/R). To ease this, these two patches make permission_fault return
>> the page fault error code.
>>
>> Right now it only adds the P bit to the input parameter "pfec", but PKU
>> can change that.
>
> Yep, i got the same idea when i reviewed the pkey patchset. This patch
> looks good to me.
>
> Reviewed-by: Xiao Guangrong <[email protected]>

Great, feel free to pick it up in the pkey patchset. I'm answering to
1/2 now.

Paolo

2016-03-10 14:07:34

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [RFC PATCH 1/2] KVM: MMU: precompute page fault error code



On 10/03/2016 15:01, Xiao Guangrong wrote:
>
>
> On 03/08/2016 07:45 PM, Paolo Bonzini wrote:
>> For the next patch, we will want to filter PFERR_FETCH_MASK away early,
>> and not pass it to permission_fault if neither NX nor SMEP are enabled.
>> Prepare for the change.
>
> Why it is needed? It is much easier to drop PFEC.F in
> update_permission_bitmask().

It's just how I structured the patches. It's unrelated to
update_permission_bimask.

I wanted permission_fault to return a fault code, and while it's easy to
add bits (such as PKERR_PK_MASK) it's harder to remove bits from the
PFEC that permission_fault receives. So I'm doing it here.

Feel free to instead drop PFEC.F in permission_fault() or even add a new
member to kvm_mmu with the valid bits of a PFEC. This is just an RFC
for you and Huaitong to adapt in the PK patchset. Sometimes things are
easier to describe in patches than in English. :)

Paolo

>>
>> Signed-off-by: Paolo Bonzini <[email protected]>
>> ---
>> arch/x86/kvm/mmu.c | 2 +-
>> arch/x86/kvm/paging_tmpl.h | 26 +++++++++++++++-----------
>> 2 files changed, 16 insertions(+), 12 deletions(-)
>>
>> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
>> index 2463de0b935c..e57f7be061e3 100644
>> --- a/arch/x86/kvm/mmu.c
>> +++ b/arch/x86/kvm/mmu.c
>> @@ -3883,7 +3883,7 @@ static void update_permission_bitmask(struct
>> kvm_vcpu *vcpu,
>> u = bit & ACC_USER_MASK;
>>
>> if (!ept) {
>> - /* Not really needed: !nx will cause pte.nx to fault */
>> + /* Not really needed: if !nx, ff will always be zero */
>> x |= !mmu->nx;
>> /* Allow supervisor writes if !cr0.wp */
>> w |= !is_write_protection(vcpu) && !uf;
>> diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
>> index 6013f3685ef4..285858d3223b 100644
>> --- a/arch/x86/kvm/paging_tmpl.h
>> +++ b/arch/x86/kvm/paging_tmpl.h
>> @@ -272,13 +272,24 @@ static int FNAME(walk_addr_generic)(struct
>> guest_walker *walker,
>> gpa_t pte_gpa;
>> int offset;
>> const int write_fault = access & PFERR_WRITE_MASK;
>> - const int user_fault = access & PFERR_USER_MASK;
>> - const int fetch_fault = access & PFERR_FETCH_MASK;
>> - u16 errcode = 0;
>> + u16 errcode;
>> gpa_t real_gpa;
>> gfn_t gfn;
>>
>> trace_kvm_mmu_pagetable_walk(addr, access);
>> +
>> + /*
>> + * Do not modify PFERR_FETCH_MASK in access. It is used later in
>> the call to
>> + * mmu->translate_gpa and, when nested virtualization is in use,
>> the X or NX
>> + * bit of nested page tables always applies---even if NX and SMEP
>> are disabled
>> + * in the guest.
>> + *
>> + * TODO: cache the result of the NX and SMEP test in struct kvm_mmu?
>> + */
>> + errcode = access;
>> + if (!(mmu->nx || kvm_read_cr4_bits(vcpu, X86_CR4_SMEP)))
>> + errcode &= ~PFERR_FETCH_MASK;
>> +
>
> PFEC.P might be set since it is copied from @access.
>
> And the workload is moved from rare path to the common path...
>
>> retry_walk:
>> walker->level = mmu->root_level;
>> pte = mmu->get_cr3(vcpu);
>> @@ -389,9 +400,7 @@ retry_walk:
>>
>> if (unlikely(!accessed_dirty)) {
>> ret = FNAME(update_accessed_dirty_bits)(vcpu, mmu, walker,
>> write_fault);
>> - if (unlikely(ret < 0))
>> - goto error;
>> - else if (ret)
>> + if (ret > 0)
>> goto retry_walk;
>
> So it returns normally even if update_accessed_dirty_bits() failed.