LinuxLists.cc - [PATCH 0/9] KVM: MMU: fix and improve validation of mmio page fault

2015-08-04 11:05:03

Subject: [PATCH 0/9] KVM: MMU: fix and improve validation of mmio page fault

Current code validating mmio #PF is buggy, it was spotted by Pavel
Shirshov, the bug is that qemu complained with "KVM: unknown exit,
hardware reason 31" and KVM shown these info:
[84245.284948] EPT: Misconfiguration.
[84245.285056] EPT: GPA: 0xfeda848
[84245.285154] ept_misconfig_inspect_spte: spte 0x5eaef50107 level 4
[84245.285344] ept_misconfig_inspect_spte: spte 0x5f5fadc107 level 3
[84245.285532] ept_misconfig_inspect_spte: spte 0x5141d18107 level 2
[84245.285723] ept_misconfig_inspect_spte: spte 0x52e40dad77 level 1

This is because we got a mmio #PF and the handler see the mmio spte
becomes normal (points to the ram page)

However, this is valid after introducing fast mmio spte invalidation which
increases the generation-number instead of zapping mmio sptes, a example
is as follows:
1. QEMU drops mmio region by adding a new memslot
2. invalidate all mmio sptes
3.

VCPU 0 VCPU 1
access the invalid mmio spte

access the region originally was MMIO before
set the spte to the normal ram map

mmio #PF
check the spte and see it becomes normal ram mapping !!!

The first patch simply fixes the bug by dropping the validation in mmio
handler which is good for backport

Later patches enable fully check reserved bits for shadow page table
entries, since shadow page table and guest page table have the some
format, this patches reuse the logic which checks reserved bits on
guest pte to check sptes

Xiao Guangrong (9):
KVM: MMU: fix validation of mmio page fault
KVM: MMU: move FNAME(is_rsvd_bits_set) to mmu.c
KVM: MMU: introduce rsvd_bits_validate
KVM: MMU: split reset_rsvds_bits_mask
KVM: MMU: split reset_rsvds_bits_mask_ept
KVM: MMU: introduce the framework to check reserved bits on sptes
KVM: MMU: introduce is_shadow_rsvd_bits_set()
KVM: MMU: fully check reserved bits for sptes
KVM: VMX: drop ept misconfig check

arch/x86/include/asm/kvm_host.h | 9 +-
arch/x86/kvm/mmu.c | 284 ++++++++++++++++++++++++----------------
arch/x86/kvm/mmu.h | 4 +-
arch/x86/kvm/paging_tmpl.h | 13 +-
arch/x86/kvm/svm.c | 1 +
arch/x86/kvm/vmx.c | 74 +----------
arch/x86/kvm/x86.c | 3 +-
7 files changed, 187 insertions(+), 201 deletions(-)

--
2.1.0

2015-08-04 11:05:01

by Xiao Guangrong

[permalink] [raw]

Subject: [PATCH 1/9] KVM: MMU: fix validation of mmio page fault

We got the bug that qemu complained with "KVM: unknown exit, hardware
reason 31" and KVM shown these info:
[84245.284948] EPT: Misconfiguration.
[84245.285056] EPT: GPA: 0xfeda848
[84245.285154] ept_misconfig_inspect_spte: spte 0x5eaef50107 level 4
[84245.285344] ept_misconfig_inspect_spte: spte 0x5f5fadc107 level 3
[84245.285532] ept_misconfig_inspect_spte: spte 0x5141d18107 level 2
[84245.285723] ept_misconfig_inspect_spte: spte 0x52e40dad77 level 1

This is because we got a mmio #PF and the handler see the mmio spte becomes
normal (points to the ram page)

However, this is valid after introducing fast mmio spte invalidation which
increases the generation-number instead of zapping mmio sptes, a example
is as follows:
1. QEMU drops mmio region by adding a new memslot
2. invalidate all mmio sptes
3.

VCPU 0 VCPU 1
access the invalid mmio spte
access the region originally was MMIO before
set the spte to the normal ram map

mmio #PF
check the spte and see it becomes normal ram mapping !!!

This patch fixes the bug just by dropping the check in mmio handler, it's
good for backport. Full check will be introduced in later patches

Reported-by: Pavel Shirshov <[email protected]>
Tested-by: Pavel Shirshov <[email protected]>
Signed-off-by: Xiao Guangrong <[email protected]>
---
arch/x86/kvm/mmu.c | 45 ---------------------------------------------
1 file changed, 45 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 6de896f..f432e9b 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -357,12 +357,6 @@ static u64 __get_spte_lockless(u64 *sptep)
{
return ACCESS_ONCE(*sptep);
}
-
-static bool __check_direct_spte_mmio_pf(u64 spte)
-{
- /* It is valid if the spte is zapped. */
- return spte == 0ull;
-}
#else
union split_spte {
struct {
@@ -478,23 +472,6 @@ retry:

return spte.spte;
}
-
-static bool __check_direct_spte_mmio_pf(u64 spte)
-{
- union split_spte sspte = (union split_spte)spte;
- u32 high_mmio_mask = shadow_mmio_mask >> 32;
-
- /* It is valid if the spte is zapped. */
- if (spte == 0ull)
- return true;
-
- /* It is valid if the spte is being zapped. */
- if (sspte.spte_low == 0ull &&
- (sspte.spte_high & high_mmio_mask) == high_mmio_mask)
- return true;
-
- return false;
-}
#endif

static bool spte_is_locklessly_modifiable(u64 spte)
@@ -3299,21 +3276,6 @@ static bool quickly_check_mmio_pf(struct kvm_vcpu *vcpu, u64 addr, bool direct)
return vcpu_match_mmio_gva(vcpu, addr);
}

-
-/*
- * On direct hosts, the last spte is only allows two states
- * for mmio page fault:
- * - It is the mmio spte
- * - It is zapped or it is being zapped.
- *
- * This function completely checks the spte when the last spte
- * is not the mmio spte.
- */
-static bool check_direct_spte_mmio_pf(u64 spte)
-{
- return __check_direct_spte_mmio_pf(spte);
-}
-
static u64 walk_shadow_page_get_mmio_spte(struct kvm_vcpu *vcpu, u64 addr)
{
struct kvm_shadow_walk_iterator iterator;
@@ -3356,13 +3318,6 @@ int handle_mmio_page_fault_common(struct kvm_vcpu *vcpu, u64 addr, bool direct)
}

/*
- * It's ok if the gva is remapped by other cpus on shadow guest,
- * it's a BUG if the gfn is not a mmio page.
- */
- if (direct && !check_direct_spte_mmio_pf(spte))
- return RET_MMIO_PF_BUG;
-
- /*
* If the page table is zapped by other cpus, let CPU fault again on
* the address.
*/
--
2.1.0

2015-08-04 11:08:47

by Xiao Guangrong

[permalink] [raw]

Subject: [PATCH 2/9] KVM: MMU: move FNAME(is_rsvd_bits_set) to mmu.c

FNAME(is_rsvd_bits_set) does not depend on guest mmu mode, move it
to mmu.c to stop being compiled multiple times

Signed-off-by: Xiao Guangrong <[email protected]>
---
arch/x86/kvm/mmu.c | 8 ++++++++
arch/x86/kvm/paging_tmpl.h | 13 ++-----------
2 files changed, 10 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index f432e9b..3f1c403 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -3546,6 +3546,14 @@ static inline bool is_last_gpte(struct kvm_mmu *mmu, unsigned level, unsigned gp
return mmu->last_pte_bitmap & (1 << index);
}

+static bool is_rsvd_bits_set(struct kvm_mmu *mmu, u64 gpte, int level)
+{
+ int bit7 = (gpte >> 7) & 1, low6 = gpte & 0x3f;
+
+ return (gpte & mmu->rsvd_bits_mask[bit7][level-1]) |
+ ((mmu->bad_mt_xwr & (1ull << low6)) != 0);
+}
+
#define PTTYPE_EPT 18 /* arbitrary */
#define PTTYPE PTTYPE_EPT
#include "paging_tmpl.h"
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 0f67d7e..736e6ab 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -128,14 +128,6 @@ static inline void FNAME(protect_clean_gpte)(unsigned *access, unsigned gpte)
*access &= mask;
}

-static bool FNAME(is_rsvd_bits_set)(struct kvm_mmu *mmu, u64 gpte, int level)
-{
- int bit7 = (gpte >> 7) & 1, low6 = gpte & 0x3f;
-
- return (gpte & mmu->rsvd_bits_mask[bit7][level-1]) |
- ((mmu->bad_mt_xwr & (1ull << low6)) != 0);
-}
-
static inline int FNAME(is_present_gpte)(unsigned long pte)
{
#if PTTYPE != PTTYPE_EPT
@@ -172,7 +164,7 @@ static bool FNAME(prefetch_invalid_gpte)(struct kvm_vcpu *vcpu,
struct kvm_mmu_page *sp, u64 *spte,
u64 gpte)
{
- if (FNAME(is_rsvd_bits_set)(&vcpu->arch.mmu, gpte, PT_PAGE_TABLE_LEVEL))
+ if (is_rsvd_bits_set(&vcpu->arch.mmu, gpte, PT_PAGE_TABLE_LEVEL))
goto no_present;

if (!FNAME(is_present_gpte)(gpte))
@@ -353,8 +345,7 @@ retry_walk:
if (unlikely(!FNAME(is_present_gpte)(pte)))
goto error;

- if (unlikely(FNAME(is_rsvd_bits_set)(mmu, pte,
- walker->level))) {
+ if (unlikely(is_rsvd_bits_set(mmu, pte, walker->level))) {
errcode |= PFERR_RSVD_MASK | PFERR_PRESENT_MASK;
goto error;
}
--
2.1.0

2015-08-04 11:08:27

by Xiao Guangrong

[permalink] [raw]

Subject: [PATCH 3/9] KVM: MMU: introduce rsvd_bits_validate

These two fields, rsvd_bits_mask and bad_mt_xwr, in "struct kvm_mmu" are
used to check if reserved bits set on guest ptes, move them to a data
struct so that the approach can be applied to check reserved bits on host
shadow page table entries

Signed-off-by: Xiao Guangrong <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 8 +++--
arch/x86/kvm/mmu.c | 75 +++++++++++++++++++++++------------------
arch/x86/kvm/x86.c | 3 +-
3 files changed, 50 insertions(+), 36 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 6e851d5..3e33c0d 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -252,6 +252,11 @@ struct kvm_pio_request {
int size;
};

+struct rsvd_bits_validate {
+ u64 rsvd_bits_mask[2][4];
+ u64 bad_mt_xwr;
+};
+
/*
* x86 supports 3 paging modes (4-level 64-bit, 3-level 64-bit, and 2-level
* 32-bit). The kvm_mmu structure abstracts the details of the current mmu
@@ -289,8 +294,7 @@ struct kvm_mmu {

u64 *pae_root;
u64 *lm_root;
- u64 rsvd_bits_mask[2][4];
- u64 bad_mt_xwr;
+ struct rsvd_bits_validate guest_rsvd_check;

/*
* Bitmap: bit set = last pte in walk
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 3f1c403..23633f5 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -3548,10 +3548,11 @@ static inline bool is_last_gpte(struct kvm_mmu *mmu, unsigned level, unsigned gp

static bool is_rsvd_bits_set(struct kvm_mmu *mmu, u64 gpte, int level)
{
+ struct rsvd_bits_validate *rsvd_check = &mmu->guest_rsvd_check;
int bit7 = (gpte >> 7) & 1, low6 = gpte & 0x3f;

- return (gpte & mmu->rsvd_bits_mask[bit7][level-1]) |
- ((mmu->bad_mt_xwr & (1ull << low6)) != 0);
+ return (gpte & rsvd_check->rsvd_bits_mask[bit7][level-1]) |
+ ((rsvd_check->bad_mt_xwr & (1ull << low6)) != 0);
}

#define PTTYPE_EPT 18 /* arbitrary */
@@ -3570,12 +3571,13 @@ static bool is_rsvd_bits_set(struct kvm_mmu *mmu, u64 gpte, int level)
static void reset_rsvds_bits_mask(struct kvm_vcpu *vcpu,
struct kvm_mmu *context)
{
+ struct rsvd_bits_validate *rsvd_check = &context->guest_rsvd_check;
int maxphyaddr = cpuid_maxphyaddr(vcpu);
u64 exb_bit_rsvd = 0;
u64 gbpages_bit_rsvd = 0;
u64 nonleaf_bit8_rsvd = 0;

- context->bad_mt_xwr = 0;
+ rsvd_check->bad_mt_xwr = 0;

if (!context->nx)
exb_bit_rsvd = rsvd_bits(63, 63);
@@ -3592,52 +3594,58 @@ static void reset_rsvds_bits_mask(struct kvm_vcpu *vcpu,
switch (context->root_level) {
case PT32_ROOT_LEVEL:
/* no rsvd bits for 2 level 4K page table entries */
- context->rsvd_bits_mask[0][1] = 0;
- context->rsvd_bits_mask[0][0] = 0;
- context->rsvd_bits_mask[1][0] = context->rsvd_bits_mask[0][0];
+ rsvd_check->rsvd_bits_mask[0][1] = 0;
+ rsvd_check->rsvd_bits_mask[0][0] = 0;
+ rsvd_check->rsvd_bits_mask[1][0] =
+ rsvd_check->rsvd_bits_mask[0][0];

if (!is_pse(vcpu)) {
- context->rsvd_bits_mask[1][1] = 0;
+ rsvd_check->rsvd_bits_mask[1][1] = 0;
break;
}

if (is_cpuid_PSE36())
/* 36bits PSE 4MB page */
- context->rsvd_bits_mask[1][1] = rsvd_bits(17, 21);
+ rsvd_check->rsvd_bits_mask[1][1] = rsvd_bits(17, 21);
else
/* 32 bits PSE 4MB page */
- context->rsvd_bits_mask[1][1] = rsvd_bits(13, 21);
+ rsvd_check->rsvd_bits_mask[1][1] = rsvd_bits(13, 21);
break;
case PT32E_ROOT_LEVEL:
- context->rsvd_bits_mask[0][2] =
+ rsvd_check->rsvd_bits_mask[0][2] =
rsvd_bits(maxphyaddr, 63) |
rsvd_bits(5, 8) | rsvd_bits(1, 2); /* PDPTE */
- context->rsvd_bits_mask[0][1] = exb_bit_rsvd |
+ rsvd_check->rsvd_bits_mask[0][1] = exb_bit_rsvd |
rsvd_bits(maxphyaddr, 62); /* PDE */
- context->rsvd_bits_mask[0][0] = exb_bit_rsvd |
+ rsvd_check->rsvd_bits_mask[0][0] = exb_bit_rsvd |
rsvd_bits(maxphyaddr, 62); /* PTE */
- context->rsvd_bits_mask[1][1] = exb_bit_rsvd |
+ rsvd_check->rsvd_bits_mask[1][1] = exb_bit_rsvd |
rsvd_bits(maxphyaddr, 62) |
rsvd_bits(13, 20); /* large page */
- context->rsvd_bits_mask[1][0] = context->rsvd_bits_mask[0][0];
+ rsvd_check->rsvd_bits_mask[1][0] =
+ rsvd_check->rsvd_bits_mask[0][0];
break;
case PT64_ROOT_LEVEL:
- context->rsvd_bits_mask[0][3] = exb_bit_rsvd |
- nonleaf_bit8_rsvd | rsvd_bits(7, 7) | rsvd_bits(maxphyaddr, 51);
- context->rsvd_bits_mask[0][2] = exb_bit_rsvd |
- nonleaf_bit8_rsvd | gbpages_bit_rsvd | rsvd_bits(maxphyaddr, 51);
- context->rsvd_bits_mask[0][1] = exb_bit_rsvd |
+ rsvd_check->rsvd_bits_mask[0][3] = exb_bit_rsvd |
+ nonleaf_bit8_rsvd | rsvd_bits(7, 7) |
rsvd_bits(maxphyaddr, 51);
- context->rsvd_bits_mask[0][0] = exb_bit_rsvd |
+ rsvd_check->rsvd_bits_mask[0][2] = exb_bit_rsvd |
+ nonleaf_bit8_rsvd | gbpages_bit_rsvd |
rsvd_bits(maxphyaddr, 51);
- context->rsvd_bits_mask[1][3] = context->rsvd_bits_mask[0][3];
- context->rsvd_bits_mask[1][2] = exb_bit_rsvd |
+ rsvd_check->rsvd_bits_mask[0][1] = exb_bit_rsvd |
+ rsvd_bits(maxphyaddr, 51);
+ rsvd_check->rsvd_bits_mask[0][0] = exb_bit_rsvd |
+ rsvd_bits(maxphyaddr, 51);
+ rsvd_check->rsvd_bits_mask[1][3] =
+ rsvd_check->rsvd_bits_mask[0][3];
+ rsvd_check->rsvd_bits_mask[1][2] = exb_bit_rsvd |
gbpages_bit_rsvd | rsvd_bits(maxphyaddr, 51) |
rsvd_bits(13, 29);
- context->rsvd_bits_mask[1][1] = exb_bit_rsvd |
+ rsvd_check->rsvd_bits_mask[1][1] = exb_bit_rsvd |
rsvd_bits(maxphyaddr, 51) |
rsvd_bits(13, 20); /* large page */
- context->rsvd_bits_mask[1][0] = context->rsvd_bits_mask[0][0];
+ rsvd_check->rsvd_bits_mask[1][0] =
+ rsvd_check->rsvd_bits_mask[0][0];
break;
}
}
@@ -3645,24 +3653,25 @@ static void reset_rsvds_bits_mask(struct kvm_vcpu *vcpu,
static void reset_rsvds_bits_mask_ept(struct kvm_vcpu *vcpu,
struct kvm_mmu *context, bool execonly)
{
+ struct rsvd_bits_validate *rsvd_check = &context->guest_rsvd_check;
int maxphyaddr = cpuid_maxphyaddr(vcpu);
int pte;

- context->rsvd_bits_mask[0][3] =
+ rsvd_check->rsvd_bits_mask[0][3] =
rsvd_bits(maxphyaddr, 51) | rsvd_bits(3, 7);
- context->rsvd_bits_mask[0][2] =
+ rsvd_check->rsvd_bits_mask[0][2] =
rsvd_bits(maxphyaddr, 51) | rsvd_bits(3, 6);
- context->rsvd_bits_mask[0][1] =
+ rsvd_check->rsvd_bits_mask[0][1] =
rsvd_bits(maxphyaddr, 51) | rsvd_bits(3, 6);
- context->rsvd_bits_mask[0][0] = rsvd_bits(maxphyaddr, 51);
+ rsvd_check->rsvd_bits_mask[0][0] = rsvd_bits(maxphyaddr, 51);

/* large page */
- context->rsvd_bits_mask[1][3] = context->rsvd_bits_mask[0][3];
- context->rsvd_bits_mask[1][2] =
+ rsvd_check->rsvd_bits_mask[1][3] = rsvd_check->rsvd_bits_mask[0][3];
+ rsvd_check->rsvd_bits_mask[1][2] =
rsvd_bits(maxphyaddr, 51) | rsvd_bits(12, 29);
- context->rsvd_bits_mask[1][1] =
+ rsvd_check->rsvd_bits_mask[1][1] =
rsvd_bits(maxphyaddr, 51) | rsvd_bits(12, 20);
- context->rsvd_bits_mask[1][0] = context->rsvd_bits_mask[0][0];
+ rsvd_check->rsvd_bits_mask[1][0] = rsvd_check->rsvd_bits_mask[0][0];

for (pte = 0; pte < 64; pte++) {
int rwx_bits = pte & 7;
@@ -3670,7 +3679,7 @@ static void reset_rsvds_bits_mask_ept(struct kvm_vcpu *vcpu,
if (mt == 0x2 || mt == 0x3 || mt == 0x7 ||
rwx_bits == 0x2 || rwx_bits == 0x6 ||
(rwx_bits == 0x4 && !execonly))
- context->bad_mt_xwr |= (1ull << pte);
+ rsvd_check->bad_mt_xwr |= (1ull << pte);
}
}

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ec7bf2b..e585007 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -525,7 +525,8 @@ int load_pdptrs(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu, unsigned long cr3)
}
for (i = 0; i < ARRAY_SIZE(pdpte); ++i) {
if (is_present_gpte(pdpte[i]) &&
- (pdpte[i] & vcpu->arch.mmu.rsvd_bits_mask[0][2])) {
+ (pdpte[i] &
+ vcpu->arch.mmu.guest_rsvd_check.rsvd_bits_mask[0][2])) {
ret = 0;
goto out;
}
--
2.1.0

2015-08-04 11:07:58

by Xiao Guangrong

[permalink] [raw]

Subject: [PATCH 4/9] KVM: MMU: split reset_rsvds_bits_mask

Since softmmu & AMD nested shadow page tables and guest page tables have
the same format, split reset_rsvds_bits_mask so that the logic can be
reused by later patches which check reserved bits on sptes

Signed-off-by: Xiao Guangrong <[email protected]>
---
arch/x86/kvm/mmu.c | 26 ++++++++++++++++++--------
1 file changed, 18 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 23633f5..693d565 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -3568,20 +3568,21 @@ static bool is_rsvd_bits_set(struct kvm_mmu *mmu, u64 gpte, int level)
#include "paging_tmpl.h"
#undef PTTYPE

-static void reset_rsvds_bits_mask(struct kvm_vcpu *vcpu,
- struct kvm_mmu *context)
+static void
+__reset_rsvds_bits_mask(struct kvm_vcpu *vcpu,
+ struct rsvd_bits_validate *rsvd_check,
+ int maxphyaddr, int level, bool nx, bool gbpages,
+ bool pse)
{
- struct rsvd_bits_validate *rsvd_check = &context->guest_rsvd_check;
- int maxphyaddr = cpuid_maxphyaddr(vcpu);
u64 exb_bit_rsvd = 0;
u64 gbpages_bit_rsvd = 0;
u64 nonleaf_bit8_rsvd = 0;

rsvd_check->bad_mt_xwr = 0;

- if (!context->nx)
+ if (!nx)
exb_bit_rsvd = rsvd_bits(63, 63);
- if (!guest_cpuid_has_gbpages(vcpu))
+ if (!gbpages)
gbpages_bit_rsvd = rsvd_bits(7, 7);

/*
@@ -3591,7 +3592,7 @@ static void reset_rsvds_bits_mask(struct kvm_vcpu *vcpu,
if (guest_cpuid_is_amd(vcpu))
nonleaf_bit8_rsvd = rsvd_bits(8, 8);

- switch (context->root_level) {
+ switch (level) {
case PT32_ROOT_LEVEL:
/* no rsvd bits for 2 level 4K page table entries */
rsvd_check->rsvd_bits_mask[0][1] = 0;
@@ -3599,7 +3600,7 @@ static void reset_rsvds_bits_mask(struct kvm_vcpu *vcpu,
rsvd_check->rsvd_bits_mask[1][0] =
rsvd_check->rsvd_bits_mask[0][0];

- if (!is_pse(vcpu)) {
+ if (!pse) {
rsvd_check->rsvd_bits_mask[1][1] = 0;
break;
}
@@ -3650,6 +3651,15 @@ static void reset_rsvds_bits_mask(struct kvm_vcpu *vcpu,
}
}

+static void reset_rsvds_bits_mask(struct kvm_vcpu *vcpu,
+ struct kvm_mmu *context)
+{
+ __reset_rsvds_bits_mask(vcpu, &context->guest_rsvd_check,
+ cpuid_maxphyaddr(vcpu), context->root_level,
+ context->nx, guest_cpuid_has_gbpages(vcpu),
+ is_pse(vcpu));
+}
+
static void reset_rsvds_bits_mask_ept(struct kvm_vcpu *vcpu,
struct kvm_mmu *context, bool execonly)
{
--
2.1.0

2015-08-04 11:05:07

by Xiao Guangrong

[permalink] [raw]

Subject: [PATCH 5/9] KVM: MMU: split reset_rsvds_bits_mask_ept

Since shdow ept page tables and intel nested guest page tables have the
same format, split reset_rsvds_bits_mask_ept so that the logic can be
reused by later patches which check reserved bits on sptes

Signed-off-by: Xiao Guangrong <[email protected]>
---
arch/x86/kvm/mmu.c | 14 ++++++++++----
1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 693d565..d11d212 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -3660,11 +3660,10 @@ static void reset_rsvds_bits_mask(struct kvm_vcpu *vcpu,
is_pse(vcpu));
}

-static void reset_rsvds_bits_mask_ept(struct kvm_vcpu *vcpu,
- struct kvm_mmu *context, bool execonly)
+static void
+__reset_rsvds_bits_mask_ept(struct rsvd_bits_validate *rsvd_check,
+ int maxphyaddr, bool execonly)
{
- struct rsvd_bits_validate *rsvd_check = &context->guest_rsvd_check;
- int maxphyaddr = cpuid_maxphyaddr(vcpu);
int pte;

rsvd_check->rsvd_bits_mask[0][3] =
@@ -3693,6 +3692,13 @@ static void reset_rsvds_bits_mask_ept(struct kvm_vcpu *vcpu,
}
}

+static void reset_rsvds_bits_mask_ept(struct kvm_vcpu *vcpu,
+ struct kvm_mmu *context, bool execonly)
+{
+ __reset_rsvds_bits_mask_ept(&context->guest_rsvd_check,
+ cpuid_maxphyaddr(vcpu), execonly);
+}
+
static void update_permission_bitmask(struct kvm_vcpu *vcpu,
struct kvm_mmu *mmu, bool ept)
{
--
2.1.0

2015-08-04 11:07:36

by Xiao Guangrong

[permalink] [raw]

Subject: [PATCH 6/9] KVM: MMU: introduce the framework to check reserved bits on sptes

We have abstracted the data struct and functions which are used to check
reserved bit on guest page tables, now we extend the logic to check
reserved bits on shadow page tables

Signed-off-by: Xiao Guangrong <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/kvm/mmu.c | 51 +++++++++++++++++++++++++++++++++++++++++
arch/x86/kvm/mmu.h | 3 +++
arch/x86/kvm/svm.c | 1 +
4 files changed, 56 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 3e33c0d..8356259 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -294,6 +294,7 @@ struct kvm_mmu {

u64 *pae_root;
u64 *lm_root;
+ struct rsvd_bits_validate shadow_rsvd_check;
struct rsvd_bits_validate guest_rsvd_check;

/*
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index d11d212..50fbc14 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -3699,6 +3699,54 @@ static void reset_rsvds_bits_mask_ept(struct kvm_vcpu *vcpu,
cpuid_maxphyaddr(vcpu), execonly);
}

+/*
+ * the page table on host is the shadow page table for the page
+ * table in guest or amd nested guest, its mmu features completely
+ * follow the features in guest.
+ */
+void
+reset_shadow_rsvds_bits_mask(struct kvm_vcpu *vcpu, struct kvm_mmu *context)
+{
+ __reset_rsvds_bits_mask(vcpu, &context->shadow_rsvd_check,
+ boot_cpu_data.x86_phys_bits,
+ context->shadow_root_level, context->nx,
+ guest_cpuid_has_gbpages(vcpu), is_pse(vcpu));
+}
+EXPORT_SYMBOL_GPL(reset_shadow_rsvds_bits_mask);
+
+/*
+ * the direct page table on host, use as much mmu features as
+ * possible, however, kvm currently does not do execution-protection.
+ */
+static void
+reset_tdp_shadow_rsvds_bits_mask(struct kvm_vcpu *vcpu,
+ struct kvm_mmu *context)
+{
+ if (guest_cpuid_is_amd(vcpu))
+ __reset_rsvds_bits_mask(vcpu, &context->shadow_rsvd_check,
+ boot_cpu_data.x86_phys_bits,
+ context->shadow_root_level, false,
+ cpu_has_gbpages, true);
+ else
+ __reset_rsvds_bits_mask_ept(&context->shadow_rsvd_check,
+ boot_cpu_data.x86_phys_bits,
+ false);
+
+}
+
+/*
+ * as the comments in reset_shadow_rsvds_bits_mask() except it
+ * is the shadow page table for intel nested guest.
+ */
+static void
+reset_ept_shadow_rsvds_bits_mask(struct kvm_vcpu *vcpu,
+ struct kvm_mmu *context,
+ bool execonly)
+{
+ __reset_rsvds_bits_mask_ept(&context->shadow_rsvd_check,
+ boot_cpu_data.x86_phys_bits, execonly);
+}
+
static void update_permission_bitmask(struct kvm_vcpu *vcpu,
struct kvm_mmu *mmu, bool ept)
{
@@ -3877,6 +3925,7 @@ static void init_kvm_tdp_mmu(struct kvm_vcpu *vcpu)

update_permission_bitmask(vcpu, context, false);
update_last_pte_bitmap(vcpu, context);
+ reset_tdp_shadow_rsvds_bits_mask(vcpu, context);
}

void kvm_init_shadow_mmu(struct kvm_vcpu *vcpu)
@@ -3904,6 +3953,7 @@ void kvm_init_shadow_mmu(struct kvm_vcpu *vcpu)
context->base_role.smap_andnot_wp
= smap && !is_write_protection(vcpu);
context->base_role.smm = is_smm(vcpu);
+ reset_shadow_rsvds_bits_mask(vcpu, context);
}
EXPORT_SYMBOL_GPL(kvm_init_shadow_mmu);

@@ -3927,6 +3977,7 @@ void kvm_init_shadow_ept_mmu(struct kvm_vcpu *vcpu, bool execonly)

update_permission_bitmask(vcpu, context, true);
reset_rsvds_bits_mask_ept(vcpu, context, execonly);
+ reset_ept_shadow_rsvds_bits_mask(vcpu, context, execonly);
}
EXPORT_SYMBOL_GPL(kvm_init_shadow_ept_mmu);

diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 398d21c..96563be 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -53,6 +53,9 @@ static inline u64 rsvd_bits(int s, int e)
int kvm_mmu_get_spte_hierarchy(struct kvm_vcpu *vcpu, u64 addr, u64 sptes[4]);
void kvm_mmu_set_mmio_spte_mask(u64 mmio_mask);

+void
+reset_shadow_rsvds_bits_mask(struct kvm_vcpu *vcpu, struct kvm_mmu *context);
+
/*
* Return values of handle_mmio_page_fault_common:
* RET_MMIO_PF_EMULATE: it is a real mmio page fault, emulate the instruction
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 568cd0f..1de4685 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -2107,6 +2107,7 @@ static void nested_svm_init_mmu_context(struct kvm_vcpu *vcpu)
vcpu->arch.mmu.get_pdptr = nested_svm_get_tdp_pdptr;
vcpu->arch.mmu.inject_page_fault = nested_svm_inject_npf_exit;
vcpu->arch.mmu.shadow_root_level = get_npt_level();
+ reset_shadow_rsvds_bits_mask(vcpu, &vcpu->arch.mmu);
vcpu->arch.walk_mmu = &vcpu->arch.nested_mmu;
}

--
2.1.0

2015-08-04 11:05:17

by Xiao Guangrong

[permalink] [raw]

Subject: [PATCH 7/9] KVM: MMU: introduce is_shadow_rsvd_bits_set()

We have the same data struct to check reserved bits on guest page tables
and shadow page tables, split is_rsvd_bits_set() so that the logic can be
shared between these two paths

Signed-off-by: Xiao Guangrong <[email protected]>
---
arch/x86/kvm/mmu.c | 28 +++++++++++++++++++---------
1 file changed, 19 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 50fbc14..3f9ce29 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -3268,6 +3268,25 @@ static gpa_t nonpaging_gva_to_gpa_nested(struct kvm_vcpu *vcpu, gva_t vaddr,
return vcpu->arch.nested_mmu.translate_gpa(vcpu, vaddr, access, exception);
}

+static bool
+__is_rsvd_bits_set(struct rsvd_bits_validate *rsvd_check, u64 pte, int level)
+{
+ int bit7 = (pte >> 7) & 1, low6 = pte & 0x3f;
+
+ return (pte & rsvd_check->rsvd_bits_mask[bit7][level-1]) |
+ ((rsvd_check->bad_mt_xwr & (1ull << low6)) != 0);
+}
+
+static bool is_rsvd_bits_set(struct kvm_mmu *mmu, u64 gpte, int level)
+{
+ return __is_rsvd_bits_set(&mmu->guest_rsvd_check, gpte, level);
+}
+
+static bool is_shadow_rsvd_bits_set(struct kvm_mmu *mmu, u64 spte, int level)
+{
+ return __is_rsvd_bits_set(&mmu->shadow_rsvd_check, spte, level);
+}
+
static bool quickly_check_mmio_pf(struct kvm_vcpu *vcpu, u64 addr, bool direct)
{
if (direct)
@@ -3546,15 +3565,6 @@ static inline bool is_last_gpte(struct kvm_mmu *mmu, unsigned level, unsigned gp
return mmu->last_pte_bitmap & (1 << index);
}

-static bool is_rsvd_bits_set(struct kvm_mmu *mmu, u64 gpte, int level)
-{
- struct rsvd_bits_validate *rsvd_check = &mmu->guest_rsvd_check;
- int bit7 = (gpte >> 7) & 1, low6 = gpte & 0x3f;
-
- return (gpte & rsvd_check->rsvd_bits_mask[bit7][level-1]) |
- ((rsvd_check->bad_mt_xwr & (1ull << low6)) != 0);
-}
-
#define PTTYPE_EPT 18 /* arbitrary */
#define PTTYPE PTTYPE_EPT
#include "paging_tmpl.h"
--
2.1.0

2015-08-04 11:05:14

by Xiao Guangrong

[permalink] [raw]

Subject: [PATCH 8/9] KVM: MMU: fully check reserved bits for sptes

The #PF with PFEC.RSV = 1 is designed to speed MMIO emulation, however,
it is possible that the RSV #PF is caused by real BUG by mis-configure
shadow page table entries

This patch enables full check for the reserved bits on shadow page table
entries and dump the shadow page table hierarchy is the real bug is
detected

Signed-off-by: Xiao Guangrong <[email protected]>
---
arch/x86/kvm/mmu.c | 41 +++++++++++++++++++++++++++++++++++------
1 file changed, 35 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 3f9ce29..6b0e9c9 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -3295,31 +3295,60 @@ static bool quickly_check_mmio_pf(struct kvm_vcpu *vcpu, u64 addr, bool direct)
return vcpu_match_mmio_gva(vcpu, addr);
}

-static u64 walk_shadow_page_get_mmio_spte(struct kvm_vcpu *vcpu, u64 addr)
+/* return true if reserved bit is detected on spte. */
+static bool
+walk_shadow_page_get_mmio_spte(struct kvm_vcpu *vcpu, u64 addr, u64 *sptep)
{
struct kvm_shadow_walk_iterator iterator;
- u64 spte = 0ull;
+ u64 sptes[PT64_ROOT_LEVEL], spte = 0ull;
+ int root = 0, leaf;
+ bool reserved = false;

if (!VALID_PAGE(vcpu->arch.mmu.root_hpa))
- return spte;
+ goto exit;

walk_shadow_page_lockless_begin(vcpu);
- for_each_shadow_entry_lockless(vcpu, addr, iterator, spte)
+ for_each_shadow_entry_lockless(vcpu, addr, iterator, spte) {
+ leaf = iterator.level;
+
+ if (!root)
+ root = leaf;
+
+ sptes[leaf - 1] = spte;
+
if (!is_shadow_present_pte(spte))
break;
+
+ reserved |= is_shadow_rsvd_bits_set(&vcpu->arch.mmu, spte,
+ leaf);
+ }
walk_shadow_page_lockless_end(vcpu);

- return spte;
+ if (reserved) {
+ pr_err("%s: detect reserved bits on spte, addr 0x%llx, dump hierarchy:\n",
+ __func__, addr);
+ while (root >= leaf) {
+ pr_err("------ spte 0x%llx level %d.\n",
+ sptes[root - 1], root);
+ root--;
+ }
+ }
+exit:
+ *sptep = spte;
+ return reserved;
}

int handle_mmio_page_fault_common(struct kvm_vcpu *vcpu, u64 addr, bool direct)
{
u64 spte;
+ bool reserved;

if (quickly_check_mmio_pf(vcpu, addr, direct))
return RET_MMIO_PF_EMULATE;

- spte = walk_shadow_page_get_mmio_spte(vcpu, addr);
+ reserved = walk_shadow_page_get_mmio_spte(vcpu, addr, &spte);
+ if (unlikely(reserved))
+ return RET_MMIO_PF_BUG;

if (is_mmio_spte(spte)) {
gfn_t gfn = get_mmio_spte_gfn(spte);
--
2.1.0

2015-08-04 11:05:59

by Xiao Guangrong

[permalink] [raw]

Subject: [PATCH 9/9] KVM: VMX: drop ept misconfig check

The logic used to check ept misconfig is completely contained in common
reserved bits check for sptes, so it can be removed

Signed-off-by: Xiao Guangrong <[email protected]>
---
arch/x86/kvm/mmu.c | 22 ----------------
arch/x86/kvm/mmu.h | 1 -
arch/x86/kvm/vmx.c | 74 ++----------------------------------------------------
3 files changed, 2 insertions(+), 95 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 6b0e9c9..a15dc11 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -4938,28 +4938,6 @@ unsigned int kvm_mmu_calculate_mmu_pages(struct kvm *kvm)
return nr_mmu_pages;
}

-int kvm_mmu_get_spte_hierarchy(struct kvm_vcpu *vcpu, u64 addr, u64 sptes[4])
-{
- struct kvm_shadow_walk_iterator iterator;
- u64 spte;
- int nr_sptes = 0;
-
- if (!VALID_PAGE(vcpu->arch.mmu.root_hpa))
- return nr_sptes;
-
- walk_shadow_page_lockless_begin(vcpu);
- for_each_shadow_entry_lockless(vcpu, addr, iterator, spte) {
- sptes[iterator.level-1] = spte;
- nr_sptes++;
- if (!is_shadow_present_pte(spte))
- break;
- }
- walk_shadow_page_lockless_end(vcpu);
-
- return nr_sptes;
-}
-EXPORT_SYMBOL_GPL(kvm_mmu_get_spte_hierarchy);
-
void kvm_mmu_destroy(struct kvm_vcpu *vcpu)
{
kvm_mmu_unload(vcpu);
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 96563be..67c4a03 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -50,7 +50,6 @@ static inline u64 rsvd_bits(int s, int e)
return ((1ULL << (e - s + 1)) - 1) << s;
}

-int kvm_mmu_get_spte_hierarchy(struct kvm_vcpu *vcpu, u64 addr, u64 sptes[4]);
void kvm_mmu_set_mmio_spte_mask(u64 mmio_mask);

void
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 217f663..4cf25b9 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -5748,73 +5748,9 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu)
return kvm_mmu_page_fault(vcpu, gpa, error_code, NULL, 0);
}

-static u64 ept_rsvd_mask(u64 spte, int level)
-{
- int i;
- u64 mask = 0;
-
- for (i = 51; i > boot_cpu_data.x86_phys_bits; i--)
- mask |= (1ULL << i);
-
- if (level == 4)
- /* bits 7:3 reserved */
- mask |= 0xf8;
- else if (spte & (1ULL << 7))
- /*
- * 1GB/2MB page, bits 29:12 or 20:12 reserved respectively,
- * level == 1 if the hypervisor is using the ignored bit 7.
- */
- mask |= (PAGE_SIZE << ((level - 1) * 9)) - PAGE_SIZE;
- else if (level > 1)
- /* bits 6:3 reserved */
- mask |= 0x78;
-
- return mask;
-}
-
-static void ept_misconfig_inspect_spte(struct kvm_vcpu *vcpu, u64 spte,
- int level)
-{
- printk(KERN_ERR "%s: spte 0x%llx level %d\n", __func__, spte, level);
-
- /* 010b (write-only) */
- WARN_ON((spte & 0x7) == 0x2);
-
- /* 110b (write/execute) */
- WARN_ON((spte & 0x7) == 0x6);
-
- /* 100b (execute-only) and value not supported by logical processor */
- if (!cpu_has_vmx_ept_execute_only())
- WARN_ON((spte & 0x7) == 0x4);
-
- /* not 000b */
- if ((spte & 0x7)) {
- u64 rsvd_bits = spte & ept_rsvd_mask(spte, level);
-
- if (rsvd_bits != 0) {
- printk(KERN_ERR "%s: rsvd_bits = 0x%llx\n",
- __func__, rsvd_bits);
- WARN_ON(1);
- }
-
- /* bits 5:3 are _not_ reserved for large page or leaf page */
- if ((rsvd_bits & 0x38) == 0) {
- u64 ept_mem_type = (spte & 0x38) >> 3;
-
- if (ept_mem_type == 2 || ept_mem_type == 3 ||
- ept_mem_type == 7) {
- printk(KERN_ERR "%s: ept_mem_type=0x%llx\n",
- __func__, ept_mem_type);
- WARN_ON(1);
- }
- }
- }
-}
-
static int handle_ept_misconfig(struct kvm_vcpu *vcpu)
{
- u64 sptes[4];
- int nr_sptes, i, ret;
+ int ret;
gpa_t gpa;

gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS);
@@ -5835,13 +5771,7 @@ static int handle_ept_misconfig(struct kvm_vcpu *vcpu)
return 1;

/* It is the real ept misconfig */
- printk(KERN_ERR "EPT: Misconfiguration.\n");
- printk(KERN_ERR "EPT: GPA: 0x%llx\n", gpa);
-
- nr_sptes = kvm_mmu_get_spte_hierarchy(vcpu, gpa, sptes);
-
- for (i = PT64_ROOT_LEVEL; i > PT64_ROOT_LEVEL - nr_sptes; --i)
- ept_misconfig_inspect_spte(vcpu, sptes[i-1], i);
+ WARN_ON(1);

vcpu->run->exit_reason = KVM_EXIT_UNKNOWN;
vcpu->run->hw.hardware_exit_reason = EXIT_REASON_EPT_MISCONFIG;
--
2.1.0

2015-08-04 11:10:25

by Xiao Guangrong

[permalink] [raw]

Subject: Re: [PATCH 0/9] KVM: MMU: fix and improve validation of mmio page fault

CCed Pavel Shirshov <[email protected]>

Sorry, git tool missed to CC mail to the person tagged with "Reported-by"
and "Tested-by". :(

On 08/04/2015 06:59 PM, Xiao Guangrong wrote:
> Current code validating mmio #PF is buggy, it was spotted by Pavel
> Shirshov, the bug is that qemu complained with "KVM: unknown exit,
> hardware reason 31" and KVM shown these info:
> [84245.284948] EPT: Misconfiguration.
> [84245.285056] EPT: GPA: 0xfeda848
> [84245.285154] ept_misconfig_inspect_spte: spte 0x5eaef50107 level 4
> [84245.285344] ept_misconfig_inspect_spte: spte 0x5f5fadc107 level 3
> [84245.285532] ept_misconfig_inspect_spte: spte 0x5141d18107 level 2
> [84245.285723] ept_misconfig_inspect_spte: spte 0x52e40dad77 level 1
>
> This is because we got a mmio #PF and the handler see the mmio spte
> becomes normal (points to the ram page)
>
> However, this is valid after introducing fast mmio spte invalidation which
> increases the generation-number instead of zapping mmio sptes, a example
> is as follows:
> 1. QEMU drops mmio region by adding a new memslot
> 2. invalidate all mmio sptes
> 3.
>
> VCPU 0 VCPU 1
> access the invalid mmio spte
>
> access the region originally was MMIO before
> set the spte to the normal ram map
>
> mmio #PF
> check the spte and see it becomes normal ram mapping !!!
>
> The first patch simply fixes the bug by dropping the validation in mmio
> handler which is good for backport
>
> Later patches enable fully check reserved bits for shadow page table
> entries, since shadow page table and guest page table have the some
> format, this patches reuse the logic which checks reserved bits on
> guest pte to check sptes
>
> Xiao Guangrong (9):
> KVM: MMU: fix validation of mmio page fault
> KVM: MMU: move FNAME(is_rsvd_bits_set) to mmu.c
> KVM: MMU: introduce rsvd_bits_validate
> KVM: MMU: split reset_rsvds_bits_mask
> KVM: MMU: split reset_rsvds_bits_mask_ept
> KVM: MMU: introduce the framework to check reserved bits on sptes
> KVM: MMU: introduce is_shadow_rsvd_bits_set()
> KVM: MMU: fully check reserved bits for sptes
> KVM: VMX: drop ept misconfig check
>
> arch/x86/include/asm/kvm_host.h | 9 +-
> arch/x86/kvm/mmu.c | 284 ++++++++++++++++++++++++----------------
> arch/x86/kvm/mmu.h | 4 +-
> arch/x86/kvm/paging_tmpl.h | 13 +-
> arch/x86/kvm/svm.c | 1 +
> arch/x86/kvm/vmx.c | 74 +----------
> arch/x86/kvm/x86.c | 3 +-
> 7 files changed, 187 insertions(+), 201 deletions(-)
>

2015-08-04 12:14:26

by Paolo Bonzini

[permalink] [raw]

Subject: Re: [PATCH 6/9] KVM: MMU: introduce the framework to check reserved bits on sptes

On 04/08/2015 12:59, Xiao Guangrong wrote:
> +/*
> + * the page table on host is the shadow page table for the page
> + * table in guest or amd nested guest, its mmu features completely
> + * follow the features in guest.
> + */
> +void
> +reset_shadow_rsvds_bits_mask(struct kvm_vcpu *vcpu, struct kvm_mmu *context)
> +{
> + __reset_rsvds_bits_mask(vcpu, &context->shadow_rsvd_check,
> + boot_cpu_data.x86_phys_bits,
> + context->shadow_root_level, context->nx,

This should be cpu_has_nx, I think.

> + guest_cpuid_has_gbpages(vcpu),

This should be cpu_has_gbpages.

> is_pse(vcpu));

This should be cpu_has_pse.

Paolo

> +}
> +EXPORT_SYMBOL_GPL(reset_shadow_rsvds_bits_mask);

2015-08-04 13:15:47

by Xiao Guangrong

[permalink] [raw]

Subject: Re: [PATCH 6/9] KVM: MMU: introduce the framework to check reserved bits on sptes

On 08/04/2015 08:14 PM, Paolo Bonzini wrote:
>
>
> On 04/08/2015 12:59, Xiao Guangrong wrote:
>> +/*
>> + * the page table on host is the shadow page table for the page
>> + * table in guest or amd nested guest, its mmu features completely
>> + * follow the features in guest.
>> + */
>> +void
>> +reset_shadow_rsvds_bits_mask(struct kvm_vcpu *vcpu, struct kvm_mmu *context)
>> +{
>> + __reset_rsvds_bits_mask(vcpu, &context->shadow_rsvd_check,
>> + boot_cpu_data.x86_phys_bits,
>> + context->shadow_root_level, context->nx,
>
> This should be cpu_has_nx, I think.

cpu_has_nx() checks the feature on host CPU, however, this is the shadow
page table which completely follow guest's features.

E.g, if guest does not execution-protect the physical page, then
KVM does not do it either.

>
>> + guest_cpuid_has_gbpages(vcpu),
>
> This should be cpu_has_gbpages.

E.g, if guest does not use 1G page size, it's also not used in shadow page
table.

>
>> is_pse(vcpu));
>
> This should be cpu_has_pse.

E.g, guest does no use 4M page size, then KVM does not use it either.

BTW, cpu_pse only hurts 32 bit page table which is not used by shadow
page table (32 PAE and 64 Long mode are used in shadow page).

Only tdp only follows host CPU's features, KVM does not use NX to
protect the page, so i always mark it as false in
reset_tdp_shadow_rsvds_bits_mask().

2015-08-04 13:23:07

by Paolo Bonzini

[permalink] [raw]

Subject: Re: [PATCH 6/9] KVM: MMU: introduce the framework to check reserved bits on sptes

On 04/08/2015 15:10, Xiao Guangrong wrote:
>>
>> This should be cpu_has_nx, I think.
>
> cpu_has_nx() checks the feature on host CPU, however, this is the shadow
> page table which completely follow guest's features.
>
> E.g, if guest does not execution-protect the physical page, then
> KVM does not do it either.

That's just true for current code. In principle you could add a memslot
flag for KVM_MEMSLOT_NO_EXECUTE, then NX would be true on an spte but
not on a PTE.

>>
>>> + guest_cpuid_has_gbpages(vcpu),
>>
>> This should be cpu_has_gbpages.
>
> E.g, if guest does not use 1G page size, it's also not used in shadow page
> table.

However, bit 7 in the shadow PDPTE is not reserved. If you're not
testing "is this bit reserved" but rather "should this bit be always
zero" in the SPTE, then checking guest_cpuid is okay. But in that case
shadow_rsvd_check is really more like shadow_always_zero_check.

>>
>>> is_pse(vcpu));
>>
>> This should be cpu_has_pse.
>
> E.g, guest does no use 4M page size, then KVM does not use it either.

Right, it should always be true, not cpu_has_pse, because PAE and 64-bit
page tables always support huge (2M) pages. Or as above, if you're
testing "should this bit be always zero" then it's a different story.

Paolo

> BTW, cpu_pse only hurts 32 bit page table which is not used by shadow
> page table (32 PAE and 64 Long mode are used in shadow page).
>
> Only tdp only follows host CPU's features, KVM does not use NX to
> protect the page, so i always mark it as false in
> reset_tdp_shadow_rsvds_bits_mask().

2015-08-04 13:40:17

by Xiao Guangrong

[permalink] [raw]

Subject: Re: [PATCH 6/9] KVM: MMU: introduce the framework to check reserved bits on sptes

On 08/04/2015 09:23 PM, Paolo Bonzini wrote:
>
>
> On 04/08/2015 15:10, Xiao Guangrong wrote:
>>>
>>> This should be cpu_has_nx, I think.
>>
>> cpu_has_nx() checks the feature on host CPU, however, this is the shadow
>> page table which completely follow guest's features.
>>
>> E.g, if guest does not execution-protect the physical page, then
>> KVM does not do it either.
>
> That's just true for current code. In principle you could add a memslot
> flag for KVM_MEMSLOT_NO_EXECUTE, then NX would be true on an spte but
> not on a PTE.

Yes, i agree. I would like to keep it as strict as possible to catch
potential bugs. We can relax it while KVM_MEMSLOT_NO_EXECUTE is being
developed.

>
>>>
>>>> + guest_cpuid_has_gbpages(vcpu),
>>>
>>> This should be cpu_has_gbpages.
>>
>> E.g, if guest does not use 1G page size, it's also not used in shadow page
>> table.
>
> However, bit 7 in the shadow PDPTE is not reserved. If you're not
> testing "is this bit reserved" but rather "should this bit be always
> zero" in the SPTE, then checking guest_cpuid is okay. But in that case
> shadow_rsvd_check is really more like shadow_always_zero_check.

Yes, it is not reserved in hardware's point of view. shadow_always_zero_check()
seems a more meaningful name, thanks for your suggestion. :)

>
>>>
>>>> is_pse(vcpu));
>>>
>>> This should be cpu_has_pse.
>>
>> E.g, guest does no use 4M page size, then KVM does not use it either.
>
> Right, it should always be true, not cpu_has_pse, because PAE and 64-bit
> page tables always support huge (2M) pages. Or as above, if you're
> testing "should this bit be always zero" then it's a different story.

Yeah, i will rename the function.