2015-11-06 07:07:20

by Takuya Yoshikawa

[permalink] [raw]
Subject: [PATCH 0/5] KVM: x86: MMU: Clean up x86's mmu code for future work

Patch 1/2/3 are easy ones.

Following two, patch 4/5, may not be ideal solutions, but at least
explain, or try to explain, the problems.

Takuya Yoshikawa (5):
KVM: x86: MMU: Remove unused parameter of __direct_map()
KVM: x86: MMU: Add helper function to clear a bit in unsync child bitmap
KVM: x86: MMU: Make mmu_set_spte() return emulate value
KVM: x86: MMU: Remove is_rmap_spte() and use is_shadow_present_pte()
KVM: x86: MMU: Consolidate WARN_ON/BUG_ON checks for reverse-mapped sptes

Documentation/virtual/kvm/mmu.txt | 4 +-
arch/x86/kvm/mmu.c | 118 ++++++++++++++++++++------------------
arch/x86/kvm/mmu_audit.c | 2 +-
arch/x86/kvm/paging_tmpl.h | 10 ++--
4 files changed, 70 insertions(+), 64 deletions(-)

--
2.1.0


2015-11-06 07:08:22

by Takuya Yoshikawa

[permalink] [raw]
Subject: [PATCH 1/5] KVM: x86: MMU: Remove unused parameter of __direct_map()

Signed-off-by: Takuya Yoshikawa <[email protected]>
---
arch/x86/kvm/mmu.c | 11 ++++-------
1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 7d85bca..a76bc04 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2708,9 +2708,8 @@ static void direct_pte_prefetch(struct kvm_vcpu *vcpu, u64 *sptep)
__direct_pte_prefetch(vcpu, sp, sptep);
}

-static int __direct_map(struct kvm_vcpu *vcpu, gpa_t v, int write,
- int map_writable, int level, gfn_t gfn, pfn_t pfn,
- bool prefault)
+static int __direct_map(struct kvm_vcpu *vcpu, int write, int map_writable,
+ int level, gfn_t gfn, pfn_t pfn, bool prefault)
{
struct kvm_shadow_walk_iterator iterator;
struct kvm_mmu_page *sp;
@@ -3018,8 +3017,7 @@ static int nonpaging_map(struct kvm_vcpu *vcpu, gva_t v, u32 error_code,
make_mmu_pages_available(vcpu);
if (likely(!force_pt_level))
transparent_hugepage_adjust(vcpu, &gfn, &pfn, &level);
- r = __direct_map(vcpu, v, write, map_writable, level, gfn, pfn,
- prefault);
+ r = __direct_map(vcpu, write, map_writable, level, gfn, pfn, prefault);
spin_unlock(&vcpu->kvm->mmu_lock);


@@ -3541,8 +3539,7 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa, u32 error_code,
make_mmu_pages_available(vcpu);
if (likely(!force_pt_level))
transparent_hugepage_adjust(vcpu, &gfn, &pfn, &level);
- r = __direct_map(vcpu, gpa, write, map_writable,
- level, gfn, pfn, prefault);
+ r = __direct_map(vcpu, write, map_writable, level, gfn, pfn, prefault);
spin_unlock(&vcpu->kvm->mmu_lock);

return r;
--
2.1.0

2015-11-06 07:09:12

by Takuya Yoshikawa

[permalink] [raw]
Subject: [PATCH 2/5] KVM: x86: MMU: Add helper function to clear a bit in unsync child bitmap

Both __mmu_unsync_walk() and mmu_pages_clear_parents() have three line
code which clears a bit in the unsync child bitmap; the former places it
inside a loop block and uses a few goto statements to jump to it.

A new helper function, clear_unsync_child_bit(), makes the code cleaner.

Signed-off-by: Takuya Yoshikawa <[email protected]>
---
arch/x86/kvm/mmu.c | 36 ++++++++++++++++++------------------
1 file changed, 18 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index a76bc04..a9622a2 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1806,6 +1806,13 @@ static int mmu_pages_add(struct kvm_mmu_pages *pvec, struct kvm_mmu_page *sp,
return (pvec->nr == KVM_PAGE_ARRAY_NR);
}

+static inline void clear_unsync_child_bit(struct kvm_mmu_page *sp, int idx)
+{
+ --sp->unsync_children;
+ WARN_ON((int)sp->unsync_children < 0);
+ __clear_bit(idx, sp->unsync_child_bitmap);
+}
+
static int __mmu_unsync_walk(struct kvm_mmu_page *sp,
struct kvm_mmu_pages *pvec)
{
@@ -1815,8 +1822,10 @@ static int __mmu_unsync_walk(struct kvm_mmu_page *sp,
struct kvm_mmu_page *child;
u64 ent = sp->spt[i];

- if (!is_shadow_present_pte(ent) || is_large_pte(ent))
- goto clear_child_bitmap;
+ if (!is_shadow_present_pte(ent) || is_large_pte(ent)) {
+ clear_unsync_child_bit(sp, i);
+ continue;
+ }

child = page_header(ent & PT64_BASE_ADDR_MASK);

@@ -1825,28 +1834,21 @@ static int __mmu_unsync_walk(struct kvm_mmu_page *sp,
return -ENOSPC;

ret = __mmu_unsync_walk(child, pvec);
- if (!ret)
- goto clear_child_bitmap;
- else if (ret > 0)
+ if (!ret) {
+ clear_unsync_child_bit(sp, i);
+ continue;
+ } else if (ret > 0) {
nr_unsync_leaf += ret;
- else
+ } else
return ret;
} else if (child->unsync) {
nr_unsync_leaf++;
if (mmu_pages_add(pvec, child, i))
return -ENOSPC;
} else
- goto clear_child_bitmap;
-
- continue;
-
-clear_child_bitmap:
- __clear_bit(i, sp->unsync_child_bitmap);
- sp->unsync_children--;
- WARN_ON((int)sp->unsync_children < 0);
+ clear_unsync_child_bit(sp, i);
}

-
return nr_unsync_leaf;
}

@@ -2009,9 +2011,7 @@ static void mmu_pages_clear_parents(struct mmu_page_path *parents)
if (!sp)
return;

- --sp->unsync_children;
- WARN_ON((int)sp->unsync_children < 0);
- __clear_bit(idx, sp->unsync_child_bitmap);
+ clear_unsync_child_bit(sp, idx);
level++;
} while (level < PT64_ROOT_LEVEL-1 && !sp->unsync_children);
}
--
2.1.0

2015-11-06 07:10:12

by Takuya Yoshikawa

[permalink] [raw]
Subject: [PATCH 3/5] KVM: x86: MMU: Make mmu_set_spte() return emulate value

mmu_set_spte()'s code is based on the assumption that the emulate
parameter has a valid pointer value if set_spte() returns true and
write_fault is not zero. In other cases, emulate may be NULL, so a
NULL-check is needed.

Stop passing emulate pointer and make mmu_set_spte() return the emulate
value instead to clean up this complex interface. Prefetch functions
can just throw away the return value.

Signed-off-by: Takuya Yoshikawa <[email protected]>
---
arch/x86/kvm/mmu.c | 27 ++++++++++++++-------------
arch/x86/kvm/paging_tmpl.h | 10 +++++-----
2 files changed, 19 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index a9622a2..69e7d20 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2564,13 +2564,13 @@ done:
return ret;
}

-static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
- unsigned pte_access, int write_fault, int *emulate,
- int level, gfn_t gfn, pfn_t pfn, bool speculative,
- bool host_writable)
+static bool mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep, unsigned pte_access,
+ int write_fault, int level, gfn_t gfn, pfn_t pfn,
+ bool speculative, bool host_writable)
{
int was_rmapped = 0;
int rmap_count;
+ bool emulate = false;

pgprintk("%s: spte %llx write_fault %d gfn %llx\n", __func__,
*sptep, write_fault, gfn);
@@ -2600,12 +2600,12 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
if (set_spte(vcpu, sptep, pte_access, level, gfn, pfn, speculative,
true, host_writable)) {
if (write_fault)
- *emulate = 1;
+ emulate = true;
kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu);
}

- if (unlikely(is_mmio_spte(*sptep) && emulate))
- *emulate = 1;
+ if (unlikely(is_mmio_spte(*sptep)))
+ emulate = true;

pgprintk("%s: setting spte %llx\n", __func__, *sptep);
pgprintk("instantiating %s PTE (%s) at %llx (%llx) addr %p\n",
@@ -2624,6 +2624,8 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
}

kvm_release_pfn_clean(pfn);
+
+ return emulate;
}

static pfn_t pte_prefetch_gfn_to_pfn(struct kvm_vcpu *vcpu, gfn_t gfn,
@@ -2658,9 +2660,8 @@ static int direct_pte_prefetch_many(struct kvm_vcpu *vcpu,
return -1;

for (i = 0; i < ret; i++, gfn++, start++)
- mmu_set_spte(vcpu, start, access, 0, NULL,
- sp->role.level, gfn, page_to_pfn(pages[i]),
- true, true);
+ mmu_set_spte(vcpu, start, access, 0, sp->role.level, gfn,
+ page_to_pfn(pages[i]), true, true);

return 0;
}
@@ -2721,9 +2722,9 @@ static int __direct_map(struct kvm_vcpu *vcpu, int write, int map_writable,

for_each_shadow_entry(vcpu, (u64)gfn << PAGE_SHIFT, iterator) {
if (iterator.level == level) {
- mmu_set_spte(vcpu, iterator.sptep, ACC_ALL,
- write, &emulate, level, gfn, pfn,
- prefault, map_writable);
+ emulate = mmu_set_spte(vcpu, iterator.sptep, ACC_ALL,
+ write, level, gfn, pfn, prefault,
+ map_writable);
direct_pte_prefetch(vcpu, iterator.sptep);
++vcpu->stat.pf_fixed;
break;
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index b41faa9..de24499 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -475,8 +475,8 @@ FNAME(prefetch_gpte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
* we call mmu_set_spte() with host_writable = true because
* pte_prefetch_gfn_to_pfn always gets a writable pfn.
*/
- mmu_set_spte(vcpu, spte, pte_access, 0, NULL, PT_PAGE_TABLE_LEVEL,
- gfn, pfn, true, true);
+ mmu_set_spte(vcpu, spte, pte_access, 0, PT_PAGE_TABLE_LEVEL, gfn, pfn,
+ true, true);

return true;
}
@@ -556,7 +556,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
struct kvm_mmu_page *sp = NULL;
struct kvm_shadow_walk_iterator it;
unsigned direct_access, access = gw->pt_access;
- int top_level, emulate = 0;
+ int top_level, emulate;

direct_access = gw->pte_access;

@@ -622,8 +622,8 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
}

clear_sp_write_flooding_count(it.sptep);
- mmu_set_spte(vcpu, it.sptep, gw->pte_access, write_fault, &emulate,
- it.level, gw->gfn, pfn, prefault, map_writable);
+ emulate = mmu_set_spte(vcpu, it.sptep, gw->pte_access, write_fault,
+ it.level, gw->gfn, pfn, prefault, map_writable);
FNAME(pte_prefetch)(vcpu, gw, it.sptep);

return emulate;
--
2.1.0

2015-11-06 07:11:07

by Takuya Yoshikawa

[permalink] [raw]
Subject: [PATCH 4/5] KVM: x86: MMU: Remove is_rmap_spte() and use is_shadow_present_pte()

is_rmap_spte(), originally named is_rmap_pte(), was introduced when the
simple reverse mapping was implemented by commit cd4a4e5374110444
("[PATCH] KVM: MMU: Implement simple reverse mapping"). At that point,
its role was clear and only rmap_add() and rmap_remove() were using it
to select sptes that need to be reverse-mapped.

Independently of that, is_shadow_present_pte() was first introduced by
commit c7addb902054195b ("KVM: Allow not-present guest page faults to
bypass kvm") to do bypass_guest_pf optimization, which does not exist
any more.

These two seem to have changed their roles somewhat, and is_rmap_spte()
just calls is_shadow_present_pte() now.

Since using both of them without no clear distinction just makes the
code confusing, remove is_rmap_spte().

Signed-off-by: Takuya Yoshikawa <[email protected]>
---
arch/x86/kvm/mmu.c | 13 ++++---------
arch/x86/kvm/mmu_audit.c | 2 +-
2 files changed, 5 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 69e7d20..c5e2363 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -311,11 +311,6 @@ static int is_large_pte(u64 pte)
return pte & PT_PAGE_SIZE_MASK;
}

-static int is_rmap_spte(u64 pte)
-{
- return is_shadow_present_pte(pte);
-}
-
static int is_last_spte(u64 pte, int level)
{
if (level == PT_PAGE_TABLE_LEVEL)
@@ -540,7 +535,7 @@ static bool mmu_spte_update(u64 *sptep, u64 new_spte)
u64 old_spte = *sptep;
bool ret = false;

- WARN_ON(!is_rmap_spte(new_spte));
+ WARN_ON(!is_shadow_present_pte(new_spte));

if (!is_shadow_present_pte(old_spte)) {
mmu_spte_set(sptep, new_spte);
@@ -595,7 +590,7 @@ static int mmu_spte_clear_track_bits(u64 *sptep)
else
old_spte = __update_clear_spte_slow(sptep, 0ull);

- if (!is_rmap_spte(old_spte))
+ if (!is_shadow_present_pte(old_spte))
return 0;

pfn = spte_to_pfn(old_spte);
@@ -2575,7 +2570,7 @@ static bool mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep, unsigned pte_access,
pgprintk("%s: spte %llx write_fault %d gfn %llx\n", __func__,
*sptep, write_fault, gfn);

- if (is_rmap_spte(*sptep)) {
+ if (is_shadow_present_pte(*sptep)) {
/*
* If we overwrite a PTE page pointer with a 2MB PMD, unlink
* the parent of the now unreachable PTE.
@@ -2919,7 +2914,7 @@ static bool fast_page_fault(struct kvm_vcpu *vcpu, gva_t gva, int level,
* If the mapping has been changed, let the vcpu fault on the
* same address again.
*/
- if (!is_rmap_spte(spte)) {
+ if (!is_shadow_present_pte(spte)) {
ret = true;
goto exit;
}
diff --git a/arch/x86/kvm/mmu_audit.c b/arch/x86/kvm/mmu_audit.c
index 03d518e..90ee420 100644
--- a/arch/x86/kvm/mmu_audit.c
+++ b/arch/x86/kvm/mmu_audit.c
@@ -183,7 +183,7 @@ static void check_mappings_rmap(struct kvm *kvm, struct kvm_mmu_page *sp)
return;

for (i = 0; i < PT64_ENT_PER_PAGE; ++i) {
- if (!is_rmap_spte(sp->spt[i]))
+ if (!is_shadow_present_pte(sp->spt[i]))
continue;

inspect_spte_has_rmap(kvm, sp->spt + i);
--
2.1.0

2015-11-06 07:12:03

by Takuya Yoshikawa

[permalink] [raw]
Subject: [PATCH 5/5] KVM: x86: MMU: Consolidate WARN_ON/BUG_ON checks for reverse-mapped sptes

At some call sites of rmap_get_first() and rmap_get_next(), BUG_ON is
placed right after the call to detect unrelated sptes which should not
be found in the reverse-mapping list.

Move this check in rmap_get_first/next() so that all call sites, not
just the users of the for_each_rmap_spte() macro, will be checked the
same way. In addition, change the BUG_ON to WARN_ON since killing the
whole host is the last thing that KVM should try.

One thing to keep in mind is that kvm_mmu_unlink_parents() also uses
rmap_get_first() to handle parent sptes. The change will not break it
because parent sptes are present, at least until drop_parent_pte()
actually unlinks them, and not mmio-sptes.

Signed-off-by: Takuya Yoshikawa <[email protected]>
---
Documentation/virtual/kvm/mmu.txt | 4 ++--
arch/x86/kvm/mmu.c | 31 ++++++++++++++++++++++---------
2 files changed, 24 insertions(+), 11 deletions(-)

diff --git a/Documentation/virtual/kvm/mmu.txt b/Documentation/virtual/kvm/mmu.txt
index 3a4d681..daf9c0f 100644
--- a/Documentation/virtual/kvm/mmu.txt
+++ b/Documentation/virtual/kvm/mmu.txt
@@ -203,10 +203,10 @@ Shadow pages contain the following information:
page cannot be destroyed. See role.invalid.
parent_ptes:
The reverse mapping for the pte/ptes pointing at this page's spt. If
- parent_ptes bit 0 is zero, only one spte points at this pages and
+ parent_ptes bit 0 is zero, only one spte points at this page and
parent_ptes points at this single spte, otherwise, there exists multiple
sptes pointing at this page and (parent_ptes & ~0x1) points at a data
- structure with a list of parent_ptes.
+ structure with a list of parent sptes.
unsync:
If true, then the translations in this page may not match the guest's
translation. This is equivalent to the state of the tlb when a pte is
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index c5e2363..353d752 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1099,17 +1099,28 @@ struct rmap_iterator {
*/
static u64 *rmap_get_first(unsigned long rmap, struct rmap_iterator *iter)
{
+ u64 *sptep;
+
if (!rmap)
return NULL;

if (!(rmap & 1)) {
iter->desc = NULL;
- return (u64 *)rmap;
+ sptep = (u64 *)rmap;
+ goto out;
}

iter->desc = (struct pte_list_desc *)(rmap & ~1ul);
iter->pos = 0;
- return iter->desc->sptes[iter->pos];
+ sptep = iter->desc->sptes[iter->pos];
+out:
+ /*
+ * Parent sptes found in sp->parent_ptes lists are also checked here
+ * since kvm_mmu_unlink_parents() uses this function. If the condition
+ * needs to be changed for them, make another wrapper function.
+ */
+ WARN_ON(!is_shadow_present_pte(*sptep));
+ return sptep;
}

/*
@@ -1119,14 +1130,14 @@ static u64 *rmap_get_first(unsigned long rmap, struct rmap_iterator *iter)
*/
static u64 *rmap_get_next(struct rmap_iterator *iter)
{
+ u64 *sptep;
+
if (iter->desc) {
if (iter->pos < PTE_LIST_EXT - 1) {
- u64 *sptep;
-
++iter->pos;
sptep = iter->desc->sptes[iter->pos];
if (sptep)
- return sptep;
+ goto out;
}

iter->desc = iter->desc->more;
@@ -1134,17 +1145,20 @@ static u64 *rmap_get_next(struct rmap_iterator *iter)
if (iter->desc) {
iter->pos = 0;
/* desc->sptes[0] cannot be NULL */
- return iter->desc->sptes[iter->pos];
+ sptep = iter->desc->sptes[iter->pos];
+ goto out;
}
}

return NULL;
+out:
+ WARN_ON(!is_shadow_present_pte(*sptep));
+ return sptep;
}

#define for_each_rmap_spte(_rmap_, _iter_, _spte_) \
for (_spte_ = rmap_get_first(*_rmap_, _iter_); \
- _spte_ && ({BUG_ON(!is_shadow_present_pte(*_spte_)); 1;}); \
- _spte_ = rmap_get_next(_iter_))
+ _spte_; _spte_ = rmap_get_next(_iter_))

static void drop_spte(struct kvm *kvm, u64 *sptep)
{
@@ -1358,7 +1372,6 @@ static bool kvm_zap_rmapp(struct kvm *kvm, unsigned long *rmapp)
bool flush = false;

while ((sptep = rmap_get_first(*rmapp, &iter))) {
- BUG_ON(!(*sptep & PT_PRESENT_MASK));
rmap_printk("%s: spte %p %llx.\n", __func__, sptep, *sptep);

drop_spte(kvm, sptep);
--
2.1.0

2015-11-09 10:14:51

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH 5/5] KVM: x86: MMU: Consolidate WARN_ON/BUG_ON checks for reverse-mapped sptes

On 06/11/2015 08:25, Takuya Yoshikawa wrote:
> At some call sites of rmap_get_first() and rmap_get_next(), BUG_ON is
> placed right after the call to detect unrelated sptes which should not
> be found in the reverse-mapping list.
>
> Move this check in rmap_get_first/next() so that all call sites, not
> just the users of the for_each_rmap_spte() macro, will be checked the
> same way. In addition, change the BUG_ON to WARN_ON since killing the
> whole host is the last thing that KVM should try.
>
> One thing to keep in mind is that kvm_mmu_unlink_parents() also uses
> rmap_get_first() to handle parent sptes. The change will not break it
> because parent sptes are present, at least until drop_parent_pte()
> actually unlinks them, and not mmio-sptes.

Can you also change kvm_mmu_mark_parents_unsync to use
for_each_rmap_spte instead of pte_list_walk? It is the last use of
pte_list_walk, and it's nice if we have two uses of for_each_rmap_spte
with parent_ptes as the argument.

BTW, on my todo list is to change the rmap items to a struct (with a
single u64 inside) for type safety. Since you are touching this code,
perhaps you can give it a shot?

Paolo

> Signed-off-by: Takuya Yoshikawa <[email protected]>
> ---
> Documentation/virtual/kvm/mmu.txt | 4 ++--
> arch/x86/kvm/mmu.c | 31 ++++++++++++++++++++++---------
> 2 files changed, 24 insertions(+), 11 deletions(-)
>
> diff --git a/Documentation/virtual/kvm/mmu.txt b/Documentation/virtual/kvm/mmu.txt
> index 3a4d681..daf9c0f 100644
> --- a/Documentation/virtual/kvm/mmu.txt
> +++ b/Documentation/virtual/kvm/mmu.txt
> @@ -203,10 +203,10 @@ Shadow pages contain the following information:
> page cannot be destroyed. See role.invalid.
> parent_ptes:
> The reverse mapping for the pte/ptes pointing at this page's spt. If
> - parent_ptes bit 0 is zero, only one spte points at this pages and
> + parent_ptes bit 0 is zero, only one spte points at this page and
> parent_ptes points at this single spte, otherwise, there exists multiple
> sptes pointing at this page and (parent_ptes & ~0x1) points at a data
> - structure with a list of parent_ptes.
> + structure with a list of parent sptes.
> unsync:
> If true, then the translations in this page may not match the guest's
> translation. This is equivalent to the state of the tlb when a pte is
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index c5e2363..353d752 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -1099,17 +1099,28 @@ struct rmap_iterator {
> */
> static u64 *rmap_get_first(unsigned long rmap, struct rmap_iterator *iter)
> {
> + u64 *sptep;
> +
> if (!rmap)
> return NULL;
>
> if (!(rmap & 1)) {
> iter->desc = NULL;
> - return (u64 *)rmap;
> + sptep = (u64 *)rmap;
> + goto out;
> }
>
> iter->desc = (struct pte_list_desc *)(rmap & ~1ul);
> iter->pos = 0;
> - return iter->desc->sptes[iter->pos];
> + sptep = iter->desc->sptes[iter->pos];
> +out:
> + /*
> + * Parent sptes found in sp->parent_ptes lists are also checked here
> + * since kvm_mmu_unlink_parents() uses this function. If the condition
> + * needs to be changed for them, make another wrapper function.
> + */
> + WARN_ON(!is_shadow_present_pte(*sptep));
> + return sptep;
> }
>
> /*
> @@ -1119,14 +1130,14 @@ static u64 *rmap_get_first(unsigned long rmap, struct rmap_iterator *iter)
> */
> static u64 *rmap_get_next(struct rmap_iterator *iter)
> {
> + u64 *sptep;
> +
> if (iter->desc) {
> if (iter->pos < PTE_LIST_EXT - 1) {
> - u64 *sptep;
> -
> ++iter->pos;
> sptep = iter->desc->sptes[iter->pos];
> if (sptep)
> - return sptep;
> + goto out;
> }
>
> iter->desc = iter->desc->more;
> @@ -1134,17 +1145,20 @@ static u64 *rmap_get_next(struct rmap_iterator *iter)
> if (iter->desc) {
> iter->pos = 0;
> /* desc->sptes[0] cannot be NULL */
> - return iter->desc->sptes[iter->pos];
> + sptep = iter->desc->sptes[iter->pos];
> + goto out;
> }
> }
>
> return NULL;
> +out:
> + WARN_ON(!is_shadow_present_pte(*sptep));
> + return sptep;
> }
>
> #define for_each_rmap_spte(_rmap_, _iter_, _spte_) \
> for (_spte_ = rmap_get_first(*_rmap_, _iter_); \
> - _spte_ && ({BUG_ON(!is_shadow_present_pte(*_spte_)); 1;}); \
> - _spte_ = rmap_get_next(_iter_))
> + _spte_; _spte_ = rmap_get_next(_iter_))
>
> static void drop_spte(struct kvm *kvm, u64 *sptep)
> {
> @@ -1358,7 +1372,6 @@ static bool kvm_zap_rmapp(struct kvm *kvm, unsigned long *rmapp)
> bool flush = false;
>
> while ((sptep = rmap_get_first(*rmapp, &iter))) {
> - BUG_ON(!(*sptep & PT_PRESENT_MASK));
> rmap_printk("%s: spte %p %llx.\n", __func__, sptep, *sptep);
>
> drop_spte(kvm, sptep);
>

2015-11-09 10:16:49

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH 0/5] KVM: x86: MMU: Clean up x86's mmu code for future work



On 06/11/2015 08:20, Takuya Yoshikawa wrote:
> Patch 1/2/3 are easy ones.
>
> Following two, patch 4/5, may not be ideal solutions, but at least
> explain, or try to explain, the problems.

They are okay! I replied to patch 5 with a suggestion for further
cleanup. I'll apply them for 4.5.

Paolo

2015-11-10 09:04:29

by Takuya Yoshikawa

[permalink] [raw]
Subject: Re: [PATCH 5/5] KVM: x86: MMU: Consolidate WARN_ON/BUG_ON checks for reverse-mapped sptes

On 2015/11/09 19:14, Paolo Bonzini wrote:
> Can you also change kvm_mmu_mark_parents_unsync to use
> for_each_rmap_spte instead of pte_list_walk? It is the last use of
> pte_list_walk, and it's nice if we have two uses of for_each_rmap_spte
> with parent_ptes as the argument.

No problem, I will do.

Since parent_ptes is also explained as the "reverse mapping" list of
parent sptes (in mmu.txt and kvm_host.h), using rmap helpers will not
look so strange.

> BTW, on my todo list is to change the rmap items to a struct (with a
> single u64 inside) for type safety. Since you are touching this code,
> perhaps you can give it a shot?

Yes, almost done here (assuming that you mean 'unsigned long').
But I have some candidates for its name in mind:

1. struct kvm_rmap { unsigned long val; };
2. struct kvm_rmap_head { unsigned long val; };
3. struct kvm_rmap_list_head { unsigned long val; };
4. struct kvm_spte_list_head { unsigned long val; };

Since this is the head of the reverse mapping list of sptes, I thought
name 3 might be the best and first made a patch with it, but it was
a bit longer than I had hoped it to be.

I have changed it to name 2, and it looks a bit nicer now, but even
shorter, e.g. name 1, may be good as well.

Do you have any preference?

Takuya

2015-11-10 09:18:52

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH 5/5] KVM: x86: MMU: Consolidate WARN_ON/BUG_ON checks for reverse-mapped sptes



On 10/11/2015 10:05, Takuya Yoshikawa wrote:
>
>
>> BTW, on my todo list is to change the rmap items to a struct (with a
>> single u64 inside) for type safety. Since you are touching this code,
>> perhaps you can give it a shot?
>
> Yes, almost done here (assuming that you mean 'unsigned long').

Yes, thanks!

> But I have some candidates for its name in mind:
>
> 1. struct kvm_rmap { unsigned long val; };
> 2. struct kvm_rmap_head { unsigned long val; };
> 3. struct kvm_rmap_list_head { unsigned long val; };
> 4. struct kvm_spte_list_head { unsigned long val; };
>
> Since this is the head of the reverse mapping list of sptes, I thought
> name 3 might be the best and first made a patch with it, but it was
> a bit longer than I had hoped it to be.
>
> I have changed it to name 2, and it looks a bit nicer now, but even
> shorter, e.g. name 1, may be good as well.
>
> Do you have any preference?

I like kvm_rmap_head.

Paolo