2010-07-16 03:23:56

by Xiao Guangrong

[permalink] [raw]
Subject: [PATCH v2 1/6] KVM: MMU: fix forgot reserved bits check in speculative path

In the speculative path, we should check guest pte's reserved bits just as
the real processor does

Reported-by: Marcelo Tosatti <[email protected]>
Signed-off-by: Xiao Guangrong <[email protected]>
---
arch/x86/kvm/mmu.c | 9 ++++++++-
arch/x86/kvm/paging_tmpl.h | 5 +++--
2 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index d16efbe..1a4b42e 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2693,6 +2693,9 @@ static void mmu_pte_write_new_pte(struct kvm_vcpu *vcpu,
return;
}

+ if (is_rsvd_bits_set(vcpu, *(u64 *)new, PT_PAGE_TABLE_LEVEL))
+ return;
+
++vcpu->kvm->stat.mmu_pte_updated;
if (!sp->role.cr4_pae)
paging32_update_pte(vcpu, sp, spte, new);
@@ -2771,6 +2774,7 @@ void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
bool guest_initiated)
{
gfn_t gfn = gpa >> PAGE_SHIFT;
+ union kvm_mmu_page_role mask = { .word = 0 };
struct kvm_mmu_page *sp;
struct hlist_node *node;
LIST_HEAD(invalid_list);
@@ -2845,6 +2849,7 @@ void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
}
}

+ mask.cr0_wp = mask.cr4_pae = mask.nxe = 1;
for_each_gfn_indirect_valid_sp(vcpu->kvm, sp, gfn, node) {
pte_size = sp->role.cr4_pae ? 8 : 4;
misaligned = (offset ^ (offset + bytes - 1)) & ~(pte_size - 1);
@@ -2892,7 +2897,9 @@ void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
while (npte--) {
entry = *spte;
mmu_pte_write_zap_pte(vcpu, sp, spte);
- if (gentry)
+ if (gentry &&
+ !(sp->role.word ^ vcpu->arch.mmu.base_role.word)
+ & mask.word)
mmu_pte_write_new_pte(vcpu, sp, spte, &gentry);
if (!remote_flush && need_remote_flush(entry, *spte))
remote_flush = true;
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index a09e04c..231fce1 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -638,8 +638,9 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
return -EINVAL;

gfn = gpte_to_gfn(gpte);
- if (gfn != sp->gfns[i] ||
- !is_present_gpte(gpte) || !(gpte & PT_ACCESSED_MASK)) {
+ if (is_rsvd_bits_set(vcpu, gpte, PT_PAGE_TABLE_LEVEL)
+ || gfn != sp->gfns[i] || !is_present_gpte(gpte)
+ || !(gpte & PT_ACCESSED_MASK)) {
u64 nonpresent;

if (is_present_gpte(gpte) || !clear_unsync)
--
1.6.1.2


2010-07-16 03:27:11

by Xiao Guangrong

[permalink] [raw]
Subject: [PATCH v2 2/6] KVM: MMU: fix page accessed tracking lost if ept is enabled

In current code, if ept is enabled(shadow_accessed_mask = 0), the page
accessed tracking is lost

Signed-off-by: Xiao Guangrong <[email protected]>
---
arch/x86/kvm/mmu.c | 5 +++--
1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 1a4b42e..5937054 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -687,7 +687,7 @@ static void drop_spte(struct kvm *kvm, u64 *sptep, u64 new_spte)
if (!is_rmap_spte(old_spte))
return;
pfn = spte_to_pfn(old_spte);
- if (old_spte & shadow_accessed_mask)
+ if (!shadow_accessed_mask || old_spte & shadow_accessed_mask)
kvm_set_pfn_accessed(pfn);
if (is_writable_pte(old_spte))
kvm_set_pfn_dirty(pfn);
@@ -815,7 +815,8 @@ static int kvm_set_pte_rmapp(struct kvm *kvm, unsigned long *rmapp,
kvm_set_pfn_dirty(spte_to_pfn(*spte));
old_spte = __xchg_spte(spte, new_spte);
if (is_shadow_present_pte(old_spte)
- && (old_spte & shadow_accessed_mask))
+ && (!shadow_accessed_mask ||
+ old_spte & shadow_accessed_mask))
mark_page_accessed(pfn_to_page(spte_to_pfn(old_spte)));
spte = rmap_next(kvm, rmapp, spte);
}
--
1.6.1.2

2010-07-16 03:29:18

by Xiao Guangrong

[permalink] [raw]
Subject: [PATCH v2 3/6] KVM: MMU: fix page dirty tracking lost while sync page

In sync-page path, if spte.writable is changed, it will lose page dirty
tracking, for example:

assume spte.writable = 0 in a unsync-page, when it's synced, it map spte
to writable(that is spte.writable = 1), later guest write spte.gfn, it means
spte.gfn is dirty, then guest changed this mapping to read-only, after it's
synced, spte.writable = 0

So, when host release the spte, it detect spte.writable = 0 and not mark page
dirty

Signed-off-by: Xiao Guangrong <[email protected]>
---
arch/x86/kvm/mmu.c | 10 +++-------
1 files changed, 3 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 5937054..b3896bf 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1981,6 +1981,8 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
mark_page_dirty(vcpu->kvm, gfn);

set_pte:
+ if (is_writable_pte(*sptep) && !is_writable_pte(spte))
+ kvm_set_pfn_dirty(pfn);
update_spte(sptep, spte);
done:
return ret;
@@ -1994,7 +1996,6 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
bool reset_host_protection)
{
int was_rmapped = 0;
- int was_writable = is_writable_pte(*sptep);
int rmap_count;

pgprintk("%s: spte %llx access %x write_fault %d"
@@ -2044,15 +2045,10 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
page_header_update_slot(vcpu->kvm, sptep, gfn);
if (!was_rmapped) {
rmap_count = rmap_add(vcpu, sptep, gfn);
- kvm_release_pfn_clean(pfn);
if (rmap_count > RMAP_RECYCLE_THRESHOLD)
rmap_recycle(vcpu, sptep, gfn);
- } else {
- if (was_writable)
- kvm_release_pfn_dirty(pfn);
- else
- kvm_release_pfn_clean(pfn);
}
+ kvm_release_pfn_clean(pfn);
if (speculative) {
vcpu->arch.last_pte_updated = sptep;
vcpu->arch.last_pte_gfn = gfn;
--
1.6.1.2


2010-07-16 03:31:13

by Xiao Guangrong

[permalink] [raw]
Subject: [PATCH v2 4/6] KVM: MMU: don't atomicly set spte if it's not present

If the old mapping is not present, the spte.a is not lost, so no need
atomic operation to set it

Signed-off-by: Xiao Guangrong <[email protected]>
---
arch/x86/kvm/mmu.c | 5 +++--
1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index b3896bf..4123e8e 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -307,9 +307,10 @@ static void update_spte(u64 *sptep, u64 new_spte)
{
u64 old_spte;

- if (!shadow_accessed_mask || (new_spte & shadow_accessed_mask)) {
+ if (!shadow_accessed_mask || (new_spte & shadow_accessed_mask) ||
+ !is_rmap_spte(*sptep))
__set_spte(sptep, new_spte);
- } else {
+ else {
old_spte = __xchg_spte(sptep, new_spte);
if (old_spte & shadow_accessed_mask)
mark_page_accessed(pfn_to_page(spte_to_pfn(old_spte)));
--
1.6.1.2

2010-07-16 03:32:10

by Xiao Guangrong

[permalink] [raw]
Subject: [PATCH v2 5/6] KVM: MMU: cleanup spte set and accssed/dirty tracking

Introduce set_spte_track_bits() to cleanup current code

Signed-off-by: Xiao Guangrong <[email protected]>
---
arch/x86/kvm/mmu.c | 17 ++++++++---------
1 files changed, 8 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 4123e8e..fb3ae54 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -679,7 +679,7 @@ static void rmap_remove(struct kvm *kvm, u64 *spte)
}
}

-static void drop_spte(struct kvm *kvm, u64 *sptep, u64 new_spte)
+static void set_spte_track_bits(u64 *sptep, u64 new_spte)
{
pfn_t pfn;
u64 old_spte;
@@ -692,6 +692,11 @@ static void drop_spte(struct kvm *kvm, u64 *sptep, u64 new_spte)
kvm_set_pfn_accessed(pfn);
if (is_writable_pte(old_spte))
kvm_set_pfn_dirty(pfn);
+}
+
+static void drop_spte(struct kvm *kvm, u64 *sptep, u64 new_spte)
+{
+ set_spte_track_bits(sptep, new_spte);
rmap_remove(kvm, sptep);
}

@@ -791,7 +796,7 @@ static int kvm_set_pte_rmapp(struct kvm *kvm, unsigned long *rmapp,
unsigned long data)
{
int need_flush = 0;
- u64 *spte, new_spte, old_spte;
+ u64 *spte, new_spte;
pte_t *ptep = (pte_t *)data;
pfn_t new_pfn;

@@ -812,13 +817,7 @@ static int kvm_set_pte_rmapp(struct kvm *kvm, unsigned long *rmapp,
new_spte &= ~PT_WRITABLE_MASK;
new_spte &= ~SPTE_HOST_WRITEABLE;
new_spte &= ~shadow_accessed_mask;
- if (is_writable_pte(*spte))
- kvm_set_pfn_dirty(spte_to_pfn(*spte));
- old_spte = __xchg_spte(spte, new_spte);
- if (is_shadow_present_pte(old_spte)
- && (!shadow_accessed_mask ||
- old_spte & shadow_accessed_mask))
- mark_page_accessed(pfn_to_page(spte_to_pfn(old_spte)));
+ set_spte_track_bits(spte, new_spte);
spte = rmap_next(kvm, rmapp, spte);
}
}
--
1.6.1.2

2010-07-16 03:34:19

by Xiao Guangrong

[permalink] [raw]
Subject: [PATCH v2 6/6] KVM: MMU: using __xchg_spte more smarter

Sometimes, atomically set spte is not needed, this patch call __xchg_spte()
more smartly

Note: if the old mapping's access bit is already set, we no need atomic operation
since the access bit is not lost

Signed-off-by: Xiao Guangrong <[email protected]>
---
arch/x86/kvm/mmu.c | 9 +++++++--
1 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index fb3ae54..71e9268 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -682,9 +682,14 @@ static void rmap_remove(struct kvm *kvm, u64 *spte)
static void set_spte_track_bits(u64 *sptep, u64 new_spte)
{
pfn_t pfn;
- u64 old_spte;
+ u64 old_spte = *sptep;
+
+ if (!shadow_accessed_mask || !is_shadow_present_pte(old_spte) ||
+ old_spte & shadow_accessed_mask) {
+ __set_spte(sptep, new_spte);
+ } else
+ old_spte = __xchg_spte(sptep, new_spte);

- old_spte = __xchg_spte(sptep, new_spte);
if (!is_rmap_spte(old_spte))
return;
pfn = spte_to_pfn(old_spte);
--
1.6.1.2

2010-07-21 00:40:24

by Xiao Guangrong

[permalink] [raw]
Subject: Re: [PATCH v2 1/6] KVM: MMU: fix forgot reserved bits check in speculative path



Xiao Guangrong wrote:
> In the speculative path, we should check guest pte's reserved bits just as
> the real processor does
>

Ping......?

2010-07-21 18:34:38

by Avi Kivity

[permalink] [raw]
Subject: Re: [PATCH v2 3/6] KVM: MMU: fix page dirty tracking lost while sync page

On 07/16/2010 06:25 AM, Xiao Guangrong wrote:
> In sync-page path, if spte.writable is changed, it will lose page dirty
> tracking, for example:
>
> assume spte.writable = 0 in a unsync-page, when it's synced, it map spte
> to writable(that is spte.writable = 1), later guest write spte.gfn, it means
> spte.gfn is dirty, then guest changed this mapping to read-only, after it's
> synced, spte.writable = 0
>
> So, when host release the spte, it detect spte.writable = 0 and not mark page
> dirty
>
>

Subtle, good catch.

> set_pte:
> + if (is_writable_pte(*sptep)&& !is_writable_pte(spte))
> + kvm_set_pfn_dirty(pfn);
> update_spte(sptep, spte);
>

I think this has to be done after the tlb flush, otherwise we have

set_pfn_dirty
(some other cpu) write out page, mark as clean
(some other vcpu writes through stale tlb entry)
update_spte
tlb flush

but perhaps mmu notifiers protect us here, if the cleaner wants to write
out the page it has to clear the dirty bit in sptes as well, and that
will block on mmu_lock.

Later on we can use the dirty bit instead of writeable bit, except on
EPT. But let's start with your fix.

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

2010-07-21 18:39:32

by Avi Kivity

[permalink] [raw]
Subject: Re: [PATCH v2 1/6] KVM: MMU: fix forgot reserved bits check in speculative path

On 07/21/2010 03:36 AM, Xiao Guangrong wrote:
>
> Xiao Guangrong wrote:
>
>> In the speculative path, we should check guest pte's reserved bits just as
>> the real processor does
>>
>>
> Ping......?
>

Sorry. All looks good. I think no change is needed to patch 3, if we
agree on this I'll apply it as is.

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

2010-07-22 01:21:59

by Xiao Guangrong

[permalink] [raw]
Subject: Re: [PATCH v2 3/6] KVM: MMU: fix page dirty tracking lost while sync page



Avi Kivity wrote:

> Later on we can use the dirty bit instead of writeable bit, except on
> EPT. But let's start with your fix.
>

I'm doing it, will post it... :-)

2010-07-22 01:22:52

by Xiao Guangrong

[permalink] [raw]
Subject: Re: [PATCH v2 1/6] KVM: MMU: fix forgot reserved bits check in speculative path



Avi Kivity wrote:
> On 07/21/2010 03:36 AM, Xiao Guangrong wrote:
>>
>> Xiao Guangrong wrote:
>>
>>> In the speculative path, we should check guest pte's reserved bits
>>> just as
>>> the real processor does
>>>
>>>
>> Ping......?
>>
>
> Sorry. All looks good. I think no change is needed to patch 3, if we
> agree on this I'll apply it as is.

Yeah, i agree, thanks, Avi.

2010-07-25 08:47:09

by Avi Kivity

[permalink] [raw]
Subject: Re: [PATCH v2 1/6] KVM: MMU: fix forgot reserved bits check in speculative path

On 07/22/2010 04:18 AM, Xiao Guangrong wrote:
>
> Avi Kivity wrote:
>> On 07/21/2010 03:36 AM, Xiao Guangrong wrote:
>>> Xiao Guangrong wrote:
>>>
>>>> In the speculative path, we should check guest pte's reserved bits
>>>> just as
>>>> the real processor does
>>>>
>>>>
>>> Ping......?
>>>
>> Sorry. All looks good. I think no change is needed to patch 3, if we
>> agree on this I'll apply it as is.
> Yeah, i agree, thanks, Avi.

Now applied.

--
error compiling committee.c: too many arguments to function