2015-11-30 18:32:31

by Xiao Guangrong

[permalink] [raw]
Subject: [PATCH 00/11] KVM: x86: track guest page access

This patchset introduces the feature which allows us to track page
access in guest. Currently, only write access tracking is implemented
in this version.

Four APIs are introduces:
- kvm_page_track_add_page(kvm, gfn, mode), single guest page @gfn is
added into the track pool of the guest instance represented by @kvm,
@mode specifies which kind of access on the @gfn is tracked

- kvm_page_track_remove_page(kvm, gfn, mode), is the opposed operation
of kvm_page_track_add_page() which removes @gfn from the tracking pool.
gfn is no tracked after its last user is gone

- kvm_page_track_register_notifier(kvm, n), register a notifier so that
the event triggered by page tracking will be received, at that time,
the callback of n->track_write() will be called

- kvm_page_track_unregister_notifier(kvm, n), does the opposed operation
of kvm_page_track_register_notifier(), which unlinks the notifier and
stops receiving the tracked event

The first user of page track is non-leaf shadow page tables as they are
always write protected. It also gains performance improvement because
page track speeds up page fault handler for the tracked pages. The
performance result of kernel building is as followings:

before after
real 461.63 real 455.48
user 4529.55 user 4557.88
sys 1995.39 sys 1922.57

Furthermore, it is the infrastructure of other kind of shadow page table,
such as GPU shadow page table introduced in KVMGT (1) and native nested
IOMMU.

This patch can be divided into two parts:
- patch 1 ~ patch 7, implement page tracking
- others patches apply page tracking to non-leaf shadow page table

(1): http://lkml.iu.edu/hypermail/linux/kernel/1510.3/01562.html

Xiao Guangrong (11):
KVM: MMU: rename has_wrprotected_page to mmu_gfn_lpage_is_disallowed
KVM: MMU: introduce kvm_mmu_gfn_{allow,disallow}_lpage
KVM: MMU: introduce kvm_mmu_slot_gfn_write_protect
KVM: page track: add the framework of guest page tracking
KVM: page track: introduce kvm_page_track_{add,remove}_page
KVM: MMU: let page fault handler be aware tracked page
KVM: page track: add notifier support
KVM: MMU: use page track for non-leaf shadow pages
KVM: MMU: simplify mmu_need_write_protect
KVM: MMU: clear write-flooding on the fast path of tracked page
KVM: MMU: apply page track notifier

Documentation/virtual/kvm/mmu.txt | 6 +-
arch/x86/include/asm/kvm_host.h | 12 +-
arch/x86/include/asm/kvm_page_track.h | 67 +++++++++
arch/x86/kvm/Makefile | 3 +-
arch/x86/kvm/mmu.c | 199 +++++++++++++++++++--------
arch/x86/kvm/mmu.h | 5 +
arch/x86/kvm/page_track.c | 252 ++++++++++++++++++++++++++++++++++
arch/x86/kvm/paging_tmpl.h | 5 +
arch/x86/kvm/x86.c | 27 ++--
9 files changed, 504 insertions(+), 72 deletions(-)
create mode 100644 arch/x86/include/asm/kvm_page_track.h
create mode 100644 arch/x86/kvm/page_track.c

--
1.8.3.1


2015-11-30 18:36:34

by Xiao Guangrong

[permalink] [raw]
Subject: [PATCH 01/11] KVM: MMU: rename has_wrprotected_page to mmu_gfn_lpage_is_disallowed

kvm_lpage_info->write_count is used to detect if the large page mapping for
the gfn on the specified level is allowed, rename it to disallow_lpage to
reflect its purpose, also we rename has_wrprotected_page() to
mmu_gfn_lpage_is_disallowed() to make the code more clearer

Later we will extend this mechanism for page tracking: if the gfn is tracked
then large mapping for that gfn on any level is not allowed. the new name is
more straightforward

Signed-off-by: Xiao Guangrong <[email protected]>
---
Documentation/virtual/kvm/mmu.txt | 6 +++---
arch/x86/include/asm/kvm_host.h | 2 +-
arch/x86/kvm/mmu.c | 25 +++++++++++++------------
arch/x86/kvm/x86.c | 14 ++++++++------
4 files changed, 25 insertions(+), 22 deletions(-)

diff --git a/Documentation/virtual/kvm/mmu.txt b/Documentation/virtual/kvm/mmu.txt
index daf9c0f..dda2e93 100644
--- a/Documentation/virtual/kvm/mmu.txt
+++ b/Documentation/virtual/kvm/mmu.txt
@@ -391,11 +391,11 @@ To instantiate a large spte, four constraints must be satisfied:
write-protected pages
- the guest page must be wholly contained by a single memory slot

-To check the last two conditions, the mmu maintains a ->write_count set of
+To check the last two conditions, the mmu maintains a ->disallow_lpage set of
arrays for each memory slot and large page size. Every write protected page
-causes its write_count to be incremented, thus preventing instantiation of
+causes its disallow_lpage to be incremented, thus preventing instantiation of
a large spte. The frames at the end of an unaligned memory slot have
-artificially inflated ->write_counts so they can never be instantiated.
+artificially inflated ->disallow_lpages so they can never be instantiated.

Zapping all pages (page generation count)
=========================================
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 8140077..5aa2dcc 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -606,7 +606,7 @@ struct kvm_vcpu_arch {
};

struct kvm_lpage_info {
- int write_count;
+ int disallow_lpage;
};

struct kvm_arch_memory_slot {
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index a1a3d19..61259ff 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -789,7 +789,7 @@ static void account_shadowed(struct kvm *kvm, struct kvm_mmu_page *sp)
slot = __gfn_to_memslot(slots, gfn);
for (i = PT_DIRECTORY_LEVEL; i <= PT_MAX_HUGEPAGE_LEVEL; ++i) {
linfo = lpage_info_slot(gfn, slot, i);
- linfo->write_count += 1;
+ linfo->disallow_lpage += 1;
}
kvm->arch.indirect_shadow_pages++;
}
@@ -807,31 +807,32 @@ static void unaccount_shadowed(struct kvm *kvm, struct kvm_mmu_page *sp)
slot = __gfn_to_memslot(slots, gfn);
for (i = PT_DIRECTORY_LEVEL; i <= PT_MAX_HUGEPAGE_LEVEL; ++i) {
linfo = lpage_info_slot(gfn, slot, i);
- linfo->write_count -= 1;
- WARN_ON(linfo->write_count < 0);
+ linfo->disallow_lpage -= 1;
+ WARN_ON(linfo->disallow_lpage < 0);
}
kvm->arch.indirect_shadow_pages--;
}

-static int __has_wrprotected_page(gfn_t gfn, int level,
- struct kvm_memory_slot *slot)
+static bool __mmu_gfn_lpage_is_disallowed(gfn_t gfn, int level,
+ struct kvm_memory_slot *slot)
{
struct kvm_lpage_info *linfo;

if (slot) {
linfo = lpage_info_slot(gfn, slot, level);
- return linfo->write_count;
+ return !!linfo->disallow_lpage;
}

- return 1;
+ return true;
}

-static int has_wrprotected_page(struct kvm_vcpu *vcpu, gfn_t gfn, int level)
+static bool mmu_gfn_lpage_is_disallowed(struct kvm_vcpu *vcpu, gfn_t gfn,
+ int level)
{
struct kvm_memory_slot *slot;

slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn);
- return __has_wrprotected_page(gfn, level, slot);
+ return __mmu_gfn_lpage_is_disallowed(gfn, level, slot);
}

static int host_mapping_level(struct kvm *kvm, gfn_t gfn)
@@ -897,7 +898,7 @@ static int mapping_level(struct kvm_vcpu *vcpu, gfn_t large_gfn,
max_level = min(kvm_x86_ops->get_lpage_level(), host_level);

for (level = PT_DIRECTORY_LEVEL; level <= max_level; ++level)
- if (__has_wrprotected_page(large_gfn, level, slot))
+ if (__mmu_gfn_lpage_is_disallowed(large_gfn, level, slot))
break;

return level - 1;
@@ -2511,7 +2512,7 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
* be fixed if guest refault.
*/
if (level > PT_PAGE_TABLE_LEVEL &&
- has_wrprotected_page(vcpu, gfn, level))
+ mmu_gfn_lpage_is_disallowed(vcpu, gfn, level))
goto done;

spte |= PT_WRITABLE_MASK | SPTE_MMU_WRITEABLE;
@@ -2775,7 +2776,7 @@ static void transparent_hugepage_adjust(struct kvm_vcpu *vcpu,
if (!is_error_noslot_pfn(pfn) && !kvm_is_reserved_pfn(pfn) &&
level == PT_PAGE_TABLE_LEVEL &&
PageTransCompound(pfn_to_page(pfn)) &&
- !has_wrprotected_page(vcpu, gfn, PT_DIRECTORY_LEVEL)) {
+ !mmu_gfn_lpage_is_disallowed(vcpu, gfn, PT_DIRECTORY_LEVEL)) {
unsigned long mask;
/*
* mmu_notifier_retry was successful and we hold the
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index f1d6501..c04987e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7846,6 +7846,7 @@ int kvm_arch_create_memslot(struct kvm *kvm, struct kvm_memory_slot *slot,
int i;

for (i = 0; i < KVM_NR_PAGE_SIZES; ++i) {
+ struct kvm_lpage_info *linfo;
unsigned long ugfn;
int lpages;
int level = i + 1;
@@ -7860,15 +7861,16 @@ int kvm_arch_create_memslot(struct kvm *kvm, struct kvm_memory_slot *slot,
if (i == 0)
continue;

- slot->arch.lpage_info[i - 1] = kvm_kvzalloc(lpages *
- sizeof(*slot->arch.lpage_info[i - 1]));
- if (!slot->arch.lpage_info[i - 1])
+ linfo = kvm_kvzalloc(lpages * sizeof(*linfo));
+ if (!linfo)
goto out_free;

+ slot->arch.lpage_info[i - 1] = linfo;
+
if (slot->base_gfn & (KVM_PAGES_PER_HPAGE(level) - 1))
- slot->arch.lpage_info[i - 1][0].write_count = 1;
+ linfo[0].disallow_lpage = 1;
if ((slot->base_gfn + npages) & (KVM_PAGES_PER_HPAGE(level) - 1))
- slot->arch.lpage_info[i - 1][lpages - 1].write_count = 1;
+ linfo[lpages - 1].disallow_lpage = 1;
ugfn = slot->userspace_addr >> PAGE_SHIFT;
/*
* If the gfn and userspace address are not aligned wrt each
@@ -7880,7 +7882,7 @@ int kvm_arch_create_memslot(struct kvm *kvm, struct kvm_memory_slot *slot,
unsigned long j;

for (j = 0; j < lpages; ++j)
- slot->arch.lpage_info[i - 1][j].write_count = 1;
+ linfo[j].disallow_lpage = 1;
}
}

--
1.8.3.1

2015-11-30 18:32:33

by Xiao Guangrong

[permalink] [raw]
Subject: [PATCH 02/11] KVM: MMU: introduce kvm_mmu_gfn_{allow,disallow}_lpage

Abstract the common operations from account_shadowed() and
unaccount_shadowed(), then introduce kvm_mmu_gfn_disallow_lpage()
and kvm_mmu_gfn_allow_lpage()

These two functions will be used by page tracking in the later patch

Signed-off-by: Xiao Guangrong <[email protected]>
---
arch/x86/kvm/mmu.c | 38 +++++++++++++++++++++++++-------------
arch/x86/kvm/mmu.h | 3 +++
2 files changed, 28 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 61259ff..4b04d13 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -776,21 +776,39 @@ static struct kvm_lpage_info *lpage_info_slot(gfn_t gfn,
return &slot->arch.lpage_info[level - 2][idx];
}

+static void update_gfn_disallow_lpage_count(struct kvm_memory_slot *slot,
+ gfn_t gfn, int count)
+{
+ struct kvm_lpage_info *linfo;
+ int i;
+
+ for (i = PT_DIRECTORY_LEVEL; i <= PT_MAX_HUGEPAGE_LEVEL; ++i) {
+ linfo = lpage_info_slot(gfn, slot, i);
+ linfo->disallow_lpage += count;
+ WARN_ON(linfo->disallow_lpage < 0);
+ }
+}
+
+void kvm_mmu_gfn_disallow_lpage(struct kvm_memory_slot *slot, gfn_t gfn)
+{
+ update_gfn_disallow_lpage_count(slot, gfn, 1);
+}
+
+void kvm_mmu_gfn_allow_lpage(struct kvm_memory_slot *slot, gfn_t gfn)
+{
+ update_gfn_disallow_lpage_count(slot, gfn, -1);
+}
+
static void account_shadowed(struct kvm *kvm, struct kvm_mmu_page *sp)
{
struct kvm_memslots *slots;
struct kvm_memory_slot *slot;
- struct kvm_lpage_info *linfo;
gfn_t gfn;
- int i;

gfn = sp->gfn;
slots = kvm_memslots_for_spte_role(kvm, sp->role);
slot = __gfn_to_memslot(slots, gfn);
- for (i = PT_DIRECTORY_LEVEL; i <= PT_MAX_HUGEPAGE_LEVEL; ++i) {
- linfo = lpage_info_slot(gfn, slot, i);
- linfo->disallow_lpage += 1;
- }
+ kvm_mmu_gfn_disallow_lpage(slot, gfn);
kvm->arch.indirect_shadow_pages++;
}

@@ -798,18 +816,12 @@ static void unaccount_shadowed(struct kvm *kvm, struct kvm_mmu_page *sp)
{
struct kvm_memslots *slots;
struct kvm_memory_slot *slot;
- struct kvm_lpage_info *linfo;
gfn_t gfn;
- int i;

gfn = sp->gfn;
slots = kvm_memslots_for_spte_role(kvm, sp->role);
slot = __gfn_to_memslot(slots, gfn);
- for (i = PT_DIRECTORY_LEVEL; i <= PT_MAX_HUGEPAGE_LEVEL; ++i) {
- linfo = lpage_info_slot(gfn, slot, i);
- linfo->disallow_lpage -= 1;
- WARN_ON(linfo->disallow_lpage < 0);
- }
+ kvm_mmu_gfn_allow_lpage(slot, gfn);
kvm->arch.indirect_shadow_pages--;
}

diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 55ffb7b..de92bed 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -174,4 +174,7 @@ static inline bool permission_fault(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,

void kvm_mmu_invalidate_zap_all_pages(struct kvm *kvm);
void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end);
+
+void kvm_mmu_gfn_disallow_lpage(struct kvm_memory_slot *slot, gfn_t gfn);
+void kvm_mmu_gfn_allow_lpage(struct kvm_memory_slot *slot, gfn_t gfn);
#endif
--
1.8.3.1

2015-11-30 18:35:53

by Xiao Guangrong

[permalink] [raw]
Subject: [PATCH 03/11] KVM: MMU: introduce kvm_mmu_slot_gfn_write_protect

Split rmap_write_protect() and introduce the function to abstract the write
protection based on the slot

This function will be used in the later patch

Signed-off-by: Xiao Guangrong <[email protected]>
---
arch/x86/kvm/mmu.c | 16 +++++++++++-----
arch/x86/kvm/mmu.h | 2 ++
2 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 4b04d13..39809b8 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1336,23 +1336,29 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
kvm_mmu_write_protect_pt_masked(kvm, slot, gfn_offset, mask);
}

-static bool rmap_write_protect(struct kvm_vcpu *vcpu, u64 gfn)
+bool kvm_mmu_slot_gfn_write_protect(struct kvm *kvm,
+ struct kvm_memory_slot *slot, u64 gfn)
{
- struct kvm_memory_slot *slot;
struct kvm_rmap_head *rmap_head;
int i;
bool write_protected = false;

- slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn);
-
for (i = PT_PAGE_TABLE_LEVEL; i <= PT_MAX_HUGEPAGE_LEVEL; ++i) {
rmap_head = __gfn_to_rmap(gfn, i, slot);
- write_protected |= __rmap_write_protect(vcpu->kvm, rmap_head, true);
+ write_protected |= __rmap_write_protect(kvm, rmap_head, true);
}

return write_protected;
}

+static bool rmap_write_protect(struct kvm_vcpu *vcpu, u64 gfn)
+{
+ struct kvm_memory_slot *slot;
+
+ slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn);
+ return kvm_mmu_slot_gfn_write_protect(vcpu->kvm, slot, gfn);
+}
+
static bool kvm_zap_rmapp(struct kvm *kvm, struct kvm_rmap_head *rmap_head)
{
u64 *sptep;
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index de92bed..58fe98a 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -177,4 +177,6 @@ void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end);

void kvm_mmu_gfn_disallow_lpage(struct kvm_memory_slot *slot, gfn_t gfn);
void kvm_mmu_gfn_allow_lpage(struct kvm_memory_slot *slot, gfn_t gfn);
+bool kvm_mmu_slot_gfn_write_protect(struct kvm *kvm,
+ struct kvm_memory_slot *slot, u64 gfn);
#endif
--
1.8.3.1

2015-11-30 18:32:36

by Xiao Guangrong

[permalink] [raw]
Subject: [PATCH 04/11] KVM: page track: add the framework of guest page tracking

The array, gfn_track[mode][gfn], is introduced in memory slot for every
guest page, this is the tracking count for the gust page on different
modes. If the page is tracked then the count is increased, the page is
not tracked after the count reaches zero

Two callbacks, kvm_page_track_create_memslot() and
kvm_page_track_free_memslot() are implemented in this patch, they are
internally used to initialize and reclaim the memory of the array

Currently, only write track mode is supported

Signed-off-by: Xiao Guangrong <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 2 ++
arch/x86/include/asm/kvm_page_track.h | 13 ++++++++
arch/x86/kvm/Makefile | 3 +-
arch/x86/kvm/page_track.c | 58 +++++++++++++++++++++++++++++++++++
arch/x86/kvm/x86.c | 5 +++
5 files changed, 80 insertions(+), 1 deletion(-)
create mode 100644 arch/x86/include/asm/kvm_page_track.h
create mode 100644 arch/x86/kvm/page_track.c

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 5aa2dcc..afff1f1 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -32,6 +32,7 @@
#include <asm/mtrr.h>
#include <asm/msr-index.h>
#include <asm/asm.h>
+#include <asm/kvm_page_track.h>

#define KVM_MAX_VCPUS 255
#define KVM_SOFT_MAX_VCPUS 160
@@ -612,6 +613,7 @@ struct kvm_lpage_info {
struct kvm_arch_memory_slot {
struct kvm_rmap_head *rmap[KVM_NR_PAGE_SIZES];
struct kvm_lpage_info *lpage_info[KVM_NR_PAGE_SIZES - 1];
+ int *gfn_track[KVM_PAGE_TRACK_MAX];
};

/*
diff --git a/arch/x86/include/asm/kvm_page_track.h b/arch/x86/include/asm/kvm_page_track.h
new file mode 100644
index 0000000..347d5c9
--- /dev/null
+++ b/arch/x86/include/asm/kvm_page_track.h
@@ -0,0 +1,13 @@
+#ifndef _ASM_X86_KVM_PAGE_TRACK_H
+#define _ASM_X86_KVM_PAGE_TRACK_H
+
+enum kvm_page_track_mode {
+ KVM_PAGE_TRACK_WRITE,
+ KVM_PAGE_TRACK_MAX,
+};
+
+int kvm_page_track_create_memslot(struct kvm_memory_slot *slot,
+ unsigned long npages);
+void kvm_page_track_free_memslot(struct kvm_memory_slot *free,
+ struct kvm_memory_slot *dont);
+#endif
diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
index a1ff508..464fa47 100644
--- a/arch/x86/kvm/Makefile
+++ b/arch/x86/kvm/Makefile
@@ -13,9 +13,10 @@ kvm-$(CONFIG_KVM_ASYNC_PF) += $(KVM)/async_pf.o

kvm-y += x86.o mmu.o emulate.o i8259.o irq.o lapic.o \
i8254.o ioapic.o irq_comm.o cpuid.o pmu.o mtrr.o \
- hyperv.o
+ hyperv.o page_track.o

kvm-$(CONFIG_KVM_DEVICE_ASSIGNMENT) += assigned-dev.o iommu.o
+
kvm-intel-y += vmx.o pmu_intel.o
kvm-amd-y += svm.o pmu_amd.o

diff --git a/arch/x86/kvm/page_track.c b/arch/x86/kvm/page_track.c
new file mode 100644
index 0000000..0338d36
--- /dev/null
+++ b/arch/x86/kvm/page_track.c
@@ -0,0 +1,58 @@
+/*
+ * Support KVM gust page tracking
+ *
+ * This feature allows us to track page access in guest. Currently, only
+ * write access is tracked.
+ *
+ * Copyright(C) 2015 Intel Corporation.
+ *
+ * Author:
+ * Xiao Guangrong <[email protected]>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2. See
+ * the COPYING file in the top-level directory.
+ */
+
+#include <linux/kvm_host.h>
+#include <asm/kvm_host.h>
+#include <asm/kvm_page_track.h>
+
+#include "mmu.h"
+
+static void page_track_slot_free(struct kvm_memory_slot *slot)
+{
+ int i;
+
+ for (i = 0; i < KVM_PAGE_TRACK_MAX; i++)
+ if (slot->arch.gfn_track[i]) {
+ kvfree(slot->arch.gfn_track[i]);
+ slot->arch.gfn_track[i] = NULL;
+ }
+}
+
+int kvm_page_track_create_memslot(struct kvm_memory_slot *slot,
+ unsigned long npages)
+{
+ int i, pages = gfn_to_index(slot->base_gfn + npages - 1,
+ slot->base_gfn, PT_PAGE_TABLE_LEVEL) + 1;
+
+ for (i = 0; i < KVM_PAGE_TRACK_MAX; i++) {
+ slot->arch.gfn_track[i] = kvm_kvzalloc(pages *
+ sizeof(*slot->arch.gfn_track[i]));
+ if (!slot->arch.gfn_track[i])
+ goto track_free;
+ }
+
+ return 0;
+
+track_free:
+ page_track_slot_free(slot);
+ return -ENOMEM;
+}
+
+void kvm_page_track_free_memslot(struct kvm_memory_slot *free,
+ struct kvm_memory_slot *dont)
+{
+ if (!dont || free->arch.gfn_track != dont->arch.gfn_track)
+ page_track_slot_free(free);
+}
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c04987e..ad4888a 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7838,6 +7838,8 @@ void kvm_arch_free_memslot(struct kvm *kvm, struct kvm_memory_slot *free,
free->arch.lpage_info[i - 1] = NULL;
}
}
+
+ kvm_page_track_free_memslot(free, dont);
}

int kvm_arch_create_memslot(struct kvm *kvm, struct kvm_memory_slot *slot,
@@ -7886,6 +7888,9 @@ int kvm_arch_create_memslot(struct kvm *kvm, struct kvm_memory_slot *slot,
}
}

+ if (kvm_page_track_create_memslot(slot, npages))
+ goto out_free;
+
return 0;

out_free:
--
1.8.3.1

2015-11-30 18:35:25

by Xiao Guangrong

[permalink] [raw]
Subject: [PATCH 05/11] KVM: page track: introduce kvm_page_track_{add,remove}_page

These two functions are the user APIs:
- kvm_page_track_add_page(): add the page to the tracking pool after
that later specified access on that page will be tracked

- kvm_page_track_remove_page(): remove the page from the tracking pool,
the specified access on the page is not tracked after the last user is
gone

Both of these are called under the protection of kvm->srcu or
kvm->slots_lock

Signed-off-by: Xiao Guangrong <[email protected]>
---
arch/x86/include/asm/kvm_page_track.h | 5 ++
arch/x86/kvm/page_track.c | 95 +++++++++++++++++++++++++++++++++++
2 files changed, 100 insertions(+)

diff --git a/arch/x86/include/asm/kvm_page_track.h b/arch/x86/include/asm/kvm_page_track.h
index 347d5c9..9cc17c6 100644
--- a/arch/x86/include/asm/kvm_page_track.h
+++ b/arch/x86/include/asm/kvm_page_track.h
@@ -10,4 +10,9 @@ int kvm_page_track_create_memslot(struct kvm_memory_slot *slot,
unsigned long npages);
void kvm_page_track_free_memslot(struct kvm_memory_slot *free,
struct kvm_memory_slot *dont);
+
+void kvm_page_track_add_page(struct kvm *kvm, gfn_t gfn,
+ enum kvm_page_track_mode mode);
+void kvm_page_track_remove_page(struct kvm *kvm, gfn_t gfn,
+ enum kvm_page_track_mode mode);
#endif
diff --git a/arch/x86/kvm/page_track.c b/arch/x86/kvm/page_track.c
index 0338d36..ad510db 100644
--- a/arch/x86/kvm/page_track.c
+++ b/arch/x86/kvm/page_track.c
@@ -56,3 +56,98 @@ void kvm_page_track_free_memslot(struct kvm_memory_slot *free,
if (!dont || free->arch.gfn_track != dont->arch.gfn_track)
page_track_slot_free(free);
}
+
+static bool check_mode(enum kvm_page_track_mode mode)
+{
+ if (mode < 0 || mode >= KVM_PAGE_TRACK_MAX)
+ return false;
+
+ return true;
+}
+
+static void update_gfn_track(struct kvm_memory_slot *slot, gfn_t gfn,
+ enum kvm_page_track_mode mode, int count)
+{
+ int index, val;
+
+ index = gfn_to_index(gfn, slot->base_gfn, PT_PAGE_TABLE_LEVEL);
+
+ slot->arch.gfn_track[mode][index] += count;
+ val = slot->arch.gfn_track[mode][index];
+ WARN_ON(val < 0);
+}
+
+/*
+ * add guest page to the tracking pool so that corresponding access on that
+ * page will be intercepted.
+ *
+ * It should be called under the protection of kvm->srcu or kvm->slots_lock
+ *
+ * @kvm: the guest instance we are interested in.
+ * @gfn: the guest page.
+ * @mode: tracking mode, currently only write track is supported.
+ */
+void kvm_page_track_add_page(struct kvm *kvm, gfn_t gfn,
+ enum kvm_page_track_mode mode)
+{
+ struct kvm_memslots *slots;
+ struct kvm_memory_slot *slot;
+ int i;
+
+ WARN_ON(!check_mode(mode));
+
+ for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
+ slots = __kvm_memslots(kvm, i);
+ slot = __gfn_to_memslot(slots, gfn);
+
+ spin_lock(&kvm->mmu_lock);
+ update_gfn_track(slot, gfn, mode, 1);
+
+ /*
+ * new track stops large page mapping for the
+ * tracked page.
+ */
+ kvm_mmu_gfn_disallow_lpage(slot, gfn);
+
+ if (mode == KVM_PAGE_TRACK_WRITE)
+ if (kvm_mmu_slot_gfn_write_protect(kvm, slot, gfn))
+ kvm_flush_remote_tlbs(kvm);
+ spin_unlock(&kvm->mmu_lock);
+ }
+}
+
+/*
+ * remove the guest page from the tracking pool which stops the interception
+ * of corresponding access on that page. It is the opposed operation of
+ * kvm_page_track_add_page().
+ *
+ * It should be called under the protection of kvm->srcu or kvm->slots_lock
+ *
+ * @kvm: the guest instance we are interested in.
+ * @gfn: the guest page.
+ * @mode: tracking mode, currently only write track is supported.
+ */
+void kvm_page_track_remove_page(struct kvm *kvm, gfn_t gfn,
+ enum kvm_page_track_mode mode)
+{
+ struct kvm_memslots *slots;
+ struct kvm_memory_slot *slot;
+ int i;
+
+ WARN_ON(!check_mode(mode));
+
+ for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
+ slots = __kvm_memslots(kvm, i);
+ slot = __gfn_to_memslot(slots, gfn);
+
+ spin_lock(&kvm->mmu_lock);
+ update_gfn_track(slot, gfn, mode, -1);
+
+ /*
+ * allow large page mapping for the tracked page
+ * after the tracker is gone.
+ */
+ kvm_mmu_gfn_allow_lpage(slot, gfn);
+ spin_unlock(&kvm->mmu_lock);
+ }
+}
--
1.8.3.1

2015-11-30 18:32:42

by Xiao Guangrong

[permalink] [raw]
Subject: [PATCH 06/11] KVM: MMU: let page fault handler be aware tracked page

The page fault caused by write access on the write tracked page can not
be fixed, it always need to be emulated. page_fault_handle_page_track()
is the fast path we introduce here to skip holding mmu-lock and shadow
page table walking

However, if the page table is not present, it is worth making the page
table entry present and readonly to make the read access happy

mmu_need_write_protect() need to be cooked to avoid page becoming writable
when making page table present or sync/prefetch shadow page table entries

Signed-off-by: Xiao Guangrong <[email protected]>
---
arch/x86/include/asm/kvm_page_track.h | 2 ++
arch/x86/kvm/mmu.c | 44 +++++++++++++++++++++++++++++------
arch/x86/kvm/page_track.c | 14 +++++++++++
arch/x86/kvm/paging_tmpl.h | 3 +++
4 files changed, 56 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/kvm_page_track.h b/arch/x86/include/asm/kvm_page_track.h
index 9cc17c6..f223201 100644
--- a/arch/x86/include/asm/kvm_page_track.h
+++ b/arch/x86/include/asm/kvm_page_track.h
@@ -15,4 +15,6 @@ void kvm_page_track_add_page(struct kvm *kvm, gfn_t gfn,
enum kvm_page_track_mode mode);
void kvm_page_track_remove_page(struct kvm *kvm, gfn_t gfn,
enum kvm_page_track_mode mode);
+bool kvm_page_track_check_mode(struct kvm_vcpu *vcpu, gfn_t gfn,
+ enum kvm_page_track_mode mode);
#endif
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 39809b8..b23f9fc 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -41,6 +41,7 @@
#include <asm/cmpxchg.h>
#include <asm/io.h>
#include <asm/vmx.h>
+#include <asm/kvm_page_track.h>

/*
* When setting this variable to true it enables Two-Dimensional-Paging
@@ -2456,25 +2457,29 @@ static void kvm_unsync_pages(struct kvm_vcpu *vcpu, gfn_t gfn)
}
}

-static int mmu_need_write_protect(struct kvm_vcpu *vcpu, gfn_t gfn,
- bool can_unsync)
+static bool mmu_need_write_protect(struct kvm_vcpu *vcpu, gfn_t gfn,
+ bool can_unsync)
{
struct kvm_mmu_page *s;
bool need_unsync = false;

+ if (kvm_page_track_check_mode(vcpu, gfn, KVM_PAGE_TRACK_WRITE))
+ return true;
+
for_each_gfn_indirect_valid_sp(vcpu->kvm, s, gfn) {
if (!can_unsync)
- return 1;
+ return true;

if (s->role.level != PT_PAGE_TABLE_LEVEL)
- return 1;
+ return true;

if (!s->unsync)
need_unsync = true;
}
if (need_unsync)
kvm_unsync_pages(vcpu, gfn);
- return 0;
+
+ return false;
}

static bool kvm_is_mmio_pfn(pfn_t pfn)
@@ -3388,10 +3393,30 @@ int handle_mmio_page_fault(struct kvm_vcpu *vcpu, u64 addr, bool direct)
}
EXPORT_SYMBOL_GPL(handle_mmio_page_fault);

+static bool page_fault_handle_page_track(struct kvm_vcpu *vcpu,
+ u32 error_code, gfn_t gfn)
+{
+ if (unlikely(error_code & PFERR_RSVD_MASK))
+ return false;
+
+ if (!(error_code & PFERR_PRESENT_MASK) ||
+ !(error_code & PFERR_WRITE_MASK))
+ return false;
+
+ /*
+ * guest is writing the page which is write tracked which can
+ * not be fixed by page fault handler.
+ */
+ if (kvm_page_track_check_mode(vcpu, gfn, KVM_PAGE_TRACK_WRITE))
+ return true;
+
+ return false;
+}
+
static int nonpaging_page_fault(struct kvm_vcpu *vcpu, gva_t gva,
u32 error_code, bool prefault)
{
- gfn_t gfn;
+ gfn_t gfn = gva >> PAGE_SHIFT;
int r;

pgprintk("%s: gva %lx error %x\n", __func__, gva, error_code);
@@ -3403,13 +3428,15 @@ static int nonpaging_page_fault(struct kvm_vcpu *vcpu, gva_t gva,
return r;
}

+ if (page_fault_handle_page_track(vcpu, error_code, gfn))
+ return 1;
+
r = mmu_topup_memory_caches(vcpu);
if (r)
return r;

MMU_WARN_ON(!VALID_PAGE(vcpu->arch.mmu.root_hpa));

- gfn = gva >> PAGE_SHIFT;

return nonpaging_map(vcpu, gva & PAGE_MASK,
error_code, gfn, prefault);
@@ -3493,6 +3520,9 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa, u32 error_code,
return r;
}

+ if (page_fault_handle_page_track(vcpu, error_code, gfn))
+ return 1;
+
r = mmu_topup_memory_caches(vcpu);
if (r)
return r;
diff --git a/arch/x86/kvm/page_track.c b/arch/x86/kvm/page_track.c
index ad510db..dc2da12 100644
--- a/arch/x86/kvm/page_track.c
+++ b/arch/x86/kvm/page_track.c
@@ -151,3 +151,17 @@ void kvm_page_track_remove_page(struct kvm *kvm, gfn_t gfn,
spin_unlock(&kvm->mmu_lock);
}
}
+
+/*
+ * check if the corresponding access on the specified guest page is tracked.
+ */
+bool kvm_page_track_check_mode(struct kvm_vcpu *vcpu, gfn_t gfn,
+ enum kvm_page_track_mode mode)
+{
+ struct kvm_memory_slot *slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn);
+ int index = gfn_to_index(gfn, slot->base_gfn, PT_PAGE_TABLE_LEVEL);
+
+ WARN_ON(!check_mode(mode));
+
+ return !!ACCESS_ONCE(slot->arch.gfn_track[mode][index]);
+}
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 91e939b..ac85682 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -735,6 +735,9 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t addr, u32 error_code,
return 0;
}

+ if (page_fault_handle_page_track(vcpu, error_code, walker.gfn))
+ return 1;
+
vcpu->arch.write_fault_to_shadow_pgtable = false;

is_self_change_mapping = FNAME(is_self_change_mapping)(vcpu,
--
1.8.3.1

2015-11-30 18:32:44

by Xiao Guangrong

[permalink] [raw]
Subject: [PATCH 07/11] KVM: page track: add notifier support

Notifier list is introduced so that any node wants to receive the track
event can register to the list

Two APIs are introduced here:
- kvm_page_track_register_notifier(): register the notifier to receive
track event

- kvm_page_track_unregister_notifier(): stop receiving track event by
unregister the notifier

The callback, node->track_write() is called when a write access on the
write tracked page happens

Signed-off-by: Xiao Guangrong <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/include/asm/kvm_page_track.h | 39 ++++++++++++++++++++
arch/x86/kvm/page_track.c | 67 +++++++++++++++++++++++++++++++++++
arch/x86/kvm/x86.c | 4 +++
4 files changed, 111 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index afff1f1..0f7b940 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -658,6 +658,7 @@ struct kvm_arch {
*/
struct list_head active_mmu_pages;
struct list_head zapped_obsolete_pages;
+ struct kvm_page_track_notifier_head track_notifier_head;

struct list_head assigned_dev_head;
struct iommu_domain *iommu_domain;
diff --git a/arch/x86/include/asm/kvm_page_track.h b/arch/x86/include/asm/kvm_page_track.h
index f223201..6744234 100644
--- a/arch/x86/include/asm/kvm_page_track.h
+++ b/arch/x86/include/asm/kvm_page_track.h
@@ -6,6 +6,36 @@ enum kvm_page_track_mode {
KVM_PAGE_TRACK_MAX,
};

+/*
+ * The notifier represented by @kvm_page_track_notifier_node is linked into
+ * the head which will be notified when guest is triggering the track event.
+ *
+ * Write access on the head is protected by kvm->mmu_lock, read access
+ * is protected by track_srcu.
+ */
+struct kvm_page_track_notifier_head {
+ struct srcu_struct track_srcu;
+ struct hlist_head track_notifier_list;
+};
+
+struct kvm_page_track_notifier_node {
+ struct hlist_node node;
+
+ /*
+ * It is called when guest is writing the write-tracked page
+ * and write emulation is finished at that time.
+ *
+ * @vcpu: the vcpu where the write access happened.
+ * @gpa: the physical address written by guest.
+ * @new: the data was written to the address.
+ * @bytes: the written length.
+ */
+ void (*track_write)(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
+ int bytes);
+};
+
+void kvm_page_track_init(struct kvm *kvm);
+
int kvm_page_track_create_memslot(struct kvm_memory_slot *slot,
unsigned long npages);
void kvm_page_track_free_memslot(struct kvm_memory_slot *free,
@@ -17,4 +47,13 @@ void kvm_page_track_remove_page(struct kvm *kvm, gfn_t gfn,
enum kvm_page_track_mode mode);
bool kvm_page_track_check_mode(struct kvm_vcpu *vcpu, gfn_t gfn,
enum kvm_page_track_mode mode);
+
+void
+kvm_page_track_register_notifier(struct kvm *kvm,
+ struct kvm_page_track_notifier_node *n);
+void
+kvm_page_track_unregister_notifier(struct kvm *kvm,
+ struct kvm_page_track_notifier_node *n);
+void kvm_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
+ int bytes);
#endif
diff --git a/arch/x86/kvm/page_track.c b/arch/x86/kvm/page_track.c
index dc2da12..84420df 100644
--- a/arch/x86/kvm/page_track.c
+++ b/arch/x86/kvm/page_track.c
@@ -165,3 +165,70 @@ bool kvm_page_track_check_mode(struct kvm_vcpu *vcpu, gfn_t gfn,

return !!ACCESS_ONCE(slot->arch.gfn_track[mode][index]);
}
+
+void kvm_page_track_init(struct kvm *kvm)
+{
+ struct kvm_page_track_notifier_head *head;
+
+ head = &kvm->arch.track_notifier_head;
+ init_srcu_struct(&head->track_srcu);
+ INIT_HLIST_HEAD(&head->track_notifier_list);
+}
+
+/*
+ * register the notifier so that event interception for the tracked guest
+ * pages can be received.
+ */
+void
+kvm_page_track_register_notifier(struct kvm *kvm,
+ struct kvm_page_track_notifier_node *n)
+{
+ struct kvm_page_track_notifier_head *head;
+
+ head = &kvm->arch.track_notifier_head;
+
+ spin_lock(&kvm->mmu_lock);
+ hlist_add_head_rcu(&n->node, &head->track_notifier_list);
+ spin_unlock(&kvm->mmu_lock);
+}
+
+/*
+ * stop receiving the event interception. It is the opposed operation of
+ * kvm_page_track_register_notifier().
+ */
+void
+kvm_page_track_unregister_notifier(struct kvm *kvm,
+ struct kvm_page_track_notifier_node *n)
+{
+ struct kvm_page_track_notifier_head *head;
+
+ head = &kvm->arch.track_notifier_head;
+
+ spin_lock(&kvm->mmu_lock);
+ hlist_del_rcu(&n->node);
+ spin_unlock(&kvm->mmu_lock);
+ synchronize_srcu(&head->track_srcu);
+}
+
+/*
+ * Notify the node that write access is intercepted and write emulation is
+ * finished at this time.
+ *
+ * The node should figure out if the written page is the one that node is
+ * interested in by itself.
+ */
+void kvm_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
+ int bytes)
+{
+ struct kvm_page_track_notifier_head *head;
+ struct kvm_page_track_notifier_node *n;
+ int idx;
+
+ head = &vcpu->kvm->arch.track_notifier_head;
+
+ idx = srcu_read_lock(&head->track_srcu);
+ hlist_for_each_entry_rcu(n, &head->track_notifier_list, node)
+ if (n->track_write)
+ n->track_write(vcpu, gpa, new, bytes);
+ srcu_read_unlock(&head->track_srcu, idx);
+}
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ad4888a..64dbc69 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4328,6 +4328,7 @@ int emulator_write_phys(struct kvm_vcpu *vcpu, gpa_t gpa,
if (ret < 0)
return 0;
kvm_mmu_pte_write(vcpu, gpa, val, bytes);
+ kvm_page_track_write(vcpu, gpa, val, bytes);
return 1;
}

@@ -4586,6 +4587,7 @@ static int emulator_cmpxchg_emulated(struct x86_emulate_ctxt *ctxt,

kvm_vcpu_mark_page_dirty(vcpu, gpa >> PAGE_SHIFT);
kvm_mmu_pte_write(vcpu, gpa, new, bytes);
+ kvm_page_track_write(vcpu, gpa, new, bytes);

return X86EMUL_CONTINUE;

@@ -7691,6 +7693,8 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
INIT_DELAYED_WORK(&kvm->arch.kvmclock_update_work, kvmclock_update_fn);
INIT_DELAYED_WORK(&kvm->arch.kvmclock_sync_work, kvmclock_sync_fn);

+ kvm_page_track_init(kvm);
+
return 0;
}

--
1.8.3.1

2015-11-30 18:34:50

by Xiao Guangrong

[permalink] [raw]
Subject: [PATCH 08/11] KVM: MMU: use page track for non-leaf shadow pages

non-leaf shadow pages are always write protected, it can be the user
of page track

Signed-off-by: Xiao Guangrong <[email protected]>
---
arch/x86/include/asm/kvm_page_track.h | 8 +++++
arch/x86/kvm/mmu.c | 26 +++++++++++++---
arch/x86/kvm/page_track.c | 58 +++++++++++++++++++++++------------
3 files changed, 67 insertions(+), 25 deletions(-)

diff --git a/arch/x86/include/asm/kvm_page_track.h b/arch/x86/include/asm/kvm_page_track.h
index 6744234..3447dac 100644
--- a/arch/x86/include/asm/kvm_page_track.h
+++ b/arch/x86/include/asm/kvm_page_track.h
@@ -41,8 +41,16 @@ int kvm_page_track_create_memslot(struct kvm_memory_slot *slot,
void kvm_page_track_free_memslot(struct kvm_memory_slot *free,
struct kvm_memory_slot *dont);

+void
+kvm_slot_page_track_add_page_nolock(struct kvm *kvm,
+ struct kvm_memory_slot *slot, gfn_t gfn,
+ enum kvm_page_track_mode mode);
void kvm_page_track_add_page(struct kvm *kvm, gfn_t gfn,
enum kvm_page_track_mode mode);
+void kvm_slot_page_track_remove_page_nolock(struct kvm *kvm,
+ struct kvm_memory_slot *slot,
+ gfn_t gfn,
+ enum kvm_page_track_mode mode);
void kvm_page_track_remove_page(struct kvm *kvm, gfn_t gfn,
enum kvm_page_track_mode mode);
bool kvm_page_track_check_mode(struct kvm_vcpu *vcpu, gfn_t gfn,
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index b23f9fc..5a2ca73 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -806,11 +806,17 @@ static void account_shadowed(struct kvm *kvm, struct kvm_mmu_page *sp)
struct kvm_memory_slot *slot;
gfn_t gfn;

+ kvm->arch.indirect_shadow_pages++;
gfn = sp->gfn;
slots = kvm_memslots_for_spte_role(kvm, sp->role);
slot = __gfn_to_memslot(slots, gfn);
+
+ /* the non-leaf shadow pages are keeping readonly. */
+ if (sp->role.level > PT_PAGE_TABLE_LEVEL)
+ return kvm_slot_page_track_add_page_nolock(kvm, slot, gfn,
+ KVM_PAGE_TRACK_WRITE);
+
kvm_mmu_gfn_disallow_lpage(slot, gfn);
- kvm->arch.indirect_shadow_pages++;
}

static void unaccount_shadowed(struct kvm *kvm, struct kvm_mmu_page *sp)
@@ -819,11 +825,15 @@ static void unaccount_shadowed(struct kvm *kvm, struct kvm_mmu_page *sp)
struct kvm_memory_slot *slot;
gfn_t gfn;

+ kvm->arch.indirect_shadow_pages--;
gfn = sp->gfn;
slots = kvm_memslots_for_spte_role(kvm, sp->role);
slot = __gfn_to_memslot(slots, gfn);
+ if (sp->role.level > PT_PAGE_TABLE_LEVEL)
+ return kvm_slot_page_track_remove_page_nolock(kvm, slot, gfn,
+ KVM_PAGE_TRACK_WRITE);
+
kvm_mmu_gfn_allow_lpage(slot, gfn);
- kvm->arch.indirect_shadow_pages--;
}

static bool __mmu_gfn_lpage_is_disallowed(gfn_t gfn, int level,
@@ -2140,12 +2150,18 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu,
hlist_add_head(&sp->hash_link,
&vcpu->kvm->arch.mmu_page_hash[kvm_page_table_hashfn(gfn)]);
if (!direct) {
- if (rmap_write_protect(vcpu, gfn))
+ /*
+ * we should do write protection before syncing pages
+ * otherwise the content of the synced shadow page may
+ * be inconsistent with guest page table.
+ */
+ account_shadowed(vcpu->kvm, sp);
+
+ if (level == PT_PAGE_TABLE_LEVEL &&
+ rmap_write_protect(vcpu, gfn))
kvm_flush_remote_tlbs(vcpu->kvm);
if (level > PT_PAGE_TABLE_LEVEL && need_sync)
kvm_sync_pages(vcpu, gfn);
-
- account_shadowed(vcpu->kvm, sp);
}
sp->mmu_valid_gen = vcpu->kvm->arch.mmu_valid_gen;
init_shadow_page_table(sp);
diff --git a/arch/x86/kvm/page_track.c b/arch/x86/kvm/page_track.c
index 84420df..87554d3 100644
--- a/arch/x86/kvm/page_track.c
+++ b/arch/x86/kvm/page_track.c
@@ -77,6 +77,26 @@ static void update_gfn_track(struct kvm_memory_slot *slot, gfn_t gfn,
WARN_ON(val < 0);
}

+void
+kvm_slot_page_track_add_page_nolock(struct kvm *kvm,
+ struct kvm_memory_slot *slot, gfn_t gfn,
+ enum kvm_page_track_mode mode)
+{
+ WARN_ON(!check_mode(mode));
+
+ update_gfn_track(slot, gfn, mode, 1);
+
+ /*
+ * new track stops large page mapping for the
+ * tracked page.
+ */
+ kvm_mmu_gfn_disallow_lpage(slot, gfn);
+
+ if (mode == KVM_PAGE_TRACK_WRITE)
+ if (kvm_mmu_slot_gfn_write_protect(kvm, slot, gfn))
+ kvm_flush_remote_tlbs(kvm);
+}
+
/*
* add guest page to the tracking pool so that corresponding access on that
* page will be intercepted.
@@ -101,21 +121,27 @@ void kvm_page_track_add_page(struct kvm *kvm, gfn_t gfn,
slot = __gfn_to_memslot(slots, gfn);

spin_lock(&kvm->mmu_lock);
- update_gfn_track(slot, gfn, mode, 1);
-
- /*
- * new track stops large page mapping for the
- * tracked page.
- */
- kvm_mmu_gfn_disallow_lpage(slot, gfn);
-
- if (mode == KVM_PAGE_TRACK_WRITE)
- if (kvm_mmu_slot_gfn_write_protect(kvm, slot, gfn))
- kvm_flush_remote_tlbs(kvm);
+ kvm_slot_page_track_add_page_nolock(kvm, slot, gfn, mode);
spin_unlock(&kvm->mmu_lock);
}
}

+void kvm_slot_page_track_remove_page_nolock(struct kvm *kvm,
+ struct kvm_memory_slot *slot,
+ gfn_t gfn,
+ enum kvm_page_track_mode mode)
+{
+ WARN_ON(!check_mode(mode));
+
+ update_gfn_track(slot, gfn, mode, -1);
+
+ /*
+ * allow large page mapping for the tracked page
+ * after the tracker is gone.
+ */
+ kvm_mmu_gfn_allow_lpage(slot, gfn);
+}
+
/*
* remove the guest page from the tracking pool which stops the interception
* of corresponding access on that page. It is the opposed operation of
@@ -134,20 +160,12 @@ void kvm_page_track_remove_page(struct kvm *kvm, gfn_t gfn,
struct kvm_memory_slot *slot;
int i;

- WARN_ON(!check_mode(mode));
-
for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
slots = __kvm_memslots(kvm, i);
slot = __gfn_to_memslot(slots, gfn);

spin_lock(&kvm->mmu_lock);
- update_gfn_track(slot, gfn, mode, -1);
-
- /*
- * allow large page mapping for the tracked page
- * after the tracker is gone.
- */
- kvm_mmu_gfn_allow_lpage(slot, gfn);
+ kvm_slot_page_track_remove_page_nolock(kvm, slot, gfn, mode);
spin_unlock(&kvm->mmu_lock);
}
}
--
1.8.3.1

2015-11-30 18:32:45

by Xiao Guangrong

[permalink] [raw]
Subject: [PATCH 09/11] KVM: MMU: simplify mmu_need_write_protect

Now, all non-leaf shadow page are page tracked, if gfn is not tracked
there is no non-leaf shadow page of gfn is existed, we can directly
make the shadow page of gfn to unsync

Signed-off-by: Xiao Guangrong <[email protected]>
---
arch/x86/kvm/mmu.c | 26 ++++++++------------------
1 file changed, 8 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 5a2ca73..f89e77f 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2461,41 +2461,31 @@ static void __kvm_unsync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
kvm_mmu_mark_parents_unsync(sp);
}

-static void kvm_unsync_pages(struct kvm_vcpu *vcpu, gfn_t gfn)
+static bool kvm_unsync_pages(struct kvm_vcpu *vcpu, gfn_t gfn,
+ bool can_unsync)
{
struct kvm_mmu_page *s;

for_each_gfn_indirect_valid_sp(vcpu->kvm, s, gfn) {
+ if (!can_unsync)
+ return true;
+
if (s->unsync)
continue;
WARN_ON(s->role.level != PT_PAGE_TABLE_LEVEL);
__kvm_unsync_page(vcpu, s);
}
+
+ return false;
}

static bool mmu_need_write_protect(struct kvm_vcpu *vcpu, gfn_t gfn,
bool can_unsync)
{
- struct kvm_mmu_page *s;
- bool need_unsync = false;
-
if (kvm_page_track_check_mode(vcpu, gfn, KVM_PAGE_TRACK_WRITE))
return true;

- for_each_gfn_indirect_valid_sp(vcpu->kvm, s, gfn) {
- if (!can_unsync)
- return true;
-
- if (s->role.level != PT_PAGE_TABLE_LEVEL)
- return true;
-
- if (!s->unsync)
- need_unsync = true;
- }
- if (need_unsync)
- kvm_unsync_pages(vcpu, gfn);
-
- return false;
+ return kvm_unsync_pages(vcpu, gfn, can_unsync);
}

static bool kvm_is_mmio_pfn(pfn_t pfn)
--
1.8.3.1

2015-11-30 18:34:00

by Xiao Guangrong

[permalink] [raw]
Subject: [PATCH 10/11] KVM: MMU: clear write-flooding on the fast path of tracked page

If the page fault is caused by write access on write tracked page, the
real shadow page walking is skipped, we lost the chance to clear write
flooding for the page structure current vcpu is using

Fix it by locklessly waking shadow page table to clear write flooding
on the shadow page structure out of mmu-lock. So that we change the
count to atomic_t

Signed-off-by: Xiao Guangrong <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 2 +-
arch/x86/kvm/mmu.c | 25 +++++++++++++++++++++----
arch/x86/kvm/paging_tmpl.h | 4 +++-
3 files changed, 25 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 0f7b940..ea7907d 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -252,7 +252,7 @@ struct kvm_mmu_page {
#endif

/* Number of writes since the last time traversal visited this page. */
- int write_flooding_count;
+ atomic_t write_flooding_count;
};

struct kvm_pio_request {
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index f89e77f..9f6a4ef 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2081,7 +2081,7 @@ static void init_shadow_page_table(struct kvm_mmu_page *sp)

static void __clear_sp_write_flooding_count(struct kvm_mmu_page *sp)
{
- sp->write_flooding_count = 0;
+ atomic_set(&sp->write_flooding_count, 0);
}

static void clear_sp_write_flooding_count(u64 *spte)
@@ -2461,8 +2461,7 @@ static void __kvm_unsync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
kvm_mmu_mark_parents_unsync(sp);
}

-static bool kvm_unsync_pages(struct kvm_vcpu *vcpu, gfn_t gfn,
- bool can_unsync)
+static bool kvm_unsync_pages(struct kvm_vcpu *vcpu, gfn_t gfn, bool can_unsync)
{
struct kvm_mmu_page *s;

@@ -3419,6 +3418,23 @@ static bool page_fault_handle_page_track(struct kvm_vcpu *vcpu,
return false;
}

+static void shadow_page_table_clear_flood(struct kvm_vcpu *vcpu, gva_t addr)
+{
+ struct kvm_shadow_walk_iterator iterator;
+ u64 spte;
+
+ if (!VALID_PAGE(vcpu->arch.mmu.root_hpa))
+ return;
+
+ walk_shadow_page_lockless_begin(vcpu);
+ for_each_shadow_entry_lockless(vcpu, addr, iterator, spte) {
+ clear_sp_write_flooding_count(iterator.sptep);
+ if (!is_shadow_present_pte(spte))
+ break;
+ }
+ walk_shadow_page_lockless_end(vcpu);
+}
+
static int nonpaging_page_fault(struct kvm_vcpu *vcpu, gva_t gva,
u32 error_code, bool prefault)
{
@@ -4246,7 +4262,8 @@ static bool detect_write_flooding(struct kvm_mmu_page *sp)
if (sp->role.level == PT_PAGE_TABLE_LEVEL)
return false;

- return ++sp->write_flooding_count >= 3;
+ atomic_inc(&sp->write_flooding_count);
+ return atomic_read(&sp->write_flooding_count) >= 3;
}

/*
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index ac85682..97fe5ac 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -735,8 +735,10 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t addr, u32 error_code,
return 0;
}

- if (page_fault_handle_page_track(vcpu, error_code, walker.gfn))
+ if (page_fault_handle_page_track(vcpu, error_code, walker.gfn)) {
+ shadow_page_table_clear_flood(vcpu, addr);
return 1;
+ }

vcpu->arch.write_fault_to_shadow_pgtable = false;

--
1.8.3.1

2015-11-30 18:32:47

by Xiao Guangrong

[permalink] [raw]
Subject: [PATCH 11/11] KVM: MMU: apply page track notifier

Register the notifier to receive write track event so that we can update
our shadow page table

It makes kvm_mmu_pte_write() be the callback of the notifier, no function
is changed

Signed-off-by: Xiao Guangrong <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 5 +++--
arch/x86/kvm/mmu.c | 19 +++++++++++++++++--
arch/x86/kvm/x86.c | 4 ++--
3 files changed, 22 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index ea7907d..698577a 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -658,6 +658,7 @@ struct kvm_arch {
*/
struct list_head active_mmu_pages;
struct list_head zapped_obsolete_pages;
+ struct kvm_page_track_notifier_node mmu_sp_tracker;
struct kvm_page_track_notifier_head track_notifier_head;

struct list_head assigned_dev_head;
@@ -953,6 +954,8 @@ void kvm_mmu_module_exit(void);
void kvm_mmu_destroy(struct kvm_vcpu *vcpu);
int kvm_mmu_create(struct kvm_vcpu *vcpu);
void kvm_mmu_setup(struct kvm_vcpu *vcpu);
+void kvm_mmu_init_vm(struct kvm *kvm);
+void kvm_mmu_uninit_vm(struct kvm *kvm);
void kvm_mmu_set_mask_ptes(u64 user_mask, u64 accessed_mask,
u64 dirty_mask, u64 nx_mask, u64 x_mask);

@@ -1092,8 +1095,6 @@ void kvm_pic_clear_all(struct kvm_pic *pic, int irq_source_id);

void kvm_inject_nmi(struct kvm_vcpu *vcpu);

-void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
- const u8 *new, int bytes);
int kvm_mmu_unprotect_page(struct kvm *kvm, gfn_t gfn);
int kvm_mmu_unprotect_page_virt(struct kvm_vcpu *vcpu, gva_t gva);
void __kvm_mmu_free_some_pages(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 9f6a4ef..a420c43 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -4325,8 +4325,8 @@ static u64 *get_written_sptes(struct kvm_mmu_page *sp, gpa_t gpa, int *nspte)
return spte;
}

-void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
- const u8 *new, int bytes)
+static void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
+ const u8 *new, int bytes)
{
gfn_t gfn = gpa >> PAGE_SHIFT;
struct kvm_mmu_page *sp;
@@ -4540,6 +4540,21 @@ void kvm_mmu_setup(struct kvm_vcpu *vcpu)
init_kvm_mmu(vcpu);
}

+void kvm_mmu_init_vm(struct kvm *kvm)
+{
+ struct kvm_page_track_notifier_node *node = &kvm->arch.mmu_sp_tracker;
+
+ node->track_write = kvm_mmu_pte_write;
+ kvm_page_track_register_notifier(kvm, node);
+}
+
+void kvm_mmu_uninit_vm(struct kvm *kvm)
+{
+ struct kvm_page_track_notifier_node *node = &kvm->arch.mmu_sp_tracker;
+
+ kvm_page_track_unregister_notifier(kvm, node);
+}
+
/* The return value indicates if tlb flush on all vcpus is needed. */
typedef bool (*slot_level_handler) (struct kvm *kvm, struct kvm_rmap_head *rmap_head);

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 64dbc69..adc031a 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4327,7 +4327,6 @@ int emulator_write_phys(struct kvm_vcpu *vcpu, gpa_t gpa,
ret = kvm_vcpu_write_guest(vcpu, gpa, val, bytes);
if (ret < 0)
return 0;
- kvm_mmu_pte_write(vcpu, gpa, val, bytes);
kvm_page_track_write(vcpu, gpa, val, bytes);
return 1;
}
@@ -4586,7 +4585,6 @@ static int emulator_cmpxchg_emulated(struct x86_emulate_ctxt *ctxt,
return X86EMUL_CMPXCHG_FAILED;

kvm_vcpu_mark_page_dirty(vcpu, gpa >> PAGE_SHIFT);
- kvm_mmu_pte_write(vcpu, gpa, new, bytes);
kvm_page_track_write(vcpu, gpa, new, bytes);

return X86EMUL_CONTINUE;
@@ -7694,6 +7692,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
INIT_DELAYED_WORK(&kvm->arch.kvmclock_sync_work, kvmclock_sync_fn);

kvm_page_track_init(kvm);
+ kvm_mmu_init_vm(kvm);

return 0;
}
@@ -7821,6 +7820,7 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
kfree(kvm->arch.vioapic);
kvm_free_vcpus(kvm);
kfree(rcu_dereference_check(kvm->arch.apic_map, 1));
+ kvm_mmu_uninit_vm(kvm);
}

void kvm_arch_free_memslot(struct kvm *kvm, struct kvm_memory_slot *free,
--
1.8.3.1

2015-12-01 10:17:36

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH 00/11] KVM: x86: track guest page access



On 30/11/2015 19:26, Xiao Guangrong wrote:
> This patchset introduces the feature which allows us to track page
> access in guest. Currently, only write access tracking is implemented
> in this version.
>
> Four APIs are introduces:
> - kvm_page_track_add_page(kvm, gfn, mode), single guest page @gfn is
> added into the track pool of the guest instance represented by @kvm,
> @mode specifies which kind of access on the @gfn is tracked
>
> - kvm_page_track_remove_page(kvm, gfn, mode), is the opposed operation
> of kvm_page_track_add_page() which removes @gfn from the tracking pool.
> gfn is no tracked after its last user is gone
>
> - kvm_page_track_register_notifier(kvm, n), register a notifier so that
> the event triggered by page tracking will be received, at that time,
> the callback of n->track_write() will be called
>
> - kvm_page_track_unregister_notifier(kvm, n), does the opposed operation
> of kvm_page_track_register_notifier(), which unlinks the notifier and
> stops receiving the tracked event
>
> The first user of page track is non-leaf shadow page tables as they are
> always write protected. It also gains performance improvement because
> page track speeds up page fault handler for the tracked pages. The
> performance result of kernel building is as followings:
>
> before after
> real 461.63 real 455.48
> user 4529.55 user 4557.88
> sys 1995.39 sys 1922.57

For KVM-GT, as far as I know Andrea Arcangeli is working on extending
userfaultfd to tracking write faults only. Perhaps KVM-GT can do
something similar, where KVM gets the write tracking functionality for
free through the MMU notifiers. Any thoughts on this?

Applying your technique to non-leaf shadow pages actually makes this
series quite interesting. :) Shadow paging is still in use for nested
EPT, so it's always a good idea to speed it up.

Paolo

2015-12-01 15:02:46

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: [PATCH 00/11] KVM: x86: track guest page access

On Tue, Dec 01, 2015 at 11:17:30AM +0100, Paolo Bonzini wrote:
>
>
> On 30/11/2015 19:26, Xiao Guangrong wrote:
> > This patchset introduces the feature which allows us to track page
> > access in guest. Currently, only write access tracking is implemented
> > in this version.
> >
> > Four APIs are introduces:
> > - kvm_page_track_add_page(kvm, gfn, mode), single guest page @gfn is
> > added into the track pool of the guest instance represented by @kvm,
> > @mode specifies which kind of access on the @gfn is tracked
> >
> > - kvm_page_track_remove_page(kvm, gfn, mode), is the opposed operation
> > of kvm_page_track_add_page() which removes @gfn from the tracking pool.
> > gfn is no tracked after its last user is gone
> >
> > - kvm_page_track_register_notifier(kvm, n), register a notifier so that
> > the event triggered by page tracking will be received, at that time,
> > the callback of n->track_write() will be called
> >
> > - kvm_page_track_unregister_notifier(kvm, n), does the opposed operation
> > of kvm_page_track_register_notifier(), which unlinks the notifier and
> > stops receiving the tracked event
> >
> > The first user of page track is non-leaf shadow page tables as they are
> > always write protected. It also gains performance improvement because
> > page track speeds up page fault handler for the tracked pages. The
> > performance result of kernel building is as followings:
> >
> > before after
> > real 461.63 real 455.48
> > user 4529.55 user 4557.88
> > sys 1995.39 sys 1922.57
>
> For KVM-GT, as far as I know Andrea Arcangeli is working on extending
> userfaultfd to tracking write faults only. Perhaps KVM-GT can do

I was a bit busy lately with the KSMscale design change and to fix a
THP purely theoretical issue, but the userfaultfd write tracking is
already become available here:

http://www.spinics.net/lists/linux-mm/msg97422.html

I'll be merging it soon in my tree after a thoughtful review.

> something similar, where KVM gets the write tracking functionality for
> free through the MMU notifiers. Any thoughts on this?
>
> Applying your technique to non-leaf shadow pages actually makes this
> series quite interesting. :) Shadow paging is still in use for nested
> EPT, so it's always a good idea to speed it up.

I don't have the full picture of how userfaultfd write tracking could
also fit in the leaf/non-leaf shadow pagetable write tracking yet but
it's good to think about it.

In the userfaultfd case the write notification would always arrive
first through the uffd and it would be received by the qemu userfault
thread, it's then the uffd memory protect ioctl invoked by the qemu
userfault thread (to handle the write fault in userland and wake up
the thread stuck in handle_userfault()) that would also flush the
secondary MMU TLB through MMU notifier and get rid of the readonly
spte (or update it to read-write with change_pte in the best case).

Thanks,
Andrea

2015-12-01 15:08:34

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH 00/11] KVM: x86: track guest page access



On 01/12/2015 16:02, Andrea Arcangeli wrote:
> > Applying your technique to non-leaf shadow pages actually makes this
> > series quite interesting. :) Shadow paging is still in use for nested
> > EPT, so it's always a good idea to speed it up.
>
> I don't have the full picture of how userfaultfd write tracking could
> also fit in the leaf/non-leaf shadow pagetable write tracking yet but
> it's good to think about it.

It's unrelated.

Xiao wrote this series for KVM-GT. I'm suggesting that he uses
userfaultfd write tracking (or similar techniques---but anyway
implemented out of KVM) for KVM-GT. The benefit is that KVM-GT is then
unrelated to KVM, similar to legacy KVM device assignment vs. VFIO.

However, he also applied this new API to shadow pagetable write
tracking. He gets measurable (~2%) performance improvement. We can
look separately at how to get a similar performance improvement, even if
KVM-GT will not use the new page tracking API.

Paolo

2015-12-01 17:07:19

by Xiao Guangrong

[permalink] [raw]
Subject: Re: [PATCH 00/11] KVM: x86: track guest page access



On 12/01/2015 06:17 PM, Paolo Bonzini wrote:
>
>
> On 30/11/2015 19:26, Xiao Guangrong wrote:
>> This patchset introduces the feature which allows us to track page
>> access in guest. Currently, only write access tracking is implemented
>> in this version.
>>
>> Four APIs are introduces:
>> - kvm_page_track_add_page(kvm, gfn, mode), single guest page @gfn is
>> added into the track pool of the guest instance represented by @kvm,
>> @mode specifies which kind of access on the @gfn is tracked
>>
>> - kvm_page_track_remove_page(kvm, gfn, mode), is the opposed operation
>> of kvm_page_track_add_page() which removes @gfn from the tracking pool.
>> gfn is no tracked after its last user is gone
>>
>> - kvm_page_track_register_notifier(kvm, n), register a notifier so that
>> the event triggered by page tracking will be received, at that time,
>> the callback of n->track_write() will be called
>>
>> - kvm_page_track_unregister_notifier(kvm, n), does the opposed operation
>> of kvm_page_track_register_notifier(), which unlinks the notifier and
>> stops receiving the tracked event
>>
>> The first user of page track is non-leaf shadow page tables as they are
>> always write protected. It also gains performance improvement because
>> page track speeds up page fault handler for the tracked pages. The
>> performance result of kernel building is as followings:
>>
>> before after
>> real 461.63 real 455.48
>> user 4529.55 user 4557.88
>> sys 1995.39 sys 1922.57
>
> For KVM-GT, as far as I know Andrea Arcangeli is working on extending
> userfaultfd to tracking write faults only. Perhaps KVM-GT can do
> something similar, where KVM gets the write tracking functionality for
> free through the MMU notifiers. Any thoughts on this?

Userfaultfd is excellent and has the ability to notify write event indeed,
however, it is not suitable for the use case of shadow page.

For the performance, shadow GPU is performance critical and requires
frequently being switched, it is not good to handle it in userspace. And
windows guest has many GPU tables and updates it frequently, that means,
we need to write protect huge number of pages which are single page based,
I am afraid userfaultfd can not handle this case efficiently.

For the functionality, userfaultfd can not fill the need of shadow page
because:
- the page is keeping readonly, userfaultfd can not fix the fault and let
the vcpu progress (write access causes writeable gup).

- the access need to be emulated, however, userfaultfd/kernel does not have
the ability to emulate the access as the access is trigged by guest, the
instruction info is stored in VMCS so that only KVM can emulate it.

- shadow page needs to be notified after the emulation is finished as it
should know the new data written to the page to update its page hierarchy.
(some hardwares lack the 'retry' ability so the shadow page table need to
reflect the table in guest at any time).

>
> Applying your technique to non-leaf shadow pages actually makes this
> series quite interesting. :) Shadow paging is still in use for nested
> EPT, so it's always a good idea to speed it up.

Yes. Very glad to see you like it. :)

2015-12-05 17:03:12

by Xiao Guangrong

[permalink] [raw]
Subject: Re: [PATCH 00/11] KVM: x86: track guest page access


Ping...

Paolo, any comment?



On 12/02/2015 01:00 AM, Xiao Guangrong wrote:
>
>
> On 12/01/2015 06:17 PM, Paolo Bonzini wrote:
>>
>>
>> On 30/11/2015 19:26, Xiao Guangrong wrote:
>>> This patchset introduces the feature which allows us to track page
>>> access in guest. Currently, only write access tracking is implemented
>>> in this version.
>>>
>>> Four APIs are introduces:
>>> - kvm_page_track_add_page(kvm, gfn, mode), single guest page @gfn is
>>> added into the track pool of the guest instance represented by @kvm,
>>> @mode specifies which kind of access on the @gfn is tracked
>>>
>>> - kvm_page_track_remove_page(kvm, gfn, mode), is the opposed operation
>>> of kvm_page_track_add_page() which removes @gfn from the tracking pool.
>>> gfn is no tracked after its last user is gone
>>>
>>> - kvm_page_track_register_notifier(kvm, n), register a notifier so that
>>> the event triggered by page tracking will be received, at that time,
>>> the callback of n->track_write() will be called
>>>
>>> - kvm_page_track_unregister_notifier(kvm, n), does the opposed operation
>>> of kvm_page_track_register_notifier(), which unlinks the notifier and
>>> stops receiving the tracked event
>>>
>>> The first user of page track is non-leaf shadow page tables as they are
>>> always write protected. It also gains performance improvement because
>>> page track speeds up page fault handler for the tracked pages. The
>>> performance result of kernel building is as followings:
>>>
>>> before after
>>> real 461.63 real 455.48
>>> user 4529.55 user 4557.88
>>> sys 1995.39 sys 1922.57
>>
>> For KVM-GT, as far as I know Andrea Arcangeli is working on extending
>> userfaultfd to tracking write faults only. Perhaps KVM-GT can do
>> something similar, where KVM gets the write tracking functionality for
>> free through the MMU notifiers. Any thoughts on this?
>
> Userfaultfd is excellent and has the ability to notify write event indeed,
> however, it is not suitable for the use case of shadow page.
>
> For the performance, shadow GPU is performance critical and requires
> frequently being switched, it is not good to handle it in userspace. And
> windows guest has many GPU tables and updates it frequently, that means,
> we need to write protect huge number of pages which are single page based,
> I am afraid userfaultfd can not handle this case efficiently.
>
> For the functionality, userfaultfd can not fill the need of shadow page
> because:
> - the page is keeping readonly, userfaultfd can not fix the fault and let
> the vcpu progress (write access causes writeable gup).
>
> - the access need to be emulated, however, userfaultfd/kernel does not have
> the ability to emulate the access as the access is trigged by guest, the
> instruction info is stored in VMCS so that only KVM can emulate it.
>
> - shadow page needs to be notified after the emulation is finished as it
> should know the new data written to the page to update its page hierarchy.
> (some hardwares lack the 'retry' ability so the shadow page table need to
> reflect the table in guest at any time).
>
>>
>> Applying your technique to non-leaf shadow pages actually makes this
>> series quite interesting. :) Shadow paging is still in use for nested
>> EPT, so it's always a good idea to speed it up.
>
> Yes. Very glad to see you like it. :)
>
>

2015-12-15 07:11:15

by Kai Huang

[permalink] [raw]
Subject: Re: [PATCH 04/11] KVM: page track: add the framework of guest page tracking

Hi Guangrong,

I am starting to review this series, and should have some comments or
questions, you can determine whether they are valuable :)

See below.

On 12/01/2015 02:26 AM, Xiao Guangrong wrote:
> The array, gfn_track[mode][gfn], is introduced in memory slot for every
> guest page, this is the tracking count for the gust page on different
> modes. If the page is tracked then the count is increased, the page is
> not tracked after the count reaches zero
>
> Two callbacks, kvm_page_track_create_memslot() and
> kvm_page_track_free_memslot() are implemented in this patch, they are
> internally used to initialize and reclaim the memory of the array
>
> Currently, only write track mode is supported
>
> Signed-off-by: Xiao Guangrong <[email protected]>
> ---
> arch/x86/include/asm/kvm_host.h | 2 ++
> arch/x86/include/asm/kvm_page_track.h | 13 ++++++++
> arch/x86/kvm/Makefile | 3 +-
> arch/x86/kvm/page_track.c | 58 +++++++++++++++++++++++++++++++++++
> arch/x86/kvm/x86.c | 5 +++
> 5 files changed, 80 insertions(+), 1 deletion(-)
> create mode 100644 arch/x86/include/asm/kvm_page_track.h
> create mode 100644 arch/x86/kvm/page_track.c
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 5aa2dcc..afff1f1 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -32,6 +32,7 @@
> #include <asm/mtrr.h>
> #include <asm/msr-index.h>
> #include <asm/asm.h>
> +#include <asm/kvm_page_track.h>
>
> #define KVM_MAX_VCPUS 255
> #define KVM_SOFT_MAX_VCPUS 160
> @@ -612,6 +613,7 @@ struct kvm_lpage_info {
> struct kvm_arch_memory_slot {
> struct kvm_rmap_head *rmap[KVM_NR_PAGE_SIZES];
> struct kvm_lpage_info *lpage_info[KVM_NR_PAGE_SIZES - 1];
> + int *gfn_track[KVM_PAGE_TRACK_MAX];
> };
>
> /*
> diff --git a/arch/x86/include/asm/kvm_page_track.h b/arch/x86/include/asm/kvm_page_track.h
> new file mode 100644
> index 0000000..347d5c9
> --- /dev/null
> +++ b/arch/x86/include/asm/kvm_page_track.h
> @@ -0,0 +1,13 @@
> +#ifndef _ASM_X86_KVM_PAGE_TRACK_H
> +#define _ASM_X86_KVM_PAGE_TRACK_H
> +
> +enum kvm_page_track_mode {
> + KVM_PAGE_TRACK_WRITE,
> + KVM_PAGE_TRACK_MAX,
> +};
> +
> +int kvm_page_track_create_memslot(struct kvm_memory_slot *slot,
> + unsigned long npages);
> +void kvm_page_track_free_memslot(struct kvm_memory_slot *free,
> + struct kvm_memory_slot *dont);
> +#endif
> diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
> index a1ff508..464fa47 100644
> --- a/arch/x86/kvm/Makefile
> +++ b/arch/x86/kvm/Makefile
> @@ -13,9 +13,10 @@ kvm-$(CONFIG_KVM_ASYNC_PF) += $(KVM)/async_pf.o
>
> kvm-y += x86.o mmu.o emulate.o i8259.o irq.o lapic.o \
> i8254.o ioapic.o irq_comm.o cpuid.o pmu.o mtrr.o \
> - hyperv.o
> + hyperv.o page_track.o
>
> kvm-$(CONFIG_KVM_DEVICE_ASSIGNMENT) += assigned-dev.o iommu.o
> +
> kvm-intel-y += vmx.o pmu_intel.o
> kvm-amd-y += svm.o pmu_amd.o
>
> diff --git a/arch/x86/kvm/page_track.c b/arch/x86/kvm/page_track.c
> new file mode 100644
> index 0000000..0338d36
> --- /dev/null
> +++ b/arch/x86/kvm/page_track.c
> @@ -0,0 +1,58 @@
> +/*
> + * Support KVM gust page tracking
> + *
> + * This feature allows us to track page access in guest. Currently, only
> + * write access is tracked.
> + *
> + * Copyright(C) 2015 Intel Corporation.
> + *
> + * Author:
> + * Xiao Guangrong <[email protected]>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2. See
> + * the COPYING file in the top-level directory.
> + */
> +
> +#include <linux/kvm_host.h>
> +#include <asm/kvm_host.h>
> +#include <asm/kvm_page_track.h>
> +
> +#include "mmu.h"
> +
> +static void page_track_slot_free(struct kvm_memory_slot *slot)
> +{
> + int i;
> +
> + for (i = 0; i < KVM_PAGE_TRACK_MAX; i++)
> + if (slot->arch.gfn_track[i]) {
> + kvfree(slot->arch.gfn_track[i]);
> + slot->arch.gfn_track[i] = NULL;
> + }
> +}
> +
> +int kvm_page_track_create_memslot(struct kvm_memory_slot *slot,
> + unsigned long npages)
> +{
> + int i, pages = gfn_to_index(slot->base_gfn + npages - 1,
> + slot->base_gfn, PT_PAGE_TABLE_LEVEL) + 1;
> +
> + for (i = 0; i < KVM_PAGE_TRACK_MAX; i++) {
> + slot->arch.gfn_track[i] = kvm_kvzalloc(pages *
> + sizeof(*slot->arch.gfn_track[i]));
> + if (!slot->arch.gfn_track[i])
> + goto track_free;
> + }
> +
> + return 0;
> +
> +track_free:
> + page_track_slot_free(slot);
> + return -ENOMEM;
> +}
Is it necessary to use the 'unsigned long npages' pareameter? In my
understanding you are going to track all GFNs in the memory slot anyway,
right? If you want to keep npages, I think it's better to add a base_gfn
as well otherwise you are assuming you are going to track the npages GFN
starting from slot->base_gfn.

> +
> +void kvm_page_track_free_memslot(struct kvm_memory_slot *free,
> + struct kvm_memory_slot *dont)
> +{
> + if (!dont || free->arch.gfn_track != dont->arch.gfn_track)
> + page_track_slot_free(free);
> +}
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index c04987e..ad4888a 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -7838,6 +7838,8 @@ void kvm_arch_free_memslot(struct kvm *kvm, struct kvm_memory_slot *free,
> free->arch.lpage_info[i - 1] = NULL;
> }
> }
> +
> + kvm_page_track_free_memslot(free, dont);
> }
>
> int kvm_arch_create_memslot(struct kvm *kvm, struct kvm_memory_slot *slot,
> @@ -7886,6 +7888,9 @@ int kvm_arch_create_memslot(struct kvm *kvm, struct kvm_memory_slot *slot,
> }
> }
>
> + if (kvm_page_track_create_memslot(slot, npages))
> + goto out_free;
> +
Looks essentially you are allocating one int for all GFNs of the slot
unconditionally. In my understanding for most of memory slots, we are
not going to track them, so isn't it going to be wasteful of memory?

Thanks,
-Kai
> return 0;
>
> out_free:

2015-12-15 07:19:24

by Kai Huang

[permalink] [raw]
Subject: Re: [PATCH 05/11] KVM: page track: introduce kvm_page_track_{add,remove}_page



On 12/01/2015 02:26 AM, Xiao Guangrong wrote:
> These two functions are the user APIs:
> - kvm_page_track_add_page(): add the page to the tracking pool after
> that later specified access on that page will be tracked
>
> - kvm_page_track_remove_page(): remove the page from the tracking pool,
> the specified access on the page is not tracked after the last user is
> gone
>
> Both of these are called under the protection of kvm->srcu or
> kvm->slots_lock
>
> Signed-off-by: Xiao Guangrong <[email protected]>
> ---
> arch/x86/include/asm/kvm_page_track.h | 5 ++
> arch/x86/kvm/page_track.c | 95 +++++++++++++++++++++++++++++++++++
> 2 files changed, 100 insertions(+)
>
> diff --git a/arch/x86/include/asm/kvm_page_track.h b/arch/x86/include/asm/kvm_page_track.h
> index 347d5c9..9cc17c6 100644
> --- a/arch/x86/include/asm/kvm_page_track.h
> +++ b/arch/x86/include/asm/kvm_page_track.h
> @@ -10,4 +10,9 @@ int kvm_page_track_create_memslot(struct kvm_memory_slot *slot,
> unsigned long npages);
> void kvm_page_track_free_memslot(struct kvm_memory_slot *free,
> struct kvm_memory_slot *dont);
> +
> +void kvm_page_track_add_page(struct kvm *kvm, gfn_t gfn,
> + enum kvm_page_track_mode mode);
> +void kvm_page_track_remove_page(struct kvm *kvm, gfn_t gfn,
> + enum kvm_page_track_mode mode);
> #endif
> diff --git a/arch/x86/kvm/page_track.c b/arch/x86/kvm/page_track.c
> index 0338d36..ad510db 100644
> --- a/arch/x86/kvm/page_track.c
> +++ b/arch/x86/kvm/page_track.c
> @@ -56,3 +56,98 @@ void kvm_page_track_free_memslot(struct kvm_memory_slot *free,
> if (!dont || free->arch.gfn_track != dont->arch.gfn_track)
> page_track_slot_free(free);
> }
> +
> +static bool check_mode(enum kvm_page_track_mode mode)
> +{
> + if (mode < 0 || mode >= KVM_PAGE_TRACK_MAX)
> + return false;
> +
> + return true;
> +}
> +
> +static void update_gfn_track(struct kvm_memory_slot *slot, gfn_t gfn,
> + enum kvm_page_track_mode mode, int count)
> +{
> + int index, val;
> +
> + index = gfn_to_index(gfn, slot->base_gfn, PT_PAGE_TABLE_LEVEL);
> +
> + slot->arch.gfn_track[mode][index] += count;
> + val = slot->arch.gfn_track[mode][index];
> + WARN_ON(val < 0);
> +}
> +
> +/*
> + * add guest page to the tracking pool so that corresponding access on that
> + * page will be intercepted.
> + *
> + * It should be called under the protection of kvm->srcu or kvm->slots_lock
> + *
> + * @kvm: the guest instance we are interested in.
> + * @gfn: the guest page.
> + * @mode: tracking mode, currently only write track is supported.
> + */
> +void kvm_page_track_add_page(struct kvm *kvm, gfn_t gfn,
> + enum kvm_page_track_mode mode)
> +{
> + struct kvm_memslots *slots;
> + struct kvm_memory_slot *slot;
> + int i;
> +
> + WARN_ON(!check_mode(mode));
> +
> + for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
> + slots = __kvm_memslots(kvm, i);
> + slot = __gfn_to_memslot(slots, gfn);
> +
> + spin_lock(&kvm->mmu_lock);
> + update_gfn_track(slot, gfn, mode, 1);
> +
> + /*
> + * new track stops large page mapping for the
> + * tracked page.
> + */
> + kvm_mmu_gfn_disallow_lpage(slot, gfn);
Where is kvm_mmu_gfn_disallow_lpage? Neither did I see it in your patch
nor in my own latest KVM repo without your patch :)

> +
> + if (mode == KVM_PAGE_TRACK_WRITE)
> + if (kvm_mmu_slot_gfn_write_protect(kvm, slot, gfn))
> + kvm_flush_remote_tlbs(kvm);
Neither can I find kvm_mmu_slot_gfn_write_protect. Did I miss something?

Thanks,
-Kai
> + spin_unlock(&kvm->mmu_lock);
> + }
> +}
> +
> +/*
> + * remove the guest page from the tracking pool which stops the interception
> + * of corresponding access on that page. It is the opposed operation of
> + * kvm_page_track_add_page().
> + *
> + * It should be called under the protection of kvm->srcu or kvm->slots_lock
> + *
> + * @kvm: the guest instance we are interested in.
> + * @gfn: the guest page.
> + * @mode: tracking mode, currently only write track is supported.
> + */
> +void kvm_page_track_remove_page(struct kvm *kvm, gfn_t gfn,
> + enum kvm_page_track_mode mode)
> +{
> + struct kvm_memslots *slots;
> + struct kvm_memory_slot *slot;
> + int i;
> +
> + WARN_ON(!check_mode(mode));
> +
> + for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
> + slots = __kvm_memslots(kvm, i);
> + slot = __gfn_to_memslot(slots, gfn);
> +
> + spin_lock(&kvm->mmu_lock);
> + update_gfn_track(slot, gfn, mode, -1);
> +
> + /*
> + * allow large page mapping for the tracked page
> + * after the tracker is gone.
> + */
> + kvm_mmu_gfn_allow_lpage(slot, gfn);
> + spin_unlock(&kvm->mmu_lock);
> + }
> +}

2015-12-15 07:56:44

by Kai Huang

[permalink] [raw]
Subject: Re: [PATCH 08/11] KVM: MMU: use page track for non-leaf shadow pages



On 12/01/2015 02:26 AM, Xiao Guangrong wrote:
> non-leaf shadow pages are always write protected, it can be the user
> of page track
>
> Signed-off-by: Xiao Guangrong <[email protected]>
> ---
> arch/x86/include/asm/kvm_page_track.h | 8 +++++
> arch/x86/kvm/mmu.c | 26 +++++++++++++---
> arch/x86/kvm/page_track.c | 58 +++++++++++++++++++++++------------
> 3 files changed, 67 insertions(+), 25 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm_page_track.h b/arch/x86/include/asm/kvm_page_track.h
> index 6744234..3447dac 100644
> --- a/arch/x86/include/asm/kvm_page_track.h
> +++ b/arch/x86/include/asm/kvm_page_track.h
> @@ -41,8 +41,16 @@ int kvm_page_track_create_memslot(struct kvm_memory_slot *slot,
> void kvm_page_track_free_memslot(struct kvm_memory_slot *free,
> struct kvm_memory_slot *dont);
>
> +void
> +kvm_slot_page_track_add_page_nolock(struct kvm *kvm,
> + struct kvm_memory_slot *slot, gfn_t gfn,
> + enum kvm_page_track_mode mode);
> void kvm_page_track_add_page(struct kvm *kvm, gfn_t gfn,
> enum kvm_page_track_mode mode);
> +void kvm_slot_page_track_remove_page_nolock(struct kvm *kvm,
> + struct kvm_memory_slot *slot,
> + gfn_t gfn,
> + enum kvm_page_track_mode mode);
> void kvm_page_track_remove_page(struct kvm *kvm, gfn_t gfn,
> enum kvm_page_track_mode mode);
> bool kvm_page_track_check_mode(struct kvm_vcpu *vcpu, gfn_t gfn,
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index b23f9fc..5a2ca73 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -806,11 +806,17 @@ static void account_shadowed(struct kvm *kvm, struct kvm_mmu_page *sp)
> struct kvm_memory_slot *slot;
> gfn_t gfn;
>
> + kvm->arch.indirect_shadow_pages++;
> gfn = sp->gfn;
> slots = kvm_memslots_for_spte_role(kvm, sp->role);
> slot = __gfn_to_memslot(slots, gfn);
> +
> + /* the non-leaf shadow pages are keeping readonly. */
> + if (sp->role.level > PT_PAGE_TABLE_LEVEL)
> + return kvm_slot_page_track_add_page_nolock(kvm, slot, gfn,
> + KVM_PAGE_TRACK_WRITE);
> +
> kvm_mmu_gfn_disallow_lpage(slot, gfn);
> - kvm->arch.indirect_shadow_pages++;
> }
>
> static void unaccount_shadowed(struct kvm *kvm, struct kvm_mmu_page *sp)
> @@ -819,11 +825,15 @@ static void unaccount_shadowed(struct kvm *kvm, struct kvm_mmu_page *sp)
> struct kvm_memory_slot *slot;
> gfn_t gfn;
>
> + kvm->arch.indirect_shadow_pages--;
> gfn = sp->gfn;
> slots = kvm_memslots_for_spte_role(kvm, sp->role);
> slot = __gfn_to_memslot(slots, gfn);
> + if (sp->role.level > PT_PAGE_TABLE_LEVEL)
> + return kvm_slot_page_track_remove_page_nolock(kvm, slot, gfn,
> + KVM_PAGE_TRACK_WRITE);
> +
> kvm_mmu_gfn_allow_lpage(slot, gfn);
> - kvm->arch.indirect_shadow_pages--;
> }
>
> static bool __mmu_gfn_lpage_is_disallowed(gfn_t gfn, int level,
> @@ -2140,12 +2150,18 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu,
> hlist_add_head(&sp->hash_link,
> &vcpu->kvm->arch.mmu_page_hash[kvm_page_table_hashfn(gfn)]);
> if (!direct) {
> - if (rmap_write_protect(vcpu, gfn))
> + /*
> + * we should do write protection before syncing pages
> + * otherwise the content of the synced shadow page may
> + * be inconsistent with guest page table.
> + */
> + account_shadowed(vcpu->kvm, sp);
> +
> + if (level == PT_PAGE_TABLE_LEVEL &&
> + rmap_write_protect(vcpu, gfn))
> kvm_flush_remote_tlbs(vcpu->kvm);
I think your modification is good but I am little bit confused here. In
account_shadowed, if sp->role.level > PT_PAGE_TABLE_LEVEL, the sp->gfn
is write protected, and this is reasonable. So why write protecting the
gfn of PT_PAGE_TABLE_LEVEL here?

> if (level > PT_PAGE_TABLE_LEVEL && need_sync)
> kvm_sync_pages(vcpu, gfn);
> -
> - account_shadowed(vcpu->kvm, sp);
> }
> sp->mmu_valid_gen = vcpu->kvm->arch.mmu_valid_gen;
> init_shadow_page_table(sp);
> diff --git a/arch/x86/kvm/page_track.c b/arch/x86/kvm/page_track.c
> index 84420df..87554d3 100644
> --- a/arch/x86/kvm/page_track.c
> +++ b/arch/x86/kvm/page_track.c
> @@ -77,6 +77,26 @@ static void update_gfn_track(struct kvm_memory_slot *slot, gfn_t gfn,
> WARN_ON(val < 0);
> }
>
> +void
> +kvm_slot_page_track_add_page_nolock(struct kvm *kvm,
> + struct kvm_memory_slot *slot, gfn_t gfn,
> + enum kvm_page_track_mode mode)
> +{
> + WARN_ON(!check_mode(mode));
> +
> + update_gfn_track(slot, gfn, mode, 1);
> +
> + /*
> + * new track stops large page mapping for the
> + * tracked page.
> + */
> + kvm_mmu_gfn_disallow_lpage(slot, gfn);
> +
> + if (mode == KVM_PAGE_TRACK_WRITE)
> + if (kvm_mmu_slot_gfn_write_protect(kvm, slot, gfn))
> + kvm_flush_remote_tlbs(kvm);
> +}
> +
> /*
> * add guest page to the tracking pool so that corresponding access on that
> * page will be intercepted.
> @@ -101,21 +121,27 @@ void kvm_page_track_add_page(struct kvm *kvm, gfn_t gfn,
> slot = __gfn_to_memslot(slots, gfn);
>
> spin_lock(&kvm->mmu_lock);
> - update_gfn_track(slot, gfn, mode, 1);
> -
> - /*
> - * new track stops large page mapping for the
> - * tracked page.
> - */
> - kvm_mmu_gfn_disallow_lpage(slot, gfn);
> -
> - if (mode == KVM_PAGE_TRACK_WRITE)
> - if (kvm_mmu_slot_gfn_write_protect(kvm, slot, gfn))
> - kvm_flush_remote_tlbs(kvm);
> + kvm_slot_page_track_add_page_nolock(kvm, slot, gfn, mode);
> spin_unlock(&kvm->mmu_lock);
> }
> }
>
> +void kvm_slot_page_track_remove_page_nolock(struct kvm *kvm,
> + struct kvm_memory_slot *slot,
> + gfn_t gfn,
> + enum kvm_page_track_mode mode)
> +{
> + WARN_ON(!check_mode(mode));
> +
> + update_gfn_track(slot, gfn, mode, -1);
> +
> + /*
> + * allow large page mapping for the tracked page
> + * after the tracker is gone.
> + */
> + kvm_mmu_gfn_allow_lpage(slot, gfn);
> +}
> +
> /*
> * remove the guest page from the tracking pool which stops the interception
> * of corresponding access on that page. It is the opposed operation of
> @@ -134,20 +160,12 @@ void kvm_page_track_remove_page(struct kvm *kvm, gfn_t gfn,
> struct kvm_memory_slot *slot;
> int i;
>
> - WARN_ON(!check_mode(mode));
> -
> for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
> slots = __kvm_memslots(kvm, i);
> slot = __gfn_to_memslot(slots, gfn);
>
> spin_lock(&kvm->mmu_lock);
> - update_gfn_track(slot, gfn, mode, -1);
> -
> - /*
> - * allow large page mapping for the tracked page
> - * after the tracker is gone.
> - */
> - kvm_mmu_gfn_allow_lpage(slot, gfn);
> + kvm_slot_page_track_remove_page_nolock(kvm, slot, gfn, mode);
Looks you need to merge this part with patch 1, as you are modifying
kvm_page_track_{add,remove}_page here, which are introduced in your patch 1.

Thanks,
-Kai
> spin_unlock(&kvm->mmu_lock);
> }
> }

2015-12-15 08:00:22

by Kai Huang

[permalink] [raw]
Subject: Re: [PATCH 05/11] KVM: page track: introduce kvm_page_track_{add,remove}_page



On 12/15/2015 03:15 PM, Kai Huang wrote:
>
>
> On 12/01/2015 02:26 AM, Xiao Guangrong wrote:
>> These two functions are the user APIs:
>> - kvm_page_track_add_page(): add the page to the tracking pool after
>> that later specified access on that page will be tracked
>>
>> - kvm_page_track_remove_page(): remove the page from the tracking pool,
>> the specified access on the page is not tracked after the last
>> user is
>> gone
>>
>> Both of these are called under the protection of kvm->srcu or
>> kvm->slots_lock
>>
>> Signed-off-by: Xiao Guangrong <[email protected]>
>> ---
>> arch/x86/include/asm/kvm_page_track.h | 5 ++
>> arch/x86/kvm/page_track.c | 95
>> +++++++++++++++++++++++++++++++++++
>> 2 files changed, 100 insertions(+)
>>
>> diff --git a/arch/x86/include/asm/kvm_page_track.h
>> b/arch/x86/include/asm/kvm_page_track.h
>> index 347d5c9..9cc17c6 100644
>> --- a/arch/x86/include/asm/kvm_page_track.h
>> +++ b/arch/x86/include/asm/kvm_page_track.h
>> @@ -10,4 +10,9 @@ int kvm_page_track_create_memslot(struct
>> kvm_memory_slot *slot,
>> unsigned long npages);
>> void kvm_page_track_free_memslot(struct kvm_memory_slot *free,
>> struct kvm_memory_slot *dont);
>> +
>> +void kvm_page_track_add_page(struct kvm *kvm, gfn_t gfn,
>> + enum kvm_page_track_mode mode);
>> +void kvm_page_track_remove_page(struct kvm *kvm, gfn_t gfn,
>> + enum kvm_page_track_mode mode);
>> #endif
>> diff --git a/arch/x86/kvm/page_track.c b/arch/x86/kvm/page_track.c
>> index 0338d36..ad510db 100644
>> --- a/arch/x86/kvm/page_track.c
>> +++ b/arch/x86/kvm/page_track.c
>> @@ -56,3 +56,98 @@ void kvm_page_track_free_memslot(struct
>> kvm_memory_slot *free,
>> if (!dont || free->arch.gfn_track != dont->arch.gfn_track)
>> page_track_slot_free(free);
>> }
>> +
>> +static bool check_mode(enum kvm_page_track_mode mode)
>> +{
>> + if (mode < 0 || mode >= KVM_PAGE_TRACK_MAX)
>> + return false;
>> +
>> + return true;
>> +}
>> +
>> +static void update_gfn_track(struct kvm_memory_slot *slot, gfn_t gfn,
>> + enum kvm_page_track_mode mode, int count)
>> +{
>> + int index, val;
>> +
>> + index = gfn_to_index(gfn, slot->base_gfn, PT_PAGE_TABLE_LEVEL);
>> +
>> + slot->arch.gfn_track[mode][index] += count;
>> + val = slot->arch.gfn_track[mode][index];
>> + WARN_ON(val < 0);
>> +}
>> +
>> +/*
>> + * add guest page to the tracking pool so that corresponding access
>> on that
>> + * page will be intercepted.
>> + *
>> + * It should be called under the protection of kvm->srcu or
>> kvm->slots_lock
>> + *
>> + * @kvm: the guest instance we are interested in.
>> + * @gfn: the guest page.
>> + * @mode: tracking mode, currently only write track is supported.
>> + */
>> +void kvm_page_track_add_page(struct kvm *kvm, gfn_t gfn,
>> + enum kvm_page_track_mode mode)
>> +{
>> + struct kvm_memslots *slots;
>> + struct kvm_memory_slot *slot;
>> + int i;
>> +
>> + WARN_ON(!check_mode(mode));
>> +
>> + for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
>> + slots = __kvm_memslots(kvm, i);
>> + slot = __gfn_to_memslot(slots, gfn);
>> +
>> + spin_lock(&kvm->mmu_lock);
>> + update_gfn_track(slot, gfn, mode, 1);
>> +
>> + /*
>> + * new track stops large page mapping for the
>> + * tracked page.
>> + */
>> + kvm_mmu_gfn_disallow_lpage(slot, gfn);
> Where is kvm_mmu_gfn_disallow_lpage? Neither did I see it in your
> patch nor in my own latest KVM repo without your patch :)
>
>> +
>> + if (mode == KVM_PAGE_TRACK_WRITE)
>> + if (kvm_mmu_slot_gfn_write_protect(kvm, slot, gfn))
>> + kvm_flush_remote_tlbs(kvm);
> Neither can I find kvm_mmu_slot_gfn_write_protect. Did I miss something?
Forget this reply. Looks my thunderbird has something wrong and couldn't
show your whole patch series, and I missed patch 1~3. Now I see them. Sorry.

Thanks,
-Kai
>
> Thanks,
> -Kai
>> + spin_unlock(&kvm->mmu_lock);
>> + }
>> +}
>> +
>> +/*
>> + * remove the guest page from the tracking pool which stops the
>> interception
>> + * of corresponding access on that page. It is the opposed operation of
>> + * kvm_page_track_add_page().
>> + *
>> + * It should be called under the protection of kvm->srcu or
>> kvm->slots_lock
>> + *
>> + * @kvm: the guest instance we are interested in.
>> + * @gfn: the guest page.
>> + * @mode: tracking mode, currently only write track is supported.
>> + */
>> +void kvm_page_track_remove_page(struct kvm *kvm, gfn_t gfn,
>> + enum kvm_page_track_mode mode)
>> +{
>> + struct kvm_memslots *slots;
>> + struct kvm_memory_slot *slot;
>> + int i;
>> +
>> + WARN_ON(!check_mode(mode));
>> +
>> + for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
>> + slots = __kvm_memslots(kvm, i);
>> + slot = __gfn_to_memslot(slots, gfn);
>> +
>> + spin_lock(&kvm->mmu_lock);
>> + update_gfn_track(slot, gfn, mode, -1);
>> +
>> + /*
>> + * allow large page mapping for the tracked page
>> + * after the tracker is gone.
>> + */
>> + kvm_mmu_gfn_allow_lpage(slot, gfn);
>> + spin_unlock(&kvm->mmu_lock);
>> + }
>> +}
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

2015-12-15 08:03:46

by Kai Huang

[permalink] [raw]
Subject: Re: [PATCH 08/11] KVM: MMU: use page track for non-leaf shadow pages



On 12/15/2015 03:52 PM, Kai Huang wrote:
>
>
> On 12/01/2015 02:26 AM, Xiao Guangrong wrote:
>> non-leaf shadow pages are always write protected, it can be the user
>> of page track
>>
>> Signed-off-by: Xiao Guangrong <[email protected]>
>> ---
>> arch/x86/include/asm/kvm_page_track.h | 8 +++++
>> arch/x86/kvm/mmu.c | 26 +++++++++++++---
>> arch/x86/kvm/page_track.c | 58
>> +++++++++++++++++++++++------------
>> 3 files changed, 67 insertions(+), 25 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/kvm_page_track.h
>> b/arch/x86/include/asm/kvm_page_track.h
>> index 6744234..3447dac 100644
>> --- a/arch/x86/include/asm/kvm_page_track.h
>> +++ b/arch/x86/include/asm/kvm_page_track.h
>> @@ -41,8 +41,16 @@ int kvm_page_track_create_memslot(struct
>> kvm_memory_slot *slot,
>> void kvm_page_track_free_memslot(struct kvm_memory_slot *free,
>> struct kvm_memory_slot *dont);
>> +void
>> +kvm_slot_page_track_add_page_nolock(struct kvm *kvm,
>> + struct kvm_memory_slot *slot, gfn_t gfn,
>> + enum kvm_page_track_mode mode);
>> void kvm_page_track_add_page(struct kvm *kvm, gfn_t gfn,
>> enum kvm_page_track_mode mode);
>> +void kvm_slot_page_track_remove_page_nolock(struct kvm *kvm,
>> + struct kvm_memory_slot *slot,
>> + gfn_t gfn,
>> + enum kvm_page_track_mode mode);
>> void kvm_page_track_remove_page(struct kvm *kvm, gfn_t gfn,
>> enum kvm_page_track_mode mode);
>> bool kvm_page_track_check_mode(struct kvm_vcpu *vcpu, gfn_t gfn,
>> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
>> index b23f9fc..5a2ca73 100644
>> --- a/arch/x86/kvm/mmu.c
>> +++ b/arch/x86/kvm/mmu.c
>> @@ -806,11 +806,17 @@ static void account_shadowed(struct kvm *kvm,
>> struct kvm_mmu_page *sp)
>> struct kvm_memory_slot *slot;
>> gfn_t gfn;
>> + kvm->arch.indirect_shadow_pages++;
>> gfn = sp->gfn;
>> slots = kvm_memslots_for_spte_role(kvm, sp->role);
>> slot = __gfn_to_memslot(slots, gfn);
>> +
>> + /* the non-leaf shadow pages are keeping readonly. */
>> + if (sp->role.level > PT_PAGE_TABLE_LEVEL)
>> + return kvm_slot_page_track_add_page_nolock(kvm, slot, gfn,
>> + KVM_PAGE_TRACK_WRITE);
>> +
>> kvm_mmu_gfn_disallow_lpage(slot, gfn);
>> - kvm->arch.indirect_shadow_pages++;
>> }
>> static void unaccount_shadowed(struct kvm *kvm, struct
>> kvm_mmu_page *sp)
>> @@ -819,11 +825,15 @@ static void unaccount_shadowed(struct kvm *kvm,
>> struct kvm_mmu_page *sp)
>> struct kvm_memory_slot *slot;
>> gfn_t gfn;
>> + kvm->arch.indirect_shadow_pages--;
>> gfn = sp->gfn;
>> slots = kvm_memslots_for_spte_role(kvm, sp->role);
>> slot = __gfn_to_memslot(slots, gfn);
>> + if (sp->role.level > PT_PAGE_TABLE_LEVEL)
>> + return kvm_slot_page_track_remove_page_nolock(kvm, slot, gfn,
>> + KVM_PAGE_TRACK_WRITE);
>> +
>> kvm_mmu_gfn_allow_lpage(slot, gfn);
>> - kvm->arch.indirect_shadow_pages--;
>> }
>> static bool __mmu_gfn_lpage_is_disallowed(gfn_t gfn, int level,
>> @@ -2140,12 +2150,18 @@ static struct kvm_mmu_page
>> *kvm_mmu_get_page(struct kvm_vcpu *vcpu,
>> hlist_add_head(&sp->hash_link,
>> &vcpu->kvm->arch.mmu_page_hash[kvm_page_table_hashfn(gfn)]);
>> if (!direct) {
>> - if (rmap_write_protect(vcpu, gfn))
>> + /*
>> + * we should do write protection before syncing pages
>> + * otherwise the content of the synced shadow page may
>> + * be inconsistent with guest page table.
>> + */
>> + account_shadowed(vcpu->kvm, sp);
>> +
>> + if (level == PT_PAGE_TABLE_LEVEL &&
>> + rmap_write_protect(vcpu, gfn))
>> kvm_flush_remote_tlbs(vcpu->kvm);
> I think your modification is good but I am little bit confused here.
> In account_shadowed, if sp->role.level > PT_PAGE_TABLE_LEVEL, the
> sp->gfn is write protected, and this is reasonable. So why write
> protecting the gfn of PT_PAGE_TABLE_LEVEL here?
>
>> if (level > PT_PAGE_TABLE_LEVEL && need_sync)
>> kvm_sync_pages(vcpu, gfn);
>> -
>> - account_shadowed(vcpu->kvm, sp);
>> }
>> sp->mmu_valid_gen = vcpu->kvm->arch.mmu_valid_gen;
>> init_shadow_page_table(sp);
>> diff --git a/arch/x86/kvm/page_track.c b/arch/x86/kvm/page_track.c
>> index 84420df..87554d3 100644
>> --- a/arch/x86/kvm/page_track.c
>> +++ b/arch/x86/kvm/page_track.c
>> @@ -77,6 +77,26 @@ static void update_gfn_track(struct
>> kvm_memory_slot *slot, gfn_t gfn,
>> WARN_ON(val < 0);
>> }
>> +void
>> +kvm_slot_page_track_add_page_nolock(struct kvm *kvm,
>> + struct kvm_memory_slot *slot, gfn_t gfn,
>> + enum kvm_page_track_mode mode)
>> +{
>> + WARN_ON(!check_mode(mode));
>> +
>> + update_gfn_track(slot, gfn, mode, 1);
>> +
>> + /*
>> + * new track stops large page mapping for the
>> + * tracked page.
>> + */
>> + kvm_mmu_gfn_disallow_lpage(slot, gfn);
>> +
>> + if (mode == KVM_PAGE_TRACK_WRITE)
>> + if (kvm_mmu_slot_gfn_write_protect(kvm, slot, gfn))
>> + kvm_flush_remote_tlbs(kvm);
>> +}
>> +
>> /*
>> * add guest page to the tracking pool so that corresponding access
>> on that
>> * page will be intercepted.
>> @@ -101,21 +121,27 @@ void kvm_page_track_add_page(struct kvm *kvm,
>> gfn_t gfn,
>> slot = __gfn_to_memslot(slots, gfn);
>> spin_lock(&kvm->mmu_lock);
>> - update_gfn_track(slot, gfn, mode, 1);
>> -
>> - /*
>> - * new track stops large page mapping for the
>> - * tracked page.
>> - */
>> - kvm_mmu_gfn_disallow_lpage(slot, gfn);
>> -
>> - if (mode == KVM_PAGE_TRACK_WRITE)
>> - if (kvm_mmu_slot_gfn_write_protect(kvm, slot, gfn))
>> - kvm_flush_remote_tlbs(kvm);
>> + kvm_slot_page_track_add_page_nolock(kvm, slot, gfn, mode);
>> spin_unlock(&kvm->mmu_lock);
>> }
>> }
>> +void kvm_slot_page_track_remove_page_nolock(struct kvm *kvm,
>> + struct kvm_memory_slot *slot,
>> + gfn_t gfn,
>> + enum kvm_page_track_mode mode)
>> +{
>> + WARN_ON(!check_mode(mode));
>> +
>> + update_gfn_track(slot, gfn, mode, -1);
>> +
>> + /*
>> + * allow large page mapping for the tracked page
>> + * after the tracker is gone.
>> + */
>> + kvm_mmu_gfn_allow_lpage(slot, gfn);
>> +}
>> +
>> /*
>> * remove the guest page from the tracking pool which stops the
>> interception
>> * of corresponding access on that page. It is the opposed
>> operation of
>> @@ -134,20 +160,12 @@ void kvm_page_track_remove_page(struct kvm
>> *kvm, gfn_t gfn,
>> struct kvm_memory_slot *slot;
>> int i;
>> - WARN_ON(!check_mode(mode));
>> -
>> for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
>> slots = __kvm_memslots(kvm, i);
>> slot = __gfn_to_memslot(slots, gfn);
>> spin_lock(&kvm->mmu_lock);
>> - update_gfn_track(slot, gfn, mode, -1);
>> -
>> - /*
>> - * allow large page mapping for the tracked page
>> - * after the tracker is gone.
>> - */
>> - kvm_mmu_gfn_allow_lpage(slot, gfn);
>> + kvm_slot_page_track_remove_page_nolock(kvm, slot, gfn, mode);
> Looks you need to merge this part with patch 1, as you are modifying
> kvm_page_track_{add,remove}_page here, which are introduced in your
> patch 1.
Should be patch 5. sorry again.

Thanks,
-Kai
>
> Thanks,
> -Kai
>> spin_unlock(&kvm->mmu_lock);
>> }
>> }
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

2015-12-15 08:15:25

by Kai Huang

[permalink] [raw]
Subject: Re: [PATCH 06/11] KVM: MMU: let page fault handler be aware tracked page



On 12/01/2015 02:26 AM, Xiao Guangrong wrote:
> The page fault caused by write access on the write tracked page can not
> be fixed, it always need to be emulated. page_fault_handle_page_track()
> is the fast path we introduce here to skip holding mmu-lock and shadow
> page table walking
Why can it be out side of mmu-lock? Is it OK that some other thread
removes tracking of this page simultaneously? Shall we assuming the
emulation code should handle this case?

Even it works for write protection, is it OK for other purpose tracking
(as your tracking framework can be extended for other purpose)?
>
> However, if the page table is not present, it is worth making the page
> table entry present and readonly to make the read access happy
It's fine for tracking write from guest. But what if I want to track
every read from guest? Probably I am exaggerating :)

Thanks,
-Kai
>
> mmu_need_write_protect() need to be cooked to avoid page becoming writable
> when making page table present or sync/prefetch shadow page table entries
>
> Signed-off-by: Xiao Guangrong <[email protected]>
> ---
> arch/x86/include/asm/kvm_page_track.h | 2 ++
> arch/x86/kvm/mmu.c | 44 +++++++++++++++++++++++++++++------
> arch/x86/kvm/page_track.c | 14 +++++++++++
> arch/x86/kvm/paging_tmpl.h | 3 +++
> 4 files changed, 56 insertions(+), 7 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm_page_track.h b/arch/x86/include/asm/kvm_page_track.h
> index 9cc17c6..f223201 100644
> --- a/arch/x86/include/asm/kvm_page_track.h
> +++ b/arch/x86/include/asm/kvm_page_track.h
> @@ -15,4 +15,6 @@ void kvm_page_track_add_page(struct kvm *kvm, gfn_t gfn,
> enum kvm_page_track_mode mode);
> void kvm_page_track_remove_page(struct kvm *kvm, gfn_t gfn,
> enum kvm_page_track_mode mode);
> +bool kvm_page_track_check_mode(struct kvm_vcpu *vcpu, gfn_t gfn,
> + enum kvm_page_track_mode mode);
> #endif
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index 39809b8..b23f9fc 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -41,6 +41,7 @@
> #include <asm/cmpxchg.h>
> #include <asm/io.h>
> #include <asm/vmx.h>
> +#include <asm/kvm_page_track.h>
>
> /*
> * When setting this variable to true it enables Two-Dimensional-Paging
> @@ -2456,25 +2457,29 @@ static void kvm_unsync_pages(struct kvm_vcpu *vcpu, gfn_t gfn)
> }
> }
>
> -static int mmu_need_write_protect(struct kvm_vcpu *vcpu, gfn_t gfn,
> - bool can_unsync)
> +static bool mmu_need_write_protect(struct kvm_vcpu *vcpu, gfn_t gfn,
> + bool can_unsync)
> {
> struct kvm_mmu_page *s;
> bool need_unsync = false;
>
> + if (kvm_page_track_check_mode(vcpu, gfn, KVM_PAGE_TRACK_WRITE))
> + return true;
> +
> for_each_gfn_indirect_valid_sp(vcpu->kvm, s, gfn) {
> if (!can_unsync)
> - return 1;
> + return true;
>
> if (s->role.level != PT_PAGE_TABLE_LEVEL)
> - return 1;
> + return true;
>
> if (!s->unsync)
> need_unsync = true;
> }
> if (need_unsync)
> kvm_unsync_pages(vcpu, gfn);
> - return 0;
> +
> + return false;
> }
>
> static bool kvm_is_mmio_pfn(pfn_t pfn)
> @@ -3388,10 +3393,30 @@ int handle_mmio_page_fault(struct kvm_vcpu *vcpu, u64 addr, bool direct)
> }
> EXPORT_SYMBOL_GPL(handle_mmio_page_fault);
>
> +static bool page_fault_handle_page_track(struct kvm_vcpu *vcpu,
> + u32 error_code, gfn_t gfn)
> +{
> + if (unlikely(error_code & PFERR_RSVD_MASK))
> + return false;
> +
> + if (!(error_code & PFERR_PRESENT_MASK) ||
> + !(error_code & PFERR_WRITE_MASK))
> + return false;
> +
> + /*
> + * guest is writing the page which is write tracked which can
> + * not be fixed by page fault handler.
> + */
> + if (kvm_page_track_check_mode(vcpu, gfn, KVM_PAGE_TRACK_WRITE))
> + return true;
> +
> + return false;
> +}
> +
> static int nonpaging_page_fault(struct kvm_vcpu *vcpu, gva_t gva,
> u32 error_code, bool prefault)
> {
> - gfn_t gfn;
> + gfn_t gfn = gva >> PAGE_SHIFT;
> int r;
>
> pgprintk("%s: gva %lx error %x\n", __func__, gva, error_code);
> @@ -3403,13 +3428,15 @@ static int nonpaging_page_fault(struct kvm_vcpu *vcpu, gva_t gva,
> return r;
> }
>
> + if (page_fault_handle_page_track(vcpu, error_code, gfn))
> + return 1;
> +
> r = mmu_topup_memory_caches(vcpu);
> if (r)
> return r;
>
> MMU_WARN_ON(!VALID_PAGE(vcpu->arch.mmu.root_hpa));
>
> - gfn = gva >> PAGE_SHIFT;
>
> return nonpaging_map(vcpu, gva & PAGE_MASK,
> error_code, gfn, prefault);
> @@ -3493,6 +3520,9 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa, u32 error_code,
> return r;
> }
>
> + if (page_fault_handle_page_track(vcpu, error_code, gfn))
> + return 1;
> +
> r = mmu_topup_memory_caches(vcpu);
> if (r)
> return r;
> diff --git a/arch/x86/kvm/page_track.c b/arch/x86/kvm/page_track.c
> index ad510db..dc2da12 100644
> --- a/arch/x86/kvm/page_track.c
> +++ b/arch/x86/kvm/page_track.c
> @@ -151,3 +151,17 @@ void kvm_page_track_remove_page(struct kvm *kvm, gfn_t gfn,
> spin_unlock(&kvm->mmu_lock);
> }
> }
> +
> +/*
> + * check if the corresponding access on the specified guest page is tracked.
> + */
> +bool kvm_page_track_check_mode(struct kvm_vcpu *vcpu, gfn_t gfn,
> + enum kvm_page_track_mode mode)
> +{
> + struct kvm_memory_slot *slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn);
> + int index = gfn_to_index(gfn, slot->base_gfn, PT_PAGE_TABLE_LEVEL);
> +
> + WARN_ON(!check_mode(mode));
> +
> + return !!ACCESS_ONCE(slot->arch.gfn_track[mode][index]);
> +}
> diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
> index 91e939b..ac85682 100644
> --- a/arch/x86/kvm/paging_tmpl.h
> +++ b/arch/x86/kvm/paging_tmpl.h
> @@ -735,6 +735,9 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t addr, u32 error_code,
> return 0;
> }
>
> + if (page_fault_handle_page_track(vcpu, error_code, walker.gfn))
> + return 1;
> +
> vcpu->arch.write_fault_to_shadow_pgtable = false;
>
> is_self_change_mapping = FNAME(is_self_change_mapping)(vcpu,

2015-12-15 08:47:33

by Kai Huang

[permalink] [raw]
Subject: Re: [PATCH 09/11] KVM: MMU: simplify mmu_need_write_protect



On 12/01/2015 02:26 AM, Xiao Guangrong wrote:
> Now, all non-leaf shadow page are page tracked, if gfn is not tracked
> there is no non-leaf shadow page of gfn is existed, we can directly
> make the shadow page of gfn to unsync
>
> Signed-off-by: Xiao Guangrong <[email protected]>
> ---
> arch/x86/kvm/mmu.c | 26 ++++++++------------------
> 1 file changed, 8 insertions(+), 18 deletions(-)
>
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index 5a2ca73..f89e77f 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -2461,41 +2461,31 @@ static void __kvm_unsync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
> kvm_mmu_mark_parents_unsync(sp);
> }
>
> -static void kvm_unsync_pages(struct kvm_vcpu *vcpu, gfn_t gfn)
> +static bool kvm_unsync_pages(struct kvm_vcpu *vcpu, gfn_t gfn,
> + bool can_unsync)
> {
> struct kvm_mmu_page *s;
>
> for_each_gfn_indirect_valid_sp(vcpu->kvm, s, gfn) {
> + if (!can_unsync)
> + return true;
How about moving this right before for_each_gfn_indirect_valid_sp? As
can_unsync is passed as parameter, so there's no point checking it
several times.

A further thinking is can we move it to mmu_need_write_protect? Passing
can_unsync as parameter to kvm_unsync_pages sounds a little bit odd.

> +
> if (s->unsync)
> continue;
> WARN_ON(s->role.level != PT_PAGE_TABLE_LEVEL);
How about large page mapping? Such as if guest uses 2M mapping and its
shadow is indirect, does above WARN_ON still meet? As you removed the PT
level check in mmu_need_write_protect.

Thanks,
-Kai
> __kvm_unsync_page(vcpu, s);
> }
> +
> + return false;
> }

>
> static bool mmu_need_write_protect(struct kvm_vcpu *vcpu, gfn_t gfn,
> bool can_unsync)
> {
> - struct kvm_mmu_page *s;
> - bool need_unsync = false;
> -
> if (kvm_page_track_check_mode(vcpu, gfn, KVM_PAGE_TRACK_WRITE))
> return true;
>
> - for_each_gfn_indirect_valid_sp(vcpu->kvm, s, gfn) {
> - if (!can_unsync)
> - return true;
> -
> - if (s->role.level != PT_PAGE_TABLE_LEVEL)
> - return true;
> -
> - if (!s->unsync)
> - need_unsync = true;
> - }
> - if (need_unsync)
> - kvm_unsync_pages(vcpu, gfn);
> -
> - return false;
> + return kvm_unsync_pages(vcpu, gfn, can_unsync);
> }
>
> static bool kvm_is_mmio_pfn(pfn_t pfn)

2015-12-15 08:53:17

by Xiao Guangrong

[permalink] [raw]
Subject: Re: [PATCH 04/11] KVM: page track: add the framework of guest page tracking



On 12/15/2015 03:06 PM, Kai Huang wrote:
> Hi Guangrong,
>
> I am starting to review this series, and should have some comments or questions, you can determine
> whether they are valuable :)

Thank you very much for your review and breaking the silent on this patchset. ;)


>> +static void page_track_slot_free(struct kvm_memory_slot *slot)
>> +{
>> + int i;
>> +
>> + for (i = 0; i < KVM_PAGE_TRACK_MAX; i++)
>> + if (slot->arch.gfn_track[i]) {
>> + kvfree(slot->arch.gfn_track[i]);
>> + slot->arch.gfn_track[i] = NULL;
>> + }
>> +}
>> +
>> +int kvm_page_track_create_memslot(struct kvm_memory_slot *slot,
>> + unsigned long npages)
>> +{
>> + int i, pages = gfn_to_index(slot->base_gfn + npages - 1,
>> + slot->base_gfn, PT_PAGE_TABLE_LEVEL) + 1;
>> +
>> + for (i = 0; i < KVM_PAGE_TRACK_MAX; i++) {
>> + slot->arch.gfn_track[i] = kvm_kvzalloc(pages *
>> + sizeof(*slot->arch.gfn_track[i]));
>> + if (!slot->arch.gfn_track[i])
>> + goto track_free;
>> + }
>> +
>> + return 0;
>> +
>> +track_free:
>> + page_track_slot_free(slot);
>> + return -ENOMEM;
>> +}
> Is it necessary to use the 'unsigned long npages' pareameter? In my understanding you are going to

The type, 'int', is used here as I followed the style of 'struct kvm_lpage_info'.

4 bytes should be enough to track all users and signed type is good to track
overflow.

> track all GFNs in the memory slot anyway, right? If you want to keep npages, I think it's better to
> add a base_gfn as well otherwise you are assuming you are going to track the npages GFN starting
> from slot->base_gfn.

Yes, any page in the memslot may be tracked so that there is a index for every
page.

>
>> +
>> +void kvm_page_track_free_memslot(struct kvm_memory_slot *free,
>> + struct kvm_memory_slot *dont)
>> +{
>> + if (!dont || free->arch.gfn_track != dont->arch.gfn_track)
>> + page_track_slot_free(free);
>> +}
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index c04987e..ad4888a 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -7838,6 +7838,8 @@ void kvm_arch_free_memslot(struct kvm *kvm, struct kvm_memory_slot *free,
>> free->arch.lpage_info[i - 1] = NULL;
>> }
>> }
>> +
>> + kvm_page_track_free_memslot(free, dont);
>> }
>> int kvm_arch_create_memslot(struct kvm *kvm, struct kvm_memory_slot *slot,
>> @@ -7886,6 +7888,9 @@ int kvm_arch_create_memslot(struct kvm *kvm, struct kvm_memory_slot *slot,
>> }
>> }
>> + if (kvm_page_track_create_memslot(slot, npages))
>> + goto out_free;
>> +
> Looks essentially you are allocating one int for all GFNs of the slot unconditionally. In my
> understanding for most of memory slots, we are not going to track them, so isn't it going to be
> wasteful of memory?
>

Yes, hmm... maybe we can make the index as "unsigned short" then 1G memory only needs 512k index
buffer. It is not so unacceptable.

2015-12-15 08:51:43

by Kai Huang

[permalink] [raw]
Subject: Re: [PATCH 09/11] KVM: MMU: simplify mmu_need_write_protect



On 12/15/2015 04:43 PM, Kai Huang wrote:
>
>
> On 12/01/2015 02:26 AM, Xiao Guangrong wrote:
>> Now, all non-leaf shadow page are page tracked, if gfn is not tracked
>> there is no non-leaf shadow page of gfn is existed, we can directly
>> make the shadow page of gfn to unsync
>>
>> Signed-off-by: Xiao Guangrong <[email protected]>
>> ---
>> arch/x86/kvm/mmu.c | 26 ++++++++------------------
>> 1 file changed, 8 insertions(+), 18 deletions(-)
>>
>> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
>> index 5a2ca73..f89e77f 100644
>> --- a/arch/x86/kvm/mmu.c
>> +++ b/arch/x86/kvm/mmu.c
>> @@ -2461,41 +2461,31 @@ static void __kvm_unsync_page(struct kvm_vcpu
>> *vcpu, struct kvm_mmu_page *sp)
>> kvm_mmu_mark_parents_unsync(sp);
>> }
>> -static void kvm_unsync_pages(struct kvm_vcpu *vcpu, gfn_t gfn)
>> +static bool kvm_unsync_pages(struct kvm_vcpu *vcpu, gfn_t gfn,
>> + bool can_unsync)
>> {
>> struct kvm_mmu_page *s;
>> for_each_gfn_indirect_valid_sp(vcpu->kvm, s, gfn) {
>> + if (!can_unsync)
>> + return true;
> How about moving this right before for_each_gfn_indirect_valid_sp? As
> can_unsync is passed as parameter, so there's no point checking it
> several times.
>
> A further thinking is can we move it to mmu_need_write_protect?
> Passing can_unsync as parameter to kvm_unsync_pages sounds a little
> bit odd.
>
>> +
>> if (s->unsync)
>> continue;
>> WARN_ON(s->role.level != PT_PAGE_TABLE_LEVEL);
> How about large page mapping? Such as if guest uses 2M mapping and its
> shadow is indirect, does above WARN_ON still meet? As you removed the
> PT level check in mmu_need_write_protect.
>
> Thanks,
> -Kai
Btw I also think this patch can be merged with patch 6.

Thanks,
-Kai
>> __kvm_unsync_page(vcpu, s);
>> }
>> +
>> + return false;
>> }
>
>> static bool mmu_need_write_protect(struct kvm_vcpu *vcpu, gfn_t gfn,
>> bool can_unsync)
>> {
>> - struct kvm_mmu_page *s;
>> - bool need_unsync = false;
>> -
>> if (kvm_page_track_check_mode(vcpu, gfn, KVM_PAGE_TRACK_WRITE))
>> return true;
>> - for_each_gfn_indirect_valid_sp(vcpu->kvm, s, gfn) {
>> - if (!can_unsync)
>> - return true;
>> -
>> - if (s->role.level != PT_PAGE_TABLE_LEVEL)
>> - return true;
>> -
>> - if (!s->unsync)
>> - need_unsync = true;
>> - }
>> - if (need_unsync)
>> - kvm_unsync_pages(vcpu, gfn);
>> -
>> - return false;
>> + return kvm_unsync_pages(vcpu, gfn, can_unsync);
>> }
>> static bool kvm_is_mmio_pfn(pfn_t pfn)
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

2015-12-15 09:10:30

by Xiao Guangrong

[permalink] [raw]
Subject: Re: [PATCH 06/11] KVM: MMU: let page fault handler be aware tracked page



On 12/15/2015 04:11 PM, Kai Huang wrote:
>
>
> On 12/01/2015 02:26 AM, Xiao Guangrong wrote:
>> The page fault caused by write access on the write tracked page can not
>> be fixed, it always need to be emulated. page_fault_handle_page_track()
>> is the fast path we introduce here to skip holding mmu-lock and shadow
>> page table walking
> Why can it be out side of mmu-lock? Is it OK that some other thread removes tracking of this page
> simultaneously? Shall we assuming the emulation code should handle this case?
>

What your mentioned is the worst case, if that happen the vcpu will spend
longer time to emulate the access rather then retry it. It is bad but it is
the rare contention. It is worth speeding up the common / most case.

> Even it works for write protection, is it OK for other purpose tracking (as your tracking framework
> can be extended for other purpose)?

We only need to make sure that no track event is lost, i.e, we can not
skip the case that the index is changed from 0 to 1.

If we see index is 0, the vcpu can hold the mmu-lock and go to slow path
anyway so no track event will be lost.

>>
>> However, if the page table is not present, it is worth making the page
>> table entry present and readonly to make the read access happy
> It's fine for tracking write from guest. But what if I want to track every read from guest?
> Probably I am exaggerating :)
>

Then we do not go to the real page fault handler, just keep the shadow
page entry non-present.

2015-12-15 09:17:02

by Xiao Guangrong

[permalink] [raw]
Subject: Re: [PATCH 08/11] KVM: MMU: use page track for non-leaf shadow pages



On 12/15/2015 03:52 PM, Kai Huang wrote:

>> static bool __mmu_gfn_lpage_is_disallowed(gfn_t gfn, int level,
>> @@ -2140,12 +2150,18 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu,
>> hlist_add_head(&sp->hash_link,
>> &vcpu->kvm->arch.mmu_page_hash[kvm_page_table_hashfn(gfn)]);
>> if (!direct) {
>> - if (rmap_write_protect(vcpu, gfn))
>> + /*
>> + * we should do write protection before syncing pages
>> + * otherwise the content of the synced shadow page may
>> + * be inconsistent with guest page table.
>> + */
>> + account_shadowed(vcpu->kvm, sp);
>> +
>> + if (level == PT_PAGE_TABLE_LEVEL &&
>> + rmap_write_protect(vcpu, gfn))
>> kvm_flush_remote_tlbs(vcpu->kvm);
> I think your modification is good but I am little bit confused here. In account_shadowed, if
> sp->role.level > PT_PAGE_TABLE_LEVEL, the sp->gfn is write protected, and this is reasonable. So why
> write protecting the gfn of PT_PAGE_TABLE_LEVEL here?

Because the shadow page will become 'sync' that means the shadow page will be synced
with the page table in guest. So the shadow page need to be write-protected to avoid
the guest page table is changed when we do the 'sync' thing.

The shadow page need to be write-protected to avoid that guest page table is changed
when we are syncing the shadow page table. See kvm_sync_pages() after doing
rmap_write_protect().

>> /*
>> * remove the guest page from the tracking pool which stops the interception
>> * of corresponding access on that page. It is the opposed operation of
>> @@ -134,20 +160,12 @@ void kvm_page_track_remove_page(struct kvm *kvm, gfn_t gfn,
>> struct kvm_memory_slot *slot;
>> int i;
>> - WARN_ON(!check_mode(mode));
>> -
>> for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
>> slots = __kvm_memslots(kvm, i);
>> slot = __gfn_to_memslot(slots, gfn);
>> spin_lock(&kvm->mmu_lock);
>> - update_gfn_track(slot, gfn, mode, -1);
>> -
>> - /*
>> - * allow large page mapping for the tracked page
>> - * after the tracker is gone.
>> - */
>> - kvm_mmu_gfn_allow_lpage(slot, gfn);
>> + kvm_slot_page_track_remove_page_nolock(kvm, slot, gfn, mode);
> Looks you need to merge this part with patch 1, as you are modifying
> kvm_page_track_{add,remove}_page here, which are introduced in your patch 1.

Indeed, it is better.

2015-12-15 09:35:56

by Xiao Guangrong

[permalink] [raw]
Subject: Re: [PATCH 09/11] KVM: MMU: simplify mmu_need_write_protect



On 12/15/2015 04:43 PM, Kai Huang wrote:
>
>
> On 12/01/2015 02:26 AM, Xiao Guangrong wrote:
>> Now, all non-leaf shadow page are page tracked, if gfn is not tracked
>> there is no non-leaf shadow page of gfn is existed, we can directly
>> make the shadow page of gfn to unsync
>>
>> Signed-off-by: Xiao Guangrong <[email protected]>
>> ---
>> arch/x86/kvm/mmu.c | 26 ++++++++------------------
>> 1 file changed, 8 insertions(+), 18 deletions(-)
>>
>> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
>> index 5a2ca73..f89e77f 100644
>> --- a/arch/x86/kvm/mmu.c
>> +++ b/arch/x86/kvm/mmu.c
>> @@ -2461,41 +2461,31 @@ static void __kvm_unsync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
>> kvm_mmu_mark_parents_unsync(sp);
>> }
>> -static void kvm_unsync_pages(struct kvm_vcpu *vcpu, gfn_t gfn)
>> +static bool kvm_unsync_pages(struct kvm_vcpu *vcpu, gfn_t gfn,
>> + bool can_unsync)
>> {
>> struct kvm_mmu_page *s;
>> for_each_gfn_indirect_valid_sp(vcpu->kvm, s, gfn) {
>> + if (!can_unsync)
>> + return true;
> How about moving this right before for_each_gfn_indirect_valid_sp? As can_unsync is passed as
> parameter, so there's no point checking it several times.
>

We can not do this. What we are doing here is checking if we have shadow page mapping
for 'gfn':
a) if no, it can be writable.
b) if yes, check 'can_unsync' to see if these shadow pages can make to be 'unsync'.

Your suggestion can break the point a).

> A further thinking is can we move it to mmu_need_write_protect? Passing can_unsync as parameter to
> kvm_unsync_pages sounds a little bit odd.
>
>> +
>> if (s->unsync)
>> continue;
>> WARN_ON(s->role.level != PT_PAGE_TABLE_LEVEL);
> How about large page mapping? Such as if guest uses 2M mapping and its shadow is indirect, does
> above WARN_ON still meet? As you removed the PT level check in mmu_need_write_protect.

The lager mapping are on the non-leaf shadow pages which can be figured out by
kvm_page_track_check_mode() before we call this function.

2015-12-15 09:37:36

by Xiao Guangrong

[permalink] [raw]
Subject: Re: [PATCH 09/11] KVM: MMU: simplify mmu_need_write_protect



On 12/15/2015 04:47 PM, Kai Huang wrote:

>> A further thinking is can we move it to mmu_need_write_protect? Passing can_unsync as parameter to
>> kvm_unsync_pages sounds a little bit odd.
>>
>>> +
>>> if (s->unsync)
>>> continue;
>>> WARN_ON(s->role.level != PT_PAGE_TABLE_LEVEL);
>> How about large page mapping? Such as if guest uses 2M mapping and its shadow is indirect, does
>> above WARN_ON still meet? As you removed the PT level check in mmu_need_write_protect.
>>
>> Thanks,
>> -Kai
> Btw I also think this patch can be merged with patch 6.

We can not as it depends on patch 8. ;)

2015-12-16 05:54:25

by Jike Song

[permalink] [raw]
Subject: Re: [PATCH 07/11] KVM: page track: add notifier support

On 12/01/2015 02:26 AM, Xiao Guangrong wrote:
> Notifier list is introduced so that any node wants to receive the track
> event can register to the list
>
> Two APIs are introduced here:
> - kvm_page_track_register_notifier(): register the notifier to receive
> track event
>
> - kvm_page_track_unregister_notifier(): stop receiving track event by
> unregister the notifier
>
> The callback, node->track_write() is called when a write access on the
> write tracked page happens
>
> Signed-off-by: Xiao Guangrong <[email protected]>
> ---
> arch/x86/include/asm/kvm_host.h | 1 +
> arch/x86/include/asm/kvm_page_track.h | 39 ++++++++++++++++++++
> arch/x86/kvm/page_track.c | 67 +++++++++++++++++++++++++++++++++++
> arch/x86/kvm/x86.c | 4 +++
> 4 files changed, 111 insertions(+)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index afff1f1..0f7b940 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -658,6 +658,7 @@ struct kvm_arch {
> */
> struct list_head active_mmu_pages;
> struct list_head zapped_obsolete_pages;
> + struct kvm_page_track_notifier_head track_notifier_head;
>
> struct list_head assigned_dev_head;
> struct iommu_domain *iommu_domain;
> diff --git a/arch/x86/include/asm/kvm_page_track.h b/arch/x86/include/asm/kvm_page_track.h
> index f223201..6744234 100644
> --- a/arch/x86/include/asm/kvm_page_track.h
> +++ b/arch/x86/include/asm/kvm_page_track.h
> @@ -6,6 +6,36 @@ enum kvm_page_track_mode {
> KVM_PAGE_TRACK_MAX,
> };
>
> +/*
> + * The notifier represented by @kvm_page_track_notifier_node is linked into
> + * the head which will be notified when guest is triggering the track event.
> + *
> + * Write access on the head is protected by kvm->mmu_lock, read access
> + * is protected by track_srcu.
> + */
> +struct kvm_page_track_notifier_head {
> + struct srcu_struct track_srcu;
> + struct hlist_head track_notifier_list;
> +};
> +
> +struct kvm_page_track_notifier_node {
> + struct hlist_node node;
> +
> + /*
> + * It is called when guest is writing the write-tracked page
> + * and write emulation is finished at that time.
> + *
> + * @vcpu: the vcpu where the write access happened.
> + * @gpa: the physical address written by guest.
> + * @new: the data was written to the address.
> + * @bytes: the written length.
> + */
> + void (*track_write)(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
> + int bytes);

Sir, is it possible to make this non-void? as you described below, the
callback may find this gpa isn't the page being tracked, so it probably
want to return something to indicate: not my business, continue :)

> +};
> +
> +void kvm_page_track_init(struct kvm *kvm);
> +
> int kvm_page_track_create_memslot(struct kvm_memory_slot *slot,
> unsigned long npages);
> void kvm_page_track_free_memslot(struct kvm_memory_slot *free,
> @@ -17,4 +47,13 @@ void kvm_page_track_remove_page(struct kvm *kvm, gfn_t gfn,
> enum kvm_page_track_mode mode);
> bool kvm_page_track_check_mode(struct kvm_vcpu *vcpu, gfn_t gfn,
> enum kvm_page_track_mode mode);
> +
> +void
> +kvm_page_track_register_notifier(struct kvm *kvm,
> + struct kvm_page_track_notifier_node *n);
> +void
> +kvm_page_track_unregister_notifier(struct kvm *kvm,
> + struct kvm_page_track_notifier_node *n);
> +void kvm_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
> + int bytes);
> #endif
> diff --git a/arch/x86/kvm/page_track.c b/arch/x86/kvm/page_track.c
> index dc2da12..84420df 100644
> --- a/arch/x86/kvm/page_track.c
> +++ b/arch/x86/kvm/page_track.c
> @@ -165,3 +165,70 @@ bool kvm_page_track_check_mode(struct kvm_vcpu *vcpu, gfn_t gfn,
>
> return !!ACCESS_ONCE(slot->arch.gfn_track[mode][index]);
> }
> +
> +void kvm_page_track_init(struct kvm *kvm)
> +{
> + struct kvm_page_track_notifier_head *head;
> +
> + head = &kvm->arch.track_notifier_head;
> + init_srcu_struct(&head->track_srcu);
> + INIT_HLIST_HEAD(&head->track_notifier_list);
> +}
> +
> +/*
> + * register the notifier so that event interception for the tracked guest
> + * pages can be received.
> + */
> +void
> +kvm_page_track_register_notifier(struct kvm *kvm,
> + struct kvm_page_track_notifier_node *n)
> +{
> + struct kvm_page_track_notifier_head *head;
> +
> + head = &kvm->arch.track_notifier_head;
> +
> + spin_lock(&kvm->mmu_lock);
> + hlist_add_head_rcu(&n->node, &head->track_notifier_list);
> + spin_unlock(&kvm->mmu_lock);
> +}
> +
> +/*
> + * stop receiving the event interception. It is the opposed operation of
> + * kvm_page_track_register_notifier().
> + */
> +void
> +kvm_page_track_unregister_notifier(struct kvm *kvm,
> + struct kvm_page_track_notifier_node *n)
> +{
> + struct kvm_page_track_notifier_head *head;
> +
> + head = &kvm->arch.track_notifier_head;
> +
> + spin_lock(&kvm->mmu_lock);
> + hlist_del_rcu(&n->node);
> + spin_unlock(&kvm->mmu_lock);
> + synchronize_srcu(&head->track_srcu);
> +}
> +
> +/*
> + * Notify the node that write access is intercepted and write emulation is
> + * finished at this time.
> + *
> + * The node should figure out if the written page is the one that node is
> + * interested in by itself.
> + */
> +void kvm_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
> + int bytes)
> +{
> + struct kvm_page_track_notifier_head *head;
> + struct kvm_page_track_notifier_node *n;
> + int idx;
> +
> + head = &vcpu->kvm->arch.track_notifier_head;
> +
> + idx = srcu_read_lock(&head->track_srcu);
> + hlist_for_each_entry_rcu(n, &head->track_notifier_list, node)
> + if (n->track_write)
> + n->track_write(vcpu, gpa, new, bytes);
> + srcu_read_unlock(&head->track_srcu, idx);
> +}
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index ad4888a..64dbc69 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -4328,6 +4328,7 @@ int emulator_write_phys(struct kvm_vcpu *vcpu, gpa_t gpa,
> if (ret < 0)
> return 0;
> kvm_mmu_pte_write(vcpu, gpa, val, bytes);
> + kvm_page_track_write(vcpu, gpa, val, bytes);
> return 1;
> }
>
> @@ -4586,6 +4587,7 @@ static int emulator_cmpxchg_emulated(struct x86_emulate_ctxt *ctxt,
>
> kvm_vcpu_mark_page_dirty(vcpu, gpa >> PAGE_SHIFT);
> kvm_mmu_pte_write(vcpu, gpa, new, bytes);
> + kvm_page_track_write(vcpu, gpa, new, bytes);
>
> return X86EMUL_CONTINUE;
>
> @@ -7691,6 +7693,8 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
> INIT_DELAYED_WORK(&kvm->arch.kvmclock_update_work, kvmclock_update_fn);
> INIT_DELAYED_WORK(&kvm->arch.kvmclock_sync_work, kvmclock_sync_fn);
>
> + kvm_page_track_init(kvm);
> +
> return 0;
> }
>
>
--
Thanks,
Jike

2015-12-16 06:34:01

by Xiao Guangrong

[permalink] [raw]
Subject: Re: [PATCH 07/11] KVM: page track: add notifier support



On 12/16/2015 01:53 PM, Jike Song wrote:
> On 12/01/2015 02:26 AM, Xiao Guangrong wrote:
>> Notifier list is introduced so that any node wants to receive the track
>> event can register to the list
>>
>> Two APIs are introduced here:
>> - kvm_page_track_register_notifier(): register the notifier to receive
>> track event
>>
>> - kvm_page_track_unregister_notifier(): stop receiving track event by
>> unregister the notifier
>>
>> The callback, node->track_write() is called when a write access on the
>> write tracked page happens
>>
>> Signed-off-by: Xiao Guangrong <[email protected]>
>> ---
>> arch/x86/include/asm/kvm_host.h | 1 +
>> arch/x86/include/asm/kvm_page_track.h | 39 ++++++++++++++++++++
>> arch/x86/kvm/page_track.c | 67 +++++++++++++++++++++++++++++++++++
>> arch/x86/kvm/x86.c | 4 +++
>> 4 files changed, 111 insertions(+)
>>
>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>> index afff1f1..0f7b940 100644
>> --- a/arch/x86/include/asm/kvm_host.h
>> +++ b/arch/x86/include/asm/kvm_host.h
>> @@ -658,6 +658,7 @@ struct kvm_arch {
>> */
>> struct list_head active_mmu_pages;
>> struct list_head zapped_obsolete_pages;
>> + struct kvm_page_track_notifier_head track_notifier_head;
>>
>> struct list_head assigned_dev_head;
>> struct iommu_domain *iommu_domain;
>> diff --git a/arch/x86/include/asm/kvm_page_track.h b/arch/x86/include/asm/kvm_page_track.h
>> index f223201..6744234 100644
>> --- a/arch/x86/include/asm/kvm_page_track.h
>> +++ b/arch/x86/include/asm/kvm_page_track.h
>> @@ -6,6 +6,36 @@ enum kvm_page_track_mode {
>> KVM_PAGE_TRACK_MAX,
>> };
>>
>> +/*
>> + * The notifier represented by @kvm_page_track_notifier_node is linked into
>> + * the head which will be notified when guest is triggering the track event.
>> + *
>> + * Write access on the head is protected by kvm->mmu_lock, read access
>> + * is protected by track_srcu.
>> + */
>> +struct kvm_page_track_notifier_head {
>> + struct srcu_struct track_srcu;
>> + struct hlist_head track_notifier_list;
>> +};
>> +
>> +struct kvm_page_track_notifier_node {
>> + struct hlist_node node;
>> +
>> + /*
>> + * It is called when guest is writing the write-tracked page
>> + * and write emulation is finished at that time.
>> + *
>> + * @vcpu: the vcpu where the write access happened.
>> + * @gpa: the physical address written by guest.
>> + * @new: the data was written to the address.
>> + * @bytes: the written length.
>> + */
>> + void (*track_write)(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
>> + int bytes);
>
> Sir, is it possible to make this non-void? as you described below, the
> callback may find this gpa isn't the page being tracked, so it probably
> want to return something to indicate: not my business, continue :)

Currently the return value is useless and it is not ABI so we can extend
it if it is needed in the future.

2015-12-16 07:36:14

by Kai Huang

[permalink] [raw]
Subject: Re: [PATCH 06/11] KVM: MMU: let page fault handler be aware tracked page



On 12/15/2015 05:03 PM, Xiao Guangrong wrote:
>
>
> On 12/15/2015 04:11 PM, Kai Huang wrote:
>>
>>
>> On 12/01/2015 02:26 AM, Xiao Guangrong wrote:
>>> The page fault caused by write access on the write tracked page can not
>>> be fixed, it always need to be emulated. page_fault_handle_page_track()
>>> is the fast path we introduce here to skip holding mmu-lock and shadow
>>> page table walking
>> Why can it be out side of mmu-lock? Is it OK that some other thread
>> removes tracking of this page
>> simultaneously? Shall we assuming the emulation code should handle
>> this case?
>>
>
> What your mentioned is the worst case, if that happen the vcpu will spend
> longer time to emulate the access rather then retry it. It is bad but
> it is
> the rare contention. It is worth speeding up the common / most case.
My concern is when this case happen, whether emulating the access is
still the right behavior, you know, after other thread removed the GFN
from tracking..
And as the notifier's track_write call back will be called in the
emulating code, won't there be problem if the GFN has been removed from
tracking by other thread?

Thanks,
-Kai
>
>> Even it works for write protection, is it OK for other purpose
>> tracking (as your tracking framework
>> can be extended for other purpose)?
>
> We only need to make sure that no track event is lost, i.e, we can not
> skip the case that the index is changed from 0 to 1.
>
> If we see index is 0, the vcpu can hold the mmu-lock and go to slow path
> anyway so no track event will be lost.
>
>>>
>>> However, if the page table is not present, it is worth making the page
>>> table entry present and readonly to make the read access happy
>> It's fine for tracking write from guest. But what if I want to track
>> every read from guest?
>> Probably I am exaggerating :)
>>
>
> Then we do not go to the real page fault handler, just keep the shadow
> page entry non-present.
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

2015-12-16 07:38:41

by Kai Huang

[permalink] [raw]
Subject: Re: [PATCH 04/11] KVM: page track: add the framework of guest page tracking



On 12/15/2015 04:46 PM, Xiao Guangrong wrote:
>
>
> On 12/15/2015 03:06 PM, Kai Huang wrote:
>> Hi Guangrong,
>>
>> I am starting to review this series, and should have some comments or
>> questions, you can determine
>> whether they are valuable :)
>
> Thank you very much for your review and breaking the silent on this
> patchset. ;)
>
>
>>> +static void page_track_slot_free(struct kvm_memory_slot *slot)
>>> +{
>>> + int i;
>>> +
>>> + for (i = 0; i < KVM_PAGE_TRACK_MAX; i++)
>>> + if (slot->arch.gfn_track[i]) {
>>> + kvfree(slot->arch.gfn_track[i]);
>>> + slot->arch.gfn_track[i] = NULL;
>>> + }
>>> +}
>>> +
>>> +int kvm_page_track_create_memslot(struct kvm_memory_slot *slot,
>>> + unsigned long npages)
>>> +{
>>> + int i, pages = gfn_to_index(slot->base_gfn + npages - 1,
>>> + slot->base_gfn, PT_PAGE_TABLE_LEVEL) + 1;
>>> +
>>> + for (i = 0; i < KVM_PAGE_TRACK_MAX; i++) {
>>> + slot->arch.gfn_track[i] = kvm_kvzalloc(pages *
>>> + sizeof(*slot->arch.gfn_track[i]));
>>> + if (!slot->arch.gfn_track[i])
>>> + goto track_free;
>>> + }
>>> +
>>> + return 0;
>>> +
>>> +track_free:
>>> + page_track_slot_free(slot);
>>> + return -ENOMEM;
>>> +}
>> Is it necessary to use the 'unsigned long npages' pareameter? In my
>> understanding you are going to
>
> The type, 'int', is used here as I followed the style of 'struct
> kvm_lpage_info'.
>
> 4 bytes should be enough to track all users and signed type is good to
> track
> overflow.
>
>> track all GFNs in the memory slot anyway, right? If you want to keep
>> npages, I think it's better to
>> add a base_gfn as well otherwise you are assuming you are going to
>> track the npages GFN starting
>> from slot->base_gfn.
>
> Yes, any page in the memslot may be tracked so that there is a index
> for every
> page.
>
>>
>>> +
>>> +void kvm_page_track_free_memslot(struct kvm_memory_slot *free,
>>> + struct kvm_memory_slot *dont)
>>> +{
>>> + if (!dont || free->arch.gfn_track != dont->arch.gfn_track)
>>> + page_track_slot_free(free);
>>> +}
>>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>>> index c04987e..ad4888a 100644
>>> --- a/arch/x86/kvm/x86.c
>>> +++ b/arch/x86/kvm/x86.c
>>> @@ -7838,6 +7838,8 @@ void kvm_arch_free_memslot(struct kvm *kvm,
>>> struct kvm_memory_slot *free,
>>> free->arch.lpage_info[i - 1] = NULL;
>>> }
>>> }
>>> +
>>> + kvm_page_track_free_memslot(free, dont);
>>> }
>>> int kvm_arch_create_memslot(struct kvm *kvm, struct
>>> kvm_memory_slot *slot,
>>> @@ -7886,6 +7888,9 @@ int kvm_arch_create_memslot(struct kvm *kvm,
>>> struct kvm_memory_slot *slot,
>>> }
>>> }
>>> + if (kvm_page_track_create_memslot(slot, npages))
>>> + goto out_free;
>>> +
>> Looks essentially you are allocating one int for all GFNs of the slot
>> unconditionally. In my
>> understanding for most of memory slots, we are not going to track
>> them, so isn't it going to be
>> wasteful of memory?
>>
>
> Yes, hmm... maybe we can make the index as "unsigned short" then 1G
> memory only needs 512k index
> buffer. It is not so unacceptable.
Those comments are really minor and don't bother on this :)

Thanks,
-Kai

2015-12-16 07:55:33

by Kai Huang

[permalink] [raw]
Subject: Re: [PATCH 08/11] KVM: MMU: use page track for non-leaf shadow pages



On 12/15/2015 05:10 PM, Xiao Guangrong wrote:
>
>
> On 12/15/2015 03:52 PM, Kai Huang wrote:
>
>>> static bool __mmu_gfn_lpage_is_disallowed(gfn_t gfn, int level,
>>> @@ -2140,12 +2150,18 @@ static struct kvm_mmu_page
>>> *kvm_mmu_get_page(struct kvm_vcpu *vcpu,
>>> hlist_add_head(&sp->hash_link,
>>> &vcpu->kvm->arch.mmu_page_hash[kvm_page_table_hashfn(gfn)]);
>>> if (!direct) {
>>> - if (rmap_write_protect(vcpu, gfn))
>>> + /*
>>> + * we should do write protection before syncing pages
>>> + * otherwise the content of the synced shadow page may
>>> + * be inconsistent with guest page table.
>>> + */
>>> + account_shadowed(vcpu->kvm, sp);
>>> +
>>> + if (level == PT_PAGE_TABLE_LEVEL &&
>>> + rmap_write_protect(vcpu, gfn))
>>> kvm_flush_remote_tlbs(vcpu->kvm);
>> I think your modification is good but I am little bit confused here.
>> In account_shadowed, if
>> sp->role.level > PT_PAGE_TABLE_LEVEL, the sp->gfn is write protected,
>> and this is reasonable. So why
>> write protecting the gfn of PT_PAGE_TABLE_LEVEL here?
>
> Because the shadow page will become 'sync' that means the shadow page
> will be synced
> with the page table in guest. So the shadow page need to be
> write-protected to avoid
> the guest page table is changed when we do the 'sync' thing.
>
> The shadow page need to be write-protected to avoid that guest page
> table is changed
> when we are syncing the shadow page table. See kvm_sync_pages() after
> doing
> rmap_write_protect().
I see. So why are you treat PT_PAGE_TABLE_LEVEL gfn separately here? why
this cannot be done in account_shadowed, as you did for upper level sp?
Actually I am thinking whether account_shadowed is overdoing things.
From it's name it should only *account* shadow sp, but now it also does
write protection and disable large page mapping.

Thanks,
-Kai
>
>>> /*
>>> * remove the guest page from the tracking pool which stops the
>>> interception
>>> * of corresponding access on that page. It is the opposed
>>> operation of
>>> @@ -134,20 +160,12 @@ void kvm_page_track_remove_page(struct kvm
>>> *kvm, gfn_t gfn,
>>> struct kvm_memory_slot *slot;
>>> int i;
>>> - WARN_ON(!check_mode(mode));
>>> -
>>> for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
>>> slots = __kvm_memslots(kvm, i);
>>> slot = __gfn_to_memslot(slots, gfn);
>>> spin_lock(&kvm->mmu_lock);
>>> - update_gfn_track(slot, gfn, mode, -1);
>>> -
>>> - /*
>>> - * allow large page mapping for the tracked page
>>> - * after the tracker is gone.
>>> - */
>>> - kvm_mmu_gfn_allow_lpage(slot, gfn);
>>> + kvm_slot_page_track_remove_page_nolock(kvm, slot, gfn, mode);
>> Looks you need to merge this part with patch 1, as you are modifying
>> kvm_page_track_{add,remove}_page here, which are introduced in your
>> patch 1.
>
> Indeed, it is better.
>
>

2015-12-16 08:10:19

by Kai Huang

[permalink] [raw]
Subject: Re: [PATCH 09/11] KVM: MMU: simplify mmu_need_write_protect



On 12/15/2015 05:25 PM, Xiao Guangrong wrote:
>
>
> On 12/15/2015 04:43 PM, Kai Huang wrote:
>>
>>
>> On 12/01/2015 02:26 AM, Xiao Guangrong wrote:
>>> Now, all non-leaf shadow page are page tracked, if gfn is not tracked
>>> there is no non-leaf shadow page of gfn is existed, we can directly
>>> make the shadow page of gfn to unsync
>>>
>>> Signed-off-by: Xiao Guangrong <[email protected]>
>>> ---
>>> arch/x86/kvm/mmu.c | 26 ++++++++------------------
>>> 1 file changed, 8 insertions(+), 18 deletions(-)
>>>
>>> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
>>> index 5a2ca73..f89e77f 100644
>>> --- a/arch/x86/kvm/mmu.c
>>> +++ b/arch/x86/kvm/mmu.c
>>> @@ -2461,41 +2461,31 @@ static void __kvm_unsync_page(struct
>>> kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
>>> kvm_mmu_mark_parents_unsync(sp);
>>> }
>>> -static void kvm_unsync_pages(struct kvm_vcpu *vcpu, gfn_t gfn)
>>> +static bool kvm_unsync_pages(struct kvm_vcpu *vcpu, gfn_t gfn,
>>> + bool can_unsync)
>>> {
>>> struct kvm_mmu_page *s;
>>> for_each_gfn_indirect_valid_sp(vcpu->kvm, s, gfn) {
>>> + if (!can_unsync)
>>> + return true;
>> How about moving this right before for_each_gfn_indirect_valid_sp? As
>> can_unsync is passed as
>> parameter, so there's no point checking it several times.
>>
>
> We can not do this. What we are doing here is checking if we have
> shadow page mapping
> for 'gfn':
> a) if no, it can be writable.
I think in this case you should also check whether the GFN is being
write protection tracked. Ex, if the spte never exists when you add the
GFN to write protection tracking, and in this case I think
mmu_need_write_protect should also report true. Right?

> b) if yes, check 'can_unsync' to see if these shadow pages can make to
> be 'unsync'.
>
> Your suggestion can break the point a).
>
>> A further thinking is can we move it to mmu_need_write_protect?
>> Passing can_unsync as parameter to
>> kvm_unsync_pages sounds a little bit odd.
>>
>>> +
>>> if (s->unsync)
>>> continue;
>>> WARN_ON(s->role.level != PT_PAGE_TABLE_LEVEL);
>> How about large page mapping? Such as if guest uses 2M mapping and
>> its shadow is indirect, does
>> above WARN_ON still meet? As you removed the PT level check in
>> mmu_need_write_protect.
>
> The lager mapping are on the non-leaf shadow pages which can be
> figured out by
> kvm_page_track_check_mode() before we call this function.
Actually I am not quite understanding how large page mapping is
implemented. I see in kvm_mmu_get_page, when sp is allocated, it is
large page mapping disabled, but I think we do support large shadow
mapping, right? I mean theoretically if guest uses 2M mapping and
shadow mapping can certainly use 2M mapping as well, and the 2M shadow
mapping can also be 'unsynced' (as a leaf mapping table). But in your
series I see if we write protect some GFN, the shadow large page
mapping is always disabled.

Am I wrong?

Thanks,
-Kai
>
>

2015-12-16 08:30:21

by Xiao Guangrong

[permalink] [raw]
Subject: Re: [PATCH 06/11] KVM: MMU: let page fault handler be aware tracked page



On 12/16/2015 03:31 PM, Kai Huang wrote:
>
>
> On 12/15/2015 05:03 PM, Xiao Guangrong wrote:
>>
>>
>> On 12/15/2015 04:11 PM, Kai Huang wrote:
>>>
>>>
>>> On 12/01/2015 02:26 AM, Xiao Guangrong wrote:
>>>> The page fault caused by write access on the write tracked page can not
>>>> be fixed, it always need to be emulated. page_fault_handle_page_track()
>>>> is the fast path we introduce here to skip holding mmu-lock and shadow
>>>> page table walking
>>> Why can it be out side of mmu-lock? Is it OK that some other thread removes tracking of this page
>>> simultaneously? Shall we assuming the emulation code should handle this case?
>>>
>>
>> What your mentioned is the worst case, if that happen the vcpu will spend
>> longer time to emulate the access rather then retry it. It is bad but it is
>> the rare contention. It is worth speeding up the common / most case.
> My concern is when this case happen, whether emulating the access is still the right behavior, you
> know, after other thread removed the GFN from tracking..
> And as the notifier's track_write call back will be called in the emulating code, won't there be
> problem if the GFN has been removed from tracking by other thread?
>

When the notify callback is called, the tracker should check if the gfn is what it
is interested by itself. If it sees the gfn is not trackered anymore, it can skip
the callback.

2015-12-16 08:46:19

by Xiao Guangrong

[permalink] [raw]
Subject: Re: [PATCH 08/11] KVM: MMU: use page track for non-leaf shadow pages



On 12/16/2015 03:51 PM, Kai Huang wrote:
>
>
> On 12/15/2015 05:10 PM, Xiao Guangrong wrote:
>>
>>
>> On 12/15/2015 03:52 PM, Kai Huang wrote:
>>
>>>> static bool __mmu_gfn_lpage_is_disallowed(gfn_t gfn, int level,
>>>> @@ -2140,12 +2150,18 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu,
>>>> hlist_add_head(&sp->hash_link,
>>>> &vcpu->kvm->arch.mmu_page_hash[kvm_page_table_hashfn(gfn)]);
>>>> if (!direct) {
>>>> - if (rmap_write_protect(vcpu, gfn))
>>>> + /*
>>>> + * we should do write protection before syncing pages
>>>> + * otherwise the content of the synced shadow page may
>>>> + * be inconsistent with guest page table.
>>>> + */
>>>> + account_shadowed(vcpu->kvm, sp);
>>>> +
>>>> + if (level == PT_PAGE_TABLE_LEVEL &&
>>>> + rmap_write_protect(vcpu, gfn))
>>>> kvm_flush_remote_tlbs(vcpu->kvm);
>>> I think your modification is good but I am little bit confused here. In account_shadowed, if
>>> sp->role.level > PT_PAGE_TABLE_LEVEL, the sp->gfn is write protected, and this is reasonable. So why
>>> write protecting the gfn of PT_PAGE_TABLE_LEVEL here?
>>
>> Because the shadow page will become 'sync' that means the shadow page will be synced
>> with the page table in guest. So the shadow page need to be write-protected to avoid
>> the guest page table is changed when we do the 'sync' thing.
>>
>> The shadow page need to be write-protected to avoid that guest page table is changed
>> when we are syncing the shadow page table. See kvm_sync_pages() after doing
>> rmap_write_protect().
> I see. So why are you treat PT_PAGE_TABLE_LEVEL gfn separately here? why this cannot be done in
> account_shadowed, as you did for upper level sp?

non-leaf shadow pages are keepking write-protected which page fault handler can not fix write
access on it. And leaf shadow pages are not.

> Actually I am thinking whether account_shadowed is
> overdoing things. From it's name it should only *account* shadow sp, but now it also does write
> protection and disable large page mapping.
>

Hmm.. disable large page mapping is already in current code... i think account_shadowed() can
be understood as new page is taken into account, so protection things are needed there.

But I am not good at naming function and also my english is not good enough, any other better name
is welcome. ;)

2015-12-16 08:55:02

by Xiao Guangrong

[permalink] [raw]
Subject: Re: [PATCH 09/11] KVM: MMU: simplify mmu_need_write_protect



On 12/16/2015 04:05 PM, Kai Huang wrote:
>
>
> On 12/15/2015 05:25 PM, Xiao Guangrong wrote:
>>
>>
>> On 12/15/2015 04:43 PM, Kai Huang wrote:
>>>
>>>
>>> On 12/01/2015 02:26 AM, Xiao Guangrong wrote:
>>>> Now, all non-leaf shadow page are page tracked, if gfn is not tracked
>>>> there is no non-leaf shadow page of gfn is existed, we can directly
>>>> make the shadow page of gfn to unsync
>>>>
>>>> Signed-off-by: Xiao Guangrong <[email protected]>
>>>> ---
>>>> arch/x86/kvm/mmu.c | 26 ++++++++------------------
>>>> 1 file changed, 8 insertions(+), 18 deletions(-)
>>>>
>>>> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
>>>> index 5a2ca73..f89e77f 100644
>>>> --- a/arch/x86/kvm/mmu.c
>>>> +++ b/arch/x86/kvm/mmu.c
>>>> @@ -2461,41 +2461,31 @@ static void __kvm_unsync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page
>>>> *sp)
>>>> kvm_mmu_mark_parents_unsync(sp);
>>>> }
>>>> -static void kvm_unsync_pages(struct kvm_vcpu *vcpu, gfn_t gfn)
>>>> +static bool kvm_unsync_pages(struct kvm_vcpu *vcpu, gfn_t gfn,
>>>> + bool can_unsync)
>>>> {
>>>> struct kvm_mmu_page *s;
>>>> for_each_gfn_indirect_valid_sp(vcpu->kvm, s, gfn) {
>>>> + if (!can_unsync)
>>>> + return true;
>>> How about moving this right before for_each_gfn_indirect_valid_sp? As can_unsync is passed as
>>> parameter, so there's no point checking it several times.
>>>
>>
>> We can not do this. What we are doing here is checking if we have shadow page mapping
>> for 'gfn':
>> a) if no, it can be writable.
> I think in this case you should also check whether the GFN is being write protection tracked. Ex, if
> the spte never exists when you add the GFN to write protection tracking, and in this case I think
> mmu_need_write_protect should also report true. Right?

We have already checked it:

static bool mmu_need_write_protect(struct kvm_vcpu *vcpu, gfn_t gfn,
bool can_unsync)
{
if (kvm_page_track_check_mode(vcpu, gfn, KVM_PAGE_TRACK_WRITE))
return true;

return kvm_unsync_pages(vcpu, gfn, can_unsync);
}


>
>> b) if yes, check 'can_unsync' to see if these shadow pages can make to be 'unsync'.
>>
>> Your suggestion can break the point a).
>>
>>> A further thinking is can we move it to mmu_need_write_protect? Passing can_unsync as parameter to
>>> kvm_unsync_pages sounds a little bit odd.
>>>
>>>> +
>>>> if (s->unsync)
>>>> continue;
>>>> WARN_ON(s->role.level != PT_PAGE_TABLE_LEVEL);
>>> How about large page mapping? Such as if guest uses 2M mapping and its shadow is indirect, does
>>> above WARN_ON still meet? As you removed the PT level check in mmu_need_write_protect.
>>
>> The lager mapping are on the non-leaf shadow pages which can be figured out by
>> kvm_page_track_check_mode() before we call this function.
> Actually I am not quite understanding how large page mapping is implemented. I see in
> kvm_mmu_get_page, when sp is allocated, it is large page mapping disabled, but I think we do support
> large shadow mapping, right? I mean theoretically if guest uses 2M mapping and shadow mapping can
> certainly use 2M mapping as well, and the 2M shadow mapping can also be 'unsynced' (as a leaf
> mapping table). But in your series I see if we write protect some GFN, the shadow large page
> mapping is always disabled.
>
> Am I wrong?

If the large page contains the page which is used as page table, kvm does not map large page for
it, the reason is we track the 4k page instead of the whole large page to reduce write emulation.

2015-12-17 02:48:33

by Kai Huang

[permalink] [raw]
Subject: Re: [PATCH 08/11] KVM: MMU: use page track for non-leaf shadow pages



On 12/16/2015 04:39 PM, Xiao Guangrong wrote:
>
>
> On 12/16/2015 03:51 PM, Kai Huang wrote:
>>
>>
>> On 12/15/2015 05:10 PM, Xiao Guangrong wrote:
>>>
>>>
>>> On 12/15/2015 03:52 PM, Kai Huang wrote:
>>>
>>>>> static bool __mmu_gfn_lpage_is_disallowed(gfn_t gfn, int level,
>>>>> @@ -2140,12 +2150,18 @@ static struct kvm_mmu_page
>>>>> *kvm_mmu_get_page(struct kvm_vcpu *vcpu,
>>>>> hlist_add_head(&sp->hash_link,
>>>>> &vcpu->kvm->arch.mmu_page_hash[kvm_page_table_hashfn(gfn)]);
>>>>> if (!direct) {
>>>>> - if (rmap_write_protect(vcpu, gfn))
>>>>> + /*
>>>>> + * we should do write protection before syncing pages
>>>>> + * otherwise the content of the synced shadow page may
>>>>> + * be inconsistent with guest page table.
>>>>> + */
>>>>> + account_shadowed(vcpu->kvm, sp);
>>>>> +
>>>>> + if (level == PT_PAGE_TABLE_LEVEL &&
>>>>> + rmap_write_protect(vcpu, gfn))
>>>>> kvm_flush_remote_tlbs(vcpu->kvm);
>>>> I think your modification is good but I am little bit confused
>>>> here. In account_shadowed, if
>>>> sp->role.level > PT_PAGE_TABLE_LEVEL, the sp->gfn is write
>>>> protected, and this is reasonable. So why
>>>> write protecting the gfn of PT_PAGE_TABLE_LEVEL here?
>>>
>>> Because the shadow page will become 'sync' that means the shadow
>>> page will be synced
>>> with the page table in guest. So the shadow page need to be
>>> write-protected to avoid
>>> the guest page table is changed when we do the 'sync' thing.
>>>
>>> The shadow page need to be write-protected to avoid that guest page
>>> table is changed
>>> when we are syncing the shadow page table. See kvm_sync_pages()
>>> after doing
>>> rmap_write_protect().
>> I see. So why are you treat PT_PAGE_TABLE_LEVEL gfn separately here?
>> why this cannot be done in
>> account_shadowed, as you did for upper level sp?
>
> non-leaf shadow pages are keepking write-protected which page fault
> handler can not fix write
> access on it. And leaf shadow pages are not.
My point is the original code didn't separate the two cases so I am not
sure why you need to separate. Perhaps you want to make account_shadowed
imply the non-leaf guest page table is write-protected while leaf page
table is not.

Thanks,
-Kai
>> Actually I am thinking whether account_shadowed is
>> overdoing things. From it's name it should only *account* shadow sp,
>> but now it also does write
>> protection and disable large page mapping.
>>
>
> Hmm.. disable large page mapping is already in current code... i think
> account_shadowed() can
> be understood as new page is taken into account, so protection things
> are needed there.
>
> But I am not good at naming function and also my english is not good
> enough, any other better name
> is welcome. ;)
>
>

2015-12-17 02:56:08

by Kai Huang

[permalink] [raw]
Subject: Re: [PATCH 09/11] KVM: MMU: simplify mmu_need_write_protect



On 12/16/2015 04:48 PM, Xiao Guangrong wrote:
>
>
> On 12/16/2015 04:05 PM, Kai Huang wrote:
>>
>>
>> On 12/15/2015 05:25 PM, Xiao Guangrong wrote:
>>>
>>>
>>> On 12/15/2015 04:43 PM, Kai Huang wrote:
>>>>
>>>>
>>>> On 12/01/2015 02:26 AM, Xiao Guangrong wrote:
>>>>> Now, all non-leaf shadow page are page tracked, if gfn is not tracked
>>>>> there is no non-leaf shadow page of gfn is existed, we can directly
>>>>> make the shadow page of gfn to unsync
>>>>>
>>>>> Signed-off-by: Xiao Guangrong <[email protected]>
>>>>> ---
>>>>> arch/x86/kvm/mmu.c | 26 ++++++++------------------
>>>>> 1 file changed, 8 insertions(+), 18 deletions(-)
>>>>>
>>>>> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
>>>>> index 5a2ca73..f89e77f 100644
>>>>> --- a/arch/x86/kvm/mmu.c
>>>>> +++ b/arch/x86/kvm/mmu.c
>>>>> @@ -2461,41 +2461,31 @@ static void __kvm_unsync_page(struct
>>>>> kvm_vcpu *vcpu, struct kvm_mmu_page
>>>>> *sp)
>>>>> kvm_mmu_mark_parents_unsync(sp);
>>>>> }
>>>>> -static void kvm_unsync_pages(struct kvm_vcpu *vcpu, gfn_t gfn)
>>>>> +static bool kvm_unsync_pages(struct kvm_vcpu *vcpu, gfn_t gfn,
>>>>> + bool can_unsync)
>>>>> {
>>>>> struct kvm_mmu_page *s;
>>>>> for_each_gfn_indirect_valid_sp(vcpu->kvm, s, gfn) {
>>>>> + if (!can_unsync)
>>>>> + return true;
>>>> How about moving this right before for_each_gfn_indirect_valid_sp?
>>>> As can_unsync is passed as
>>>> parameter, so there's no point checking it several times.
>>>>
>>>
>>> We can not do this. What we are doing here is checking if we have
>>> shadow page mapping
>>> for 'gfn':
>>> a) if no, it can be writable.
>> I think in this case you should also check whether the GFN is being
>> write protection tracked. Ex, if
>> the spte never exists when you add the GFN to write protection
>> tracking, and in this case I think
>> mmu_need_write_protect should also report true. Right?
>
> We have already checked it:
>
> static bool mmu_need_write_protect(struct kvm_vcpu *vcpu, gfn_t gfn,
> bool can_unsync)
> {
> if (kvm_page_track_check_mode(vcpu, gfn, KVM_PAGE_TRACK_WRITE))
> return true;
>
> return kvm_unsync_pages(vcpu, gfn, can_unsync);
> }
Oh sorry I missed this :)

>
>
>>
>>> b) if yes, check 'can_unsync' to see if these shadow pages can make
>>> to be 'unsync'.
>>>
>>> Your suggestion can break the point a).
>>>
>>>> A further thinking is can we move it to mmu_need_write_protect?
>>>> Passing can_unsync as parameter to
>>>> kvm_unsync_pages sounds a little bit odd.
>>>>
>>>>> +
>>>>> if (s->unsync)
>>>>> continue;
>>>>> WARN_ON(s->role.level != PT_PAGE_TABLE_LEVEL);
>>>> How about large page mapping? Such as if guest uses 2M mapping and
>>>> its shadow is indirect, does
>>>> above WARN_ON still meet? As you removed the PT level check in
>>>> mmu_need_write_protect.
>>>
>>> The lager mapping are on the non-leaf shadow pages which can be
>>> figured out by
>>> kvm_page_track_check_mode() before we call this function.
>> Actually I am not quite understanding how large page mapping is
>> implemented. I see in
>> kvm_mmu_get_page, when sp is allocated, it is large page mapping
>> disabled, but I think we do support
>> large shadow mapping, right? I mean theoretically if guest uses 2M
>> mapping and shadow mapping can
>> certainly use 2M mapping as well, and the 2M shadow mapping can also
>> be 'unsynced' (as a leaf
>> mapping table). But in your series I see if we write protect some
>> GFN, the shadow large page
>> mapping is always disabled.
>>
>> Am I wrong?
>
> If the large page contains the page which is used as page table, kvm
> does not map large page for
> it, the reason is we track the 4k page instead of the whole large page
> to reduce write emulation.
I don't know why breaking large page to 4K mapping can reduce write
emulation, but this explanation works for me. I guess KVM-GT doesn't
care about it neither. :)

Thanks,
-Kai
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

2015-12-17 04:14:25

by Xiao Guangrong

[permalink] [raw]
Subject: Re: [PATCH 08/11] KVM: MMU: use page track for non-leaf shadow pages



On 12/17/2015 10:44 AM, Kai Huang wrote:
>
>
> On 12/16/2015 04:39 PM, Xiao Guangrong wrote:
>>
>>
>> On 12/16/2015 03:51 PM, Kai Huang wrote:
>>>
>>>
>>> On 12/15/2015 05:10 PM, Xiao Guangrong wrote:
>>>>
>>>>
>>>> On 12/15/2015 03:52 PM, Kai Huang wrote:
>>>>
>>>>>> static bool __mmu_gfn_lpage_is_disallowed(gfn_t gfn, int level,
>>>>>> @@ -2140,12 +2150,18 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu,
>>>>>> hlist_add_head(&sp->hash_link,
>>>>>> &vcpu->kvm->arch.mmu_page_hash[kvm_page_table_hashfn(gfn)]);
>>>>>> if (!direct) {
>>>>>> - if (rmap_write_protect(vcpu, gfn))
>>>>>> + /*
>>>>>> + * we should do write protection before syncing pages
>>>>>> + * otherwise the content of the synced shadow page may
>>>>>> + * be inconsistent with guest page table.
>>>>>> + */
>>>>>> + account_shadowed(vcpu->kvm, sp);
>>>>>> +
>>>>>> + if (level == PT_PAGE_TABLE_LEVEL &&
>>>>>> + rmap_write_protect(vcpu, gfn))
>>>>>> kvm_flush_remote_tlbs(vcpu->kvm);
>>>>> I think your modification is good but I am little bit confused here. In account_shadowed, if
>>>>> sp->role.level > PT_PAGE_TABLE_LEVEL, the sp->gfn is write protected, and this is reasonable.
>>>>> So why
>>>>> write protecting the gfn of PT_PAGE_TABLE_LEVEL here?
>>>>
>>>> Because the shadow page will become 'sync' that means the shadow page will be synced
>>>> with the page table in guest. So the shadow page need to be write-protected to avoid
>>>> the guest page table is changed when we do the 'sync' thing.
>>>>
>>>> The shadow page need to be write-protected to avoid that guest page table is changed
>>>> when we are syncing the shadow page table. See kvm_sync_pages() after doing
>>>> rmap_write_protect().
>>> I see. So why are you treat PT_PAGE_TABLE_LEVEL gfn separately here? why this cannot be done in
>>> account_shadowed, as you did for upper level sp?
>>
>> non-leaf shadow pages are keepking write-protected which page fault handler can not fix write
>> access on it. And leaf shadow pages are not.
> My point is the original code didn't separate the two cases so I am not sure why you need to
> separate. Perhaps you want to make account_shadowed imply the non-leaf guest page table is
> write-protected while leaf page table is not.

That is why we get improvement after this patchset, we seep up the case for the write access
happens on non-leaf page tables. ;)