2012-08-21 02:57:17

by Xiao Guangrong

[permalink] [raw]
Subject: [PATCH v6 00/12] KVM: introduce readonly memslot

Changelog:
- fix endless retrying for unhandleable instruction which accesses on readonly
host memory
- divide slot->flags by 16:16, the lower part is visible for userspace, the
reset is internally used in kvm, and document this in the code
- check slot->flags for gfn_to_hva_memslot

The test case can be found at:
http://lkml.indiana.edu/hypermail/linux/kernel/1207.2/00819/migrate-perf.tar.bz2

In current code, if we map a readonly memory space from host to guest
and the page is not currently mapped in the host, we will get a fault-pfn
and async is not allowed, then the vm will crash.

As Avi's suggestion, We introduce readonly memory region to map ROM/ROMD
to the guest, read access is happy for readonly memslot, write access on
readonly memslot will cause KVM_EXIT_MMIO exit.


2012-08-21 02:57:54

by Xiao Guangrong

[permalink] [raw]
Subject: [PATCH v6 01/12] KVM: x86: fix possible infinite loop caused by reexecute_instruction

Currently, we reexecute all unhandleable instructions if they do not
access on the mmio, however, it can not work if host map the readonly
memory to guest. If the instruction try to write this kind of memory,
it will fault again when guest retry it, then we will goto a infinite
loop: retry instruction -> write #PF -> emulation fail ->
retry instruction -> ...

Fix it by retrying the instruction only when it faults on the writable
memory

Signed-off-by: Xiao Guangrong <[email protected]>
---
arch/x86/kvm/x86.c | 12 +++++++++++-
1 files changed, 11 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index fb0d937..704680d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4473,6 +4473,7 @@ static int handle_emulation_failure(struct kvm_vcpu *vcpu)
static bool reexecute_instruction(struct kvm_vcpu *vcpu, gva_t gva)
{
gpa_t gpa;
+ pfn_t pfn;

if (tdp_enabled)
return false;
@@ -4490,8 +4491,17 @@ static bool reexecute_instruction(struct kvm_vcpu *vcpu, gva_t gva)
if (gpa == UNMAPPED_GVA)
return true; /* let cpu generate fault */

- if (!kvm_is_error_hva(gfn_to_hva(vcpu->kvm, gpa >> PAGE_SHIFT)))
+ /*
+ * Do not retry the unhandleable instruction if it faults on the
+ * readonly host memory, otherwise it will goto a infinite loop:
+ * retry instruction -> write #PF -> emulation fail -> retry
+ * instruction -> ...
+ */
+ pfn = gfn_to_pfn(vcpu->kvm, gpa_to_gfn(gpa));
+ if (!is_error_pfn(pfn)) {
+ kvm_release_pfn_clean(pfn);
return true;
+ }

return false;
}
--
1.7.7.6

2012-08-21 02:58:24

by Xiao Guangrong

[permalink] [raw]
Subject: [PATCH v6 02/12] KVM: fix missing check for memslot flags

Check flags when memslot is registered from userspace as Avi's suggestion

Signed-off-by: Xiao Guangrong <[email protected]>
---
virt/kvm/kvm_main.c | 12 ++++++++++++
1 files changed, 12 insertions(+), 0 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index a2e85af..7b94d70 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -678,6 +678,14 @@ void update_memslots(struct kvm_memslots *slots, struct kvm_memory_slot *new)
slots->generation++;
}

+static int check_memory_region_flags(struct kvm_userspace_memory_region *mem)
+{
+ if (mem->flags & ~KVM_MEM_LOG_DIRTY_PAGES)
+ return -EINVAL;
+
+ return 0;
+}
+
/*
* Allocate some memory and give it an address in the guest physical address
* space.
@@ -698,6 +706,10 @@ int __kvm_set_memory_region(struct kvm *kvm,
struct kvm_memory_slot old, new;
struct kvm_memslots *slots, *old_memslots;

+ r = check_memory_region_flags(mem);
+ if (r)
+ goto out;
+
r = -EINVAL;
/* General sanity checks */
if (mem->memory_size & (PAGE_SIZE - 1))
--
1.7.7.6

2012-08-21 02:58:57

by Xiao Guangrong

[permalink] [raw]
Subject: [PATCH v6 03/12] KVM: hide KVM_MEMSLOT_INVALID from userspace

Quote Avi's comment:
| KVM_MEMSLOT_INVALID is actually an internal symbol, not used by
| userspace. Please move it to kvm_host.h.

Also, we divide the memlsot->flags into two parts, the lower 16 bits
are visible for userspace, the higher 16 bits are internally used in
kvm

Signed-off-by: Xiao Guangrong <[email protected]>
---
include/linux/kvm.h | 7 +++++--
include/linux/kvm_host.h | 7 +++++++
2 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index 2ce09aa..2de335d 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -101,9 +101,12 @@ struct kvm_userspace_memory_region {
__u64 userspace_addr; /* start of the userspace allocated memory */
};

-/* for kvm_memory_region::flags */
+/*
+ * The bit 0 ~ bit 15 of kvm_memory_region::flags are visible for userspace,
+ * other bits are reserved for kvm internal use which are defined in
+ * include/linux/kvm_host.h.
+ */
#define KVM_MEM_LOG_DIRTY_PAGES 1UL
-#define KVM_MEMSLOT_INVALID (1UL << 1)

/* for KVM_IRQ_LINE */
struct kvm_irq_level {
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index d2b897e..d4bd4d4 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -36,6 +36,13 @@
#endif

/*
+ * The bit 16 ~ bit 31 of kvm_memory_region::flags are internally used
+ * in kvm, other bits are visible for userspace which are defined in
+ * include/linux/kvm_h.
+ */
+#define KVM_MEMSLOT_INVALID (1UL << 16)
+
+/*
* If we support unaligned MMIO, at most one fragment will be split into two:
*/
#ifdef KVM_UNALIGNED_MMIO
--
1.7.7.6

2012-08-21 02:59:25

by Xiao Guangrong

[permalink] [raw]
Subject: [PATCH v6 04/12] KVM: introduce gfn_to_pfn_memslot_atomic

It can instead of hva_to_pfn_atomic

Signed-off-by: Xiao Guangrong <[email protected]>
---
arch/x86/kvm/mmu.c | 5 +----
include/linux/kvm_host.h | 3 ++-
virt/kvm/kvm_main.c | 14 ++++++++------
3 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 9651c2c..5548971 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2510,15 +2510,12 @@ static pfn_t pte_prefetch_gfn_to_pfn(struct kvm_vcpu *vcpu, gfn_t gfn,
bool no_dirty_log)
{
struct kvm_memory_slot *slot;
- unsigned long hva;

slot = gfn_to_memslot_dirty_bitmap(vcpu, gfn, no_dirty_log);
if (!slot)
return KVM_PFN_ERR_FAULT;

- hva = gfn_to_hva_memslot(slot, gfn);
-
- return hva_to_pfn_atomic(hva);
+ return gfn_to_pfn_memslot_atomic(slot, gfn);
}

static int direct_pte_prefetch_many(struct kvm_vcpu *vcpu,
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index d4bd4d4..52c86e4 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -462,7 +462,6 @@ void kvm_release_page_dirty(struct page *page);
void kvm_set_page_dirty(struct page *page);
void kvm_set_page_accessed(struct page *page);

-pfn_t hva_to_pfn_atomic(unsigned long addr);
pfn_t gfn_to_pfn_atomic(struct kvm *kvm, gfn_t gfn);
pfn_t gfn_to_pfn_async(struct kvm *kvm, gfn_t gfn, bool *async,
bool write_fault, bool *writable);
@@ -470,6 +469,8 @@ pfn_t gfn_to_pfn(struct kvm *kvm, gfn_t gfn);
pfn_t gfn_to_pfn_prot(struct kvm *kvm, gfn_t gfn, bool write_fault,
bool *writable);
pfn_t gfn_to_pfn_memslot(struct kvm_memory_slot *slot, gfn_t gfn);
+pfn_t gfn_to_pfn_memslot_atomic(struct kvm_memory_slot *slot, gfn_t gfn);
+
void kvm_release_pfn_dirty(pfn_t pfn);
void kvm_release_pfn_clean(pfn_t pfn);
void kvm_set_pfn_dirty(pfn_t pfn);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 7b94d70..543f9b7 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1102,12 +1102,6 @@ static pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool *async,
return pfn;
}

-pfn_t hva_to_pfn_atomic(unsigned long addr)
-{
- return hva_to_pfn(addr, true, NULL, true, NULL);
-}
-EXPORT_SYMBOL_GPL(hva_to_pfn_atomic);
-
static pfn_t __gfn_to_pfn(struct kvm *kvm, gfn_t gfn, bool atomic, bool *async,
bool write_fault, bool *writable)
{
@@ -1155,6 +1149,14 @@ pfn_t gfn_to_pfn_memslot(struct kvm_memory_slot *slot, gfn_t gfn)
return hva_to_pfn(addr, false, NULL, true, NULL);
}

+pfn_t gfn_to_pfn_memslot_atomic(struct kvm_memory_slot *slot, gfn_t gfn)
+{
+ unsigned long addr = gfn_to_hva_memslot(slot, gfn);
+
+ return hva_to_pfn(addr, true, NULL, true, NULL);
+}
+EXPORT_SYMBOL_GPL(gfn_to_pfn_memslot_atomic);
+
int gfn_to_page_many_atomic(struct kvm *kvm, gfn_t gfn, struct page **pages,
int nr_pages)
{
--
1.7.7.6

2012-08-21 03:00:07

by Xiao Guangrong

[permalink] [raw]
Subject: [PATCH v6 05/12] KVM: introduce gfn_to_hva_read/kvm_read_hva/kvm_read_hva_atomic

This set of functions is only used to read data from host space, in the
later patch, we will only get a readonly hva in gfn_to_hva_read, and
the function name is a good hint to let gfn_to_hva_read to pair with
kvm_read_hva()/kvm_read_hva_atomic()

Signed-off-by: Xiao Guangrong <[email protected]>
---
virt/kvm/kvm_main.c | 27 +++++++++++++++++++++++----
1 files changed, 23 insertions(+), 4 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 543f9b7..6e3ea15 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1002,6 +1002,25 @@ unsigned long gfn_to_hva(struct kvm *kvm, gfn_t gfn)
}
EXPORT_SYMBOL_GPL(gfn_to_hva);

+/*
+ * The hva returned by this function is only allowed to be read.
+ * It should pair with kvm_read_hva() or kvm_read_hva_atomic().
+ */
+static unsigned long gfn_to_hva_read(struct kvm *kvm, gfn_t gfn)
+{
+ return gfn_to_hva_many(gfn_to_memslot(kvm, gfn), gfn, NULL);
+}
+
+static int kvm_read_hva(void *data, void __user *hva, int len)
+{
+ return __copy_from_user(data, hva, len);
+}
+
+static int kvm_read_hva_atomic(void *data, void __user *hva, int len)
+{
+ return __copy_from_user_inatomic(data, hva, len);
+}
+
int get_user_page_nowait(struct task_struct *tsk, struct mm_struct *mm,
unsigned long start, int write, struct page **page)
{
@@ -1274,10 +1293,10 @@ int kvm_read_guest_page(struct kvm *kvm, gfn_t gfn, void *data, int offset,
int r;
unsigned long addr;

- addr = gfn_to_hva(kvm, gfn);
+ addr = gfn_to_hva_read(kvm, gfn);
if (kvm_is_error_hva(addr))
return -EFAULT;
- r = __copy_from_user(data, (void __user *)addr + offset, len);
+ r = kvm_read_hva(data, (void __user *)addr + offset, len);
if (r)
return -EFAULT;
return 0;
@@ -1312,11 +1331,11 @@ int kvm_read_guest_atomic(struct kvm *kvm, gpa_t gpa, void *data,
gfn_t gfn = gpa >> PAGE_SHIFT;
int offset = offset_in_page(gpa);

- addr = gfn_to_hva(kvm, gfn);
+ addr = gfn_to_hva_read(kvm, gfn);
if (kvm_is_error_hva(addr))
return -EFAULT;
pagefault_disable();
- r = __copy_from_user_inatomic(data, (void __user *)addr + offset, len);
+ r = kvm_read_hva_atomic(data, (void __user *)addr + offset, len);
pagefault_enable();
if (r)
return -EFAULT;
--
1.7.7.6

2012-08-21 03:00:40

by Xiao Guangrong

[permalink] [raw]
Subject: [PATCH v6 06/12] KVM: reorganize hva_to_pfn

We do too many things in hva_to_pfn, this patch reorganize the code,
let it be better readable

Signed-off-by: Xiao Guangrong <[email protected]>
---
virt/kvm/kvm_main.c | 159 +++++++++++++++++++++++++++++++--------------------
1 files changed, 97 insertions(+), 62 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 6e3ea15..aa4a38a 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1041,83 +1041,118 @@ static inline int check_user_page_hwpoison(unsigned long addr)
return rc == -EHWPOISON;
}

-static pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool *async,
- bool write_fault, bool *writable)
+/*
+ * The atomic path to get the writable pfn which will be stored in @pfn,
+ * true indicates success, otherwise false is returned.
+ */
+static bool hva_to_pfn_fast(unsigned long addr, bool atomic, bool *async,
+ bool write_fault, bool *writable, pfn_t *pfn)
{
struct page *page[1];
- int npages = 0;
- pfn_t pfn;
+ int npages;

- /* we can do it either atomically or asynchronously, not both */
- BUG_ON(atomic && async);
+ if (!(async || atomic))
+ return false;

- BUG_ON(!write_fault && !writable);
+ npages = __get_user_pages_fast(addr, 1, 1, page);
+ if (npages == 1) {
+ *pfn = page_to_pfn(page[0]);

- if (writable)
- *writable = true;
+ if (writable)
+ *writable = true;
+ return true;
+ }
+
+ return false;
+}

- if (atomic || async)
- npages = __get_user_pages_fast(addr, 1, 1, page);
+/*
+ * The slow path to get the pfn of the specified host virtual address,
+ * 1 indicates success, -errno is returned if error is detected.
+ */
+static int hva_to_pfn_slow(unsigned long addr, bool *async, bool write_fault,
+ bool *writable, pfn_t *pfn)
+{
+ struct page *page[1];
+ int npages = 0;

- if (unlikely(npages != 1) && !atomic) {
- might_sleep();
+ might_sleep();

- if (writable)
- *writable = write_fault;
-
- if (async) {
- down_read(&current->mm->mmap_sem);
- npages = get_user_page_nowait(current, current->mm,
- addr, write_fault, page);
- up_read(&current->mm->mmap_sem);
- } else
- npages = get_user_pages_fast(addr, 1, write_fault,
- page);
-
- /* map read fault as writable if possible */
- if (unlikely(!write_fault) && npages == 1) {
- struct page *wpage[1];
-
- npages = __get_user_pages_fast(addr, 1, 1, wpage);
- if (npages == 1) {
- *writable = true;
- put_page(page[0]);
- page[0] = wpage[0];
- }
- npages = 1;
+ if (writable)
+ *writable = write_fault;
+
+ if (async) {
+ down_read(&current->mm->mmap_sem);
+ npages = get_user_page_nowait(current, current->mm,
+ addr, write_fault, page);
+ up_read(&current->mm->mmap_sem);
+ } else
+ npages = get_user_pages_fast(addr, 1, write_fault,
+ page);
+ if (npages != 1)
+ return npages;
+
+ /* map read fault as writable if possible */
+ if (unlikely(!write_fault)) {
+ struct page *wpage[1];
+
+ npages = __get_user_pages_fast(addr, 1, 1, wpage);
+ if (npages == 1) {
+ *writable = true;
+ put_page(page[0]);
+ page[0] = wpage[0];
}
+
+ npages = 1;
}
+ *pfn = page_to_pfn(page[0]);
+ return npages;
+}
+
+static pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool *async,
+ bool write_fault, bool *writable)
+{
+ struct vm_area_struct *vma;
+ pfn_t pfn = 0;
+ int npages;

- if (unlikely(npages != 1)) {
- struct vm_area_struct *vma;
+ /* we can do it either atomically or asynchronously, not both */
+ BUG_ON(atomic && async);

- if (atomic)
- return KVM_PFN_ERR_FAULT;
+ BUG_ON(!write_fault && !writable);

- down_read(&current->mm->mmap_sem);
- if (npages == -EHWPOISON ||
- (!async && check_user_page_hwpoison(addr))) {
- up_read(&current->mm->mmap_sem);
- return KVM_PFN_ERR_HWPOISON;
- }
+ if (hva_to_pfn_fast(addr, atomic, async, write_fault, writable, &pfn))
+ return pfn;

- vma = find_vma_intersection(current->mm, addr, addr+1);
-
- if (vma == NULL)
- pfn = KVM_PFN_ERR_FAULT;
- else if ((vma->vm_flags & VM_PFNMAP)) {
- pfn = ((addr - vma->vm_start) >> PAGE_SHIFT) +
- vma->vm_pgoff;
- BUG_ON(!kvm_is_mmio_pfn(pfn));
- } else {
- if (async && (vma->vm_flags & VM_WRITE))
- *async = true;
- pfn = KVM_PFN_ERR_FAULT;
- }
- up_read(&current->mm->mmap_sem);
- } else
- pfn = page_to_pfn(page[0]);
+ if (atomic)
+ return KVM_PFN_ERR_FAULT;

+ npages = hva_to_pfn_slow(addr, async, write_fault, writable, &pfn);
+ if (npages == 1)
+ return pfn;
+
+ down_read(&current->mm->mmap_sem);
+ if (npages == -EHWPOISON ||
+ (!async && check_user_page_hwpoison(addr))) {
+ pfn = KVM_PFN_ERR_HWPOISON;
+ goto exit;
+ }
+
+ vma = find_vma_intersection(current->mm, addr, addr + 1);
+
+ if (vma == NULL)
+ pfn = KVM_PFN_ERR_FAULT;
+ else if ((vma->vm_flags & VM_PFNMAP)) {
+ pfn = ((addr - vma->vm_start) >> PAGE_SHIFT) +
+ vma->vm_pgoff;
+ BUG_ON(!kvm_is_mmio_pfn(pfn));
+ } else {
+ if (async && (vma->vm_flags & VM_WRITE))
+ *async = true;
+ pfn = KVM_PFN_ERR_FAULT;
+ }
+exit:
+ up_read(&current->mm->mmap_sem);
return pfn;
}

--
1.7.7.6

2012-08-21 03:00:59

by Xiao Guangrong

[permalink] [raw]
Subject: [PATCH v6 07/12] KVM: use 'writable' as a hint to map writable pfn

In current code, we always map writable pfn for the read fault, in order
to support readonly memslot, we map writable pfn only if 'writable'
is not NULL

Signed-off-by: Xiao Guangrong <[email protected]>
---
virt/kvm/kvm_main.c | 24 +++++++++++++++++++++++-
1 files changed, 23 insertions(+), 1 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index aa4a38a..c89d1b5 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1054,6 +1054,14 @@ static bool hva_to_pfn_fast(unsigned long addr, bool atomic, bool *async,
if (!(async || atomic))
return false;

+ /*
+ * Fast pin a writable pfn only if it is a write fault request
+ * or the caller allows to map a writable pfn for a read fault
+ * request.
+ */
+ if (!(write_fault || writable))
+ return false;
+
npages = __get_user_pages_fast(addr, 1, 1, page);
if (npages == 1) {
*pfn = page_to_pfn(page[0]);
@@ -1093,7 +1101,7 @@ static int hva_to_pfn_slow(unsigned long addr, bool *async, bool write_fault,
return npages;

/* map read fault as writable if possible */
- if (unlikely(!write_fault)) {
+ if (unlikely(!write_fault) && writable) {
struct page *wpage[1];

npages = __get_user_pages_fast(addr, 1, 1, wpage);
@@ -1109,6 +1117,20 @@ static int hva_to_pfn_slow(unsigned long addr, bool *async, bool write_fault,
return npages;
}

+/*
+ * Pin guest page in memory and return its pfn.
+ * @addr: host virtual address which maps memory to the guest
+ * @atomic: whether this function can sleep
+ * @async: whether this function need to wait IO complete if the
+ * host page is not in the memory
+ * @write_fault: whether we should get a writable host page
+ * @writable: whether it allows to map a writable host page for !@write_fault
+ *
+ * The function will map a writable host page for these two cases:
+ * 1): @write_fault = true
+ * 2): @write_fault = false && @writable, @writable will tell the caller
+ * whether the mapping is writable.
+ */
static pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool *async,
bool write_fault, bool *writable)
{
--
1.7.7.6

2012-08-21 03:01:37

by Xiao Guangrong

[permalink] [raw]
Subject: [PATCH v6 08/12] KVM: introduce KVM_PFN_ERR_RO_FAULT

In the later patch, it indicates failure when we try to get a writable
pfn from the readonly memslot

Signed-off-by: Xiao Guangrong <[email protected]>
---
include/linux/kvm_host.h | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 52c86e4..4fd33e0 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -65,6 +65,7 @@
#define KVM_PFN_ERR_FAULT (KVM_PFN_ERR_MASK)
#define KVM_PFN_ERR_HWPOISON (KVM_PFN_ERR_MASK + 1)
#define KVM_PFN_ERR_BAD (KVM_PFN_ERR_MASK + 2)
+#define KVM_PFN_ERR_RO_FAULT (KVM_PFN_ERR_MASK + 3)

static inline bool is_error_pfn(pfn_t pfn)
{
--
1.7.7.6

2012-08-21 03:02:40

by Xiao Guangrong

[permalink] [raw]
Subject: [PATCH v6 10/12] KVM: introduce KVM_HVA_ERR_RO_BAD

In the later patch, it indicates failure when we try to get a writable
hva from the readonly memslot

Signed-off-by: Xiao Guangrong <[email protected]>
---
include/linux/kvm_host.h | 5 +++--
1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index b9bba60..a913ac7 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -82,11 +82,12 @@ static inline bool is_invalid_pfn(pfn_t pfn)
return !is_noslot_pfn(pfn) && is_error_pfn(pfn);
}

-#define KVM_HVA_ERR_BAD (PAGE_OFFSET)
+#define KVM_HVA_ERR_BAD (PAGE_OFFSET)
+#define KVM_HVA_ERR_RO_BAD (PAGE_OFFSET + PAGE_SIZE)

static inline bool kvm_is_error_hva(unsigned long addr)
{
- return addr == PAGE_OFFSET;
+ return addr >= PAGE_OFFSET;
}

#define KVM_ERR_PTR_BAD_PAGE (ERR_PTR(-ENOENT))
--
1.7.7.6

2012-08-21 03:03:15

by Xiao Guangrong

[permalink] [raw]
Subject: [PATCH v6 11/12] KVM: introduce readonly memslot

In current code, if we map a readonly memory space from host to guest
and the page is not currently mapped in the host, we will get a fault
pfn and async is not allowed, then the vm will crash

We introduce readonly memory region to map ROM/ROMD to the guest, read access
is happy for readonly memslot, write access on readonly memslot will cause
KVM_EXIT_MMIO exit

Signed-off-by: Xiao Guangrong <[email protected]>
---
Documentation/virtual/kvm/api.txt | 10 +++-
arch/x86/include/asm/kvm.h | 1 +
arch/x86/kvm/mmu.c | 9 ++++
arch/x86/kvm/x86.c | 1 +
include/linux/kvm.h | 6 ++-
include/linux/kvm_host.h | 7 +--
virt/kvm/kvm_main.c | 96 ++++++++++++++++++++++++++++++-------
7 files changed, 102 insertions(+), 28 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index bf33aaa..b91bfd4 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -857,7 +857,8 @@ struct kvm_userspace_memory_region {
};

/* for kvm_memory_region::flags */
-#define KVM_MEM_LOG_DIRTY_PAGES 1UL
+#define KVM_MEM_LOG_DIRTY_PAGES (1UL << 0)
+#define KVM_MEM_READONLY (1UL << 1)

This ioctl allows the user to create or modify a guest physical memory
slot. When changing an existing slot, it may be moved in the guest
@@ -873,9 +874,12 @@ It is recommended that the lower 21 bits of guest_phys_addr and userspace_addr
be identical. This allows large pages in the guest to be backed by large
pages in the host.

-The flags field supports just one flag, KVM_MEM_LOG_DIRTY_PAGES, which
+The flags field supports two flag, KVM_MEM_LOG_DIRTY_PAGES, which
instructs kvm to keep track of writes to memory within the slot. See
-the KVM_GET_DIRTY_LOG ioctl.
+the KVM_GET_DIRTY_LOG ioctl. Another flag is KVM_MEM_READONLY when the
+KVM_CAP_READONLY_MEM capability, it indicates the guest memory is read-only,
+that means, guest is only allowed to read it. Writes will be posted to
+userspace as KVM_EXIT_MMIO exits.

When the KVM_CAP_SYNC_MMU capability, changes in the backing of the memory
region are automatically reflected into the guest. For example, an mmap()
diff --git a/arch/x86/include/asm/kvm.h b/arch/x86/include/asm/kvm.h
index 246617e..521bf25 100644
--- a/arch/x86/include/asm/kvm.h
+++ b/arch/x86/include/asm/kvm.h
@@ -25,6 +25,7 @@
#define __KVM_HAVE_DEBUGREGS
#define __KVM_HAVE_XSAVE
#define __KVM_HAVE_XCRS
+#define __KVM_HAVE_READONLY_MEM

/* Architectural interrupt line count. */
#define KVM_NR_INTERRUPTS 256
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 5548971..8e312a2 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2647,6 +2647,15 @@ static void kvm_send_hwpoison_signal(unsigned long address, struct task_struct *

static int kvm_handle_bad_page(struct kvm_vcpu *vcpu, gfn_t gfn, pfn_t pfn)
{
+ /*
+ * Do not cache the mmio info caused by writing the readonly gfn
+ * into the spte otherwise read access on readonly gfn also can
+ * caused mmio page fault and treat it as mmio access.
+ * Return 1 to tell kvm to emulate it.
+ */
+ if (pfn == KVM_PFN_ERR_RO_FAULT)
+ return 1;
+
if (pfn == KVM_PFN_ERR_HWPOISON) {
kvm_send_hwpoison_signal(gfn_to_hva(vcpu->kvm, gfn), current);
return 0;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 704680d..42bbf41 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2175,6 +2175,7 @@ int kvm_dev_ioctl_check_extension(long ext)
case KVM_CAP_GET_TSC_KHZ:
case KVM_CAP_PCI_2_3:
case KVM_CAP_KVMCLOCK_CTRL:
+ case KVM_CAP_READONLY_MEM:
r = 1;
break;
case KVM_CAP_COALESCED_MMIO:
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index 2de335d..d808694 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -106,7 +106,8 @@ struct kvm_userspace_memory_region {
* other bits are reserved for kvm internal use which are defined in
* include/linux/kvm_host.h.
*/
-#define KVM_MEM_LOG_DIRTY_PAGES 1UL
+#define KVM_MEM_LOG_DIRTY_PAGES (1UL << 0)
+#define KVM_MEM_READONLY (1UL << 1)

/* for KVM_IRQ_LINE */
struct kvm_irq_level {
@@ -621,6 +622,9 @@ struct kvm_ppc_smmu_info {
#define KVM_CAP_PPC_GET_SMMU_INFO 78
#define KVM_CAP_S390_COW 79
#define KVM_CAP_PPC_ALLOC_HTAB 80
+#ifdef __KVM_HAVE_READONLY_MEM
+#define KVM_CAP_READONLY_MEM 81
+#endif

#ifdef KVM_CAP_IRQ_ROUTING

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index a913ac7..5972c98 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -465,6 +465,7 @@ int gfn_to_page_many_atomic(struct kvm *kvm, gfn_t gfn, struct page **pages,

struct page *gfn_to_page(struct kvm *kvm, gfn_t gfn);
unsigned long gfn_to_hva(struct kvm *kvm, gfn_t gfn);
+unsigned long gfn_to_hva_memslot(struct kvm_memory_slot *slot, gfn_t gfn);
void kvm_release_page_clean(struct page *page);
void kvm_release_page_dirty(struct page *page);
void kvm_set_page_dirty(struct page *page);
@@ -792,12 +793,6 @@ hva_to_gfn_memslot(unsigned long hva, struct kvm_memory_slot *slot)
return slot->base_gfn + gfn_offset;
}

-static inline unsigned long gfn_to_hva_memslot(struct kvm_memory_slot *slot,
- gfn_t gfn)
-{
- return slot->userspace_addr + (gfn - slot->base_gfn) * PAGE_SIZE;
-}
-
static inline gpa_t gfn_to_gpa(gfn_t gfn)
{
return (gpa_t)gfn << PAGE_SHIFT;
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index e3e1658..3416f8a 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -680,7 +680,13 @@ void update_memslots(struct kvm_memslots *slots, struct kvm_memory_slot *new)

static int check_memory_region_flags(struct kvm_userspace_memory_region *mem)
{
- if (mem->flags & ~KVM_MEM_LOG_DIRTY_PAGES)
+ u32 valid_flags = KVM_MEM_LOG_DIRTY_PAGES;
+
+#ifdef KVM_CAP_READONLY_MEM
+ valid_flags |= KVM_MEM_READONLY;
+#endif
+
+ if (mem->flags & ~valid_flags)
return -EINVAL;

return 0;
@@ -973,18 +979,45 @@ out:
return size;
}

-static unsigned long gfn_to_hva_many(struct kvm_memory_slot *slot, gfn_t gfn,
- gfn_t *nr_pages)
+static bool memslot_is_readonly(struct kvm_memory_slot *slot)
+{
+ return slot->flags & KVM_MEM_READONLY;
+}
+
+static unsigned long __gfn_to_hva_memslot(struct kvm_memory_slot *slot,
+ gfn_t gfn)
+{
+ return slot->userspace_addr + (gfn - slot->base_gfn) * PAGE_SIZE;
+}
+
+static unsigned long __gfn_to_hva_many(struct kvm_memory_slot *slot, gfn_t gfn,
+ gfn_t *nr_pages, bool write)
{
if (!slot || slot->flags & KVM_MEMSLOT_INVALID)
return KVM_HVA_ERR_BAD;

+ if (memslot_is_readonly(slot) && write)
+ return KVM_HVA_ERR_RO_BAD;
+
if (nr_pages)
*nr_pages = slot->npages - (gfn - slot->base_gfn);

- return gfn_to_hva_memslot(slot, gfn);
+ return __gfn_to_hva_memslot(slot, gfn);
}

+static unsigned long gfn_to_hva_many(struct kvm_memory_slot *slot, gfn_t gfn,
+ gfn_t *nr_pages)
+{
+ return __gfn_to_hva_many(slot, gfn, nr_pages, true);
+}
+
+unsigned long gfn_to_hva_memslot(struct kvm_memory_slot *slot,
+ gfn_t gfn)
+{
+ return gfn_to_hva_many(slot, gfn, NULL);
+}
+EXPORT_SYMBOL_GPL(gfn_to_hva_memslot);
+
unsigned long gfn_to_hva(struct kvm *kvm, gfn_t gfn)
{
return gfn_to_hva_many(gfn_to_memslot(kvm, gfn), gfn, NULL);
@@ -997,7 +1030,7 @@ EXPORT_SYMBOL_GPL(gfn_to_hva);
*/
static unsigned long gfn_to_hva_read(struct kvm *kvm, gfn_t gfn)
{
- return gfn_to_hva_many(gfn_to_memslot(kvm, gfn), gfn, NULL);
+ return __gfn_to_hva_many(gfn_to_memslot(kvm, gfn), gfn, NULL, false);
}

static int kvm_read_hva(void *data, void __user *hva, int len)
@@ -1106,6 +1139,17 @@ static int hva_to_pfn_slow(unsigned long addr, bool *async, bool write_fault,
return npages;
}

+static bool vma_is_valid(struct vm_area_struct *vma, bool write_fault)
+{
+ if (unlikely(!(vma->vm_flags & VM_READ)))
+ return false;
+
+ if (write_fault && (unlikely(!(vma->vm_flags & VM_WRITE))))
+ return false;
+
+ return true;
+}
+
/*
* Pin guest page in memory and return its pfn.
* @addr: host virtual address which maps memory to the guest
@@ -1130,8 +1174,6 @@ static pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool *async,
/* we can do it either atomically or asynchronously, not both */
BUG_ON(atomic && async);

- BUG_ON(!write_fault && !writable);
-
if (hva_to_pfn_fast(addr, atomic, async, write_fault, writable, &pfn))
return pfn;

@@ -1158,7 +1200,7 @@ static pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool *async,
vma->vm_pgoff;
BUG_ON(!kvm_is_mmio_pfn(pfn));
} else {
- if (async && (vma->vm_flags & VM_WRITE))
+ if (async && vma_is_valid(vma, write_fault))
*async = true;
pfn = KVM_PFN_ERR_FAULT;
}
@@ -1167,19 +1209,40 @@ exit:
return pfn;
}

+static pfn_t
+__gfn_to_pfn_memslot(struct kvm_memory_slot *slot, gfn_t gfn, bool atomic,
+ bool *async, bool write_fault, bool *writable)
+{
+ unsigned long addr = __gfn_to_hva_many(slot, gfn, NULL, write_fault);
+
+ if (addr == KVM_HVA_ERR_RO_BAD)
+ return KVM_PFN_ERR_RO_FAULT;
+
+ if (kvm_is_error_hva(addr))
+ return KVM_PFN_ERR_BAD;
+
+ /* Do not map writable pfn in the readonly memslot. */
+ if (writable && memslot_is_readonly(slot)) {
+ *writable = false;
+ writable = NULL;
+ }
+
+ return hva_to_pfn(addr, atomic, async, write_fault,
+ writable);
+}
+
static pfn_t __gfn_to_pfn(struct kvm *kvm, gfn_t gfn, bool atomic, bool *async,
bool write_fault, bool *writable)
{
- unsigned long addr;
+ struct kvm_memory_slot *slot;

if (async)
*async = false;

- addr = gfn_to_hva(kvm, gfn);
- if (kvm_is_error_hva(addr))
- return KVM_PFN_ERR_BAD;
+ slot = gfn_to_memslot(kvm, gfn);

- return hva_to_pfn(addr, atomic, async, write_fault, writable);
+ return __gfn_to_pfn_memslot(slot, gfn, atomic, async, write_fault,
+ writable);
}

pfn_t gfn_to_pfn_atomic(struct kvm *kvm, gfn_t gfn)
@@ -1210,15 +1273,12 @@ EXPORT_SYMBOL_GPL(gfn_to_pfn_prot);

pfn_t gfn_to_pfn_memslot(struct kvm_memory_slot *slot, gfn_t gfn)
{
- unsigned long addr = gfn_to_hva_memslot(slot, gfn);
- return hva_to_pfn(addr, false, NULL, true, NULL);
+ return __gfn_to_pfn_memslot(slot, gfn, false, NULL, true, NULL);
}

pfn_t gfn_to_pfn_memslot_atomic(struct kvm_memory_slot *slot, gfn_t gfn)
{
- unsigned long addr = gfn_to_hva_memslot(slot, gfn);
-
- return hva_to_pfn(addr, true, NULL, true, NULL);
+ return __gfn_to_pfn_memslot(slot, gfn, true, NULL, true, NULL);
}
EXPORT_SYMBOL_GPL(gfn_to_pfn_memslot_atomic);

--
1.7.7.6

2012-08-21 03:03:12

by Xiao Guangrong

[permalink] [raw]
Subject: [PATCH v6 09/12] KVM: introduce KVM_HVA_ERR_BAD

Then, remove bad_hva and inline kvm_is_error_hva

Signed-off-by: Xiao Guangrong <[email protected]>
---
include/linux/kvm_host.h | 8 +++++++-
virt/kvm/kvm_main.c | 13 +------------
2 files changed, 8 insertions(+), 13 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 4fd33e0..b9bba60 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -82,6 +82,13 @@ static inline bool is_invalid_pfn(pfn_t pfn)
return !is_noslot_pfn(pfn) && is_error_pfn(pfn);
}

+#define KVM_HVA_ERR_BAD (PAGE_OFFSET)
+
+static inline bool kvm_is_error_hva(unsigned long addr)
+{
+ return addr == PAGE_OFFSET;
+}
+
#define KVM_ERR_PTR_BAD_PAGE (ERR_PTR(-ENOENT))

static inline bool is_error_page(struct page *page)
@@ -430,7 +437,6 @@ id_to_memslot(struct kvm_memslots *slots, int id)
return slot;
}

-int kvm_is_error_hva(unsigned long addr);
int kvm_set_memory_region(struct kvm *kvm,
struct kvm_userspace_memory_region *mem,
int user_alloc);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index c89d1b5..e3e1658 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -931,17 +931,6 @@ void kvm_disable_largepages(void)
}
EXPORT_SYMBOL_GPL(kvm_disable_largepages);

-static inline unsigned long bad_hva(void)
-{
- return PAGE_OFFSET;
-}
-
-int kvm_is_error_hva(unsigned long addr)
-{
- return addr == bad_hva();
-}
-EXPORT_SYMBOL_GPL(kvm_is_error_hva);
-
struct kvm_memory_slot *gfn_to_memslot(struct kvm *kvm, gfn_t gfn)
{
return __gfn_to_memslot(kvm_memslots(kvm), gfn);
@@ -988,7 +977,7 @@ static unsigned long gfn_to_hva_many(struct kvm_memory_slot *slot, gfn_t gfn,
gfn_t *nr_pages)
{
if (!slot || slot->flags & KVM_MEMSLOT_INVALID)
- return bad_hva();
+ return KVM_HVA_ERR_BAD;

if (nr_pages)
*nr_pages = slot->npages - (gfn - slot->base_gfn);
--
1.7.7.6

2012-08-21 03:03:43

by Xiao Guangrong

[permalink] [raw]
Subject: [PATCH v6 12/12] KVM: indicate readonly access fault

Introduce write_readonly_mem in mmio-exit-info to indicate this exit is
caused by write access on readonly memslot

Signed-off-by: Xiao Guangrong <[email protected]>
---
arch/x86/kvm/x86.c | 23 ++++++++++++++---------
include/linux/kvm.h | 3 +++
include/linux/kvm_host.h | 1 +
virt/kvm/kvm_main.c | 3 +++
4 files changed, 21 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 42bbf41..6d7fe4a 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3710,9 +3710,10 @@ int emulator_write_phys(struct kvm_vcpu *vcpu, gpa_t gpa,

ret = kvm_write_guest(vcpu->kvm, gpa, val, bytes);
if (ret < 0)
- return 0;
+ return ret;
+
kvm_mmu_pte_write(vcpu, gpa, val, bytes);
- return 1;
+ return 0;
}

struct read_write_emulator_ops {
@@ -3742,7 +3743,7 @@ static int read_prepare(struct kvm_vcpu *vcpu, void *val, int bytes)
static int read_emulate(struct kvm_vcpu *vcpu, gpa_t gpa,
void *val, int bytes)
{
- return !kvm_read_guest(vcpu->kvm, gpa, val, bytes);
+ return kvm_read_guest(vcpu->kvm, gpa, val, bytes);
}

static int write_emulate(struct kvm_vcpu *vcpu, gpa_t gpa,
@@ -3807,7 +3808,8 @@ static int emulator_read_write_onepage(unsigned long addr, void *val,
if (ret)
goto mmio;

- if (ops->read_write_emulate(vcpu, gpa, val, bytes))
+ ret = ops->read_write_emulate(vcpu, gpa, val, bytes);
+ if (!ret)
return X86EMUL_CONTINUE;

mmio:
@@ -3829,6 +3831,7 @@ mmio:
frag->gpa = gpa;
frag->data = val;
frag->len = now;
+ frag->write_readonly_mem = (ret == -EPERM);

gpa += now;
val += now;
@@ -3842,8 +3845,8 @@ int emulator_read_write(struct x86_emulate_ctxt *ctxt, unsigned long addr,
struct x86_exception *exception,
struct read_write_emulator_ops *ops)
{
+ struct kvm_mmio_fragment *frag;
struct kvm_vcpu *vcpu = emul_to_vcpu(ctxt);
- gpa_t gpa;
int rc;

if (ops->read_write_prepare &&
@@ -3875,17 +3878,18 @@ int emulator_read_write(struct x86_emulate_ctxt *ctxt, unsigned long addr,
if (!vcpu->mmio_nr_fragments)
return rc;

- gpa = vcpu->mmio_fragments[0].gpa;
+ frag = &vcpu->mmio_fragments[0];

vcpu->mmio_needed = 1;
vcpu->mmio_cur_fragment = 0;

- vcpu->run->mmio.len = vcpu->mmio_fragments[0].len;
+ vcpu->run->mmio.len = frag->len;
vcpu->run->mmio.is_write = vcpu->mmio_is_write = ops->write;
vcpu->run->exit_reason = KVM_EXIT_MMIO;
- vcpu->run->mmio.phys_addr = gpa;
+ vcpu->run->mmio.phys_addr = frag->gpa;
+ vcpu->run->mmio.write_readonly_mem = frag->write_readonly_mem;

- return ops->read_write_exit_mmio(vcpu, gpa, val, bytes);
+ return ops->read_write_exit_mmio(vcpu, frag->gpa, val, bytes);
}

static int emulator_read_emulated(struct x86_emulate_ctxt *ctxt,
@@ -5525,6 +5529,7 @@ static int complete_mmio(struct kvm_vcpu *vcpu)
++frag;
run->exit_reason = KVM_EXIT_MMIO;
run->mmio.phys_addr = frag->gpa;
+ vcpu->run->mmio.write_readonly_mem = frag->write_readonly_mem;
if (vcpu->mmio_is_write)
memcpy(run->mmio.data, frag->data, frag->len);
run->mmio.len = frag->len;
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index d808694..9d8002f 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -226,6 +226,9 @@ struct kvm_run {
__u8 data[8];
__u32 len;
__u8 is_write;
+#ifdef __KVM_HAVE_READONLY_MEM
+ __u8 write_readonly_mem;
+#endif
} mmio;
/* KVM_EXIT_HYPERCALL */
struct {
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 5972c98..c37b225 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -189,6 +189,7 @@ struct kvm_mmio_fragment {
gpa_t gpa;
void *data;
unsigned len;
+ bool write_readonly_mem;
};

struct kvm_vcpu {
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 3416f8a..0c7def7 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1456,6 +1456,9 @@ int kvm_write_guest_page(struct kvm *kvm, gfn_t gfn, const void *data,
unsigned long addr;

addr = gfn_to_hva(kvm, gfn);
+ if (addr == KVM_HVA_ERR_RO_BAD)
+ return -EPERM;
+
if (kvm_is_error_hva(addr))
return -EFAULT;
r = __copy_to_user((void __user *)addr + offset, data, len);
--
1.7.7.6

2012-08-22 12:01:57

by Avi Kivity

[permalink] [raw]
Subject: Re: [PATCH v6 01/12] KVM: x86: fix possible infinite loop caused by reexecute_instruction

On 08/21/2012 05:57 AM, Xiao Guangrong wrote:
> Currently, we reexecute all unhandleable instructions if they do not
> access on the mmio, however, it can not work if host map the readonly
> memory to guest. If the instruction try to write this kind of memory,
> it will fault again when guest retry it, then we will goto a infinite
> loop: retry instruction -> write #PF -> emulation fail ->
> retry instruction -> ...
>
> Fix it by retrying the instruction only when it faults on the writable
> memory
>
> Signed-off-by: Xiao Guangrong <[email protected]>
> ---
> arch/x86/kvm/x86.c | 12 +++++++++++-
> 1 files changed, 11 insertions(+), 1 deletions(-)
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index fb0d937..704680d 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -4473,6 +4473,7 @@ static int handle_emulation_failure(struct kvm_vcpu *vcpu)
> static bool reexecute_instruction(struct kvm_vcpu *vcpu, gva_t gva)
> {
> gpa_t gpa;
> + pfn_t pfn;
>
> if (tdp_enabled)
> return false;
> @@ -4490,8 +4491,17 @@ static bool reexecute_instruction(struct kvm_vcpu *vcpu, gva_t gva)
> if (gpa == UNMAPPED_GVA)
> return true; /* let cpu generate fault */
>
> - if (!kvm_is_error_hva(gfn_to_hva(vcpu->kvm, gpa >> PAGE_SHIFT)))
> + /*
> + * Do not retry the unhandleable instruction if it faults on the
> + * readonly host memory, otherwise it will goto a infinite loop:
> + * retry instruction -> write #PF -> emulation fail -> retry
> + * instruction -> ...
> + */
> + pfn = gfn_to_pfn(vcpu->kvm, gpa_to_gfn(gpa));
> + if (!is_error_pfn(pfn)) {
> + kvm_release_pfn_clean(pfn);
> return true;
> + }
>
> return false;
> }
>

Good catch. Did this actually happen or did you find it by code inspection?

--
error compiling committee.c: too many arguments to function

2012-08-22 12:07:10

by Avi Kivity

[permalink] [raw]
Subject: Re: [PATCH v6 12/12] KVM: indicate readonly access fault

On 08/21/2012 06:03 AM, Xiao Guangrong wrote:
> Introduce write_readonly_mem in mmio-exit-info to indicate this exit is
> caused by write access on readonly memslot

Please document this in chapter 5 of apic.txt.

--
error compiling committee.c: too many arguments to function

2012-08-22 12:09:42

by Avi Kivity

[permalink] [raw]
Subject: Re: [PATCH v6 00/12] KVM: introduce readonly memslot

On 08/21/2012 05:57 AM, Xiao Guangrong wrote:
> Changelog:
> - fix endless retrying for unhandleable instruction which accesses on readonly
> host memory
> - divide slot->flags by 16:16, the lower part is visible for userspace, the
> reset is internally used in kvm, and document this in the code
> - check slot->flags for gfn_to_hva_memslot
>
> The test case can be found at:
> http://lkml.indiana.edu/hypermail/linux/kernel/1207.2/00819/migrate-perf.tar.bz2
>
> In current code, if we map a readonly memory space from host to guest
> and the page is not currently mapped in the host, we will get a fault-pfn
> and async is not allowed, then the vm will crash.
>
> As Avi's suggestion, We introduce readonly memory region to map ROM/ROMD
> to the guest, read access is happy for readonly memslot, write access on
> readonly memslot will cause KVM_EXIT_MMIO exit.

Thanks, applied 1-11.


--
error compiling committee.c: too many arguments to function

2012-08-22 12:47:45

by Xiao Guangrong

[permalink] [raw]
Subject: Re: [PATCH v6 12/12] KVM: indicate readonly access fault

On 08/22/2012 08:06 PM, Avi Kivity wrote:
> On 08/21/2012 06:03 AM, Xiao Guangrong wrote:
>> Introduce write_readonly_mem in mmio-exit-info to indicate this exit is
>> caused by write access on readonly memslot
>
> Please document this in chapter 5 of apic.txt.
>

Okay, please review this one.

Subject: [PATCH v6 12/12] KVM: indicate readonly access fault

Introduce write_readonly_mem in mmio-exit-info to indicate this exit is
caused by write access on readonly memslot

Signed-off-by: Xiao Guangrong <[email protected]>
---
Documentation/virtual/kvm/api.txt | 8 +++++++-
arch/x86/kvm/x86.c | 23 ++++++++++++++---------
include/linux/kvm.h | 3 +++
include/linux/kvm_host.h | 1 +
virt/kvm/kvm_main.c | 3 +++
5 files changed, 28 insertions(+), 10 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index b91bfd4..baf477e 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -2089,12 +2089,18 @@ Unused.
__u8 data[8];
__u32 len;
__u8 is_write;
+#ifdef __KVM_HAVE_READONLY_MEM
+ __u8 write_readonly_mem;
+#endif
} mmio;

If exit_reason is KVM_EXIT_MMIO, then the vcpu has
executed a memory-mapped I/O instruction which could not be satisfied
by kvm. The 'data' member contains the written data if 'is_write' is
-true, and should be filled by application code otherwise.
+true, and should be filled by application code otherwise. When the
+KVM_CAP_READONLY_MEM capability, the 'write_readonly_mem' member indicates
+whether the exit is caused by write access on the readonly memslot
+(see KVM_MEM_READONLY in 4.35 KVM_SET_USER_MEMORY_REGION).

NOTE: For KVM_EXIT_IO, KVM_EXIT_MMIO and KVM_EXIT_OSI, the corresponding
operations are complete (and guest state is consistent) only after userspace
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 42bbf41..6d7fe4a 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3710,9 +3710,10 @@ int emulator_write_phys(struct kvm_vcpu *vcpu, gpa_t gpa,

ret = kvm_write_guest(vcpu->kvm, gpa, val, bytes);
if (ret < 0)
- return 0;
+ return ret;
+
kvm_mmu_pte_write(vcpu, gpa, val, bytes);
- return 1;
+ return 0;
}

struct read_write_emulator_ops {
@@ -3742,7 +3743,7 @@ static int read_prepare(struct kvm_vcpu *vcpu, void *val, int bytes)
static int read_emulate(struct kvm_vcpu *vcpu, gpa_t gpa,
void *val, int bytes)
{
- return !kvm_read_guest(vcpu->kvm, gpa, val, bytes);
+ return kvm_read_guest(vcpu->kvm, gpa, val, bytes);
}

static int write_emulate(struct kvm_vcpu *vcpu, gpa_t gpa,
@@ -3807,7 +3808,8 @@ static int emulator_read_write_onepage(unsigned long addr, void *val,
if (ret)
goto mmio;

- if (ops->read_write_emulate(vcpu, gpa, val, bytes))
+ ret = ops->read_write_emulate(vcpu, gpa, val, bytes);
+ if (!ret)
return X86EMUL_CONTINUE;

mmio:
@@ -3829,6 +3831,7 @@ mmio:
frag->gpa = gpa;
frag->data = val;
frag->len = now;
+ frag->write_readonly_mem = (ret == -EPERM);

gpa += now;
val += now;
@@ -3842,8 +3845,8 @@ int emulator_read_write(struct x86_emulate_ctxt *ctxt, unsigned long addr,
struct x86_exception *exception,
struct read_write_emulator_ops *ops)
{
+ struct kvm_mmio_fragment *frag;
struct kvm_vcpu *vcpu = emul_to_vcpu(ctxt);
- gpa_t gpa;
int rc;

if (ops->read_write_prepare &&
@@ -3875,17 +3878,18 @@ int emulator_read_write(struct x86_emulate_ctxt *ctxt, unsigned long addr,
if (!vcpu->mmio_nr_fragments)
return rc;

- gpa = vcpu->mmio_fragments[0].gpa;
+ frag = &vcpu->mmio_fragments[0];

vcpu->mmio_needed = 1;
vcpu->mmio_cur_fragment = 0;

- vcpu->run->mmio.len = vcpu->mmio_fragments[0].len;
+ vcpu->run->mmio.len = frag->len;
vcpu->run->mmio.is_write = vcpu->mmio_is_write = ops->write;
vcpu->run->exit_reason = KVM_EXIT_MMIO;
- vcpu->run->mmio.phys_addr = gpa;
+ vcpu->run->mmio.phys_addr = frag->gpa;
+ vcpu->run->mmio.write_readonly_mem = frag->write_readonly_mem;

- return ops->read_write_exit_mmio(vcpu, gpa, val, bytes);
+ return ops->read_write_exit_mmio(vcpu, frag->gpa, val, bytes);
}

static int emulator_read_emulated(struct x86_emulate_ctxt *ctxt,
@@ -5525,6 +5529,7 @@ static int complete_mmio(struct kvm_vcpu *vcpu)
++frag;
run->exit_reason = KVM_EXIT_MMIO;
run->mmio.phys_addr = frag->gpa;
+ vcpu->run->mmio.write_readonly_mem = frag->write_readonly_mem;
if (vcpu->mmio_is_write)
memcpy(run->mmio.data, frag->data, frag->len);
run->mmio.len = frag->len;
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index d808694..9d8002f 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -226,6 +226,9 @@ struct kvm_run {
__u8 data[8];
__u32 len;
__u8 is_write;
+#ifdef __KVM_HAVE_READONLY_MEM
+ __u8 write_readonly_mem;
+#endif
} mmio;
/* KVM_EXIT_HYPERCALL */
struct {
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 5972c98..c37b225 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -189,6 +189,7 @@ struct kvm_mmio_fragment {
gpa_t gpa;
void *data;
unsigned len;
+ bool write_readonly_mem;
};

struct kvm_vcpu {
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 3416f8a..0c7def7 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1456,6 +1456,9 @@ int kvm_write_guest_page(struct kvm *kvm, gfn_t gfn, const void *data,
unsigned long addr;

addr = gfn_to_hva(kvm, gfn);
+ if (addr == KVM_HVA_ERR_RO_BAD)
+ return -EPERM;
+
if (kvm_is_error_hva(addr))
return -EFAULT;
r = __copy_to_user((void __user *)addr + offset, data, len);
--
1.7.7.6

2012-08-22 12:49:38

by Xiao Guangrong

[permalink] [raw]
Subject: Re: [PATCH v6 01/12] KVM: x86: fix possible infinite loop caused by reexecute_instruction

On 08/22/2012 08:01 PM, Avi Kivity wrote:
> On 08/21/2012 05:57 AM, Xiao Guangrong wrote:
>> Currently, we reexecute all unhandleable instructions if they do not
>> access on the mmio, however, it can not work if host map the readonly
>> memory to guest. If the instruction try to write this kind of memory,
>> it will fault again when guest retry it, then we will goto a infinite
>> loop: retry instruction -> write #PF -> emulation fail ->
>> retry instruction -> ...
>>
>> Fix it by retrying the instruction only when it faults on the writable
>> memory
>>
>> Signed-off-by: Xiao Guangrong <[email protected]>
>> ---
>> arch/x86/kvm/x86.c | 12 +++++++++++-
>> 1 files changed, 11 insertions(+), 1 deletions(-)
>>
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index fb0d937..704680d 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -4473,6 +4473,7 @@ static int handle_emulation_failure(struct kvm_vcpu *vcpu)
>> static bool reexecute_instruction(struct kvm_vcpu *vcpu, gva_t gva)
>> {
>> gpa_t gpa;
>> + pfn_t pfn;
>>
>> if (tdp_enabled)
>> return false;
>> @@ -4490,8 +4491,17 @@ static bool reexecute_instruction(struct kvm_vcpu *vcpu, gva_t gva)
>> if (gpa == UNMAPPED_GVA)
>> return true; /* let cpu generate fault */
>>
>> - if (!kvm_is_error_hva(gfn_to_hva(vcpu->kvm, gpa >> PAGE_SHIFT)))
>> + /*
>> + * Do not retry the unhandleable instruction if it faults on the
>> + * readonly host memory, otherwise it will goto a infinite loop:
>> + * retry instruction -> write #PF -> emulation fail -> retry
>> + * instruction -> ...
>> + */
>> + pfn = gfn_to_pfn(vcpu->kvm, gpa_to_gfn(gpa));
>> + if (!is_error_pfn(pfn)) {
>> + kvm_release_pfn_clean(pfn);
>> return true;
>> + }
>>
>> return false;
>> }
>>
>
> Good catch. Did this actually happen or did you find it by code inspection?
>

It did not happen in my test. Actually, it is reported by Marcelo.