2023-09-13 16:25:49

by Isaku Yamahata

[permalink] [raw]
Subject: [RFC PATCH 0/6] KVM: gmem: Implement error_remove_page

From: Isaku Yamahata <[email protected]>

This patch series is to share my progress on the KVM gmem error_remove_page task.
Although I'm still working on test cases, I don't want to hold the patches
locally until I finish test cases.

- Update error_remove_page method. Unmap gfn on poisoned pages. Pass related
arguments. Unfortunately, the error_remove_page callback is passed struct
page. So the callback can't know about the actual poisoned address and range.
The memory poisoning would be based on cache line size, though.
- Add a new flag to KVM_EXIT_MEMORY_FAULT to indicate the page is poisoned.
- Add check in faultin_pfn_private. When the page is poisoned,
KVM_EXIT_MEMORY_FAULT(HWPOISON).
- Only test case for ioctl(FIBMAP). Test cases are TODO.

TODOs
- Implement test cases to inject HWPOISON or MCE by hwpoison
(/sys/kernel/debug/hwpoison/corrupt-pfn) or MCE injection
(/sys/kernel/debug/mce-inject).
- Update qemu to handle KVM_EXIT_MEMORY_FAULT(HWPOISON)
- Update TDX KVM to handle it and Add test cases for TDX.
- Try to inject HWPOISON as soon as the poison is detected.

Isaku Yamahata (6):
KVM: guest_memfd: Add config to show the capability to handle error
page
KVM: guestmem_fd: Make error_remove_page callback to unmap guest
memory
KVM: guest_memfd, x86: MEMORY_FAULT exit with hw poisoned page
KVM: guest_memfd: Implemnet bmap inode operation
KVM: selftests: Add selftest for guest_memfd() fibmap
KVM: X86: Allow KVM gmem hwpoison test cases

arch/x86/kvm/Kconfig | 2 +
arch/x86/kvm/mmu/mmu.c | 21 +++--
include/linux/kvm_host.h | 2 +
include/uapi/linux/kvm.h | 3 +-
.../testing/selftests/kvm/guest_memfd_test.c | 45 ++++++++++
virt/kvm/Kconfig | 7 ++
virt/kvm/guest_mem.c | 82 +++++++++++++++----
7 files changed, 139 insertions(+), 23 deletions(-)


base-commit: a5accd8596fa84b9fe00076444b5ef628d2351b9
--
2.25.1


2023-09-13 19:17:25

by Isaku Yamahata

[permalink] [raw]
Subject: [RFC PATCH 1/6] KVM: guest_memfd: Add config to show the capability to handle error page

From: Isaku Yamahata <[email protected]>

Add config, HAVE_GENERIC_PRIVATE_MEM_HANDLE_ERROR, to indicate kvm arch
can handle gmem error page.

Signed-off-by: Isaku Yamahata <[email protected]>
---
virt/kvm/Kconfig | 3 +++
virt/kvm/guest_mem.c | 3 +++
2 files changed, 6 insertions(+)

diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
index 1a48cb530092..624df45baff0 100644
--- a/virt/kvm/Kconfig
+++ b/virt/kvm/Kconfig
@@ -112,3 +112,6 @@ config KVM_GENERIC_PRIVATE_MEM
select KVM_GENERIC_MEMORY_ATTRIBUTES
select KVM_PRIVATE_MEM
bool
+
+config HAVE_GENERIC_PRIVATE_MEM_HANDLE_ERROR
+ bool
diff --git a/virt/kvm/guest_mem.c b/virt/kvm/guest_mem.c
index 85903c32163f..35d8f03e7937 100644
--- a/virt/kvm/guest_mem.c
+++ b/virt/kvm/guest_mem.c
@@ -307,6 +307,9 @@ static int kvm_gmem_error_page(struct address_space *mapping, struct page *page)
pgoff_t start, end;
gfn_t gfn;

+ if (!IS_ENABLED(CONFIG_HAVE_GENERIC_PRIVATE_MEM_HANDLE_ERROR))
+ return MF_IGNORED;
+
filemap_invalidate_lock_shared(mapping);

start = page->index;
--
2.25.1

2023-09-13 20:04:40

by Isaku Yamahata

[permalink] [raw]
Subject: [RFC PATCH 6/6] KVM: X86: Allow KVM gmem hwpoison test cases

From: Isaku Yamahata <[email protected]>

Set HAVE_GENERIC_PRIVATE_MEM_HANDLE_ERROR and KVM_GENERIC_PRIVATE_MEM_BMAP
to allow test cases for KVM gmem.

Signed-off-by: Isaku Yamahata <[email protected]>
---
arch/x86/kvm/Kconfig | 2 ++
1 file changed, 2 insertions(+)

diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index 029c76bcd1a5..46fdedde9c0f 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -82,6 +82,8 @@ config KVM_SW_PROTECTED_VM
depends on EXPERT
depends on X86_64
select KVM_GENERIC_PRIVATE_MEM
+ select HAVE_GENERIC_PRIVATE_MEM_HANDLE_ERROR
+ select KVM_GENERIC_PRIVATE_MEM_BMAP
help
Enable support for KVM software-protected VMs. Currently "protected"
means the VM can be backed with memory provided by
--
2.25.1

2023-09-13 20:23:30

by Isaku Yamahata

[permalink] [raw]
Subject: [RFC PATCH 2/6] KVM: guestmem_fd: Make error_remove_page callback to unmap guest memory

From: Isaku Yamahata <[email protected]>

Implement error_remove_page inode method for KVM gmem. Update struct
kvm_gfn_range to indicate unmapping gufs because page is poisoned.

Signed-off-by: Isaku Yamahata <[email protected]>
---
include/linux/kvm_host.h | 2 ++
virt/kvm/guest_mem.c | 47 +++++++++++++++++++++++++++-------------
2 files changed, 34 insertions(+), 15 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 091bc89ae805..e81a7123c84f 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -266,8 +266,10 @@ struct kvm_gfn_range {
pte_t pte;
unsigned long attributes;
u64 raw;
+ struct page *page;
} arg;
bool may_block;
+ bool memory_error;
};
bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range);
bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range);
diff --git a/virt/kvm/guest_mem.c b/virt/kvm/guest_mem.c
index 35d8f03e7937..746e683df589 100644
--- a/virt/kvm/guest_mem.c
+++ b/virt/kvm/guest_mem.c
@@ -305,7 +305,7 @@ static int kvm_gmem_error_page(struct address_space *mapping, struct page *page)
struct kvm_gmem *gmem;
unsigned long index;
pgoff_t start, end;
- gfn_t gfn;
+ bool flush;

if (!IS_ENABLED(CONFIG_HAVE_GENERIC_PRIVATE_MEM_HANDLE_ERROR))
return MF_IGNORED;
@@ -316,26 +316,43 @@ static int kvm_gmem_error_page(struct address_space *mapping, struct page *page)
end = start + thp_nr_pages(page);

list_for_each_entry(gmem, gmem_list, entry) {
+ struct kvm *kvm = gmem->kvm;
+
+ KVM_MMU_LOCK(kvm);
+ kvm_mmu_invalidate_begin(kvm);
+ KVM_MMU_UNLOCK(kvm);
+
+ flush = false;
xa_for_each_range(&gmem->bindings, index, slot, start, end - 1) {
- for (gfn = start; gfn < end; gfn++) {
- if (WARN_ON_ONCE(gfn < slot->base_gfn ||
- gfn >= slot->base_gfn + slot->npages))
- continue;
-
- /*
- * FIXME: Tell userspace that the *private*
- * memory encountered an error.
- */
- send_sig_mceerr(BUS_MCEERR_AR,
- (void __user *)gfn_to_hva_memslot(slot, gfn),
- PAGE_SHIFT, current);
- }
+ pgoff_t pgoff;
+
+ if (WARN_ON_ONCE(end < slot->base_gfn ||
+ start >= slot->base_gfn + slot->npages))
+ continue;
+
+ pgoff = slot->gmem.pgoff;
+ struct kvm_gfn_range gfn_range = {
+ .slot = slot,
+ .start = slot->base_gfn + max(pgoff, start) - pgoff,
+ .end = slot->base_gfn + min(pgoff + slot->npages, end) - pgoff,
+ .arg.page = page,
+ .may_block = true,
+ .memory_error = true,
+ };
+
+ flush |= kvm_mmu_unmap_gfn_range(kvm, &gfn_range);
}
+ if (flush)
+ kvm_flush_remote_tlbs(kvm);
+
+ KVM_MMU_LOCK(kvm);
+ kvm_mmu_invalidate_end(kvm);
+ KVM_MMU_UNLOCK(kvm);
}

filemap_invalidate_unlock_shared(mapping);

- return 0;
+ return MF_DELAYED;
}

static const struct address_space_operations kvm_gmem_aops = {
--
2.25.1

2023-09-13 21:35:13

by Isaku Yamahata

[permalink] [raw]
Subject: [RFC PATCH 5/6] KVM: selftests: Add selftest for guest_memfd() fibmap

From: Isaku Yamahata <[email protected]>

Signed-off-by: Isaku Yamahata <[email protected]>
---
.../testing/selftests/kvm/guest_memfd_test.c | 45 +++++++++++++++++++
1 file changed, 45 insertions(+)

diff --git a/tools/testing/selftests/kvm/guest_memfd_test.c b/tools/testing/selftests/kvm/guest_memfd_test.c
index 4d2b110ab0d6..c20b4a14e9c7 100644
--- a/tools/testing/selftests/kvm/guest_memfd_test.c
+++ b/tools/testing/selftests/kvm/guest_memfd_test.c
@@ -10,6 +10,7 @@
#include "kvm_util_base.h"
#include <linux/bitmap.h>
#include <linux/falloc.h>
+#include <linux/fs.h>
#include <sys/mman.h>
#include <sys/types.h>
#include <sys/stat.h>
@@ -91,6 +92,49 @@ static void test_fallocate(int fd, size_t page_size, size_t total_size)
TEST_ASSERT(!ret, "fallocate to restore punched hole should succeed");
}

+static void test_fibmap(int fd, size_t page_size, size_t total_size)
+{
+ int ret;
+ int b;
+ int i;
+
+ /* Make while file unallocated as known initial state */
+ ret = fallocate(fd, FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE,
+ 0, total_size);
+ TEST_ASSERT(!ret, "fallocate(PUNCH_HOLE) at while file should succeed");
+
+ for (i = 0; i < total_size / page_size; i++) {
+ b = i;
+ ret = ioctl(fd, FIBMAP, &b);
+ if (ret == -EINVAL) {
+ print_skip("Set CONFIG_KVM_GENERIC_PRIVATE_MEM_BMAP=y. bmap test. ");
+ return;
+ }
+ TEST_ASSERT(!ret, "ioctl(FIBMAP) should succeed");
+ TEST_ASSERT(!b, "ioctl(FIBMAP) should return zero 0x%x", b);
+ }
+
+ ret = fallocate(fd, FALLOC_FL_KEEP_SIZE, page_size, page_size * 2);
+ TEST_ASSERT(!ret, "fallocate beginning at page_size should succeed");
+
+ for (i = 0; i < total_size / page_size; i++) {
+ b = i;
+ ret = ioctl(fd, FIBMAP, &b);
+ if (ret == -EINVAL) {
+ print_skip("Set CONFIG_KVM_GENERIC_PRIVATE_MEM_BMAP=y. bmap test. ");
+ return;
+ }
+ TEST_ASSERT(!ret, "ioctl(FIBMAP) should succeed");
+
+ if (i == 1 || i == 2) {
+ TEST_ASSERT(b, "ioctl(FIBMAP) should return non-zero 0x%x", b);
+ } else {
+ TEST_ASSERT(!b, "ioctl(FIBMAP) should return non-zero, 0x%x", b);
+ }
+ }
+
+}
+
static void test_create_guest_memfd_invalid(struct kvm_vm *vm)
{
uint64_t valid_flags = 0;
@@ -158,6 +202,7 @@ int main(int argc, char *argv[])
test_mmap(fd, page_size);
test_file_size(fd, page_size, total_size);
test_fallocate(fd, page_size, total_size);
+ test_fibmap(fd, page_size, total_size);

close(fd);
}
--
2.25.1

2023-09-14 15:22:54

by Sean Christopherson

[permalink] [raw]
Subject: Re: [RFC PATCH 1/6] KVM: guest_memfd: Add config to show the capability to handle error page

On Wed, Sep 13, 2023, [email protected] wrote:
> From: Isaku Yamahata <[email protected]>
>
> Add config, HAVE_GENERIC_PRIVATE_MEM_HANDLE_ERROR, to indicate kvm arch
> can handle gmem error page.
>
> Signed-off-by: Isaku Yamahata <[email protected]>
> ---
> virt/kvm/Kconfig | 3 +++
> virt/kvm/guest_mem.c | 3 +++
> 2 files changed, 6 insertions(+)
>
> diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
> index 1a48cb530092..624df45baff0 100644
> --- a/virt/kvm/Kconfig
> +++ b/virt/kvm/Kconfig
> @@ -112,3 +112,6 @@ config KVM_GENERIC_PRIVATE_MEM
> select KVM_GENERIC_MEMORY_ATTRIBUTES
> select KVM_PRIVATE_MEM
> bool
> +
> +config HAVE_GENERIC_PRIVATE_MEM_HANDLE_ERROR
> + bool
> diff --git a/virt/kvm/guest_mem.c b/virt/kvm/guest_mem.c
> index 85903c32163f..35d8f03e7937 100644
> --- a/virt/kvm/guest_mem.c
> +++ b/virt/kvm/guest_mem.c
> @@ -307,6 +307,9 @@ static int kvm_gmem_error_page(struct address_space *mapping, struct page *page)
> pgoff_t start, end;
> gfn_t gfn;
>
> + if (!IS_ENABLED(CONFIG_HAVE_GENERIC_PRIVATE_MEM_HANDLE_ERROR))
> + return MF_IGNORED;

I don't see the point, KVM can and should always zap SPTEs, i.e. can force the
geust to re-fault on the affected memory. At that point kvm_gmem_get_pfn() will
return -EHWPOISON and architectures that don't support graceful recovery can
simply terminate the VM.

2023-09-14 21:43:05

by Sean Christopherson

[permalink] [raw]
Subject: Re: [RFC PATCH 2/6] KVM: guestmem_fd: Make error_remove_page callback to unmap guest memory

On Wed, Sep 13, 2023, [email protected] wrote:
> @@ -316,26 +316,43 @@ static int kvm_gmem_error_page(struct address_space *mapping, struct page *page)
> end = start + thp_nr_pages(page);
>
> list_for_each_entry(gmem, gmem_list, entry) {
> + struct kvm *kvm = gmem->kvm;
> +
> + KVM_MMU_LOCK(kvm);
> + kvm_mmu_invalidate_begin(kvm);
> + KVM_MMU_UNLOCK(kvm);
> +
> + flush = false;
> xa_for_each_range(&gmem->bindings, index, slot, start, end - 1) {
> - for (gfn = start; gfn < end; gfn++) {
> - if (WARN_ON_ONCE(gfn < slot->base_gfn ||
> - gfn >= slot->base_gfn + slot->npages))
> - continue;
> -
> - /*
> - * FIXME: Tell userspace that the *private*
> - * memory encountered an error.
> - */
> - send_sig_mceerr(BUS_MCEERR_AR,
> - (void __user *)gfn_to_hva_memslot(slot, gfn),
> - PAGE_SHIFT, current);
> - }
> + pgoff_t pgoff;
> +
> + if (WARN_ON_ONCE(end < slot->base_gfn ||
> + start >= slot->base_gfn + slot->npages))
> + continue;
> +
> + pgoff = slot->gmem.pgoff;
> + struct kvm_gfn_range gfn_range = {
> + .slot = slot,
> + .start = slot->base_gfn + max(pgoff, start) - pgoff,
> + .end = slot->base_gfn + min(pgoff + slot->npages, end) - pgoff,
> + .arg.page = page,
> + .may_block = true,
> + .memory_error = true,

Why pass arg.page and memory_error? There's no usage in this mini-series, and no
explanation of what arch code would do the information. And I can't think of why
arch would need to do anything but zap the SPTEs. If the memory error is directly
related to the current instruction, the vCPU will fault on the zapped SPTE, see
-HWPOISON, and exit to userspace. If the memory is unrelated, then the delayed
notification is less than ideal, but not fundamentally broken, e.g. it's no worse
than TDX's behavior of not signaling #MC until a poisoned cache line is actually
accessed.

I don't get arg.page in particular, because having the gfn should be enough for
arch code to take action beyond zapping SPTEs.

And _if_ we want to communicate the error to arch code, it would be much better
to add a dedicated arch hook instead of piggybacking kvm_mmu_unmap_gfn_range()
with a "memory_error" flag.

If we just zap SPTEs, then can't this simply be?

static int kvm_gmem_error_page(struct address_space *mapping, struct page *page)
{
struct list_head *gmem_list = &mapping->private_list;
struct kvm_gmem *gmem;
pgoff_t start, end;

filemap_invalidate_lock_shared(mapping);

start = page->index;
end = start + thp_nr_pages(page);

list_for_each_entry(gmem, gmem_list, entry)
kvm_gmem_invalidate_begin(gmem, start, end);

/*
* Do not truncate the range, what action is taken in response to the
* error is userspace's decision (assuming the architecture supports
* gracefully handling memory errors). If/when the guest attempts to
* access a poisoned page, kvm_gmem_get_pfn() will return -EHWPOISON,
* at which point KVM can either terminate the VM or propagate the
* error to userspace.
*/

list_for_each_entry(gmem, gmem_list, entry)
kvm_gmem_invalidate_end(gmem, start, end);

filemap_invalidate_unlock_shared(mapping);

return MF_DELAYED;
}