2022-04-10 11:12:25

by Vishal Annapurve

[permalink] [raw]
Subject: [RFC V1 PATCH 0/5] selftests: KVM: selftests for fd-based approach of supporting private memory

This series implements selftests targeting the feature floated by Chao
via:
https://lore.kernel.org/linux-mm/[email protected]/

Below changes aim to test the fd based approach for guest private memory
in context of normal (non-confidential) VMs executing on non-confidential
platforms.

Confidential platforms along with the confidentiality aware software
stack support a notion of private/shared accesses from the confidential
VMs.
Generally, a bit in the GPA conveys the shared/private-ness of the
access. Non-confidential platforms don't have a notion of private or
shared accesses from the guest VMs. To support this notion,
KVM_HC_MAP_GPA_RANGE
is modified to allow marking an access from a VM within a GPA range as
always shared or private. Any suggestions regarding implementing this ioctl
alternatively/cleanly are appreciated.

priv_memfd_test.c file adds a suite of two basic selftests to access private
memory from the guest via private/shared access and checking if the contents
can be leaked to/accessed by vmm via shared memory view.

Test results:
1) PMPAT - PrivateMemoryPrivateAccess test passes
2) PMSAT - PrivateMemorySharedAccess test fails currently and needs more
analysis to understand the reason of failure.

Important - Below patch is needed to ensure host kernel crash is avoided while
running these tests:
https://github.com/vishals4gh/linux/commit/b9adedf777ad84af39042e9c19899600a4add68a

Github link for the patches posted as part of this series:
https://github.com/vishals4gh/linux/commits/priv_memfd_selftests_v1
Note that this series is dependent on Chao's v5 patches mentioned above
applied on top of 5.17.

Vishal Annapurve (5):
x86: kvm: HACK: Allow testing of priv memfd approach
selftests: kvm: Fix inline assembly for hypercall
selftests: kvm: Add a basic selftest test priv memfd
selftests: kvm: priv_memfd_test: Add support for memory conversion
selftests: kvm: priv_memfd_test: Add shared access test

arch/x86/include/uapi/asm/kvm_para.h | 1 +
arch/x86/kvm/mmu/mmu.c | 9 +-
arch/x86/kvm/x86.c | 16 +-
include/linux/kvm_host.h | 3 +
tools/testing/selftests/kvm/Makefile | 1 +
.../selftests/kvm/lib/x86_64/processor.c | 2 +-
tools/testing/selftests/kvm/priv_memfd_test.c | 410 ++++++++++++++++++
virt/kvm/kvm_main.c | 2 +-
8 files changed, 436 insertions(+), 8 deletions(-)
create mode 100644 tools/testing/selftests/kvm/priv_memfd_test.c

--
2.35.1.1178.g4f1659d476-goog


2022-04-12 06:58:12

by Vishal Annapurve

[permalink] [raw]
Subject: [RFC V1 PATCH 2/5] selftests: kvm: Fix inline assembly for hypercall

Fix inline assembly for hypercall to explicitly set
eax with hypercall number to allow the implementation
to work even in cases where compiler would inline the
function.

Signed-off-by: Vishal Annapurve <[email protected]>
---
tools/testing/selftests/kvm/lib/x86_64/processor.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/selftests/kvm/lib/x86_64/processor.c b/tools/testing/selftests/kvm/lib/x86_64/processor.c
index 9f000dfb5594..4d88e1a553bf 100644
--- a/tools/testing/selftests/kvm/lib/x86_64/processor.c
+++ b/tools/testing/selftests/kvm/lib/x86_64/processor.c
@@ -1461,7 +1461,7 @@ uint64_t kvm_hypercall(uint64_t nr, uint64_t a0, uint64_t a1, uint64_t a2,

asm volatile("vmcall"
: "=a"(r)
- : "b"(a0), "c"(a1), "d"(a2), "S"(a3));
+ : "a"(nr), "b"(a0), "c"(a1), "d"(a2), "S"(a3));
return r;
}

--
2.35.1.1178.g4f1659d476-goog

2022-04-12 08:30:23

by Vishal Annapurve

[permalink] [raw]
Subject: [RFC V1 PATCH 4/5] selftests: kvm: priv_memfd_test: Add support for memory conversion

Add handling of explicit private/shared memory conversion using
KVM_HC_MAP_GPA_RANGE and implicit memory conversion by handling
KVM_EXIT_MEMORY_ERROR.

Signed-off-by: Vishal Annapurve <[email protected]>
---
tools/testing/selftests/kvm/priv_memfd_test.c | 87 +++++++++++++++++++
1 file changed, 87 insertions(+)

diff --git a/tools/testing/selftests/kvm/priv_memfd_test.c b/tools/testing/selftests/kvm/priv_memfd_test.c
index 11ccdb853a84..0e6c19501f27 100644
--- a/tools/testing/selftests/kvm/priv_memfd_test.c
+++ b/tools/testing/selftests/kvm/priv_memfd_test.c
@@ -129,6 +129,83 @@ static struct test_run_helper priv_memfd_testsuite[] = {
},
};

+static void handle_vm_exit_hypercall(struct kvm_run *run,
+ uint32_t test_id)
+{
+ uint64_t gpa, npages, attrs;
+ int priv_memfd =
+ priv_memfd_testsuite[test_id].priv_memfd;
+ int ret;
+ int fallocate_mode;
+
+ if (run->hypercall.nr != KVM_HC_MAP_GPA_RANGE) {
+ TEST_FAIL("Unhandled Hypercall %lld\n",
+ run->hypercall.nr);
+ }
+
+ gpa = run->hypercall.args[0];
+ npages = run->hypercall.args[1];
+ attrs = run->hypercall.args[2];
+
+ if ((gpa >= TEST_MEM_GPA) && ((gpa +
+ (npages << MIN_PAGE_SHIFT)) <= TEST_MEM_END)) {
+ TEST_FAIL("Unhandled gpa 0x%lx npages %ld\n",
+ gpa, npages);
+ }
+
+ if (attrs & KVM_MAP_GPA_RANGE_ENCRYPTED)
+ fallocate_mode = 0;
+ else {
+ fallocate_mode = (FALLOC_FL_PUNCH_HOLE |
+ FALLOC_FL_KEEP_SIZE);
+ }
+ pr_info("Converting off 0x%lx pages 0x%lx to %s\n",
+ (gpa - TEST_MEM_GPA), npages,
+ fallocate_mode ?
+ "shared" : "private");
+ ret = fallocate(priv_memfd, fallocate_mode,
+ (gpa - TEST_MEM_GPA),
+ npages << MIN_PAGE_SHIFT);
+ TEST_ASSERT(ret != -1,
+ "fallocate failed in hc handling");
+ run->hypercall.ret = 0;
+}
+
+static void handle_vm_exit_memory_error(struct kvm_run *run,
+ uint32_t test_id)
+{
+ uint64_t gpa, size, flags;
+ int ret;
+ int priv_memfd =
+ priv_memfd_testsuite[test_id].priv_memfd;
+ int fallocate_mode;
+
+ gpa = run->memory.gpa;
+ size = run->memory.size;
+ flags = run->memory.flags;
+
+ if ((gpa < TEST_MEM_GPA) || ((gpa + size)
+ > TEST_MEM_END)) {
+ TEST_FAIL("Unhandled gpa 0x%lx size 0x%lx\n",
+ gpa, size);
+ }
+
+ if (flags & KVM_MEMORY_EXIT_FLAG_PRIVATE)
+ fallocate_mode = 0;
+ else {
+ fallocate_mode = (FALLOC_FL_PUNCH_HOLE |
+ FALLOC_FL_KEEP_SIZE);
+ }
+ pr_info("Converting off 0x%lx size 0x%lx to %s\n",
+ (gpa - TEST_MEM_GPA), size,
+ fallocate_mode ?
+ "shared" : "private");
+ ret = fallocate(priv_memfd, fallocate_mode,
+ (gpa - TEST_MEM_GPA), size);
+ TEST_ASSERT(ret != -1,
+ "fallocate failed in memory error handling");
+}
+
static void vcpu_work(struct kvm_vm *vm, uint32_t test_id)
{
struct kvm_run *run;
@@ -155,6 +232,16 @@ static void vcpu_work(struct kvm_vm *vm, uint32_t test_id)
continue;
}

+ if (run->exit_reason == KVM_EXIT_HYPERCALL) {
+ handle_vm_exit_hypercall(run, test_id);
+ continue;
+ }
+
+ if (run->exit_reason == KVM_EXIT_MEMORY_ERROR) {
+ handle_vm_exit_memory_error(run, test_id);
+ continue;
+ }
+
TEST_FAIL("Unhandled VCPU exit reason %d\n", run->exit_reason);
break;
}
--
2.35.1.1178.g4f1659d476-goog

2022-04-12 10:58:12

by Vishal Annapurve

[permalink] [raw]
Subject: [RFC V1 PATCH 1/5] x86: kvm: HACK: Allow testing of priv memfd approach

Add plumbing in KVM logic to allow private memfd series:
https://lore.kernel.org/linux-mm/[email protected]/
to be tested with non-confidential VMs.

1) Existing hypercall KVM_HC_MAP_GPA_RANGE is modified to support
marking pages of the guest memory as privately accessed or
accessed in a shared fashion.

2) kvm_vcpu_is_private_gfn is defined to allow guest accesses to
be categorized as shared or private based on the values set by
KVM_HC_MAP_GPA_RANGE hypercall.

3) KVM_MEM_PRIVATE flag for memslots is marked as always supported.

Signed-off-by: Vishal Annapurve <[email protected]>
---
arch/x86/include/uapi/asm/kvm_para.h | 1 +
arch/x86/kvm/mmu/mmu.c | 9 +++++----
arch/x86/kvm/x86.c | 16 ++++++++++++++--
include/linux/kvm_host.h | 3 +++
virt/kvm/kvm_main.c | 2 +-
5 files changed, 24 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
index 6e64b27b2c1e..3bc9add4095d 100644
--- a/arch/x86/include/uapi/asm/kvm_para.h
+++ b/arch/x86/include/uapi/asm/kvm_para.h
@@ -102,6 +102,7 @@ struct kvm_clock_pairing {
#define KVM_MAP_GPA_RANGE_PAGE_SZ_2M (1 << 0)
#define KVM_MAP_GPA_RANGE_PAGE_SZ_1G (1 << 1)
#define KVM_MAP_GPA_RANGE_ENC_STAT(n) (n << 4)
+#define KVM_MARK_GPA_RANGE_ENC_ACCESS (1 << 8)
#define KVM_MAP_GPA_RANGE_ENCRYPTED KVM_MAP_GPA_RANGE_ENC_STAT(1)
#define KVM_MAP_GPA_RANGE_DECRYPTED KVM_MAP_GPA_RANGE_ENC_STAT(0)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index b1a30a751db0..ee9bc36011de 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3895,10 +3895,11 @@ static bool kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,

static bool kvm_vcpu_is_private_gfn(struct kvm_vcpu *vcpu, gfn_t gfn)
{
- /*
- * At this time private gfn has not been supported yet. Other patch
- * that enables it should change this.
- */
+ gpa_t priv_gfn_end = vcpu->priv_gfn + vcpu->priv_pages;
+
+ if ((gfn >= vcpu->priv_gfn) && (gfn < priv_gfn_end))
+ return true;
+
return false;
}

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 11a949928a85..3b17fa7f2192 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9186,8 +9186,20 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
if (!(vcpu->kvm->arch.hypercall_exit_enabled & (1 << KVM_HC_MAP_GPA_RANGE)))
break;

- if (!PAGE_ALIGNED(gpa) || !npages ||
- gpa_to_gfn(gpa) + npages <= gpa_to_gfn(gpa)) {
+ if (!PAGE_ALIGNED(gpa) ||
+ gpa_to_gfn(gpa) + npages < gpa_to_gfn(gpa)) {
+ ret = -KVM_EINVAL;
+ break;
+ }
+
+ if (attrs & KVM_MARK_GPA_RANGE_ENC_ACCESS) {
+ vcpu->priv_gfn = gpa_to_gfn(gpa);
+ vcpu->priv_pages = npages;
+ ret = 0;
+ break;
+ }
+
+ if (!npages) {
ret = -KVM_EINVAL;
break;
}
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 0150e952a131..7c12a0bdb495 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -311,6 +311,9 @@ struct kvm_vcpu {
u64 requests;
unsigned long guest_debug;

+ uint64_t priv_gfn;
+ uint64_t priv_pages;
+
struct mutex mutex;
struct kvm_run *run;

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index df5311755a40..a31a58aa1b79 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1487,7 +1487,7 @@ static void kvm_replace_memslot(struct kvm *kvm,

bool __weak kvm_arch_private_memory_supported(struct kvm *kvm)
{
- return false;
+ return true;
}

static int check_memory_region_flags(struct kvm *kvm,
--
2.35.1.1178.g4f1659d476-goog

2022-04-12 21:53:25

by Chao Peng

[permalink] [raw]
Subject: Re: [RFC V1 PATCH 0/5] selftests: KVM: selftests for fd-based approach of supporting private memory

On Mon, Apr 11, 2022 at 05:31:09PM +0530, Nikunj A. Dadhania wrote:
> On 4/9/2022 2:35 AM, Vishal Annapurve wrote:
> > This series implements selftests targeting the feature floated by Chao
> > via:
> > https://lore.kernel.org/linux-mm/[email protected]/
> >
>
> Thanks for working on this.
>
> > Below changes aim to test the fd based approach for guest private memory
> > in context of normal (non-confidential) VMs executing on non-confidential
> > platforms.
> >
> > Confidential platforms along with the confidentiality aware software
> > stack support a notion of private/shared accesses from the confidential
> > VMs.
> > Generally, a bit in the GPA conveys the shared/private-ness of the
> > access. Non-confidential platforms don't have a notion of private or
> > shared accesses from the guest VMs. To support this notion,
> > KVM_HC_MAP_GPA_RANGE
> > is modified to allow marking an access from a VM within a GPA range as
> > always shared or private. Any suggestions regarding implementing this ioctl
> > alternatively/cleanly are appreciated.
> >
> > priv_memfd_test.c file adds a suite of two basic selftests to access private
> > memory from the guest via private/shared access and checking if the contents
> > can be leaked to/accessed by vmm via shared memory view.
> >
> > Test results:
> > 1) PMPAT - PrivateMemoryPrivateAccess test passes
> > 2) PMSAT - PrivateMemorySharedAccess test fails currently and needs more
> > analysis to understand the reason of failure.
>
> That could be because of the return code (*r = -1) from the KVM_EXIT_MEMORY_ERROR.
> This gets interpreted as -EPERM in the VMM when the vcpu_run exits.
>
> + vcpu->run->exit_reason = KVM_EXIT_MEMORY_ERROR;
> + vcpu->run->memory.flags = flags;
> + vcpu->run->memory.padding = 0;
> + vcpu->run->memory.gpa = fault->gfn << PAGE_SHIFT;
> + vcpu->run->memory.size = PAGE_SIZE;
> + fault->pfn = -1;
> + *r = -1;
> + return true;

That's true. The current private mem patch treats KVM_EXIT_MEMORY_ERROR as error
for KVM_RUN. That behavior needs to be discussed, but right now (v5) it hits the
ASSERT in tools/testing/selftests/kvm/lib/kvm_util.c before you have chance to
handle KVM_EXIT_MEMORY_ERROR in this patch series.

void vcpu_run(struct kvm_vm *vm, uint32_t vcpuid)
{
int ret = _vcpu_run(vm, vcpuid);
TEST_ASSERT(ret == 0, "KVM_RUN IOCTL failed, "
"rc: %i errno: %i", ret, errno);
}

Thanks,
Chao

>
>
> Regards
> Nikunj
>
> [1] https://lore.kernel.org/all/[email protected]/#t

2022-04-12 22:59:11

by Nikunj A. Dadhania

[permalink] [raw]
Subject: Re: [RFC V1 PATCH 0/5] selftests: KVM: selftests for fd-based approach of supporting private memory

On 4/9/2022 2:35 AM, Vishal Annapurve wrote:
> This series implements selftests targeting the feature floated by Chao
> via:
> https://lore.kernel.org/linux-mm/[email protected]/
>

Thanks for working on this.

> Below changes aim to test the fd based approach for guest private memory
> in context of normal (non-confidential) VMs executing on non-confidential
> platforms.
>
> Confidential platforms along with the confidentiality aware software
> stack support a notion of private/shared accesses from the confidential
> VMs.
> Generally, a bit in the GPA conveys the shared/private-ness of the
> access. Non-confidential platforms don't have a notion of private or
> shared accesses from the guest VMs. To support this notion,
> KVM_HC_MAP_GPA_RANGE
> is modified to allow marking an access from a VM within a GPA range as
> always shared or private. Any suggestions regarding implementing this ioctl
> alternatively/cleanly are appreciated.
>
> priv_memfd_test.c file adds a suite of two basic selftests to access private
> memory from the guest via private/shared access and checking if the contents
> can be leaked to/accessed by vmm via shared memory view.
>
> Test results:
> 1) PMPAT - PrivateMemoryPrivateAccess test passes
> 2) PMSAT - PrivateMemorySharedAccess test fails currently and needs more
> analysis to understand the reason of failure.

That could be because of the return code (*r = -1) from the KVM_EXIT_MEMORY_ERROR.
This gets interpreted as -EPERM in the VMM when the vcpu_run exits.

+ vcpu->run->exit_reason = KVM_EXIT_MEMORY_ERROR;
+ vcpu->run->memory.flags = flags;
+ vcpu->run->memory.padding = 0;
+ vcpu->run->memory.gpa = fault->gfn << PAGE_SHIFT;
+ vcpu->run->memory.size = PAGE_SIZE;
+ fault->pfn = -1;
+ *r = -1;
+ return true;


Regards
Nikunj

[1] https://lore.kernel.org/all/[email protected]/#t

2022-04-13 03:47:21

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [RFC V1 PATCH 0/5] selftests: KVM: selftests for fd-based approach of supporting private memory

On Fri, Apr 8, 2022, at 2:05 PM, Vishal Annapurve wrote:
> This series implements selftests targeting the feature floated by Chao
> via:
> https://lore.kernel.org/linux-mm/[email protected]/
>
> Below changes aim to test the fd based approach for guest private memory
> in context of normal (non-confidential) VMs executing on non-confidential
> platforms.
>
> Confidential platforms along with the confidentiality aware software
> stack support a notion of private/shared accesses from the confidential
> VMs.
> Generally, a bit in the GPA conveys the shared/private-ness of the
> access. Non-confidential platforms don't have a notion of private or
> shared accesses from the guest VMs. To support this notion,
> KVM_HC_MAP_GPA_RANGE
> is modified to allow marking an access from a VM within a GPA range as
> always shared or private. Any suggestions regarding implementing this ioctl
> alternatively/cleanly are appreciated.

This is fantastic. I do think we need to decide how this should work in general. We have a few platforms with somewhat different properties:

TDX: The guest decides, per memory access (using a GPA bit), whether an access is private or shared. In principle, the same address could be *both* and be distinguished by only that bit, and the two addresses would refer to different pages.

SEV: The guest decides, per memory access (using a GPA bit), whether an access is private or shared. At any given time, a physical address (with that bit masked off) can be private, shared, or invalid, but it can't be valid as private and shared at the same time.

pKVM (currently, as I understand it): the guest decides by hypercall, in advance of an access, which addresses are private and which are shared.

This series, if I understood it correctly, is like TDX except with no hardware security.

Sean or Chao, do you have a clear sense of whether the current fd-based private memory proposal can cleanly support SEV and pKVM? What, if anything, needs to be done on the API side to get that working well? I don't think we need to support SEV or pKVM right away to get this merged, but I do think we should understand how the API can map to them.

2022-04-14 14:31:12

by Michael Roth

[permalink] [raw]
Subject: Re: [RFC V1 PATCH 0/5] selftests: KVM: selftests for fd-based approach of supporting private memory

On Tue, Apr 12, 2022 at 05:16:22PM -0700, Andy Lutomirski wrote:
> On Fri, Apr 8, 2022, at 2:05 PM, Vishal Annapurve wrote:
> > This series implements selftests targeting the feature floated by Chao
> > via:
> > https://lore.kernel.org/linux-mm/[email protected]/
> >
> > Below changes aim to test the fd based approach for guest private memory
> > in context of normal (non-confidential) VMs executing on non-confidential
> > platforms.
> >
> > Confidential platforms along with the confidentiality aware software
> > stack support a notion of private/shared accesses from the confidential
> > VMs.
> > Generally, a bit in the GPA conveys the shared/private-ness of the
> > access. Non-confidential platforms don't have a notion of private or
> > shared accesses from the guest VMs. To support this notion,
> > KVM_HC_MAP_GPA_RANGE
> > is modified to allow marking an access from a VM within a GPA range as
> > always shared or private. Any suggestions regarding implementing this ioctl
> > alternatively/cleanly are appreciated.
>
> This is fantastic. I do think we need to decide how this should work in general. We have a few platforms with somewhat different properties:
>
> TDX: The guest decides, per memory access (using a GPA bit), whether an access is private or shared. In principle, the same address could be *both* and be distinguished by only that bit, and the two addresses would refer to different pages.
>
> SEV: The guest decides, per memory access (using a GPA bit), whether an access is private or shared. At any given time, a physical address (with that bit masked off) can be private, shared, or invalid, but it can't be valid as private and shared at the same time.
>
> pKVM (currently, as I understand it): the guest decides by hypercall, in advance of an access, which addresses are private and which are shared.
>
> This series, if I understood it correctly, is like TDX except with no hardware security.
>
> Sean or Chao, do you have a clear sense of whether the current fd-based private memory proposal can cleanly support SEV and pKVM? What, if anything, needs to be done on the API side to get that working well? I don't think we need to support SEV or pKVM right away to get this merged, but I do think we should understand how the API can map to them.

I've been looking at porting the SEV-SNP hypervisor patches over to
using memfd, and I hit an issue that I think is generally applicable
to SEV/SEV-ES as well. Namely at guest init time we have something
like the following flow:

VMM:
- allocate shared memory to back the guest and map it into guest
address space
- initialize shared memory with initialize memory contents (namely
the BIOS)
- ask KVM to encrypt these pages in-place and measure them to
generate the initial measured payload for attestation, via
KVM_SEV_LAUNCH_UPDATE with the GPA for each range of memory to
encrypt.
KVM:
- issue SEV_LAUNCH_UPDATE firmware command, which takes an HPA as
input and does an in-place encryption/measure of the page.

With current v5 of the memfd/UPM series, I think the expected flow is that
we would fallocate() these ranges from the private fd backend in advance of
calling KVM_SEV_LAUNCH_UPDATE (if VMM does it after we'd destroy the initial
guest payload, since they'd be replaced by newly-allocated pages). But if
VMM does it before, VMM has no way to initialize the guest memory contents,
since mmap()/pwrite() are disallowed due to MFD_INACCESSIBLE.

I think something similar to your proposal[1] here of making pread()/pwrite()
possible for private-fd-backed memory that's been flagged as "shareable"
would work for this case. Although here the "shareable" flag could be
removed immediately upon successful completion of the SEV_LAUNCH_UPDATE
firmware command.

I think with TDX this isn't an issue because their analagous TDH.MEM.PAGE.ADD
seamcall takes a pair of source/dest HPA as input params, so the VMM
wouldn't need write access to dest HPA at any point, just source HPA.

[1] https://lwn.net/ml/linux-kernel/[email protected]/

2022-04-15 06:12:33

by Chao Peng

[permalink] [raw]
Subject: Re: [RFC V1 PATCH 0/5] selftests: KVM: selftests for fd-based approach of supporting private memory

On Wed, Apr 13, 2022 at 08:42:00AM -0500, Michael Roth wrote:
> On Tue, Apr 12, 2022 at 05:16:22PM -0700, Andy Lutomirski wrote:
> > On Fri, Apr 8, 2022, at 2:05 PM, Vishal Annapurve wrote:
> > > This series implements selftests targeting the feature floated by Chao
> > > via:
> > > https://lore.kernel.org/linux-mm/[email protected]/
> > >
> > > Below changes aim to test the fd based approach for guest private memory
> > > in context of normal (non-confidential) VMs executing on non-confidential
> > > platforms.
> > >
> > > Confidential platforms along with the confidentiality aware software
> > > stack support a notion of private/shared accesses from the confidential
> > > VMs.
> > > Generally, a bit in the GPA conveys the shared/private-ness of the
> > > access. Non-confidential platforms don't have a notion of private or
> > > shared accesses from the guest VMs. To support this notion,
> > > KVM_HC_MAP_GPA_RANGE
> > > is modified to allow marking an access from a VM within a GPA range as
> > > always shared or private. Any suggestions regarding implementing this ioctl
> > > alternatively/cleanly are appreciated.
> >
> > This is fantastic. I do think we need to decide how this should work in general. We have a few platforms with somewhat different properties:
> >
> > TDX: The guest decides, per memory access (using a GPA bit), whether an access is private or shared. In principle, the same address could be *both* and be distinguished by only that bit, and the two addresses would refer to different pages.
> >
> > SEV: The guest decides, per memory access (using a GPA bit), whether an access is private or shared. At any given time, a physical address (with that bit masked off) can be private, shared, or invalid, but it can't be valid as private and shared at the same time.
> >
> > pKVM (currently, as I understand it): the guest decides by hypercall, in advance of an access, which addresses are private and which are shared.
> >
> > This series, if I understood it correctly, is like TDX except with no hardware security.
> >
> > Sean or Chao, do you have a clear sense of whether the current fd-based private memory proposal can cleanly support SEV and pKVM? What, if anything, needs to be done on the API side to get that working well? I don't think we need to support SEV or pKVM right away to get this merged, but I do think we should understand how the API can map to them.
>
> I've been looking at porting the SEV-SNP hypervisor patches over to
> using memfd, and I hit an issue that I think is generally applicable
> to SEV/SEV-ES as well. Namely at guest init time we have something
> like the following flow:
>
> VMM:
> - allocate shared memory to back the guest and map it into guest
> address space
> - initialize shared memory with initialize memory contents (namely
> the BIOS)
> - ask KVM to encrypt these pages in-place and measure them to
> generate the initial measured payload for attestation, via
> KVM_SEV_LAUNCH_UPDATE with the GPA for each range of memory to
> encrypt.
> KVM:
> - issue SEV_LAUNCH_UPDATE firmware command, which takes an HPA as
> input and does an in-place encryption/measure of the page.
>
> With current v5 of the memfd/UPM series, I think the expected flow is that
> we would fallocate() these ranges from the private fd backend in advance of
> calling KVM_SEV_LAUNCH_UPDATE (if VMM does it after we'd destroy the initial
> guest payload, since they'd be replaced by newly-allocated pages). But if
> VMM does it before, VMM has no way to initialize the guest memory contents,
> since mmap()/pwrite() are disallowed due to MFD_INACCESSIBLE.

OK, so for SEV, basically VMM puts vBIOS directly into guest memory and then
do in-place measurement.

TDX has no problem because TDX temporarily uses a VMM buffer (vs. guest memory)
to hold the vBIOS and then asks SEAM-MODULE to measure and copy that to guest
memory.

Maybe something like SHM_LOCK should be used instead of the aggressive
MFD_INACCESSIBLE. Before VMM calling SHM_LOCK on the memfd, the content
can be changed but after that it's not visible to userspace VMM. This
gives userspace a chance to modify the data in private page.

Chao
>
> I think something similar to your proposal[1] here of making pread()/pwrite()
> possible for private-fd-backed memory that's been flagged as "shareable"
> would work for this case. Although here the "shareable" flag could be
> removed immediately upon successful completion of the SEV_LAUNCH_UPDATE
> firmware command.
>
> I think with TDX this isn't an issue because their analagous TDH.MEM.PAGE.ADD
> seamcall takes a pair of source/dest HPA as input params, so the VMM
> wouldn't need write access to dest HPA at any point, just source HPA.
>
> [1] https://lwn.net/ml/linux-kernel/[email protected]/