This series implements selftests targeting the feature floated by Chao
via:
https://lore.kernel.org/linux-mm/[email protected]/
Below changes aim to test the fd based approach for guest private memory
in context of normal (non-confidential) VMs executing on non-confidential
platforms.
Confidential platforms along with the confidentiality aware software
stack support a notion of private/shared accesses from the confidential
VMs.
Generally, a bit in the GPA conveys the shared/private-ness of the
access. Non-confidential platforms don't have a notion of private or
shared accesses from the guest VMs. To support this notion,
KVM_HC_MAP_GPA_RANGE
is modified to allow marking an access from a VM within a GPA range as
always shared or private. Any suggestions regarding implementing this ioctl
alternatively/cleanly are appreciated.
priv_memfd_test.c file adds a suite of two basic selftests to access private
memory from the guest via private/shared access and checking if the contents
can be leaked to/accessed by vmm via shared memory view.
Test results:
1) PMPAT - PrivateMemoryPrivateAccess test passes
2) PMSAT - PrivateMemorySharedAccess test fails currently and needs more
analysis to understand the reason of failure.
Important - Below patch is needed to ensure host kernel crash is avoided while
running these tests:
https://github.com/vishals4gh/linux/commit/b9adedf777ad84af39042e9c19899600a4add68a
Github link for the patches posted as part of this series:
https://github.com/vishals4gh/linux/commits/priv_memfd_selftests_v1
Note that this series is dependent on Chao's v5 patches mentioned above
applied on top of 5.17.
Vishal Annapurve (5):
x86: kvm: HACK: Allow testing of priv memfd approach
selftests: kvm: Fix inline assembly for hypercall
selftests: kvm: Add a basic selftest test priv memfd
selftests: kvm: priv_memfd_test: Add support for memory conversion
selftests: kvm: priv_memfd_test: Add shared access test
arch/x86/include/uapi/asm/kvm_para.h | 1 +
arch/x86/kvm/mmu/mmu.c | 9 +-
arch/x86/kvm/x86.c | 16 +-
include/linux/kvm_host.h | 3 +
tools/testing/selftests/kvm/Makefile | 1 +
.../selftests/kvm/lib/x86_64/processor.c | 2 +-
tools/testing/selftests/kvm/priv_memfd_test.c | 410 ++++++++++++++++++
virt/kvm/kvm_main.c | 2 +-
8 files changed, 436 insertions(+), 8 deletions(-)
create mode 100644 tools/testing/selftests/kvm/priv_memfd_test.c
--
2.35.1.1178.g4f1659d476-goog
Fix inline assembly for hypercall to explicitly set
eax with hypercall number to allow the implementation
to work even in cases where compiler would inline the
function.
Signed-off-by: Vishal Annapurve <[email protected]>
---
tools/testing/selftests/kvm/lib/x86_64/processor.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/kvm/lib/x86_64/processor.c b/tools/testing/selftests/kvm/lib/x86_64/processor.c
index 9f000dfb5594..4d88e1a553bf 100644
--- a/tools/testing/selftests/kvm/lib/x86_64/processor.c
+++ b/tools/testing/selftests/kvm/lib/x86_64/processor.c
@@ -1461,7 +1461,7 @@ uint64_t kvm_hypercall(uint64_t nr, uint64_t a0, uint64_t a1, uint64_t a2,
asm volatile("vmcall"
: "=a"(r)
- : "b"(a0), "c"(a1), "d"(a2), "S"(a3));
+ : "a"(nr), "b"(a0), "c"(a1), "d"(a2), "S"(a3));
return r;
}
--
2.35.1.1178.g4f1659d476-goog
Add handling of explicit private/shared memory conversion using
KVM_HC_MAP_GPA_RANGE and implicit memory conversion by handling
KVM_EXIT_MEMORY_ERROR.
Signed-off-by: Vishal Annapurve <[email protected]>
---
tools/testing/selftests/kvm/priv_memfd_test.c | 87 +++++++++++++++++++
1 file changed, 87 insertions(+)
diff --git a/tools/testing/selftests/kvm/priv_memfd_test.c b/tools/testing/selftests/kvm/priv_memfd_test.c
index 11ccdb853a84..0e6c19501f27 100644
--- a/tools/testing/selftests/kvm/priv_memfd_test.c
+++ b/tools/testing/selftests/kvm/priv_memfd_test.c
@@ -129,6 +129,83 @@ static struct test_run_helper priv_memfd_testsuite[] = {
},
};
+static void handle_vm_exit_hypercall(struct kvm_run *run,
+ uint32_t test_id)
+{
+ uint64_t gpa, npages, attrs;
+ int priv_memfd =
+ priv_memfd_testsuite[test_id].priv_memfd;
+ int ret;
+ int fallocate_mode;
+
+ if (run->hypercall.nr != KVM_HC_MAP_GPA_RANGE) {
+ TEST_FAIL("Unhandled Hypercall %lld\n",
+ run->hypercall.nr);
+ }
+
+ gpa = run->hypercall.args[0];
+ npages = run->hypercall.args[1];
+ attrs = run->hypercall.args[2];
+
+ if ((gpa >= TEST_MEM_GPA) && ((gpa +
+ (npages << MIN_PAGE_SHIFT)) <= TEST_MEM_END)) {
+ TEST_FAIL("Unhandled gpa 0x%lx npages %ld\n",
+ gpa, npages);
+ }
+
+ if (attrs & KVM_MAP_GPA_RANGE_ENCRYPTED)
+ fallocate_mode = 0;
+ else {
+ fallocate_mode = (FALLOC_FL_PUNCH_HOLE |
+ FALLOC_FL_KEEP_SIZE);
+ }
+ pr_info("Converting off 0x%lx pages 0x%lx to %s\n",
+ (gpa - TEST_MEM_GPA), npages,
+ fallocate_mode ?
+ "shared" : "private");
+ ret = fallocate(priv_memfd, fallocate_mode,
+ (gpa - TEST_MEM_GPA),
+ npages << MIN_PAGE_SHIFT);
+ TEST_ASSERT(ret != -1,
+ "fallocate failed in hc handling");
+ run->hypercall.ret = 0;
+}
+
+static void handle_vm_exit_memory_error(struct kvm_run *run,
+ uint32_t test_id)
+{
+ uint64_t gpa, size, flags;
+ int ret;
+ int priv_memfd =
+ priv_memfd_testsuite[test_id].priv_memfd;
+ int fallocate_mode;
+
+ gpa = run->memory.gpa;
+ size = run->memory.size;
+ flags = run->memory.flags;
+
+ if ((gpa < TEST_MEM_GPA) || ((gpa + size)
+ > TEST_MEM_END)) {
+ TEST_FAIL("Unhandled gpa 0x%lx size 0x%lx\n",
+ gpa, size);
+ }
+
+ if (flags & KVM_MEMORY_EXIT_FLAG_PRIVATE)
+ fallocate_mode = 0;
+ else {
+ fallocate_mode = (FALLOC_FL_PUNCH_HOLE |
+ FALLOC_FL_KEEP_SIZE);
+ }
+ pr_info("Converting off 0x%lx size 0x%lx to %s\n",
+ (gpa - TEST_MEM_GPA), size,
+ fallocate_mode ?
+ "shared" : "private");
+ ret = fallocate(priv_memfd, fallocate_mode,
+ (gpa - TEST_MEM_GPA), size);
+ TEST_ASSERT(ret != -1,
+ "fallocate failed in memory error handling");
+}
+
static void vcpu_work(struct kvm_vm *vm, uint32_t test_id)
{
struct kvm_run *run;
@@ -155,6 +232,16 @@ static void vcpu_work(struct kvm_vm *vm, uint32_t test_id)
continue;
}
+ if (run->exit_reason == KVM_EXIT_HYPERCALL) {
+ handle_vm_exit_hypercall(run, test_id);
+ continue;
+ }
+
+ if (run->exit_reason == KVM_EXIT_MEMORY_ERROR) {
+ handle_vm_exit_memory_error(run, test_id);
+ continue;
+ }
+
TEST_FAIL("Unhandled VCPU exit reason %d\n", run->exit_reason);
break;
}
--
2.35.1.1178.g4f1659d476-goog
Add plumbing in KVM logic to allow private memfd series:
https://lore.kernel.org/linux-mm/[email protected]/
to be tested with non-confidential VMs.
1) Existing hypercall KVM_HC_MAP_GPA_RANGE is modified to support
marking pages of the guest memory as privately accessed or
accessed in a shared fashion.
2) kvm_vcpu_is_private_gfn is defined to allow guest accesses to
be categorized as shared or private based on the values set by
KVM_HC_MAP_GPA_RANGE hypercall.
3) KVM_MEM_PRIVATE flag for memslots is marked as always supported.
Signed-off-by: Vishal Annapurve <[email protected]>
---
arch/x86/include/uapi/asm/kvm_para.h | 1 +
arch/x86/kvm/mmu/mmu.c | 9 +++++----
arch/x86/kvm/x86.c | 16 ++++++++++++++--
include/linux/kvm_host.h | 3 +++
virt/kvm/kvm_main.c | 2 +-
5 files changed, 24 insertions(+), 7 deletions(-)
diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
index 6e64b27b2c1e..3bc9add4095d 100644
--- a/arch/x86/include/uapi/asm/kvm_para.h
+++ b/arch/x86/include/uapi/asm/kvm_para.h
@@ -102,6 +102,7 @@ struct kvm_clock_pairing {
#define KVM_MAP_GPA_RANGE_PAGE_SZ_2M (1 << 0)
#define KVM_MAP_GPA_RANGE_PAGE_SZ_1G (1 << 1)
#define KVM_MAP_GPA_RANGE_ENC_STAT(n) (n << 4)
+#define KVM_MARK_GPA_RANGE_ENC_ACCESS (1 << 8)
#define KVM_MAP_GPA_RANGE_ENCRYPTED KVM_MAP_GPA_RANGE_ENC_STAT(1)
#define KVM_MAP_GPA_RANGE_DECRYPTED KVM_MAP_GPA_RANGE_ENC_STAT(0)
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index b1a30a751db0..ee9bc36011de 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3895,10 +3895,11 @@ static bool kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
static bool kvm_vcpu_is_private_gfn(struct kvm_vcpu *vcpu, gfn_t gfn)
{
- /*
- * At this time private gfn has not been supported yet. Other patch
- * that enables it should change this.
- */
+ gpa_t priv_gfn_end = vcpu->priv_gfn + vcpu->priv_pages;
+
+ if ((gfn >= vcpu->priv_gfn) && (gfn < priv_gfn_end))
+ return true;
+
return false;
}
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 11a949928a85..3b17fa7f2192 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9186,8 +9186,20 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
if (!(vcpu->kvm->arch.hypercall_exit_enabled & (1 << KVM_HC_MAP_GPA_RANGE)))
break;
- if (!PAGE_ALIGNED(gpa) || !npages ||
- gpa_to_gfn(gpa) + npages <= gpa_to_gfn(gpa)) {
+ if (!PAGE_ALIGNED(gpa) ||
+ gpa_to_gfn(gpa) + npages < gpa_to_gfn(gpa)) {
+ ret = -KVM_EINVAL;
+ break;
+ }
+
+ if (attrs & KVM_MARK_GPA_RANGE_ENC_ACCESS) {
+ vcpu->priv_gfn = gpa_to_gfn(gpa);
+ vcpu->priv_pages = npages;
+ ret = 0;
+ break;
+ }
+
+ if (!npages) {
ret = -KVM_EINVAL;
break;
}
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 0150e952a131..7c12a0bdb495 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -311,6 +311,9 @@ struct kvm_vcpu {
u64 requests;
unsigned long guest_debug;
+ uint64_t priv_gfn;
+ uint64_t priv_pages;
+
struct mutex mutex;
struct kvm_run *run;
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index df5311755a40..a31a58aa1b79 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1487,7 +1487,7 @@ static void kvm_replace_memslot(struct kvm *kvm,
bool __weak kvm_arch_private_memory_supported(struct kvm *kvm)
{
- return false;
+ return true;
}
static int check_memory_region_flags(struct kvm *kvm,
--
2.35.1.1178.g4f1659d476-goog
On Mon, Apr 11, 2022 at 05:31:09PM +0530, Nikunj A. Dadhania wrote:
> On 4/9/2022 2:35 AM, Vishal Annapurve wrote:
> > This series implements selftests targeting the feature floated by Chao
> > via:
> > https://lore.kernel.org/linux-mm/[email protected]/
> >
>
> Thanks for working on this.
>
> > Below changes aim to test the fd based approach for guest private memory
> > in context of normal (non-confidential) VMs executing on non-confidential
> > platforms.
> >
> > Confidential platforms along with the confidentiality aware software
> > stack support a notion of private/shared accesses from the confidential
> > VMs.
> > Generally, a bit in the GPA conveys the shared/private-ness of the
> > access. Non-confidential platforms don't have a notion of private or
> > shared accesses from the guest VMs. To support this notion,
> > KVM_HC_MAP_GPA_RANGE
> > is modified to allow marking an access from a VM within a GPA range as
> > always shared or private. Any suggestions regarding implementing this ioctl
> > alternatively/cleanly are appreciated.
> >
> > priv_memfd_test.c file adds a suite of two basic selftests to access private
> > memory from the guest via private/shared access and checking if the contents
> > can be leaked to/accessed by vmm via shared memory view.
> >
> > Test results:
> > 1) PMPAT - PrivateMemoryPrivateAccess test passes
> > 2) PMSAT - PrivateMemorySharedAccess test fails currently and needs more
> > analysis to understand the reason of failure.
>
> That could be because of the return code (*r = -1) from the KVM_EXIT_MEMORY_ERROR.
> This gets interpreted as -EPERM in the VMM when the vcpu_run exits.
>
> + vcpu->run->exit_reason = KVM_EXIT_MEMORY_ERROR;
> + vcpu->run->memory.flags = flags;
> + vcpu->run->memory.padding = 0;
> + vcpu->run->memory.gpa = fault->gfn << PAGE_SHIFT;
> + vcpu->run->memory.size = PAGE_SIZE;
> + fault->pfn = -1;
> + *r = -1;
> + return true;
That's true. The current private mem patch treats KVM_EXIT_MEMORY_ERROR as error
for KVM_RUN. That behavior needs to be discussed, but right now (v5) it hits the
ASSERT in tools/testing/selftests/kvm/lib/kvm_util.c before you have chance to
handle KVM_EXIT_MEMORY_ERROR in this patch series.
void vcpu_run(struct kvm_vm *vm, uint32_t vcpuid)
{
int ret = _vcpu_run(vm, vcpuid);
TEST_ASSERT(ret == 0, "KVM_RUN IOCTL failed, "
"rc: %i errno: %i", ret, errno);
}
Thanks,
Chao
>
>
> Regards
> Nikunj
>
> [1] https://lore.kernel.org/all/[email protected]/#t
On 4/9/2022 2:35 AM, Vishal Annapurve wrote:
> This series implements selftests targeting the feature floated by Chao
> via:
> https://lore.kernel.org/linux-mm/[email protected]/
>
Thanks for working on this.
> Below changes aim to test the fd based approach for guest private memory
> in context of normal (non-confidential) VMs executing on non-confidential
> platforms.
>
> Confidential platforms along with the confidentiality aware software
> stack support a notion of private/shared accesses from the confidential
> VMs.
> Generally, a bit in the GPA conveys the shared/private-ness of the
> access. Non-confidential platforms don't have a notion of private or
> shared accesses from the guest VMs. To support this notion,
> KVM_HC_MAP_GPA_RANGE
> is modified to allow marking an access from a VM within a GPA range as
> always shared or private. Any suggestions regarding implementing this ioctl
> alternatively/cleanly are appreciated.
>
> priv_memfd_test.c file adds a suite of two basic selftests to access private
> memory from the guest via private/shared access and checking if the contents
> can be leaked to/accessed by vmm via shared memory view.
>
> Test results:
> 1) PMPAT - PrivateMemoryPrivateAccess test passes
> 2) PMSAT - PrivateMemorySharedAccess test fails currently and needs more
> analysis to understand the reason of failure.
That could be because of the return code (*r = -1) from the KVM_EXIT_MEMORY_ERROR.
This gets interpreted as -EPERM in the VMM when the vcpu_run exits.
+ vcpu->run->exit_reason = KVM_EXIT_MEMORY_ERROR;
+ vcpu->run->memory.flags = flags;
+ vcpu->run->memory.padding = 0;
+ vcpu->run->memory.gpa = fault->gfn << PAGE_SHIFT;
+ vcpu->run->memory.size = PAGE_SIZE;
+ fault->pfn = -1;
+ *r = -1;
+ return true;
Regards
Nikunj
[1] https://lore.kernel.org/all/[email protected]/#t
On Fri, Apr 8, 2022, at 2:05 PM, Vishal Annapurve wrote:
> This series implements selftests targeting the feature floated by Chao
> via:
> https://lore.kernel.org/linux-mm/[email protected]/
>
> Below changes aim to test the fd based approach for guest private memory
> in context of normal (non-confidential) VMs executing on non-confidential
> platforms.
>
> Confidential platforms along with the confidentiality aware software
> stack support a notion of private/shared accesses from the confidential
> VMs.
> Generally, a bit in the GPA conveys the shared/private-ness of the
> access. Non-confidential platforms don't have a notion of private or
> shared accesses from the guest VMs. To support this notion,
> KVM_HC_MAP_GPA_RANGE
> is modified to allow marking an access from a VM within a GPA range as
> always shared or private. Any suggestions regarding implementing this ioctl
> alternatively/cleanly are appreciated.
This is fantastic. I do think we need to decide how this should work in general. We have a few platforms with somewhat different properties:
TDX: The guest decides, per memory access (using a GPA bit), whether an access is private or shared. In principle, the same address could be *both* and be distinguished by only that bit, and the two addresses would refer to different pages.
SEV: The guest decides, per memory access (using a GPA bit), whether an access is private or shared. At any given time, a physical address (with that bit masked off) can be private, shared, or invalid, but it can't be valid as private and shared at the same time.
pKVM (currently, as I understand it): the guest decides by hypercall, in advance of an access, which addresses are private and which are shared.
This series, if I understood it correctly, is like TDX except with no hardware security.
Sean or Chao, do you have a clear sense of whether the current fd-based private memory proposal can cleanly support SEV and pKVM? What, if anything, needs to be done on the API side to get that working well? I don't think we need to support SEV or pKVM right away to get this merged, but I do think we should understand how the API can map to them.
On Tue, Apr 12, 2022 at 05:16:22PM -0700, Andy Lutomirski wrote:
> On Fri, Apr 8, 2022, at 2:05 PM, Vishal Annapurve wrote:
> > This series implements selftests targeting the feature floated by Chao
> > via:
> > https://lore.kernel.org/linux-mm/[email protected]/
> >
> > Below changes aim to test the fd based approach for guest private memory
> > in context of normal (non-confidential) VMs executing on non-confidential
> > platforms.
> >
> > Confidential platforms along with the confidentiality aware software
> > stack support a notion of private/shared accesses from the confidential
> > VMs.
> > Generally, a bit in the GPA conveys the shared/private-ness of the
> > access. Non-confidential platforms don't have a notion of private or
> > shared accesses from the guest VMs. To support this notion,
> > KVM_HC_MAP_GPA_RANGE
> > is modified to allow marking an access from a VM within a GPA range as
> > always shared or private. Any suggestions regarding implementing this ioctl
> > alternatively/cleanly are appreciated.
>
> This is fantastic. I do think we need to decide how this should work in general. We have a few platforms with somewhat different properties:
>
> TDX: The guest decides, per memory access (using a GPA bit), whether an access is private or shared. In principle, the same address could be *both* and be distinguished by only that bit, and the two addresses would refer to different pages.
>
> SEV: The guest decides, per memory access (using a GPA bit), whether an access is private or shared. At any given time, a physical address (with that bit masked off) can be private, shared, or invalid, but it can't be valid as private and shared at the same time.
>
> pKVM (currently, as I understand it): the guest decides by hypercall, in advance of an access, which addresses are private and which are shared.
>
> This series, if I understood it correctly, is like TDX except with no hardware security.
>
> Sean or Chao, do you have a clear sense of whether the current fd-based private memory proposal can cleanly support SEV and pKVM? What, if anything, needs to be done on the API side to get that working well? I don't think we need to support SEV or pKVM right away to get this merged, but I do think we should understand how the API can map to them.
I've been looking at porting the SEV-SNP hypervisor patches over to
using memfd, and I hit an issue that I think is generally applicable
to SEV/SEV-ES as well. Namely at guest init time we have something
like the following flow:
VMM:
- allocate shared memory to back the guest and map it into guest
address space
- initialize shared memory with initialize memory contents (namely
the BIOS)
- ask KVM to encrypt these pages in-place and measure them to
generate the initial measured payload for attestation, via
KVM_SEV_LAUNCH_UPDATE with the GPA for each range of memory to
encrypt.
KVM:
- issue SEV_LAUNCH_UPDATE firmware command, which takes an HPA as
input and does an in-place encryption/measure of the page.
With current v5 of the memfd/UPM series, I think the expected flow is that
we would fallocate() these ranges from the private fd backend in advance of
calling KVM_SEV_LAUNCH_UPDATE (if VMM does it after we'd destroy the initial
guest payload, since they'd be replaced by newly-allocated pages). But if
VMM does it before, VMM has no way to initialize the guest memory contents,
since mmap()/pwrite() are disallowed due to MFD_INACCESSIBLE.
I think something similar to your proposal[1] here of making pread()/pwrite()
possible for private-fd-backed memory that's been flagged as "shareable"
would work for this case. Although here the "shareable" flag could be
removed immediately upon successful completion of the SEV_LAUNCH_UPDATE
firmware command.
I think with TDX this isn't an issue because their analagous TDH.MEM.PAGE.ADD
seamcall takes a pair of source/dest HPA as input params, so the VMM
wouldn't need write access to dest HPA at any point, just source HPA.
[1] https://lwn.net/ml/linux-kernel/[email protected]/
On Wed, Apr 13, 2022 at 08:42:00AM -0500, Michael Roth wrote:
> On Tue, Apr 12, 2022 at 05:16:22PM -0700, Andy Lutomirski wrote:
> > On Fri, Apr 8, 2022, at 2:05 PM, Vishal Annapurve wrote:
> > > This series implements selftests targeting the feature floated by Chao
> > > via:
> > > https://lore.kernel.org/linux-mm/[email protected]/
> > >
> > > Below changes aim to test the fd based approach for guest private memory
> > > in context of normal (non-confidential) VMs executing on non-confidential
> > > platforms.
> > >
> > > Confidential platforms along with the confidentiality aware software
> > > stack support a notion of private/shared accesses from the confidential
> > > VMs.
> > > Generally, a bit in the GPA conveys the shared/private-ness of the
> > > access. Non-confidential platforms don't have a notion of private or
> > > shared accesses from the guest VMs. To support this notion,
> > > KVM_HC_MAP_GPA_RANGE
> > > is modified to allow marking an access from a VM within a GPA range as
> > > always shared or private. Any suggestions regarding implementing this ioctl
> > > alternatively/cleanly are appreciated.
> >
> > This is fantastic. I do think we need to decide how this should work in general. We have a few platforms with somewhat different properties:
> >
> > TDX: The guest decides, per memory access (using a GPA bit), whether an access is private or shared. In principle, the same address could be *both* and be distinguished by only that bit, and the two addresses would refer to different pages.
> >
> > SEV: The guest decides, per memory access (using a GPA bit), whether an access is private or shared. At any given time, a physical address (with that bit masked off) can be private, shared, or invalid, but it can't be valid as private and shared at the same time.
> >
> > pKVM (currently, as I understand it): the guest decides by hypercall, in advance of an access, which addresses are private and which are shared.
> >
> > This series, if I understood it correctly, is like TDX except with no hardware security.
> >
> > Sean or Chao, do you have a clear sense of whether the current fd-based private memory proposal can cleanly support SEV and pKVM? What, if anything, needs to be done on the API side to get that working well? I don't think we need to support SEV or pKVM right away to get this merged, but I do think we should understand how the API can map to them.
>
> I've been looking at porting the SEV-SNP hypervisor patches over to
> using memfd, and I hit an issue that I think is generally applicable
> to SEV/SEV-ES as well. Namely at guest init time we have something
> like the following flow:
>
> VMM:
> - allocate shared memory to back the guest and map it into guest
> address space
> - initialize shared memory with initialize memory contents (namely
> the BIOS)
> - ask KVM to encrypt these pages in-place and measure them to
> generate the initial measured payload for attestation, via
> KVM_SEV_LAUNCH_UPDATE with the GPA for each range of memory to
> encrypt.
> KVM:
> - issue SEV_LAUNCH_UPDATE firmware command, which takes an HPA as
> input and does an in-place encryption/measure of the page.
>
> With current v5 of the memfd/UPM series, I think the expected flow is that
> we would fallocate() these ranges from the private fd backend in advance of
> calling KVM_SEV_LAUNCH_UPDATE (if VMM does it after we'd destroy the initial
> guest payload, since they'd be replaced by newly-allocated pages). But if
> VMM does it before, VMM has no way to initialize the guest memory contents,
> since mmap()/pwrite() are disallowed due to MFD_INACCESSIBLE.
OK, so for SEV, basically VMM puts vBIOS directly into guest memory and then
do in-place measurement.
TDX has no problem because TDX temporarily uses a VMM buffer (vs. guest memory)
to hold the vBIOS and then asks SEAM-MODULE to measure and copy that to guest
memory.
Maybe something like SHM_LOCK should be used instead of the aggressive
MFD_INACCESSIBLE. Before VMM calling SHM_LOCK on the memfd, the content
can be changed but after that it's not visible to userspace VMM. This
gives userspace a chance to modify the data in private page.
Chao
>
> I think something similar to your proposal[1] here of making pread()/pwrite()
> possible for private-fd-backed memory that's been flagged as "shareable"
> would work for this case. Although here the "shareable" flag could be
> removed immediately upon successful completion of the SEV_LAUNCH_UPDATE
> firmware command.
>
> I think with TDX this isn't an issue because their analagous TDH.MEM.PAGE.ADD
> seamcall takes a pair of source/dest HPA as input params, so the VMM
> wouldn't need write access to dest HPA at any point, just source HPA.
>
> [1] https://lwn.net/ml/linux-kernel/[email protected]/