2022-12-14 19:48:53

by Michael Roth

[permalink] [raw]
Subject: [PATCH RFC v7 00/64] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support

This patchset is based on top of the following patchset:

"[PATCH v10 0/9] KVM: mm: fd-based approach for supporting KVM"
https://lore.kernel.org/lkml/[email protected]/T/#me1dd3a4c295758b4e4ac8ff600f2db055bc5f987

and is also available at:

https://github.com/amdese/linux/commits/upmv10-host-snp-v7-rfc

== OVERVIEW ==

This version is being posted as an RFC due to fairly extensive changes
relating to transitioning the SEV-SNP implementation to using
restricted/private memslots (aka Unmapped Private Memory) to manage
private guest pages instead of the legacy SEV memory registration ioctls.

Alongside that work we've also been investigating leveraging UPM to to
implement lazy-pinning support for SEV guests, rather than the legacy
SEV memory registration ioctls which rely on pinning everything in
advance.

For both of these SEV and SEV-SNP use-cases we've needed to add a
number of hooks in the restricted, so we thought it would be useful
for this version at least to include both UPM-based SEV and SNP
implementations so can see if these hooks might be needed for other
archs/platforms and start consolidating around whether/how they should
be defined for general usage. There are still some TODOs in this area,
but we hope this implementation is complete enough to at least outline
the required additions needed for using UPM for these use-cases.

Outside of UPM-related items, we've also included fairly extensive
changes based on review feedback from v6 and would appreciate any
feedback on those aspects as well.

== LAYOUT ==

PATCH 01-06: pre-patches that add the UPM hooks and KVM capability needed
to switch between UPM and legacy SEV memory registration.
PATCH 07-12: implement SEV lazy-pinning using UPM to manage private memory
PATCH 13-32: general SNP detection/enablement for host and CCP driver
PATCH 33-58: base KVM-specific support for running SNP guests
PATCH 59-64: misc./documentation/IOMMU changes

== TESTING ==

Tested with the following QEMU tree, which is based on Chao Peng's UPM v10 QEMU
tree:
https://github.com/mdroth/qemu/commits/upmv10-snpv3

SEV-SNP with UPM:

qemu-system-x86_64 -cpu EPYC-Milan-v2 \
-object memory-backend-memfd-private,id=ram1,size=1G,share=true \
-object sev-snp-guest,id=sev0,cbitpos=51,reduced-phys-bits=1,upm-mode=on \
-machine q35,confidential-guest-support=sev0,memory-backend=ram1 \
...

SEV with UPM (requires patched OVMF[1]):

qemu-system-x86_64 -cpu EPYC-Milan-v2 \
-object memory-backend-memfd-private,id=ram1,size=1G,share=true \
-object sev-guest,id=sev0,cbitpos=51,reduced-phys-bits=1,upm-mode=on \
-machine q35,confidential-guest-support=sev0,memory-backend=ram1 \
...

[1] https://github.com/mdroth/edk2/commits/upmv8-seves-v1

== BACKGROUND ==

This part of the Secure Encrypted Paging (SEV-SNP) series focuses on the
changes required in a host OS for SEV-SNP support. The series builds upon
SEV-SNP Guest Support now part of mainline.

This series provides the basic building blocks to support booting the SEV-SNP
VMs, it does not cover all the security enhancement introduced by the SEV-SNP
such as interrupt protection.

The CCP driver is enhanced to provide new APIs that use the SEV-SNP
specific commands defined in the SEV-SNP firmware specification. The KVM
driver uses those APIs to create and managed the SEV-SNP guests.

The GHCB specification version 2 introduces new set of NAE's that is
used by the SEV-SNP guest to communicate with the hypervisor. The series
provides support to handle the following new NAE events:
- Register GHCB GPA
- Page State Change Request
- Hypevisor feature
- Guest message request

The RMP check is enforced as soon as SEV-SNP is enabled. Not every memory
access requires an RMP check. In particular, the read accesses from the
hypervisor do not require RMP checks because the data confidentiality is
already protected via memory encryption. When hardware encounters an RMP
checks failure, it raises a page-fault exception. If RMP check failure
is due to the page-size mismatch, then split the large page to resolve
the fault.

The series does not provide support for the interrupt security and migration
and those feature will be added after the base support.

Changes since v6:

* Added support for restrictedmem/UPM, and removed SEV-specific
implementation of private memory management. As a result of this rework
the following patches were no longer needed so were dropped:
- KVM: SVM: Mark the private vma unmergable for SEV-SNP guests
- KVM: SVM: Disallow registering memory range from HugeTLB for SNP guest
- KVM: x86/mmu: Introduce kvm_mmu_map_tdp_page() for use by TDX and SNP
- KVM: x86: Introduce kvm_mmu_get_tdp_walk() for SEV-SNP use
* Moved RMP table entry structure definition (struct rmpentry)
to sev.c, to not expose this non-architectural definition to rest
of the kernel and making the structure private to SNP code.
Also made RMP table entry accessors to be inline functions and
removed all accessors which are not called more than once.
Added a new function rmptable_entry() to index into the RMP table
and return RMP table entry.
* Moved RMPUPDATE, PSMASH helper function declerations to x86 arch
specific include namespace from linux namespace. Added comments
for these helper functions.
* Introduce set_memory_p() to provide a way to change atributes of a
memory range to be marked as present and added to the kernel
directmap, and invalidating/restoring pages from directmap are
now done using set_memory_np() and set_memory_p().
* Added detailed comments around user RMP #PF fault handling and
simplified computation of the faulting pfn for large-pages.
* Added support to return pfn from dump_pagetable() to do SEV-specific
fault handling, this is added a pre-patch. This support is now
used to dump RMP entry in case of RMP #PF in show_fault_oops().
* Added a new generic SNP command params structure sev_data_snp_addr,
which is used for all SNP firmware API commands requiring a
single physical address parameter.
* Added support for new SNP_INIT_EX command with support for HV-Fixed
page range list.
* Added support for new SNP_SHUTDOWN_EX command which allows
disabling enforcement of SNP in the IOMMU. Also DF_FLUSH is done
at SNP shutdown if it indicates DF_FLUSH is required.
* Make sev_do_cmd() a generic API interface for the hypervisor
to issue commands to manage an SEV and SNP guest. Also removed
the API wrappers used by the hypervisor to manage an SEV-SNP guest.
All these APIs now invoke sev_do_cmd() directly.
* Introduce snp leaked pages list. If pages are unsafe to be released
back to the page-allocator as they can't be reclaimed or
transitioned back to hypervisor/shared state are now added
to this internal leaked pages list to prevent fatal page faults
when accessing these pages. The function snp_leak_pages() is
renamed to snp_mark_pages_offline() and is an external function
available to both CCP driver and the SNP hypervisor code. Removed
call to memory_failure() when leaking/marking pages offline.
* Remove snp_set_rmp_state() multiplexor code and add new separate
helpers such as rmp_mark_pages_firmware() & rmp_mark_pages_shared().
The callers now issue snp_reclaim_pages() directly when needed as
done by __snp_free_firmware_pages() and unmap_firmware_writeable().
All callers of snp_set_rmp_state() modified to call helpers
rmp_mark_pages_firmware() or rmp_mark_pages_shared() as required.
* Change snp_reclaim_pages() to take physical address as an argument
and clear C-bit from this physical address argument internally.
* Output parameter sev_user_data_ext_snp_config in sev_ioctl_snp_get_config()
is memset to zero to avoid kernel memory leaking.
* Prevent race between sev_ioctl_snp_set_config() and
snp_guest_ext_guest_request() for sev->snp_certs_data by acquiring
sev->snp_certs_lock mutex.
* Zeroed out struct sev_user_data_snp_config in
sev_ioctl_snp_set_config() to prevent leaking uninitialized
kernel memory.
* Optimized snp_safe_alloc_page() by avoiding multiple calls to
pfn_to_page() and checking for a hugepage using pfn instead of
expanding to full physical address.
* Invoke host_rmp_make_shared() with leak parameter set to true
if VMSA page cannot be transitioned back to shared state.
* Fix snp_launch_finish() to always sent the ID_AUTH struct to
the firmware. Use params.auth_key_en indicator to set
if the ID_AUTH struct contains an author key or not.
* Cleanup snp_context_create() and allocate certs_data in this
function using kzalloc() to prevent giving the guest
uninitialized kernel memory.
* Remove the check for guest supplied buffer greater than the data
provided by the hypervisor in snp_handle_ext_guest_request().
* Add check in sev_snp_ap_create() if a malicious guest can
RMPADJUST a large page into VMSA which will hit the SNP erratum
where the CPU will incorrectly signal an RMP violation #PF if a
hugepage collides with the RMP entry of VMSA page, reject the
AP CREATE request if VMSA address from guest is 2M aligned.
* Make VMSAVE target area memory allocation SNP safe, implemented
workaround for an SNP erratum where the CPU will incorrectly signal
an RMP violation #PF if a hugepage (2mb or 1gb) collides with the
RMP entry of the VMSAVE target page.
* Fix handle_split_page_fault() to work with memfd backed pages.
* Add KVM commands for per-VM instance certificates.
* Add IOMMU_SNP_SHUTDOWN support, this adds support for Host kexec
support with SNP.

----------------------------------------------------------------
Ashish Kalra (6):
x86/mm/pat: Introduce set_memory_p
x86/fault: Return pfn from dump_pagetable() for SEV-specific fault handling.
crypto: ccp: Introduce snp leaked pages list
KVM: SVM: Sync the GHCB scratch buffer using already mapped ghcb
KVM: SVM: Make VMSAVE target area memory allocation SNP safe
iommu/amd: Add IOMMU_SNP_SHUTDOWN support

Brijesh Singh (36):
x86/cpufeatures: Add SEV-SNP CPU feature
x86/sev: Add the host SEV-SNP initialization support
x86/sev: Add RMP entry lookup helpers
x86/sev: Add helper functions for RMPUPDATE and PSMASH instruction
x86/sev: Invalidate pages from the direct map when adding them to the RMP table
x86/traps: Define RMP violation #PF error code
x86/fault: Add support to handle the RMP fault for user address
x86/fault: Add support to dump RMP entry on fault
crypto:ccp: Define the SEV-SNP commands
crypto: ccp: Add support to initialize the AMD-SP for SEV-SNP
crypto:ccp: Provide API to issue SEV and SNP commands
crypto: ccp: Handle the legacy TMR allocation when SNP is enabled
crypto: ccp: Handle the legacy SEV command when SNP is enabled
crypto: ccp: Add the SNP_PLATFORM_STATUS command
crypto: ccp: Add the SNP_{SET,GET}_EXT_CONFIG command
crypto: ccp: Provide APIs to query extended attestation report
KVM: SVM: Provide the Hypervisor Feature support VMGEXIT
KVM: SVM: Make AVIC backing, VMSA and VMCB memory allocation SNP safe
KVM: SVM: Add initial SEV-SNP support
KVM: SVM: Add KVM_SNP_INIT command
KVM: SVM: Add KVM_SEV_SNP_LAUNCH_START command
KVM: SVM: Add KVM_SEV_SNP_LAUNCH_UPDATE command
KVM: SVM: Add KVM_SEV_SNP_LAUNCH_FINISH command
KVM: X86: Keep the NPT and RMP page level in sync
KVM: x86: Define RMP page fault error bits for #NPF
KVM: SVM: Do not use long-lived GHCB map while setting scratch area
KVM: SVM: Remove the long-lived GHCB host map
KVM: SVM: Add support to handle GHCB GPA register VMGEXIT
KVM: SVM: Add support to handle MSR based Page State Change VMGEXIT
KVM: SVM: Add support to handle Page State Change VMGEXIT
KVM: SVM: Introduce ops for the post gfn map and unmap
KVM: x86: Export the kvm_zap_gfn_range() for the SNP use
KVM: SVM: Add support to handle the RMP nested page fault
KVM: SVM: Provide support for SNP_GUEST_REQUEST NAE event
KVM: SVM: Add module parameter to enable the SEV-SNP
ccp: Add support to decrypt the page

Dionna Glaze (2):
x86/sev: Add KVM commands for instance certs
x86/sev: Document KVM_SEV_SNP_{G,S}ET_CERTS

Hugh Dickins (1):
x86/fault: fix handle_split_page_fault() to work with memfd backed pages

Michael Roth (9):
KVM: x86: Add KVM_CAP_UNMAPPED_PRIVATE_MEMORY
KVM: x86: Add 'fault_is_private' x86 op
KVM: x86: Add 'update_mem_attr' x86 op
KVM: x86: Add platform hooks for private memory invalidations
KVM: SEV: Implement .fault_is_private callback
KVM: SVM: Add KVM_EXIT_VMGEXIT
KVM: SVM: Add SNP-specific handling for memory attribute updates
KVM: x86/mmu: Generate KVM_EXIT_MEMORY_FAULT for implicit conversions for SNP
KVM: SEV: Handle restricted memory invalidations for SNP

Nikunj A Dadhania (5):
KVM: Fix memslot boundary condition for large page
KVM: SVM: Advertise private memory support to KVM
KVM: SEV: Handle KVM_HC_MAP_GPA_RANGE hypercall
KVM: Move kvm_for_each_memslot_in_hva_range() to be used in SVM
KVM: SEV: Support private pages in LAUNCH_UPDATE_DATA

Tom Lendacky (3):
KVM: SVM: Add support to handle AP reset MSR protocol
KVM: SVM: Use a VMSA physical address variable for populating VMCB
KVM: SVM: Support SEV-SNP AP Creation NAE event

Vishal Annapurve (2):
KVM: Add HVA range operator
KVM: SEV: Populate private memory fd during LAUNCH_UPDATE_DATA

Documentation/virt/coco/sev-guest.rst | 54 +
.../virt/kvm/x86/amd-memory-encryption.rst | 146 ++
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/disabled-features.h | 8 +-
arch/x86/include/asm/kvm-x86-ops.h | 6 +
arch/x86/include/asm/kvm_host.h | 23 +
arch/x86/include/asm/msr-index.h | 11 +-
arch/x86/include/asm/set_memory.h | 3 +-
arch/x86/include/asm/sev-common.h | 28 +
arch/x86/include/asm/sev.h | 28 +
arch/x86/include/asm/svm.h | 6 +
arch/x86/include/asm/trap_pf.h | 18 +-
arch/x86/kernel/cpu/amd.c | 5 +-
arch/x86/kernel/sev.c | 437 ++++
arch/x86/kvm/lapic.c | 5 +-
arch/x86/kvm/mmu.h | 2 -
arch/x86/kvm/mmu/mmu.c | 34 +-
arch/x86/kvm/mmu/mmu_internal.h | 40 +-
arch/x86/kvm/svm/sev.c | 2217 +++++++++++++++++---
arch/x86/kvm/svm/svm.c | 84 +-
arch/x86/kvm/svm/svm.h | 75 +-
arch/x86/kvm/trace.h | 34 +
arch/x86/kvm/x86.c | 36 +
arch/x86/mm/fault.c | 118 +-
arch/x86/mm/pat/set_memory.c | 12 +-
drivers/crypto/ccp/sev-dev.c | 1055 +++++++++-
drivers/crypto/ccp/sev-dev.h | 18 +
drivers/iommu/amd/init.c | 53 +
include/linux/amd-iommu.h | 1 +
include/linux/kvm_host.h | 15 +
include/linux/mm.h | 3 +-
include/linux/mm_types.h | 3 +
include/linux/psp-sev.h | 352 +++-
include/uapi/linux/kvm.h | 75 +
include/uapi/linux/psp-sev.h | 60 +
mm/memory.c | 15 +
mm/restrictedmem.c | 16 +
tools/arch/x86/include/asm/cpufeatures.h | 1 +
virt/kvm/kvm_main.c | 87 +-
39 files changed, 4838 insertions(+), 347 deletions(-)



2022-12-14 19:49:28

by Michael Roth

[permalink] [raw]
Subject: [PATCH RFC v7 11/64] KVM: SEV: Support private pages in LAUNCH_UPDATE_DATA

From: Nikunj A Dadhania <[email protected]>

Pre-boot guest payload needs to be encrypted and VMM has copied it
over to the private-fd. Add support to get the pfn from the memfile fd
for encrypting the payload in-place.

Signed-off-by: Nikunj A Dadhania <[email protected]>
Signed-off-by: Michael Roth <[email protected]>
---
arch/x86/kvm/svm/sev.c | 79 ++++++++++++++++++++++++++++++++++--------
1 file changed, 64 insertions(+), 15 deletions(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index a7e4e3005786..ae4920aeb281 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -107,6 +107,11 @@ static inline bool is_mirroring_enc_context(struct kvm *kvm)
return !!to_kvm_svm(kvm)->sev_info.enc_context_owner;
}

+static bool kvm_is_upm_enabled(struct kvm *kvm)
+{
+ return kvm->arch.upm_mode;
+}
+
/* Must be called with the sev_bitmap_lock held */
static bool __sev_recycle_asids(int min_asid, int max_asid)
{
@@ -382,6 +387,38 @@ static int sev_launch_start(struct kvm *kvm, struct kvm_sev_cmd *argp)
return ret;
}

+static int sev_get_memfile_pfn_handler(struct kvm *kvm, struct kvm_gfn_range *range, void *data)
+{
+ struct kvm_memory_slot *memslot = range->slot;
+ struct page **pages = data;
+ int ret = 0, i = 0;
+ kvm_pfn_t pfn;
+ gfn_t gfn;
+
+ for (gfn = range->start; gfn < range->end; gfn++) {
+ int order;
+
+ ret = kvm_restricted_mem_get_pfn(memslot, gfn, &pfn, &order);
+ if (ret)
+ return ret;
+
+ if (is_error_noslot_pfn(pfn))
+ return -EFAULT;
+
+ pages[i++] = pfn_to_page(pfn);
+ }
+
+ return ret;
+}
+
+static int sev_get_memfile_pfn(struct kvm *kvm, unsigned long addr,
+ unsigned long size, unsigned long npages,
+ struct page **pages)
+{
+ return kvm_vm_do_hva_range_op(kvm, addr, size,
+ sev_get_memfile_pfn_handler, pages);
+}
+
static struct page **sev_pin_memory(struct kvm *kvm, unsigned long uaddr,
unsigned long ulen, unsigned long *n,
int write)
@@ -424,16 +461,25 @@ static struct page **sev_pin_memory(struct kvm *kvm, unsigned long uaddr,
if (!pages)
return ERR_PTR(-ENOMEM);

- /* Pin the user virtual address. */
- npinned = pin_user_pages_fast(uaddr, npages, write ? FOLL_WRITE : 0, pages);
- if (npinned != npages) {
- pr_err("SEV: Failure locking %lu pages.\n", npages);
- ret = -ENOMEM;
- goto err;
+ if (kvm_is_upm_enabled(kvm)) {
+ /* Get the PFN from memfile */
+ if (sev_get_memfile_pfn(kvm, uaddr, ulen, npages, pages)) {
+ pr_err("%s: ERROR: unable to find slot for uaddr %lx", __func__, uaddr);
+ ret = -ENOMEM;
+ goto err;
+ }
+ } else {
+ /* Pin the user virtual address. */
+ npinned = pin_user_pages_fast(uaddr, npages, write ? FOLL_WRITE : 0, pages);
+ if (npinned != npages) {
+ pr_err("SEV: Failure locking %lu pages.\n", npages);
+ ret = -ENOMEM;
+ goto err;
+ }
+ sev->pages_locked = locked;
}

*n = npages;
- sev->pages_locked = locked;

return pages;

@@ -514,6 +560,7 @@ static int sev_launch_update_shared_gfn_handler(struct kvm *kvm,

size = (range->end - range->start) << PAGE_SHIFT;
vaddr_end = vaddr + size;
+ WARN_ON(size < PAGE_SIZE);

/* Lock the user memory. */
inpages = sev_pin_memory(kvm, vaddr, size, &npages, 1);
@@ -554,13 +601,16 @@ static int sev_launch_update_shared_gfn_handler(struct kvm *kvm,
}

e_unpin:
- /* content of memory is updated, mark pages dirty */
- for (i = 0; i < npages; i++) {
- set_page_dirty_lock(inpages[i]);
- mark_page_accessed(inpages[i]);
+ if (!kvm_is_upm_enabled(kvm)) {
+ /* content of memory is updated, mark pages dirty */
+ for (i = 0; i < npages; i++) {
+ set_page_dirty_lock(inpages[i]);
+ mark_page_accessed(inpages[i]);
+ }
+ /* unlock the user pages */
+ sev_unpin_memory(kvm, inpages, npages);
}
- /* unlock the user pages */
- sev_unpin_memory(kvm, inpages, npages);
+
return ret;
}

@@ -609,9 +659,8 @@ static int sev_launch_update_priv_gfn_handler(struct kvm *kvm,
goto e_ret;
kvm_release_pfn_clean(pfn);
}
- kvm_vm_set_region_attr(kvm, range->start, range->end,
- true /* priv_attr */);

+ kvm_vm_set_region_attr(kvm, range->start, range->end, KVM_MEMORY_ATTRIBUTE_PRIVATE);
e_ret:
return ret;
}
--
2.25.1

2022-12-14 19:49:59

by Michael Roth

[permalink] [raw]
Subject: [PATCH RFC v7 12/64] KVM: SEV: Implement .fault_is_private callback

KVM MMU will use this to determine whether an #NPF should be serviced
with restricted memory or not.

Signed-off-by: Michael Roth <[email protected]>
---
arch/x86/kvm/svm/sev.c | 23 +++++++++++++++++++++++
arch/x86/kvm/svm/svm.c | 2 ++
arch/x86/kvm/svm/svm.h | 2 ++
3 files changed, 27 insertions(+)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index ae4920aeb281..6579ed218f6a 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3179,3 +3179,26 @@ void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector)

ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, 1);
}
+
+int sev_fault_is_private(struct kvm *kvm, gpa_t gpa, u64 error_code, bool *private_fault)
+{
+ gfn_t gfn = gpa_to_gfn(gpa);
+
+ if (!kvm_is_upm_enabled(kvm) || !sev_guest(kvm))
+ goto out_unhandled;
+
+ /*
+ * For SEV, the hypervisor is not aware of implicit conversions in the
+ * guest, so it relies purely on explicit conversions via
+ * KVM_EXIT_HYPERCALL, so the resulting handling by userspace should
+ * update the backing memory source accordingly. Therefore, the backing
+ * source is the only indicator of whether the fault should be treated
+ * as private or not.
+ */
+ *private_fault = kvm_mem_is_private(kvm, gfn);
+
+ return 1;
+
+out_unhandled:
+ return 0;
+}
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 7f3e4d91c0c6..fc7885869f7e 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4830,6 +4830,8 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {

.vcpu_deliver_sipi_vector = svm_vcpu_deliver_sipi_vector,
.vcpu_get_apicv_inhibit_reasons = avic_vcpu_get_apicv_inhibit_reasons,
+
+ .fault_is_private = sev_fault_is_private,
};

/*
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 4826e6cc611b..c760ec51a910 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -683,6 +683,8 @@ void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector);
void sev_es_prepare_switch_to_guest(struct sev_es_save_area *hostsa);
void sev_es_unmap_ghcb(struct vcpu_svm *svm);

+int sev_fault_is_private(struct kvm *kvm, gpa_t gpa, u64 error_code, bool *private_fault);
+
/* vmenter.S */

void __svm_sev_es_vcpu_run(struct vcpu_svm *svm, bool spec_ctrl_intercepted);
--
2.25.1

2022-12-14 19:50:46

by Michael Roth

[permalink] [raw]
Subject: [PATCH RFC v7 13/64] x86/cpufeatures: Add SEV-SNP CPU feature

From: Brijesh Singh <[email protected]>

Add CPU feature detection for Secure Encrypted Virtualization with
Secure Nested Paging. This feature adds a strong memory integrity
protection to help prevent malicious hypervisor-based attacks like
data replay, memory re-mapping, and more.

Link: https://lore.kernel.org/all/YrGINaPc3cojG6%[email protected]/
Signed-off-by: Brijesh Singh <[email protected]>
Signed-off-by: Jarkko Sakkinen <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
Signed-off-by: Michael Roth <[email protected]>
---
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/kernel/cpu/amd.c | 5 +++--
tools/arch/x86/include/asm/cpufeatures.h | 1 +
3 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 1419c4e04d45..480b4eaef310 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -420,6 +420,7 @@
#define X86_FEATURE_SEV (19*32+ 1) /* AMD Secure Encrypted Virtualization */
#define X86_FEATURE_VM_PAGE_FLUSH (19*32+ 2) /* "" VM Page Flush MSR is supported */
#define X86_FEATURE_SEV_ES (19*32+ 3) /* AMD Secure Encrypted Virtualization - Encrypted State */
+#define X86_FEATURE_SEV_SNP (19*32+ 4) /* AMD Secure Encrypted Virtualization - Secure Nested Paging */
#define X86_FEATURE_V_TSC_AUX (19*32+ 9) /* "" Virtual TSC_AUX */
#define X86_FEATURE_SME_COHERENT (19*32+10) /* "" AMD hardware-enforced cache coherency */

diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index 860b60273df3..c7884198ad5b 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -558,8 +558,8 @@ static void early_detect_mem_encrypt(struct cpuinfo_x86 *c)
* SME feature (set in scattered.c).
* If the kernel has not enabled SME via any means then
* don't advertise the SME feature.
- * For SEV: If BIOS has not enabled SEV then don't advertise the
- * SEV and SEV_ES feature (set in scattered.c).
+ * For SEV: If BIOS has not enabled SEV then don't advertise SEV and
+ * any additional functionality based on it.
*
* In all cases, since support for SME and SEV requires long mode,
* don't advertise the feature under CONFIG_X86_32.
@@ -594,6 +594,7 @@ static void early_detect_mem_encrypt(struct cpuinfo_x86 *c)
clear_sev:
setup_clear_cpu_cap(X86_FEATURE_SEV);
setup_clear_cpu_cap(X86_FEATURE_SEV_ES);
+ setup_clear_cpu_cap(X86_FEATURE_SEV_SNP);
}
}

diff --git a/tools/arch/x86/include/asm/cpufeatures.h b/tools/arch/x86/include/asm/cpufeatures.h
index b71f4f2ecdd5..e81606fcd2ab 100644
--- a/tools/arch/x86/include/asm/cpufeatures.h
+++ b/tools/arch/x86/include/asm/cpufeatures.h
@@ -417,6 +417,7 @@
#define X86_FEATURE_SEV (19*32+ 1) /* AMD Secure Encrypted Virtualization */
#define X86_FEATURE_VM_PAGE_FLUSH (19*32+ 2) /* "" VM Page Flush MSR is supported */
#define X86_FEATURE_SEV_ES (19*32+ 3) /* AMD Secure Encrypted Virtualization - Encrypted State */
+#define X86_FEATURE_SEV_SNP (19*32+ 4) /* AMD Secure Encrypted Virtualization - Secure Nested Paging */
#define X86_FEATURE_V_TSC_AUX (19*32+ 9) /* "" Virtual TSC_AUX */
#define X86_FEATURE_SME_COHERENT (19*32+10) /* "" AMD hardware-enforced cache coherency */

--
2.25.1

2022-12-14 19:51:23

by Michael Roth

[permalink] [raw]
Subject: [PATCH RFC v7 14/64] x86/sev: Add the host SEV-SNP initialization support

From: Brijesh Singh <[email protected]>

The memory integrity guarantees of SEV-SNP are enforced through a new
structure called the Reverse Map Table (RMP). The RMP is a single data
structure shared across the system that contains one entry for every 4K
page of DRAM that may be used by SEV-SNP VMs. The goal of RMP is to
track the owner of each page of memory. Pages of memory can be owned by
the hypervisor, owned by a specific VM or owned by the AMD-SP. See APM2
section 15.36.3 for more detail on RMP.

The RMP table is used to enforce access control to memory. The table itself
is not directly writable by the software. New CPU instructions (RMPUPDATE,
PVALIDATE, RMPADJUST) are used to manipulate the RMP entries.

Based on the platform configuration, the BIOS reserves the memory used
for the RMP table. The start and end address of the RMP table must be
queried by reading the RMP_BASE and RMP_END MSRs. If the RMP_BASE and
RMP_END are not set then disable the SEV-SNP feature.

The SEV-SNP feature is enabled only after the RMP table is successfully
initialized.

Also set SYSCFG.MFMD when enabling SNP as SEV-SNP FW >= 1.51 requires
that SYSCFG.MFMD must be se

RMP table entry format is non-architectural and it can vary by processor
and is defined by the PPR. Restrict SNP support on the known CPU model
and family for which the RMP table entry format is currently defined for.

Signed-off-by: Brijesh Singh <[email protected]>
Signed-off-b: Ashish Kalra <[email protected]>
Signed-off-by: Michael Roth <[email protected]>
---
arch/x86/include/asm/disabled-features.h | 8 +-
arch/x86/include/asm/msr-index.h | 11 +-
arch/x86/kernel/sev.c | 180 +++++++++++++++++++++++
3 files changed, 197 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/disabled-features.h b/arch/x86/include/asm/disabled-features.h
index 33d2cd04d254..9b5a2cc8064a 100644
--- a/arch/x86/include/asm/disabled-features.h
+++ b/arch/x86/include/asm/disabled-features.h
@@ -87,6 +87,12 @@
# define DISABLE_TDX_GUEST (1 << (X86_FEATURE_TDX_GUEST & 31))
#endif

+#ifdef CONFIG_AMD_MEM_ENCRYPT
+# define DISABLE_SEV_SNP 0
+#else
+# define DISABLE_SEV_SNP (1 << (X86_FEATURE_SEV_SNP & 31))
+#endif
+
/*
* Make sure to add features to the correct mask
*/
@@ -110,7 +116,7 @@
DISABLE_ENQCMD)
#define DISABLED_MASK17 0
#define DISABLED_MASK18 0
-#define DISABLED_MASK19 0
+#define DISABLED_MASK19 (DISABLE_SEV_SNP)
#define DISABLED_MASK_CHECK BUILD_BUG_ON_ZERO(NCAPINTS != 20)

#endif /* _ASM_X86_DISABLED_FEATURES_H */
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 10ac52705892..35100c630617 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -565,6 +565,8 @@
#define MSR_AMD64_SEV_ENABLED BIT_ULL(MSR_AMD64_SEV_ENABLED_BIT)
#define MSR_AMD64_SEV_ES_ENABLED BIT_ULL(MSR_AMD64_SEV_ES_ENABLED_BIT)
#define MSR_AMD64_SEV_SNP_ENABLED BIT_ULL(MSR_AMD64_SEV_SNP_ENABLED_BIT)
+#define MSR_AMD64_RMP_BASE 0xc0010132
+#define MSR_AMD64_RMP_END 0xc0010133

#define MSR_AMD64_VIRT_SPEC_CTRL 0xc001011f

@@ -649,7 +651,14 @@
#define MSR_K8_TOP_MEM2 0xc001001d
#define MSR_AMD64_SYSCFG 0xc0010010
#define MSR_AMD64_SYSCFG_MEM_ENCRYPT_BIT 23
-#define MSR_AMD64_SYSCFG_MEM_ENCRYPT BIT_ULL(MSR_AMD64_SYSCFG_MEM_ENCRYPT_BIT)
+#define MSR_AMD64_SYSCFG_MEM_ENCRYPT BIT_ULL(MSR_AMD64_SYSCFG_MEM_ENCRYPT_BIT)
+#define MSR_AMD64_SYSCFG_SNP_EN_BIT 24
+#define MSR_AMD64_SYSCFG_SNP_EN BIT_ULL(MSR_AMD64_SYSCFG_SNP_EN_BIT)
+#define MSR_AMD64_SYSCFG_SNP_VMPL_EN_BIT 25
+#define MSR_AMD64_SYSCFG_SNP_VMPL_EN BIT_ULL(MSR_AMD64_SYSCFG_SNP_VMPL_EN_BIT)
+#define MSR_AMD64_SYSCFG_MFDM_BIT 19
+#define MSR_AMD64_SYSCFG_MFDM BIT_ULL(MSR_AMD64_SYSCFG_MFDM_BIT)
+
#define MSR_K8_INT_PENDING_MSG 0xc0010055
/* C1E active bits in int pending message */
#define K8_INTP_C1E_ACTIVE_MASK 0x18000000
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index a428c62330d3..687a91284506 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -22,6 +22,9 @@
#include <linux/efi.h>
#include <linux/platform_device.h>
#include <linux/io.h>
+#include <linux/cpumask.h>
+#include <linux/iommu.h>
+#include <linux/amd-iommu.h>

#include <asm/cpu_entry_area.h>
#include <asm/stacktrace.h>
@@ -38,6 +41,7 @@
#include <asm/apic.h>
#include <asm/cpuid.h>
#include <asm/cmdline.h>
+#include <asm/iommu.h>

#define DR7_RESET_VALUE 0x400

@@ -57,6 +61,12 @@
#define AP_INIT_CR0_DEFAULT 0x60000010
#define AP_INIT_MXCSR_DEFAULT 0x1f80

+/*
+ * The first 16KB from the RMP_BASE is used by the processor for the
+ * bookkeeping, the range needs to be added during the RMP entry lookup.
+ */
+#define RMPTABLE_CPU_BOOKKEEPING_SZ 0x4000
+
/* For early boot hypervisor communication in SEV-ES enabled guests */
static struct ghcb boot_ghcb_page __bss_decrypted __aligned(PAGE_SIZE);

@@ -69,6 +79,9 @@ static struct ghcb *boot_ghcb __section(".data");
/* Bitmap of SEV features supported by the hypervisor */
static u64 sev_hv_features __ro_after_init;

+static unsigned long rmptable_start __ro_after_init;
+static unsigned long rmptable_end __ro_after_init;
+
/* #VC handler runtime per-CPU data */
struct sev_es_runtime_data {
struct ghcb ghcb_page;
@@ -2260,3 +2273,170 @@ static int __init snp_init_platform_device(void)
return 0;
}
device_initcall(snp_init_platform_device);
+
+#undef pr_fmt
+#define pr_fmt(fmt) "SEV-SNP: " fmt
+
+static int __mfd_enable(unsigned int cpu)
+{
+ u64 val;
+
+ if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
+ return 0;
+
+ rdmsrl(MSR_AMD64_SYSCFG, val);
+
+ val |= MSR_AMD64_SYSCFG_MFDM;
+
+ wrmsrl(MSR_AMD64_SYSCFG, val);
+
+ return 0;
+}
+
+static __init void mfd_enable(void *arg)
+{
+ __mfd_enable(smp_processor_id());
+}
+
+static int __snp_enable(unsigned int cpu)
+{
+ u64 val;
+
+ if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
+ return 0;
+
+ rdmsrl(MSR_AMD64_SYSCFG, val);
+
+ val |= MSR_AMD64_SYSCFG_SNP_EN;
+ val |= MSR_AMD64_SYSCFG_SNP_VMPL_EN;
+
+ wrmsrl(MSR_AMD64_SYSCFG, val);
+
+ return 0;
+}
+
+static __init void snp_enable(void *arg)
+{
+ __snp_enable(smp_processor_id());
+}
+
+static bool get_rmptable_info(u64 *start, u64 *len)
+{
+ u64 calc_rmp_sz, rmp_sz, rmp_base, rmp_end;
+
+ rdmsrl(MSR_AMD64_RMP_BASE, rmp_base);
+ rdmsrl(MSR_AMD64_RMP_END, rmp_end);
+
+ if (!rmp_base || !rmp_end) {
+ pr_err("Memory for the RMP table has not been reserved by BIOS\n");
+ return false;
+ }
+
+ rmp_sz = rmp_end - rmp_base + 1;
+
+ /*
+ * Calculate the amount the memory that must be reserved by the BIOS to
+ * address the whole RAM. The reserved memory should also cover the
+ * RMP table itself.
+ */
+ calc_rmp_sz = (((rmp_sz >> PAGE_SHIFT) + totalram_pages()) << 4) + RMPTABLE_CPU_BOOKKEEPING_SZ;
+
+ if (calc_rmp_sz > rmp_sz) {
+ pr_err("Memory reserved for the RMP table does not cover full system RAM (expected 0x%llx got 0x%llx)\n",
+ calc_rmp_sz, rmp_sz);
+ return false;
+ }
+
+ *start = rmp_base;
+ *len = rmp_sz;
+
+ pr_info("RMP table physical address [0x%016llx - 0x%016llx]\n", rmp_base, rmp_end);
+
+ return true;
+}
+
+static __init int __snp_rmptable_init(void)
+{
+ u64 rmp_base, sz;
+ void *start;
+ u64 val;
+
+ if (!get_rmptable_info(&rmp_base, &sz))
+ return 1;
+
+ start = memremap(rmp_base, sz, MEMREMAP_WB);
+ if (!start) {
+ pr_err("Failed to map RMP table addr 0x%llx size 0x%llx\n", rmp_base, sz);
+ return 1;
+ }
+
+ /*
+ * Check if SEV-SNP is already enabled, this can happen in case of
+ * kexec boot.
+ */
+ rdmsrl(MSR_AMD64_SYSCFG, val);
+ if (val & MSR_AMD64_SYSCFG_SNP_EN)
+ goto skip_enable;
+
+ /* Initialize the RMP table to zero */
+ memset(start, 0, sz);
+
+ /* Flush the caches to ensure that data is written before SNP is enabled. */
+ wbinvd_on_all_cpus();
+
+ /* MFDM must be enabled on all the CPUs prior to enabling SNP. */
+ on_each_cpu(mfd_enable, NULL, 1);
+
+ /* Enable SNP on all CPUs. */
+ on_each_cpu(snp_enable, NULL, 1);
+
+skip_enable:
+ rmptable_start = (unsigned long)start;
+ rmptable_end = rmptable_start + sz - 1;
+
+ return 0;
+}
+
+static int __init snp_rmptable_init(void)
+{
+ int family, model;
+
+ if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
+ return 0;
+
+ family = boot_cpu_data.x86;
+ model = boot_cpu_data.x86_model;
+
+ /*
+ * RMP table entry format is not architectural and it can vary by processor and
+ * is defined by the per-processor PPR. Restrict SNP support on the known CPU
+ * model and family for which the RMP table entry format is currently defined for.
+ */
+ if (family != 0x19 || model > 0xaf)
+ goto nosnp;
+
+ if (amd_iommu_snp_enable())
+ goto nosnp;
+
+ if (__snp_rmptable_init())
+ goto nosnp;
+
+ cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "x86/rmptable_init:online", __snp_enable, NULL);
+
+ return 0;
+
+nosnp:
+ setup_clear_cpu_cap(X86_FEATURE_SEV_SNP);
+ return -ENOSYS;
+}
+
+/*
+ * This must be called after the PCI subsystem. This is because amd_iommu_snp_enable()
+ * is called to ensure the IOMMU supports the SEV-SNP feature, which can only be
+ * called after subsys_initcall().
+ *
+ * NOTE: IOMMU is enforced by SNP to ensure that hypervisor cannot program DMA
+ * directly into guest private memory. In case of SNP, the IOMMU ensures that
+ * the page(s) used for DMA are hypervisor owned.
+ */
+fs_initcall(snp_rmptable_init);
--
2.25.1

2022-12-14 19:52:03

by Michael Roth

[permalink] [raw]
Subject: [PATCH RFC v7 16/64] x86/sev: Add helper functions for RMPUPDATE and PSMASH instruction

From: Brijesh Singh <[email protected]>

The RMPUPDATE instruction writes a new RMP entry in the RMP Table. The
hypervisor will use the instruction to add pages to the RMP table. See
APM3 for details on the instruction operations.

The PSMASH instruction expands a 2MB RMP entry into a corresponding set
of contiguous 4KB-Page RMP entries. The hypervisor will use this
instruction to adjust the RMP entry without invalidating the previous
RMP entry.

Add the following external interface API functions:

int psmash(u64 pfn);
psmash is used to smash a 2MB aligned page into 4K
pages while preserving the Validated bit in the RMP.

int rmp_make_private(u64 pfn, u64 gpa, enum pg_level level, int asid, bool immutable);
Used to assign a page to guest using the RMPUPDATE instruction.

int rmp_make_shared(u64 pfn, enum pg_level level);
Used to transition a page to hypervisor/shared state using the RMPUPDATE instruction.

Signed-off-by: Ashish Kalra <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
[mdr: add RMPUPDATE retry logic for transient FAIL_OVERLAP errors]
Signed-off-by: Michael Roth <[email protected]>
---
arch/x86/include/asm/sev.h | 24 ++++++++++
arch/x86/kernel/sev.c | 95 ++++++++++++++++++++++++++++++++++++++
2 files changed, 119 insertions(+)

diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index 8d3ce2ad27da..4eeedcaca593 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -80,10 +80,15 @@ extern bool handle_vc_boot_ghcb(struct pt_regs *regs);

/* Software defined (when rFlags.CF = 1) */
#define PVALIDATE_FAIL_NOUPDATE 255
+/* RMUPDATE detected 4K page and 2MB page overlap. */
+#define RMPUPDATE_FAIL_OVERLAP 7

/* RMP page size */
#define RMP_PG_SIZE_4K 0
+#define RMP_PG_SIZE_2M 1
#define RMP_TO_X86_PG_LEVEL(level) (((level) == RMP_PG_SIZE_4K) ? PG_LEVEL_4K : PG_LEVEL_2M)
+#define X86_TO_RMP_PG_LEVEL(level) (((level) == PG_LEVEL_4K) ? RMP_PG_SIZE_4K : RMP_PG_SIZE_2M)
+
#define RMPADJUST_VMSA_PAGE_BIT BIT(16)

/* SNP Guest message request */
@@ -133,6 +138,15 @@ struct snp_secrets_page_layout {
u8 rsvd3[3840];
} __packed;

+struct rmp_state {
+ u64 gpa;
+ u8 assigned;
+ u8 pagesize;
+ u8 immutable;
+ u8 rsvd;
+ u32 asid;
+} __packed;
+
#ifdef CONFIG_AMD_MEM_ENCRYPT
extern struct static_key_false sev_es_enable_key;
extern void __sev_es_ist_enter(struct pt_regs *regs);
@@ -198,6 +212,9 @@ bool snp_init(struct boot_params *bp);
void __init __noreturn snp_abort(void);
int snp_issue_guest_request(u64 exit_code, struct snp_req_data *input, unsigned long *fw_err);
int snp_lookup_rmpentry(u64 pfn, int *level);
+int psmash(u64 pfn);
+int rmp_make_private(u64 pfn, u64 gpa, enum pg_level level, int asid, bool immutable);
+int rmp_make_shared(u64 pfn, enum pg_level level);
#else
static inline void sev_es_ist_enter(struct pt_regs *regs) { }
static inline void sev_es_ist_exit(void) { }
@@ -223,6 +240,13 @@ static inline int snp_issue_guest_request(u64 exit_code, struct snp_req_data *in
return -ENOTTY;
}
static inline int snp_lookup_rmpentry(u64 pfn, int *level) { return 0; }
+static inline int psmash(u64 pfn) { return -ENXIO; }
+static inline int rmp_make_private(u64 pfn, u64 gpa, enum pg_level level, int asid,
+ bool immutable)
+{
+ return -ENODEV;
+}
+static inline int rmp_make_shared(u64 pfn, enum pg_level level) { return -ENODEV; }
#endif

#endif
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index 706675561f49..67035d34adad 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -2523,3 +2523,98 @@ int snp_lookup_rmpentry(u64 pfn, int *level)
return !!rmpentry_assigned(e);
}
EXPORT_SYMBOL_GPL(snp_lookup_rmpentry);
+
+/*
+ * psmash is used to smash a 2MB aligned page into 4K
+ * pages while preserving the Validated bit in the RMP.
+ */
+int psmash(u64 pfn)
+{
+ unsigned long paddr = pfn << PAGE_SHIFT;
+ int ret;
+
+ if (!pfn_valid(pfn))
+ return -EINVAL;
+
+ if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
+ return -ENXIO;
+
+ /* Binutils version 2.36 supports the PSMASH mnemonic. */
+ asm volatile(".byte 0xF3, 0x0F, 0x01, 0xFF"
+ : "=a"(ret)
+ : "a"(paddr)
+ : "memory", "cc");
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(psmash);
+
+static int rmpupdate(u64 pfn, struct rmp_state *val)
+{
+ unsigned long paddr = pfn << PAGE_SHIFT;
+ int retries = 0;
+ int ret;
+
+ if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
+ return -ENXIO;
+
+retry:
+ /* Binutils version 2.36 supports the RMPUPDATE mnemonic. */
+ asm volatile(".byte 0xF2, 0x0F, 0x01, 0xFE"
+ : "=a"(ret)
+ : "a"(paddr), "c"((unsigned long)val)
+ : "memory", "cc");
+
+ if (ret) {
+ if (!retries) {
+ pr_err("RMPUPDATE failed, ret: %d, pfn: %llx, npages: %d, level: %d, retrying (max: %d)...\n",
+ ret, pfn, npages, level, 2 * num_present_cpus());
+ dump_stack();
+ }
+ retries++;
+ if (retries < 2 * num_present_cpus())
+ goto retry;
+ } else if (retries > 0) {
+ pr_err("RMPUPDATE for pfn %llx succeeded after %d retries\n", pfn, retries);
+ }
+
+ return ret;
+}
+
+/*
+ * Assign a page to guest using the RMPUPDATE instruction.
+ */
+int rmp_make_private(u64 pfn, u64 gpa, enum pg_level level, int asid, bool immutable)
+{
+ struct rmp_state val;
+
+ if (!pfn_valid(pfn))
+ return -EINVAL;
+
+ memset(&val, 0, sizeof(val));
+ val.assigned = 1;
+ val.asid = asid;
+ val.immutable = immutable;
+ val.gpa = gpa;
+ val.pagesize = X86_TO_RMP_PG_LEVEL(level);
+
+ return rmpupdate(pfn, &val);
+}
+EXPORT_SYMBOL_GPL(rmp_make_private);
+
+/*
+ * Transition a page to hypervisor/shared state using the RMPUPDATE instruction.
+ */
+int rmp_make_shared(u64 pfn, enum pg_level level)
+{
+ struct rmp_state val;
+
+ if (!pfn_valid(pfn))
+ return -EINVAL;
+
+ memset(&val, 0, sizeof(val));
+ val.pagesize = X86_TO_RMP_PG_LEVEL(level);
+
+ return rmpupdate(pfn, &val);
+}
+EXPORT_SYMBOL_GPL(rmp_make_shared);
--
2.25.1

2022-12-14 19:52:34

by Michael Roth

[permalink] [raw]
Subject: [PATCH RFC v7 17/64] x86/mm/pat: Introduce set_memory_p

From: Ashish Kalra <[email protected]>

set_memory_p() provides a way to change atributes of a memory range
to be marked as present.

Signed-off-by: Ashish Kalra <[email protected]>
Signed-off-by: Michael Roth <[email protected]>
---
arch/x86/include/asm/set_memory.h | 3 ++-
arch/x86/mm/pat/set_memory.c | 12 ++++++------
2 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/set_memory.h b/arch/x86/include/asm/set_memory.h
index b45c4d27fd46..56be492eb2d1 100644
--- a/arch/x86/include/asm/set_memory.h
+++ b/arch/x86/include/asm/set_memory.h
@@ -12,7 +12,7 @@
* Cacheability : UnCached, WriteCombining, WriteThrough, WriteBack
* Executability : eXecutable, NoteXecutable
* Read/Write : ReadOnly, ReadWrite
- * Presence : NotPresent
+ * Presence : NotPresent, Present
* Encryption : Encrypted, Decrypted
*
* Within a category, the attributes are mutually exclusive.
@@ -44,6 +44,7 @@ int set_memory_uc(unsigned long addr, int numpages);
int set_memory_wc(unsigned long addr, int numpages);
int set_memory_wb(unsigned long addr, int numpages);
int set_memory_np(unsigned long addr, int numpages);
+int set_memory_p(unsigned long addr, int numpages);
int set_memory_4k(unsigned long addr, int numpages);
int set_memory_encrypted(unsigned long addr, int numpages);
int set_memory_decrypted(unsigned long addr, int numpages);
diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
index 2e5a045731de..b1f79062c4a5 100644
--- a/arch/x86/mm/pat/set_memory.c
+++ b/arch/x86/mm/pat/set_memory.c
@@ -1993,17 +1993,12 @@ int set_mce_nospec(unsigned long pfn)
return rc;
}

-static int set_memory_p(unsigned long *addr, int numpages)
-{
- return change_page_attr_set(addr, numpages, __pgprot(_PAGE_PRESENT), 0);
-}
-
/* Restore full speculative operation to the pfn. */
int clear_mce_nospec(unsigned long pfn)
{
unsigned long addr = (unsigned long) pfn_to_kaddr(pfn);

- return set_memory_p(&addr, 1);
+ return set_memory_p(addr, 1);
}
EXPORT_SYMBOL_GPL(clear_mce_nospec);
#endif /* CONFIG_X86_64 */
@@ -2039,6 +2034,11 @@ int set_memory_np(unsigned long addr, int numpages)
return change_page_attr_clear(&addr, numpages, __pgprot(_PAGE_PRESENT), 0);
}

+int set_memory_p(unsigned long addr, int numpages)
+{
+ return change_page_attr_set(&addr, numpages, __pgprot(_PAGE_PRESENT), 0);
+}
+
int set_memory_np_noalias(unsigned long addr, int numpages)
{
int cpa_flags = CPA_NO_CHECK_ALIAS;
--
2.25.1

2022-12-14 19:53:02

by Michael Roth

[permalink] [raw]
Subject: [PATCH RFC v7 15/64] x86/sev: Add RMP entry lookup helpers

From: Brijesh Singh <[email protected]>

The snp_lookup_rmpentry() can be used by the host to read the RMP entry
for a given page. The RMP entry format is documented in AMD PPR, see
https://bugzilla.kernel.org/attachment.cgi?id=296015.

Co-developed-by: Ashish Kalra <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
Signed-off-by: Michael Roth <[email protected]>
---
arch/x86/include/asm/sev.h | 4 +-
arch/x86/kernel/sev.c | 83 ++++++++++++++++++++++++++++++++++++++
2 files changed, 86 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index ebc271bb6d8e..8d3ce2ad27da 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -83,7 +83,7 @@ extern bool handle_vc_boot_ghcb(struct pt_regs *regs);

/* RMP page size */
#define RMP_PG_SIZE_4K 0
-
+#define RMP_TO_X86_PG_LEVEL(level) (((level) == RMP_PG_SIZE_4K) ? PG_LEVEL_4K : PG_LEVEL_2M)
#define RMPADJUST_VMSA_PAGE_BIT BIT(16)

/* SNP Guest message request */
@@ -197,6 +197,7 @@ void snp_set_wakeup_secondary_cpu(void);
bool snp_init(struct boot_params *bp);
void __init __noreturn snp_abort(void);
int snp_issue_guest_request(u64 exit_code, struct snp_req_data *input, unsigned long *fw_err);
+int snp_lookup_rmpentry(u64 pfn, int *level);
#else
static inline void sev_es_ist_enter(struct pt_regs *regs) { }
static inline void sev_es_ist_exit(void) { }
@@ -221,6 +222,7 @@ static inline int snp_issue_guest_request(u64 exit_code, struct snp_req_data *in
{
return -ENOTTY;
}
+static inline int snp_lookup_rmpentry(u64 pfn, int *level) { return 0; }
#endif

#endif
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index 687a91284506..706675561f49 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -61,11 +61,35 @@
#define AP_INIT_CR0_DEFAULT 0x60000010
#define AP_INIT_MXCSR_DEFAULT 0x1f80

+/*
+ * The RMP entry format is not architectural. The format is defined in PPR
+ * Family 19h Model 01h, Rev B1 processor.
+ */
+struct rmpentry {
+ union {
+ struct {
+ u64 assigned : 1,
+ pagesize : 1,
+ immutable : 1,
+ rsvd1 : 9,
+ gpa : 39,
+ asid : 10,
+ vmsa : 1,
+ validated : 1,
+ rsvd2 : 1;
+ } info;
+ u64 low;
+ };
+ u64 high;
+} __packed;
+
/*
* The first 16KB from the RMP_BASE is used by the processor for the
* bookkeeping, the range needs to be added during the RMP entry lookup.
*/
#define RMPTABLE_CPU_BOOKKEEPING_SZ 0x4000
+#define RMPENTRY_SHIFT 8
+#define rmptable_page_offset(x) (RMPTABLE_CPU_BOOKKEEPING_SZ + (((unsigned long)x) >> RMPENTRY_SHIFT))

/* For early boot hypervisor communication in SEV-ES enabled guests */
static struct ghcb boot_ghcb_page __bss_decrypted __aligned(PAGE_SIZE);
@@ -2440,3 +2464,62 @@ static int __init snp_rmptable_init(void)
* the page(s) used for DMA are hypervisor owned.
*/
fs_initcall(snp_rmptable_init);
+
+static inline unsigned int rmpentry_assigned(struct rmpentry *e)
+{
+ return e->info.assigned;
+}
+
+static inline unsigned int rmpentry_pagesize(struct rmpentry *e)
+{
+ return e->info.pagesize;
+}
+
+static struct rmpentry *rmptable_entry(unsigned long paddr)
+{
+ unsigned long vaddr;
+
+ vaddr = rmptable_start + rmptable_page_offset(paddr);
+ if (unlikely(vaddr > rmptable_end))
+ return ERR_PTR(-EFAULT);
+
+ return (struct rmpentry *)vaddr;
+}
+
+static struct rmpentry *__snp_lookup_rmpentry(u64 pfn, int *level)
+{
+ unsigned long paddr = pfn << PAGE_SHIFT;
+ struct rmpentry *entry, *large_entry;
+
+ if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
+ return ERR_PTR(-ENXIO);
+
+ if (!pfn_valid(pfn))
+ return ERR_PTR(-EINVAL);
+
+ entry = rmptable_entry(paddr);
+ if (IS_ERR(entry))
+ return entry;
+
+ /* Read a large RMP entry to get the correct page level used in RMP entry. */
+ large_entry = rmptable_entry(paddr & PMD_MASK);
+ *level = RMP_TO_X86_PG_LEVEL(rmpentry_pagesize(large_entry));
+
+ return entry;
+}
+
+/*
+ * Return 1 if the RMP entry is assigned, 0 if it exists but is not assigned,
+ * and -errno if there is no corresponding RMP entry.
+ */
+int snp_lookup_rmpentry(u64 pfn, int *level)
+{
+ struct rmpentry *e;
+
+ e = __snp_lookup_rmpentry(pfn, level);
+ if (IS_ERR(e))
+ return PTR_ERR(e);
+
+ return !!rmpentry_assigned(e);
+}
+EXPORT_SYMBOL_GPL(snp_lookup_rmpentry);
--
2.25.1

2022-12-14 19:53:07

by Michael Roth

[permalink] [raw]
Subject: [PATCH RFC v7 18/64] x86/sev: Invalidate pages from the direct map when adding them to the RMP table

From: Brijesh Singh <[email protected]>

The integrity guarantee of SEV-SNP is enforced through the RMP table.
The RMP is used with standard x86 and IOMMU page tables to enforce
memory restrictions and page access rights. The RMP check is enforced as
soon as SEV-SNP is enabled globally in the system. When hardware
encounters an RMP-check failure, it raises a page-fault exception.

The rmp_make_private() and rmp_make_shared() helpers are used to add
or remove the pages from the RMP table. Improve the rmp_make_private()
to invalidate state so that pages cannot be used in the direct-map after
they are added the RMP table, and restored to their default valid
permission after the pages are removed from the RMP table.

Co-developed-by: Ashish Kalra <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
Signed-off-by: Michael Roth <[email protected]>
---
arch/x86/kernel/sev.c | 38 +++++++++++++++++++++++++++++++++++++-
1 file changed, 37 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index 67035d34adad..e2b38c3551be 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -2549,15 +2549,40 @@ int psmash(u64 pfn)
}
EXPORT_SYMBOL_GPL(psmash);

+static int restore_direct_map(u64 pfn, int npages)
+{
+ return set_memory_p((unsigned long)pfn_to_kaddr(pfn), npages);
+}
+
+static int invalidate_direct_map(unsigned long pfn, int npages)
+{
+ return set_memory_np((unsigned long)pfn_to_kaddr(pfn), npages);
+}
+
static int rmpupdate(u64 pfn, struct rmp_state *val)
{
unsigned long paddr = pfn << PAGE_SHIFT;
+ int ret, level, npages;
int retries = 0;
- int ret;

if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
return -ENXIO;

+ level = RMP_TO_X86_PG_LEVEL(val->pagesize);
+ npages = page_level_size(level) / PAGE_SIZE;
+
+ /*
+ * If page is getting assigned in the RMP table then unmap it from the
+ * direct map.
+ */
+ if (val->assigned) {
+ if (invalidate_direct_map(pfn, npages)) {
+ pr_err("Failed to unmap %d pages at pfn 0x%llx from the direct_map\n",
+ npages, pfn);
+ return -EFAULT;
+ }
+ }
+
retry:
/* Binutils version 2.36 supports the RMPUPDATE mnemonic. */
asm volatile(".byte 0xF2, 0x0F, 0x01, 0xFE"
@@ -2578,6 +2603,17 @@ static int rmpupdate(u64 pfn, struct rmp_state *val)
pr_err("RMPUPDATE for pfn %llx succeeded after %d retries\n", pfn, retries);
}

+ /*
+ * Restore the direct map after the page is removed from the RMP table.
+ */
+ if (!ret && !val->assigned) {
+ if (restore_direct_map(pfn, npages)) {
+ pr_err("Failed to map %d pages at pfn 0x%llx into the direct_map\n",
+ npages, pfn);
+ return -EFAULT;
+ }
+ }
+
return ret;
}

--
2.25.1

2022-12-14 19:56:12

by Michael Roth

[permalink] [raw]
Subject: [PATCH RFC v7 19/64] x86/traps: Define RMP violation #PF error code

From: Brijesh Singh <[email protected]>

Bit 31 in the page fault-error bit will be set when processor encounters
an RMP violation.

While at it, use the BIT_ULL() macro.

Signed-off-by: Brijesh Singh <[email protected]>
Signed-off by: Ashish Kalra <[email protected]>
Signed-off-by: Michael Roth <[email protected]>
---
arch/x86/include/asm/trap_pf.h | 18 +++++++++++-------
arch/x86/mm/fault.c | 1 +
2 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/trap_pf.h b/arch/x86/include/asm/trap_pf.h
index 10b1de500ab1..295be06f8db7 100644
--- a/arch/x86/include/asm/trap_pf.h
+++ b/arch/x86/include/asm/trap_pf.h
@@ -2,6 +2,8 @@
#ifndef _ASM_X86_TRAP_PF_H
#define _ASM_X86_TRAP_PF_H

+#include <linux/bits.h> /* BIT() macro */
+
/*
* Page fault error code bits:
*
@@ -12,15 +14,17 @@
* bit 4 == 1: fault was an instruction fetch
* bit 5 == 1: protection keys block access
* bit 15 == 1: SGX MMU page-fault
+ * bit 31 == 1: fault was due to RMP violation
*/
enum x86_pf_error_code {
- X86_PF_PROT = 1 << 0,
- X86_PF_WRITE = 1 << 1,
- X86_PF_USER = 1 << 2,
- X86_PF_RSVD = 1 << 3,
- X86_PF_INSTR = 1 << 4,
- X86_PF_PK = 1 << 5,
- X86_PF_SGX = 1 << 15,
+ X86_PF_PROT = BIT(0),
+ X86_PF_WRITE = BIT(1),
+ X86_PF_USER = BIT(2),
+ X86_PF_RSVD = BIT(3),
+ X86_PF_INSTR = BIT(4),
+ X86_PF_PK = BIT(5),
+ X86_PF_SGX = BIT(15),
+ X86_PF_RMP = BIT(31),
};

#endif /* _ASM_X86_TRAP_PF_H */
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 7b0d4ab894c8..f8193b99e9c8 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -567,6 +567,7 @@ show_fault_oops(struct pt_regs *regs, unsigned long error_code, unsigned long ad
!(error_code & X86_PF_PROT) ? "not-present page" :
(error_code & X86_PF_RSVD) ? "reserved bit violation" :
(error_code & X86_PF_PK) ? "protection keys violation" :
+ (error_code & X86_PF_RMP) ? "RMP violation" :
"permissions violation");

if (!(error_code & X86_PF_USER) && user_mode(regs)) {
--
2.25.1

2022-12-14 19:58:13

by Michael Roth

[permalink] [raw]
Subject: [PATCH RFC v7 01/64] KVM: Fix memslot boundary condition for large page

From: Nikunj A Dadhania <[email protected]>

Aligned end boundary causes a kvm crash, handle the case.

Signed-off-by: Nikunj A Dadhania <[email protected]>
Signed-off-by: Michael Roth <[email protected]>
---
arch/x86/kvm/mmu/mmu.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index b1953ebc012e..b3ffc61c668c 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -7159,6 +7159,9 @@ static void kvm_update_lpage_private_shared_mixed(struct kvm *kvm,
for (gfn = first + pages; gfn < last; gfn += pages)
linfo_set_mixed(gfn, slot, level, false);

+ if (gfn == last)
+ goto out;
+
gfn = last;
gfn_end = min(last + pages, slot->base_gfn + slot->npages);
mixed = mem_attrs_mixed(kvm, slot, level, attrs, gfn, gfn_end);
--
2.25.1

2022-12-14 19:59:39

by Michael Roth

[permalink] [raw]
Subject: [PATCH RFC v7 20/64] x86/fault: Add support to handle the RMP fault for user address

From: Brijesh Singh <[email protected]>

When SEV-SNP is enabled globally, a write from the host goes through the
RMP check. When the host writes to pages, hardware checks the following
conditions at the end of page walk:

1. Assigned bit in the RMP table is zero (i.e page is shared).
2. If the page table entry that gives the sPA indicates that the target
page size is a large page, then all RMP entries for the 4KB
constituting pages of the target must have the assigned bit 0.
3. Immutable bit in the RMP table is not zero.

The hardware will raise page fault if one of the above conditions is not
met. Try resolving the fault instead of taking fault again and again. If
the host attempts to write to the guest private memory then send the
SIGBUS signal to kill the process. If the page level between the host and
RMP entry does not match, then split the address to keep the RMP and host
page levels in sync.

Co-developed-by: Jarkko Sakkinen <[email protected]>
Signed-off-by: Jarkko Sakkinen <[email protected]>
Co-developed-by: Ashish Kalra <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
Signed-off-by: Michael Roth <[email protected]>
---
arch/x86/mm/fault.c | 97 ++++++++++++++++++++++++++++++++++++++++
include/linux/mm.h | 3 +-
include/linux/mm_types.h | 3 ++
mm/memory.c | 10 +++++
4 files changed, 112 insertions(+), 1 deletion(-)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index f8193b99e9c8..d611051dcf1e 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -33,6 +33,7 @@
#include <asm/kvm_para.h> /* kvm_handle_async_pf */
#include <asm/vdso.h> /* fixup_vdso_exception() */
#include <asm/irq_stack.h>
+#include <asm/sev.h> /* snp_lookup_rmpentry() */

#define CREATE_TRACE_POINTS
#include <asm/trace/exceptions.h>
@@ -414,6 +415,7 @@ static void dump_pagetable(unsigned long address)
pr_cont("PTE %lx", pte_val(*pte));
out:
pr_cont("\n");
+
return;
bad:
pr_info("BAD\n");
@@ -1240,6 +1242,90 @@ do_kern_addr_fault(struct pt_regs *regs, unsigned long hw_error_code,
}
NOKPROBE_SYMBOL(do_kern_addr_fault);

+enum rmp_pf_ret {
+ RMP_PF_SPLIT = 0,
+ RMP_PF_RETRY = 1,
+ RMP_PF_UNMAP = 2,
+};
+
+/*
+ * The goal of RMP faulting routine is really to check whether the
+ * page that faulted should be accessible. That can be determined
+ * simply by looking at the RMP entry for the 4k address being accessed.
+ * If that entry has Assigned=1 then it's a bad address. It could be
+ * because the 2MB region was assigned as a large page, or it could be
+ * because the region is all 4k pages and that 4k was assigned.
+ * In either case, it's a bad access.
+ * There are basically two main possibilities:
+ * 1. The 2M entry has Assigned=1 and Page_Size=1. Then all 511 middle
+ * entries also have Assigned=1. This entire 2M region is a guest page.
+ * 2. The 2M entry has Assigned=0 and Page_Size=0. Then the 511 middle
+ * entries can be anything, this region consists of individual 4k assignments.
+ */
+static int handle_user_rmp_page_fault(struct pt_regs *regs, unsigned long error_code,
+ unsigned long address)
+{
+ int rmp_level, level;
+ pgd_t *pgd;
+ pte_t *pte;
+ u64 pfn;
+
+ pgd = __va(read_cr3_pa());
+ pgd += pgd_index(address);
+
+ pte = lookup_address_in_pgd(pgd, address, &level);
+
+ /*
+ * It can happen if there was a race between an unmap event and
+ * the RMP fault delivery.
+ */
+ if (!pte || !pte_present(*pte))
+ return RMP_PF_UNMAP;
+
+ /*
+ * RMP page fault handler follows this algorithm:
+ * 1. Compute the pfn for the 4kb page being accessed
+ * 2. Read that RMP entry -- If it is assigned then kill the process
+ * 3. Otherwise, check the level from the host page table
+ * If level=PG_LEVEL_4K then the page is already smashed
+ * so just retry the instruction
+ * 4. If level=PG_LEVEL_2M/1G, then the host page needs to be split
+ */
+
+ pfn = pte_pfn(*pte);
+
+ /* If its large page then calculte the fault pfn */
+ if (level > PG_LEVEL_4K)
+ pfn = pfn | PFN_DOWN(address & (page_level_size(level) - 1));
+
+ /*
+ * If its a guest private page, then the fault cannot be resolved.
+ * Send a SIGBUS to terminate the process.
+ *
+ * As documented in APM vol3 pseudo-code for RMPUPDATE, when the 2M range
+ * is covered by a valid (Assigned=1) 2M entry, the middle 511 4k entries
+ * also have Assigned=1. This means that if there is an access to a page
+ * which happens to lie within an Assigned 2M entry, the 4k RMP entry
+ * will also have Assigned=1. Therefore, the kernel should see that
+ * the page is not a valid page and the fault cannot be resolved.
+ */
+ if (snp_lookup_rmpentry(pfn, &rmp_level)) {
+ pr_info("Fatal RMP page fault, terminating process, entry assigned for pfn 0x%llx\n",
+ pfn);
+ do_sigbus(regs, error_code, address, VM_FAULT_SIGBUS);
+ return RMP_PF_RETRY;
+ }
+
+ /*
+ * The backing page level is higher than the RMP page level, request
+ * to split the page.
+ */
+ if (level > rmp_level)
+ return RMP_PF_SPLIT;
+
+ return RMP_PF_RETRY;
+}
+
/*
* Handle faults in the user portion of the address space. Nothing in here
* should check X86_PF_USER without a specific justification: for almost
@@ -1337,6 +1423,17 @@ void do_user_addr_fault(struct pt_regs *regs,
if (error_code & X86_PF_INSTR)
flags |= FAULT_FLAG_INSTRUCTION;

+ /*
+ * If its an RMP violation, try resolving it.
+ */
+ if (error_code & X86_PF_RMP) {
+ if (handle_user_rmp_page_fault(regs, error_code, address))
+ return;
+
+ /* Ask to split the page */
+ flags |= FAULT_FLAG_PAGE_SPLIT;
+ }
+
#ifdef CONFIG_X86_64
/*
* Faults in the vsyscall page might need emulation. The
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 3c84f4e48cd7..2fd8e16d149c 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -466,7 +466,8 @@ static inline bool fault_flag_allow_retry_first(enum fault_flag flags)
{ FAULT_FLAG_USER, "USER" }, \
{ FAULT_FLAG_REMOTE, "REMOTE" }, \
{ FAULT_FLAG_INSTRUCTION, "INSTRUCTION" }, \
- { FAULT_FLAG_INTERRUPTIBLE, "INTERRUPTIBLE" }
+ { FAULT_FLAG_INTERRUPTIBLE, "INTERRUPTIBLE" }, \
+ { FAULT_FLAG_PAGE_SPLIT, "PAGESPLIT" }

/*
* vm_fault is filled by the pagefault handler and passed to the vma's
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 500e536796ca..06ba34d51638 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -962,6 +962,8 @@ typedef struct {
* mapped R/O.
* @FAULT_FLAG_ORIG_PTE_VALID: whether the fault has vmf->orig_pte cached.
* We should only access orig_pte if this flag set.
+ * @FAULT_FLAG_PAGE_SPLIT: The fault was due page size mismatch, split the
+ * region to smaller page size and retry.
*
* About @FAULT_FLAG_ALLOW_RETRY and @FAULT_FLAG_TRIED: we can specify
* whether we would allow page faults to retry by specifying these two
@@ -999,6 +1001,7 @@ enum fault_flag {
FAULT_FLAG_INTERRUPTIBLE = 1 << 9,
FAULT_FLAG_UNSHARE = 1 << 10,
FAULT_FLAG_ORIG_PTE_VALID = 1 << 11,
+ FAULT_FLAG_PAGE_SPLIT = 1 << 12,
};

typedef unsigned int __bitwise zap_flags_t;
diff --git a/mm/memory.c b/mm/memory.c
index f88c351aecd4..e68da7e403c6 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4996,6 +4996,12 @@ static vm_fault_t handle_pte_fault(struct vm_fault *vmf)
return 0;
}

+static int handle_split_page_fault(struct vm_fault *vmf)
+{
+ __split_huge_pmd(vmf->vma, vmf->pmd, vmf->address, false, NULL);
+ return 0;
+}
+
/*
* By the time we get here, we already hold the mm semaphore
*
@@ -5078,6 +5084,10 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma,
pmd_migration_entry_wait(mm, vmf.pmd);
return 0;
}
+
+ if (flags & FAULT_FLAG_PAGE_SPLIT)
+ return handle_split_page_fault(&vmf);
+
if (pmd_trans_huge(vmf.orig_pmd) || pmd_devmap(vmf.orig_pmd)) {
if (pmd_protnone(vmf.orig_pmd) && vma_is_accessible(vma))
return do_huge_pmd_numa_page(&vmf);
--
2.25.1

2022-12-14 20:01:05

by Michael Roth

[permalink] [raw]
Subject: [PATCH RFC v7 21/64] x86/fault: fix handle_split_page_fault() to work with memfd backed pages

From: Hugh Dickins <[email protected]>

When the address is backed by a memfd, the code to split the page does
nothing more than remove the PMD from the page tables. So immediately
install a PTE to ensure that any other pages in that 2MB region are
brought back as in 4K pages.

Signed-off-by: Hugh Dickins <[email protected]>
Cc: Hugh Dickins <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
Signed-off-by: Michael Roth <[email protected]>
---
mm/memory.c | 5 +++++
1 file changed, 5 insertions(+)

diff --git a/mm/memory.c b/mm/memory.c
index e68da7e403c6..33c9020ba1f8 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4999,6 +4999,11 @@ static vm_fault_t handle_pte_fault(struct vm_fault *vmf)
static int handle_split_page_fault(struct vm_fault *vmf)
{
__split_huge_pmd(vmf->vma, vmf->pmd, vmf->address, false, NULL);
+ /*
+ * Install a PTE immediately to ensure that any other pages in
+ * this 2MB region are brought back in as 4K pages.
+ */
+ __pte_alloc(vmf->vma->vm_mm, vmf->pmd);
return 0;
}

--
2.25.1

2022-12-14 20:01:07

by Michael Roth

[permalink] [raw]
Subject: [PATCH RFC v7 22/64] x86/fault: Return pfn from dump_pagetable() for SEV-specific fault handling.

From: Ashish Kalra <[email protected]>

Return pfn from dump_pagetable() to do SEV-specific
fault handling. Used for handling SNP RMP page fault.

Signed-off-by: Ashish Kalra <[email protected]>
Signed-off-by: Michael Roth <[email protected]>
---
arch/x86/mm/fault.c | 15 +++++++++++----
1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index d611051dcf1e..ded53879f98d 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -311,7 +311,7 @@ static bool low_pfn(unsigned long pfn)
return pfn < max_low_pfn;
}

-static void dump_pagetable(unsigned long address)
+static unsigned long dump_pagetable(unsigned long address)
{
pgd_t *base = __va(read_cr3_pa());
pgd_t *pgd = &base[pgd_index(address)];
@@ -345,8 +345,10 @@ static void dump_pagetable(unsigned long address)

pte = pte_offset_kernel(pmd, address);
pr_cont("*pte = %0*Lx ", sizeof(*pte) * 2, (u64)pte_val(*pte));
+ return 0;
out:
pr_cont("\n");
+ return 0;
}

#else /* CONFIG_X86_64: */
@@ -367,10 +369,11 @@ static int bad_address(void *p)
return get_kernel_nofault(dummy, (unsigned long *)p);
}

-static void dump_pagetable(unsigned long address)
+static unsigned long dump_pagetable(unsigned long address)
{
pgd_t *base = __va(read_cr3_pa());
pgd_t *pgd = base + pgd_index(address);
+ unsigned long pfn;
p4d_t *p4d;
pud_t *pud;
pmd_t *pmd;
@@ -388,6 +391,7 @@ static void dump_pagetable(unsigned long address)
if (bad_address(p4d))
goto bad;

+ pfn = p4d_pfn(*p4d);
pr_cont("P4D %lx ", p4d_val(*p4d));
if (!p4d_present(*p4d) || p4d_large(*p4d))
goto out;
@@ -396,6 +400,7 @@ static void dump_pagetable(unsigned long address)
if (bad_address(pud))
goto bad;

+ pfn = pud_pfn(*pud);
pr_cont("PUD %lx ", pud_val(*pud));
if (!pud_present(*pud) || pud_large(*pud))
goto out;
@@ -404,6 +409,7 @@ static void dump_pagetable(unsigned long address)
if (bad_address(pmd))
goto bad;

+ pfn = pmd_pfn(*pmd);
pr_cont("PMD %lx ", pmd_val(*pmd));
if (!pmd_present(*pmd) || pmd_large(*pmd))
goto out;
@@ -412,13 +418,14 @@ static void dump_pagetable(unsigned long address)
if (bad_address(pte))
goto bad;

+ pfn = pte_pfn(*pte);
pr_cont("PTE %lx", pte_val(*pte));
out:
pr_cont("\n");
-
- return;
+ return pfn;
bad:
pr_info("BAD\n");
+ return -1;
}

#endif /* CONFIG_X86_64 */
--
2.25.1

2022-12-14 20:01:25

by Michael Roth

[permalink] [raw]
Subject: [PATCH RFC v7 23/64] x86/fault: Add support to dump RMP entry on fault

From: Brijesh Singh <[email protected]>

When SEV-SNP is enabled globally, a write from the host goes through the
RMP check. If the hardware encounters the check failure, then it raises
the #PF (with RMP set). Dump the RMP entry at the faulting pfn to help
the debug.

Signed-off-by: Brijesh Singh <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
Signed-off-by: Michael Roth <[email protected]>
---
arch/x86/include/asm/sev.h | 2 ++
arch/x86/kernel/sev.c | 43 ++++++++++++++++++++++++++++++++++++++
arch/x86/mm/fault.c | 7 ++++++-
3 files changed, 51 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index 4eeedcaca593..2916f4150ac7 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -215,6 +215,7 @@ int snp_lookup_rmpentry(u64 pfn, int *level);
int psmash(u64 pfn);
int rmp_make_private(u64 pfn, u64 gpa, enum pg_level level, int asid, bool immutable);
int rmp_make_shared(u64 pfn, enum pg_level level);
+void sev_dump_rmpentry(u64 pfn);
#else
static inline void sev_es_ist_enter(struct pt_regs *regs) { }
static inline void sev_es_ist_exit(void) { }
@@ -247,6 +248,7 @@ static inline int rmp_make_private(u64 pfn, u64 gpa, enum pg_level level, int as
return -ENODEV;
}
static inline int rmp_make_shared(u64 pfn, enum pg_level level) { return -ENODEV; }
+static inline void sev_dump_rmpentry(u64 pfn) {}
#endif

#endif
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index e2b38c3551be..1dd1b36bdfea 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -2508,6 +2508,49 @@ static struct rmpentry *__snp_lookup_rmpentry(u64 pfn, int *level)
return entry;
}

+void sev_dump_rmpentry(u64 pfn)
+{
+ unsigned long pfn_end;
+ struct rmpentry *e;
+ int level;
+
+ e = __snp_lookup_rmpentry(pfn, &level);
+ if (!e) {
+ pr_info("failed to read RMP entry pfn 0x%llx\n", pfn);
+ return;
+ }
+
+ if (rmpentry_assigned(e)) {
+ pr_info("RMPEntry paddr 0x%llx [assigned=%d immutable=%d pagesize=%d gpa=0x%lx"
+ " asid=%d vmsa=%d validated=%d]\n", pfn << PAGE_SHIFT,
+ rmpentry_assigned(e), e->info.immutable, rmpentry_pagesize(e),
+ (unsigned long)e->info.gpa, e->info.asid, e->info.vmsa,
+ e->info.validated);
+ return;
+ }
+
+ /*
+ * If the RMP entry at the faulting pfn was not assigned, then not sure
+ * what caused the RMP violation. To get some useful debug information,
+ * iterate through the entire 2MB region, and dump the RMP entries if
+ * one of the bit in the RMP entry is set.
+ */
+ pfn = pfn & ~(PTRS_PER_PMD - 1);
+ pfn_end = pfn + PTRS_PER_PMD;
+
+ while (pfn < pfn_end) {
+ e = __snp_lookup_rmpentry(pfn, &level);
+ if (!e)
+ return;
+
+ if (e->low || e->high)
+ pr_info("RMPEntry paddr 0x%llx: [high=0x%016llx low=0x%016llx]\n",
+ pfn << PAGE_SHIFT, e->high, e->low);
+ pfn++;
+ }
+}
+EXPORT_SYMBOL_GPL(sev_dump_rmpentry);
+
/*
* Return 1 if the RMP entry is assigned, 0 if it exists but is not assigned,
* and -errno if there is no corresponding RMP entry.
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index ded53879f98d..f2b16dcfbd9a 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -536,6 +536,8 @@ static void show_ldttss(const struct desc_ptr *gdt, const char *name, u16 index)
static void
show_fault_oops(struct pt_regs *regs, unsigned long error_code, unsigned long address)
{
+ unsigned long pfn;
+
if (!oops_may_print())
return;

@@ -608,7 +610,10 @@ show_fault_oops(struct pt_regs *regs, unsigned long error_code, unsigned long ad
show_ldttss(&gdt, "TR", tr);
}

- dump_pagetable(address);
+ pfn = dump_pagetable(address);
+
+ if (error_code & X86_PF_RMP)
+ sev_dump_rmpentry(pfn);
}

static noinline void
--
2.25.1

2022-12-14 20:01:44

by Michael Roth

[permalink] [raw]
Subject: [PATCH RFC v7 26/64] crypto:ccp: Provide API to issue SEV and SNP commands

From: Brijesh Singh <[email protected]>

Make sev_do_cmd() a generic API interface for the hypervisor
to issue commands to manage an SEV and SNP guest. The commands
for SEV and SNP are defined in the SEV and SEV-SNP firmware
specifications.

Signed-off-by: Brijesh Singh <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
Signed-off-by: Michael Roth <[email protected]>
---
drivers/crypto/ccp/sev-dev.c | 3 ++-
include/linux/psp-sev.h | 17 +++++++++++++++++
2 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index af20420bd6c2..35f605936f1b 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -415,7 +415,7 @@ static int __sev_do_cmd_locked(int cmd, void *data, int *psp_ret)
return ret;
}

-static int sev_do_cmd(int cmd, void *data, int *psp_ret)
+int sev_do_cmd(int cmd, void *data, int *psp_ret)
{
int rc;

@@ -425,6 +425,7 @@ static int sev_do_cmd(int cmd, void *data, int *psp_ret)

return rc;
}
+EXPORT_SYMBOL_GPL(sev_do_cmd);

static int __sev_init_locked(int *error)
{
diff --git a/include/linux/psp-sev.h b/include/linux/psp-sev.h
index 8cfe92e82743..46f61e3ae33b 100644
--- a/include/linux/psp-sev.h
+++ b/include/linux/psp-sev.h
@@ -907,6 +907,20 @@ int sev_guest_df_flush(int *error);
*/
int sev_guest_decommission(struct sev_data_decommission *data, int *error);

+/**
+ * sev_do_cmd - perform SEV command
+ *
+ * @error: SEV command return code
+ *
+ * Returns:
+ * 0 if the SEV successfully processed the command
+ * -%ENODEV if the SEV device is not available
+ * -%ENOTSUPP if the SEV does not support SEV
+ * -%ETIMEDOUT if the SEV command timed out
+ * -%EIO if the SEV returned a non-zero return code
+ */
+int sev_do_cmd(int cmd, void *data, int *psp_ret);
+
void *psp_copy_user_blob(u64 uaddr, u32 len);

#else /* !CONFIG_CRYPTO_DEV_SP_PSP */
@@ -924,6 +938,9 @@ sev_guest_deactivate(struct sev_data_deactivate *data, int *error) { return -ENO
static inline int
sev_guest_decommission(struct sev_data_decommission *data, int *error) { return -ENODEV; }

+static inline int
+sev_do_cmd(int cmd, void *data, int *psp_ret) { return -ENODEV; }
+
static inline int
sev_guest_activate(struct sev_data_activate *data, int *error) { return -ENODEV; }

--
2.25.1

2022-12-14 20:02:00

by Michael Roth

[permalink] [raw]
Subject: [PATCH RFC v7 27/64] crypto: ccp: Introduce snp leaked pages list

From: Ashish Kalra <[email protected]>

Pages are unsafe to be released back to the page-allocator, if they
have been transitioned to firmware/guest state and can't be reclaimed
or transitioned back to hypervisor/shared state. In this case add
them to an internal leaked pages list to ensure that they are not freed
or touched/accessed to cause fatal page faults.

Signed-off-by: Ashish Kalra <[email protected]>
Signed-off-by: Michael Roth <[email protected]>
---
drivers/crypto/ccp/sev-dev.c | 28 ++++++++++++++++++++++++++++
include/linux/psp-sev.h | 8 ++++++++
2 files changed, 36 insertions(+)

diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index 35f605936f1b..eca4e59b0f44 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -42,6 +42,12 @@
static DEFINE_MUTEX(sev_cmd_mutex);
static struct sev_misc_dev *misc_dev;

+/* list of pages which are leaked and cannot be reclaimed */
+static LIST_HEAD(snp_leaked_pages_list);
+static DEFINE_SPINLOCK(snp_leaked_pages_list_lock);
+
+static atomic_long_t snp_nr_leaked_pages = ATOMIC_LONG_INIT(0);
+
static int psp_cmd_timeout = 100;
module_param(psp_cmd_timeout, int, 0644);
MODULE_PARM_DESC(psp_cmd_timeout, " default timeout value, in seconds, for PSP commands");
@@ -188,6 +194,28 @@ static int sev_cmd_buffer_len(int cmd)
return 0;
}

+void snp_mark_pages_offline(unsigned long pfn, unsigned int npages)
+{
+ struct page *page = pfn_to_page(pfn);
+
+ WARN(1, "psc failed, pfn 0x%lx pages %d (marked offline)\n", pfn, npages);
+
+ spin_lock(&snp_leaked_pages_list_lock);
+ while (npages--) {
+ /*
+ * Reuse the page's buddy list for chaining into the leaked
+ * pages list. This page should not be on a free list currently
+ * and is also unsafe to be added to a free list.
+ */
+ list_add_tail(&page->buddy_list, &snp_leaked_pages_list);
+ sev_dump_rmpentry(pfn);
+ pfn++;
+ }
+ spin_unlock(&snp_leaked_pages_list_lock);
+ atomic_long_inc(&snp_nr_leaked_pages);
+}
+EXPORT_SYMBOL_GPL(snp_mark_pages_offline);
+
static void *sev_fw_alloc(unsigned long len)
{
struct page *page;
diff --git a/include/linux/psp-sev.h b/include/linux/psp-sev.h
index 46f61e3ae33b..8edf5c548fbf 100644
--- a/include/linux/psp-sev.h
+++ b/include/linux/psp-sev.h
@@ -923,6 +923,12 @@ int sev_do_cmd(int cmd, void *data, int *psp_ret);

void *psp_copy_user_blob(u64 uaddr, u32 len);

+/**
+ * sev_mark_pages_offline - insert non-reclaimed firmware/guest pages
+ * into a leaked pages list.
+ */
+void snp_mark_pages_offline(unsigned long pfn, unsigned int npages);
+
#else /* !CONFIG_CRYPTO_DEV_SP_PSP */

static inline int
@@ -951,6 +957,8 @@ sev_issue_cmd_external_user(struct file *filep, unsigned int id, void *data, int

static inline void *psp_copy_user_blob(u64 __user uaddr, u32 len) { return ERR_PTR(-EINVAL); }

+void snp_mark_pages_offline(unsigned long pfn, unsigned int npages) {}
+
#endif /* CONFIG_CRYPTO_DEV_SP_PSP */

#endif /* __PSP_SEV_H__ */
--
2.25.1

2022-12-14 20:02:37

by Michael Roth

[permalink] [raw]
Subject: [PATCH RFC v7 25/64] crypto: ccp: Add support to initialize the AMD-SP for SEV-SNP

From: Brijesh Singh <[email protected]>

Before SNP VMs can be launched, the platform must be appropriately
configured and initialized. Platform initialization is accomplished via
the SNP_INIT command. Make sure to do a WBINVD and issue DF_FLUSH
command to prepare for the first SNP guest launch after INIT.

During the execution of SNP_INIT command, the firmware configures
and enables SNP security policy enforcement in many system components.
Some system components write to regions of memory reserved by early
x86 firmware (e.g. UEFI). Other system components write to regions
provided by the operation system, hypervisor, or x86 firmware.
Such system components can only write to HV-fixed pages or Default
pages. They will error when attempting to write to other page states
after SNP_INIT enables their SNP enforcement.

Starting in SNP firmware v1.52, the SNP_INIT_EX command takes a list of
system physical address ranges to convert into the HV-fixed page states
during the RMP initialization. If INIT_RMP is 1, hypervisors should
provide all system physical address ranges that the hypervisor will
never assign to a guest until the next RMP re-initialization.
For instance, the memory that UEFI reserves should be included in the
range list. This allows system components that occasionally write to
memory (e.g. logging to UEFI reserved regions) to not fail due to
RMP initialization and SNP enablement.

Co-developed-by: Ashish Kalra <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
Signed-off-by: Michael Roth <[email protected]>
---
drivers/crypto/ccp/sev-dev.c | 225 +++++++++++++++++++++++++++++++++++
drivers/crypto/ccp/sev-dev.h | 2 +
include/linux/psp-sev.h | 17 +++
3 files changed, 244 insertions(+)

diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index 9d84720a41d7..af20420bd6c2 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -26,6 +26,7 @@
#include <linux/fs_struct.h>

#include <asm/smp.h>
+#include <asm/e820/types.h>

#include "psp-dev.h"
#include "sev-dev.h"
@@ -34,6 +35,10 @@
#define SEV_FW_FILE "amd/sev.fw"
#define SEV_FW_NAME_SIZE 64

+/* Minimum firmware version required for the SEV-SNP support */
+#define SNP_MIN_API_MAJOR 1
+#define SNP_MIN_API_MINOR 51
+
static DEFINE_MUTEX(sev_cmd_mutex);
static struct sev_misc_dev *misc_dev;

@@ -76,6 +81,13 @@ static void *sev_es_tmr;
#define NV_LENGTH (32 * 1024)
static void *sev_init_ex_buffer;

+/*
+ * SEV_DATA_RANGE_LIST:
+ * Array containing range of pages that firmware transitions to HV-fixed
+ * page state.
+ */
+struct sev_data_range_list *snp_range_list;
+
static inline bool sev_version_greater_or_equal(u8 maj, u8 min)
{
struct sev_device *sev = psp_master->sev_data;
@@ -830,6 +842,186 @@ static int sev_update_firmware(struct device *dev)
return ret;
}

+static void snp_set_hsave_pa(void *arg)
+{
+ wrmsrl(MSR_VM_HSAVE_PA, 0);
+}
+
+static int snp_filter_reserved_mem_regions(struct resource *rs, void *arg)
+{
+ struct sev_data_range_list *range_list = arg;
+ struct sev_data_range *range = &range_list->ranges[range_list->num_elements];
+ size_t size;
+
+ if ((range_list->num_elements * sizeof(struct sev_data_range) +
+ sizeof(struct sev_data_range_list)) > PAGE_SIZE)
+ return -E2BIG;
+
+ switch (rs->desc) {
+ case E820_TYPE_RESERVED:
+ case E820_TYPE_PMEM:
+ case E820_TYPE_ACPI:
+ range->base = rs->start & PAGE_MASK;
+ size = (rs->end + 1) - rs->start;
+ range->page_count = size >> PAGE_SHIFT;
+ range_list->num_elements++;
+ break;
+ default:
+ break;
+ }
+
+ return 0;
+}
+
+static int __sev_snp_init_locked(int *error)
+{
+ struct psp_device *psp = psp_master;
+ struct sev_data_snp_init_ex data;
+ struct sev_device *sev;
+ int rc = 0;
+
+ if (!psp || !psp->sev_data)
+ return -ENODEV;
+
+ sev = psp->sev_data;
+
+ if (sev->snp_initialized)
+ return 0;
+
+ /*
+ * The SNP_INIT requires the MSR_VM_HSAVE_PA must be set to 0h
+ * across all cores.
+ */
+ on_each_cpu(snp_set_hsave_pa, NULL, 1);
+
+ /*
+ * Starting in SNP firmware v1.52, the SNP_INIT_EX command takes a list of
+ * system physical address ranges to convert into the HV-fixed page states
+ * during the RMP initialization. For instance, the memory that UEFI
+ * reserves should be included in the range list. This allows system
+ * components that occasionally write to memory (e.g. logging to UEFI
+ * reserved regions) to not fail due to RMP initialization and SNP enablement.
+ */
+ if (sev_version_greater_or_equal(SNP_MIN_API_MAJOR, 52)) {
+ /*
+ * Firmware checks that the pages containing the ranges enumerated
+ * in the RANGES structure are either in the Default page state or in the
+ * firmware page state.
+ */
+ snp_range_list = sev_fw_alloc(PAGE_SIZE);
+ if (!snp_range_list) {
+ dev_err(sev->dev,
+ "SEV: SNP_INIT_EX range list memory allocation failed\n");
+ return -ENOMEM;
+ }
+
+ memset(snp_range_list, 0, PAGE_SIZE);
+
+ /*
+ * Retrieve all reserved memory regions setup by UEFI from the e820 memory map
+ * to be setup as HV-fixed pages.
+ */
+
+ rc = walk_iomem_res_desc(IORES_DESC_NONE, IORESOURCE_MEM, 0, ~0,
+ snp_range_list, snp_filter_reserved_mem_regions);
+ if (rc) {
+ dev_err(sev->dev,
+ "SEV: SNP_INIT_EX walk_iomem_res_desc failed rc = %d\n", rc);
+ return rc;
+ }
+
+ memset(&data, 0, sizeof(data));
+ data.init_rmp = 1;
+ data.list_paddr_en = 1;
+ data.list_paddr = __pa(snp_range_list);
+
+ rc = __sev_do_cmd_locked(SEV_CMD_SNP_INIT_EX, &data, error);
+ if (rc)
+ return rc;
+ } else {
+ rc = __sev_do_cmd_locked(SEV_CMD_SNP_INIT, NULL, error);
+ if (rc)
+ return rc;
+ }
+
+ /* Prepare for first SNP guest launch after INIT */
+ wbinvd_on_all_cpus();
+ rc = __sev_do_cmd_locked(SEV_CMD_SNP_DF_FLUSH, NULL, error);
+ if (rc)
+ return rc;
+
+ sev->snp_initialized = true;
+ dev_dbg(sev->dev, "SEV-SNP firmware initialized\n");
+
+ return rc;
+}
+
+int sev_snp_init(int *error, bool init_on_probe)
+{
+ int rc;
+
+ if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
+ return -ENODEV;
+
+ if (init_on_probe && !psp_init_on_probe)
+ return 0;
+
+ mutex_lock(&sev_cmd_mutex);
+ rc = __sev_snp_init_locked(error);
+ mutex_unlock(&sev_cmd_mutex);
+
+ return rc;
+}
+EXPORT_SYMBOL_GPL(sev_snp_init);
+
+static int __sev_snp_shutdown_locked(int *error)
+{
+ struct sev_device *sev = psp_master->sev_data;
+ struct sev_data_snp_shutdown_ex data;
+ int ret;
+
+ if (!sev->snp_initialized)
+ return 0;
+
+ memset(&data, 0, sizeof(data));
+ data.length = sizeof(data);
+ data.iommu_snp_shutdown = 1;
+
+ wbinvd_on_all_cpus();
+
+retry:
+ ret = __sev_do_cmd_locked(SEV_CMD_SNP_SHUTDOWN_EX, &data, error);
+ /* SHUTDOWN may require DF_FLUSH */
+ if (*error == SEV_RET_DFFLUSH_REQUIRED) {
+ ret = __sev_do_cmd_locked(SEV_CMD_SNP_DF_FLUSH, NULL, NULL);
+ if (ret) {
+ dev_err(sev->dev, "SEV-SNP DF_FLUSH failed\n");
+ return ret;
+ }
+ goto retry;
+ }
+ if (ret) {
+ dev_err(sev->dev, "SEV-SNP firmware shutdown failed\n");
+ return ret;
+ }
+
+ sev->snp_initialized = false;
+ dev_dbg(sev->dev, "SEV-SNP firmware shutdown\n");
+
+ return ret;
+}
+
+static int sev_snp_shutdown(int *error)
+{
+ int rc;
+
+ mutex_lock(&sev_cmd_mutex);
+ rc = __sev_snp_shutdown_locked(error);
+ mutex_unlock(&sev_cmd_mutex);
+
+ return rc;
+}
+
static int sev_ioctl_do_pek_import(struct sev_issue_cmd *argp, bool writable)
{
struct sev_device *sev = psp_master->sev_data;
@@ -1270,6 +1462,8 @@ int sev_dev_init(struct psp_device *psp)

static void sev_firmware_shutdown(struct sev_device *sev)
{
+ int error;
+
sev_platform_shutdown(NULL);

if (sev_es_tmr) {
@@ -1286,6 +1480,14 @@ static void sev_firmware_shutdown(struct sev_device *sev)
get_order(NV_LENGTH));
sev_init_ex_buffer = NULL;
}
+
+ if (snp_range_list) {
+ free_pages((unsigned long)snp_range_list,
+ get_order(PAGE_SIZE));
+ snp_range_list = NULL;
+ }
+
+ sev_snp_shutdown(&error);
}

void sev_dev_destroy(struct psp_device *psp)
@@ -1341,6 +1543,26 @@ void sev_pci_init(void)
}
}

+ /*
+ * If boot CPU supports SNP, then first attempt to initialize
+ * the SNP firmware.
+ */
+ if (cpu_feature_enabled(X86_FEATURE_SEV_SNP)) {
+ if (!sev_version_greater_or_equal(SNP_MIN_API_MAJOR, SNP_MIN_API_MINOR)) {
+ dev_err(sev->dev, "SEV-SNP support requires firmware version >= %d:%d\n",
+ SNP_MIN_API_MAJOR, SNP_MIN_API_MINOR);
+ } else {
+ rc = sev_snp_init(&error, true);
+ if (rc) {
+ /*
+ * Don't abort the probe if SNP INIT failed,
+ * continue to initialize the legacy SEV firmware.
+ */
+ dev_err(sev->dev, "SEV-SNP: failed to INIT error %#x\n", error);
+ }
+ }
+ }
+
/* Obtain the TMR memory area for SEV-ES use */
sev_es_tmr = sev_fw_alloc(SEV_ES_TMR_SIZE);
if (!sev_es_tmr)
@@ -1356,6 +1578,9 @@ void sev_pci_init(void)
dev_err(sev->dev, "SEV: failed to INIT error %#x, rc %d\n",
error, rc);

+ dev_info(sev->dev, "SEV%s API:%d.%d build:%d\n", sev->snp_initialized ?
+ "-SNP" : "", sev->api_major, sev->api_minor, sev->build);
+
return;

err:
diff --git a/drivers/crypto/ccp/sev-dev.h b/drivers/crypto/ccp/sev-dev.h
index 666c21eb81ab..34767657beb5 100644
--- a/drivers/crypto/ccp/sev-dev.h
+++ b/drivers/crypto/ccp/sev-dev.h
@@ -52,6 +52,8 @@ struct sev_device {
u8 build;

void *cmd_buf;
+
+ bool snp_initialized;
};

int sev_dev_init(struct psp_device *psp);
diff --git a/include/linux/psp-sev.h b/include/linux/psp-sev.h
index 31b045e1926f..8cfe92e82743 100644
--- a/include/linux/psp-sev.h
+++ b/include/linux/psp-sev.h
@@ -794,6 +794,21 @@ struct sev_data_snp_shutdown_ex {
*/
int sev_platform_init(int *error);

+/**
+ * sev_snp_init - perform SEV SNP_INIT command
+ *
+ * @error: SEV command return code
+ * @init_on_probe: indicates if called during module probe/init
+ *
+ * Returns:
+ * 0 if the SEV successfully processed the command
+ * -%ENODEV if the SEV device is not available
+ * -%ENOTSUPP if the SEV does not support SEV
+ * -%ETIMEDOUT if the SEV command timed out
+ * -%EIO if the SEV returned a non-zero return code
+ */
+int sev_snp_init(int *error, bool init_on_probe);
+
/**
* sev_platform_status - perform SEV PLATFORM_STATUS command
*
@@ -901,6 +916,8 @@ sev_platform_status(struct sev_user_data_status *status, int *error) { return -E

static inline int sev_platform_init(int *error) { return -ENODEV; }

+static inline int sev_snp_init(int *error, bool init_on_probe) { return -ENODEV; }
+
static inline int
sev_guest_deactivate(struct sev_data_deactivate *data, int *error) { return -ENODEV; }

--
2.25.1

2022-12-14 20:02:38

by Michael Roth

[permalink] [raw]
Subject: [PATCH RFC v7 02/64] KVM: x86: Add KVM_CAP_UNMAPPED_PRIVATE_MEMORY

This mainly indicates to KVM that it should expect all private guest
memory to be backed by private memslots. Ideally this would work
similarly for others archs, give or take a few additional flags, but
for now it's a simple boolean indicator for x86.

Signed-off-by: Michael Roth <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 3 +++
arch/x86/kvm/x86.c | 10 ++++++++++
include/uapi/linux/kvm.h | 1 +
3 files changed, 14 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 27ef31133352..2b6244525107 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1438,6 +1438,9 @@ struct kvm_arch {
*/
#define SPLIT_DESC_CACHE_MIN_NR_OBJECTS (SPTE_ENT_PER_PAGE + 1)
struct kvm_mmu_memory_cache split_desc_cache;
+
+ /* Use/enforce unmapped private memory. */
+ bool upm_mode;
};

struct kvm_vm_stat {
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c67e22f3e2ee..99ecf99bc4d2 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4421,6 +4421,11 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
case KVM_CAP_EXIT_HYPERCALL:
r = KVM_EXIT_HYPERCALL_VALID_MASK;
break;
+#ifdef CONFIG_HAVE_KVM_MEMORY_ATTRIBUTES
+ case KVM_CAP_UNMAPPED_PRIVATE_MEM:
+ r = 1;
+ break;
+#endif
case KVM_CAP_SET_GUEST_DEBUG2:
return KVM_GUESTDBG_VALID_MASK;
#ifdef CONFIG_KVM_XEN
@@ -6382,6 +6387,10 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
}
mutex_unlock(&kvm->lock);
break;
+ case KVM_CAP_UNMAPPED_PRIVATE_MEM:
+ kvm->arch.upm_mode = true;
+ r = 0;
+ break;
default:
r = -EINVAL;
break;
@@ -12128,6 +12137,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
kvm->arch.default_tsc_khz = max_tsc_khz ? : tsc_khz;
kvm->arch.guest_can_read_msr_platform_info = true;
kvm->arch.enable_pmu = enable_pmu;
+ kvm->arch.upm_mode = false;

#if IS_ENABLED(CONFIG_HYPERV)
spin_lock_init(&kvm->arch.hv_root_tdp_lock);
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index c7e9d375a902..cc9424ccf9b2 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1219,6 +1219,7 @@ struct kvm_ppc_resize_hpt {
#define KVM_CAP_DIRTY_LOG_RING_ACQ_REL 223
#define KVM_CAP_S390_PROTECTED_ASYNC_DISABLE 224
#define KVM_CAP_MEMORY_ATTRIBUTES 225
+#define KVM_CAP_UNMAPPED_PRIVATE_MEM 240

#ifdef KVM_CAP_IRQ_ROUTING

--
2.25.1

2022-12-14 20:02:38

by Michael Roth

[permalink] [raw]
Subject: [PATCH RFC v7 29/64] crypto: ccp: Handle the legacy SEV command when SNP is enabled

From: Brijesh Singh <[email protected]>

The behavior of the SEV-legacy commands is altered when the SNP firmware
is in the INIT state. When SNP is in INIT state, all the SEV-legacy
commands that cause the firmware to write to memory must be in the
firmware state before issuing the command..

A command buffer may contains a system physical address that the firmware
may write to. There are two cases that need to be handled:

1) system physical address points to a guest memory
2) system physical address points to a host memory

To handle the case #1, change the page state to the firmware in the RMP
table before issuing the command and restore the state to shared after the
command completes.

For the case #2, use a bounce buffer to complete the request.

Signed-off-by: Brijesh Singh <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
Signed-off-by: Michael Roth <[email protected]>
---
drivers/crypto/ccp/sev-dev.c | 370 ++++++++++++++++++++++++++++++++++-
drivers/crypto/ccp/sev-dev.h | 12 ++
2 files changed, 372 insertions(+), 10 deletions(-)

diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index 4c12e98a1219..5eb2e8f364d4 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -286,6 +286,30 @@ static int rmp_mark_pages_firmware(unsigned long paddr, unsigned int npages, boo
return rc;
}

+static int rmp_mark_pages_shared(unsigned long paddr, unsigned int npages)
+{
+ /* Cbit maybe set in the paddr */
+ unsigned long pfn = __sme_clr(paddr) >> PAGE_SHIFT;
+ int rc, n = 0, i;
+
+ for (i = 0; i < npages; i++, pfn++, n++) {
+ rc = rmp_make_shared(pfn, PG_LEVEL_4K);
+ if (rc)
+ goto cleanup;
+ }
+
+ return 0;
+
+cleanup:
+ /*
+ * If failed to change the page state to shared, then its not safe
+ * to release the page back to the system, leak it.
+ */
+ snp_mark_pages_offline(pfn, npages - n);
+
+ return rc;
+}
+
static struct page *__snp_alloc_firmware_pages(gfp_t gfp_mask, int order, bool locked)
{
unsigned long npages = 1ul << order, paddr;
@@ -487,12 +511,295 @@ static int sev_write_init_ex_file_if_required(int cmd_id)
return sev_write_init_ex_file();
}

+static int alloc_snp_host_map(struct sev_device *sev)
+{
+ struct page *page;
+ int i;
+
+ for (i = 0; i < MAX_SNP_HOST_MAP_BUFS; i++) {
+ struct snp_host_map *map = &sev->snp_host_map[i];
+
+ memset(map, 0, sizeof(*map));
+
+ page = alloc_pages(GFP_KERNEL_ACCOUNT, get_order(SEV_FW_BLOB_MAX_SIZE));
+ if (!page)
+ return -ENOMEM;
+
+ map->host = page_address(page);
+ }
+
+ return 0;
+}
+
+static void free_snp_host_map(struct sev_device *sev)
+{
+ int i;
+
+ for (i = 0; i < MAX_SNP_HOST_MAP_BUFS; i++) {
+ struct snp_host_map *map = &sev->snp_host_map[i];
+
+ if (map->host) {
+ __free_pages(virt_to_page(map->host), get_order(SEV_FW_BLOB_MAX_SIZE));
+ memset(map, 0, sizeof(*map));
+ }
+ }
+}
+
+static int map_firmware_writeable(u64 *paddr, u32 len, bool guest, struct snp_host_map *map)
+{
+ unsigned int npages = PAGE_ALIGN(len) >> PAGE_SHIFT;
+
+ map->active = false;
+
+ if (!paddr || !len)
+ return 0;
+
+ map->paddr = *paddr;
+ map->len = len;
+
+ /* If paddr points to a guest memory then change the page state to firmwware. */
+ if (guest) {
+ if (rmp_mark_pages_firmware(*paddr, npages, true))
+ return -EFAULT;
+
+ goto done;
+ }
+
+ if (!map->host)
+ return -ENOMEM;
+
+ /* Check if the pre-allocated buffer can be used to fullfil the request. */
+ if (len > SEV_FW_BLOB_MAX_SIZE)
+ return -EINVAL;
+
+ /* Transition the pre-allocated buffer to the firmware state. */
+ if (rmp_mark_pages_firmware(__pa(map->host), npages, true))
+ return -EFAULT;
+
+ /* Set the paddr to use pre-allocated firmware buffer */
+ *paddr = __psp_pa(map->host);
+
+done:
+ map->active = true;
+ return 0;
+}
+
+static int unmap_firmware_writeable(u64 *paddr, u32 len, bool guest, struct snp_host_map *map)
+{
+ unsigned int npages = PAGE_ALIGN(len) >> PAGE_SHIFT;
+
+ if (!map->active)
+ return 0;
+
+ /* If paddr points to a guest memory then restore the page state to hypervisor. */
+ if (guest) {
+ if (snp_reclaim_pages(*paddr, npages, true))
+ return -EFAULT;
+
+ goto done;
+ }
+
+ /*
+ * Transition the pre-allocated buffer to hypervisor state before the access.
+ *
+ * This is because while changing the page state to firmware, the kernel unmaps
+ * the pages from the direct map, and to restore the direct map the pages must
+ * be transitioned back to the shared state.
+ */
+ if (snp_reclaim_pages(__pa(map->host), npages, true))
+ return -EFAULT;
+
+ /* Copy the response data firmware buffer to the callers buffer. */
+ memcpy(__va(__sme_clr(map->paddr)), map->host, min_t(size_t, len, map->len));
+ *paddr = map->paddr;
+
+done:
+ map->active = false;
+ return 0;
+}
+
+static bool sev_legacy_cmd_buf_writable(int cmd)
+{
+ switch (cmd) {
+ case SEV_CMD_PLATFORM_STATUS:
+ case SEV_CMD_GUEST_STATUS:
+ case SEV_CMD_LAUNCH_START:
+ case SEV_CMD_RECEIVE_START:
+ case SEV_CMD_LAUNCH_MEASURE:
+ case SEV_CMD_SEND_START:
+ case SEV_CMD_SEND_UPDATE_DATA:
+ case SEV_CMD_SEND_UPDATE_VMSA:
+ case SEV_CMD_PEK_CSR:
+ case SEV_CMD_PDH_CERT_EXPORT:
+ case SEV_CMD_GET_ID:
+ case SEV_CMD_ATTESTATION_REPORT:
+ return true;
+ default:
+ return false;
+ }
+}
+
+#define prep_buffer(name, addr, len, guest, map) \
+ func(&((typeof(name *))cmd_buf)->addr, ((typeof(name *))cmd_buf)->len, guest, map)
+
+static int __snp_cmd_buf_copy(int cmd, void *cmd_buf, bool to_fw, int fw_err)
+{
+ int (*func)(u64 *paddr, u32 len, bool guest, struct snp_host_map *map);
+ struct sev_device *sev = psp_master->sev_data;
+ bool from_fw = !to_fw;
+
+ /*
+ * After the command is completed, change the command buffer memory to
+ * hypervisor state.
+ *
+ * The immutable bit is automatically cleared by the firmware, so
+ * no not need to reclaim the page.
+ */
+ if (from_fw && sev_legacy_cmd_buf_writable(cmd)) {
+ if (rmp_mark_pages_shared(__pa(cmd_buf), 1))
+ return -EFAULT;
+
+ /* No need to go further if firmware failed to execute command. */
+ if (fw_err)
+ return 0;
+ }
+
+ if (to_fw)
+ func = map_firmware_writeable;
+ else
+ func = unmap_firmware_writeable;
+
+ /*
+ * A command buffer may contains a system physical address. If the address
+ * points to a host memory then use an intermediate firmware page otherwise
+ * change the page state in the RMP table.
+ */
+ switch (cmd) {
+ case SEV_CMD_PDH_CERT_EXPORT:
+ if (prep_buffer(struct sev_data_pdh_cert_export, pdh_cert_address,
+ pdh_cert_len, false, &sev->snp_host_map[0]))
+ goto err;
+ if (prep_buffer(struct sev_data_pdh_cert_export, cert_chain_address,
+ cert_chain_len, false, &sev->snp_host_map[1]))
+ goto err;
+ break;
+ case SEV_CMD_GET_ID:
+ if (prep_buffer(struct sev_data_get_id, address, len,
+ false, &sev->snp_host_map[0]))
+ goto err;
+ break;
+ case SEV_CMD_PEK_CSR:
+ if (prep_buffer(struct sev_data_pek_csr, address, len,
+ false, &sev->snp_host_map[0]))
+ goto err;
+ break;
+ case SEV_CMD_LAUNCH_UPDATE_DATA:
+ if (prep_buffer(struct sev_data_launch_update_data, address, len,
+ true, &sev->snp_host_map[0]))
+ goto err;
+ break;
+ case SEV_CMD_LAUNCH_UPDATE_VMSA:
+ if (prep_buffer(struct sev_data_launch_update_vmsa, address, len,
+ true, &sev->snp_host_map[0]))
+ goto err;
+ break;
+ case SEV_CMD_LAUNCH_MEASURE:
+ if (prep_buffer(struct sev_data_launch_measure, address, len,
+ false, &sev->snp_host_map[0]))
+ goto err;
+ break;
+ case SEV_CMD_LAUNCH_UPDATE_SECRET:
+ if (prep_buffer(struct sev_data_launch_secret, guest_address, guest_len,
+ true, &sev->snp_host_map[0]))
+ goto err;
+ break;
+ case SEV_CMD_DBG_DECRYPT:
+ if (prep_buffer(struct sev_data_dbg, dst_addr, len, false,
+ &sev->snp_host_map[0]))
+ goto err;
+ break;
+ case SEV_CMD_DBG_ENCRYPT:
+ if (prep_buffer(struct sev_data_dbg, dst_addr, len, true,
+ &sev->snp_host_map[0]))
+ goto err;
+ break;
+ case SEV_CMD_ATTESTATION_REPORT:
+ if (prep_buffer(struct sev_data_attestation_report, address, len,
+ false, &sev->snp_host_map[0]))
+ goto err;
+ break;
+ case SEV_CMD_SEND_START:
+ if (prep_buffer(struct sev_data_send_start, session_address,
+ session_len, false, &sev->snp_host_map[0]))
+ goto err;
+ break;
+ case SEV_CMD_SEND_UPDATE_DATA:
+ if (prep_buffer(struct sev_data_send_update_data, hdr_address, hdr_len,
+ false, &sev->snp_host_map[0]))
+ goto err;
+ if (prep_buffer(struct sev_data_send_update_data, trans_address,
+ trans_len, false, &sev->snp_host_map[1]))
+ goto err;
+ break;
+ case SEV_CMD_SEND_UPDATE_VMSA:
+ if (prep_buffer(struct sev_data_send_update_vmsa, hdr_address, hdr_len,
+ false, &sev->snp_host_map[0]))
+ goto err;
+ if (prep_buffer(struct sev_data_send_update_vmsa, trans_address,
+ trans_len, false, &sev->snp_host_map[1]))
+ goto err;
+ break;
+ case SEV_CMD_RECEIVE_UPDATE_DATA:
+ if (prep_buffer(struct sev_data_receive_update_data, guest_address,
+ guest_len, true, &sev->snp_host_map[0]))
+ goto err;
+ break;
+ case SEV_CMD_RECEIVE_UPDATE_VMSA:
+ if (prep_buffer(struct sev_data_receive_update_vmsa, guest_address,
+ guest_len, true, &sev->snp_host_map[0]))
+ goto err;
+ break;
+ default:
+ break;
+ }
+
+ /* The command buffer need to be in the firmware state. */
+ if (to_fw && sev_legacy_cmd_buf_writable(cmd)) {
+ if (rmp_mark_pages_firmware(__pa(cmd_buf), 1, true))
+ return -EFAULT;
+ }
+
+ return 0;
+
+err:
+ return -EINVAL;
+}
+
+static inline bool need_firmware_copy(int cmd)
+{
+ struct sev_device *sev = psp_master->sev_data;
+
+ /* After SNP is INIT'ed, the behavior of legacy SEV command is changed. */
+ return ((cmd < SEV_CMD_SNP_INIT) && sev->snp_initialized) ? true : false;
+}
+
+static int snp_aware_copy_to_firmware(int cmd, void *data)
+{
+ return __snp_cmd_buf_copy(cmd, data, true, 0);
+}
+
+static int snp_aware_copy_from_firmware(int cmd, void *data, int fw_err)
+{
+ return __snp_cmd_buf_copy(cmd, data, false, fw_err);
+}
+
static int __sev_do_cmd_locked(int cmd, void *data, int *psp_ret)
{
struct psp_device *psp = psp_master;
struct sev_device *sev;
unsigned int phys_lsb, phys_msb;
unsigned int reg, ret = 0;
+ void *cmd_buf;
int buf_len;

if (!psp || !psp->sev_data)
@@ -512,12 +819,28 @@ static int __sev_do_cmd_locked(int cmd, void *data, int *psp_ret)
* work for some memory, e.g. vmalloc'd addresses, and @data may not be
* physically contiguous.
*/
- if (data)
- memcpy(sev->cmd_buf, data, buf_len);
+ if (data) {
+ if (sev->cmd_buf_active > 2)
+ return -EBUSY;
+
+ cmd_buf = sev->cmd_buf_active ? sev->cmd_buf_backup : sev->cmd_buf;
+
+ memcpy(cmd_buf, data, buf_len);
+ sev->cmd_buf_active++;
+
+ /*
+ * The behavior of the SEV-legacy commands is altered when the
+ * SNP firmware is in the INIT state.
+ */
+ if (need_firmware_copy(cmd) && snp_aware_copy_to_firmware(cmd, sev->cmd_buf))
+ return -EFAULT;
+ } else {
+ cmd_buf = sev->cmd_buf;
+ }

/* Get the physical address of the command buffer */
- phys_lsb = data ? lower_32_bits(__psp_pa(sev->cmd_buf)) : 0;
- phys_msb = data ? upper_32_bits(__psp_pa(sev->cmd_buf)) : 0;
+ phys_lsb = data ? lower_32_bits(__psp_pa(cmd_buf)) : 0;
+ phys_msb = data ? upper_32_bits(__psp_pa(cmd_buf)) : 0;

dev_dbg(sev->dev, "sev command id %#x buffer 0x%08x%08x timeout %us\n",
cmd, phys_msb, phys_lsb, psp_timeout);
@@ -560,15 +883,24 @@ static int __sev_do_cmd_locked(int cmd, void *data, int *psp_ret)
ret = sev_write_init_ex_file_if_required(cmd);
}

- print_hex_dump_debug("(out): ", DUMP_PREFIX_OFFSET, 16, 2, data,
- buf_len, false);
-
/*
* Copy potential output from the PSP back to data. Do this even on
* failure in case the caller wants to glean something from the error.
*/
- if (data)
- memcpy(data, sev->cmd_buf, buf_len);
+ if (data) {
+ /*
+ * Restore the page state after the command completes.
+ */
+ if (need_firmware_copy(cmd) &&
+ snp_aware_copy_from_firmware(cmd, cmd_buf, ret))
+ return -EFAULT;
+
+ memcpy(data, cmd_buf, buf_len);
+ sev->cmd_buf_active--;
+ }
+
+ print_hex_dump_debug("(out): ", DUMP_PREFIX_OFFSET, 16, 2, data,
+ buf_len, false);

return ret;
}
@@ -1579,10 +1911,12 @@ int sev_dev_init(struct psp_device *psp)
if (!sev)
goto e_err;

- sev->cmd_buf = (void *)devm_get_free_pages(dev, GFP_KERNEL, 0);
+ sev->cmd_buf = (void *)devm_get_free_pages(dev, GFP_KERNEL, 1);
if (!sev->cmd_buf)
goto e_sev;

+ sev->cmd_buf_backup = (uint8_t *)sev->cmd_buf + PAGE_SIZE;
+
psp->sev_data = sev;

sev->dev = dev;
@@ -1648,6 +1982,12 @@ static void sev_firmware_shutdown(struct sev_device *sev)
snp_range_list = NULL;
}

+ /*
+ * The host map need to clear the immutable bit so it must be free'd before the
+ * SNP firmware shutdown.
+ */
+ free_snp_host_map(sev);
+
sev_snp_shutdown(&error);
}

@@ -1722,6 +2062,14 @@ void sev_pci_init(void)
dev_err(sev->dev, "SEV-SNP: failed to INIT error %#x\n", error);
}
}
+
+ /*
+ * Allocate the intermediate buffers used for the legacy command handling.
+ */
+ if (alloc_snp_host_map(sev)) {
+ dev_notice(sev->dev, "Failed to alloc host map (disabling legacy SEV)\n");
+ goto skip_legacy;
+ }
}

/* Obtain the TMR memory area for SEV-ES use */
@@ -1739,12 +2087,14 @@ void sev_pci_init(void)
dev_err(sev->dev, "SEV: failed to INIT error %#x, rc %d\n",
error, rc);

+skip_legacy:
dev_info(sev->dev, "SEV%s API:%d.%d build:%d\n", sev->snp_initialized ?
"-SNP" : "", sev->api_major, sev->api_minor, sev->build);

return;

err:
+ free_snp_host_map(sev);
psp_master->sev_data = NULL;
}

diff --git a/drivers/crypto/ccp/sev-dev.h b/drivers/crypto/ccp/sev-dev.h
index 34767657beb5..19d79f9d4212 100644
--- a/drivers/crypto/ccp/sev-dev.h
+++ b/drivers/crypto/ccp/sev-dev.h
@@ -29,11 +29,20 @@
#define SEV_CMDRESP_CMD_SHIFT 16
#define SEV_CMDRESP_IOC BIT(0)

+#define MAX_SNP_HOST_MAP_BUFS 2
+
struct sev_misc_dev {
struct kref refcount;
struct miscdevice misc;
};

+struct snp_host_map {
+ u64 paddr;
+ u32 len;
+ void *host;
+ bool active;
+};
+
struct sev_device {
struct device *dev;
struct psp_device *psp;
@@ -52,8 +61,11 @@ struct sev_device {
u8 build;

void *cmd_buf;
+ void *cmd_buf_backup;
+ int cmd_buf_active;

bool snp_initialized;
+ struct snp_host_map snp_host_map[MAX_SNP_HOST_MAP_BUFS];
};

int sev_dev_init(struct psp_device *psp);
--
2.25.1

2022-12-14 20:02:44

by Michael Roth

[permalink] [raw]
Subject: [PATCH RFC v7 30/64] crypto: ccp: Add the SNP_PLATFORM_STATUS command

From: Brijesh Singh <[email protected]>

The command can be used by the userspace to query the SNP platform status
report. See the SEV-SNP spec for more details.

Signed-off-by: Brijesh Singh <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
Signed-off-by: Michael Roth <[email protected]>
---
Documentation/virt/coco/sev-guest.rst | 27 ++++++++++++++++
drivers/crypto/ccp/sev-dev.c | 45 +++++++++++++++++++++++++++
include/uapi/linux/psp-sev.h | 1 +
3 files changed, 73 insertions(+)

diff --git a/Documentation/virt/coco/sev-guest.rst b/Documentation/virt/coco/sev-guest.rst
index bf593e88cfd9..11ea67c944df 100644
--- a/Documentation/virt/coco/sev-guest.rst
+++ b/Documentation/virt/coco/sev-guest.rst
@@ -61,6 +61,22 @@ counter (e.g. counter overflow), then -EIO will be returned.
__u64 fw_err;
};

+The host ioctl should be called to /dev/sev device. The ioctl accepts command
+id and command input structure.
+
+::
+ struct sev_issue_cmd {
+ /* Command ID */
+ __u32 cmd;
+
+ /* Command request structure */
+ __u64 data;
+
+ /* firmware error code on failure (see psp-sev.h) */
+ __u32 error;
+ };
+
+
2.1 SNP_GET_REPORT
------------------

@@ -118,6 +134,17 @@ be updated with the expected value.

See GHCB specification for further detail on how to parse the certificate blob.

+2.4 SNP_PLATFORM_STATUS
+-----------------------
+:Technology: sev-snp
+:Type: hypervisor ioctl cmd
+:Parameters (in): struct sev_data_snp_platform_status
+:Returns (out): 0 on success, -negative on error
+
+The SNP_PLATFORM_STATUS command is used to query the SNP platform status. The
+status includes API major, minor version and more. See the SEV-SNP
+specification for further details.
+
3. SEV-SNP CPUID Enforcement
============================

diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index 5eb2e8f364d4..10b87ec339aa 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -1750,6 +1750,48 @@ static int sev_ioctl_do_pdh_export(struct sev_issue_cmd *argp, bool writable)
return ret;
}

+static int sev_ioctl_snp_platform_status(struct sev_issue_cmd *argp)
+{
+ struct sev_device *sev = psp_master->sev_data;
+ struct sev_data_snp_addr buf;
+ struct page *status_page;
+ void *data;
+ int ret;
+
+ if (!sev->snp_initialized || !argp->data)
+ return -EINVAL;
+
+ status_page = alloc_page(GFP_KERNEL_ACCOUNT);
+ if (!status_page)
+ return -ENOMEM;
+
+ data = page_address(status_page);
+ if (rmp_mark_pages_firmware(__pa(data), 1, true)) {
+ __free_pages(status_page, 0);
+ return -EFAULT;
+ }
+
+ buf.gctx_paddr = __psp_pa(data);
+ ret = __sev_do_cmd_locked(SEV_CMD_SNP_PLATFORM_STATUS, &buf, &argp->error);
+
+ /* Change the page state before accessing it */
+ if (snp_reclaim_pages(__pa(data), 1, true)) {
+ snp_mark_pages_offline(__pa(data) >> PAGE_SHIFT, 1);
+ return -EFAULT;
+ }
+
+ if (ret)
+ goto cleanup;
+
+ if (copy_to_user((void __user *)argp->data, data,
+ sizeof(struct sev_user_data_snp_status)))
+ ret = -EFAULT;
+
+cleanup:
+ __free_pages(status_page, 0);
+ return ret;
+}
+
static long sev_ioctl(struct file *file, unsigned int ioctl, unsigned long arg)
{
void __user *argp = (void __user *)arg;
@@ -1801,6 +1843,9 @@ static long sev_ioctl(struct file *file, unsigned int ioctl, unsigned long arg)
case SEV_GET_ID2:
ret = sev_ioctl_do_get_id2(&input);
break;
+ case SNP_PLATFORM_STATUS:
+ ret = sev_ioctl_snp_platform_status(&input);
+ break;
default:
ret = -EINVAL;
goto out;
diff --git a/include/uapi/linux/psp-sev.h b/include/uapi/linux/psp-sev.h
index bed65a891223..ffd60e8b0a31 100644
--- a/include/uapi/linux/psp-sev.h
+++ b/include/uapi/linux/psp-sev.h
@@ -28,6 +28,7 @@ enum {
SEV_PEK_CERT_IMPORT,
SEV_GET_ID, /* This command is deprecated, use SEV_GET_ID2 */
SEV_GET_ID2,
+ SNP_PLATFORM_STATUS,

SEV_MAX,
};
--
2.25.1

2022-12-14 20:02:48

by Michael Roth

[permalink] [raw]
Subject: [PATCH RFC v7 32/64] crypto: ccp: Provide APIs to query extended attestation report

From: Brijesh Singh <[email protected]>

Version 2 of the GHCB specification defines VMGEXIT that is used to get
the extended attestation report. The extended attestation report includes
the certificate blobs provided through the SNP_SET_EXT_CONFIG.

The snp_guest_ext_guest_request() will be used by the hypervisor to get
the extended attestation report. See the GHCB specification for more
details.

Signed-off-by: Brijesh Singh <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
Signed-off-by: Michael Roth <[email protected]>
---
drivers/crypto/ccp/sev-dev.c | 48 ++++++++++++++++++++++++++++++++++++
include/linux/psp-sev.h | 33 +++++++++++++++++++++++++
2 files changed, 81 insertions(+)

diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index d59727ac2bdd..d4f13e5a8dde 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -27,6 +27,7 @@

#include <asm/smp.h>
#include <asm/e820/types.h>
+#include <asm/sev.h>

#include "psp-dev.h"
#include "sev-dev.h"
@@ -2016,6 +2017,53 @@ int sev_guest_df_flush(int *error)
}
EXPORT_SYMBOL_GPL(sev_guest_df_flush);

+int snp_guest_ext_guest_request(struct sev_data_snp_guest_request *data,
+ unsigned long vaddr, unsigned long *npages, unsigned long *fw_err)
+{
+ unsigned long expected_npages;
+ struct sev_device *sev;
+ int rc;
+
+ if (!psp_master || !psp_master->sev_data)
+ return -ENODEV;
+
+ sev = psp_master->sev_data;
+
+ if (!sev->snp_initialized)
+ return -EINVAL;
+
+ mutex_lock(&sev->snp_certs_lock);
+ /*
+ * Check if there is enough space to copy the certificate chain. Otherwise
+ * return ERROR code defined in the GHCB specification.
+ */
+ expected_npages = sev->snp_certs_len >> PAGE_SHIFT;
+ if (*npages < expected_npages) {
+ *npages = expected_npages;
+ *fw_err = SNP_GUEST_REQ_INVALID_LEN;
+ mutex_unlock(&sev->snp_certs_lock);
+ return -EINVAL;
+ }
+
+ rc = sev_do_cmd(SEV_CMD_SNP_GUEST_REQUEST, data, (int *)fw_err);
+ if (rc) {
+ mutex_unlock(&sev->snp_certs_lock);
+ return rc;
+ }
+
+ /* Copy the certificate blob */
+ if (sev->snp_certs_data) {
+ *npages = expected_npages;
+ memcpy((void *)vaddr, sev->snp_certs_data, *npages << PAGE_SHIFT);
+ } else {
+ *npages = 0;
+ }
+
+ mutex_unlock(&sev->snp_certs_lock);
+ return rc;
+}
+EXPORT_SYMBOL_GPL(snp_guest_ext_guest_request);
+
static void sev_exit(struct kref *ref)
{
misc_deregister(&misc_dev->misc);
diff --git a/include/linux/psp-sev.h b/include/linux/psp-sev.h
index d19744807471..81bafc049eca 100644
--- a/include/linux/psp-sev.h
+++ b/include/linux/psp-sev.h
@@ -931,6 +931,32 @@ void snp_free_firmware_page(void *addr);
*/
void snp_mark_pages_offline(unsigned long pfn, unsigned int npages);

+/**
+ * snp_guest_ext_guest_request - perform the SNP extended guest request command
+ * defined in the GHCB specification.
+ *
+ * @data: the input guest request structure
+ * @vaddr: address where the certificate blob need to be copied.
+ * @npages: number of pages for the certificate blob.
+ * If the specified page count is less than the certificate blob size, then the
+ * required page count is returned with error code defined in the GHCB spec.
+ * If the specified page count is more than the certificate blob size, then
+ * page count is updated to reflect the amount of valid data copied in the
+ * vaddr.
+ *
+ * @sev_ret: sev command return code
+ *
+ * Returns:
+ * 0 if the sev successfully processed the command
+ * -%ENODEV if the sev device is not available
+ * -%ENOTSUPP if the sev does not support SEV
+ * -%ETIMEDOUT if the sev command timed out
+ * -%EIO if the sev returned a non-zero return code
+ */
+int snp_guest_ext_guest_request(struct sev_data_snp_guest_request *data,
+ unsigned long vaddr, unsigned long *npages,
+ unsigned long *error);
+
#else /* !CONFIG_CRYPTO_DEV_SP_PSP */

static inline int
@@ -968,6 +994,13 @@ static inline void *snp_alloc_firmware_page(gfp_t mask)

static inline void snp_free_firmware_page(void *addr) { }

+static inline int snp_guest_ext_guest_request(struct sev_data_snp_guest_request *data,
+ unsigned long vaddr, unsigned long *n,
+ unsigned long *error)
+{
+ return -ENODEV;
+}
+
#endif /* CONFIG_CRYPTO_DEV_SP_PSP */

#endif /* __PSP_SEV_H__ */
--
2.25.1

2022-12-14 20:03:12

by Michael Roth

[permalink] [raw]
Subject: [PATCH RFC v7 24/64] crypto:ccp: Define the SEV-SNP commands

From: Brijesh Singh <[email protected]>

AMD introduced the next generation of SEV called SEV-SNP (Secure Nested
Paging). SEV-SNP builds upon existing SEV and SEV-ES functionality
while adding new hardware security protection.

Define the commands and structures used to communicate with the AMD-SP
when creating and managing the SEV-SNP guests. The SEV-SNP firmware spec
is available at developer.amd.com/sev.

Co-developed-by: Ashish Kalra <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
Signed-off-by: Michael Roth <[email protected]>
---
drivers/crypto/ccp/sev-dev.c | 16 +++
include/linux/psp-sev.h | 247 +++++++++++++++++++++++++++++++++++
include/uapi/linux/psp-sev.h | 42 ++++++
3 files changed, 305 insertions(+)

diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index 06fc7156c04f..9d84720a41d7 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -126,6 +126,8 @@ static int sev_cmd_buffer_len(int cmd)
switch (cmd) {
case SEV_CMD_INIT: return sizeof(struct sev_data_init);
case SEV_CMD_INIT_EX: return sizeof(struct sev_data_init_ex);
+ case SEV_CMD_SNP_SHUTDOWN_EX: return sizeof(struct sev_data_snp_shutdown_ex);
+ case SEV_CMD_SNP_INIT_EX: return sizeof(struct sev_data_snp_init_ex);
case SEV_CMD_PLATFORM_STATUS: return sizeof(struct sev_user_data_status);
case SEV_CMD_PEK_CSR: return sizeof(struct sev_data_pek_csr);
case SEV_CMD_PEK_CERT_IMPORT: return sizeof(struct sev_data_pek_cert_import);
@@ -154,6 +156,20 @@ static int sev_cmd_buffer_len(int cmd)
case SEV_CMD_GET_ID: return sizeof(struct sev_data_get_id);
case SEV_CMD_ATTESTATION_REPORT: return sizeof(struct sev_data_attestation_report);
case SEV_CMD_SEND_CANCEL: return sizeof(struct sev_data_send_cancel);
+ case SEV_CMD_SNP_GCTX_CREATE: return sizeof(struct sev_data_snp_addr);
+ case SEV_CMD_SNP_LAUNCH_START: return sizeof(struct sev_data_snp_launch_start);
+ case SEV_CMD_SNP_LAUNCH_UPDATE: return sizeof(struct sev_data_snp_launch_update);
+ case SEV_CMD_SNP_ACTIVATE: return sizeof(struct sev_data_snp_activate);
+ case SEV_CMD_SNP_DECOMMISSION: return sizeof(struct sev_data_snp_addr);
+ case SEV_CMD_SNP_PAGE_RECLAIM: return sizeof(struct sev_data_snp_page_reclaim);
+ case SEV_CMD_SNP_GUEST_STATUS: return sizeof(struct sev_data_snp_guest_status);
+ case SEV_CMD_SNP_LAUNCH_FINISH: return sizeof(struct sev_data_snp_launch_finish);
+ case SEV_CMD_SNP_DBG_DECRYPT: return sizeof(struct sev_data_snp_dbg);
+ case SEV_CMD_SNP_DBG_ENCRYPT: return sizeof(struct sev_data_snp_dbg);
+ case SEV_CMD_SNP_PAGE_UNSMASH: return sizeof(struct sev_data_snp_page_unsmash);
+ case SEV_CMD_SNP_PLATFORM_STATUS: return sizeof(struct sev_data_snp_addr);
+ case SEV_CMD_SNP_GUEST_REQUEST: return sizeof(struct sev_data_snp_guest_request);
+ case SEV_CMD_SNP_CONFIG: return sizeof(struct sev_user_data_snp_config);
default: return 0;
}

diff --git a/include/linux/psp-sev.h b/include/linux/psp-sev.h
index 1595088c428b..31b045e1926f 100644
--- a/include/linux/psp-sev.h
+++ b/include/linux/psp-sev.h
@@ -86,6 +86,35 @@ enum sev_cmd {
SEV_CMD_DBG_DECRYPT = 0x060,
SEV_CMD_DBG_ENCRYPT = 0x061,

+ /* SNP specific commands */
+ SEV_CMD_SNP_INIT = 0x81,
+ SEV_CMD_SNP_SHUTDOWN = 0x82,
+ SEV_CMD_SNP_PLATFORM_STATUS = 0x83,
+ SEV_CMD_SNP_DF_FLUSH = 0x84,
+ SEV_CMD_SNP_INIT_EX = 0x85,
+ SEV_CMD_SNP_SHUTDOWN_EX = 0x86,
+ SEV_CMD_SNP_DECOMMISSION = 0x90,
+ SEV_CMD_SNP_ACTIVATE = 0x91,
+ SEV_CMD_SNP_GUEST_STATUS = 0x92,
+ SEV_CMD_SNP_GCTX_CREATE = 0x93,
+ SEV_CMD_SNP_GUEST_REQUEST = 0x94,
+ SEV_CMD_SNP_ACTIVATE_EX = 0x95,
+ SEV_CMD_SNP_LAUNCH_START = 0xA0,
+ SEV_CMD_SNP_LAUNCH_UPDATE = 0xA1,
+ SEV_CMD_SNP_LAUNCH_FINISH = 0xA2,
+ SEV_CMD_SNP_DBG_DECRYPT = 0xB0,
+ SEV_CMD_SNP_DBG_ENCRYPT = 0xB1,
+ SEV_CMD_SNP_PAGE_SWAP_OUT = 0xC0,
+ SEV_CMD_SNP_PAGE_SWAP_IN = 0xC1,
+ SEV_CMD_SNP_PAGE_MOVE = 0xC2,
+ SEV_CMD_SNP_PAGE_MD_INIT = 0xC3,
+ SEV_CMD_SNP_PAGE_MD_RECLAIM = 0xC4,
+ SEV_CMD_SNP_PAGE_RO_RECLAIM = 0xC5,
+ SEV_CMD_SNP_PAGE_RO_RESTORE = 0xC6,
+ SEV_CMD_SNP_PAGE_RECLAIM = 0xC7,
+ SEV_CMD_SNP_PAGE_UNSMASH = 0xC8,
+ SEV_CMD_SNP_CONFIG = 0xC9,
+
SEV_CMD_MAX,
};

@@ -531,6 +560,224 @@ struct sev_data_attestation_report {
u32 len; /* In/Out */
} __packed;

+/**
+ * struct sev_data_snp_download_firmware - SNP_DOWNLOAD_FIRMWARE command params
+ *
+ * @address: physical address of firmware image
+ * @len: len of the firmware image
+ */
+struct sev_data_snp_download_firmware {
+ u64 address; /* In */
+ u32 len; /* In */
+} __packed;
+
+/**
+ * struct sev_data_snp_activate - SNP_ACTIVATE command params
+ *
+ * @gctx_paddr: system physical address guest context page
+ * @asid: ASID to bind to the guest
+ */
+struct sev_data_snp_activate {
+ u64 gctx_paddr; /* In */
+ u32 asid; /* In */
+} __packed;
+
+/**
+ * struct sev_data_snp_addr - generic SNP command params
+ *
+ * @address: system physical address guest context page
+ */
+struct sev_data_snp_addr {
+ u64 gctx_paddr; /* In */
+} __packed;
+
+/**
+ * struct sev_data_snp_launch_start - SNP_LAUNCH_START command params
+ *
+ * @gctx_addr: system physical address of guest context page
+ * @policy: guest policy
+ * @ma_gctx_addr: system physical address of migration agent
+ * @imi_en: launch flow is launching an IMI for the purpose of
+ * guest-assisted migration.
+ * @ma_en: the guest is associated with a migration agent
+ */
+struct sev_data_snp_launch_start {
+ u64 gctx_paddr; /* In */
+ u64 policy; /* In */
+ u64 ma_gctx_paddr; /* In */
+ u32 ma_en:1; /* In */
+ u32 imi_en:1; /* In */
+ u32 rsvd:30;
+ u8 gosvw[16]; /* In */
+} __packed;
+
+/* SNP support page type */
+enum {
+ SNP_PAGE_TYPE_NORMAL = 0x1,
+ SNP_PAGE_TYPE_VMSA = 0x2,
+ SNP_PAGE_TYPE_ZERO = 0x3,
+ SNP_PAGE_TYPE_UNMEASURED = 0x4,
+ SNP_PAGE_TYPE_SECRET = 0x5,
+ SNP_PAGE_TYPE_CPUID = 0x6,
+
+ SNP_PAGE_TYPE_MAX
+};
+
+/**
+ * struct sev_data_snp_launch_update - SNP_LAUNCH_UPDATE command params
+ *
+ * @gctx_addr: system physical address of guest context page
+ * @imi_page: indicates that this page is part of the IMI of the guest
+ * @page_type: encoded page type
+ * @page_size: page size 0 indicates 4K and 1 indicates 2MB page
+ * @address: system physical address of destination page to encrypt
+ * @vmpl1_perms: VMPL permission mask for VMPL1
+ * @vmpl2_perms: VMPL permission mask for VMPL2
+ * @vmpl3_perms: VMPL permission mask for VMPL3
+ */
+struct sev_data_snp_launch_update {
+ u64 gctx_paddr; /* In */
+ u32 page_size:1; /* In */
+ u32 page_type:3; /* In */
+ u32 imi_page:1; /* In */
+ u32 rsvd:27;
+ u32 rsvd2;
+ u64 address; /* In */
+ u32 rsvd3:8;
+ u32 vmpl1_perms:8; /* In */
+ u32 vmpl2_perms:8; /* In */
+ u32 vmpl3_perms:8; /* In */
+ u32 rsvd4;
+} __packed;
+
+/**
+ * struct sev_data_snp_launch_finish - SNP_LAUNCH_FINISH command params
+ *
+ * @gctx_addr: system physical address of guest context page
+ */
+struct sev_data_snp_launch_finish {
+ u64 gctx_paddr;
+ u64 id_block_paddr;
+ u64 id_auth_paddr;
+ u8 id_block_en:1;
+ u8 auth_key_en:1;
+ u64 rsvd:62;
+ u8 host_data[32];
+} __packed;
+
+/**
+ * struct sev_data_snp_guest_status - SNP_GUEST_STATUS command params
+ *
+ * @gctx_paddr: system physical address of guest context page
+ * @address: system physical address of guest status page
+ */
+struct sev_data_snp_guest_status {
+ u64 gctx_paddr;
+ u64 address;
+} __packed;
+
+/**
+ * struct sev_data_snp_page_reclaim - SNP_PAGE_RECLAIM command params
+ *
+ * @paddr: system physical address of page to be claimed. The 0th bit
+ * in the address indicates the page size. 0h indicates 4 kB and
+ * 1h indicates 2 MB page.
+ */
+struct sev_data_snp_page_reclaim {
+ u64 paddr;
+} __packed;
+
+/**
+ * struct sev_data_snp_page_unsmash - SNP_PAGE_UNSMASH command params
+ *
+ * @paddr: system physical address of page to be unsmashed. The 0th bit
+ * in the address indicates the page size. 0h indicates 4 kB and
+ * 1h indicates 2 MB page.
+ */
+struct sev_data_snp_page_unsmash {
+ u64 paddr;
+} __packed;
+
+/**
+ * struct sev_data_dbg - DBG_ENCRYPT/DBG_DECRYPT command parameters
+ *
+ * @handle: handle of the VM to perform debug operation
+ * @src_addr: source address of data to operate on
+ * @dst_addr: destination address of data to operate on
+ * @len: len of data to operate on
+ */
+struct sev_data_snp_dbg {
+ u64 gctx_paddr; /* In */
+ u64 src_addr; /* In */
+ u64 dst_addr; /* In */
+ u32 len; /* In */
+} __packed;
+
+/**
+ * struct sev_snp_guest_request - SNP_GUEST_REQUEST command params
+ *
+ * @gctx_paddr: system physical address of guest context page
+ * @req_paddr: system physical address of request page
+ * @res_paddr: system physical address of response page
+ */
+struct sev_data_snp_guest_request {
+ u64 gctx_paddr; /* In */
+ u64 req_paddr; /* In */
+ u64 res_paddr; /* In */
+} __packed;
+
+/**
+ * struct sev_data_snp_init - SNP_INIT_EX structure
+ *
+ * @init_rmp: indicate that the RMP should be initialized.
+ * @list_paddr_en: indicate that list_paddr is valid
+ * #list_paddr: system physical address of range list
+ */
+struct sev_data_snp_init_ex {
+ u32 init_rmp:1;
+ u32 list_paddr_en:1;
+ u32 rsvd:30;
+ u32 rsvd1;
+ u64 list_paddr;
+ u8 rsvd2[48];
+} __packed;
+
+/**
+ * struct sev_data_range - RANGE structure
+ *
+ * @base: system physical address of first byte of range
+ * @page_count: number of 4KB pages in this range
+ */
+struct sev_data_range {
+ u64 base;
+ u32 page_count;
+ u32 rsvd;
+} __packed;
+
+/**
+ * struct sev_data_range_list - RANGE_LIST structure
+ *
+ * @num_elements: number of elements in RANGE_ARRAY
+ * @ranges: array of num_elements of type RANGE
+ */
+struct sev_data_range_list {
+ u32 num_elements;
+ u32 rsvd;
+ struct sev_data_range ranges[0];
+} __packed;
+
+/**
+ * struct sev_data_snp_shutdown_ex - SNP_SHUTDOWN_EX structure
+ *
+ * @length: len of the command buffer read by the PSP
+ * @iommu_snp_shutdown: Disable enforcement of SNP in the IOMMU
+ */
+struct sev_data_snp_shutdown_ex {
+ u32 length;
+ u32 iommu_snp_shutdown:1;
+ u32 rsvd1:31;
+} __packed;
+
#ifdef CONFIG_CRYPTO_DEV_SP_PSP

/**
diff --git a/include/uapi/linux/psp-sev.h b/include/uapi/linux/psp-sev.h
index 91b4c63d5cbf..bed65a891223 100644
--- a/include/uapi/linux/psp-sev.h
+++ b/include/uapi/linux/psp-sev.h
@@ -61,6 +61,13 @@ typedef enum {
SEV_RET_INVALID_PARAM,
SEV_RET_RESOURCE_LIMIT,
SEV_RET_SECURE_DATA_INVALID,
+ SEV_RET_INVALID_PAGE_SIZE,
+ SEV_RET_INVALID_PAGE_STATE,
+ SEV_RET_INVALID_MDATA_ENTRY,
+ SEV_RET_INVALID_PAGE_OWNER,
+ SEV_RET_INVALID_PAGE_AEAD_OFLOW,
+ SEV_RET_RMP_INIT_REQUIRED,
+
SEV_RET_MAX,
} sev_ret_code;

@@ -147,6 +154,41 @@ struct sev_user_data_get_id2 {
__u32 length; /* In/Out */
} __packed;

+/**
+ * struct sev_user_data_snp_status - SNP status
+ *
+ * @major: API major version
+ * @minor: API minor version
+ * @state: current platform state
+ * @build: firmware build id for the API version
+ * @guest_count: the number of guest currently managed by the firmware
+ * @tcb_version: current TCB version
+ */
+struct sev_user_data_snp_status {
+ __u8 api_major; /* Out */
+ __u8 api_minor; /* Out */
+ __u8 state; /* Out */
+ __u8 rsvd;
+ __u32 build_id; /* Out */
+ __u32 rsvd1;
+ __u32 guest_count; /* Out */
+ __u64 tcb_version; /* Out */
+ __u64 rsvd2;
+} __packed;
+
+/*
+ * struct sev_user_data_snp_config - system wide configuration value for SNP.
+ *
+ * @reported_tcb: The TCB version to report in the guest attestation report.
+ * @mask_chip_id: Indicates that the CHID_ID field in the attestation report
+ * will always be zero.
+ */
+struct sev_user_data_snp_config {
+ __u64 reported_tcb; /* In */
+ __u32 mask_chip_id; /* In */
+ __u8 rsvd[52];
+} __packed;
+
/**
* struct sev_issue_cmd - SEV ioctl parameters
*
--
2.25.1

2022-12-14 20:03:21

by Michael Roth

[permalink] [raw]
Subject: [PATCH RFC v7 33/64] KVM: SVM: Add support to handle AP reset MSR protocol

From: Tom Lendacky <[email protected]>

Add support for AP Reset Hold being invoked using the GHCB MSR protocol,
available in version 2 of the GHCB specification.

Signed-off-by: Tom Lendacky <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
Signed-off-by: Michael Roth <[email protected]>
---
arch/x86/include/asm/sev-common.h | 2 ++
arch/x86/kvm/svm/sev.c | 56 ++++++++++++++++++++++++++-----
arch/x86/kvm/svm/svm.h | 1 +
3 files changed, 51 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
index b8357d6ecd47..e15548d88f2a 100644
--- a/arch/x86/include/asm/sev-common.h
+++ b/arch/x86/include/asm/sev-common.h
@@ -56,6 +56,8 @@
/* AP Reset Hold */
#define GHCB_MSR_AP_RESET_HOLD_REQ 0x006
#define GHCB_MSR_AP_RESET_HOLD_RESP 0x007
+#define GHCB_MSR_AP_RESET_HOLD_RESULT_POS 12
+#define GHCB_MSR_AP_RESET_HOLD_RESULT_MASK GENMASK_ULL(51, 0)

/* GHCB GPA Register */
#define GHCB_MSR_REG_GPA_REQ 0x012
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 6579ed218f6a..244c58bd3de7 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -57,6 +57,10 @@ module_param_named(sev_es, sev_es_enabled, bool, 0444);
#define sev_es_enabled false
#endif /* CONFIG_KVM_AMD_SEV */

+#define AP_RESET_HOLD_NONE 0
+#define AP_RESET_HOLD_NAE_EVENT 1
+#define AP_RESET_HOLD_MSR_PROTO 2
+
static u8 sev_enc_bit;
static DECLARE_RWSEM(sev_deactivate_lock);
static DEFINE_MUTEX(sev_bitmap_lock);
@@ -2698,6 +2702,9 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)

void sev_es_unmap_ghcb(struct vcpu_svm *svm)
{
+ /* Clear any indication that the vCPU is in a type of AP Reset Hold */
+ svm->sev_es.ap_reset_hold_type = AP_RESET_HOLD_NONE;
+
if (!svm->sev_es.ghcb)
return;

@@ -2910,6 +2917,22 @@ static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
GHCB_MSR_INFO_POS);
break;
}
+ case GHCB_MSR_AP_RESET_HOLD_REQ:
+ svm->sev_es.ap_reset_hold_type = AP_RESET_HOLD_MSR_PROTO;
+ ret = kvm_emulate_ap_reset_hold(&svm->vcpu);
+
+ /*
+ * Preset the result to a non-SIPI return and then only set
+ * the result to non-zero when delivering a SIPI.
+ */
+ set_ghcb_msr_bits(svm, 0,
+ GHCB_MSR_AP_RESET_HOLD_RESULT_MASK,
+ GHCB_MSR_AP_RESET_HOLD_RESULT_POS);
+
+ set_ghcb_msr_bits(svm, GHCB_MSR_AP_RESET_HOLD_RESP,
+ GHCB_MSR_INFO_MASK,
+ GHCB_MSR_INFO_POS);
+ break;
case GHCB_MSR_TERM_REQ: {
u64 reason_set, reason_code;

@@ -3009,6 +3032,7 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
ret = svm_invoke_exit_handler(vcpu, SVM_EXIT_IRET);
break;
case SVM_VMGEXIT_AP_HLT_LOOP:
+ svm->sev_es.ap_reset_hold_type = AP_RESET_HOLD_NAE_EVENT;
ret = kvm_emulate_ap_reset_hold(vcpu);
break;
case SVM_VMGEXIT_AP_JUMP_TABLE: {
@@ -3169,15 +3193,31 @@ void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector)
return;
}

- /*
- * Subsequent SIPI: Return from an AP Reset Hold VMGEXIT, where
- * the guest will set the CS and RIP. Set SW_EXIT_INFO_2 to a
- * non-zero value.
- */
- if (!svm->sev_es.ghcb)
- return;
+ /* Subsequent SIPI */
+ switch (svm->sev_es.ap_reset_hold_type) {
+ case AP_RESET_HOLD_NAE_EVENT:
+ /*
+ * Return from an AP Reset Hold VMGEXIT, where the guest will
+ * set the CS and RIP. Set SW_EXIT_INFO_2 to a non-zero value.
+ */
+ ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, 1);
+ break;
+ case AP_RESET_HOLD_MSR_PROTO:
+ /*
+ * Return from an AP Reset Hold VMGEXIT, where the guest will
+ * set the CS and RIP. Set GHCB data field to a non-zero value.
+ */
+ set_ghcb_msr_bits(svm, 1,
+ GHCB_MSR_AP_RESET_HOLD_RESULT_MASK,
+ GHCB_MSR_AP_RESET_HOLD_RESULT_POS);

- ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, 1);
+ set_ghcb_msr_bits(svm, GHCB_MSR_AP_RESET_HOLD_RESP,
+ GHCB_MSR_INFO_MASK,
+ GHCB_MSR_INFO_POS);
+ break;
+ default:
+ break;
+ }
}

int sev_fault_is_private(struct kvm *kvm, gpa_t gpa, u64 error_code, bool *private_fault)
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index c760ec51a910..cb9da04e745a 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -198,6 +198,7 @@ struct vcpu_sev_es_state {
struct ghcb *ghcb;
struct kvm_host_map ghcb_map;
bool received_first_sipi;
+ unsigned int ap_reset_hold_type;

/* SEV-ES scratch area support */
void *ghcb_sa;
--
2.25.1

2022-12-14 20:04:24

by Michael Roth

[permalink] [raw]
Subject: [PATCH RFC v7 34/64] KVM: SVM: Provide the Hypervisor Feature support VMGEXIT

From: Brijesh Singh <[email protected]>

Version 2 of the GHCB specification introduced advertisement of features
that are supported by the Hypervisor.

Now that KVM supports version 2 of the GHCB specification, bump the
maximum supported protocol version.

Signed-off-by: Brijesh Singh <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
Signed-off-by: Michael Roth <[email protected]>
---
arch/x86/include/asm/sev-common.h | 2 ++
arch/x86/kvm/svm/sev.c | 14 ++++++++++++++
arch/x86/kvm/svm/svm.h | 3 ++-
3 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
index e15548d88f2a..539de6b93420 100644
--- a/arch/x86/include/asm/sev-common.h
+++ b/arch/x86/include/asm/sev-common.h
@@ -101,6 +101,8 @@ enum psc_op {
/* GHCB Hypervisor Feature Request/Response */
#define GHCB_MSR_HV_FT_REQ 0x080
#define GHCB_MSR_HV_FT_RESP 0x081
+#define GHCB_MSR_HV_FT_POS 12
+#define GHCB_MSR_HV_FT_MASK GENMASK_ULL(51, 0)
#define GHCB_MSR_HV_FT_RESP_VAL(v) \
/* GHCBData[63:12] */ \
(((u64)(v) & GENMASK_ULL(63, 12)) >> 12)
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 244c58bd3de7..82ff96b4f04a 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -2667,6 +2667,7 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
case SVM_VMGEXIT_AP_HLT_LOOP:
case SVM_VMGEXIT_AP_JUMP_TABLE:
case SVM_VMGEXIT_UNSUPPORTED_EVENT:
+ case SVM_VMGEXIT_HV_FEATURES:
break;
default:
reason = GHCB_ERR_INVALID_EVENT;
@@ -2933,6 +2934,13 @@ static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
GHCB_MSR_INFO_MASK,
GHCB_MSR_INFO_POS);
break;
+ case GHCB_MSR_HV_FT_REQ: {
+ set_ghcb_msr_bits(svm, GHCB_HV_FT_SUPPORTED,
+ GHCB_MSR_HV_FT_MASK, GHCB_MSR_HV_FT_POS);
+ set_ghcb_msr_bits(svm, GHCB_MSR_HV_FT_RESP,
+ GHCB_MSR_INFO_MASK, GHCB_MSR_INFO_POS);
+ break;
+ }
case GHCB_MSR_TERM_REQ: {
u64 reason_set, reason_code;

@@ -3057,6 +3065,12 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
ret = 1;
break;
}
+ case SVM_VMGEXIT_HV_FEATURES: {
+ ghcb_set_sw_exit_info_2(ghcb, GHCB_HV_FT_SUPPORTED);
+
+ ret = 1;
+ break;
+ }
case SVM_VMGEXIT_UNSUPPORTED_EVENT:
vcpu_unimpl(vcpu,
"vmgexit: unsupported event - exit_info_1=%#llx, exit_info_2=%#llx\n",
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index cb9da04e745a..1f3098dff3d5 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -654,9 +654,10 @@ void avic_set_virtual_apic_mode(struct kvm_vcpu *vcpu);

/* sev.c */

-#define GHCB_VERSION_MAX 1ULL
+#define GHCB_VERSION_MAX 2ULL
#define GHCB_VERSION_MIN 1ULL

+#define GHCB_HV_FT_SUPPORTED 0

extern unsigned int max_sev_asid;

--
2.25.1

2022-12-14 20:04:25

by Michael Roth

[permalink] [raw]
Subject: [PATCH RFC v7 31/64] crypto: ccp: Add the SNP_{SET,GET}_EXT_CONFIG command

From: Brijesh Singh <[email protected]>

The SEV-SNP firmware provides the SNP_CONFIG command used to set the
system-wide configuration value for SNP guests. The information includes
the TCB version string to be reported in guest attestation reports.

Version 2 of the GHCB specification adds an NAE (SNP extended guest
request) that a guest can use to query the reports that include additional
certificates.

In both cases, userspace provided additional data is included in the
attestation reports. The userspace will use the SNP_SET_EXT_CONFIG
command to give the certificate blob and the reported TCB version string
at once. Note that the specification defines certificate blob with a
specific GUID format; the userspace is responsible for building the
proper certificate blob. The ioctl treats it an opaque blob.

While it is not defined in the spec, but let's add SNP_GET_EXT_CONFIG
command that can be used to obtain the data programmed through the
SNP_SET_EXT_CONFIG.

Signed-off-by: Brijesh Singh <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
Signed-off-by: Michael Roth <[email protected]>
---
Documentation/virt/coco/sev-guest.rst | 27 ++++++
drivers/crypto/ccp/sev-dev.c | 123 ++++++++++++++++++++++++++
drivers/crypto/ccp/sev-dev.h | 4 +
include/uapi/linux/psp-sev.h | 17 ++++
4 files changed, 171 insertions(+)

diff --git a/Documentation/virt/coco/sev-guest.rst b/Documentation/virt/coco/sev-guest.rst
index 11ea67c944df..fad1e5639dac 100644
--- a/Documentation/virt/coco/sev-guest.rst
+++ b/Documentation/virt/coco/sev-guest.rst
@@ -145,6 +145,33 @@ The SNP_PLATFORM_STATUS command is used to query the SNP platform status. The
status includes API major, minor version and more. See the SEV-SNP
specification for further details.

+2.5 SNP_SET_EXT_CONFIG
+----------------------
+:Technology: sev-snp
+:Type: hypervisor ioctl cmd
+:Parameters (in): struct sev_data_snp_ext_config
+:Returns (out): 0 on success, -negative on error
+
+The SNP_SET_EXT_CONFIG is used to set the system-wide configuration such as
+reported TCB version in the attestation report. The command is similar to
+SNP_CONFIG command defined in the SEV-SNP spec. The main difference is the
+command also accepts an additional certificate blob defined in the GHCB
+specification.
+
+If the certs_address is zero, then the previous certificate blob will deleted.
+For more information on the certificate blob layout, see the GHCB spec
+(extended guest request message).
+
+2.6 SNP_GET_EXT_CONFIG
+----------------------
+:Technology: sev-snp
+:Type: hypervisor ioctl cmd
+:Parameters (in): struct sev_data_snp_ext_config
+:Returns (out): 0 on success, -negative on error
+
+The SNP_SET_EXT_CONFIG is used to query the system-wide configuration set
+through the SNP_SET_EXT_CONFIG.
+
3. SEV-SNP CPUID Enforcement
============================

diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index 10b87ec339aa..d59727ac2bdd 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -1480,6 +1480,10 @@ static int __sev_snp_shutdown_locked(int *error)
data.length = sizeof(data);
data.iommu_snp_shutdown = 1;

+ /* Free the memory used for caching the certificate data */
+ kfree(sev->snp_certs_data);
+ sev->snp_certs_data = NULL;
+
wbinvd_on_all_cpus();

retry:
@@ -1792,6 +1796,118 @@ static int sev_ioctl_snp_platform_status(struct sev_issue_cmd *argp)
return ret;
}

+static int sev_ioctl_snp_get_config(struct sev_issue_cmd *argp)
+{
+ struct sev_device *sev = psp_master->sev_data;
+ struct sev_user_data_ext_snp_config input;
+ int ret;
+
+ if (!sev->snp_initialized || !argp->data)
+ return -EINVAL;
+
+ memset(&input, 0, sizeof(input));
+
+ if (copy_from_user(&input, (void __user *)argp->data, sizeof(input)))
+ return -EFAULT;
+
+ /* Copy the TCB version programmed through the SET_CONFIG to userspace */
+ if (input.config_address) {
+ if (copy_to_user((void * __user)input.config_address,
+ &sev->snp_config, sizeof(struct sev_user_data_snp_config)))
+ return -EFAULT;
+ }
+
+ /* Copy the extended certs programmed through the SNP_SET_CONFIG */
+ if (input.certs_address && sev->snp_certs_data) {
+ if (input.certs_len < sev->snp_certs_len) {
+ /* Return the certs length to userspace */
+ input.certs_len = sev->snp_certs_len;
+
+ ret = -ENOSR;
+ goto e_done;
+ }
+
+ if (copy_to_user((void * __user)input.certs_address,
+ sev->snp_certs_data, sev->snp_certs_len))
+ return -EFAULT;
+ }
+
+ ret = 0;
+
+e_done:
+ if (copy_to_user((void __user *)argp->data, &input, sizeof(input)))
+ ret = -EFAULT;
+
+ return ret;
+}
+
+static int sev_ioctl_snp_set_config(struct sev_issue_cmd *argp, bool writable)
+{
+ struct sev_device *sev = psp_master->sev_data;
+ struct sev_user_data_ext_snp_config input;
+ struct sev_user_data_snp_config config;
+ void *certs = NULL;
+ int ret = 0;
+
+ if (!sev->snp_initialized || !argp->data)
+ return -EINVAL;
+
+ if (!writable)
+ return -EPERM;
+
+ memset(&input, 0, sizeof(input));
+
+ if (copy_from_user(&input, (void __user *)argp->data, sizeof(input)))
+ return -EFAULT;
+
+ /* Copy the certs from userspace */
+ if (input.certs_address) {
+ if (!input.certs_len || !IS_ALIGNED(input.certs_len, PAGE_SIZE))
+ return -EINVAL;
+
+ certs = psp_copy_user_blob(input.certs_address, input.certs_len);
+ if (IS_ERR(certs))
+ return PTR_ERR(certs);
+ }
+
+ /* Issue the PSP command to update the TCB version using the SNP_CONFIG. */
+ if (input.config_address) {
+ memset(&config, 0, sizeof(config));
+ if (copy_from_user(&config,
+ (void __user *)input.config_address, sizeof(config))) {
+ ret = -EFAULT;
+ goto e_free;
+ }
+
+ ret = __sev_do_cmd_locked(SEV_CMD_SNP_CONFIG, &config, &argp->error);
+ if (ret)
+ goto e_free;
+
+ memcpy(&sev->snp_config, &config, sizeof(config));
+ }
+
+ /*
+ * If the new certs are passed then cache it else free the old certs.
+ */
+ mutex_lock(&sev->snp_certs_lock);
+ if (certs) {
+ kfree(sev->snp_certs_data);
+ sev->snp_certs_data = certs;
+ sev->snp_certs_len = input.certs_len;
+ } else {
+ kfree(sev->snp_certs_data);
+ sev->snp_certs_data = NULL;
+ sev->snp_certs_len = 0;
+ }
+ mutex_unlock(&sev->snp_certs_lock);
+
+ return 0;
+
+e_free:
+ kfree(certs);
+ return ret;
+}
+
static long sev_ioctl(struct file *file, unsigned int ioctl, unsigned long arg)
{
void __user *argp = (void __user *)arg;
@@ -1846,6 +1962,12 @@ static long sev_ioctl(struct file *file, unsigned int ioctl, unsigned long arg)
case SNP_PLATFORM_STATUS:
ret = sev_ioctl_snp_platform_status(&input);
break;
+ case SNP_SET_EXT_CONFIG:
+ ret = sev_ioctl_snp_set_config(&input, writable);
+ break;
+ case SNP_GET_EXT_CONFIG:
+ ret = sev_ioctl_snp_get_config(&input);
+ break;
default:
ret = -EINVAL;
goto out;
@@ -1961,6 +2083,7 @@ int sev_dev_init(struct psp_device *psp)
goto e_sev;

sev->cmd_buf_backup = (uint8_t *)sev->cmd_buf + PAGE_SIZE;
+ mutex_init(&sev->snp_certs_lock);

psp->sev_data = sev;

diff --git a/drivers/crypto/ccp/sev-dev.h b/drivers/crypto/ccp/sev-dev.h
index 19d79f9d4212..41d5353d5bab 100644
--- a/drivers/crypto/ccp/sev-dev.h
+++ b/drivers/crypto/ccp/sev-dev.h
@@ -66,6 +66,10 @@ struct sev_device {

bool snp_initialized;
struct snp_host_map snp_host_map[MAX_SNP_HOST_MAP_BUFS];
+ void *snp_certs_data;
+ u32 snp_certs_len;
+ struct mutex snp_certs_lock;
+ struct sev_user_data_snp_config snp_config;
};

int sev_dev_init(struct psp_device *psp);
diff --git a/include/uapi/linux/psp-sev.h b/include/uapi/linux/psp-sev.h
index ffd60e8b0a31..60e7a8d1a18e 100644
--- a/include/uapi/linux/psp-sev.h
+++ b/include/uapi/linux/psp-sev.h
@@ -29,6 +29,8 @@ enum {
SEV_GET_ID, /* This command is deprecated, use SEV_GET_ID2 */
SEV_GET_ID2,
SNP_PLATFORM_STATUS,
+ SNP_SET_EXT_CONFIG,
+ SNP_GET_EXT_CONFIG,

SEV_MAX,
};
@@ -190,6 +192,21 @@ struct sev_user_data_snp_config {
__u8 rsvd[52];
} __packed;

+/**
+ * struct sev_data_snp_ext_config - system wide configuration value for SNP.
+ *
+ * @config_address: address of the struct sev_user_data_snp_config or 0 when
+ * reported_tcb does not need to be updated.
+ * @certs_address: address of extended guest request certificate chain or
+ * 0 when previous certificate should be removed on SNP_SET_EXT_CONFIG.
+ * @certs_len: length of the certs
+ */
+struct sev_user_data_ext_snp_config {
+ __u64 config_address; /* In */
+ __u64 certs_address; /* In */
+ __u32 certs_len; /* In */
+};
+
/**
* struct sev_issue_cmd - SEV ioctl parameters
*
--
2.25.1

2022-12-14 20:04:37

by Michael Roth

[permalink] [raw]
Subject: [PATCH RFC v7 28/64] crypto: ccp: Handle the legacy TMR allocation when SNP is enabled

From: Brijesh Singh <[email protected]>

The behavior and requirement for the SEV-legacy command is altered when
the SNP firmware is in the INIT state. See SEV-SNP firmware specification
for more details.

Allocate the Trusted Memory Region (TMR) as a 2mb sized/aligned region
when SNP is enabled to satisfy new requirements for the SNP. Continue
allocating a 1mb region for !SNP configuration.

While at it, provide API that can be used by others to allocate a page
that can be used by the firmware. The immediate user for this API will
be the KVM driver. The KVM driver to need to allocate a firmware context
page during the guest creation. The context page need to be updated
by the firmware. See the SEV-SNP specification for further details.

Co-developed-by: Ashish Kalra <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
Signed-off-by: Michael Roth <[email protected]>
---
drivers/crypto/ccp/sev-dev.c | 148 +++++++++++++++++++++++++++++++++--
include/linux/psp-sev.h | 9 +++
2 files changed, 149 insertions(+), 8 deletions(-)

diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index eca4e59b0f44..4c12e98a1219 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -94,6 +94,13 @@ static void *sev_init_ex_buffer;
*/
struct sev_data_range_list *snp_range_list;

+/* When SEV-SNP is enabled the TMR needs to be 2MB aligned and 2MB size. */
+#define SEV_SNP_ES_TMR_SIZE (2 * 1024 * 1024)
+
+static size_t sev_es_tmr_size = SEV_ES_TMR_SIZE;
+
+static int __sev_do_cmd_locked(int cmd, void *data, int *psp_ret);
+
static inline bool sev_version_greater_or_equal(u8 maj, u8 min)
{
struct sev_device *sev = psp_master->sev_data;
@@ -216,11 +223,134 @@ void snp_mark_pages_offline(unsigned long pfn, unsigned int npages)
}
EXPORT_SYMBOL_GPL(snp_mark_pages_offline);

+static int snp_reclaim_pages(unsigned long paddr, unsigned int npages, bool locked)
+{
+ /* Cbit maybe set in the paddr */
+ unsigned long pfn = __sme_clr(paddr) >> PAGE_SHIFT;
+ int ret, err, i, n = 0;
+
+ if (!pfn_valid(pfn)) {
+ pr_err("%s: Invalid PFN %lx\n", __func__, pfn);
+ return 0;
+ }
+
+ for (i = 0; i < npages; i++, pfn++, n++) {
+ paddr = pfn << PAGE_SHIFT;
+
+ if (locked)
+ ret = __sev_do_cmd_locked(SEV_CMD_SNP_PAGE_RECLAIM, &paddr, &err);
+ else
+ ret = sev_do_cmd(SEV_CMD_SNP_PAGE_RECLAIM, &paddr, &err);
+
+ if (ret)
+ goto cleanup;
+
+ ret = rmp_make_shared(pfn, PG_LEVEL_4K);
+ if (ret)
+ goto cleanup;
+ }
+
+ return 0;
+
+cleanup:
+ /*
+ * If failed to reclaim the page then page is no longer safe to
+ * be release back to the system, leak it.
+ */
+ snp_mark_pages_offline(pfn, npages - n);
+ return ret;
+}
+
+static int rmp_mark_pages_firmware(unsigned long paddr, unsigned int npages, bool locked)
+{
+ /* Cbit maybe set in the paddr */
+ unsigned long pfn = __sme_clr(paddr) >> PAGE_SHIFT;
+ int rc, n = 0, i;
+
+ for (i = 0; i < npages; i++, n++, pfn++) {
+ rc = rmp_make_private(pfn, 0, PG_LEVEL_4K, 0, true);
+ if (rc)
+ goto cleanup;
+ }
+
+ return 0;
+
+cleanup:
+ /*
+ * Try unrolling the firmware state changes by
+ * reclaiming the pages which were already changed to the
+ * firmware state.
+ */
+ snp_reclaim_pages(paddr, n, locked);
+
+ return rc;
+}
+
+static struct page *__snp_alloc_firmware_pages(gfp_t gfp_mask, int order, bool locked)
+{
+ unsigned long npages = 1ul << order, paddr;
+ struct sev_device *sev;
+ struct page *page;
+
+ if (!psp_master || !psp_master->sev_data)
+ return NULL;
+
+ page = alloc_pages(gfp_mask, order);
+ if (!page)
+ return NULL;
+
+ /* If SEV-SNP is initialized then add the page in RMP table. */
+ sev = psp_master->sev_data;
+ if (!sev->snp_initialized)
+ return page;
+
+ paddr = __pa((unsigned long)page_address(page));
+ if (rmp_mark_pages_firmware(paddr, npages, locked))
+ return NULL;
+
+ return page;
+}
+
+void *snp_alloc_firmware_page(gfp_t gfp_mask)
+{
+ struct page *page;
+
+ page = __snp_alloc_firmware_pages(gfp_mask, 0, false);
+
+ return page ? page_address(page) : NULL;
+}
+EXPORT_SYMBOL_GPL(snp_alloc_firmware_page);
+
+static void __snp_free_firmware_pages(struct page *page, int order, bool locked)
+{
+ struct sev_device *sev = psp_master->sev_data;
+ unsigned long paddr, npages = 1ul << order;
+
+ if (!page)
+ return;
+
+ paddr = __pa((unsigned long)page_address(page));
+ if (sev->snp_initialized &&
+ snp_reclaim_pages(paddr, npages, locked))
+ return;
+
+ __free_pages(page, order);
+}
+
+void snp_free_firmware_page(void *addr)
+{
+ if (!addr)
+ return;
+
+ __snp_free_firmware_pages(virt_to_page(addr), 0, false);
+}
+EXPORT_SYMBOL_GPL(snp_free_firmware_page);
+
static void *sev_fw_alloc(unsigned long len)
{
struct page *page;

- page = alloc_pages(GFP_KERNEL, get_order(len));
+ page = __snp_alloc_firmware_pages(GFP_KERNEL, get_order(len), false);
if (!page)
return NULL;

@@ -468,7 +598,7 @@ static int __sev_init_locked(int *error)
data.tmr_address = __pa(sev_es_tmr);

data.flags |= SEV_INIT_FLAGS_SEV_ES;
- data.tmr_len = SEV_ES_TMR_SIZE;
+ data.tmr_len = sev_es_tmr_size;
}

return __sev_do_cmd_locked(SEV_CMD_INIT, &data, error);
@@ -491,7 +621,7 @@ static int __sev_init_ex_locked(int *error)
data.tmr_address = __pa(sev_es_tmr);

data.flags |= SEV_INIT_FLAGS_SEV_ES;
- data.tmr_len = SEV_ES_TMR_SIZE;
+ data.tmr_len = sev_es_tmr_size;
}

return __sev_do_cmd_locked(SEV_CMD_INIT_EX, &data, error);
@@ -982,6 +1112,8 @@ static int __sev_snp_init_locked(int *error)
sev->snp_initialized = true;
dev_dbg(sev->dev, "SEV-SNP firmware initialized\n");

+ sev_es_tmr_size = SEV_SNP_ES_TMR_SIZE;
+
return rc;
}

@@ -1499,8 +1631,9 @@ static void sev_firmware_shutdown(struct sev_device *sev)
/* The TMR area was encrypted, flush it from the cache */
wbinvd_on_all_cpus();

- free_pages((unsigned long)sev_es_tmr,
- get_order(SEV_ES_TMR_SIZE));
+ __snp_free_firmware_pages(virt_to_page(sev_es_tmr),
+ get_order(sev_es_tmr_size),
+ false);
sev_es_tmr = NULL;
}

@@ -1511,8 +1644,7 @@ static void sev_firmware_shutdown(struct sev_device *sev)
}

if (snp_range_list) {
- free_pages((unsigned long)snp_range_list,
- get_order(PAGE_SIZE));
+ snp_free_firmware_page(snp_range_list);
snp_range_list = NULL;
}

@@ -1593,7 +1725,7 @@ void sev_pci_init(void)
}

/* Obtain the TMR memory area for SEV-ES use */
- sev_es_tmr = sev_fw_alloc(SEV_ES_TMR_SIZE);
+ sev_es_tmr = sev_fw_alloc(sev_es_tmr_size);
if (!sev_es_tmr)
dev_warn(sev->dev,
"SEV: TMR allocation failed, SEV-ES support unavailable\n");
diff --git a/include/linux/psp-sev.h b/include/linux/psp-sev.h
index 8edf5c548fbf..d19744807471 100644
--- a/include/linux/psp-sev.h
+++ b/include/linux/psp-sev.h
@@ -922,6 +922,8 @@ int sev_guest_decommission(struct sev_data_decommission *data, int *error);
int sev_do_cmd(int cmd, void *data, int *psp_ret);

void *psp_copy_user_blob(u64 uaddr, u32 len);
+void *snp_alloc_firmware_page(gfp_t mask);
+void snp_free_firmware_page(void *addr);

/**
* sev_mark_pages_offline - insert non-reclaimed firmware/guest pages
@@ -959,6 +961,13 @@ static inline void *psp_copy_user_blob(u64 __user uaddr, u32 len) { return ERR_P

void snp_mark_pages_offline(unsigned long pfn, unsigned int npages) {}

+static inline void *snp_alloc_firmware_page(gfp_t mask)
+{
+ return NULL;
+}
+
+static inline void snp_free_firmware_page(void *addr) { }
+
#endif /* CONFIG_CRYPTO_DEV_SP_PSP */

#endif /* __PSP_SEV_H__ */
--
2.25.1

2022-12-14 20:04:55

by Michael Roth

[permalink] [raw]
Subject: [PATCH RFC v7 35/64] KVM: SVM: Make AVIC backing, VMSA and VMCB memory allocation SNP safe

From: Brijesh Singh <[email protected]>

Implement a workaround for an SNP erratum where the CPU will incorrectly
signal an RMP violation #PF if a hugepage (2mb or 1gb) collides with the
RMP entry of a VMCB, VMSA or AVIC backing page.

When SEV-SNP is globally enabled, the CPU marks the VMCB, VMSA, and AVIC
backing pages as "in-use" in the RMP after a successful VMRUN. This
is done for _all_ VMs, not just SNP-Active VMs.

If the hypervisor accesses an in-use page through a writable
translation, the CPU will throw an RMP violation #PF. On early SNP
hardware, if an in-use page is 2mb aligned and software accesses any
part of the associated 2mb region with a hupage, the CPU will
incorrectly treat the entire 2mb region as in-use and signal a spurious
RMP violation #PF.

The recommended is to not use the hugepage for the VMCB, VMSA or
AVIC backing page. Add a generic allocator that will ensure that the
page returns is not hugepage (2mb or 1gb) and is safe to be used when
SEV-SNP is enabled.

Co-developed-by: Marc Orr <[email protected]>
Signed-off-by: Marc Orr <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
Signed-off-by: Michael Roth <[email protected]>
---
arch/x86/include/asm/kvm-x86-ops.h | 1 +
arch/x86/include/asm/kvm_host.h | 2 ++
arch/x86/kvm/lapic.c | 5 ++++-
arch/x86/kvm/svm/sev.c | 33 ++++++++++++++++++++++++++++++
arch/x86/kvm/svm/svm.c | 15 ++++++++++++--
arch/x86/kvm/svm/svm.h | 1 +
6 files changed, 54 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
index c71df44b0f02..e0015926cdf4 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -131,6 +131,7 @@ KVM_X86_OP(msr_filter_changed)
KVM_X86_OP(complete_emulated_msr)
KVM_X86_OP(vcpu_deliver_sipi_vector)
KVM_X86_OP_OPTIONAL_RET0(vcpu_get_apicv_inhibit_reasons);
+KVM_X86_OP_OPTIONAL(alloc_apic_backing_page)
KVM_X86_OP_OPTIONAL_RET0(private_mem_enabled);
KVM_X86_OP_OPTIONAL_RET0(fault_is_private);
KVM_X86_OP_OPTIONAL_RET0(update_mem_attr)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 9ef8d73455d9..e2529415f28b 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1722,6 +1722,8 @@ struct kvm_x86_ops {
* Returns vCPU specific APICv inhibit reasons
*/
unsigned long (*vcpu_get_apicv_inhibit_reasons)(struct kvm_vcpu *vcpu);
+
+ void *(*alloc_apic_backing_page)(struct kvm_vcpu *vcpu);
};

struct kvm_x86_nested_ops {
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 1bb63746e991..8500d1d54664 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -2581,7 +2581,10 @@ int kvm_create_lapic(struct kvm_vcpu *vcpu, int timer_advance_ns)

vcpu->arch.apic = apic;

- apic->regs = (void *)get_zeroed_page(GFP_KERNEL_ACCOUNT);
+ if (kvm_x86_ops.alloc_apic_backing_page)
+ apic->regs = static_call(kvm_x86_alloc_apic_backing_page)(vcpu);
+ else
+ apic->regs = (void *)get_zeroed_page(GFP_KERNEL_ACCOUNT);
if (!apic->regs) {
printk(KERN_ERR "malloc apic regs error for vcpu %x\n",
vcpu->vcpu_id);
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 82ff96b4f04a..0e93b536dc34 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3234,6 +3234,39 @@ void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector)
}
}

+struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu)
+{
+ unsigned long pfn;
+ struct page *p;
+
+ if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
+ return alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
+
+ /*
+ * Allocate an SNP safe page to workaround the SNP erratum where
+ * the CPU will incorrectly signal an RMP violation #PF if a
+ * hugepage (2mb or 1gb) collides with the RMP entry of VMCB, VMSA
+ * or AVIC backing page. The recommeded workaround is to not use the
+ * hugepage.
+ *
+ * Allocate one extra page, use a page which is not 2mb aligned
+ * and free the other.
+ */
+ p = alloc_pages(GFP_KERNEL_ACCOUNT | __GFP_ZERO, 1);
+ if (!p)
+ return NULL;
+
+ split_page(p, 1);
+
+ pfn = page_to_pfn(p);
+ if (IS_ALIGNED(pfn, PTRS_PER_PMD))
+ __free_page(p++);
+ else
+ __free_page(p + 1);
+
+ return p;
+}
+
int sev_fault_is_private(struct kvm *kvm, gpa_t gpa, u64 error_code, bool *private_fault)
{
gfn_t gfn = gpa_to_gfn(gpa);
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index fc7885869f7e..013f811c733c 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1360,7 +1360,7 @@ static int svm_vcpu_create(struct kvm_vcpu *vcpu)
svm = to_svm(vcpu);

err = -ENOMEM;
- vmcb01_page = alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
+ vmcb01_page = snp_safe_alloc_page(vcpu);
if (!vmcb01_page)
goto out;

@@ -1369,7 +1369,7 @@ static int svm_vcpu_create(struct kvm_vcpu *vcpu)
* SEV-ES guests require a separate VMSA page used to contain
* the encrypted register state of the guest.
*/
- vmsa_page = alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
+ vmsa_page = snp_safe_alloc_page(vcpu);
if (!vmsa_page)
goto error_free_vmcb_page;

@@ -4694,6 +4694,16 @@ static int svm_vm_init(struct kvm *kvm)
return 0;
}

+static void *svm_alloc_apic_backing_page(struct kvm_vcpu *vcpu)
+{
+ struct page *page = snp_safe_alloc_page(vcpu);
+
+ if (!page)
+ return NULL;
+
+ return page_address(page);
+}
+
static int svm_private_mem_enabled(struct kvm *kvm)
{
if (sev_guest(kvm))
@@ -4830,6 +4840,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {

.vcpu_deliver_sipi_vector = svm_vcpu_deliver_sipi_vector,
.vcpu_get_apicv_inhibit_reasons = avic_vcpu_get_apicv_inhibit_reasons,
+ .alloc_apic_backing_page = svm_alloc_apic_backing_page,

.fault_is_private = sev_fault_is_private,
};
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 1f3098dff3d5..ea9844546e8a 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -684,6 +684,7 @@ void sev_es_vcpu_reset(struct vcpu_svm *svm);
void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector);
void sev_es_prepare_switch_to_guest(struct sev_es_save_area *hostsa);
void sev_es_unmap_ghcb(struct vcpu_svm *svm);
+struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu);

int sev_fault_is_private(struct kvm *kvm, gpa_t gpa, u64 error_code, bool *private_fault);

--
2.25.1

2022-12-14 20:04:56

by Michael Roth

[permalink] [raw]
Subject: [PATCH RFC v7 36/64] KVM: SVM: Add initial SEV-SNP support

From: Brijesh Singh <[email protected]>

The next generation of SEV is called SEV-SNP (Secure Nested Paging).
SEV-SNP builds upon existing SEV and SEV-ES functionality while adding new
hardware based security protection. SEV-SNP adds strong memory encryption
integrity protection to help prevent malicious hypervisor-based attacks
such as data replay, memory re-mapping, and more, to create an isolated
execution environment.

The SNP feature is added incrementally, the later patches adds a new module
parameters that can be used to enabled SEV-SNP in the KVM.

Signed-off-by: Brijesh Singh <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
Signed-off-by: Michael Roth <[email protected]>
---
arch/x86/kvm/svm/sev.c | 10 +++++++++-
arch/x86/kvm/svm/svm.h | 8 ++++++++
2 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 0e93b536dc34..f34da1203e09 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -57,6 +57,9 @@ module_param_named(sev_es, sev_es_enabled, bool, 0444);
#define sev_es_enabled false
#endif /* CONFIG_KVM_AMD_SEV */

+/* enable/disable SEV-SNP support */
+static bool sev_snp_enabled;
+
#define AP_RESET_HOLD_NONE 0
#define AP_RESET_HOLD_NAE_EVENT 1
#define AP_RESET_HOLD_MSR_PROTO 2
@@ -2298,6 +2301,7 @@ void __init sev_hardware_setup(void)
{
#ifdef CONFIG_KVM_AMD_SEV
unsigned int eax, ebx, ecx, edx, sev_asid_count, sev_es_asid_count;
+ bool sev_snp_supported = false;
bool sev_es_supported = false;
bool sev_supported = false;

@@ -2377,12 +2381,16 @@ void __init sev_hardware_setup(void)
if (misc_cg_set_capacity(MISC_CG_RES_SEV_ES, sev_es_asid_count))
goto out;

- pr_info("SEV-ES supported: %u ASIDs\n", sev_es_asid_count);
sev_es_supported = true;
+ sev_snp_supported = sev_snp_enabled && cpu_feature_enabled(X86_FEATURE_SEV_SNP);
+
+ pr_info("SEV-ES %ssupported: %u ASIDs\n",
+ sev_snp_supported ? "and SEV-SNP " : "", sev_es_asid_count);

out:
sev_enabled = sev_supported;
sev_es_enabled = sev_es_supported;
+ sev_snp_enabled = sev_snp_supported;
#endif
}

diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index ea9844546e8a..a48fe5d2bea5 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -83,6 +83,7 @@ enum {
struct kvm_sev_info {
bool active; /* SEV enabled guest */
bool es_active; /* SEV-ES enabled guest */
+ bool snp_active; /* SEV-SNP enabled guest */
unsigned int asid; /* ASID used for this guest */
unsigned int handle; /* SEV firmware handle */
int fd; /* SEV device fd */
@@ -330,6 +331,13 @@ static __always_inline bool sev_es_guest(struct kvm *kvm)
#endif
}

+static inline bool sev_snp_guest(struct kvm *kvm)
+{
+ struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+
+ return sev_es_guest(kvm) && sev->snp_active;
+}
+
static inline void vmcb_mark_all_dirty(struct vmcb *vmcb)
{
vmcb->control.clean = 0;
--
2.25.1

2022-12-14 20:05:13

by Michael Roth

[permalink] [raw]
Subject: [PATCH RFC v7 37/64] KVM: SVM: Add KVM_SNP_INIT command

From: Brijesh Singh <[email protected]>

The KVM_SNP_INIT command is used by the hypervisor to initialize the
SEV-SNP platform context. In a typical workflow, this command should be the
first command issued. When creating SEV-SNP guest, the VMM must use this
command instead of the KVM_SEV_INIT or KVM_SEV_ES_INIT.

The flags value must be zero, it will be extended in future SNP support to
communicate the optional features (such as restricted INT injection etc).

Co-developed-by: Pavan Kumar Paluri <[email protected]>
Signed-off-by: Pavan Kumar Paluri <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
Signed-off-by: Michael Roth <[email protected]>
---
.../virt/kvm/x86/amd-memory-encryption.rst | 27 ++++++++++++
arch/x86/include/asm/svm.h | 1 +
arch/x86/kvm/svm/sev.c | 44 ++++++++++++++++++-
arch/x86/kvm/svm/svm.h | 4 ++
include/uapi/linux/kvm.h | 13 ++++++
5 files changed, 87 insertions(+), 2 deletions(-)

diff --git a/Documentation/virt/kvm/x86/amd-memory-encryption.rst b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
index 935aaeb97fe6..2432213bd0ea 100644
--- a/Documentation/virt/kvm/x86/amd-memory-encryption.rst
+++ b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
@@ -434,6 +434,33 @@ issued by the hypervisor to make the guest ready for execution.

Returns: 0 on success, -negative on error

+18. KVM_SNP_INIT
+----------------
+
+The KVM_SNP_INIT command can be used by the hypervisor to initialize SEV-SNP
+context. In a typical workflow, this command should be the first command issued.
+
+Parameters (in/out): struct kvm_snp_init
+
+Returns: 0 on success, -negative on error
+
+::
+
+ struct kvm_snp_init {
+ __u64 flags;
+ };
+
+The flags bitmap is defined as::
+
+ /* enable the restricted injection */
+ #define KVM_SEV_SNP_RESTRICTED_INJET (1<<0)
+
+ /* enable the restricted injection timer */
+ #define KVM_SEV_SNP_RESTRICTED_TIMER_INJET (1<<1)
+
+If the specified flags is not supported then return -EOPNOTSUPP, and the supported
+flags are returned.
+
References
==========

diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index cb1ee53ad3b1..c18d78d5e505 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -278,6 +278,7 @@ enum avic_ipi_failure_cause {
#define AVIC_HPA_MASK ~((0xFFFULL << 52) | 0xFFF)
#define VMCB_AVIC_APIC_BAR_MASK 0xFFFFFFFFFF000ULL

+#define SVM_SEV_FEAT_SNP_ACTIVE BIT(0)

struct vmcb_seg {
u16 selector;
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index f34da1203e09..e3f857cde8c0 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -247,6 +247,25 @@ static void sev_unbind_asid(struct kvm *kvm, unsigned int handle)
sev_decommission(handle);
}

+static int verify_snp_init_flags(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+ struct kvm_snp_init params;
+ int ret = 0;
+
+ if (copy_from_user(&params, (void __user *)(uintptr_t)argp->data, sizeof(params)))
+ return -EFAULT;
+
+ if (params.flags & ~SEV_SNP_SUPPORTED_FLAGS)
+ ret = -EOPNOTSUPP;
+
+ params.flags = SEV_SNP_SUPPORTED_FLAGS;
+
+ if (copy_to_user((void __user *)(uintptr_t)argp->data, &params, sizeof(params)))
+ ret = -EFAULT;
+
+ return ret;
+}
+
static int sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp)
{
struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
@@ -260,13 +279,23 @@ static int sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp)
return ret;

sev->active = true;
- sev->es_active = argp->id == KVM_SEV_ES_INIT;
+ sev->es_active = (argp->id == KVM_SEV_ES_INIT || argp->id == KVM_SEV_SNP_INIT);
+ sev->snp_active = argp->id == KVM_SEV_SNP_INIT;
asid = sev_asid_new(sev);
if (asid < 0)
goto e_no_asid;
sev->asid = asid;

- ret = sev_platform_init(&argp->error);
+ if (sev->snp_active) {
+ ret = verify_snp_init_flags(kvm, argp);
+ if (ret)
+ goto e_free;
+
+ ret = sev_snp_init(&argp->error, false);
+ } else {
+ ret = sev_platform_init(&argp->error);
+ }
+
if (ret)
goto e_free;

@@ -281,6 +310,7 @@ static int sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp)
sev_asid_free(sev);
sev->asid = 0;
e_no_asid:
+ sev->snp_active = false;
sev->es_active = false;
sev->active = false;
return ret;
@@ -741,6 +771,10 @@ static int sev_es_sync_vmsa(struct vcpu_svm *svm)
save->xss = svm->vcpu.arch.ia32_xss;
save->dr6 = svm->vcpu.arch.dr6;

+ /* Enable the SEV-SNP feature */
+ if (sev_snp_guest(svm->vcpu.kvm))
+ save->sev_features |= SVM_SEV_FEAT_SNP_ACTIVE;
+
pr_debug("Virtual Machine Save Area (VMSA):\n");
print_hex_dump_debug("", DUMP_PREFIX_NONE, 16, 1, save, sizeof(*save), false);

@@ -1993,6 +2027,12 @@ int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
}

switch (sev_cmd.id) {
+ case KVM_SEV_SNP_INIT:
+ if (!sev_snp_enabled) {
+ r = -ENOTTY;
+ goto out;
+ }
+ fallthrough;
case KVM_SEV_ES_INIT:
if (!sev_es_enabled) {
r = -ENOTTY;
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index a48fe5d2bea5..379b253d2464 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -80,6 +80,9 @@ enum {
/* TPR and CR2 are always written before VMRUN */
#define VMCB_ALWAYS_DIRTY_MASK ((1U << VMCB_INTR) | (1U << VMCB_CR2))

+/* Supported init feature flags */
+#define SEV_SNP_SUPPORTED_FLAGS 0x0
+
struct kvm_sev_info {
bool active; /* SEV enabled guest */
bool es_active; /* SEV-ES enabled guest */
@@ -95,6 +98,7 @@ struct kvm_sev_info {
struct list_head mirror_entry; /* Use as a list entry of mirrors */
struct misc_cg *misc_cg; /* For misc cgroup accounting */
atomic_t migration_in_progress;
+ u64 snp_init_flags;
};

struct kvm_svm {
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index cc9424ccf9b2..a6c73297a62d 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1938,6 +1938,9 @@ enum sev_cmd_id {
/* Guest Migration Extension */
KVM_SEV_SEND_CANCEL,

+ /* SNP specific commands */
+ KVM_SEV_SNP_INIT,
+
KVM_SEV_NR_MAX,
};

@@ -2034,6 +2037,16 @@ struct kvm_sev_receive_update_data {
__u32 trans_len;
};

+/* enable the restricted injection */
+#define KVM_SEV_SNP_RESTRICTED_INJET (1 << 0)
+
+/* enable the restricted injection timer */
+#define KVM_SEV_SNP_RESTRICTED_TIMER_INJET (1 << 1)
+
+struct kvm_snp_init {
+ __u64 flags;
+};
+
#define KVM_DEV_ASSIGN_ENABLE_IOMMU (1 << 0)
#define KVM_DEV_ASSIGN_PCI_2_3 (1 << 1)
#define KVM_DEV_ASSIGN_MASK_INTX (1 << 2)
--
2.25.1

2022-12-14 20:05:36

by Michael Roth

[permalink] [raw]
Subject: [PATCH RFC v7 38/64] KVM: SVM: Add KVM_SEV_SNP_LAUNCH_START command

From: Brijesh Singh <[email protected]>

KVM_SEV_SNP_LAUNCH_START begins the launch process for an SEV-SNP guest.
The command initializes a cryptographic digest context used to construct
the measurement of the guest. If the guest is expected to be migrated,
the command also binds a migration agent (MA) to the guest.

For more information see the SEV-SNP specification.

Signed-off-by: Brijesh Singh <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
Signed-off-by: Michael Roth <[email protected]>
---
.../virt/kvm/x86/amd-memory-encryption.rst | 24 ++++
arch/x86/kvm/svm/sev.c | 121 +++++++++++++++++-
arch/x86/kvm/svm/svm.h | 1 +
include/uapi/linux/kvm.h | 10 ++
4 files changed, 153 insertions(+), 3 deletions(-)

diff --git a/Documentation/virt/kvm/x86/amd-memory-encryption.rst b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
index 2432213bd0ea..58971fc02a15 100644
--- a/Documentation/virt/kvm/x86/amd-memory-encryption.rst
+++ b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
@@ -461,6 +461,30 @@ The flags bitmap is defined as::
If the specified flags is not supported then return -EOPNOTSUPP, and the supported
flags are returned.

+19. KVM_SNP_LAUNCH_START
+------------------------
+
+The KVM_SNP_LAUNCH_START command is used for creating the memory encryption
+context for the SEV-SNP guest. To create the encryption context, user must
+provide a guest policy, migration agent (if any) and guest OS visible
+workarounds value as defined SEV-SNP specification.
+
+Parameters (in): struct kvm_snp_launch_start
+
+Returns: 0 on success, -negative on error
+
+::
+
+ struct kvm_sev_snp_launch_start {
+ __u64 policy; /* Guest policy to use. */
+ __u64 ma_uaddr; /* userspace address of migration agent */
+ __u8 ma_en; /* 1 if the migration agent is enabled */
+ __u8 imi_en; /* set IMI to 1. */
+ __u8 gosvw[16]; /* guest OS visible workarounds */
+ };
+
+See the SEV-SNP specification for further detail on the launch input.
+
References
==========

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index e3f857cde8c0..6d1d0e424f76 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -21,6 +21,7 @@
#include <asm/pkru.h>
#include <asm/trapnr.h>
#include <asm/fpu/xcr.h>
+#include <asm/sev.h>

#include "mmu.h"
#include "x86.h"
@@ -74,6 +75,8 @@ static unsigned int nr_asids;
static unsigned long *sev_asid_bitmap;
static unsigned long *sev_reclaim_asid_bitmap;

+static int snp_decommission_context(struct kvm *kvm);
+
struct enc_region {
struct list_head list;
unsigned long npages;
@@ -99,12 +102,17 @@ static int sev_flush_asids(int min_asid, int max_asid)
down_write(&sev_deactivate_lock);

wbinvd_on_all_cpus();
- ret = sev_guest_df_flush(&error);
+
+ if (sev_snp_enabled)
+ ret = sev_do_cmd(SEV_CMD_SNP_DF_FLUSH, NULL, &error);
+ else
+ ret = sev_guest_df_flush(&error);

up_write(&sev_deactivate_lock);

if (ret)
- pr_err("SEV: DF_FLUSH failed, ret=%d, error=%#x\n", ret, error);
+ pr_err("SEV%s: DF_FLUSH failed, ret=%d, error=%#x\n",
+ sev_snp_enabled ? "-SNP" : "", ret, error);

return ret;
}
@@ -2003,6 +2011,80 @@ int sev_vm_move_enc_context_from(struct kvm *kvm, unsigned int source_fd)
return ret;
}

+/*
+ * The guest context contains all the information, keys and metadata
+ * associated with the guest that the firmware tracks to implement SEV
+ * and SNP features. The firmware stores the guest context in hypervisor
+ * provide page via the SNP_GCTX_CREATE command.
+ */
+static void *snp_context_create(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+ struct sev_data_snp_addr data = {};
+ void *context;
+ int rc;
+
+ /* Allocate memory for context page */
+ context = snp_alloc_firmware_page(GFP_KERNEL_ACCOUNT);
+ if (!context)
+ return NULL;
+
+ data.gctx_paddr = __psp_pa(context);
+ rc = __sev_issue_cmd(argp->sev_fd, SEV_CMD_SNP_GCTX_CREATE, &data, &argp->error);
+ if (rc) {
+ snp_free_firmware_page(context);
+ return NULL;
+ }
+
+ return context;
+}
+
+static int snp_bind_asid(struct kvm *kvm, int *error)
+{
+ struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+ struct sev_data_snp_activate data = {0};
+
+ data.gctx_paddr = __psp_pa(sev->snp_context);
+ data.asid = sev_get_asid(kvm);
+ return sev_issue_cmd(kvm, SEV_CMD_SNP_ACTIVATE, &data, error);
+}
+
+static int snp_launch_start(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+ struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+ struct sev_data_snp_launch_start start = {0};
+ struct kvm_sev_snp_launch_start params;
+ int rc;
+
+ if (!sev_snp_guest(kvm))
+ return -ENOTTY;
+
+ if (copy_from_user(&params, (void __user *)(uintptr_t)argp->data, sizeof(params)))
+ return -EFAULT;
+
+ sev->snp_context = snp_context_create(kvm, argp);
+ if (!sev->snp_context)
+ return -ENOTTY;
+
+ start.gctx_paddr = __psp_pa(sev->snp_context);
+ start.policy = params.policy;
+ memcpy(start.gosvw, params.gosvw, sizeof(params.gosvw));
+ rc = __sev_issue_cmd(argp->sev_fd, SEV_CMD_SNP_LAUNCH_START, &start, &argp->error);
+ if (rc)
+ goto e_free_context;
+
+ sev->fd = argp->sev_fd;
+ rc = snp_bind_asid(kvm, &argp->error);
+ if (rc)
+ goto e_free_context;
+
+ return 0;
+
+e_free_context:
+ snp_decommission_context(kvm);
+
+ return rc;
+}
+
int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
{
struct kvm_sev_cmd sev_cmd;
@@ -2093,6 +2175,9 @@ int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
case KVM_SEV_RECEIVE_FINISH:
r = sev_receive_finish(kvm, &sev_cmd);
break;
+ case KVM_SEV_SNP_LAUNCH_START:
+ r = snp_launch_start(kvm, &sev_cmd);
+ break;
default:
r = -EINVAL;
goto out;
@@ -2284,6 +2369,28 @@ int sev_vm_copy_enc_context_from(struct kvm *kvm, unsigned int source_fd)
return ret;
}

+static int snp_decommission_context(struct kvm *kvm)
+{
+ struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+ struct sev_data_snp_addr data = {};
+ int ret;
+
+ /* If context is not created then do nothing */
+ if (!sev->snp_context)
+ return 0;
+
+ data.gctx_paddr = __sme_pa(sev->snp_context);
+ ret = sev_do_cmd(SEV_CMD_SNP_DECOMMISSION, &data, NULL);
+ if (WARN_ONCE(ret, "failed to release guest context"))
+ return ret;
+
+ /* free the context page now */
+ snp_free_firmware_page(sev->snp_context);
+ sev->snp_context = NULL;
+
+ return 0;
+}
+
void sev_vm_destroy(struct kvm *kvm)
{
struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
@@ -2325,7 +2432,15 @@ void sev_vm_destroy(struct kvm *kvm)
}
}

- sev_unbind_asid(kvm, sev->handle);
+ if (sev_snp_guest(kvm)) {
+ if (snp_decommission_context(kvm)) {
+ WARN_ONCE(1, "Failed to free SNP guest context, leaking asid!\n");
+ return;
+ }
+ } else {
+ sev_unbind_asid(kvm, sev->handle);
+ }
+
sev_asid_free(sev);
}

diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 379b253d2464..17200c1ad20e 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -99,6 +99,7 @@ struct kvm_sev_info {
struct misc_cg *misc_cg; /* For misc cgroup accounting */
atomic_t migration_in_progress;
u64 snp_init_flags;
+ void *snp_context; /* SNP guest context page */
};

struct kvm_svm {
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index a6c73297a62d..b2311e0abeef 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1940,6 +1940,7 @@ enum sev_cmd_id {

/* SNP specific commands */
KVM_SEV_SNP_INIT,
+ KVM_SEV_SNP_LAUNCH_START,

KVM_SEV_NR_MAX,
};
@@ -2047,6 +2048,15 @@ struct kvm_snp_init {
__u64 flags;
};

+struct kvm_sev_snp_launch_start {
+ __u64 policy;
+ __u64 ma_uaddr;
+ __u8 ma_en;
+ __u8 imi_en;
+ __u8 gosvw[16];
+ __u8 pad[6];
+};
+
#define KVM_DEV_ASSIGN_ENABLE_IOMMU (1 << 0)
#define KVM_DEV_ASSIGN_PCI_2_3 (1 << 1)
#define KVM_DEV_ASSIGN_MASK_INTX (1 << 2)
--
2.25.1

2022-12-14 20:06:15

by Michael Roth

[permalink] [raw]
Subject: [PATCH RFC v7 39/64] KVM: SVM: Add KVM_SEV_SNP_LAUNCH_UPDATE command

From: Brijesh Singh <[email protected]>

The KVM_SEV_SNP_LAUNCH_UPDATE command can be used to insert data into the
guest's memory. The data is encrypted with the cryptographic context
created with the KVM_SEV_SNP_LAUNCH_START.

In addition to the inserting data, it can insert a two special pages
into the guests memory: the secrets page and the CPUID page.

While terminating the guest, reclaim the guest pages added in the RMP
table. If the reclaim fails, then the page is no longer safe to be
released back to the system and leak them.

For more information see the SEV-SNP specification.

Co-developed-by: Michael Roth <[email protected]>
Signed-off-by: Michael Roth <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
---
.../virt/kvm/x86/amd-memory-encryption.rst | 29 ++++
arch/x86/kvm/svm/sev.c | 161 ++++++++++++++++++
include/uapi/linux/kvm.h | 19 +++
3 files changed, 209 insertions(+)

diff --git a/Documentation/virt/kvm/x86/amd-memory-encryption.rst b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
index 58971fc02a15..c94be8e6d657 100644
--- a/Documentation/virt/kvm/x86/amd-memory-encryption.rst
+++ b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
@@ -485,6 +485,35 @@ Returns: 0 on success, -negative on error

See the SEV-SNP specification for further detail on the launch input.

+20. KVM_SNP_LAUNCH_UPDATE
+-------------------------
+
+The KVM_SNP_LAUNCH_UPDATE is used for encrypting a memory region. It also
+calculates a measurement of the memory contents. The measurement is a signature
+of the memory contents that can be sent to the guest owner as an attestation
+that the memory was encrypted correctly by the firmware.
+
+Parameters (in): struct kvm_snp_launch_update
+
+Returns: 0 on success, -negative on error
+
+::
+
+ struct kvm_sev_snp_launch_update {
+ __u64 start_gfn; /* Guest page number to start from. */
+ __u64 uaddr; /* userspace address need to be encrypted */
+ __u32 len; /* length of memory region */
+ __u8 imi_page; /* 1 if memory is part of the IMI */
+ __u8 page_type; /* page type */
+ __u8 vmpl3_perms; /* VMPL3 permission mask */
+ __u8 vmpl2_perms; /* VMPL2 permission mask */
+ __u8 vmpl1_perms; /* VMPL1 permission mask */
+ };
+
+See the SEV-SNP spec for further details on how to build the VMPL permission
+mask and page type.
+
+
References
==========

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 6d1d0e424f76..379e61a9226a 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -238,6 +238,37 @@ static void sev_decommission(unsigned int handle)
sev_guest_decommission(&decommission, NULL);
}

+static int snp_page_reclaim(u64 pfn)
+{
+ struct sev_data_snp_page_reclaim data = {0};
+ int err, rc;
+
+ data.paddr = __sme_set(pfn << PAGE_SHIFT);
+ rc = sev_do_cmd(SEV_CMD_SNP_PAGE_RECLAIM, &data, &err);
+ if (rc) {
+ /*
+ * If the reclaim failed, then page is no longer safe
+ * to use.
+ */
+ snp_mark_pages_offline(pfn,
+ page_level_size(PG_LEVEL_4K) >> PAGE_SHIFT);
+ }
+
+ return rc;
+}
+
+static int host_rmp_make_shared(u64 pfn, enum pg_level level, bool leak)
+{
+ int rc;
+
+ rc = rmp_make_shared(pfn, level);
+ if (rc && leak)
+ snp_mark_pages_offline(pfn,
+ page_level_size(level) >> PAGE_SHIFT);
+
+ return rc;
+}
+
static void sev_unbind_asid(struct kvm *kvm, unsigned int handle)
{
struct sev_data_deactivate deactivate;
@@ -2085,6 +2116,133 @@ static int snp_launch_start(struct kvm *kvm, struct kvm_sev_cmd *argp)
return rc;
}

+static int snp_launch_update_gfn_handler(struct kvm *kvm,
+ struct kvm_gfn_range *range,
+ void *opaque)
+{
+ struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+ struct kvm_memory_slot *memslot = range->slot;
+ struct sev_data_snp_launch_update data = {0};
+ struct kvm_sev_snp_launch_update params;
+ struct kvm_sev_cmd *argp = opaque;
+ int *error = &argp->error;
+ int i, n = 0, ret = 0;
+ unsigned long npages;
+ kvm_pfn_t *pfns;
+ gfn_t gfn;
+
+ if (!kvm_slot_can_be_private(memslot)) {
+ pr_err("SEV-SNP requires restricted memory.\n");
+ return -EINVAL;
+ }
+
+ if (copy_from_user(&params, (void __user *)(uintptr_t)argp->data, sizeof(params))) {
+ pr_err("Failed to copy user parameters for SEV-SNP launch.\n");
+ return -EFAULT;
+ }
+
+ data.gctx_paddr = __psp_pa(sev->snp_context);
+
+ npages = range->end - range->start;
+ pfns = kvmalloc_array(npages, sizeof(*pfns), GFP_KERNEL_ACCOUNT);
+ if (!pfns)
+ return -ENOMEM;
+
+ pr_debug("%s: GFN range 0x%llx-0x%llx, type %d\n", __func__,
+ range->start, range->end, params.page_type);
+
+ for (gfn = range->start, i = 0; gfn < range->end; gfn++, i++) {
+ int order, level;
+ void *kvaddr;
+
+ ret = kvm_restricted_mem_get_pfn(memslot, gfn, &pfns[i], &order);
+ if (ret)
+ goto e_release;
+
+ n++;
+ ret = snp_lookup_rmpentry((u64)pfns[i], &level);
+ if (ret) {
+ pr_err("Failed to ensure GFN 0x%llx is in initial shared state, ret: %d\n",
+ gfn, ret);
+ return -EFAULT;
+ }
+
+ kvaddr = pfn_to_kaddr(pfns[i]);
+ if (!virt_addr_valid(kvaddr)) {
+ pr_err("Invalid HVA 0x%llx for GFN 0x%llx\n", (uint64_t)kvaddr, gfn);
+ ret = -EINVAL;
+ goto e_release;
+ }
+
+ ret = kvm_read_guest_page(kvm, gfn, kvaddr, 0, PAGE_SIZE);
+ if (ret) {
+ pr_err("Guest read failed, ret: 0x%x\n", ret);
+ goto e_release;
+ }
+
+ ret = rmp_make_private(pfns[i], gfn << PAGE_SHIFT, PG_LEVEL_4K,
+ sev_get_asid(kvm), true);
+ if (ret) {
+ ret = -EFAULT;
+ goto e_release;
+ }
+
+ data.address = __sme_set(pfns[i] << PAGE_SHIFT);
+ data.page_size = X86_TO_RMP_PG_LEVEL(PG_LEVEL_4K);
+ data.page_type = params.page_type;
+ data.vmpl3_perms = params.vmpl3_perms;
+ data.vmpl2_perms = params.vmpl2_perms;
+ data.vmpl1_perms = params.vmpl1_perms;
+ ret = __sev_issue_cmd(argp->sev_fd, SEV_CMD_SNP_LAUNCH_UPDATE,
+ &data, error);
+ if (ret) {
+ pr_err("SEV-SNP launch update failed, ret: 0x%x, fw_error: 0x%x\n",
+ ret, *error);
+ snp_page_reclaim(pfns[i]);
+ goto e_release;
+ }
+ }
+
+ kvm_vm_set_region_attr(kvm, range->start, range->end, KVM_MEMORY_ATTRIBUTE_PRIVATE);
+
+e_release:
+ /* Content of memory is updated, mark pages dirty */
+ for (i = 0; i < n; i++) {
+ set_page_dirty(pfn_to_page(pfns[i]));
+ mark_page_accessed(pfn_to_page(pfns[i]));
+
+ /*
+ * If its an error, then update RMP entry to change page ownership
+ * to the hypervisor.
+ */
+ if (ret)
+ host_rmp_make_shared(pfns[i], PG_LEVEL_4K, true);
+
+ put_page(pfn_to_page(pfns[i]));
+ }
+
+ kvfree(pfns);
+ return ret;
+}
+
+static int snp_launch_update(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+ struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+ struct kvm_sev_snp_launch_update params;
+
+ if (!sev_snp_guest(kvm))
+ return -ENOTTY;
+
+ if (!sev->snp_context)
+ return -EINVAL;
+
+ if (copy_from_user(&params, (void __user *)(uintptr_t)argp->data, sizeof(params)))
+ return -EFAULT;
+
+ return kvm_vm_do_hva_range_op(kvm, params.uaddr, params.uaddr + params.len,
+ snp_launch_update_gfn_handler, argp);
+}
+
int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
{
struct kvm_sev_cmd sev_cmd;
@@ -2178,6 +2336,9 @@ int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
case KVM_SEV_SNP_LAUNCH_START:
r = snp_launch_start(kvm, &sev_cmd);
break;
+ case KVM_SEV_SNP_LAUNCH_UPDATE:
+ r = snp_launch_update(kvm, &sev_cmd);
+ break;
default:
r = -EINVAL;
goto out;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index b2311e0abeef..9b6c95cc62a8 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1941,6 +1941,7 @@ enum sev_cmd_id {
/* SNP specific commands */
KVM_SEV_SNP_INIT,
KVM_SEV_SNP_LAUNCH_START,
+ KVM_SEV_SNP_LAUNCH_UPDATE,

KVM_SEV_NR_MAX,
};
@@ -2057,6 +2058,24 @@ struct kvm_sev_snp_launch_start {
__u8 pad[6];
};

+#define KVM_SEV_SNP_PAGE_TYPE_NORMAL 0x1
+#define KVM_SEV_SNP_PAGE_TYPE_VMSA 0x2
+#define KVM_SEV_SNP_PAGE_TYPE_ZERO 0x3
+#define KVM_SEV_SNP_PAGE_TYPE_UNMEASURED 0x4
+#define KVM_SEV_SNP_PAGE_TYPE_SECRETS 0x5
+#define KVM_SEV_SNP_PAGE_TYPE_CPUID 0x6
+
+struct kvm_sev_snp_launch_update {
+ __u64 start_gfn;
+ __u64 uaddr;
+ __u32 len;
+ __u8 imi_page;
+ __u8 page_type;
+ __u8 vmpl3_perms;
+ __u8 vmpl2_perms;
+ __u8 vmpl1_perms;
+};
+
#define KVM_DEV_ASSIGN_ENABLE_IOMMU (1 << 0)
#define KVM_DEV_ASSIGN_PCI_2_3 (1 << 1)
#define KVM_DEV_ASSIGN_MASK_INTX (1 << 2)
--
2.25.1

2022-12-14 20:06:19

by Michael Roth

[permalink] [raw]
Subject: [PATCH RFC v7 40/64] KVM: SVM: Add KVM_SEV_SNP_LAUNCH_FINISH command

From: Brijesh Singh <[email protected]>

The KVM_SEV_SNP_LAUNCH_FINISH finalize the cryptographic digest and stores
it as the measurement of the guest at launch.

While finalizing the launch flow, it also issues the LAUNCH_UPDATE command
to encrypt the VMSA pages.

If its an SNP guest, then VMSA was added in the RMP entry as
a guest owned page and also removed from the kernel direct map
so flush it later after it is transitioned back to hypervisor
state and restored in the direct map.

Signed-off-by: Brijesh Singh <[email protected]>
Signed-off-by: Harald Hoyer <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
Signed-off-by: Michael Roth <[email protected]>
---
.../virt/kvm/x86/amd-memory-encryption.rst | 22 ++++
arch/x86/kvm/svm/sev.c | 119 ++++++++++++++++++
include/uapi/linux/kvm.h | 14 +++
3 files changed, 155 insertions(+)

diff --git a/Documentation/virt/kvm/x86/amd-memory-encryption.rst b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
index c94be8e6d657..e4b42aaab1de 100644
--- a/Documentation/virt/kvm/x86/amd-memory-encryption.rst
+++ b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
@@ -513,6 +513,28 @@ Returns: 0 on success, -negative on error
See the SEV-SNP spec for further details on how to build the VMPL permission
mask and page type.

+21. KVM_SNP_LAUNCH_FINISH
+-------------------------
+
+After completion of the SNP guest launch flow, the KVM_SNP_LAUNCH_FINISH command can be
+issued to make the guest ready for the execution.
+
+Parameters (in): struct kvm_sev_snp_launch_finish
+
+Returns: 0 on success, -negative on error
+
+::
+
+ struct kvm_sev_snp_launch_finish {
+ __u64 id_block_uaddr;
+ __u64 id_auth_uaddr;
+ __u8 id_block_en;
+ __u8 auth_key_en;
+ __u8 host_data[32];
+ };
+
+
+See SEV-SNP specification for further details on launch finish input parameters.

References
==========
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 379e61a9226a..6f901545bed9 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -2243,6 +2243,106 @@ static int snp_launch_update(struct kvm *kvm, struct kvm_sev_cmd *argp)
snp_launch_update_gfn_handler, argp);
}

+static int snp_launch_update_vmsa(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+ struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+ struct sev_data_snp_launch_update data = {};
+ int i, ret;
+
+ data.gctx_paddr = __psp_pa(sev->snp_context);
+ data.page_type = SNP_PAGE_TYPE_VMSA;
+
+ for (i = 0; i < kvm->created_vcpus; i++) {
+ struct vcpu_svm *svm = to_svm(xa_load(&kvm->vcpu_array, i));
+ u64 pfn = __pa(svm->sev_es.vmsa) >> PAGE_SHIFT;
+
+ /* Perform some pre-encryption checks against the VMSA */
+ ret = sev_es_sync_vmsa(svm);
+ if (ret)
+ return ret;
+
+ /* Transition the VMSA page to a firmware state. */
+ ret = rmp_make_private(pfn, -1, PG_LEVEL_4K, sev->asid, true);
+ if (ret)
+ return ret;
+
+ /* Issue the SNP command to encrypt the VMSA */
+ data.address = __sme_pa(svm->sev_es.vmsa);
+ ret = __sev_issue_cmd(argp->sev_fd, SEV_CMD_SNP_LAUNCH_UPDATE,
+ &data, &argp->error);
+ if (ret) {
+ snp_page_reclaim(pfn);
+ return ret;
+ }
+
+ svm->vcpu.arch.guest_state_protected = true;
+ }
+
+ return 0;
+}
+
+static int snp_launch_finish(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+ struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+ struct kvm_sev_snp_launch_finish params;
+ struct sev_data_snp_launch_finish *data;
+ void *id_block = NULL, *id_auth = NULL;
+ int ret;
+
+ if (!sev_snp_guest(kvm))
+ return -ENOTTY;
+
+ if (!sev->snp_context)
+ return -EINVAL;
+
+ if (copy_from_user(&params, (void __user *)(uintptr_t)argp->data, sizeof(params)))
+ return -EFAULT;
+
+ /* Measure all vCPUs using LAUNCH_UPDATE before finalizing the launch flow. */
+ ret = snp_launch_update_vmsa(kvm, argp);
+ if (ret)
+ return ret;
+
+ data = kzalloc(sizeof(*data), GFP_KERNEL_ACCOUNT);
+ if (!data)
+ return -ENOMEM;
+
+ if (params.id_block_en) {
+ id_block = psp_copy_user_blob(params.id_block_uaddr, KVM_SEV_SNP_ID_BLOCK_SIZE);
+ if (IS_ERR(id_block)) {
+ ret = PTR_ERR(id_block);
+ goto e_free;
+ }
+
+ data->id_block_en = 1;
+ data->id_block_paddr = __sme_pa(id_block);
+
+ id_auth = psp_copy_user_blob(params.id_auth_uaddr, KVM_SEV_SNP_ID_AUTH_SIZE);
+ if (IS_ERR(id_auth)) {
+ ret = PTR_ERR(id_auth);
+ goto e_free_id_block;
+ }
+
+ data->id_auth_paddr = __sme_pa(id_auth);
+
+ if (params.auth_key_en)
+ data->auth_key_en = 1;
+ }
+
+ data->gctx_paddr = __psp_pa(sev->snp_context);
+ ret = sev_issue_cmd(kvm, SEV_CMD_SNP_LAUNCH_FINISH, data, &argp->error);
+
+ kfree(id_auth);
+
+e_free_id_block:
+ kfree(id_block);
+
+e_free:
+ kfree(data);
+
+ return ret;
+}
+
int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
{
struct kvm_sev_cmd sev_cmd;
@@ -2339,6 +2439,9 @@ int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
case KVM_SEV_SNP_LAUNCH_UPDATE:
r = snp_launch_update(kvm, &sev_cmd);
break;
+ case KVM_SEV_SNP_LAUNCH_FINISH:
+ r = snp_launch_finish(kvm, &sev_cmd);
+ break;
default:
r = -EINVAL;
goto out;
@@ -2794,11 +2897,27 @@ void sev_free_vcpu(struct kvm_vcpu *vcpu)

svm = to_svm(vcpu);

+ /*
+ * If its an SNP guest, then VMSA was added in the RMP entry as
+ * a guest owned page. Transition the page to hypervisor state
+ * before releasing it back to the system.
+ * Also the page is removed from the kernel direct map, so flush it
+ * later after it is transitioned back to hypervisor state and
+ * restored in the direct map.
+ */
+ if (sev_snp_guest(vcpu->kvm)) {
+ u64 pfn = __pa(svm->sev_es.vmsa) >> PAGE_SHIFT;
+
+ if (host_rmp_make_shared(pfn, PG_LEVEL_4K, true))
+ goto skip_vmsa_free;
+ }
+
if (vcpu->arch.guest_state_protected)
sev_flush_encrypted_page(vcpu, svm->sev_es.vmsa);

__free_page(virt_to_page(svm->sev_es.vmsa));

+skip_vmsa_free:
if (svm->sev_es.ghcb_sa_free)
kvfree(svm->sev_es.ghcb_sa);
}
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 9b6c95cc62a8..c468adc1f147 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1942,6 +1942,7 @@ enum sev_cmd_id {
KVM_SEV_SNP_INIT,
KVM_SEV_SNP_LAUNCH_START,
KVM_SEV_SNP_LAUNCH_UPDATE,
+ KVM_SEV_SNP_LAUNCH_FINISH,

KVM_SEV_NR_MAX,
};
@@ -2076,6 +2077,19 @@ struct kvm_sev_snp_launch_update {
__u8 vmpl1_perms;
};

+#define KVM_SEV_SNP_ID_BLOCK_SIZE 96
+#define KVM_SEV_SNP_ID_AUTH_SIZE 4096
+#define KVM_SEV_SNP_FINISH_DATA_SIZE 32
+
+struct kvm_sev_snp_launch_finish {
+ __u64 id_block_uaddr;
+ __u64 id_auth_uaddr;
+ __u8 id_block_en;
+ __u8 auth_key_en;
+ __u8 host_data[KVM_SEV_SNP_FINISH_DATA_SIZE];
+ __u8 pad[6];
+};
+
#define KVM_DEV_ASSIGN_ENABLE_IOMMU (1 << 0)
#define KVM_DEV_ASSIGN_PCI_2_3 (1 << 1)
#define KVM_DEV_ASSIGN_MASK_INTX (1 << 2)
--
2.25.1

2022-12-14 20:06:35

by Michael Roth

[permalink] [raw]
Subject: [PATCH RFC v7 03/64] KVM: SVM: Advertise private memory support to KVM

From: Nikunj A Dadhania <[email protected]>

KVM should use private memory for guests that have upm_mode flag set.

Add a kvm_x86_ops hook for determining UPM support that accounts for
this situation by only enabling UPM test mode in the case of non-SEV
guests.

Signed-off-by: Nikunj A Dadhania <[email protected]>
[mdr: add x86 hook for determining restricted/private memory support]
Signed-off-by: Michael Roth <[email protected]>
---
arch/x86/include/asm/kvm-x86-ops.h | 1 +
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/kvm/svm/svm.c | 10 ++++++++++
arch/x86/kvm/x86.c | 8 ++++++++
4 files changed, 20 insertions(+)

diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
index abccd51dcfca..f530a550c092 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -131,6 +131,7 @@ KVM_X86_OP(msr_filter_changed)
KVM_X86_OP(complete_emulated_msr)
KVM_X86_OP(vcpu_deliver_sipi_vector)
KVM_X86_OP_OPTIONAL_RET0(vcpu_get_apicv_inhibit_reasons);
+KVM_X86_OP_OPTIONAL_RET0(private_mem_enabled);

#undef KVM_X86_OP
#undef KVM_X86_OP_OPTIONAL
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 2b6244525107..9317abffbf68 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1635,6 +1635,7 @@ struct kvm_x86_ops {

void (*load_mmu_pgd)(struct kvm_vcpu *vcpu, hpa_t root_hpa,
int root_level);
+ int (*private_mem_enabled)(struct kvm *kvm);

bool (*has_wbinvd_exit)(void);

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 91352d692845..7f3e4d91c0c6 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4694,6 +4694,14 @@ static int svm_vm_init(struct kvm *kvm)
return 0;
}

+static int svm_private_mem_enabled(struct kvm *kvm)
+{
+ if (sev_guest(kvm))
+ return kvm->arch.upm_mode ? 1 : 0;
+
+ return IS_ENABLED(CONFIG_HAVE_KVM_PRIVATE_MEM_TESTING) ? 1 : 0;
+}
+
static struct kvm_x86_ops svm_x86_ops __initdata = {
.name = "kvm_amd",

@@ -4774,6 +4782,8 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {

.vcpu_after_set_cpuid = svm_vcpu_after_set_cpuid,

+ .private_mem_enabled = svm_private_mem_enabled,
+
.has_wbinvd_exit = svm_has_wbinvd_exit,

.get_l2_tsc_offset = svm_get_l2_tsc_offset,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 99ecf99bc4d2..bb6adb216054 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -12266,6 +12266,14 @@ void __user * __x86_set_memory_region(struct kvm *kvm, int id, gpa_t gpa,
}
EXPORT_SYMBOL_GPL(__x86_set_memory_region);

+bool kvm_arch_has_private_mem(struct kvm *kvm)
+{
+ if (static_call(kvm_x86_private_mem_enabled)(kvm))
+ return true;
+
+ return false;
+}
+
void kvm_arch_pre_destroy_vm(struct kvm *kvm)
{
kvm_mmu_pre_destroy_vm(kvm);
--
2.25.1

2022-12-14 20:07:16

by Michael Roth

[permalink] [raw]
Subject: [PATCH RFC v7 41/64] KVM: X86: Keep the NPT and RMP page level in sync

From: Brijesh Singh <[email protected]>

When running an SEV-SNP VM, the sPA used to index the RMP entry is
obtained through the NPT translation (gva->gpa->spa). The NPT page
level is checked against the page level programmed in the RMP entry.
If the page level does not match, then it will cause a nested page
fault with the RMP bit set to indicate the RMP violation.

Co-developed-by: Michael Roth <[email protected]>
Signed-off-by: Michael Roth <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
Signed-off-by: Jarkko Sakkinen <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
---
arch/x86/include/asm/kvm-x86-ops.h | 1 +
arch/x86/include/asm/kvm_host.h | 2 +
arch/x86/kvm/mmu/mmu.c | 12 +++++-
arch/x86/kvm/svm/sev.c | 66 ++++++++++++++++++++++++++++++
arch/x86/kvm/svm/svm.c | 2 +
arch/x86/kvm/svm/svm.h | 1 +
6 files changed, 83 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
index e0015926cdf4..61e31b622fce 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -136,6 +136,7 @@ KVM_X86_OP_OPTIONAL_RET0(private_mem_enabled);
KVM_X86_OP_OPTIONAL_RET0(fault_is_private);
KVM_X86_OP_OPTIONAL_RET0(update_mem_attr)
KVM_X86_OP_OPTIONAL(invalidate_restricted_mem)
+KVM_X86_OP_OPTIONAL(rmp_page_level_adjust)

#undef KVM_X86_OP
#undef KVM_X86_OP_OPTIONAL
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index e2529415f28b..b126c6ac7ce4 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1724,6 +1724,8 @@ struct kvm_x86_ops {
unsigned long (*vcpu_get_apicv_inhibit_reasons)(struct kvm_vcpu *vcpu);

void *(*alloc_apic_backing_page)(struct kvm_vcpu *vcpu);
+
+ void (*rmp_page_level_adjust)(struct kvm *kvm, gfn_t gfn, int *level);
};

struct kvm_x86_nested_ops {
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 2713632e5061..25db83021500 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3053,6 +3053,11 @@ static int host_pfn_mapping_level(struct kvm *kvm, gfn_t gfn,

out:
local_irq_restore(flags);
+
+ /* Adjust the page level based on the SEV-SNP RMP page level. */
+ if (kvm_x86_ops.rmp_page_level_adjust)
+ static_call(kvm_x86_rmp_page_level_adjust)(kvm, gfn, &level);
+
return level;
}

@@ -3070,8 +3075,13 @@ int kvm_mmu_max_mapping_level(struct kvm *kvm,
break;
}

- if (is_private)
+ pr_debug("%s: gfn: %llx max_level: %d max_huge_page_level: %d\n",
+ __func__, gfn, max_level, max_huge_page_level);
+ if (kvm_slot_can_be_private(slot) && is_private) {
+ if (kvm_x86_ops.rmp_page_level_adjust)
+ static_call(kvm_x86_rmp_page_level_adjust)(kvm, gfn, &max_level);
return max_level;
+ }

if (max_level == PG_LEVEL_4K)
return PG_LEVEL_4K;
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 6f901545bed9..443c5c8aaaf3 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3710,6 +3710,72 @@ struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu)
return p;
}

+static bool is_pfn_range_shared(kvm_pfn_t start, kvm_pfn_t end)
+{
+ int level;
+
+ while (end > start) {
+ if (snp_lookup_rmpentry(start, &level) != 0)
+ return false;
+ start++;
+ }
+
+ return true;
+}
+
+void sev_rmp_page_level_adjust(struct kvm *kvm, gfn_t gfn, int *level)
+{
+ struct kvm_memory_slot *slot;
+ int ret, order, assigned;
+ int rmp_level = 1;
+ kvm_pfn_t pfn;
+
+ slot = gfn_to_memslot(kvm, gfn);
+ if (!kvm_slot_can_be_private(slot))
+ return;
+
+ ret = kvm_restricted_mem_get_pfn(slot, gfn, &pfn, &order);
+ if (ret) {
+ pr_warn_ratelimited("Failed to adjust RMP page level, unable to obtain private PFN, rc: %d\n",
+ ret);
+ *level = PG_LEVEL_4K;
+ return;
+ }
+
+ /* If there's an error retrieving RMP entry, stick with 4K mappings */
+ assigned = snp_lookup_rmpentry(pfn, &rmp_level);
+ if (unlikely(assigned < 0))
+ goto out_adjust;
+
+ if (!assigned) {
+ kvm_pfn_t huge_pfn;
+
+ /*
+ * If all the pages are shared then no need to keep the RMP
+ * and NPT in sync.
+ */
+ huge_pfn = pfn & ~(PTRS_PER_PMD - 1);
+ if (is_pfn_range_shared(huge_pfn, huge_pfn + PTRS_PER_PMD))
+ goto out;
+ }
+
+ /*
+ * The hardware installs 2MB TLB entries to access to 1GB pages,
+ * therefore allow NPT to use 1GB pages when pfn was added as 2MB
+ * in the RMP table.
+ */
+ if (rmp_level == PG_LEVEL_2M && (*level == PG_LEVEL_1G))
+ goto out;
+
+out_adjust:
+ /* Adjust the level to keep the NPT and RMP in sync */
+ *level = min_t(size_t, *level, rmp_level);
+out:
+ put_page(pfn_to_page(pfn));
+ pr_debug("%s: GFN: 0x%llx, level: %d, rmp_level: %d, ret: %d\n",
+ __func__, gfn, *level, rmp_level, ret);
+}
+
int sev_fault_is_private(struct kvm *kvm, gpa_t gpa, u64 error_code, bool *private_fault)
{
gfn_t gfn = gpa_to_gfn(gpa);
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 013f811c733c..2dfa150bcb09 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4843,6 +4843,8 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
.alloc_apic_backing_page = svm_alloc_apic_backing_page,

.fault_is_private = sev_fault_is_private,
+
+ .rmp_page_level_adjust = sev_rmp_page_level_adjust,
};

/*
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 17200c1ad20e..ae733188cf87 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -698,6 +698,7 @@ void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector);
void sev_es_prepare_switch_to_guest(struct sev_es_save_area *hostsa);
void sev_es_unmap_ghcb(struct vcpu_svm *svm);
struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu);
+void sev_rmp_page_level_adjust(struct kvm *kvm, gfn_t gfn, int *level);

int sev_fault_is_private(struct kvm *kvm, gpa_t gpa, u64 error_code, bool *private_fault);

--
2.25.1

2022-12-14 20:07:50

by Michael Roth

[permalink] [raw]
Subject: [PATCH RFC v7 42/64] KVM: x86: Define RMP page fault error bits for #NPF

From: Brijesh Singh <[email protected]>

When SEV-SNP is enabled globally, the hardware places restrictions on all
memory accesses based on the RMP entry, whether the hypervisor or a VM,
performs the accesses. When hardware encounters an RMP access violation
during a guest access, it will cause a #VMEXIT(NPF).

See APM2 section 16.36.10 for more details.

Signed-off-by: Brijesh Singh <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
Signed-off-by: Michael Roth <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 8 ++++++++
1 file changed, 8 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index b126c6ac7ce4..f4bb0821757e 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -257,9 +257,13 @@ enum x86_intercept_stage;
#define PFERR_FETCH_BIT 4
#define PFERR_PK_BIT 5
#define PFERR_SGX_BIT 15
+#define PFERR_GUEST_RMP_BIT 31
#define PFERR_GUEST_FINAL_BIT 32
#define PFERR_GUEST_PAGE_BIT 33
#define PFERR_IMPLICIT_ACCESS_BIT 48
+#define PFERR_GUEST_ENC_BIT 34
+#define PFERR_GUEST_SIZEM_BIT 35
+#define PFERR_GUEST_VMPL_BIT 36

#define PFERR_PRESENT_MASK BIT(PFERR_PRESENT_BIT)
#define PFERR_WRITE_MASK BIT(PFERR_WRITE_BIT)
@@ -271,6 +275,10 @@ enum x86_intercept_stage;
#define PFERR_GUEST_FINAL_MASK BIT_ULL(PFERR_GUEST_FINAL_BIT)
#define PFERR_GUEST_PAGE_MASK BIT_ULL(PFERR_GUEST_PAGE_BIT)
#define PFERR_IMPLICIT_ACCESS BIT_ULL(PFERR_IMPLICIT_ACCESS_BIT)
+#define PFERR_GUEST_RMP_MASK BIT_ULL(PFERR_GUEST_RMP_BIT)
+#define PFERR_GUEST_ENC_MASK BIT_ULL(PFERR_GUEST_ENC_BIT)
+#define PFERR_GUEST_SIZEM_MASK BIT_ULL(PFERR_GUEST_SIZEM_BIT)
+#define PFERR_GUEST_VMPL_MASK BIT_ULL(PFERR_GUEST_VMPL_BIT)

#define PFERR_NESTED_GUEST_PAGE (PFERR_GUEST_PAGE_MASK | \
PFERR_WRITE_MASK | \
--
2.25.1

2022-12-14 20:07:51

by Michael Roth

[permalink] [raw]
Subject: [PATCH RFC v7 43/64] KVM: SVM: Do not use long-lived GHCB map while setting scratch area

From: Brijesh Singh <[email protected]>

The setup_vmgexit_scratch() function may rely on a long-lived GHCB
mapping if the GHCB shared buffer area was used for the scratch area.
In preparation for eliminating the long-lived GHCB mapping, always
allocate a buffer for the scratch area so it can be accessed without
the GHCB mapping.

Signed-off-by: Brijesh Singh <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
Signed-off-by: Michael Roth <[email protected]>
---
arch/x86/kvm/svm/sev.c | 74 +++++++++++++++++++-----------------------
arch/x86/kvm/svm/svm.h | 3 +-
2 files changed, 36 insertions(+), 41 deletions(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 443c5c8aaaf3..d5c6e48055fb 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -2918,8 +2918,7 @@ void sev_free_vcpu(struct kvm_vcpu *vcpu)
__free_page(virt_to_page(svm->sev_es.vmsa));

skip_vmsa_free:
- if (svm->sev_es.ghcb_sa_free)
- kvfree(svm->sev_es.ghcb_sa);
+ kvfree(svm->sev_es.ghcb_sa);
}

static void dump_ghcb(struct vcpu_svm *svm)
@@ -3007,6 +3006,9 @@ static void sev_es_sync_from_ghcb(struct vcpu_svm *svm)
control->exit_info_1 = ghcb_get_sw_exit_info_1(ghcb);
control->exit_info_2 = ghcb_get_sw_exit_info_2(ghcb);

+ /* Copy the GHCB scratch area GPA */
+ svm->sev_es.ghcb_sa_gpa = ghcb_get_sw_scratch(ghcb);
+
/* Clear the valid entries fields */
memset(ghcb->save.valid_bitmap, 0, sizeof(ghcb->save.valid_bitmap));
}
@@ -3152,23 +3154,12 @@ void sev_es_unmap_ghcb(struct vcpu_svm *svm)
if (!svm->sev_es.ghcb)
return;

- if (svm->sev_es.ghcb_sa_free) {
- /*
- * The scratch area lives outside the GHCB, so there is a
- * buffer that, depending on the operation performed, may
- * need to be synced, then freed.
- */
- if (svm->sev_es.ghcb_sa_sync) {
- kvm_write_guest(svm->vcpu.kvm,
- ghcb_get_sw_scratch(svm->sev_es.ghcb),
- svm->sev_es.ghcb_sa,
- svm->sev_es.ghcb_sa_len);
- svm->sev_es.ghcb_sa_sync = false;
- }
-
- kvfree(svm->sev_es.ghcb_sa);
- svm->sev_es.ghcb_sa = NULL;
- svm->sev_es.ghcb_sa_free = false;
+ /* Sync the scratch buffer area. */
+ if (svm->sev_es.ghcb_sa_sync) {
+ kvm_write_guest(svm->vcpu.kvm,
+ ghcb_get_sw_scratch(svm->sev_es.ghcb),
+ svm->sev_es.ghcb_sa, svm->sev_es.ghcb_sa_len);
+ svm->sev_es.ghcb_sa_sync = false;
}

trace_kvm_vmgexit_exit(svm->vcpu.vcpu_id, svm->sev_es.ghcb);
@@ -3209,9 +3200,8 @@ static int setup_vmgexit_scratch(struct vcpu_svm *svm, bool sync, u64 len)
struct ghcb *ghcb = svm->sev_es.ghcb;
u64 ghcb_scratch_beg, ghcb_scratch_end;
u64 scratch_gpa_beg, scratch_gpa_end;
- void *scratch_va;

- scratch_gpa_beg = ghcb_get_sw_scratch(ghcb);
+ scratch_gpa_beg = svm->sev_es.ghcb_sa_gpa;
if (!scratch_gpa_beg) {
pr_err("vmgexit: scratch gpa not provided\n");
goto e_scratch;
@@ -3241,9 +3231,6 @@ static int setup_vmgexit_scratch(struct vcpu_svm *svm, bool sync, u64 len)
scratch_gpa_beg, scratch_gpa_end);
goto e_scratch;
}
-
- scratch_va = (void *)svm->sev_es.ghcb;
- scratch_va += (scratch_gpa_beg - control->ghcb_gpa);
} else {
/*
* The guest memory must be read into a kernel buffer, so
@@ -3254,29 +3241,36 @@ static int setup_vmgexit_scratch(struct vcpu_svm *svm, bool sync, u64 len)
len, GHCB_SCRATCH_AREA_LIMIT);
goto e_scratch;
}
- scratch_va = kvzalloc(len, GFP_KERNEL_ACCOUNT);
- if (!scratch_va)
- return -ENOMEM;
+ }

- if (kvm_read_guest(svm->vcpu.kvm, scratch_gpa_beg, scratch_va, len)) {
- /* Unable to copy scratch area from guest */
- pr_err("vmgexit: kvm_read_guest for scratch area failed\n");
+ if (svm->sev_es.ghcb_sa_alloc_len < len) {
+ void *scratch_va = kvzalloc(len, GFP_KERNEL_ACCOUNT);

- kvfree(scratch_va);
- return -EFAULT;
- }
+ if (!scratch_va)
+ return -ENOMEM;

/*
- * The scratch area is outside the GHCB. The operation will
- * dictate whether the buffer needs to be synced before running
- * the vCPU next time (i.e. a read was requested so the data
- * must be written back to the guest memory).
+ * Free the old scratch area and switch to using newly
+ * allocated.
*/
- svm->sev_es.ghcb_sa_sync = sync;
- svm->sev_es.ghcb_sa_free = true;
+ kvfree(svm->sev_es.ghcb_sa);
+
+ svm->sev_es.ghcb_sa_alloc_len = len;
+ svm->sev_es.ghcb_sa = scratch_va;
}

- svm->sev_es.ghcb_sa = scratch_va;
+ if (kvm_read_guest(svm->vcpu.kvm, scratch_gpa_beg, svm->sev_es.ghcb_sa, len)) {
+ /* Unable to copy scratch area from guest */
+ pr_err("vmgexit: kvm_read_guest for scratch area failed\n");
+ return -EFAULT;
+ }
+
+ /*
+ * The operation will dictate whether the buffer needs to be synced
+ * before running the vCPU next time (i.e. a read was requested so
+ * the data must be written back to the guest memory).
+ */
+ svm->sev_es.ghcb_sa_sync = sync;
svm->sev_es.ghcb_sa_len = len;

return 0;
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index ae733188cf87..f53a41e13033 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -209,8 +209,9 @@ struct vcpu_sev_es_state {
/* SEV-ES scratch area support */
void *ghcb_sa;
u32 ghcb_sa_len;
+ u64 ghcb_sa_gpa;
+ u32 ghcb_sa_alloc_len;
bool ghcb_sa_sync;
- bool ghcb_sa_free;
};

struct vcpu_svm {
--
2.25.1

2022-12-14 20:09:30

by Michael Roth

[permalink] [raw]
Subject: [PATCH RFC v7 44/64] KVM: SVM: Remove the long-lived GHCB host map

From: Brijesh Singh <[email protected]>

On VMGEXIT, sev_handle_vmgexit() creates a host mapping for the GHCB GPA,
and unmaps it just before VM-entry. This long-lived GHCB map is used by
the VMGEXIT handler through accessors such as ghcb_{set_get}_xxx().

A long-lived GHCB map can cause issue when SEV-SNP is enabled. When
SEV-SNP is enabled the mapped GPA needs to be protected against a page
state change.

To eliminate the long-lived GHCB mapping, update the GHCB sync operations
to explicitly map the GHCB before access and unmap it after access is
complete. This requires that the setting of the GHCBs sw_exit_info_{1,2}
fields be done during sev_es_sync_to_ghcb(), so create two new fields in
the vcpu_svm struct to hold these values when required to be set outside
of the GHCB mapping.

Signed-off-by: Brijesh Singh <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
[mdr: defer per_cpu() assignment and order it with barrier() to fix case
where kvm_vcpu_map() causes reschedule on different CPU]
Signed-off-by: Michael Roth <[email protected]>
---
arch/x86/kvm/svm/sev.c | 131 ++++++++++++++++++++++++++---------------
arch/x86/kvm/svm/svm.c | 18 +++---
arch/x86/kvm/svm/svm.h | 24 +++++++-
3 files changed, 116 insertions(+), 57 deletions(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index d5c6e48055fb..6ac0cb6e3484 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -2921,15 +2921,40 @@ void sev_free_vcpu(struct kvm_vcpu *vcpu)
kvfree(svm->sev_es.ghcb_sa);
}

+static inline int svm_map_ghcb(struct vcpu_svm *svm, struct kvm_host_map *map)
+{
+ struct vmcb_control_area *control = &svm->vmcb->control;
+ u64 gfn = gpa_to_gfn(control->ghcb_gpa);
+
+ if (kvm_vcpu_map(&svm->vcpu, gfn, map)) {
+ /* Unable to map GHCB from guest */
+ pr_err("error mapping GHCB GFN [%#llx] from guest\n", gfn);
+ return -EFAULT;
+ }
+
+ return 0;
+}
+
+static inline void svm_unmap_ghcb(struct vcpu_svm *svm, struct kvm_host_map *map)
+{
+ kvm_vcpu_unmap(&svm->vcpu, map, true);
+}
+
static void dump_ghcb(struct vcpu_svm *svm)
{
- struct ghcb *ghcb = svm->sev_es.ghcb;
+ struct kvm_host_map map;
unsigned int nbits;
+ struct ghcb *ghcb;
+
+ if (svm_map_ghcb(svm, &map))
+ return;
+
+ ghcb = map.hva;

/* Re-use the dump_invalid_vmcb module parameter */
if (!dump_invalid_vmcb) {
pr_warn_ratelimited("set kvm_amd.dump_invalid_vmcb=1 to dump internal KVM state.\n");
- return;
+ goto e_unmap;
}

nbits = sizeof(ghcb->save.valid_bitmap) * 8;
@@ -2944,12 +2969,21 @@ static void dump_ghcb(struct vcpu_svm *svm)
pr_err("%-20s%016llx is_valid: %u\n", "sw_scratch",
ghcb->save.sw_scratch, ghcb_sw_scratch_is_valid(ghcb));
pr_err("%-20s%*pb\n", "valid_bitmap", nbits, ghcb->save.valid_bitmap);
+
+e_unmap:
+ svm_unmap_ghcb(svm, &map);
}

-static void sev_es_sync_to_ghcb(struct vcpu_svm *svm)
+static bool sev_es_sync_to_ghcb(struct vcpu_svm *svm)
{
struct kvm_vcpu *vcpu = &svm->vcpu;
- struct ghcb *ghcb = svm->sev_es.ghcb;
+ struct kvm_host_map map;
+ struct ghcb *ghcb;
+
+ if (svm_map_ghcb(svm, &map))
+ return false;
+
+ ghcb = map.hva;

/*
* The GHCB protocol so far allows for the following data
@@ -2963,13 +2997,24 @@ static void sev_es_sync_to_ghcb(struct vcpu_svm *svm)
ghcb_set_rbx(ghcb, vcpu->arch.regs[VCPU_REGS_RBX]);
ghcb_set_rcx(ghcb, vcpu->arch.regs[VCPU_REGS_RCX]);
ghcb_set_rdx(ghcb, vcpu->arch.regs[VCPU_REGS_RDX]);
+
+ /*
+ * Copy the return values from the exit_info_{1,2}.
+ */
+ ghcb_set_sw_exit_info_1(ghcb, svm->sev_es.ghcb_sw_exit_info_1);
+ ghcb_set_sw_exit_info_2(ghcb, svm->sev_es.ghcb_sw_exit_info_2);
+
+ trace_kvm_vmgexit_exit(svm->vcpu.vcpu_id, ghcb);
+
+ svm_unmap_ghcb(svm, &map);
+
+ return true;
}

-static void sev_es_sync_from_ghcb(struct vcpu_svm *svm)
+static void sev_es_sync_from_ghcb(struct vcpu_svm *svm, struct ghcb *ghcb)
{
struct vmcb_control_area *control = &svm->vmcb->control;
struct kvm_vcpu *vcpu = &svm->vcpu;
- struct ghcb *ghcb = svm->sev_es.ghcb;
u64 exit_code;

/*
@@ -3013,20 +3058,25 @@ static void sev_es_sync_from_ghcb(struct vcpu_svm *svm)
memset(ghcb->save.valid_bitmap, 0, sizeof(ghcb->save.valid_bitmap));
}

-static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
+static int sev_es_validate_vmgexit(struct vcpu_svm *svm, u64 *exit_code)
{
- struct kvm_vcpu *vcpu;
+ struct kvm_vcpu *vcpu = &svm->vcpu;
+ struct kvm_host_map map;
struct ghcb *ghcb;
- u64 exit_code;
u64 reason;

- ghcb = svm->sev_es.ghcb;
+ if (svm_map_ghcb(svm, &map))
+ return -EFAULT;
+
+ ghcb = map.hva;
+
+ trace_kvm_vmgexit_enter(vcpu->vcpu_id, ghcb);

/*
* Retrieve the exit code now even though it may not be marked valid
* as it could help with debugging.
*/
- exit_code = ghcb_get_sw_exit_code(ghcb);
+ *exit_code = ghcb_get_sw_exit_code(ghcb);

/* Only GHCB Usage code 0 is supported */
if (ghcb->ghcb_usage) {
@@ -3119,6 +3169,9 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
goto vmgexit_err;
}

+ sev_es_sync_from_ghcb(svm, ghcb);
+
+ svm_unmap_ghcb(svm, &map);
return 0;

vmgexit_err:
@@ -3129,10 +3182,10 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
ghcb->ghcb_usage);
} else if (reason == GHCB_ERR_INVALID_EVENT) {
vcpu_unimpl(vcpu, "vmgexit: exit code %#llx is not valid\n",
- exit_code);
+ *exit_code);
} else {
vcpu_unimpl(vcpu, "vmgexit: exit code %#llx input is not valid\n",
- exit_code);
+ *exit_code);
dump_ghcb(svm);
}

@@ -3142,6 +3195,8 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
ghcb_set_sw_exit_info_1(ghcb, 2);
ghcb_set_sw_exit_info_2(ghcb, reason);

+ svm_unmap_ghcb(svm, &map);
+
/* Resume the guest to "return" the error code. */
return 1;
}
@@ -3151,23 +3206,20 @@ void sev_es_unmap_ghcb(struct vcpu_svm *svm)
/* Clear any indication that the vCPU is in a type of AP Reset Hold */
svm->sev_es.ap_reset_hold_type = AP_RESET_HOLD_NONE;

- if (!svm->sev_es.ghcb)
+ if (!svm->sev_es.ghcb_in_use)
return;

/* Sync the scratch buffer area. */
if (svm->sev_es.ghcb_sa_sync) {
kvm_write_guest(svm->vcpu.kvm,
- ghcb_get_sw_scratch(svm->sev_es.ghcb),
+ svm->sev_es.ghcb_sa_gpa,
svm->sev_es.ghcb_sa, svm->sev_es.ghcb_sa_len);
svm->sev_es.ghcb_sa_sync = false;
}

- trace_kvm_vmgexit_exit(svm->vcpu.vcpu_id, svm->sev_es.ghcb);
-
sev_es_sync_to_ghcb(svm);

- kvm_vcpu_unmap(&svm->vcpu, &svm->sev_es.ghcb_map, true);
- svm->sev_es.ghcb = NULL;
+ svm->sev_es.ghcb_in_use = false;
}

void pre_sev_run(struct vcpu_svm *svm, int cpu)
@@ -3197,7 +3249,6 @@ void pre_sev_run(struct vcpu_svm *svm, int cpu)
static int setup_vmgexit_scratch(struct vcpu_svm *svm, bool sync, u64 len)
{
struct vmcb_control_area *control = &svm->vmcb->control;
- struct ghcb *ghcb = svm->sev_es.ghcb;
u64 ghcb_scratch_beg, ghcb_scratch_end;
u64 scratch_gpa_beg, scratch_gpa_end;

@@ -3276,8 +3327,8 @@ static int setup_vmgexit_scratch(struct vcpu_svm *svm, bool sync, u64 len)
return 0;

e_scratch:
- ghcb_set_sw_exit_info_1(ghcb, 2);
- ghcb_set_sw_exit_info_2(ghcb, GHCB_ERR_INVALID_SCRATCH_AREA);
+ svm_set_ghcb_sw_exit_info_1(&svm->vcpu, 2);
+ svm_set_ghcb_sw_exit_info_2(&svm->vcpu, GHCB_ERR_INVALID_SCRATCH_AREA);

return 1;
}
@@ -3413,7 +3464,6 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
struct vcpu_svm *svm = to_svm(vcpu);
struct vmcb_control_area *control = &svm->vmcb->control;
u64 ghcb_gpa, exit_code;
- struct ghcb *ghcb;
int ret;

/* Validate the GHCB */
@@ -3428,29 +3478,14 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
return 1;
}

- if (kvm_vcpu_map(vcpu, ghcb_gpa >> PAGE_SHIFT, &svm->sev_es.ghcb_map)) {
- /* Unable to map GHCB from guest */
- vcpu_unimpl(vcpu, "vmgexit: error mapping GHCB [%#llx] from guest\n",
- ghcb_gpa);
-
- /* Without a GHCB, just return right back to the guest */
- return 1;
- }
-
- svm->sev_es.ghcb = svm->sev_es.ghcb_map.hva;
- ghcb = svm->sev_es.ghcb_map.hva;
-
- trace_kvm_vmgexit_enter(vcpu->vcpu_id, ghcb);
-
- exit_code = ghcb_get_sw_exit_code(ghcb);
-
- ret = sev_es_validate_vmgexit(svm);
+ ret = sev_es_validate_vmgexit(svm, &exit_code);
if (ret)
return ret;

- sev_es_sync_from_ghcb(svm);
- ghcb_set_sw_exit_info_1(ghcb, 0);
- ghcb_set_sw_exit_info_2(ghcb, 0);
+ svm->sev_es.ghcb_in_use = true;
+
+ svm_set_ghcb_sw_exit_info_1(vcpu, 0);
+ svm_set_ghcb_sw_exit_info_2(vcpu, 0);

switch (exit_code) {
case SVM_VMGEXIT_MMIO_READ:
@@ -3490,20 +3525,20 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
break;
case 1:
/* Get AP jump table address */
- ghcb_set_sw_exit_info_2(ghcb, sev->ap_jump_table);
+ svm_set_ghcb_sw_exit_info_2(vcpu, sev->ap_jump_table);
break;
default:
pr_err("svm: vmgexit: unsupported AP jump table request - exit_info_1=%#llx\n",
control->exit_info_1);
- ghcb_set_sw_exit_info_1(ghcb, 2);
- ghcb_set_sw_exit_info_2(ghcb, GHCB_ERR_INVALID_INPUT);
+ svm_set_ghcb_sw_exit_info_1(vcpu, 2);
+ svm_set_ghcb_sw_exit_info_2(vcpu, GHCB_ERR_INVALID_INPUT);
}

ret = 1;
break;
}
case SVM_VMGEXIT_HV_FEATURES: {
- ghcb_set_sw_exit_info_2(ghcb, GHCB_HV_FT_SUPPORTED);
+ svm_set_ghcb_sw_exit_info_2(vcpu, GHCB_HV_FT_SUPPORTED);

ret = 1;
break;
@@ -3651,7 +3686,7 @@ void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector)
* Return from an AP Reset Hold VMGEXIT, where the guest will
* set the CS and RIP. Set SW_EXIT_INFO_2 to a non-zero value.
*/
- ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, 1);
+ svm_set_ghcb_sw_exit_info_2(vcpu, 1);
break;
case AP_RESET_HOLD_MSR_PROTO:
/*
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 2dfa150bcb09..1826946a2f43 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1445,7 +1445,7 @@ static void svm_vcpu_free(struct kvm_vcpu *vcpu)
static void svm_prepare_switch_to_guest(struct kvm_vcpu *vcpu)
{
struct vcpu_svm *svm = to_svm(vcpu);
- struct svm_cpu_data *sd = per_cpu_ptr(&svm_data, vcpu->cpu);
+ struct svm_cpu_data *sd;

if (sev_es_guest(vcpu->kvm))
sev_es_unmap_ghcb(svm);
@@ -1453,6 +1453,10 @@ static void svm_prepare_switch_to_guest(struct kvm_vcpu *vcpu)
if (svm->guest_state_loaded)
return;

+ /* sev_es_unmap_ghcb() can resched, so grab per-cpu pointer afterward. */
+ barrier();
+ sd = per_cpu_ptr(&svm_data, vcpu->cpu);
+
/*
* Save additional host state that will be restored on VMEXIT (sev-es)
* or subsequent vmload of host save area.
@@ -2818,14 +2822,14 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
static int svm_complete_emulated_msr(struct kvm_vcpu *vcpu, int err)
{
struct vcpu_svm *svm = to_svm(vcpu);
- if (!err || !sev_es_guest(vcpu->kvm) || WARN_ON_ONCE(!svm->sev_es.ghcb))
+ if (!err || !sev_es_guest(vcpu->kvm) || WARN_ON_ONCE(!svm->sev_es.ghcb_in_use))
return kvm_complete_insn_gp(vcpu, err);

- ghcb_set_sw_exit_info_1(svm->sev_es.ghcb, 1);
- ghcb_set_sw_exit_info_2(svm->sev_es.ghcb,
- X86_TRAP_GP |
- SVM_EVTINJ_TYPE_EXEPT |
- SVM_EVTINJ_VALID);
+ svm_set_ghcb_sw_exit_info_1(vcpu, 1);
+ svm_set_ghcb_sw_exit_info_2(vcpu,
+ X86_TRAP_GP |
+ SVM_EVTINJ_TYPE_EXEPT |
+ SVM_EVTINJ_VALID);
return 1;
}

diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index f53a41e13033..c462dfac0a0d 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -201,8 +201,7 @@ struct svm_nested_state {
struct vcpu_sev_es_state {
/* SEV-ES support */
struct sev_es_save_area *vmsa;
- struct ghcb *ghcb;
- struct kvm_host_map ghcb_map;
+ bool ghcb_in_use;
bool received_first_sipi;
unsigned int ap_reset_hold_type;

@@ -212,6 +211,13 @@ struct vcpu_sev_es_state {
u64 ghcb_sa_gpa;
u32 ghcb_sa_alloc_len;
bool ghcb_sa_sync;
+
+ /*
+ * SEV-ES support to hold the sw_exit_info return values to be
+ * sync'ed to the GHCB when mapped.
+ */
+ u64 ghcb_sw_exit_info_1;
+ u64 ghcb_sw_exit_info_2;
};

struct vcpu_svm {
@@ -640,6 +646,20 @@ void nested_sync_control_from_vmcb02(struct vcpu_svm *svm);
void nested_vmcb02_compute_g_pat(struct vcpu_svm *svm);
void svm_switch_vmcb(struct vcpu_svm *svm, struct kvm_vmcb_info *target_vmcb);

+static inline void svm_set_ghcb_sw_exit_info_1(struct kvm_vcpu *vcpu, u64 val)
+{
+ struct vcpu_svm *svm = to_svm(vcpu);
+
+ svm->sev_es.ghcb_sw_exit_info_1 = val;
+}
+
+static inline void svm_set_ghcb_sw_exit_info_2(struct kvm_vcpu *vcpu, u64 val)
+{
+ struct vcpu_svm *svm = to_svm(vcpu);
+
+ svm->sev_es.ghcb_sw_exit_info_2 = val;
+}
+
extern struct kvm_x86_nested_ops svm_nested_ops;

/* avic.c */
--
2.25.1

2022-12-14 20:09:53

by Michael Roth

[permalink] [raw]
Subject: [PATCH RFC v7 45/64] KVM: SVM: Add support to handle GHCB GPA register VMGEXIT

From: Brijesh Singh <[email protected]>

SEV-SNP guests are required to perform a GHCB GPA registration. Before
using a GHCB GPA for a vCPU the first time, a guest must register the
vCPU GHCB GPA. If hypervisor can work with the guest requested GPA then
it must respond back with the same GPA otherwise return -1.

On VMEXIT, Verify that GHCB GPA matches with the registered value. If a
mismatch is detected then abort the guest.

Signed-off-by: Brijesh Singh <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
Signed-off-by: Michael Roth <[email protected]>
---
arch/x86/include/asm/sev-common.h | 8 ++++++++
arch/x86/kvm/svm/sev.c | 27 +++++++++++++++++++++++++++
arch/x86/kvm/svm/svm.h | 7 +++++++
3 files changed, 42 insertions(+)

diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
index 539de6b93420..0a9055cdfae2 100644
--- a/arch/x86/include/asm/sev-common.h
+++ b/arch/x86/include/asm/sev-common.h
@@ -59,6 +59,14 @@
#define GHCB_MSR_AP_RESET_HOLD_RESULT_POS 12
#define GHCB_MSR_AP_RESET_HOLD_RESULT_MASK GENMASK_ULL(51, 0)

+/* Preferred GHCB GPA Request */
+#define GHCB_MSR_PREF_GPA_REQ 0x010
+#define GHCB_MSR_GPA_VALUE_POS 12
+#define GHCB_MSR_GPA_VALUE_MASK GENMASK_ULL(51, 0)
+
+#define GHCB_MSR_PREF_GPA_RESP 0x011
+#define GHCB_MSR_PREF_GPA_NONE 0xfffffffffffff
+
/* GHCB GPA Register */
#define GHCB_MSR_REG_GPA_REQ 0x012
#define GHCB_MSR_REG_GPA_REQ_VAL(v) \
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 6ac0cb6e3484..d7b467b620aa 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3429,6 +3429,27 @@ static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
GHCB_MSR_INFO_MASK, GHCB_MSR_INFO_POS);
break;
}
+ case GHCB_MSR_PREF_GPA_REQ: {
+ set_ghcb_msr_bits(svm, GHCB_MSR_PREF_GPA_NONE, GHCB_MSR_GPA_VALUE_MASK,
+ GHCB_MSR_GPA_VALUE_POS);
+ set_ghcb_msr_bits(svm, GHCB_MSR_PREF_GPA_RESP, GHCB_MSR_INFO_MASK,
+ GHCB_MSR_INFO_POS);
+ break;
+ }
+ case GHCB_MSR_REG_GPA_REQ: {
+ u64 gfn;
+
+ gfn = get_ghcb_msr_bits(svm, GHCB_MSR_GPA_VALUE_MASK,
+ GHCB_MSR_GPA_VALUE_POS);
+
+ svm->sev_es.ghcb_registered_gpa = gfn_to_gpa(gfn);
+
+ set_ghcb_msr_bits(svm, gfn, GHCB_MSR_GPA_VALUE_MASK,
+ GHCB_MSR_GPA_VALUE_POS);
+ set_ghcb_msr_bits(svm, GHCB_MSR_REG_GPA_RESP, GHCB_MSR_INFO_MASK,
+ GHCB_MSR_INFO_POS);
+ break;
+ }
case GHCB_MSR_TERM_REQ: {
u64 reason_set, reason_code;

@@ -3478,6 +3499,12 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
return 1;
}

+ /* SEV-SNP guest requires that the GHCB GPA must be registered */
+ if (sev_snp_guest(svm->vcpu.kvm) && !ghcb_gpa_is_registered(svm, ghcb_gpa)) {
+ vcpu_unimpl(&svm->vcpu, "vmgexit: GHCB GPA [%#llx] is not registered.\n", ghcb_gpa);
+ return -EINVAL;
+ }
+
ret = sev_es_validate_vmgexit(svm, &exit_code);
if (ret)
return ret;
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index c462dfac0a0d..a4d48c3e0f89 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -218,6 +218,8 @@ struct vcpu_sev_es_state {
*/
u64 ghcb_sw_exit_info_1;
u64 ghcb_sw_exit_info_2;
+
+ u64 ghcb_registered_gpa;
};

struct vcpu_svm {
@@ -350,6 +352,11 @@ static inline bool sev_snp_guest(struct kvm *kvm)
return sev_es_guest(kvm) && sev->snp_active;
}

+static inline bool ghcb_gpa_is_registered(struct vcpu_svm *svm, u64 val)
+{
+ return svm->sev_es.ghcb_registered_gpa == val;
+}
+
static inline void vmcb_mark_all_dirty(struct vmcb *vmcb)
{
vmcb->control.clean = 0;
--
2.25.1

2022-12-14 20:10:16

by Michael Roth

[permalink] [raw]
Subject: [PATCH RFC v7 46/64] KVM: SVM: Add KVM_EXIT_VMGEXIT

For private memslots, GHCB page state change requests will be forwarded
to userspace for processing. Define a new KVM_EXIT_VMGEXIT for exits of
this type.

Signed-off-by: Michael Roth <[email protected]>
---
include/uapi/linux/kvm.h | 6 ++++++
1 file changed, 6 insertions(+)

diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index c468adc1f147..61b1e26ced01 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -301,6 +301,7 @@ struct kvm_xen_exit {
#define KVM_EXIT_RISCV_CSR 36
#define KVM_EXIT_NOTIFY 37
#define KVM_EXIT_MEMORY_FAULT 38
+#define KVM_EXIT_VMGEXIT 50

/* For KVM_EXIT_INTERNAL_ERROR */
/* Emulate instruction failed. */
@@ -549,6 +550,11 @@ struct kvm_run {
__u64 gpa;
__u64 size;
} memory;
+ /* KVM_EXIT_VMGEXIT */
+ struct {
+ __u64 ghcb_msr; /* GHCB MSR contents */
+ __u8 error; /* user -> kernel */
+ } vmgexit;
/* Fix the size of the union. */
char padding[256];
};
--
2.25.1

2022-12-14 20:10:50

by Michael Roth

[permalink] [raw]
Subject: [PATCH RFC v7 47/64] KVM: SVM: Add support to handle MSR based Page State Change VMGEXIT

From: Brijesh Singh <[email protected]>

SEV-SNP VMs can ask the hypervisor to change the page state in the RMP
table to be private or shared using the Page State Change MSR protocol
as defined in the GHCB specification.

Forward these requests to userspace via KVM_EXIT_VMGEXIT so the VMM can
issue the KVM ioctls to update the page state accordingly.

Co-developed-by: Michael Roth <[email protected]>
Signed-off-by: Michael Roth <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
---
arch/x86/include/asm/sev-common.h | 9 ++++++++
arch/x86/kvm/svm/sev.c | 25 +++++++++++++++++++++++
arch/x86/kvm/trace.h | 34 +++++++++++++++++++++++++++++++
arch/x86/kvm/x86.c | 1 +
4 files changed, 69 insertions(+)

diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
index 0a9055cdfae2..ee38f7408470 100644
--- a/arch/x86/include/asm/sev-common.h
+++ b/arch/x86/include/asm/sev-common.h
@@ -93,6 +93,10 @@ enum psc_op {
};

#define GHCB_MSR_PSC_REQ 0x014
+#define GHCB_MSR_PSC_GFN_POS 12
+#define GHCB_MSR_PSC_GFN_MASK GENMASK_ULL(39, 0)
+#define GHCB_MSR_PSC_OP_POS 52
+#define GHCB_MSR_PSC_OP_MASK 0xf
#define GHCB_MSR_PSC_REQ_GFN(gfn, op) \
/* GHCBData[55:52] */ \
(((u64)((op) & 0xf) << 52) | \
@@ -102,6 +106,11 @@ enum psc_op {
GHCB_MSR_PSC_REQ)

#define GHCB_MSR_PSC_RESP 0x015
+#define GHCB_MSR_PSC_ERROR_POS 32
+#define GHCB_MSR_PSC_ERROR_MASK GENMASK_ULL(31, 0)
+#define GHCB_MSR_PSC_ERROR GENMASK_ULL(31, 0)
+#define GHCB_MSR_PSC_RSVD_POS 12
+#define GHCB_MSR_PSC_RSVD_MASK GENMASK_ULL(19, 0)
#define GHCB_MSR_PSC_RESP_VAL(val) \
/* GHCBData[63:32] */ \
(((u64)(val) & GENMASK_ULL(63, 32)) >> 32)
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index d7b467b620aa..d7988629073b 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -29,6 +29,7 @@
#include "svm_ops.h"
#include "cpuid.h"
#include "trace.h"
+#include "mmu.h"

#ifndef CONFIG_KVM_AMD_SEV
/*
@@ -3350,6 +3351,23 @@ static void set_ghcb_msr(struct vcpu_svm *svm, u64 value)
svm->vmcb->control.ghcb_gpa = value;
}

+/*
+ * TODO: need to get the value set by userspace in vcpu->run->vmgexit.ghcb_msr
+ * and process that here accordingly.
+ */
+static int snp_complete_psc_msr_protocol(struct kvm_vcpu *vcpu)
+{
+ struct vcpu_svm *svm = to_svm(vcpu);
+
+ set_ghcb_msr_bits(svm, 0,
+ GHCB_MSR_PSC_ERROR_MASK, GHCB_MSR_PSC_ERROR_POS);
+
+ set_ghcb_msr_bits(svm, 0, GHCB_MSR_PSC_RSVD_MASK, GHCB_MSR_PSC_RSVD_POS);
+ set_ghcb_msr_bits(svm, GHCB_MSR_PSC_RESP, GHCB_MSR_INFO_MASK, GHCB_MSR_INFO_POS);
+
+ return 1; /* resume */
+}
+
static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
{
struct vmcb_control_area *control = &svm->vmcb->control;
@@ -3450,6 +3468,13 @@ static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
GHCB_MSR_INFO_POS);
break;
}
+ case GHCB_MSR_PSC_REQ:
+ vcpu->run->exit_reason = KVM_EXIT_VMGEXIT;
+ vcpu->run->vmgexit.ghcb_msr = control->ghcb_gpa;
+ vcpu->arch.complete_userspace_io = snp_complete_psc_msr_protocol;
+
+ ret = -1;
+ break;
case GHCB_MSR_TERM_REQ: {
u64 reason_set, reason_code;

diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
index 83843379813e..65861d2d086c 100644
--- a/arch/x86/kvm/trace.h
+++ b/arch/x86/kvm/trace.h
@@ -7,6 +7,7 @@
#include <asm/svm.h>
#include <asm/clocksource.h>
#include <asm/pvclock-abi.h>
+#include <asm/sev-common.h>

#undef TRACE_SYSTEM
#define TRACE_SYSTEM kvm
@@ -1831,6 +1832,39 @@ TRACE_EVENT(kvm_vmgexit_msr_protocol_exit,
__entry->vcpu_id, __entry->ghcb_gpa, __entry->result)
);

+/*
+ * Tracepoint for the SEV-SNP page state change processing
+ */
+#define psc_operation \
+ {SNP_PAGE_STATE_PRIVATE, "private"}, \
+ {SNP_PAGE_STATE_SHARED, "shared"} \
+
+TRACE_EVENT(kvm_snp_psc,
+ TP_PROTO(unsigned int vcpu_id, u64 pfn, u64 gpa, u8 op, int level),
+ TP_ARGS(vcpu_id, pfn, gpa, op, level),
+
+ TP_STRUCT__entry(
+ __field(int, vcpu_id)
+ __field(u64, pfn)
+ __field(u64, gpa)
+ __field(u8, op)
+ __field(int, level)
+ ),
+
+ TP_fast_assign(
+ __entry->vcpu_id = vcpu_id;
+ __entry->pfn = pfn;
+ __entry->gpa = gpa;
+ __entry->op = op;
+ __entry->level = level;
+ ),
+
+ TP_printk("vcpu %u, pfn %llx, gpa %llx, op %s, level %d",
+ __entry->vcpu_id, __entry->pfn, __entry->gpa,
+ __print_symbolic(__entry->op, psc_operation),
+ __entry->level)
+);
+
#endif /* _TRACE_KVM_H */

#undef TRACE_INCLUDE_PATH
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 732f9cbbadb5..08dd1ef7e136 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -13481,6 +13481,7 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_vmgexit_enter);
EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_vmgexit_exit);
EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_vmgexit_msr_protocol_enter);
EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_vmgexit_msr_protocol_exit);
+EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_snp_psc);

static int __init kvm_x86_init(void)
{
--
2.25.1

2022-12-14 20:10:56

by Michael Roth

[permalink] [raw]
Subject: [PATCH RFC v7 48/64] KVM: SVM: Add support to handle Page State Change VMGEXIT

From: Brijesh Singh <[email protected]>

SEV-SNP VMs can ask the hypervisor to change the page state in the RMP
table to be private or shared using the Page State Change NAE event
as defined in the GHCB specification version 2.

Forward these requests to userspace as KVM_EXIT_VMGEXITs, similar to how
it is done for requests that don't use a GHCB page.

Co-developed-by: Michael Roth <[email protected]>
Signed-off-by: Michael Roth <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
---
arch/x86/include/asm/sev-common.h | 7 +++++++
arch/x86/kvm/svm/sev.c | 18 ++++++++++++++++++
2 files changed, 25 insertions(+)

diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
index ee38f7408470..1b111cde8c82 100644
--- a/arch/x86/include/asm/sev-common.h
+++ b/arch/x86/include/asm/sev-common.h
@@ -130,6 +130,13 @@ enum psc_op {
/* SNP Page State Change NAE event */
#define VMGEXIT_PSC_MAX_ENTRY 253

+/* The page state change hdr structure in not valid */
+#define PSC_INVALID_HDR 1
+/* The hdr.cur_entry or hdr.end_entry is not valid */
+#define PSC_INVALID_ENTRY 2
+/* Page state change encountered undefined error */
+#define PSC_UNDEF_ERR 3
+
struct psc_hdr {
u16 cur_entry;
u16 end_entry;
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index d7988629073b..abe6444bf5d4 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3164,6 +3164,7 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm, u64 *exit_code)
case SVM_VMGEXIT_AP_JUMP_TABLE:
case SVM_VMGEXIT_UNSUPPORTED_EVENT:
case SVM_VMGEXIT_HV_FEATURES:
+ case SVM_VMGEXIT_PSC:
break;
default:
reason = GHCB_ERR_INVALID_EVENT;
@@ -3368,6 +3369,17 @@ static int snp_complete_psc_msr_protocol(struct kvm_vcpu *vcpu)
return 1; /* resume */
}

+/*
+ * TODO: need to process the GHCB contents and report the proper error code
+ * instead of assuming success.
+ */
+static int snp_complete_psc(struct kvm_vcpu *vcpu)
+{
+ svm_set_ghcb_sw_exit_info_2(vcpu, 0);
+
+ return 1;
+}
+
static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
{
struct vmcb_control_area *control = &svm->vmcb->control;
@@ -3595,6 +3607,12 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
ret = 1;
break;
}
+ case SVM_VMGEXIT_PSC:
+ /* Let userspace handling allocating/deallocating backing pages. */
+ vcpu->run->exit_reason = KVM_EXIT_VMGEXIT;
+ vcpu->run->vmgexit.ghcb_msr = ghcb_gpa;
+ vcpu->arch.complete_userspace_io = snp_complete_psc;
+ break;
case SVM_VMGEXIT_UNSUPPORTED_EVENT:
vcpu_unimpl(vcpu,
"vmgexit: unsupported event - exit_info_1=%#llx, exit_info_2=%#llx\n",
--
2.25.1

2022-12-14 20:11:14

by Michael Roth

[permalink] [raw]
Subject: [PATCH RFC v7 49/64] KVM: SVM: Introduce ops for the post gfn map and unmap

From: Brijesh Singh <[email protected]>

When SEV-SNP is enabled in the guest VM, the guest memory pages can
either be a private or shared. A write from the hypervisor goes through
the RMP checks. If the CPU sees that hypervisor is attempting to write
to a guest private page, then it will trigger an RMP violation #PF.

To avoid the RMP violation with GHCB pages, added new
post_{map,unmap}_gfn functions to verify if its safe to map GHCB pages.
Use kvm->mmu_lock to guard the GHCB against invalidations while being
accessed.

Need to add generic post_{map,unmap}_gfn() ops that can be used to
verify that it's safe to map a given guest page in the hypervisor.

Link: https://lore.kernel.org/all/CABpDEukAEGwb9w12enO=fhSbHbchypsOdO2dkR4Jei3wDW6NWg@mail.gmail.com/
Signed-off-by: Brijesh Singh <[email protected]>
Signed-off-by: Jarkko Sakkinen <[email protected]>
Signed-off by: Ashish Kalra <[email protected]>
[mdr: use kvm->mmu_lock instead of a new spinlock, this should guard
GHCB page against invalidations]
Signed-off-by: Michael Roth <[email protected]>
---
arch/x86/kvm/svm/sev.c | 40 ++++++++++++++++++++++++++++++++++++++--
arch/x86/kvm/svm/svm.h | 3 +++
2 files changed, 41 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index abe6444bf5d4..90b509fe1826 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -2926,19 +2926,28 @@ static inline int svm_map_ghcb(struct vcpu_svm *svm, struct kvm_host_map *map)
{
struct vmcb_control_area *control = &svm->vmcb->control;
u64 gfn = gpa_to_gfn(control->ghcb_gpa);
+ struct kvm_vcpu *vcpu = &svm->vcpu;

- if (kvm_vcpu_map(&svm->vcpu, gfn, map)) {
+ if (kvm_vcpu_map(vcpu, gfn, map)) {
/* Unable to map GHCB from guest */
pr_err("error mapping GHCB GFN [%#llx] from guest\n", gfn);
return -EFAULT;
}

+ if (sev_post_map_gfn(vcpu->kvm, map->gfn, map->pfn)) {
+ kvm_vcpu_unmap(vcpu, map, false);
+ return -EBUSY;
+ }
+
return 0;
}

static inline void svm_unmap_ghcb(struct vcpu_svm *svm, struct kvm_host_map *map)
{
- kvm_vcpu_unmap(&svm->vcpu, map, true);
+ struct kvm_vcpu *vcpu = &svm->vcpu;
+
+ kvm_vcpu_unmap(vcpu, map, true);
+ sev_post_unmap_gfn(vcpu->kvm, map->gfn, map->pfn);
}

static void dump_ghcb(struct vcpu_svm *svm)
@@ -3875,6 +3884,33 @@ void sev_rmp_page_level_adjust(struct kvm *kvm, gfn_t gfn, int *level)
__func__, gfn, *level, rmp_level, ret);
}

+int sev_post_map_gfn(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn)
+{
+ int level;
+
+ if (!sev_snp_guest(kvm))
+ return 0;
+
+ read_lock(&(kvm)->mmu_lock);
+
+ /* If pfn is not added as private then fail */
+ if (snp_lookup_rmpentry(pfn, &level) == 1) {
+ read_unlock(&(kvm)->mmu_lock);
+ pr_err_ratelimited("failed to map private gfn 0x%llx pfn 0x%llx\n", gfn, pfn);
+ return -EBUSY;
+ }
+
+ return 0;
+}
+
+void sev_post_unmap_gfn(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn)
+{
+ if (!sev_snp_guest(kvm))
+ return;
+
+ read_unlock(&(kvm)->mmu_lock);
+}
+
int sev_fault_is_private(struct kvm *kvm, gpa_t gpa, u64 error_code, bool *private_fault)
{
gfn_t gfn = gpa_to_gfn(gpa);
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index a4d48c3e0f89..aef13c120f2d 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -100,6 +100,7 @@ struct kvm_sev_info {
atomic_t migration_in_progress;
u64 snp_init_flags;
void *snp_context; /* SNP guest context page */
+ spinlock_t psc_lock;
};

struct kvm_svm {
@@ -727,6 +728,8 @@ void sev_es_prepare_switch_to_guest(struct sev_es_save_area *hostsa);
void sev_es_unmap_ghcb(struct vcpu_svm *svm);
struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu);
void sev_rmp_page_level_adjust(struct kvm *kvm, gfn_t gfn, int *level);
+int sev_post_map_gfn(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn);
+void sev_post_unmap_gfn(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn);

int sev_fault_is_private(struct kvm *kvm, gpa_t gpa, u64 error_code, bool *private_fault);

--
2.25.1

2022-12-14 20:12:38

by Michael Roth

[permalink] [raw]
Subject: [PATCH RFC v7 50/64] KVM: x86: Export the kvm_zap_gfn_range() for the SNP use

From: Brijesh Singh <[email protected]>

While resolving the RMP page fault, there may be cases where the page
level between the RMP entry and TDP does not match and the 2M RMP entry
must be split into 4K RMP entries. Or a 2M TDP page need to be broken
into multiple of 4K pages.

To keep the RMP and TDP page level in sync, zap the gfn range after
splitting the pages in the RMP entry. The zap should force the TDP to
gets rebuilt with the new page level.

Signed-off-by: Brijesh Singh <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
Signed-off-by: Michael Roth <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 2 ++
arch/x86/kvm/mmu.h | 2 --
arch/x86/kvm/mmu/mmu.c | 1 +
3 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index f4bb0821757e..15b9c678b281 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1838,6 +1838,8 @@ void kvm_mmu_slot_leaf_clear_dirty(struct kvm *kvm,
void kvm_mmu_zap_all(struct kvm *kvm);
void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm, u64 gen);
void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned long kvm_nr_mmu_pages);
+void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end);
+

int load_pdptrs(struct kvm_vcpu *vcpu, unsigned long cr3);

diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 6bdaacb6faa0..c94b620bf94b 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -211,8 +211,6 @@ static inline u8 permission_fault(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
return -(u32)fault & errcode;
}

-void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end);
-
int kvm_arch_write_log_dirty(struct kvm_vcpu *vcpu);

int kvm_mmu_post_init_vm(struct kvm *kvm);
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 25db83021500..02c7fb83a669 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -6533,6 +6533,7 @@ static bool kvm_mmu_zap_collapsible_spte(struct kvm *kvm,

return need_tlb_flush;
}
+EXPORT_SYMBOL_GPL(kvm_zap_gfn_range);

static void kvm_rmap_zap_collapsible_sptes(struct kvm *kvm,
const struct kvm_memory_slot *slot)
--
2.25.1

2022-12-14 20:12:40

by Michael Roth

[permalink] [raw]
Subject: [PATCH RFC v7 04/64] KVM: x86: Add 'fault_is_private' x86 op

This callback is used by the KVM MMU to check whether a #NPF was
or a private GPA or not.

Signed-off-by: Michael Roth <[email protected]>
---
arch/x86/include/asm/kvm-x86-ops.h | 1 +
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/kvm/mmu/mmu.c | 3 +--
arch/x86/kvm/mmu/mmu_internal.h | 40 +++++++++++++++++++++++++++---
4 files changed, 39 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
index f530a550c092..efae987cdce0 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -132,6 +132,7 @@ KVM_X86_OP(complete_emulated_msr)
KVM_X86_OP(vcpu_deliver_sipi_vector)
KVM_X86_OP_OPTIONAL_RET0(vcpu_get_apicv_inhibit_reasons);
KVM_X86_OP_OPTIONAL_RET0(private_mem_enabled);
+KVM_X86_OP_OPTIONAL_RET0(fault_is_private);

#undef KVM_X86_OP
#undef KVM_X86_OP_OPTIONAL
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 9317abffbf68..92539708f062 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1636,6 +1636,7 @@ struct kvm_x86_ops {
void (*load_mmu_pgd)(struct kvm_vcpu *vcpu, hpa_t root_hpa,
int root_level);
int (*private_mem_enabled)(struct kvm *kvm);
+ int (*fault_is_private)(struct kvm *kvm, gpa_t gpa, u64 error_code, bool *private_fault);

bool (*has_wbinvd_exit)(void);

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index b3ffc61c668c..61a7c221b966 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -5646,8 +5646,7 @@ int noinline kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 err
}

if (r == RET_PF_INVALID) {
- r = kvm_mmu_do_page_fault(vcpu, cr2_or_gpa,
- lower_32_bits(error_code), false);
+ r = kvm_mmu_do_page_fault(vcpu, cr2_or_gpa, error_code, false);
if (KVM_BUG_ON(r == RET_PF_INVALID, vcpu->kvm))
return -EIO;
}
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index e2f508db0b6e..04ea8da86510 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -230,6 +230,38 @@ struct kvm_page_fault {

int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault);

+static bool kvm_mmu_fault_is_private(struct kvm *kvm, gpa_t gpa, u64 err)
+{
+ struct kvm_memory_slot *slot;
+ bool private_fault = false;
+ gfn_t gfn = gpa_to_gfn(gpa);
+
+ slot = gfn_to_memslot(kvm, gfn);
+ if (!slot) {
+ pr_debug("%s: no slot, GFN: 0x%llx\n", __func__, gfn);
+ goto out;
+ }
+
+ if (!kvm_slot_can_be_private(slot)) {
+ pr_debug("%s: slot is not private, GFN: 0x%llx\n", __func__, gfn);
+ goto out;
+ }
+
+ if (static_call(kvm_x86_fault_is_private)(kvm, gpa, err, &private_fault) == 1)
+ goto out;
+
+ /*
+ * Handling below is for UPM self-tests and guests that use
+ * slot->shared_bitmap for encrypted access tracking.
+ */
+ if (IS_ENABLED(CONFIG_HAVE_KVM_PRIVATE_MEM_TESTING))
+ private_fault = kvm_mem_is_private(kvm, gpa >> PAGE_SHIFT);
+
+out:
+ pr_debug("%s: GFN: 0x%llx, private: %d\n", __func__, gfn, private_fault);
+ return private_fault;
+}
+
/*
* Return values of handle_mmio_page_fault(), mmu.page_fault(), fast_page_fault(),
* and of course kvm_mmu_do_page_fault().
@@ -261,13 +293,13 @@ enum {
};

static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
- u32 err, bool prefetch)
+ u64 err, bool prefetch)
{
bool is_tdp = likely(vcpu->arch.mmu->page_fault == kvm_tdp_page_fault);

struct kvm_page_fault fault = {
.addr = cr2_or_gpa,
- .error_code = err,
+ .error_code = lower_32_bits(err),
.exec = err & PFERR_FETCH_MASK,
.write = err & PFERR_WRITE_MASK,
.present = err & PFERR_PRESENT_MASK,
@@ -281,8 +313,8 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
.max_level = KVM_MAX_HUGEPAGE_LEVEL,
.req_level = PG_LEVEL_4K,
.goal_level = PG_LEVEL_4K,
- .is_private = IS_ENABLED(CONFIG_HAVE_KVM_PRIVATE_MEM_TESTING) && is_tdp &&
- kvm_mem_is_private(vcpu->kvm, cr2_or_gpa >> PAGE_SHIFT),
+ .is_private = is_tdp && kvm_mmu_fault_is_private(vcpu->kvm,
+ cr2_or_gpa, err),
};
int r;

--
2.25.1

2022-12-14 20:12:46

by Michael Roth

[permalink] [raw]
Subject: [PATCH RFC v7 51/64] KVM: SVM: Add support to handle the RMP nested page fault

From: Brijesh Singh <[email protected]>

When SEV-SNP is enabled in the guest, the hardware places restrictions
on all memory accesses based on the contents of the RMP table. When
hardware encounters RMP check failure caused by the guest memory access
it raises the #NPF. The error code contains additional information on
the access type. See the APM volume 2 for additional information.

Page state changes are handled by userspace, so if an RMP fault is
triggered as a result of an RMP NPT fault, exit to userspace just like
with explicit page-state change requests.

RMP NPT faults can also occur if the guest pvalidates a 2M page as 4K,
in which case the RMP entries need to be PSMASH'd. Handle this case
immediately in the kernel.

Co-developed-by: Michael Roth <[email protected]>
Signed-off-by: Michael Roth <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
---
arch/x86/kvm/svm/sev.c | 78 ++++++++++++++++++++++++++++++++++++++++++
arch/x86/kvm/svm/svm.c | 21 +++++++++---
arch/x86/kvm/svm/svm.h | 1 +
3 files changed, 96 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 90b509fe1826..5f2b2092cdae 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3361,6 +3361,13 @@ static void set_ghcb_msr(struct vcpu_svm *svm, u64 value)
svm->vmcb->control.ghcb_gpa = value;
}

+static int snp_rmptable_psmash(struct kvm *kvm, kvm_pfn_t pfn)
+{
+ pfn = pfn & ~(KVM_PAGES_PER_HPAGE(PG_LEVEL_2M) - 1);
+
+ return psmash(pfn);
+}
+
/*
* TODO: need to get the value set by userspace in vcpu->run->vmgexit.ghcb_msr
* and process that here accordingly.
@@ -3911,6 +3918,77 @@ void sev_post_unmap_gfn(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn)
read_unlock(&(kvm)->mmu_lock);
}

+void handle_rmp_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code)
+{
+ int order, rmp_level, assigned, ret;
+ struct kvm_memory_slot *slot;
+ struct kvm *kvm = vcpu->kvm;
+ kvm_pfn_t pfn;
+ gfn_t gfn;
+
+ /*
+ * Private memslots punt handling of implicit page state changes to
+ * userspace, so the only RMP faults expected here for
+ * PFERR_GUEST_SIZEM_MASK. Anything else suggests that the RMP table has
+ * gotten out of sync with the private memslot.
+ *
+ * TODO: However, this case has also been noticed when an access occurs
+ * to an NPT mapping that has just been split/PSMASHED, in which case
+ * PFERR_GUEST_SIZEM_MASK might not be set. In those cases it should be
+ * safe to ignore and let the guest retry, but log these just in case
+ * for now.
+ */
+ if (!(error_code & PFERR_GUEST_SIZEM_MASK))
+ pr_warn("Unexpected RMP fault for GPA 0x%llx, error_code 0x%llx",
+ gpa, error_code);
+
+ gfn = gpa >> PAGE_SHIFT;
+
+ /*
+ * Only RMPADJUST/PVALIDATE should cause PFERR_GUEST_SIZEM.
+ *
+ * For PVALIDATE, this should only happen if a guest PVALIDATEs a 4K GFN
+ * that is backed by a huge page in the host whose RMP entry has the
+ * hugepage/assigned bits set. With UPM, that should only ever happen
+ * for private pages.
+ *
+ * For RMPADJUST, this assumption might not hold, in which case handling
+ * for obtaining the PFN from HVA-backed memory may be needed. For now,
+ * just print warnings.
+ */
+ if (!kvm_mem_is_private(kvm, gfn)) {
+ pr_warn("Unexpected RMP fault, size-mismatch for non-private GPA 0x%llx", gpa);
+ return;
+ }
+
+ slot = gfn_to_memslot(kvm, gfn);
+ if (!kvm_slot_can_be_private(slot)) {
+ pr_warn("Unexpected RMP fault, non-private slot for GPA 0x%llx", gpa);
+ return;
+ }
+
+ ret = kvm_restricted_mem_get_pfn(slot, gfn, &pfn, &order);
+ if (ret) {
+ pr_warn("Unexpected RMP fault, no private backing page for GPA 0x%llx", gpa);
+ return;
+ }
+
+ assigned = snp_lookup_rmpentry(pfn, &rmp_level);
+ if (assigned != 1) {
+ pr_warn("Unexpected RMP fault, no assigned RMP entry for GPA 0x%llx", gpa);
+ goto out;
+ }
+
+ ret = snp_rmptable_psmash(kvm, pfn);
+ if (ret)
+ pr_err_ratelimited("Unable to split RMP entries for GPA 0x%llx PFN 0x%llx ret %d\n",
+ gpa, pfn, ret);
+
+out:
+ kvm_zap_gfn_range(kvm, gfn, gfn + PTRS_PER_PMD);
+ put_page(pfn_to_page(pfn));
+}
+
int sev_fault_is_private(struct kvm *kvm, gpa_t gpa, u64 error_code, bool *private_fault)
{
gfn_t gfn = gpa_to_gfn(gpa);
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 1826946a2f43..43f04fc95a0a 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1968,15 +1968,28 @@ static int pf_interception(struct kvm_vcpu *vcpu)
static int npf_interception(struct kvm_vcpu *vcpu)
{
struct vcpu_svm *svm = to_svm(vcpu);
+ int rc;

u64 fault_address = svm->vmcb->control.exit_info_2;
u64 error_code = svm->vmcb->control.exit_info_1;

trace_kvm_page_fault(vcpu, fault_address, error_code);
- return kvm_mmu_page_fault(vcpu, fault_address, error_code,
- static_cpu_has(X86_FEATURE_DECODEASSISTS) ?
- svm->vmcb->control.insn_bytes : NULL,
- svm->vmcb->control.insn_len);
+ rc = kvm_mmu_page_fault(vcpu, fault_address, error_code,
+ static_cpu_has(X86_FEATURE_DECODEASSISTS) ?
+ svm->vmcb->control.insn_bytes : NULL,
+ svm->vmcb->control.insn_len);
+
+ /*
+ * rc == 0 indicates a userspace exit is needed to handle page
+ * transitions, so do that first before updating the RMP table.
+ */
+ if (error_code & PFERR_GUEST_RMP_MASK) {
+ if (rc == 0)
+ return rc;
+ handle_rmp_page_fault(vcpu, fault_address, error_code);
+ }
+
+ return rc;
}

static int db_interception(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index aef13c120f2d..12b9f4d539fb 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -730,6 +730,7 @@ struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu);
void sev_rmp_page_level_adjust(struct kvm *kvm, gfn_t gfn, int *level);
int sev_post_map_gfn(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn);
void sev_post_unmap_gfn(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn);
+void handle_rmp_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code);

int sev_fault_is_private(struct kvm *kvm, gpa_t gpa, u64 error_code, bool *private_fault);

--
2.25.1

2022-12-14 20:13:04

by Michael Roth

[permalink] [raw]
Subject: [PATCH RFC v7 52/64] KVM: SVM: Provide support for SNP_GUEST_REQUEST NAE event

From: Brijesh Singh <[email protected]>

Version 2 of GHCB specification added the support for two SNP Guest
Request Message NAE events. The events allows for an SEV-SNP guest to
make request to the SEV-SNP firmware through hypervisor using the
SNP_GUEST_REQUEST API define in the SEV-SNP firmware specification.

The SNP_EXT_GUEST_REQUEST is similar to SNP_GUEST_REQUEST with the
difference of an additional certificate blob that can be passed through
the SNP_SET_CONFIG ioctl defined in the CCP driver. The CCP driver
provides snp_guest_ext_guest_request() that is used by the KVM to get
both the report and certificate data at once.

Signed-off-by: Brijesh Singh <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
Signed-off-by: Michael Roth <[email protected]>
---
arch/x86/kvm/svm/sev.c | 185 +++++++++++++++++++++++++++++++++++++++--
arch/x86/kvm/svm/svm.h | 2 +
2 files changed, 181 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 5f2b2092cdae..18efa70553c2 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -331,6 +331,7 @@ static int sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp)
if (ret)
goto e_free;

+ mutex_init(&sev->guest_req_lock);
ret = sev_snp_init(&argp->error, false);
} else {
ret = sev_platform_init(&argp->error);
@@ -2051,23 +2052,34 @@ int sev_vm_move_enc_context_from(struct kvm *kvm, unsigned int source_fd)
*/
static void *snp_context_create(struct kvm *kvm, struct kvm_sev_cmd *argp)
{
+ struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
struct sev_data_snp_addr data = {};
- void *context;
+ void *context, *certs_data;
int rc;

+ /* Allocate memory used for the certs data in SNP guest request */
+ certs_data = kzalloc(SEV_FW_BLOB_MAX_SIZE, GFP_KERNEL_ACCOUNT);
+ if (!certs_data)
+ return NULL;
+
/* Allocate memory for context page */
context = snp_alloc_firmware_page(GFP_KERNEL_ACCOUNT);
if (!context)
- return NULL;
+ goto e_free;

data.gctx_paddr = __psp_pa(context);
rc = __sev_issue_cmd(argp->sev_fd, SEV_CMD_SNP_GCTX_CREATE, &data, &argp->error);
- if (rc) {
- snp_free_firmware_page(context);
- return NULL;
- }
+ if (rc)
+ goto e_free;
+
+ sev->snp_certs_data = certs_data;

return context;
+
+e_free:
+ snp_free_firmware_page(context);
+ kfree(certs_data);
+ return NULL;
}

static int snp_bind_asid(struct kvm *kvm, int *error)
@@ -2653,6 +2665,8 @@ static int snp_decommission_context(struct kvm *kvm)
snp_free_firmware_page(sev->snp_context);
sev->snp_context = NULL;

+ kfree(sev->snp_certs_data);
+
return 0;
}

@@ -3174,6 +3188,8 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm, u64 *exit_code)
case SVM_VMGEXIT_UNSUPPORTED_EVENT:
case SVM_VMGEXIT_HV_FEATURES:
case SVM_VMGEXIT_PSC:
+ case SVM_VMGEXIT_GUEST_REQUEST:
+ case SVM_VMGEXIT_EXT_GUEST_REQUEST:
break;
default:
reason = GHCB_ERR_INVALID_EVENT;
@@ -3396,6 +3412,149 @@ static int snp_complete_psc(struct kvm_vcpu *vcpu)
return 1;
}

+static unsigned long snp_setup_guest_buf(struct vcpu_svm *svm,
+ struct sev_data_snp_guest_request *data,
+ gpa_t req_gpa, gpa_t resp_gpa)
+{
+ struct kvm_vcpu *vcpu = &svm->vcpu;
+ struct kvm *kvm = vcpu->kvm;
+ kvm_pfn_t req_pfn, resp_pfn;
+ struct kvm_sev_info *sev;
+
+ sev = &to_kvm_svm(kvm)->sev_info;
+
+ if (!IS_ALIGNED(req_gpa, PAGE_SIZE) || !IS_ALIGNED(resp_gpa, PAGE_SIZE))
+ return SEV_RET_INVALID_PARAM;
+
+ req_pfn = gfn_to_pfn(kvm, gpa_to_gfn(req_gpa));
+ if (is_error_noslot_pfn(req_pfn))
+ return SEV_RET_INVALID_ADDRESS;
+
+ resp_pfn = gfn_to_pfn(kvm, gpa_to_gfn(resp_gpa));
+ if (is_error_noslot_pfn(resp_pfn))
+ return SEV_RET_INVALID_ADDRESS;
+
+ if (rmp_make_private(resp_pfn, 0, PG_LEVEL_4K, 0, true))
+ return SEV_RET_INVALID_ADDRESS;
+
+ data->gctx_paddr = __psp_pa(sev->snp_context);
+ data->req_paddr = __sme_set(req_pfn << PAGE_SHIFT);
+ data->res_paddr = __sme_set(resp_pfn << PAGE_SHIFT);
+
+ return 0;
+}
+
+static void snp_cleanup_guest_buf(struct sev_data_snp_guest_request *data, unsigned long *rc)
+{
+ u64 pfn = __sme_clr(data->res_paddr) >> PAGE_SHIFT;
+ int ret;
+
+ ret = snp_page_reclaim(pfn);
+ if (ret)
+ *rc = SEV_RET_INVALID_ADDRESS;
+
+ ret = rmp_make_shared(pfn, PG_LEVEL_4K);
+ if (ret)
+ *rc = SEV_RET_INVALID_ADDRESS;
+}
+
+static void snp_handle_guest_request(struct vcpu_svm *svm, gpa_t req_gpa, gpa_t resp_gpa)
+{
+ struct sev_data_snp_guest_request data = {0};
+ struct kvm_vcpu *vcpu = &svm->vcpu;
+ struct kvm *kvm = vcpu->kvm;
+ struct kvm_sev_info *sev;
+ unsigned long rc;
+ int err;
+
+ if (!sev_snp_guest(vcpu->kvm)) {
+ rc = SEV_RET_INVALID_GUEST;
+ goto e_fail;
+ }
+
+ sev = &to_kvm_svm(kvm)->sev_info;
+
+ mutex_lock(&sev->guest_req_lock);
+
+ rc = snp_setup_guest_buf(svm, &data, req_gpa, resp_gpa);
+ if (rc)
+ goto unlock;
+
+ rc = sev_issue_cmd(kvm, SEV_CMD_SNP_GUEST_REQUEST, &data, &err);
+ if (rc)
+ /* use the firmware error code */
+ rc = err;
+
+ snp_cleanup_guest_buf(&data, &rc);
+
+unlock:
+ mutex_unlock(&sev->guest_req_lock);
+
+e_fail:
+ svm_set_ghcb_sw_exit_info_2(vcpu, rc);
+}
+
+static void snp_handle_ext_guest_request(struct vcpu_svm *svm, gpa_t req_gpa, gpa_t resp_gpa)
+{
+ struct sev_data_snp_guest_request req = {0};
+ struct kvm_vcpu *vcpu = &svm->vcpu;
+ struct kvm *kvm = vcpu->kvm;
+ unsigned long data_npages;
+ struct kvm_sev_info *sev;
+ unsigned long rc, err;
+ u64 data_gpa;
+
+ if (!sev_snp_guest(vcpu->kvm)) {
+ rc = SEV_RET_INVALID_GUEST;
+ goto e_fail;
+ }
+
+ sev = &to_kvm_svm(kvm)->sev_info;
+
+ data_gpa = vcpu->arch.regs[VCPU_REGS_RAX];
+ data_npages = vcpu->arch.regs[VCPU_REGS_RBX];
+
+ if (!IS_ALIGNED(data_gpa, PAGE_SIZE)) {
+ rc = SEV_RET_INVALID_ADDRESS;
+ goto e_fail;
+ }
+
+ mutex_lock(&sev->guest_req_lock);
+
+ rc = snp_setup_guest_buf(svm, &req, req_gpa, resp_gpa);
+ if (rc)
+ goto unlock;
+
+ rc = snp_guest_ext_guest_request(&req, (unsigned long)sev->snp_certs_data,
+ &data_npages, &err);
+ if (rc) {
+ /*
+ * If buffer length is small then return the expected
+ * length in rbx.
+ */
+ if (err == SNP_GUEST_REQ_INVALID_LEN)
+ vcpu->arch.regs[VCPU_REGS_RBX] = data_npages;
+
+ /* pass the firmware error code */
+ rc = err;
+ goto cleanup;
+ }
+
+ /* Copy the certificate blob in the guest memory */
+ if (data_npages &&
+ kvm_write_guest(kvm, data_gpa, sev->snp_certs_data, data_npages << PAGE_SHIFT))
+ rc = SEV_RET_INVALID_ADDRESS;
+
+cleanup:
+ snp_cleanup_guest_buf(&req, &rc);
+
+unlock:
+ mutex_unlock(&sev->guest_req_lock);
+
+e_fail:
+ svm_set_ghcb_sw_exit_info_2(vcpu, rc);
+}
+
static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
{
struct vmcb_control_area *control = &svm->vmcb->control;
@@ -3629,6 +3788,20 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
vcpu->run->vmgexit.ghcb_msr = ghcb_gpa;
vcpu->arch.complete_userspace_io = snp_complete_psc;
break;
+ case SVM_VMGEXIT_GUEST_REQUEST: {
+ snp_handle_guest_request(svm, control->exit_info_1, control->exit_info_2);
+
+ ret = 1;
+ break;
+ }
+ case SVM_VMGEXIT_EXT_GUEST_REQUEST: {
+ snp_handle_ext_guest_request(svm,
+ control->exit_info_1,
+ control->exit_info_2);
+
+ ret = 1;
+ break;
+ }
case SVM_VMGEXIT_UNSUPPORTED_EVENT:
vcpu_unimpl(vcpu,
"vmgexit: unsupported event - exit_info_1=%#llx, exit_info_2=%#llx\n",
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 12b9f4d539fb..7c0f9d00950f 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -101,6 +101,8 @@ struct kvm_sev_info {
u64 snp_init_flags;
void *snp_context; /* SNP guest context page */
spinlock_t psc_lock;
+ void *snp_certs_data;
+ struct mutex guest_req_lock;
};

struct kvm_svm {
--
2.25.1

2022-12-14 20:14:11

by Michael Roth

[permalink] [raw]
Subject: [PATCH RFC v7 53/64] KVM: SVM: Use a VMSA physical address variable for populating VMCB

From: Tom Lendacky <[email protected]>

In preparation to support SEV-SNP AP Creation, use a variable that holds
the VMSA physical address rather than converting the virtual address.
This will allow SEV-SNP AP Creation to set the new physical address that
will be used should the vCPU reset path be taken.

Signed-off-by: Tom Lendacky <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
Signed-off-by: Michael Roth <[email protected]>
---
arch/x86/kvm/svm/sev.c | 5 ++---
arch/x86/kvm/svm/svm.c | 9 ++++++++-
arch/x86/kvm/svm/svm.h | 1 +
3 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 18efa70553c2..36c312143d12 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3845,10 +3845,9 @@ static void sev_es_init_vmcb(struct vcpu_svm *svm)

/*
* An SEV-ES guest requires a VMSA area that is a separate from the
- * VMCB page. Do not include the encryption mask on the VMSA physical
- * address since hardware will access it using the guest key.
+ * VMCB page.
*/
- svm->vmcb->control.vmsa_pa = __pa(svm->sev_es.vmsa);
+ svm->vmcb->control.vmsa_pa = svm->sev_es.vmsa_pa;

/* Can't intercept CR register access, HV can't modify CR registers */
svm_clr_intercept(svm, INTERCEPT_CR0_READ);
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 43f04fc95a0a..e9317d27a01d 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1398,9 +1398,16 @@ static int svm_vcpu_create(struct kvm_vcpu *vcpu)
svm->vmcb01.pa = __sme_set(page_to_pfn(vmcb01_page) << PAGE_SHIFT);
svm_switch_vmcb(svm, &svm->vmcb01);

- if (vmsa_page)
+ if (vmsa_page) {
svm->sev_es.vmsa = page_address(vmsa_page);

+ /*
+ * Do not include the encryption mask on the VMSA physical
+ * address since hardware will access it using the guest key.
+ */
+ svm->sev_es.vmsa_pa = __pa(svm->sev_es.vmsa);
+ }
+
svm->guest_state_loaded = false;

return 0;
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 7c0f9d00950f..284902e22dce 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -204,6 +204,7 @@ struct svm_nested_state {
struct vcpu_sev_es_state {
/* SEV-ES support */
struct sev_es_save_area *vmsa;
+ hpa_t vmsa_pa;
bool ghcb_in_use;
bool received_first_sipi;
unsigned int ap_reset_hold_type;
--
2.25.1

2022-12-14 20:14:49

by Michael Roth

[permalink] [raw]
Subject: [PATCH RFC v7 54/64] KVM: SVM: Support SEV-SNP AP Creation NAE event

From: Tom Lendacky <[email protected]>

Add support for the SEV-SNP AP Creation NAE event. This allows SEV-SNP
guests to alter the register state of the APs on their own. This allows
the guest a way of simulating INIT-SIPI.

A new event, KVM_REQ_UPDATE_PROTECTED_GUEST_STATE, is created and used
so as to avoid updating the VMSA pointer while the vCPU is running.

For CREATE
The guest supplies the GPA of the VMSA to be used for the vCPU with
the specified APIC ID. The GPA is saved in the svm struct of the
target vCPU, the KVM_REQ_UPDATE_PROTECTED_GUEST_STATE event is added
to the vCPU and then the vCPU is kicked.

For CREATE_ON_INIT:
The guest supplies the GPA of the VMSA to be used for the vCPU with
the specified APIC ID the next time an INIT is performed. The GPA is
saved in the svm struct of the target vCPU.

For DESTROY:
The guest indicates it wishes to stop the vCPU. The GPA is cleared
from the svm struct, the KVM_REQ_UPDATE_PROTECTED_GUEST_STATE event is
added to vCPU and then the vCPU is kicked.

The KVM_REQ_UPDATE_PROTECTED_GUEST_STATE event handler will be invoked
as a result of the event or as a result of an INIT. The handler sets the
vCPU to the KVM_MP_STATE_UNINITIALIZED state, so that any errors will
leave the vCPU as not runnable. Any previous VMSA pages that were
installed as part of an SEV-SNP AP Creation NAE event are un-pinned. If
a new VMSA is to be installed, the VMSA guest page is pinned and set as
the VMSA in the vCPU VMCB and the vCPU state is set to
KVM_MP_STATE_RUNNABLE. If a new VMSA is not to be installed, the VMSA is
cleared in the vCPU VMCB and the vCPU state is left as
KVM_MP_STATE_UNINITIALIZED to prevent it from being run.

Signed-off-by: Tom Lendacky <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
[mdr: add handling for restrictedmem]
Signed-off-by: Michael Roth <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/include/asm/svm.h | 7 +-
arch/x86/kvm/svm/sev.c | 245 ++++++++++++++++++++++++++++++++
arch/x86/kvm/svm/svm.c | 3 +
arch/x86/kvm/svm/svm.h | 7 +
arch/x86/kvm/x86.c | 9 ++
6 files changed, 271 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 15b9c678b281..5958cd93e5e6 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -115,6 +115,7 @@
#define KVM_REQ_HV_TLB_FLUSH \
KVM_ARCH_REQ_FLAGS(32, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
#define KVM_REQ_MEMORY_MCE KVM_ARCH_REQ(33)
+#define KVM_REQ_UPDATE_PROTECTED_GUEST_STATE KVM_ARCH_REQ(34)

#define CR0_RESERVED_BITS \
(~(unsigned long)(X86_CR0_PE | X86_CR0_MP | X86_CR0_EM | X86_CR0_TS \
diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index c18d78d5e505..e76ad26ba64f 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -278,7 +278,12 @@ enum avic_ipi_failure_cause {
#define AVIC_HPA_MASK ~((0xFFFULL << 52) | 0xFFF)
#define VMCB_AVIC_APIC_BAR_MASK 0xFFFFFFFFFF000ULL

-#define SVM_SEV_FEAT_SNP_ACTIVE BIT(0)
+#define SVM_SEV_FEAT_SNP_ACTIVE BIT(0)
+#define SVM_SEV_FEAT_RESTRICTED_INJECTION BIT(3)
+#define SVM_SEV_FEAT_ALTERNATE_INJECTION BIT(4)
+#define SVM_SEV_FEAT_INT_INJ_MODES \
+ (SVM_SEV_FEAT_RESTRICTED_INJECTION | \
+ SVM_SEV_FEAT_ALTERNATE_INJECTION)

struct vmcb_seg {
u16 selector;
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 36c312143d12..2f4c9f2bcf76 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -771,6 +771,7 @@ static int sev_launch_update_data(struct kvm *kvm,

static int sev_es_sync_vmsa(struct vcpu_svm *svm)
{
+ struct kvm_sev_info *sev = &to_kvm_svm(svm->vcpu.kvm)->sev_info;
struct sev_es_save_area *save = svm->sev_es.vmsa;

/* Check some debug related fields before encrypting the VMSA */
@@ -816,6 +817,12 @@ static int sev_es_sync_vmsa(struct vcpu_svm *svm)
if (sev_snp_guest(svm->vcpu.kvm))
save->sev_features |= SVM_SEV_FEAT_SNP_ACTIVE;

+ /*
+ * Save the VMSA synced SEV features. For now, they are the same for
+ * all vCPUs, so just save each time.
+ */
+ sev->sev_features = save->sev_features;
+
pr_debug("Virtual Machine Save Area (VMSA):\n");
print_hex_dump_debug("", DUMP_PREFIX_NONE, 16, 1, save, sizeof(*save), false);

@@ -3182,6 +3189,10 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm, u64 *exit_code)
if (!ghcb_sw_scratch_is_valid(ghcb))
goto vmgexit_err;
break;
+ case SVM_VMGEXIT_AP_CREATION:
+ if (!ghcb_rax_is_valid(ghcb))
+ goto vmgexit_err;
+ break;
case SVM_VMGEXIT_NMI_COMPLETE:
case SVM_VMGEXIT_AP_HLT_LOOP:
case SVM_VMGEXIT_AP_JUMP_TABLE:
@@ -3555,6 +3566,226 @@ static void snp_handle_ext_guest_request(struct vcpu_svm *svm, gpa_t req_gpa, gp
svm_set_ghcb_sw_exit_info_2(vcpu, rc);
}

+static kvm_pfn_t gfn_to_pfn_restricted(struct kvm *kvm, gfn_t gfn)
+{
+ struct kvm_memory_slot *slot;
+ kvm_pfn_t pfn;
+ int order = 0;
+
+ slot = gfn_to_memslot(kvm, gfn);
+ if (!kvm_slot_can_be_private(slot)) {
+ pr_err("SEV: Failure retrieving restricted memslot for GFN 0x%llx, flags 0x%x, userspace_addr: 0x%lx\n",
+ gfn, slot->flags, slot->userspace_addr);
+ return INVALID_PAGE;
+ }
+
+ if (!kvm_mem_is_private(kvm, gfn)) {
+ pr_err("SEV: Failure retrieving restricted PFN for GFN 0x%llx\n", gfn);
+ return INVALID_PAGE;
+ }
+
+ if (kvm_restricted_mem_get_pfn(slot, gfn, &pfn, &order)) {
+ pr_err("SEV: Failure retrieving restricted PFN for GFN 0x%llx\n", gfn);
+ return INVALID_PAGE;
+ }
+
+ put_page(pfn_to_page(pfn));
+
+ return pfn;
+}
+
+static int __sev_snp_update_protected_guest_state(struct kvm_vcpu *vcpu)
+{
+ struct vcpu_svm *svm = to_svm(vcpu);
+ kvm_pfn_t pfn;
+ hpa_t cur_pa;
+
+ WARN_ON(!mutex_is_locked(&svm->sev_es.snp_vmsa_mutex));
+
+ /* Save off the current VMSA PA for later checks */
+ cur_pa = svm->sev_es.vmsa_pa;
+
+ /* Mark the vCPU as offline and not runnable */
+ vcpu->arch.pv.pv_unhalted = false;
+ vcpu->arch.mp_state = KVM_MP_STATE_STOPPED;
+
+ /* Clear use of the VMSA */
+ svm->sev_es.vmsa_pa = INVALID_PAGE;
+ svm->vmcb->control.vmsa_pa = INVALID_PAGE;
+
+ if (cur_pa != __pa(svm->sev_es.vmsa) && VALID_PAGE(cur_pa)) {
+ /*
+ * The svm->sev_es.vmsa_pa field holds the hypervisor physical
+ * address of the about to be replaced VMSA which will no longer
+ * be used or referenced, so un-pin it. However, restricted
+ * pages (e.g. via AP creation) should be left to the
+ * restrictedmem backend to deal with, so don't release the
+ * page in that case.
+ */
+ if (!VALID_PAGE(gfn_to_pfn_restricted(vcpu->kvm,
+ gpa_to_gfn(svm->sev_es.snp_vmsa_gpa))))
+ kvm_release_pfn_dirty(__phys_to_pfn(cur_pa));
+ }
+
+ if (VALID_PAGE(svm->sev_es.snp_vmsa_gpa)) {
+ /*
+ * The VMSA is referenced by the hypervisor physical address,
+ * so retrieve the PFN and ensure it is restricted memory.
+ */
+ pfn = gfn_to_pfn_restricted(vcpu->kvm, gpa_to_gfn(svm->sev_es.snp_vmsa_gpa));
+ if (!VALID_PAGE(pfn))
+ return pfn;
+
+ /* Use the new VMSA */
+ svm->sev_es.vmsa_pa = pfn_to_hpa(pfn);
+ svm->vmcb->control.vmsa_pa = svm->sev_es.vmsa_pa;
+
+ /* Mark the vCPU as runnable */
+ vcpu->arch.pv.pv_unhalted = false;
+ vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
+
+ svm->sev_es.snp_vmsa_gpa = INVALID_PAGE;
+ }
+
+ /*
+ * When replacing the VMSA during SEV-SNP AP creation,
+ * mark the VMCB dirty so that full state is always reloaded.
+ */
+ vmcb_mark_all_dirty(svm->vmcb);
+
+ return 0;
+}
+
+/*
+ * Invoked as part of svm_vcpu_reset() processing of an init event.
+ */
+void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu)
+{
+ struct vcpu_svm *svm = to_svm(vcpu);
+ int ret;
+
+ if (!sev_snp_guest(vcpu->kvm))
+ return;
+
+ mutex_lock(&svm->sev_es.snp_vmsa_mutex);
+
+ if (!svm->sev_es.snp_ap_create)
+ goto unlock;
+
+ svm->sev_es.snp_ap_create = false;
+
+ ret = __sev_snp_update_protected_guest_state(vcpu);
+ if (ret)
+ vcpu_unimpl(vcpu, "snp: AP state update on init failed\n");
+
+unlock:
+ mutex_unlock(&svm->sev_es.snp_vmsa_mutex);
+}
+
+static int sev_snp_ap_creation(struct vcpu_svm *svm)
+{
+ struct kvm_sev_info *sev = &to_kvm_svm(svm->vcpu.kvm)->sev_info;
+ struct kvm_vcpu *vcpu = &svm->vcpu;
+ struct kvm_vcpu *target_vcpu;
+ struct vcpu_svm *target_svm;
+ unsigned int request;
+ unsigned int apic_id;
+ bool kick;
+ int ret;
+
+ request = lower_32_bits(svm->vmcb->control.exit_info_1);
+ apic_id = upper_32_bits(svm->vmcb->control.exit_info_1);
+
+ /* Validate the APIC ID */
+ target_vcpu = kvm_get_vcpu_by_id(vcpu->kvm, apic_id);
+ if (!target_vcpu) {
+ vcpu_unimpl(vcpu, "vmgexit: invalid AP APIC ID [%#x] from guest\n",
+ apic_id);
+ return -EINVAL;
+ }
+
+ ret = 0;
+
+ target_svm = to_svm(target_vcpu);
+
+ /*
+ * The target vCPU is valid, so the vCPU will be kicked unless the
+ * request is for CREATE_ON_INIT. For any errors at this stage, the
+ * kick will place the vCPU in an non-runnable state.
+ */
+ kick = true;
+
+ mutex_lock(&target_svm->sev_es.snp_vmsa_mutex);
+
+ target_svm->sev_es.snp_vmsa_gpa = INVALID_PAGE;
+ target_svm->sev_es.snp_ap_create = true;
+
+ /* Interrupt injection mode shouldn't change for AP creation */
+ if (request < SVM_VMGEXIT_AP_DESTROY) {
+ u64 sev_features;
+
+ sev_features = vcpu->arch.regs[VCPU_REGS_RAX];
+ sev_features ^= sev->sev_features;
+ if (sev_features & SVM_SEV_FEAT_INT_INJ_MODES) {
+ vcpu_unimpl(vcpu, "vmgexit: invalid AP injection mode [%#lx] from guest\n",
+ vcpu->arch.regs[VCPU_REGS_RAX]);
+ ret = -EINVAL;
+ goto out;
+ }
+ }
+
+ switch (request) {
+ case SVM_VMGEXIT_AP_CREATE_ON_INIT:
+ kick = false;
+ fallthrough;
+ case SVM_VMGEXIT_AP_CREATE:
+ if (!page_address_valid(vcpu, svm->vmcb->control.exit_info_2)) {
+ vcpu_unimpl(vcpu, "vmgexit: invalid AP VMSA address [%#llx] from guest\n",
+ svm->vmcb->control.exit_info_2);
+ ret = -EINVAL;
+ goto out;
+ }
+
+ /*
+ * Malicious guest can RMPADJUST a large page into VMSA which
+ * will hit the SNP erratum where the CPU will incorrectly signal
+ * an RMP violation #PF if a hugepage collides with the RMP entry
+ * of VMSA page, reject the AP CREATE request if VMSA address from
+ * guest is 2M aligned.
+ */
+ if (IS_ALIGNED(svm->vmcb->control.exit_info_2, PMD_SIZE)) {
+ vcpu_unimpl(vcpu,
+ "vmgexit: AP VMSA address [%llx] from guest is unsafe as it is 2M aligned\n",
+ svm->vmcb->control.exit_info_2);
+ ret = -EINVAL;
+ goto out;
+ }
+
+ target_svm->sev_es.snp_vmsa_gpa = svm->vmcb->control.exit_info_2;
+ break;
+ case SVM_VMGEXIT_AP_DESTROY:
+ break;
+ default:
+ vcpu_unimpl(vcpu, "vmgexit: invalid AP creation request [%#x] from guest\n",
+ request);
+ ret = -EINVAL;
+ break;
+ }
+
+out:
+ if (kick) {
+ if (target_vcpu->arch.mp_state == KVM_MP_STATE_UNINITIALIZED)
+ target_vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
+
+ kvm_make_request(KVM_REQ_UPDATE_PROTECTED_GUEST_STATE, target_vcpu);
+ kvm_vcpu_kick(target_vcpu);
+ }
+
+ mutex_unlock(&target_svm->sev_es.snp_vmsa_mutex);
+
+ return ret;
+}
+
static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
{
struct vmcb_control_area *control = &svm->vmcb->control;
@@ -3802,6 +4033,18 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
ret = 1;
break;
}
+ case SVM_VMGEXIT_AP_CREATION:
+ ret = sev_snp_ap_creation(svm);
+ if (ret) {
+ svm_set_ghcb_sw_exit_info_1(vcpu, 1);
+ svm_set_ghcb_sw_exit_info_2(vcpu,
+ X86_TRAP_GP |
+ SVM_EVTINJ_TYPE_EXEPT |
+ SVM_EVTINJ_VALID);
+ }
+
+ ret = 1;
+ break;
case SVM_VMGEXIT_UNSUPPORTED_EVENT:
vcpu_unimpl(vcpu,
"vmgexit: unsupported event - exit_info_1=%#llx, exit_info_2=%#llx\n",
@@ -3906,6 +4149,8 @@ void sev_es_vcpu_reset(struct vcpu_svm *svm)
set_ghcb_msr(svm, GHCB_MSR_SEV_INFO(GHCB_VERSION_MAX,
GHCB_VERSION_MIN,
sev_enc_bit));
+
+ mutex_init(&svm->sev_es.snp_vmsa_mutex);
}

void sev_es_prepare_switch_to_guest(struct sev_es_save_area *hostsa)
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index e9317d27a01d..7f8c480dfa5e 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1337,6 +1337,9 @@ static void svm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
svm->spec_ctrl = 0;
svm->virt_spec_ctrl = 0;

+ if (init_event)
+ sev_snp_init_protected_guest_state(vcpu);
+
init_vmcb(vcpu);

if (!init_event)
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 284902e22dce..5e7cb0260dc3 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -103,6 +103,8 @@ struct kvm_sev_info {
spinlock_t psc_lock;
void *snp_certs_data;
struct mutex guest_req_lock;
+
+ u64 sev_features; /* Features set at VMSA creation */
};

struct kvm_svm {
@@ -224,6 +226,10 @@ struct vcpu_sev_es_state {
u64 ghcb_sw_exit_info_2;

u64 ghcb_registered_gpa;
+
+ struct mutex snp_vmsa_mutex;
+ gpa_t snp_vmsa_gpa;
+ bool snp_ap_create;
};

struct vcpu_svm {
@@ -734,6 +740,7 @@ void sev_rmp_page_level_adjust(struct kvm *kvm, gfn_t gfn, int *level);
int sev_post_map_gfn(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn);
void sev_post_unmap_gfn(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn);
void handle_rmp_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code);
+void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu);

int sev_fault_is_private(struct kvm *kvm, gpa_t gpa, u64 error_code, bool *private_fault);

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 08dd1ef7e136..a08601277497 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10387,6 +10387,12 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
r = 0;
goto out;
}
+
+ if (kvm_check_request(KVM_REQ_UPDATE_PROTECTED_GUEST_STATE, vcpu)) {
+ kvm_vcpu_reset(vcpu, true);
+ if (vcpu->arch.mp_state != KVM_MP_STATE_RUNNABLE)
+ goto out;
+ }
}

if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win ||
@@ -12667,6 +12673,9 @@ static inline bool kvm_vcpu_has_events(struct kvm_vcpu *vcpu)
return true;
#endif

+ if (kvm_test_request(KVM_REQ_UPDATE_PROTECTED_GUEST_STATE, vcpu))
+ return true;
+
if (kvm_arch_interrupt_allowed(vcpu) &&
(kvm_cpu_has_interrupt(vcpu) ||
kvm_guest_apic_has_interrupt(vcpu)))
--
2.25.1

2022-12-14 20:15:24

by Michael Roth

[permalink] [raw]
Subject: [PATCH RFC v7 56/64] KVM: x86/mmu: Generate KVM_EXIT_MEMORY_FAULT for implicit conversions for SNP

SEV-SNP will set PFERR_GUEST_ENC_MASK for NPT faults for
encrypted/private memory. Generally such accesses will be preceded at
some point by a GHCB request to the hypervisor to put the page in the
expected private/shared state, so the KVM MMU wouldn't normally need to
generate KVM_EXIT_MEMORY_FAULTs to handle the updates at access time.

However, implicit conversions are also supported for SNP guests, and in
those cases an KVM_EXIT_MEMORY_FAULT will be needed to put the page in
the expected private/shared state.

Check for this PFERR_GUEST_ENC_MASK bit when determining whether a #NPF
should be handled with restrictedmem pages or not.

Signed-off-by: Michael Roth <[email protected]>
---
arch/x86/kvm/svm/sev.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 885a3f1da910..0dd3d9debe48 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -4446,7 +4446,10 @@ int sev_fault_is_private(struct kvm *kvm, gpa_t gpa, u64 error_code, bool *priva
* source is the only indicator of whether the fault should be treated
* as private or not.
*/
- *private_fault = kvm_mem_is_private(kvm, gfn);
+ if (sev_snp_guest(kvm))
+ *private_fault = (error_code & PFERR_GUEST_ENC_MASK) ? true : false;
+ else
+ *private_fault = kvm_mem_is_private(kvm, gfn);

return 1;

--
2.25.1

2022-12-14 20:15:47

by Michael Roth

[permalink] [raw]
Subject: [PATCH RFC v7 55/64] KVM: SVM: Add SNP-specific handling for memory attribute updates

This will handle RMP table updates and direct map changes needed for
page state conversions requested by userspace.

Signed-off-by: Michael Roth <[email protected]>
---
arch/x86/kvm/svm/sev.c | 126 +++++++++++++++++++++++++++++++++++++++++
arch/x86/kvm/svm/svm.c | 1 +
arch/x86/kvm/svm/svm.h | 2 +
3 files changed, 129 insertions(+)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 2f4c9f2bcf76..885a3f1da910 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3395,6 +3395,31 @@ static int snp_rmptable_psmash(struct kvm *kvm, kvm_pfn_t pfn)
return psmash(pfn);
}

+static int snp_make_page_shared(struct kvm *kvm, gpa_t gpa, kvm_pfn_t pfn, int level)
+{
+ int rc, rmp_level;
+
+ rc = snp_lookup_rmpentry(pfn, &rmp_level);
+ if (rc < 0)
+ return -EINVAL;
+
+ /* If page is not assigned then do nothing */
+ if (!rc)
+ return 0;
+
+ /*
+ * Is the page part of an existing 2MB RMP entry ? Split the 2MB into
+ * multiple of 4K-page before making the memory shared.
+ */
+ if (level == PG_LEVEL_4K && rmp_level == PG_LEVEL_2M) {
+ rc = snp_rmptable_psmash(kvm, pfn);
+ if (rc)
+ return rc;
+ }
+
+ return rmp_make_shared(pfn, level);
+}
+
/*
* TODO: need to get the value set by userspace in vcpu->run->vmgexit.ghcb_msr
* and process that here accordingly.
@@ -4428,3 +4453,104 @@ int sev_fault_is_private(struct kvm *kvm, gpa_t gpa, u64 error_code, bool *priva
out_unhandled:
return 0;
}
+
+static inline u8 order_to_level(int order)
+{
+ BUILD_BUG_ON(KVM_MAX_HUGEPAGE_LEVEL > PG_LEVEL_1G);
+
+ if (order >= KVM_HPAGE_GFN_SHIFT(PG_LEVEL_1G))
+ return PG_LEVEL_1G;
+
+ if (order >= KVM_HPAGE_GFN_SHIFT(PG_LEVEL_2M))
+ return PG_LEVEL_2M;
+
+ return PG_LEVEL_4K;
+}
+
+int sev_update_mem_attr(struct kvm_memory_slot *slot, unsigned int attr,
+ gfn_t start, gfn_t end)
+{
+ struct kvm_sev_info *sev = &to_kvm_svm(slot->kvm)->sev_info;
+ enum psc_op op = (attr & KVM_MEMORY_ATTRIBUTE_PRIVATE) ? SNP_PAGE_STATE_PRIVATE
+ : SNP_PAGE_STATE_SHARED;
+ gfn_t gfn = start;
+
+ pr_debug("%s: GFN 0x%llx - 0x%llx, op: %d\n", __func__, start, end, op);
+
+ if (!sev_snp_guest(slot->kvm))
+ return 0;
+
+ if (!kvm_slot_can_be_private(slot)) {
+ pr_err_ratelimited("%s: memslot for gfn: 0x%llx is not private.\n",
+ __func__, gfn);
+ return -EPERM;
+ }
+
+ while (gfn < end) {
+ kvm_pfn_t pfn;
+ int level = PG_LEVEL_4K; /* TODO: take actual order into account */
+ gpa_t gpa = gfn_to_gpa(gfn);
+ int npages = 1;
+ int order;
+ int rc;
+
+ /*
+ * No work to do if there was never a page allocated from private
+ * memory. If there was a page that was deallocated previously,
+ * the invalidation notifier should have restored the page to
+ * shared.
+ */
+ rc = kvm_restricted_mem_get_pfn(slot, gfn, &pfn, &order);
+ if (rc) {
+ pr_warn_ratelimited("%s: failed to retrieve gfn 0x%llx from private FD\n",
+ __func__, gfn);
+ gfn++;
+ continue;
+ }
+
+ /*
+ * TODO: The RMP entry's hugepage bit is ignored for
+ * shared/unassigned pages. Either handle looping through each
+ * sub-page as part of snp_make_page_shared(), or remove the
+ * level argument.
+ */
+ if (op == SNP_PAGE_STATE_PRIVATE && order &&
+ IS_ALIGNED(gfn, 1 << order) && (gfn + (1 << order)) <= end) {
+ level = order_to_level(order);
+ npages = 1 << order;
+ }
+
+ /*
+ * Grab the PFN from private memslot and update the RMP entry.
+ * It may be worthwhile to go ahead and map it into the TDP at
+ * this point if the guest is doing lazy acceptance, but for
+ * up-front bulk shared->private conversions it's not likely
+ * the guest will try to access the PFN any time soon, so for
+ * now just take the let KVM MMU handle faulting it on the next
+ * access.
+ */
+ switch (op) {
+ case SNP_PAGE_STATE_SHARED:
+ rc = snp_make_page_shared(slot->kvm, gpa, pfn, level);
+ break;
+ case SNP_PAGE_STATE_PRIVATE:
+ rc = rmp_make_private(pfn, gpa, level, sev->asid, false);
+ break;
+ default:
+ rc = PSC_INVALID_ENTRY;
+ break;
+ }
+
+ put_page(pfn_to_page(pfn));
+
+ if (rc) {
+ pr_err_ratelimited("%s: failed op %d gpa %llx pfn %llx level %d rc %d\n",
+ __func__, op, gpa, pfn, level, rc);
+ return -EINVAL;
+ }
+
+ gfn += npages;
+ }
+
+ return 0;
+}
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 7f8c480dfa5e..6cf5b73f74c1 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4872,6 +4872,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
.fault_is_private = sev_fault_is_private,

.rmp_page_level_adjust = sev_rmp_page_level_adjust,
+ .update_mem_attr = sev_update_mem_attr,
};

/*
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 5e7cb0260dc3..5f315225ae4d 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -741,6 +741,8 @@ int sev_post_map_gfn(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn);
void sev_post_unmap_gfn(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn);
void handle_rmp_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code);
void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu);
+int sev_update_mem_attr(struct kvm_memory_slot *slot, unsigned int attr,
+ gfn_t start, gfn_t end);

int sev_fault_is_private(struct kvm *kvm, gpa_t gpa, u64 error_code, bool *private_fault);

--
2.25.1

2022-12-14 20:15:54

by Michael Roth

[permalink] [raw]
Subject: [PATCH RFC v7 57/64] KVM: SEV: Handle restricted memory invalidations for SNP

Implement a platform hook to do the work of restoring the direct map
entries and cleaning up RMP table entries for restricted memory that is
being freed back to the host.

Signed-off-by: Michael Roth <[email protected]>
---
arch/x86/kvm/svm/sev.c | 64 ++++++++++++++++++++++++++++++++++++++++++
arch/x86/kvm/svm/svm.c | 1 +
arch/x86/kvm/svm/svm.h | 1 +
3 files changed, 66 insertions(+)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 0dd3d9debe48..8783b64557e5 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -4557,3 +4557,67 @@ int sev_update_mem_attr(struct kvm_memory_slot *slot, unsigned int attr,

return 0;
}
+
+void sev_invalidate_private_range(struct kvm_memory_slot *slot, gfn_t start, gfn_t end)
+{
+ gfn_t gfn = start;
+
+ if (!sev_snp_guest(slot->kvm))
+ return;
+
+ if (!kvm_slot_can_be_private(slot)) {
+ pr_warn_ratelimited("SEV: Memslot for GFN: 0x%llx is not private.\n",
+ gfn);
+ return;
+ }
+
+ while (gfn < end) {
+ gpa_t gpa = gfn_to_gpa(gfn);
+ int level = PG_LEVEL_4K;
+ int order, rc;
+ kvm_pfn_t pfn;
+
+ if (!kvm_mem_is_private(slot->kvm, gfn)) {
+ gfn++;
+ continue;
+ }
+
+ rc = kvm_restricted_mem_get_pfn(slot, gfn, &pfn, &order);
+ if (rc) {
+ pr_warn_ratelimited("SEV: Failed to retrieve restricted PFN for GFN 0x%llx, rc: %d\n",
+ gfn, rc);
+ gfn++;
+ continue;
+ }
+
+ if (order) {
+ int rmp_level;
+
+ if (IS_ALIGNED(gpa, page_level_size(PG_LEVEL_2M)) &&
+ gpa + page_level_size(PG_LEVEL_2M) <= gfn_to_gpa(end))
+ level = PG_LEVEL_2M;
+ else
+ pr_debug("%s: GPA 0x%llx is not aligned to 2M, skipping 2M directmap restoration\n",
+ __func__, gpa);
+
+ /* TODO: It may still be possible to restore 2M mapping here, but keep it simple for now. */
+ if (level == PG_LEVEL_2M &&
+ (!snp_lookup_rmpentry(pfn, &rmp_level) || rmp_level == PG_LEVEL_4K)) {
+ pr_debug("%s: PFN 0x%llx is not mapped as 2M private range, skipping 2M directmap restoration\n",
+ __func__, pfn);
+ level = PG_LEVEL_4K;
+ }
+ }
+
+ pr_debug("%s: GPA %llx PFN %llx order %d level %d\n",
+ __func__, gpa, pfn, order, level);
+ rc = snp_make_page_shared(slot->kvm, gpa, pfn, level);
+ if (rc)
+ pr_err("SEV: Failed to restore page to shared, GPA: 0x%llx PFN: 0x%llx order: %d rc: %d\n",
+ gpa, pfn, order, rc);
+
+ gfn += page_level_size(level) >> PAGE_SHIFT;
+ put_page(pfn_to_page(pfn));
+ cond_resched();
+ }
+}
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 6cf5b73f74c1..543261c87eb3 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4873,6 +4873,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {

.rmp_page_level_adjust = sev_rmp_page_level_adjust,
.update_mem_attr = sev_update_mem_attr,
+ .invalidate_restricted_mem = sev_invalidate_private_range,
};

/*
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 5f315225ae4d..277f53c903c2 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -743,6 +743,7 @@ void handle_rmp_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code);
void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu);
int sev_update_mem_attr(struct kvm_memory_slot *slot, unsigned int attr,
gfn_t start, gfn_t end);
+void sev_invalidate_private_range(struct kvm_memory_slot *slot, gfn_t start, gfn_t end);

int sev_fault_is_private(struct kvm *kvm, gpa_t gpa, u64 error_code, bool *private_fault);

--
2.25.1

2022-12-14 20:17:02

by Michael Roth

[permalink] [raw]
Subject: [PATCH RFC v7 58/64] KVM: SVM: Add module parameter to enable the SEV-SNP

From: Brijesh Singh <[email protected]>

Add a module parameter than can be used to enable or disable the SEV-SNP
feature. Now that KVM contains the support for the SNP set the GHCB
hypervisor feature flag to indicate that SNP is supported.

Signed-off-by: Brijesh Singh <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
Signed-off-by: Michael Roth <[email protected]>
---
arch/x86/kvm/svm/sev.c | 7 ++++---
arch/x86/kvm/svm/svm.h | 2 +-
2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 8783b64557e5..b0f25ced7bcf 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -54,14 +54,15 @@ module_param_named(sev, sev_enabled, bool, 0444);
/* enable/disable SEV-ES support */
static bool sev_es_enabled = true;
module_param_named(sev_es, sev_es_enabled, bool, 0444);
+
+/* enable/disable SEV-SNP support */
+static bool sev_snp_enabled = true;
+module_param_named(sev_snp, sev_snp_enabled, bool, 0444);
#else
#define sev_enabled false
#define sev_es_enabled false
#endif /* CONFIG_KVM_AMD_SEV */

-/* enable/disable SEV-SNP support */
-static bool sev_snp_enabled;
-
#define AP_RESET_HOLD_NONE 0
#define AP_RESET_HOLD_NAE_EVENT 1
#define AP_RESET_HOLD_MSR_PROTO 2
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 277f53c903c2..4692ada13f02 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -708,7 +708,7 @@ void avic_set_virtual_apic_mode(struct kvm_vcpu *vcpu);
#define GHCB_VERSION_MAX 2ULL
#define GHCB_VERSION_MIN 1ULL

-#define GHCB_HV_FT_SUPPORTED 0
+#define GHCB_HV_FT_SUPPORTED (GHCB_HV_FT_SNP | GHCB_HV_FT_SNP_AP_CREATION)

extern unsigned int max_sev_asid;

--
2.25.1

2022-12-14 20:30:15

by Michael Roth

[permalink] [raw]
Subject: [PATCH RFC v7 08/64] KVM: Move kvm_for_each_memslot_in_hva_range() to be used in SVM

From: Nikunj A Dadhania <[email protected]>

Move the macro to kvm_host.h and make if visible for SVM to use.

No functional change intended.

Suggested-by: Maciej S. Szmigiero <[email protected]>
Signed-off-by: Nikunj A Dadhania <[email protected]>
Signed-off-by: Michael Roth <[email protected]>
---
include/linux/kvm_host.h | 6 ++++++
virt/kvm/kvm_main.c | 6 ------
2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index f72a2e0b8699..43b5c5aa8e80 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1116,6 +1116,12 @@ static inline bool kvm_memslot_iter_is_valid(struct kvm_memslot_iter *iter, gfn_
kvm_memslot_iter_is_valid(iter, end); \
kvm_memslot_iter_next(iter))

+/* Iterate over each memslot intersecting [start, last] (inclusive) range */
+#define kvm_for_each_memslot_in_hva_range(node, slots, start, last) \
+ for (node = interval_tree_iter_first(&slots->hva_tree, start, last); \
+ node; \
+ node = interval_tree_iter_next(node, start, last))
+
/*
* KVM_SET_USER_MEMORY_REGION ioctl allows the following operations:
* - create a new memory slot
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 73bf0bdedb59..a2306ccf9ab1 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -623,12 +623,6 @@ static void kvm_null_fn(void)
}
#define IS_KVM_NULL_FN(fn) ((fn) == (void *)kvm_null_fn)

-/* Iterate over each memslot intersecting [start, last] (inclusive) range */
-#define kvm_for_each_memslot_in_hva_range(node, slots, start, last) \
- for (node = interval_tree_iter_first(&slots->hva_tree, start, last); \
- node; \
- node = interval_tree_iter_next(node, start, last)) \
-
static __always_inline int __kvm_handle_hva_range(struct kvm *kvm,
const struct kvm_hva_range *range)
{
--
2.25.1

2022-12-14 20:31:05

by Michael Roth

[permalink] [raw]
Subject: [PATCH RFC v7 62/64] x86/sev: Add KVM commands for instance certs

From: Dionna Glaze <[email protected]>

The /dev/sev device has the ability to store host-wide certificates for
the key used by the AMD-SP for SEV-SNP attestation report signing,
but for hosts that want to specify additional certificates that are
specific to the image launched in a VM, a different way is needed to
communicate those certificates.

This patch adds two new KVM ioctl commands: KVM_SEV_SNP_{GET,SET}_CERTS

The certificates that are set with this command are expected to follow
the same format as the host certificates, but that format is opaque
to the kernel.

The new behavior for custom certificates is that the extended guest
request command will now return the overridden certificates if they
were installed for the instance. The error condition for a too small
data buffer is changed to return the overridden certificate data size
if there is an overridden certificate set installed.

Setting a 0 length certificate returns the system state to only return
the host certificates on an extended guest request.

We also increase the SEV_FW_BLOB_MAX_SIZE another 4K page to allow
space for an extra certificate.

Cc: Tom Lendacky <[email protected]>
Cc: Paolo Bonzini <[email protected]>

Signed-off-by: Dionna Glaze <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
Signed-off-by: Michael Roth <[email protected]>
---
arch/x86/kvm/svm/sev.c | 111 ++++++++++++++++++++++++++++++++++++++-
arch/x86/kvm/svm/svm.h | 1 +
include/linux/psp-sev.h | 2 +-
include/uapi/linux/kvm.h | 12 +++++
4 files changed, 123 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 4de952d1d446..d0e58cffd1ed 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -2081,6 +2081,7 @@ static void *snp_context_create(struct kvm *kvm, struct kvm_sev_cmd *argp)
goto e_free;

sev->snp_certs_data = certs_data;
+ sev->snp_certs_len = 0;

return context;

@@ -2364,6 +2365,86 @@ static int snp_launch_finish(struct kvm *kvm, struct kvm_sev_cmd *argp)
return ret;
}

+static int snp_get_instance_certs(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+ struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+ struct kvm_sev_snp_get_certs params;
+
+ if (!sev_snp_guest(kvm))
+ return -ENOTTY;
+
+ if (!sev->snp_context)
+ return -EINVAL;
+
+ if (copy_from_user(&params, (void __user *)(uintptr_t)argp->data,
+ sizeof(params)))
+ return -EFAULT;
+
+ /* No instance certs set. */
+ if (!sev->snp_certs_len)
+ return -ENOENT;
+
+ if (params.certs_len < sev->snp_certs_len) {
+ /* Output buffer too small. Return the required size. */
+ params.certs_len = sev->snp_certs_len;
+
+ if (copy_to_user((void __user *)(uintptr_t)argp->data, &params,
+ sizeof(params)))
+ return -EFAULT;
+
+ return -EINVAL;
+ }
+
+ if (copy_to_user((void __user *)(uintptr_t)params.certs_uaddr,
+ sev->snp_certs_data, sev->snp_certs_len))
+ return -EFAULT;
+
+ return 0;
+}
+
+static int snp_set_instance_certs(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+ struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+ unsigned long length = SEV_FW_BLOB_MAX_SIZE;
+ void *to_certs = sev->snp_certs_data;
+ struct kvm_sev_snp_set_certs params;
+
+ if (!sev_snp_guest(kvm))
+ return -ENOTTY;
+
+ if (!sev->snp_context)
+ return -EINVAL;
+
+ if (copy_from_user(&params, (void __user *)(uintptr_t)argp->data,
+ sizeof(params)))
+ return -EFAULT;
+
+ if (params.certs_len > SEV_FW_BLOB_MAX_SIZE)
+ return -EINVAL;
+
+ /*
+ * Setting a length of 0 is the same as "uninstalling" instance-
+ * specific certificates.
+ */
+ if (params.certs_len == 0) {
+ sev->snp_certs_len = 0;
+ return 0;
+ }
+
+ /* Page-align the length */
+ length = (params.certs_len + PAGE_SIZE - 1) & PAGE_MASK;
+
+ if (copy_from_user(to_certs,
+ (void __user *)(uintptr_t)params.certs_uaddr,
+ params.certs_len)) {
+ return -EFAULT;
+ }
+
+ sev->snp_certs_len = length;
+
+ return 0;
+}
+
int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
{
struct kvm_sev_cmd sev_cmd;
@@ -2463,6 +2544,12 @@ int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
case KVM_SEV_SNP_LAUNCH_FINISH:
r = snp_launch_finish(kvm, &sev_cmd);
break;
+ case KVM_SEV_SNP_GET_CERTS:
+ r = snp_get_instance_certs(kvm, &sev_cmd);
+ break;
+ case KVM_SEV_SNP_SET_CERTS:
+ r = snp_set_instance_certs(kvm, &sev_cmd);
+ break;
default:
r = -EINVAL;
goto out;
@@ -3575,8 +3662,28 @@ static void snp_handle_ext_guest_request(struct vcpu_svm *svm, gpa_t req_gpa, gp
if (rc)
goto unlock;

- rc = snp_guest_ext_guest_request(&req, (unsigned long)sev->snp_certs_data,
- &data_npages, &err);
+ /*
+ * If the VMM has overridden the certs, then change the error message
+ * if the size is inappropriate for the override. Otherwise, use a
+ * regular guest request and copy back the instance certs.
+ */
+ if (sev->snp_certs_len) {
+ if ((data_npages << PAGE_SHIFT) < sev->snp_certs_len) {
+ rc = -EINVAL;
+ err = SNP_GUEST_REQ_INVALID_LEN;
+ goto datalen;
+ }
+ rc = sev_issue_cmd(kvm, SEV_CMD_SNP_GUEST_REQUEST, &req,
+ (int *)&err);
+ } else {
+ rc = snp_guest_ext_guest_request(&req,
+ (unsigned long)sev->snp_certs_data,
+ &data_npages, &err);
+ }
+datalen:
+ if (sev->snp_certs_len)
+ data_npages = sev->snp_certs_len >> PAGE_SHIFT;
+
if (rc) {
/*
* If buffer length is small then return the expected
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 38aa579f6f70..8d1ba66860a4 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -102,6 +102,7 @@ struct kvm_sev_info {
void *snp_context; /* SNP guest context page */
spinlock_t psc_lock;
void *snp_certs_data;
+ unsigned int snp_certs_len; /* Size of instance override for certs */
struct mutex guest_req_lock;

u64 sev_features; /* Features set at VMSA creation */
diff --git a/include/linux/psp-sev.h b/include/linux/psp-sev.h
index a1e6624540f3..970a9de0ed20 100644
--- a/include/linux/psp-sev.h
+++ b/include/linux/psp-sev.h
@@ -22,7 +22,7 @@
#define __psp_pa(x) __pa(x)
#endif

-#define SEV_FW_BLOB_MAX_SIZE 0x4000 /* 16KB */
+#define SEV_FW_BLOB_MAX_SIZE 0x5000 /* 20KB */

/**
* SEV platform state
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 61b1e26ced01..48bcc59cf86b 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1949,6 +1949,8 @@ enum sev_cmd_id {
KVM_SEV_SNP_LAUNCH_START,
KVM_SEV_SNP_LAUNCH_UPDATE,
KVM_SEV_SNP_LAUNCH_FINISH,
+ KVM_SEV_SNP_GET_CERTS,
+ KVM_SEV_SNP_SET_CERTS,

KVM_SEV_NR_MAX,
};
@@ -2096,6 +2098,16 @@ struct kvm_sev_snp_launch_finish {
__u8 pad[6];
};

+struct kvm_sev_snp_get_certs {
+ __u64 certs_uaddr;
+ __u64 certs_len;
+};
+
+struct kvm_sev_snp_set_certs {
+ __u64 certs_uaddr;
+ __u64 certs_len;
+};
+
#define KVM_DEV_ASSIGN_ENABLE_IOMMU (1 << 0)
#define KVM_DEV_ASSIGN_PCI_2_3 (1 << 1)
#define KVM_DEV_ASSIGN_MASK_INTX (1 << 2)
--
2.25.1

2022-12-14 20:34:48

by Michael Roth

[permalink] [raw]
Subject: [PATCH RFC v7 06/64] KVM: x86: Add platform hooks for private memory invalidations

In some cases, like with SEV-SNP, guest memory needs to be updated in a
platform-specific manner before it can be safely freed back to the host.
Add hooks to wire up handling of this sort to the invalidation notifiers
for restricted memory.

Also issue invalidations of all allocated pages during notifier
unregistration so that the pages are not left in an unusable state when
they eventually get freed back to the host upon FD release.

Signed-off-by: Michael Roth <[email protected]>
---
arch/x86/include/asm/kvm-x86-ops.h | 1 +
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/kvm/mmu/mmu.c | 5 +++++
include/linux/kvm_host.h | 2 ++
mm/restrictedmem.c | 16 ++++++++++++++++
virt/kvm/kvm_main.c | 5 +++++
6 files changed, 30 insertions(+)

diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
index 52f94a0ba5e9..c71df44b0f02 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -134,6 +134,7 @@ KVM_X86_OP_OPTIONAL_RET0(vcpu_get_apicv_inhibit_reasons);
KVM_X86_OP_OPTIONAL_RET0(private_mem_enabled);
KVM_X86_OP_OPTIONAL_RET0(fault_is_private);
KVM_X86_OP_OPTIONAL_RET0(update_mem_attr)
+KVM_X86_OP_OPTIONAL(invalidate_restricted_mem)

#undef KVM_X86_OP
#undef KVM_X86_OP_OPTIONAL
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 13802389f0f9..9ef8d73455d9 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1639,6 +1639,7 @@ struct kvm_x86_ops {
int (*fault_is_private)(struct kvm *kvm, gpa_t gpa, u64 error_code, bool *private_fault);
int (*update_mem_attr)(struct kvm_memory_slot *slot, unsigned int attr,
gfn_t start, gfn_t end);
+ void (*invalidate_restricted_mem)(struct kvm_memory_slot *slot, gfn_t start, gfn_t end);

bool (*has_wbinvd_exit)(void);

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index a0c41d391547..2713632e5061 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -7183,3 +7183,8 @@ void kvm_arch_set_memory_attributes(struct kvm *kvm,
kvm_update_lpage_private_shared_mixed(kvm, slot, attrs,
start, end);
}
+
+void kvm_arch_invalidate_restricted_mem(struct kvm_memory_slot *slot, gfn_t start, gfn_t end)
+{
+ static_call_cond(kvm_x86_invalidate_restricted_mem)(slot, start, end);
+}
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index f032d878e034..f72a2e0b8699 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -2327,6 +2327,7 @@ void kvm_arch_set_memory_attributes(struct kvm *kvm,
struct kvm_memory_slot *slot,
unsigned long attrs,
gfn_t start, gfn_t end);
+
#else
static inline void kvm_arch_set_memory_attributes(struct kvm *kvm,
struct kvm_memory_slot *slot,
@@ -2366,6 +2367,7 @@ static inline int kvm_restricted_mem_get_pfn(struct kvm_memory_slot *slot,
}

void kvm_arch_memory_mce(struct kvm *kvm);
+void kvm_arch_invalidate_restricted_mem(struct kvm_memory_slot *slot, gfn_t start, gfn_t end);
#endif /* CONFIG_HAVE_KVM_RESTRICTED_MEM */

#endif
diff --git a/mm/restrictedmem.c b/mm/restrictedmem.c
index 56953c204e5c..74fa2cfb8618 100644
--- a/mm/restrictedmem.c
+++ b/mm/restrictedmem.c
@@ -54,6 +54,11 @@ static int restrictedmem_release(struct inode *inode, struct file *file)
{
struct restrictedmem_data *data = inode->i_mapping->private_data;

+ pr_debug("%s: releasing memfd, invalidating page offsets 0x0-0x%llx\n",
+ __func__, inode->i_size >> PAGE_SHIFT);
+ restrictedmem_invalidate_start(data, 0, inode->i_size >> PAGE_SHIFT);
+ restrictedmem_invalidate_end(data, 0, inode->i_size >> PAGE_SHIFT);
+
fput(data->memfd);
kfree(data);
return 0;
@@ -258,6 +263,17 @@ void restrictedmem_unregister_notifier(struct file *file,
struct restrictedmem_notifier *notifier)
{
struct restrictedmem_data *data = file->f_mapping->private_data;
+ struct inode *inode = file_inode(data->memfd);
+
+ /* TODO: this will issue notifications to all registered notifiers,
+ * but it's only the one being unregistered that needs to process
+ * invalidations for any ranges still allocated at this point in
+ * time. For now this relies on KVM currently being the only notifier.
+ */
+ pr_debug("%s: unregistering notifier, invalidating page offsets 0x0-0x%llx\n",
+ __func__, inode->i_size >> PAGE_SHIFT);
+ restrictedmem_invalidate_start(data, 0, inode->i_size >> PAGE_SHIFT);
+ restrictedmem_invalidate_end(data, 0, inode->i_size >> PAGE_SHIFT);

mutex_lock(&data->lock);
list_del(&notifier->list);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index d2d829d23442..d2daa049e94a 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -974,6 +974,9 @@ static void kvm_restrictedmem_invalidate_begin(struct restrictedmem_notifier *no
&gfn_start, &gfn_end))
return;

+ pr_debug("%s: start: 0x%lx, end: 0x%lx, roffset: 0x%llx, gfn_start: 0x%llx, gfn_end: 0x%llx\n",
+ __func__, start, end, slot->restricted_offset, gfn_start, gfn_end);
+
gfn_range.start = gfn_start;
gfn_range.end = gfn_end;
gfn_range.slot = slot;
@@ -988,6 +991,8 @@ static void kvm_restrictedmem_invalidate_begin(struct restrictedmem_notifier *no
if (kvm_unmap_gfn_range(kvm, &gfn_range))
kvm_flush_remote_tlbs(kvm);

+ kvm_arch_invalidate_restricted_mem(slot, gfn_start, gfn_end);
+
KVM_MMU_UNLOCK(kvm);
srcu_read_unlock(&kvm->srcu, idx);
}
--
2.25.1

2022-12-14 20:35:23

by Michael Roth

[permalink] [raw]
Subject: [PATCH RFC v7 64/64] iommu/amd: Add IOMMU_SNP_SHUTDOWN support

From: Ashish Kalra <[email protected]>

Add a new IOMMU API interface amd_iommu_snp_disable() to transition
IOMMU pages to Hypervisor state from Reclaim state after SNP_SHUTDOWN_EX
command. Invoke this API from the CCP driver after SNP_SHUTDOWN_EX
command.

Signed-off-by: Ashish Kalra <[email protected]>
Signed-off-by: Michael Roth <[email protected]>
---
drivers/crypto/ccp/sev-dev.c | 20 ++++++++++++++
drivers/iommu/amd/init.c | 53 ++++++++++++++++++++++++++++++++++++
include/linux/amd-iommu.h | 1 +
3 files changed, 74 insertions(+)

diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index 052190bdb8a6..6c4fdcaed72b 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -24,6 +24,7 @@
#include <linux/cpufeature.h>
#include <linux/fs.h>
#include <linux/fs_struct.h>
+#include <linux/amd-iommu.h>

#include <asm/smp.h>
#include <asm/e820/types.h>
@@ -1503,6 +1504,25 @@ static int __sev_snp_shutdown_locked(int *error)
return ret;
}

+ /*
+ * SNP_SHUTDOWN_EX with IOMMU_SNP_SHUTDOWN set to 1 disables SNP
+ * enforcement by the IOMMU and also transitions all pages
+ * associated with the IOMMU to the Reclaim state.
+ * Firmware was transitioning the IOMMU pages to Hypervisor state
+ * before version 1.53. But, accounting for the number of assigned
+ * 4kB pages in a 2M page was done incorrectly by not transitioning
+ * to the Reclaim state. This resulted in RMP #PF when later accessing
+ * the 2M page containing those pages during kexec boot. Hence, the
+ * firmware now transitions these pages to Reclaim state and hypervisor
+ * needs to transition these pages to shared state. SNP Firmware
+ * version 1.53 and above are needed for kexec boot.
+ */
+ ret = amd_iommu_snp_disable();
+ if (ret) {
+ dev_err(sev->dev, "SNP IOMMU shutdown failed\n");
+ return ret;
+ }
+
sev->snp_initialized = false;
dev_dbg(sev->dev, "SEV-SNP firmware shutdown\n");

diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 1a2d425bf568..d1270e3c5baf 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -30,6 +30,7 @@
#include <asm/io_apic.h>
#include <asm/irq_remapping.h>
#include <asm/set_memory.h>
+#include <asm/sev.h>

#include <linux/crash_dump.h>

@@ -3651,4 +3652,56 @@ int amd_iommu_snp_enable(void)

return 0;
}
+
+static int iommu_page_make_shared(void *page)
+{
+ unsigned long pfn;
+
+ pfn = iommu_virt_to_phys(page) >> PAGE_SHIFT;
+ return rmp_make_shared(pfn, PG_LEVEL_4K);
+}
+
+static int iommu_make_shared(void *va, size_t size)
+{
+ void *page;
+ int ret;
+
+ if (!va)
+ return 0;
+
+ for (page = va; page < (va + size); page += PAGE_SIZE) {
+ ret = iommu_page_make_shared(page);
+ if (ret)
+ return ret;
+ }
+
+ return 0;
+}
+
+int amd_iommu_snp_disable(void)
+{
+ struct amd_iommu *iommu;
+ int ret;
+
+ if (!amd_iommu_snp_en)
+ return 0;
+
+ for_each_iommu(iommu) {
+ ret = iommu_make_shared(iommu->evt_buf, EVT_BUFFER_SIZE);
+ if (ret)
+ return ret;
+
+ ret = iommu_make_shared(iommu->ppr_log, PPR_LOG_SIZE);
+ if (ret)
+ return ret;
+
+ ret = iommu_make_shared((void *)iommu->cmd_sem, PAGE_SIZE);
+ if (ret)
+ return ret;
+ }
+
+ amd_iommu_snp_en = false;
+ return 0;
+}
+EXPORT_SYMBOL_GPL(amd_iommu_snp_disable);
#endif
diff --git a/include/linux/amd-iommu.h b/include/linux/amd-iommu.h
index 953e6f12fa1c..a1b33b838842 100644
--- a/include/linux/amd-iommu.h
+++ b/include/linux/amd-iommu.h
@@ -208,6 +208,7 @@ struct amd_iommu *get_amd_iommu(unsigned int idx);

#ifdef CONFIG_AMD_MEM_ENCRYPT
int amd_iommu_snp_enable(void);
+int amd_iommu_snp_disable(void);
#endif

#endif /* _ASM_X86_AMD_IOMMU_H */
--
2.25.1

2022-12-14 20:36:06

by Michael Roth

[permalink] [raw]
Subject: [PATCH RFC v7 07/64] KVM: SEV: Handle KVM_HC_MAP_GPA_RANGE hypercall

From: Nikunj A Dadhania <[email protected]>

KVM_HC_MAP_GPA_RANGE hypercall is used by the SEV guest to notify a
change in the page encryption status to the hypervisor.

The hypercall exits to userspace with KVM_EXIT_HYPERCALL exit code,
currently this is used for explicit memory conversion between
shared/private for memfd based private memory.

Signed-off-by: Nikunj A Dadhania <[email protected]>
Signed-off-by: Michael Roth <[email protected]>
---
arch/x86/kvm/x86.c | 8 ++++++++
virt/kvm/kvm_main.c | 1 +
2 files changed, 9 insertions(+)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index bb6adb216054..732f9cbbadb5 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9649,6 +9649,7 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
break;
case KVM_HC_MAP_GPA_RANGE: {
u64 gpa = a0, npages = a1, attrs = a2;
+ struct kvm_memory_slot *slot;

ret = -KVM_ENOSYS;
if (!(vcpu->kvm->arch.hypercall_exit_enabled & (1 << KVM_HC_MAP_GPA_RANGE)))
@@ -9660,6 +9661,13 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
break;
}

+ slot = kvm_vcpu_gfn_to_memslot(vcpu, gpa_to_gfn(gpa));
+ if (!vcpu->kvm->arch.upm_mode ||
+ !kvm_slot_can_be_private(slot)) {
+ ret = 0;
+ break;
+ }
+
vcpu->run->exit_reason = KVM_EXIT_HYPERCALL;
vcpu->run->hypercall.nr = KVM_HC_MAP_GPA_RANGE;
vcpu->run->hypercall.args[0] = gpa;
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index d2daa049e94a..73bf0bdedb59 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2646,6 +2646,7 @@ struct kvm_memory_slot *kvm_vcpu_gfn_to_memslot(struct kvm_vcpu *vcpu, gfn_t gfn

return NULL;
}
+EXPORT_SYMBOL_GPL(kvm_vcpu_gfn_to_memslot);

bool kvm_is_visible_gfn(struct kvm *kvm, gfn_t gfn)
{
--
2.25.1

2022-12-15 01:12:11

by Hugh Dickins

[permalink] [raw]
Subject: Re: [PATCH RFC v7 21/64] x86/fault: fix handle_split_page_fault() to work with memfd backed pages

On Wed, 14 Dec 2022, Michael Roth wrote:
> From: Hugh Dickins <[email protected]>
>
> When the address is backed by a memfd, the code to split the page does
> nothing more than remove the PMD from the page tables. So immediately
> install a PTE to ensure that any other pages in that 2MB region are
> brought back as in 4K pages.
>
> Signed-off-by: Hugh Dickins <[email protected]>
> Cc: Hugh Dickins <[email protected]>
> Signed-off-by: Ashish Kalra <[email protected]>
> Signed-off-by: Michael Roth <[email protected]>

Hah, it's good to see this again, but it was "Suggested-by" me, not
"Signed-off-by" me. And was a neat pragmatic one-liner workaround
for the immediate problem we had, but came with caveats.

The problem is that we have one wind blowing in the split direction,
and another wind (khugepaged) blowing in the collapse direction, and
who wins for how long depends on factors I've not fully got to grips
with (and is liable to differ between kernel releases).

Good and bad timing to see it. I was just yesterday reviewing a patch
to the collapsing wind, which reminded me of an improvement yet to be
made there, thinking I'd like to try it sometime; but recallng that
someone somewhere relies on the splitting wind, and doesn't want the
collapsing wind to blow any harder - now you remind me who!

Bad timing in that I don't have any quick answer on the right thing
to do instead, and can't give it the thought it needs at the moment -
perhaps others can chime in more usefully.

Hugh

p.s. I don't know where "handle_split_page_fault" comes in, but
"x86/fault" in the subject looks wrong, since this appears to be
in generic code; and "memfd" seems inappropriate too, but perhaps you
have a situation where only memfds can reach handle_split_page_fault().

> ---
> mm/memory.c | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/mm/memory.c b/mm/memory.c
> index e68da7e403c6..33c9020ba1f8 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -4999,6 +4999,11 @@ static vm_fault_t handle_pte_fault(struct vm_fault *vmf)
> static int handle_split_page_fault(struct vm_fault *vmf)
> {
> __split_huge_pmd(vmf->vma, vmf->pmd, vmf->address, false, NULL);
> + /*
> + * Install a PTE immediately to ensure that any other pages in
> + * this 2MB region are brought back in as 4K pages.
> + */
> + __pte_alloc(vmf->vma->vm_mm, vmf->pmd);
> return 0;
> }
>
> --
> 2.25.1

2022-12-19 18:09:55

by Tom Lendacky

[permalink] [raw]
Subject: Re: [PATCH RFC v7 40/64] KVM: SVM: Add KVM_SEV_SNP_LAUNCH_FINISH command

On 12/14/22 13:40, Michael Roth wrote:
> From: Brijesh Singh <[email protected]>
>
> The KVM_SEV_SNP_LAUNCH_FINISH finalize the cryptographic digest and stores
> it as the measurement of the guest at launch.
>
> While finalizing the launch flow, it also issues the LAUNCH_UPDATE command
> to encrypt the VMSA pages.
>
> If its an SNP guest, then VMSA was added in the RMP entry as
> a guest owned page and also removed from the kernel direct map
> so flush it later after it is transitioned back to hypervisor
> state and restored in the direct map.
>
> Signed-off-by: Brijesh Singh <[email protected]>
> Signed-off-by: Harald Hoyer <[email protected]>
> Signed-off-by: Ashish Kalra <[email protected]>
> Signed-off-by: Michael Roth <[email protected]>
> ---
> .../virt/kvm/x86/amd-memory-encryption.rst | 22 ++++
> arch/x86/kvm/svm/sev.c | 119 ++++++++++++++++++
> include/uapi/linux/kvm.h | 14 +++
> 3 files changed, 155 insertions(+)
>
> diff --git a/Documentation/virt/kvm/x86/amd-memory-encryption.rst b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
> index c94be8e6d657..e4b42aaab1de 100644
> --- a/Documentation/virt/kvm/x86/amd-memory-encryption.rst
> +++ b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
> @@ -513,6 +513,28 @@ Returns: 0 on success, -negative on error
> See the SEV-SNP spec for further details on how to build the VMPL permission
> mask and page type.
>
> +21. KVM_SNP_LAUNCH_FINISH
> +-------------------------
> +
> +After completion of the SNP guest launch flow, the KVM_SNP_LAUNCH_FINISH command can be
> +issued to make the guest ready for the execution.
> +
> +Parameters (in): struct kvm_sev_snp_launch_finish
> +
> +Returns: 0 on success, -negative on error
> +
> +::
> +
> + struct kvm_sev_snp_launch_finish {
> + __u64 id_block_uaddr;
> + __u64 id_auth_uaddr;
> + __u8 id_block_en;
> + __u8 auth_key_en;
> + __u8 host_data[32];

This is missing the 6 bytes of padding at the end of the struct.

> + };
> +
> +
> +See SEV-SNP specification for further details on launch finish input parameters.
>
> References
> ==========
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 379e61a9226a..6f901545bed9 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -2243,6 +2243,106 @@ static int snp_launch_update(struct kvm *kvm, struct kvm_sev_cmd *argp)
> snp_launch_update_gfn_handler, argp);
> }
>
> +static int snp_launch_update_vmsa(struct kvm *kvm, struct kvm_sev_cmd *argp)
> +{
> + struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> + struct sev_data_snp_launch_update data = {};
> + int i, ret;
> +
> + data.gctx_paddr = __psp_pa(sev->snp_context);
> + data.page_type = SNP_PAGE_TYPE_VMSA;
> +
> + for (i = 0; i < kvm->created_vcpus; i++) {
> + struct vcpu_svm *svm = to_svm(xa_load(&kvm->vcpu_array, i));
> + u64 pfn = __pa(svm->sev_es.vmsa) >> PAGE_SHIFT;
> +
> + /* Perform some pre-encryption checks against the VMSA */
> + ret = sev_es_sync_vmsa(svm);
> + if (ret)
> + return ret;
> +
> + /* Transition the VMSA page to a firmware state. */
> + ret = rmp_make_private(pfn, -1, PG_LEVEL_4K, sev->asid, true);
> + if (ret)
> + return ret;
> +
> + /* Issue the SNP command to encrypt the VMSA */
> + data.address = __sme_pa(svm->sev_es.vmsa);
> + ret = __sev_issue_cmd(argp->sev_fd, SEV_CMD_SNP_LAUNCH_UPDATE,
> + &data, &argp->error);
> + if (ret) {
> + snp_page_reclaim(pfn);
> + return ret;
> + }
> +
> + svm->vcpu.arch.guest_state_protected = true;
> + }
> +
> + return 0;
> +}
> +
> +static int snp_launch_finish(struct kvm *kvm, struct kvm_sev_cmd *argp)
> +{
> + struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> + struct kvm_sev_snp_launch_finish params;
> + struct sev_data_snp_launch_finish *data;
> + void *id_block = NULL, *id_auth = NULL;
> + int ret;
> +
> + if (!sev_snp_guest(kvm))
> + return -ENOTTY;
> +
> + if (!sev->snp_context)
> + return -EINVAL;
> +
> + if (copy_from_user(&params, (void __user *)(uintptr_t)argp->data, sizeof(params)))
> + return -EFAULT;
> +
> + /* Measure all vCPUs using LAUNCH_UPDATE before finalizing the launch flow. */
> + ret = snp_launch_update_vmsa(kvm, argp);
> + if (ret)
> + return ret;
> +
> + data = kzalloc(sizeof(*data), GFP_KERNEL_ACCOUNT);
> + if (!data)
> + return -ENOMEM;
> +
> + if (params.id_block_en) {
> + id_block = psp_copy_user_blob(params.id_block_uaddr, KVM_SEV_SNP_ID_BLOCK_SIZE);
> + if (IS_ERR(id_block)) {
> + ret = PTR_ERR(id_block);
> + goto e_free;
> + }
> +
> + data->id_block_en = 1;
> + data->id_block_paddr = __sme_pa(id_block);
> +
> + id_auth = psp_copy_user_blob(params.id_auth_uaddr, KVM_SEV_SNP_ID_AUTH_SIZE);
> + if (IS_ERR(id_auth)) {
> + ret = PTR_ERR(id_auth);
> + goto e_free_id_block;
> + }
> +
> + data->id_auth_paddr = __sme_pa(id_auth);
> +
> + if (params.auth_key_en)
> + data->auth_key_en = 1;
> + }
> +
> + data->gctx_paddr = __psp_pa(sev->snp_context);

This is missing the copying of the params.host_data field into the
data->host_data field. This is needed so that the host_data shows up in
the attestation report.

Thanks,
Tom

> + ret = sev_issue_cmd(kvm, SEV_CMD_SNP_LAUNCH_FINISH, data, &argp->error);
> +
> + kfree(id_auth);
> +
> +e_free_id_block:
> + kfree(id_block);
> +
> +e_free:
> + kfree(data);
> +
> + return ret;
> +}
> +
> int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
> {
> struct kvm_sev_cmd sev_cmd;
> @@ -2339,6 +2439,9 @@ int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
> case KVM_SEV_SNP_LAUNCH_UPDATE:
> r = snp_launch_update(kvm, &sev_cmd);
> break;
> + case KVM_SEV_SNP_LAUNCH_FINISH:
> + r = snp_launch_finish(kvm, &sev_cmd);
> + break;
> default:
> r = -EINVAL;
> goto out;
> @@ -2794,11 +2897,27 @@ void sev_free_vcpu(struct kvm_vcpu *vcpu)
>
> svm = to_svm(vcpu);
>
> + /*
> + * If its an SNP guest, then VMSA was added in the RMP entry as
> + * a guest owned page. Transition the page to hypervisor state
> + * before releasing it back to the system.
> + * Also the page is removed from the kernel direct map, so flush it
> + * later after it is transitioned back to hypervisor state and
> + * restored in the direct map.
> + */
> + if (sev_snp_guest(vcpu->kvm)) {
> + u64 pfn = __pa(svm->sev_es.vmsa) >> PAGE_SHIFT;
> +
> + if (host_rmp_make_shared(pfn, PG_LEVEL_4K, true))
> + goto skip_vmsa_free;
> + }
> +
> if (vcpu->arch.guest_state_protected)
> sev_flush_encrypted_page(vcpu, svm->sev_es.vmsa);
>
> __free_page(virt_to_page(svm->sev_es.vmsa));
>
> +skip_vmsa_free:
> if (svm->sev_es.ghcb_sa_free)
> kvfree(svm->sev_es.ghcb_sa);
> }
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 9b6c95cc62a8..c468adc1f147 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -1942,6 +1942,7 @@ enum sev_cmd_id {
> KVM_SEV_SNP_INIT,
> KVM_SEV_SNP_LAUNCH_START,
> KVM_SEV_SNP_LAUNCH_UPDATE,
> + KVM_SEV_SNP_LAUNCH_FINISH,
>
> KVM_SEV_NR_MAX,
> };
> @@ -2076,6 +2077,19 @@ struct kvm_sev_snp_launch_update {
> __u8 vmpl1_perms;
> };
>
> +#define KVM_SEV_SNP_ID_BLOCK_SIZE 96
> +#define KVM_SEV_SNP_ID_AUTH_SIZE 4096
> +#define KVM_SEV_SNP_FINISH_DATA_SIZE 32
> +
> +struct kvm_sev_snp_launch_finish {
> + __u64 id_block_uaddr;
> + __u64 id_auth_uaddr;
> + __u8 id_block_en;
> + __u8 auth_key_en;
> + __u8 host_data[KVM_SEV_SNP_FINISH_DATA_SIZE];
> + __u8 pad[6];
> +};
> +
> #define KVM_DEV_ASSIGN_ENABLE_IOMMU (1 << 0)
> #define KVM_DEV_ASSIGN_PCI_2_3 (1 << 1)
> #define KVM_DEV_ASSIGN_MASK_INTX (1 << 2)

2022-12-19 23:47:36

by Kalra, Ashish

[permalink] [raw]
Subject: Re: [PATCH RFC v7 40/64] KVM: SVM: Add KVM_SEV_SNP_LAUNCH_FINISH command

Hello Tom,

On 12/19/2022 12:04 PM, Tom Lendacky wrote:
> On 12/14/22 13:40, Michael Roth wrote:
>> From: Brijesh Singh <[email protected]>
>>
>> The KVM_SEV_SNP_LAUNCH_FINISH finalize the cryptographic digest and
>> stores
>> it as the measurement of the guest at launch.
>>
>> While finalizing the launch flow, it also issues the LAUNCH_UPDATE
>> command
>> to encrypt the VMSA pages.
>>
>> If its an SNP guest, then VMSA was added in the RMP entry as
>> a guest owned page and also removed from the kernel direct map
>> so flush it later after it is transitioned back to hypervisor
>> state and restored in the direct map.
>>
>> Signed-off-by: Brijesh Singh <[email protected]>
>> Signed-off-by: Harald Hoyer <[email protected]>
>> Signed-off-by: Ashish Kalra <[email protected]>
>> Signed-off-by: Michael Roth <[email protected]>
>> ---
>>   .../virt/kvm/x86/amd-memory-encryption.rst    |  22 ++++
>>   arch/x86/kvm/svm/sev.c                        | 119 ++++++++++++++++++
>>   include/uapi/linux/kvm.h                      |  14 +++
>>   3 files changed, 155 insertions(+)
>>
>> diff --git a/Documentation/virt/kvm/x86/amd-memory-encryption.rst
>> b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
>> index c94be8e6d657..e4b42aaab1de 100644
>> --- a/Documentation/virt/kvm/x86/amd-memory-encryption.rst
>> +++ b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
>> @@ -513,6 +513,28 @@ Returns: 0 on success, -negative on error
>>   See the SEV-SNP spec for further details on how to build the VMPL
>> permission
>>   mask and page type.
>> +21. KVM_SNP_LAUNCH_FINISH
>> +-------------------------
>> +
>> +After completion of the SNP guest launch flow, the
>> KVM_SNP_LAUNCH_FINISH command can be
>> +issued to make the guest ready for the execution.
>> +
>> +Parameters (in): struct kvm_sev_snp_launch_finish
>> +
>> +Returns: 0 on success, -negative on error
>> +
>> +::
>> +
>> +        struct kvm_sev_snp_launch_finish {
>> +                __u64 id_block_uaddr;
>> +                __u64 id_auth_uaddr;
>> +                __u8 id_block_en;
>> +                __u8 auth_key_en;
>> +                __u8 host_data[32];
>
> This is missing the 6 bytes of padding at the end of the struct.
>

Yes will fix this, the documentation is missing that, the structure
defination in include/uapi/linux/kvm.h includes it.

But why do we need this padding ?

>> +        };
>> +
>> +
>> +See SEV-SNP specification for further details on launch finish input
>> parameters.
>>   References
>>   ==========
>> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
>> index 379e61a9226a..6f901545bed9 100644
>> --- a/arch/x86/kvm/svm/sev.c
>> +++ b/arch/x86/kvm/svm/sev.c
>> @@ -2243,6 +2243,106 @@ static int snp_launch_update(struct kvm *kvm,
>> struct kvm_sev_cmd *argp)
>>                         snp_launch_update_gfn_handler, argp);
>>   }
>> +static int snp_launch_update_vmsa(struct kvm *kvm, struct kvm_sev_cmd
>> *argp)
>> +{
>> +    struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
>> +    struct sev_data_snp_launch_update data = {};
>> +    int i, ret;
>> +
>> +    data.gctx_paddr = __psp_pa(sev->snp_context);
>> +    data.page_type = SNP_PAGE_TYPE_VMSA;
>> +
>> +    for (i = 0; i < kvm->created_vcpus; i++) {
>> +        struct vcpu_svm *svm = to_svm(xa_load(&kvm->vcpu_array, i));
>> +        u64 pfn = __pa(svm->sev_es.vmsa) >> PAGE_SHIFT;
>> +
>> +        /* Perform some pre-encryption checks against the VMSA */
>> +        ret = sev_es_sync_vmsa(svm);
>> +        if (ret)
>> +            return ret;
>> +
>> +        /* Transition the VMSA page to a firmware state. */
>> +        ret = rmp_make_private(pfn, -1, PG_LEVEL_4K, sev->asid, true);
>> +        if (ret)
>> +            return ret;
>> +
>> +        /* Issue the SNP command to encrypt the VMSA */
>> +        data.address = __sme_pa(svm->sev_es.vmsa);
>> +        ret = __sev_issue_cmd(argp->sev_fd, SEV_CMD_SNP_LAUNCH_UPDATE,
>> +                      &data, &argp->error);
>> +        if (ret) {
>> +            snp_page_reclaim(pfn);
>> +            return ret;
>> +        }
>> +
>> +        svm->vcpu.arch.guest_state_protected = true;
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>> +static int snp_launch_finish(struct kvm *kvm, struct kvm_sev_cmd *argp)
>> +{
>> +    struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
>> +    struct kvm_sev_snp_launch_finish params;
>> +    struct sev_data_snp_launch_finish *data;
>> +    void *id_block = NULL, *id_auth = NULL;
>> +    int ret;
>> +
>> +    if (!sev_snp_guest(kvm))
>> +        return -ENOTTY;
>> +
>> +    if (!sev->snp_context)
>> +        return -EINVAL;
>> +
>> +    if (copy_from_user(&params, (void __user *)(uintptr_t)argp->data,
>> sizeof(params)))
>> +        return -EFAULT;
>> +
>> +    /* Measure all vCPUs using LAUNCH_UPDATE before finalizing the
>> launch flow. */
>> +    ret = snp_launch_update_vmsa(kvm, argp);
>> +    if (ret)
>> +        return ret;
>> +
>> +    data = kzalloc(sizeof(*data), GFP_KERNEL_ACCOUNT);
>> +    if (!data)
>> +        return -ENOMEM;
>> +
>> +    if (params.id_block_en) {
>> +        id_block = psp_copy_user_blob(params.id_block_uaddr,
>> KVM_SEV_SNP_ID_BLOCK_SIZE);
>> +        if (IS_ERR(id_block)) {
>> +            ret = PTR_ERR(id_block);
>> +            goto e_free;
>> +        }
>> +
>> +        data->id_block_en = 1;
>> +        data->id_block_paddr = __sme_pa(id_block);
>> +
>> +        id_auth = psp_copy_user_blob(params.id_auth_uaddr,
>> KVM_SEV_SNP_ID_AUTH_SIZE);
>> +        if (IS_ERR(id_auth)) {
>> +            ret = PTR_ERR(id_auth);
>> +            goto e_free_id_block;
>> +        }
>> +
>> +        data->id_auth_paddr = __sme_pa(id_auth);
>> +
>> +        if (params.auth_key_en)
>> +            data->auth_key_en = 1;
>> +    }
>> +
>> +    data->gctx_paddr = __psp_pa(sev->snp_context);
>
> This is missing the copying of the params.host_data field into the
> data->host_data field. This is needed so that the host_data shows up in
> the attestation report.
>

Yes will fix this.

Thanks,
Ashish

> Thanks,
> Tom
>
>> +    ret = sev_issue_cmd(kvm, SEV_CMD_SNP_LAUNCH_FINISH, data,
>> &argp->error);
>> +
>> +    kfree(id_auth);
>> +
>> +e_free_id_block:
>> +    kfree(id_block);
>> +
>> +e_free:
>> +    kfree(data);
>> +
>> +    return ret;
>> +}
>> +
>>   int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
>>   {
>>       struct kvm_sev_cmd sev_cmd;
>> @@ -2339,6 +2439,9 @@ int sev_mem_enc_ioctl(struct kvm *kvm, void
>> __user *argp)
>>       case KVM_SEV_SNP_LAUNCH_UPDATE:
>>           r = snp_launch_update(kvm, &sev_cmd);
>>           break;
>> +    case KVM_SEV_SNP_LAUNCH_FINISH:
>> +        r = snp_launch_finish(kvm, &sev_cmd);
>> +        break;
>>       default:
>>           r = -EINVAL;
>>           goto out;
>> @@ -2794,11 +2897,27 @@ void sev_free_vcpu(struct kvm_vcpu *vcpu)
>>       svm = to_svm(vcpu);
>> +    /*
>> +     * If its an SNP guest, then VMSA was added in the RMP entry as
>> +     * a guest owned page. Transition the page to hypervisor state
>> +     * before releasing it back to the system.
>> +     * Also the page is removed from the kernel direct map, so flush it
>> +     * later after it is transitioned back to hypervisor state and
>> +     * restored in the direct map.
>> +     */
>> +    if (sev_snp_guest(vcpu->kvm)) {
>> +        u64 pfn = __pa(svm->sev_es.vmsa) >> PAGE_SHIFT;
>> +
>> +        if (host_rmp_make_shared(pfn, PG_LEVEL_4K, true))
>> +            goto skip_vmsa_free;
>> +    }
>> +
>>       if (vcpu->arch.guest_state_protected)
>>           sev_flush_encrypted_page(vcpu, svm->sev_es.vmsa);
>>       __free_page(virt_to_page(svm->sev_es.vmsa));
>> +skip_vmsa_free:
>>       if (svm->sev_es.ghcb_sa_free)
>>           kvfree(svm->sev_es.ghcb_sa);
>>   }
>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
>> index 9b6c95cc62a8..c468adc1f147 100644
>> --- a/include/uapi/linux/kvm.h
>> +++ b/include/uapi/linux/kvm.h
>> @@ -1942,6 +1942,7 @@ enum sev_cmd_id {
>>       KVM_SEV_SNP_INIT,
>>       KVM_SEV_SNP_LAUNCH_START,
>>       KVM_SEV_SNP_LAUNCH_UPDATE,
>> +    KVM_SEV_SNP_LAUNCH_FINISH,
>>       KVM_SEV_NR_MAX,
>>   };
>> @@ -2076,6 +2077,19 @@ struct kvm_sev_snp_launch_update {
>>       __u8 vmpl1_perms;
>>   };
>> +#define KVM_SEV_SNP_ID_BLOCK_SIZE    96
>> +#define KVM_SEV_SNP_ID_AUTH_SIZE    4096
>> +#define KVM_SEV_SNP_FINISH_DATA_SIZE    32
>> +
>> +struct kvm_sev_snp_launch_finish {
>> +    __u64 id_block_uaddr;
>> +    __u64 id_auth_uaddr;
>> +    __u8 id_block_en;
>> +    __u8 auth_key_en;
>> +    __u8 host_data[KVM_SEV_SNP_FINISH_DATA_SIZE];
>> +    __u8 pad[6];
>> +};
>> +
>>   #define KVM_DEV_ASSIGN_ENABLE_IOMMU    (1 << 0)
>>   #define KVM_DEV_ASSIGN_PCI_2_3        (1 << 1)
>>   #define KVM_DEV_ASSIGN_MASK_INTX    (1 << 2)

2022-12-20 14:30:53

by Tom Lendacky

[permalink] [raw]
Subject: Re: [PATCH RFC v7 40/64] KVM: SVM: Add KVM_SEV_SNP_LAUNCH_FINISH command

On 12/19/22 17:24, Kalra, Ashish wrote:
> Hello Tom,
>
> On 12/19/2022 12:04 PM, Tom Lendacky wrote:
>> On 12/14/22 13:40, Michael Roth wrote:
>>> From: Brijesh Singh <[email protected]>
>>>
>>> The KVM_SEV_SNP_LAUNCH_FINISH finalize the cryptographic digest and stores
>>> it as the measurement of the guest at launch.
>>>
>>> While finalizing the launch flow, it also issues the LAUNCH_UPDATE command
>>> to encrypt the VMSA pages.
>>>
>>> If its an SNP guest, then VMSA was added in the RMP entry as
>>> a guest owned page and also removed from the kernel direct map
>>> so flush it later after it is transitioned back to hypervisor
>>> state and restored in the direct map.
>>>
>>> Signed-off-by: Brijesh Singh <[email protected]>
>>> Signed-off-by: Harald Hoyer <[email protected]>
>>> Signed-off-by: Ashish Kalra <[email protected]>
>>> Signed-off-by: Michael Roth <[email protected]>
>>> ---
>>>   .../virt/kvm/x86/amd-memory-encryption.rst    |  22 ++++
>>>   arch/x86/kvm/svm/sev.c                        | 119 ++++++++++++++++++
>>>   include/uapi/linux/kvm.h                      |  14 +++
>>>   3 files changed, 155 insertions(+)
>>>
>>> diff --git a/Documentation/virt/kvm/x86/amd-memory-encryption.rst
>>> b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
>>> index c94be8e6d657..e4b42aaab1de 100644
>>> --- a/Documentation/virt/kvm/x86/amd-memory-encryption.rst
>>> +++ b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
>>> @@ -513,6 +513,28 @@ Returns: 0 on success, -negative on error
>>>   See the SEV-SNP spec for further details on how to build the VMPL
>>> permission
>>>   mask and page type.
>>> +21. KVM_SNP_LAUNCH_FINISH
>>> +-------------------------
>>> +
>>> +After completion of the SNP guest launch flow, the
>>> KVM_SNP_LAUNCH_FINISH command can be
>>> +issued to make the guest ready for the execution.
>>> +
>>> +Parameters (in): struct kvm_sev_snp_launch_finish
>>> +
>>> +Returns: 0 on success, -negative on error
>>> +
>>> +::
>>> +
>>> +        struct kvm_sev_snp_launch_finish {
>>> +                __u64 id_block_uaddr;
>>> +                __u64 id_auth_uaddr;
>>> +                __u8 id_block_en;
>>> +                __u8 auth_key_en;
>>> +                __u8 host_data[32];
>>
>> This is missing the 6 bytes of padding at the end of the struct.
>>
>
> Yes will fix this, the documentation is missing that, the structure
> defination in include/uapi/linux/kvm.h includes it.
>
> But why do we need this padding ?
>

I'm assuming it was added so that any new elements added would be aligned
on an 8 byte boundary (should the next element added be a __u64). I don't
think that it is truly needed right now, though.

Thanks,
Tom

2022-12-22 12:25:08

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH RFC v7 01/64] KVM: Fix memslot boundary condition for large page

On Wed, Dec 14, 2022 at 01:39:53PM -0600, Michael Roth wrote:
> From: Nikunj A Dadhania <[email protected]>
>
> Aligned end boundary causes a kvm crash, handle the case.
>
> Signed-off-by: Nikunj A Dadhania <[email protected]>
> Signed-off-by: Michael Roth <[email protected]>
> ---
> arch/x86/kvm/mmu/mmu.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index b1953ebc012e..b3ffc61c668c 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -7159,6 +7159,9 @@ static void kvm_update_lpage_private_shared_mixed(struct kvm *kvm,
> for (gfn = first + pages; gfn < last; gfn += pages)
> linfo_set_mixed(gfn, slot, level, false);
>
> + if (gfn == last)
> + goto out;

I'm guessing this was supposed to be "return;" here:

arch/x86/kvm/mmu/mmu.c: In function ‘kvm_update_lpage_private_shared_mixed’:
arch/x86/kvm/mmu/mmu.c:7090:25: error: label ‘out’ used but not defined
7090 | goto out;
| ^~~~

/me goes and digs deeper.

Aha, it was a "return" but you reordered the patches and the one adding
the out label:

KVM: x86: Add 'update_mem_attr' x86 op

went further down and this became the first but it didn't have the label
anymore.

Yeah, each patch needs to build successfully for bisection reasons, ofc.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2022-12-22 12:33:02

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH RFC v7 02/64] KVM: x86: Add KVM_CAP_UNMAPPED_PRIVATE_MEMORY

On Wed, Dec 14, 2022 at 01:39:54PM -0600, Michael Roth wrote:
> This mainly indicates to KVM that it should expect all private guest
> memory to be backed by private memslots. Ideally this would work
> similarly for others archs, give or take a few additional flags, but
> for now it's a simple boolean indicator for x86.

...

> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index c7e9d375a902..cc9424ccf9b2 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -1219,6 +1219,7 @@ struct kvm_ppc_resize_hpt {
> #define KVM_CAP_DIRTY_LOG_RING_ACQ_REL 223
> #define KVM_CAP_S390_PROTECTED_ASYNC_DISABLE 224
> #define KVM_CAP_MEMORY_ATTRIBUTES 225
> +#define KVM_CAP_UNMAPPED_PRIVATE_MEM 240

Isn't this new cap supposed to be documented somewhere in
Documentation/virt/kvm/api.rst ?

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2022-12-22 15:07:09

by Dov Murik

[permalink] [raw]
Subject: Re: [PATCH RFC v7 62/64] x86/sev: Add KVM commands for instance certs

Hi Dionna, Mike,

On 14/12/2022 21:40, Michael Roth wrote:
> From: Dionna Glaze <[email protected]>
>
> The /dev/sev device has the ability to store host-wide certificates for
> the key used by the AMD-SP for SEV-SNP attestation report signing,
> but for hosts that want to specify additional certificates that are
> specific to the image launched in a VM, a different way is needed to
> communicate those certificates.
>
> This patch adds two new KVM ioctl commands: KVM_SEV_SNP_{GET,SET}_CERTS
>
> The certificates that are set with this command are expected to follow
> the same format as the host certificates, but that format is opaque
> to the kernel.
>
> The new behavior for custom certificates is that the extended guest
> request command will now return the overridden certificates if they
> were installed for the instance. The error condition for a too small
> data buffer is changed to return the overridden certificate data size
> if there is an overridden certificate set installed.
>
> Setting a 0 length certificate returns the system state to only return
> the host certificates on an extended guest request.
>
> We also increase the SEV_FW_BLOB_MAX_SIZE another 4K page to allow
> space for an extra certificate.
>
> Cc: Tom Lendacky <[email protected]>
> Cc: Paolo Bonzini <[email protected]>
>
> Signed-off-by: Dionna Glaze <[email protected]>
> Signed-off-by: Ashish Kalra <[email protected]>
> Signed-off-by: Michael Roth <[email protected]>
> ---
> arch/x86/kvm/svm/sev.c | 111 ++++++++++++++++++++++++++++++++++++++-
> arch/x86/kvm/svm/svm.h | 1 +
> include/linux/psp-sev.h | 2 +-
> include/uapi/linux/kvm.h | 12 +++++
> 4 files changed, 123 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 4de952d1d446..d0e58cffd1ed 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -2081,6 +2081,7 @@ static void *snp_context_create(struct kvm *kvm, struct kvm_sev_cmd *argp)
> goto e_free;
>
> sev->snp_certs_data = certs_data;
> + sev->snp_certs_len = 0;
>
> return context;
>
> @@ -2364,6 +2365,86 @@ static int snp_launch_finish(struct kvm *kvm, struct kvm_sev_cmd *argp)
> return ret;
> }
>
> +static int snp_get_instance_certs(struct kvm *kvm, struct kvm_sev_cmd *argp)
> +{
> + struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> + struct kvm_sev_snp_get_certs params;
> +
> + if (!sev_snp_guest(kvm))
> + return -ENOTTY;
> +
> + if (!sev->snp_context)
> + return -EINVAL;
> +
> + if (copy_from_user(&params, (void __user *)(uintptr_t)argp->data,
> + sizeof(params)))
> + return -EFAULT;
> +
> + /* No instance certs set. */
> + if (!sev->snp_certs_len)
> + return -ENOENT;
> +
> + if (params.certs_len < sev->snp_certs_len) {
> + /* Output buffer too small. Return the required size. */
> + params.certs_len = sev->snp_certs_len;
> +
> + if (copy_to_user((void __user *)(uintptr_t)argp->data, &params,
> + sizeof(params)))
> + return -EFAULT;
> +
> + return -EINVAL;
> + }
> +
> + if (copy_to_user((void __user *)(uintptr_t)params.certs_uaddr,
> + sev->snp_certs_data, sev->snp_certs_len))
> + return -EFAULT;
> +
> + return 0;
> +}
> +
> +static int snp_set_instance_certs(struct kvm *kvm, struct kvm_sev_cmd *argp)
> +{
> + struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> + unsigned long length = SEV_FW_BLOB_MAX_SIZE;
> + void *to_certs = sev->snp_certs_data;
> + struct kvm_sev_snp_set_certs params;
> +
> + if (!sev_snp_guest(kvm))
> + return -ENOTTY;
> +
> + if (!sev->snp_context)
> + return -EINVAL;
> +
> + if (copy_from_user(&params, (void __user *)(uintptr_t)argp->data,
> + sizeof(params)))
> + return -EFAULT;
> +
> + if (params.certs_len > SEV_FW_BLOB_MAX_SIZE)
> + return -EINVAL;
> +
> + /*
> + * Setting a length of 0 is the same as "uninstalling" instance-
> + * specific certificates.
> + */
> + if (params.certs_len == 0) {
> + sev->snp_certs_len = 0;
> + return 0;
> + }
> +
> + /* Page-align the length */
> + length = (params.certs_len + PAGE_SIZE - 1) & PAGE_MASK;
> +
> + if (copy_from_user(to_certs,
> + (void __user *)(uintptr_t)params.certs_uaddr,
> + params.certs_len)) {
> + return -EFAULT;
> + }
> +
> + sev->snp_certs_len = length;

Here we set the length to the page-aligned value, but we copy only
params.cert_len bytes. If there are two subsequent
snp_set_instance_certs() calls where the second one has a shorter
length, we might "keep" some leftover bytes from the first call.

Consider:
1. snp_set_instance_certs(certs_addr point to "AAA...", certs_len=8192)
2. snp_set_instance_certs(certs_addr point to "BBB...", certs_len=4097)

If I understand correctly, on the second call we'll copy 4097 "BBB..."
bytes into the to_certs buffer, but length will be (4096 + PAGE_SIZE -
1) & PAGE_MASK which will be 8192.

Later when fetching the certs (for the extended report or in
snp_get_instance_certs()) the user will get a buffer of 8192 bytes
filled with 4097 BBBs and 4095 leftover AAAs.

Maybe zero sev->snp_certs_data entirely before writing to it?

Related question (not only for this patch) regarding snp_certs_data
(host or per-instance): why is its size page-aligned at all? why is it
limited by 16KB or 20KB? If I understand correctly, for SNP, this buffer
is never sent to the PSP.

> +
> + return 0;
> +}
> +
> int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
> {
> struct kvm_sev_cmd sev_cmd;

[...]

> diff --git a/include/linux/psp-sev.h b/include/linux/psp-sev.h
> index a1e6624540f3..970a9de0ed20 100644
> --- a/include/linux/psp-sev.h
> +++ b/include/linux/psp-sev.h
> @@ -22,7 +22,7 @@
> #define __psp_pa(x) __pa(x)
> #endif
>
> -#define SEV_FW_BLOB_MAX_SIZE 0x4000 /* 16KB */
> +#define SEV_FW_BLOB_MAX_SIZE 0x5000 /* 20KB */
>

This has effects in drivers/crypto/ccp/sev-dev.c
(for
example in alloc_snp_host_map). Is that OK?


-Dov

> /**
> * SEV platform state
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 61b1e26ced01..48bcc59cf86b 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -1949,6 +1949,8 @@ enum sev_cmd_id {
> KVM_SEV_SNP_LAUNCH_START,
> KVM_SEV_SNP_LAUNCH_UPDATE,
> KVM_SEV_SNP_LAUNCH_FINISH,
> + KVM_SEV_SNP_GET_CERTS,
> + KVM_SEV_SNP_SET_CERTS,
>
> KVM_SEV_NR_MAX,
> };
> @@ -2096,6 +2098,16 @@ struct kvm_sev_snp_launch_finish {
> __u8 pad[6];
> };
>
> +struct kvm_sev_snp_get_certs {
> + __u64 certs_uaddr;
> + __u64 certs_len;
> +};
> +
> +struct kvm_sev_snp_set_certs {
> + __u64 certs_uaddr;
> + __u64 certs_len;
> +};
> +
> #define KVM_DEV_ASSIGN_ENABLE_IOMMU (1 << 0)
> #define KVM_DEV_ASSIGN_PCI_2_3 (1 << 1)
> #define KVM_DEV_ASSIGN_MASK_INTX (1 << 2)

2022-12-22 18:32:45

by Tom Dohrmann

[permalink] [raw]
Subject: Re: [PATCH RFC v7 11/64] KVM: SEV: Support private pages in LAUNCH_UPDATE_DATA

On Wed, Dec 14, 2022 at 01:40:03PM -0600, Michael Roth wrote:
> From: Nikunj A Dadhania <[email protected]>
>
> Pre-boot guest payload needs to be encrypted and VMM has copied it
> over to the private-fd. Add support to get the pfn from the memfile fd
> for encrypting the payload in-place.
>
> Signed-off-by: Nikunj A Dadhania <[email protected]>
> Signed-off-by: Michael Roth <[email protected]>
> ---
> arch/x86/kvm/svm/sev.c | 79 ++++++++++++++++++++++++++++++++++--------
> 1 file changed, 64 insertions(+), 15 deletions(-)
>
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index a7e4e3005786..ae4920aeb281 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -107,6 +107,11 @@ static inline bool is_mirroring_enc_context(struct kvm *kvm)
> return !!to_kvm_svm(kvm)->sev_info.enc_context_owner;
> }
>
> +static bool kvm_is_upm_enabled(struct kvm *kvm)
> +{
> + return kvm->arch.upm_mode;
> +}
> +
> /* Must be called with the sev_bitmap_lock held */
> static bool __sev_recycle_asids(int min_asid, int max_asid)
> {
> @@ -382,6 +387,38 @@ static int sev_launch_start(struct kvm *kvm, struct kvm_sev_cmd *argp)
> return ret;
> }
>
> +static int sev_get_memfile_pfn_handler(struct kvm *kvm, struct kvm_gfn_range *range, void *data)
> +{
> + struct kvm_memory_slot *memslot = range->slot;
> + struct page **pages = data;
> + int ret = 0, i = 0;
> + kvm_pfn_t pfn;
> + gfn_t gfn;
> +
> + for (gfn = range->start; gfn < range->end; gfn++) {
> + int order;
> +
> + ret = kvm_restricted_mem_get_pfn(memslot, gfn, &pfn, &order);
> + if (ret)
> + return ret;
> +
> + if (is_error_noslot_pfn(pfn))
> + return -EFAULT;
> +
> + pages[i++] = pfn_to_page(pfn);
> + }
> +
> + return ret;
> +}
> +
> +static int sev_get_memfile_pfn(struct kvm *kvm, unsigned long addr,
> + unsigned long size, unsigned long npages,
> + struct page **pages)
> +{
> + return kvm_vm_do_hva_range_op(kvm, addr, size,
> + sev_get_memfile_pfn_handler, pages);
> +}

The third argument for the kvm_vm_do_hva_range_op call should be addr+size; the
function expects the end of the range not the size of the range.

> +
> static struct page **sev_pin_memory(struct kvm *kvm, unsigned long uaddr,
> unsigned long ulen, unsigned long *n,
> int write)
> @@ -424,16 +461,25 @@ static struct page **sev_pin_memory(struct kvm *kvm, unsigned long uaddr,
> if (!pages)
> return ERR_PTR(-ENOMEM);
>
> - /* Pin the user virtual address. */
> - npinned = pin_user_pages_fast(uaddr, npages, write ? FOLL_WRITE : 0, pages);
> - if (npinned != npages) {
> - pr_err("SEV: Failure locking %lu pages.\n", npages);
> - ret = -ENOMEM;
> - goto err;
> + if (kvm_is_upm_enabled(kvm)) {
> + /* Get the PFN from memfile */
> + if (sev_get_memfile_pfn(kvm, uaddr, ulen, npages, pages)) {
> + pr_err("%s: ERROR: unable to find slot for uaddr %lx", __func__, uaddr);
> + ret = -ENOMEM;
> + goto err;
> + }

This branch doesn't initialize npinned. If sev_get_memfile_pfn fails, the code following the err
label passes the uninitialized value to unpin_user_pages.

> + } else {
> + /* Pin the user virtual address. */
> + npinned = pin_user_pages_fast(uaddr, npages, write ? FOLL_WRITE : 0, pages);
> + if (npinned != npages) {
> + pr_err("SEV: Failure locking %lu pages.\n", npages);
> + ret = -ENOMEM;
> + goto err;
> + }
> + sev->pages_locked = locked;
> }
>
> *n = npages;
> - sev->pages_locked = locked;
>
> return pages;
>
> @@ -514,6 +560,7 @@ static int sev_launch_update_shared_gfn_handler(struct kvm *kvm,
>
> size = (range->end - range->start) << PAGE_SHIFT;
> vaddr_end = vaddr + size;
> + WARN_ON(size < PAGE_SIZE);
>
> /* Lock the user memory. */
> inpages = sev_pin_memory(kvm, vaddr, size, &npages, 1);
> @@ -554,13 +601,16 @@ static int sev_launch_update_shared_gfn_handler(struct kvm *kvm,
> }
>
> e_unpin:
> - /* content of memory is updated, mark pages dirty */
> - for (i = 0; i < npages; i++) {
> - set_page_dirty_lock(inpages[i]);
> - mark_page_accessed(inpages[i]);
> + if (!kvm_is_upm_enabled(kvm)) {
> + /* content of memory is updated, mark pages dirty */
> + for (i = 0; i < npages; i++) {
> + set_page_dirty_lock(inpages[i]);
> + mark_page_accessed(inpages[i]);
> + }
> + /* unlock the user pages */
> + sev_unpin_memory(kvm, inpages, npages);
> }
> - /* unlock the user pages */
> - sev_unpin_memory(kvm, inpages, npages);
> +
> return ret;
> }
>
> @@ -609,9 +659,8 @@ static int sev_launch_update_priv_gfn_handler(struct kvm *kvm,
> goto e_ret;
> kvm_release_pfn_clean(pfn);
> }
> - kvm_vm_set_region_attr(kvm, range->start, range->end,
> - true /* priv_attr */);
>
> + kvm_vm_set_region_attr(kvm, range->start, range->end, KVM_MEMORY_ATTRIBUTE_PRIVATE);
> e_ret:
> return ret;
> }
> --
> 2.25.1
>

Regards, Tom

2022-12-23 11:58:47

by Nikunj A. Dadhania

[permalink] [raw]
Subject: Re: [PATCH RFC v7 11/64] KVM: SEV: Support private pages in LAUNCH_UPDATE_DATA

On 22/12/22 23:54, [email protected] wrote:
> On Wed, Dec 14, 2022 at 01:40:03PM -0600, Michael Roth wrote:
>> From: Nikunj A Dadhania <[email protected]>
>>
>> Pre-boot guest payload needs to be encrypted and VMM has copied it
>> over to the private-fd. Add support to get the pfn from the memfile fd
>> for encrypting the payload in-place.
>>
>> Signed-off-by: Nikunj A Dadhania <[email protected]>
>> Signed-off-by: Michael Roth <[email protected]>
>> ---
>> arch/x86/kvm/svm/sev.c | 79 ++++++++++++++++++++++++++++++++++--------
>> 1 file changed, 64 insertions(+), 15 deletions(-)
>>
>> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
>> index a7e4e3005786..ae4920aeb281 100644
>> --- a/arch/x86/kvm/svm/sev.c
>> +++ b/arch/x86/kvm/svm/sev.c
>> @@ -107,6 +107,11 @@ static inline bool is_mirroring_enc_context(struct kvm *kvm)
>> return !!to_kvm_svm(kvm)->sev_info.enc_context_owner;
>> }
>>
>> +static bool kvm_is_upm_enabled(struct kvm *kvm)
>> +{
>> + return kvm->arch.upm_mode;
>> +}
>> +
>> /* Must be called with the sev_bitmap_lock held */
>> static bool __sev_recycle_asids(int min_asid, int max_asid)
>> {
>> @@ -382,6 +387,38 @@ static int sev_launch_start(struct kvm *kvm, struct kvm_sev_cmd *argp)
>> return ret;
>> }
>>
>> +static int sev_get_memfile_pfn_handler(struct kvm *kvm, struct kvm_gfn_range *range, void *data)
>> +{
>> + struct kvm_memory_slot *memslot = range->slot;
>> + struct page **pages = data;
>> + int ret = 0, i = 0;
>> + kvm_pfn_t pfn;
>> + gfn_t gfn;
>> +
>> + for (gfn = range->start; gfn < range->end; gfn++) {
>> + int order;
>> +
>> + ret = kvm_restricted_mem_get_pfn(memslot, gfn, &pfn, &order);
>> + if (ret)
>> + return ret;
>> +
>> + if (is_error_noslot_pfn(pfn))
>> + return -EFAULT;
>> +
>> + pages[i++] = pfn_to_page(pfn);
>> + }
>> +
>> + return ret;
>> +}
>> +
>> +static int sev_get_memfile_pfn(struct kvm *kvm, unsigned long addr,
>> + unsigned long size, unsigned long npages,
>> + struct page **pages)
>> +{
>> + return kvm_vm_do_hva_range_op(kvm, addr, size,
>> + sev_get_memfile_pfn_handler, pages);
>> +}
>
> The third argument for the kvm_vm_do_hva_range_op call should be addr+size; the
> function expects the end of the range not the size of the range.

Good catch, will fix.

>> +
>> static struct page **sev_pin_memory(struct kvm *kvm, unsigned long uaddr,
>> unsigned long ulen, unsigned long *n,
>> int write)
>> @@ -424,16 +461,25 @@ static struct page **sev_pin_memory(struct kvm *kvm, unsigned long uaddr,
>> if (!pages)
>> return ERR_PTR(-ENOMEM);
>>
>> - /* Pin the user virtual address. */
>> - npinned = pin_user_pages_fast(uaddr, npages, write ? FOLL_WRITE : 0, pages);
>> - if (npinned != npages) {
>> - pr_err("SEV: Failure locking %lu pages.\n", npages);
>> - ret = -ENOMEM;
>> - goto err;
>> + if (kvm_is_upm_enabled(kvm)) {
>> + /* Get the PFN from memfile */
>> + if (sev_get_memfile_pfn(kvm, uaddr, ulen, npages, pages)) {
>> + pr_err("%s: ERROR: unable to find slot for uaddr %lx", __func__, uaddr);
>> + ret = -ENOMEM;
>> + goto err;
>> + }
>
> This branch doesn't initialize npinned. If sev_get_memfile_pfn fails, the code following the err
> label passes the uninitialized value to unpin_user_pages.

Sure, will fix.

>
>> + } else {
>> + /* Pin the user virtual address. */
>> + npinned = pin_user_pages_fast(uaddr, npages, write ? FOLL_WRITE : 0, pages);
>> + if (npinned != npages) {
>> + pr_err("SEV: Failure locking %lu pages.\n", npages);
>> + ret = -ENOMEM;
>> + goto err;
>> + }
>> + sev->pages_locked = locked;
>> }
>>
>> *n = npages;
>> - sev->pages_locked = locked;
>>
>> return pages;
>>
>> @@ -514,6 +560,7 @@ static int sev_launch_update_shared_gfn_handler(struct kvm *kvm,
>>
>> size = (range->end - range->start) << PAGE_SHIFT;
>> vaddr_end = vaddr + size;
>> + WARN_ON(size < PAGE_SIZE);
>>
>> /* Lock the user memory. */
>> inpages = sev_pin_memory(kvm, vaddr, size, &npages, 1);
>> @@ -554,13 +601,16 @@ static int sev_launch_update_shared_gfn_handler(struct kvm *kvm,
>> }
>>
>> e_unpin:
>> - /* content of memory is updated, mark pages dirty */
>> - for (i = 0; i < npages; i++) {
>> - set_page_dirty_lock(inpages[i]);
>> - mark_page_accessed(inpages[i]);
>> + if (!kvm_is_upm_enabled(kvm)) {
>> + /* content of memory is updated, mark pages dirty */
>> + for (i = 0; i < npages; i++) {
>> + set_page_dirty_lock(inpages[i]);
>> + mark_page_accessed(inpages[i]);
>> + }
>> + /* unlock the user pages */
>> + sev_unpin_memory(kvm, inpages, npages);
>> }
>> - /* unlock the user pages */
>> - sev_unpin_memory(kvm, inpages, npages);
>> +
>> return ret;
>> }
>>
>> @@ -609,9 +659,8 @@ static int sev_launch_update_priv_gfn_handler(struct kvm *kvm,
>> goto e_ret;
>> kvm_release_pfn_clean(pfn);
>> }
>> - kvm_vm_set_region_attr(kvm, range->start, range->end,
>> - true /* priv_attr */);
>>
>> + kvm_vm_set_region_attr(kvm, range->start, range->end, KVM_MEMORY_ATTRIBUTE_PRIVATE);
>> e_ret:
>> return ret;
>> }
>> --
>> 2.25.1
>>
>
> Regards, Tom

Thanks
Nikunj

2022-12-23 17:03:33

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH RFC v7 03/64] KVM: SVM: Advertise private memory support to KVM

On Wed, Dec 14, 2022 at 01:39:55PM -0600, Michael Roth wrote:
> + bool (*private_mem_enabled)(struct kvm *kvm);

This looks like a function returning boolean to me. IOW, you can
simplify this to:

diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
index 82ba4a564e58..4449aeff0dff 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -129,6 +129,7 @@ KVM_X86_OP(msr_filter_changed)
KVM_X86_OP(complete_emulated_msr)
KVM_X86_OP(vcpu_deliver_sipi_vector)
KVM_X86_OP_OPTIONAL_RET0(vcpu_get_apicv_inhibit_reasons);
+KVM_X86_OP_OPTIONAL_RET0(private_mem_enabled);

#undef KVM_X86_OP
#undef KVM_X86_OP_OPTIONAL
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 1da0474edb2d..1b4b89ddeb55 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1574,6 +1574,7 @@ struct kvm_x86_ops {

void (*load_mmu_pgd)(struct kvm_vcpu *vcpu, hpa_t root_hpa,
int root_level);
+ bool (*private_mem_enabled)(struct kvm *kvm);

bool (*has_wbinvd_exit)(void);

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index ce362e88a567..73b780fa4653 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4680,6 +4680,14 @@ static int svm_vm_init(struct kvm *kvm)
return 0;
}

+static bool svm_private_mem_enabled(struct kvm *kvm)
+{
+ if (sev_guest(kvm))
+ return kvm->arch.upm_mode;
+
+ return IS_ENABLED(CONFIG_HAVE_KVM_PRIVATE_MEM_TESTING);
+}
+
static struct kvm_x86_ops svm_x86_ops __initdata = {
.name = "kvm_amd",

@@ -4760,6 +4768,8 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {

.vcpu_after_set_cpuid = svm_vcpu_after_set_cpuid,

+ .private_mem_enabled = svm_private_mem_enabled,
+
.has_wbinvd_exit = svm_has_wbinvd_exit,

.get_l2_tsc_offset = svm_get_l2_tsc_offset,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 823646d601db..9a1ca59d36a4 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -12556,6 +12556,11 @@ void __user * __x86_set_memory_region(struct kvm *kvm, int id, gpa_t gpa,
}
EXPORT_SYMBOL_GPL(__x86_set_memory_region);

+bool kvm_arch_has_private_mem(struct kvm *kvm)
+{
+ return static_call(kvm_x86_private_mem_enabled)(kvm);
+}
+
void kvm_arch_pre_destroy_vm(struct kvm *kvm)
{
kvm_mmu_pre_destroy_vm(kvm);

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2022-12-23 20:37:39

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH RFC v7 00/64] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support

On Wed, Dec 14, 2022 at 01:39:52PM -0600, Michael Roth wrote:
> This patchset is based on top of the following patchset:
>
> "[PATCH v10 0/9] KVM: mm: fd-based approach for supporting KVM"
> https://lore.kernel.org/lkml/[email protected]/T/#me1dd3a4c295758b4e4ac8ff600f2db055bc5f987

Well, not quite.

There's also this thing which is stuck in there:

https://lore.kernel.org/r/[email protected]

and I would appreciate reading that in the 0th message so that I don't
scratch my head over why don't those patches apply and what else is
missing...

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2022-12-29 16:14:34

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH RFC v7 04/64] KVM: x86: Add 'fault_is_private' x86 op

On Wed, Dec 14, 2022 at 01:39:56PM -0600, Michael Roth wrote:
> This callback is used by the KVM MMU to check whether a #NPF was
> or a private GPA or not.

s/or //

>
> Signed-off-by: Michael Roth <[email protected]>
> ---
> arch/x86/include/asm/kvm-x86-ops.h | 1 +
> arch/x86/include/asm/kvm_host.h | 1 +
> arch/x86/kvm/mmu/mmu.c | 3 +--
> arch/x86/kvm/mmu/mmu_internal.h | 40 +++++++++++++++++++++++++++---
> 4 files changed, 39 insertions(+), 6 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
> index f530a550c092..efae987cdce0 100644
> --- a/arch/x86/include/asm/kvm-x86-ops.h
> +++ b/arch/x86/include/asm/kvm-x86-ops.h
> @@ -132,6 +132,7 @@ KVM_X86_OP(complete_emulated_msr)
> KVM_X86_OP(vcpu_deliver_sipi_vector)
> KVM_X86_OP_OPTIONAL_RET0(vcpu_get_apicv_inhibit_reasons);
> KVM_X86_OP_OPTIONAL_RET0(private_mem_enabled);
> +KVM_X86_OP_OPTIONAL_RET0(fault_is_private);
>
> #undef KVM_X86_OP
> #undef KVM_X86_OP_OPTIONAL
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 9317abffbf68..92539708f062 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1636,6 +1636,7 @@ struct kvm_x86_ops {
> void (*load_mmu_pgd)(struct kvm_vcpu *vcpu, hpa_t root_hpa,
> int root_level);
> int (*private_mem_enabled)(struct kvm *kvm);
> + int (*fault_is_private)(struct kvm *kvm, gpa_t gpa, u64 error_code, bool *private_fault);

bool

and then you don't need the silly "== 1" at the call site.

>
> bool (*has_wbinvd_exit)(void);

...

> @@ -261,13 +293,13 @@ enum {
> };
>
> static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
> - u32 err, bool prefetch)
> + u64 err, bool prefetch)

The u32 -> u64 change of err could use a sentence or two of
clarification in the commit message...

> {
> bool is_tdp = likely(vcpu->arch.mmu->page_fault == kvm_tdp_page_fault);
>
> struct kvm_page_fault fault = {
> .addr = cr2_or_gpa,
> - .error_code = err,
> + .error_code = lower_32_bits(err),
> .exec = err & PFERR_FETCH_MASK,
> .write = err & PFERR_WRITE_MASK,
> .present = err & PFERR_PRESENT_MASK,
> @@ -281,8 +313,8 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
> .max_level = KVM_MAX_HUGEPAGE_LEVEL,
> .req_level = PG_LEVEL_4K,
> .goal_level = PG_LEVEL_4K,
> - .is_private = IS_ENABLED(CONFIG_HAVE_KVM_PRIVATE_MEM_TESTING) && is_tdp &&
> - kvm_mem_is_private(vcpu->kvm, cr2_or_gpa >> PAGE_SHIFT),
> + .is_private = is_tdp && kvm_mmu_fault_is_private(vcpu->kvm,
> + cr2_or_gpa, err),
> };
> int r;
>
> --
> 2.25.1
>

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2022-12-30 12:06:28

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH RFC v7 06/64] KVM: x86: Add platform hooks for private memory invalidations

On Wed, Dec 14, 2022 at 01:39:58PM -0600, Michael Roth wrote:
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index a0c41d391547..2713632e5061 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -7183,3 +7183,8 @@ void kvm_arch_set_memory_attributes(struct kvm *kvm,
> kvm_update_lpage_private_shared_mixed(kvm, slot, attrs,
> start, end);
> }
> +
> +void kvm_arch_invalidate_restricted_mem(struct kvm_memory_slot *slot, gfn_t start, gfn_t end)
> +{
> + static_call_cond(kvm_x86_invalidate_restricted_mem)(slot, start, end);

Why _cond?

> @@ -258,6 +263,17 @@ void restrictedmem_unregister_notifier(struct file *file,
> struct restrictedmem_notifier *notifier)
> {
> struct restrictedmem_data *data = file->f_mapping->private_data;
> + struct inode *inode = file_inode(data->memfd);
> +
> + /* TODO: this will issue notifications to all registered notifiers,

First of all:

verify_comment_style: WARNING: Multi-line comment needs to start text on the second line:
[+ /* TODO: this will issue notifications to all registered notifiers,]

Then, if you only want to run the callbacks for the one going away only,
why don't you simply do:

mutex_lock(&data->lock);
notifier->ops->invalidate_start(notifier, 0, inode->i_size >> PAGE_SHIFT);
notifier->ops->invalidate_end(notifier, 0, inode->i_size >> PAGE_SHIFT);
list_del(&notifier->list);
mutex_unlock(&data->lock);

here?

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2022-12-31 14:28:35

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: [PATCH RFC v7 37/64] KVM: SVM: Add KVM_SNP_INIT command

On Wed, Dec 14, 2022 at 01:40:29PM -0600, Michael Roth wrote:
> static int sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp)
> {
> struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> @@ -260,13 +279,23 @@ static int sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp)
> return ret;
>
> sev->active = true;
> - sev->es_active = argp->id == KVM_SEV_ES_INIT;
> + sev->es_active = (argp->id == KVM_SEV_ES_INIT || argp->id == KVM_SEV_SNP_INIT);
> + sev->snp_active = argp->id == KVM_SEV_SNP_INIT;
> asid = sev_asid_new(sev);
> if (asid < 0)
> goto e_no_asid;
> sev->asid = asid;
>
> - ret = sev_platform_init(&argp->error);
> + if (sev->snp_active) {
> + ret = verify_snp_init_flags(kvm, argp);
> + if (ret)
> + goto e_free;
> +
> + ret = sev_snp_init(&argp->error, false);
> + } else {
> + ret = sev_platform_init(&argp->error);
> + }

Couldn't sev_snp_init() and sev_platform_init() be called unconditionally
in order?

Since there is a hardware constraint that SNP init needs to always happen
before platform init, shouldn't SNP init happen as part of
__sev_platform_init_locked() instead?

I found these call sites for __sev_platform_init_locked(), none of which
follow the correct call order:

* sev_guest_init()
* sev_ioctl_do_pek_csr
* sev_ioctl_do_pdh_export()
* sev_ioctl_do_pek_import()
* sev_ioctl_do_pek_pdh_gen()
* sev_pci_init()

For me it looks like a bit flakky API use to have sev_snp_init() as an API
call.

I would suggest to make SNP init internal to the ccp driver and take care
of the correct orchestration over there.

Also, how it currently works in this patch set, if the firmware did not
load correctly, SNP init halts the whole system. The version check needs
to be in all call paths.

BR, Jarkko


Attachments:
(No filename) (1.78 kB)
0001-crypto-ccp-Prevent-a-spurious-SEV_CMD_SNP_INIT-trigg.patch (2.14 kB)
Download all attachments

2022-12-31 14:52:49

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: [PATCH RFC v7 37/64] KVM: SVM: Add KVM_SNP_INIT command

A couple of fixups.

On Sat, Dec 31, 2022 at 02:27:57PM +0000, Jarkko Sakkinen wrote:
> diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
> index 6c4fdcaed72b..462c9aaa2e7e 100644
> --- a/drivers/crypto/ccp/sev-dev.c
> +++ b/drivers/crypto/ccp/sev-dev.c
> @@ -1381,6 +1381,12 @@ static int __sev_snp_init_locked(int *error)
> if (sev->snp_initialized)
> return 0;
>
> + if (!sev_version_greater_or_equal(SNP_MIN_API_MAJOR, SNP_MIN_API_MINOR)) {
> + dev_dbg(sev->dev, "SEV-SNP support requires firmware version >= %d:%d\n",
> + SNP_MIN_API_MAJOR, SNP_MIN_API_MINOR);
> + return -ENODEV;

return 0;

It is not a failure case anyway.

> + }
> +
> /*
> * The SNP_INIT requires the MSR_VM_HSAVE_PA must be set to 0h
> * across all cores.
> @@ -2313,25 +2319,19 @@ void sev_pci_init(void)
> }
> }
>
> + rc = sev_snp_init(&error, true);
> + if (rc != -ENODEV)


if (rc)

Because other wise there would need to be nasty "if (rc && rc != ENODEV)"
so that this does not happen:

[ 9.321588] ccp 0000:49:00.1: SEV-SNP: failed to INIT error 0x0

BR, Jarkko

2022-12-31 15:19:02

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: [PATCH RFC v7 37/64] KVM: SVM: Add KVM_SNP_INIT command

On Sat, Dec 31, 2022 at 02:47:29PM +0000, Jarkko Sakkinen wrote:
> A couple of fixups.
>
> On Sat, Dec 31, 2022 at 02:27:57PM +0000, Jarkko Sakkinen wrote:
> > diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
> > index 6c4fdcaed72b..462c9aaa2e7e 100644
> > --- a/drivers/crypto/ccp/sev-dev.c
> > +++ b/drivers/crypto/ccp/sev-dev.c
> > @@ -1381,6 +1381,12 @@ static int __sev_snp_init_locked(int *error)
> > if (sev->snp_initialized)
> > return 0;
> >
> > + if (!sev_version_greater_or_equal(SNP_MIN_API_MAJOR, SNP_MIN_API_MINOR)) {
> > + dev_dbg(sev->dev, "SEV-SNP support requires firmware version >= %d:%d\n",
> > + SNP_MIN_API_MAJOR, SNP_MIN_API_MINOR);
> > + return -ENODEV;
>
> return 0;
>
> It is not a failure case anyway.
>
> > + }
> > +
> > /*
> > * The SNP_INIT requires the MSR_VM_HSAVE_PA must be set to 0h
> > * across all cores.
> > @@ -2313,25 +2319,19 @@ void sev_pci_init(void)
> > }
> > }
> >
> > + rc = sev_snp_init(&error, true);
> > + if (rc != -ENODEV)
>
>
> if (rc)
>
> Because other wise there would need to be nasty "if (rc && rc != ENODEV)"
> so that this does not happen:
>
> [ 9.321588] ccp 0000:49:00.1: SEV-SNP: failed to INIT error 0x0
>
> BR, Jarkko

This patch (not dependent on the series) is kind of related to my
feedback. Since platform init can span from quite many locations
it would be useful to get errors reported from all locations:

https://www.lkml.org/lkml/2022/12/31/175

Would be IMHO good to have this in the baseline when testing SNP init
functionality.

BR, Jarkko

2022-12-31 15:34:30

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: [PATCH RFC v7 25/64] crypto: ccp: Add support to initialize the AMD-SP for SEV-SNP

On Wed, Dec 14, 2022 at 01:40:17PM -0600, Michael Roth wrote:
> From: Brijesh Singh <[email protected]>
>
> Before SNP VMs can be launched, the platform must be appropriately
> configured and initialized. Platform initialization is accomplished via
> the SNP_INIT command. Make sure to do a WBINVD and issue DF_FLUSH
> command to prepare for the first SNP guest launch after INIT.
>
> During the execution of SNP_INIT command, the firmware configures
> and enables SNP security policy enforcement in many system components.
> Some system components write to regions of memory reserved by early
> x86 firmware (e.g. UEFI). Other system components write to regions
> provided by the operation system, hypervisor, or x86 firmware.
> Such system components can only write to HV-fixed pages or Default
> pages. They will error when attempting to write to other page states
> after SNP_INIT enables their SNP enforcement.
>
> Starting in SNP firmware v1.52, the SNP_INIT_EX command takes a list of
> system physical address ranges to convert into the HV-fixed page states
> during the RMP initialization. If INIT_RMP is 1, hypervisors should
> provide all system physical address ranges that the hypervisor will
> never assign to a guest until the next RMP re-initialization.
> For instance, the memory that UEFI reserves should be included in the
> range list. This allows system components that occasionally write to
> memory (e.g. logging to UEFI reserved regions) to not fail due to
> RMP initialization and SNP enablement.
>
> Co-developed-by: Ashish Kalra <[email protected]>
> Signed-off-by: Ashish Kalra <[email protected]>
> Signed-off-by: Brijesh Singh <[email protected]>
> Signed-off-by: Michael Roth <[email protected]>
> ---
> drivers/crypto/ccp/sev-dev.c | 225 +++++++++++++++++++++++++++++++++++
> drivers/crypto/ccp/sev-dev.h | 2 +
> include/linux/psp-sev.h | 17 +++
> 3 files changed, 244 insertions(+)
>
> diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
> index 9d84720a41d7..af20420bd6c2 100644
> --- a/drivers/crypto/ccp/sev-dev.c
> +++ b/drivers/crypto/ccp/sev-dev.c
> @@ -26,6 +26,7 @@
> #include <linux/fs_struct.h>
>
> #include <asm/smp.h>
> +#include <asm/e820/types.h>
>
> #include "psp-dev.h"
> #include "sev-dev.h"
> @@ -34,6 +35,10 @@
> #define SEV_FW_FILE "amd/sev.fw"
> #define SEV_FW_NAME_SIZE 64
>
> +/* Minimum firmware version required for the SEV-SNP support */
> +#define SNP_MIN_API_MAJOR 1
> +#define SNP_MIN_API_MINOR 51
> +
> static DEFINE_MUTEX(sev_cmd_mutex);
> static struct sev_misc_dev *misc_dev;
>
> @@ -76,6 +81,13 @@ static void *sev_es_tmr;
> #define NV_LENGTH (32 * 1024)
> static void *sev_init_ex_buffer;
>
> +/*
> + * SEV_DATA_RANGE_LIST:
> + * Array containing range of pages that firmware transitions to HV-fixed
> + * page state.
> + */
> +struct sev_data_range_list *snp_range_list;
> +
> static inline bool sev_version_greater_or_equal(u8 maj, u8 min)
> {
> struct sev_device *sev = psp_master->sev_data;
> @@ -830,6 +842,186 @@ static int sev_update_firmware(struct device *dev)
> return ret;
> }
>
> +static void snp_set_hsave_pa(void *arg)
> +{
> + wrmsrl(MSR_VM_HSAVE_PA, 0);
> +}
> +
> +static int snp_filter_reserved_mem_regions(struct resource *rs, void *arg)
> +{
> + struct sev_data_range_list *range_list = arg;
> + struct sev_data_range *range = &range_list->ranges[range_list->num_elements];
> + size_t size;
> +
> + if ((range_list->num_elements * sizeof(struct sev_data_range) +
> + sizeof(struct sev_data_range_list)) > PAGE_SIZE)
> + return -E2BIG;
> +
> + switch (rs->desc) {
> + case E820_TYPE_RESERVED:
> + case E820_TYPE_PMEM:
> + case E820_TYPE_ACPI:
> + range->base = rs->start & PAGE_MASK;
> + size = (rs->end + 1) - rs->start;
> + range->page_count = size >> PAGE_SHIFT;
> + range_list->num_elements++;
> + break;
> + default:
> + break;
> + }
> +
> + return 0;
> +}
> +
> +static int __sev_snp_init_locked(int *error)
> +{
> + struct psp_device *psp = psp_master;
> + struct sev_data_snp_init_ex data;
> + struct sev_device *sev;
> + int rc = 0;
> +
> + if (!psp || !psp->sev_data)
> + return -ENODEV;
> +
> + sev = psp->sev_data;
> +
> + if (sev->snp_initialized)
> + return 0;

Shouldn't this follow this check:

if (sev->state == SEV_STATE_INIT) {
/* debug printk about possible incorrect call order */
return -ENODEV;
}

It is game over for SNP, if SEV_CMD_INIT{_EX} got first, which means that
this should not proceed.

BR, Jarkko

2023-01-04 12:01:43

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: [PATCH RFC v7 01/64] KVM: Fix memslot boundary condition for large page

On Wed, Dec 14, 2022 at 01:39:53PM -0600, Michael Roth wrote:
> From: Nikunj A Dadhania <[email protected]>
>
> Aligned end boundary causes a kvm crash, handle the case.
>

Link: https://lore.kernel.org/kvm/[email protected]/

Chao, are you aware of this issue already?

> Signed-off-by: Nikunj A Dadhania <[email protected]>
> Signed-off-by: Michael Roth <[email protected]>
> ---
> arch/x86/kvm/mmu/mmu.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index b1953ebc012e..b3ffc61c668c 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -7159,6 +7159,9 @@ static void kvm_update_lpage_private_shared_mixed(struct kvm *kvm,
> for (gfn = first + pages; gfn < last; gfn += pages)
> linfo_set_mixed(gfn, slot, level, false);
>
> + if (gfn == last)
> + goto out;
> +
> gfn = last;
> gfn_end = min(last + pages, slot->base_gfn + slot->npages);
> mixed = mem_attrs_mixed(kvm, slot, level, attrs, gfn, gfn_end);
> --
> 2.25.1
>


BR, Jarkko

2023-01-04 12:07:14

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: [PATCH RFC v7 02/64] KVM: x86: Add KVM_CAP_UNMAPPED_PRIVATE_MEMORY

On Wed, Dec 14, 2022 at 01:39:54PM -0600, Michael Roth wrote:
> This mainly indicates to KVM that it should expect all private guest
> memory to be backed by private memslots. Ideally this would work
> similarly for others archs, give or take a few additional flags, but
> for now it's a simple boolean indicator for x86.
>
> Signed-off-by: Michael Roth <[email protected]>
> ---
> arch/x86/include/asm/kvm_host.h | 3 +++
> arch/x86/kvm/x86.c | 10 ++++++++++
> include/uapi/linux/kvm.h | 1 +
> 3 files changed, 14 insertions(+)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 27ef31133352..2b6244525107 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1438,6 +1438,9 @@ struct kvm_arch {
> */
> #define SPLIT_DESC_CACHE_MIN_NR_OBJECTS (SPTE_ENT_PER_PAGE + 1)
> struct kvm_mmu_memory_cache split_desc_cache;
> +
> + /* Use/enforce unmapped private memory. */
> + bool upm_mode;
> };
>
> struct kvm_vm_stat {
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index c67e22f3e2ee..99ecf99bc4d2 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -4421,6 +4421,11 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
> case KVM_CAP_EXIT_HYPERCALL:
> r = KVM_EXIT_HYPERCALL_VALID_MASK;
> break;
> +#ifdef CONFIG_HAVE_KVM_MEMORY_ATTRIBUTES
> + case KVM_CAP_UNMAPPED_PRIVATE_MEM:
> + r = 1;
> + break;
> +#endif
> case KVM_CAP_SET_GUEST_DEBUG2:
> return KVM_GUESTDBG_VALID_MASK;
> #ifdef CONFIG_KVM_XEN
> @@ -6382,6 +6387,10 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
> }
> mutex_unlock(&kvm->lock);
> break;
> + case KVM_CAP_UNMAPPED_PRIVATE_MEM:
> + kvm->arch.upm_mode = true;
> + r = 0;
> + break;
> default:
> r = -EINVAL;
> break;
> @@ -12128,6 +12137,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
> kvm->arch.default_tsc_khz = max_tsc_khz ? : tsc_khz;
> kvm->arch.guest_can_read_msr_platform_info = true;
> kvm->arch.enable_pmu = enable_pmu;
> + kvm->arch.upm_mode = false;
>
> #if IS_ENABLED(CONFIG_HYPERV)
> spin_lock_init(&kvm->arch.hv_root_tdp_lock);
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index c7e9d375a902..cc9424ccf9b2 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -1219,6 +1219,7 @@ struct kvm_ppc_resize_hpt {
> #define KVM_CAP_DIRTY_LOG_RING_ACQ_REL 223
> #define KVM_CAP_S390_PROTECTED_ASYNC_DISABLE 224
> #define KVM_CAP_MEMORY_ATTRIBUTES 225
> +#define KVM_CAP_UNMAPPED_PRIVATE_MEM 240
>
> #ifdef KVM_CAP_IRQ_ROUTING
>
> --
> 2.25.1
>

Why we want to carry non-UPM support still?

BR, Jarkko

2023-01-04 12:17:37

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: [PATCH RFC v7 25/64] crypto: ccp: Add support to initialize the AMD-SP for SEV-SNP

On Wed, Dec 14, 2022 at 01:40:17PM -0600, Michael Roth wrote:
> + /*
> + * If boot CPU supports SNP, then first attempt to initialize
> + * the SNP firmware.
> + */
> + if (cpu_feature_enabled(X86_FEATURE_SEV_SNP)) {
> + if (!sev_version_greater_or_equal(SNP_MIN_API_MAJOR, SNP_MIN_API_MINOR)) {
> + dev_err(sev->dev, "SEV-SNP support requires firmware version >= %d:%d\n",
> + SNP_MIN_API_MAJOR, SNP_MIN_API_MINOR);
> + } else {
> + rc = sev_snp_init(&error, true);
> + if (rc) {
> + /*
> + * Don't abort the probe if SNP INIT failed,
> + * continue to initialize the legacy SEV firmware.
> + */
> + dev_err(sev->dev, "SEV-SNP: failed to INIT error %#x\n", error);
> + }
> + }
> + }

I think this is not right as there is a dep between sev init and this,
and there is about a dozen of call sites already __sev_platform_init_locked().

Instead there should be __sev_snp_init_locked() that would be called as
part of __sev_platform_init_locked() flow.

Also TMR allocation should be moved inside __sev_platform_init_locked,
given that it needs to be marked into RMP after SNP init.

BR, Jarkko

2023-01-05 02:58:44

by Michael Roth

[permalink] [raw]
Subject: Re: [PATCH RFC v7 00/64] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support

On Fri, Dec 23, 2022 at 09:33:16PM +0100, Borislav Petkov wrote:
> On Wed, Dec 14, 2022 at 01:39:52PM -0600, Michael Roth wrote:
> > This patchset is based on top of the following patchset:
> >
> > "[PATCH v10 0/9] KVM: mm: fd-based approach for supporting KVM"
> > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Flkml%2F20221202061347.1070246-1-chao.p.peng%40linux.intel.com%2FT%2F%23me1dd3a4c295758b4e4ac8ff600f2db055bc5f987&data=05%7C01%7Cmichael.roth%40amd.com%7Ce778b913794d41ca7a1b08dae524f21c%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C638074244202143178%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=rhjtuPIYWrTN%2FtPHOb2HM5GGZ5cgHRCGVoqu8N1f7XY%3D&reserved=0
>
> Well, not quite.
>
> There's also this thing which is stuck in there:
>
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Fr%2F20221205232341.4131240-1-vannapurve%40google.com&data=05%7C01%7Cmichael.roth%40amd.com%7Ce778b913794d41ca7a1b08dae524f21c%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C638074244202143178%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=L20ayS1IDUCuOjrs2HzVApzf5%2BmW48PhcsnZprn1RIM%3D&reserved=0
>
> and I would appreciate reading that in the 0th message so that I don't
> scratch my head over why don't those patches apply and what else is
> missing...

That's correct, sorry for the confusion. With UPM v9 those tests were included
on top so when I ported to v10 I failed to recall it was now a separate
patchset I'd added on top.

-Mike

>
> --
> Regards/Gruss,
> Boris.
>
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpeople.kernel.org%2Ftglx%2Fnotes-about-netiquette&data=05%7C01%7Cmichael.roth%40amd.com%7Ce778b913794d41ca7a1b08dae524f21c%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C638074244202143178%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=q4TNZvaptVOEPnx%2B6FJpg64v%2BUya14kO6PX1U422JXE%3D&reserved=0

2023-01-05 02:58:54

by Michael Roth

[permalink] [raw]
Subject: Re: [PATCH RFC v7 02/64] KVM: x86: Add KVM_CAP_UNMAPPED_PRIVATE_MEMORY

On Thu, Dec 22, 2022 at 01:26:25PM +0100, Borislav Petkov wrote:
> On Wed, Dec 14, 2022 at 01:39:54PM -0600, Michael Roth wrote:
> > This mainly indicates to KVM that it should expect all private guest
> > memory to be backed by private memslots. Ideally this would work
> > similarly for others archs, give or take a few additional flags, but
> > for now it's a simple boolean indicator for x86.
>
> ...
>
> > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> > index c7e9d375a902..cc9424ccf9b2 100644
> > --- a/include/uapi/linux/kvm.h
> > +++ b/include/uapi/linux/kvm.h
> > @@ -1219,6 +1219,7 @@ struct kvm_ppc_resize_hpt {
> > #define KVM_CAP_DIRTY_LOG_RING_ACQ_REL 223
> > #define KVM_CAP_S390_PROTECTED_ASYNC_DISABLE 224
> > #define KVM_CAP_MEMORY_ATTRIBUTES 225
> > +#define KVM_CAP_UNMAPPED_PRIVATE_MEM 240
>
> Isn't this new cap supposed to be documented somewhere in
> Documentation/virt/kvm/api.rst ?

It should, but this is sort of a placeholder for now. Ideally we'd
re-use the capabilities introduced by UPM patchset rather than introduce
a new one. Originally the UPM patchset had a KVM_CAP_PRIVATE_MEM which
we planned to use to switch between legacy SEV and UPM-based SEV (for
lazy-pinning support) by making it writeable, but that was removed in v10
in favor of KVM_CAP_MEMORY_ATTRIBUTES, which is tied to the new
KVM_GET_SUPPORTED_MEMORY_ATTRIBUTES/KVM_SET_MEMORY_ATTRIBUTES ioctls:

https://lore.kernel.org/lkml/CA+EHjTxXOdzcP25F57Mtmnb1NWyG5DcyqeDPqzjEOzRUrqH8FQ@mail.gmail.com/

It wasn't clear at the time if that was the right interface to use for
this particular case, so we stuck with the more general
'use-upm/dont-use-upm' semantics originally provided by making
KVM_CAP_UNMAPPED_PRIVATE_MEM/KVM_CAP_PRIVATE_MEM writeable.

But maybe it's okay to just make KVM_CAP_MEMORY_ATTRIBUTES writeable and
require userspace to negotiate it rather than just tying it to
CONFIG_HAVE_KVM_MEMORY_ATTRIBUTES. Or maybe introducing a new
KVM_SET_SUPPORTED_MEMORY_ATTRIBUTES ioctl to pair with
KVM_GET_SUPPORTED_MEMORY_ATTRIBUTES. It sort of makes sense, since userspace
needs to be prepared to deal with KVM_EXIT_MEMORY_FAULTs relating to these
attributes.

-Mike

>
> --
> Regards/Gruss,
> Boris.
>
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpeople.kernel.org%2Ftglx%2Fnotes-about-netiquette&data=05%7C01%7Cmichael.roth%40amd.com%7Cb019ddcb34a74fae1e3e08dae417c322%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C638073087997837943%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=QRyV96y3drt%2BqwxfifWzJ6UF6te8NOKWAFuGAYOdKcg%3D&reserved=0

2023-01-05 02:59:25

by Michael Roth

[permalink] [raw]
Subject: Re: [PATCH RFC v7 03/64] KVM: SVM: Advertise private memory support to KVM

On Fri, Dec 23, 2022 at 05:56:50PM +0100, Borislav Petkov wrote:
> On Wed, Dec 14, 2022 at 01:39:55PM -0600, Michael Roth wrote:
> > + bool (*private_mem_enabled)(struct kvm *kvm);
>
> This looks like a function returning boolean to me. IOW, you can
> simplify this to:

The semantics and existing uses of KVM_X86_OP_OPTIONAL_RET0() gave me the
impression it needed to return an integer value, since by default if a
platform doesn't implement the op it would "return 0", and so could
still be called unconditionally.

Maybe that's not actually enforced, by it seems awkward to try to use a
bool return instead. At least for KVM_X86_OP_OPTIONAL_RET0().

However, we could just use KVM_X86_OP() to declare it so we can cleanly
use a function that returns bool, and then we just need to do:

bool kvm_arch_has_private_mem(struct kvm *kvm)
{
if (kvm_x86_ops.private_mem_enabled)
return static_call(kvm_x86_private_mem_enabled)(kvm);
}

instead of relying on default return value. So I'll take that approach
and adopt your other suggested changes.

...

On a separate topic though, at a high level, this hook is basically a way
for platform-specific code to tell generic KVM code that private memslots
are supported by overriding the kvm_arch_has_private_mem() weak
reference. In this case the AMD platform is using using kvm->arch.upm_mode
flag to convey that, which is in turn set by the
KVM_CAP_UNMAPPED_PRIVATE_MEMORY introduced in this series.

But if, as I suggested in response to your PATCH 2 comments, we drop
KVM_CAP_UNAMMPED_PRIVATE_MEMORY in favor of
KVM_SET_SUPPORTED_MEMORY_ATTRIBUTES ioctl to enable "UPM mode" in SEV/SNP
code, then we need to rethink things a bit, since KVM_SET_MEMORY_ATTRIBUTES
in-part relies on kvm_arch_has_private_mem() to determine what flags are
supported, whereas SEV/SNP code would be using what was set by
KVM_SET_MEMORY_ATTRIBUTES to determine the return value in
kvm_arch_has_private_mem().

So, for AMD, the return value of kvm_arch_has_private_mem() needs to rely
on something else. Maybe the logic can just be:

bool svm_private_mem_enabled(struct kvm *kvm)
{
return sev_enabled(kvm) || sev_snp_enabled(kvm)
}

(at least in the context of this patchset where UPM support is added for
both SEV and SNP).

So I'll plan to make that change as well.

-Mike

>
> diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
> index 82ba4a564e58..4449aeff0dff 100644
> --- a/arch/x86/include/asm/kvm-x86-ops.h
> +++ b/arch/x86/include/asm/kvm-x86-ops.h
> @@ -129,6 +129,7 @@ KVM_X86_OP(msr_filter_changed)
> KVM_X86_OP(complete_emulated_msr)
> KVM_X86_OP(vcpu_deliver_sipi_vector)
> KVM_X86_OP_OPTIONAL_RET0(vcpu_get_apicv_inhibit_reasons);
> +KVM_X86_OP_OPTIONAL_RET0(private_mem_enabled);
>
> #undef KVM_X86_OP
> #undef KVM_X86_OP_OPTIONAL
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 1da0474edb2d..1b4b89ddeb55 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1574,6 +1574,7 @@ struct kvm_x86_ops {
>
> void (*load_mmu_pgd)(struct kvm_vcpu *vcpu, hpa_t root_hpa,
> int root_level);
> + bool (*private_mem_enabled)(struct kvm *kvm);
>
> bool (*has_wbinvd_exit)(void);
>
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index ce362e88a567..73b780fa4653 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -4680,6 +4680,14 @@ static int svm_vm_init(struct kvm *kvm)
> return 0;
> }
>
> +static bool svm_private_mem_enabled(struct kvm *kvm)
> +{
> + if (sev_guest(kvm))
> + return kvm->arch.upm_mode;
> +
> + return IS_ENABLED(CONFIG_HAVE_KVM_PRIVATE_MEM_TESTING);
> +}
> +
> static struct kvm_x86_ops svm_x86_ops __initdata = {
> .name = "kvm_amd",
>
> @@ -4760,6 +4768,8 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
>
> .vcpu_after_set_cpuid = svm_vcpu_after_set_cpuid,
>
> + .private_mem_enabled = svm_private_mem_enabled,
> +
> .has_wbinvd_exit = svm_has_wbinvd_exit,
>
> .get_l2_tsc_offset = svm_get_l2_tsc_offset,
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 823646d601db..9a1ca59d36a4 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -12556,6 +12556,11 @@ void __user * __x86_set_memory_region(struct kvm *kvm, int id, gpa_t gpa,
> }
> EXPORT_SYMBOL_GPL(__x86_set_memory_region);
>
> +bool kvm_arch_has_private_mem(struct kvm *kvm)
> +{
> + return static_call(kvm_x86_private_mem_enabled)(kvm);
> +}
> +
> void kvm_arch_pre_destroy_vm(struct kvm *kvm)
> {
> kvm_mmu_pre_destroy_vm(kvm);
>
> --
> Regards/Gruss,
> Boris.
>
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpeople.kernel.org%2Ftglx%2Fnotes-about-netiquette&data=05%7C01%7Cmichael.roth%40amd.com%7C319e89ce555a46eace4d08dae506b51a%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C638074114318137471%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=aG11K7va1BhemwlKCKKdcIXEwXGUzImYL%2BZ9%2FQ7XToI%3D&reserved=0

2023-01-05 02:59:27

by Michael Roth

[permalink] [raw]
Subject: Re: [PATCH RFC v7 02/64] KVM: x86: Add KVM_CAP_UNMAPPED_PRIVATE_MEMORY

On Wed, Jan 04, 2023 at 12:03:44PM +0000, Jarkko Sakkinen wrote:
> On Wed, Dec 14, 2022 at 01:39:54PM -0600, Michael Roth wrote:
> > This mainly indicates to KVM that it should expect all private guest
> > memory to be backed by private memslots. Ideally this would work
> > similarly for others archs, give or take a few additional flags, but
> > for now it's a simple boolean indicator for x86.
> >
> > Signed-off-by: Michael Roth <[email protected]>
> > ---
> > arch/x86/include/asm/kvm_host.h | 3 +++
> > arch/x86/kvm/x86.c | 10 ++++++++++
> > include/uapi/linux/kvm.h | 1 +
> > 3 files changed, 14 insertions(+)
> >
> > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > index 27ef31133352..2b6244525107 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -1438,6 +1438,9 @@ struct kvm_arch {
> > */
> > #define SPLIT_DESC_CACHE_MIN_NR_OBJECTS (SPTE_ENT_PER_PAGE + 1)
> > struct kvm_mmu_memory_cache split_desc_cache;
> > +
> > + /* Use/enforce unmapped private memory. */
> > + bool upm_mode;
> > };
> >
> > struct kvm_vm_stat {
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index c67e22f3e2ee..99ecf99bc4d2 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -4421,6 +4421,11 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
> > case KVM_CAP_EXIT_HYPERCALL:
> > r = KVM_EXIT_HYPERCALL_VALID_MASK;
> > break;
> > +#ifdef CONFIG_HAVE_KVM_MEMORY_ATTRIBUTES
> > + case KVM_CAP_UNMAPPED_PRIVATE_MEM:
> > + r = 1;
> > + break;
> > +#endif
> > case KVM_CAP_SET_GUEST_DEBUG2:
> > return KVM_GUESTDBG_VALID_MASK;
> > #ifdef CONFIG_KVM_XEN
> > @@ -6382,6 +6387,10 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
> > }
> > mutex_unlock(&kvm->lock);
> > break;
> > + case KVM_CAP_UNMAPPED_PRIVATE_MEM:
> > + kvm->arch.upm_mode = true;
> > + r = 0;
> > + break;
> > default:
> > r = -EINVAL;
> > break;
> > @@ -12128,6 +12137,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
> > kvm->arch.default_tsc_khz = max_tsc_khz ? : tsc_khz;
> > kvm->arch.guest_can_read_msr_platform_info = true;
> > kvm->arch.enable_pmu = enable_pmu;
> > + kvm->arch.upm_mode = false;
> >
> > #if IS_ENABLED(CONFIG_HYPERV)
> > spin_lock_init(&kvm->arch.hv_root_tdp_lock);
> > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> > index c7e9d375a902..cc9424ccf9b2 100644
> > --- a/include/uapi/linux/kvm.h
> > +++ b/include/uapi/linux/kvm.h
> > @@ -1219,6 +1219,7 @@ struct kvm_ppc_resize_hpt {
> > #define KVM_CAP_DIRTY_LOG_RING_ACQ_REL 223
> > #define KVM_CAP_S390_PROTECTED_ASYNC_DISABLE 224
> > #define KVM_CAP_MEMORY_ATTRIBUTES 225
> > +#define KVM_CAP_UNMAPPED_PRIVATE_MEM 240
> >
> > #ifdef KVM_CAP_IRQ_ROUTING
> >
> > --
> > 2.25.1
> >
>
> Why we want to carry non-UPM support still?

For SNP, non-UPM support is no longer included in this patchset.

However, this patchset also adds support for UPM-based SEV (for lazy-pinning
support). So we still need a way to let userspace switch between those 2
modes.

-Mike

>
> BR, Jarkko
>

2023-01-05 03:00:30

by Michael Roth

[permalink] [raw]
Subject: Re: [PATCH RFC v7 06/64] KVM: x86: Add platform hooks for private memory invalidations

On Fri, Dec 30, 2022 at 12:53:31PM +0100, Borislav Petkov wrote:
> On Wed, Dec 14, 2022 at 01:39:58PM -0600, Michael Roth wrote:
> > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > index a0c41d391547..2713632e5061 100644
> > --- a/arch/x86/kvm/mmu/mmu.c
> > +++ b/arch/x86/kvm/mmu/mmu.c
> > @@ -7183,3 +7183,8 @@ void kvm_arch_set_memory_attributes(struct kvm *kvm,
> > kvm_update_lpage_private_shared_mixed(kvm, slot, attrs,
> > start, end);
> > }
> > +
> > +void kvm_arch_invalidate_restricted_mem(struct kvm_memory_slot *slot, gfn_t start, gfn_t end)
> > +{
> > + static_call_cond(kvm_x86_invalidate_restricted_mem)(slot, start, end);
>
> Why _cond?

Since this hook is declared via KVM_X86_OP_OPTIONAL() (instead of
KVM_X86_OP_OPTIONAL_RET0 like the previous hooks), the comment in kvm-x86-ops.h
suggests this should be called via static_call_cond():

/*
* KVM_X86_OP() and KVM_X86_OP_OPTIONAL() are used to help generate
* both DECLARE/DEFINE_STATIC_CALL() invocations and
* "static_call_update()" calls.
*
* KVM_X86_OP_OPTIONAL() can be used for those functions that can have
* a NULL definition, for example if "static_call_cond()" will be used
* at the call sites. KVM_X86_OP_OPTIONAL_RET0() can be used likewise
* to make a definition optional, but in this case the default will
* be __static_call_return0.
*/


>
> > @@ -258,6 +263,17 @@ void restrictedmem_unregister_notifier(struct file *file,
> > struct restrictedmem_notifier *notifier)
> > {
> > struct restrictedmem_data *data = file->f_mapping->private_data;
> > + struct inode *inode = file_inode(data->memfd);
> > +
> > + /* TODO: this will issue notifications to all registered notifiers,
>
> First of all:
>
> verify_comment_style: WARNING: Multi-line comment needs to start text on the second line:
> [+ /* TODO: this will issue notifications to all registered notifiers,]
>
> Then, if you only want to run the callbacks for the one going away only,
> why don't you simply do:
>
> mutex_lock(&data->lock);
> notifier->ops->invalidate_start(notifier, 0, inode->i_size >> PAGE_SHIFT);
> notifier->ops->invalidate_end(notifier, 0, inode->i_size >> PAGE_SHIFT);
> list_del(&notifier->list);
> mutex_unlock(&data->lock);
>
> here?

That should do the trick. Thanks for the suggestion.

-Mike

>
> --
> Regards/Gruss,
> Boris.
>
> https://people.kernel.org/tglx/notes-about-netiquette

2023-01-05 03:00:45

by Michael Roth

[permalink] [raw]
Subject: Re: [PATCH RFC v7 04/64] KVM: x86: Add 'fault_is_private' x86 op

On Thu, Dec 29, 2022 at 05:14:03PM +0100, Borislav Petkov wrote:
> On Wed, Dec 14, 2022 at 01:39:56PM -0600, Michael Roth wrote:
> > This callback is used by the KVM MMU to check whether a #NPF was
> > or a private GPA or not.
>
> s/or //
>
> >
> > Signed-off-by: Michael Roth <[email protected]>
> > ---
> > arch/x86/include/asm/kvm-x86-ops.h | 1 +
> > arch/x86/include/asm/kvm_host.h | 1 +
> > arch/x86/kvm/mmu/mmu.c | 3 +--
> > arch/x86/kvm/mmu/mmu_internal.h | 40 +++++++++++++++++++++++++++---
> > 4 files changed, 39 insertions(+), 6 deletions(-)
> >
> > diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
> > index f530a550c092..efae987cdce0 100644
> > --- a/arch/x86/include/asm/kvm-x86-ops.h
> > +++ b/arch/x86/include/asm/kvm-x86-ops.h
> > @@ -132,6 +132,7 @@ KVM_X86_OP(complete_emulated_msr)
> > KVM_X86_OP(vcpu_deliver_sipi_vector)
> > KVM_X86_OP_OPTIONAL_RET0(vcpu_get_apicv_inhibit_reasons);
> > KVM_X86_OP_OPTIONAL_RET0(private_mem_enabled);
> > +KVM_X86_OP_OPTIONAL_RET0(fault_is_private);
> >
> > #undef KVM_X86_OP
> > #undef KVM_X86_OP_OPTIONAL
> > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > index 9317abffbf68..92539708f062 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -1636,6 +1636,7 @@ struct kvm_x86_ops {
> > void (*load_mmu_pgd)(struct kvm_vcpu *vcpu, hpa_t root_hpa,
> > int root_level);
> > int (*private_mem_enabled)(struct kvm *kvm);
> > + int (*fault_is_private)(struct kvm *kvm, gpa_t gpa, u64 error_code, bool *private_fault);
>
> bool
>
> and then you don't need the silly "== 1" at the call site.

Obviously I need to add some proper documentation for this, but a 1
return basically means 'private_fault' pass-by-ref arg has been set
with the appropriate value, whereas 0 means "there's no platform-specific
handling for this, so if you have some generic way to determine this
then use that instead".

This is mainly to handle CONFIG_HAVE_KVM_PRIVATE_MEM_TESTING, which
just parrots whatever kvm_mem_is_private() returns to support running
KVM selftests without needed hardware/platform support. If we don't
take care to skip this check where the above fault_is_private() hook
returns 1, then it ends up breaking SNP in cases where the kernel has
been compiled with CONFIG_HAVE_KVM_PRIVATE_MEM_TESTING, since SNP
relies on the page fault flags to make this determination, not
kvm_mem_is_private(), which normally only tracks the memory attributes
set by userspace via KVM_SET_MEMORY_ATTRIBUTES ioctl.

>
> >
> > bool (*has_wbinvd_exit)(void);
>
> ...
>
> > @@ -261,13 +293,13 @@ enum {
> > };
> >
> > static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
> > - u32 err, bool prefetch)
> > + u64 err, bool prefetch)
>
> The u32 -> u64 change of err could use a sentence or two of
> clarification in the commit message...

Will do.

-Mike

>
> > {
> > bool is_tdp = likely(vcpu->arch.mmu->page_fault == kvm_tdp_page_fault);
> >
> > struct kvm_page_fault fault = {
> > .addr = cr2_or_gpa,
> > - .error_code = err,
> > + .error_code = lower_32_bits(err),
> > .exec = err & PFERR_FETCH_MASK,
> > .write = err & PFERR_WRITE_MASK,
> > .present = err & PFERR_PRESENT_MASK,
> > @@ -281,8 +313,8 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
> > .max_level = KVM_MAX_HUGEPAGE_LEVEL,
> > .req_level = PG_LEVEL_4K,
> > .goal_level = PG_LEVEL_4K,
> > - .is_private = IS_ENABLED(CONFIG_HAVE_KVM_PRIVATE_MEM_TESTING) && is_tdp &&
> > - kvm_mem_is_private(vcpu->kvm, cr2_or_gpa >> PAGE_SHIFT),
> > + .is_private = is_tdp && kvm_mmu_fault_is_private(vcpu->kvm,
> > + cr2_or_gpa, err),
> > };
> > int r;
> >
> > --
> > 2.25.1
> >
>
> --
> Regards/Gruss,
> Boris.
>
> https://people.kernel.org/tglx/notes-about-netiquette

2023-01-05 03:45:30

by Chao Peng

[permalink] [raw]
Subject: Re: [PATCH RFC v7 01/64] KVM: Fix memslot boundary condition for large page

On Wed, Jan 04, 2023 at 12:01:05PM +0000, Jarkko Sakkinen wrote:
> On Wed, Dec 14, 2022 at 01:39:53PM -0600, Michael Roth wrote:
> > From: Nikunj A Dadhania <[email protected]>
> >
> > Aligned end boundary causes a kvm crash, handle the case.
> >
>
> Link: https://lore.kernel.org/kvm/[email protected]/
>
> Chao, are you aware of this issue already?

Thanks Jarkko adding me. I'm not aware of there is a fix.

>
> > Signed-off-by: Nikunj A Dadhania <[email protected]>
> > Signed-off-by: Michael Roth <[email protected]>
> > ---
> > arch/x86/kvm/mmu/mmu.c | 3 +++
> > 1 file changed, 3 insertions(+)
> >
> > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > index b1953ebc012e..b3ffc61c668c 100644
> > --- a/arch/x86/kvm/mmu/mmu.c
> > +++ b/arch/x86/kvm/mmu/mmu.c
> > @@ -7159,6 +7159,9 @@ static void kvm_update_lpage_private_shared_mixed(struct kvm *kvm,
> > for (gfn = first + pages; gfn < last; gfn += pages)
> > linfo_set_mixed(gfn, slot, level, false);
> >
> > + if (gfn == last)
> > + goto out;
> > +

Nikunj or Michael, could you help me understand in which case it causes
a KVM crash? To me, even the end is aligned to huge page boundary, but:
last = (end - 1) & mask;
so 'last' is the base address for the last effective huage page. Even
when gfn == last, it should still a valid page and needs to be updated
for mem_attrs, correct?

Thanks,
Chao
> > gfn = last;
> > gfn_end = min(last + pages, slot->base_gfn + slot->npages);
> > mixed = mem_attrs_mixed(kvm, slot, level, attrs, gfn, gfn_end);
> > --
> > 2.25.1
> >
>
>
> BR, Jarkko

2023-01-05 03:45:30

by Chao Peng

[permalink] [raw]
Subject: Re: [PATCH RFC v7 01/64] KVM: Fix memslot boundary condition for large page

On Thu, Dec 22, 2022 at 01:16:04PM +0100, Borislav Petkov wrote:
> On Wed, Dec 14, 2022 at 01:39:53PM -0600, Michael Roth wrote:
> > From: Nikunj A Dadhania <[email protected]>
> >
> > Aligned end boundary causes a kvm crash, handle the case.
> >
> > Signed-off-by: Nikunj A Dadhania <[email protected]>
> > Signed-off-by: Michael Roth <[email protected]>
> > ---
> > arch/x86/kvm/mmu/mmu.c | 3 +++
> > 1 file changed, 3 insertions(+)
> >
> > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > index b1953ebc012e..b3ffc61c668c 100644
> > --- a/arch/x86/kvm/mmu/mmu.c
> > +++ b/arch/x86/kvm/mmu/mmu.c
> > @@ -7159,6 +7159,9 @@ static void kvm_update_lpage_private_shared_mixed(struct kvm *kvm,
> > for (gfn = first + pages; gfn < last; gfn += pages)
> > linfo_set_mixed(gfn, slot, level, false);
> >
> > + if (gfn == last)
> > + goto out;
>
> I'm guessing this was supposed to be "return;" here:

If we finally need this, this should be "continue;", we can't skip the
remaining huge page levels.

Thanks,
Chao
>
> arch/x86/kvm/mmu/mmu.c: In function ??kvm_update_lpage_private_shared_mixed??:
> arch/x86/kvm/mmu/mmu.c:7090:25: error: label ??out?? used but not defined
> 7090 | goto out;
> | ^~~~
>
> /me goes and digs deeper.
>
> Aha, it was a "return" but you reordered the patches and the one adding
> the out label:
>
> KVM: x86: Add 'update_mem_attr' x86 op
>
> went further down and this became the first but it didn't have the label
> anymore.
>
> Yeah, each patch needs to build successfully for bisection reasons, ofc.
>
> Thx.
>
> --
> Regards/Gruss,
> Boris.
>
> https://people.kernel.org/tglx/notes-about-netiquette

2023-01-05 04:19:30

by Nikunj A. Dadhania

[permalink] [raw]
Subject: Re: [PATCH RFC v7 01/64] KVM: Fix memslot boundary condition for large page



On 05/01/23 09:04, Chao Peng wrote:
> On Wed, Jan 04, 2023 at 12:01:05PM +0000, Jarkko Sakkinen wrote:
>> On Wed, Dec 14, 2022 at 01:39:53PM -0600, Michael Roth wrote:
>>> From: Nikunj A Dadhania <[email protected]>
>>>
>>> Aligned end boundary causes a kvm crash, handle the case.
>>>
>>
>> Link: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Fkvm%2F20221202061347.1070246-8-chao.p.peng%40linux.intel.com%2F&data=05%7C01%7Cnikunj.dadhania%40amd.com%7C7a95933fac1b433e339c08daeece6c2c%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C638084867591405299%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=vDEu9Uxs0QRdzbUkJbE2LsJnMHJJHBdQijkePbE2woc%3D&reserved=0
>>
>> Chao, are you aware of this issue already?
>
> Thanks Jarkko adding me. I'm not aware of there is a fix.

It was discussed here: https://lore.kernel.org/all/[email protected]/

I was hitting this with one of the selftests case.

>
>>
>>> Signed-off-by: Nikunj A Dadhania <[email protected]>
>>> Signed-off-by: Michael Roth <[email protected]>
>>> ---
>>> arch/x86/kvm/mmu/mmu.c | 3 +++
>>> 1 file changed, 3 insertions(+)
>>>
>>> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
>>> index b1953ebc012e..b3ffc61c668c 100644
>>> --- a/arch/x86/kvm/mmu/mmu.c
>>> +++ b/arch/x86/kvm/mmu/mmu.c
>>> @@ -7159,6 +7159,9 @@ static void kvm_update_lpage_private_shared_mixed(struct kvm *kvm,
>>> for (gfn = first + pages; gfn < last; gfn += pages)
>>> linfo_set_mixed(gfn, slot, level, false);
>>>
>>> + if (gfn == last)
>>> + goto out;
>>> +
>
> Nikunj or Michael, could you help me understand in which case it causes
> a KVM crash? To me, even the end is aligned to huge page boundary, but:
> last = (end - 1) & mask;
> so 'last' is the base address for the last effective huage page. Even
> when gfn == last, it should still a valid page and needs to be updated
> for mem_attrs, correct?

Yes, that is correct with: last = (end - 1) & mask;

We can drop this patch from SNP series.

Regards
Nikunj

2023-01-05 08:37:47

by Chao Peng

[permalink] [raw]
Subject: Re: [PATCH RFC v7 01/64] KVM: Fix memslot boundary condition for large page

On Thu, Jan 05, 2023 at 09:38:59AM +0530, Nikunj A. Dadhania wrote:
>
>
> On 05/01/23 09:04, Chao Peng wrote:
> > On Wed, Jan 04, 2023 at 12:01:05PM +0000, Jarkko Sakkinen wrote:
> >> On Wed, Dec 14, 2022 at 01:39:53PM -0600, Michael Roth wrote:
> >>> From: Nikunj A Dadhania <[email protected]>
> >>>
> >>> Aligned end boundary causes a kvm crash, handle the case.
> >>>
> >>
> >> Link: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Fkvm%2F20221202061347.1070246-8-chao.p.peng%40linux.intel.com%2F&data=05%7C01%7Cnikunj.dadhania%40amd.com%7C7a95933fac1b433e339c08daeece6c2c%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C638084867591405299%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=vDEu9Uxs0QRdzbUkJbE2LsJnMHJJHBdQijkePbE2woc%3D&reserved=0
> >>
> >> Chao, are you aware of this issue already?
> >
> > Thanks Jarkko adding me. I'm not aware of there is a fix.
>
> It was discussed here: https://lore.kernel.org/all/[email protected]/
>
> I was hitting this with one of the selftests case.

Yeah, I remember that discussion. With the new UPM code, this bug
should be fixed. If you still hit the issue please let me know.

Thanks,
Chao
>
> >
> >>
> >>> Signed-off-by: Nikunj A Dadhania <[email protected]>
> >>> Signed-off-by: Michael Roth <[email protected]>
> >>> ---
> >>> arch/x86/kvm/mmu/mmu.c | 3 +++
> >>> 1 file changed, 3 insertions(+)
> >>>
> >>> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> >>> index b1953ebc012e..b3ffc61c668c 100644
> >>> --- a/arch/x86/kvm/mmu/mmu.c
> >>> +++ b/arch/x86/kvm/mmu/mmu.c
> >>> @@ -7159,6 +7159,9 @@ static void kvm_update_lpage_private_shared_mixed(struct kvm *kvm,
> >>> for (gfn = first + pages; gfn < last; gfn += pages)
> >>> linfo_set_mixed(gfn, slot, level, false);
> >>>
> >>> + if (gfn == last)
> >>> + goto out;
> >>> +
> >
> > Nikunj or Michael, could you help me understand in which case it causes
> > a KVM crash? To me, even the end is aligned to huge page boundary, but:
> > last = (end - 1) & mask;
> > so 'last' is the base address for the last effective huage page. Even
> > when gfn == last, it should still a valid page and needs to be updated
> > for mem_attrs, correct?
>
> Yes, that is correct with: last = (end - 1) & mask;
>
> We can drop this patch from SNP series.
>
> Regards
> Nikunj

2023-01-05 12:02:33

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH RFC v7 02/64] KVM: x86: Add KVM_CAP_UNMAPPED_PRIVATE_MEMORY

On Wed, Jan 04, 2023 at 11:47:21AM -0600, Michael Roth wrote:
> But maybe it's okay to just make KVM_CAP_MEMORY_ATTRIBUTES writeable and
> require userspace to negotiate it rather than just tying it to
> CONFIG_HAVE_KVM_MEMORY_ATTRIBUTES. Or maybe introducing a new
> KVM_SET_SUPPORTED_MEMORY_ATTRIBUTES ioctl to pair with
> KVM_GET_SUPPORTED_MEMORY_ATTRIBUTES. It sort of makes sense, since userspace
> needs to be prepared to deal with KVM_EXIT_MEMORY_FAULTs relating to these
> attributes.

Makes sense.

AFAICT, ofc.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2023-01-05 15:12:49

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH RFC v7 03/64] KVM: SVM: Advertise private memory support to KVM

On Wed, Jan 04, 2023 at 08:14:19PM -0600, Michael Roth wrote:
> Maybe that's not actually enforced, by it seems awkward to try to use a
> bool return instead. At least for KVM_X86_OP_OPTIONAL_RET0().

I don't see there being a problem/restriction for bool functions, see

5be2226f417d ("KVM: x86: allow defining return-0 static calls")

and __static_call_return0() returns a long which, if you wanna interpret as
bool, works too as "false".

I still need to disassemble and single-step through a static_call to see what
all that magic does in detail, to be sure.

> However, we could just use KVM_X86_OP() to declare it so we can cleanly
> use a function that returns bool, and then we just need to do:
>
> bool kvm_arch_has_private_mem(struct kvm *kvm)
> {
> if (kvm_x86_ops.private_mem_enabled)
> return static_call(kvm_x86_private_mem_enabled)(kvm);

That would be defeating the whole purpose of static calls, AFAICT, as you're
testing the pointer. Might as well leave it be a normal function pointer then.

> On a separate topic though, at a high level, this hook is basically a way
> for platform-specific code to tell generic KVM code that private memslots
> are supported by overriding the kvm_arch_has_private_mem() weak
> reference. In this case the AMD platform is using using kvm->arch.upm_mode
> flag to convey that, which is in turn set by the
> KVM_CAP_UNMAPPED_PRIVATE_MEMORY introduced in this series.
>
> But if, as I suggested in response to your PATCH 2 comments, we drop
> KVM_CAP_UNAMMPED_PRIVATE_MEMORY in favor of
> KVM_SET_SUPPORTED_MEMORY_ATTRIBUTES ioctl to enable "UPM mode" in SEV/SNP
> code, then we need to rethink things a bit, since KVM_SET_MEMORY_ATTRIBUTES
> in-part relies on kvm_arch_has_private_mem() to determine what flags are
> supported, whereas SEV/SNP code would be using what was set by
> KVM_SET_MEMORY_ATTRIBUTES to determine the return value in
> kvm_arch_has_private_mem().
>
> So, for AMD, the return value of kvm_arch_has_private_mem() needs to rely
> on something else. Maybe the logic can just be:
>
> bool svm_private_mem_enabled(struct kvm *kvm)
> {
> return sev_enabled(kvm) || sev_snp_enabled(kvm)

I haven't followed the whole discussion in detail but this means that SEV/SNP
*means* UPM. I.e., no SEV/SNP without UPM, correct? I guess that's the final
thing you guys decided to do ...

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2023-01-05 22:42:14

by Kalra, Ashish

[permalink] [raw]
Subject: Re: [PATCH RFC v7 25/64] crypto: ccp: Add support to initialize the AMD-SP for SEV-SNP

Hello Jarkko,

On 12/31/2022 9:32 AM, Jarkko Sakkinen wrote:
> On Wed, Dec 14, 2022 at 01:40:17PM -0600, Michael Roth wrote:
>> From: Brijesh Singh <[email protected]>
>>
>> Before SNP VMs can be launched, the platform must be appropriately
>> configured and initialized. Platform initialization is accomplished via
>> the SNP_INIT command. Make sure to do a WBINVD and issue DF_FLUSH
>> command to prepare for the first SNP guest launch after INIT.
>>
>> During the execution of SNP_INIT command, the firmware configures
>> and enables SNP security policy enforcement in many system components.
>> Some system components write to regions of memory reserved by early
>> x86 firmware (e.g. UEFI). Other system components write to regions
>> provided by the operation system, hypervisor, or x86 firmware.
>> Such system components can only write to HV-fixed pages or Default
>> pages. They will error when attempting to write to other page states
>> after SNP_INIT enables their SNP enforcement.
>>
>> Starting in SNP firmware v1.52, the SNP_INIT_EX command takes a list of
>> system physical address ranges to convert into the HV-fixed page states
>> during the RMP initialization. If INIT_RMP is 1, hypervisors should
>> provide all system physical address ranges that the hypervisor will
>> never assign to a guest until the next RMP re-initialization.
>> For instance, the memory that UEFI reserves should be included in the
>> range list. This allows system components that occasionally write to
>> memory (e.g. logging to UEFI reserved regions) to not fail due to
>> RMP initialization and SNP enablement.
>>
>> Co-developed-by: Ashish Kalra <[email protected]>
>> Signed-off-by: Ashish Kalra <[email protected]>
>> Signed-off-by: Brijesh Singh <[email protected]>
>> Signed-off-by: Michael Roth <[email protected]>
>> ---
>> drivers/crypto/ccp/sev-dev.c | 225 +++++++++++++++++++++++++++++++++++
>> drivers/crypto/ccp/sev-dev.h | 2 +
>> include/linux/psp-sev.h | 17 +++
>> 3 files changed, 244 insertions(+)
>>
>> diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
>> index 9d84720a41d7..af20420bd6c2 100644
>> --- a/drivers/crypto/ccp/sev-dev.c
>> +++ b/drivers/crypto/ccp/sev-dev.c
>> @@ -26,6 +26,7 @@
>> #include <linux/fs_struct.h>
>>
>> #include <asm/smp.h>
>> +#include <asm/e820/types.h>
>>
>> #include "psp-dev.h"
>> #include "sev-dev.h"
>> @@ -34,6 +35,10 @@
>> #define SEV_FW_FILE "amd/sev.fw"
>> #define SEV_FW_NAME_SIZE 64
>>
>> +/* Minimum firmware version required for the SEV-SNP support */
>> +#define SNP_MIN_API_MAJOR 1
>> +#define SNP_MIN_API_MINOR 51
>> +
>> static DEFINE_MUTEX(sev_cmd_mutex);
>> static struct sev_misc_dev *misc_dev;
>>
>> @@ -76,6 +81,13 @@ static void *sev_es_tmr;
>> #define NV_LENGTH (32 * 1024)
>> static void *sev_init_ex_buffer;
>>
>> +/*
>> + * SEV_DATA_RANGE_LIST:
>> + * Array containing range of pages that firmware transitions to HV-fixed
>> + * page state.
>> + */
>> +struct sev_data_range_list *snp_range_list;
>> +
>> static inline bool sev_version_greater_or_equal(u8 maj, u8 min)
>> {
>> struct sev_device *sev = psp_master->sev_data;
>> @@ -830,6 +842,186 @@ static int sev_update_firmware(struct device *dev)
>> return ret;
>> }
>>
>> +static void snp_set_hsave_pa(void *arg)
>> +{
>> + wrmsrl(MSR_VM_HSAVE_PA, 0);
>> +}
>> +
>> +static int snp_filter_reserved_mem_regions(struct resource *rs, void *arg)
>> +{
>> + struct sev_data_range_list *range_list = arg;
>> + struct sev_data_range *range = &range_list->ranges[range_list->num_elements];
>> + size_t size;
>> +
>> + if ((range_list->num_elements * sizeof(struct sev_data_range) +
>> + sizeof(struct sev_data_range_list)) > PAGE_SIZE)
>> + return -E2BIG;
>> +
>> + switch (rs->desc) {
>> + case E820_TYPE_RESERVED:
>> + case E820_TYPE_PMEM:
>> + case E820_TYPE_ACPI:
>> + range->base = rs->start & PAGE_MASK;
>> + size = (rs->end + 1) - rs->start;
>> + range->page_count = size >> PAGE_SHIFT;
>> + range_list->num_elements++;
>> + break;
>> + default:
>> + break;
>> + }
>> +
>> + return 0;
>> +}
>> +
>> +static int __sev_snp_init_locked(int *error)
>> +{
>> + struct psp_device *psp = psp_master;
>> + struct sev_data_snp_init_ex data;
>> + struct sev_device *sev;
>> + int rc = 0;
>> +
>> + if (!psp || !psp->sev_data)
>> + return -ENODEV;
>> +
>> + sev = psp->sev_data;
>> +
>> + if (sev->snp_initialized)
>> + return 0;
>
> Shouldn't this follow this check:
>
> if (sev->state == SEV_STATE_INIT) {
> /* debug printk about possible incorrect call order */
> return -ENODEV;
> }
>
> It is game over for SNP, if SEV_CMD_INIT{_EX} got first, which means that
> this should not proceed.


But, how will SEV_CMD_INIT_EX happen before as sev_pci_init() which is
invoked during CCP module load/initialization, will first try to do
sev_snp_init() if SNP is supported, before it invokes
sev_platform_init() to do SEV firmware initialization ?

Thanks,
Ashish

2023-01-05 23:03:58

by Kalra, Ashish

[permalink] [raw]
Subject: Re: [PATCH RFC v7 25/64] crypto: ccp: Add support to initialize the AMD-SP for SEV-SNP

Hello Jarkko,

On 1/4/2023 6:12 AM, Jarkko Sakkinen wrote:
> On Wed, Dec 14, 2022 at 01:40:17PM -0600, Michael Roth wrote:
>> + /*
>> + * If boot CPU supports SNP, then first attempt to initialize
>> + * the SNP firmware.
>> + */
>> + if (cpu_feature_enabled(X86_FEATURE_SEV_SNP)) {
>> + if (!sev_version_greater_or_equal(SNP_MIN_API_MAJOR, SNP_MIN_API_MINOR)) {
>> + dev_err(sev->dev, "SEV-SNP support requires firmware version >= %d:%d\n",
>> + SNP_MIN_API_MAJOR, SNP_MIN_API_MINOR);
>> + } else {
>> + rc = sev_snp_init(&error, true);
>> + if (rc) {
>> + /*
>> + * Don't abort the probe if SNP INIT failed,
>> + * continue to initialize the legacy SEV firmware.
>> + */
>> + dev_err(sev->dev, "SEV-SNP: failed to INIT error %#x\n", error);
>> + }
>> + }
>> + }
>
> I think this is not right as there is a dep between sev init and this,
> and there is about a dozen of call sites already __sev_platform_init_locked().
>

sev_init ?

As this is invoked during CCP module load/initialization, shouldn't this
get invoked before any other call sites invoking
__sev_platform_init_locked() ?

Thanks,
Ashish

> Instead there should be __sev_snp_init_locked() that would be called as
> part of __sev_platform_init_locked() flow.
>
> Also TMR allocation should be moved inside __sev_platform_init_locked,
> given that it needs to be marked into RMP after SNP init.
>
> BR, Jarkko
>

2023-01-05 23:38:14

by Kalra, Ashish

[permalink] [raw]
Subject: Re: [PATCH RFC v7 37/64] KVM: SVM: Add KVM_SNP_INIT command

Hello Jarkko,

On 12/31/2022 8:27 AM, Jarkko Sakkinen wrote:
> On Wed, Dec 14, 2022 at 01:40:29PM -0600, Michael Roth wrote:
>> static int sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp)
>> {
>> struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
>> @@ -260,13 +279,23 @@ static int sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp)
>> return ret;
>>
>> sev->active = true;
>> - sev->es_active = argp->id == KVM_SEV_ES_INIT;
>> + sev->es_active = (argp->id == KVM_SEV_ES_INIT || argp->id == KVM_SEV_SNP_INIT);
>> + sev->snp_active = argp->id == KVM_SEV_SNP_INIT;
>> asid = sev_asid_new(sev);
>> if (asid < 0)
>> goto e_no_asid;
>> sev->asid = asid;
>>
>> - ret = sev_platform_init(&argp->error);
>> + if (sev->snp_active) {
>> + ret = verify_snp_init_flags(kvm, argp);
>> + if (ret)
>> + goto e_free;
>> +
>> + ret = sev_snp_init(&argp->error, false);
>> + } else {
>> + ret = sev_platform_init(&argp->error);
>> + }
>
> Couldn't sev_snp_init() and sev_platform_init() be called unconditionally
> in order?
>
> Since there is a hardware constraint that SNP init needs to always happen
> before platform init, shouldn't SNP init happen as part of
> __sev_platform_init_locked() instead?
>

On Genoa there is currently an issue that if we do an SNP_INIT before an
SEV_INIT and then attempt to launch a SEV guest that may fail, so we
need to keep SNP INIT and SEV INIT separate.

We need to provide a way to run (existing) SEV guests on a system that
supports SNP without doing an SNP_INIT at all.

This is done using psp_init_on_probe parameter of the CCP module to
avoid doing either SNP/SEV firmware initialization during module load
and then defer the firmware initialization till someone launches a guest
of one flavor or the other.

And then sev_guest_init() does either SNP or SEV firmware init depending
on the type of the guest being launched.

> I found these call sites for __sev_platform_init_locked(), none of which
> follow the correct call order:
>
> * sev_guest_init()

As explained above, this call site is important for deferring the
firmware initialization to an actual guest launch.

> * sev_ioctl_do_pek_csr
> * sev_ioctl_do_pdh_export()
> * sev_ioctl_do_pek_import()
> * sev_ioctl_do_pek_pdh_gen()
> * sev_pci_init()
>
> For me it looks like a bit flakky API use to have sev_snp_init() as an API
> call.
>
> I would suggest to make SNP init internal to the ccp driver and take care
> of the correct orchestration over there.
>

Due to Genoa issue, we may still need SNP init and SEV init to be
invoked separately outside the CCP driver.

> Also, how it currently works in this patch set, if the firmware did not
> load correctly, SNP init halts the whole system. The version check needs
> to be in all call paths.
>

Yes, i agree with that.

Thanks,
Ashish

2023-01-09 03:34:27

by Alexey Kardashevskiy

[permalink] [raw]
Subject: Re: [PATCH RFC v7 52/64] KVM: SVM: Provide support for SNP_GUEST_REQUEST NAE event

On 15/12/22 06:40, Michael Roth wrote:
> From: Brijesh Singh <[email protected]>
>
> Version 2 of GHCB specification added the support for two SNP Guest
> Request Message NAE events. The events allows for an SEV-SNP guest to
> make request to the SEV-SNP firmware through hypervisor using the
> SNP_GUEST_REQUEST API define in the SEV-SNP firmware specification.
>
> The SNP_EXT_GUEST_REQUEST is similar to SNP_GUEST_REQUEST with the
> difference of an additional certificate blob that can be passed through
> the SNP_SET_CONFIG ioctl defined in the CCP driver. The CCP driver
> provides snp_guest_ext_guest_request() that is used by the KVM to get
> both the report and certificate data at once.
>
> Signed-off-by: Brijesh Singh <[email protected]>
> Signed-off-by: Ashish Kalra <[email protected]>
> Signed-off-by: Michael Roth <[email protected]>
> ---
> arch/x86/kvm/svm/sev.c | 185 +++++++++++++++++++++++++++++++++++++++--
> arch/x86/kvm/svm/svm.h | 2 +
> 2 files changed, 181 insertions(+), 6 deletions(-)
>
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 5f2b2092cdae..18efa70553c2 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -331,6 +331,7 @@ static int sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp)
> if (ret)
> goto e_free;
>
> + mutex_init(&sev->guest_req_lock);
> ret = sev_snp_init(&argp->error, false);
> } else {
> ret = sev_platform_init(&argp->error);
> @@ -2051,23 +2052,34 @@ int sev_vm_move_enc_context_from(struct kvm *kvm, unsigned int source_fd)
> */
> static void *snp_context_create(struct kvm *kvm, struct kvm_sev_cmd *argp)
> {
> + struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> struct sev_data_snp_addr data = {};
> - void *context;
> + void *context, *certs_data;
> int rc;
>
> + /* Allocate memory used for the certs data in SNP guest request */
> + certs_data = kzalloc(SEV_FW_BLOB_MAX_SIZE, GFP_KERNEL_ACCOUNT);
> + if (!certs_data)
> + return NULL;
> +
> /* Allocate memory for context page */
> context = snp_alloc_firmware_page(GFP_KERNEL_ACCOUNT);
> if (!context)
> - return NULL;
> + goto e_free;
>
> data.gctx_paddr = __psp_pa(context);
> rc = __sev_issue_cmd(argp->sev_fd, SEV_CMD_SNP_GCTX_CREATE, &data, &argp->error);
> - if (rc) {
> - snp_free_firmware_page(context);
> - return NULL;
> - }
> + if (rc)
> + goto e_free;
> +
> + sev->snp_certs_data = certs_data;
>
> return context;
> +
> +e_free:
> + snp_free_firmware_page(context);
> + kfree(certs_data);
> + return NULL;
> }
>
> static int snp_bind_asid(struct kvm *kvm, int *error)
> @@ -2653,6 +2665,8 @@ static int snp_decommission_context(struct kvm *kvm)
> snp_free_firmware_page(sev->snp_context);
> sev->snp_context = NULL;
>
> + kfree(sev->snp_certs_data);
> +
> return 0;
> }
>
> @@ -3174,6 +3188,8 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm, u64 *exit_code)
> case SVM_VMGEXIT_UNSUPPORTED_EVENT:
> case SVM_VMGEXIT_HV_FEATURES:
> case SVM_VMGEXIT_PSC:
> + case SVM_VMGEXIT_GUEST_REQUEST:
> + case SVM_VMGEXIT_EXT_GUEST_REQUEST:
> break;
> default:
> reason = GHCB_ERR_INVALID_EVENT;
> @@ -3396,6 +3412,149 @@ static int snp_complete_psc(struct kvm_vcpu *vcpu)
> return 1;
> }
>
> +static unsigned long snp_setup_guest_buf(struct vcpu_svm *svm,
> + struct sev_data_snp_guest_request *data,
> + gpa_t req_gpa, gpa_t resp_gpa)
> +{
> + struct kvm_vcpu *vcpu = &svm->vcpu;
> + struct kvm *kvm = vcpu->kvm;
> + kvm_pfn_t req_pfn, resp_pfn;
> + struct kvm_sev_info *sev;
> +
> + sev = &to_kvm_svm(kvm)->sev_info;
> +
> + if (!IS_ALIGNED(req_gpa, PAGE_SIZE) || !IS_ALIGNED(resp_gpa, PAGE_SIZE))
> + return SEV_RET_INVALID_PARAM;
> +
> + req_pfn = gfn_to_pfn(kvm, gpa_to_gfn(req_gpa));
> + if (is_error_noslot_pfn(req_pfn))
> + return SEV_RET_INVALID_ADDRESS;
> +
> + resp_pfn = gfn_to_pfn(kvm, gpa_to_gfn(resp_gpa));
> + if (is_error_noslot_pfn(resp_pfn))
> + return SEV_RET_INVALID_ADDRESS;
> +
> + if (rmp_make_private(resp_pfn, 0, PG_LEVEL_4K, 0, true))
> + return SEV_RET_INVALID_ADDRESS;
> +
> + data->gctx_paddr = __psp_pa(sev->snp_context);
> + data->req_paddr = __sme_set(req_pfn << PAGE_SHIFT);
> + data->res_paddr = __sme_set(resp_pfn << PAGE_SHIFT);
> +
> + return 0;
> +}
> +
> +static void snp_cleanup_guest_buf(struct sev_data_snp_guest_request *data, unsigned long *rc)
> +{
> + u64 pfn = __sme_clr(data->res_paddr) >> PAGE_SHIFT;
> + int ret;
> +
> + ret = snp_page_reclaim(pfn);
> + if (ret)
> + *rc = SEV_RET_INVALID_ADDRESS;
> +
> + ret = rmp_make_shared(pfn, PG_LEVEL_4K);
> + if (ret)
> + *rc = SEV_RET_INVALID_ADDRESS;
> +}
> +
> +static void snp_handle_guest_request(struct vcpu_svm *svm, gpa_t req_gpa, gpa_t resp_gpa)
> +{
> + struct sev_data_snp_guest_request data = {0};
> + struct kvm_vcpu *vcpu = &svm->vcpu;
> + struct kvm *kvm = vcpu->kvm;
> + struct kvm_sev_info *sev;
> + unsigned long rc;
> + int err;
> +
> + if (!sev_snp_guest(vcpu->kvm)) {
> + rc = SEV_RET_INVALID_GUEST;
> + goto e_fail;
> + }
> +
> + sev = &to_kvm_svm(kvm)->sev_info;
> +
> + mutex_lock(&sev->guest_req_lock);
> +
> + rc = snp_setup_guest_buf(svm, &data, req_gpa, resp_gpa);
> + if (rc)
> + goto unlock;
> +
> + rc = sev_issue_cmd(kvm, SEV_CMD_SNP_GUEST_REQUEST, &data, &err);


This one goes via sev_issue_cmd_external_user() and uses sev-fd...

> + if (rc)
> + /* use the firmware error code */
> + rc = err;
> +
> + snp_cleanup_guest_buf(&data, &rc);
> +
> +unlock:
> + mutex_unlock(&sev->guest_req_lock);
> +
> +e_fail:
> + svm_set_ghcb_sw_exit_info_2(vcpu, rc);
> +}
> +
> +static void snp_handle_ext_guest_request(struct vcpu_svm *svm, gpa_t req_gpa, gpa_t resp_gpa)
> +{
> + struct sev_data_snp_guest_request req = {0};
> + struct kvm_vcpu *vcpu = &svm->vcpu;
> + struct kvm *kvm = vcpu->kvm;
> + unsigned long data_npages;
> + struct kvm_sev_info *sev;
> + unsigned long rc, err;
> + u64 data_gpa;
> +
> + if (!sev_snp_guest(vcpu->kvm)) {
> + rc = SEV_RET_INVALID_GUEST;
> + goto e_fail;
> + }
> +
> + sev = &to_kvm_svm(kvm)->sev_info;
> +
> + data_gpa = vcpu->arch.regs[VCPU_REGS_RAX];
> + data_npages = vcpu->arch.regs[VCPU_REGS_RBX];
> +
> + if (!IS_ALIGNED(data_gpa, PAGE_SIZE)) {
> + rc = SEV_RET_INVALID_ADDRESS;
> + goto e_fail;
> + }
> +
> + mutex_lock(&sev->guest_req_lock);
> +
> + rc = snp_setup_guest_buf(svm, &req, req_gpa, resp_gpa);
> + if (rc)
> + goto unlock;
> +
> + rc = snp_guest_ext_guest_request(&req, (unsigned long)sev->snp_certs_data,
> + &data_npages, &err);

but this one does not and jump straight to drivers/crypto/ccp/sev-dev.c
ignoring sev->fd. Why different? Can these two be unified?
sev_issue_cmd_external_user() only checks if fd is /dev/sev which is
hardly useful.

"[PATCH RFC v7 32/64] crypto: ccp: Provide APIs to query extended
attestation report" added this one.

Besides, is sev->fd really needed in the sev struct at all? Thanks,


> + if (rc) {
> + /*
> + * If buffer length is small then return the expected
> + * length in rbx.
> + */
> + if (err == SNP_GUEST_REQ_INVALID_LEN)
> + vcpu->arch.regs[VCPU_REGS_RBX] = data_npages;
> +
> + /* pass the firmware error code */
> + rc = err;
> + goto cleanup;
> + }
> +
> + /* Copy the certificate blob in the guest memory */
> + if (data_npages &&
> + kvm_write_guest(kvm, data_gpa, sev->snp_certs_data, data_npages << PAGE_SHIFT))
> + rc = SEV_RET_INVALID_ADDRESS;
> +
> +cleanup:
> + snp_cleanup_guest_buf(&req, &rc);
> +
> +unlock:
> + mutex_unlock(&sev->guest_req_lock);
> +
> +e_fail:
> + svm_set_ghcb_sw_exit_info_2(vcpu, rc);
> +}
> +
> static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
> {
> struct vmcb_control_area *control = &svm->vmcb->control;
> @@ -3629,6 +3788,20 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
> vcpu->run->vmgexit.ghcb_msr = ghcb_gpa;
> vcpu->arch.complete_userspace_io = snp_complete_psc;
> break;
> + case SVM_VMGEXIT_GUEST_REQUEST: {
> + snp_handle_guest_request(svm, control->exit_info_1, control->exit_info_2);
> +
> + ret = 1;
> + break;
> + }
> + case SVM_VMGEXIT_EXT_GUEST_REQUEST: {
> + snp_handle_ext_guest_request(svm,
> + control->exit_info_1,
> + control->exit_info_2);
> +
> + ret = 1;
> + break;
> + }
> case SVM_VMGEXIT_UNSUPPORTED_EVENT:
> vcpu_unimpl(vcpu,
> "vmgexit: unsupported event - exit_info_1=%#llx, exit_info_2=%#llx\n",
> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> index 12b9f4d539fb..7c0f9d00950f 100644
> --- a/arch/x86/kvm/svm/svm.h
> +++ b/arch/x86/kvm/svm/svm.h
> @@ -101,6 +101,8 @@ struct kvm_sev_info {
> u64 snp_init_flags;
> void *snp_context; /* SNP guest context page */
> spinlock_t psc_lock;
> + void *snp_certs_data;
> + struct mutex guest_req_lock;
> };
>
> struct kvm_svm {

--
Alexey

2023-01-09 16:58:09

by Dionna Amalie Glaze

[permalink] [raw]
Subject: Re: [PATCH RFC v7 62/64] x86/sev: Add KVM commands for instance certs

> > +
> > +static int snp_set_instance_certs(struct kvm *kvm, struct kvm_sev_cmd *argp)
> > +{
> [...]
>
> Here we set the length to the page-aligned value, but we copy only
> params.cert_len bytes. If there are two subsequent
> snp_set_instance_certs() calls where the second one has a shorter
> length, we might "keep" some leftover bytes from the first call.
>
> Consider:
> 1. snp_set_instance_certs(certs_addr point to "AAA...", certs_len=8192)
> 2. snp_set_instance_certs(certs_addr point to "BBB...", certs_len=4097)
>
> If I understand correctly, on the second call we'll copy 4097 "BBB..."
> bytes into the to_certs buffer, but length will be (4096 + PAGE_SIZE -
> 1) & PAGE_MASK which will be 8192.
>
> Later when fetching the certs (for the extended report or in
> snp_get_instance_certs()) the user will get a buffer of 8192 bytes
> filled with 4097 BBBs and 4095 leftover AAAs.
>
> Maybe zero sev->snp_certs_data entirely before writing to it?
>

Yes, I agree it should be zeroed, at least if the previous length is
greater than the new length. Good catch.


> Related question (not only for this patch) regarding snp_certs_data
> (host or per-instance): why is its size page-aligned at all? why is it
> limited by 16KB or 20KB? If I understand correctly, for SNP, this buffer
> is never sent to the PSP.
>

The buffer is meant to be copied into the guest driver following the
GHCB extended guest request protocol. The data to copy back are
expected to be in 4K page granularity.

> [...]
> >
> > -#define SEV_FW_BLOB_MAX_SIZE 0x4000 /* 16KB */
> > +#define SEV_FW_BLOB_MAX_SIZE 0x5000 /* 20KB */
> >
>
> This has effects in drivers/crypto/ccp/sev-dev.c
> (for
> example in alloc_snp_host_map). Is that OK?
>

No, this was a mistake of mine because I was using a bloated data
encoding that needed 5 pages for the GUID table plus 4 small
certificates. I've since fixed that in our user space code.
We shouldn't change this size and instead wait for a better size
negotiation protocol between the guest and host to avoid this awkward
hard-coding.


--
-Dionna Glaze, PhD (she/her)

2023-01-09 22:31:51

by Tom Lendacky

[permalink] [raw]
Subject: Re: [PATCH RFC v7 62/64] x86/sev: Add KVM commands for instance certs

On 1/9/23 10:55, Dionna Amalie Glaze wrote:
>>> +
>>> +static int snp_set_instance_certs(struct kvm *kvm, struct kvm_sev_cmd *argp)
>>> +{
>> [...]
>>
>> Here we set the length to the page-aligned value, but we copy only
>> params.cert_len bytes. If there are two subsequent
>> snp_set_instance_certs() calls where the second one has a shorter
>> length, we might "keep" some leftover bytes from the first call.
>>
>> Consider:
>> 1. snp_set_instance_certs(certs_addr point to "AAA...", certs_len=8192)
>> 2. snp_set_instance_certs(certs_addr point to "BBB...", certs_len=4097)
>>
>> If I understand correctly, on the second call we'll copy 4097 "BBB..."
>> bytes into the to_certs buffer, but length will be (4096 + PAGE_SIZE -
>> 1) & PAGE_MASK which will be 8192.
>>
>> Later when fetching the certs (for the extended report or in
>> snp_get_instance_certs()) the user will get a buffer of 8192 bytes
>> filled with 4097 BBBs and 4095 leftover AAAs.
>>
>> Maybe zero sev->snp_certs_data entirely before writing to it?
>>
>
> Yes, I agree it should be zeroed, at least if the previous length is
> greater than the new length. Good catch.
>
>
>> Related question (not only for this patch) regarding snp_certs_data
>> (host or per-instance): why is its size page-aligned at all? why is it
>> limited by 16KB or 20KB? If I understand correctly, for SNP, this buffer
>> is never sent to the PSP.
>>
>
> The buffer is meant to be copied into the guest driver following the
> GHCB extended guest request protocol. The data to copy back are
> expected to be in 4K page granularity.

I don't think the data has to be in 4K page granularity. Why do you think
it does?

Thanks,
Tom

>
>> [...]
>>>
>>> -#define SEV_FW_BLOB_MAX_SIZE 0x4000 /* 16KB */
>>> +#define SEV_FW_BLOB_MAX_SIZE 0x5000 /* 20KB */
>>>
>>
>> This has effects in drivers/crypto/ccp/sev-dev.c
>> (for
>> example in alloc_snp_host_map). Is that OK?
>>
>
> No, this was a mistake of mine because I was using a bloated data
> encoding that needed 5 pages for the GUID table plus 4 small
> certificates. I've since fixed that in our user space code.
> We shouldn't change this size and instead wait for a better size
> negotiation protocol between the guest and host to avoid this awkward
> hard-coding.
>
>

2023-01-09 23:43:06

by Kalra, Ashish

[permalink] [raw]
Subject: Re: [PATCH RFC v7 52/64] KVM: SVM: Provide support for SNP_GUEST_REQUEST NAE event

On 1/8/2023 9:33 PM, Alexey Kardashevskiy wrote:
> On 15/12/22 06:40, Michael Roth wrote:
>> From: Brijesh Singh <[email protected]>
>>
>> Version 2 of GHCB specification added the support for two SNP Guest
>> Request Message NAE events. The events allows for an SEV-SNP guest to
>> make request to the SEV-SNP firmware through hypervisor using the
>> SNP_GUEST_REQUEST API define in the SEV-SNP firmware specification.
>>
>> The SNP_EXT_GUEST_REQUEST is similar to SNP_GUEST_REQUEST with the
>> difference of an additional certificate blob that can be passed through
>> the SNP_SET_CONFIG ioctl defined in the CCP driver. The CCP driver
>> provides snp_guest_ext_guest_request() that is used by the KVM to get
>> both the report and certificate data at once.
>>
>> Signed-off-by: Brijesh Singh <[email protected]>
>> Signed-off-by: Ashish Kalra <[email protected]>
>> Signed-off-by: Michael Roth <[email protected]>
>> ---
>>   arch/x86/kvm/svm/sev.c | 185 +++++++++++++++++++++++++++++++++++++++--
>>   arch/x86/kvm/svm/svm.h |   2 +
>>   2 files changed, 181 insertions(+), 6 deletions(-)
>>
>> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
>> index 5f2b2092cdae..18efa70553c2 100644
>> --- a/arch/x86/kvm/svm/sev.c
>> +++ b/arch/x86/kvm/svm/sev.c
>> @@ -331,6 +331,7 @@ static int sev_guest_init(struct kvm *kvm, struct
>> kvm_sev_cmd *argp)
>>           if (ret)
>>               goto e_free;
>> +        mutex_init(&sev->guest_req_lock);
>>           ret = sev_snp_init(&argp->error, false);
>>       } else {
>>           ret = sev_platform_init(&argp->error);
>> @@ -2051,23 +2052,34 @@ int sev_vm_move_enc_context_from(struct kvm
>> *kvm, unsigned int source_fd)
>>    */
>>   static void *snp_context_create(struct kvm *kvm, struct kvm_sev_cmd
>> *argp)
>>   {
>> +    struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
>>       struct sev_data_snp_addr data = {};
>> -    void *context;
>> +    void *context, *certs_data;
>>       int rc;
>> +    /* Allocate memory used for the certs data in SNP guest request */
>> +    certs_data = kzalloc(SEV_FW_BLOB_MAX_SIZE, GFP_KERNEL_ACCOUNT);
>> +    if (!certs_data)
>> +        return NULL;
>> +
>>       /* Allocate memory for context page */
>>       context = snp_alloc_firmware_page(GFP_KERNEL_ACCOUNT);
>>       if (!context)
>> -        return NULL;
>> +        goto e_free;
>>       data.gctx_paddr = __psp_pa(context);
>>       rc = __sev_issue_cmd(argp->sev_fd, SEV_CMD_SNP_GCTX_CREATE,
>> &data, &argp->error);
>> -    if (rc) {
>> -        snp_free_firmware_page(context);
>> -        return NULL;
>> -    }
>> +    if (rc)
>> +        goto e_free;
>> +
>> +    sev->snp_certs_data = certs_data;
>>       return context;
>> +
>> +e_free:
>> +    snp_free_firmware_page(context);
>> +    kfree(certs_data);
>> +    return NULL;
>>   }
>>   static int snp_bind_asid(struct kvm *kvm, int *error)
>> @@ -2653,6 +2665,8 @@ static int snp_decommission_context(struct kvm
>> *kvm)
>>       snp_free_firmware_page(sev->snp_context);
>>       sev->snp_context = NULL;
>> +    kfree(sev->snp_certs_data);
>> +
>>       return 0;
>>   }
>> @@ -3174,6 +3188,8 @@ static int sev_es_validate_vmgexit(struct
>> vcpu_svm *svm, u64 *exit_code)
>>       case SVM_VMGEXIT_UNSUPPORTED_EVENT:
>>       case SVM_VMGEXIT_HV_FEATURES:
>>       case SVM_VMGEXIT_PSC:
>> +    case SVM_VMGEXIT_GUEST_REQUEST:
>> +    case SVM_VMGEXIT_EXT_GUEST_REQUEST:
>>           break;
>>       default:
>>           reason = GHCB_ERR_INVALID_EVENT;
>> @@ -3396,6 +3412,149 @@ static int snp_complete_psc(struct kvm_vcpu
>> *vcpu)
>>       return 1;
>>   }
>> +static unsigned long snp_setup_guest_buf(struct vcpu_svm *svm,
>> +                     struct sev_data_snp_guest_request *data,
>> +                     gpa_t req_gpa, gpa_t resp_gpa)
>> +{
>> +    struct kvm_vcpu *vcpu = &svm->vcpu;
>> +    struct kvm *kvm = vcpu->kvm;
>> +    kvm_pfn_t req_pfn, resp_pfn;
>> +    struct kvm_sev_info *sev;
>> +
>> +    sev = &to_kvm_svm(kvm)->sev_info;
>> +
>> +    if (!IS_ALIGNED(req_gpa, PAGE_SIZE) || !IS_ALIGNED(resp_gpa,
>> PAGE_SIZE))
>> +        return SEV_RET_INVALID_PARAM;
>> +
>> +    req_pfn = gfn_to_pfn(kvm, gpa_to_gfn(req_gpa));
>> +    if (is_error_noslot_pfn(req_pfn))
>> +        return SEV_RET_INVALID_ADDRESS;
>> +
>> +    resp_pfn = gfn_to_pfn(kvm, gpa_to_gfn(resp_gpa));
>> +    if (is_error_noslot_pfn(resp_pfn))
>> +        return SEV_RET_INVALID_ADDRESS;
>> +
>> +    if (rmp_make_private(resp_pfn, 0, PG_LEVEL_4K, 0, true))
>> +        return SEV_RET_INVALID_ADDRESS;
>> +
>> +    data->gctx_paddr = __psp_pa(sev->snp_context);
>> +    data->req_paddr = __sme_set(req_pfn << PAGE_SHIFT);
>> +    data->res_paddr = __sme_set(resp_pfn << PAGE_SHIFT);
>> +
>> +    return 0;
>> +}
>> +
>> +static void snp_cleanup_guest_buf(struct sev_data_snp_guest_request
>> *data, unsigned long *rc)
>> +{
>> +    u64 pfn = __sme_clr(data->res_paddr) >> PAGE_SHIFT;
>> +    int ret;
>> +
>> +    ret = snp_page_reclaim(pfn);
>> +    if (ret)
>> +        *rc = SEV_RET_INVALID_ADDRESS;
>> +
>> +    ret = rmp_make_shared(pfn, PG_LEVEL_4K);
>> +    if (ret)
>> +        *rc = SEV_RET_INVALID_ADDRESS;
>> +}
>> +
>> +static void snp_handle_guest_request(struct vcpu_svm *svm, gpa_t
>> req_gpa, gpa_t resp_gpa)
>> +{
>> +    struct sev_data_snp_guest_request data = {0};
>> +    struct kvm_vcpu *vcpu = &svm->vcpu;
>> +    struct kvm *kvm = vcpu->kvm;
>> +    struct kvm_sev_info *sev;
>> +    unsigned long rc;
>> +    int err;
>> +
>> +    if (!sev_snp_guest(vcpu->kvm)) {
>> +        rc = SEV_RET_INVALID_GUEST;
>> +        goto e_fail;
>> +    }
>> +
>> +    sev = &to_kvm_svm(kvm)->sev_info;
>> +
>> +    mutex_lock(&sev->guest_req_lock);
>> +
>> +    rc = snp_setup_guest_buf(svm, &data, req_gpa, resp_gpa);
>> +    if (rc)
>> +        goto unlock;
>> +
>> +    rc = sev_issue_cmd(kvm, SEV_CMD_SNP_GUEST_REQUEST, &data, &err);
>
>
> This one goes via sev_issue_cmd_external_user() and uses sev-fd...
>
>> +    if (rc)
>> +        /* use the firmware error code */
>> +        rc = err;
>> +
>> +    snp_cleanup_guest_buf(&data, &rc);
>> +
>> +unlock:
>> +    mutex_unlock(&sev->guest_req_lock);
>> +
>> +e_fail:
>> +    svm_set_ghcb_sw_exit_info_2(vcpu, rc);
>> +}
>> +
>> +static void snp_handle_ext_guest_request(struct vcpu_svm *svm, gpa_t
>> req_gpa, gpa_t resp_gpa)
>> +{
>> +    struct sev_data_snp_guest_request req = {0};
>> +    struct kvm_vcpu *vcpu = &svm->vcpu;
>> +    struct kvm *kvm = vcpu->kvm;
>> +    unsigned long data_npages;
>> +    struct kvm_sev_info *sev;
>> +    unsigned long rc, err;
>> +    u64 data_gpa;
>> +
>> +    if (!sev_snp_guest(vcpu->kvm)) {
>> +        rc = SEV_RET_INVALID_GUEST;
>> +        goto e_fail;
>> +    }
>> +
>> +    sev = &to_kvm_svm(kvm)->sev_info;
>> +
>> +    data_gpa = vcpu->arch.regs[VCPU_REGS_RAX];
>> +    data_npages = vcpu->arch.regs[VCPU_REGS_RBX];
>> +
>> +    if (!IS_ALIGNED(data_gpa, PAGE_SIZE)) {
>> +        rc = SEV_RET_INVALID_ADDRESS;
>> +        goto e_fail;
>> +    }
>> +
>> +    mutex_lock(&sev->guest_req_lock);
>> +
>> +    rc = snp_setup_guest_buf(svm, &req, req_gpa, resp_gpa);
>> +    if (rc)
>> +        goto unlock;
>> +
>> +    rc = snp_guest_ext_guest_request(&req, (unsigned
>> long)sev->snp_certs_data,
>> +                     &data_npages, &err);
>
> but this one does not and jump straight to drivers/crypto/ccp/sev-dev.c
> ignoring sev->fd. Why different? Can these two be unified?
> sev_issue_cmd_external_user() only checks if fd is /dev/sev which is
> hardly useful.
>
> "[PATCH RFC v7 32/64] crypto: ccp: Provide APIs to query extended
> attestation report" added this one.

SNP_EXT_GUEST_REQUEST additionally returns a certificate blob and that's
why it goes through the CCP driver interface
snp_guest_ext_guest_request() that is used to get both the report and
certificate data/blob at the same time.

All the FW API calls on the KVM side go through sev_issue_cmd() and
sev_issue_cmd_external_user() interfaces and that i believe uses sev->fd
more of as a sanity check.

Thanks,
Ashish

>
> Besides, is sev->fd really needed in the sev struct at all? Thanks,
>
>
>> +    if (rc) {
>> +        /*
>> +         * If buffer length is small then return the expected
>> +         * length in rbx.
>> +         */
>> +        if (err == SNP_GUEST_REQ_INVALID_LEN)
>> +            vcpu->arch.regs[VCPU_REGS_RBX] = data_npages;
>> +
>> +        /* pass the firmware error code */
>> +        rc = err;
>> +        goto cleanup;
>> +    }
>> +
>> +    /* Copy the certificate blob in the guest memory */
>> +    if (data_npages &&
>> +        kvm_write_guest(kvm, data_gpa, sev->snp_certs_data,
>> data_npages << PAGE_SHIFT))
>> +        rc = SEV_RET_INVALID_ADDRESS;
>> +
>> +cleanup:
>> +    snp_cleanup_guest_buf(&req, &rc);
>> +
>> +unlock:
>> +    mutex_unlock(&sev->guest_req_lock);
>> +
>> +e_fail:
>> +    svm_set_ghcb_sw_exit_info_2(vcpu, rc);
>> +}
>> +
>>   static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
>>   {
>>       struct vmcb_control_area *control = &svm->vmcb->control;
>> @@ -3629,6 +3788,20 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
>>           vcpu->run->vmgexit.ghcb_msr = ghcb_gpa;
>>           vcpu->arch.complete_userspace_io = snp_complete_psc;
>>           break;
>> +    case SVM_VMGEXIT_GUEST_REQUEST: {
>> +        snp_handle_guest_request(svm, control->exit_info_1,
>> control->exit_info_2);
>> +
>> +        ret = 1;
>> +        break;
>> +    }
>> +    case SVM_VMGEXIT_EXT_GUEST_REQUEST: {
>> +        snp_handle_ext_guest_request(svm,
>> +                         control->exit_info_1,
>> +                         control->exit_info_2);
>> +
>> +        ret = 1;
>> +        break;
>> +    }
>>       case SVM_VMGEXIT_UNSUPPORTED_EVENT:
>>           vcpu_unimpl(vcpu,
>>                   "vmgexit: unsupported event - exit_info_1=%#llx,
>> exit_info_2=%#llx\n",
>> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
>> index 12b9f4d539fb..7c0f9d00950f 100644
>> --- a/arch/x86/kvm/svm/svm.h
>> +++ b/arch/x86/kvm/svm/svm.h
>> @@ -101,6 +101,8 @@ struct kvm_sev_info {
>>       u64 snp_init_flags;
>>       void *snp_context;      /* SNP guest context page */
>>       spinlock_t psc_lock;
>> +    void *snp_certs_data;
>> +    struct mutex guest_req_lock;
>>   };
>>   struct kvm_svm {
>

2023-01-10 02:30:16

by Alexey Kardashevskiy

[permalink] [raw]
Subject: Re: [PATCH RFC v7 52/64] KVM: SVM: Provide support for SNP_GUEST_REQUEST NAE event



On 10/1/23 10:41, Kalra, Ashish wrote:
> On 1/8/2023 9:33 PM, Alexey Kardashevskiy wrote:
>> On 15/12/22 06:40, Michael Roth wrote:
>>> From: Brijesh Singh <[email protected]>
>>>
>>> Version 2 of GHCB specification added the support for two SNP Guest
>>> Request Message NAE events. The events allows for an SEV-SNP guest to
>>> make request to the SEV-SNP firmware through hypervisor using the
>>> SNP_GUEST_REQUEST API define in the SEV-SNP firmware specification.
>>>
>>> The SNP_EXT_GUEST_REQUEST is similar to SNP_GUEST_REQUEST with the
>>> difference of an additional certificate blob that can be passed through
>>> the SNP_SET_CONFIG ioctl defined in the CCP driver. The CCP driver
>>> provides snp_guest_ext_guest_request() that is used by the KVM to get
>>> both the report and certificate data at once.
>>>
>>> Signed-off-by: Brijesh Singh <[email protected]>
>>> Signed-off-by: Ashish Kalra <[email protected]>
>>> Signed-off-by: Michael Roth <[email protected]>
>>> ---
>>>   arch/x86/kvm/svm/sev.c | 185 +++++++++++++++++++++++++++++++++++++++--
>>>   arch/x86/kvm/svm/svm.h |   2 +
>>>   2 files changed, 181 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
>>> index 5f2b2092cdae..18efa70553c2 100644
>>> --- a/arch/x86/kvm/svm/sev.c
>>> +++ b/arch/x86/kvm/svm/sev.c
>>> @@ -331,6 +331,7 @@ static int sev_guest_init(struct kvm *kvm, struct
>>> kvm_sev_cmd *argp)
>>>           if (ret)
>>>               goto e_free;
>>> +        mutex_init(&sev->guest_req_lock);
>>>           ret = sev_snp_init(&argp->error, false);
>>>       } else {
>>>           ret = sev_platform_init(&argp->error);
>>> @@ -2051,23 +2052,34 @@ int sev_vm_move_enc_context_from(struct kvm
>>> *kvm, unsigned int source_fd)
>>>    */
>>>   static void *snp_context_create(struct kvm *kvm, struct kvm_sev_cmd
>>> *argp)
>>>   {
>>> +    struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
>>>       struct sev_data_snp_addr data = {};
>>> -    void *context;
>>> +    void *context, *certs_data;
>>>       int rc;
>>> +    /* Allocate memory used for the certs data in SNP guest request */
>>> +    certs_data = kzalloc(SEV_FW_BLOB_MAX_SIZE, GFP_KERNEL_ACCOUNT);
>>> +    if (!certs_data)
>>> +        return NULL;
>>> +
>>>       /* Allocate memory for context page */
>>>       context = snp_alloc_firmware_page(GFP_KERNEL_ACCOUNT);
>>>       if (!context)
>>> -        return NULL;
>>> +        goto e_free;
>>>       data.gctx_paddr = __psp_pa(context);
>>>       rc = __sev_issue_cmd(argp->sev_fd, SEV_CMD_SNP_GCTX_CREATE,
>>> &data, &argp->error);
>>> -    if (rc) {
>>> -        snp_free_firmware_page(context);
>>> -        return NULL;
>>> -    }
>>> +    if (rc)
>>> +        goto e_free;
>>> +
>>> +    sev->snp_certs_data = certs_data;
>>>       return context;
>>> +
>>> +e_free:
>>> +    snp_free_firmware_page(context);
>>> +    kfree(certs_data);
>>> +    return NULL;
>>>   }
>>>   static int snp_bind_asid(struct kvm *kvm, int *error)
>>> @@ -2653,6 +2665,8 @@ static int snp_decommission_context(struct kvm
>>> *kvm)
>>>       snp_free_firmware_page(sev->snp_context);
>>>       sev->snp_context = NULL;
>>> +    kfree(sev->snp_certs_data);
>>> +
>>>       return 0;
>>>   }
>>> @@ -3174,6 +3188,8 @@ static int sev_es_validate_vmgexit(struct
>>> vcpu_svm *svm, u64 *exit_code)
>>>       case SVM_VMGEXIT_UNSUPPORTED_EVENT:
>>>       case SVM_VMGEXIT_HV_FEATURES:
>>>       case SVM_VMGEXIT_PSC:
>>> +    case SVM_VMGEXIT_GUEST_REQUEST:
>>> +    case SVM_VMGEXIT_EXT_GUEST_REQUEST:
>>>           break;
>>>       default:
>>>           reason = GHCB_ERR_INVALID_EVENT;
>>> @@ -3396,6 +3412,149 @@ static int snp_complete_psc(struct kvm_vcpu
>>> *vcpu)
>>>       return 1;
>>>   }
>>> +static unsigned long snp_setup_guest_buf(struct vcpu_svm *svm,
>>> +                     struct sev_data_snp_guest_request *data,
>>> +                     gpa_t req_gpa, gpa_t resp_gpa)
>>> +{
>>> +    struct kvm_vcpu *vcpu = &svm->vcpu;
>>> +    struct kvm *kvm = vcpu->kvm;
>>> +    kvm_pfn_t req_pfn, resp_pfn;
>>> +    struct kvm_sev_info *sev;
>>> +
>>> +    sev = &to_kvm_svm(kvm)->sev_info;
>>> +
>>> +    if (!IS_ALIGNED(req_gpa, PAGE_SIZE) || !IS_ALIGNED(resp_gpa,
>>> PAGE_SIZE))
>>> +        return SEV_RET_INVALID_PARAM;
>>> +
>>> +    req_pfn = gfn_to_pfn(kvm, gpa_to_gfn(req_gpa));
>>> +    if (is_error_noslot_pfn(req_pfn))
>>> +        return SEV_RET_INVALID_ADDRESS;
>>> +
>>> +    resp_pfn = gfn_to_pfn(kvm, gpa_to_gfn(resp_gpa));
>>> +    if (is_error_noslot_pfn(resp_pfn))
>>> +        return SEV_RET_INVALID_ADDRESS;
>>> +
>>> +    if (rmp_make_private(resp_pfn, 0, PG_LEVEL_4K, 0, true))
>>> +        return SEV_RET_INVALID_ADDRESS;
>>> +
>>> +    data->gctx_paddr = __psp_pa(sev->snp_context);
>>> +    data->req_paddr = __sme_set(req_pfn << PAGE_SHIFT);
>>> +    data->res_paddr = __sme_set(resp_pfn << PAGE_SHIFT);
>>> +
>>> +    return 0;
>>> +}
>>> +
>>> +static void snp_cleanup_guest_buf(struct sev_data_snp_guest_request
>>> *data, unsigned long *rc)
>>> +{
>>> +    u64 pfn = __sme_clr(data->res_paddr) >> PAGE_SHIFT;
>>> +    int ret;
>>> +
>>> +    ret = snp_page_reclaim(pfn);
>>> +    if (ret)
>>> +        *rc = SEV_RET_INVALID_ADDRESS;
>>> +
>>> +    ret = rmp_make_shared(pfn, PG_LEVEL_4K);
>>> +    if (ret)
>>> +        *rc = SEV_RET_INVALID_ADDRESS;
>>> +}
>>> +
>>> +static void snp_handle_guest_request(struct vcpu_svm *svm, gpa_t
>>> req_gpa, gpa_t resp_gpa)
>>> +{
>>> +    struct sev_data_snp_guest_request data = {0};
>>> +    struct kvm_vcpu *vcpu = &svm->vcpu;
>>> +    struct kvm *kvm = vcpu->kvm;
>>> +    struct kvm_sev_info *sev;
>>> +    unsigned long rc;
>>> +    int err;
>>> +
>>> +    if (!sev_snp_guest(vcpu->kvm)) {
>>> +        rc = SEV_RET_INVALID_GUEST;
>>> +        goto e_fail;
>>> +    }
>>> +
>>> +    sev = &to_kvm_svm(kvm)->sev_info;
>>> +
>>> +    mutex_lock(&sev->guest_req_lock);
>>> +
>>> +    rc = snp_setup_guest_buf(svm, &data, req_gpa, resp_gpa);
>>> +    if (rc)
>>> +        goto unlock;
>>> +
>>> +    rc = sev_issue_cmd(kvm, SEV_CMD_SNP_GUEST_REQUEST, &data, &err);
>>
>>
>> This one goes via sev_issue_cmd_external_user() and uses sev-fd...
>>
>>> +    if (rc)
>>> +        /* use the firmware error code */
>>> +        rc = err;
>>> +
>>> +    snp_cleanup_guest_buf(&data, &rc);
>>> +
>>> +unlock:
>>> +    mutex_unlock(&sev->guest_req_lock);
>>> +
>>> +e_fail:
>>> +    svm_set_ghcb_sw_exit_info_2(vcpu, rc);
>>> +}
>>> +
>>> +static void snp_handle_ext_guest_request(struct vcpu_svm *svm, gpa_t
>>> req_gpa, gpa_t resp_gpa)
>>> +{
>>> +    struct sev_data_snp_guest_request req = {0};
>>> +    struct kvm_vcpu *vcpu = &svm->vcpu;
>>> +    struct kvm *kvm = vcpu->kvm;
>>> +    unsigned long data_npages;
>>> +    struct kvm_sev_info *sev;
>>> +    unsigned long rc, err;
>>> +    u64 data_gpa;
>>> +
>>> +    if (!sev_snp_guest(vcpu->kvm)) {
>>> +        rc = SEV_RET_INVALID_GUEST;
>>> +        goto e_fail;
>>> +    }
>>> +
>>> +    sev = &to_kvm_svm(kvm)->sev_info;
>>> +
>>> +    data_gpa = vcpu->arch.regs[VCPU_REGS_RAX];
>>> +    data_npages = vcpu->arch.regs[VCPU_REGS_RBX];
>>> +
>>> +    if (!IS_ALIGNED(data_gpa, PAGE_SIZE)) {
>>> +        rc = SEV_RET_INVALID_ADDRESS;
>>> +        goto e_fail;
>>> +    }
>>> +
>>> +    mutex_lock(&sev->guest_req_lock);
>>> +
>>> +    rc = snp_setup_guest_buf(svm, &req, req_gpa, resp_gpa);
>>> +    if (rc)
>>> +        goto unlock;
>>> +
>>> +    rc = snp_guest_ext_guest_request(&req, (unsigned
>>> long)sev->snp_certs_data,
>>> +                     &data_npages, &err);
>>
>> but this one does not and jump straight to
>> drivers/crypto/ccp/sev-dev.c ignoring sev->fd. Why different? Can
>> these two be unified? sev_issue_cmd_external_user() only checks if fd
>> is /dev/sev which is hardly useful.
>>
>> "[PATCH RFC v7 32/64] crypto: ccp: Provide APIs to query extended
>> attestation report" added this one.
>
> SNP_EXT_GUEST_REQUEST additionally returns a certificate blob and that's
> why it goes through the CCP driver interface
> snp_guest_ext_guest_request() that is used to get both the report and
> certificate data/blob at the same time.

True. I thought though that this calls for extending sev_issue_cmd() to
take care of these extra parameters rather than just skipping the sev->fd.


> All the FW API calls on the KVM side go through sev_issue_cmd() and
> sev_issue_cmd_external_user() interfaces and that i believe uses sev->fd
> more of as a sanity check.

Does not look like it:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/crypto/ccp/sev-dev.c?h=v6.2-rc3#n1290

===
int sev_issue_cmd_external_user(struct file *filep, unsigned int cmd,
void *data, int *error)
{
if (!filep || filep->f_op != &sev_fops)
return -EBADF;

return sev_do_cmd(cmd, data, error);
}
EXPORT_SYMBOL_GPL(sev_issue_cmd_external_user);
===

The only "more" is that it requires sev->fd to be a valid open fd, what
is the value in that? I may easily miss the bigger picture here. Thanks,


> Thanks,
> Ashish
>
>>
>> Besides, is sev->fd really needed in the sev struct at all? Thanks,
>>
>>
>>> +    if (rc) {
>>> +        /*
>>> +         * If buffer length is small then return the expected
>>> +         * length in rbx.
>>> +         */
>>> +        if (err == SNP_GUEST_REQ_INVALID_LEN)
>>> +            vcpu->arch.regs[VCPU_REGS_RBX] = data_npages;
>>> +
>>> +        /* pass the firmware error code */
>>> +        rc = err;
>>> +        goto cleanup;
>>> +    }
>>> +
>>> +    /* Copy the certificate blob in the guest memory */
>>> +    if (data_npages &&
>>> +        kvm_write_guest(kvm, data_gpa, sev->snp_certs_data,
>>> data_npages << PAGE_SHIFT))
>>> +        rc = SEV_RET_INVALID_ADDRESS;
>>> +
>>> +cleanup:
>>> +    snp_cleanup_guest_buf(&req, &rc);
>>> +
>>> +unlock:
>>> +    mutex_unlock(&sev->guest_req_lock);
>>> +
>>> +e_fail:
>>> +    svm_set_ghcb_sw_exit_info_2(vcpu, rc);
>>> +}
>>> +
>>>   static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
>>>   {
>>>       struct vmcb_control_area *control = &svm->vmcb->control;
>>> @@ -3629,6 +3788,20 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
>>>           vcpu->run->vmgexit.ghcb_msr = ghcb_gpa;
>>>           vcpu->arch.complete_userspace_io = snp_complete_psc;
>>>           break;
>>> +    case SVM_VMGEXIT_GUEST_REQUEST: {
>>> +        snp_handle_guest_request(svm, control->exit_info_1,
>>> control->exit_info_2);
>>> +
>>> +        ret = 1;
>>> +        break;
>>> +    }
>>> +    case SVM_VMGEXIT_EXT_GUEST_REQUEST: {
>>> +        snp_handle_ext_guest_request(svm,
>>> +                         control->exit_info_1,
>>> +                         control->exit_info_2);
>>> +
>>> +        ret = 1;
>>> +        break;
>>> +    }
>>>       case SVM_VMGEXIT_UNSUPPORTED_EVENT:
>>>           vcpu_unimpl(vcpu,
>>>                   "vmgexit: unsupported event - exit_info_1=%#llx,
>>> exit_info_2=%#llx\n",
>>> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
>>> index 12b9f4d539fb..7c0f9d00950f 100644
>>> --- a/arch/x86/kvm/svm/svm.h
>>> +++ b/arch/x86/kvm/svm/svm.h
>>> @@ -101,6 +101,8 @@ struct kvm_sev_info {
>>>       u64 snp_init_flags;
>>>       void *snp_context;      /* SNP guest context page */
>>>       spinlock_t psc_lock;
>>> +    void *snp_certs_data;
>>> +    struct mutex guest_req_lock;
>>>   };
>>>   struct kvm_svm {
>>

--
Alexey

2023-01-10 07:25:40

by Dov Murik

[permalink] [raw]
Subject: Re: [PATCH RFC v7 62/64] x86/sev: Add KVM commands for instance certs

Hi Tom,

On 10/01/2023 0:27, Tom Lendacky wrote:
> On 1/9/23 10:55, Dionna Amalie Glaze wrote:
>>>> +
>>>> +static int snp_set_instance_certs(struct kvm *kvm, struct
>>>> kvm_sev_cmd *argp)
>>>> +{
>>> [...]
>>>
>>> Here we set the length to the page-aligned value, but we copy only
>>> params.cert_len bytes.  If there are two subsequent
>>> snp_set_instance_certs() calls where the second one has a shorter
>>> length, we might "keep" some leftover bytes from the first call.
>>>
>>> Consider:
>>> 1. snp_set_instance_certs(certs_addr point to "AAA...", certs_len=8192)
>>> 2. snp_set_instance_certs(certs_addr point to "BBB...", certs_len=4097)
>>>
>>> If I understand correctly, on the second call we'll copy 4097 "BBB..."
>>> bytes into the to_certs buffer, but length will be (4096 + PAGE_SIZE -
>>> 1) & PAGE_MASK which will be 8192.
>>>
>>> Later when fetching the certs (for the extended report or in
>>> snp_get_instance_certs()) the user will get a buffer of 8192 bytes
>>> filled with 4097 BBBs and 4095 leftover AAAs.
>>>
>>> Maybe zero sev->snp_certs_data entirely before writing to it?
>>>
>>
>> Yes, I agree it should be zeroed, at least if the previous length is
>> greater than the new length. Good catch.
>>
>>
>>> Related question (not only for this patch) regarding snp_certs_data
>>> (host or per-instance): why is its size page-aligned at all? why is it
>>> limited by 16KB or 20KB? If I understand correctly, for SNP, this buffer
>>> is never sent to the PSP.
>>>
>>
>> The buffer is meant to be copied into the guest driver following the
>> GHCB extended guest request protocol. The data to copy back are
>> expected to be in 4K page granularity.
>
> I don't think the data has to be in 4K page granularity. Why do you
> think it does?
>

I looked at AMD publication 56421 SEV-ES Guest-Hypervisor Communication
Block Standardization (July 2022), page 37. The table says:

--------------

NAE Event: SNP Extended Guest Request

Notes:

RAX will have the guest physical address of the page(s) to hold returned
data

RBX
State to Hypervisor: will contain the number of guest contiguous
pages supplied to hold returned data
State from Hypervisor: on error will contain the number of guest
contiguous pages required to hold the data to be returned

...

The request page, response page and data page(s) must be assigned to the
hypervisor (shared).

--------------


According to this spec, it looks like the sizes are communicated as
number of pages in RBX. So the data should start at a 4KB alignment
(this is verified in snp_handle_ext_guest_request()) and its length
should be 4KB-aligned, as Dionna noted.

I see no reason (in the spec and in the kernel code) for the data length
to be limited to 16KB (SEV_FW_BLOB_MAX_SIZE) but I might be missing some
flow because Dionna ran into this limit.


-Dov



> Thanks,
> Tom
>
>>
>>> [...]
>>>>
>>>> -#define SEV_FW_BLOB_MAX_SIZE 0x4000  /* 16KB */
>>>> +#define SEV_FW_BLOB_MAX_SIZE 0x5000  /* 20KB */
>>>>
>>>
>>> This has effects in drivers/crypto/ccp/sev-dev.c
>>>                                                                 (for
>>> example in alloc_snp_host_map).  Is that OK?
>>>
>>
>> No, this was a mistake of mine because I was using a bloated data
>> encoding that needed 5 pages for the GUID table plus 4 small
>> certificates. I've since fixed that in our user space code.
>> We shouldn't change this size and instead wait for a better size
>> negotiation protocol between the guest and host to avoid this awkward
>> hard-coding.
>>
>>

2023-01-10 15:11:51

by Tom Lendacky

[permalink] [raw]
Subject: Re: [PATCH RFC v7 62/64] x86/sev: Add KVM commands for instance certs

On 1/10/23 01:10, Dov Murik wrote:
> Hi Tom,
>
> On 10/01/2023 0:27, Tom Lendacky wrote:
>> On 1/9/23 10:55, Dionna Amalie Glaze wrote:
>>>>> +
>>>>> +static int snp_set_instance_certs(struct kvm *kvm, struct
>>>>> kvm_sev_cmd *argp)
>>>>> +{
>>>> [...]
>>>>
>>>> Here we set the length to the page-aligned value, but we copy only
>>>> params.cert_len bytes.  If there are two subsequent
>>>> snp_set_instance_certs() calls where the second one has a shorter
>>>> length, we might "keep" some leftover bytes from the first call.
>>>>
>>>> Consider:
>>>> 1. snp_set_instance_certs(certs_addr point to "AAA...", certs_len=8192)
>>>> 2. snp_set_instance_certs(certs_addr point to "BBB...", certs_len=4097)
>>>>
>>>> If I understand correctly, on the second call we'll copy 4097 "BBB..."
>>>> bytes into the to_certs buffer, but length will be (4096 + PAGE_SIZE -
>>>> 1) & PAGE_MASK which will be 8192.
>>>>
>>>> Later when fetching the certs (for the extended report or in
>>>> snp_get_instance_certs()) the user will get a buffer of 8192 bytes
>>>> filled with 4097 BBBs and 4095 leftover AAAs.
>>>>
>>>> Maybe zero sev->snp_certs_data entirely before writing to it?
>>>>
>>>
>>> Yes, I agree it should be zeroed, at least if the previous length is
>>> greater than the new length. Good catch.
>>>
>>>
>>>> Related question (not only for this patch) regarding snp_certs_data
>>>> (host or per-instance): why is its size page-aligned at all? why is it
>>>> limited by 16KB or 20KB? If I understand correctly, for SNP, this buffer
>>>> is never sent to the PSP.
>>>>
>>>
>>> The buffer is meant to be copied into the guest driver following the
>>> GHCB extended guest request protocol. The data to copy back are
>>> expected to be in 4K page granularity.
>>
>> I don't think the data has to be in 4K page granularity. Why do you
>> think it does?
>>
>
> I looked at AMD publication 56421 SEV-ES Guest-Hypervisor Communication
> Block Standardization (July 2022), page 37. The table says:
>
> --------------
>
> NAE Event: SNP Extended Guest Request
>
> Notes:
>
> RAX will have the guest physical address of the page(s) to hold returned
> data
>
> RBX
> State to Hypervisor: will contain the number of guest contiguous
> pages supplied to hold returned data
> State from Hypervisor: on error will contain the number of guest
> contiguous pages required to hold the data to be returned
>
> ...
>
> The request page, response page and data page(s) must be assigned to the
> hypervisor (shared).
>
> --------------
>
>
> According to this spec, it looks like the sizes are communicated as
> number of pages in RBX. So the data should start at a 4KB alignment
> (this is verified in snp_handle_ext_guest_request()) and its length
> should be 4KB-aligned, as Dionna noted.

That only indicates how many pages are required to hold the data, but the
hypervisor only has to copy however much data is present. If the data is
20 bytes, then you only have to copy 20 bytes. If the user supplied 0 for
the number of pages, then the code returns 1 in RBX to indicate that one
page is required to hold the 20 bytes.

>
> I see no reason (in the spec and in the kernel code) for the data length
> to be limited to 16KB (SEV_FW_BLOB_MAX_SIZE) but I might be missing some
> flow because Dionna ran into this limit.

Correct, there is no limit. I believe that SEV_FW_BLOB_MAX_SIZE is a way
to keep the memory usage controlled because data is coming from userspace
and it isn't expected that the data would be larger than that.

I'm not sure if that was in from the start or as a result of a review
comment. Not sure what is the best approach is.

Thanks,
Tom

>
>
> -Dov
>
>
>
>> Thanks,
>> Tom
>>
>>>
>>>> [...]
>>>>>
>>>>> -#define SEV_FW_BLOB_MAX_SIZE 0x4000  /* 16KB */
>>>>> +#define SEV_FW_BLOB_MAX_SIZE 0x5000  /* 20KB */
>>>>>
>>>>
>>>> This has effects in drivers/crypto/ccp/sev-dev.c
>>>>                                                                 (for
>>>> example in alloc_snp_host_map).  Is that OK?
>>>>
>>>
>>> No, this was a mistake of mine because I was using a bloated data
>>> encoding that needed 5 pages for the GUID table plus 4 small
>>> certificates. I've since fixed that in our user space code.
>>> We shouldn't change this size and instead wait for a better size
>>> negotiation protocol between the guest and host to avoid this awkward
>>> hard-coding.
>>>
>>>

2023-01-10 15:25:15

by Peter Gonda

[permalink] [raw]
Subject: Re: [PATCH RFC v7 62/64] x86/sev: Add KVM commands for instance certs

On Tue, Jan 10, 2023 at 8:10 AM Tom Lendacky <[email protected]> wrote:
>
> On 1/10/23 01:10, Dov Murik wrote:
> > Hi Tom,
> >
> > On 10/01/2023 0:27, Tom Lendacky wrote:
> >> On 1/9/23 10:55, Dionna Amalie Glaze wrote:
> >>>>> +
> >>>>> +static int snp_set_instance_certs(struct kvm *kvm, struct
> >>>>> kvm_sev_cmd *argp)
> >>>>> +{
> >>>> [...]
> >>>>
> >>>> Here we set the length to the page-aligned value, but we copy only
> >>>> params.cert_len bytes. If there are two subsequent
> >>>> snp_set_instance_certs() calls where the second one has a shorter
> >>>> length, we might "keep" some leftover bytes from the first call.
> >>>>
> >>>> Consider:
> >>>> 1. snp_set_instance_certs(certs_addr point to "AAA...", certs_len=8192)
> >>>> 2. snp_set_instance_certs(certs_addr point to "BBB...", certs_len=4097)
> >>>>
> >>>> If I understand correctly, on the second call we'll copy 4097 "BBB..."
> >>>> bytes into the to_certs buffer, but length will be (4096 + PAGE_SIZE -
> >>>> 1) & PAGE_MASK which will be 8192.
> >>>>
> >>>> Later when fetching the certs (for the extended report or in
> >>>> snp_get_instance_certs()) the user will get a buffer of 8192 bytes
> >>>> filled with 4097 BBBs and 4095 leftover AAAs.
> >>>>
> >>>> Maybe zero sev->snp_certs_data entirely before writing to it?
> >>>>
> >>>
> >>> Yes, I agree it should be zeroed, at least if the previous length is
> >>> greater than the new length. Good catch.
> >>>
> >>>
> >>>> Related question (not only for this patch) regarding snp_certs_data
> >>>> (host or per-instance): why is its size page-aligned at all? why is it
> >>>> limited by 16KB or 20KB? If I understand correctly, for SNP, this buffer
> >>>> is never sent to the PSP.
> >>>>
> >>>
> >>> The buffer is meant to be copied into the guest driver following the
> >>> GHCB extended guest request protocol. The data to copy back are
> >>> expected to be in 4K page granularity.
> >>
> >> I don't think the data has to be in 4K page granularity. Why do you
> >> think it does?
> >>
> >
> > I looked at AMD publication 56421 SEV-ES Guest-Hypervisor Communication
> > Block Standardization (July 2022), page 37. The table says:
> >
> > --------------
> >
> > NAE Event: SNP Extended Guest Request
> >
> > Notes:
> >
> > RAX will have the guest physical address of the page(s) to hold returned
> > data
> >
> > RBX
> > State to Hypervisor: will contain the number of guest contiguous
> > pages supplied to hold returned data
> > State from Hypervisor: on error will contain the number of guest
> > contiguous pages required to hold the data to be returned
> >
> > ...
> >
> > The request page, response page and data page(s) must be assigned to the
> > hypervisor (shared).
> >
> > --------------
> >
> >
> > According to this spec, it looks like the sizes are communicated as
> > number of pages in RBX. So the data should start at a 4KB alignment
> > (this is verified in snp_handle_ext_guest_request()) and its length
> > should be 4KB-aligned, as Dionna noted.
>
> That only indicates how many pages are required to hold the data, but the
> hypervisor only has to copy however much data is present. If the data is
> 20 bytes, then you only have to copy 20 bytes. If the user supplied 0 for
> the number of pages, then the code returns 1 in RBX to indicate that one
> page is required to hold the 20 bytes.
>
> >
> > I see no reason (in the spec and in the kernel code) for the data length
> > to be limited to 16KB (SEV_FW_BLOB_MAX_SIZE) but I might be missing some
> > flow because Dionna ran into this limit.
>
> Correct, there is no limit. I believe that SEV_FW_BLOB_MAX_SIZE is a way
> to keep the memory usage controlled because data is coming from userspace
> and it isn't expected that the data would be larger than that.
>
> I'm not sure if that was in from the start or as a result of a review
> comment. Not sure what is the best approach is.

This was discussed a bit in the guest driver changes recently too that
SEV_FW_BLOB_MAX_SIZE is used in the guest driver code for the max cert
length. We discussed increasing the limit there after fixing the IV
reuse issue.

Maybe we could introduce SEV_CERT_BLOB_MAX_SIZE here to be more clear
there is no firmware based limit? Then we could switch the guest
driver to use that too. Dionna confirmed 4 pages is enough for our
current usecase, Dov would you recommend something larger to start?

>
> Thanks,
> Tom
>
> >
> >
> > -Dov
> >
> >
> >
> >> Thanks,
> >> Tom
> >>
> >>>
> >>>> [...]
> >>>>>
> >>>>> -#define SEV_FW_BLOB_MAX_SIZE 0x4000 /* 16KB */
> >>>>> +#define SEV_FW_BLOB_MAX_SIZE 0x5000 /* 20KB */
> >>>>>
> >>>>
> >>>> This has effects in drivers/crypto/ccp/sev-dev.c
> >>>> (for
> >>>> example in alloc_snp_host_map). Is that OK?
> >>>>
> >>>
> >>> No, this was a mistake of mine because I was using a bloated data
> >>> encoding that needed 5 pages for the GUID table plus 4 small
> >>> certificates. I've since fixed that in our user space code.
> >>> We shouldn't change this size and instead wait for a better size
> >>> negotiation protocol between the guest and host to avoid this awkward
> >>> hard-coding.
> >>>
> >>>

2023-01-11 00:50:45

by Alexey Kardashevskiy

[permalink] [raw]
Subject: Re: [PATCH RFC v7 52/64] KVM: SVM: Provide support for SNP_GUEST_REQUEST NAE event

On 10/1/23 19:33, Kalra, Ashish wrote:
>
> On 1/9/2023 8:28 PM, Alexey Kardashevskiy wrote:
>>
>>
>> On 10/1/23 10:41, Kalra, Ashish wrote:
>>> On 1/8/2023 9:33 PM, Alexey Kardashevskiy wrote:
>>>> On 15/12/22 06:40, Michael Roth wrote:
>>>>> From: Brijesh Singh <[email protected]>
>>>>>
>>>>> Version 2 of GHCB specification added the support for two SNP Guest
>>>>> Request Message NAE events. The events allows for an SEV-SNP guest to
>>>>> make request to the SEV-SNP firmware through hypervisor using the
>>>>> SNP_GUEST_REQUEST API define in the SEV-SNP firmware specification.
>>>>>
>>>>> The SNP_EXT_GUEST_REQUEST is similar to SNP_GUEST_REQUEST with the
>>>>> difference of an additional certificate blob that can be passed
>>>>> through
>>>>> the SNP_SET_CONFIG ioctl defined in the CCP driver. The CCP driver
>>>>> provides snp_guest_ext_guest_request() that is used by the KVM to get
>>>>> both the report and certificate data at once.
>>>>>
>>>>> Signed-off-by: Brijesh Singh <[email protected]>
>>>>> Signed-off-by: Ashish Kalra <[email protected]>
>>>>> Signed-off-by: Michael Roth <[email protected]>
>>>>> ---
>>>>>   arch/x86/kvm/svm/sev.c | 185
>>>>> +++++++++++++++++++++++++++++++++++++++--
>>>>>   arch/x86/kvm/svm/svm.h |   2 +
>>>>>   2 files changed, 181 insertions(+), 6 deletions(-)
>>>>>
>>>>> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
>>>>> index 5f2b2092cdae..18efa70553c2 100644
>>>>> --- a/arch/x86/kvm/svm/sev.c
>>>>> +++ b/arch/x86/kvm/svm/sev.c
>>>>> @@ -331,6 +331,7 @@ static int sev_guest_init(struct kvm *kvm,
>>>>> struct kvm_sev_cmd *argp)
>>>>>           if (ret)
>>>>>               goto e_free;
>>>>> +        mutex_init(&sev->guest_req_lock);
>>>>>           ret = sev_snp_init(&argp->error, false);
>>>>>       } else {
>>>>>           ret = sev_platform_init(&argp->error);
>>>>> @@ -2051,23 +2052,34 @@ int sev_vm_move_enc_context_from(struct kvm
>>>>> *kvm, unsigned int source_fd)
>>>>>    */
>>>>>   static void *snp_context_create(struct kvm *kvm, struct
>>>>> kvm_sev_cmd *argp)
>>>>>   {
>>>>> +    struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
>>>>>       struct sev_data_snp_addr data = {};
>>>>> -    void *context;
>>>>> +    void *context, *certs_data;
>>>>>       int rc;
>>>>> +    /* Allocate memory used for the certs data in SNP guest
>>>>> request */
>>>>> +    certs_data = kzalloc(SEV_FW_BLOB_MAX_SIZE, GFP_KERNEL_ACCOUNT);
>>>>> +    if (!certs_data)
>>>>> +        return NULL;
>>>>> +
>>>>>       /* Allocate memory for context page */
>>>>>       context = snp_alloc_firmware_page(GFP_KERNEL_ACCOUNT);
>>>>>       if (!context)
>>>>> -        return NULL;
>>>>> +        goto e_free;
>>>>>       data.gctx_paddr = __psp_pa(context);
>>>>>       rc = __sev_issue_cmd(argp->sev_fd, SEV_CMD_SNP_GCTX_CREATE,
>>>>> &data, &argp->error);
>>>>> -    if (rc) {
>>>>> -        snp_free_firmware_page(context);
>>>>> -        return NULL;
>>>>> -    }
>>>>> +    if (rc)
>>>>> +        goto e_free;
>>>>> +
>>>>> +    sev->snp_certs_data = certs_data;
>>>>>       return context;
>>>>> +
>>>>> +e_free:
>>>>> +    snp_free_firmware_page(context);
>>>>> +    kfree(certs_data);
>>>>> +    return NULL;
>>>>>   }
>>>>>   static int snp_bind_asid(struct kvm *kvm, int *error)
>>>>> @@ -2653,6 +2665,8 @@ static int snp_decommission_context(struct
>>>>> kvm *kvm)
>>>>>       snp_free_firmware_page(sev->snp_context);
>>>>>       sev->snp_context = NULL;
>>>>> +    kfree(sev->snp_certs_data);
>>>>> +
>>>>>       return 0;
>>>>>   }
>>>>> @@ -3174,6 +3188,8 @@ static int sev_es_validate_vmgexit(struct
>>>>> vcpu_svm *svm, u64 *exit_code)
>>>>>       case SVM_VMGEXIT_UNSUPPORTED_EVENT:
>>>>>       case SVM_VMGEXIT_HV_FEATURES:
>>>>>       case SVM_VMGEXIT_PSC:
>>>>> +    case SVM_VMGEXIT_GUEST_REQUEST:
>>>>> +    case SVM_VMGEXIT_EXT_GUEST_REQUEST:
>>>>>           break;
>>>>>       default:
>>>>>           reason = GHCB_ERR_INVALID_EVENT;
>>>>> @@ -3396,6 +3412,149 @@ static int snp_complete_psc(struct kvm_vcpu
>>>>> *vcpu)
>>>>>       return 1;
>>>>>   }
>>>>> +static unsigned long snp_setup_guest_buf(struct vcpu_svm *svm,
>>>>> +                     struct sev_data_snp_guest_request *data,
>>>>> +                     gpa_t req_gpa, gpa_t resp_gpa)
>>>>> +{
>>>>> +    struct kvm_vcpu *vcpu = &svm->vcpu;
>>>>> +    struct kvm *kvm = vcpu->kvm;
>>>>> +    kvm_pfn_t req_pfn, resp_pfn;
>>>>> +    struct kvm_sev_info *sev;
>>>>> +
>>>>> +    sev = &to_kvm_svm(kvm)->sev_info;
>>>>> +
>>>>> +    if (!IS_ALIGNED(req_gpa, PAGE_SIZE) || !IS_ALIGNED(resp_gpa,
>>>>> PAGE_SIZE))
>>>>> +        return SEV_RET_INVALID_PARAM;
>>>>> +
>>>>> +    req_pfn = gfn_to_pfn(kvm, gpa_to_gfn(req_gpa));
>>>>> +    if (is_error_noslot_pfn(req_pfn))
>>>>> +        return SEV_RET_INVALID_ADDRESS;
>>>>> +
>>>>> +    resp_pfn = gfn_to_pfn(kvm, gpa_to_gfn(resp_gpa));
>>>>> +    if (is_error_noslot_pfn(resp_pfn))
>>>>> +        return SEV_RET_INVALID_ADDRESS;
>>>>> +
>>>>> +    if (rmp_make_private(resp_pfn, 0, PG_LEVEL_4K, 0, true))
>>>>> +        return SEV_RET_INVALID_ADDRESS;
>>>>> +
>>>>> +    data->gctx_paddr = __psp_pa(sev->snp_context);
>>>>> +    data->req_paddr = __sme_set(req_pfn << PAGE_SHIFT);
>>>>> +    data->res_paddr = __sme_set(resp_pfn << PAGE_SHIFT);
>>>>> +
>>>>> +    return 0;
>>>>> +}
>>>>> +
>>>>> +static void snp_cleanup_guest_buf(struct
>>>>> sev_data_snp_guest_request *data, unsigned long *rc)
>>>>> +{
>>>>> +    u64 pfn = __sme_clr(data->res_paddr) >> PAGE_SHIFT;
>>>>> +    int ret;
>>>>> +
>>>>> +    ret = snp_page_reclaim(pfn);
>>>>> +    if (ret)
>>>>> +        *rc = SEV_RET_INVALID_ADDRESS;
>>>>> +
>>>>> +    ret = rmp_make_shared(pfn, PG_LEVEL_4K);
>>>>> +    if (ret)
>>>>> +        *rc = SEV_RET_INVALID_ADDRESS;
>>>>> +}
>>>>> +
>>>>> +static void snp_handle_guest_request(struct vcpu_svm *svm, gpa_t
>>>>> req_gpa, gpa_t resp_gpa)
>>>>> +{
>>>>> +    struct sev_data_snp_guest_request data = {0};
>>>>> +    struct kvm_vcpu *vcpu = &svm->vcpu;
>>>>> +    struct kvm *kvm = vcpu->kvm;
>>>>> +    struct kvm_sev_info *sev;
>>>>> +    unsigned long rc;
>>>>> +    int err;
>>>>> +
>>>>> +    if (!sev_snp_guest(vcpu->kvm)) {
>>>>> +        rc = SEV_RET_INVALID_GUEST;
>>>>> +        goto e_fail;
>>>>> +    }
>>>>> +
>>>>> +    sev = &to_kvm_svm(kvm)->sev_info;
>>>>> +
>>>>> +    mutex_lock(&sev->guest_req_lock);
>>>>> +
>>>>> +    rc = snp_setup_guest_buf(svm, &data, req_gpa, resp_gpa);
>>>>> +    if (rc)
>>>>> +        goto unlock;
>>>>> +
>>>>> +    rc = sev_issue_cmd(kvm, SEV_CMD_SNP_GUEST_REQUEST, &data, &err);
>>>>
>>>>
>>>> This one goes via sev_issue_cmd_external_user() and uses sev-fd...
>>>>
>>>>> +    if (rc)
>>>>> +        /* use the firmware error code */
>>>>> +        rc = err;
>>>>> +
>>>>> +    snp_cleanup_guest_buf(&data, &rc);
>>>>> +
>>>>> +unlock:
>>>>> +    mutex_unlock(&sev->guest_req_lock);
>>>>> +
>>>>> +e_fail:
>>>>> +    svm_set_ghcb_sw_exit_info_2(vcpu, rc);
>>>>> +}
>>>>> +
>>>>> +static void snp_handle_ext_guest_request(struct vcpu_svm *svm,
>>>>> gpa_t req_gpa, gpa_t resp_gpa)
>>>>> +{
>>>>> +    struct sev_data_snp_guest_request req = {0};
>>>>> +    struct kvm_vcpu *vcpu = &svm->vcpu;
>>>>> +    struct kvm *kvm = vcpu->kvm;
>>>>> +    unsigned long data_npages;
>>>>> +    struct kvm_sev_info *sev;
>>>>> +    unsigned long rc, err;
>>>>> +    u64 data_gpa;
>>>>> +
>>>>> +    if (!sev_snp_guest(vcpu->kvm)) {
>>>>> +        rc = SEV_RET_INVALID_GUEST;
>>>>> +        goto e_fail;
>>>>> +    }
>>>>> +
>>>>> +    sev = &to_kvm_svm(kvm)->sev_info;
>>>>> +
>>>>> +    data_gpa = vcpu->arch.regs[VCPU_REGS_RAX];
>>>>> +    data_npages = vcpu->arch.regs[VCPU_REGS_RBX];
>>>>> +
>>>>> +    if (!IS_ALIGNED(data_gpa, PAGE_SIZE)) {
>>>>> +        rc = SEV_RET_INVALID_ADDRESS;
>>>>> +        goto e_fail;
>>>>> +    }
>>>>> +
>>>>> +    mutex_lock(&sev->guest_req_lock);
>>>>> +
>>>>> +    rc = snp_setup_guest_buf(svm, &req, req_gpa, resp_gpa);
>>>>> +    if (rc)
>>>>> +        goto unlock;
>>>>> +
>>>>> +    rc = snp_guest_ext_guest_request(&req, (unsigned
>>>>> long)sev->snp_certs_data,
>>>>> +                     &data_npages, &err);
>>>>
>>>> but this one does not and jump straight to
>>>> drivers/crypto/ccp/sev-dev.c ignoring sev->fd. Why different? Can
>>>> these two be unified? sev_issue_cmd_external_user() only checks if
>>>> fd is /dev/sev which is hardly useful.
>>>>
>>>> "[PATCH RFC v7 32/64] crypto: ccp: Provide APIs to query extended
>>>> attestation report" added this one.
>>>
>>> SNP_EXT_GUEST_REQUEST additionally returns a certificate blob and
>>> that's why it goes through the CCP driver interface
>>> snp_guest_ext_guest_request() that is used to get both the report and
>>> certificate data/blob at the same time.
>>
>> True. I thought though that this calls for extending sev_issue_cmd()
>> to take care of these extra parameters rather than just skipping the
>> sev->fd.
>>
>>
>>> All the FW API calls on the KVM side go through sev_issue_cmd() and
>>> sev_issue_cmd_external_user() interfaces and that i believe uses
>>> sev->fd more of as a sanity check.
>>
>> Does not look like it:
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/crypto/ccp/sev-dev.c?h=v6.2-rc3#n1290
>>
>> ===
>> int sev_issue_cmd_external_user(struct file *filep, unsigned int cmd,
>>                  void *data, int *error)
>> {
>>      if (!filep || filep->f_op != &sev_fops)
>>          return -EBADF;
>>
>>      return sev_do_cmd(cmd, data, error);
>> }
>> EXPORT_SYMBOL_GPL(sev_issue_cmd_external_user);
>> ===
>>
>> The only "more" is that it requires sev->fd to be a valid open fd,
>> what is the value in that? I may easily miss the bigger picture here.
>> Thanks,
>>
>>
>
> Have a look at following functions in drivers/crypto/ccp/sev-dev.c:
> sev_dev_init() and sev_misc_init().
>
> static int sev_misc_init(struct sev_device *sev)
> {
>         struct device *dev = sev->dev;
>         int ret;
>
>         /*
>          * SEV feature support can be detected on multiple devices but
>          * the SEV FW commands must be issued on the master. During
>          * probe, we do not know the master hence we create /dev/sev on
>          * the first device probe.
>          * sev_do_cmd() finds the right master device to which to issue
>          * the command to the firmware.
>      */


It is still a single /dev/sev node and the userspace cannot get it
wrong, it does not have to choose between (for instance) /dev/sev0 and
/dev/sev1 on a 2 SOC system.

> ...
> ...
>
> Hence, sev_issue_cmd_external_user() needs to ensure that the correct
> device (master device) is being operated upon and that's why there is
> the check for file operations matching sev_fops as below :
>
> int sev_issue_cmd_external_user(struct file *filep, unsigned int cmd,
>                                 void *data, int *error)
> {
>         if (!filep || filep->f_op != &sev_fops)
>                 return -EBADF;
> ..
> ..
>
> Essentially, sev->fd is the misc. device created for the master PSP
> device on which the SEV/SNP firmware commands are issued, hence,
> sev_issue_cmd() uses sev->fd.

There is always just one fd which always uses psp_master, nothing from
that fd is used.

More to the point, if sev->fd is still important, why is it ok to skip
it for snp_handle_ext_guest_request()? Thanks,


--
Alexey

2023-01-11 02:04:55

by Kalra, Ashish

[permalink] [raw]
Subject: Re: [PATCH RFC v7 52/64] KVM: SVM: Provide support for SNP_GUEST_REQUEST NAE event

On 1/10/2023 6:48 PM, Alexey Kardashevskiy wrote:
> On 10/1/23 19:33, Kalra, Ashish wrote:
>>
>> On 1/9/2023 8:28 PM, Alexey Kardashevskiy wrote:
>>>
>>>
>>> On 10/1/23 10:41, Kalra, Ashish wrote:
>>>> On 1/8/2023 9:33 PM, Alexey Kardashevskiy wrote:
>>>>> On 15/12/22 06:40, Michael Roth wrote:
>>>>>> From: Brijesh Singh <[email protected]>
>>>>>>
>>>>>> Version 2 of GHCB specification added the support for two SNP Guest
>>>>>> Request Message NAE events. The events allows for an SEV-SNP guest to
>>>>>> make request to the SEV-SNP firmware through hypervisor using the
>>>>>> SNP_GUEST_REQUEST API define in the SEV-SNP firmware specification.
>>>>>>
>>>>>> The SNP_EXT_GUEST_REQUEST is similar to SNP_GUEST_REQUEST with the
>>>>>> difference of an additional certificate blob that can be passed
>>>>>> through
>>>>>> the SNP_SET_CONFIG ioctl defined in the CCP driver. The CCP driver
>>>>>> provides snp_guest_ext_guest_request() that is used by the KVM to get
>>>>>> both the report and certificate data at once.
>>>>>>
>>>>>> Signed-off-by: Brijesh Singh <[email protected]>
>>>>>> Signed-off-by: Ashish Kalra <[email protected]>
>>>>>> Signed-off-by: Michael Roth <[email protected]>
>>>>>> ---
>>>>>>   arch/x86/kvm/svm/sev.c | 185
>>>>>> +++++++++++++++++++++++++++++++++++++++--
>>>>>>   arch/x86/kvm/svm/svm.h |   2 +
>>>>>>   2 files changed, 181 insertions(+), 6 deletions(-)
>>>>>>
>>>>>> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
>>>>>> index 5f2b2092cdae..18efa70553c2 100644
>>>>>> --- a/arch/x86/kvm/svm/sev.c
>>>>>> +++ b/arch/x86/kvm/svm/sev.c
>>>>>> @@ -331,6 +331,7 @@ static int sev_guest_init(struct kvm *kvm,
>>>>>> struct kvm_sev_cmd *argp)
>>>>>>           if (ret)
>>>>>>               goto e_free;
>>>>>> +        mutex_init(&sev->guest_req_lock);
>>>>>>           ret = sev_snp_init(&argp->error, false);
>>>>>>       } else {
>>>>>>           ret = sev_platform_init(&argp->error);
>>>>>> @@ -2051,23 +2052,34 @@ int sev_vm_move_enc_context_from(struct
>>>>>> kvm *kvm, unsigned int source_fd)
>>>>>>    */
>>>>>>   static void *snp_context_create(struct kvm *kvm, struct
>>>>>> kvm_sev_cmd *argp)
>>>>>>   {
>>>>>> +    struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
>>>>>>       struct sev_data_snp_addr data = {};
>>>>>> -    void *context;
>>>>>> +    void *context, *certs_data;
>>>>>>       int rc;
>>>>>> +    /* Allocate memory used for the certs data in SNP guest
>>>>>> request */
>>>>>> +    certs_data = kzalloc(SEV_FW_BLOB_MAX_SIZE, GFP_KERNEL_ACCOUNT);
>>>>>> +    if (!certs_data)
>>>>>> +        return NULL;
>>>>>> +
>>>>>>       /* Allocate memory for context page */
>>>>>>       context = snp_alloc_firmware_page(GFP_KERNEL_ACCOUNT);
>>>>>>       if (!context)
>>>>>> -        return NULL;
>>>>>> +        goto e_free;
>>>>>>       data.gctx_paddr = __psp_pa(context);
>>>>>>       rc = __sev_issue_cmd(argp->sev_fd, SEV_CMD_SNP_GCTX_CREATE,
>>>>>> &data, &argp->error);
>>>>>> -    if (rc) {
>>>>>> -        snp_free_firmware_page(context);
>>>>>> -        return NULL;
>>>>>> -    }
>>>>>> +    if (rc)
>>>>>> +        goto e_free;
>>>>>> +
>>>>>> +    sev->snp_certs_data = certs_data;
>>>>>>       return context;
>>>>>> +
>>>>>> +e_free:
>>>>>> +    snp_free_firmware_page(context);
>>>>>> +    kfree(certs_data);
>>>>>> +    return NULL;
>>>>>>   }
>>>>>>   static int snp_bind_asid(struct kvm *kvm, int *error)
>>>>>> @@ -2653,6 +2665,8 @@ static int snp_decommission_context(struct
>>>>>> kvm *kvm)
>>>>>>       snp_free_firmware_page(sev->snp_context);
>>>>>>       sev->snp_context = NULL;
>>>>>> +    kfree(sev->snp_certs_data);
>>>>>> +
>>>>>>       return 0;
>>>>>>   }
>>>>>> @@ -3174,6 +3188,8 @@ static int sev_es_validate_vmgexit(struct
>>>>>> vcpu_svm *svm, u64 *exit_code)
>>>>>>       case SVM_VMGEXIT_UNSUPPORTED_EVENT:
>>>>>>       case SVM_VMGEXIT_HV_FEATURES:
>>>>>>       case SVM_VMGEXIT_PSC:
>>>>>> +    case SVM_VMGEXIT_GUEST_REQUEST:
>>>>>> +    case SVM_VMGEXIT_EXT_GUEST_REQUEST:
>>>>>>           break;
>>>>>>       default:
>>>>>>           reason = GHCB_ERR_INVALID_EVENT;
>>>>>> @@ -3396,6 +3412,149 @@ static int snp_complete_psc(struct
>>>>>> kvm_vcpu *vcpu)
>>>>>>       return 1;
>>>>>>   }
>>>>>> +static unsigned long snp_setup_guest_buf(struct vcpu_svm *svm,
>>>>>> +                     struct sev_data_snp_guest_request *data,
>>>>>> +                     gpa_t req_gpa, gpa_t resp_gpa)
>>>>>> +{
>>>>>> +    struct kvm_vcpu *vcpu = &svm->vcpu;
>>>>>> +    struct kvm *kvm = vcpu->kvm;
>>>>>> +    kvm_pfn_t req_pfn, resp_pfn;
>>>>>> +    struct kvm_sev_info *sev;
>>>>>> +
>>>>>> +    sev = &to_kvm_svm(kvm)->sev_info;
>>>>>> +
>>>>>> +    if (!IS_ALIGNED(req_gpa, PAGE_SIZE) || !IS_ALIGNED(resp_gpa,
>>>>>> PAGE_SIZE))
>>>>>> +        return SEV_RET_INVALID_PARAM;
>>>>>> +
>>>>>> +    req_pfn = gfn_to_pfn(kvm, gpa_to_gfn(req_gpa));
>>>>>> +    if (is_error_noslot_pfn(req_pfn))
>>>>>> +        return SEV_RET_INVALID_ADDRESS;
>>>>>> +
>>>>>> +    resp_pfn = gfn_to_pfn(kvm, gpa_to_gfn(resp_gpa));
>>>>>> +    if (is_error_noslot_pfn(resp_pfn))
>>>>>> +        return SEV_RET_INVALID_ADDRESS;
>>>>>> +
>>>>>> +    if (rmp_make_private(resp_pfn, 0, PG_LEVEL_4K, 0, true))
>>>>>> +        return SEV_RET_INVALID_ADDRESS;
>>>>>> +
>>>>>> +    data->gctx_paddr = __psp_pa(sev->snp_context);
>>>>>> +    data->req_paddr = __sme_set(req_pfn << PAGE_SHIFT);
>>>>>> +    data->res_paddr = __sme_set(resp_pfn << PAGE_SHIFT);
>>>>>> +
>>>>>> +    return 0;
>>>>>> +}
>>>>>> +
>>>>>> +static void snp_cleanup_guest_buf(struct
>>>>>> sev_data_snp_guest_request *data, unsigned long *rc)
>>>>>> +{
>>>>>> +    u64 pfn = __sme_clr(data->res_paddr) >> PAGE_SHIFT;
>>>>>> +    int ret;
>>>>>> +
>>>>>> +    ret = snp_page_reclaim(pfn);
>>>>>> +    if (ret)
>>>>>> +        *rc = SEV_RET_INVALID_ADDRESS;
>>>>>> +
>>>>>> +    ret = rmp_make_shared(pfn, PG_LEVEL_4K);
>>>>>> +    if (ret)
>>>>>> +        *rc = SEV_RET_INVALID_ADDRESS;
>>>>>> +}
>>>>>> +
>>>>>> +static void snp_handle_guest_request(struct vcpu_svm *svm, gpa_t
>>>>>> req_gpa, gpa_t resp_gpa)
>>>>>> +{
>>>>>> +    struct sev_data_snp_guest_request data = {0};
>>>>>> +    struct kvm_vcpu *vcpu = &svm->vcpu;
>>>>>> +    struct kvm *kvm = vcpu->kvm;
>>>>>> +    struct kvm_sev_info *sev;
>>>>>> +    unsigned long rc;
>>>>>> +    int err;
>>>>>> +
>>>>>> +    if (!sev_snp_guest(vcpu->kvm)) {
>>>>>> +        rc = SEV_RET_INVALID_GUEST;
>>>>>> +        goto e_fail;
>>>>>> +    }
>>>>>> +
>>>>>> +    sev = &to_kvm_svm(kvm)->sev_info;
>>>>>> +
>>>>>> +    mutex_lock(&sev->guest_req_lock);
>>>>>> +
>>>>>> +    rc = snp_setup_guest_buf(svm, &data, req_gpa, resp_gpa);
>>>>>> +    if (rc)
>>>>>> +        goto unlock;
>>>>>> +
>>>>>> +    rc = sev_issue_cmd(kvm, SEV_CMD_SNP_GUEST_REQUEST, &data, &err);
>>>>>
>>>>>
>>>>> This one goes via sev_issue_cmd_external_user() and uses sev-fd...
>>>>>
>>>>>> +    if (rc)
>>>>>> +        /* use the firmware error code */
>>>>>> +        rc = err;
>>>>>> +
>>>>>> +    snp_cleanup_guest_buf(&data, &rc);
>>>>>> +
>>>>>> +unlock:
>>>>>> +    mutex_unlock(&sev->guest_req_lock);
>>>>>> +
>>>>>> +e_fail:
>>>>>> +    svm_set_ghcb_sw_exit_info_2(vcpu, rc);
>>>>>> +}
>>>>>> +
>>>>>> +static void snp_handle_ext_guest_request(struct vcpu_svm *svm,
>>>>>> gpa_t req_gpa, gpa_t resp_gpa)
>>>>>> +{
>>>>>> +    struct sev_data_snp_guest_request req = {0};
>>>>>> +    struct kvm_vcpu *vcpu = &svm->vcpu;
>>>>>> +    struct kvm *kvm = vcpu->kvm;
>>>>>> +    unsigned long data_npages;
>>>>>> +    struct kvm_sev_info *sev;
>>>>>> +    unsigned long rc, err;
>>>>>> +    u64 data_gpa;
>>>>>> +
>>>>>> +    if (!sev_snp_guest(vcpu->kvm)) {
>>>>>> +        rc = SEV_RET_INVALID_GUEST;
>>>>>> +        goto e_fail;
>>>>>> +    }
>>>>>> +
>>>>>> +    sev = &to_kvm_svm(kvm)->sev_info;
>>>>>> +
>>>>>> +    data_gpa = vcpu->arch.regs[VCPU_REGS_RAX];
>>>>>> +    data_npages = vcpu->arch.regs[VCPU_REGS_RBX];
>>>>>> +
>>>>>> +    if (!IS_ALIGNED(data_gpa, PAGE_SIZE)) {
>>>>>> +        rc = SEV_RET_INVALID_ADDRESS;
>>>>>> +        goto e_fail;
>>>>>> +    }
>>>>>> +
>>>>>> +    mutex_lock(&sev->guest_req_lock);
>>>>>> +
>>>>>> +    rc = snp_setup_guest_buf(svm, &req, req_gpa, resp_gpa);
>>>>>> +    if (rc)
>>>>>> +        goto unlock;
>>>>>> +
>>>>>> +    rc = snp_guest_ext_guest_request(&req, (unsigned
>>>>>> long)sev->snp_certs_data,
>>>>>> +                     &data_npages, &err);
>>>>>
>>>>> but this one does not and jump straight to
>>>>> drivers/crypto/ccp/sev-dev.c ignoring sev->fd. Why different? Can
>>>>> these two be unified? sev_issue_cmd_external_user() only checks if
>>>>> fd is /dev/sev which is hardly useful.
>>>>>
>>>>> "[PATCH RFC v7 32/64] crypto: ccp: Provide APIs to query extended
>>>>> attestation report" added this one.
>>>>
>>>> SNP_EXT_GUEST_REQUEST additionally returns a certificate blob and
>>>> that's why it goes through the CCP driver interface
>>>> snp_guest_ext_guest_request() that is used to get both the report
>>>> and certificate data/blob at the same time.
>>>
>>> True. I thought though that this calls for extending sev_issue_cmd()
>>> to take care of these extra parameters rather than just skipping the
>>> sev->fd.
>>>
>>>
>>>> All the FW API calls on the KVM side go through sev_issue_cmd() and
>>>> sev_issue_cmd_external_user() interfaces and that i believe uses
>>>> sev->fd more of as a sanity check.
>>>
>>> Does not look like it:
>>>
>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/crypto/ccp/sev-dev.c?h=v6.2-rc3#n1290
>>>
>>>
>>> ===
>>> int sev_issue_cmd_external_user(struct file *filep, unsigned int cmd,
>>>                  void *data, int *error)
>>> {
>>>      if (!filep || filep->f_op != &sev_fops)
>>>          return -EBADF;
>>>
>>>      return sev_do_cmd(cmd, data, error);
>>> }
>>> EXPORT_SYMBOL_GPL(sev_issue_cmd_external_user);
>>> ===
>>>
>>> The only "more" is that it requires sev->fd to be a valid open fd,
>>> what is the value in that? I may easily miss the bigger picture here.
>>> Thanks,
>>>
>>>
>>
>> Have a look at following functions in drivers/crypto/ccp/sev-dev.c:
>> sev_dev_init() and sev_misc_init().
>>
>> static int sev_misc_init(struct sev_device *sev)
>> {
>>          struct device *dev = sev->dev;
>>          int ret;
>>
>>          /*
>>           * SEV feature support can be detected on multiple devices but
>>           * the SEV FW commands must be issued on the master. During
>>           * probe, we do not know the master hence we create /dev/sev on
>>           * the first device probe.
>>           * sev_do_cmd() finds the right master device to which to issue
>>           * the command to the firmware.
>>       */
>
>
> It is still a single /dev/sev node and the userspace cannot get it
> wrong, it does not have to choose between (for instance) /dev/sev0 and
> /dev/sev1 on a 2 SOC system.
>
>> ...
>> ...
>>
>> Hence, sev_issue_cmd_external_user() needs to ensure that the correct
>> device (master device) is being operated upon and that's why there is
>> the check for file operations matching sev_fops as below :
>>
>> int sev_issue_cmd_external_user(struct file *filep, unsigned int cmd,
>>                                  void *data, int *error)
>> {
>>          if (!filep || filep->f_op != &sev_fops)
>>                  return -EBADF;
>> ..
>> ..
>>
>> Essentially, sev->fd is the misc. device created for the master PSP
>> device on which the SEV/SNP firmware commands are issued, hence,
>> sev_issue_cmd() uses sev->fd.
>
> There is always just one fd which always uses psp_master, nothing from
> that fd is used.

It also ensures that we can only issue commands (sev_issue_cmd) after
SEV/SNP guest has launched. We don't have a valid fd to use before the
guest launch. The file descriptor is passed as part of the guest launch
flow, for example, in snp_launch_start().

>
> More to the point, if sev->fd is still important, why is it ok to skip
> it for snp_handle_ext_guest_request()? Thanks,
>
>
Then, we should do the same for snp_handle_ext_guest_request().

Thanks,
Ashish

2023-01-11 06:04:44

by Dov Murik

[permalink] [raw]
Subject: Re: [PATCH RFC v7 62/64] x86/sev: Add KVM commands for instance certs



On 10/01/2023 17:10, Tom Lendacky wrote:
> On 1/10/23 01:10, Dov Murik wrote:
>> Hi Tom,
>>
>> On 10/01/2023 0:27, Tom Lendacky wrote:
>>> On 1/9/23 10:55, Dionna Amalie Glaze wrote:
>>>>>> +
>>>>>> +static int snp_set_instance_certs(struct kvm *kvm, struct
>>>>>> kvm_sev_cmd *argp)
>>>>>> +{
>>>>> [...]
>>>>>
>>>>> Here we set the length to the page-aligned value, but we copy only
>>>>> params.cert_len bytes.  If there are two subsequent
>>>>> snp_set_instance_certs() calls where the second one has a shorter
>>>>> length, we might "keep" some leftover bytes from the first call.
>>>>>
>>>>> Consider:
>>>>> 1. snp_set_instance_certs(certs_addr point to "AAA...",
>>>>> certs_len=8192)
>>>>> 2. snp_set_instance_certs(certs_addr point to "BBB...",
>>>>> certs_len=4097)
>>>>>
>>>>> If I understand correctly, on the second call we'll copy 4097 "BBB..."
>>>>> bytes into the to_certs buffer, but length will be (4096 + PAGE_SIZE -
>>>>> 1) & PAGE_MASK which will be 8192.
>>>>>
>>>>> Later when fetching the certs (for the extended report or in
>>>>> snp_get_instance_certs()) the user will get a buffer of 8192 bytes
>>>>> filled with 4097 BBBs and 4095 leftover AAAs.
>>>>>
>>>>> Maybe zero sev->snp_certs_data entirely before writing to it?
>>>>>
>>>>
>>>> Yes, I agree it should be zeroed, at least if the previous length is
>>>> greater than the new length. Good catch.
>>>>
>>>>
>>>>> Related question (not only for this patch) regarding snp_certs_data
>>>>> (host or per-instance): why is its size page-aligned at all? why is it
>>>>> limited by 16KB or 20KB? If I understand correctly, for SNP, this
>>>>> buffer
>>>>> is never sent to the PSP.
>>>>>
>>>>
>>>> The buffer is meant to be copied into the guest driver following the
>>>> GHCB extended guest request protocol. The data to copy back are
>>>> expected to be in 4K page granularity.
>>>
>>> I don't think the data has to be in 4K page granularity. Why do you
>>> think it does?
>>>
>>
>> I looked at AMD publication 56421 SEV-ES Guest-Hypervisor Communication
>> Block Standardization (July 2022), page 37.  The table says:
>>
>> --------------
>>
>> NAE Event: SNP Extended Guest Request
>>
>> Notes:
>>
>> RAX will have the guest physical address of the page(s) to hold returned
>> data
>>
>> RBX
>> State to Hypervisor: will contain the number of guest contiguous
>> pages supplied to hold returned data
>> State from Hypervisor: on error will contain the number of guest
>> contiguous pages required to hold the data to be returned
>>
>> ...
>>
>> The request page, response page and data page(s) must be assigned to the
>> hypervisor (shared).
>>
>> --------------
>>
>>
>> According to this spec, it looks like the sizes are communicated as
>> number of pages in RBX.  So the data should start at a 4KB alignment
>> (this is verified in snp_handle_ext_guest_request()) and its length
>> should be 4KB-aligned, as Dionna noted.
>
> That only indicates how many pages are required to hold the data, but
> the hypervisor only has to copy however much data is present. If the
> data is 20 bytes, then you only have to copy 20 bytes. If the user
> supplied 0 for the number of pages, then the code returns 1 in RBX to
> indicate that one page is required to hold the 20 bytes.
>


Maybe it should only copy 20 bytes, but current implementation copies
whole 4KB pages:


if (sev->snp_certs_len)
data_npages = sev->snp_certs_len >> PAGE_SHIFT;
...
...
/* Copy the certificate blob in the guest memory */
if (data_npages &&
kvm_write_guest(kvm, data_gpa, sev->snp_certs_data, data_npages << PAGE_SHIFT))
rc = SEV_RET_INVALID_ADDRESS;


(elsewhere we ensure that sev->snp_certs_len is page-aligned, so the assignment
to data_npages is in fact correct even though looks off-by-one; aside, maybe it's
better to use some DIV_ROUND_UP macro anywhere we calculate the number of
needed pages.)

Also -- how does the guest know they got only 20 bytes and not 4096? Do they have
to read all the 'struct cert_table' entries at the beginning of the received data?

-Dov


>>
>> I see no reason (in the spec and in the kernel code) for the data length
>> to be limited to 16KB (SEV_FW_BLOB_MAX_SIZE) but I might be missing some
>> flow because Dionna ran into this limit.
>
> Correct, there is no limit. I believe that SEV_FW_BLOB_MAX_SIZE is a way
> to keep the memory usage controlled because data is coming from
> userspace and it isn't expected that the data would be larger than that.
>
> I'm not sure if that was in from the start or as a result of a review
> comment. Not sure what is the best approach is.
>
> Thanks,
> Tom
>
>>
>>
>> -Dov
>>
>>
>>
>>> Thanks,
>>> Tom
>>>
>>>>
>>>>> [...]
>>>>>>
>>>>>> -#define SEV_FW_BLOB_MAX_SIZE 0x4000  /* 16KB */
>>>>>> +#define SEV_FW_BLOB_MAX_SIZE 0x5000  /* 20KB */
>>>>>>
>>>>>
>>>>> This has effects in drivers/crypto/ccp/sev-dev.c
>>>>>                                                                  (for
>>>>> example in alloc_snp_host_map).  Is that OK?
>>>>>
>>>>
>>>> No, this was a mistake of mine because I was using a bloated data
>>>> encoding that needed 5 pages for the GUID table plus 4 small
>>>> certificates. I've since fixed that in our user space code.
>>>> We shouldn't change this size and instead wait for a better size
>>>> negotiation protocol between the guest and host to avoid this awkward
>>>> hard-coding.
>>>>
>>>>

2023-01-11 07:35:02

by Dov Murik

[permalink] [raw]
Subject: Re: [PATCH RFC v7 62/64] x86/sev: Add KVM commands for instance certs

Hi Peter,

On 10/01/2023 17:23, Peter Gonda wrote:
> On Tue, Jan 10, 2023 at 8:10 AM Tom Lendacky <[email protected]> wrote:
>>
>> On 1/10/23 01:10, Dov Murik wrote:
>>> Hi Tom,
>>>
>>> On 10/01/2023 0:27, Tom Lendacky wrote:
>>>> On 1/9/23 10:55, Dionna Amalie Glaze wrote:
>>>>>>> +
>>>>>>> +static int snp_set_instance_certs(struct kvm *kvm, struct
>>>>>>> kvm_sev_cmd *argp)
>>>>>>> +{
>>>>>> [...]
>>>>>>
>>>>>> Here we set the length to the page-aligned value, but we copy only
>>>>>> params.cert_len bytes. If there are two subsequent
>>>>>> snp_set_instance_certs() calls where the second one has a shorter
>>>>>> length, we might "keep" some leftover bytes from the first call.
>>>>>>
>>>>>> Consider:
>>>>>> 1. snp_set_instance_certs(certs_addr point to "AAA...", certs_len=8192)
>>>>>> 2. snp_set_instance_certs(certs_addr point to "BBB...", certs_len=4097)
>>>>>>
>>>>>> If I understand correctly, on the second call we'll copy 4097 "BBB..."
>>>>>> bytes into the to_certs buffer, but length will be (4096 + PAGE_SIZE -
>>>>>> 1) & PAGE_MASK which will be 8192.
>>>>>>
>>>>>> Later when fetching the certs (for the extended report or in
>>>>>> snp_get_instance_certs()) the user will get a buffer of 8192 bytes
>>>>>> filled with 4097 BBBs and 4095 leftover AAAs.
>>>>>>
>>>>>> Maybe zero sev->snp_certs_data entirely before writing to it?
>>>>>>
>>>>>
>>>>> Yes, I agree it should be zeroed, at least if the previous length is
>>>>> greater than the new length. Good catch.
>>>>>
>>>>>
>>>>>> Related question (not only for this patch) regarding snp_certs_data
>>>>>> (host or per-instance): why is its size page-aligned at all? why is it
>>>>>> limited by 16KB or 20KB? If I understand correctly, for SNP, this buffer
>>>>>> is never sent to the PSP.
>>>>>>
>>>>>
>>>>> The buffer is meant to be copied into the guest driver following the
>>>>> GHCB extended guest request protocol. The data to copy back are
>>>>> expected to be in 4K page granularity.
>>>>
>>>> I don't think the data has to be in 4K page granularity. Why do you
>>>> think it does?
>>>>
>>>
>>> I looked at AMD publication 56421 SEV-ES Guest-Hypervisor Communication
>>> Block Standardization (July 2022), page 37. The table says:
>>>
>>> --------------
>>>
>>> NAE Event: SNP Extended Guest Request
>>>
>>> Notes:
>>>
>>> RAX will have the guest physical address of the page(s) to hold returned
>>> data
>>>
>>> RBX
>>> State to Hypervisor: will contain the number of guest contiguous
>>> pages supplied to hold returned data
>>> State from Hypervisor: on error will contain the number of guest
>>> contiguous pages required to hold the data to be returned
>>>
>>> ...
>>>
>>> The request page, response page and data page(s) must be assigned to the
>>> hypervisor (shared).
>>>
>>> --------------
>>>
>>>
>>> According to this spec, it looks like the sizes are communicated as
>>> number of pages in RBX. So the data should start at a 4KB alignment
>>> (this is verified in snp_handle_ext_guest_request()) and its length
>>> should be 4KB-aligned, as Dionna noted.
>>
>> That only indicates how many pages are required to hold the data, but the
>> hypervisor only has to copy however much data is present. If the data is
>> 20 bytes, then you only have to copy 20 bytes. If the user supplied 0 for
>> the number of pages, then the code returns 1 in RBX to indicate that one
>> page is required to hold the 20 bytes.
>>
>>>
>>> I see no reason (in the spec and in the kernel code) for the data length
>>> to be limited to 16KB (SEV_FW_BLOB_MAX_SIZE) but I might be missing some
>>> flow because Dionna ran into this limit.
>>
>> Correct, there is no limit. I believe that SEV_FW_BLOB_MAX_SIZE is a way
>> to keep the memory usage controlled because data is coming from userspace
>> and it isn't expected that the data would be larger than that.
>>
>> I'm not sure if that was in from the start or as a result of a review
>> comment. Not sure what is the best approach is.
>
> This was discussed a bit in the guest driver changes recently too that
> SEV_FW_BLOB_MAX_SIZE is used in the guest driver code for the max cert
> length. We discussed increasing the limit there after fixing the IV
> reuse issue.

I see it now.

(Joerg, maybe we should add F:drivers/virt/coco/ to the MAINTAINERS list
so that patches there are hopefully sent to linux-coco?)


>
> Maybe we could introduce SEV_CERT_BLOB_MAX_SIZE here to be more clear
> there is no firmware based limit? Then we could switch the guest
> driver to use that too. Dionna confirmed 4 pages is enough for our
> current usecase, Dov would you recommend something larger to start?
>

Introducing a new constant sounds good to me (and use the same constant
in the guest driver).

I think 4 pages are OK; I also don't see real harm in increasing this
limit to 1 MB (if the host+guest agree to pass more stuff there, besides
certificates). But maybe that's just abusing this channel, and for
other data we should use other mechanisms (like vsock).

-Dov

2023-01-11 13:33:27

by Sabin Rapan

[permalink] [raw]
Subject: Re: [PATCH RFC v7 40/64] KVM: SVM: Add KVM_SEV_SNP_LAUNCH_FINISH command



On 14.12.2022 21:40, Michael Roth wrote:
> +static int snp_launch_update_vmsa(struct kvm *kvm, struct kvm_sev_cmd *argp)
> +{
> + struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> + struct sev_data_snp_launch_update data = {};
> + int i, ret;
> +
> + data.gctx_paddr = __psp_pa(sev->snp_context);
> + data.page_type = SNP_PAGE_TYPE_VMSA;
> +
> + for (i = 0; i < kvm->created_vcpus; i++) {

Should be replaced with kvm_for_each_vcpu() as it was done for
sev_launch_update_vmsa() in c36b16d29f3a ("KVM: SVM: Use online_vcpus,
not created_vcpus, to iterate over vCPUs").
Prevents accessing uninitialized data in struct vcpu_svm.

--
Sabin.



Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in Romania. Registration number J22/2621/2005.

2023-01-11 13:53:24

by Sabin Rapan

[permalink] [raw]
Subject: Re: [PATCH RFC v7 49/64] KVM: SVM: Introduce ops for the post gfn map and unmap



On 14.12.2022 21:40, Michael Roth wrote:
> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> index a4d48c3e0f89..aef13c120f2d 100644
> --- a/arch/x86/kvm/svm/svm.h
> +++ b/arch/x86/kvm/svm/svm.h
> @@ -100,6 +100,7 @@ struct kvm_sev_info {
> atomic_t migration_in_progress;
> u64 snp_init_flags;
> void *snp_context; /* SNP guest context page */
> + spinlock_t psc_lock;

Looks like a leftover from v6 series.

--
Sabin.



Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in Romania. Registration number J22/2621/2005.

2023-01-11 14:06:49

by Tom Dohrmann

[permalink] [raw]
Subject: Re: [PATCH RFC v7 39/64] KVM: SVM: Add KVM_SEV_SNP_LAUNCH_UPDATE command

On Wed, Dec 14, 2022 at 01:40:31PM -0600, Michael Roth wrote:
> From: Brijesh Singh <[email protected]>
>
> The KVM_SEV_SNP_LAUNCH_UPDATE command can be used to insert data into the
> guest's memory. The data is encrypted with the cryptographic context
> created with the KVM_SEV_SNP_LAUNCH_START.
>
> In addition to the inserting data, it can insert a two special pages
> into the guests memory: the secrets page and the CPUID page.
>
> While terminating the guest, reclaim the guest pages added in the RMP
> table. If the reclaim fails, then the page is no longer safe to be
> released back to the system and leak them.
>
> For more information see the SEV-SNP specification.
>
> Co-developed-by: Michael Roth <[email protected]>
> Signed-off-by: Michael Roth <[email protected]>
> Signed-off-by: Brijesh Singh <[email protected]>
> Signed-off-by: Ashish Kalra <[email protected]>
> ---
> .../virt/kvm/x86/amd-memory-encryption.rst | 29 ++++
> arch/x86/kvm/svm/sev.c | 161 ++++++++++++++++++
> include/uapi/linux/kvm.h | 19 +++
> 3 files changed, 209 insertions(+)
>
> diff --git a/Documentation/virt/kvm/x86/amd-memory-encryption.rst b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
> index 58971fc02a15..c94be8e6d657 100644
> --- a/Documentation/virt/kvm/x86/amd-memory-encryption.rst
> +++ b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
> @@ -485,6 +485,35 @@ Returns: 0 on success, -negative on error
>
> See the SEV-SNP specification for further detail on the launch input.
>
> +20. KVM_SNP_LAUNCH_UPDATE
> +-------------------------
> +
> +The KVM_SNP_LAUNCH_UPDATE is used for encrypting a memory region. It also
> +calculates a measurement of the memory contents. The measurement is a signature
> +of the memory contents that can be sent to the guest owner as an attestation
> +that the memory was encrypted correctly by the firmware.
> +
> +Parameters (in): struct kvm_snp_launch_update
> +
> +Returns: 0 on success, -negative on error
> +
> +::
> +
> + struct kvm_sev_snp_launch_update {
> + __u64 start_gfn; /* Guest page number to start from. */
> + __u64 uaddr; /* userspace address need to be encrypted */
> + __u32 len; /* length of memory region */
> + __u8 imi_page; /* 1 if memory is part of the IMI */
> + __u8 page_type; /* page type */
> + __u8 vmpl3_perms; /* VMPL3 permission mask */
> + __u8 vmpl2_perms; /* VMPL2 permission mask */
> + __u8 vmpl1_perms; /* VMPL1 permission mask */
> + };
> +
> +See the SEV-SNP spec for further details on how to build the VMPL permission
> +mask and page type.
> +
> +
> References
> ==========
>
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 6d1d0e424f76..379e61a9226a 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -238,6 +238,37 @@ static void sev_decommission(unsigned int handle)
> sev_guest_decommission(&decommission, NULL);
> }
>
> +static int snp_page_reclaim(u64 pfn)
> +{
> + struct sev_data_snp_page_reclaim data = {0};
> + int err, rc;
> +
> + data.paddr = __sme_set(pfn << PAGE_SHIFT);
> + rc = sev_do_cmd(SEV_CMD_SNP_PAGE_RECLAIM, &data, &err);
> + if (rc) {
> + /*
> + * If the reclaim failed, then page is no longer safe
> + * to use.
> + */
> + snp_mark_pages_offline(pfn,
> + page_level_size(PG_LEVEL_4K) >> PAGE_SHIFT);
> + }
> +
> + return rc;
> +}
> +
> +static int host_rmp_make_shared(u64 pfn, enum pg_level level, bool leak)
> +{
> + int rc;
> +
> + rc = rmp_make_shared(pfn, level);
> + if (rc && leak)
> + snp_mark_pages_offline(pfn,
> + page_level_size(level) >> PAGE_SHIFT);
> +
> + return rc;
> +}
> +
> static void sev_unbind_asid(struct kvm *kvm, unsigned int handle)
> {
> struct sev_data_deactivate deactivate;
> @@ -2085,6 +2116,133 @@ static int snp_launch_start(struct kvm *kvm, struct kvm_sev_cmd *argp)
> return rc;
> }
>
> +static int snp_launch_update_gfn_handler(struct kvm *kvm,
> + struct kvm_gfn_range *range,
> + void *opaque)
> +{
> + struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> + struct kvm_memory_slot *memslot = range->slot;
> + struct sev_data_snp_launch_update data = {0};
> + struct kvm_sev_snp_launch_update params;
> + struct kvm_sev_cmd *argp = opaque;
> + int *error = &argp->error;
> + int i, n = 0, ret = 0;
> + unsigned long npages;
> + kvm_pfn_t *pfns;
> + gfn_t gfn;
> +
> + if (!kvm_slot_can_be_private(memslot)) {
> + pr_err("SEV-SNP requires restricted memory.\n");
> + return -EINVAL;
> + }
> +
> + if (copy_from_user(&params, (void __user *)(uintptr_t)argp->data, sizeof(params))) {
> + pr_err("Failed to copy user parameters for SEV-SNP launch.\n");
> + return -EFAULT;
> + }
> +
> + data.gctx_paddr = __psp_pa(sev->snp_context);
> +
> + npages = range->end - range->start;
> + pfns = kvmalloc_array(npages, sizeof(*pfns), GFP_KERNEL_ACCOUNT);
> + if (!pfns)
> + return -ENOMEM;
> +
> + pr_debug("%s: GFN range 0x%llx-0x%llx, type %d\n", __func__,
> + range->start, range->end, params.page_type);
> +
> + for (gfn = range->start, i = 0; gfn < range->end; gfn++, i++) {
> + int order, level;
> + void *kvaddr;
> +
> + ret = kvm_restricted_mem_get_pfn(memslot, gfn, &pfns[i], &order);
> + if (ret)
> + goto e_release;
> +
> + n++;
> + ret = snp_lookup_rmpentry((u64)pfns[i], &level);
> + if (ret) {
> + pr_err("Failed to ensure GFN 0x%llx is in initial shared state, ret: %d\n",
> + gfn, ret);
> + return -EFAULT;
> + }
> +
> + kvaddr = pfn_to_kaddr(pfns[i]);
> + if (!virt_addr_valid(kvaddr)) {
> + pr_err("Invalid HVA 0x%llx for GFN 0x%llx\n", (uint64_t)kvaddr, gfn);
> + ret = -EINVAL;
> + goto e_release;
> + }
> +
> + ret = kvm_read_guest_page(kvm, gfn, kvaddr, 0, PAGE_SIZE);
> + if (ret) {
> + pr_err("Guest read failed, ret: 0x%x\n", ret);
> + goto e_release;
> + }
> +
> + ret = rmp_make_private(pfns[i], gfn << PAGE_SHIFT, PG_LEVEL_4K,
> + sev_get_asid(kvm), true);
> + if (ret) {
> + ret = -EFAULT;
> + goto e_release;
> + }
> +
> + data.address = __sme_set(pfns[i] << PAGE_SHIFT);
> + data.page_size = X86_TO_RMP_PG_LEVEL(PG_LEVEL_4K);
> + data.page_type = params.page_type;
> + data.vmpl3_perms = params.vmpl3_perms;
> + data.vmpl2_perms = params.vmpl2_perms;
> + data.vmpl1_perms = params.vmpl1_perms;
> + ret = __sev_issue_cmd(argp->sev_fd, SEV_CMD_SNP_LAUNCH_UPDATE,
> + &data, error);
> + if (ret) {
> + pr_err("SEV-SNP launch update failed, ret: 0x%x, fw_error: 0x%x\n",
> + ret, *error);
> + snp_page_reclaim(pfns[i]);
> + goto e_release;

When a launch update fails for a CPUID page with error `INVALID_PARAM` the
firmware writes back corrected values. We should probably write these values
back to userspace. Before UPM was introduced this happened automatically
because we didn't copy the page to private memory and did the update
completly in place.

> + }
> + }
> +
> + kvm_vm_set_region_attr(kvm, range->start, range->end, KVM_MEMORY_ATTRIBUTE_PRIVATE);
> +
> +e_release:
> + /* Content of memory is updated, mark pages dirty */
> + for (i = 0; i < n; i++) {
> + set_page_dirty(pfn_to_page(pfns[i]));
> + mark_page_accessed(pfn_to_page(pfns[i]));
> +
> + /*
> + * If its an error, then update RMP entry to change page ownership
> + * to the hypervisor.
> + */
> + if (ret)
> + host_rmp_make_shared(pfns[i], PG_LEVEL_4K, true);
> +
> + put_page(pfn_to_page(pfns[i]));
> + }
> +
> + kvfree(pfns);
> + return ret;
> +}
> +
> +static int snp_launch_update(struct kvm *kvm, struct kvm_sev_cmd *argp)
> +{
> + struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> + struct kvm_sev_snp_launch_update params;
> +
> + if (!sev_snp_guest(kvm))
> + return -ENOTTY;
> +
> + if (!sev->snp_context)
> + return -EINVAL;
> +
> + if (copy_from_user(&params, (void __user *)(uintptr_t)argp->data, sizeof(params)))
> + return -EFAULT;
> +
> + return kvm_vm_do_hva_range_op(kvm, params.uaddr, params.uaddr + params.len,
> + snp_launch_update_gfn_handler, argp);
> +}
> +
> int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
> {
> struct kvm_sev_cmd sev_cmd;
> @@ -2178,6 +2336,9 @@ int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
> case KVM_SEV_SNP_LAUNCH_START:
> r = snp_launch_start(kvm, &sev_cmd);
> break;
> + case KVM_SEV_SNP_LAUNCH_UPDATE:
> + r = snp_launch_update(kvm, &sev_cmd);
> + break;
> default:
> r = -EINVAL;
> goto out;
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index b2311e0abeef..9b6c95cc62a8 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -1941,6 +1941,7 @@ enum sev_cmd_id {
> /* SNP specific commands */
> KVM_SEV_SNP_INIT,
> KVM_SEV_SNP_LAUNCH_START,
> + KVM_SEV_SNP_LAUNCH_UPDATE,
>
> KVM_SEV_NR_MAX,
> };
> @@ -2057,6 +2058,24 @@ struct kvm_sev_snp_launch_start {
> __u8 pad[6];
> };
>
> +#define KVM_SEV_SNP_PAGE_TYPE_NORMAL 0x1
> +#define KVM_SEV_SNP_PAGE_TYPE_VMSA 0x2
> +#define KVM_SEV_SNP_PAGE_TYPE_ZERO 0x3
> +#define KVM_SEV_SNP_PAGE_TYPE_UNMEASURED 0x4
> +#define KVM_SEV_SNP_PAGE_TYPE_SECRETS 0x5
> +#define KVM_SEV_SNP_PAGE_TYPE_CPUID 0x6
> +
> +struct kvm_sev_snp_launch_update {
> + __u64 start_gfn;
> + __u64 uaddr;
> + __u32 len;
> + __u8 imi_page;
> + __u8 page_type;
> + __u8 vmpl3_perms;
> + __u8 vmpl2_perms;
> + __u8 vmpl1_perms;
> +};
> +
> #define KVM_DEV_ASSIGN_ENABLE_IOMMU (1 << 0)
> #define KVM_DEV_ASSIGN_PCI_2_3 (1 << 1)
> #define KVM_DEV_ASSIGN_MASK_INTX (1 << 2)
> --
> 2.25.1
>

Regards, Tom

2023-01-11 14:10:25

by Harald Hoyer

[permalink] [raw]
Subject: Re: [PATCH RFC v7 39/64] KVM: SVM: Add KVM_SEV_SNP_LAUNCH_UPDATE command

Am 11.01.23 um 14:56 schrieb Tom Dohrmann:
> On Wed, Dec 14, 2022 at 01:40:31PM -0600, Michael Roth wrote:
>> From: Brijesh Singh <[email protected]>
>>
>> The KVM_SEV_SNP_LAUNCH_UPDATE command can be used to insert data into the
>> guest's memory. The data is encrypted with the cryptographic context
>> created with the KVM_SEV_SNP_LAUNCH_START.
>>
>> In addition to the inserting data, it can insert a two special pages
>> into the guests memory: the secrets page and the CPUID page.
>>
>> While terminating the guest, reclaim the guest pages added in the RMP
>> table. If the reclaim fails, then the page is no longer safe to be
>> released back to the system and leak them.
>>
>> For more information see the SEV-SNP specification.
>>
>> Co-developed-by: Michael Roth <[email protected]>
>> Signed-off-by: Michael Roth <[email protected]>
>> Signed-off-by: Brijesh Singh <[email protected]>
>> Signed-off-by: Ashish Kalra <[email protected]>
>> ---
>> .../virt/kvm/x86/amd-memory-encryption.rst | 29 ++++
>> arch/x86/kvm/svm/sev.c | 161 ++++++++++++++++++
>> include/uapi/linux/kvm.h | 19 +++
>> 3 files changed, 209 insertions(+)
>>
>> diff --git a/Documentation/virt/kvm/x86/amd-memory-encryption.rst b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
>> index 58971fc02a15..c94be8e6d657 100644
>> --- a/Documentation/virt/kvm/x86/amd-memory-encryption.rst
>> +++ b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
>> @@ -485,6 +485,35 @@ Returns: 0 on success, -negative on error
>>
>> See the SEV-SNP specification for further detail on the launch input.
>>
>> +20. KVM_SNP_LAUNCH_UPDATE
>> +-------------------------
>> +
>> +The KVM_SNP_LAUNCH_UPDATE is used for encrypting a memory region. It also
>> +calculates a measurement of the memory contents. The measurement is a signature
>> +of the memory contents that can be sent to the guest owner as an attestation
>> +that the memory was encrypted correctly by the firmware.
>> +
>> +Parameters (in): struct kvm_snp_launch_update
>> +
>> +Returns: 0 on success, -negative on error
>> +
>> +::
>> +
>> + struct kvm_sev_snp_launch_update {
>> + __u64 start_gfn; /* Guest page number to start from. */
>> + __u64 uaddr; /* userspace address need to be encrypted */
>> + __u32 len; /* length of memory region */
>> + __u8 imi_page; /* 1 if memory is part of the IMI */
>> + __u8 page_type; /* page type */
>> + __u8 vmpl3_perms; /* VMPL3 permission mask */
>> + __u8 vmpl2_perms; /* VMPL2 permission mask */
>> + __u8 vmpl1_perms; /* VMPL1 permission mask */
>> + };
>> +
>> +See the SEV-SNP spec for further details on how to build the VMPL permission
>> +mask and page type.
>> +
>> +
>> References
>> ==========
>>
>> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
>> index 6d1d0e424f76..379e61a9226a 100644
>> --- a/arch/x86/kvm/svm/sev.c
>> +++ b/arch/x86/kvm/svm/sev.c
>> @@ -238,6 +238,37 @@ static void sev_decommission(unsigned int handle)
>> sev_guest_decommission(&decommission, NULL);
>> }
>>
>> +static int snp_page_reclaim(u64 pfn)
>> +{
>> + struct sev_data_snp_page_reclaim data = {0};
>> + int err, rc;
>> +
>> + data.paddr = __sme_set(pfn << PAGE_SHIFT);
>> + rc = sev_do_cmd(SEV_CMD_SNP_PAGE_RECLAIM, &data, &err);
>> + if (rc) {
>> + /*
>> + * If the reclaim failed, then page is no longer safe
>> + * to use.
>> + */
>> + snp_mark_pages_offline(pfn,
>> + page_level_size(PG_LEVEL_4K) >> PAGE_SHIFT);
>> + }
>> +
>> + return rc;
>> +}
>> +
>> +static int host_rmp_make_shared(u64 pfn, enum pg_level level, bool leak)
>> +{
>> + int rc;
>> +
>> + rc = rmp_make_shared(pfn, level);
>> + if (rc && leak)
>> + snp_mark_pages_offline(pfn,
>> + page_level_size(level) >> PAGE_SHIFT);
>> +
>> + return rc;
>> +}
>> +
>> static void sev_unbind_asid(struct kvm *kvm, unsigned int handle)
>> {
>> struct sev_data_deactivate deactivate;
>> @@ -2085,6 +2116,133 @@ static int snp_launch_start(struct kvm *kvm, struct kvm_sev_cmd *argp)
>> return rc;
>> }
>>
>> +static int snp_launch_update_gfn_handler(struct kvm *kvm,
>> + struct kvm_gfn_range *range,
>> + void *opaque)
>> +{
>> + struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
>> + struct kvm_memory_slot *memslot = range->slot;
>> + struct sev_data_snp_launch_update data = {0};
>> + struct kvm_sev_snp_launch_update params;
>> + struct kvm_sev_cmd *argp = opaque;
>> + int *error = &argp->error;
>> + int i, n = 0, ret = 0;
>> + unsigned long npages;
>> + kvm_pfn_t *pfns;
>> + gfn_t gfn;
>> +
>> + if (!kvm_slot_can_be_private(memslot)) {
>> + pr_err("SEV-SNP requires restricted memory.\n");
>> + return -EINVAL;
>> + }
>> +
>> + if (copy_from_user(&params, (void __user *)(uintptr_t)argp->data, sizeof(params))) {
>> + pr_err("Failed to copy user parameters for SEV-SNP launch.\n");
>> + return -EFAULT;
>> + }
>> +
>> + data.gctx_paddr = __psp_pa(sev->snp_context);
>> +
>> + npages = range->end - range->start;
>> + pfns = kvmalloc_array(npages, sizeof(*pfns), GFP_KERNEL_ACCOUNT);
>> + if (!pfns)
>> + return -ENOMEM;
>> +
>> + pr_debug("%s: GFN range 0x%llx-0x%llx, type %d\n", __func__,
>> + range->start, range->end, params.page_type);
>> +
>> + for (gfn = range->start, i = 0; gfn < range->end; gfn++, i++) {
>> + int order, level;
>> + void *kvaddr;
>> +
>> + ret = kvm_restricted_mem_get_pfn(memslot, gfn, &pfns[i], &order);
>> + if (ret)
>> + goto e_release;
>> +
>> + n++;
>> + ret = snp_lookup_rmpentry((u64)pfns[i], &level);
>> + if (ret) {
>> + pr_err("Failed to ensure GFN 0x%llx is in initial shared state, ret: %d\n",
>> + gfn, ret);
>> + return -EFAULT;
>> + }
>> +
>> + kvaddr = pfn_to_kaddr(pfns[i]);
>> + if (!virt_addr_valid(kvaddr)) {
>> + pr_err("Invalid HVA 0x%llx for GFN 0x%llx\n", (uint64_t)kvaddr, gfn);
>> + ret = -EINVAL;
>> + goto e_release;
>> + }
>> +
>> + ret = kvm_read_guest_page(kvm, gfn, kvaddr, 0, PAGE_SIZE);
>> + if (ret) {
>> + pr_err("Guest read failed, ret: 0x%x\n", ret);
>> + goto e_release;
>> + }
>> +
>> + ret = rmp_make_private(pfns[i], gfn << PAGE_SHIFT, PG_LEVEL_4K,
>> + sev_get_asid(kvm), true);
>> + if (ret) {
>> + ret = -EFAULT;
>> + goto e_release;
>> + }
>> +
>> + data.address = __sme_set(pfns[i] << PAGE_SHIFT);
>> + data.page_size = X86_TO_RMP_PG_LEVEL(PG_LEVEL_4K);
>> + data.page_type = params.page_type;
>> + data.vmpl3_perms = params.vmpl3_perms;
>> + data.vmpl2_perms = params.vmpl2_perms;
>> + data.vmpl1_perms = params.vmpl1_perms;
>> + ret = __sev_issue_cmd(argp->sev_fd, SEV_CMD_SNP_LAUNCH_UPDATE,
>> + &data, error);
>> + if (ret) {
>> + pr_err("SEV-SNP launch update failed, ret: 0x%x, fw_error: 0x%x\n",
>> + ret, *error);
>> + snp_page_reclaim(pfns[i]);
>> + goto e_release;
>
> When a launch update fails for a CPUID page with error `INVALID_PARAM` the
> firmware writes back corrected values. We should probably write these values
> back to userspace. Before UPM was introduced this happened automatically
> because we didn't copy the page to private memory and did the update
> completly in place.
>

Yes, pretty please!

2023-01-11 14:47:02

by Tom Lendacky

[permalink] [raw]
Subject: Re: [PATCH RFC v7 62/64] x86/sev: Add KVM commands for instance certs

On 1/11/23 00:00, Dov Murik wrote:
>
>
> On 10/01/2023 17:10, Tom Lendacky wrote:
>> On 1/10/23 01:10, Dov Murik wrote:
>>> Hi Tom,
>>>
>>> On 10/01/2023 0:27, Tom Lendacky wrote:
>>>> On 1/9/23 10:55, Dionna Amalie Glaze wrote:
>>>>>>> +
>>>>>>> +static int snp_set_instance_certs(struct kvm *kvm, struct
>>>>>>> kvm_sev_cmd *argp)
>>>>>>> +{
>>>>>> [...]
>>>>>>
>>>>>> Here we set the length to the page-aligned value, but we copy only
>>>>>> params.cert_len bytes.  If there are two subsequent
>>>>>> snp_set_instance_certs() calls where the second one has a shorter
>>>>>> length, we might "keep" some leftover bytes from the first call.
>>>>>>
>>>>>> Consider:
>>>>>> 1. snp_set_instance_certs(certs_addr point to "AAA...",
>>>>>> certs_len=8192)
>>>>>> 2. snp_set_instance_certs(certs_addr point to "BBB...",
>>>>>> certs_len=4097)
>>>>>>
>>>>>> If I understand correctly, on the second call we'll copy 4097 "BBB..."
>>>>>> bytes into the to_certs buffer, but length will be (4096 + PAGE_SIZE -
>>>>>> 1) & PAGE_MASK which will be 8192.
>>>>>>
>>>>>> Later when fetching the certs (for the extended report or in
>>>>>> snp_get_instance_certs()) the user will get a buffer of 8192 bytes
>>>>>> filled with 4097 BBBs and 4095 leftover AAAs.
>>>>>>
>>>>>> Maybe zero sev->snp_certs_data entirely before writing to it?
>>>>>>
>>>>>
>>>>> Yes, I agree it should be zeroed, at least if the previous length is
>>>>> greater than the new length. Good catch.
>>>>>
>>>>>
>>>>>> Related question (not only for this patch) regarding snp_certs_data
>>>>>> (host or per-instance): why is its size page-aligned at all? why is it
>>>>>> limited by 16KB or 20KB? If I understand correctly, for SNP, this
>>>>>> buffer
>>>>>> is never sent to the PSP.
>>>>>>
>>>>>
>>>>> The buffer is meant to be copied into the guest driver following the
>>>>> GHCB extended guest request protocol. The data to copy back are
>>>>> expected to be in 4K page granularity.
>>>>
>>>> I don't think the data has to be in 4K page granularity. Why do you
>>>> think it does?
>>>>
>>>
>>> I looked at AMD publication 56421 SEV-ES Guest-Hypervisor Communication
>>> Block Standardization (July 2022), page 37.  The table says:
>>>
>>> --------------
>>>
>>> NAE Event: SNP Extended Guest Request
>>>
>>> Notes:
>>>
>>> RAX will have the guest physical address of the page(s) to hold returned
>>> data
>>>
>>> RBX
>>> State to Hypervisor: will contain the number of guest contiguous
>>> pages supplied to hold returned data
>>> State from Hypervisor: on error will contain the number of guest
>>> contiguous pages required to hold the data to be returned
>>>
>>> ...
>>>
>>> The request page, response page and data page(s) must be assigned to the
>>> hypervisor (shared).
>>>
>>> --------------
>>>
>>>
>>> According to this spec, it looks like the sizes are communicated as
>>> number of pages in RBX.  So the data should start at a 4KB alignment
>>> (this is verified in snp_handle_ext_guest_request()) and its length
>>> should be 4KB-aligned, as Dionna noted.
>>
>> That only indicates how many pages are required to hold the data, but
>> the hypervisor only has to copy however much data is present. If the
>> data is 20 bytes, then you only have to copy 20 bytes. If the user
>> supplied 0 for the number of pages, then the code returns 1 in RBX to
>> indicate that one page is required to hold the 20 bytes.
>>
>
>
> Maybe it should only copy 20 bytes, but current implementation copies
> whole 4KB pages:
>
>
> if (sev->snp_certs_len)
> data_npages = sev->snp_certs_len >> PAGE_SHIFT;
> ...
> ...
> /* Copy the certificate blob in the guest memory */
> if (data_npages &&
> kvm_write_guest(kvm, data_gpa, sev->snp_certs_data, data_npages << PAGE_SHIFT))
> rc = SEV_RET_INVALID_ADDRESS;
>
>
> (elsewhere we ensure that sev->snp_certs_len is page-aligned, so the assignment
> to data_npages is in fact correct even though looks off-by-one; aside, maybe it's
> better to use some DIV_ROUND_UP macro anywhere we calculate the number of
> needed pages.)

Hmmm... yeah, not sure why it was implemented that way, I guess it can
always be changed later if desired.

>
> Also -- how does the guest know they got only 20 bytes and not 4096? Do they have
> to read all the 'struct cert_table' entries at the beginning of the received data?

Yes, they should walk the cert table entries.

Thanks,
Tom

>
> -Dov
>
>
>>>
>>> I see no reason (in the spec and in the kernel code) for the data length
>>> to be limited to 16KB (SEV_FW_BLOB_MAX_SIZE) but I might be missing some
>>> flow because Dionna ran into this limit.
>>
>> Correct, there is no limit. I believe that SEV_FW_BLOB_MAX_SIZE is a way
>> to keep the memory usage controlled because data is coming from
>> userspace and it isn't expected that the data would be larger than that.
>>
>> I'm not sure if that was in from the start or as a result of a review
>> comment. Not sure what is the best approach is.
>>
>> Thanks,
>> Tom
>>
>>>
>>>
>>> -Dov
>>>
>>>
>>>
>>>> Thanks,
>>>> Tom
>>>>
>>>>>
>>>>>> [...]
>>>>>>>
>>>>>>> -#define SEV_FW_BLOB_MAX_SIZE 0x4000  /* 16KB */
>>>>>>> +#define SEV_FW_BLOB_MAX_SIZE 0x5000  /* 20KB */
>>>>>>>
>>>>>>
>>>>>> This has effects in drivers/crypto/ccp/sev-dev.c
>>>>>>                                                                  (for
>>>>>> example in alloc_snp_host_map).  Is that OK?
>>>>>>
>>>>>
>>>>> No, this was a mistake of mine because I was using a bloated data
>>>>> encoding that needed 5 pages for the GUID table plus 4 small
>>>>> certificates. I've since fixed that in our user space code.
>>>>> We shouldn't change this size and instead wait for a better size
>>>>> negotiation protocol between the guest and host to avoid this awkward
>>>>> hard-coding.
>>>>>
>>>>>

2023-01-11 14:52:53

by Sabin Rapan

[permalink] [raw]
Subject: Re: [PATCH RFC v7 14/64] x86/sev: Add the host SEV-SNP initialization support



On 14.12.2022 21:40, Michael Roth wrote:
> +#ifdef CONFIG_AMD_MEM_ENCRYPT
> +# define DISABLE_SEV_SNP 0
> +#else
> +# define DISABLE_SEV_SNP (1 << (X86_FEATURE_SEV_SNP & 31))
> +#endif
> +

Would it make sense to split the SEV-* feature family into their own
config flag(s) ?
I'm thinking in the context of SEV-SNP running on systems with
Transparent SME enabled in the bios. In this case, enabling
CONFIG_AMD_MEM_ENCRYPT will also enable SME in the kernel, which is a
bit strange and not necessarily useful.
Commit 4e2c87949f2b ("crypto: ccp - When TSME and SME both detected
notify user") highlights it.

--
Sabin.



Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in Romania. Registration number J22/2621/2005.

2023-01-11 14:52:54

by Tom Dohrmann

[permalink] [raw]
Subject: Re: [PATCH RFC v7 47/64] KVM: SVM: Add support to handle MSR based Page State Change VMGEXIT

On Wed, Dec 14, 2022 at 01:40:39PM -0600, Michael Roth wrote:
> From: Brijesh Singh <[email protected]>
>
> SEV-SNP VMs can ask the hypervisor to change the page state in the RMP
> table to be private or shared using the Page State Change MSR protocol
> as defined in the GHCB specification.
>
> Forward these requests to userspace via KVM_EXIT_VMGEXIT so the VMM can
> issue the KVM ioctls to update the page state accordingly.
>
> Co-developed-by: Michael Roth <[email protected]>
> Signed-off-by: Michael Roth <[email protected]>
> Signed-off-by: Brijesh Singh <[email protected]>
> Signed-off-by: Ashish Kalra <[email protected]>
> ---
> arch/x86/include/asm/sev-common.h | 9 ++++++++
> arch/x86/kvm/svm/sev.c | 25 +++++++++++++++++++++++
> arch/x86/kvm/trace.h | 34 +++++++++++++++++++++++++++++++
> arch/x86/kvm/x86.c | 1 +
> 4 files changed, 69 insertions(+)
>
> diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
> index 0a9055cdfae2..ee38f7408470 100644
> --- a/arch/x86/include/asm/sev-common.h
> +++ b/arch/x86/include/asm/sev-common.h
> @@ -93,6 +93,10 @@ enum psc_op {
> };
>
> #define GHCB_MSR_PSC_REQ 0x014
> +#define GHCB_MSR_PSC_GFN_POS 12
> +#define GHCB_MSR_PSC_GFN_MASK GENMASK_ULL(39, 0)
> +#define GHCB_MSR_PSC_OP_POS 52
> +#define GHCB_MSR_PSC_OP_MASK 0xf
> #define GHCB_MSR_PSC_REQ_GFN(gfn, op) \
> /* GHCBData[55:52] */ \
> (((u64)((op) & 0xf) << 52) | \
> @@ -102,6 +106,11 @@ enum psc_op {
> GHCB_MSR_PSC_REQ)
>
> #define GHCB_MSR_PSC_RESP 0x015
> +#define GHCB_MSR_PSC_ERROR_POS 32
> +#define GHCB_MSR_PSC_ERROR_MASK GENMASK_ULL(31, 0)
> +#define GHCB_MSR_PSC_ERROR GENMASK_ULL(31, 0)
> +#define GHCB_MSR_PSC_RSVD_POS 12
> +#define GHCB_MSR_PSC_RSVD_MASK GENMASK_ULL(19, 0)
> #define GHCB_MSR_PSC_RESP_VAL(val) \
> /* GHCBData[63:32] */ \
> (((u64)(val) & GENMASK_ULL(63, 32)) >> 32)
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index d7b467b620aa..d7988629073b 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -29,6 +29,7 @@
> #include "svm_ops.h"
> #include "cpuid.h"
> #include "trace.h"
> +#include "mmu.h"
>
> #ifndef CONFIG_KVM_AMD_SEV
> /*
> @@ -3350,6 +3351,23 @@ static void set_ghcb_msr(struct vcpu_svm *svm, u64 value)
> svm->vmcb->control.ghcb_gpa = value;
> }
>
> +/*
> + * TODO: need to get the value set by userspace in vcpu->run->vmgexit.ghcb_msr
> + * and process that here accordingly.
> + */
> +static int snp_complete_psc_msr_protocol(struct kvm_vcpu *vcpu)
> +{
> + struct vcpu_svm *svm = to_svm(vcpu);
> +
> + set_ghcb_msr_bits(svm, 0,
> + GHCB_MSR_PSC_ERROR_MASK, GHCB_MSR_PSC_ERROR_POS);
> +
> + set_ghcb_msr_bits(svm, 0, GHCB_MSR_PSC_RSVD_MASK, GHCB_MSR_PSC_RSVD_POS);
> + set_ghcb_msr_bits(svm, GHCB_MSR_PSC_RESP, GHCB_MSR_INFO_MASK, GHCB_MSR_INFO_POS);
> +
> + return 1; /* resume */
> +}
> +
> static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
> {
> struct vmcb_control_area *control = &svm->vmcb->control;
> @@ -3450,6 +3468,13 @@ static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
> GHCB_MSR_INFO_POS);
> break;
> }
> + case GHCB_MSR_PSC_REQ:
> + vcpu->run->exit_reason = KVM_EXIT_VMGEXIT;
> + vcpu->run->vmgexit.ghcb_msr = control->ghcb_gpa;
> + vcpu->arch.complete_userspace_io = snp_complete_psc_msr_protocol;
> +
> + ret = -1;
> + break;

What's the reasoning behind returning an error (-1) here? This error bubbles all
the way up to the `KVM_RUN` ioctl. Would it be more appropriate to return 0?
Returning 0 would cause a VM exit without indicating an error to userspace.

> case GHCB_MSR_TERM_REQ: {
> u64 reason_set, reason_code;
>
> diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
> index 83843379813e..65861d2d086c 100644
> --- a/arch/x86/kvm/trace.h
> +++ b/arch/x86/kvm/trace.h
> @@ -7,6 +7,7 @@
> #include <asm/svm.h>
> #include <asm/clocksource.h>
> #include <asm/pvclock-abi.h>
> +#include <asm/sev-common.h>
>
> #undef TRACE_SYSTEM
> #define TRACE_SYSTEM kvm
> @@ -1831,6 +1832,39 @@ TRACE_EVENT(kvm_vmgexit_msr_protocol_exit,
> __entry->vcpu_id, __entry->ghcb_gpa, __entry->result)
> );
>
> +/*
> + * Tracepoint for the SEV-SNP page state change processing
> + */
> +#define psc_operation \
> + {SNP_PAGE_STATE_PRIVATE, "private"}, \
> + {SNP_PAGE_STATE_SHARED, "shared"} \
> +
> +TRACE_EVENT(kvm_snp_psc,
> + TP_PROTO(unsigned int vcpu_id, u64 pfn, u64 gpa, u8 op, int level),
> + TP_ARGS(vcpu_id, pfn, gpa, op, level),
> +
> + TP_STRUCT__entry(
> + __field(int, vcpu_id)
> + __field(u64, pfn)
> + __field(u64, gpa)
> + __field(u8, op)
> + __field(int, level)
> + ),
> +
> + TP_fast_assign(
> + __entry->vcpu_id = vcpu_id;
> + __entry->pfn = pfn;
> + __entry->gpa = gpa;
> + __entry->op = op;
> + __entry->level = level;
> + ),
> +
> + TP_printk("vcpu %u, pfn %llx, gpa %llx, op %s, level %d",
> + __entry->vcpu_id, __entry->pfn, __entry->gpa,
> + __print_symbolic(__entry->op, psc_operation),
> + __entry->level)
> +);
> +
> #endif /* _TRACE_KVM_H */
>
> #undef TRACE_INCLUDE_PATH
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 732f9cbbadb5..08dd1ef7e136 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -13481,6 +13481,7 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_vmgexit_enter);
> EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_vmgexit_exit);
> EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_vmgexit_msr_protocol_enter);
> EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_vmgexit_msr_protocol_exit);
> +EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_snp_psc);
>
> static int __init kvm_x86_init(void)
> {
> --
> 2.25.1
>

Regards, Tom

2023-01-11 23:28:35

by Kalra, Ashish

[permalink] [raw]
Subject: Re: [PATCH RFC v7 40/64] KVM: SVM: Add KVM_SEV_SNP_LAUNCH_FINISH command

On 1/11/2023 7:27 AM, Sabin Rapan wrote:
>
>
> On 14.12.2022 21:40, Michael Roth wrote:
>> +static int snp_launch_update_vmsa(struct kvm *kvm, struct kvm_sev_cmd *argp)
>> +{
>> + struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
>> + struct sev_data_snp_launch_update data = {};
>> + int i, ret;
>> +
>> + data.gctx_paddr = __psp_pa(sev->snp_context);
>> + data.page_type = SNP_PAGE_TYPE_VMSA;
>> +
>> + for (i = 0; i < kvm->created_vcpus; i++) {
>
> Should be replaced with kvm_for_each_vcpu() as it was done for
> sev_launch_update_vmsa() in c36b16d29f3a ("KVM: SVM: Use online_vcpus,
> not created_vcpus, to iterate over vCPUs").
> Prevents accessing uninitialized data in struct vcpu_svm.

Yes, fixed this one.

Thanks,
Ashish

2023-01-12 21:06:22

by Alper Gun

[permalink] [raw]
Subject: Re: [PATCH RFC v7 29/64] crypto: ccp: Handle the legacy SEV command when SNP is enabled

On Wed, Dec 14, 2022 at 11:54 AM Michael Roth <[email protected]> wrote:
>
> From: Brijesh Singh <[email protected]>
>
> The behavior of the SEV-legacy commands is altered when the SNP firmware
> is in the INIT state. When SNP is in INIT state, all the SEV-legacy
> commands that cause the firmware to write to memory must be in the
> firmware state before issuing the command..
>
> A command buffer may contains a system physical address that the firmware
> may write to. There are two cases that need to be handled:
>
> 1) system physical address points to a guest memory
> 2) system physical address points to a host memory
>
> To handle the case #1, change the page state to the firmware in the RMP
> table before issuing the command and restore the state to shared after the
> command completes.
>
> For the case #2, use a bounce buffer to complete the request.
>
> Signed-off-by: Brijesh Singh <[email protected]>
> Signed-off-by: Ashish Kalra <[email protected]>
> Signed-off-by: Michael Roth <[email protected]>
> ---
> drivers/crypto/ccp/sev-dev.c | 370 ++++++++++++++++++++++++++++++++++-
> drivers/crypto/ccp/sev-dev.h | 12 ++
> 2 files changed, 372 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
> index 4c12e98a1219..5eb2e8f364d4 100644
> --- a/drivers/crypto/ccp/sev-dev.c
> +++ b/drivers/crypto/ccp/sev-dev.c
> @@ -286,6 +286,30 @@ static int rmp_mark_pages_firmware(unsigned long paddr, unsigned int npages, boo
> return rc;
> }
>
> +static int rmp_mark_pages_shared(unsigned long paddr, unsigned int npages)
> +{
> + /* Cbit maybe set in the paddr */
> + unsigned long pfn = __sme_clr(paddr) >> PAGE_SHIFT;
> + int rc, n = 0, i;
> +
> + for (i = 0; i < npages; i++, pfn++, n++) {
> + rc = rmp_make_shared(pfn, PG_LEVEL_4K);
> + if (rc)
> + goto cleanup;
> + }
> +
> + return 0;
> +
> +cleanup:
> + /*
> + * If failed to change the page state to shared, then its not safe
> + * to release the page back to the system, leak it.
> + */
> + snp_mark_pages_offline(pfn, npages - n);
> +
> + return rc;
> +}
> +
> static struct page *__snp_alloc_firmware_pages(gfp_t gfp_mask, int order, bool locked)
> {
> unsigned long npages = 1ul << order, paddr;
> @@ -487,12 +511,295 @@ static int sev_write_init_ex_file_if_required(int cmd_id)
> return sev_write_init_ex_file();
> }
>
> +static int alloc_snp_host_map(struct sev_device *sev)
> +{
> + struct page *page;
> + int i;
> +
> + for (i = 0; i < MAX_SNP_HOST_MAP_BUFS; i++) {
> + struct snp_host_map *map = &sev->snp_host_map[i];
> +
> + memset(map, 0, sizeof(*map));
> +
> + page = alloc_pages(GFP_KERNEL_ACCOUNT, get_order(SEV_FW_BLOB_MAX_SIZE));
> + if (!page)
> + return -ENOMEM;
> +
> + map->host = page_address(page);
> + }
> +
> + return 0;
> +}
> +
> +static void free_snp_host_map(struct sev_device *sev)
> +{
> + int i;
> +
> + for (i = 0; i < MAX_SNP_HOST_MAP_BUFS; i++) {
> + struct snp_host_map *map = &sev->snp_host_map[i];
> +
> + if (map->host) {
> + __free_pages(virt_to_page(map->host), get_order(SEV_FW_BLOB_MAX_SIZE));
> + memset(map, 0, sizeof(*map));
> + }
> + }
> +}
> +
> +static int map_firmware_writeable(u64 *paddr, u32 len, bool guest, struct snp_host_map *map)
> +{
> + unsigned int npages = PAGE_ALIGN(len) >> PAGE_SHIFT;
> +
> + map->active = false;
> +
> + if (!paddr || !len)
> + return 0;
> +
> + map->paddr = *paddr;
> + map->len = len;
> +
> + /* If paddr points to a guest memory then change the page state to firmwware. */
> + if (guest) {
> + if (rmp_mark_pages_firmware(*paddr, npages, true))
> + return -EFAULT;
> +
> + goto done;
> + }
> +
> + if (!map->host)
> + return -ENOMEM;
> +
> + /* Check if the pre-allocated buffer can be used to fullfil the request. */
> + if (len > SEV_FW_BLOB_MAX_SIZE)
> + return -EINVAL;
> +
> + /* Transition the pre-allocated buffer to the firmware state. */
> + if (rmp_mark_pages_firmware(__pa(map->host), npages, true))
> + return -EFAULT;
> +
> + /* Set the paddr to use pre-allocated firmware buffer */
> + *paddr = __psp_pa(map->host);
> +
> +done:
> + map->active = true;
> + return 0;
> +}
> +
> +static int unmap_firmware_writeable(u64 *paddr, u32 len, bool guest, struct snp_host_map *map)
> +{
> + unsigned int npages = PAGE_ALIGN(len) >> PAGE_SHIFT;
> +
> + if (!map->active)
> + return 0;
> +
> + /* If paddr points to a guest memory then restore the page state to hypervisor. */
> + if (guest) {
> + if (snp_reclaim_pages(*paddr, npages, true))
> + return -EFAULT;
> +
> + goto done;
> + }
> +
> + /*
> + * Transition the pre-allocated buffer to hypervisor state before the access.
> + *
> + * This is because while changing the page state to firmware, the kernel unmaps
> + * the pages from the direct map, and to restore the direct map the pages must
> + * be transitioned back to the shared state.
> + */
> + if (snp_reclaim_pages(__pa(map->host), npages, true))
> + return -EFAULT;
> +
> + /* Copy the response data firmware buffer to the callers buffer. */
> + memcpy(__va(__sme_clr(map->paddr)), map->host, min_t(size_t, len, map->len));
> + *paddr = map->paddr;
> +
> +done:
> + map->active = false;
> + return 0;
> +}
> +
> +static bool sev_legacy_cmd_buf_writable(int cmd)
> +{
> + switch (cmd) {
> + case SEV_CMD_PLATFORM_STATUS:
> + case SEV_CMD_GUEST_STATUS:
> + case SEV_CMD_LAUNCH_START:
> + case SEV_CMD_RECEIVE_START:
> + case SEV_CMD_LAUNCH_MEASURE:
> + case SEV_CMD_SEND_START:
> + case SEV_CMD_SEND_UPDATE_DATA:
> + case SEV_CMD_SEND_UPDATE_VMSA:
> + case SEV_CMD_PEK_CSR:
> + case SEV_CMD_PDH_CERT_EXPORT:
> + case SEV_CMD_GET_ID:
> + case SEV_CMD_ATTESTATION_REPORT:
> + return true;
> + default:
> + return false;
> + }
> +}
> +
> +#define prep_buffer(name, addr, len, guest, map) \
> + func(&((typeof(name *))cmd_buf)->addr, ((typeof(name *))cmd_buf)->len, guest, map)
> +
> +static int __snp_cmd_buf_copy(int cmd, void *cmd_buf, bool to_fw, int fw_err)
> +{
> + int (*func)(u64 *paddr, u32 len, bool guest, struct snp_host_map *map);
> + struct sev_device *sev = psp_master->sev_data;
> + bool from_fw = !to_fw;
> +
> + /*
> + * After the command is completed, change the command buffer memory to
> + * hypervisor state.
> + *
> + * The immutable bit is automatically cleared by the firmware, so
> + * no not need to reclaim the page.
> + */
> + if (from_fw && sev_legacy_cmd_buf_writable(cmd)) {
> + if (rmp_mark_pages_shared(__pa(cmd_buf), 1))
> + return -EFAULT;

If we return here, we will skip calling unmap_firmware_writeable and
we will leak some pages in firmware state.

> +
> + /* No need to go further if firmware failed to execute command. */
> + if (fw_err)
> + return 0;
> + }
> +
> + if (to_fw)
> + func = map_firmware_writeable;
> + else
> + func = unmap_firmware_writeable;
> +
> + /*
> + * A command buffer may contains a system physical address. If the address
> + * points to a host memory then use an intermediate firmware page otherwise
> + * change the page state in the RMP table.
> + */
> + switch (cmd) {
> + case SEV_CMD_PDH_CERT_EXPORT:
> + if (prep_buffer(struct sev_data_pdh_cert_export, pdh_cert_address,
> + pdh_cert_len, false, &sev->snp_host_map[0]))
> + goto err;
> + if (prep_buffer(struct sev_data_pdh_cert_export, cert_chain_address,
> + cert_chain_len, false, &sev->snp_host_map[1]))
> + goto err;
> + break;
> + case SEV_CMD_GET_ID:
> + if (prep_buffer(struct sev_data_get_id, address, len,
> + false, &sev->snp_host_map[0]))
> + goto err;
> + break;
> + case SEV_CMD_PEK_CSR:
> + if (prep_buffer(struct sev_data_pek_csr, address, len,
> + false, &sev->snp_host_map[0]))
> + goto err;
> + break;
> + case SEV_CMD_LAUNCH_UPDATE_DATA:
> + if (prep_buffer(struct sev_data_launch_update_data, address, len,
> + true, &sev->snp_host_map[0]))
> + goto err;
> + break;
> + case SEV_CMD_LAUNCH_UPDATE_VMSA:
> + if (prep_buffer(struct sev_data_launch_update_vmsa, address, len,
> + true, &sev->snp_host_map[0]))
> + goto err;
> + break;
> + case SEV_CMD_LAUNCH_MEASURE:
> + if (prep_buffer(struct sev_data_launch_measure, address, len,
> + false, &sev->snp_host_map[0]))
> + goto err;
> + break;
> + case SEV_CMD_LAUNCH_UPDATE_SECRET:
> + if (prep_buffer(struct sev_data_launch_secret, guest_address, guest_len,
> + true, &sev->snp_host_map[0]))
> + goto err;
> + break;
> + case SEV_CMD_DBG_DECRYPT:
> + if (prep_buffer(struct sev_data_dbg, dst_addr, len, false,
> + &sev->snp_host_map[0]))
> + goto err;
> + break;
> + case SEV_CMD_DBG_ENCRYPT:
> + if (prep_buffer(struct sev_data_dbg, dst_addr, len, true,
> + &sev->snp_host_map[0]))
> + goto err;
> + break;
> + case SEV_CMD_ATTESTATION_REPORT:
> + if (prep_buffer(struct sev_data_attestation_report, address, len,
> + false, &sev->snp_host_map[0]))
> + goto err;
> + break;
> + case SEV_CMD_SEND_START:
> + if (prep_buffer(struct sev_data_send_start, session_address,
> + session_len, false, &sev->snp_host_map[0]))
> + goto err;
> + break;
> + case SEV_CMD_SEND_UPDATE_DATA:
> + if (prep_buffer(struct sev_data_send_update_data, hdr_address, hdr_len,
> + false, &sev->snp_host_map[0]))
> + goto err;
> + if (prep_buffer(struct sev_data_send_update_data, trans_address,
> + trans_len, false, &sev->snp_host_map[1]))
> + goto err;
> + break;
> + case SEV_CMD_SEND_UPDATE_VMSA:
> + if (prep_buffer(struct sev_data_send_update_vmsa, hdr_address, hdr_len,
> + false, &sev->snp_host_map[0]))
> + goto err;
> + if (prep_buffer(struct sev_data_send_update_vmsa, trans_address,
> + trans_len, false, &sev->snp_host_map[1]))
> + goto err;
> + break;
> + case SEV_CMD_RECEIVE_UPDATE_DATA:
> + if (prep_buffer(struct sev_data_receive_update_data, guest_address,
> + guest_len, true, &sev->snp_host_map[0]))
> + goto err;
> + break;
> + case SEV_CMD_RECEIVE_UPDATE_VMSA:
> + if (prep_buffer(struct sev_data_receive_update_vmsa, guest_address,
> + guest_len, true, &sev->snp_host_map[0]))
> + goto err;
> + break;
> + default:
> + break;
> + }
> +
> + /* The command buffer need to be in the firmware state. */
> + if (to_fw && sev_legacy_cmd_buf_writable(cmd)) {
> + if (rmp_mark_pages_firmware(__pa(cmd_buf), 1, true))
> + return -EFAULT;

This function moves two separate pages to firmware state. First
calling map_firmware_writeable and second calling
rmp_mark_pages_firmware for cmd_buf.
In case rmp_mark_pages_firmware fails for cmd_buf, the page which has
already moved to firmware state in map_firmware_writeable should be
reclaimed.
This is a problem especially if we leak a guest owned page in firmware
state. Since this is used only by legacy SEV VMs, these leaked pages
will never be reclaimed back when destroying these VMs.

>
> + }
> +
> + return 0;
> +
> +err:
> + return -EINVAL;
> +}
> +
> +static inline bool need_firmware_copy(int cmd)
> +{
> + struct sev_device *sev = psp_master->sev_data;
> +
> + /* After SNP is INIT'ed, the behavior of legacy SEV command is changed. */
> + return ((cmd < SEV_CMD_SNP_INIT) && sev->snp_initialized) ? true : false;
> +}
> +
> +static int snp_aware_copy_to_firmware(int cmd, void *data)
> +{
> + return __snp_cmd_buf_copy(cmd, data, true, 0);
> +}
> +
> +static int snp_aware_copy_from_firmware(int cmd, void *data, int fw_err)
> +{
> + return __snp_cmd_buf_copy(cmd, data, false, fw_err);
> +}
> +
> static int __sev_do_cmd_locked(int cmd, void *data, int *psp_ret)
> {
> struct psp_device *psp = psp_master;
> struct sev_device *sev;
> unsigned int phys_lsb, phys_msb;
> unsigned int reg, ret = 0;
> + void *cmd_buf;
> int buf_len;
>
> if (!psp || !psp->sev_data)
> @@ -512,12 +819,28 @@ static int __sev_do_cmd_locked(int cmd, void *data, int *psp_ret)
> * work for some memory, e.g. vmalloc'd addresses, and @data may not be
> * physically contiguous.
> */
> - if (data)
> - memcpy(sev->cmd_buf, data, buf_len);
> + if (data) {
> + if (sev->cmd_buf_active > 2)
> + return -EBUSY;
> +
> + cmd_buf = sev->cmd_buf_active ? sev->cmd_buf_backup : sev->cmd_buf;
> +
> + memcpy(cmd_buf, data, buf_len);
> + sev->cmd_buf_active++;
> +
> + /*
> + * The behavior of the SEV-legacy commands is altered when the
> + * SNP firmware is in the INIT state.
> + */
> + if (need_firmware_copy(cmd) && snp_aware_copy_to_firmware(cmd, sev->cmd_buf))
> + return -EFAULT;
> + } else {
> + cmd_buf = sev->cmd_buf;
> + }
>
> /* Get the physical address of the command buffer */
> - phys_lsb = data ? lower_32_bits(__psp_pa(sev->cmd_buf)) : 0;
> - phys_msb = data ? upper_32_bits(__psp_pa(sev->cmd_buf)) : 0;
> + phys_lsb = data ? lower_32_bits(__psp_pa(cmd_buf)) : 0;
> + phys_msb = data ? upper_32_bits(__psp_pa(cmd_buf)) : 0;
>
> dev_dbg(sev->dev, "sev command id %#x buffer 0x%08x%08x timeout %us\n",
> cmd, phys_msb, phys_lsb, psp_timeout);
> @@ -560,15 +883,24 @@ static int __sev_do_cmd_locked(int cmd, void *data, int *psp_ret)
> ret = sev_write_init_ex_file_if_required(cmd);
> }
>
> - print_hex_dump_debug("(out): ", DUMP_PREFIX_OFFSET, 16, 2, data,
> - buf_len, false);
> -
> /*
> * Copy potential output from the PSP back to data. Do this even on
> * failure in case the caller wants to glean something from the error.
> */
> - if (data)
> - memcpy(data, sev->cmd_buf, buf_len);
> + if (data) {
> + /*
> + * Restore the page state after the command completes.
> + */
> + if (need_firmware_copy(cmd) &&
> + snp_aware_copy_from_firmware(cmd, cmd_buf, ret))
> + return -EFAULT;
> +
> + memcpy(data, cmd_buf, buf_len);
> + sev->cmd_buf_active--;
> + }
> +
> + print_hex_dump_debug("(out): ", DUMP_PREFIX_OFFSET, 16, 2, data,
> + buf_len, false);
>
> return ret;
> }
> @@ -1579,10 +1911,12 @@ int sev_dev_init(struct psp_device *psp)
> if (!sev)
> goto e_err;
>
> - sev->cmd_buf = (void *)devm_get_free_pages(dev, GFP_KERNEL, 0);
> + sev->cmd_buf = (void *)devm_get_free_pages(dev, GFP_KERNEL, 1);
> if (!sev->cmd_buf)
> goto e_sev;
>
> + sev->cmd_buf_backup = (uint8_t *)sev->cmd_buf + PAGE_SIZE;
> +
> psp->sev_data = sev;
>
> sev->dev = dev;
> @@ -1648,6 +1982,12 @@ static void sev_firmware_shutdown(struct sev_device *sev)
> snp_range_list = NULL;
> }
>
> + /*
> + * The host map need to clear the immutable bit so it must be free'd before the
> + * SNP firmware shutdown.
> + */
> + free_snp_host_map(sev);
> +
> sev_snp_shutdown(&error);
> }
>
> @@ -1722,6 +2062,14 @@ void sev_pci_init(void)
> dev_err(sev->dev, "SEV-SNP: failed to INIT error %#x\n", error);
> }
> }
> +
> + /*
> + * Allocate the intermediate buffers used for the legacy command handling.
> + */
> + if (alloc_snp_host_map(sev)) {
> + dev_notice(sev->dev, "Failed to alloc host map (disabling legacy SEV)\n");
> + goto skip_legacy;
> + }
> }
>
> /* Obtain the TMR memory area for SEV-ES use */
> @@ -1739,12 +2087,14 @@ void sev_pci_init(void)
> dev_err(sev->dev, "SEV: failed to INIT error %#x, rc %d\n",
> error, rc);
>
> +skip_legacy:
> dev_info(sev->dev, "SEV%s API:%d.%d build:%d\n", sev->snp_initialized ?
> "-SNP" : "", sev->api_major, sev->api_minor, sev->build);
>
> return;
>
> err:
> + free_snp_host_map(sev);
> psp_master->sev_data = NULL;
> }
>
> diff --git a/drivers/crypto/ccp/sev-dev.h b/drivers/crypto/ccp/sev-dev.h
> index 34767657beb5..19d79f9d4212 100644
> --- a/drivers/crypto/ccp/sev-dev.h
> +++ b/drivers/crypto/ccp/sev-dev.h
> @@ -29,11 +29,20 @@
> #define SEV_CMDRESP_CMD_SHIFT 16
> #define SEV_CMDRESP_IOC BIT(0)
>
> +#define MAX_SNP_HOST_MAP_BUFS 2
> +
> struct sev_misc_dev {
> struct kref refcount;
> struct miscdevice misc;
> };
>
> +struct snp_host_map {
> + u64 paddr;
> + u32 len;
> + void *host;
> + bool active;
> +};
> +
> struct sev_device {
> struct device *dev;
> struct psp_device *psp;
> @@ -52,8 +61,11 @@ struct sev_device {
> u8 build;
>
> void *cmd_buf;
> + void *cmd_buf_backup;
> + int cmd_buf_active;
>
> bool snp_initialized;
> + struct snp_host_map snp_host_map[MAX_SNP_HOST_MAP_BUFS];
> };
>
> int sev_dev_init(struct psp_device *psp);
> --
> 2.25.1
>

2023-01-12 23:57:48

by Alper Gun

[permalink] [raw]
Subject: Re: [PATCH RFC v7 29/64] crypto: ccp: Handle the legacy SEV command when SNP is enabled

On Wed, Dec 14, 2022 at 11:54 AM Michael Roth <[email protected]> wrote:
>
> From: Brijesh Singh <[email protected]>
>
> The behavior of the SEV-legacy commands is altered when the SNP firmware
> is in the INIT state. When SNP is in INIT state, all the SEV-legacy
> commands that cause the firmware to write to memory must be in the
> firmware state before issuing the command..
>
> A command buffer may contains a system physical address that the firmware
> may write to. There are two cases that need to be handled:
>
> 1) system physical address points to a guest memory
> 2) system physical address points to a host memory
>
> To handle the case #1, change the page state to the firmware in the RMP
> table before issuing the command and restore the state to shared after the
> command completes.
>
> For the case #2, use a bounce buffer to complete the request.
>
> Signed-off-by: Brijesh Singh <[email protected]>
> Signed-off-by: Ashish Kalra <[email protected]>
> Signed-off-by: Michael Roth <[email protected]>
> ---
> drivers/crypto/ccp/sev-dev.c | 370 ++++++++++++++++++++++++++++++++++-
> drivers/crypto/ccp/sev-dev.h | 12 ++
> 2 files changed, 372 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
> index 4c12e98a1219..5eb2e8f364d4 100644
> --- a/drivers/crypto/ccp/sev-dev.c
> +++ b/drivers/crypto/ccp/sev-dev.c
> @@ -286,6 +286,30 @@ static int rmp_mark_pages_firmware(unsigned long paddr, unsigned int npages, boo
> return rc;
> }
>
> +static int rmp_mark_pages_shared(unsigned long paddr, unsigned int npages)
> +{
> + /* Cbit maybe set in the paddr */
> + unsigned long pfn = __sme_clr(paddr) >> PAGE_SHIFT;
> + int rc, n = 0, i;
> +
> + for (i = 0; i < npages; i++, pfn++, n++) {
> + rc = rmp_make_shared(pfn, PG_LEVEL_4K);
> + if (rc)
> + goto cleanup;
> + }
> +
> + return 0;
> +
> +cleanup:
> + /*
> + * If failed to change the page state to shared, then its not safe
> + * to release the page back to the system, leak it.
> + */
> + snp_mark_pages_offline(pfn, npages - n);
> +
> + return rc;
> +}
> +
> static struct page *__snp_alloc_firmware_pages(gfp_t gfp_mask, int order, bool locked)
> {
> unsigned long npages = 1ul << order, paddr;
> @@ -487,12 +511,295 @@ static int sev_write_init_ex_file_if_required(int cmd_id)
> return sev_write_init_ex_file();
> }
>
> +static int alloc_snp_host_map(struct sev_device *sev)
> +{
> + struct page *page;
> + int i;
> +
> + for (i = 0; i < MAX_SNP_HOST_MAP_BUFS; i++) {
> + struct snp_host_map *map = &sev->snp_host_map[i];
> +
> + memset(map, 0, sizeof(*map));
> +
> + page = alloc_pages(GFP_KERNEL_ACCOUNT, get_order(SEV_FW_BLOB_MAX_SIZE));
> + if (!page)
> + return -ENOMEM;
> +
> + map->host = page_address(page);
> + }
> +
> + return 0;
> +}
> +
> +static void free_snp_host_map(struct sev_device *sev)
> +{
> + int i;
> +
> + for (i = 0; i < MAX_SNP_HOST_MAP_BUFS; i++) {
> + struct snp_host_map *map = &sev->snp_host_map[i];
> +
> + if (map->host) {
> + __free_pages(virt_to_page(map->host), get_order(SEV_FW_BLOB_MAX_SIZE));
> + memset(map, 0, sizeof(*map));
> + }
> + }
> +}
> +
> +static int map_firmware_writeable(u64 *paddr, u32 len, bool guest, struct snp_host_map *map)
> +{
> + unsigned int npages = PAGE_ALIGN(len) >> PAGE_SHIFT;
> +
> + map->active = false;
> +
> + if (!paddr || !len)
> + return 0;
> +
> + map->paddr = *paddr;
> + map->len = len;
> +
> + /* If paddr points to a guest memory then change the page state to firmwware. */
> + if (guest) {
> + if (rmp_mark_pages_firmware(*paddr, npages, true))
> + return -EFAULT;
> +
> + goto done;
> + }
> +
> + if (!map->host)
> + return -ENOMEM;
> +
> + /* Check if the pre-allocated buffer can be used to fullfil the request. */
> + if (len > SEV_FW_BLOB_MAX_SIZE)
> + return -EINVAL;
> +
> + /* Transition the pre-allocated buffer to the firmware state. */
> + if (rmp_mark_pages_firmware(__pa(map->host), npages, true))
> + return -EFAULT;
> +
> + /* Set the paddr to use pre-allocated firmware buffer */
> + *paddr = __psp_pa(map->host);
> +
> +done:
> + map->active = true;
> + return 0;
> +}
> +
> +static int unmap_firmware_writeable(u64 *paddr, u32 len, bool guest, struct snp_host_map *map)
> +{
> + unsigned int npages = PAGE_ALIGN(len) >> PAGE_SHIFT;
> +
> + if (!map->active)
> + return 0;
> +
> + /* If paddr points to a guest memory then restore the page state to hypervisor. */
> + if (guest) {
> + if (snp_reclaim_pages(*paddr, npages, true))
> + return -EFAULT;
> +
> + goto done;
> + }
> +
> + /*
> + * Transition the pre-allocated buffer to hypervisor state before the access.
> + *
> + * This is because while changing the page state to firmware, the kernel unmaps
> + * the pages from the direct map, and to restore the direct map the pages must
> + * be transitioned back to the shared state.
> + */
> + if (snp_reclaim_pages(__pa(map->host), npages, true))
> + return -EFAULT;
> +
> + /* Copy the response data firmware buffer to the callers buffer. */
> + memcpy(__va(__sme_clr(map->paddr)), map->host, min_t(size_t, len, map->len));
> + *paddr = map->paddr;
> +
> +done:
> + map->active = false;
> + return 0;
> +}
> +
> +static bool sev_legacy_cmd_buf_writable(int cmd)
> +{
> + switch (cmd) {
> + case SEV_CMD_PLATFORM_STATUS:
> + case SEV_CMD_GUEST_STATUS:
> + case SEV_CMD_LAUNCH_START:
> + case SEV_CMD_RECEIVE_START:
> + case SEV_CMD_LAUNCH_MEASURE:
> + case SEV_CMD_SEND_START:
> + case SEV_CMD_SEND_UPDATE_DATA:
> + case SEV_CMD_SEND_UPDATE_VMSA:
> + case SEV_CMD_PEK_CSR:
> + case SEV_CMD_PDH_CERT_EXPORT:
> + case SEV_CMD_GET_ID:
> + case SEV_CMD_ATTESTATION_REPORT:
> + return true;
> + default:
> + return false;
> + }
> +}
> +
> +#define prep_buffer(name, addr, len, guest, map) \
> + func(&((typeof(name *))cmd_buf)->addr, ((typeof(name *))cmd_buf)->len, guest, map)
> +
> +static int __snp_cmd_buf_copy(int cmd, void *cmd_buf, bool to_fw, int fw_err)
> +{
> + int (*func)(u64 *paddr, u32 len, bool guest, struct snp_host_map *map);
> + struct sev_device *sev = psp_master->sev_data;
> + bool from_fw = !to_fw;
> +
> + /*
> + * After the command is completed, change the command buffer memory to
> + * hypervisor state.
> + *
> + * The immutable bit is automatically cleared by the firmware, so
> + * no not need to reclaim the page.
> + */
> + if (from_fw && sev_legacy_cmd_buf_writable(cmd)) {
> + if (rmp_mark_pages_shared(__pa(cmd_buf), 1))
> + return -EFAULT;
> +
> + /* No need to go further if firmware failed to execute command. */
> + if (fw_err)
> + return 0;
> + }
> +
> + if (to_fw)
> + func = map_firmware_writeable;
> + else
> + func = unmap_firmware_writeable;
> +
> + /*
> + * A command buffer may contains a system physical address. If the address
> + * points to a host memory then use an intermediate firmware page otherwise
> + * change the page state in the RMP table.
> + */
> + switch (cmd) {
> + case SEV_CMD_PDH_CERT_EXPORT:
> + if (prep_buffer(struct sev_data_pdh_cert_export, pdh_cert_address,
> + pdh_cert_len, false, &sev->snp_host_map[0]))
> + goto err;
> + if (prep_buffer(struct sev_data_pdh_cert_export, cert_chain_address,
> + cert_chain_len, false, &sev->snp_host_map[1]))
> + goto err;
> + break;
> + case SEV_CMD_GET_ID:
> + if (prep_buffer(struct sev_data_get_id, address, len,
> + false, &sev->snp_host_map[0]))
> + goto err;
> + break;
> + case SEV_CMD_PEK_CSR:
> + if (prep_buffer(struct sev_data_pek_csr, address, len,
> + false, &sev->snp_host_map[0]))
> + goto err;
> + break;
> + case SEV_CMD_LAUNCH_UPDATE_DATA:
> + if (prep_buffer(struct sev_data_launch_update_data, address, len,
> + true, &sev->snp_host_map[0]))
> + goto err;
> + break;
> + case SEV_CMD_LAUNCH_UPDATE_VMSA:
> + if (prep_buffer(struct sev_data_launch_update_vmsa, address, len,
> + true, &sev->snp_host_map[0]))
> + goto err;
> + break;
> + case SEV_CMD_LAUNCH_MEASURE:
> + if (prep_buffer(struct sev_data_launch_measure, address, len,
> + false, &sev->snp_host_map[0]))
> + goto err;
> + break;
> + case SEV_CMD_LAUNCH_UPDATE_SECRET:
> + if (prep_buffer(struct sev_data_launch_secret, guest_address, guest_len,
> + true, &sev->snp_host_map[0]))
> + goto err;
> + break;
> + case SEV_CMD_DBG_DECRYPT:
> + if (prep_buffer(struct sev_data_dbg, dst_addr, len, false,
> + &sev->snp_host_map[0]))
> + goto err;
> + break;
> + case SEV_CMD_DBG_ENCRYPT:
> + if (prep_buffer(struct sev_data_dbg, dst_addr, len, true,
> + &sev->snp_host_map[0]))
> + goto err;
> + break;
> + case SEV_CMD_ATTESTATION_REPORT:
> + if (prep_buffer(struct sev_data_attestation_report, address, len,
> + false, &sev->snp_host_map[0]))
> + goto err;
> + break;
> + case SEV_CMD_SEND_START:
> + if (prep_buffer(struct sev_data_send_start, session_address,
> + session_len, false, &sev->snp_host_map[0]))
> + goto err;
> + break;
> + case SEV_CMD_SEND_UPDATE_DATA:
> + if (prep_buffer(struct sev_data_send_update_data, hdr_address, hdr_len,
> + false, &sev->snp_host_map[0]))
> + goto err;
> + if (prep_buffer(struct sev_data_send_update_data, trans_address,
> + trans_len, false, &sev->snp_host_map[1]))
> + goto err;
> + break;
> + case SEV_CMD_SEND_UPDATE_VMSA:
> + if (prep_buffer(struct sev_data_send_update_vmsa, hdr_address, hdr_len,
> + false, &sev->snp_host_map[0]))
> + goto err;
> + if (prep_buffer(struct sev_data_send_update_vmsa, trans_address,
> + trans_len, false, &sev->snp_host_map[1]))
> + goto err;
> + break;
> + case SEV_CMD_RECEIVE_UPDATE_DATA:
> + if (prep_buffer(struct sev_data_receive_update_data, guest_address,
> + guest_len, true, &sev->snp_host_map[0]))
> + goto err;
> + break;
> + case SEV_CMD_RECEIVE_UPDATE_VMSA:
> + if (prep_buffer(struct sev_data_receive_update_vmsa, guest_address,
> + guest_len, true, &sev->snp_host_map[0]))
> + goto err;
> + break;
> + default:
> + break;
> + }
> +
> + /* The command buffer need to be in the firmware state. */
> + if (to_fw && sev_legacy_cmd_buf_writable(cmd)) {
> + if (rmp_mark_pages_firmware(__pa(cmd_buf), 1, true))
> + return -EFAULT;
> + }
> +
> + return 0;
> +
> +err:
> + return -EINVAL;
> +}
> +
> +static inline bool need_firmware_copy(int cmd)
> +{
> + struct sev_device *sev = psp_master->sev_data;
> +
> + /* After SNP is INIT'ed, the behavior of legacy SEV command is changed. */
> + return ((cmd < SEV_CMD_SNP_INIT) && sev->snp_initialized) ? true : false;
> +}
> +
> +static int snp_aware_copy_to_firmware(int cmd, void *data)
> +{
> + return __snp_cmd_buf_copy(cmd, data, true, 0);
> +}
> +
> +static int snp_aware_copy_from_firmware(int cmd, void *data, int fw_err)
> +{
> + return __snp_cmd_buf_copy(cmd, data, false, fw_err);
> +}
> +
> static int __sev_do_cmd_locked(int cmd, void *data, int *psp_ret)
> {
> struct psp_device *psp = psp_master;
> struct sev_device *sev;
> unsigned int phys_lsb, phys_msb;
> unsigned int reg, ret = 0;
> + void *cmd_buf;
> int buf_len;
>
> if (!psp || !psp->sev_data)
> @@ -512,12 +819,28 @@ static int __sev_do_cmd_locked(int cmd, void *data, int *psp_ret)
> * work for some memory, e.g. vmalloc'd addresses, and @data may not be
> * physically contiguous.
> */
> - if (data)
> - memcpy(sev->cmd_buf, data, buf_len);
> + if (data) {
> + if (sev->cmd_buf_active > 2)
> + return -EBUSY;
> +
> + cmd_buf = sev->cmd_buf_active ? sev->cmd_buf_backup : sev->cmd_buf;
> +
> + memcpy(cmd_buf, data, buf_len);
> + sev->cmd_buf_active++;
> +
> + /*
> + * The behavior of the SEV-legacy commands is altered when the
> + * SNP firmware is in the INIT state.
> + */
> + if (need_firmware_copy(cmd) && snp_aware_copy_to_firmware(cmd, sev->cmd_buf))
I believe this should be cmd_buf instead of sev->cmd_buf.
snp_aware_copy_to_firmware(cmd, cmd_buf)

> + return -EFAULT;
> + } else {
> + cmd_buf = sev->cmd_buf;
> + }
>
> /* Get the physical address of the command buffer */
> - phys_lsb = data ? lower_32_bits(__psp_pa(sev->cmd_buf)) : 0;
> - phys_msb = data ? upper_32_bits(__psp_pa(sev->cmd_buf)) : 0;
> + phys_lsb = data ? lower_32_bits(__psp_pa(cmd_buf)) : 0;
> + phys_msb = data ? upper_32_bits(__psp_pa(cmd_buf)) : 0;
>
> dev_dbg(sev->dev, "sev command id %#x buffer 0x%08x%08x timeout %us\n",
> cmd, phys_msb, phys_lsb, psp_timeout);
> @@ -560,15 +883,24 @@ static int __sev_do_cmd_locked(int cmd, void *data, int *psp_ret)
> ret = sev_write_init_ex_file_if_required(cmd);
> }
>
> - print_hex_dump_debug("(out): ", DUMP_PREFIX_OFFSET, 16, 2, data,
> - buf_len, false);
> -
> /*
> * Copy potential output from the PSP back to data. Do this even on
> * failure in case the caller wants to glean something from the error.
> */
> - if (data)
> - memcpy(data, sev->cmd_buf, buf_len);
> + if (data) {
> + /*
> + * Restore the page state after the command completes.
> + */
> + if (need_firmware_copy(cmd) &&
> + snp_aware_copy_from_firmware(cmd, cmd_buf, ret))
> + return -EFAULT;
> +
> + memcpy(data, cmd_buf, buf_len);
> + sev->cmd_buf_active--;
> + }
> +
> + print_hex_dump_debug("(out): ", DUMP_PREFIX_OFFSET, 16, 2, data,
> + buf_len, false);
>
> return ret;
> }
> @@ -1579,10 +1911,12 @@ int sev_dev_init(struct psp_device *psp)
> if (!sev)
> goto e_err;
>
> - sev->cmd_buf = (void *)devm_get_free_pages(dev, GFP_KERNEL, 0);
> + sev->cmd_buf = (void *)devm_get_free_pages(dev, GFP_KERNEL, 1);
> if (!sev->cmd_buf)
> goto e_sev;
>
> + sev->cmd_buf_backup = (uint8_t *)sev->cmd_buf + PAGE_SIZE;
> +
> psp->sev_data = sev;
>
> sev->dev = dev;
> @@ -1648,6 +1982,12 @@ static void sev_firmware_shutdown(struct sev_device *sev)
> snp_range_list = NULL;
> }
>
> + /*
> + * The host map need to clear the immutable bit so it must be free'd before the
> + * SNP firmware shutdown.
> + */
> + free_snp_host_map(sev);
> +
> sev_snp_shutdown(&error);
> }
>
> @@ -1722,6 +2062,14 @@ void sev_pci_init(void)
> dev_err(sev->dev, "SEV-SNP: failed to INIT error %#x\n", error);
> }
> }
> +
> + /*
> + * Allocate the intermediate buffers used for the legacy command handling.
> + */
> + if (alloc_snp_host_map(sev)) {
> + dev_notice(sev->dev, "Failed to alloc host map (disabling legacy SEV)\n");
> + goto skip_legacy;
> + }
> }
>
> /* Obtain the TMR memory area for SEV-ES use */
> @@ -1739,12 +2087,14 @@ void sev_pci_init(void)
> dev_err(sev->dev, "SEV: failed to INIT error %#x, rc %d\n",
> error, rc);
>
> +skip_legacy:
> dev_info(sev->dev, "SEV%s API:%d.%d build:%d\n", sev->snp_initialized ?
> "-SNP" : "", sev->api_major, sev->api_minor, sev->build);
>
> return;
>
> err:
> + free_snp_host_map(sev);
> psp_master->sev_data = NULL;
> }
>
> diff --git a/drivers/crypto/ccp/sev-dev.h b/drivers/crypto/ccp/sev-dev.h
> index 34767657beb5..19d79f9d4212 100644
> --- a/drivers/crypto/ccp/sev-dev.h
> +++ b/drivers/crypto/ccp/sev-dev.h
> @@ -29,11 +29,20 @@
> #define SEV_CMDRESP_CMD_SHIFT 16
> #define SEV_CMDRESP_IOC BIT(0)
>
> +#define MAX_SNP_HOST_MAP_BUFS 2
> +
> struct sev_misc_dev {
> struct kref refcount;
> struct miscdevice misc;
> };
>
> +struct snp_host_map {
> + u64 paddr;
> + u32 len;
> + void *host;
> + bool active;
> +};
> +
> struct sev_device {
> struct device *dev;
> struct psp_device *psp;
> @@ -52,8 +61,11 @@ struct sev_device {
> u8 build;
>
> void *cmd_buf;
> + void *cmd_buf_backup;
> + int cmd_buf_active;
>
> bool snp_initialized;
> + struct snp_host_map snp_host_map[MAX_SNP_HOST_MAP_BUFS];
> };
>
> int sev_dev_init(struct psp_device *psp);
> --
> 2.25.1
>

2023-01-13 14:29:42

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH RFC v7 03/64] KVM: SVM: Advertise private memory support to KVM

On Thu, Jan 05, 2023 at 12:17:41PM -0600, Michael Roth wrote:
> In the case of SEV, it would still be up to userspace whether or not it
> actually wants to make use of UPM functionality like KVM_SET_MEMORY_ATTRIBUTES
> and private memslots. Otherwise, to maintain backward-compatibility,
> userspace can do things as it has always done and continue running SEV without
> relying on private memslots/KVM_SET_MEMORY_ATTRIBUTES or any of the new ioctls.
>
> For SNP however it is required that userspace uses/implements UPM
> functionality.

Makes sense to me.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2023-01-13 14:54:06

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH RFC v7 04/64] KVM: x86: Add 'fault_is_private' x86 op

On Wed, Jan 04, 2023 at 08:42:56PM -0600, Michael Roth wrote:
> Obviously I need to add some proper documentation for this, but a 1
> return basically means 'private_fault' pass-by-ref arg has been set
> with the appropriate value, whereas 0 means "there's no platform-specific
> handling for this, so if you have some generic way to determine this
> then use that instead".

Still binary, tho, and can be bool, right?

I.e., you can just as well do:

if (static_call(kvm_x86_fault_is_private)(kvm, gpa, err, &private_fault))
goto out;

at the call site.

> This is mainly to handle CONFIG_HAVE_KVM_PRIVATE_MEM_TESTING, which
> just parrots whatever kvm_mem_is_private() returns to support running
> KVM selftests without needed hardware/platform support. If we don't
> take care to skip this check where the above fault_is_private() hook
> returns 1, then it ends up breaking SNP in cases where the kernel has
> been compiled with CONFIG_HAVE_KVM_PRIVATE_MEM_TESTING, since SNP
> relies on the page fault flags to make this determination, not
> kvm_mem_is_private(), which normally only tracks the memory attributes
> set by userspace via KVM_SET_MEMORY_ATTRIBUTES ioctl.

Some of that explanation belongs into the commit message, which is a bit
lacking...

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2023-01-13 16:11:07

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH RFC v7 04/64] KVM: x86: Add 'fault_is_private' x86 op

On Fri, Jan 13, 2023, Borislav Petkov wrote:
> On Wed, Jan 04, 2023 at 08:42:56PM -0600, Michael Roth wrote:
> > Obviously I need to add some proper documentation for this, but a 1
> > return basically means 'private_fault' pass-by-ref arg has been set
> > with the appropriate value, whereas 0 means "there's no platform-specific
> > handling for this, so if you have some generic way to determine this
> > then use that instead".
>
> Still binary, tho, and can be bool, right?
>
> I.e., you can just as well do:
>
> if (static_call(kvm_x86_fault_is_private)(kvm, gpa, err, &private_fault))
> goto out;
>
> at the call site.

Ya. Don't spend too much time trying to make this look super pretty though, there
are subtle bugs inherited from the base UPM series that need to be sorted out and
will impact this code. E.g. invoking kvm_mem_is_private() outside of the protection
of mmu_invalidate_seq means changes to the attributes may not be reflected in the
page tables.

I'm also hoping we can avoid a callback entirely, though that may prove to be
more pain than gain. I'm poking at the UPM and testing series right now, will
circle back to this and TDX in a few weeks to see if there's a sane way to communicate
shared vs. private without having to resort to a callback, and without having
races between page faults, KVM_SET_MEMORY_ATTRIBUTES, and KVM_SET_USER_MEMORY_REGION2.

> > This is mainly to handle CONFIG_HAVE_KVM_PRIVATE_MEM_TESTING, which
> > just parrots whatever kvm_mem_is_private() returns to support running
> > KVM selftests without needed hardware/platform support. If we don't
> > take care to skip this check where the above fault_is_private() hook
> > returns 1, then it ends up breaking SNP in cases where the kernel has
> > been compiled with CONFIG_HAVE_KVM_PRIVATE_MEM_TESTING, since SNP
> > relies on the page fault flags to make this determination, not
> > kvm_mem_is_private(), which normally only tracks the memory attributes
> > set by userspace via KVM_SET_MEMORY_ATTRIBUTES ioctl.
>
> Some of that explanation belongs into the commit message, which is a bit
> lacking...

I'll circle back to this too when I give this series (and TDX) a proper look,
there's got too be a better way to handle this.

2023-01-13 16:19:58

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH RFC v7 07/64] KVM: SEV: Handle KVM_HC_MAP_GPA_RANGE hypercall

On Wed, Dec 14, 2022 at 01:39:59PM -0600, Michael Roth wrote:
> From: Nikunj A Dadhania <[email protected]>
>
> KVM_HC_MAP_GPA_RANGE hypercall is used by the SEV guest to notify a
> change in the page encryption status to the hypervisor.
>
> The hypercall exits to userspace with KVM_EXIT_HYPERCALL exit code,
> currently this is used for explicit memory conversion between
> shared/private for memfd based private memory.

So Tom and I spent a while to figure out what this is doing...

Please explain in more detail what that is. Like the hypercall gets ignored for
memslots which cannot be private...?

And what's the story with supporting UPM with SEV{,-ES} guests?

In general, this text needs more background and why this is being done.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2023-01-13 16:28:59

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH RFC v7 07/64] KVM: SEV: Handle KVM_HC_MAP_GPA_RANGE hypercall

On Fri, Jan 13, 2023, Borislav Petkov wrote:
> On Wed, Dec 14, 2022 at 01:39:59PM -0600, Michael Roth wrote:
> > From: Nikunj A Dadhania <[email protected]>
> >
> > KVM_HC_MAP_GPA_RANGE hypercall is used by the SEV guest to notify a
> > change in the page encryption status to the hypervisor.
> >
> > The hypercall exits to userspace with KVM_EXIT_HYPERCALL exit code,
> > currently this is used for explicit memory conversion between
> > shared/private for memfd based private memory.
>
> So Tom and I spent a while to figure out what this is doing...
>
> Please explain in more detail what that is. Like the hypercall gets ignored for
> memslots which cannot be private...?

Don't bother, just drop the patch. It's perfectly legal for userspace to create
the private memslot in response to a guest request.

2023-01-13 19:01:22

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH RFC v7 08/64] KVM: Move kvm_for_each_memslot_in_hva_range() to be used in SVM

On Wed, Dec 14, 2022 at 01:40:00PM -0600, Michael Roth wrote:
> From: Nikunj A Dadhania <[email protected]>
>
> Move the macro to kvm_host.h and make if visible for SVM to use.

s/if/it/

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2023-01-13 22:09:34

by Kalra, Ashish

[permalink] [raw]
Subject: Re: [PATCH RFC v7 29/64] crypto: ccp: Handle the legacy SEV command when SNP is enabled

Hello Alper,

On 1/12/2023 2:47 PM, Alper Gun wrote:
> On Wed, Dec 14, 2022 at 11:54 AM Michael Roth <[email protected]> wrote:
>>
>> From: Brijesh Singh <[email protected]>
>>
>> The behavior of the SEV-legacy commands is altered when the SNP firmware
>> is in the INIT state. When SNP is in INIT state, all the SEV-legacy
>> commands that cause the firmware to write to memory must be in the
>> firmware state before issuing the command..
>>
>> A command buffer may contains a system physical address that the firmware
>> may write to. There are two cases that need to be handled:
>>
>> 1) system physical address points to a guest memory
>> 2) system physical address points to a host memory
>>
>> To handle the case #1, change the page state to the firmware in the RMP
>> table before issuing the command and restore the state to shared after the
>> command completes.
>>
>> For the case #2, use a bounce buffer to complete the request.
>>
>> Signed-off-by: Brijesh Singh <[email protected]>
>> Signed-off-by: Ashish Kalra <[email protected]>
>> Signed-off-by: Michael Roth <[email protected]>
>> ---
>> drivers/crypto/ccp/sev-dev.c | 370 ++++++++++++++++++++++++++++++++++-
>> drivers/crypto/ccp/sev-dev.h | 12 ++
>> 2 files changed, 372 insertions(+), 10 deletions(-)
>>
>> diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
>> index 4c12e98a1219..5eb2e8f364d4 100644
>> --- a/drivers/crypto/ccp/sev-dev.c
>> +++ b/drivers/crypto/ccp/sev-dev.c
>> @@ -286,6 +286,30 @@ static int rmp_mark_pages_firmware(unsigned long paddr, unsigned int npages, boo
>> return rc;
>> }
>>
>> +static int rmp_mark_pages_shared(unsigned long paddr, unsigned int npages)
>> +{
>> + /* Cbit maybe set in the paddr */
>> + unsigned long pfn = __sme_clr(paddr) >> PAGE_SHIFT;
>> + int rc, n = 0, i;
>> +
>> + for (i = 0; i < npages; i++, pfn++, n++) {
>> + rc = rmp_make_shared(pfn, PG_LEVEL_4K);
>> + if (rc)
>> + goto cleanup;
>> + }
>> +
>> + return 0;
>> +
>> +cleanup:
>> + /*
>> + * If failed to change the page state to shared, then its not safe
>> + * to release the page back to the system, leak it.
>> + */
>> + snp_mark_pages_offline(pfn, npages - n);
>> +
>> + return rc;
>> +}
>> +
>> static struct page *__snp_alloc_firmware_pages(gfp_t gfp_mask, int order, bool locked)
>> {
>> unsigned long npages = 1ul << order, paddr;
>> @@ -487,12 +511,295 @@ static int sev_write_init_ex_file_if_required(int cmd_id)
>> return sev_write_init_ex_file();
>> }
>>
>> +static int alloc_snp_host_map(struct sev_device *sev)
>> +{
>> + struct page *page;
>> + int i;
>> +
>> + for (i = 0; i < MAX_SNP_HOST_MAP_BUFS; i++) {
>> + struct snp_host_map *map = &sev->snp_host_map[i];
>> +
>> + memset(map, 0, sizeof(*map));
>> +
>> + page = alloc_pages(GFP_KERNEL_ACCOUNT, get_order(SEV_FW_BLOB_MAX_SIZE));
>> + if (!page)
>> + return -ENOMEM;
>> +
>> + map->host = page_address(page);
>> + }
>> +
>> + return 0;
>> +}
>> +
>> +static void free_snp_host_map(struct sev_device *sev)
>> +{
>> + int i;
>> +
>> + for (i = 0; i < MAX_SNP_HOST_MAP_BUFS; i++) {
>> + struct snp_host_map *map = &sev->snp_host_map[i];
>> +
>> + if (map->host) {
>> + __free_pages(virt_to_page(map->host), get_order(SEV_FW_BLOB_MAX_SIZE));
>> + memset(map, 0, sizeof(*map));
>> + }
>> + }
>> +}
>> +
>> +static int map_firmware_writeable(u64 *paddr, u32 len, bool guest, struct snp_host_map *map)
>> +{
>> + unsigned int npages = PAGE_ALIGN(len) >> PAGE_SHIFT;
>> +
>> + map->active = false;
>> +
>> + if (!paddr || !len)
>> + return 0;
>> +
>> + map->paddr = *paddr;
>> + map->len = len;
>> +
>> + /* If paddr points to a guest memory then change the page state to firmwware. */
>> + if (guest) {
>> + if (rmp_mark_pages_firmware(*paddr, npages, true))
>> + return -EFAULT;
>> +
>> + goto done;
>> + }
>> +
>> + if (!map->host)
>> + return -ENOMEM;
>> +
>> + /* Check if the pre-allocated buffer can be used to fullfil the request. */
>> + if (len > SEV_FW_BLOB_MAX_SIZE)
>> + return -EINVAL;
>> +
>> + /* Transition the pre-allocated buffer to the firmware state. */
>> + if (rmp_mark_pages_firmware(__pa(map->host), npages, true))
>> + return -EFAULT;
>> +
>> + /* Set the paddr to use pre-allocated firmware buffer */
>> + *paddr = __psp_pa(map->host);
>> +
>> +done:
>> + map->active = true;
>> + return 0;
>> +}
>> +
>> +static int unmap_firmware_writeable(u64 *paddr, u32 len, bool guest, struct snp_host_map *map)
>> +{
>> + unsigned int npages = PAGE_ALIGN(len) >> PAGE_SHIFT;
>> +
>> + if (!map->active)
>> + return 0;
>> +
>> + /* If paddr points to a guest memory then restore the page state to hypervisor. */
>> + if (guest) {
>> + if (snp_reclaim_pages(*paddr, npages, true))
>> + return -EFAULT;
>> +
>> + goto done;
>> + }
>> +
>> + /*
>> + * Transition the pre-allocated buffer to hypervisor state before the access.
>> + *
>> + * This is because while changing the page state to firmware, the kernel unmaps
>> + * the pages from the direct map, and to restore the direct map the pages must
>> + * be transitioned back to the shared state.
>> + */
>> + if (snp_reclaim_pages(__pa(map->host), npages, true))
>> + return -EFAULT;
>> +
>> + /* Copy the response data firmware buffer to the callers buffer. */
>> + memcpy(__va(__sme_clr(map->paddr)), map->host, min_t(size_t, len, map->len));
>> + *paddr = map->paddr;
>> +
>> +done:
>> + map->active = false;
>> + return 0;
>> +}
>> +
>> +static bool sev_legacy_cmd_buf_writable(int cmd)
>> +{
>> + switch (cmd) {
>> + case SEV_CMD_PLATFORM_STATUS:
>> + case SEV_CMD_GUEST_STATUS:
>> + case SEV_CMD_LAUNCH_START:
>> + case SEV_CMD_RECEIVE_START:
>> + case SEV_CMD_LAUNCH_MEASURE:
>> + case SEV_CMD_SEND_START:
>> + case SEV_CMD_SEND_UPDATE_DATA:
>> + case SEV_CMD_SEND_UPDATE_VMSA:
>> + case SEV_CMD_PEK_CSR:
>> + case SEV_CMD_PDH_CERT_EXPORT:
>> + case SEV_CMD_GET_ID:
>> + case SEV_CMD_ATTESTATION_REPORT:
>> + return true;
>> + default:
>> + return false;
>> + }
>> +}
>> +
>> +#define prep_buffer(name, addr, len, guest, map) \
>> + func(&((typeof(name *))cmd_buf)->addr, ((typeof(name *))cmd_buf)->len, guest, map)
>> +
>> +static int __snp_cmd_buf_copy(int cmd, void *cmd_buf, bool to_fw, int fw_err)
>> +{
>> + int (*func)(u64 *paddr, u32 len, bool guest, struct snp_host_map *map);
>> + struct sev_device *sev = psp_master->sev_data;
>> + bool from_fw = !to_fw;
>> +
>> + /*
>> + * After the command is completed, change the command buffer memory to
>> + * hypervisor state.
>> + *
>> + * The immutable bit is automatically cleared by the firmware, so
>> + * no not need to reclaim the page.
>> + */
>> + if (from_fw && sev_legacy_cmd_buf_writable(cmd)) {
>> + if (rmp_mark_pages_shared(__pa(cmd_buf), 1))
>> + return -EFAULT;
>
> If we return here, we will skip calling unmap_firmware_writeable and
> we will leak some pages in firmware state.

Do you mean those (guest) pages which were transitioned to firmware
state as part of
snp_aware_copy_to_firmware()->_snp_cmd_buf_copy()->map_firmware_writeable()?

>
>> +
>> + /* No need to go further if firmware failed to execute command. */
>> + if (fw_err)
>> + return 0;
>> + }
>> +
>> + if (to_fw)
>> + func = map_firmware_writeable;
>> + else
>> + func = unmap_firmware_writeable;
>> +
>> + /*
>> + * A command buffer may contains a system physical address. If the address
>> + * points to a host memory then use an intermediate firmware page otherwise
>> + * change the page state in the RMP table.
>> + */
>> + switch (cmd) {
>> + case SEV_CMD_PDH_CERT_EXPORT:
>> + if (prep_buffer(struct sev_data_pdh_cert_export, pdh_cert_address,
>> + pdh_cert_len, false, &sev->snp_host_map[0]))
>> + goto err;
>> + if (prep_buffer(struct sev_data_pdh_cert_export, cert_chain_address,
>> + cert_chain_len, false, &sev->snp_host_map[1]))
>> + goto err;
>> + break;
>> + case SEV_CMD_GET_ID:
>> + if (prep_buffer(struct sev_data_get_id, address, len,
>> + false, &sev->snp_host_map[0]))
>> + goto err;
>> + break;
>> + case SEV_CMD_PEK_CSR:
>> + if (prep_buffer(struct sev_data_pek_csr, address, len,
>> + false, &sev->snp_host_map[0]))
>> + goto err;
>> + break;
>> + case SEV_CMD_LAUNCH_UPDATE_DATA:
>> + if (prep_buffer(struct sev_data_launch_update_data, address, len,
>> + true, &sev->snp_host_map[0]))
>> + goto err;
>> + break;
>> + case SEV_CMD_LAUNCH_UPDATE_VMSA:
>> + if (prep_buffer(struct sev_data_launch_update_vmsa, address, len,
>> + true, &sev->snp_host_map[0]))
>> + goto err;
>> + break;
>> + case SEV_CMD_LAUNCH_MEASURE:
>> + if (prep_buffer(struct sev_data_launch_measure, address, len,
>> + false, &sev->snp_host_map[0]))
>> + goto err;
>> + break;
>> + case SEV_CMD_LAUNCH_UPDATE_SECRET:
>> + if (prep_buffer(struct sev_data_launch_secret, guest_address, guest_len,
>> + true, &sev->snp_host_map[0]))
>> + goto err;
>> + break;
>> + case SEV_CMD_DBG_DECRYPT:
>> + if (prep_buffer(struct sev_data_dbg, dst_addr, len, false,
>> + &sev->snp_host_map[0]))
>> + goto err;
>> + break;
>> + case SEV_CMD_DBG_ENCRYPT:
>> + if (prep_buffer(struct sev_data_dbg, dst_addr, len, true,
>> + &sev->snp_host_map[0]))
>> + goto err;
>> + break;
>> + case SEV_CMD_ATTESTATION_REPORT:
>> + if (prep_buffer(struct sev_data_attestation_report, address, len,
>> + false, &sev->snp_host_map[0]))
>> + goto err;
>> + break;
>> + case SEV_CMD_SEND_START:
>> + if (prep_buffer(struct sev_data_send_start, session_address,
>> + session_len, false, &sev->snp_host_map[0]))
>> + goto err;
>> + break;
>> + case SEV_CMD_SEND_UPDATE_DATA:
>> + if (prep_buffer(struct sev_data_send_update_data, hdr_address, hdr_len,
>> + false, &sev->snp_host_map[0]))
>> + goto err;
>> + if (prep_buffer(struct sev_data_send_update_data, trans_address,
>> + trans_len, false, &sev->snp_host_map[1]))
>> + goto err;
>> + break;
>> + case SEV_CMD_SEND_UPDATE_VMSA:
>> + if (prep_buffer(struct sev_data_send_update_vmsa, hdr_address, hdr_len,
>> + false, &sev->snp_host_map[0]))
>> + goto err;
>> + if (prep_buffer(struct sev_data_send_update_vmsa, trans_address,
>> + trans_len, false, &sev->snp_host_map[1]))
>> + goto err;
>> + break;
>> + case SEV_CMD_RECEIVE_UPDATE_DATA:
>> + if (prep_buffer(struct sev_data_receive_update_data, guest_address,
>> + guest_len, true, &sev->snp_host_map[0]))
>> + goto err;
>> + break;
>> + case SEV_CMD_RECEIVE_UPDATE_VMSA:
>> + if (prep_buffer(struct sev_data_receive_update_vmsa, guest_address,
>> + guest_len, true, &sev->snp_host_map[0]))
>> + goto err;
>> + break;
>> + default:
>> + break;
>> + }
>> +
>> + /* The command buffer need to be in the firmware state. */
>> + if (to_fw && sev_legacy_cmd_buf_writable(cmd)) {
>> + if (rmp_mark_pages_firmware(__pa(cmd_buf), 1, true))
>> + return -EFAULT;
>
> This function moves two separate pages to firmware state. First
> calling map_firmware_writeable and second calling
> rmp_mark_pages_firmware for cmd_buf.
> In case rmp_mark_pages_firmware fails for cmd_buf, the page which has
> already moved to firmware state in map_firmware_writeable should be
> reclaimed.
> This is a problem especially if we leak a guest owned page in firmware
> state. Since this is used only by legacy SEV VMs, these leaked pages
> will never be reclaimed back when destroying these VMs.
>

Yes, this looks to be an inherent issue with the original patch, as you
mentioned there are two pages - guest owned page and the HV cmd_buf, and
failure to transition the cmd_buf back to HV/shared state has no
corresponding recovery/reclaim for the transitioned guest page.

Thanks,
Ashish

>>
>> + }
>> +
>> + return 0;
>> +
>> +err:
>> + return -EINVAL;
>> +}
>> +
>> +static inline bool need_firmware_copy(int cmd)
>> +{
>> + struct sev_device *sev = psp_master->sev_data;
>> +
>> + /* After SNP is INIT'ed, the behavior of legacy SEV command is changed. */
>> + return ((cmd < SEV_CMD_SNP_INIT) && sev->snp_initialized) ? true : false;
>> +}
>> +
>> +static int snp_aware_copy_to_firmware(int cmd, void *data)
>> +{
>> + return __snp_cmd_buf_copy(cmd, data, true, 0);
>> +}
>> +
>> +static int snp_aware_copy_from_firmware(int cmd, void *data, int fw_err)
>> +{
>> + return __snp_cmd_buf_copy(cmd, data, false, fw_err);
>> +}
>> +
>> static int __sev_do_cmd_locked(int cmd, void *data, int *psp_ret)
>> {
>> struct psp_device *psp = psp_master;
>> struct sev_device *sev;
>> unsigned int phys_lsb, phys_msb;
>> unsigned int reg, ret = 0;
>> + void *cmd_buf;
>> int buf_len;
>>
>> if (!psp || !psp->sev_data)
>> @@ -512,12 +819,28 @@ static int __sev_do_cmd_locked(int cmd, void *data, int *psp_ret)
>> * work for some memory, e.g. vmalloc'd addresses, and @data may not be
>> * physically contiguous.
>> */
>> - if (data)
>> - memcpy(sev->cmd_buf, data, buf_len);
>> + if (data) {
>> + if (sev->cmd_buf_active > 2)
>> + return -EBUSY;
>> +
>> + cmd_buf = sev->cmd_buf_active ? sev->cmd_buf_backup : sev->cmd_buf;
>> +
>> + memcpy(cmd_buf, data, buf_len);
>> + sev->cmd_buf_active++;
>> +
>> + /*
>> + * The behavior of the SEV-legacy commands is altered when the
>> + * SNP firmware is in the INIT state.
>> + */
>> + if (need_firmware_copy(cmd) && snp_aware_copy_to_firmware(cmd, sev->cmd_buf))
>> + return -EFAULT;
>> + } else {
>> + cmd_buf = sev->cmd_buf;
>> + }
>>
>> /* Get the physical address of the command buffer */
>> - phys_lsb = data ? lower_32_bits(__psp_pa(sev->cmd_buf)) : 0;
>> - phys_msb = data ? upper_32_bits(__psp_pa(sev->cmd_buf)) : 0;
>> + phys_lsb = data ? lower_32_bits(__psp_pa(cmd_buf)) : 0;
>> + phys_msb = data ? upper_32_bits(__psp_pa(cmd_buf)) : 0;
>>
>> dev_dbg(sev->dev, "sev command id %#x buffer 0x%08x%08x timeout %us\n",
>> cmd, phys_msb, phys_lsb, psp_timeout);
>> @@ -560,15 +883,24 @@ static int __sev_do_cmd_locked(int cmd, void *data, int *psp_ret)
>> ret = sev_write_init_ex_file_if_required(cmd);
>> }
>>
>> - print_hex_dump_debug("(out): ", DUMP_PREFIX_OFFSET, 16, 2, data,
>> - buf_len, false);
>> -
>> /*
>> * Copy potential output from the PSP back to data. Do this even on
>> * failure in case the caller wants to glean something from the error.
>> */
>> - if (data)
>> - memcpy(data, sev->cmd_buf, buf_len);
>> + if (data) {
>> + /*
>> + * Restore the page state after the command completes.
>> + */
>> + if (need_firmware_copy(cmd) &&
>> + snp_aware_copy_from_firmware(cmd, cmd_buf, ret))
>> + return -EFAULT;
>> +
>> + memcpy(data, cmd_buf, buf_len);
>> + sev->cmd_buf_active--;
>> + }
>> +
>> + print_hex_dump_debug("(out): ", DUMP_PREFIX_OFFSET, 16, 2, data,
>> + buf_len, false);
>>
>> return ret;
>> }
>> @@ -1579,10 +1911,12 @@ int sev_dev_init(struct psp_device *psp)
>> if (!sev)
>> goto e_err;
>>
>> - sev->cmd_buf = (void *)devm_get_free_pages(dev, GFP_KERNEL, 0);
>> + sev->cmd_buf = (void *)devm_get_free_pages(dev, GFP_KERNEL, 1);
>> if (!sev->cmd_buf)
>> goto e_sev;
>>
>> + sev->cmd_buf_backup = (uint8_t *)sev->cmd_buf + PAGE_SIZE;
>> +
>> psp->sev_data = sev;
>>
>> sev->dev = dev;
>> @@ -1648,6 +1982,12 @@ static void sev_firmware_shutdown(struct sev_device *sev)
>> snp_range_list = NULL;
>> }
>>
>> + /*
>> + * The host map need to clear the immutable bit so it must be free'd before the
>> + * SNP firmware shutdown.
>> + */
>> + free_snp_host_map(sev);
>> +
>> sev_snp_shutdown(&error);
>> }
>>
>> @@ -1722,6 +2062,14 @@ void sev_pci_init(void)
>> dev_err(sev->dev, "SEV-SNP: failed to INIT error %#x\n", error);
>> }
>> }
>> +
>> + /*
>> + * Allocate the intermediate buffers used for the legacy command handling.
>> + */
>> + if (alloc_snp_host_map(sev)) {
>> + dev_notice(sev->dev, "Failed to alloc host map (disabling legacy SEV)\n");
>> + goto skip_legacy;
>> + }
>> }
>>
>> /* Obtain the TMR memory area for SEV-ES use */
>> @@ -1739,12 +2087,14 @@ void sev_pci_init(void)
>> dev_err(sev->dev, "SEV: failed to INIT error %#x, rc %d\n",
>> error, rc);
>>
>> +skip_legacy:
>> dev_info(sev->dev, "SEV%s API:%d.%d build:%d\n", sev->snp_initialized ?
>> "-SNP" : "", sev->api_major, sev->api_minor, sev->build);
>>
>> return;
>>
>> err:
>> + free_snp_host_map(sev);
>> psp_master->sev_data = NULL;
>> }
>>
>> diff --git a/drivers/crypto/ccp/sev-dev.h b/drivers/crypto/ccp/sev-dev.h
>> index 34767657beb5..19d79f9d4212 100644
>> --- a/drivers/crypto/ccp/sev-dev.h
>> +++ b/drivers/crypto/ccp/sev-dev.h
>> @@ -29,11 +29,20 @@
>> #define SEV_CMDRESP_CMD_SHIFT 16
>> #define SEV_CMDRESP_IOC BIT(0)
>>
>> +#define MAX_SNP_HOST_MAP_BUFS 2
>> +
>> struct sev_misc_dev {
>> struct kref refcount;
>> struct miscdevice misc;
>> };
>>
>> +struct snp_host_map {
>> + u64 paddr;
>> + u32 len;
>> + void *host;
>> + bool active;
>> +};
>> +
>> struct sev_device {
>> struct device *dev;
>> struct psp_device *psp;
>> @@ -52,8 +61,11 @@ struct sev_device {
>> u8 build;
>>
>> void *cmd_buf;
>> + void *cmd_buf_backup;
>> + int cmd_buf_active;
>>
>> bool snp_initialized;
>> + struct snp_host_map snp_host_map[MAX_SNP_HOST_MAP_BUFS];
>> };
>>
>> int sev_dev_init(struct psp_device *psp);
>> --
>> 2.25.1
>>

2023-01-13 22:35:50

by Kalra, Ashish

[permalink] [raw]
Subject: Re: [PATCH RFC v7 29/64] crypto: ccp: Handle the legacy SEV command when SNP is enabled

Hello Alper,

On 1/12/2023 5:45 PM, Alper Gun wrote:
> On Wed, Dec 14, 2022 at 11:54 AM Michael Roth <[email protected]> wrote:
>>
>> From: Brijesh Singh <[email protected]>
>>
>> The behavior of the SEV-legacy commands is altered when the SNP firmware
>> is in the INIT state. When SNP is in INIT state, all the SEV-legacy
>> commands that cause the firmware to write to memory must be in the
>> firmware state before issuing the command..
>>
>> A command buffer may contains a system physical address that the firmware
>> may write to. There are two cases that need to be handled:
>>
>> 1) system physical address points to a guest memory
>> 2) system physical address points to a host memory
>>
>> To handle the case #1, change the page state to the firmware in the RMP
>> table before issuing the command and restore the state to shared after the
>> command completes.
>>
>> For the case #2, use a bounce buffer to complete the request.
>>
>> Signed-off-by: Brijesh Singh <[email protected]>
>> Signed-off-by: Ashish Kalra <[email protected]>
>> Signed-off-by: Michael Roth <[email protected]>
>> ---
>> drivers/crypto/ccp/sev-dev.c | 370 ++++++++++++++++++++++++++++++++++-
>> drivers/crypto/ccp/sev-dev.h | 12 ++
>> 2 files changed, 372 insertions(+), 10 deletions(-)
>>
>> diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
>> index 4c12e98a1219..5eb2e8f364d4 100644
>> --- a/drivers/crypto/ccp/sev-dev.c
>> +++ b/drivers/crypto/ccp/sev-dev.c
>> @@ -286,6 +286,30 @@ static int rmp_mark_pages_firmware(unsigned long paddr, unsigned int npages, boo
>> return rc;
>> }
>>
>> +static int rmp_mark_pages_shared(unsigned long paddr, unsigned int npages)
>> +{
>> + /* Cbit maybe set in the paddr */
>> + unsigned long pfn = __sme_clr(paddr) >> PAGE_SHIFT;
>> + int rc, n = 0, i;
>> +
>> + for (i = 0; i < npages; i++, pfn++, n++) {
>> + rc = rmp_make_shared(pfn, PG_LEVEL_4K);
>> + if (rc)
>> + goto cleanup;
>> + }
>> +
>> + return 0;
>> +
>> +cleanup:
>> + /*
>> + * If failed to change the page state to shared, then its not safe
>> + * to release the page back to the system, leak it.
>> + */
>> + snp_mark_pages_offline(pfn, npages - n);
>> +
>> + return rc;
>> +}
>> +
>> static struct page *__snp_alloc_firmware_pages(gfp_t gfp_mask, int order, bool locked)
>> {
>> unsigned long npages = 1ul << order, paddr;
>> @@ -487,12 +511,295 @@ static int sev_write_init_ex_file_if_required(int cmd_id)
>> return sev_write_init_ex_file();
>> }
>>
>> +static int alloc_snp_host_map(struct sev_device *sev)
>> +{
>> + struct page *page;
>> + int i;
>> +
>> + for (i = 0; i < MAX_SNP_HOST_MAP_BUFS; i++) {
>> + struct snp_host_map *map = &sev->snp_host_map[i];
>> +
>> + memset(map, 0, sizeof(*map));
>> +
>> + page = alloc_pages(GFP_KERNEL_ACCOUNT, get_order(SEV_FW_BLOB_MAX_SIZE));
>> + if (!page)
>> + return -ENOMEM;
>> +
>> + map->host = page_address(page);
>> + }
>> +
>> + return 0;
>> +}
>> +
>> +static void free_snp_host_map(struct sev_device *sev)
>> +{
>> + int i;
>> +
>> + for (i = 0; i < MAX_SNP_HOST_MAP_BUFS; i++) {
>> + struct snp_host_map *map = &sev->snp_host_map[i];
>> +
>> + if (map->host) {
>> + __free_pages(virt_to_page(map->host), get_order(SEV_FW_BLOB_MAX_SIZE));
>> + memset(map, 0, sizeof(*map));
>> + }
>> + }
>> +}
>> +
>> +static int map_firmware_writeable(u64 *paddr, u32 len, bool guest, struct snp_host_map *map)
>> +{
>> + unsigned int npages = PAGE_ALIGN(len) >> PAGE_SHIFT;
>> +
>> + map->active = false;
>> +
>> + if (!paddr || !len)
>> + return 0;
>> +
>> + map->paddr = *paddr;
>> + map->len = len;
>> +
>> + /* If paddr points to a guest memory then change the page state to firmwware. */
>> + if (guest) {
>> + if (rmp_mark_pages_firmware(*paddr, npages, true))
>> + return -EFAULT;
>> +
>> + goto done;
>> + }
>> +
>> + if (!map->host)
>> + return -ENOMEM;
>> +
>> + /* Check if the pre-allocated buffer can be used to fullfil the request. */
>> + if (len > SEV_FW_BLOB_MAX_SIZE)
>> + return -EINVAL;
>> +
>> + /* Transition the pre-allocated buffer to the firmware state. */
>> + if (rmp_mark_pages_firmware(__pa(map->host), npages, true))
>> + return -EFAULT;
>> +
>> + /* Set the paddr to use pre-allocated firmware buffer */
>> + *paddr = __psp_pa(map->host);
>> +
>> +done:
>> + map->active = true;
>> + return 0;
>> +}
>> +
>> +static int unmap_firmware_writeable(u64 *paddr, u32 len, bool guest, struct snp_host_map *map)
>> +{
>> + unsigned int npages = PAGE_ALIGN(len) >> PAGE_SHIFT;
>> +
>> + if (!map->active)
>> + return 0;
>> +
>> + /* If paddr points to a guest memory then restore the page state to hypervisor. */
>> + if (guest) {
>> + if (snp_reclaim_pages(*paddr, npages, true))
>> + return -EFAULT;
>> +
>> + goto done;
>> + }
>> +
>> + /*
>> + * Transition the pre-allocated buffer to hypervisor state before the access.
>> + *
>> + * This is because while changing the page state to firmware, the kernel unmaps
>> + * the pages from the direct map, and to restore the direct map the pages must
>> + * be transitioned back to the shared state.
>> + */
>> + if (snp_reclaim_pages(__pa(map->host), npages, true))
>> + return -EFAULT;
>> +
>> + /* Copy the response data firmware buffer to the callers buffer. */
>> + memcpy(__va(__sme_clr(map->paddr)), map->host, min_t(size_t, len, map->len));
>> + *paddr = map->paddr;
>> +
>> +done:
>> + map->active = false;
>> + return 0;
>> +}
>> +
>> +static bool sev_legacy_cmd_buf_writable(int cmd)
>> +{
>> + switch (cmd) {
>> + case SEV_CMD_PLATFORM_STATUS:
>> + case SEV_CMD_GUEST_STATUS:
>> + case SEV_CMD_LAUNCH_START:
>> + case SEV_CMD_RECEIVE_START:
>> + case SEV_CMD_LAUNCH_MEASURE:
>> + case SEV_CMD_SEND_START:
>> + case SEV_CMD_SEND_UPDATE_DATA:
>> + case SEV_CMD_SEND_UPDATE_VMSA:
>> + case SEV_CMD_PEK_CSR:
>> + case SEV_CMD_PDH_CERT_EXPORT:
>> + case SEV_CMD_GET_ID:
>> + case SEV_CMD_ATTESTATION_REPORT:
>> + return true;
>> + default:
>> + return false;
>> + }
>> +}
>> +
>> +#define prep_buffer(name, addr, len, guest, map) \
>> + func(&((typeof(name *))cmd_buf)->addr, ((typeof(name *))cmd_buf)->len, guest, map)
>> +
>> +static int __snp_cmd_buf_copy(int cmd, void *cmd_buf, bool to_fw, int fw_err)
>> +{
>> + int (*func)(u64 *paddr, u32 len, bool guest, struct snp_host_map *map);
>> + struct sev_device *sev = psp_master->sev_data;
>> + bool from_fw = !to_fw;
>> +
>> + /*
>> + * After the command is completed, change the command buffer memory to
>> + * hypervisor state.
>> + *
>> + * The immutable bit is automatically cleared by the firmware, so
>> + * no not need to reclaim the page.
>> + */
>> + if (from_fw && sev_legacy_cmd_buf_writable(cmd)) {
>> + if (rmp_mark_pages_shared(__pa(cmd_buf), 1))
>> + return -EFAULT;
>> +
>> + /* No need to go further if firmware failed to execute command. */
>> + if (fw_err)
>> + return 0;
>> + }
>> +
>> + if (to_fw)
>> + func = map_firmware_writeable;
>> + else
>> + func = unmap_firmware_writeable;
>> +
>> + /*
>> + * A command buffer may contains a system physical address. If the address
>> + * points to a host memory then use an intermediate firmware page otherwise
>> + * change the page state in the RMP table.
>> + */
>> + switch (cmd) {
>> + case SEV_CMD_PDH_CERT_EXPORT:
>> + if (prep_buffer(struct sev_data_pdh_cert_export, pdh_cert_address,
>> + pdh_cert_len, false, &sev->snp_host_map[0]))
>> + goto err;
>> + if (prep_buffer(struct sev_data_pdh_cert_export, cert_chain_address,
>> + cert_chain_len, false, &sev->snp_host_map[1]))
>> + goto err;
>> + break;
>> + case SEV_CMD_GET_ID:
>> + if (prep_buffer(struct sev_data_get_id, address, len,
>> + false, &sev->snp_host_map[0]))
>> + goto err;
>> + break;
>> + case SEV_CMD_PEK_CSR:
>> + if (prep_buffer(struct sev_data_pek_csr, address, len,
>> + false, &sev->snp_host_map[0]))
>> + goto err;
>> + break;
>> + case SEV_CMD_LAUNCH_UPDATE_DATA:
>> + if (prep_buffer(struct sev_data_launch_update_data, address, len,
>> + true, &sev->snp_host_map[0]))
>> + goto err;
>> + break;
>> + case SEV_CMD_LAUNCH_UPDATE_VMSA:
>> + if (prep_buffer(struct sev_data_launch_update_vmsa, address, len,
>> + true, &sev->snp_host_map[0]))
>> + goto err;
>> + break;
>> + case SEV_CMD_LAUNCH_MEASURE:
>> + if (prep_buffer(struct sev_data_launch_measure, address, len,
>> + false, &sev->snp_host_map[0]))
>> + goto err;
>> + break;
>> + case SEV_CMD_LAUNCH_UPDATE_SECRET:
>> + if (prep_buffer(struct sev_data_launch_secret, guest_address, guest_len,
>> + true, &sev->snp_host_map[0]))
>> + goto err;
>> + break;
>> + case SEV_CMD_DBG_DECRYPT:
>> + if (prep_buffer(struct sev_data_dbg, dst_addr, len, false,
>> + &sev->snp_host_map[0]))
>> + goto err;
>> + break;
>> + case SEV_CMD_DBG_ENCRYPT:
>> + if (prep_buffer(struct sev_data_dbg, dst_addr, len, true,
>> + &sev->snp_host_map[0]))
>> + goto err;
>> + break;
>> + case SEV_CMD_ATTESTATION_REPORT:
>> + if (prep_buffer(struct sev_data_attestation_report, address, len,
>> + false, &sev->snp_host_map[0]))
>> + goto err;
>> + break;
>> + case SEV_CMD_SEND_START:
>> + if (prep_buffer(struct sev_data_send_start, session_address,
>> + session_len, false, &sev->snp_host_map[0]))
>> + goto err;
>> + break;
>> + case SEV_CMD_SEND_UPDATE_DATA:
>> + if (prep_buffer(struct sev_data_send_update_data, hdr_address, hdr_len,
>> + false, &sev->snp_host_map[0]))
>> + goto err;
>> + if (prep_buffer(struct sev_data_send_update_data, trans_address,
>> + trans_len, false, &sev->snp_host_map[1]))
>> + goto err;
>> + break;
>> + case SEV_CMD_SEND_UPDATE_VMSA:
>> + if (prep_buffer(struct sev_data_send_update_vmsa, hdr_address, hdr_len,
>> + false, &sev->snp_host_map[0]))
>> + goto err;
>> + if (prep_buffer(struct sev_data_send_update_vmsa, trans_address,
>> + trans_len, false, &sev->snp_host_map[1]))
>> + goto err;
>> + break;
>> + case SEV_CMD_RECEIVE_UPDATE_DATA:
>> + if (prep_buffer(struct sev_data_receive_update_data, guest_address,
>> + guest_len, true, &sev->snp_host_map[0]))
>> + goto err;
>> + break;
>> + case SEV_CMD_RECEIVE_UPDATE_VMSA:
>> + if (prep_buffer(struct sev_data_receive_update_vmsa, guest_address,
>> + guest_len, true, &sev->snp_host_map[0]))
>> + goto err;
>> + break;
>> + default:
>> + break;
>> + }
>> +
>> + /* The command buffer need to be in the firmware state. */
>> + if (to_fw && sev_legacy_cmd_buf_writable(cmd)) {
>> + if (rmp_mark_pages_firmware(__pa(cmd_buf), 1, true))
>> + return -EFAULT;
>> + }
>> +
>> + return 0;
>> +
>> +err:
>> + return -EINVAL;
>> +}
>> +
>> +static inline bool need_firmware_copy(int cmd)
>> +{
>> + struct sev_device *sev = psp_master->sev_data;
>> +
>> + /* After SNP is INIT'ed, the behavior of legacy SEV command is changed. */
>> + return ((cmd < SEV_CMD_SNP_INIT) && sev->snp_initialized) ? true : false;
>> +}
>> +
>> +static int snp_aware_copy_to_firmware(int cmd, void *data)
>> +{
>> + return __snp_cmd_buf_copy(cmd, data, true, 0);
>> +}
>> +
>> +static int snp_aware_copy_from_firmware(int cmd, void *data, int fw_err)
>> +{
>> + return __snp_cmd_buf_copy(cmd, data, false, fw_err);
>> +}
>> +
>> static int __sev_do_cmd_locked(int cmd, void *data, int *psp_ret)
>> {
>> struct psp_device *psp = psp_master;
>> struct sev_device *sev;
>> unsigned int phys_lsb, phys_msb;
>> unsigned int reg, ret = 0;
>> + void *cmd_buf;
>> int buf_len;
>>
>> if (!psp || !psp->sev_data)
>> @@ -512,12 +819,28 @@ static int __sev_do_cmd_locked(int cmd, void *data, int *psp_ret)
>> * work for some memory, e.g. vmalloc'd addresses, and @data may not be
>> * physically contiguous.
>> */
>> - if (data)
>> - memcpy(sev->cmd_buf, data, buf_len);
>> + if (data) {
>> + if (sev->cmd_buf_active > 2)
>> + return -EBUSY;
>> +
>> + cmd_buf = sev->cmd_buf_active ? sev->cmd_buf_backup : sev->cmd_buf;
>> +
>> + memcpy(cmd_buf, data, buf_len);
>> + sev->cmd_buf_active++;
>> +
>> + /*
>> + * The behavior of the SEV-legacy commands is altered when the
>> + * SNP firmware is in the INIT state.
>> + */
>> + if (need_firmware_copy(cmd) && snp_aware_copy_to_firmware(cmd, sev->cmd_buf))
> I believe this should be cmd_buf instead of sev->cmd_buf.
> snp_aware_copy_to_firmware(cmd, cmd_buf)

Yes, you are right, this should be cmd_buf instead of sev->cmd_buf, will
fix this accordingly.

Thanks,
Ashish

>
>> + return -EFAULT;
>> + } else {
>> + cmd_buf = sev->cmd_buf;
>> + }
>>
>> /* Get the physical address of the command buffer */
>> - phys_lsb = data ? lower_32_bits(__psp_pa(sev->cmd_buf)) : 0;
>> - phys_msb = data ? upper_32_bits(__psp_pa(sev->cmd_buf)) : 0;
>> + phys_lsb = data ? lower_32_bits(__psp_pa(cmd_buf)) : 0;
>> + phys_msb = data ? upper_32_bits(__psp_pa(cmd_buf)) : 0;
>>
>> dev_dbg(sev->dev, "sev command id %#x buffer 0x%08x%08x timeout %us\n",
>> cmd, phys_msb, phys_lsb, psp_timeout);
>> @@ -560,15 +883,24 @@ static int __sev_do_cmd_locked(int cmd, void *data, int *psp_ret)
>> ret = sev_write_init_ex_file_if_required(cmd);
>> }
>>
>> - print_hex_dump_debug("(out): ", DUMP_PREFIX_OFFSET, 16, 2, data,
>> - buf_len, false);
>> -
>> /*
>> * Copy potential output from the PSP back to data. Do this even on
>> * failure in case the caller wants to glean something from the error.
>> */
>> - if (data)
>> - memcpy(data, sev->cmd_buf, buf_len);
>> + if (data) {
>> + /*
>> + * Restore the page state after the command completes.
>> + */
>> + if (need_firmware_copy(cmd) &&
>> + snp_aware_copy_from_firmware(cmd, cmd_buf, ret))
>> + return -EFAULT;
>> +
>> + memcpy(data, cmd_buf, buf_len);
>> + sev->cmd_buf_active--;
>> + }
>> +
>> + print_hex_dump_debug("(out): ", DUMP_PREFIX_OFFSET, 16, 2, data,
>> + buf_len, false);
>>
>> return ret;
>> }
>> @@ -1579,10 +1911,12 @@ int sev_dev_init(struct psp_device *psp)
>> if (!sev)
>> goto e_err;
>>
>> - sev->cmd_buf = (void *)devm_get_free_pages(dev, GFP_KERNEL, 0);
>> + sev->cmd_buf = (void *)devm_get_free_pages(dev, GFP_KERNEL, 1);
>> if (!sev->cmd_buf)
>> goto e_sev;
>>
>> + sev->cmd_buf_backup = (uint8_t *)sev->cmd_buf + PAGE_SIZE;
>> +
>> psp->sev_data = sev;
>>
>> sev->dev = dev;
>> @@ -1648,6 +1982,12 @@ static void sev_firmware_shutdown(struct sev_device *sev)
>> snp_range_list = NULL;
>> }
>>
>> + /*
>> + * The host map need to clear the immutable bit so it must be free'd before the
>> + * SNP firmware shutdown.
>> + */
>> + free_snp_host_map(sev);
>> +
>> sev_snp_shutdown(&error);
>> }
>>
>> @@ -1722,6 +2062,14 @@ void sev_pci_init(void)
>> dev_err(sev->dev, "SEV-SNP: failed to INIT error %#x\n", error);
>> }
>> }
>> +
>> + /*
>> + * Allocate the intermediate buffers used for the legacy command handling.
>> + */
>> + if (alloc_snp_host_map(sev)) {
>> + dev_notice(sev->dev, "Failed to alloc host map (disabling legacy SEV)\n");
>> + goto skip_legacy;
>> + }
>> }
>>
>> /* Obtain the TMR memory area for SEV-ES use */
>> @@ -1739,12 +2087,14 @@ void sev_pci_init(void)
>> dev_err(sev->dev, "SEV: failed to INIT error %#x, rc %d\n",
>> error, rc);
>>
>> +skip_legacy:
>> dev_info(sev->dev, "SEV%s API:%d.%d build:%d\n", sev->snp_initialized ?
>> "-SNP" : "", sev->api_major, sev->api_minor, sev->build);
>>
>> return;
>>
>> err:
>> + free_snp_host_map(sev);
>> psp_master->sev_data = NULL;
>> }
>>
>> diff --git a/drivers/crypto/ccp/sev-dev.h b/drivers/crypto/ccp/sev-dev.h
>> index 34767657beb5..19d79f9d4212 100644
>> --- a/drivers/crypto/ccp/sev-dev.h
>> +++ b/drivers/crypto/ccp/sev-dev.h
>> @@ -29,11 +29,20 @@
>> #define SEV_CMDRESP_CMD_SHIFT 16
>> #define SEV_CMDRESP_IOC BIT(0)
>>
>> +#define MAX_SNP_HOST_MAP_BUFS 2
>> +
>> struct sev_misc_dev {
>> struct kref refcount;
>> struct miscdevice misc;
>> };
>>
>> +struct snp_host_map {
>> + u64 paddr;
>> + u32 len;
>> + void *host;
>> + bool active;
>> +};
>> +
>> struct sev_device {
>> struct device *dev;
>> struct psp_device *psp;
>> @@ -52,8 +61,11 @@ struct sev_device {
>> u8 build;
>>
>> void *cmd_buf;
>> + void *cmd_buf_backup;
>> + int cmd_buf_active;
>>
>> bool snp_initialized;
>> + struct snp_host_map snp_host_map[MAX_SNP_HOST_MAP_BUFS];
>> };
>>
>> int sev_dev_init(struct psp_device *psp);
>> --
>> 2.25.1
>>

2023-01-13 22:44:20

by Alper Gun

[permalink] [raw]
Subject: Re: [PATCH RFC v7 29/64] crypto: ccp: Handle the legacy SEV command when SNP is enabled

On Fri, Jan 13, 2023 at 2:04 PM Kalra, Ashish <[email protected]> wrote:
>
> Hello Alper,
>
> On 1/12/2023 2:47 PM, Alper Gun wrote:
> > On Wed, Dec 14, 2022 at 11:54 AM Michael Roth <[email protected]> wrote:
> >>
> >> From: Brijesh Singh <[email protected]>
> >>
> >> The behavior of the SEV-legacy commands is altered when the SNP firmware
> >> is in the INIT state. When SNP is in INIT state, all the SEV-legacy
> >> commands that cause the firmware to write to memory must be in the
> >> firmware state before issuing the command..
> >>
> >> A command buffer may contains a system physical address that the firmware
> >> may write to. There are two cases that need to be handled:
> >>
> >> 1) system physical address points to a guest memory
> >> 2) system physical address points to a host memory
> >>
> >> To handle the case #1, change the page state to the firmware in the RMP
> >> table before issuing the command and restore the state to shared after the
> >> command completes.
> >>
> >> For the case #2, use a bounce buffer to complete the request.
> >>
> >> Signed-off-by: Brijesh Singh <[email protected]>
> >> Signed-off-by: Ashish Kalra <[email protected]>
> >> Signed-off-by: Michael Roth <[email protected]>
> >> ---
> >> drivers/crypto/ccp/sev-dev.c | 370 ++++++++++++++++++++++++++++++++++-
> >> drivers/crypto/ccp/sev-dev.h | 12 ++
> >> 2 files changed, 372 insertions(+), 10 deletions(-)
> >>
> >> diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
> >> index 4c12e98a1219..5eb2e8f364d4 100644
> >> --- a/drivers/crypto/ccp/sev-dev.c
> >> +++ b/drivers/crypto/ccp/sev-dev.c
> >> @@ -286,6 +286,30 @@ static int rmp_mark_pages_firmware(unsigned long paddr, unsigned int npages, boo
> >> return rc;
> >> }
> >>
> >> +static int rmp_mark_pages_shared(unsigned long paddr, unsigned int npages)
> >> +{
> >> + /* Cbit maybe set in the paddr */
> >> + unsigned long pfn = __sme_clr(paddr) >> PAGE_SHIFT;
> >> + int rc, n = 0, i;
> >> +
> >> + for (i = 0; i < npages; i++, pfn++, n++) {
> >> + rc = rmp_make_shared(pfn, PG_LEVEL_4K);
> >> + if (rc)
> >> + goto cleanup;
> >> + }
> >> +
> >> + return 0;
> >> +
> >> +cleanup:
> >> + /*
> >> + * If failed to change the page state to shared, then its not safe
> >> + * to release the page back to the system, leak it.
> >> + */
> >> + snp_mark_pages_offline(pfn, npages - n);
> >> +
> >> + return rc;
> >> +}
> >> +
> >> static struct page *__snp_alloc_firmware_pages(gfp_t gfp_mask, int order, bool locked)
> >> {
> >> unsigned long npages = 1ul << order, paddr;
> >> @@ -487,12 +511,295 @@ static int sev_write_init_ex_file_if_required(int cmd_id)
> >> return sev_write_init_ex_file();
> >> }
> >>
> >> +static int alloc_snp_host_map(struct sev_device *sev)
> >> +{
> >> + struct page *page;
> >> + int i;
> >> +
> >> + for (i = 0; i < MAX_SNP_HOST_MAP_BUFS; i++) {
> >> + struct snp_host_map *map = &sev->snp_host_map[i];
> >> +
> >> + memset(map, 0, sizeof(*map));
> >> +
> >> + page = alloc_pages(GFP_KERNEL_ACCOUNT, get_order(SEV_FW_BLOB_MAX_SIZE));
> >> + if (!page)
> >> + return -ENOMEM;
> >> +
> >> + map->host = page_address(page);
> >> + }
> >> +
> >> + return 0;
> >> +}
> >> +
> >> +static void free_snp_host_map(struct sev_device *sev)
> >> +{
> >> + int i;
> >> +
> >> + for (i = 0; i < MAX_SNP_HOST_MAP_BUFS; i++) {
> >> + struct snp_host_map *map = &sev->snp_host_map[i];
> >> +
> >> + if (map->host) {
> >> + __free_pages(virt_to_page(map->host), get_order(SEV_FW_BLOB_MAX_SIZE));
> >> + memset(map, 0, sizeof(*map));
> >> + }
> >> + }
> >> +}
> >> +
> >> +static int map_firmware_writeable(u64 *paddr, u32 len, bool guest, struct snp_host_map *map)
> >> +{
> >> + unsigned int npages = PAGE_ALIGN(len) >> PAGE_SHIFT;
> >> +
> >> + map->active = false;
> >> +
> >> + if (!paddr || !len)
> >> + return 0;
> >> +
> >> + map->paddr = *paddr;
> >> + map->len = len;
> >> +
> >> + /* If paddr points to a guest memory then change the page state to firmwware. */
> >> + if (guest) {
> >> + if (rmp_mark_pages_firmware(*paddr, npages, true))
> >> + return -EFAULT;
> >> +
> >> + goto done;
> >> + }
> >> +
> >> + if (!map->host)
> >> + return -ENOMEM;
> >> +
> >> + /* Check if the pre-allocated buffer can be used to fullfil the request. */
> >> + if (len > SEV_FW_BLOB_MAX_SIZE)
> >> + return -EINVAL;
> >> +
> >> + /* Transition the pre-allocated buffer to the firmware state. */
> >> + if (rmp_mark_pages_firmware(__pa(map->host), npages, true))
> >> + return -EFAULT;
> >> +
> >> + /* Set the paddr to use pre-allocated firmware buffer */
> >> + *paddr = __psp_pa(map->host);
> >> +
> >> +done:
> >> + map->active = true;
> >> + return 0;
> >> +}
> >> +
> >> +static int unmap_firmware_writeable(u64 *paddr, u32 len, bool guest, struct snp_host_map *map)
> >> +{
> >> + unsigned int npages = PAGE_ALIGN(len) >> PAGE_SHIFT;
> >> +
> >> + if (!map->active)
> >> + return 0;
> >> +
> >> + /* If paddr points to a guest memory then restore the page state to hypervisor. */
> >> + if (guest) {
> >> + if (snp_reclaim_pages(*paddr, npages, true))
> >> + return -EFAULT;
> >> +
> >> + goto done;
> >> + }
> >> +
> >> + /*
> >> + * Transition the pre-allocated buffer to hypervisor state before the access.
> >> + *
> >> + * This is because while changing the page state to firmware, the kernel unmaps
> >> + * the pages from the direct map, and to restore the direct map the pages must
> >> + * be transitioned back to the shared state.
> >> + */
> >> + if (snp_reclaim_pages(__pa(map->host), npages, true))
> >> + return -EFAULT;
> >> +
> >> + /* Copy the response data firmware buffer to the callers buffer. */
> >> + memcpy(__va(__sme_clr(map->paddr)), map->host, min_t(size_t, len, map->len));
> >> + *paddr = map->paddr;
> >> +
> >> +done:
> >> + map->active = false;
> >> + return 0;
> >> +}
> >> +
> >> +static bool sev_legacy_cmd_buf_writable(int cmd)
> >> +{
> >> + switch (cmd) {
> >> + case SEV_CMD_PLATFORM_STATUS:
> >> + case SEV_CMD_GUEST_STATUS:
> >> + case SEV_CMD_LAUNCH_START:
> >> + case SEV_CMD_RECEIVE_START:
> >> + case SEV_CMD_LAUNCH_MEASURE:
> >> + case SEV_CMD_SEND_START:
> >> + case SEV_CMD_SEND_UPDATE_DATA:
> >> + case SEV_CMD_SEND_UPDATE_VMSA:
> >> + case SEV_CMD_PEK_CSR:
> >> + case SEV_CMD_PDH_CERT_EXPORT:
> >> + case SEV_CMD_GET_ID:
> >> + case SEV_CMD_ATTESTATION_REPORT:
> >> + return true;
> >> + default:
> >> + return false;
> >> + }
> >> +}
> >> +
> >> +#define prep_buffer(name, addr, len, guest, map) \
> >> + func(&((typeof(name *))cmd_buf)->addr, ((typeof(name *))cmd_buf)->len, guest, map)
> >> +
> >> +static int __snp_cmd_buf_copy(int cmd, void *cmd_buf, bool to_fw, int fw_err)
> >> +{
> >> + int (*func)(u64 *paddr, u32 len, bool guest, struct snp_host_map *map);
> >> + struct sev_device *sev = psp_master->sev_data;
> >> + bool from_fw = !to_fw;
> >> +
> >> + /*
> >> + * After the command is completed, change the command buffer memory to
> >> + * hypervisor state.
> >> + *
> >> + * The immutable bit is automatically cleared by the firmware, so
> >> + * no not need to reclaim the page.
> >> + */
> >> + if (from_fw && sev_legacy_cmd_buf_writable(cmd)) {
> >> + if (rmp_mark_pages_shared(__pa(cmd_buf), 1))
> >> + return -EFAULT;
> >
> > If we return here, we will skip calling unmap_firmware_writeable and
> > we will leak some pages in firmware state.
>
> Do you mean those (guest) pages which were transitioned to firmware
> state as part of
> snp_aware_copy_to_firmware()->_snp_cmd_buf_copy()->map_firmware_writeable()?

yes, if we return here, these guest pages will be left in firmware state.
>
> >
> >> +
> >> + /* No need to go further if firmware failed to execute command. */
> >> + if (fw_err)
> >> + return 0;
Same thing also here, we are possibly leaving guest pages in the firmware state.

> >> + }
> >> +
> >> + if (to_fw)
> >> + func = map_firmware_writeable;
> >> + else
> >> + func = unmap_firmware_writeable;
> >> +
> >> + /*
> >> + * A command buffer may contains a system physical address. If the address
> >> + * points to a host memory then use an intermediate firmware page otherwise
> >> + * change the page state in the RMP table.
> >> + */
> >> + switch (cmd) {
> >> + case SEV_CMD_PDH_CERT_EXPORT:
> >> + if (prep_buffer(struct sev_data_pdh_cert_export, pdh_cert_address,
> >> + pdh_cert_len, false, &sev->snp_host_map[0]))
> >> + goto err;
> >> + if (prep_buffer(struct sev_data_pdh_cert_export, cert_chain_address,
> >> + cert_chain_len, false, &sev->snp_host_map[1]))
> >> + goto err;
> >> + break;
> >> + case SEV_CMD_GET_ID:
> >> + if (prep_buffer(struct sev_data_get_id, address, len,
> >> + false, &sev->snp_host_map[0]))
> >> + goto err;
> >> + break;
> >> + case SEV_CMD_PEK_CSR:
> >> + if (prep_buffer(struct sev_data_pek_csr, address, len,
> >> + false, &sev->snp_host_map[0]))
> >> + goto err;
> >> + break;
> >> + case SEV_CMD_LAUNCH_UPDATE_DATA:
> >> + if (prep_buffer(struct sev_data_launch_update_data, address, len,
> >> + true, &sev->snp_host_map[0]))
> >> + goto err;
> >> + break;
> >> + case SEV_CMD_LAUNCH_UPDATE_VMSA:
> >> + if (prep_buffer(struct sev_data_launch_update_vmsa, address, len,
> >> + true, &sev->snp_host_map[0]))
> >> + goto err;
> >> + break;
> >> + case SEV_CMD_LAUNCH_MEASURE:
> >> + if (prep_buffer(struct sev_data_launch_measure, address, len,
> >> + false, &sev->snp_host_map[0]))
> >> + goto err;
> >> + break;
> >> + case SEV_CMD_LAUNCH_UPDATE_SECRET:
> >> + if (prep_buffer(struct sev_data_launch_secret, guest_address, guest_len,
> >> + true, &sev->snp_host_map[0]))
> >> + goto err;
> >> + break;
> >> + case SEV_CMD_DBG_DECRYPT:
> >> + if (prep_buffer(struct sev_data_dbg, dst_addr, len, false,
> >> + &sev->snp_host_map[0]))
> >> + goto err;
> >> + break;
> >> + case SEV_CMD_DBG_ENCRYPT:
> >> + if (prep_buffer(struct sev_data_dbg, dst_addr, len, true,
> >> + &sev->snp_host_map[0]))
> >> + goto err;
> >> + break;
> >> + case SEV_CMD_ATTESTATION_REPORT:
> >> + if (prep_buffer(struct sev_data_attestation_report, address, len,
> >> + false, &sev->snp_host_map[0]))
> >> + goto err;
> >> + break;
> >> + case SEV_CMD_SEND_START:
> >> + if (prep_buffer(struct sev_data_send_start, session_address,
> >> + session_len, false, &sev->snp_host_map[0]))
> >> + goto err;
> >> + break;
> >> + case SEV_CMD_SEND_UPDATE_DATA:
> >> + if (prep_buffer(struct sev_data_send_update_data, hdr_address, hdr_len,
> >> + false, &sev->snp_host_map[0]))
> >> + goto err;
> >> + if (prep_buffer(struct sev_data_send_update_data, trans_address,
> >> + trans_len, false, &sev->snp_host_map[1]))
> >> + goto err;
> >> + break;
> >> + case SEV_CMD_SEND_UPDATE_VMSA:
> >> + if (prep_buffer(struct sev_data_send_update_vmsa, hdr_address, hdr_len,
> >> + false, &sev->snp_host_map[0]))
> >> + goto err;
> >> + if (prep_buffer(struct sev_data_send_update_vmsa, trans_address,
> >> + trans_len, false, &sev->snp_host_map[1]))
> >> + goto err;
> >> + break;
> >> + case SEV_CMD_RECEIVE_UPDATE_DATA:
> >> + if (prep_buffer(struct sev_data_receive_update_data, guest_address,
> >> + guest_len, true, &sev->snp_host_map[0]))
> >> + goto err;
> >> + break;
> >> + case SEV_CMD_RECEIVE_UPDATE_VMSA:
> >> + if (prep_buffer(struct sev_data_receive_update_vmsa, guest_address,
> >> + guest_len, true, &sev->snp_host_map[0]))
> >> + goto err;
> >> + break;
> >> + default:
> >> + break;
> >> + }
> >> +
> >> + /* The command buffer need to be in the firmware state. */
> >> + if (to_fw && sev_legacy_cmd_buf_writable(cmd)) {
> >> + if (rmp_mark_pages_firmware(__pa(cmd_buf), 1, true))
> >> + return -EFAULT;
> >
> > This function moves two separate pages to firmware state. First
> > calling map_firmware_writeable and second calling
> > rmp_mark_pages_firmware for cmd_buf.
> > In case rmp_mark_pages_firmware fails for cmd_buf, the page which has
> > already moved to firmware state in map_firmware_writeable should be
> > reclaimed.
> > This is a problem especially if we leak a guest owned page in firmware
> > state. Since this is used only by legacy SEV VMs, these leaked pages
> > will never be reclaimed back when destroying these VMs.
> >
>
> Yes, this looks to be an inherent issue with the original patch, as you
> mentioned there are two pages - guest owned page and the HV cmd_buf, and
> failure to transition the cmd_buf back to HV/shared state has no
> corresponding recovery/reclaim for the transitioned guest page.
>
> Thanks,
> Ashish
>
> >>
> >> + }
> >> +
> >> + return 0;
> >> +
> >> +err:
> >> + return -EINVAL;
> >> +}
> >> +
> >> +static inline bool need_firmware_copy(int cmd)
> >> +{
> >> + struct sev_device *sev = psp_master->sev_data;
> >> +
> >> + /* After SNP is INIT'ed, the behavior of legacy SEV command is changed. */
> >> + return ((cmd < SEV_CMD_SNP_INIT) && sev->snp_initialized) ? true : false;
> >> +}
> >> +
> >> +static int snp_aware_copy_to_firmware(int cmd, void *data)
> >> +{
> >> + return __snp_cmd_buf_copy(cmd, data, true, 0);
> >> +}
> >> +
> >> +static int snp_aware_copy_from_firmware(int cmd, void *data, int fw_err)
> >> +{
> >> + return __snp_cmd_buf_copy(cmd, data, false, fw_err);
> >> +}
> >> +
> >> static int __sev_do_cmd_locked(int cmd, void *data, int *psp_ret)
> >> {
> >> struct psp_device *psp = psp_master;
> >> struct sev_device *sev;
> >> unsigned int phys_lsb, phys_msb;
> >> unsigned int reg, ret = 0;
> >> + void *cmd_buf;
> >> int buf_len;
> >>
> >> if (!psp || !psp->sev_data)
> >> @@ -512,12 +819,28 @@ static int __sev_do_cmd_locked(int cmd, void *data, int *psp_ret)
> >> * work for some memory, e.g. vmalloc'd addresses, and @data may not be
> >> * physically contiguous.
> >> */
> >> - if (data)
> >> - memcpy(sev->cmd_buf, data, buf_len);
> >> + if (data) {
> >> + if (sev->cmd_buf_active > 2)
> >> + return -EBUSY;
> >> +
> >> + cmd_buf = sev->cmd_buf_active ? sev->cmd_buf_backup : sev->cmd_buf;
> >> +
> >> + memcpy(cmd_buf, data, buf_len);
> >> + sev->cmd_buf_active++;
> >> +
> >> + /*
> >> + * The behavior of the SEV-legacy commands is altered when the
> >> + * SNP firmware is in the INIT state.
> >> + */
> >> + if (need_firmware_copy(cmd) && snp_aware_copy_to_firmware(cmd, sev->cmd_buf))
> >> + return -EFAULT;
> >> + } else {
> >> + cmd_buf = sev->cmd_buf;
> >> + }
> >>
> >> /* Get the physical address of the command buffer */
> >> - phys_lsb = data ? lower_32_bits(__psp_pa(sev->cmd_buf)) : 0;
> >> - phys_msb = data ? upper_32_bits(__psp_pa(sev->cmd_buf)) : 0;
> >> + phys_lsb = data ? lower_32_bits(__psp_pa(cmd_buf)) : 0;
> >> + phys_msb = data ? upper_32_bits(__psp_pa(cmd_buf)) : 0;
> >>
> >> dev_dbg(sev->dev, "sev command id %#x buffer 0x%08x%08x timeout %us\n",
> >> cmd, phys_msb, phys_lsb, psp_timeout);
> >> @@ -560,15 +883,24 @@ static int __sev_do_cmd_locked(int cmd, void *data, int *psp_ret)
> >> ret = sev_write_init_ex_file_if_required(cmd);
> >> }
> >>
> >> - print_hex_dump_debug("(out): ", DUMP_PREFIX_OFFSET, 16, 2, data,
> >> - buf_len, false);
> >> -
> >> /*
> >> * Copy potential output from the PSP back to data. Do this even on
> >> * failure in case the caller wants to glean something from the error.
> >> */
> >> - if (data)
> >> - memcpy(data, sev->cmd_buf, buf_len);
> >> + if (data) {
> >> + /*
> >> + * Restore the page state after the command completes.
> >> + */
> >> + if (need_firmware_copy(cmd) &&
> >> + snp_aware_copy_from_firmware(cmd, cmd_buf, ret))
> >> + return -EFAULT;
> >> +
> >> + memcpy(data, cmd_buf, buf_len);
> >> + sev->cmd_buf_active--;
> >> + }
> >> +
> >> + print_hex_dump_debug("(out): ", DUMP_PREFIX_OFFSET, 16, 2, data,
> >> + buf_len, false);
> >>
> >> return ret;
> >> }
> >> @@ -1579,10 +1911,12 @@ int sev_dev_init(struct psp_device *psp)
> >> if (!sev)
> >> goto e_err;
> >>
> >> - sev->cmd_buf = (void *)devm_get_free_pages(dev, GFP_KERNEL, 0);
> >> + sev->cmd_buf = (void *)devm_get_free_pages(dev, GFP_KERNEL, 1);
> >> if (!sev->cmd_buf)
> >> goto e_sev;
> >>
> >> + sev->cmd_buf_backup = (uint8_t *)sev->cmd_buf + PAGE_SIZE;
> >> +
> >> psp->sev_data = sev;
> >>
> >> sev->dev = dev;
> >> @@ -1648,6 +1982,12 @@ static void sev_firmware_shutdown(struct sev_device *sev)
> >> snp_range_list = NULL;
> >> }
> >>
> >> + /*
> >> + * The host map need to clear the immutable bit so it must be free'd before the
> >> + * SNP firmware shutdown.
> >> + */
> >> + free_snp_host_map(sev);
> >> +
> >> sev_snp_shutdown(&error);
> >> }
> >>
> >> @@ -1722,6 +2062,14 @@ void sev_pci_init(void)
> >> dev_err(sev->dev, "SEV-SNP: failed to INIT error %#x\n", error);
> >> }
> >> }
> >> +
> >> + /*
> >> + * Allocate the intermediate buffers used for the legacy command handling.
> >> + */
> >> + if (alloc_snp_host_map(sev)) {
> >> + dev_notice(sev->dev, "Failed to alloc host map (disabling legacy SEV)\n");
> >> + goto skip_legacy;
> >> + }
> >> }
> >>
> >> /* Obtain the TMR memory area for SEV-ES use */
> >> @@ -1739,12 +2087,14 @@ void sev_pci_init(void)
> >> dev_err(sev->dev, "SEV: failed to INIT error %#x, rc %d\n",
> >> error, rc);
> >>
> >> +skip_legacy:
> >> dev_info(sev->dev, "SEV%s API:%d.%d build:%d\n", sev->snp_initialized ?
> >> "-SNP" : "", sev->api_major, sev->api_minor, sev->build);
> >>
> >> return;
> >>
> >> err:
> >> + free_snp_host_map(sev);
> >> psp_master->sev_data = NULL;
> >> }
> >>
> >> diff --git a/drivers/crypto/ccp/sev-dev.h b/drivers/crypto/ccp/sev-dev.h
> >> index 34767657beb5..19d79f9d4212 100644
> >> --- a/drivers/crypto/ccp/sev-dev.h
> >> +++ b/drivers/crypto/ccp/sev-dev.h
> >> @@ -29,11 +29,20 @@
> >> #define SEV_CMDRESP_CMD_SHIFT 16
> >> #define SEV_CMDRESP_IOC BIT(0)
> >>
> >> +#define MAX_SNP_HOST_MAP_BUFS 2
> >> +
> >> struct sev_misc_dev {
> >> struct kref refcount;
> >> struct miscdevice misc;
> >> };
> >>
> >> +struct snp_host_map {
> >> + u64 paddr;
> >> + u32 len;
> >> + void *host;
> >> + bool active;
> >> +};
> >> +
> >> struct sev_device {
> >> struct device *dev;
> >> struct psp_device *psp;
> >> @@ -52,8 +61,11 @@ struct sev_device {
> >> u8 build;
> >>
> >> void *cmd_buf;
> >> + void *cmd_buf_backup;
> >> + int cmd_buf_active;
> >>
> >> bool snp_initialized;
> >> + struct snp_host_map snp_host_map[MAX_SNP_HOST_MAP_BUFS];
> >> };
> >>
> >> int sev_dev_init(struct psp_device *psp);
> >> --
> >> 2.25.1
> >>

2023-01-13 23:00:03

by Kalra, Ashish

[permalink] [raw]
Subject: Re: [PATCH RFC v7 29/64] crypto: ccp: Handle the legacy SEV command when SNP is enabled

On 1/13/2023 4:42 PM, Alper Gun wrote:
> On Fri, Jan 13, 2023 at 2:04 PM Kalra, Ashish <[email protected]> wrote:
>>
>> Hello Alper,
>>
>> On 1/12/2023 2:47 PM, Alper Gun wrote:
>>> On Wed, Dec 14, 2022 at 11:54 AM Michael Roth <[email protected]> wrote:
>>>>
>>>> From: Brijesh Singh <[email protected]>
>>>>
>>>> The behavior of the SEV-legacy commands is altered when the SNP firmware
>>>> is in the INIT state. When SNP is in INIT state, all the SEV-legacy
>>>> commands that cause the firmware to write to memory must be in the
>>>> firmware state before issuing the command..
>>>>
>>>> A command buffer may contains a system physical address that the firmware
>>>> may write to. There are two cases that need to be handled:
>>>>
>>>> 1) system physical address points to a guest memory
>>>> 2) system physical address points to a host memory
>>>>
>>>> To handle the case #1, change the page state to the firmware in the RMP
>>>> table before issuing the command and restore the state to shared after the
>>>> command completes.
>>>>
>>>> For the case #2, use a bounce buffer to complete the request.
>>>>
>>>> Signed-off-by: Brijesh Singh <[email protected]>
>>>> Signed-off-by: Ashish Kalra <[email protected]>
>>>> Signed-off-by: Michael Roth <[email protected]>
>>>> ---
>>>> drivers/crypto/ccp/sev-dev.c | 370 ++++++++++++++++++++++++++++++++++-
>>>> drivers/crypto/ccp/sev-dev.h | 12 ++
>>>> 2 files changed, 372 insertions(+), 10 deletions(-)
>>>>
>>>> diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
>>>> index 4c12e98a1219..5eb2e8f364d4 100644
>>>> --- a/drivers/crypto/ccp/sev-dev.c
>>>> +++ b/drivers/crypto/ccp/sev-dev.c
>>>> @@ -286,6 +286,30 @@ static int rmp_mark_pages_firmware(unsigned long paddr, unsigned int npages, boo
>>>> return rc;
>>>> }
>>>>
>>>> +static int rmp_mark_pages_shared(unsigned long paddr, unsigned int npages)
>>>> +{
>>>> + /* Cbit maybe set in the paddr */
>>>> + unsigned long pfn = __sme_clr(paddr) >> PAGE_SHIFT;
>>>> + int rc, n = 0, i;
>>>> +
>>>> + for (i = 0; i < npages; i++, pfn++, n++) {
>>>> + rc = rmp_make_shared(pfn, PG_LEVEL_4K);
>>>> + if (rc)
>>>> + goto cleanup;
>>>> + }
>>>> +
>>>> + return 0;
>>>> +
>>>> +cleanup:
>>>> + /*
>>>> + * If failed to change the page state to shared, then its not safe
>>>> + * to release the page back to the system, leak it.
>>>> + */
>>>> + snp_mark_pages_offline(pfn, npages - n);
>>>> +
>>>> + return rc;
>>>> +}
>>>> +
>>>> static struct page *__snp_alloc_firmware_pages(gfp_t gfp_mask, int order, bool locked)
>>>> {
>>>> unsigned long npages = 1ul << order, paddr;
>>>> @@ -487,12 +511,295 @@ static int sev_write_init_ex_file_if_required(int cmd_id)
>>>> return sev_write_init_ex_file();
>>>> }
>>>>
>>>> +static int alloc_snp_host_map(struct sev_device *sev)
>>>> +{
>>>> + struct page *page;
>>>> + int i;
>>>> +
>>>> + for (i = 0; i < MAX_SNP_HOST_MAP_BUFS; i++) {
>>>> + struct snp_host_map *map = &sev->snp_host_map[i];
>>>> +
>>>> + memset(map, 0, sizeof(*map));
>>>> +
>>>> + page = alloc_pages(GFP_KERNEL_ACCOUNT, get_order(SEV_FW_BLOB_MAX_SIZE));
>>>> + if (!page)
>>>> + return -ENOMEM;
>>>> +
>>>> + map->host = page_address(page);
>>>> + }
>>>> +
>>>> + return 0;
>>>> +}
>>>> +
>>>> +static void free_snp_host_map(struct sev_device *sev)
>>>> +{
>>>> + int i;
>>>> +
>>>> + for (i = 0; i < MAX_SNP_HOST_MAP_BUFS; i++) {
>>>> + struct snp_host_map *map = &sev->snp_host_map[i];
>>>> +
>>>> + if (map->host) {
>>>> + __free_pages(virt_to_page(map->host), get_order(SEV_FW_BLOB_MAX_SIZE));
>>>> + memset(map, 0, sizeof(*map));
>>>> + }
>>>> + }
>>>> +}
>>>> +
>>>> +static int map_firmware_writeable(u64 *paddr, u32 len, bool guest, struct snp_host_map *map)
>>>> +{
>>>> + unsigned int npages = PAGE_ALIGN(len) >> PAGE_SHIFT;
>>>> +
>>>> + map->active = false;
>>>> +
>>>> + if (!paddr || !len)
>>>> + return 0;
>>>> +
>>>> + map->paddr = *paddr;
>>>> + map->len = len;
>>>> +
>>>> + /* If paddr points to a guest memory then change the page state to firmwware. */
>>>> + if (guest) {
>>>> + if (rmp_mark_pages_firmware(*paddr, npages, true))
>>>> + return -EFAULT;
>>>> +
>>>> + goto done;
>>>> + }
>>>> +
>>>> + if (!map->host)
>>>> + return -ENOMEM;
>>>> +
>>>> + /* Check if the pre-allocated buffer can be used to fullfil the request. */
>>>> + if (len > SEV_FW_BLOB_MAX_SIZE)
>>>> + return -EINVAL;
>>>> +
>>>> + /* Transition the pre-allocated buffer to the firmware state. */
>>>> + if (rmp_mark_pages_firmware(__pa(map->host), npages, true))
>>>> + return -EFAULT;
>>>> +
>>>> + /* Set the paddr to use pre-allocated firmware buffer */
>>>> + *paddr = __psp_pa(map->host);
>>>> +
>>>> +done:
>>>> + map->active = true;
>>>> + return 0;
>>>> +}
>>>> +
>>>> +static int unmap_firmware_writeable(u64 *paddr, u32 len, bool guest, struct snp_host_map *map)
>>>> +{
>>>> + unsigned int npages = PAGE_ALIGN(len) >> PAGE_SHIFT;
>>>> +
>>>> + if (!map->active)
>>>> + return 0;
>>>> +
>>>> + /* If paddr points to a guest memory then restore the page state to hypervisor. */
>>>> + if (guest) {
>>>> + if (snp_reclaim_pages(*paddr, npages, true))
>>>> + return -EFAULT;
>>>> +
>>>> + goto done;
>>>> + }
>>>> +
>>>> + /*
>>>> + * Transition the pre-allocated buffer to hypervisor state before the access.
>>>> + *
>>>> + * This is because while changing the page state to firmware, the kernel unmaps
>>>> + * the pages from the direct map, and to restore the direct map the pages must
>>>> + * be transitioned back to the shared state.
>>>> + */
>>>> + if (snp_reclaim_pages(__pa(map->host), npages, true))
>>>> + return -EFAULT;
>>>> +
>>>> + /* Copy the response data firmware buffer to the callers buffer. */
>>>> + memcpy(__va(__sme_clr(map->paddr)), map->host, min_t(size_t, len, map->len));
>>>> + *paddr = map->paddr;
>>>> +
>>>> +done:
>>>> + map->active = false;
>>>> + return 0;
>>>> +}
>>>> +
>>>> +static bool sev_legacy_cmd_buf_writable(int cmd)
>>>> +{
>>>> + switch (cmd) {
>>>> + case SEV_CMD_PLATFORM_STATUS:
>>>> + case SEV_CMD_GUEST_STATUS:
>>>> + case SEV_CMD_LAUNCH_START:
>>>> + case SEV_CMD_RECEIVE_START:
>>>> + case SEV_CMD_LAUNCH_MEASURE:
>>>> + case SEV_CMD_SEND_START:
>>>> + case SEV_CMD_SEND_UPDATE_DATA:
>>>> + case SEV_CMD_SEND_UPDATE_VMSA:
>>>> + case SEV_CMD_PEK_CSR:
>>>> + case SEV_CMD_PDH_CERT_EXPORT:
>>>> + case SEV_CMD_GET_ID:
>>>> + case SEV_CMD_ATTESTATION_REPORT:
>>>> + return true;
>>>> + default:
>>>> + return false;
>>>> + }
>>>> +}
>>>> +
>>>> +#define prep_buffer(name, addr, len, guest, map) \
>>>> + func(&((typeof(name *))cmd_buf)->addr, ((typeof(name *))cmd_buf)->len, guest, map)
>>>> +
>>>> +static int __snp_cmd_buf_copy(int cmd, void *cmd_buf, bool to_fw, int fw_err)
>>>> +{
>>>> + int (*func)(u64 *paddr, u32 len, bool guest, struct snp_host_map *map);
>>>> + struct sev_device *sev = psp_master->sev_data;
>>>> + bool from_fw = !to_fw;
>>>> +
>>>> + /*
>>>> + * After the command is completed, change the command buffer memory to
>>>> + * hypervisor state.
>>>> + *
>>>> + * The immutable bit is automatically cleared by the firmware, so
>>>> + * no not need to reclaim the page.
>>>> + */
>>>> + if (from_fw && sev_legacy_cmd_buf_writable(cmd)) {
>>>> + if (rmp_mark_pages_shared(__pa(cmd_buf), 1))
>>>> + return -EFAULT;
>>>
>>> If we return here, we will skip calling unmap_firmware_writeable and
>>> we will leak some pages in firmware state.
>>
>> Do you mean those (guest) pages which were transitioned to firmware
>> state as part of
>> snp_aware_copy_to_firmware()->_snp_cmd_buf_copy()->map_firmware_writeable()?
>
> yes, if we return here, these guest pages will be left in firmware state.
>>
>>>
>>>> +
>>>> + /* No need to go further if firmware failed to execute command. */
>>>> + if (fw_err)
>>>> + return 0;
> Same thing also here, we are possibly leaving guest pages in the firmware state.
>

Yes, i agree, both are inhererent bugs in the original patch and need to
be fixed.

Thanks,
Ashish

>>>> + }
>>>> +
>>>> + if (to_fw)
>>>> + func = map_firmware_writeable;
>>>> + else
>>>> + func = unmap_firmware_writeable;
>>>> +
>>>> + /*
>>>> + * A command buffer may contains a system physical address. If the address
>>>> + * points to a host memory then use an intermediate firmware page otherwise
>>>> + * change the page state in the RMP table.
>>>> + */
>>>> + switch (cmd) {
>>>> + case SEV_CMD_PDH_CERT_EXPORT:
>>>> + if (prep_buffer(struct sev_data_pdh_cert_export, pdh_cert_address,
>>>> + pdh_cert_len, false, &sev->snp_host_map[0]))
>>>> + goto err;
>>>> + if (prep_buffer(struct sev_data_pdh_cert_export, cert_chain_address,
>>>> + cert_chain_len, false, &sev->snp_host_map[1]))
>>>> + goto err;
>>>> + break;
>>>> + case SEV_CMD_GET_ID:
>>>> + if (prep_buffer(struct sev_data_get_id, address, len,
>>>> + false, &sev->snp_host_map[0]))
>>>> + goto err;
>>>> + break;
>>>> + case SEV_CMD_PEK_CSR:
>>>> + if (prep_buffer(struct sev_data_pek_csr, address, len,
>>>> + false, &sev->snp_host_map[0]))
>>>> + goto err;
>>>> + break;
>>>> + case SEV_CMD_LAUNCH_UPDATE_DATA:
>>>> + if (prep_buffer(struct sev_data_launch_update_data, address, len,
>>>> + true, &sev->snp_host_map[0]))
>>>> + goto err;
>>>> + break;
>>>> + case SEV_CMD_LAUNCH_UPDATE_VMSA:
>>>> + if (prep_buffer(struct sev_data_launch_update_vmsa, address, len,
>>>> + true, &sev->snp_host_map[0]))
>>>> + goto err;
>>>> + break;
>>>> + case SEV_CMD_LAUNCH_MEASURE:
>>>> + if (prep_buffer(struct sev_data_launch_measure, address, len,
>>>> + false, &sev->snp_host_map[0]))
>>>> + goto err;
>>>> + break;
>>>> + case SEV_CMD_LAUNCH_UPDATE_SECRET:
>>>> + if (prep_buffer(struct sev_data_launch_secret, guest_address, guest_len,
>>>> + true, &sev->snp_host_map[0]))
>>>> + goto err;
>>>> + break;
>>>> + case SEV_CMD_DBG_DECRYPT:
>>>> + if (prep_buffer(struct sev_data_dbg, dst_addr, len, false,
>>>> + &sev->snp_host_map[0]))
>>>> + goto err;
>>>> + break;
>>>> + case SEV_CMD_DBG_ENCRYPT:
>>>> + if (prep_buffer(struct sev_data_dbg, dst_addr, len, true,
>>>> + &sev->snp_host_map[0]))
>>>> + goto err;
>>>> + break;
>>>> + case SEV_CMD_ATTESTATION_REPORT:
>>>> + if (prep_buffer(struct sev_data_attestation_report, address, len,
>>>> + false, &sev->snp_host_map[0]))
>>>> + goto err;
>>>> + break;
>>>> + case SEV_CMD_SEND_START:
>>>> + if (prep_buffer(struct sev_data_send_start, session_address,
>>>> + session_len, false, &sev->snp_host_map[0]))
>>>> + goto err;
>>>> + break;
>>>> + case SEV_CMD_SEND_UPDATE_DATA:
>>>> + if (prep_buffer(struct sev_data_send_update_data, hdr_address, hdr_len,
>>>> + false, &sev->snp_host_map[0]))
>>>> + goto err;
>>>> + if (prep_buffer(struct sev_data_send_update_data, trans_address,
>>>> + trans_len, false, &sev->snp_host_map[1]))
>>>> + goto err;
>>>> + break;
>>>> + case SEV_CMD_SEND_UPDATE_VMSA:
>>>> + if (prep_buffer(struct sev_data_send_update_vmsa, hdr_address, hdr_len,
>>>> + false, &sev->snp_host_map[0]))
>>>> + goto err;
>>>> + if (prep_buffer(struct sev_data_send_update_vmsa, trans_address,
>>>> + trans_len, false, &sev->snp_host_map[1]))
>>>> + goto err;
>>>> + break;
>>>> + case SEV_CMD_RECEIVE_UPDATE_DATA:
>>>> + if (prep_buffer(struct sev_data_receive_update_data, guest_address,
>>>> + guest_len, true, &sev->snp_host_map[0]))
>>>> + goto err;
>>>> + break;
>>>> + case SEV_CMD_RECEIVE_UPDATE_VMSA:
>>>> + if (prep_buffer(struct sev_data_receive_update_vmsa, guest_address,
>>>> + guest_len, true, &sev->snp_host_map[0]))
>>>> + goto err;
>>>> + break;
>>>> + default:
>>>> + break;
>>>> + }
>>>> +
>>>> + /* The command buffer need to be in the firmware state. */
>>>> + if (to_fw && sev_legacy_cmd_buf_writable(cmd)) {
>>>> + if (rmp_mark_pages_firmware(__pa(cmd_buf), 1, true))
>>>> + return -EFAULT;
>>>
>>> This function moves two separate pages to firmware state. First
>>> calling map_firmware_writeable and second calling
>>> rmp_mark_pages_firmware for cmd_buf.
>>> In case rmp_mark_pages_firmware fails for cmd_buf, the page which has
>>> already moved to firmware state in map_firmware_writeable should be
>>> reclaimed.
>>> This is a problem especially if we leak a guest owned page in firmware
>>> state. Since this is used only by legacy SEV VMs, these leaked pages
>>> will never be reclaimed back when destroying these VMs.
>>>
>>
>> Yes, this looks to be an inherent issue with the original patch, as you
>> mentioned there are two pages - guest owned page and the HV cmd_buf, and
>> failure to transition the cmd_buf back to HV/shared state has no
>> corresponding recovery/reclaim for the transitioned guest page.
>>
>> Thanks,
>> Ashish
>>
>>>>
>>>> + }
>>>> +
>>>> + return 0;
>>>> +
>>>> +err:
>>>> + return -EINVAL;
>>>> +}
>>>> +
>>>> +static inline bool need_firmware_copy(int cmd)
>>>> +{
>>>> + struct sev_device *sev = psp_master->sev_data;
>>>> +
>>>> + /* After SNP is INIT'ed, the behavior of legacy SEV command is changed. */
>>>> + return ((cmd < SEV_CMD_SNP_INIT) && sev->snp_initialized) ? true : false;
>>>> +}
>>>> +
>>>> +static int snp_aware_copy_to_firmware(int cmd, void *data)
>>>> +{
>>>> + return __snp_cmd_buf_copy(cmd, data, true, 0);
>>>> +}
>>>> +
>>>> +static int snp_aware_copy_from_firmware(int cmd, void *data, int fw_err)
>>>> +{
>>>> + return __snp_cmd_buf_copy(cmd, data, false, fw_err);
>>>> +}
>>>> +
>>>> static int __sev_do_cmd_locked(int cmd, void *data, int *psp_ret)
>>>> {
>>>> struct psp_device *psp = psp_master;
>>>> struct sev_device *sev;
>>>> unsigned int phys_lsb, phys_msb;
>>>> unsigned int reg, ret = 0;
>>>> + void *cmd_buf;
>>>> int buf_len;
>>>>
>>>> if (!psp || !psp->sev_data)
>>>> @@ -512,12 +819,28 @@ static int __sev_do_cmd_locked(int cmd, void *data, int *psp_ret)
>>>> * work for some memory, e.g. vmalloc'd addresses, and @data may not be
>>>> * physically contiguous.
>>>> */
>>>> - if (data)
>>>> - memcpy(sev->cmd_buf, data, buf_len);
>>>> + if (data) {
>>>> + if (sev->cmd_buf_active > 2)
>>>> + return -EBUSY;
>>>> +
>>>> + cmd_buf = sev->cmd_buf_active ? sev->cmd_buf_backup : sev->cmd_buf;
>>>> +
>>>> + memcpy(cmd_buf, data, buf_len);
>>>> + sev->cmd_buf_active++;
>>>> +
>>>> + /*
>>>> + * The behavior of the SEV-legacy commands is altered when the
>>>> + * SNP firmware is in the INIT state.
>>>> + */
>>>> + if (need_firmware_copy(cmd) && snp_aware_copy_to_firmware(cmd, sev->cmd_buf))
>>>> + return -EFAULT;
>>>> + } else {
>>>> + cmd_buf = sev->cmd_buf;
>>>> + }
>>>>
>>>> /* Get the physical address of the command buffer */
>>>> - phys_lsb = data ? lower_32_bits(__psp_pa(sev->cmd_buf)) : 0;
>>>> - phys_msb = data ? upper_32_bits(__psp_pa(sev->cmd_buf)) : 0;
>>>> + phys_lsb = data ? lower_32_bits(__psp_pa(cmd_buf)) : 0;
>>>> + phys_msb = data ? upper_32_bits(__psp_pa(cmd_buf)) : 0;
>>>>
>>>> dev_dbg(sev->dev, "sev command id %#x buffer 0x%08x%08x timeout %us\n",
>>>> cmd, phys_msb, phys_lsb, psp_timeout);
>>>> @@ -560,15 +883,24 @@ static int __sev_do_cmd_locked(int cmd, void *data, int *psp_ret)
>>>> ret = sev_write_init_ex_file_if_required(cmd);
>>>> }
>>>>
>>>> - print_hex_dump_debug("(out): ", DUMP_PREFIX_OFFSET, 16, 2, data,
>>>> - buf_len, false);
>>>> -
>>>> /*
>>>> * Copy potential output from the PSP back to data. Do this even on
>>>> * failure in case the caller wants to glean something from the error.
>>>> */
>>>> - if (data)
>>>> - memcpy(data, sev->cmd_buf, buf_len);
>>>> + if (data) {
>>>> + /*
>>>> + * Restore the page state after the command completes.
>>>> + */
>>>> + if (need_firmware_copy(cmd) &&
>>>> + snp_aware_copy_from_firmware(cmd, cmd_buf, ret))
>>>> + return -EFAULT;
>>>> +
>>>> + memcpy(data, cmd_buf, buf_len);
>>>> + sev->cmd_buf_active--;
>>>> + }
>>>> +
>>>> + print_hex_dump_debug("(out): ", DUMP_PREFIX_OFFSET, 16, 2, data,
>>>> + buf_len, false);
>>>>
>>>> return ret;
>>>> }
>>>> @@ -1579,10 +1911,12 @@ int sev_dev_init(struct psp_device *psp)
>>>> if (!sev)
>>>> goto e_err;
>>>>
>>>> - sev->cmd_buf = (void *)devm_get_free_pages(dev, GFP_KERNEL, 0);
>>>> + sev->cmd_buf = (void *)devm_get_free_pages(dev, GFP_KERNEL, 1);
>>>> if (!sev->cmd_buf)
>>>> goto e_sev;
>>>>
>>>> + sev->cmd_buf_backup = (uint8_t *)sev->cmd_buf + PAGE_SIZE;
>>>> +
>>>> psp->sev_data = sev;
>>>>
>>>> sev->dev = dev;
>>>> @@ -1648,6 +1982,12 @@ static void sev_firmware_shutdown(struct sev_device *sev)
>>>> snp_range_list = NULL;
>>>> }
>>>>
>>>> + /*
>>>> + * The host map need to clear the immutable bit so it must be free'd before the
>>>> + * SNP firmware shutdown.
>>>> + */
>>>> + free_snp_host_map(sev);
>>>> +
>>>> sev_snp_shutdown(&error);
>>>> }
>>>>
>>>> @@ -1722,6 +2062,14 @@ void sev_pci_init(void)
>>>> dev_err(sev->dev, "SEV-SNP: failed to INIT error %#x\n", error);
>>>> }
>>>> }
>>>> +
>>>> + /*
>>>> + * Allocate the intermediate buffers used for the legacy command handling.
>>>> + */
>>>> + if (alloc_snp_host_map(sev)) {
>>>> + dev_notice(sev->dev, "Failed to alloc host map (disabling legacy SEV)\n");
>>>> + goto skip_legacy;
>>>> + }
>>>> }
>>>>
>>>> /* Obtain the TMR memory area for SEV-ES use */
>>>> @@ -1739,12 +2087,14 @@ void sev_pci_init(void)
>>>> dev_err(sev->dev, "SEV: failed to INIT error %#x, rc %d\n",
>>>> error, rc);
>>>>
>>>> +skip_legacy:
>>>> dev_info(sev->dev, "SEV%s API:%d.%d build:%d\n", sev->snp_initialized ?
>>>> "-SNP" : "", sev->api_major, sev->api_minor, sev->build);
>>>>
>>>> return;
>>>>
>>>> err:
>>>> + free_snp_host_map(sev);
>>>> psp_master->sev_data = NULL;
>>>> }
>>>>
>>>> diff --git a/drivers/crypto/ccp/sev-dev.h b/drivers/crypto/ccp/sev-dev.h
>>>> index 34767657beb5..19d79f9d4212 100644
>>>> --- a/drivers/crypto/ccp/sev-dev.h
>>>> +++ b/drivers/crypto/ccp/sev-dev.h
>>>> @@ -29,11 +29,20 @@
>>>> #define SEV_CMDRESP_CMD_SHIFT 16
>>>> #define SEV_CMDRESP_IOC BIT(0)
>>>>
>>>> +#define MAX_SNP_HOST_MAP_BUFS 2
>>>> +
>>>> struct sev_misc_dev {
>>>> struct kref refcount;
>>>> struct miscdevice misc;
>>>> };
>>>>
>>>> +struct snp_host_map {
>>>> + u64 paddr;
>>>> + u32 len;
>>>> + void *host;
>>>> + bool active;
>>>> +};
>>>> +
>>>> struct sev_device {
>>>> struct device *dev;
>>>> struct psp_device *psp;
>>>> @@ -52,8 +61,11 @@ struct sev_device {
>>>> u8 build;
>>>>
>>>> void *cmd_buf;
>>>> + void *cmd_buf_backup;
>>>> + int cmd_buf_active;
>>>>
>>>> bool snp_initialized;
>>>> + struct snp_host_map snp_host_map[MAX_SNP_HOST_MAP_BUFS];
>>>> };
>>>>
>>>> int sev_dev_init(struct psp_device *psp);
>>>> --
>>>> 2.25.1
>>>>

2023-01-13 23:05:14

by Alper Gun

[permalink] [raw]
Subject: Re: [PATCH RFC v7 23/64] x86/fault: Add support to dump RMP entry on fault

On Wed, Dec 14, 2022 at 11:52 AM Michael Roth <[email protected]> wrote:
>
> From: Brijesh Singh <[email protected]>
>
> When SEV-SNP is enabled globally, a write from the host goes through the
> RMP check. If the hardware encounters the check failure, then it raises
> the #PF (with RMP set). Dump the RMP entry at the faulting pfn to help
> the debug.
>
> Signed-off-by: Brijesh Singh <[email protected]>
> Signed-off-by: Ashish Kalra <[email protected]>
> Signed-off-by: Michael Roth <[email protected]>
> ---
> arch/x86/include/asm/sev.h | 2 ++
> arch/x86/kernel/sev.c | 43 ++++++++++++++++++++++++++++++++++++++
> arch/x86/mm/fault.c | 7 ++++++-
> 3 files changed, 51 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
> index 4eeedcaca593..2916f4150ac7 100644
> --- a/arch/x86/include/asm/sev.h
> +++ b/arch/x86/include/asm/sev.h
> @@ -215,6 +215,7 @@ int snp_lookup_rmpentry(u64 pfn, int *level);
> int psmash(u64 pfn);
> int rmp_make_private(u64 pfn, u64 gpa, enum pg_level level, int asid, bool immutable);
> int rmp_make_shared(u64 pfn, enum pg_level level);
> +void sev_dump_rmpentry(u64 pfn);
> #else
> static inline void sev_es_ist_enter(struct pt_regs *regs) { }
> static inline void sev_es_ist_exit(void) { }
> @@ -247,6 +248,7 @@ static inline int rmp_make_private(u64 pfn, u64 gpa, enum pg_level level, int as
> return -ENODEV;
> }
> static inline int rmp_make_shared(u64 pfn, enum pg_level level) { return -ENODEV; }
> +static inline void sev_dump_rmpentry(u64 pfn) {}
> #endif
>
> #endif
> diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
> index e2b38c3551be..1dd1b36bdfea 100644
> --- a/arch/x86/kernel/sev.c
> +++ b/arch/x86/kernel/sev.c
> @@ -2508,6 +2508,49 @@ static struct rmpentry *__snp_lookup_rmpentry(u64 pfn, int *level)
> return entry;
> }
>
> +void sev_dump_rmpentry(u64 pfn)
> +{
> + unsigned long pfn_end;
> + struct rmpentry *e;
> + int level;
> +
> + e = __snp_lookup_rmpentry(pfn, &level);
> + if (!e) {
if (IS_ERR(e)) {

> + pr_info("failed to read RMP entry pfn 0x%llx\n", pfn);
> + return;
> + }
> +
> + if (rmpentry_assigned(e)) {
> + pr_info("RMPEntry paddr 0x%llx [assigned=%d immutable=%d pagesize=%d gpa=0x%lx"
> + " asid=%d vmsa=%d validated=%d]\n", pfn << PAGE_SHIFT,
> + rmpentry_assigned(e), e->info.immutable, rmpentry_pagesize(e),
> + (unsigned long)e->info.gpa, e->info.asid, e->info.vmsa,
> + e->info.validated);
> + return;
> + }
> +
> + /*
> + * If the RMP entry at the faulting pfn was not assigned, then not sure
> + * what caused the RMP violation. To get some useful debug information,
> + * iterate through the entire 2MB region, and dump the RMP entries if
> + * one of the bit in the RMP entry is set.
> + */
> + pfn = pfn & ~(PTRS_PER_PMD - 1);
> + pfn_end = pfn + PTRS_PER_PMD;
> +
> + while (pfn < pfn_end) {
> + e = __snp_lookup_rmpentry(pfn, &level);
> + if (!e)
> + return;
if (IS_ERR(e))
continue;
> +
> + if (e->low || e->high)
> + pr_info("RMPEntry paddr 0x%llx: [high=0x%016llx low=0x%016llx]\n",
> + pfn << PAGE_SHIFT, e->high, e->low);
> + pfn++;
> + }
> +}
> +EXPORT_SYMBOL_GPL(sev_dump_rmpentry);
> +
> /*
> * Return 1 if the RMP entry is assigned, 0 if it exists but is not assigned,
> * and -errno if there is no corresponding RMP entry.
> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
> index ded53879f98d..f2b16dcfbd9a 100644
> --- a/arch/x86/mm/fault.c
> +++ b/arch/x86/mm/fault.c
> @@ -536,6 +536,8 @@ static void show_ldttss(const struct desc_ptr *gdt, const char *name, u16 index)
> static void
> show_fault_oops(struct pt_regs *regs, unsigned long error_code, unsigned long address)
> {
> + unsigned long pfn;
> +
> if (!oops_may_print())
> return;
>
> @@ -608,7 +610,10 @@ show_fault_oops(struct pt_regs *regs, unsigned long error_code, unsigned long ad
> show_ldttss(&gdt, "TR", tr);
> }
>
> - dump_pagetable(address);
> + pfn = dump_pagetable(address);
> +
> + if (error_code & X86_PF_RMP)
> + sev_dump_rmpentry(pfn);
> }
>
> static noinline void
> --
> 2.25.1
>

2023-01-13 23:50:21

by Kalra, Ashish

[permalink] [raw]
Subject: Re: [PATCH RFC v7 23/64] x86/fault: Add support to dump RMP entry on fault

On 1/13/2023 4:56 PM, Alper Gun wrote:
> On Wed, Dec 14, 2022 at 11:52 AM Michael Roth <[email protected]> wrote:
>>
>> From: Brijesh Singh <[email protected]>
>>
>> When SEV-SNP is enabled globally, a write from the host goes through the
>> RMP check. If the hardware encounters the check failure, then it raises
>> the #PF (with RMP set). Dump the RMP entry at the faulting pfn to help
>> the debug.
>>
>> Signed-off-by: Brijesh Singh <[email protected]>
>> Signed-off-by: Ashish Kalra <[email protected]>
>> Signed-off-by: Michael Roth <[email protected]>
>> ---
>> arch/x86/include/asm/sev.h | 2 ++
>> arch/x86/kernel/sev.c | 43 ++++++++++++++++++++++++++++++++++++++
>> arch/x86/mm/fault.c | 7 ++++++-
>> 3 files changed, 51 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
>> index 4eeedcaca593..2916f4150ac7 100644
>> --- a/arch/x86/include/asm/sev.h
>> +++ b/arch/x86/include/asm/sev.h
>> @@ -215,6 +215,7 @@ int snp_lookup_rmpentry(u64 pfn, int *level);
>> int psmash(u64 pfn);
>> int rmp_make_private(u64 pfn, u64 gpa, enum pg_level level, int asid, bool immutable);
>> int rmp_make_shared(u64 pfn, enum pg_level level);
>> +void sev_dump_rmpentry(u64 pfn);
>> #else
>> static inline void sev_es_ist_enter(struct pt_regs *regs) { }
>> static inline void sev_es_ist_exit(void) { }
>> @@ -247,6 +248,7 @@ static inline int rmp_make_private(u64 pfn, u64 gpa, enum pg_level level, int as
>> return -ENODEV;
>> }
>> static inline int rmp_make_shared(u64 pfn, enum pg_level level) { return -ENODEV; }
>> +static inline void sev_dump_rmpentry(u64 pfn) {}
>> #endif
>>
>> #endif
>> diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
>> index e2b38c3551be..1dd1b36bdfea 100644
>> --- a/arch/x86/kernel/sev.c
>> +++ b/arch/x86/kernel/sev.c
>> @@ -2508,6 +2508,49 @@ static struct rmpentry *__snp_lookup_rmpentry(u64 pfn, int *level)
>> return entry;
>> }
>>
>> +void sev_dump_rmpentry(u64 pfn)
>> +{
>> + unsigned long pfn_end;
>> + struct rmpentry *e;
>> + int level;
>> +
>> + e = __snp_lookup_rmpentry(pfn, &level);
>> + if (!e) {
> if (IS_ERR(e)) {
>

Yes, this needs to be fixed to IS_ERR(e)

>> + pr_info("failed to read RMP entry pfn 0x%llx\n", pfn);
>> + return;
>> + }
>> +
>> + if (rmpentry_assigned(e)) {
>> + pr_info("RMPEntry paddr 0x%llx [assigned=%d immutable=%d pagesize=%d gpa=0x%lx"
>> + " asid=%d vmsa=%d validated=%d]\n", pfn << PAGE_SHIFT,
>> + rmpentry_assigned(e), e->info.immutable, rmpentry_pagesize(e),
>> + (unsigned long)e->info.gpa, e->info.asid, e->info.vmsa,
>> + e->info.validated);
>> + return;
>> + }
>> +
>> + /*
>> + * If the RMP entry at the faulting pfn was not assigned, then not sure
>> + * what caused the RMP violation. To get some useful debug information,
>> + * iterate through the entire 2MB region, and dump the RMP entries if
>> + * one of the bit in the RMP entry is set.
>> + */
>> + pfn = pfn & ~(PTRS_PER_PMD - 1);
>> + pfn_end = pfn + PTRS_PER_PMD;
>> +
>> + while (pfn < pfn_end) {
>> + e = __snp_lookup_rmpentry(pfn, &level);
>> + if (!e)
>> + return;
> if (IS_ERR(e))
> continue;

Again, this is correct, but then it should be :

if (IS_ERR(e)) {
pfn++;
continue;
}

Thanks,
Ashish

>> +
>> + if (e->low || e->high)
>> + pr_info("RMPEntry paddr 0x%llx: [high=0x%016llx low=0x%016llx]\n",
>> + pfn << PAGE_SHIFT, e->high, e->low);
>> + pfn++;
>> + }
>> +}
>> +EXPORT_SYMBOL_GPL(sev_dump_rmpentry);
>> +
>> /*
>> * Return 1 if the RMP entry is assigned, 0 if it exists but is not assigned,
>> * and -errno if there is no corresponding RMP entry.
>> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
>> index ded53879f98d..f2b16dcfbd9a 100644
>> --- a/arch/x86/mm/fault.c
>> +++ b/arch/x86/mm/fault.c
>> @@ -536,6 +536,8 @@ static void show_ldttss(const struct desc_ptr *gdt, const char *name, u16 index)
>> static void
>> show_fault_oops(struct pt_regs *regs, unsigned long error_code, unsigned long address)
>> {
>> + unsigned long pfn;
>> +
>> if (!oops_may_print())
>> return;
>>
>> @@ -608,7 +610,10 @@ show_fault_oops(struct pt_regs *regs, unsigned long error_code, unsigned long ad
>> show_ldttss(&gdt, "TR", tr);
>> }
>>
>> - dump_pagetable(address);
>> + pfn = dump_pagetable(address);
>> +
>> + if (error_code & X86_PF_RMP)
>> + sev_dump_rmpentry(pfn);
>> }
>>
>> static noinline void
>> --
>> 2.25.1
>>

2023-01-16 08:00:29

by Nikunj A. Dadhania

[permalink] [raw]
Subject: Re: [PATCH RFC v7 07/64] KVM: SEV: Handle KVM_HC_MAP_GPA_RANGE hypercall

On 13/01/23 21:47, Sean Christopherson wrote:
> On Fri, Jan 13, 2023, Borislav Petkov wrote:
>> On Wed, Dec 14, 2022 at 01:39:59PM -0600, Michael Roth wrote:
>>> From: Nikunj A Dadhania <[email protected]>
>>>
>>> KVM_HC_MAP_GPA_RANGE hypercall is used by the SEV guest to notify a
>>> change in the page encryption status to the hypervisor.
>>>
>>> The hypercall exits to userspace with KVM_EXIT_HYPERCALL exit code,
>>> currently this is used for explicit memory conversion between
>>> shared/private for memfd based private memory.
>>
>> So Tom and I spent a while to figure out what this is doing...
>>
>> Please explain in more detail what that is. Like the hypercall gets ignored for
>> memslots which cannot be private...?

This was required when we were using per memslot bitmap for storing the
private information, mem_attr_array is not dependent on memslot anymore.

>
> Don't bother, just drop the patch.

Agree, we can drop this. I have tested SEV without this patch.

> It's perfectly legal for userspace to create the private memslot in response > to a guest request.

Sean, did not understand this part, how could a memslot be created on a guest request?

Regards
Nikunj

2023-01-17 10:54:09

by Zhi Wang

[permalink] [raw]
Subject: Re: [PATCH RFC v7 20/64] x86/fault: Add support to handle the RMP fault for user address

On Wed, 14 Dec 2022 13:40:12 -0600
Michael Roth <[email protected]> wrote:

> From: Brijesh Singh <[email protected]>
>
> When SEV-SNP is enabled globally, a write from the host goes through the
> RMP check. When the host writes to pages, hardware checks the following
> conditions at the end of page walk:
>
> 1. Assigned bit in the RMP table is zero (i.e page is shared).
> 2. If the page table entry that gives the sPA indicates that the target
> page size is a large page, then all RMP entries for the 4KB
> constituting pages of the target must have the assigned bit 0.
> 3. Immutable bit in the RMP table is not zero.
>

Just being curious. AMD APM table 15-37 "RMP Page Assignment Settings" shows
Immuable bit is "don't care" when a page is owned by the hypervisor. The
table 15-39 "RMP Memory Access Checks" shows the hardware will do
"Hypervisor-owned" check for host data write and page table access. I suppose
"Hypervisor-owned" check means HW will check if the RMP entry is configured
according to the table 15-37 (Assign bit = 0, ASID = 0, Immutable = X)

None of them mentions that Immutable bit in the related RMP-entry should
be 1 for hypervisor-owned page.

I can understand 1) 2). Can you explain more about 3)?

> The hardware will raise page fault if one of the above conditions is not
> met. Try resolving the fault instead of taking fault again and again. If
> the host attempts to write to the guest private memory then send the
> SIGBUS signal to kill the process. If the page level between the host and
> RMP entry does not match, then split the address to keep the RMP and host
> page levels in sync.
>
> Co-developed-by: Jarkko Sakkinen <[email protected]>
> Signed-off-by: Jarkko Sakkinen <[email protected]>
> Co-developed-by: Ashish Kalra <[email protected]>
> Signed-off-by: Ashish Kalra <[email protected]>
> Signed-off-by: Brijesh Singh <[email protected]>
> Signed-off-by: Michael Roth <[email protected]>
> ---
> arch/x86/mm/fault.c | 97 ++++++++++++++++++++++++++++++++++++++++
> include/linux/mm.h | 3 +-
> include/linux/mm_types.h | 3 ++
> mm/memory.c | 10 +++++
> 4 files changed, 112 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
> index f8193b99e9c8..d611051dcf1e 100644
> --- a/arch/x86/mm/fault.c
> +++ b/arch/x86/mm/fault.c
> @@ -33,6 +33,7 @@
> #include <asm/kvm_para.h> /* kvm_handle_async_pf */
> #include <asm/vdso.h> /* fixup_vdso_exception() */
> #include <asm/irq_stack.h>
> +#include <asm/sev.h> /* snp_lookup_rmpentry() */
>
> #define CREATE_TRACE_POINTS
> #include <asm/trace/exceptions.h>
> @@ -414,6 +415,7 @@ static void dump_pagetable(unsigned long address)
> pr_cont("PTE %lx", pte_val(*pte));
> out:
> pr_cont("\n");
> +
> return;
> bad:
> pr_info("BAD\n");
> @@ -1240,6 +1242,90 @@ do_kern_addr_fault(struct pt_regs *regs, unsigned long hw_error_code,
> }
> NOKPROBE_SYMBOL(do_kern_addr_fault);
>
> +enum rmp_pf_ret {
> + RMP_PF_SPLIT = 0,
> + RMP_PF_RETRY = 1,
> + RMP_PF_UNMAP = 2,
> +};
> +
> +/*
> + * The goal of RMP faulting routine is really to check whether the
> + * page that faulted should be accessible. That can be determined
> + * simply by looking at the RMP entry for the 4k address being accessed.
> + * If that entry has Assigned=1 then it's a bad address. It could be
> + * because the 2MB region was assigned as a large page, or it could be
> + * because the region is all 4k pages and that 4k was assigned.
> + * In either case, it's a bad access.
> + * There are basically two main possibilities:
> + * 1. The 2M entry has Assigned=1 and Page_Size=1. Then all 511 middle
> + * entries also have Assigned=1. This entire 2M region is a guest page.
> + * 2. The 2M entry has Assigned=0 and Page_Size=0. Then the 511 middle
> + * entries can be anything, this region consists of individual 4k assignments.
> + */
> +static int handle_user_rmp_page_fault(struct pt_regs *regs, unsigned long error_code,
> + unsigned long address)
> +{
> + int rmp_level, level;
> + pgd_t *pgd;
> + pte_t *pte;
> + u64 pfn;
> +
> + pgd = __va(read_cr3_pa());
> + pgd += pgd_index(address);
> +
> + pte = lookup_address_in_pgd(pgd, address, &level);
> +
> + /*
> + * It can happen if there was a race between an unmap event and
> + * the RMP fault delivery.
> + */
> + if (!pte || !pte_present(*pte))
> + return RMP_PF_UNMAP;
> +
> + /*
> + * RMP page fault handler follows this algorithm:
> + * 1. Compute the pfn for the 4kb page being accessed
> + * 2. Read that RMP entry -- If it is assigned then kill the process
> + * 3. Otherwise, check the level from the host page table
> + * If level=PG_LEVEL_4K then the page is already smashed
> + * so just retry the instruction
> + * 4. If level=PG_LEVEL_2M/1G, then the host page needs to be split
> + */
> +
> + pfn = pte_pfn(*pte);
> +
> + /* If its large page then calculte the fault pfn */
> + if (level > PG_LEVEL_4K)
> + pfn = pfn | PFN_DOWN(address & (page_level_size(level) - 1));
> +
> + /*
> + * If its a guest private page, then the fault cannot be resolved.
> + * Send a SIGBUS to terminate the process.
> + *
> + * As documented in APM vol3 pseudo-code for RMPUPDATE, when the 2M range
> + * is covered by a valid (Assigned=1) 2M entry, the middle 511 4k entries
> + * also have Assigned=1. This means that if there is an access to a page
> + * which happens to lie within an Assigned 2M entry, the 4k RMP entry
> + * will also have Assigned=1. Therefore, the kernel should see that
> + * the page is not a valid page and the fault cannot be resolved.
> + */
> + if (snp_lookup_rmpentry(pfn, &rmp_level)) {
> + pr_info("Fatal RMP page fault, terminating process, entry assigned for pfn 0x%llx\n",
> + pfn);
> + do_sigbus(regs, error_code, address, VM_FAULT_SIGBUS);
> + return RMP_PF_RETRY;
> + }
> +
> + /*
> + * The backing page level is higher than the RMP page level, request
> + * to split the page.
> + */
> + if (level > rmp_level)
> + return RMP_PF_SPLIT;
> +
> + return RMP_PF_RETRY;
> +}
> +
> /*
> * Handle faults in the user portion of the address space. Nothing in here
> * should check X86_PF_USER without a specific justification: for almost
> @@ -1337,6 +1423,17 @@ void do_user_addr_fault(struct pt_regs *regs,
> if (error_code & X86_PF_INSTR)
> flags |= FAULT_FLAG_INSTRUCTION;
>
> + /*
> + * If its an RMP violation, try resolving it.
> + */
> + if (error_code & X86_PF_RMP) {
> + if (handle_user_rmp_page_fault(regs, error_code, address))
> + return;
> +
> + /* Ask to split the page */
> + flags |= FAULT_FLAG_PAGE_SPLIT;
> + }
> +
> #ifdef CONFIG_X86_64
> /*
> * Faults in the vsyscall page might need emulation. The
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 3c84f4e48cd7..2fd8e16d149c 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -466,7 +466,8 @@ static inline bool fault_flag_allow_retry_first(enum fault_flag flags)
> { FAULT_FLAG_USER, "USER" }, \
> { FAULT_FLAG_REMOTE, "REMOTE" }, \
> { FAULT_FLAG_INSTRUCTION, "INSTRUCTION" }, \
> - { FAULT_FLAG_INTERRUPTIBLE, "INTERRUPTIBLE" }
> + { FAULT_FLAG_INTERRUPTIBLE, "INTERRUPTIBLE" }, \
> + { FAULT_FLAG_PAGE_SPLIT, "PAGESPLIT" }
>
> /*
> * vm_fault is filled by the pagefault handler and passed to the vma's
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 500e536796ca..06ba34d51638 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -962,6 +962,8 @@ typedef struct {
> * mapped R/O.
> * @FAULT_FLAG_ORIG_PTE_VALID: whether the fault has vmf->orig_pte cached.
> * We should only access orig_pte if this flag set.
> + * @FAULT_FLAG_PAGE_SPLIT: The fault was due page size mismatch, split the
> + * region to smaller page size and retry.
> *
> * About @FAULT_FLAG_ALLOW_RETRY and @FAULT_FLAG_TRIED: we can specify
> * whether we would allow page faults to retry by specifying these two
> @@ -999,6 +1001,7 @@ enum fault_flag {
> FAULT_FLAG_INTERRUPTIBLE = 1 << 9,
> FAULT_FLAG_UNSHARE = 1 << 10,
> FAULT_FLAG_ORIG_PTE_VALID = 1 << 11,
> + FAULT_FLAG_PAGE_SPLIT = 1 << 12,
> };
>
> typedef unsigned int __bitwise zap_flags_t;
> diff --git a/mm/memory.c b/mm/memory.c
> index f88c351aecd4..e68da7e403c6 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -4996,6 +4996,12 @@ static vm_fault_t handle_pte_fault(struct vm_fault *vmf)
> return 0;
> }
>
> +static int handle_split_page_fault(struct vm_fault *vmf)
> +{
> + __split_huge_pmd(vmf->vma, vmf->pmd, vmf->address, false, NULL);
> + return 0;
> +}
> +
> /*
> * By the time we get here, we already hold the mm semaphore
> *
> @@ -5078,6 +5084,10 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma,
> pmd_migration_entry_wait(mm, vmf.pmd);
> return 0;
> }
> +
> + if (flags & FAULT_FLAG_PAGE_SPLIT)
> + return handle_split_page_fault(&vmf);
> +
> if (pmd_trans_huge(vmf.orig_pmd) || pmd_devmap(vmf.orig_pmd)) {
> if (pmd_protnone(vmf.orig_pmd) && vma_is_accessible(vma))
> return do_huge_pmd_numa_page(&vmf);

2023-01-18 00:48:41

by Kai Huang

[permalink] [raw]
Subject: Re: [PATCH RFC v7 03/64] KVM: SVM: Advertise private memory support to KVM

On Wed, 2022-12-14 at 13:39 -0600, Michael Roth wrote:
> From: Nikunj A Dadhania <[email protected]>
>
> KVM should use private memory for guests that have upm_mode flag set.
>
> Add a kvm_x86_ops hook for determining UPM support that accounts for
> this situation by only enabling UPM test mode in the case of non-SEV
> guests.
>
> Signed-off-by: Nikunj A Dadhania <[email protected]>
> [mdr: add x86 hook for determining restricted/private memory support]
> Signed-off-by: Michael Roth <[email protected]>
> ---
> arch/x86/include/asm/kvm-x86-ops.h | 1 +
> arch/x86/include/asm/kvm_host.h | 1 +
> arch/x86/kvm/svm/svm.c | 10 ++++++++++
> arch/x86/kvm/x86.c | 8 ++++++++
> 4 files changed, 20 insertions(+)
>
> diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
> index abccd51dcfca..f530a550c092 100644
> --- a/arch/x86/include/asm/kvm-x86-ops.h
> +++ b/arch/x86/include/asm/kvm-x86-ops.h
> @@ -131,6 +131,7 @@ KVM_X86_OP(msr_filter_changed)
> KVM_X86_OP(complete_emulated_msr)
> KVM_X86_OP(vcpu_deliver_sipi_vector)
> KVM_X86_OP_OPTIONAL_RET0(vcpu_get_apicv_inhibit_reasons);
> +KVM_X86_OP_OPTIONAL_RET0(private_mem_enabled);
>
> #undef KVM_X86_OP
> #undef KVM_X86_OP_OPTIONAL
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 2b6244525107..9317abffbf68 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1635,6 +1635,7 @@ struct kvm_x86_ops {
>
> void (*load_mmu_pgd)(struct kvm_vcpu *vcpu, hpa_t root_hpa,
> int root_level);
> + int (*private_mem_enabled)(struct kvm *kvm);
>
> bool (*has_wbinvd_exit)(void);
>
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 91352d692845..7f3e4d91c0c6 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -4694,6 +4694,14 @@ static int svm_vm_init(struct kvm *kvm)
> return 0;
> }
>
> +static int svm_private_mem_enabled(struct kvm *kvm)
> +{
> + if (sev_guest(kvm))
> + return kvm->arch.upm_mode ? 1 : 0;
> +
> + return IS_ENABLED(CONFIG_HAVE_KVM_PRIVATE_MEM_TESTING) ? 1 : 0;
> +}
> +

Is this new callback really needed? Shouldn't kvm->arch.upm_mode be sufficient
enough to indicate whether the private memory will be used or not?

Probably the CONFIG_HAVE_KVM_PRIVATE_MEM_TESTING is the concern here. But this
Kconfig option is not even x86-specific, so shouldn't the handling of it be done
in common code too?

For instance, can we explicitly set 'kvm->arch.upm_mode' to 'true' at some point
of creating the VM if we see CONFIG_HAVE_KVM_PRIVATE_MEM_TESTING is true?

[snip]

2023-01-18 08:56:19

by Nikunj A. Dadhania

[permalink] [raw]
Subject: Re: [PATCH RFC v7 11/64] KVM: SEV: Support private pages in LAUNCH_UPDATE_DATA

On 18/01/23 05:00, Jarkko Sakkinen wrote:
> On Wed, Dec 14, 2022 at 01:40:03PM -0600, Michael Roth wrote:
>> From: Nikunj A Dadhania <[email protected]>

>> @@ -609,9 +659,8 @@ static int sev_launch_update_priv_gfn_handler(struct kvm *kvm,
>> goto e_ret;
>> kvm_release_pfn_clean(pfn);
>> }
>> - kvm_vm_set_region_attr(kvm, range->start, range->end,
>> - true /* priv_attr */);
>>
>> + kvm_vm_set_region_attr(kvm, range->start, range->end, KVM_MEMORY_ATTRIBUTE_PRIVATE);

As the memory attribute is no more a boolean in the UPM series, I had this change.

>> e_ret:
>> return ret;
>> }
>> --
>> 2.25.1
>>
>
> kvm_vm_set_region_attr() should be fixed already in:
>> https://lore.kernel.org/all/[email protected]/

Will discuss with Mike and move this hunk to above patch.

Regards
Nikunj

2023-01-18 15:34:17

by Jeremi Piotrowski

[permalink] [raw]
Subject: Re: [PATCH RFC v7 44/64] KVM: SVM: Remove the long-lived GHCB host map

On Wed, Dec 14, 2022 at 01:40:36PM -0600, Michael Roth wrote:
> From: Brijesh Singh <[email protected]>
>
> On VMGEXIT, sev_handle_vmgexit() creates a host mapping for the GHCB GPA,
> and unmaps it just before VM-entry. This long-lived GHCB map is used by
> the VMGEXIT handler through accessors such as ghcb_{set_get}_xxx().
>
> A long-lived GHCB map can cause issue when SEV-SNP is enabled. When
> SEV-SNP is enabled the mapped GPA needs to be protected against a page
> state change.
>
> To eliminate the long-lived GHCB mapping, update the GHCB sync operations
> to explicitly map the GHCB before access and unmap it after access is
> complete. This requires that the setting of the GHCBs sw_exit_info_{1,2}
> fields be done during sev_es_sync_to_ghcb(), so create two new fields in
> the vcpu_svm struct to hold these values when required to be set outside
> of the GHCB mapping.
>
> Signed-off-by: Brijesh Singh <[email protected]>
> Signed-off-by: Ashish Kalra <[email protected]>
> [mdr: defer per_cpu() assignment and order it with barrier() to fix case
> where kvm_vcpu_map() causes reschedule on different CPU]
> Signed-off-by: Michael Roth <[email protected]>
> ---
> arch/x86/kvm/svm/sev.c | 131 ++++++++++++++++++++++++++---------------
> arch/x86/kvm/svm/svm.c | 18 +++---
> arch/x86/kvm/svm/svm.h | 24 +++++++-
> 3 files changed, 116 insertions(+), 57 deletions(-)
>
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index d5c6e48055fb..6ac0cb6e3484 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -2921,15 +2921,40 @@ void sev_free_vcpu(struct kvm_vcpu *vcpu)
> kvfree(svm->sev_es.ghcb_sa);
> }
>
> +static inline int svm_map_ghcb(struct vcpu_svm *svm, struct kvm_host_map *map)
> +{
> + struct vmcb_control_area *control = &svm->vmcb->control;
> + u64 gfn = gpa_to_gfn(control->ghcb_gpa);
> +
> + if (kvm_vcpu_map(&svm->vcpu, gfn, map)) {
> + /* Unable to map GHCB from guest */
> + pr_err("error mapping GHCB GFN [%#llx] from guest\n", gfn);
> + return -EFAULT;
> + }
> +
> + return 0;
> +}
> +
> +static inline void svm_unmap_ghcb(struct vcpu_svm *svm, struct kvm_host_map *map)
> +{
> + kvm_vcpu_unmap(&svm->vcpu, map, true);
> +}
> +
> static void dump_ghcb(struct vcpu_svm *svm)
> {
> - struct ghcb *ghcb = svm->sev_es.ghcb;
> + struct kvm_host_map map;
> unsigned int nbits;
> + struct ghcb *ghcb;
> +
> + if (svm_map_ghcb(svm, &map))
> + return;
> +
> + ghcb = map.hva;

dump_ghcb() is called from sev_es_validate_vmgexit() with the ghcb already
mapped. How about passing 'struct kvm_host_map *' (or struct ghcb *) as a
param to avoid double mapping?

>
> /* Re-use the dump_invalid_vmcb module parameter */
> if (!dump_invalid_vmcb) {
> pr_warn_ratelimited("set kvm_amd.dump_invalid_vmcb=1 to dump internal KVM state.\n");
> - return;
> + goto e_unmap;
> }
>
> nbits = sizeof(ghcb->save.valid_bitmap) * 8;
> @@ -2944,12 +2969,21 @@ static void dump_ghcb(struct vcpu_svm *svm)
> pr_err("%-20s%016llx is_valid: %u\n", "sw_scratch",
> ghcb->save.sw_scratch, ghcb_sw_scratch_is_valid(ghcb));
> pr_err("%-20s%*pb\n", "valid_bitmap", nbits, ghcb->save.valid_bitmap);
> +
> +e_unmap:
> + svm_unmap_ghcb(svm, &map);
> }
>
> -static void sev_es_sync_to_ghcb(struct vcpu_svm *svm)
> +static bool sev_es_sync_to_ghcb(struct vcpu_svm *svm)
> {
> struct kvm_vcpu *vcpu = &svm->vcpu;
> - struct ghcb *ghcb = svm->sev_es.ghcb;
> + struct kvm_host_map map;
> + struct ghcb *ghcb;
> +
> + if (svm_map_ghcb(svm, &map))
> + return false;
> +
> + ghcb = map.hva;
>
> /*
> * The GHCB protocol so far allows for the following data
> @@ -2963,13 +2997,24 @@ static void sev_es_sync_to_ghcb(struct vcpu_svm *svm)
> ghcb_set_rbx(ghcb, vcpu->arch.regs[VCPU_REGS_RBX]);
> ghcb_set_rcx(ghcb, vcpu->arch.regs[VCPU_REGS_RCX]);
> ghcb_set_rdx(ghcb, vcpu->arch.regs[VCPU_REGS_RDX]);
> +
> + /*
> + * Copy the return values from the exit_info_{1,2}.
> + */
> + ghcb_set_sw_exit_info_1(ghcb, svm->sev_es.ghcb_sw_exit_info_1);
> + ghcb_set_sw_exit_info_2(ghcb, svm->sev_es.ghcb_sw_exit_info_2);
> +
> + trace_kvm_vmgexit_exit(svm->vcpu.vcpu_id, ghcb);
> +
> + svm_unmap_ghcb(svm, &map);
> +
> + return true;
> }
>
> -static void sev_es_sync_from_ghcb(struct vcpu_svm *svm)
> +static void sev_es_sync_from_ghcb(struct vcpu_svm *svm, struct ghcb *ghcb)
> {
> struct vmcb_control_area *control = &svm->vmcb->control;
> struct kvm_vcpu *vcpu = &svm->vcpu;
> - struct ghcb *ghcb = svm->sev_es.ghcb;
> u64 exit_code;
>
> /*
> @@ -3013,20 +3058,25 @@ static void sev_es_sync_from_ghcb(struct vcpu_svm *svm)
> memset(ghcb->save.valid_bitmap, 0, sizeof(ghcb->save.valid_bitmap));
> }
>
> -static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
> +static int sev_es_validate_vmgexit(struct vcpu_svm *svm, u64 *exit_code)
> {
> - struct kvm_vcpu *vcpu;
> + struct kvm_vcpu *vcpu = &svm->vcpu;
> + struct kvm_host_map map;
> struct ghcb *ghcb;
> - u64 exit_code;
> u64 reason;
>
> - ghcb = svm->sev_es.ghcb;
> + if (svm_map_ghcb(svm, &map))
> + return -EFAULT;
> +
> + ghcb = map.hva;
> +
> + trace_kvm_vmgexit_enter(vcpu->vcpu_id, ghcb);
>
> /*
> * Retrieve the exit code now even though it may not be marked valid
> * as it could help with debugging.
> */
> - exit_code = ghcb_get_sw_exit_code(ghcb);
> + *exit_code = ghcb_get_sw_exit_code(ghcb);
>
> /* Only GHCB Usage code 0 is supported */
> if (ghcb->ghcb_usage) {
> @@ -3119,6 +3169,9 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
> goto vmgexit_err;
> }
>
> + sev_es_sync_from_ghcb(svm, ghcb);
> +
> + svm_unmap_ghcb(svm, &map);
> return 0;
>
> vmgexit_err:
> @@ -3129,10 +3182,10 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
> ghcb->ghcb_usage);
> } else if (reason == GHCB_ERR_INVALID_EVENT) {
> vcpu_unimpl(vcpu, "vmgexit: exit code %#llx is not valid\n",
> - exit_code);
> + *exit_code);
> } else {
> vcpu_unimpl(vcpu, "vmgexit: exit code %#llx input is not valid\n",
> - exit_code);
> + *exit_code);
> dump_ghcb(svm);
> }
>
> @@ -3142,6 +3195,8 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
> ghcb_set_sw_exit_info_1(ghcb, 2);
> ghcb_set_sw_exit_info_2(ghcb, reason);
>
> + svm_unmap_ghcb(svm, &map);
> +
> /* Resume the guest to "return" the error code. */
> return 1;
> }

2023-01-18 16:07:08

by Jeremi Piotrowski

[permalink] [raw]
Subject: Re: [PATCH RFC v7 14/64] x86/sev: Add the host SEV-SNP initialization support

On Wed, Dec 14, 2022 at 01:40:06PM -0600, Michael Roth wrote:
> From: Brijesh Singh <[email protected]>
>
> The memory integrity guarantees of SEV-SNP are enforced through a new
> structure called the Reverse Map Table (RMP). The RMP is a single data
> structure shared across the system that contains one entry for every 4K
> page of DRAM that may be used by SEV-SNP VMs. The goal of RMP is to
> track the owner of each page of memory. Pages of memory can be owned by
> the hypervisor, owned by a specific VM or owned by the AMD-SP. See APM2
> section 15.36.3 for more detail on RMP.
>
> The RMP table is used to enforce access control to memory. The table itself
> is not directly writable by the software. New CPU instructions (RMPUPDATE,
> PVALIDATE, RMPADJUST) are used to manipulate the RMP entries.
>
> Based on the platform configuration, the BIOS reserves the memory used
> for the RMP table. The start and end address of the RMP table must be
> queried by reading the RMP_BASE and RMP_END MSRs. If the RMP_BASE and
> RMP_END are not set then disable the SEV-SNP feature.
>
> The SEV-SNP feature is enabled only after the RMP table is successfully
> initialized.
>
> Also set SYSCFG.MFMD when enabling SNP as SEV-SNP FW >= 1.51 requires
> that SYSCFG.MFMD must be se
>
> RMP table entry format is non-architectural and it can vary by processor
> and is defined by the PPR. Restrict SNP support on the known CPU model
> and family for which the RMP table entry format is currently defined for.
>
> Signed-off-by: Brijesh Singh <[email protected]>
> Signed-off-b: Ashish Kalra <[email protected]>
> Signed-off-by: Michael Roth <[email protected]>
> ---
> arch/x86/include/asm/disabled-features.h | 8 +-
> arch/x86/include/asm/msr-index.h | 11 +-
> arch/x86/kernel/sev.c | 180 +++++++++++++++++++++++
> 3 files changed, 197 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/include/asm/disabled-features.h b/arch/x86/include/asm/disabled-features.h
> index 33d2cd04d254..9b5a2cc8064a 100644
> --- a/arch/x86/include/asm/disabled-features.h
> +++ b/arch/x86/include/asm/disabled-features.h
> @@ -87,6 +87,12 @@
> # define DISABLE_TDX_GUEST (1 << (X86_FEATURE_TDX_GUEST & 31))
> #endif
>
> +#ifdef CONFIG_AMD_MEM_ENCRYPT
> +# define DISABLE_SEV_SNP 0
> +#else
> +# define DISABLE_SEV_SNP (1 << (X86_FEATURE_SEV_SNP & 31))
> +#endif
> +
> /*
> * Make sure to add features to the correct mask
> */
> @@ -110,7 +116,7 @@
> DISABLE_ENQCMD)
> #define DISABLED_MASK17 0
> #define DISABLED_MASK18 0
> -#define DISABLED_MASK19 0
> +#define DISABLED_MASK19 (DISABLE_SEV_SNP)
> #define DISABLED_MASK_CHECK BUILD_BUG_ON_ZERO(NCAPINTS != 20)
>
> #endif /* _ASM_X86_DISABLED_FEATURES_H */
> diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
> index 10ac52705892..35100c630617 100644
> --- a/arch/x86/include/asm/msr-index.h
> +++ b/arch/x86/include/asm/msr-index.h
> @@ -565,6 +565,8 @@
> #define MSR_AMD64_SEV_ENABLED BIT_ULL(MSR_AMD64_SEV_ENABLED_BIT)
> #define MSR_AMD64_SEV_ES_ENABLED BIT_ULL(MSR_AMD64_SEV_ES_ENABLED_BIT)
> #define MSR_AMD64_SEV_SNP_ENABLED BIT_ULL(MSR_AMD64_SEV_SNP_ENABLED_BIT)
> +#define MSR_AMD64_RMP_BASE 0xc0010132
> +#define MSR_AMD64_RMP_END 0xc0010133
>
> #define MSR_AMD64_VIRT_SPEC_CTRL 0xc001011f
>
> @@ -649,7 +651,14 @@
> #define MSR_K8_TOP_MEM2 0xc001001d
> #define MSR_AMD64_SYSCFG 0xc0010010
> #define MSR_AMD64_SYSCFG_MEM_ENCRYPT_BIT 23
> -#define MSR_AMD64_SYSCFG_MEM_ENCRYPT BIT_ULL(MSR_AMD64_SYSCFG_MEM_ENCRYPT_BIT)
> +#define MSR_AMD64_SYSCFG_MEM_ENCRYPT BIT_ULL(MSR_AMD64_SYSCFG_MEM_ENCRYPT_BIT)
> +#define MSR_AMD64_SYSCFG_SNP_EN_BIT 24
> +#define MSR_AMD64_SYSCFG_SNP_EN BIT_ULL(MSR_AMD64_SYSCFG_SNP_EN_BIT)
> +#define MSR_AMD64_SYSCFG_SNP_VMPL_EN_BIT 25
> +#define MSR_AMD64_SYSCFG_SNP_VMPL_EN BIT_ULL(MSR_AMD64_SYSCFG_SNP_VMPL_EN_BIT)
> +#define MSR_AMD64_SYSCFG_MFDM_BIT 19
> +#define MSR_AMD64_SYSCFG_MFDM BIT_ULL(MSR_AMD64_SYSCFG_MFDM_BIT)
> +
> #define MSR_K8_INT_PENDING_MSG 0xc0010055
> /* C1E active bits in int pending message */
> #define K8_INTP_C1E_ACTIVE_MASK 0x18000000
> diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
> index a428c62330d3..687a91284506 100644
> --- a/arch/x86/kernel/sev.c
> +++ b/arch/x86/kernel/sev.c
> @@ -22,6 +22,9 @@
> #include <linux/efi.h>
> #include <linux/platform_device.h>
> #include <linux/io.h>
> +#include <linux/cpumask.h>
> +#include <linux/iommu.h>
> +#include <linux/amd-iommu.h>
>
> #include <asm/cpu_entry_area.h>
> #include <asm/stacktrace.h>
> @@ -38,6 +41,7 @@
> #include <asm/apic.h>
> #include <asm/cpuid.h>
> #include <asm/cmdline.h>
> +#include <asm/iommu.h>
>
> #define DR7_RESET_VALUE 0x400
>
> @@ -57,6 +61,12 @@
> #define AP_INIT_CR0_DEFAULT 0x60000010
> #define AP_INIT_MXCSR_DEFAULT 0x1f80
>
> +/*
> + * The first 16KB from the RMP_BASE is used by the processor for the
> + * bookkeeping, the range needs to be added during the RMP entry lookup.
> + */
> +#define RMPTABLE_CPU_BOOKKEEPING_SZ 0x4000
> +
> /* For early boot hypervisor communication in SEV-ES enabled guests */
> static struct ghcb boot_ghcb_page __bss_decrypted __aligned(PAGE_SIZE);
>
> @@ -69,6 +79,9 @@ static struct ghcb *boot_ghcb __section(".data");
> /* Bitmap of SEV features supported by the hypervisor */
> static u64 sev_hv_features __ro_after_init;
>
> +static unsigned long rmptable_start __ro_after_init;
> +static unsigned long rmptable_end __ro_after_init;
> +
> /* #VC handler runtime per-CPU data */
> struct sev_es_runtime_data {
> struct ghcb ghcb_page;
> @@ -2260,3 +2273,170 @@ static int __init snp_init_platform_device(void)
> return 0;
> }
> device_initcall(snp_init_platform_device);
> +
> +#undef pr_fmt
> +#define pr_fmt(fmt) "SEV-SNP: " fmt
> +
> +static int __mfd_enable(unsigned int cpu)
> +{
> + u64 val;
> +
> + if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
> + return 0;
> +
> + rdmsrl(MSR_AMD64_SYSCFG, val);
> +
> + val |= MSR_AMD64_SYSCFG_MFDM;
> +
> + wrmsrl(MSR_AMD64_SYSCFG, val);
> +
> + return 0;
> +}
> +
> +static __init void mfd_enable(void *arg)
> +{
> + __mfd_enable(smp_processor_id());
> +}
> +
> +static int __snp_enable(unsigned int cpu)
> +{
> + u64 val;
> +
> + if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
> + return 0;
> +
> + rdmsrl(MSR_AMD64_SYSCFG, val);
> +
> + val |= MSR_AMD64_SYSCFG_SNP_EN;
> + val |= MSR_AMD64_SYSCFG_SNP_VMPL_EN;
> +
> + wrmsrl(MSR_AMD64_SYSCFG, val);
> +
> + return 0;
> +}
> +
> +static __init void snp_enable(void *arg)
> +{
> + __snp_enable(smp_processor_id());
> +}
> +
> +static bool get_rmptable_info(u64 *start, u64 *len)
> +{
> + u64 calc_rmp_sz, rmp_sz, rmp_base, rmp_end;
> +
> + rdmsrl(MSR_AMD64_RMP_BASE, rmp_base);
> + rdmsrl(MSR_AMD64_RMP_END, rmp_end);
> +
> + if (!rmp_base || !rmp_end) {
> + pr_err("Memory for the RMP table has not been reserved by BIOS\n");
> + return false;
> + }
> +
> + rmp_sz = rmp_end - rmp_base + 1;
> +
> + /*
> + * Calculate the amount the memory that must be reserved by the BIOS to
> + * address the whole RAM. The reserved memory should also cover the
> + * RMP table itself.
> + */
> + calc_rmp_sz = (((rmp_sz >> PAGE_SHIFT) + totalram_pages()) << 4) + RMPTABLE_CPU_BOOKKEEPING_SZ;

Since the rmptable is indexed by page number, I believe this check should be
using max_pfn:

calc_rmp_sz = (max_pfn << 4) + RMPTABLE_CPU_BOOKKEEPING_SZ;

This accounts for holes/offsets in the memory map which lead to the top of
memory having pfn > totalram_pages().

> +
> + if (calc_rmp_sz > rmp_sz) {
> + pr_err("Memory reserved for the RMP table does not cover full system RAM (expected 0x%llx got 0x%llx)\n",
> + calc_rmp_sz, rmp_sz);
> + return false;
> + }
> +
> + *start = rmp_base;
> + *len = rmp_sz;
> +
> + pr_info("RMP table physical address [0x%016llx - 0x%016llx]\n", rmp_base, rmp_end);
> +
> + return true;
> +}
> +
> +static __init int __snp_rmptable_init(void)
> +{
> + u64 rmp_base, sz;
> + void *start;
> + u64 val;
> +
> + if (!get_rmptable_info(&rmp_base, &sz))
> + return 1;
> +
> + start = memremap(rmp_base, sz, MEMREMAP_WB);
> + if (!start) {
> + pr_err("Failed to map RMP table addr 0x%llx size 0x%llx\n", rmp_base, sz);
> + return 1;
> + }
> +
> + /*
> + * Check if SEV-SNP is already enabled, this can happen in case of
> + * kexec boot.
> + */
> + rdmsrl(MSR_AMD64_SYSCFG, val);
> + if (val & MSR_AMD64_SYSCFG_SNP_EN)
> + goto skip_enable;
> +
> + /* Initialize the RMP table to zero */
> + memset(start, 0, sz);
> +
> + /* Flush the caches to ensure that data is written before SNP is enabled. */
> + wbinvd_on_all_cpus();
> +
> + /* MFDM must be enabled on all the CPUs prior to enabling SNP. */
> + on_each_cpu(mfd_enable, NULL, 1);
> +
> + /* Enable SNP on all CPUs. */
> + on_each_cpu(snp_enable, NULL, 1);
> +
> +skip_enable:
> + rmptable_start = (unsigned long)start;
> + rmptable_end = rmptable_start + sz - 1;
> +
> + return 0;
> +}
> +
> +static int __init snp_rmptable_init(void)
> +{
> + int family, model;
> +
> + if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
> + return 0;
> +
> + family = boot_cpu_data.x86;
> + model = boot_cpu_data.x86_model;
> +
> + /*
> + * RMP table entry format is not architectural and it can vary by processor and
> + * is defined by the per-processor PPR. Restrict SNP support on the known CPU
> + * model and family for which the RMP table entry format is currently defined for.
> + */
> + if (family != 0x19 || model > 0xaf)
> + goto nosnp;
> +
> + if (amd_iommu_snp_enable())
> + goto nosnp;
> +
> + if (__snp_rmptable_init())
> + goto nosnp;
> +
> + cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "x86/rmptable_init:online", __snp_enable, NULL);
> +
> + return 0;
> +
> +nosnp:
> + setup_clear_cpu_cap(X86_FEATURE_SEV_SNP);
> + return -ENOSYS;
> +}
> +
> +/*
> + * This must be called after the PCI subsystem. This is because amd_iommu_snp_enable()
> + * is called to ensure the IOMMU supports the SEV-SNP feature, which can only be
> + * called after subsys_initcall().
> + *
> + * NOTE: IOMMU is enforced by SNP to ensure that hypervisor cannot program DMA
> + * directly into guest private memory. In case of SNP, the IOMMU ensures that
> + * the page(s) used for DMA are hypervisor owned.
> + */
> +fs_initcall(snp_rmptable_init);
> --
> 2.25.1
>

2023-01-18 18:26:48

by Alper Gun

[permalink] [raw]
Subject: Re: [PATCH RFC v7 44/64] KVM: SVM: Remove the long-lived GHCB host map

On Wed, Jan 18, 2023 at 7:27 AM Jeremi Piotrowski
<[email protected]> wrote:
>
> On Wed, Dec 14, 2022 at 01:40:36PM -0600, Michael Roth wrote:
> > From: Brijesh Singh <[email protected]>
> >
> > On VMGEXIT, sev_handle_vmgexit() creates a host mapping for the GHCB GPA,
> > and unmaps it just before VM-entry. This long-lived GHCB map is used by
> > the VMGEXIT handler through accessors such as ghcb_{set_get}_xxx().
> >
> > A long-lived GHCB map can cause issue when SEV-SNP is enabled. When
> > SEV-SNP is enabled the mapped GPA needs to be protected against a page
> > state change.
> >
> > To eliminate the long-lived GHCB mapping, update the GHCB sync operations
> > to explicitly map the GHCB before access and unmap it after access is
> > complete. This requires that the setting of the GHCBs sw_exit_info_{1,2}
> > fields be done during sev_es_sync_to_ghcb(), so create two new fields in
> > the vcpu_svm struct to hold these values when required to be set outside
> > of the GHCB mapping.
> >
> > Signed-off-by: Brijesh Singh <[email protected]>
> > Signed-off-by: Ashish Kalra <[email protected]>
> > [mdr: defer per_cpu() assignment and order it with barrier() to fix case
> > where kvm_vcpu_map() causes reschedule on different CPU]
> > Signed-off-by: Michael Roth <[email protected]>
> > ---
> > arch/x86/kvm/svm/sev.c | 131 ++++++++++++++++++++++++++---------------
> > arch/x86/kvm/svm/svm.c | 18 +++---
> > arch/x86/kvm/svm/svm.h | 24 +++++++-
> > 3 files changed, 116 insertions(+), 57 deletions(-)
> >
> > diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> > index d5c6e48055fb..6ac0cb6e3484 100644
> > --- a/arch/x86/kvm/svm/sev.c
> > +++ b/arch/x86/kvm/svm/sev.c
> > @@ -2921,15 +2921,40 @@ void sev_free_vcpu(struct kvm_vcpu *vcpu)
> > kvfree(svm->sev_es.ghcb_sa);
> > }
> >
> > +static inline int svm_map_ghcb(struct vcpu_svm *svm, struct kvm_host_map *map)
> > +{
> > + struct vmcb_control_area *control = &svm->vmcb->control;
> > + u64 gfn = gpa_to_gfn(control->ghcb_gpa);
> > +
> > + if (kvm_vcpu_map(&svm->vcpu, gfn, map)) {
> > + /* Unable to map GHCB from guest */
> > + pr_err("error mapping GHCB GFN [%#llx] from guest\n", gfn);
> > + return -EFAULT;
> > + }
> > +
> > + return 0;
> > +}
> > +
> > +static inline void svm_unmap_ghcb(struct vcpu_svm *svm, struct kvm_host_map *map)
> > +{
> > + kvm_vcpu_unmap(&svm->vcpu, map, true);
> > +}
> > +
> > static void dump_ghcb(struct vcpu_svm *svm)
> > {
> > - struct ghcb *ghcb = svm->sev_es.ghcb;
> > + struct kvm_host_map map;
> > unsigned int nbits;
> > + struct ghcb *ghcb;
> > +
> > + if (svm_map_ghcb(svm, &map))
> > + return;
> > +
> > + ghcb = map.hva;
>
> dump_ghcb() is called from sev_es_validate_vmgexit() with the ghcb already
> mapped. How about passing 'struct kvm_host_map *' (or struct ghcb *) as a
> param to avoid double mapping?

This also causes a soft lockup, PSC spin lock is already acquired in
sev_es_validate_vmgexit. dump_ghcb will try to acquire the same lock
again. So a guest can send an invalid ghcb page and cause a host soft
lockup.

>
> >
> > /* Re-use the dump_invalid_vmcb module parameter */
> > if (!dump_invalid_vmcb) {
> > pr_warn_ratelimited("set kvm_amd.dump_invalid_vmcb=1 to dump internal KVM state.\n");
> > - return;
> > + goto e_unmap;
> > }
> >
> > nbits = sizeof(ghcb->save.valid_bitmap) * 8;
> > @@ -2944,12 +2969,21 @@ static void dump_ghcb(struct vcpu_svm *svm)
> > pr_err("%-20s%016llx is_valid: %u\n", "sw_scratch",
> > ghcb->save.sw_scratch, ghcb_sw_scratch_is_valid(ghcb));
> > pr_err("%-20s%*pb\n", "valid_bitmap", nbits, ghcb->save.valid_bitmap);
> > +
> > +e_unmap:
> > + svm_unmap_ghcb(svm, &map);
> > }
> >
> > -static void sev_es_sync_to_ghcb(struct vcpu_svm *svm)
> > +static bool sev_es_sync_to_ghcb(struct vcpu_svm *svm)
> > {
> > struct kvm_vcpu *vcpu = &svm->vcpu;
> > - struct ghcb *ghcb = svm->sev_es.ghcb;
> > + struct kvm_host_map map;
> > + struct ghcb *ghcb;
> > +
> > + if (svm_map_ghcb(svm, &map))
> > + return false;
> > +
> > + ghcb = map.hva;
> >
> > /*
> > * The GHCB protocol so far allows for the following data
> > @@ -2963,13 +2997,24 @@ static void sev_es_sync_to_ghcb(struct vcpu_svm *svm)
> > ghcb_set_rbx(ghcb, vcpu->arch.regs[VCPU_REGS_RBX]);
> > ghcb_set_rcx(ghcb, vcpu->arch.regs[VCPU_REGS_RCX]);
> > ghcb_set_rdx(ghcb, vcpu->arch.regs[VCPU_REGS_RDX]);
> > +
> > + /*
> > + * Copy the return values from the exit_info_{1,2}.
> > + */
> > + ghcb_set_sw_exit_info_1(ghcb, svm->sev_es.ghcb_sw_exit_info_1);
> > + ghcb_set_sw_exit_info_2(ghcb, svm->sev_es.ghcb_sw_exit_info_2);
> > +
> > + trace_kvm_vmgexit_exit(svm->vcpu.vcpu_id, ghcb);
> > +
> > + svm_unmap_ghcb(svm, &map);
> > +
> > + return true;
> > }
> >
> > -static void sev_es_sync_from_ghcb(struct vcpu_svm *svm)
> > +static void sev_es_sync_from_ghcb(struct vcpu_svm *svm, struct ghcb *ghcb)
> > {
> > struct vmcb_control_area *control = &svm->vmcb->control;
> > struct kvm_vcpu *vcpu = &svm->vcpu;
> > - struct ghcb *ghcb = svm->sev_es.ghcb;
> > u64 exit_code;
> >
> > /*
> > @@ -3013,20 +3058,25 @@ static void sev_es_sync_from_ghcb(struct vcpu_svm *svm)
> > memset(ghcb->save.valid_bitmap, 0, sizeof(ghcb->save.valid_bitmap));
> > }
> >
> > -static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
> > +static int sev_es_validate_vmgexit(struct vcpu_svm *svm, u64 *exit_code)
> > {
> > - struct kvm_vcpu *vcpu;
> > + struct kvm_vcpu *vcpu = &svm->vcpu;
> > + struct kvm_host_map map;
> > struct ghcb *ghcb;
> > - u64 exit_code;
> > u64 reason;
> >
> > - ghcb = svm->sev_es.ghcb;
> > + if (svm_map_ghcb(svm, &map))
> > + return -EFAULT;
> > +
> > + ghcb = map.hva;
> > +
> > + trace_kvm_vmgexit_enter(vcpu->vcpu_id, ghcb);
> >
> > /*
> > * Retrieve the exit code now even though it may not be marked valid
> > * as it could help with debugging.
> > */
> > - exit_code = ghcb_get_sw_exit_code(ghcb);
> > + *exit_code = ghcb_get_sw_exit_code(ghcb);
> >
> > /* Only GHCB Usage code 0 is supported */
> > if (ghcb->ghcb_usage) {
> > @@ -3119,6 +3169,9 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
> > goto vmgexit_err;
> > }
> >
> > + sev_es_sync_from_ghcb(svm, ghcb);
> > +
> > + svm_unmap_ghcb(svm, &map);
> > return 0;
> >
> > vmgexit_err:
> > @@ -3129,10 +3182,10 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
> > ghcb->ghcb_usage);
> > } else if (reason == GHCB_ERR_INVALID_EVENT) {
> > vcpu_unimpl(vcpu, "vmgexit: exit code %#llx is not valid\n",
> > - exit_code);
> > + *exit_code);
> > } else {
> > vcpu_unimpl(vcpu, "vmgexit: exit code %#llx input is not valid\n",
> > - exit_code);
> > + *exit_code);
> > dump_ghcb(svm);
> > }
> >
> > @@ -3142,6 +3195,8 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
> > ghcb_set_sw_exit_info_1(ghcb, 2);
> > ghcb_set_sw_exit_info_2(ghcb, reason);
> >
> > + svm_unmap_ghcb(svm, &map);
> > +
> > /* Resume the guest to "return" the error code. */
> > return 1;
> > }
>

2023-01-18 21:40:29

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH RFC v7 03/64] KVM: SVM: Advertise private memory support to KVM

On Wed, Jan 18, 2023, Huang, Kai wrote:
> On Wed, 2022-12-14 at 13:39 -0600, Michael Roth wrote:
> > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> > index 91352d692845..7f3e4d91c0c6 100644
> > --- a/arch/x86/kvm/svm/svm.c
> > +++ b/arch/x86/kvm/svm/svm.c
> > @@ -4694,6 +4694,14 @@ static int svm_vm_init(struct kvm *kvm)
> > return 0;
> > }
> >
> > +static int svm_private_mem_enabled(struct kvm *kvm)
> > +{
> > + if (sev_guest(kvm))
> > + return kvm->arch.upm_mode ? 1 : 0;
> > +
> > + return IS_ENABLED(CONFIG_HAVE_KVM_PRIVATE_MEM_TESTING) ? 1 : 0;
> > +}
> > +
>
> Is this new callback really needed?

Probably not. For anything in this series that gets within spitting distance of
CONFIG_HAVE_KVM_PRIVATE_MEM_TESTING, my recommendation is to make a mental note
but otherwise ignore things like this for now. I suspect it will be much, much
more efficient to sort all of this out when I smush UPM+SNP+TDX together in a few
weeks.

2023-01-19 07:26:29

by Dov Murik

[permalink] [raw]
Subject: Re: [PATCH RFC v7 31/64] crypto: ccp: Add the SNP_{SET,GET}_EXT_CONFIG command

Hi Mike,

On 14/12/2022 21:40, Michael Roth wrote:
> From: Brijesh Singh <[email protected]>
>
> The SEV-SNP firmware provides the SNP_CONFIG command used to set the
> system-wide configuration value for SNP guests. The information includes
> the TCB version string to be reported in guest attestation reports.
>
> Version 2 of the GHCB specification adds an NAE (SNP extended guest
> request) that a guest can use to query the reports that include additional
> certificates.
>
> In both cases, userspace provided additional data is included in the
> attestation reports. The userspace will use the SNP_SET_EXT_CONFIG
> command to give the certificate blob and the reported TCB version string
> at once. Note that the specification defines certificate blob with a
> specific GUID format; the userspace is responsible for building the
> proper certificate blob. The ioctl treats it an opaque blob.
>
> While it is not defined in the spec, but let's add SNP_GET_EXT_CONFIG
> command that can be used to obtain the data programmed through the
> SNP_SET_EXT_CONFIG.
>
> Signed-off-by: Brijesh Singh <[email protected]>
> Signed-off-by: Ashish Kalra <[email protected]>
> Signed-off-by: Michael Roth <[email protected]>
> ---
> Documentation/virt/coco/sev-guest.rst | 27 ++++++
> drivers/crypto/ccp/sev-dev.c | 123 ++++++++++++++++++++++++++
> drivers/crypto/ccp/sev-dev.h | 4 +
> include/uapi/linux/psp-sev.h | 17 ++++
> 4 files changed, 171 insertions(+)
>
> diff --git a/Documentation/virt/coco/sev-guest.rst b/Documentation/virt/coco/sev-guest.rst
> index 11ea67c944df..fad1e5639dac 100644
> --- a/Documentation/virt/coco/sev-guest.rst
> +++ b/Documentation/virt/coco/sev-guest.rst
> @@ -145,6 +145,33 @@ The SNP_PLATFORM_STATUS command is used to query the SNP platform status. The
> status includes API major, minor version and more. See the SEV-SNP
> specification for further details.
>
> +2.5 SNP_SET_EXT_CONFIG
> +----------------------
> +:Technology: sev-snp
> +:Type: hypervisor ioctl cmd
> +:Parameters (in): struct sev_data_snp_ext_config
> +:Returns (out): 0 on success, -negative on error
> +
> +The SNP_SET_EXT_CONFIG is used to set the system-wide configuration such as
> +reported TCB version in the attestation report. The command is similar to
> +SNP_CONFIG command defined in the SEV-SNP spec. The main difference is the
> +command also accepts an additional certificate blob defined in the GHCB
> +specification.
> +
> +If the certs_address is zero, then the previous certificate blob will deleted.
> +For more information on the certificate blob layout, see the GHCB spec
> +(extended guest request message).
> +
> +2.6 SNP_GET_EXT_CONFIG
> +----------------------
> +:Technology: sev-snp
> +:Type: hypervisor ioctl cmd
> +:Parameters (in): struct sev_data_snp_ext_config
> +:Returns (out): 0 on success, -negative on error
> +
> +The SNP_SET_EXT_CONFIG is used to query the system-wide configuration set

^^^^^^^^^^^^^^^^^^

This should be SNP_GET_EXT_CONFIG.


-Dov

> +through the SNP_SET_EXT_CONFIG.
> +
> 3. SEV-SNP CPUID Enforcement
> ============================
>

2023-01-19 13:08:38

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: [PATCH RFC v7 02/64] KVM: x86: Add KVM_CAP_UNMAPPED_PRIVATE_MEMORY

On Wed, Jan 04, 2023 at 11:47:21AM -0600, Michael Roth wrote:
> On Thu, Dec 22, 2022 at 01:26:25PM +0100, Borislav Petkov wrote:
> > On Wed, Dec 14, 2022 at 01:39:54PM -0600, Michael Roth wrote:
> > > This mainly indicates to KVM that it should expect all private guest
> > > memory to be backed by private memslots. Ideally this would work
> > > similarly for others archs, give or take a few additional flags, but
> > > for now it's a simple boolean indicator for x86.
> >
> > ...
> >
> > > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> > > index c7e9d375a902..cc9424ccf9b2 100644
> > > --- a/include/uapi/linux/kvm.h
> > > +++ b/include/uapi/linux/kvm.h
> > > @@ -1219,6 +1219,7 @@ struct kvm_ppc_resize_hpt {
> > > #define KVM_CAP_DIRTY_LOG_RING_ACQ_REL 223
> > > #define KVM_CAP_S390_PROTECTED_ASYNC_DISABLE 224
> > > #define KVM_CAP_MEMORY_ATTRIBUTES 225
> > > +#define KVM_CAP_UNMAPPED_PRIVATE_MEM 240
> >
> > Isn't this new cap supposed to be documented somewhere in
> > Documentation/virt/kvm/api.rst ?
>
> It should, but this is sort of a placeholder for now. Ideally we'd
> re-use the capabilities introduced by UPM patchset rather than introduce
> a new one. Originally the UPM patchset had a KVM_CAP_PRIVATE_MEM which
> we planned to use to switch between legacy SEV and UPM-based SEV (for
> lazy-pinning support) by making it writeable, but that was removed in v10
> in favor of KVM_CAP_MEMORY_ATTRIBUTES, which is tied to the new
> KVM_GET_SUPPORTED_MEMORY_ATTRIBUTES/KVM_SET_MEMORY_ATTRIBUTES ioctls:
>
> https://lore.kernel.org/lkml/CA+EHjTxXOdzcP25F57Mtmnb1NWyG5DcyqeDPqzjEOzRUrqH8FQ@mail.gmail.com/
>
> It wasn't clear at the time if that was the right interface to use for
> this particular case, so we stuck with the more general
> 'use-upm/dont-use-upm' semantics originally provided by making
> KVM_CAP_UNMAPPED_PRIVATE_MEM/KVM_CAP_PRIVATE_MEM writeable.
>
> But maybe it's okay to just make KVM_CAP_MEMORY_ATTRIBUTES writeable and
> require userspace to negotiate it rather than just tying it to
> CONFIG_HAVE_KVM_MEMORY_ATTRIBUTES. Or maybe introducing a new
> KVM_SET_SUPPORTED_MEMORY_ATTRIBUTES ioctl to pair with
> KVM_GET_SUPPORTED_MEMORY_ATTRIBUTES. It sort of makes sense, since userspace
> needs to be prepared to deal with KVM_EXIT_MEMORY_FAULTs relating to these
> attributes.

Doesn't upm patch set imply that user space should negotiate the memory
attributes with the ioctl?

For me it looks like that the problem is introduced by conflicting usage
pattern in the SNP code [*].

Perhaps sev_launch_update_gfn_handler() should not set memory attributes
but instead expect user space to do it before the call?

[*] https://lore.kernel.org/all/[email protected]/

BR, Jarkko

2023-01-19 16:32:41

by Kalra, Ashish

[permalink] [raw]
Subject: Re: [PATCH RFC v7 14/64] x86/sev: Add the host SEV-SNP initialization support

On 1/11/2023 8:50 AM, Sabin Rapan wrote:
>
>
> On 14.12.2022 21:40, Michael Roth wrote:
>> +#ifdef CONFIG_AMD_MEM_ENCRYPT
>> +# define DISABLE_SEV_SNP 0
>> +#else
>> +# define DISABLE_SEV_SNP (1 << (X86_FEATURE_SEV_SNP & 31))
>> +#endif
>> +
>
> Would it make sense to split the SEV-* feature family into their own
> config flag(s) ?
> I'm thinking in the context of SEV-SNP running on systems with
> Transparent SME enabled in the bios. In this case, enabling
> CONFIG_AMD_MEM_ENCRYPT will also enable SME in the kernel, which is a
> bit strange and not necessarily useful.
> Commit 4e2c87949f2b ("crypto: ccp - When TSME and SME both detected
> notify user") highlights it.
>

Yes, we plan to move the SNP host initialization stuff into a separate
source file and under a different config flag such as CONFIG_KVM_AMD_SEV
or something.

Thanks,
Ashish

2023-01-19 18:56:50

by Dionna Amalie Glaze

[permalink] [raw]
Subject: Re: [PATCH RFC v7 62/64] x86/sev: Add KVM commands for instance certs

> +
> + /* Page-align the length */
> + length = (params.certs_len + PAGE_SIZE - 1) & PAGE_MASK;
> +

I believe Ashish wanted this to be PAGE_ALIGN(params.certs_len)

--
-Dionna Glaze, PhD (she/her)

2023-01-19 20:46:48

by Dionna Amalie Glaze

[permalink] [raw]
Subject: Re: [PATCH RFC v7 52/64] KVM: SVM: Provide support for SNP_GUEST_REQUEST NAE event

> +
> +static void snp_handle_guest_request(struct vcpu_svm *svm, gpa_t req_gpa, gpa_t resp_gpa)
> +{

Both regular,

> +
> +static void snp_handle_ext_guest_request(struct vcpu_svm *svm, gpa_t req_gpa, gpa_t resp_gpa)
> +{

and extended guest requests should be subject to rate limiting, since
they take a lock on the shared resource that is the AMD-SP (psp?). I
proposed a mechanism with empirically chosen defaults in

[PATCH v2 0/2] kvm: sev: Add SNP guest request throttling
[PATCH v2 1/2] kvm: sev: Add SEV-SNP guest request throttling
[PATCH v2 2/2] kvm: sev: If ccp is busy, report throttled to guest

http://129.79.113.48/hypermail/linux/kernel/2211.2/03107.html
http://129.79.113.48/hypermail/linux/kernel/2211.2/03110.html
http://129.79.113.48/hypermail/linux/kernel/2211.2/03111.html

But I don't see these on lore. Would you like me to repost these?

--
-Dionna Glaze, PhD (she/her)

2023-01-19 21:03:20

by Kalra, Ashish

[permalink] [raw]
Subject: Re: [PATCH RFC v7 52/64] KVM: SVM: Provide support for SNP_GUEST_REQUEST NAE event


On 1/19/2023 2:35 PM, Dionna Amalie Glaze wrote:
>> +
>> +static void snp_handle_guest_request(struct vcpu_svm *svm, gpa_t req_gpa, gpa_t resp_gpa)
>> +{
>
> Both regular,
>
>> +
>> +static void snp_handle_ext_guest_request(struct vcpu_svm *svm, gpa_t req_gpa, gpa_t resp_gpa)
>> +{
>
> and extended guest requests should be subject to rate limiting, since
> they take a lock on the shared resource that is the AMD-SP (psp?). I
> proposed a mechanism with empirically chosen defaults in
>
> [PATCH v2 0/2] kvm: sev: Add SNP guest request throttling
> [PATCH v2 1/2] kvm: sev: Add SEV-SNP guest request throttling
> [PATCH v2 2/2] kvm: sev: If ccp is busy, report throttled to guest
>
> http://129.79.113.48/hypermail/linux/kernel/2211.2/03107.html
> http://129.79.113.48/hypermail/linux/kernel/2211.2/03110.html
> http://129.79.113.48/hypermail/linux/kernel/2211.2/03111.html
>
> But I don't see these on lore. Would you like me to repost these?
>

Yes, please.

Thanks,
Ashish

2023-01-19 21:14:39

by Dov Murik

[permalink] [raw]
Subject: Re: [PATCH RFC v7 52/64] KVM: SVM: Provide support for SNP_GUEST_REQUEST NAE event



On 19/01/2023 22:54, Kalra, Ashish wrote:
>
> On 1/19/2023 2:35 PM, Dionna Amalie Glaze wrote:
>>> +
>>> +static void snp_handle_guest_request(struct vcpu_svm *svm, gpa_t
>>> req_gpa, gpa_t resp_gpa)
>>> +{
>>
>> Both regular,
>>
>>> +
>>> +static void snp_handle_ext_guest_request(struct vcpu_svm *svm, gpa_t
>>> req_gpa, gpa_t resp_gpa)
>>> +{
>>
>> and extended guest requests should be subject to rate limiting, since
>> they take a lock on the shared resource that is the AMD-SP (psp?). I
>> proposed a mechanism with empirically chosen defaults in
>>
>> [PATCH v2 0/2] kvm: sev: Add SNP guest request throttling
>> [PATCH v2 1/2] kvm: sev: Add SEV-SNP guest request throttling
>> [PATCH v2 2/2] kvm: sev: If ccp is busy, report throttled to guest
>>
>> http://129.79.113.48/hypermail/linux/kernel/2211.2/03107.html
>> http://129.79.113.48/hypermail/linux/kernel/2211.2/03110.html
>> http://129.79.113.48/hypermail/linux/kernel/2211.2/03111.html
>>
>> But I don't see these on lore. Would you like me to repost these?
>>
>
> Yes, please.
>

I think it's this series:

https://lore.kernel.org/all/[email protected]/

-Dov

2023-01-19 22:43:23

by Kalra, Ashish

[permalink] [raw]
Subject: Re: [PATCH RFC v7 62/64] x86/sev: Add KVM commands for instance certs

Hello Dionna,

Do you also have other updates to this patch with regard to review
comments from Dov ?

Thanks,
Ashish

On 1/19/2023 12:49 PM, Dionna Amalie Glaze wrote:
>> +
>> + /* Page-align the length */
>> + length = (params.certs_len + PAGE_SIZE - 1) & PAGE_MASK;
>> +
>
> I believe Ashish wanted this to be PAGE_ALIGN(params.certs_len)
>

2023-01-20 00:34:29

by Kalra, Ashish

[permalink] [raw]
Subject: Re: [PATCH RFC v7 14/64] x86/sev: Add the host SEV-SNP initialization support

Hello Jeremi,

On 1/18/2023 9:55 AM, Jeremi Piotrowski wrote:
> On Wed, Dec 14, 2022 at 01:40:06PM -0600, Michael Roth wrote:
>> From: Brijesh Singh <[email protected]>
>>
>> The memory integrity guarantees of SEV-SNP are enforced through a new
>> structure called the Reverse Map Table (RMP). The RMP is a single data
>> structure shared across the system that contains one entry for every 4K
>> page of DRAM that may be used by SEV-SNP VMs. The goal of RMP is to
>> track the owner of each page of memory. Pages of memory can be owned by
>> the hypervisor, owned by a specific VM or owned by the AMD-SP. See APM2
>> section 15.36.3 for more detail on RMP.
>>
>> The RMP table is used to enforce access control to memory. The table itself
>> is not directly writable by the software. New CPU instructions (RMPUPDATE,
>> PVALIDATE, RMPADJUST) are used to manipulate the RMP entries.
>>
>> Based on the platform configuration, the BIOS reserves the memory used
>> for the RMP table. The start and end address of the RMP table must be
>> queried by reading the RMP_BASE and RMP_END MSRs. If the RMP_BASE and
>> RMP_END are not set then disable the SEV-SNP feature.
>>
>> The SEV-SNP feature is enabled only after the RMP table is successfully
>> initialized.
>>
>> Also set SYSCFG.MFMD when enabling SNP as SEV-SNP FW >= 1.51 requires
>> that SYSCFG.MFMD must be se
>>
>> RMP table entry format is non-architectural and it can vary by processor
>> and is defined by the PPR. Restrict SNP support on the known CPU model
>> and family for which the RMP table entry format is currently defined for.
>>
>> Signed-off-by: Brijesh Singh <[email protected]>
>> Signed-off-b: Ashish Kalra <[email protected]>
>> Signed-off-by: Michael Roth <[email protected]>
>> ---
>> arch/x86/include/asm/disabled-features.h | 8 +-
>> arch/x86/include/asm/msr-index.h | 11 +-
>> arch/x86/kernel/sev.c | 180 +++++++++++++++++++++++
>> 3 files changed, 197 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/disabled-features.h b/arch/x86/include/asm/disabled-features.h
>> index 33d2cd04d254..9b5a2cc8064a 100644
>> --- a/arch/x86/include/asm/disabled-features.h
>> +++ b/arch/x86/include/asm/disabled-features.h
>> @@ -87,6 +87,12 @@
>> # define DISABLE_TDX_GUEST (1 << (X86_FEATURE_TDX_GUEST & 31))
>> #endif
>>
>> +#ifdef CONFIG_AMD_MEM_ENCRYPT
>> +# define DISABLE_SEV_SNP 0
>> +#else
>> +# define DISABLE_SEV_SNP (1 << (X86_FEATURE_SEV_SNP & 31))
>> +#endif
>> +
>> /*
>> * Make sure to add features to the correct mask
>> */
>> @@ -110,7 +116,7 @@
>> DISABLE_ENQCMD)
>> #define DISABLED_MASK17 0
>> #define DISABLED_MASK18 0
>> -#define DISABLED_MASK19 0
>> +#define DISABLED_MASK19 (DISABLE_SEV_SNP)
>> #define DISABLED_MASK_CHECK BUILD_BUG_ON_ZERO(NCAPINTS != 20)
>>
>> #endif /* _ASM_X86_DISABLED_FEATURES_H */
>> diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
>> index 10ac52705892..35100c630617 100644
>> --- a/arch/x86/include/asm/msr-index.h
>> +++ b/arch/x86/include/asm/msr-index.h
>> @@ -565,6 +565,8 @@
>> #define MSR_AMD64_SEV_ENABLED BIT_ULL(MSR_AMD64_SEV_ENABLED_BIT)
>> #define MSR_AMD64_SEV_ES_ENABLED BIT_ULL(MSR_AMD64_SEV_ES_ENABLED_BIT)
>> #define MSR_AMD64_SEV_SNP_ENABLED BIT_ULL(MSR_AMD64_SEV_SNP_ENABLED_BIT)
>> +#define MSR_AMD64_RMP_BASE 0xc0010132
>> +#define MSR_AMD64_RMP_END 0xc0010133
>>
>> #define MSR_AMD64_VIRT_SPEC_CTRL 0xc001011f
>>
>> @@ -649,7 +651,14 @@
>> #define MSR_K8_TOP_MEM2 0xc001001d
>> #define MSR_AMD64_SYSCFG 0xc0010010
>> #define MSR_AMD64_SYSCFG_MEM_ENCRYPT_BIT 23
>> -#define MSR_AMD64_SYSCFG_MEM_ENCRYPT BIT_ULL(MSR_AMD64_SYSCFG_MEM_ENCRYPT_BIT)
>> +#define MSR_AMD64_SYSCFG_MEM_ENCRYPT BIT_ULL(MSR_AMD64_SYSCFG_MEM_ENCRYPT_BIT)
>> +#define MSR_AMD64_SYSCFG_SNP_EN_BIT 24
>> +#define MSR_AMD64_SYSCFG_SNP_EN BIT_ULL(MSR_AMD64_SYSCFG_SNP_EN_BIT)
>> +#define MSR_AMD64_SYSCFG_SNP_VMPL_EN_BIT 25
>> +#define MSR_AMD64_SYSCFG_SNP_VMPL_EN BIT_ULL(MSR_AMD64_SYSCFG_SNP_VMPL_EN_BIT)
>> +#define MSR_AMD64_SYSCFG_MFDM_BIT 19
>> +#define MSR_AMD64_SYSCFG_MFDM BIT_ULL(MSR_AMD64_SYSCFG_MFDM_BIT)
>> +
>> #define MSR_K8_INT_PENDING_MSG 0xc0010055
>> /* C1E active bits in int pending message */
>> #define K8_INTP_C1E_ACTIVE_MASK 0x18000000
>> diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
>> index a428c62330d3..687a91284506 100644
>> --- a/arch/x86/kernel/sev.c
>> +++ b/arch/x86/kernel/sev.c
>> @@ -22,6 +22,9 @@
>> #include <linux/efi.h>
>> #include <linux/platform_device.h>
>> #include <linux/io.h>
>> +#include <linux/cpumask.h>
>> +#include <linux/iommu.h>
>> +#include <linux/amd-iommu.h>
>>
>> #include <asm/cpu_entry_area.h>
>> #include <asm/stacktrace.h>
>> @@ -38,6 +41,7 @@
>> #include <asm/apic.h>
>> #include <asm/cpuid.h>
>> #include <asm/cmdline.h>
>> +#include <asm/iommu.h>
>>
>> #define DR7_RESET_VALUE 0x400
>>
>> @@ -57,6 +61,12 @@
>> #define AP_INIT_CR0_DEFAULT 0x60000010
>> #define AP_INIT_MXCSR_DEFAULT 0x1f80
>>
>> +/*
>> + * The first 16KB from the RMP_BASE is used by the processor for the
>> + * bookkeeping, the range needs to be added during the RMP entry lookup.
>> + */
>> +#define RMPTABLE_CPU_BOOKKEEPING_SZ 0x4000
>> +
>> /* For early boot hypervisor communication in SEV-ES enabled guests */
>> static struct ghcb boot_ghcb_page __bss_decrypted __aligned(PAGE_SIZE);
>>
>> @@ -69,6 +79,9 @@ static struct ghcb *boot_ghcb __section(".data");
>> /* Bitmap of SEV features supported by the hypervisor */
>> static u64 sev_hv_features __ro_after_init;
>>
>> +static unsigned long rmptable_start __ro_after_init;
>> +static unsigned long rmptable_end __ro_after_init;
>> +
>> /* #VC handler runtime per-CPU data */
>> struct sev_es_runtime_data {
>> struct ghcb ghcb_page;
>> @@ -2260,3 +2273,170 @@ static int __init snp_init_platform_device(void)
>> return 0;
>> }
>> device_initcall(snp_init_platform_device);
>> +
>> +#undef pr_fmt
>> +#define pr_fmt(fmt) "SEV-SNP: " fmt
>> +
>> +static int __mfd_enable(unsigned int cpu)
>> +{
>> + u64 val;
>> +
>> + if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
>> + return 0;
>> +
>> + rdmsrl(MSR_AMD64_SYSCFG, val);
>> +
>> + val |= MSR_AMD64_SYSCFG_MFDM;
>> +
>> + wrmsrl(MSR_AMD64_SYSCFG, val);
>> +
>> + return 0;
>> +}
>> +
>> +static __init void mfd_enable(void *arg)
>> +{
>> + __mfd_enable(smp_processor_id());
>> +}
>> +
>> +static int __snp_enable(unsigned int cpu)
>> +{
>> + u64 val;
>> +
>> + if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
>> + return 0;
>> +
>> + rdmsrl(MSR_AMD64_SYSCFG, val);
>> +
>> + val |= MSR_AMD64_SYSCFG_SNP_EN;
>> + val |= MSR_AMD64_SYSCFG_SNP_VMPL_EN;
>> +
>> + wrmsrl(MSR_AMD64_SYSCFG, val);
>> +
>> + return 0;
>> +}
>> +
>> +static __init void snp_enable(void *arg)
>> +{
>> + __snp_enable(smp_processor_id());
>> +}
>> +
>> +static bool get_rmptable_info(u64 *start, u64 *len)
>> +{
>> + u64 calc_rmp_sz, rmp_sz, rmp_base, rmp_end;
>> +
>> + rdmsrl(MSR_AMD64_RMP_BASE, rmp_base);
>> + rdmsrl(MSR_AMD64_RMP_END, rmp_end);
>> +
>> + if (!rmp_base || !rmp_end) {
>> + pr_err("Memory for the RMP table has not been reserved by BIOS\n");
>> + return false;
>> + }
>> +
>> + rmp_sz = rmp_end - rmp_base + 1;
>> +
>> + /*
>> + * Calculate the amount the memory that must be reserved by the BIOS to
>> + * address the whole RAM. The reserved memory should also cover the
>> + * RMP table itself.
>> + */
>> + calc_rmp_sz = (((rmp_sz >> PAGE_SHIFT) + totalram_pages()) << 4) + RMPTABLE_CPU_BOOKKEEPING_SZ;
>
> Since the rmptable is indexed by page number, I believe this check should be
> using max_pfn:
>
> calc_rmp_sz = (max_pfn << 4) + RMPTABLE_CPU_BOOKKEEPING_SZ;
>
> This accounts for holes/offsets in the memory map which lead to the top of
> memory having pfn > totalram_pages().
>

I agree that this check should use max. addressable pfn to account for
holes in the physical address map. The BIOS will probably also be
computing RMP table size to cover the entire physical memory, which
should be max. addressable PFN.

But, then we primarly need to check that all available RAM pages are
covered by the RMP table, so the above check is sufficient for that, right ?

Also, i assume that max_pfn will take into account any hotplugged memory
as i do know that totalram_pages() handles hotplugged memory.

Thanks,
Ashish

>> +
>> + if (calc_rmp_sz > rmp_sz) {
>> + pr_err("Memory reserved for the RMP table does not cover full system RAM (expected 0x%llx got 0x%llx)\n",
>> + calc_rmp_sz, rmp_sz);
>> + return false;
>> + }
>> +
>> + *start = rmp_base;
>> + *len = rmp_sz;
>> +
>> + pr_info("RMP table physical address [0x%016llx - 0x%016llx]\n", rmp_base, rmp_end);
>> +
>> + return true;
>> +}
>> +

2023-01-20 01:50:48

by Dionna Amalie Glaze

[permalink] [raw]
Subject: Re: [PATCH RFC v7 62/64] x86/sev: Add KVM commands for instance certs

On Thu, Jan 19, 2023 at 2:18 PM Kalra, Ashish <[email protected]> wrote:
>
> Hello Dionna,
>
> Do you also have other updates to this patch with regard to review
> comments from Dov ?
>

Apart from the PAGE_ALIGN change, the result of the whole discussion
appears to only need the following immediately before the
copy_from_user of certs_uaddr in the snp_set_instance_certs function:

/* The size could shrink and leave garbage at the end. */
memset(sev->snp_certs_data, 0, SEV_FW_BLOB_MAX_SIZE);

I don't believe there is an off-by-one with the page shifting for the
number of pages because snp_certs_len is already rounded up to the
nearest page size. Any other change wrt the way the blob size is
decided between the guest and host should come later.

--
-Dionna Glaze, PhD (she/her)

2023-01-20 17:00:29

by Kalra, Ashish

[permalink] [raw]
Subject: Re: [PATCH RFC v7 14/64] x86/sev: Add the host SEV-SNP initialization support

On 1/19/2023 5:59 PM, Kalra, Ashish wrote:
> Hello Jeremi,
>
> On 1/18/2023 9:55 AM, Jeremi Piotrowski wrote:
>> On Wed, Dec 14, 2022 at 01:40:06PM -0600, Michael Roth wrote:
>>> From: Brijesh Singh <[email protected]>
>>>
>>> The memory integrity guarantees of SEV-SNP are enforced through a new
>>> structure called the Reverse Map Table (RMP). The RMP is a single data
>>> structure shared across the system that contains one entry for every 4K
>>> page of DRAM that may be used by SEV-SNP VMs. The goal of RMP is to
>>> track the owner of each page of memory. Pages of memory can be owned by
>>> the hypervisor, owned by a specific VM or owned by the AMD-SP. See APM2
>>> section 15.36.3 for more detail on RMP.
>>>
>>> The RMP table is used to enforce access control to memory. The table
>>> itself
>>> is not directly writable by the software. New CPU instructions
>>> (RMPUPDATE,
>>> PVALIDATE, RMPADJUST) are used to manipulate the RMP entries.
>>>
>>> Based on the platform configuration, the BIOS reserves the memory used
>>> for the RMP table. The start and end address of the RMP table must be
>>> queried by reading the RMP_BASE and RMP_END MSRs. If the RMP_BASE and
>>> RMP_END are not set then disable the SEV-SNP feature.
>>>
>>> The SEV-SNP feature is enabled only after the RMP table is successfully
>>> initialized.
>>>
>>> Also set SYSCFG.MFMD when enabling SNP as SEV-SNP FW >= 1.51 requires
>>> that SYSCFG.MFMD must be se
>>>
>>> RMP table entry format is non-architectural and it can vary by processor
>>> and is defined by the PPR. Restrict SNP support on the known CPU model
>>> and family for which the RMP table entry format is currently defined
>>> for.
>>>
>>> Signed-off-by: Brijesh Singh <[email protected]>
>>> Signed-off-b: Ashish Kalra <[email protected]>
>>> Signed-off-by: Michael Roth <[email protected]>
>>> ---
>>>   arch/x86/include/asm/disabled-features.h |   8 +-
>>>   arch/x86/include/asm/msr-index.h         |  11 +-
>>>   arch/x86/kernel/sev.c                    | 180 +++++++++++++++++++++++
>>>   3 files changed, 197 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/arch/x86/include/asm/disabled-features.h
>>> b/arch/x86/include/asm/disabled-features.h
>>> index 33d2cd04d254..9b5a2cc8064a 100644
>>> --- a/arch/x86/include/asm/disabled-features.h
>>> +++ b/arch/x86/include/asm/disabled-features.h
>>> @@ -87,6 +87,12 @@
>>>   # define DISABLE_TDX_GUEST    (1 << (X86_FEATURE_TDX_GUEST & 31))
>>>   #endif
>>> +#ifdef CONFIG_AMD_MEM_ENCRYPT
>>> +# define DISABLE_SEV_SNP    0
>>> +#else
>>> +# define DISABLE_SEV_SNP    (1 << (X86_FEATURE_SEV_SNP & 31))
>>> +#endif
>>> +
>>>   /*
>>>    * Make sure to add features to the correct mask
>>>    */
>>> @@ -110,7 +116,7 @@
>>>                DISABLE_ENQCMD)
>>>   #define DISABLED_MASK17    0
>>>   #define DISABLED_MASK18    0
>>> -#define DISABLED_MASK19    0
>>> +#define DISABLED_MASK19    (DISABLE_SEV_SNP)
>>>   #define DISABLED_MASK_CHECK BUILD_BUG_ON_ZERO(NCAPINTS != 20)
>>>   #endif /* _ASM_X86_DISABLED_FEATURES_H */
>>> diff --git a/arch/x86/include/asm/msr-index.h
>>> b/arch/x86/include/asm/msr-index.h
>>> index 10ac52705892..35100c630617 100644
>>> --- a/arch/x86/include/asm/msr-index.h
>>> +++ b/arch/x86/include/asm/msr-index.h
>>> @@ -565,6 +565,8 @@
>>>   #define MSR_AMD64_SEV_ENABLED
>>> BIT_ULL(MSR_AMD64_SEV_ENABLED_BIT)
>>>   #define MSR_AMD64_SEV_ES_ENABLED
>>> BIT_ULL(MSR_AMD64_SEV_ES_ENABLED_BIT)
>>>   #define MSR_AMD64_SEV_SNP_ENABLED
>>> BIT_ULL(MSR_AMD64_SEV_SNP_ENABLED_BIT)
>>> +#define MSR_AMD64_RMP_BASE        0xc0010132
>>> +#define MSR_AMD64_RMP_END        0xc0010133
>>>   #define MSR_AMD64_VIRT_SPEC_CTRL    0xc001011f
>>> @@ -649,7 +651,14 @@
>>>   #define MSR_K8_TOP_MEM2            0xc001001d
>>>   #define MSR_AMD64_SYSCFG        0xc0010010
>>>   #define MSR_AMD64_SYSCFG_MEM_ENCRYPT_BIT    23
>>> -#define MSR_AMD64_SYSCFG_MEM_ENCRYPT
>>> BIT_ULL(MSR_AMD64_SYSCFG_MEM_ENCRYPT_BIT)
>>> +#define MSR_AMD64_SYSCFG_MEM_ENCRYPT
>>> BIT_ULL(MSR_AMD64_SYSCFG_MEM_ENCRYPT_BIT)
>>> +#define MSR_AMD64_SYSCFG_SNP_EN_BIT        24
>>> +#define MSR_AMD64_SYSCFG_SNP_EN
>>> BIT_ULL(MSR_AMD64_SYSCFG_SNP_EN_BIT)
>>> +#define MSR_AMD64_SYSCFG_SNP_VMPL_EN_BIT    25
>>> +#define MSR_AMD64_SYSCFG_SNP_VMPL_EN
>>> BIT_ULL(MSR_AMD64_SYSCFG_SNP_VMPL_EN_BIT)
>>> +#define MSR_AMD64_SYSCFG_MFDM_BIT        19
>>> +#define MSR_AMD64_SYSCFG_MFDM
>>> BIT_ULL(MSR_AMD64_SYSCFG_MFDM_BIT)
>>> +
>>>   #define MSR_K8_INT_PENDING_MSG        0xc0010055
>>>   /* C1E active bits in int pending message */
>>>   #define K8_INTP_C1E_ACTIVE_MASK        0x18000000
>>> diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
>>> index a428c62330d3..687a91284506 100644
>>> --- a/arch/x86/kernel/sev.c
>>> +++ b/arch/x86/kernel/sev.c
>>> @@ -22,6 +22,9 @@
>>>   #include <linux/efi.h>
>>>   #include <linux/platform_device.h>
>>>   #include <linux/io.h>
>>> +#include <linux/cpumask.h>
>>> +#include <linux/iommu.h>
>>> +#include <linux/amd-iommu.h>
>>>   #include <asm/cpu_entry_area.h>
>>>   #include <asm/stacktrace.h>
>>> @@ -38,6 +41,7 @@
>>>   #include <asm/apic.h>
>>>   #include <asm/cpuid.h>
>>>   #include <asm/cmdline.h>
>>> +#include <asm/iommu.h>
>>>   #define DR7_RESET_VALUE        0x400
>>> @@ -57,6 +61,12 @@
>>>   #define AP_INIT_CR0_DEFAULT        0x60000010
>>>   #define AP_INIT_MXCSR_DEFAULT        0x1f80
>>> +/*
>>> + * The first 16KB from the RMP_BASE is used by the processor for the
>>> + * bookkeeping, the range needs to be added during the RMP entry
>>> lookup.
>>> + */
>>> +#define RMPTABLE_CPU_BOOKKEEPING_SZ    0x4000
>>> +
>>>   /* For early boot hypervisor communication in SEV-ES enabled guests */
>>>   static struct ghcb boot_ghcb_page __bss_decrypted
>>> __aligned(PAGE_SIZE);
>>> @@ -69,6 +79,9 @@ static struct ghcb *boot_ghcb __section(".data");
>>>   /* Bitmap of SEV features supported by the hypervisor */
>>>   static u64 sev_hv_features __ro_after_init;
>>> +static unsigned long rmptable_start __ro_after_init;
>>> +static unsigned long rmptable_end __ro_after_init;
>>> +
>>>   /* #VC handler runtime per-CPU data */
>>>   struct sev_es_runtime_data {
>>>       struct ghcb ghcb_page;
>>> @@ -2260,3 +2273,170 @@ static int __init snp_init_platform_device(void)
>>>       return 0;
>>>   }
>>>   device_initcall(snp_init_platform_device);
>>> +
>>> +#undef pr_fmt
>>> +#define pr_fmt(fmt)    "SEV-SNP: " fmt
>>> +
>>> +static int __mfd_enable(unsigned int cpu)
>>> +{
>>> +    u64 val;
>>> +
>>> +    if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
>>> +        return 0;
>>> +
>>> +    rdmsrl(MSR_AMD64_SYSCFG, val);
>>> +
>>> +    val |= MSR_AMD64_SYSCFG_MFDM;
>>> +
>>> +    wrmsrl(MSR_AMD64_SYSCFG, val);
>>> +
>>> +    return 0;
>>> +}
>>> +
>>> +static __init void mfd_enable(void *arg)
>>> +{
>>> +    __mfd_enable(smp_processor_id());
>>> +}
>>> +
>>> +static int __snp_enable(unsigned int cpu)
>>> +{
>>> +    u64 val;
>>> +
>>> +    if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
>>> +        return 0;
>>> +
>>> +    rdmsrl(MSR_AMD64_SYSCFG, val);
>>> +
>>> +    val |= MSR_AMD64_SYSCFG_SNP_EN;
>>> +    val |= MSR_AMD64_SYSCFG_SNP_VMPL_EN;
>>> +
>>> +    wrmsrl(MSR_AMD64_SYSCFG, val);
>>> +
>>> +    return 0;
>>> +}
>>> +
>>> +static __init void snp_enable(void *arg)
>>> +{
>>> +    __snp_enable(smp_processor_id());
>>> +}
>>> +
>>> +static bool get_rmptable_info(u64 *start, u64 *len)
>>> +{
>>> +    u64 calc_rmp_sz, rmp_sz, rmp_base, rmp_end;
>>> +
>>> +    rdmsrl(MSR_AMD64_RMP_BASE, rmp_base);
>>> +    rdmsrl(MSR_AMD64_RMP_END, rmp_end);
>>> +
>>> +    if (!rmp_base || !rmp_end) {
>>> +        pr_err("Memory for the RMP table has not been reserved by
>>> BIOS\n");
>>> +        return false;
>>> +    }
>>> +
>>> +    rmp_sz = rmp_end - rmp_base + 1;
>>> +
>>> +    /*
>>> +     * Calculate the amount the memory that must be reserved by the
>>> BIOS to
>>> +     * address the whole RAM. The reserved memory should also cover the
>>> +     * RMP table itself.
>>> +     */
>>> +    calc_rmp_sz = (((rmp_sz >> PAGE_SHIFT) + totalram_pages()) << 4)
>>> + RMPTABLE_CPU_BOOKKEEPING_SZ;
>>
>> Since the rmptable is indexed by page number, I believe this check
>> should be
>> using max_pfn:
>>
>>      calc_rmp_sz = (max_pfn << 4) + RMPTABLE_CPU_BOOKKEEPING_SZ;
>>
>> This accounts for holes/offsets in the memory map which lead to the
>> top of
>> memory having pfn > totalram_pages().
>>
>
> I agree that this check should use max. addressable pfn to account for
> holes in the physical address map. The BIOS will probably also be
> computing RMP table size to cover the entire physical memory, which
> should be max. addressable PFN.
>
> But, then we primarly need to check that all available RAM pages are
> covered by the RMP table, so the above check is sufficient for that,
> right ?
>
> Also, i assume that max_pfn will take into account any hotplugged memory
> as i do know that totalram_pages() handles hotplugged memory.
>

But essentially you are correct, as RMP table is indexed by PFN, we have
to take into account these physical memory holes so that we can have
entries for the max DRAM SPA, so i will fix this to use the max.
addressable PFN, i.e., max_pfn.

Thanks,
Ashish

>
>>> +
>>> +    if (calc_rmp_sz > rmp_sz) {
>>> +        pr_err("Memory reserved for the RMP table does not cover
>>> full system RAM (expected 0x%llx got 0x%llx)\n",
>>> +               calc_rmp_sz, rmp_sz);
>>> +        return false;
>>> +    }
>>> +
>>> +    *start = rmp_base;
>>> +    *len = rmp_sz;
>>> +
>>> +    pr_info("RMP table physical address [0x%016llx - 0x%016llx]\n",
>>> rmp_base, rmp_end);
>>> +
>>> +    return true;
>>> +}
>>> +

2023-01-20 20:11:29

by Michael Roth

[permalink] [raw]
Subject: Re: [PATCH RFC v7 44/64] KVM: SVM: Remove the long-lived GHCB host map

On Wed, Jan 18, 2023 at 10:15:38AM -0800, Alper Gun wrote:
> On Wed, Jan 18, 2023 at 7:27 AM Jeremi Piotrowski
> <[email protected]> wrote:
> >
> > On Wed, Dec 14, 2022 at 01:40:36PM -0600, Michael Roth wrote:
> > > From: Brijesh Singh <[email protected]>
> > >
> > > On VMGEXIT, sev_handle_vmgexit() creates a host mapping for the GHCB GPA,
> > > and unmaps it just before VM-entry. This long-lived GHCB map is used by
> > > the VMGEXIT handler through accessors such as ghcb_{set_get}_xxx().
> > >
> > > A long-lived GHCB map can cause issue when SEV-SNP is enabled. When
> > > SEV-SNP is enabled the mapped GPA needs to be protected against a page
> > > state change.
> > >
> > > To eliminate the long-lived GHCB mapping, update the GHCB sync operations
> > > to explicitly map the GHCB before access and unmap it after access is
> > > complete. This requires that the setting of the GHCBs sw_exit_info_{1,2}
> > > fields be done during sev_es_sync_to_ghcb(), so create two new fields in
> > > the vcpu_svm struct to hold these values when required to be set outside
> > > of the GHCB mapping.
> > >
> > > Signed-off-by: Brijesh Singh <[email protected]>
> > > Signed-off-by: Ashish Kalra <[email protected]>
> > > [mdr: defer per_cpu() assignment and order it with barrier() to fix case
> > > where kvm_vcpu_map() causes reschedule on different CPU]
> > > Signed-off-by: Michael Roth <[email protected]>
> > > ---
> > > arch/x86/kvm/svm/sev.c | 131 ++++++++++++++++++++++++++---------------
> > > arch/x86/kvm/svm/svm.c | 18 +++---
> > > arch/x86/kvm/svm/svm.h | 24 +++++++-
> > > 3 files changed, 116 insertions(+), 57 deletions(-)
> > >
> > > diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> > > index d5c6e48055fb..6ac0cb6e3484 100644
> > > --- a/arch/x86/kvm/svm/sev.c
> > > +++ b/arch/x86/kvm/svm/sev.c
> > > @@ -2921,15 +2921,40 @@ void sev_free_vcpu(struct kvm_vcpu *vcpu)
> > > kvfree(svm->sev_es.ghcb_sa);
> > > }
> > >
> > > +static inline int svm_map_ghcb(struct vcpu_svm *svm, struct kvm_host_map *map)
> > > +{
> > > + struct vmcb_control_area *control = &svm->vmcb->control;
> > > + u64 gfn = gpa_to_gfn(control->ghcb_gpa);
> > > +
> > > + if (kvm_vcpu_map(&svm->vcpu, gfn, map)) {
> > > + /* Unable to map GHCB from guest */
> > > + pr_err("error mapping GHCB GFN [%#llx] from guest\n", gfn);
> > > + return -EFAULT;
> > > + }
> > > +
> > > + return 0;
> > > +}
> > > +
> > > +static inline void svm_unmap_ghcb(struct vcpu_svm *svm, struct kvm_host_map *map)
> > > +{
> > > + kvm_vcpu_unmap(&svm->vcpu, map, true);
> > > +}
> > > +
> > > static void dump_ghcb(struct vcpu_svm *svm)
> > > {
> > > - struct ghcb *ghcb = svm->sev_es.ghcb;
> > > + struct kvm_host_map map;
> > > unsigned int nbits;
> > > + struct ghcb *ghcb;
> > > +
> > > + if (svm_map_ghcb(svm, &map))
> > > + return;
> > > +
> > > + ghcb = map.hva;
> >
> > dump_ghcb() is called from sev_es_validate_vmgexit() with the ghcb already
> > mapped. How about passing 'struct kvm_host_map *' (or struct ghcb *) as a
> > param to avoid double mapping?
>
> This also causes a soft lockup, PSC spin lock is already acquired in
> sev_es_validate_vmgexit. dump_ghcb will try to acquire the same lock
> again. So a guest can send an invalid ghcb page and cause a host soft
> lockup.

We did notice that issue with v6, and had a fix similar to what Jeremi
suggested, but in this patchset the psc_lock spinlock has been replaced
in favor of taking a read_lock() on kvm->mmu_lock.

The logic there is that userspace drives the page state changes via
kvm_vm_ioctl_set_mem_attributes() now, and it does so while holding a
write_lock() on kvm->mmu_lock, so if we want to guard the GHCB page from
page state changes while the kernel is accessing it, it makes sense to
use the same lock rather than relying on some separate SNP-specific lock.

And because it's now a read_lock(), I don't think the deadlock issue is
present anymore in v7, but it probably does still make sense to avoid the
double-mapping as Jeremi suggested, so we'll plan to make that change for
v8.

Thanks,

Mike

2023-01-20 21:55:03

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: [PATCH RFC v7 03/64] KVM: SVM: Advertise private memory support to KVM

On Wed, Jan 04, 2023 at 08:14:19PM -0600, Michael Roth wrote:
> On Fri, Dec 23, 2022 at 05:56:50PM +0100, Borislav Petkov wrote:
> > On Wed, Dec 14, 2022 at 01:39:55PM -0600, Michael Roth wrote:
> > > + bool (*private_mem_enabled)(struct kvm *kvm);
> >
> > This looks like a function returning boolean to me. IOW, you can
> > simplify this to:
>
> The semantics and existing uses of KVM_X86_OP_OPTIONAL_RET0() gave me the
> impression it needed to return an integer value, since by default if a
> platform doesn't implement the op it would "return 0", and so could
> still be called unconditionally.
>
> Maybe that's not actually enforced, by it seems awkward to try to use a
> bool return instead. At least for KVM_X86_OP_OPTIONAL_RET0().
>
> However, we could just use KVM_X86_OP() to declare it so we can cleanly
> use a function that returns bool, and then we just need to do:
>
> bool kvm_arch_has_private_mem(struct kvm *kvm)
> {
> if (kvm_x86_ops.private_mem_enabled)
> return static_call(kvm_x86_private_mem_enabled)(kvm);

I guess this is missing:

return false;

> }
>
> instead of relying on default return value. So I'll take that approach
> and adopt your other suggested changes.
>
> ...
>
> On a separate topic though, at a high level, this hook is basically a way
> for platform-specific code to tell generic KVM code that private memslots
> are supported by overriding the kvm_arch_has_private_mem() weak
> reference. In this case the AMD platform is using using kvm->arch.upm_mode
> flag to convey that, which is in turn set by the
> KVM_CAP_UNMAPPED_PRIVATE_MEMORY introduced in this series.
>
> But if, as I suggested in response to your PATCH 2 comments, we drop
> KVM_CAP_UNAMMPED_PRIVATE_MEMORY in favor of
> KVM_SET_SUPPORTED_MEMORY_ATTRIBUTES ioctl to enable "UPM mode" in SEV/SNP
> code, then we need to rethink things a bit, since KVM_SET_MEMORY_ATTRIBUTES
> in-part relies on kvm_arch_has_private_mem() to determine what flags are
> supported, whereas SEV/SNP code would be using what was set by
> KVM_SET_MEMORY_ATTRIBUTES to determine the return value in
> kvm_arch_has_private_mem().

Does this mean that internal calls to kvm_vm_set_region_attr() will
cease to exist, and it will rely for user space to use the ioctl
properly instead?

> So, for AMD, the return value of kvm_arch_has_private_mem() needs to rely
> on something else. Maybe the logic can just be:
>
> bool svm_private_mem_enabled(struct kvm *kvm)
> {
> return sev_enabled(kvm) || sev_snp_enabled(kvm)
> }
>
> (at least in the context of this patchset where UPM support is added for
> both SEV and SNP).
>
> So I'll plan to make that change as well.
>
> -Mike
>
> >
> > diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
> > index 82ba4a564e58..4449aeff0dff 100644
> > --- a/arch/x86/include/asm/kvm-x86-ops.h
> > +++ b/arch/x86/include/asm/kvm-x86-ops.h
> > @@ -129,6 +129,7 @@ KVM_X86_OP(msr_filter_changed)
> > KVM_X86_OP(complete_emulated_msr)
> > KVM_X86_OP(vcpu_deliver_sipi_vector)
> > KVM_X86_OP_OPTIONAL_RET0(vcpu_get_apicv_inhibit_reasons);
> > +KVM_X86_OP_OPTIONAL_RET0(private_mem_enabled);
> >
> > #undef KVM_X86_OP
> > #undef KVM_X86_OP_OPTIONAL
> > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > index 1da0474edb2d..1b4b89ddeb55 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -1574,6 +1574,7 @@ struct kvm_x86_ops {
> >
> > void (*load_mmu_pgd)(struct kvm_vcpu *vcpu, hpa_t root_hpa,
> > int root_level);
> > + bool (*private_mem_enabled)(struct kvm *kvm);
> >
> > bool (*has_wbinvd_exit)(void);
> >
> > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> > index ce362e88a567..73b780fa4653 100644
> > --- a/arch/x86/kvm/svm/svm.c
> > +++ b/arch/x86/kvm/svm/svm.c
> > @@ -4680,6 +4680,14 @@ static int svm_vm_init(struct kvm *kvm)
> > return 0;
> > }
> >
> > +static bool svm_private_mem_enabled(struct kvm *kvm)
> > +{
> > + if (sev_guest(kvm))
> > + return kvm->arch.upm_mode;
> > +
> > + return IS_ENABLED(CONFIG_HAVE_KVM_PRIVATE_MEM_TESTING);
> > +}
> > +
> > static struct kvm_x86_ops svm_x86_ops __initdata = {
> > .name = "kvm_amd",
> >
> > @@ -4760,6 +4768,8 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
> >
> > .vcpu_after_set_cpuid = svm_vcpu_after_set_cpuid,
> >
> > + .private_mem_enabled = svm_private_mem_enabled,
> > +
> > .has_wbinvd_exit = svm_has_wbinvd_exit,
> >
> > .get_l2_tsc_offset = svm_get_l2_tsc_offset,
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 823646d601db..9a1ca59d36a4 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -12556,6 +12556,11 @@ void __user * __x86_set_memory_region(struct kvm *kvm, int id, gpa_t gpa,
> > }
> > EXPORT_SYMBOL_GPL(__x86_set_memory_region);
> >
> > +bool kvm_arch_has_private_mem(struct kvm *kvm)
> > +{
> > + return static_call(kvm_x86_private_mem_enabled)(kvm);
> > +}
> > +
> > void kvm_arch_pre_destroy_vm(struct kvm *kvm)
> > {
> > kvm_mmu_pre_destroy_vm(kvm);
> >
> > --
> > Regards/Gruss,
> > Boris.
> >
> > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpeople.kernel.org%2Ftglx%2Fnotes-about-netiquette&data=05%7C01%7Cmichael.roth%40amd.com%7C319e89ce555a46eace4d08dae506b51a%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C638074114318137471%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=aG11K7va1BhemwlKCKKdcIXEwXGUzImYL%2BZ9%2FQ7XToI%3D&reserved=0

BR, Jarkko

2023-01-20 22:27:07

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: [PATCH RFC v7 25/64] crypto: ccp: Add support to initialize the AMD-SP for SEV-SNP

On Thu, Jan 05, 2023 at 04:40:29PM -0600, Kalra, Ashish wrote:
> Hello Jarkko,
>
> On 12/31/2022 9:32 AM, Jarkko Sakkinen wrote:
> > On Wed, Dec 14, 2022 at 01:40:17PM -0600, Michael Roth wrote:
> > > From: Brijesh Singh <[email protected]>
> > >
> > > Before SNP VMs can be launched, the platform must be appropriately
> > > configured and initialized. Platform initialization is accomplished via
> > > the SNP_INIT command. Make sure to do a WBINVD and issue DF_FLUSH
> > > command to prepare for the first SNP guest launch after INIT.
> > >
> > > During the execution of SNP_INIT command, the firmware configures
> > > and enables SNP security policy enforcement in many system components.
> > > Some system components write to regions of memory reserved by early
> > > x86 firmware (e.g. UEFI). Other system components write to regions
> > > provided by the operation system, hypervisor, or x86 firmware.
> > > Such system components can only write to HV-fixed pages or Default
> > > pages. They will error when attempting to write to other page states
> > > after SNP_INIT enables their SNP enforcement.
> > >
> > > Starting in SNP firmware v1.52, the SNP_INIT_EX command takes a list of
> > > system physical address ranges to convert into the HV-fixed page states
> > > during the RMP initialization. If INIT_RMP is 1, hypervisors should
> > > provide all system physical address ranges that the hypervisor will
> > > never assign to a guest until the next RMP re-initialization.
> > > For instance, the memory that UEFI reserves should be included in the
> > > range list. This allows system components that occasionally write to
> > > memory (e.g. logging to UEFI reserved regions) to not fail due to
> > > RMP initialization and SNP enablement.
> > >
> > > Co-developed-by: Ashish Kalra <[email protected]>
> > > Signed-off-by: Ashish Kalra <[email protected]>
> > > Signed-off-by: Brijesh Singh <[email protected]>
> > > Signed-off-by: Michael Roth <[email protected]>
> > > ---
> > > drivers/crypto/ccp/sev-dev.c | 225 +++++++++++++++++++++++++++++++++++
> > > drivers/crypto/ccp/sev-dev.h | 2 +
> > > include/linux/psp-sev.h | 17 +++
> > > 3 files changed, 244 insertions(+)
> > >
> > > diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
> > > index 9d84720a41d7..af20420bd6c2 100644
> > > --- a/drivers/crypto/ccp/sev-dev.c
> > > +++ b/drivers/crypto/ccp/sev-dev.c
> > > @@ -26,6 +26,7 @@
> > > #include <linux/fs_struct.h>
> > > #include <asm/smp.h>
> > > +#include <asm/e820/types.h>
> > > #include "psp-dev.h"
> > > #include "sev-dev.h"
> > > @@ -34,6 +35,10 @@
> > > #define SEV_FW_FILE "amd/sev.fw"
> > > #define SEV_FW_NAME_SIZE 64
> > > +/* Minimum firmware version required for the SEV-SNP support */
> > > +#define SNP_MIN_API_MAJOR 1
> > > +#define SNP_MIN_API_MINOR 51
> > > +
> > > static DEFINE_MUTEX(sev_cmd_mutex);
> > > static struct sev_misc_dev *misc_dev;
> > > @@ -76,6 +81,13 @@ static void *sev_es_tmr;
> > > #define NV_LENGTH (32 * 1024)
> > > static void *sev_init_ex_buffer;
> > > +/*
> > > + * SEV_DATA_RANGE_LIST:
> > > + * Array containing range of pages that firmware transitions to HV-fixed
> > > + * page state.
> > > + */
> > > +struct sev_data_range_list *snp_range_list;
> > > +
> > > static inline bool sev_version_greater_or_equal(u8 maj, u8 min)
> > > {
> > > struct sev_device *sev = psp_master->sev_data;
> > > @@ -830,6 +842,186 @@ static int sev_update_firmware(struct device *dev)
> > > return ret;
> > > }
> > > +static void snp_set_hsave_pa(void *arg)
> > > +{
> > > + wrmsrl(MSR_VM_HSAVE_PA, 0);
> > > +}
> > > +
> > > +static int snp_filter_reserved_mem_regions(struct resource *rs, void *arg)
> > > +{
> > > + struct sev_data_range_list *range_list = arg;
> > > + struct sev_data_range *range = &range_list->ranges[range_list->num_elements];
> > > + size_t size;
> > > +
> > > + if ((range_list->num_elements * sizeof(struct sev_data_range) +
> > > + sizeof(struct sev_data_range_list)) > PAGE_SIZE)
> > > + return -E2BIG;
> > > +
> > > + switch (rs->desc) {
> > > + case E820_TYPE_RESERVED:
> > > + case E820_TYPE_PMEM:
> > > + case E820_TYPE_ACPI:
> > > + range->base = rs->start & PAGE_MASK;
> > > + size = (rs->end + 1) - rs->start;
> > > + range->page_count = size >> PAGE_SHIFT;
> > > + range_list->num_elements++;
> > > + break;
> > > + default:
> > > + break;
> > > + }
> > > +
> > > + return 0;
> > > +}
> > > +
> > > +static int __sev_snp_init_locked(int *error)
> > > +{
> > > + struct psp_device *psp = psp_master;
> > > + struct sev_data_snp_init_ex data;
> > > + struct sev_device *sev;
> > > + int rc = 0;
> > > +
> > > + if (!psp || !psp->sev_data)
> > > + return -ENODEV;
> > > +
> > > + sev = psp->sev_data;
> > > +
> > > + if (sev->snp_initialized)
> > > + return 0;
> >
> > Shouldn't this follow this check:
> >
> > if (sev->state == SEV_STATE_INIT) {
> > /* debug printk about possible incorrect call order */
> > return -ENODEV;
> > }
> >
> > It is game over for SNP, if SEV_CMD_INIT{_EX} got first, which means that
> > this should not proceed.
>
>
> But, how will SEV_CMD_INIT_EX happen before as sev_pci_init() which is
> invoked during CCP module load/initialization, will first try to do
> sev_snp_init() if SNP is supported, before it invokes sev_platform_init() to
> do SEV firmware initialization ?

Because the symbol is exported outside the driver to be called by other
subsystems, you need to have a santiy check for the call order, as it
is a hardware constraint. Otherwise, any unconsidered change in either
side could unknowingily break the kernel.

Alternatively, you could choose not to export sev_snp_init(). It is
supported by the fact that the call in sev_guest_init() is does nothing
useful (for the reasons you already wrote).

BR, Jarkko

2023-01-20 23:06:37

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: [PATCH RFC v7 25/64] crypto: ccp: Add support to initialize the AMD-SP for SEV-SNP

On Thu, Jan 05, 2023 at 04:54:23PM -0600, Kalra, Ashish wrote:
> Hello Jarkko,
>
> On 1/4/2023 6:12 AM, Jarkko Sakkinen wrote:
> > On Wed, Dec 14, 2022 at 01:40:17PM -0600, Michael Roth wrote:
> > > + /*
> > > + * If boot CPU supports SNP, then first attempt to initialize
> > > + * the SNP firmware.
> > > + */
> > > + if (cpu_feature_enabled(X86_FEATURE_SEV_SNP)) {
> > > + if (!sev_version_greater_or_equal(SNP_MIN_API_MAJOR, SNP_MIN_API_MINOR)) {
> > > + dev_err(sev->dev, "SEV-SNP support requires firmware version >= %d:%d\n",
> > > + SNP_MIN_API_MAJOR, SNP_MIN_API_MINOR);
> > > + } else {
> > > + rc = sev_snp_init(&error, true);
> > > + if (rc) {
> > > + /*
> > > + * Don't abort the probe if SNP INIT failed,
> > > + * continue to initialize the legacy SEV firmware.
> > > + */
> > > + dev_err(sev->dev, "SEV-SNP: failed to INIT error %#x\n", error);
> > > + }
> > > + }
> > > + }
> >
> > I think this is not right as there is a dep between sev init and this,
> > and there is about a dozen of call sites already __sev_platform_init_locked().
> >
>
> sev_init ?
>
> As this is invoked during CCP module load/initialization, shouldn't this get
> invoked before any other call sites invoking __sev_platform_init_locked() ?

Then it should not be exported because this the only working call site.

However, the benefit of __sev_platform_init_locked() addressing SNP init is
that psp_init_on_probe can also postpone SNP init without possibility to
any side effects (other call sites than sev_guest_init()).

BR, Jarkko

2023-01-20 23:19:10

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: [PATCH RFC v7 37/64] KVM: SVM: Add KVM_SNP_INIT command

On Thu, Jan 05, 2023 at 05:37:20PM -0600, Kalra, Ashish wrote:
> Hello Jarkko,
>
> On 12/31/2022 8:27 AM, Jarkko Sakkinen wrote:
> > On Wed, Dec 14, 2022 at 01:40:29PM -0600, Michael Roth wrote:
> > > static int sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp)
> > > {
> > > struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> > > @@ -260,13 +279,23 @@ static int sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp)
> > > return ret;
> > > sev->active = true;
> > > - sev->es_active = argp->id == KVM_SEV_ES_INIT;
> > > + sev->es_active = (argp->id == KVM_SEV_ES_INIT || argp->id == KVM_SEV_SNP_INIT);
> > > + sev->snp_active = argp->id == KVM_SEV_SNP_INIT;
> > > asid = sev_asid_new(sev);
> > > if (asid < 0)
> > > goto e_no_asid;
> > > sev->asid = asid;
> > > - ret = sev_platform_init(&argp->error);
> > > + if (sev->snp_active) {
> > > + ret = verify_snp_init_flags(kvm, argp);
> > > + if (ret)
> > > + goto e_free;
> > > +
> > > + ret = sev_snp_init(&argp->error, false);
> > > + } else {
> > > + ret = sev_platform_init(&argp->error);
> > > + }
> >
> > Couldn't sev_snp_init() and sev_platform_init() be called unconditionally
> > in order?
> >
> > Since there is a hardware constraint that SNP init needs to always happen
> > before platform init, shouldn't SNP init happen as part of
> > __sev_platform_init_locked() instead?
> >
>
> On Genoa there is currently an issue that if we do an SNP_INIT before an
> SEV_INIT and then attempt to launch a SEV guest that may fail, so we need to
> keep SNP INIT and SEV INIT separate.
>
> We need to provide a way to run (existing) SEV guests on a system that
> supports SNP without doing an SNP_INIT at all.
>
> This is done using psp_init_on_probe parameter of the CCP module to avoid
> doing either SNP/SEV firmware initialization during module load and then
> defer the firmware initialization till someone launches a guest of one
> flavor or the other.
>
> And then sev_guest_init() does either SNP or SEV firmware init depending on
> the type of the guest being launched.

OK, got it, thank you. I have not noticed the init_on_probe for
sev_snp_init() before. Was it in earlier patch set version?

The benefit of having everything in __sev_platform_init_lock() would be first
less risk of shooting yourself into foot, and also no need to pass
init_on_probe to sev_snp_init() as it would be internal to sev-dev.c, and
no need for special cases for callers. It is in my opinion internals of the
SEV driver to guarantee the order.

E.g. changes to svm/sev.c would be then quite trivial.

> > I found these call sites for __sev_platform_init_locked(), none of which
> > follow the correct call order:
> >
> > * sev_guest_init()
>
> As explained above, this call site is important for deferring the firmware
> initialization to an actual guest launch.
>
> > * sev_ioctl_do_pek_csr
> > * sev_ioctl_do_pdh_export()
> > * sev_ioctl_do_pek_import()
> > * sev_ioctl_do_pek_pdh_gen()

What happens if any of these are called before sev_guest_init()? They only
call __sev_platform_init_locked().

> > * sev_pci_init()
> >
> > For me it looks like a bit flakky API use to have sev_snp_init() as an API
> > call.
> >
> > I would suggest to make SNP init internal to the ccp driver and take care
> > of the correct orchestration over there.
> >
>
> Due to Genoa issue, we may still need SNP init and SEV init to be invoked
> separately outside the CCP driver.
>
> > Also, how it currently works in this patch set, if the firmware did not
> > load correctly, SNP init halts the whole system. The version check needs
> > to be in all call paths.
> >
>
> Yes, i agree with that.

Attached the fix I sent in private earlier.

> Thanks,
> Ashish

BR, Jarkko


Attachments:
(No filename) (3.78 kB)
0001-crypto-ccp-Prevent-a-spurious-SEV_CMD_SNP_INIT-trigg.patch (2.12 kB)
Download all attachments

2023-01-17 17:20:19

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH RFC v7 07/64] KVM: SEV: Handle KVM_HC_MAP_GPA_RANGE hypercall

On Mon, Jan 16, 2023, Nikunj A. Dadhania wrote:
> On 13/01/23 21:47, Sean Christopherson wrote:
> > It's perfectly legal for userspace to create the private memslot in response
> > to a guest request.
>
> Sean, did not understand this part, how could a memslot be created on a guest request?

KVM_HC_MAP_GPA_RANGE gets routed to host userspace, at that point userspace can
take any action it wants to satisfy the guest request. E.g. a userspace+guest
setup could define memory as shared by default, and only create KVM_MEM_PRIVATE
memslots for memory that the guest explicitly requests to be mapped private.

I don't anticipate any real world use cases actually doing something like that,
but I also don't see any value in going out of our way to disallow it. Normally
I like to be conservative when it comes to KVM's uAPI, e.g. allow the minimum
needed to support known use cases, but restricting KVM_HC_MAP_GPA_RANGE doesn't
actually achieve anything and just makes things more complex for KVM. E.g. the
behavior is non-deterministic from KVM's perspective if a userspace memslots update
is in-progress.

2023-01-18 00:14:20

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: [PATCH RFC v7 11/64] KVM: SEV: Support private pages in LAUNCH_UPDATE_DATA

On Wed, Dec 14, 2022 at 01:40:03PM -0600, Michael Roth wrote:
> From: Nikunj A Dadhania <[email protected]>
>
> Pre-boot guest payload needs to be encrypted and VMM has copied it
> over to the private-fd. Add support to get the pfn from the memfile fd
> for encrypting the payload in-place.
>
> Signed-off-by: Nikunj A Dadhania <[email protected]>
> Signed-off-by: Michael Roth <[email protected]>
> ---
> arch/x86/kvm/svm/sev.c | 79 ++++++++++++++++++++++++++++++++++--------
> 1 file changed, 64 insertions(+), 15 deletions(-)
>
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index a7e4e3005786..ae4920aeb281 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -107,6 +107,11 @@ static inline bool is_mirroring_enc_context(struct kvm *kvm)
> return !!to_kvm_svm(kvm)->sev_info.enc_context_owner;
> }
>
> +static bool kvm_is_upm_enabled(struct kvm *kvm)
> +{
> + return kvm->arch.upm_mode;
> +}
> +
> /* Must be called with the sev_bitmap_lock held */
> static bool __sev_recycle_asids(int min_asid, int max_asid)
> {
> @@ -382,6 +387,38 @@ static int sev_launch_start(struct kvm *kvm, struct kvm_sev_cmd *argp)
> return ret;
> }
>
> +static int sev_get_memfile_pfn_handler(struct kvm *kvm, struct kvm_gfn_range *range, void *data)
> +{
> + struct kvm_memory_slot *memslot = range->slot;
> + struct page **pages = data;
> + int ret = 0, i = 0;
> + kvm_pfn_t pfn;
> + gfn_t gfn;
> +
> + for (gfn = range->start; gfn < range->end; gfn++) {
> + int order;
> +
> + ret = kvm_restricted_mem_get_pfn(memslot, gfn, &pfn, &order);
> + if (ret)
> + return ret;
> +
> + if (is_error_noslot_pfn(pfn))
> + return -EFAULT;
> +
> + pages[i++] = pfn_to_page(pfn);
> + }
> +
> + return ret;
> +}
> +
> +static int sev_get_memfile_pfn(struct kvm *kvm, unsigned long addr,
> + unsigned long size, unsigned long npages,
> + struct page **pages)
> +{
> + return kvm_vm_do_hva_range_op(kvm, addr, size,
> + sev_get_memfile_pfn_handler, pages);
> +}
> +
> static struct page **sev_pin_memory(struct kvm *kvm, unsigned long uaddr,
> unsigned long ulen, unsigned long *n,
> int write)
> @@ -424,16 +461,25 @@ static struct page **sev_pin_memory(struct kvm *kvm, unsigned long uaddr,
> if (!pages)
> return ERR_PTR(-ENOMEM);
>
> - /* Pin the user virtual address. */
> - npinned = pin_user_pages_fast(uaddr, npages, write ? FOLL_WRITE : 0, pages);
> - if (npinned != npages) {
> - pr_err("SEV: Failure locking %lu pages.\n", npages);
> - ret = -ENOMEM;
> - goto err;
> + if (kvm_is_upm_enabled(kvm)) {
> + /* Get the PFN from memfile */
> + if (sev_get_memfile_pfn(kvm, uaddr, ulen, npages, pages)) {
> + pr_err("%s: ERROR: unable to find slot for uaddr %lx", __func__, uaddr);
> + ret = -ENOMEM;
> + goto err;
> + }
> + } else {
> + /* Pin the user virtual address. */
> + npinned = pin_user_pages_fast(uaddr, npages, write ? FOLL_WRITE : 0, pages);
> + if (npinned != npages) {
> + pr_err("SEV: Failure locking %lu pages.\n", npages);
> + ret = -ENOMEM;
> + goto err;
> + }
> + sev->pages_locked = locked;
> }
>
> *n = npages;
> - sev->pages_locked = locked;
>
> return pages;
>
> @@ -514,6 +560,7 @@ static int sev_launch_update_shared_gfn_handler(struct kvm *kvm,
>
> size = (range->end - range->start) << PAGE_SHIFT;
> vaddr_end = vaddr + size;
> + WARN_ON(size < PAGE_SIZE);
>
> /* Lock the user memory. */
> inpages = sev_pin_memory(kvm, vaddr, size, &npages, 1);
> @@ -554,13 +601,16 @@ static int sev_launch_update_shared_gfn_handler(struct kvm *kvm,
> }
>
> e_unpin:
> - /* content of memory is updated, mark pages dirty */
> - for (i = 0; i < npages; i++) {
> - set_page_dirty_lock(inpages[i]);
> - mark_page_accessed(inpages[i]);
> + if (!kvm_is_upm_enabled(kvm)) {
> + /* content of memory is updated, mark pages dirty */
> + for (i = 0; i < npages; i++) {
> + set_page_dirty_lock(inpages[i]);
> + mark_page_accessed(inpages[i]);
> + }
> + /* unlock the user pages */
> + sev_unpin_memory(kvm, inpages, npages);
> }
> - /* unlock the user pages */
> - sev_unpin_memory(kvm, inpages, npages);
> +
> return ret;
> }
>
> @@ -609,9 +659,8 @@ static int sev_launch_update_priv_gfn_handler(struct kvm *kvm,
> goto e_ret;
> kvm_release_pfn_clean(pfn);
> }
> - kvm_vm_set_region_attr(kvm, range->start, range->end,
> - true /* priv_attr */);
>
> + kvm_vm_set_region_attr(kvm, range->start, range->end, KVM_MEMORY_ATTRIBUTE_PRIVATE);
> e_ret:
> return ret;
> }
> --
> 2.25.1
>

kvm_vm_set_region_attr() should be fixed already in:

https://lore.kernel.org/all/[email protected]/

BR, Jarkko


2023-01-22 12:45:49

by Tom Dohrmann

[permalink] [raw]
Subject: Re: [PATCH RFC v7 06/64] KVM: x86: Add platform hooks for private memory invalidations

On Wed, Dec 14, 2022 at 01:39:58PM -0600, Michael Roth wrote:
> In some cases, like with SEV-SNP, guest memory needs to be updated in a
> platform-specific manner before it can be safely freed back to the host.
> Add hooks to wire up handling of this sort to the invalidation notifiers
> for restricted memory.
>
> Also issue invalidations of all allocated pages during notifier
> unregistration so that the pages are not left in an unusable state when
> they eventually get freed back to the host upon FD release.
>
> Signed-off-by: Michael Roth <[email protected]>
> ---
> arch/x86/include/asm/kvm-x86-ops.h | 1 +
> arch/x86/include/asm/kvm_host.h | 1 +
> arch/x86/kvm/mmu/mmu.c | 5 +++++
> include/linux/kvm_host.h | 2 ++
> mm/restrictedmem.c | 16 ++++++++++++++++
> virt/kvm/kvm_main.c | 5 +++++
> 6 files changed, 30 insertions(+)
>
> diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
> index 52f94a0ba5e9..c71df44b0f02 100644
> --- a/arch/x86/include/asm/kvm-x86-ops.h
> +++ b/arch/x86/include/asm/kvm-x86-ops.h
> @@ -134,6 +134,7 @@ KVM_X86_OP_OPTIONAL_RET0(vcpu_get_apicv_inhibit_reasons);
> KVM_X86_OP_OPTIONAL_RET0(private_mem_enabled);
> KVM_X86_OP_OPTIONAL_RET0(fault_is_private);
> KVM_X86_OP_OPTIONAL_RET0(update_mem_attr)
> +KVM_X86_OP_OPTIONAL(invalidate_restricted_mem)
>
> #undef KVM_X86_OP
> #undef KVM_X86_OP_OPTIONAL
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 13802389f0f9..9ef8d73455d9 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1639,6 +1639,7 @@ struct kvm_x86_ops {
> int (*fault_is_private)(struct kvm *kvm, gpa_t gpa, u64 error_code, bool *private_fault);
> int (*update_mem_attr)(struct kvm_memory_slot *slot, unsigned int attr,
> gfn_t start, gfn_t end);
> + void (*invalidate_restricted_mem)(struct kvm_memory_slot *slot, gfn_t start, gfn_t end);
>
> bool (*has_wbinvd_exit)(void);
>
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index a0c41d391547..2713632e5061 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -7183,3 +7183,8 @@ void kvm_arch_set_memory_attributes(struct kvm *kvm,
> kvm_update_lpage_private_shared_mixed(kvm, slot, attrs,
> start, end);
> }
> +
> +void kvm_arch_invalidate_restricted_mem(struct kvm_memory_slot *slot, gfn_t start, gfn_t end)
> +{
> + static_call_cond(kvm_x86_invalidate_restricted_mem)(slot, start, end);
> +}
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index f032d878e034..f72a2e0b8699 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -2327,6 +2327,7 @@ void kvm_arch_set_memory_attributes(struct kvm *kvm,
> struct kvm_memory_slot *slot,
> unsigned long attrs,
> gfn_t start, gfn_t end);
> +
> #else
> static inline void kvm_arch_set_memory_attributes(struct kvm *kvm,
> struct kvm_memory_slot *slot,
> @@ -2366,6 +2367,7 @@ static inline int kvm_restricted_mem_get_pfn(struct kvm_memory_slot *slot,
> }
>
> void kvm_arch_memory_mce(struct kvm *kvm);
> +void kvm_arch_invalidate_restricted_mem(struct kvm_memory_slot *slot, gfn_t start, gfn_t end);
> #endif /* CONFIG_HAVE_KVM_RESTRICTED_MEM */
>
> #endif
> diff --git a/mm/restrictedmem.c b/mm/restrictedmem.c
> index 56953c204e5c..74fa2cfb8618 100644
> --- a/mm/restrictedmem.c
> +++ b/mm/restrictedmem.c
> @@ -54,6 +54,11 @@ static int restrictedmem_release(struct inode *inode, struct file *file)
> {
> struct restrictedmem_data *data = inode->i_mapping->private_data;
>
> + pr_debug("%s: releasing memfd, invalidating page offsets 0x0-0x%llx\n",
> + __func__, inode->i_size >> PAGE_SHIFT);
> + restrictedmem_invalidate_start(data, 0, inode->i_size >> PAGE_SHIFT);
> + restrictedmem_invalidate_end(data, 0, inode->i_size >> PAGE_SHIFT);
> +
> fput(data->memfd);
> kfree(data);
> return 0;
> @@ -258,6 +263,17 @@ void restrictedmem_unregister_notifier(struct file *file,
> struct restrictedmem_notifier *notifier)
> {
> struct restrictedmem_data *data = file->f_mapping->private_data;
> + struct inode *inode = file_inode(data->memfd);
> +
> + /* TODO: this will issue notifications to all registered notifiers,
> + * but it's only the one being unregistered that needs to process
> + * invalidations for any ranges still allocated at this point in
> + * time. For now this relies on KVM currently being the only notifier.
> + */
> + pr_debug("%s: unregistering notifier, invalidating page offsets 0x0-0x%llx\n",
> + __func__, inode->i_size >> PAGE_SHIFT);
> + restrictedmem_invalidate_start(data, 0, inode->i_size >> PAGE_SHIFT);
> + restrictedmem_invalidate_end(data, 0, inode->i_size >> PAGE_SHIFT);
>
> mutex_lock(&data->lock);
> list_del(&notifier->list);
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index d2d829d23442..d2daa049e94a 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -974,6 +974,9 @@ static void kvm_restrictedmem_invalidate_begin(struct restrictedmem_notifier *no
> &gfn_start, &gfn_end))
> return;
>
> + pr_debug("%s: start: 0x%lx, end: 0x%lx, roffset: 0x%llx, gfn_start: 0x%llx, gfn_end: 0x%llx\n",
> + __func__, start, end, slot->restricted_offset, gfn_start, gfn_end);
> +
> gfn_range.start = gfn_start;
> gfn_range.end = gfn_end;
> gfn_range.slot = slot;
> @@ -988,6 +991,8 @@ static void kvm_restrictedmem_invalidate_begin(struct restrictedmem_notifier *no
> if (kvm_unmap_gfn_range(kvm, &gfn_range))
> kvm_flush_remote_tlbs(kvm);
>
> + kvm_arch_invalidate_restricted_mem(slot, gfn_start, gfn_end);

Calling kvm_arch_invalidate_restricted_mem while the KVM MMU lock is taken
causes problems, because taking said lock disables preemption. Within
kvm_arch_invalidate_restricted_mem a few calls down, eventually
vm_unmap_aliases is called which tries to lock a mutex, which shouldn't happen
with preemption disabled. This causes a "scheduling while atomic" bug:

[ 152.846596] BUG: scheduling while atomic: enarx/8302/0x00000002
[ 152.846599] Modules linked in: nf_conntrack_netlink(E) xfrm_user(E) xfrm_algo(E) xt_addrtype(E) br_netfilter(E) xt_CHECKSUM(E) xt_MASQUERADE(E) xt_conntrack(E) ipt_REJECT(E) nf_reject_ipv4(E) xt_tcpudp(E) nft_compat(E) nft_chain_nat(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) nf_tables(E) libcrc32c(E) nfnetlink(E) bridge(E) stp(E) llc(E) bonding(E) intel_rapl_msr(E) intel_rapl_common(E) amd64_edac(E) edac_mce_amd(E) kvm_amd(E) tun(E) ipmi_ssif(E) rfkill(E) overlay(E) ghash_clmulni_intel(E) sha512_ssse3(E) sha512_generic(E) aesni_intel(E) libaes(E) crypto_simd(E) cryptd(E) rapl(E) wmi_bmof(E) binfmt_misc(E) kvm(E) irqbypass(E) nls_ascii(E) nls_cp437(E) vfat(E) fat(E) snd_usb_audio(E) snd_usbmidi_lib(E) snd_hwdep(E) mc(E) snd_pcm(E) snd_seq_midi(E) snd_seq_midi_event(E) snd_rawmidi(E) snd_seq(E) ast(E) snd_seq_device(E) drm_vram_helper(E) drm_ttm_helper(E) snd_timer(E) ttm(E) joydev(E) snd(E) ccp(E) drm_kms_helper(E) soundcore(E) sg(E) i2c_algo_bit(E) rng_core(E)
[ 152.846629] k10temp(E) evdev(E) acpi_ipmi(E) ipmi_si(E) ipmi_devintf(E) ipmi_msghandler(E) acpi_cpufreq(E) button(E) squashfs(E) loop(E) sch_fq_codel(E) msr(E) parport_pc(E) ppdev(E) lp(E) ramoops(E) parport(E) reed_solomon(E) fuse(E) drm(E) efi_pstore(E) configfs(E) efivarfs(E) ip_tables(E) x_tables(E) autofs4(E) ext4(E) crc32c_generic(E) crc16(E) mbcache(E) jbd2(E) rndis_host(E) cdc_ether(E) usbnet(E) mii(E) hid_generic(E) usbhid(E) hid(E) sd_mod(E) t10_pi(E) crc64_rocksoft(E) crc64(E) crc_t10dif(E) crct10dif_generic(E) crct10dif_pclmul(E) crct10dif_common(E) crc32_pclmul(E) crc32c_intel(E) ahci(E) libahci(E) xhci_pci(E) libata(E) bnxt_en(E) xhci_hcd(E) scsi_mod(E) ptp(E) scsi_common(E) pps_core(E) usbcore(E) i2c_piix4(E) usb_common(E) wmi(E)
[ 152.846657] Preemption disabled at:
[ 152.846657] [<ffffffffc146a09a>] kvm_restrictedmem_invalidate_begin+0xba/0x1c0 [kvm]
[ 152.846688] CPU: 108 PID: 8302 Comm: enarx Tainted: G W E 6.1.0-rc4+ #30
[ 152.846690] Hardware name: Supermicro Super Server/H12SSL-NT, BIOS 2.4 04/14/2022
[ 152.846691] Call Trace:
[ 152.846692] <TASK>
[ 152.846694] dump_stack_lvl+0x49/0x63
[ 152.846695] ? kvm_restrictedmem_invalidate_begin+0xba/0x1c0 [kvm]
[ 152.846723] dump_stack+0x10/0x16
[ 152.846725] __schedule_bug.cold+0x81/0x92
[ 152.846727] __schedule+0x809/0xa00
[ 152.846729] ? asm_sysvec_call_function+0x1b/0x20
[ 152.846731] schedule+0x6b/0xf0
[ 152.846733] schedule_preempt_disabled+0x18/0x30
[ 152.846735] __mutex_lock.constprop.0+0x723/0x750
[ 152.846738] ? smp_call_function_many_cond+0xc1/0x2e0
[ 152.846740] __mutex_lock_slowpath+0x13/0x20
[ 152.846742] mutex_lock+0x49/0x60
[ 152.846744] _vm_unmap_aliases+0x10e/0x160
[ 152.846746] vm_unmap_aliases+0x19/0x20
[ 152.846748] change_page_attr_set_clr+0xb7/0x1c0
[ 152.846751] set_memory_p+0x29/0x30
[ 152.846753] rmpupdate+0xd5/0x110
[ 152.846756] rmp_make_shared+0xb7/0xc0
[ 152.846758] snp_make_page_shared.constprop.0+0x4c/0x90 [kvm_amd]
[ 152.846765] sev_invalidate_private_range+0x156/0x330 [kvm_amd]
[ 152.846770] ? kvm_unmap_gfn_range+0xef/0x100 [kvm]
[ 152.846801] kvm_arch_invalidate_restricted_mem+0xe/0x20 [kvm]
[ 152.846829] kvm_restrictedmem_invalidate_begin+0x106/0x1c0 [kvm]
[ 152.846856] restrictedmem_unregister_notifier+0x74/0x150
[ 152.846859] kvm_free_memslot+0x6b/0x80 [kvm]
[ 152.846885] kvm_free_memslots.part.0+0x47/0x70 [kvm]
[ 152.846911] kvm_destroy_vm+0x222/0x320 [kvm]
[ 152.846937] kvm_put_kvm+0x2a/0x50 [kvm]
[ 152.846964] kvm_vm_release+0x22/0x30 [kvm]
[ 152.846990] __fput+0xa8/0x280
[ 152.846992] ____fput+0xe/0x20
[ 152.846994] task_work_run+0x61/0xb0
[ 152.846996] do_exit+0x362/0xb30
[ 152.846998] ? tomoyo_path_number_perm+0x6f/0x200
[ 152.847001] do_group_exit+0x38/0xa0
[ 152.847003] get_signal+0x999/0x9c0
[ 152.847005] arch_do_signal_or_restart+0x37/0x7e0
[ 152.847008] ? __might_fault+0x26/0x30
[ 152.847010] ? __rseq_handle_notify_resume+0xd5/0x4f0
[ 152.847013] exit_to_user_mode_prepare+0xd3/0x170
[ 152.847016] syscall_exit_to_user_mode+0x26/0x50
[ 152.847019] do_syscall_64+0x48/0x90
[ 152.847020] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 152.847022] RIP: 0033:0x7fa345f1aaff
[ 152.847023] Code: Unable to access opcode bytes at 0x7fa345f1aad5.
[ 152.847024] RSP: 002b:00007fff99d6c050 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 152.847026] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 00007fa345f1aaff
[ 152.847027] RDX: 00007fff99d6c188 RSI: 00000000c008aeba RDI: 0000000000000006
[ 152.847028] RBP: 00007fff99576000 R08: 0000000000000000 R09: 0000000000000000
[ 152.847029] R10: 0000000001680000 R11: 0000000000000246 R12: 00007fff99d752c0
[ 152.847030] R13: 00007fff99d75270 R14: 0000000000000000 R15: 00007fff99577000
[ 152.847032] </TASK>

This bug can be triggered by destroying multiple SNP VMs at the same time.

> +
> KVM_MMU_UNLOCK(kvm);
> srcu_read_unlock(&kvm->srcu, idx);
> }
> --
> 2.25.1
>
>

Regards, Tom

2023-01-22 16:10:14

by Sabin Rapan

[permalink] [raw]
Subject: Re: [PATCH RFC v7 24/64] crypto:ccp: Define the SEV-SNP commands



On 14.12.2022 21:40, Michael Roth wrote:
> From: Brijesh Singh <[email protected]>
>
> +/*
> + * struct sev_user_data_snp_config - system wide configuration value for SNP.
> + *
> + * @reported_tcb: The TCB version to report in the guest attestation report.
> + * @mask_chip_id: Indicates that the CHID_ID field in the attestation report
> + * will always be zero.
> + */
> +struct sev_user_data_snp_config {
> + __u64 reported_tcb; /* In */
> + __u32 mask_chip_id; /* In */
> + __u8 rsvd[52];
> +} __packed;
> +

Based on table 45 section 8.6.1 in
https://www.amd.com/system/files/TechDocs/56860.pdf I think this should be

struct sev_user_data_snp_config {
__u64 reported_tcb; /* In */
__u32 mask_chip_id:1; /* In */
__u32 mask_chip_key:1; /* In */
__u32 rsvd:30;
__u8 rsvd1[52];
} __packed;


--
Sabin.



Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in Romania. Registration number J22/2621/2005.

2023-01-23 22:49:32

by Kalra, Ashish

[permalink] [raw]
Subject: Re: [PATCH RFC v7 37/64] KVM: SVM: Add KVM_SNP_INIT command

There was an early firmware issue on Genoa which supported only SNP_INIT
or SEV_INIT, but this issue is resolved now.

Now, the main constraints are that SNP_INIT is always required before
SEV_INIT in case we want to launch SNP guests. In other words, if only
SEV_INIT is done on a platform which supports SNP we won't be able to
launch SNP guests after that.

So once we have RMP table setup (in BIOS) we will always do an SNP_INIT
and SEV_INIT will be ideally done only (on demand) when an SEV guest is
launched.

Thanks,
Ashish

On 1/5/2023 5:37 PM, Kalra, Ashish wrote:
> Hello Jarkko,
>
> On 12/31/2022 8:27 AM, Jarkko Sakkinen wrote:
>> On Wed, Dec 14, 2022 at 01:40:29PM -0600, Michael Roth wrote:
>>>   static int sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp)
>>>   {
>>>       struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
>>> @@ -260,13 +279,23 @@ static int sev_guest_init(struct kvm *kvm,
>>> struct kvm_sev_cmd *argp)
>>>           return ret;
>>>       sev->active = true;
>>> -    sev->es_active = argp->id == KVM_SEV_ES_INIT;
>>> +    sev->es_active = (argp->id == KVM_SEV_ES_INIT || argp->id ==
>>> KVM_SEV_SNP_INIT);
>>> +    sev->snp_active = argp->id == KVM_SEV_SNP_INIT;
>>>       asid = sev_asid_new(sev);
>>>       if (asid < 0)
>>>           goto e_no_asid;
>>>       sev->asid = asid;
>>> -    ret = sev_platform_init(&argp->error);
>>> +    if (sev->snp_active) {
>>> +        ret = verify_snp_init_flags(kvm, argp);
>>> +        if (ret)
>>> +            goto e_free;
>>> +
>>> +        ret = sev_snp_init(&argp->error, false);
>>> +    } else {
>>> +        ret = sev_platform_init(&argp->error);
>>> +    }
>>
>> Couldn't sev_snp_init() and sev_platform_init() be called unconditionally
>> in order?
>>
>> Since there is a hardware constraint that SNP init needs to always happen
>> before platform init, shouldn't SNP init happen as part of
>> __sev_platform_init_locked() instead?
>>
>
> On Genoa there is currently an issue that if we do an SNP_INIT before an
> SEV_INIT and then attempt to launch a SEV guest that may fail, so we
> need to keep SNP INIT and SEV INIT separate.
>
> We need to provide a way to run (existing) SEV guests on a system that
> supports SNP without doing an SNP_INIT at all.
>
> This is done using psp_init_on_probe parameter of the CCP module to
> avoid doing either SNP/SEV firmware initialization during module load
> and then defer the firmware initialization till someone launches a guest
> of one flavor or the other.
>
> And then sev_guest_init() does either SNP or SEV firmware init depending
> on the type of the guest being launched.
>
>> I found these call sites for __sev_platform_init_locked(), none of which
>> follow the correct call order:
>>
>> * sev_guest_init()
>
> As explained above, this call site is important for deferring the
> firmware initialization to an actual guest launch.
>
>> * sev_ioctl_do_pek_csr
>> * sev_ioctl_do_pdh_export()
>> * sev_ioctl_do_pek_import()
>> * sev_ioctl_do_pek_pdh_gen()
>> * sev_pci_init()
>>
>> For me it looks like a bit flakky API use to have sev_snp_init() as an
>> API
>> call.
>>
>> I would suggest to make SNP init internal to the ccp driver and take care
>> of the correct orchestration over there.
>>
>
> Due to Genoa issue, we may still need SNP init and SEV init to be
> invoked separately outside the CCP driver.
>
>> Also, how it currently works in this patch set, if the firmware did not
>> load correctly, SNP init halts the whole system. The version check needs
>> to be in all call paths.
>>
>
> Yes, i agree with that.
>
> Thanks,
> Ashish

2023-01-26 15:52:18

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: [PATCH RFC v7 06/64] KVM: x86: Add platform hooks for private memory invalidations

On Sun, Jan 22, 2023 at 01:43:48PM +0100, Tom Dohrmann wrote:
> On Wed, Dec 14, 2022 at 01:39:58PM -0600, Michael Roth wrote:
> > In some cases, like with SEV-SNP, guest memory needs to be updated in a
> > platform-specific manner before it can be safely freed back to the host.
> > Add hooks to wire up handling of this sort to the invalidation notifiers
> > for restricted memory.
> >
> > Also issue invalidations of all allocated pages during notifier
> > unregistration so that the pages are not left in an unusable state when
> > they eventually get freed back to the host upon FD release.
> >
> > Signed-off-by: Michael Roth <[email protected]>
> > ---
> > arch/x86/include/asm/kvm-x86-ops.h | 1 +
> > arch/x86/include/asm/kvm_host.h | 1 +
> > arch/x86/kvm/mmu/mmu.c | 5 +++++
> > include/linux/kvm_host.h | 2 ++
> > mm/restrictedmem.c | 16 ++++++++++++++++
> > virt/kvm/kvm_main.c | 5 +++++
> > 6 files changed, 30 insertions(+)
> >
> > diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
> > index 52f94a0ba5e9..c71df44b0f02 100644
> > --- a/arch/x86/include/asm/kvm-x86-ops.h
> > +++ b/arch/x86/include/asm/kvm-x86-ops.h
> > @@ -134,6 +134,7 @@ KVM_X86_OP_OPTIONAL_RET0(vcpu_get_apicv_inhibit_reasons);
> > KVM_X86_OP_OPTIONAL_RET0(private_mem_enabled);
> > KVM_X86_OP_OPTIONAL_RET0(fault_is_private);
> > KVM_X86_OP_OPTIONAL_RET0(update_mem_attr)
> > +KVM_X86_OP_OPTIONAL(invalidate_restricted_mem)
> >
> > #undef KVM_X86_OP
> > #undef KVM_X86_OP_OPTIONAL
> > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > index 13802389f0f9..9ef8d73455d9 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -1639,6 +1639,7 @@ struct kvm_x86_ops {
> > int (*fault_is_private)(struct kvm *kvm, gpa_t gpa, u64 error_code, bool *private_fault);
> > int (*update_mem_attr)(struct kvm_memory_slot *slot, unsigned int attr,
> > gfn_t start, gfn_t end);
> > + void (*invalidate_restricted_mem)(struct kvm_memory_slot *slot, gfn_t start, gfn_t end);
> >
> > bool (*has_wbinvd_exit)(void);
> >
> > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > index a0c41d391547..2713632e5061 100644
> > --- a/arch/x86/kvm/mmu/mmu.c
> > +++ b/arch/x86/kvm/mmu/mmu.c
> > @@ -7183,3 +7183,8 @@ void kvm_arch_set_memory_attributes(struct kvm *kvm,
> > kvm_update_lpage_private_shared_mixed(kvm, slot, attrs,
> > start, end);
> > }
> > +
> > +void kvm_arch_invalidate_restricted_mem(struct kvm_memory_slot *slot, gfn_t start, gfn_t end)
> > +{
> > + static_call_cond(kvm_x86_invalidate_restricted_mem)(slot, start, end);
> > +}
> > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > index f032d878e034..f72a2e0b8699 100644
> > --- a/include/linux/kvm_host.h
> > +++ b/include/linux/kvm_host.h
> > @@ -2327,6 +2327,7 @@ void kvm_arch_set_memory_attributes(struct kvm *kvm,
> > struct kvm_memory_slot *slot,
> > unsigned long attrs,
> > gfn_t start, gfn_t end);
> > +
> > #else
> > static inline void kvm_arch_set_memory_attributes(struct kvm *kvm,
> > struct kvm_memory_slot *slot,
> > @@ -2366,6 +2367,7 @@ static inline int kvm_restricted_mem_get_pfn(struct kvm_memory_slot *slot,
> > }
> >
> > void kvm_arch_memory_mce(struct kvm *kvm);
> > +void kvm_arch_invalidate_restricted_mem(struct kvm_memory_slot *slot, gfn_t start, gfn_t end);
> > #endif /* CONFIG_HAVE_KVM_RESTRICTED_MEM */
> >
> > #endif
> > diff --git a/mm/restrictedmem.c b/mm/restrictedmem.c
> > index 56953c204e5c..74fa2cfb8618 100644
> > --- a/mm/restrictedmem.c
> > +++ b/mm/restrictedmem.c
> > @@ -54,6 +54,11 @@ static int restrictedmem_release(struct inode *inode, struct file *file)
> > {
> > struct restrictedmem_data *data = inode->i_mapping->private_data;
> >
> > + pr_debug("%s: releasing memfd, invalidating page offsets 0x0-0x%llx\n",
> > + __func__, inode->i_size >> PAGE_SHIFT);
> > + restrictedmem_invalidate_start(data, 0, inode->i_size >> PAGE_SHIFT);
> > + restrictedmem_invalidate_end(data, 0, inode->i_size >> PAGE_SHIFT);
> > +
> > fput(data->memfd);
> > kfree(data);
> > return 0;
> > @@ -258,6 +263,17 @@ void restrictedmem_unregister_notifier(struct file *file,
> > struct restrictedmem_notifier *notifier)
> > {
> > struct restrictedmem_data *data = file->f_mapping->private_data;
> > + struct inode *inode = file_inode(data->memfd);
> > +
> > + /* TODO: this will issue notifications to all registered notifiers,
> > + * but it's only the one being unregistered that needs to process
> > + * invalidations for any ranges still allocated at this point in
> > + * time. For now this relies on KVM currently being the only notifier.
> > + */
> > + pr_debug("%s: unregistering notifier, invalidating page offsets 0x0-0x%llx\n",
> > + __func__, inode->i_size >> PAGE_SHIFT);
> > + restrictedmem_invalidate_start(data, 0, inode->i_size >> PAGE_SHIFT);
> > + restrictedmem_invalidate_end(data, 0, inode->i_size >> PAGE_SHIFT);
> >
> > mutex_lock(&data->lock);
> > list_del(&notifier->list);
> > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > index d2d829d23442..d2daa049e94a 100644
> > --- a/virt/kvm/kvm_main.c
> > +++ b/virt/kvm/kvm_main.c
> > @@ -974,6 +974,9 @@ static void kvm_restrictedmem_invalidate_begin(struct restrictedmem_notifier *no
> > &gfn_start, &gfn_end))
> > return;
> >
> > + pr_debug("%s: start: 0x%lx, end: 0x%lx, roffset: 0x%llx, gfn_start: 0x%llx, gfn_end: 0x%llx\n",
> > + __func__, start, end, slot->restricted_offset, gfn_start, gfn_end);
> > +
> > gfn_range.start = gfn_start;
> > gfn_range.end = gfn_end;
> > gfn_range.slot = slot;
> > @@ -988,6 +991,8 @@ static void kvm_restrictedmem_invalidate_begin(struct restrictedmem_notifier *no
> > if (kvm_unmap_gfn_range(kvm, &gfn_range))
> > kvm_flush_remote_tlbs(kvm);
> >
> > + kvm_arch_invalidate_restricted_mem(slot, gfn_start, gfn_end);
>
> Calling kvm_arch_invalidate_restricted_mem while the KVM MMU lock is taken
> causes problems, because taking said lock disables preemption. Within
> kvm_arch_invalidate_restricted_mem a few calls down, eventually
> vm_unmap_aliases is called which tries to lock a mutex, which shouldn't happen
> with preemption disabled. This causes a "scheduling while atomic" bug:
>
> [ 152.846596] BUG: scheduling while atomic: enarx/8302/0x00000002
> [ 152.846599] Modules linked in: nf_conntrack_netlink(E) xfrm_user(E) xfrm_algo(E) xt_addrtype(E) br_netfilter(E) xt_CHECKSUM(E) xt_MASQUERADE(E) xt_conntrack(E) ipt_REJECT(E) nf_reject_ipv4(E) xt_tcpudp(E) nft_compat(E) nft_chain_nat(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) nf_tables(E) libcrc32c(E) nfnetlink(E) bridge(E) stp(E) llc(E) bonding(E) intel_rapl_msr(E) intel_rapl_common(E) amd64_edac(E) edac_mce_amd(E) kvm_amd(E) tun(E) ipmi_ssif(E) rfkill(E) overlay(E) ghash_clmulni_intel(E) sha512_ssse3(E) sha512_generic(E) aesni_intel(E) libaes(E) crypto_simd(E) cryptd(E) rapl(E) wmi_bmof(E) binfmt_misc(E) kvm(E) irqbypass(E) nls_ascii(E) nls_cp437(E) vfat(E) fat(E) snd_usb_audio(E) snd_usbmidi_lib(E) snd_hwdep(E) mc(E) snd_pcm(E) snd_seq_midi(E) snd_seq_midi_event(E) snd_rawmidi(E) snd_seq(E) ast(E) snd_seq_device(E) drm_vram_helper(E) drm_ttm_helper(E) snd_timer(E) ttm(E) joydev(E) snd(E) ccp(E) drm_kms_helper(E) soundcore(E) sg(E) i2c_algo_bit(E) rng_core(E)
> [ 152.846629] k10temp(E) evdev(E) acpi_ipmi(E) ipmi_si(E) ipmi_devintf(E) ipmi_msghandler(E) acpi_cpufreq(E) button(E) squashfs(E) loop(E) sch_fq_codel(E) msr(E) parport_pc(E) ppdev(E) lp(E) ramoops(E) parport(E) reed_solomon(E) fuse(E) drm(E) efi_pstore(E) configfs(E) efivarfs(E) ip_tables(E) x_tables(E) autofs4(E) ext4(E) crc32c_generic(E) crc16(E) mbcache(E) jbd2(E) rndis_host(E) cdc_ether(E) usbnet(E) mii(E) hid_generic(E) usbhid(E) hid(E) sd_mod(E) t10_pi(E) crc64_rocksoft(E) crc64(E) crc_t10dif(E) crct10dif_generic(E) crct10dif_pclmul(E) crct10dif_common(E) crc32_pclmul(E) crc32c_intel(E) ahci(E) libahci(E) xhci_pci(E) libata(E) bnxt_en(E) xhci_hcd(E) scsi_mod(E) ptp(E) scsi_common(E) pps_core(E) usbcore(E) i2c_piix4(E) usb_common(E) wmi(E)
> [ 152.846657] Preemption disabled at:
> [ 152.846657] [<ffffffffc146a09a>] kvm_restrictedmem_invalidate_begin+0xba/0x1c0 [kvm]
> [ 152.846688] CPU: 108 PID: 8302 Comm: enarx Tainted: G W E 6.1.0-rc4+ #30
> [ 152.846690] Hardware name: Supermicro Super Server/H12SSL-NT, BIOS 2.4 04/14/2022
> [ 152.846691] Call Trace:
> [ 152.846692] <TASK>
> [ 152.846694] dump_stack_lvl+0x49/0x63
> [ 152.846695] ? kvm_restrictedmem_invalidate_begin+0xba/0x1c0 [kvm]
> [ 152.846723] dump_stack+0x10/0x16
> [ 152.846725] __schedule_bug.cold+0x81/0x92
> [ 152.846727] __schedule+0x809/0xa00
> [ 152.846729] ? asm_sysvec_call_function+0x1b/0x20
> [ 152.846731] schedule+0x6b/0xf0
> [ 152.846733] schedule_preempt_disabled+0x18/0x30
> [ 152.846735] __mutex_lock.constprop.0+0x723/0x750
> [ 152.846738] ? smp_call_function_many_cond+0xc1/0x2e0
> [ 152.846740] __mutex_lock_slowpath+0x13/0x20
> [ 152.846742] mutex_lock+0x49/0x60
> [ 152.846744] _vm_unmap_aliases+0x10e/0x160
> [ 152.846746] vm_unmap_aliases+0x19/0x20
> [ 152.846748] change_page_attr_set_clr+0xb7/0x1c0
> [ 152.846751] set_memory_p+0x29/0x30
> [ 152.846753] rmpupdate+0xd5/0x110
> [ 152.846756] rmp_make_shared+0xb7/0xc0
> [ 152.846758] snp_make_page_shared.constprop.0+0x4c/0x90 [kvm_amd]
> [ 152.846765] sev_invalidate_private_range+0x156/0x330 [kvm_amd]
> [ 152.846770] ? kvm_unmap_gfn_range+0xef/0x100 [kvm]
> [ 152.846801] kvm_arch_invalidate_restricted_mem+0xe/0x20 [kvm]
> [ 152.846829] kvm_restrictedmem_invalidate_begin+0x106/0x1c0 [kvm]
> [ 152.846856] restrictedmem_unregister_notifier+0x74/0x150
> [ 152.846859] kvm_free_memslot+0x6b/0x80 [kvm]
> [ 152.846885] kvm_free_memslots.part.0+0x47/0x70 [kvm]
> [ 152.846911] kvm_destroy_vm+0x222/0x320 [kvm]
> [ 152.846937] kvm_put_kvm+0x2a/0x50 [kvm]
> [ 152.846964] kvm_vm_release+0x22/0x30 [kvm]
> [ 152.846990] __fput+0xa8/0x280
> [ 152.846992] ____fput+0xe/0x20
> [ 152.846994] task_work_run+0x61/0xb0
> [ 152.846996] do_exit+0x362/0xb30
> [ 152.846998] ? tomoyo_path_number_perm+0x6f/0x200
> [ 152.847001] do_group_exit+0x38/0xa0
> [ 152.847003] get_signal+0x999/0x9c0
> [ 152.847005] arch_do_signal_or_restart+0x37/0x7e0
> [ 152.847008] ? __might_fault+0x26/0x30
> [ 152.847010] ? __rseq_handle_notify_resume+0xd5/0x4f0
> [ 152.847013] exit_to_user_mode_prepare+0xd3/0x170
> [ 152.847016] syscall_exit_to_user_mode+0x26/0x50
> [ 152.847019] do_syscall_64+0x48/0x90
> [ 152.847020] entry_SYSCALL_64_after_hwframe+0x63/0xcd
> [ 152.847022] RIP: 0033:0x7fa345f1aaff
> [ 152.847023] Code: Unable to access opcode bytes at 0x7fa345f1aad5.
> [ 152.847024] RSP: 002b:00007fff99d6c050 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
> [ 152.847026] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 00007fa345f1aaff
> [ 152.847027] RDX: 00007fff99d6c188 RSI: 00000000c008aeba RDI: 0000000000000006
> [ 152.847028] RBP: 00007fff99576000 R08: 0000000000000000 R09: 0000000000000000
> [ 152.847029] R10: 0000000001680000 R11: 0000000000000246 R12: 00007fff99d752c0
> [ 152.847030] R13: 00007fff99d75270 R14: 0000000000000000 R15: 00007fff99577000
> [ 152.847032] </TASK>
>
> This bug can be triggered by destroying multiple SNP VMs at the same time.

I can also reproduce this one.

If I do "cargo test -- --test-threads=1", then this does not happen.

Even then I get this:

[ 232.054359] SEV-SNP launch update failed, ret: 0xfffffffb, fw_error: 0x16
[ 232.061466] ------------[ cut here ]------------
[ 232.061467] WARNING: CPU: 18 PID: 2436 at arch/x86/kvm/../../../virt/kvm/kvm_main.c:3665 mark_page_dirty_in_slot+0x99/0xd0
[ 232.061472] Modules linked in: af_packet irdma intel_rapl_msr i40e ib_uverbs ib_core dell_smbios wmi_bmof dell_wmi_descriptor evdev mac_hid dcdbas amd64_edac edac_mce_amd edac_core intel_rapl_common crc32_pclmul polyval_clmulni polyval_generic gf128mul ghash_clmulni_intel sha512_ssse3 sha512_generic aesni_intel libaes crypto_simd cryptd rapl deflate efi_pstore bonding tls cfg80211 rfkill ip6_tables xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_tcpudp ip6t_rpfilter ipt_rpfilter xt_pkttype nft_compat nf_tables libcrc32c sch_fq_codel nfnetlink atkbd libps2 serio vivaldi_fmap loop tun tap macvlan bridge stp llc ipmi_ssif ipmi_watchdog dm_round_robin dm_multipath kvm_amd mgag200 drm_shmem_helper drm_kms_helper i2c_algo_bit fb_sys_fops syscopyarea sysfillrect sysimgblt ice ptp pps_core sp5100_tco watchdog k10temp i2c_piix4 ptdma virt_dma hed tpm_crb wmi acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler tpm_tis tpm_tis_core tpm acpi_power_meter nls_iso8859_1 nls_cp437 vfat fat
[ 232.061525] tiny_power_button button ccp rng_core drm pstore fuse backlight i2c_core configfs efivarfs ip_tables x_tables autofs4 ext4 crc32c_generic crc16 mbcache jbd2 sd_mod ahci libahci xhci_pci xhci_pci_renesas libata xhci_hcd nvme nvme_core scsi_mod usbcore t10_pi crc32c_intel crc64_rocksoft crc64 crc_t10dif scsi_common crct10dif_generic crct10dif_pclmul crct10dif_common usb_common rtc_cmos dm_mod dax
[ 232.061547] CPU: 18 PID: 2436 Comm: enarx Not tainted 6.1.0-rc4 #1-NixOS
[ 232.061549] Hardware name: Dell Inc. PowerEdge R6515/068NXX, BIOS 2.6.6 01/13/2022
[ 232.061550] RIP: 0010:mark_page_dirty_in_slot+0x99/0xd0
[ 232.061552] Code: 83 04 01 00 00 4c 2b a3 b0 00 00 00 85 d2 74 27 c1 e6 10 5b 49 8d bd f8 19 00 00 5d 4c 89 e2 09 c6 41 5c 41 5d e9 07 aa 00 00 <0f> 0b 5b 5d 41 5c 41 5d c3 cc cc cc cc 48 8b 83 c0 00 00 00 49 63
[ 232.061553] RSP: 0018:ffffc169265978f0 EFLAGS: 00010246
[ 232.061555] RAX: 0000000080000000 RBX: ffffa0c045c49600 RCX: 0000000000000000
[ 232.061556] RDX: 0000000000000001 RSI: ffffffff92d076bd RDI: 00000000ffffffff
[ 232.061557] RBP: ffffc1692177d000 R08: 0000000000000001 R09: 0000000000001000
[ 232.061557] R10: 0000000000000001 R11: 0000000000000001 R12: 00000000000ffe02
[ 232.061558] R13: 0000000000000000 R14: ffffc1692177d000 R15: 00007f6cacf89000
[ 232.061559] FS: 00007f6cacfc3540(0000) GS:ffffa0cfffa80000(0000) knlGS:0000000000000000
[ 232.061560] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 232.061561] CR2: 00007f6cacf89020 CR3: 000000208c9c6005 CR4: 0000000000770ee0
[ 232.061561] PKRU: 55555554
[ 232.061562] Call Trace:
[ 232.061563] <TASK>
[ 232.061566] __kvm_write_guest_page+0xac/0xf0
[ 232.061569] snp_launch_update_gfn_handler.cold+0x5e/0xfe [kvm_amd]
[ 232.061578] kvm_vm_do_hva_range_op+0x142/0x1c0
[ 232.061579] ? sev_launch_update_gfn_handler+0x470/0x470 [kvm_amd]
[ 232.061585] sev_mem_enc_ioctl+0x4dd/0x1270 [kvm_amd]
[ 232.061591] ? sev_pin_memory+0x159/0x1a0 [kvm_amd]
[ 232.061595] ? sev_mem_enc_register_region+0xe3/0x130 [kvm_amd]
[ 232.061602] kvm_arch_vm_ioctl+0x6a6/0xc20
[ 232.061604] ? __blk_flush_plug+0x102/0x160
[ 232.061606] ? _raw_spin_unlock_irqrestore+0x23/0x40
[ 232.061609] ? get_page_from_freelist+0x1451/0x15a0
[ 232.061612] ? __mod_node_page_state+0x7c/0xb0
[ 232.061615] ? try_charge_memcg+0x466/0x800
[ 232.061618] ? __mod_node_page_state+0x7c/0xb0
[ 232.061619] ? __mod_memcg_lruvec_state+0x6e/0xd0
[ 232.061620] kvm_vm_ioctl+0x7ba/0x1290
[ 232.061622] ? folio_add_lru+0x6e/0xa0
[ 232.061624] ? _raw_spin_unlock+0x15/0x30
[ 232.061626] ? __handle_mm_fault+0xace/0xc70
[ 232.061629] ? handle_mm_fault+0xb2/0x2a0
[ 232.061630] __x64_sys_ioctl+0x8a/0xc0
[ 232.061634] do_syscall_64+0x3b/0x90
[ 232.061636] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 232.061638] RIP: 0033:0x7f6cad0c1e37
[ 232.061659] Code: ff ff 48 89 d8 5b 5d 41 5c c3 66 0f 1f 84 00 00 00 00 00 48 89 e8 48 f7 d8 48 39 c3 0f 92 c0 eb c9 66 90 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d b1 0f 0f 00 f7 d8 64 89 01 48
[ 232.061660] RSP: 002b:00007ffee9081668 EFLAGS: 00000216 ORIG_RAX: 0000000000000010
[ 232.061661] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f6cad0c1e37
[ 232.061662] RDX: 00007ffee9081748 RSI: 00000000c008aeba RDI: 0000000000000004
[ 232.061663] RBP: 00007ffee888e000 R08: 0000000000000006 R09: 0000000000000000
[ 232.061663] R10: 0000000000001000 R11: 0000000000000216 R12: 00007ffee908a260
[ 232.061664] R13: 00007ffee908a210 R14: 0000000000000000 R15: 00007ffee888f000
[ 232.061665] </TASK>
[ 232.061666] ---[ end trace 0000000000000000 ]---
[ 232.759176] SEV-SNP launch update failed, ret: 0xfffffffb, fw_error: 0x16
[ 233.449477] SEV-SNP launch update failed, ret: 0xfffffffb, fw_error: 0x16
[ 235.284941] SEV-SNP launch update failed, ret: 0xfffffffb, fw_error: 0x16
[ 237.129469] SEV-SNP launch update failed, ret: 0xfffffffb, fw_error: 0x16
[ 237.978020] SEV-SNP launch update failed, ret: 0xfffffffb, fw_error: 0x16
[ 239.034002] SEV-SNP launch update failed, ret: 0xfffffffb, fw_error: 0x16
[ 239.598148] SEV-SNP launch update failed, ret: 0xfffffffb, fw_error: 0x16
[ 240.152122] SEV-SNP launch update failed, ret: 0xfffffffb, fw_error: 0x16
[ 240.718185] SEV-SNP launch update failed, ret: 0xfffffffb, fw_error: 0x16
[ 241.271235] SEV-SNP launch update failed, ret: 0xfffffffb, fw_error: 0x16
[ 241.836286] SEV-SNP launch update failed, ret: 0xfffffffb, fw_error: 0x16
[ 242.390345] SEV-SNP launch update failed, ret: 0xfffffffb, fw_error: 0x16
[ 242.962499] SEV-SNP launch update failed, ret: 0xfffffffb, fw_error: 0x16
[ 243.501430] SEV-SNP launch update failed, ret: 0xfffffffb, fw_error: 0x16
[ 244.093547] SEV-SNP launch update failed, ret: 0xfffffffb, fw_error: 0x16

There is no cumulative klog output, i.e. only first run of cargo test emits this.
Also software runs without issues, including attestation and everything.

BR, Jarkko

2023-01-26 21:25:50

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: [PATCH RFC v7 37/64] KVM: SVM: Add KVM_SNP_INIT command

On Mon, Jan 23, 2023 at 04:49:14PM -0600, Kalra, Ashish wrote:
> There was an early firmware issue on Genoa which supported only SNP_INIT or
> SEV_INIT, but this issue is resolved now.
>
> Now, the main constraints are that SNP_INIT is always required before
> SEV_INIT in case we want to launch SNP guests. In other words, if only
> SEV_INIT is done on a platform which supports SNP we won't be able to launch
> SNP guests after that.
>
> So once we have RMP table setup (in BIOS) we will always do an SNP_INIT and
> SEV_INIT will be ideally done only (on demand) when an SEV guest is
> launched.

OK, thanks for the clarification!

BR, Jarkko

2023-01-27 16:36:02

by Jeremi Piotrowski

[permalink] [raw]
Subject: Re: [PATCH RFC v7 07/64] KVM: SEV: Handle KVM_HC_MAP_GPA_RANGE hypercall

On Wed, Dec 14, 2022 at 01:39:59PM -0600, Michael Roth wrote:
> From: Nikunj A Dadhania <[email protected]>
>
> KVM_HC_MAP_GPA_RANGE hypercall is used by the SEV guest to notify a
> change in the page encryption status to the hypervisor.
>
> The hypercall exits to userspace with KVM_EXIT_HYPERCALL exit code,
> currently this is used for explicit memory conversion between
> shared/private for memfd based private memory.
>
> Signed-off-by: Nikunj A Dadhania <[email protected]>
> Signed-off-by: Michael Roth <[email protected]>
> ---
> arch/x86/kvm/x86.c | 8 ++++++++
> virt/kvm/kvm_main.c | 1 +
> 2 files changed, 9 insertions(+)
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index bb6adb216054..732f9cbbadb5 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -9649,6 +9649,7 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)

Couldn't find a better commit to comment on:
when the guest has the ptp-kvm module, it will issue a KVM_HC_CLOCK_PAIRING
hypercall. This will pass sev_es_validate_vmgexit validation and end up in this
function where kvm_pv_clock_pairing() is called, and that calls
kvm_write_guest(). This results in a CPU soft-lockup, at least in my testing.

Are there any emulated hypercalls that make sense for snp guests? We should
block at least the ones that definitely don't work.

Jeremi

> break;
> case KVM_HC_MAP_GPA_RANGE: {
> u64 gpa = a0, npages = a1, attrs = a2;
> + struct kvm_memory_slot *slot;
>
> ret = -KVM_ENOSYS;
> if (!(vcpu->kvm->arch.hypercall_exit_enabled & (1 << KVM_HC_MAP_GPA_RANGE)))
> @@ -9660,6 +9661,13 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
> break;
> }
>
> + slot = kvm_vcpu_gfn_to_memslot(vcpu, gpa_to_gfn(gpa));
> + if (!vcpu->kvm->arch.upm_mode ||
> + !kvm_slot_can_be_private(slot)) {
> + ret = 0;
> + break;
> + }
> +
> vcpu->run->exit_reason = KVM_EXIT_HYPERCALL;
> vcpu->run->hypercall.nr = KVM_HC_MAP_GPA_RANGE;
> vcpu->run->hypercall.args[0] = gpa;
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index d2daa049e94a..73bf0bdedb59 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -2646,6 +2646,7 @@ struct kvm_memory_slot *kvm_vcpu_gfn_to_memslot(struct kvm_vcpu *vcpu, gfn_t gfn
>
> return NULL;
> }
> +EXPORT_SYMBOL_GPL(kvm_vcpu_gfn_to_memslot);
>
> bool kvm_is_visible_gfn(struct kvm *kvm, gfn_t gfn)
> {
> --
> 2.25.1
>

2023-01-31 01:55:04

by Alexey Kardashevskiy

[permalink] [raw]
Subject: Re: [PATCH RFC v7 52/64] KVM: SVM: Provide support for SNP_GUEST_REQUEST NAE event



On 11/1/23 13:01, Kalra, Ashish wrote:
> On 1/10/2023 6:48 PM, Alexey Kardashevskiy wrote:
>> On 10/1/23 19:33, Kalra, Ashish wrote:
>>>
>>> On 1/9/2023 8:28 PM, Alexey Kardashevskiy wrote:
>>>>
>>>>
>>>> On 10/1/23 10:41, Kalra, Ashish wrote:
>>>>> On 1/8/2023 9:33 PM, Alexey Kardashevskiy wrote:
>>>>>> On 15/12/22 06:40, Michael Roth wrote:
>>>>>>> From: Brijesh Singh <[email protected]>
>>>>>>>
>>>>>>> Version 2 of GHCB specification added the support for two SNP Guest
>>>>>>> Request Message NAE events. The events allows for an SEV-SNP
>>>>>>> guest to
>>>>>>> make request to the SEV-SNP firmware through hypervisor using the
>>>>>>> SNP_GUEST_REQUEST API define in the SEV-SNP firmware specification.
>>>>>>>
>>>>>>> The SNP_EXT_GUEST_REQUEST is similar to SNP_GUEST_REQUEST with the
>>>>>>> difference of an additional certificate blob that can be passed
>>>>>>> through
>>>>>>> the SNP_SET_CONFIG ioctl defined in the CCP driver. The CCP driver
>>>>>>> provides snp_guest_ext_guest_request() that is used by the KVM to
>>>>>>> get
>>>>>>> both the report and certificate data at once.
>>>>>>>
>>>>>>> Signed-off-by: Brijesh Singh <[email protected]>
>>>>>>> Signed-off-by: Ashish Kalra <[email protected]>
>>>>>>> Signed-off-by: Michael Roth <[email protected]>
>>>>>>> ---
>>>>>>>   arch/x86/kvm/svm/sev.c | 185
>>>>>>> +++++++++++++++++++++++++++++++++++++++--
>>>>>>>   arch/x86/kvm/svm/svm.h |   2 +
>>>>>>>   2 files changed, 181 insertions(+), 6 deletions(-)
>>>>>>>
>>>>>>> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
>>>>>>> index 5f2b2092cdae..18efa70553c2 100644
>>>>>>> --- a/arch/x86/kvm/svm/sev.c
>>>>>>> +++ b/arch/x86/kvm/svm/sev.c
>>>>>>> @@ -331,6 +331,7 @@ static int sev_guest_init(struct kvm *kvm,
>>>>>>> struct kvm_sev_cmd *argp)
>>>>>>>           if (ret)
>>>>>>>               goto e_free;
>>>>>>> +        mutex_init(&sev->guest_req_lock);
>>>>>>>           ret = sev_snp_init(&argp->error, false);
>>>>>>>       } else {
>>>>>>>           ret = sev_platform_init(&argp->error);
>>>>>>> @@ -2051,23 +2052,34 @@ int sev_vm_move_enc_context_from(struct
>>>>>>> kvm *kvm, unsigned int source_fd)
>>>>>>>    */
>>>>>>>   static void *snp_context_create(struct kvm *kvm, struct
>>>>>>> kvm_sev_cmd *argp)
>>>>>>>   {
>>>>>>> +    struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
>>>>>>>       struct sev_data_snp_addr data = {};
>>>>>>> -    void *context;
>>>>>>> +    void *context, *certs_data;
>>>>>>>       int rc;
>>>>>>> +    /* Allocate memory used for the certs data in SNP guest
>>>>>>> request */
>>>>>>> +    certs_data = kzalloc(SEV_FW_BLOB_MAX_SIZE, GFP_KERNEL_ACCOUNT);
>>>>>>> +    if (!certs_data)
>>>>>>> +        return NULL;
>>>>>>> +
>>>>>>>       /* Allocate memory for context page */
>>>>>>>       context = snp_alloc_firmware_page(GFP_KERNEL_ACCOUNT);
>>>>>>>       if (!context)
>>>>>>> -        return NULL;
>>>>>>> +        goto e_free;
>>>>>>>       data.gctx_paddr = __psp_pa(context);
>>>>>>>       rc = __sev_issue_cmd(argp->sev_fd, SEV_CMD_SNP_GCTX_CREATE,
>>>>>>> &data, &argp->error);
>>>>>>> -    if (rc) {
>>>>>>> -        snp_free_firmware_page(context);
>>>>>>> -        return NULL;
>>>>>>> -    }
>>>>>>> +    if (rc)
>>>>>>> +        goto e_free;
>>>>>>> +
>>>>>>> +    sev->snp_certs_data = certs_data;
>>>>>>>       return context;
>>>>>>> +
>>>>>>> +e_free:
>>>>>>> +    snp_free_firmware_page(context);
>>>>>>> +    kfree(certs_data);
>>>>>>> +    return NULL;
>>>>>>>   }
>>>>>>>   static int snp_bind_asid(struct kvm *kvm, int *error)
>>>>>>> @@ -2653,6 +2665,8 @@ static int snp_decommission_context(struct
>>>>>>> kvm *kvm)
>>>>>>>       snp_free_firmware_page(sev->snp_context);
>>>>>>>       sev->snp_context = NULL;
>>>>>>> +    kfree(sev->snp_certs_data);
>>>>>>> +
>>>>>>>       return 0;
>>>>>>>   }
>>>>>>> @@ -3174,6 +3188,8 @@ static int sev_es_validate_vmgexit(struct
>>>>>>> vcpu_svm *svm, u64 *exit_code)
>>>>>>>       case SVM_VMGEXIT_UNSUPPORTED_EVENT:
>>>>>>>       case SVM_VMGEXIT_HV_FEATURES:
>>>>>>>       case SVM_VMGEXIT_PSC:
>>>>>>> +    case SVM_VMGEXIT_GUEST_REQUEST:
>>>>>>> +    case SVM_VMGEXIT_EXT_GUEST_REQUEST:
>>>>>>>           break;
>>>>>>>       default:
>>>>>>>           reason = GHCB_ERR_INVALID_EVENT;
>>>>>>> @@ -3396,6 +3412,149 @@ static int snp_complete_psc(struct
>>>>>>> kvm_vcpu *vcpu)
>>>>>>>       return 1;
>>>>>>>   }
>>>>>>> +static unsigned long snp_setup_guest_buf(struct vcpu_svm *svm,
>>>>>>> +                     struct sev_data_snp_guest_request *data,
>>>>>>> +                     gpa_t req_gpa, gpa_t resp_gpa)
>>>>>>> +{
>>>>>>> +    struct kvm_vcpu *vcpu = &svm->vcpu;
>>>>>>> +    struct kvm *kvm = vcpu->kvm;
>>>>>>> +    kvm_pfn_t req_pfn, resp_pfn;
>>>>>>> +    struct kvm_sev_info *sev;
>>>>>>> +
>>>>>>> +    sev = &to_kvm_svm(kvm)->sev_info;
>>>>>>> +
>>>>>>> +    if (!IS_ALIGNED(req_gpa, PAGE_SIZE) || !IS_ALIGNED(resp_gpa,
>>>>>>> PAGE_SIZE))
>>>>>>> +        return SEV_RET_INVALID_PARAM;
>>>>>>> +
>>>>>>> +    req_pfn = gfn_to_pfn(kvm, gpa_to_gfn(req_gpa));
>>>>>>> +    if (is_error_noslot_pfn(req_pfn))
>>>>>>> +        return SEV_RET_INVALID_ADDRESS;
>>>>>>> +
>>>>>>> +    resp_pfn = gfn_to_pfn(kvm, gpa_to_gfn(resp_gpa));
>>>>>>> +    if (is_error_noslot_pfn(resp_pfn))
>>>>>>> +        return SEV_RET_INVALID_ADDRESS;
>>>>>>> +
>>>>>>> +    if (rmp_make_private(resp_pfn, 0, PG_LEVEL_4K, 0, true))
>>>>>>> +        return SEV_RET_INVALID_ADDRESS;
>>>>>>> +
>>>>>>> +    data->gctx_paddr = __psp_pa(sev->snp_context);
>>>>>>> +    data->req_paddr = __sme_set(req_pfn << PAGE_SHIFT);
>>>>>>> +    data->res_paddr = __sme_set(resp_pfn << PAGE_SHIFT);
>>>>>>> +
>>>>>>> +    return 0;
>>>>>>> +}
>>>>>>> +
>>>>>>> +static void snp_cleanup_guest_buf(struct
>>>>>>> sev_data_snp_guest_request *data, unsigned long *rc)
>>>>>>> +{
>>>>>>> +    u64 pfn = __sme_clr(data->res_paddr) >> PAGE_SHIFT;
>>>>>>> +    int ret;
>>>>>>> +
>>>>>>> +    ret = snp_page_reclaim(pfn);
>>>>>>> +    if (ret)
>>>>>>> +        *rc = SEV_RET_INVALID_ADDRESS;
>>>>>>> +
>>>>>>> +    ret = rmp_make_shared(pfn, PG_LEVEL_4K);
>>>>>>> +    if (ret)
>>>>>>> +        *rc = SEV_RET_INVALID_ADDRESS;
>>>>>>> +}
>>>>>>> +
>>>>>>> +static void snp_handle_guest_request(struct vcpu_svm *svm, gpa_t
>>>>>>> req_gpa, gpa_t resp_gpa)
>>>>>>> +{
>>>>>>> +    struct sev_data_snp_guest_request data = {0};
>>>>>>> +    struct kvm_vcpu *vcpu = &svm->vcpu;
>>>>>>> +    struct kvm *kvm = vcpu->kvm;
>>>>>>> +    struct kvm_sev_info *sev;
>>>>>>> +    unsigned long rc;
>>>>>>> +    int err;
>>>>>>> +
>>>>>>> +    if (!sev_snp_guest(vcpu->kvm)) {
>>>>>>> +        rc = SEV_RET_INVALID_GUEST;
>>>>>>> +        goto e_fail;
>>>>>>> +    }
>>>>>>> +
>>>>>>> +    sev = &to_kvm_svm(kvm)->sev_info;
>>>>>>> +
>>>>>>> +    mutex_lock(&sev->guest_req_lock);
>>>>>>> +
>>>>>>> +    rc = snp_setup_guest_buf(svm, &data, req_gpa, resp_gpa);
>>>>>>> +    if (rc)
>>>>>>> +        goto unlock;
>>>>>>> +
>>>>>>> +    rc = sev_issue_cmd(kvm, SEV_CMD_SNP_GUEST_REQUEST, &data,
>>>>>>> &err);
>>>>>>
>>>>>>
>>>>>> This one goes via sev_issue_cmd_external_user() and uses sev-fd...
>>>>>>
>>>>>>> +    if (rc)
>>>>>>> +        /* use the firmware error code */
>>>>>>> +        rc = err;
>>>>>>> +
>>>>>>> +    snp_cleanup_guest_buf(&data, &rc);
>>>>>>> +
>>>>>>> +unlock:
>>>>>>> +    mutex_unlock(&sev->guest_req_lock);
>>>>>>> +
>>>>>>> +e_fail:
>>>>>>> +    svm_set_ghcb_sw_exit_info_2(vcpu, rc);
>>>>>>> +}
>>>>>>> +
>>>>>>> +static void snp_handle_ext_guest_request(struct vcpu_svm *svm,
>>>>>>> gpa_t req_gpa, gpa_t resp_gpa)
>>>>>>> +{
>>>>>>> +    struct sev_data_snp_guest_request req = {0};
>>>>>>> +    struct kvm_vcpu *vcpu = &svm->vcpu;
>>>>>>> +    struct kvm *kvm = vcpu->kvm;
>>>>>>> +    unsigned long data_npages;
>>>>>>> +    struct kvm_sev_info *sev;
>>>>>>> +    unsigned long rc, err;
>>>>>>> +    u64 data_gpa;
>>>>>>> +
>>>>>>> +    if (!sev_snp_guest(vcpu->kvm)) {
>>>>>>> +        rc = SEV_RET_INVALID_GUEST;
>>>>>>> +        goto e_fail;
>>>>>>> +    }
>>>>>>> +
>>>>>>> +    sev = &to_kvm_svm(kvm)->sev_info;
>>>>>>> +
>>>>>>> +    data_gpa = vcpu->arch.regs[VCPU_REGS_RAX];
>>>>>>> +    data_npages = vcpu->arch.regs[VCPU_REGS_RBX];
>>>>>>> +
>>>>>>> +    if (!IS_ALIGNED(data_gpa, PAGE_SIZE)) {
>>>>>>> +        rc = SEV_RET_INVALID_ADDRESS;
>>>>>>> +        goto e_fail;
>>>>>>> +    }
>>>>>>> +
>>>>>>> +    mutex_lock(&sev->guest_req_lock);
>>>>>>> +
>>>>>>> +    rc = snp_setup_guest_buf(svm, &req, req_gpa, resp_gpa);
>>>>>>> +    if (rc)
>>>>>>> +        goto unlock;
>>>>>>> +
>>>>>>> +    rc = snp_guest_ext_guest_request(&req, (unsigned
>>>>>>> long)sev->snp_certs_data,
>>>>>>> +                     &data_npages, &err);
>>>>>>
>>>>>> but this one does not and jump straight to
>>>>>> drivers/crypto/ccp/sev-dev.c ignoring sev->fd. Why different? Can
>>>>>> these two be unified? sev_issue_cmd_external_user() only checks if
>>>>>> fd is /dev/sev which is hardly useful.
>>>>>>
>>>>>> "[PATCH RFC v7 32/64] crypto: ccp: Provide APIs to query extended
>>>>>> attestation report" added this one.
>>>>>
>>>>> SNP_EXT_GUEST_REQUEST additionally returns a certificate blob and
>>>>> that's why it goes through the CCP driver interface
>>>>> snp_guest_ext_guest_request() that is used to get both the report
>>>>> and certificate data/blob at the same time.
>>>>
>>>> True. I thought though that this calls for extending sev_issue_cmd()
>>>> to take care of these extra parameters rather than just skipping the
>>>> sev->fd.
>>>>
>>>>
>>>>> All the FW API calls on the KVM side go through sev_issue_cmd() and
>>>>> sev_issue_cmd_external_user() interfaces and that i believe uses
>>>>> sev->fd more of as a sanity check.
>>>>
>>>> Does not look like it:
>>>>
>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/crypto/ccp/sev-dev.c?h=v6.2-rc3#n1290
>>>>
>>>> ===
>>>> int sev_issue_cmd_external_user(struct file *filep, unsigned int cmd,
>>>>                  void *data, int *error)
>>>> {
>>>>      if (!filep || filep->f_op != &sev_fops)
>>>>          return -EBADF;
>>>>
>>>>      return sev_do_cmd(cmd, data, error);
>>>> }
>>>> EXPORT_SYMBOL_GPL(sev_issue_cmd_external_user);
>>>> ===
>>>>
>>>> The only "more" is that it requires sev->fd to be a valid open fd,
>>>> what is the value in that? I may easily miss the bigger picture
>>>> here. Thanks,
>>>>
>>>>
>>>
>>> Have a look at following functions in drivers/crypto/ccp/sev-dev.c:
>>> sev_dev_init() and sev_misc_init().
>>>
>>> static int sev_misc_init(struct sev_device *sev)
>>> {
>>>          struct device *dev = sev->dev;
>>>          int ret;
>>>
>>>          /*
>>>           * SEV feature support can be detected on multiple devices but
>>>           * the SEV FW commands must be issued on the master. During
>>>           * probe, we do not know the master hence we create /dev/sev on
>>>           * the first device probe.
>>>           * sev_do_cmd() finds the right master device to which to issue
>>>           * the command to the firmware.
>>>       */
>>
>>
>> It is still a single /dev/sev node and the userspace cannot get it
>> wrong, it does not have to choose between (for instance) /dev/sev0 and
>> /dev/sev1 on a 2 SOC system.
>>
>>> ...
>>> ...
>>>
>>> Hence, sev_issue_cmd_external_user() needs to ensure that the correct
>>> device (master device) is being operated upon and that's why there is
>>> the check for file operations matching sev_fops as below :
>>>
>>> int sev_issue_cmd_external_user(struct file *filep, unsigned int cmd,
>>>                                  void *data, int *error)
>>> {
>>>          if (!filep || filep->f_op != &sev_fops)
>>>                  return -EBADF;
>>> ..
>>> ..
>>>
>>> Essentially, sev->fd is the misc. device created for the master PSP
>>> device on which the SEV/SNP firmware commands are issued, hence,
>>> sev_issue_cmd() uses sev->fd.
>>
>> There is always just one fd which always uses psp_master, nothing from
>> that fd is used.
>
> It also ensures that we can only issue commands (sev_issue_cmd) after
> SEV/SNP guest has launched.

I can open /dev/sev and start sending commands to the firmware with no
KVM running at all. Oh well, we discussed this offline :)

> We don't have a valid fd to use before the
> guest launch. The file descriptor is passed as part of the guest launch
> flow, for example, in snp_launch_start().
>>
>> More to the point, if sev->fd is still important, why is it ok to skip
>> it for snp_handle_ext_guest_request()? Thanks,
>>
>>
> Then, we should do the same for snp_handle_ext_guest_request().

Okay.

This snp_handle_ext_guest_request() helper is for returning "Table 21.
ATTESTATION_REPORT Structure" along with the certificate(s) used to sign
the report: "This usage allows the attestation report and the
certificates required to verify the report to be returned at the same time".

I can see:
1) KVM_SEV_SNP_{G,S}ET_CERTS ioctls on KVM VM and
2) SNP_{SET,GET}_EXT_CONFIG ioctls on /dev/sev
Both store the passed blob and neither communicate it to the firmware.
This makes me wonder - how does the attestation report (cooked by the
firmware) get signed with those certificates passed on by the HV userspace?

Also, the cached blob in /dev/sev seems redundand - the attestation
report is retuned for a specific guest so having a blob in the KVM VM
makes sense and KVM unconditionally reserves memory for it anyway. And
for the HV itself the blob is useless (?) so why bother with caching it
in /dev/sev.

And GET ioctls() return what SET passed on (not something the firware
returned, for example), what is ever going to call SET? The userspace
can as well cache what it passed and save a bit of the code/memory in
the kernel.

btw SNP_{SET,GET}_EXT_CONFIG are documented in
Documentation/virt/coco/sev-guest.rst but implemented in
drivers/crypto/ccp/sev-dev.c (not sev-guest.c).

What do I miss in the big picture here? :) Thanks,


--
Alexey

2023-01-31 14:15:10

by Jeremi Piotrowski

[permalink] [raw]
Subject: Re: [PATCH RFC v7 07/64] KVM: SEV: Handle KVM_HC_MAP_GPA_RANGE hypercall

On Fri, Jan 27, 2023 at 08:35:58AM -0800, Jeremi Piotrowski wrote:
> On Wed, Dec 14, 2022 at 01:39:59PM -0600, Michael Roth wrote:
> > From: Nikunj A Dadhania <[email protected]>
> >
> > KVM_HC_MAP_GPA_RANGE hypercall is used by the SEV guest to notify a
> > change in the page encryption status to the hypervisor.
> >
> > The hypercall exits to userspace with KVM_EXIT_HYPERCALL exit code,
> > currently this is used for explicit memory conversion between
> > shared/private for memfd based private memory.
> >
> > Signed-off-by: Nikunj A Dadhania <[email protected]>
> > Signed-off-by: Michael Roth <[email protected]>
> > ---
> > arch/x86/kvm/x86.c | 8 ++++++++
> > virt/kvm/kvm_main.c | 1 +
> > 2 files changed, 9 insertions(+)
> >
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index bb6adb216054..732f9cbbadb5 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -9649,6 +9649,7 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
>
> Couldn't find a better commit to comment on:
> when the guest has the ptp-kvm module, it will issue a KVM_HC_CLOCK_PAIRING
> hypercall. This will pass sev_es_validate_vmgexit validation and end up in this
> function where kvm_pv_clock_pairing() is called, and that calls
> kvm_write_guest(). This results in a CPU soft-lockup, at least in my testing.
>
> Are there any emulated hypercalls that make sense for snp guests? We should
> block at least the ones that definitely don't work.
>
> Jeremi

So turns out the soft-lockup is a nested issue (details here for those
interested: [^1]), but the questions still stands, of whether we should
block kvm_write_page (and similar) explicitly or rely on the rmp fault.

[^1]: https://github.com/jepio/linux/commit/6c3bdf552e93664ae172660e24ceceed60fd4df5

2023-01-31 16:24:12

by Tom Lendacky

[permalink] [raw]
Subject: Re: [PATCH RFC v7 52/64] KVM: SVM: Provide support for SNP_GUEST_REQUEST NAE event

On 1/30/23 19:54, Alexey Kardashevskiy wrote:
>
>
> On 11/1/23 13:01, Kalra, Ashish wrote:
>> On 1/10/2023 6:48 PM, Alexey Kardashevskiy wrote:
>>> On 10/1/23 19:33, Kalra, Ashish wrote:
>>>>
>>>> On 1/9/2023 8:28 PM, Alexey Kardashevskiy wrote:
>>>>>
>>>>>
>>>>> On 10/1/23 10:41, Kalra, Ashish wrote:
>>>>>> On 1/8/2023 9:33 PM, Alexey Kardashevskiy wrote:
>>>>>>> On 15/12/22 06:40, Michael Roth wrote:
>>>>>>>> From: Brijesh Singh <[email protected]>
>>>>>>>>
>>>>>>>> Version 2 of GHCB specification added the support for two SNP Guest
>>>>>>>> Request Message NAE events. The events allows for an SEV-SNP guest to
>>>>>>>> make request to the SEV-SNP firmware through hypervisor using the
>>>>>>>> SNP_GUEST_REQUEST API define in the SEV-SNP firmware specification.
>>>>>>>>
>>>>>>>> The SNP_EXT_GUEST_REQUEST is similar to SNP_GUEST_REQUEST with the
>>>>>>>> difference of an additional certificate blob that can be passed
>>>>>>>> through
>>>>>>>> the SNP_SET_CONFIG ioctl defined in the CCP driver. The CCP driver
>>>>>>>> provides snp_guest_ext_guest_request() that is used by the KVM to get
>>>>>>>> both the report and certificate data at once.
>>>>>>>>
>>>>>>>> Signed-off-by: Brijesh Singh <[email protected]>
>>>>>>>> Signed-off-by: Ashish Kalra <[email protected]>
>>>>>>>> Signed-off-by: Michael Roth <[email protected]>
>>>>>>>> ---
>>>>>>>>   arch/x86/kvm/svm/sev.c | 185
>>>>>>>> +++++++++++++++++++++++++++++++++++++++--
>>>>>>>>   arch/x86/kvm/svm/svm.h |   2 +
>>>>>>>>   2 files changed, 181 insertions(+), 6 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
>>>>>>>> index 5f2b2092cdae..18efa70553c2 100644
>>>>>>>> --- a/arch/x86/kvm/svm/sev.c
>>>>>>>> +++ b/arch/x86/kvm/svm/sev.c
>>>>>>>> @@ -331,6 +331,7 @@ static int sev_guest_init(struct kvm *kvm,
>>>>>>>> struct kvm_sev_cmd *argp)
>>>>>>>>           if (ret)
>>>>>>>>               goto e_free;
>>>>>>>> +        mutex_init(&sev->guest_req_lock);
>>>>>>>>           ret = sev_snp_init(&argp->error, false);
>>>>>>>>       } else {
>>>>>>>>           ret = sev_platform_init(&argp->error);
>>>>>>>> @@ -2051,23 +2052,34 @@ int sev_vm_move_enc_context_from(struct
>>>>>>>> kvm *kvm, unsigned int source_fd)
>>>>>>>>    */
>>>>>>>>   static void *snp_context_create(struct kvm *kvm, struct
>>>>>>>> kvm_sev_cmd *argp)
>>>>>>>>   {
>>>>>>>> +    struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
>>>>>>>>       struct sev_data_snp_addr data = {};
>>>>>>>> -    void *context;
>>>>>>>> +    void *context, *certs_data;
>>>>>>>>       int rc;
>>>>>>>> +    /* Allocate memory used for the certs data in SNP guest
>>>>>>>> request */
>>>>>>>> +    certs_data = kzalloc(SEV_FW_BLOB_MAX_SIZE, GFP_KERNEL_ACCOUNT);
>>>>>>>> +    if (!certs_data)
>>>>>>>> +        return NULL;
>>>>>>>> +
>>>>>>>>       /* Allocate memory for context page */
>>>>>>>>       context = snp_alloc_firmware_page(GFP_KERNEL_ACCOUNT);
>>>>>>>>       if (!context)
>>>>>>>> -        return NULL;
>>>>>>>> +        goto e_free;
>>>>>>>>       data.gctx_paddr = __psp_pa(context);
>>>>>>>>       rc = __sev_issue_cmd(argp->sev_fd, SEV_CMD_SNP_GCTX_CREATE,
>>>>>>>> &data, &argp->error);
>>>>>>>> -    if (rc) {
>>>>>>>> -        snp_free_firmware_page(context);
>>>>>>>> -        return NULL;
>>>>>>>> -    }
>>>>>>>> +    if (rc)
>>>>>>>> +        goto e_free;
>>>>>>>> +
>>>>>>>> +    sev->snp_certs_data = certs_data;
>>>>>>>>       return context;
>>>>>>>> +
>>>>>>>> +e_free:
>>>>>>>> +    snp_free_firmware_page(context);
>>>>>>>> +    kfree(certs_data);
>>>>>>>> +    return NULL;
>>>>>>>>   }
>>>>>>>>   static int snp_bind_asid(struct kvm *kvm, int *error)
>>>>>>>> @@ -2653,6 +2665,8 @@ static int snp_decommission_context(struct
>>>>>>>> kvm *kvm)
>>>>>>>>       snp_free_firmware_page(sev->snp_context);
>>>>>>>>       sev->snp_context = NULL;
>>>>>>>> +    kfree(sev->snp_certs_data);
>>>>>>>> +
>>>>>>>>       return 0;
>>>>>>>>   }
>>>>>>>> @@ -3174,6 +3188,8 @@ static int sev_es_validate_vmgexit(struct
>>>>>>>> vcpu_svm *svm, u64 *exit_code)
>>>>>>>>       case SVM_VMGEXIT_UNSUPPORTED_EVENT:
>>>>>>>>       case SVM_VMGEXIT_HV_FEATURES:
>>>>>>>>       case SVM_VMGEXIT_PSC:
>>>>>>>> +    case SVM_VMGEXIT_GUEST_REQUEST:
>>>>>>>> +    case SVM_VMGEXIT_EXT_GUEST_REQUEST:
>>>>>>>>           break;
>>>>>>>>       default:
>>>>>>>>           reason = GHCB_ERR_INVALID_EVENT;
>>>>>>>> @@ -3396,6 +3412,149 @@ static int snp_complete_psc(struct
>>>>>>>> kvm_vcpu *vcpu)
>>>>>>>>       return 1;
>>>>>>>>   }
>>>>>>>> +static unsigned long snp_setup_guest_buf(struct vcpu_svm *svm,
>>>>>>>> +                     struct sev_data_snp_guest_request *data,
>>>>>>>> +                     gpa_t req_gpa, gpa_t resp_gpa)
>>>>>>>> +{
>>>>>>>> +    struct kvm_vcpu *vcpu = &svm->vcpu;
>>>>>>>> +    struct kvm *kvm = vcpu->kvm;
>>>>>>>> +    kvm_pfn_t req_pfn, resp_pfn;
>>>>>>>> +    struct kvm_sev_info *sev;
>>>>>>>> +
>>>>>>>> +    sev = &to_kvm_svm(kvm)->sev_info;
>>>>>>>> +
>>>>>>>> +    if (!IS_ALIGNED(req_gpa, PAGE_SIZE) || !IS_ALIGNED(resp_gpa,
>>>>>>>> PAGE_SIZE))
>>>>>>>> +        return SEV_RET_INVALID_PARAM;
>>>>>>>> +
>>>>>>>> +    req_pfn = gfn_to_pfn(kvm, gpa_to_gfn(req_gpa));
>>>>>>>> +    if (is_error_noslot_pfn(req_pfn))
>>>>>>>> +        return SEV_RET_INVALID_ADDRESS;
>>>>>>>> +
>>>>>>>> +    resp_pfn = gfn_to_pfn(kvm, gpa_to_gfn(resp_gpa));
>>>>>>>> +    if (is_error_noslot_pfn(resp_pfn))
>>>>>>>> +        return SEV_RET_INVALID_ADDRESS;
>>>>>>>> +
>>>>>>>> +    if (rmp_make_private(resp_pfn, 0, PG_LEVEL_4K, 0, true))
>>>>>>>> +        return SEV_RET_INVALID_ADDRESS;
>>>>>>>> +
>>>>>>>> +    data->gctx_paddr = __psp_pa(sev->snp_context);
>>>>>>>> +    data->req_paddr = __sme_set(req_pfn << PAGE_SHIFT);
>>>>>>>> +    data->res_paddr = __sme_set(resp_pfn << PAGE_SHIFT);
>>>>>>>> +
>>>>>>>> +    return 0;
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +static void snp_cleanup_guest_buf(struct
>>>>>>>> sev_data_snp_guest_request *data, unsigned long *rc)
>>>>>>>> +{
>>>>>>>> +    u64 pfn = __sme_clr(data->res_paddr) >> PAGE_SHIFT;
>>>>>>>> +    int ret;
>>>>>>>> +
>>>>>>>> +    ret = snp_page_reclaim(pfn);
>>>>>>>> +    if (ret)
>>>>>>>> +        *rc = SEV_RET_INVALID_ADDRESS;
>>>>>>>> +
>>>>>>>> +    ret = rmp_make_shared(pfn, PG_LEVEL_4K);
>>>>>>>> +    if (ret)
>>>>>>>> +        *rc = SEV_RET_INVALID_ADDRESS;
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +static void snp_handle_guest_request(struct vcpu_svm *svm, gpa_t
>>>>>>>> req_gpa, gpa_t resp_gpa)
>>>>>>>> +{
>>>>>>>> +    struct sev_data_snp_guest_request data = {0};
>>>>>>>> +    struct kvm_vcpu *vcpu = &svm->vcpu;
>>>>>>>> +    struct kvm *kvm = vcpu->kvm;
>>>>>>>> +    struct kvm_sev_info *sev;
>>>>>>>> +    unsigned long rc;
>>>>>>>> +    int err;
>>>>>>>> +
>>>>>>>> +    if (!sev_snp_guest(vcpu->kvm)) {
>>>>>>>> +        rc = SEV_RET_INVALID_GUEST;
>>>>>>>> +        goto e_fail;
>>>>>>>> +    }
>>>>>>>> +
>>>>>>>> +    sev = &to_kvm_svm(kvm)->sev_info;
>>>>>>>> +
>>>>>>>> +    mutex_lock(&sev->guest_req_lock);
>>>>>>>> +
>>>>>>>> +    rc = snp_setup_guest_buf(svm, &data, req_gpa, resp_gpa);
>>>>>>>> +    if (rc)
>>>>>>>> +        goto unlock;
>>>>>>>> +
>>>>>>>> +    rc = sev_issue_cmd(kvm, SEV_CMD_SNP_GUEST_REQUEST, &data, &err);
>>>>>>>
>>>>>>>
>>>>>>> This one goes via sev_issue_cmd_external_user() and uses sev-fd...
>>>>>>>
>>>>>>>> +    if (rc)
>>>>>>>> +        /* use the firmware error code */
>>>>>>>> +        rc = err;
>>>>>>>> +
>>>>>>>> +    snp_cleanup_guest_buf(&data, &rc);
>>>>>>>> +
>>>>>>>> +unlock:
>>>>>>>> +    mutex_unlock(&sev->guest_req_lock);
>>>>>>>> +
>>>>>>>> +e_fail:
>>>>>>>> +    svm_set_ghcb_sw_exit_info_2(vcpu, rc);
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +static void snp_handle_ext_guest_request(struct vcpu_svm *svm,
>>>>>>>> gpa_t req_gpa, gpa_t resp_gpa)
>>>>>>>> +{
>>>>>>>> +    struct sev_data_snp_guest_request req = {0};
>>>>>>>> +    struct kvm_vcpu *vcpu = &svm->vcpu;
>>>>>>>> +    struct kvm *kvm = vcpu->kvm;
>>>>>>>> +    unsigned long data_npages;
>>>>>>>> +    struct kvm_sev_info *sev;
>>>>>>>> +    unsigned long rc, err;
>>>>>>>> +    u64 data_gpa;
>>>>>>>> +
>>>>>>>> +    if (!sev_snp_guest(vcpu->kvm)) {
>>>>>>>> +        rc = SEV_RET_INVALID_GUEST;
>>>>>>>> +        goto e_fail;
>>>>>>>> +    }
>>>>>>>> +
>>>>>>>> +    sev = &to_kvm_svm(kvm)->sev_info;
>>>>>>>> +
>>>>>>>> +    data_gpa = vcpu->arch.regs[VCPU_REGS_RAX];
>>>>>>>> +    data_npages = vcpu->arch.regs[VCPU_REGS_RBX];
>>>>>>>> +
>>>>>>>> +    if (!IS_ALIGNED(data_gpa, PAGE_SIZE)) {
>>>>>>>> +        rc = SEV_RET_INVALID_ADDRESS;
>>>>>>>> +        goto e_fail;
>>>>>>>> +    }
>>>>>>>> +
>>>>>>>> +    mutex_lock(&sev->guest_req_lock);
>>>>>>>> +
>>>>>>>> +    rc = snp_setup_guest_buf(svm, &req, req_gpa, resp_gpa);
>>>>>>>> +    if (rc)
>>>>>>>> +        goto unlock;
>>>>>>>> +
>>>>>>>> +    rc = snp_guest_ext_guest_request(&req, (unsigned
>>>>>>>> long)sev->snp_certs_data,
>>>>>>>> +                     &data_npages, &err);
>>>>>>>
>>>>>>> but this one does not and jump straight to
>>>>>>> drivers/crypto/ccp/sev-dev.c ignoring sev->fd. Why different? Can
>>>>>>> these two be unified? sev_issue_cmd_external_user() only checks if
>>>>>>> fd is /dev/sev which is hardly useful.
>>>>>>>
>>>>>>> "[PATCH RFC v7 32/64] crypto: ccp: Provide APIs to query extended
>>>>>>> attestation report" added this one.
>>>>>>
>>>>>> SNP_EXT_GUEST_REQUEST additionally returns a certificate blob and
>>>>>> that's why it goes through the CCP driver interface
>>>>>> snp_guest_ext_guest_request() that is used to get both the report
>>>>>> and certificate data/blob at the same time.
>>>>>
>>>>> True. I thought though that this calls for extending sev_issue_cmd()
>>>>> to take care of these extra parameters rather than just skipping the
>>>>> sev->fd.
>>>>>
>>>>>
>>>>>> All the FW API calls on the KVM side go through sev_issue_cmd() and
>>>>>> sev_issue_cmd_external_user() interfaces and that i believe uses
>>>>>> sev->fd more of as a sanity check.
>>>>>
>>>>> Does not look like it:
>>>>>
>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/crypto/ccp/sev-dev.c?h=v6.2-rc3#n1290
>>>>>
>>>>> ===
>>>>> int sev_issue_cmd_external_user(struct file *filep, unsigned int cmd,
>>>>>                  void *data, int *error)
>>>>> {
>>>>>      if (!filep || filep->f_op != &sev_fops)
>>>>>          return -EBADF;
>>>>>
>>>>>      return sev_do_cmd(cmd, data, error);
>>>>> }
>>>>> EXPORT_SYMBOL_GPL(sev_issue_cmd_external_user);
>>>>> ===
>>>>>
>>>>> The only "more" is that it requires sev->fd to be a valid open fd,
>>>>> what is the value in that? I may easily miss the bigger picture here.
>>>>> Thanks,
>>>>>
>>>>>
>>>>
>>>> Have a look at following functions in drivers/crypto/ccp/sev-dev.c:
>>>> sev_dev_init() and sev_misc_init().
>>>>
>>>> static int sev_misc_init(struct sev_device *sev)
>>>> {
>>>>          struct device *dev = sev->dev;
>>>>          int ret;
>>>>
>>>>          /*
>>>>           * SEV feature support can be detected on multiple devices but
>>>>           * the SEV FW commands must be issued on the master. During
>>>>           * probe, we do not know the master hence we create /dev/sev on
>>>>           * the first device probe.
>>>>           * sev_do_cmd() finds the right master device to which to issue
>>>>           * the command to the firmware.
>>>>       */
>>>
>>>
>>> It is still a single /dev/sev node and the userspace cannot get it
>>> wrong, it does not have to choose between (for instance) /dev/sev0 and
>>> /dev/sev1 on a 2 SOC system.
>>>
>>>> ...
>>>> ...
>>>>
>>>> Hence, sev_issue_cmd_external_user() needs to ensure that the correct
>>>> device (master device) is being operated upon and that's why there is
>>>> the check for file operations matching sev_fops as below :
>>>>
>>>> int sev_issue_cmd_external_user(struct file *filep, unsigned int cmd,
>>>>                                  void *data, int *error)
>>>> {
>>>>          if (!filep || filep->f_op != &sev_fops)
>>>>                  return -EBADF;
>>>> ..
>>>> ..
>>>>
>>>> Essentially, sev->fd is the misc. device created for the master PSP
>>>> device on which the SEV/SNP firmware commands are issued, hence,
>>>> sev_issue_cmd() uses sev->fd.
>>>
>>> There is always just one fd which always uses psp_master, nothing from
>>> that fd is used.
>>
>> It also ensures that we can only issue commands (sev_issue_cmd) after
>> SEV/SNP guest has launched.
>
> I can open /dev/sev and start sending commands to the firmware with no KVM
> running at all. Oh well, we discussed this offline :)
>
>> We don't have a valid fd to use before the guest launch. The file
>> descriptor is passed as part of the guest launch flow, for example, in
>> snp_launch_start().
>>>
>>> More to the point, if sev->fd is still important, why is it ok to skip
>>> it for snp_handle_ext_guest_request()? Thanks,
>>>
>>>
>> Then, we should do the same for snp_handle_ext_guest_request().
>
> Okay.
>
> This snp_handle_ext_guest_request() helper is for returning "Table 21.
> ATTESTATION_REPORT Structure" along with the certificate(s) used to sign
> the report: "This usage allows the attestation report and the certificates
> required to verify the report to be returned at the same time".
>
> I can see:
> 1) KVM_SEV_SNP_{G,S}ET_CERTS ioctls on KVM VM and

This allows the VMM to (optionally) supply per-VM certificates that the
guest can use to validate the attestation report, instead of the guest
requesting separately.

> 2) SNP_{SET,GET}_EXT_CONFIG ioctls on /dev/sev

This allows the VMM to (optionally) supply certificates used for all VMs,
i.e., there is no need for per-VM certificates.

> Both store the passed blob and neither communicate it to the firmware.
> This makes me wonder - how does the attestation report (cooked by the
> firmware) get signed with those certificates passed on by the HV userspace?

These are for use by the guest to validate the attestation report. It
allows the guest to obtain the certificate information without having to
use another method to request the certificates.

By having this certificate store, the hypervisor can request the
certificates from the KDS once, rather than every time a guest requests an
attestation report.

>
> Also, the cached blob in /dev/sev seems redundand - the attestation report
> is retuned for a specific guest so having a blob in the KVM VM makes sense
> and KVM unconditionally reserves memory for it anyway. And for the HV
> itself the blob is useless (?) so why bother with caching it in /dev/sev.

In general, the certificates are for the machine (VCEK, ASK, ARK), so they
can be for all VMs on the machine. The per-VM blob allows a VMM to supply
additional per-VM certficates, if it desires, but is not required.

>
> And GET ioctls() return what SET passed on (not something the firware
> returned, for example), what is ever going to call SET? The userspace can

As stated above, the firmware already has the information needed to sign
the attestation report. The SET IOCTL is used to supply the certficates to
the guest for validation of the attestation report. This reduces the
traffic and complexity of the guest requesting the certficates from the KDS.

> as well cache what it passed and save a bit of the code/memory in the kernel.
>
> btw SNP_{SET,GET}_EXT_CONFIG are documented in
> Documentation/virt/coco/sev-guest.rst but implemented in
> drivers/crypto/ccp/sev-dev.c (not sev-guest.c).
>
> What do I miss in the big picture here? :) Thanks,

The reason for the extended request is to make the attestation request
appear atomic to the guest. If you had to make two calls to request the
information, in the future, when live migration is possible, there is no
guarantee that the guest couldn't have been migrated in between the calls
to obtain the certificates and the call to obtain the attestation report
and thus validation of the attestation report could fail.

Thanks,
Tom

>
>

2023-01-31 17:52:56

by Kalra, Ashish

[permalink] [raw]
Subject: Re: [PATCH RFC v7 52/64] KVM: SVM: Provide support for SNP_GUEST_REQUEST NAE event

On 1/30/2023 7:54 PM, Alexey Kardashevskiy wrote:
>
>
> On 11/1/23 13:01, Kalra, Ashish wrote:
>> On 1/10/2023 6:48 PM, Alexey Kardashevskiy wrote:
>>> On 10/1/23 19:33, Kalra, Ashish wrote:
>>>>
>>>> On 1/9/2023 8:28 PM, Alexey Kardashevskiy wrote:
>>>>>
>>>>>
>>>>> On 10/1/23 10:41, Kalra, Ashish wrote:
>>>>>> On 1/8/2023 9:33 PM, Alexey Kardashevskiy wrote:
>>>>>>> On 15/12/22 06:40, Michael Roth wrote:
>>>>>>>> From: Brijesh Singh <[email protected]>
>>>>>>>>
>>>>>>>> Version 2 of GHCB specification added the support for two SNP Guest
>>>>>>>> Request Message NAE events. The events allows for an SEV-SNP
>>>>>>>> guest to
>>>>>>>> make request to the SEV-SNP firmware through hypervisor using the
>>>>>>>> SNP_GUEST_REQUEST API define in the SEV-SNP firmware specification.
>>>>>>>>
>>>>>>>> The SNP_EXT_GUEST_REQUEST is similar to SNP_GUEST_REQUEST with the
>>>>>>>> difference of an additional certificate blob that can be passed
>>>>>>>> through
>>>>>>>> the SNP_SET_CONFIG ioctl defined in the CCP driver. The CCP driver
>>>>>>>> provides snp_guest_ext_guest_request() that is used by the KVM
>>>>>>>> to get
>>>>>>>> both the report and certificate data at once.
>>>>>>>>
>>>>>>>> Signed-off-by: Brijesh Singh <[email protected]>
>>>>>>>> Signed-off-by: Ashish Kalra <[email protected]>
>>>>>>>> Signed-off-by: Michael Roth <[email protected]>
>>>>>>>> ---
>>>>>>>>   arch/x86/kvm/svm/sev.c | 185
>>>>>>>> +++++++++++++++++++++++++++++++++++++++--
>>>>>>>>   arch/x86/kvm/svm/svm.h |   2 +
>>>>>>>>   2 files changed, 181 insertions(+), 6 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
>>>>>>>> index 5f2b2092cdae..18efa70553c2 100644
>>>>>>>> --- a/arch/x86/kvm/svm/sev.c
>>>>>>>> +++ b/arch/x86/kvm/svm/sev.c
>>>>>>>> @@ -331,6 +331,7 @@ static int sev_guest_init(struct kvm *kvm,
>>>>>>>> struct kvm_sev_cmd *argp)
>>>>>>>>           if (ret)
>>>>>>>>               goto e_free;
>>>>>>>> +        mutex_init(&sev->guest_req_lock);
>>>>>>>>           ret = sev_snp_init(&argp->error, false);
>>>>>>>>       } else {
>>>>>>>>           ret = sev_platform_init(&argp->error);
>>>>>>>> @@ -2051,23 +2052,34 @@ int sev_vm_move_enc_context_from(struct
>>>>>>>> kvm *kvm, unsigned int source_fd)
>>>>>>>>    */
>>>>>>>>   static void *snp_context_create(struct kvm *kvm, struct
>>>>>>>> kvm_sev_cmd *argp)
>>>>>>>>   {
>>>>>>>> +    struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
>>>>>>>>       struct sev_data_snp_addr data = {};
>>>>>>>> -    void *context;
>>>>>>>> +    void *context, *certs_data;
>>>>>>>>       int rc;
>>>>>>>> +    /* Allocate memory used for the certs data in SNP guest
>>>>>>>> request */
>>>>>>>> +    certs_data = kzalloc(SEV_FW_BLOB_MAX_SIZE,
>>>>>>>> GFP_KERNEL_ACCOUNT);
>>>>>>>> +    if (!certs_data)
>>>>>>>> +        return NULL;
>>>>>>>> +
>>>>>>>>       /* Allocate memory for context page */
>>>>>>>>       context = snp_alloc_firmware_page(GFP_KERNEL_ACCOUNT);
>>>>>>>>       if (!context)
>>>>>>>> -        return NULL;
>>>>>>>> +        goto e_free;
>>>>>>>>       data.gctx_paddr = __psp_pa(context);
>>>>>>>>       rc = __sev_issue_cmd(argp->sev_fd,
>>>>>>>> SEV_CMD_SNP_GCTX_CREATE, &data, &argp->error);
>>>>>>>> -    if (rc) {
>>>>>>>> -        snp_free_firmware_page(context);
>>>>>>>> -        return NULL;
>>>>>>>> -    }
>>>>>>>> +    if (rc)
>>>>>>>> +        goto e_free;
>>>>>>>> +
>>>>>>>> +    sev->snp_certs_data = certs_data;
>>>>>>>>       return context;
>>>>>>>> +
>>>>>>>> +e_free:
>>>>>>>> +    snp_free_firmware_page(context);
>>>>>>>> +    kfree(certs_data);
>>>>>>>> +    return NULL;
>>>>>>>>   }
>>>>>>>>   static int snp_bind_asid(struct kvm *kvm, int *error)
>>>>>>>> @@ -2653,6 +2665,8 @@ static int snp_decommission_context(struct
>>>>>>>> kvm *kvm)
>>>>>>>>       snp_free_firmware_page(sev->snp_context);
>>>>>>>>       sev->snp_context = NULL;
>>>>>>>> +    kfree(sev->snp_certs_data);
>>>>>>>> +
>>>>>>>>       return 0;
>>>>>>>>   }
>>>>>>>> @@ -3174,6 +3188,8 @@ static int sev_es_validate_vmgexit(struct
>>>>>>>> vcpu_svm *svm, u64 *exit_code)
>>>>>>>>       case SVM_VMGEXIT_UNSUPPORTED_EVENT:
>>>>>>>>       case SVM_VMGEXIT_HV_FEATURES:
>>>>>>>>       case SVM_VMGEXIT_PSC:
>>>>>>>> +    case SVM_VMGEXIT_GUEST_REQUEST:
>>>>>>>> +    case SVM_VMGEXIT_EXT_GUEST_REQUEST:
>>>>>>>>           break;
>>>>>>>>       default:
>>>>>>>>           reason = GHCB_ERR_INVALID_EVENT;
>>>>>>>> @@ -3396,6 +3412,149 @@ static int snp_complete_psc(struct
>>>>>>>> kvm_vcpu *vcpu)
>>>>>>>>       return 1;
>>>>>>>>   }
>>>>>>>> +static unsigned long snp_setup_guest_buf(struct vcpu_svm *svm,
>>>>>>>> +                     struct sev_data_snp_guest_request *data,
>>>>>>>> +                     gpa_t req_gpa, gpa_t resp_gpa)
>>>>>>>> +{
>>>>>>>> +    struct kvm_vcpu *vcpu = &svm->vcpu;
>>>>>>>> +    struct kvm *kvm = vcpu->kvm;
>>>>>>>> +    kvm_pfn_t req_pfn, resp_pfn;
>>>>>>>> +    struct kvm_sev_info *sev;
>>>>>>>> +
>>>>>>>> +    sev = &to_kvm_svm(kvm)->sev_info;
>>>>>>>> +
>>>>>>>> +    if (!IS_ALIGNED(req_gpa, PAGE_SIZE) ||
>>>>>>>> !IS_ALIGNED(resp_gpa, PAGE_SIZE))
>>>>>>>> +        return SEV_RET_INVALID_PARAM;
>>>>>>>> +
>>>>>>>> +    req_pfn = gfn_to_pfn(kvm, gpa_to_gfn(req_gpa));
>>>>>>>> +    if (is_error_noslot_pfn(req_pfn))
>>>>>>>> +        return SEV_RET_INVALID_ADDRESS;
>>>>>>>> +
>>>>>>>> +    resp_pfn = gfn_to_pfn(kvm, gpa_to_gfn(resp_gpa));
>>>>>>>> +    if (is_error_noslot_pfn(resp_pfn))
>>>>>>>> +        return SEV_RET_INVALID_ADDRESS;
>>>>>>>> +
>>>>>>>> +    if (rmp_make_private(resp_pfn, 0, PG_LEVEL_4K, 0, true))
>>>>>>>> +        return SEV_RET_INVALID_ADDRESS;
>>>>>>>> +
>>>>>>>> +    data->gctx_paddr = __psp_pa(sev->snp_context);
>>>>>>>> +    data->req_paddr = __sme_set(req_pfn << PAGE_SHIFT);
>>>>>>>> +    data->res_paddr = __sme_set(resp_pfn << PAGE_SHIFT);
>>>>>>>> +
>>>>>>>> +    return 0;
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +static void snp_cleanup_guest_buf(struct
>>>>>>>> sev_data_snp_guest_request *data, unsigned long *rc)
>>>>>>>> +{
>>>>>>>> +    u64 pfn = __sme_clr(data->res_paddr) >> PAGE_SHIFT;
>>>>>>>> +    int ret;
>>>>>>>> +
>>>>>>>> +    ret = snp_page_reclaim(pfn);
>>>>>>>> +    if (ret)
>>>>>>>> +        *rc = SEV_RET_INVALID_ADDRESS;
>>>>>>>> +
>>>>>>>> +    ret = rmp_make_shared(pfn, PG_LEVEL_4K);
>>>>>>>> +    if (ret)
>>>>>>>> +        *rc = SEV_RET_INVALID_ADDRESS;
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +static void snp_handle_guest_request(struct vcpu_svm *svm,
>>>>>>>> gpa_t req_gpa, gpa_t resp_gpa)
>>>>>>>> +{
>>>>>>>> +    struct sev_data_snp_guest_request data = {0};
>>>>>>>> +    struct kvm_vcpu *vcpu = &svm->vcpu;
>>>>>>>> +    struct kvm *kvm = vcpu->kvm;
>>>>>>>> +    struct kvm_sev_info *sev;
>>>>>>>> +    unsigned long rc;
>>>>>>>> +    int err;
>>>>>>>> +
>>>>>>>> +    if (!sev_snp_guest(vcpu->kvm)) {
>>>>>>>> +        rc = SEV_RET_INVALID_GUEST;
>>>>>>>> +        goto e_fail;
>>>>>>>> +    }
>>>>>>>> +
>>>>>>>> +    sev = &to_kvm_svm(kvm)->sev_info;
>>>>>>>> +
>>>>>>>> +    mutex_lock(&sev->guest_req_lock);
>>>>>>>> +
>>>>>>>> +    rc = snp_setup_guest_buf(svm, &data, req_gpa, resp_gpa);
>>>>>>>> +    if (rc)
>>>>>>>> +        goto unlock;
>>>>>>>> +
>>>>>>>> +    rc = sev_issue_cmd(kvm, SEV_CMD_SNP_GUEST_REQUEST, &data,
>>>>>>>> &err);
>>>>>>>
>>>>>>>
>>>>>>> This one goes via sev_issue_cmd_external_user() and uses sev-fd...
>>>>>>>
>>>>>>>> +    if (rc)
>>>>>>>> +        /* use the firmware error code */
>>>>>>>> +        rc = err;
>>>>>>>> +
>>>>>>>> +    snp_cleanup_guest_buf(&data, &rc);
>>>>>>>> +
>>>>>>>> +unlock:
>>>>>>>> +    mutex_unlock(&sev->guest_req_lock);
>>>>>>>> +
>>>>>>>> +e_fail:
>>>>>>>> +    svm_set_ghcb_sw_exit_info_2(vcpu, rc);
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +static void snp_handle_ext_guest_request(struct vcpu_svm *svm,
>>>>>>>> gpa_t req_gpa, gpa_t resp_gpa)
>>>>>>>> +{
>>>>>>>> +    struct sev_data_snp_guest_request req = {0};
>>>>>>>> +    struct kvm_vcpu *vcpu = &svm->vcpu;
>>>>>>>> +    struct kvm *kvm = vcpu->kvm;
>>>>>>>> +    unsigned long data_npages;
>>>>>>>> +    struct kvm_sev_info *sev;
>>>>>>>> +    unsigned long rc, err;
>>>>>>>> +    u64 data_gpa;
>>>>>>>> +
>>>>>>>> +    if (!sev_snp_guest(vcpu->kvm)) {
>>>>>>>> +        rc = SEV_RET_INVALID_GUEST;
>>>>>>>> +        goto e_fail;
>>>>>>>> +    }
>>>>>>>> +
>>>>>>>> +    sev = &to_kvm_svm(kvm)->sev_info;
>>>>>>>> +
>>>>>>>> +    data_gpa = vcpu->arch.regs[VCPU_REGS_RAX];
>>>>>>>> +    data_npages = vcpu->arch.regs[VCPU_REGS_RBX];
>>>>>>>> +
>>>>>>>> +    if (!IS_ALIGNED(data_gpa, PAGE_SIZE)) {
>>>>>>>> +        rc = SEV_RET_INVALID_ADDRESS;
>>>>>>>> +        goto e_fail;
>>>>>>>> +    }
>>>>>>>> +
>>>>>>>> +    mutex_lock(&sev->guest_req_lock);
>>>>>>>> +
>>>>>>>> +    rc = snp_setup_guest_buf(svm, &req, req_gpa, resp_gpa);
>>>>>>>> +    if (rc)
>>>>>>>> +        goto unlock;
>>>>>>>> +
>>>>>>>> +    rc = snp_guest_ext_guest_request(&req, (unsigned
>>>>>>>> long)sev->snp_certs_data,
>>>>>>>> +                     &data_npages, &err);
>>>>>>>
>>>>>>> but this one does not and jump straight to
>>>>>>> drivers/crypto/ccp/sev-dev.c ignoring sev->fd. Why different? Can
>>>>>>> these two be unified? sev_issue_cmd_external_user() only checks
>>>>>>> if fd is /dev/sev which is hardly useful.
>>>>>>>
>>>>>>> "[PATCH RFC v7 32/64] crypto: ccp: Provide APIs to query extended
>>>>>>> attestation report" added this one.
>>>>>>
>>>>>> SNP_EXT_GUEST_REQUEST additionally returns a certificate blob and
>>>>>> that's why it goes through the CCP driver interface
>>>>>> snp_guest_ext_guest_request() that is used to get both the report
>>>>>> and certificate data/blob at the same time.
>>>>>
>>>>> True. I thought though that this calls for extending
>>>>> sev_issue_cmd() to take care of these extra parameters rather than
>>>>> just skipping the sev->fd.
>>>>>
>>>>>
>>>>>> All the FW API calls on the KVM side go through sev_issue_cmd()
>>>>>> and sev_issue_cmd_external_user() interfaces and that i believe
>>>>>> uses sev->fd more of as a sanity check.
>>>>>
>>>>> Does not look like it:
>>>>>
>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/crypto/ccp/sev-dev.c?h=v6.2-rc3#n1290
>>>>>
>>>>>
>>>>> ===
>>>>> int sev_issue_cmd_external_user(struct file *filep, unsigned int cmd,
>>>>>                  void *data, int *error)
>>>>> {
>>>>>      if (!filep || filep->f_op != &sev_fops)
>>>>>          return -EBADF;
>>>>>
>>>>>      return sev_do_cmd(cmd, data, error);
>>>>> }
>>>>> EXPORT_SYMBOL_GPL(sev_issue_cmd_external_user);
>>>>> ===
>>>>>
>>>>> The only "more" is that it requires sev->fd to be a valid open fd,
>>>>> what is the value in that? I may easily miss the bigger picture
>>>>> here. Thanks,
>>>>>
>>>>>
>>>>
>>>> Have a look at following functions in drivers/crypto/ccp/sev-dev.c:
>>>> sev_dev_init() and sev_misc_init().
>>>>
>>>> static int sev_misc_init(struct sev_device *sev)
>>>> {
>>>>          struct device *dev = sev->dev;
>>>>          int ret;
>>>>
>>>>          /*
>>>>           * SEV feature support can be detected on multiple devices but
>>>>           * the SEV FW commands must be issued on the master. During
>>>>           * probe, we do not know the master hence we create
>>>> /dev/sev on
>>>>           * the first device probe.
>>>>           * sev_do_cmd() finds the right master device to which to
>>>> issue
>>>>           * the command to the firmware.
>>>>       */
>>>
>>>
>>> It is still a single /dev/sev node and the userspace cannot get it
>>> wrong, it does not have to choose between (for instance) /dev/sev0
>>> and /dev/sev1 on a 2 SOC system.
>>>
>>>> ...
>>>> ...
>>>>
>>>> Hence, sev_issue_cmd_external_user() needs to ensure that the
>>>> correct device (master device) is being operated upon and that's why
>>>> there is the check for file operations matching sev_fops as below :
>>>>
>>>> int sev_issue_cmd_external_user(struct file *filep, unsigned int cmd,
>>>>                                  void *data, int *error)
>>>> {
>>>>          if (!filep || filep->f_op != &sev_fops)
>>>>                  return -EBADF;
>>>> ..
>>>> ..
>>>>
>>>> Essentially, sev->fd is the misc. device created for the master PSP
>>>> device on which the SEV/SNP firmware commands are issued, hence,
>>>> sev_issue_cmd() uses sev->fd.
>>>
>>> There is always just one fd which always uses psp_master, nothing
>>> from that fd is used.
>>
>> It also ensures that we can only issue commands (sev_issue_cmd) after
>> SEV/SNP guest has launched.
>
> I can open /dev/sev and start sending commands to the firmware with no
> KVM running at all. Oh well, we discussed this offline :)
>

Yes, and as we already discussed we need to support that to get SEV/SNP
platform status (SNP_PLATFORM_STATUS) and also for legacy SEV commands
like certificate generation and import/export which can be issued before
a VM is launched.

Thanks,
Ashish

2023-01-31 20:21:32

by Alexey Kardashevskiy

[permalink] [raw]
Subject: Re: [PATCH RFC v7 52/64] KVM: SVM: Provide support for SNP_GUEST_REQUEST NAE event



On 01/02/2023 03:23, Tom Lendacky wrote:
> On 1/30/23 19:54, Alexey Kardashevskiy wrote:
>>
>>
>> On 11/1/23 13:01, Kalra, Ashish wrote:
>>> On 1/10/2023 6:48 PM, Alexey Kardashevskiy wrote:
>>>> On 10/1/23 19:33, Kalra, Ashish wrote:
>>>>>
>>>>> On 1/9/2023 8:28 PM, Alexey Kardashevskiy wrote:
>>>>>>
>>>>>>
>>>>>> On 10/1/23 10:41, Kalra, Ashish wrote:
>>>>>>> On 1/8/2023 9:33 PM, Alexey Kardashevskiy wrote:
>>>>>>>> On 15/12/22 06:40, Michael Roth wrote:
>>>>>>>>> From: Brijesh Singh <[email protected]>
>>>>>>>>>
>>>>>>>>> Version 2 of GHCB specification added the support for two SNP
>>>>>>>>> Guest
>>>>>>>>> Request Message NAE events. The events allows for an SEV-SNP
>>>>>>>>> guest to
>>>>>>>>> make request to the SEV-SNP firmware through hypervisor using the
>>>>>>>>> SNP_GUEST_REQUEST API define in the SEV-SNP firmware
>>>>>>>>> specification.
>>>>>>>>>
>>>>>>>>> The SNP_EXT_GUEST_REQUEST is similar to SNP_GUEST_REQUEST with the
>>>>>>>>> difference of an additional certificate blob that can be passed
>>>>>>>>> through
>>>>>>>>> the SNP_SET_CONFIG ioctl defined in the CCP driver. The CCP driver
>>>>>>>>> provides snp_guest_ext_guest_request() that is used by the KVM
>>>>>>>>> to get
>>>>>>>>> both the report and certificate data at once.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Brijesh Singh <[email protected]>
>>>>>>>>> Signed-off-by: Ashish Kalra <[email protected]>
>>>>>>>>> Signed-off-by: Michael Roth <[email protected]>
>>>>>>>>> ---
>>>>>>>>>   arch/x86/kvm/svm/sev.c | 185
>>>>>>>>> +++++++++++++++++++++++++++++++++++++++--
>>>>>>>>>   arch/x86/kvm/svm/svm.h |   2 +
>>>>>>>>>   2 files changed, 181 insertions(+), 6 deletions(-)
>>>>>>>>>
>>>>>>>>> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
>>>>>>>>> index 5f2b2092cdae..18efa70553c2 100644
>>>>>>>>> --- a/arch/x86/kvm/svm/sev.c
>>>>>>>>> +++ b/arch/x86/kvm/svm/sev.c
>>>>>>>>> @@ -331,6 +331,7 @@ static int sev_guest_init(struct kvm *kvm,
>>>>>>>>> struct kvm_sev_cmd *argp)
>>>>>>>>>           if (ret)
>>>>>>>>>               goto e_free;
>>>>>>>>> +        mutex_init(&sev->guest_req_lock);
>>>>>>>>>           ret = sev_snp_init(&argp->error, false);
>>>>>>>>>       } else {
>>>>>>>>>           ret = sev_platform_init(&argp->error);
>>>>>>>>> @@ -2051,23 +2052,34 @@ int sev_vm_move_enc_context_from(struct
>>>>>>>>> kvm *kvm, unsigned int source_fd)
>>>>>>>>>    */
>>>>>>>>>   static void *snp_context_create(struct kvm *kvm, struct
>>>>>>>>> kvm_sev_cmd *argp)
>>>>>>>>>   {
>>>>>>>>> +    struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
>>>>>>>>>       struct sev_data_snp_addr data = {};
>>>>>>>>> -    void *context;
>>>>>>>>> +    void *context, *certs_data;
>>>>>>>>>       int rc;
>>>>>>>>> +    /* Allocate memory used for the certs data in SNP guest
>>>>>>>>> request */
>>>>>>>>> +    certs_data = kzalloc(SEV_FW_BLOB_MAX_SIZE,
>>>>>>>>> GFP_KERNEL_ACCOUNT);
>>>>>>>>> +    if (!certs_data)
>>>>>>>>> +        return NULL;
>>>>>>>>> +
>>>>>>>>>       /* Allocate memory for context page */
>>>>>>>>>       context = snp_alloc_firmware_page(GFP_KERNEL_ACCOUNT);
>>>>>>>>>       if (!context)
>>>>>>>>> -        return NULL;
>>>>>>>>> +        goto e_free;
>>>>>>>>>       data.gctx_paddr = __psp_pa(context);
>>>>>>>>>       rc = __sev_issue_cmd(argp->sev_fd,
>>>>>>>>> SEV_CMD_SNP_GCTX_CREATE, &data, &argp->error);
>>>>>>>>> -    if (rc) {
>>>>>>>>> -        snp_free_firmware_page(context);
>>>>>>>>> -        return NULL;
>>>>>>>>> -    }
>>>>>>>>> +    if (rc)
>>>>>>>>> +        goto e_free;
>>>>>>>>> +
>>>>>>>>> +    sev->snp_certs_data = certs_data;
>>>>>>>>>       return context;
>>>>>>>>> +
>>>>>>>>> +e_free:
>>>>>>>>> +    snp_free_firmware_page(context);
>>>>>>>>> +    kfree(certs_data);
>>>>>>>>> +    return NULL;
>>>>>>>>>   }
>>>>>>>>>   static int snp_bind_asid(struct kvm *kvm, int *error)
>>>>>>>>> @@ -2653,6 +2665,8 @@ static int
>>>>>>>>> snp_decommission_context(struct kvm *kvm)
>>>>>>>>>       snp_free_firmware_page(sev->snp_context);
>>>>>>>>>       sev->snp_context = NULL;
>>>>>>>>> +    kfree(sev->snp_certs_data);
>>>>>>>>> +
>>>>>>>>>       return 0;
>>>>>>>>>   }
>>>>>>>>> @@ -3174,6 +3188,8 @@ static int sev_es_validate_vmgexit(struct
>>>>>>>>> vcpu_svm *svm, u64 *exit_code)
>>>>>>>>>       case SVM_VMGEXIT_UNSUPPORTED_EVENT:
>>>>>>>>>       case SVM_VMGEXIT_HV_FEATURES:
>>>>>>>>>       case SVM_VMGEXIT_PSC:
>>>>>>>>> +    case SVM_VMGEXIT_GUEST_REQUEST:
>>>>>>>>> +    case SVM_VMGEXIT_EXT_GUEST_REQUEST:
>>>>>>>>>           break;
>>>>>>>>>       default:
>>>>>>>>>           reason = GHCB_ERR_INVALID_EVENT;
>>>>>>>>> @@ -3396,6 +3412,149 @@ static int snp_complete_psc(struct
>>>>>>>>> kvm_vcpu *vcpu)
>>>>>>>>>       return 1;
>>>>>>>>>   }
>>>>>>>>> +static unsigned long snp_setup_guest_buf(struct vcpu_svm *svm,
>>>>>>>>> +                     struct sev_data_snp_guest_request *data,
>>>>>>>>> +                     gpa_t req_gpa, gpa_t resp_gpa)
>>>>>>>>> +{
>>>>>>>>> +    struct kvm_vcpu *vcpu = &svm->vcpu;
>>>>>>>>> +    struct kvm *kvm = vcpu->kvm;
>>>>>>>>> +    kvm_pfn_t req_pfn, resp_pfn;
>>>>>>>>> +    struct kvm_sev_info *sev;
>>>>>>>>> +
>>>>>>>>> +    sev = &to_kvm_svm(kvm)->sev_info;
>>>>>>>>> +
>>>>>>>>> +    if (!IS_ALIGNED(req_gpa, PAGE_SIZE) ||
>>>>>>>>> !IS_ALIGNED(resp_gpa, PAGE_SIZE))
>>>>>>>>> +        return SEV_RET_INVALID_PARAM;
>>>>>>>>> +
>>>>>>>>> +    req_pfn = gfn_to_pfn(kvm, gpa_to_gfn(req_gpa));
>>>>>>>>> +    if (is_error_noslot_pfn(req_pfn))
>>>>>>>>> +        return SEV_RET_INVALID_ADDRESS;
>>>>>>>>> +
>>>>>>>>> +    resp_pfn = gfn_to_pfn(kvm, gpa_to_gfn(resp_gpa));
>>>>>>>>> +    if (is_error_noslot_pfn(resp_pfn))
>>>>>>>>> +        return SEV_RET_INVALID_ADDRESS;
>>>>>>>>> +
>>>>>>>>> +    if (rmp_make_private(resp_pfn, 0, PG_LEVEL_4K, 0, true))
>>>>>>>>> +        return SEV_RET_INVALID_ADDRESS;
>>>>>>>>> +
>>>>>>>>> +    data->gctx_paddr = __psp_pa(sev->snp_context);
>>>>>>>>> +    data->req_paddr = __sme_set(req_pfn << PAGE_SHIFT);
>>>>>>>>> +    data->res_paddr = __sme_set(resp_pfn << PAGE_SHIFT);
>>>>>>>>> +
>>>>>>>>> +    return 0;
>>>>>>>>> +}
>>>>>>>>> +
>>>>>>>>> +static void snp_cleanup_guest_buf(struct
>>>>>>>>> sev_data_snp_guest_request *data, unsigned long *rc)
>>>>>>>>> +{
>>>>>>>>> +    u64 pfn = __sme_clr(data->res_paddr) >> PAGE_SHIFT;
>>>>>>>>> +    int ret;
>>>>>>>>> +
>>>>>>>>> +    ret = snp_page_reclaim(pfn);
>>>>>>>>> +    if (ret)
>>>>>>>>> +        *rc = SEV_RET_INVALID_ADDRESS;
>>>>>>>>> +
>>>>>>>>> +    ret = rmp_make_shared(pfn, PG_LEVEL_4K);
>>>>>>>>> +    if (ret)
>>>>>>>>> +        *rc = SEV_RET_INVALID_ADDRESS;
>>>>>>>>> +}
>>>>>>>>> +
>>>>>>>>> +static void snp_handle_guest_request(struct vcpu_svm *svm,
>>>>>>>>> gpa_t req_gpa, gpa_t resp_gpa)
>>>>>>>>> +{
>>>>>>>>> +    struct sev_data_snp_guest_request data = {0};
>>>>>>>>> +    struct kvm_vcpu *vcpu = &svm->vcpu;
>>>>>>>>> +    struct kvm *kvm = vcpu->kvm;
>>>>>>>>> +    struct kvm_sev_info *sev;
>>>>>>>>> +    unsigned long rc;
>>>>>>>>> +    int err;
>>>>>>>>> +
>>>>>>>>> +    if (!sev_snp_guest(vcpu->kvm)) {
>>>>>>>>> +        rc = SEV_RET_INVALID_GUEST;
>>>>>>>>> +        goto e_fail;
>>>>>>>>> +    }
>>>>>>>>> +
>>>>>>>>> +    sev = &to_kvm_svm(kvm)->sev_info;
>>>>>>>>> +
>>>>>>>>> +    mutex_lock(&sev->guest_req_lock);
>>>>>>>>> +
>>>>>>>>> +    rc = snp_setup_guest_buf(svm, &data, req_gpa, resp_gpa);
>>>>>>>>> +    if (rc)
>>>>>>>>> +        goto unlock;
>>>>>>>>> +
>>>>>>>>> +    rc = sev_issue_cmd(kvm, SEV_CMD_SNP_GUEST_REQUEST, &data,
>>>>>>>>> &err);
>>>>>>>>
>>>>>>>>
>>>>>>>> This one goes via sev_issue_cmd_external_user() and uses sev-fd...
>>>>>>>>
>>>>>>>>> +    if (rc)
>>>>>>>>> +        /* use the firmware error code */
>>>>>>>>> +        rc = err;
>>>>>>>>> +
>>>>>>>>> +    snp_cleanup_guest_buf(&data, &rc);
>>>>>>>>> +
>>>>>>>>> +unlock:
>>>>>>>>> +    mutex_unlock(&sev->guest_req_lock);
>>>>>>>>> +
>>>>>>>>> +e_fail:
>>>>>>>>> +    svm_set_ghcb_sw_exit_info_2(vcpu, rc);
>>>>>>>>> +}
>>>>>>>>> +
>>>>>>>>> +static void snp_handle_ext_guest_request(struct vcpu_svm *svm,
>>>>>>>>> gpa_t req_gpa, gpa_t resp_gpa)
>>>>>>>>> +{
>>>>>>>>> +    struct sev_data_snp_guest_request req = {0};
>>>>>>>>> +    struct kvm_vcpu *vcpu = &svm->vcpu;
>>>>>>>>> +    struct kvm *kvm = vcpu->kvm;
>>>>>>>>> +    unsigned long data_npages;
>>>>>>>>> +    struct kvm_sev_info *sev;
>>>>>>>>> +    unsigned long rc, err;
>>>>>>>>> +    u64 data_gpa;
>>>>>>>>> +
>>>>>>>>> +    if (!sev_snp_guest(vcpu->kvm)) {
>>>>>>>>> +        rc = SEV_RET_INVALID_GUEST;
>>>>>>>>> +        goto e_fail;
>>>>>>>>> +    }
>>>>>>>>> +
>>>>>>>>> +    sev = &to_kvm_svm(kvm)->sev_info;
>>>>>>>>> +
>>>>>>>>> +    data_gpa = vcpu->arch.regs[VCPU_REGS_RAX];
>>>>>>>>> +    data_npages = vcpu->arch.regs[VCPU_REGS_RBX];
>>>>>>>>> +
>>>>>>>>> +    if (!IS_ALIGNED(data_gpa, PAGE_SIZE)) {
>>>>>>>>> +        rc = SEV_RET_INVALID_ADDRESS;
>>>>>>>>> +        goto e_fail;
>>>>>>>>> +    }
>>>>>>>>> +
>>>>>>>>> +    mutex_lock(&sev->guest_req_lock);
>>>>>>>>> +
>>>>>>>>> +    rc = snp_setup_guest_buf(svm, &req, req_gpa, resp_gpa);
>>>>>>>>> +    if (rc)
>>>>>>>>> +        goto unlock;
>>>>>>>>> +
>>>>>>>>> +    rc = snp_guest_ext_guest_request(&req, (unsigned
>>>>>>>>> long)sev->snp_certs_data,
>>>>>>>>> +                     &data_npages, &err);
>>>>>>>>
>>>>>>>> but this one does not and jump straight to
>>>>>>>> drivers/crypto/ccp/sev-dev.c ignoring sev->fd. Why different?
>>>>>>>> Can these two be unified? sev_issue_cmd_external_user() only
>>>>>>>> checks if fd is /dev/sev which is hardly useful.
>>>>>>>>
>>>>>>>> "[PATCH RFC v7 32/64] crypto: ccp: Provide APIs to query
>>>>>>>> extended attestation report" added this one.
>>>>>>>
>>>>>>> SNP_EXT_GUEST_REQUEST additionally returns a certificate blob and
>>>>>>> that's why it goes through the CCP driver interface
>>>>>>> snp_guest_ext_guest_request() that is used to get both the report
>>>>>>> and certificate data/blob at the same time.
>>>>>>
>>>>>> True. I thought though that this calls for extending
>>>>>> sev_issue_cmd() to take care of these extra parameters rather than
>>>>>> just skipping the sev->fd.
>>>>>>
>>>>>>
>>>>>>> All the FW API calls on the KVM side go through sev_issue_cmd()
>>>>>>> and sev_issue_cmd_external_user() interfaces and that i believe
>>>>>>> uses sev->fd more of as a sanity check.
>>>>>>
>>>>>> Does not look like it:
>>>>>>
>>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/crypto/ccp/sev-dev.c?h=v6.2-rc3#n1290
>>>>>>
>>>>>> ===
>>>>>> int sev_issue_cmd_external_user(struct file *filep, unsigned int cmd,
>>>>>>                  void *data, int *error)
>>>>>> {
>>>>>>      if (!filep || filep->f_op != &sev_fops)
>>>>>>          return -EBADF;
>>>>>>
>>>>>>      return sev_do_cmd(cmd, data, error);
>>>>>> }
>>>>>> EXPORT_SYMBOL_GPL(sev_issue_cmd_external_user);
>>>>>> ===
>>>>>>
>>>>>> The only "more" is that it requires sev->fd to be a valid open fd,
>>>>>> what is the value in that? I may easily miss the bigger picture
>>>>>> here. Thanks,
>>>>>>
>>>>>>
>>>>>
>>>>> Have a look at following functions in drivers/crypto/ccp/sev-dev.c:
>>>>> sev_dev_init() and sev_misc_init().
>>>>>
>>>>> static int sev_misc_init(struct sev_device *sev)
>>>>> {
>>>>>          struct device *dev = sev->dev;
>>>>>          int ret;
>>>>>
>>>>>          /*
>>>>>           * SEV feature support can be detected on multiple devices
>>>>> but
>>>>>           * the SEV FW commands must be issued on the master. During
>>>>>           * probe, we do not know the master hence we create
>>>>> /dev/sev on
>>>>>           * the first device probe.
>>>>>           * sev_do_cmd() finds the right master device to which to
>>>>> issue
>>>>>           * the command to the firmware.
>>>>>       */
>>>>
>>>>
>>>> It is still a single /dev/sev node and the userspace cannot get it
>>>> wrong, it does not have to choose between (for instance) /dev/sev0
>>>> and /dev/sev1 on a 2 SOC system.
>>>>
>>>>> ...
>>>>> ...
>>>>>
>>>>> Hence, sev_issue_cmd_external_user() needs to ensure that the
>>>>> correct device (master device) is being operated upon and that's
>>>>> why there is the check for file operations matching sev_fops as
>>>>> below :
>>>>>
>>>>> int sev_issue_cmd_external_user(struct file *filep, unsigned int cmd,
>>>>>                                  void *data, int *error)
>>>>> {
>>>>>          if (!filep || filep->f_op != &sev_fops)
>>>>>                  return -EBADF;
>>>>> ..
>>>>> ..
>>>>>
>>>>> Essentially, sev->fd is the misc. device created for the master PSP
>>>>> device on which the SEV/SNP firmware commands are issued, hence,
>>>>> sev_issue_cmd() uses sev->fd.
>>>>
>>>> There is always just one fd which always uses psp_master, nothing
>>>> from that fd is used.
>>>
>>> It also ensures that we can only issue commands (sev_issue_cmd) after
>>> SEV/SNP guest has launched.
>>
>> I can open /dev/sev and start sending commands to the firmware with no
>> KVM running at all. Oh well, we discussed this offline :)
>>
>>> We don't have a valid fd to use before the guest launch. The file
>>> descriptor is passed as part of the guest launch flow, for example,
>>> in snp_launch_start().
>>>>
>>>> More to the point, if sev->fd is still important, why is it ok to
>>>> skip it for snp_handle_ext_guest_request()? Thanks,
>>>>
>>>>
>>> Then, we should do the same for snp_handle_ext_guest_request().
>>
>> Okay.
>>
>> This snp_handle_ext_guest_request() helper is for returning "Table 21.
>> ATTESTATION_REPORT Structure" along with the certificate(s) used to
>> sign the report: "This usage allows the attestation report and the
>> certificates required to verify the report to be returned at the same
>> time".
>>
>> I can see:
>> 1) KVM_SEV_SNP_{G,S}ET_CERTS ioctls on KVM VM and
>
> This allows the VMM to (optionally) supply per-VM certificates that the
> guest can use to validate the attestation report, instead of the guest
> requesting separately.
>
>> 2) SNP_{SET,GET}_EXT_CONFIG ioctls on /dev/sev
>
> This allows the VMM to (optionally) supply certificates used for all
> VMs, i.e., there is no need for per-VM certificates.
>
>> Both store the passed blob and neither communicate it to the firmware.
>> This makes me wonder - how does the attestation report (cooked by the
>> firmware) get signed with those certificates passed on by the HV
>> userspace?
>
> These are for use by the guest to validate the attestation report. It
> allows the guest to obtain the certificate information without having to
> use another method to request the certificates.
>
> By having this certificate store, the hypervisor can request the
> certificates from the KDS once, rather than every time a guest requests
> an attestation report.
>
>>
>> Also, the cached blob in /dev/sev seems redundand - the attestation
>> report is retuned for a specific guest so having a blob in the KVM VM
>> makes sense and KVM unconditionally reserves memory for it anyway. And
>> for the HV itself the blob is useless (?) so why bother with caching
>> it in /dev/sev.
>
> In general, the certificates are for the machine (VCEK, ASK, ARK), so
> they can be for all VMs on the machine. The per-VM blob allows a VMM to
> supply additional per-VM certficates, if it desires, but is not required.
>
>>
>> And GET ioctls() return what SET passed on (not something the firware
>> returned, for example), what is ever going to call SET? The userspace can
>
> As stated above, the firmware already has the information needed to sign
> the attestation report. The SET IOCTL is used to supply the certficates
> to the guest for validation of the attestation report.


Does the firmware have to have all certificates beforehand? How does the
firmware choose which certificate to use for a specific VM, or just
signs all reports with all certificates it knows?


> This reduces the
> traffic and complexity of the guest requesting the certficates from the
> KDS.

Guest <-> HV interaction is clear, I am only wondering about HV <-> FW.


>> as well cache what it passed and save a bit of the code/memory in the
>> kernel.
>>
>> btw SNP_{SET,GET}_EXT_CONFIG are documented in
>> Documentation/virt/coco/sev-guest.rst but implemented in
>> drivers/crypto/ccp/sev-dev.c (not sev-guest.c).
>>
>> What do I miss in the big picture here? :) Thanks,
>
> The reason for the extended request is to make the attestation request
> appear atomic to the guest. If you had to make two calls to request the
> information, in the future, when live migration is possible, there is no
> guarantee that the guest couldn't have been migrated in between the
> calls to obtain the certificates and the call to obtain the attestation
> report and thus validation of the attestation report could fail.




--
Alexey

2023-01-31 21:21:35

by Tom Lendacky

[permalink] [raw]
Subject: Re: [PATCH RFC v7 52/64] KVM: SVM: Provide support for SNP_GUEST_REQUEST NAE event

On 1/31/23 14:21, Alexey Kardashevskiy wrote:
> On 01/02/2023 03:23, Tom Lendacky wrote:
>> On 1/30/23 19:54, Alexey Kardashevskiy wrote:
>>> On 11/1/23 13:01, Kalra, Ashish wrote:
>>>> On 1/10/2023 6:48 PM, Alexey Kardashevskiy wrote:
>>>>> On 10/1/23 19:33, Kalra, Ashish wrote:
>>>>>> On 1/9/2023 8:28 PM, Alexey Kardashevskiy wrote:
>>>>>>> On 10/1/23 10:41, Kalra, Ashish wrote:
>>>>>>>> On 1/8/2023 9:33 PM, Alexey Kardashevskiy wrote:
>>>>>>>>> On 15/12/22 06:40, Michael Roth wrote:
>>>>>>>>>> From: Brijesh Singh <[email protected]>
>>>>>>>>>>
>>>>>>>>>> Version 2 of GHCB specification added the support for two SNP Guest
>>>>>>>>>> Request Message NAE events. The events allows for an SEV-SNP
>>>>>>>>>> guest to
>>>>>>>>>> make request to the SEV-SNP firmware through hypervisor using the
>>>>>>>>>> SNP_GUEST_REQUEST API define in the SEV-SNP firmware specification.
>>>>>>>>>>
>>>>>>>>>> The SNP_EXT_GUEST_REQUEST is similar to SNP_GUEST_REQUEST with the
>>>>>>>>>> difference of an additional certificate blob that can be passed
>>>>>>>>>> through
>>>>>>>>>> the SNP_SET_CONFIG ioctl defined in the CCP driver. The CCP driver
>>>>>>>>>> provides snp_guest_ext_guest_request() that is used by the KVM
>>>>>>>>>> to get
>>>>>>>>>> both the report and certificate data at once.
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Brijesh Singh <[email protected]>
>>>>>>>>>> Signed-off-by: Ashish Kalra <[email protected]>
>>>>>>>>>> Signed-off-by: Michael Roth <[email protected]>
>>>>>>>>>> ---

>>>
>>> And GET ioctls() return what SET passed on (not something the firware
>>> returned, for example), what is ever going to call SET? The userspace can
>>
>> As stated above, the firmware already has the information needed to sign
>> the attestation report. The SET IOCTL is used to supply the certficates
>> to the guest for validation of the attestation report.
>
>
> Does the firmware have to have all certificates beforehand? How does the
> firmware choose which certificate to use for a specific VM, or just signs
> all reports with all certificates it knows?

From the SNP API spec, the firmware uses the VCEK, which is derived from
chip-unique secrets, to sign the attestation report.

The guest can then use the returned VCEK certificate, the ASK certificate
and ARK certificate from the extended guest request to validate the
attestation report.

>
>
>> This reduces the traffic and complexity of the guest requesting the
>> certficates from the KDS.
>
> Guest <-> HV interaction is clear, I am only wondering about HV <-> FW.

I'm not sure what you mean here. The HV doesn't put the signing key in the
firmware, it is derived.

Thanks,
Tom

2023-01-31 21:27:13

by Alexander Graf

[permalink] [raw]
Subject: Re: [PATCH RFC v7 16/64] x86/sev: Add helper functions for RMPUPDATE and PSMASH instruction


On 14.12.22 20:40, Michael Roth wrote:
> From: Brijesh Singh <[email protected]>
>
> The RMPUPDATE instruction writes a new RMP entry in the RMP Table. The
> hypervisor will use the instruction to add pages to the RMP table. See
> APM3 for details on the instruction operations.
>
> The PSMASH instruction expands a 2MB RMP entry into a corresponding set
> of contiguous 4KB-Page RMP entries. The hypervisor will use this
> instruction to adjust the RMP entry without invalidating the previous
> RMP entry.
>
> Add the following external interface API functions:
>
> int psmash(u64 pfn);
> psmash is used to smash a 2MB aligned page into 4K
> pages while preserving the Validated bit in the RMP.
>
> int rmp_make_private(u64 pfn, u64 gpa, enum pg_level level, int asid, bool immutable);
> Used to assign a page to guest using the RMPUPDATE instruction.
>
> int rmp_make_shared(u64 pfn, enum pg_level level);
> Used to transition a page to hypervisor/shared state using the RMPUPDATE instruction.
>
> Signed-off-by: Ashish Kalra <[email protected]>
> Signed-off-by: Brijesh Singh <[email protected]>
> [mdr: add RMPUPDATE retry logic for transient FAIL_OVERLAP errors]
> Signed-off-by: Michael Roth <[email protected]>
> ---
> arch/x86/include/asm/sev.h | 24 ++++++++++
> arch/x86/kernel/sev.c | 95 ++++++++++++++++++++++++++++++++++++++
> 2 files changed, 119 insertions(+)
>
> diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
> index 8d3ce2ad27da..4eeedcaca593 100644
> --- a/arch/x86/include/asm/sev.h
> +++ b/arch/x86/include/asm/sev.h
> @@ -80,10 +80,15 @@ extern bool handle_vc_boot_ghcb(struct pt_regs *regs);
>
> /* Software defined (when rFlags.CF = 1) */
> #define PVALIDATE_FAIL_NOUPDATE 255
> +/* RMUPDATE detected 4K page and 2MB page overlap. */
> +#define RMPUPDATE_FAIL_OVERLAP 7
>
> /* RMP page size */
> #define RMP_PG_SIZE_4K 0
> +#define RMP_PG_SIZE_2M 1
> #define RMP_TO_X86_PG_LEVEL(level) (((level) == RMP_PG_SIZE_4K) ? PG_LEVEL_4K : PG_LEVEL_2M)
> +#define X86_TO_RMP_PG_LEVEL(level) (((level) == PG_LEVEL_4K) ? RMP_PG_SIZE_4K : RMP_PG_SIZE_2M)
> +
> #define RMPADJUST_VMSA_PAGE_BIT BIT(16)
>
> /* SNP Guest message request */
> @@ -133,6 +138,15 @@ struct snp_secrets_page_layout {
> u8 rsvd3[3840];
> } __packed;
>
> +struct rmp_state {
> + u64 gpa;
> + u8 assigned;
> + u8 pagesize;
> + u8 immutable;
> + u8 rsvd;
> + u32 asid;
> +} __packed;
> +
> #ifdef CONFIG_AMD_MEM_ENCRYPT
> extern struct static_key_false sev_es_enable_key;
> extern void __sev_es_ist_enter(struct pt_regs *regs);
> @@ -198,6 +212,9 @@ bool snp_init(struct boot_params *bp);
> void __init __noreturn snp_abort(void);
> int snp_issue_guest_request(u64 exit_code, struct snp_req_data *input, unsigned long *fw_err);
> int snp_lookup_rmpentry(u64 pfn, int *level);
> +int psmash(u64 pfn);
> +int rmp_make_private(u64 pfn, u64 gpa, enum pg_level level, int asid, bool immutable);
> +int rmp_make_shared(u64 pfn, enum pg_level level);
> #else
> static inline void sev_es_ist_enter(struct pt_regs *regs) { }
> static inline void sev_es_ist_exit(void) { }
> @@ -223,6 +240,13 @@ static inline int snp_issue_guest_request(u64 exit_code, struct snp_req_data *in
> return -ENOTTY;
> }
> static inline int snp_lookup_rmpentry(u64 pfn, int *level) { return 0; }
> +static inline int psmash(u64 pfn) { return -ENXIO; }
> +static inline int rmp_make_private(u64 pfn, u64 gpa, enum pg_level level, int asid,
> + bool immutable)
> +{
> + return -ENODEV;
> +}
> +static inline int rmp_make_shared(u64 pfn, enum pg_level level) { return -ENODEV; }
> #endif
>
> #endif
> diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
> index 706675561f49..67035d34adad 100644
> --- a/arch/x86/kernel/sev.c
> +++ b/arch/x86/kernel/sev.c
> @@ -2523,3 +2523,98 @@ int snp_lookup_rmpentry(u64 pfn, int *level)
> return !!rmpentry_assigned(e);
> }
> EXPORT_SYMBOL_GPL(snp_lookup_rmpentry);
> +
> +/*
> + * psmash is used to smash a 2MB aligned page into 4K
> + * pages while preserving the Validated bit in the RMP.
> + */
> +int psmash(u64 pfn)
> +{
> + unsigned long paddr = pfn << PAGE_SHIFT;
> + int ret;
> +
> + if (!pfn_valid(pfn))
> + return -EINVAL;


We (and many other clouds) use a neat trick to reduce the number of
struct pages Linux allocates for guest memory: In its simplest form, add
mem= to the kernel cmdline and mmap() /dev/mem to access the reserved
memory instead.

This means that the system covers more RAM than Linux contains, which
means pfn_valid() is no longer a good indication whether a page is
indeed valid. KVM handles this case fine, but this code does not.

Is there any particular reason why we need this check (and similar ones
below and in other RMP related patches) in the first place? I would
expect that PSMASH and friends return failure codes for invalid pfns.


Alex





Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879


2023-01-31 22:00:58

by Alexey Kardashevskiy

[permalink] [raw]
Subject: Re: [PATCH RFC v7 52/64] KVM: SVM: Provide support for SNP_GUEST_REQUEST NAE event



On 01/02/2023 08:21, Tom Lendacky wrote:
> On 1/31/23 14:21, Alexey Kardashevskiy wrote:
>> On 01/02/2023 03:23, Tom Lendacky wrote:
>>> On 1/30/23 19:54, Alexey Kardashevskiy wrote:
>>>> On 11/1/23 13:01, Kalra, Ashish wrote:
>>>>> On 1/10/2023 6:48 PM, Alexey Kardashevskiy wrote:
>>>>>> On 10/1/23 19:33, Kalra, Ashish wrote:
>>>>>>> On 1/9/2023 8:28 PM, Alexey Kardashevskiy wrote:
>>>>>>>> On 10/1/23 10:41, Kalra, Ashish wrote:
>>>>>>>>> On 1/8/2023 9:33 PM, Alexey Kardashevskiy wrote:
>>>>>>>>>> On 15/12/22 06:40, Michael Roth wrote:
>>>>>>>>>>> From: Brijesh Singh <[email protected]>
>>>>>>>>>>>
>>>>>>>>>>> Version 2 of GHCB specification added the support for two SNP
>>>>>>>>>>> Guest
>>>>>>>>>>> Request Message NAE events. The events allows for an SEV-SNP
>>>>>>>>>>> guest to
>>>>>>>>>>> make request to the SEV-SNP firmware through hypervisor using
>>>>>>>>>>> the
>>>>>>>>>>> SNP_GUEST_REQUEST API define in the SEV-SNP firmware
>>>>>>>>>>> specification.
>>>>>>>>>>>
>>>>>>>>>>> The SNP_EXT_GUEST_REQUEST is similar to SNP_GUEST_REQUEST
>>>>>>>>>>> with the
>>>>>>>>>>> difference of an additional certificate blob that can be
>>>>>>>>>>> passed through
>>>>>>>>>>> the SNP_SET_CONFIG ioctl defined in the CCP driver. The CCP
>>>>>>>>>>> driver
>>>>>>>>>>> provides snp_guest_ext_guest_request() that is used by the
>>>>>>>>>>> KVM to get
>>>>>>>>>>> both the report and certificate data at once.
>>>>>>>>>>>
>>>>>>>>>>> Signed-off-by: Brijesh Singh <[email protected]>
>>>>>>>>>>> Signed-off-by: Ashish Kalra <[email protected]>
>>>>>>>>>>> Signed-off-by: Michael Roth <[email protected]>
>>>>>>>>>>> ---
>
>>>>
>>>> And GET ioctls() return what SET passed on (not something the
>>>> firware returned, for example), what is ever going to call SET? The
>>>> userspace can
>>>
>>> As stated above, the firmware already has the information needed to
>>> sign the attestation report. The SET IOCTL is used to supply the
>>> certficates to the guest for validation of the attestation report.
>>
>>
>> Does the firmware have to have all certificates beforehand? How does
>> the firmware choose which certificate to use for a specific VM, or
>> just signs all reports with all certificates it knows?
>
> From the SNP API spec, the firmware uses the VCEK, which is derived
> from chip-unique secrets, to sign the attestation report.

Does the firmware derive it? How does the guest gets to know it?
(forgive me my ignorance)


> The guest can then use the returned VCEK certificate, the ASK
> certificate and ARK certificate from the extended guest request to
> validate the attestation report.

>>
>>
>>> This reduces the traffic and complexity of the guest requesting the
>>> certficates from the KDS.
>>
>> Guest <-> HV interaction is clear, I am only wondering about HV <-> FW.
>
> I'm not sure what you mean here. The HV doesn't put the signing key in
> the firmware, it is derived.


Those ioctls() are in the HV and they take certificates which then get
sent to the guest but not to the firmware. The firmware signs a report
with a key and the guest needs another half of it to verify the report.
Sadly I do not know cryptography enough.



--
Alexey

2023-01-31 22:42:32

by Tom Lendacky

[permalink] [raw]
Subject: Re: [PATCH RFC v7 52/64] KVM: SVM: Provide support for SNP_GUEST_REQUEST NAE event

On 1/31/23 16:00, Alexey Kardashevskiy wrote:
> On 01/02/2023 08:21, Tom Lendacky wrote:
>> On 1/31/23 14:21, Alexey Kardashevskiy wrote:
>>> On 01/02/2023 03:23, Tom Lendacky wrote:
>>>> On 1/30/23 19:54, Alexey Kardashevskiy wrote:
>>>>> On 11/1/23 13:01, Kalra, Ashish wrote:
>>>>>> On 1/10/2023 6:48 PM, Alexey Kardashevskiy wrote:
>>>>>>> On 10/1/23 19:33, Kalra, Ashish wrote:
>>>>>>>> On 1/9/2023 8:28 PM, Alexey Kardashevskiy wrote:
>>>>>>>>> On 10/1/23 10:41, Kalra, Ashish wrote:
>>>>>>>>>> On 1/8/2023 9:33 PM, Alexey Kardashevskiy wrote:
>>>>>>>>>>> On 15/12/22 06:40, Michael Roth wrote:
>>>>>>>>>>>> From: Brijesh Singh <[email protected]>
>>>>>>>>>>>>
>>>>>>>>>>>> Version 2 of GHCB specification added the support for two SNP
>>>>>>>>>>>> Guest
>>>>>>>>>>>> Request Message NAE events. The events allows for an SEV-SNP
>>>>>>>>>>>> guest to
>>>>>>>>>>>> make request to the SEV-SNP firmware through hypervisor using the
>>>>>>>>>>>> SNP_GUEST_REQUEST API define in the SEV-SNP firmware
>>>>>>>>>>>> specification.
>>>>>>>>>>>>
>>>>>>>>>>>> The SNP_EXT_GUEST_REQUEST is similar to SNP_GUEST_REQUEST with
>>>>>>>>>>>> the
>>>>>>>>>>>> difference of an additional certificate blob that can be
>>>>>>>>>>>> passed through
>>>>>>>>>>>> the SNP_SET_CONFIG ioctl defined in the CCP driver. The CCP
>>>>>>>>>>>> driver
>>>>>>>>>>>> provides snp_guest_ext_guest_request() that is used by the KVM
>>>>>>>>>>>> to get
>>>>>>>>>>>> both the report and certificate data at once.
>>>>>>>>>>>>
>>>>>>>>>>>> Signed-off-by: Brijesh Singh <[email protected]>
>>>>>>>>>>>> Signed-off-by: Ashish Kalra <[email protected]>
>>>>>>>>>>>> Signed-off-by: Michael Roth <[email protected]>
>>>>>>>>>>>> ---
>>
>>>>>
>>>>> And GET ioctls() return what SET passed on (not something the firware
>>>>> returned, for example), what is ever going to call SET? The userspace
>>>>> can
>>>>
>>>> As stated above, the firmware already has the information needed to
>>>> sign the attestation report. The SET IOCTL is used to supply the
>>>> certficates to the guest for validation of the attestation report.
>>>
>>>
>>> Does the firmware have to have all certificates beforehand? How does
>>> the firmware choose which certificate to use for a specific VM, or just
>>> signs all reports with all certificates it knows?
>>
>>  From the SNP API spec, the firmware uses the VCEK, which is derived
>> from chip-unique secrets, to sign the attestation report.
>
> Does the firmware derive it? How does the guest gets to know it?
> (forgive me my ignorance)

Yes, the firmware derives the private key. The guest doesn't know the
private key, it gets the VCEK certificate which has the public key and can
then validate the attestation report.

>
>
>> The guest can then use the returned VCEK certificate, the ASK
>> certificate and ARK certificate from the extended guest request to
>> validate the attestation report.
>
>>>
>>>
>>>> This reduces the traffic and complexity of the guest requesting the
>>>> certficates from the KDS.
>>>
>>> Guest <-> HV interaction is clear, I am only wondering about HV <-> FW.
>>
>> I'm not sure what you mean here. The HV doesn't put the signing key in
>> the firmware, it is derived.
>
>
> Those ioctls() are in the HV and they take certificates which then get
> sent to the guest but not to the firmware. The firmware signs a report
> with a key and the guest needs another half of it to verify the report.
> Sadly I do not know cryptography enough.

Correct, no need to send the certificates to the firmware. The certs have
the public key which can be used to verify the report signed with the
private key.

Thanks,
Tom

>
>
>

2023-02-01 17:14:47

by Kalra, Ashish

[permalink] [raw]
Subject: Re: [PATCH RFC v7 16/64] x86/sev: Add helper functions for RMPUPDATE and PSMASH instruction


On 1/31/2023 3:26 PM, Alexander Graf wrote:
>
> On 14.12.22 20:40, Michael Roth wrote:
>> From: Brijesh Singh <[email protected]>
>>
>> The RMPUPDATE instruction writes a new RMP entry in the RMP Table. The
>> hypervisor will use the instruction to add pages to the RMP table. See
>> APM3 for details on the instruction operations.
>>
>> The PSMASH instruction expands a 2MB RMP entry into a corresponding set
>> of contiguous 4KB-Page RMP entries. The hypervisor will use this
>> instruction to adjust the RMP entry without invalidating the previous
>> RMP entry.
>>
>> Add the following external interface API functions:
>>
>> int psmash(u64 pfn);
>> psmash is used to smash a 2MB aligned page into 4K
>> pages while preserving the Validated bit in the RMP.
>>
>> int rmp_make_private(u64 pfn, u64 gpa, enum pg_level level, int asid,
>> bool immutable);
>> Used to assign a page to guest using the RMPUPDATE instruction.
>>
>> int rmp_make_shared(u64 pfn, enum pg_level level);
>> Used to transition a page to hypervisor/shared state using the
>> RMPUPDATE instruction.
>>
>> Signed-off-by: Ashish Kalra <[email protected]>
>> Signed-off-by: Brijesh Singh <[email protected]>
>> [mdr: add RMPUPDATE retry logic for transient FAIL_OVERLAP errors]
>> Signed-off-by: Michael Roth <[email protected]>
>> ---
>>   arch/x86/include/asm/sev.h | 24 ++++++++++
>>   arch/x86/kernel/sev.c      | 95 ++++++++++++++++++++++++++++++++++++++
>>   2 files changed, 119 insertions(+)
>>
>> diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
>> index 8d3ce2ad27da..4eeedcaca593 100644
>> --- a/arch/x86/include/asm/sev.h
>> +++ b/arch/x86/include/asm/sev.h
>> @@ -80,10 +80,15 @@ extern bool handle_vc_boot_ghcb(struct pt_regs
>> *regs);
>>
>>   /* Software defined (when rFlags.CF = 1) */
>>   #define PVALIDATE_FAIL_NOUPDATE                255
>> +/* RMUPDATE detected 4K page and 2MB page overlap. */
>> +#define RMPUPDATE_FAIL_OVERLAP         7
>>
>>   /* RMP page size */
>>   #define RMP_PG_SIZE_4K                 0
>> +#define RMP_PG_SIZE_2M                 1
>>   #define RMP_TO_X86_PG_LEVEL(level)     (((level) == RMP_PG_SIZE_4K)
>> ? PG_LEVEL_4K : PG_LEVEL_2M)
>> +#define X86_TO_RMP_PG_LEVEL(level)     (((level) == PG_LEVEL_4K) ?
>> RMP_PG_SIZE_4K : RMP_PG_SIZE_2M)
>> +
>>   #define RMPADJUST_VMSA_PAGE_BIT                BIT(16)
>>
>>   /* SNP Guest message request */
>> @@ -133,6 +138,15 @@ struct snp_secrets_page_layout {
>>          u8 rsvd3[3840];
>>   } __packed;
>>
>> +struct rmp_state {
>> +       u64 gpa;
>> +       u8 assigned;
>> +       u8 pagesize;
>> +       u8 immutable;
>> +       u8 rsvd;
>> +       u32 asid;
>> +} __packed;
>> +
>>   #ifdef CONFIG_AMD_MEM_ENCRYPT
>>   extern struct static_key_false sev_es_enable_key;
>>   extern void __sev_es_ist_enter(struct pt_regs *regs);
>> @@ -198,6 +212,9 @@ bool snp_init(struct boot_params *bp);
>>   void __init __noreturn snp_abort(void);
>>   int snp_issue_guest_request(u64 exit_code, struct snp_req_data
>> *input, unsigned long *fw_err);
>>   int snp_lookup_rmpentry(u64 pfn, int *level);
>> +int psmash(u64 pfn);
>> +int rmp_make_private(u64 pfn, u64 gpa, enum pg_level level, int asid,
>> bool immutable);
>> +int rmp_make_shared(u64 pfn, enum pg_level level);
>>   #else
>>   static inline void sev_es_ist_enter(struct pt_regs *regs) { }
>>   static inline void sev_es_ist_exit(void) { }
>> @@ -223,6 +240,13 @@ static inline int snp_issue_guest_request(u64
>> exit_code, struct snp_req_data *in
>>          return -ENOTTY;
>>   }
>>   static inline int snp_lookup_rmpentry(u64 pfn, int *level) { return
>> 0; }
>> +static inline int psmash(u64 pfn) { return -ENXIO; }
>> +static inline int rmp_make_private(u64 pfn, u64 gpa, enum pg_level
>> level, int asid,
>> +                                  bool immutable)
>> +{
>> +       return -ENODEV;
>> +}
>> +static inline int rmp_make_shared(u64 pfn, enum pg_level level) {
>> return -ENODEV; }
>>   #endif
>>
>>   #endif
>> diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
>> index 706675561f49..67035d34adad 100644
>> --- a/arch/x86/kernel/sev.c
>> +++ b/arch/x86/kernel/sev.c
>> @@ -2523,3 +2523,98 @@ int snp_lookup_rmpentry(u64 pfn, int *level)
>>          return !!rmpentry_assigned(e);
>>   }
>>   EXPORT_SYMBOL_GPL(snp_lookup_rmpentry);
>> +
>> +/*
>> + * psmash is used to smash a 2MB aligned page into 4K
>> + * pages while preserving the Validated bit in the RMP.
>> + */
>> +int psmash(u64 pfn)
>> +{
>> +       unsigned long paddr = pfn << PAGE_SHIFT;
>> +       int ret;
>> +
>> +       if (!pfn_valid(pfn))
>> +               return -EINVAL;
>
>
> We (and many other clouds) use a neat trick to reduce the number of
> struct pages Linux allocates for guest memory: In its simplest form, add
> mem= to the kernel cmdline and mmap() /dev/mem to access the reserved
> memory instead.
>
> This means that the system covers more RAM than Linux contains, which
> means pfn_valid() is no longer a good indication whether a page is
> indeed valid. KVM handles this case fine, but this code does not.

Hmm...but then is also using max_pfn reliable ?

>
> Is there any particular reason why we need this check (and similar ones
> below and in other RMP related patches) in the first place. I would > expect that PSMASH and friends return failure codes for invalid pfns.
>

Yes, PSMASH does out of bounds check on the input SPA and additionally
checks if SPA is 2M aligned, so guess we can rely on using PSMASH
failing on invalid pfns.

Thanks,
Ashish

2023-02-01 17:21:02

by Alexander Graf

[permalink] [raw]
Subject: Re: [PATCH RFC v7 16/64] x86/sev: Add helper functions for RMPUPDATE and PSMASH instruction


On 01.02.23 18:14, Kalra, Ashish wrote:
>
> On 1/31/2023 3:26 PM, Alexander Graf wrote:
>>
>> On 14.12.22 20:40, Michael Roth wrote:
>>> From: Brijesh Singh <[email protected]>
>>>
>>> The RMPUPDATE instruction writes a new RMP entry in the RMP Table. The
>>> hypervisor will use the instruction to add pages to the RMP table. See
>>> APM3 for details on the instruction operations.
>>>
>>> The PSMASH instruction expands a 2MB RMP entry into a corresponding set
>>> of contiguous 4KB-Page RMP entries. The hypervisor will use this
>>> instruction to adjust the RMP entry without invalidating the previous
>>> RMP entry.
>>>
>>> Add the following external interface API functions:
>>>
>>> int psmash(u64 pfn);
>>> psmash is used to smash a 2MB aligned page into 4K
>>> pages while preserving the Validated bit in the RMP.
>>>
>>> int rmp_make_private(u64 pfn, u64 gpa, enum pg_level level, int asid,
>>> bool immutable);
>>> Used to assign a page to guest using the RMPUPDATE instruction.
>>>
>>> int rmp_make_shared(u64 pfn, enum pg_level level);
>>> Used to transition a page to hypervisor/shared state using the
>>> RMPUPDATE instruction.
>>>
>>> Signed-off-by: Ashish Kalra <[email protected]>
>>> Signed-off-by: Brijesh Singh <[email protected]>
>>> [mdr: add RMPUPDATE retry logic for transient FAIL_OVERLAP errors]
>>> Signed-off-by: Michael Roth <[email protected]>
>>> ---
>>>   arch/x86/include/asm/sev.h | 24 ++++++++++
>>>   arch/x86/kernel/sev.c      | 95
>>> ++++++++++++++++++++++++++++++++++++++
>>>   2 files changed, 119 insertions(+)
>>>
>>> diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
>>> index 8d3ce2ad27da..4eeedcaca593 100644
>>> --- a/arch/x86/include/asm/sev.h
>>> +++ b/arch/x86/include/asm/sev.h
>>> @@ -80,10 +80,15 @@ extern bool handle_vc_boot_ghcb(struct pt_regs
>>> *regs);
>>>
>>>   /* Software defined (when rFlags.CF = 1) */
>>>   #define PVALIDATE_FAIL_NOUPDATE                255
>>> +/* RMUPDATE detected 4K page and 2MB page overlap. */
>>> +#define RMPUPDATE_FAIL_OVERLAP         7
>>>
>>>   /* RMP page size */
>>>   #define RMP_PG_SIZE_4K                 0
>>> +#define RMP_PG_SIZE_2M                 1
>>>   #define RMP_TO_X86_PG_LEVEL(level)     (((level) == RMP_PG_SIZE_4K)
>>> ? PG_LEVEL_4K : PG_LEVEL_2M)
>>> +#define X86_TO_RMP_PG_LEVEL(level)     (((level) == PG_LEVEL_4K) ?
>>> RMP_PG_SIZE_4K : RMP_PG_SIZE_2M)
>>> +
>>>   #define RMPADJUST_VMSA_PAGE_BIT                BIT(16)
>>>
>>>   /* SNP Guest message request */
>>> @@ -133,6 +138,15 @@ struct snp_secrets_page_layout {
>>>          u8 rsvd3[3840];
>>>   } __packed;
>>>
>>> +struct rmp_state {
>>> +       u64 gpa;
>>> +       u8 assigned;
>>> +       u8 pagesize;
>>> +       u8 immutable;
>>> +       u8 rsvd;
>>> +       u32 asid;
>>> +} __packed;
>>> +
>>>   #ifdef CONFIG_AMD_MEM_ENCRYPT
>>>   extern struct static_key_false sev_es_enable_key;
>>>   extern void __sev_es_ist_enter(struct pt_regs *regs);
>>> @@ -198,6 +212,9 @@ bool snp_init(struct boot_params *bp);
>>>   void __init __noreturn snp_abort(void);
>>>   int snp_issue_guest_request(u64 exit_code, struct snp_req_data
>>> *input, unsigned long *fw_err);
>>>   int snp_lookup_rmpentry(u64 pfn, int *level);
>>> +int psmash(u64 pfn);
>>> +int rmp_make_private(u64 pfn, u64 gpa, enum pg_level level, int asid,
>>> bool immutable);
>>> +int rmp_make_shared(u64 pfn, enum pg_level level);
>>>   #else
>>>   static inline void sev_es_ist_enter(struct pt_regs *regs) { }
>>>   static inline void sev_es_ist_exit(void) { }
>>> @@ -223,6 +240,13 @@ static inline int snp_issue_guest_request(u64
>>> exit_code, struct snp_req_data *in
>>>          return -ENOTTY;
>>>   }
>>>   static inline int snp_lookup_rmpentry(u64 pfn, int *level) { return
>>> 0; }
>>> +static inline int psmash(u64 pfn) { return -ENXIO; }
>>> +static inline int rmp_make_private(u64 pfn, u64 gpa, enum pg_level
>>> level, int asid,
>>> +                                  bool immutable)
>>> +{
>>> +       return -ENODEV;
>>> +}
>>> +static inline int rmp_make_shared(u64 pfn, enum pg_level level) {
>>> return -ENODEV; }
>>>   #endif
>>>
>>>   #endif
>>> diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
>>> index 706675561f49..67035d34adad 100644
>>> --- a/arch/x86/kernel/sev.c
>>> +++ b/arch/x86/kernel/sev.c
>>> @@ -2523,3 +2523,98 @@ int snp_lookup_rmpentry(u64 pfn, int *level)
>>>          return !!rmpentry_assigned(e);
>>>   }
>>>   EXPORT_SYMBOL_GPL(snp_lookup_rmpentry);
>>> +
>>> +/*
>>> + * psmash is used to smash a 2MB aligned page into 4K
>>> + * pages while preserving the Validated bit in the RMP.
>>> + */
>>> +int psmash(u64 pfn)
>>> +{
>>> +       unsigned long paddr = pfn << PAGE_SHIFT;
>>> +       int ret;
>>> +
>>> +       if (!pfn_valid(pfn))
>>> +               return -EINVAL;
>>
>>
>> We (and many other clouds) use a neat trick to reduce the number of
>> struct pages Linux allocates for guest memory: In its simplest form, add
>> mem= to the kernel cmdline and mmap() /dev/mem to access the reserved
>> memory instead.
>>
>> This means that the system covers more RAM than Linux contains, which
>> means pfn_valid() is no longer a good indication whether a page is
>> indeed valid. KVM handles this case fine, but this code does not.
>
> Hmm...but then is also using max_pfn reliable ?


I would expect it to not be reliable as it only looks at E820_TYPE_RAM,
yes. Do you rely on max_pfn anywhere?


>
>>
>> Is there any particular reason why we need this check (and similar ones
>> below and in other RMP related patches) in the first place. I would >
>> expect that PSMASH and friends return failure codes for invalid pfns.
>>
>
> Yes, PSMASH does out of bounds check on the input SPA and additionally
> checks if SPA is 2M aligned, so guess we can rely on using PSMASH
> failing on invalid pfns.


Perfect, please remove all the superfluous checks then. If you want to
make our life easy, I'd recommend you try to try the patch set with mem=
passed on the host and tell QEMU to mmap() /dev/mem for guest RAM. That
way you should be able to find any other pitfalls :)


Alex





Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879


2023-02-01 18:22:36

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH RFC v7 11/64] KVM: SEV: Support private pages in LAUNCH_UPDATE_DATA

On Wed, Dec 14, 2022 at 01:40:03PM -0600, Michael Roth wrote:
> From: Nikunj A Dadhania <[email protected]>
>
> Pre-boot guest payload needs to be encrypted and VMM has copied it

"has to have copied it over" I presume?

> over to the private-fd. Add support to get the pfn from the memfile fd
> for encrypting the payload in-place.

Why is that a good thing?

I guess with UPM you're supposed to get the PFN of that encrypted guest
payload from that memslot.

IOW, such commit messages are too laconic for my taste and you could try
to explain more why this is happening instead of me having to
"reverse-deduce" what you're doing from the code...

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2023-02-01 18:40:14

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH RFC v7 13/64] x86/cpufeatures: Add SEV-SNP CPU feature

On Wed, Dec 14, 2022 at 01:40:05PM -0600, Michael Roth wrote:
> From: Brijesh Singh <[email protected]>
>
> Add CPU feature detection for Secure Encrypted Virtualization with
> Secure Nested Paging. This feature adds a strong memory integrity
> protection to help prevent malicious hypervisor-based attacks like
> data replay, memory re-mapping, and more.
>
> Link: https://lore.kernel.org/all/YrGINaPc3cojG6%[email protected]/

That points to some review feedback I've given - dunno if it is
relevant.

> Signed-off-by: Brijesh Singh <[email protected]>
> Signed-off-by: Jarkko Sakkinen <[email protected]>

I read this as Jarkko has handled this patch too. Is that the case?

> Signed-off-by: Ashish Kalra <[email protected]>
> Signed-off-by: Michael Roth <[email protected]>

Those last two are ok - you took ovef from Ashish.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2023-02-02 08:09:48

by Nikunj A. Dadhania

[permalink] [raw]
Subject: Re: [PATCH RFC v7 11/64] KVM: SEV: Support private pages in LAUNCH_UPDATE_DATA



On 01/02/23 23:52, Borislav Petkov wrote:
> On Wed, Dec 14, 2022 at 01:40:03PM -0600, Michael Roth wrote:
>> From: Nikunj A Dadhania <[email protected]>
>>
>> Pre-boot guest payload needs to be encrypted and VMM has copied it
>
> "has to have copied it over" I presume?

True, payload is being copied in patch 10/64 now.

>> over to the private-fd. Add support to get the pfn from the memfile fd
>> for encrypting the payload in-place.
>
> Why is that a good thing?
>
> I guess with UPM you're supposed to get the PFN of that encrypted guest
> payload from that memslot.
>
> IOW, such commit messages are too laconic for my taste and you could try
> to explain more why this is happening instead of me having to
> "reverse-deduce" what you're doing from the code...
>

I am updating the SEV related patches, will add more details in commit and send.

Regards
Nikunj

2023-02-02 11:17:18

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH RFC v7 14/64] x86/sev: Add the host SEV-SNP initialization support

On Wed, Dec 14, 2022 at 01:40:06PM -0600, Michael Roth wrote:
> From: Brijesh Singh <[email protected]>
>
> The memory integrity guarantees of SEV-SNP are enforced through a new
> structure called the Reverse Map Table (RMP). The RMP is a single data
> structure shared across the system that contains one entry for every 4K
> page of DRAM that may be used by SEV-SNP VMs. The goal of RMP is to
> track the owner of each page of memory. Pages of memory can be owned by
> the hypervisor, owned by a specific VM or owned by the AMD-SP. See APM2
> section 15.36.3 for more detail on RMP.
>
> The RMP table is used to enforce access control to memory. The table itself
> is not directly writable by the software. New CPU instructions (RMPUPDATE,
> PVALIDATE, RMPADJUST) are used to manipulate the RMP entries.
>
> Based on the platform configuration, the BIOS reserves the memory used
> for the RMP table. The start and end address of the RMP table must be
> queried by reading the RMP_BASE and RMP_END MSRs. If the RMP_BASE and
> RMP_END are not set then disable the SEV-SNP feature.
>
> The SEV-SNP feature is enabled only after the RMP table is successfully
> initialized.
>
> Also set SYSCFG.MFMD when enabling SNP as SEV-SNP FW >= 1.51 requires
> that SYSCFG.MFMD must be se

set.
>
> RMP table entry format is non-architectural and it can vary by processor
> and is defined by the PPR. Restrict SNP support on the known CPU model
> and family for which the RMP table entry format is currently defined for.
>
> Signed-off-by: Brijesh Singh <[email protected]>
> Signed-off-b: Ashish Kalra <[email protected]>
^^

Somebody ate a 'y' here. :)

> Signed-off-by: Michael Roth <[email protected]>
> ---
> arch/x86/include/asm/disabled-features.h | 8 +-
> arch/x86/include/asm/msr-index.h | 11 +-
> arch/x86/kernel/sev.c | 180 +++++++++++++++++++++++
> 3 files changed, 197 insertions(+), 2 deletions(-)

...

> +static __init int __snp_rmptable_init(void)

Why is this one carved out of snp_rmptable_init() ?

> +{
> + u64 rmp_base, sz;
> + void *start;
> + u64 val;
> +
> + if (!get_rmptable_info(&rmp_base, &sz))
> + return 1;
> +
> + start = memremap(rmp_base, sz, MEMREMAP_WB);
> + if (!start) {
> + pr_err("Failed to map RMP table addr 0x%llx size 0x%llx\n", rmp_base, sz);
> + return 1;
> + }
> +
> + /*
> + * Check if SEV-SNP is already enabled, this can happen in case of
> + * kexec boot.
> + */
> + rdmsrl(MSR_AMD64_SYSCFG, val);
> + if (val & MSR_AMD64_SYSCFG_SNP_EN)
> + goto skip_enable;
> +
> + /* Initialize the RMP table to zero */

Useless comment.

> + memset(start, 0, sz);
> +
> + /* Flush the caches to ensure that data is written before SNP is enabled. */
> + wbinvd_on_all_cpus();
> +
> + /* MFDM must be enabled on all the CPUs prior to enabling SNP. */
> + on_each_cpu(mfd_enable, NULL, 1);
> +
> + /* Enable SNP on all CPUs. */
> + on_each_cpu(snp_enable, NULL, 1);

What happens if someone boots the machine with maxcpus=N, where N is
less than all CPUs on the machine? The hotplug notifier should handle it
but have you checked that it works fine?

> +skip_enable:
> + rmptable_start = (unsigned long)start;
> + rmptable_end = rmptable_start + sz - 1;
> +
> + return 0;
> +}
> +
> +static int __init snp_rmptable_init(void)
> +{
> + int family, model;
> +
> + if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
> + return 0;
> +
> + family = boot_cpu_data.x86;
> + model = boot_cpu_data.x86_model;

Looks useless - just use boot_cpu_data directly below.

> +
> + /*
> + * RMP table entry format is not architectural and it can vary by processor and
> + * is defined by the per-processor PPR. Restrict SNP support on the known CPU
> + * model and family for which the RMP table entry format is currently defined for.
> + */
> + if (family != 0x19 || model > 0xaf)
> + goto nosnp;
> +
> + if (amd_iommu_snp_enable())
> + goto nosnp;
> +
> + if (__snp_rmptable_init())
> + goto nosnp;
> +
> + cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "x86/rmptable_init:online", __snp_enable, NULL);
> +
> + return 0;
> +
> +nosnp:
> + setup_clear_cpu_cap(X86_FEATURE_SEV_SNP);
> + return -ENOSYS;
> +}

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2023-02-02 19:04:38

by Kalra, Ashish

[permalink] [raw]
Subject: Re: [PATCH RFC v7 16/64] x86/sev: Add helper functions for RMPUPDATE and PSMASH instruction

On 2/1/2023 11:20 AM, Alexander Graf wrote:
>
> On 01.02.23 18:14, Kalra, Ashish wrote:
>>
>> On 1/31/2023 3:26 PM, Alexander Graf wrote:
>>>
>>> On 14.12.22 20:40, Michael Roth wrote:
>>>> From: Brijesh Singh <[email protected]>
>>>>
>>>> The RMPUPDATE instruction writes a new RMP entry in the RMP Table. The
>>>> hypervisor will use the instruction to add pages to the RMP table. See
>>>> APM3 for details on the instruction operations.
>>>>
>>>> The PSMASH instruction expands a 2MB RMP entry into a corresponding set
>>>> of contiguous 4KB-Page RMP entries. The hypervisor will use this
>>>> instruction to adjust the RMP entry without invalidating the previous
>>>> RMP entry.
>>>>
>>>> Add the following external interface API functions:
>>>>
>>>> int psmash(u64 pfn);
>>>> psmash is used to smash a 2MB aligned page into 4K
>>>> pages while preserving the Validated bit in the RMP.
>>>>
>>>> int rmp_make_private(u64 pfn, u64 gpa, enum pg_level level, int asid,
>>>> bool immutable);
>>>> Used to assign a page to guest using the RMPUPDATE instruction.
>>>>
>>>> int rmp_make_shared(u64 pfn, enum pg_level level);
>>>> Used to transition a page to hypervisor/shared state using the
>>>> RMPUPDATE instruction.
>>>>
>>>> Signed-off-by: Ashish Kalra <[email protected]>
>>>> Signed-off-by: Brijesh Singh <[email protected]>
>>>> [mdr: add RMPUPDATE retry logic for transient FAIL_OVERLAP errors]
>>>> Signed-off-by: Michael Roth <[email protected]>
>>>> ---
>>>>   arch/x86/include/asm/sev.h | 24 ++++++++++
>>>>   arch/x86/kernel/sev.c      | 95
>>>> ++++++++++++++++++++++++++++++++++++++
>>>>   2 files changed, 119 insertions(+)
>>>>
>>>> diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
>>>> index 8d3ce2ad27da..4eeedcaca593 100644
>>>> --- a/arch/x86/include/asm/sev.h
>>>> +++ b/arch/x86/include/asm/sev.h
>>>> @@ -80,10 +80,15 @@ extern bool handle_vc_boot_ghcb(struct pt_regs
>>>> *regs);
>>>>
>>>>   /* Software defined (when rFlags.CF = 1) */
>>>>   #define PVALIDATE_FAIL_NOUPDATE                255
>>>> +/* RMUPDATE detected 4K page and 2MB page overlap. */
>>>> +#define RMPUPDATE_FAIL_OVERLAP         7
>>>>
>>>>   /* RMP page size */
>>>>   #define RMP_PG_SIZE_4K                 0
>>>> +#define RMP_PG_SIZE_2M                 1
>>>>   #define RMP_TO_X86_PG_LEVEL(level)     (((level) == RMP_PG_SIZE_4K)
>>>> ? PG_LEVEL_4K : PG_LEVEL_2M)
>>>> +#define X86_TO_RMP_PG_LEVEL(level)     (((level) == PG_LEVEL_4K) ?
>>>> RMP_PG_SIZE_4K : RMP_PG_SIZE_2M)
>>>> +
>>>>   #define RMPADJUST_VMSA_PAGE_BIT                BIT(16)
>>>>
>>>>   /* SNP Guest message request */
>>>> @@ -133,6 +138,15 @@ struct snp_secrets_page_layout {
>>>>          u8 rsvd3[3840];
>>>>   } __packed;
>>>>
>>>> +struct rmp_state {
>>>> +       u64 gpa;
>>>> +       u8 assigned;
>>>> +       u8 pagesize;
>>>> +       u8 immutable;
>>>> +       u8 rsvd;
>>>> +       u32 asid;
>>>> +} __packed;
>>>> +
>>>>   #ifdef CONFIG_AMD_MEM_ENCRYPT
>>>>   extern struct static_key_false sev_es_enable_key;
>>>>   extern void __sev_es_ist_enter(struct pt_regs *regs);
>>>> @@ -198,6 +212,9 @@ bool snp_init(struct boot_params *bp);
>>>>   void __init __noreturn snp_abort(void);
>>>>   int snp_issue_guest_request(u64 exit_code, struct snp_req_data
>>>> *input, unsigned long *fw_err);
>>>>   int snp_lookup_rmpentry(u64 pfn, int *level);
>>>> +int psmash(u64 pfn);
>>>> +int rmp_make_private(u64 pfn, u64 gpa, enum pg_level level, int asid,
>>>> bool immutable);
>>>> +int rmp_make_shared(u64 pfn, enum pg_level level);
>>>>   #else
>>>>   static inline void sev_es_ist_enter(struct pt_regs *regs) { }
>>>>   static inline void sev_es_ist_exit(void) { }
>>>> @@ -223,6 +240,13 @@ static inline int snp_issue_guest_request(u64
>>>> exit_code, struct snp_req_data *in
>>>>          return -ENOTTY;
>>>>   }
>>>>   static inline int snp_lookup_rmpentry(u64 pfn, int *level) { return
>>>> 0; }
>>>> +static inline int psmash(u64 pfn) { return -ENXIO; }
>>>> +static inline int rmp_make_private(u64 pfn, u64 gpa, enum pg_level
>>>> level, int asid,
>>>> +                                  bool immutable)
>>>> +{
>>>> +       return -ENODEV;
>>>> +}
>>>> +static inline int rmp_make_shared(u64 pfn, enum pg_level level) {
>>>> return -ENODEV; }
>>>>   #endif
>>>>
>>>>   #endif
>>>> diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
>>>> index 706675561f49..67035d34adad 100644
>>>> --- a/arch/x86/kernel/sev.c
>>>> +++ b/arch/x86/kernel/sev.c
>>>> @@ -2523,3 +2523,98 @@ int snp_lookup_rmpentry(u64 pfn, int *level)
>>>>          return !!rmpentry_assigned(e);
>>>>   }
>>>>   EXPORT_SYMBOL_GPL(snp_lookup_rmpentry);
>>>> +
>>>> +/*
>>>> + * psmash is used to smash a 2MB aligned page into 4K
>>>> + * pages while preserving the Validated bit in the RMP.
>>>> + */
>>>> +int psmash(u64 pfn)
>>>> +{
>>>> +       unsigned long paddr = pfn << PAGE_SHIFT;
>>>> +       int ret;
>>>> +
>>>> +       if (!pfn_valid(pfn))
>>>> +               return -EINVAL;
>>>
>>>
>>> We (and many other clouds) use a neat trick to reduce the number of
>>> struct pages Linux allocates for guest memory: In its simplest form, add
>>> mem= to the kernel cmdline and mmap() /dev/mem to access the reserved
>>> memory instead.
>>>
>>> This means that the system covers more RAM than Linux contains, which
>>> means pfn_valid() is no longer a good indication whether a page is
>>> indeed valid. KVM handles this case fine, but this code does not.
>>
>> Hmm...but then is also using max_pfn reliable ?
>
>
> I would expect it to not be reliable as it only looks at E820_TYPE_RAM,
> yes. Do you rely on max_pfn anywhere?
>

We use it to check if the RMP table is covering the whole system RAM,
to get the max. addressable PFN, which should be fine.

Thanks,
Ashish

2023-02-06 03:14:11

by Alexey Kardashevskiy

[permalink] [raw]
Subject: [PATCH kernel] KVM: SVM: Fix SVM_VMGEXIT_EXT_GUEST_REQUEST to follow the rest of API

When SVM VM is up, KVM uses sev_issue_cmd_external_user() with an open
/dev/sev fd which ensures that the SVM initialization was done correctly.
The only helper not following the scheme is snp_guest_ext_guest_request()
which bypasses the fd check.

Change the SEV API to require passing a file.

Handle errors with care in the SNP Extended Guest Request handler
(snp_handle_ext_guest_request()) as there are actually 3 types of errors:
- @rc: return code SEV device's sev_issue_cmd() which is int==int32;
- @err: a psp return code in sev_issue_cmd(), also int==int32 (probably
a mistake but kvm_sev_cmd::error uses __u32 for some time now);
- (added by this) @exitcode: GHCB's exit code sw_exit_info_2, uint64.

Use the right types, remove cast to int* and return ENOSPC from SEV
device for converting it to the GHCB's exit code
SNP_GUEST_REQ_INVALID_LEN==BIT(32).

Fixes: 17f1d0c995ac ("KVM: SVM: Provide support for SNP_GUEST_REQUEST NAE event")
While at this, preserve the original error in snp_cleanup_guest_buf().

Signed-off-by: Alexey Kardashevskiy <[email protected]>
---

This can easily be squashed into what it fixes.

The patch is made for
https://github.com/AMDESE/linux/commits/upmv10-host-snp-v7-rfc
---
include/linux/psp-sev.h | 62 +++++++++++---------
arch/x86/kvm/svm/sev.c | 50 +++++++++++-----
drivers/crypto/ccp/sev-dev.c | 11 ++--
3 files changed, 73 insertions(+), 50 deletions(-)

diff --git a/include/linux/psp-sev.h b/include/linux/psp-sev.h
index 970a9de0ed20..466b1a6e7d7b 100644
--- a/include/linux/psp-sev.h
+++ b/include/linux/psp-sev.h
@@ -848,6 +848,36 @@ int sev_platform_status(struct sev_user_data_status *status, int *error);
int sev_issue_cmd_external_user(struct file *filep, unsigned int id,
void *data, int *error);

+/**
+ * sev_issue_cmd_external_user_cert - issue SEV command by other driver with a file
+ * handle and return certificates set onto SEV device via SNP_SET_EXT_CONFIG;
+ * intended for use by the SNP extended guest request command defined
+ * in the GHCB specification.
+ *
+ * @filep - SEV device file pointer
+ * @cmd - command to issue
+ * @data - command buffer
+ * @vaddr: address where the certificate blob need to be copied.
+ * @npages: number of pages for the certificate blob.
+ * If the specified page count is less than the certificate blob size, then the
+ * required page count is returned with ENOSPC error code.
+ * If the specified page count is more than the certificate blob size, then
+ * page count is updated to reflect the amount of valid data copied in the
+ * vaddr.
+ *
+ * @error: SEV command return code
+ *
+ * Returns:
+ * 0 if the sev successfully processed the command
+ * -%ENODEV if the sev device is not available
+ * -%ENOTSUPP if the sev does not support SEV
+ * -%ETIMEDOUT if the sev command timed out
+ * -%EIO if the sev returned a non-zero return code
+ * -%ENOSPC if the specified page count is too small
+ */
+int sev_issue_cmd_external_user_cert(struct file *filep, unsigned int cmd, void *data,
+ unsigned long vaddr, unsigned long *npages, int *error);
+
/**
* sev_guest_deactivate - perform SEV DEACTIVATE command
*
@@ -945,32 +975,6 @@ void snp_free_firmware_page(void *addr);
*/
void snp_mark_pages_offline(unsigned long pfn, unsigned int npages);

-/**
- * snp_guest_ext_guest_request - perform the SNP extended guest request command
- * defined in the GHCB specification.
- *
- * @data: the input guest request structure
- * @vaddr: address where the certificate blob need to be copied.
- * @npages: number of pages for the certificate blob.
- * If the specified page count is less than the certificate blob size, then the
- * required page count is returned with error code defined in the GHCB spec.
- * If the specified page count is more than the certificate blob size, then
- * page count is updated to reflect the amount of valid data copied in the
- * vaddr.
- *
- * @sev_ret: sev command return code
- *
- * Returns:
- * 0 if the sev successfully processed the command
- * -%ENODEV if the sev device is not available
- * -%ENOTSUPP if the sev does not support SEV
- * -%ETIMEDOUT if the sev command timed out
- * -%EIO if the sev returned a non-zero return code
- */
-int snp_guest_ext_guest_request(struct sev_data_snp_guest_request *data,
- unsigned long vaddr, unsigned long *npages,
- unsigned long *error);
-
#else /* !CONFIG_CRYPTO_DEV_SP_PSP */

static inline int
@@ -1013,9 +1017,9 @@ static inline void *snp_alloc_firmware_page(gfp_t mask)

static inline void snp_free_firmware_page(void *addr) { }

-static inline int snp_guest_ext_guest_request(struct sev_data_snp_guest_request *data,
- unsigned long vaddr, unsigned long *n,
- unsigned long *error)
+static inline int sev_issue_cmd_external_user_cert(struct file *filep, unsigned int cmd,
+ void *data, unsigned long vaddr,
+ unsigned long *npages, int *error)
{
return -ENODEV;
}
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index d0e58cffd1ed..b268c35efab4 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -394,6 +394,23 @@ static int sev_issue_cmd(struct kvm *kvm, int id, void *data, int *error)
return __sev_issue_cmd(sev->fd, id, data, error);
}

+static int sev_issue_cmd_cert(struct kvm *kvm, int id, void *data,
+ unsigned long vaddr, unsigned long *npages, int *error)
+{
+ struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+ struct fd f;
+ int ret;
+
+ f = fdget(sev->fd);
+ if (!f.file)
+ return -EBADF;
+
+ ret = sev_issue_cmd_external_user_cert(f.file, id, data, vaddr, npages, error);
+
+ fdput(f);
+ return ret;
+}
+
static int sev_launch_start(struct kvm *kvm, struct kvm_sev_cmd *argp)
{
struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
@@ -3587,11 +3604,11 @@ static void snp_cleanup_guest_buf(struct sev_data_snp_guest_request *data, unsig
int ret;

ret = snp_page_reclaim(pfn);
- if (ret)
+ if (ret && (*rc == SEV_RET_SUCCESS))
*rc = SEV_RET_INVALID_ADDRESS;

ret = rmp_make_shared(pfn, PG_LEVEL_4K);
- if (ret)
+ if (ret && (*rc == SEV_RET_SUCCESS))
*rc = SEV_RET_INVALID_ADDRESS;
}

@@ -3638,8 +3655,9 @@ static void snp_handle_ext_guest_request(struct vcpu_svm *svm, gpa_t req_gpa, gp
struct kvm *kvm = vcpu->kvm;
unsigned long data_npages;
struct kvm_sev_info *sev;
- unsigned long rc, err;
+ unsigned long exitcode;
u64 data_gpa;
+ int err, rc;

if (!sev_snp_guest(vcpu->kvm)) {
rc = SEV_RET_INVALID_GUEST;
@@ -3669,17 +3687,16 @@ static void snp_handle_ext_guest_request(struct vcpu_svm *svm, gpa_t req_gpa, gp
*/
if (sev->snp_certs_len) {
if ((data_npages << PAGE_SHIFT) < sev->snp_certs_len) {
- rc = -EINVAL;
- err = SNP_GUEST_REQ_INVALID_LEN;
+ rc = -ENOSPC;
goto datalen;
}
- rc = sev_issue_cmd(kvm, SEV_CMD_SNP_GUEST_REQUEST, &req,
- (int *)&err);
+ rc = sev_issue_cmd(kvm, SEV_CMD_SNP_GUEST_REQUEST, &req, &err);
} else {
- rc = snp_guest_ext_guest_request(&req,
- (unsigned long)sev->snp_certs_data,
- &data_npages, &err);
+ rc = sev_issue_cmd_cert(kvm, SEV_CMD_SNP_GUEST_REQUEST, &req,
+ (unsigned long)sev->snp_certs_data,
+ &data_npages, &err);
}
+
datalen:
if (sev->snp_certs_len)
data_npages = sev->snp_certs_len >> PAGE_SHIFT;
@@ -3689,27 +3706,30 @@ static void snp_handle_ext_guest_request(struct vcpu_svm *svm, gpa_t req_gpa, gp
* If buffer length is small then return the expected
* length in rbx.
*/
- if (err == SNP_GUEST_REQ_INVALID_LEN)
+ if (rc == -ENOSPC) {
vcpu->arch.regs[VCPU_REGS_RBX] = data_npages;
+ exitcode = SNP_GUEST_REQ_INVALID_LEN;
+ goto cleanup;
+ }

/* pass the firmware error code */
- rc = err;
+ exitcode = err;
goto cleanup;
}

/* Copy the certificate blob in the guest memory */
if (data_npages &&
kvm_write_guest(kvm, data_gpa, sev->snp_certs_data, data_npages << PAGE_SHIFT))
- rc = SEV_RET_INVALID_ADDRESS;
+ exitcode = SEV_RET_INVALID_ADDRESS;

cleanup:
- snp_cleanup_guest_buf(&req, &rc);
+ snp_cleanup_guest_buf(&req, &exitcode);

unlock:
mutex_unlock(&sev->guest_req_lock);

e_fail:
- svm_set_ghcb_sw_exit_info_2(vcpu, rc);
+ svm_set_ghcb_sw_exit_info_2(vcpu, exitcode);
}

static kvm_pfn_t gfn_to_pfn_restricted(struct kvm *kvm, gfn_t gfn)
diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index 6c4fdcaed72b..73f56c20255c 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -2070,8 +2070,8 @@ int snp_guest_dbg_decrypt_page(u64 gctx_pfn, u64 src_pfn, u64 dst_pfn, int *erro
}
EXPORT_SYMBOL_GPL(snp_guest_dbg_decrypt_page);

-int snp_guest_ext_guest_request(struct sev_data_snp_guest_request *data,
- unsigned long vaddr, unsigned long *npages, unsigned long *fw_err)
+int sev_issue_cmd_external_user_cert(struct file *filep, unsigned int cmd, void *data,
+ unsigned long vaddr, unsigned long *npages, int *error)
{
unsigned long expected_npages;
struct sev_device *sev;
@@ -2093,12 +2093,11 @@ int snp_guest_ext_guest_request(struct sev_data_snp_guest_request *data,
expected_npages = sev->snp_certs_len >> PAGE_SHIFT;
if (*npages < expected_npages) {
*npages = expected_npages;
- *fw_err = SNP_GUEST_REQ_INVALID_LEN;
mutex_unlock(&sev->snp_certs_lock);
- return -EINVAL;
+ return -ENOSPC;
}

- rc = sev_do_cmd(SEV_CMD_SNP_GUEST_REQUEST, data, (int *)fw_err);
+ rc = sev_issue_cmd_external_user(filep, cmd, data, error);
if (rc) {
mutex_unlock(&sev->snp_certs_lock);
return rc;
@@ -2115,7 +2114,7 @@ int snp_guest_ext_guest_request(struct sev_data_snp_guest_request *data,
mutex_unlock(&sev->snp_certs_lock);
return rc;
}
-EXPORT_SYMBOL_GPL(snp_guest_ext_guest_request);
+EXPORT_SYMBOL_GPL(sev_issue_cmd_external_user_cert);

static void sev_exit(struct kref *ref)
{
--
2.39.1


2023-02-06 21:57:28

by Kalra, Ashish

[permalink] [raw]
Subject: Re: [PATCH kernel] KVM: SVM: Fix SVM_VMGEXIT_EXT_GUEST_REQUEST to follow the rest of API

On 2/5/2023 9:13 PM, Alexey Kardashevskiy wrote:
> When SVM VM is up, KVM uses sev_issue_cmd_external_user() with an open
> /dev/sev fd which ensures that the SVM initialization was done correctly.
> The only helper not following the scheme is snp_guest_ext_guest_request()
> which bypasses the fd check.
>
> Change the SEV API to require passing a file.
>
> Handle errors with care in the SNP Extended Guest Request handler
> (snp_handle_ext_guest_request()) as there are actually 3 types of errors:
> - @rc: return code SEV device's sev_issue_cmd() which is int==int32;
> - @err: a psp return code in sev_issue_cmd(), also int==int32 (probably
> a mistake but kvm_sev_cmd::error uses __u32 for some time now);
> - (added by this) @exitcode: GHCB's exit code sw_exit_info_2, uint64.
>
> Use the right types, remove cast to int* and return ENOSPC from SEV
> device for converting it to the GHCB's exit code
> SNP_GUEST_REQ_INVALID_LEN==BIT(32).
>
> Fixes: 17f1d0c995ac ("KVM: SVM: Provide support for SNP_GUEST_REQUEST NAE event")
> While at this, preserve the original error in snp_cleanup_guest_buf().
>
> Signed-off-by: Alexey Kardashevskiy <[email protected]>
> ---
>
> This can easily be squashed into what it fixes.
>
> The patch is made for
> https://github.com/AMDESE/linux/commits/upmv10-host-snp-v7-rfc
> ---
> include/linux/psp-sev.h | 62 +++++++++++---------
> arch/x86/kvm/svm/sev.c | 50 +++++++++++-----
> drivers/crypto/ccp/sev-dev.c | 11 ++--
> 3 files changed, 73 insertions(+), 50 deletions(-)
>
> diff --git a/include/linux/psp-sev.h b/include/linux/psp-sev.h
> index 970a9de0ed20..466b1a6e7d7b 100644
> --- a/include/linux/psp-sev.h
> +++ b/include/linux/psp-sev.h
> @@ -848,6 +848,36 @@ int sev_platform_status(struct sev_user_data_status *status, int *error);
> int sev_issue_cmd_external_user(struct file *filep, unsigned int id,
> void *data, int *error);
>
> +/**
> + * sev_issue_cmd_external_user_cert - issue SEV command by other driver with a file
> + * handle and return certificates set onto SEV device via SNP_SET_EXT_CONFIG;
> + * intended for use by the SNP extended guest request command defined
> + * in the GHCB specification.
> + *
> + * @filep - SEV device file pointer
> + * @cmd - command to issue
> + * @data - command buffer
> + * @vaddr: address where the certificate blob need to be copied.
> + * @npages: number of pages for the certificate blob.
> + * If the specified page count is less than the certificate blob size, then the
> + * required page count is returned with ENOSPC error code.
> + * If the specified page count is more than the certificate blob size, then
> + * page count is updated to reflect the amount of valid data copied in the
> + * vaddr.
> + *
> + * @error: SEV command return code
> + *
> + * Returns:
> + * 0 if the sev successfully processed the command
> + * -%ENODEV if the sev device is not available
> + * -%ENOTSUPP if the sev does not support SEV
> + * -%ETIMEDOUT if the sev command timed out
> + * -%EIO if the sev returned a non-zero return code
> + * -%ENOSPC if the specified page count is too small
> + */
> +int sev_issue_cmd_external_user_cert(struct file *filep, unsigned int cmd, void *data,
> + unsigned long vaddr, unsigned long *npages, int *error);
> +
> /**
> * sev_guest_deactivate - perform SEV DEACTIVATE command
> *
> @@ -945,32 +975,6 @@ void snp_free_firmware_page(void *addr);
> */
> void snp_mark_pages_offline(unsigned long pfn, unsigned int npages);
>
> -/**
> - * snp_guest_ext_guest_request - perform the SNP extended guest request command
> - * defined in the GHCB specification.
> - *
> - * @data: the input guest request structure
> - * @vaddr: address where the certificate blob need to be copied.
> - * @npages: number of pages for the certificate blob.
> - * If the specified page count is less than the certificate blob size, then the
> - * required page count is returned with error code defined in the GHCB spec.
> - * If the specified page count is more than the certificate blob size, then
> - * page count is updated to reflect the amount of valid data copied in the
> - * vaddr.
> - *
> - * @sev_ret: sev command return code
> - *
> - * Returns:
> - * 0 if the sev successfully processed the command
> - * -%ENODEV if the sev device is not available
> - * -%ENOTSUPP if the sev does not support SEV
> - * -%ETIMEDOUT if the sev command timed out
> - * -%EIO if the sev returned a non-zero return code
> - */
> -int snp_guest_ext_guest_request(struct sev_data_snp_guest_request *data,
> - unsigned long vaddr, unsigned long *npages,
> - unsigned long *error);
> -
> #else /* !CONFIG_CRYPTO_DEV_SP_PSP */
>
> static inline int
> @@ -1013,9 +1017,9 @@ static inline void *snp_alloc_firmware_page(gfp_t mask)
>
> static inline void snp_free_firmware_page(void *addr) { }
>
> -static inline int snp_guest_ext_guest_request(struct sev_data_snp_guest_request *data,
> - unsigned long vaddr, unsigned long *n,
> - unsigned long *error)
> +static inline int sev_issue_cmd_external_user_cert(struct file *filep, unsigned int cmd,
> + void *data, unsigned long vaddr,
> + unsigned long *npages, int *error)
> {
> return -ENODEV;
> }
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index d0e58cffd1ed..b268c35efab4 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -394,6 +394,23 @@ static int sev_issue_cmd(struct kvm *kvm, int id, void *data, int *error)
> return __sev_issue_cmd(sev->fd, id, data, error);
> }
>
> +static int sev_issue_cmd_cert(struct kvm *kvm, int id, void *data,
> + unsigned long vaddr, unsigned long *npages, int *error)
> +{
> + struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> + struct fd f;
> + int ret;
> +
> + f = fdget(sev->fd);
> + if (!f.file)
> + return -EBADF;
> +
> + ret = sev_issue_cmd_external_user_cert(f.file, id, data, vaddr, npages, error);
> +
> + fdput(f);
> + return ret;
> +}
> +
> static int sev_launch_start(struct kvm *kvm, struct kvm_sev_cmd *argp)
> {
> struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> @@ -3587,11 +3604,11 @@ static void snp_cleanup_guest_buf(struct sev_data_snp_guest_request *data, unsig
> int ret;
>
> ret = snp_page_reclaim(pfn);
> - if (ret)
> + if (ret && (*rc == SEV_RET_SUCCESS))
> *rc = SEV_RET_INVALID_ADDRESS;
>
> ret = rmp_make_shared(pfn, PG_LEVEL_4K);
> - if (ret)
> + if (ret && (*rc == SEV_RET_SUCCESS))
> *rc = SEV_RET_INVALID_ADDRESS;
> }

I believe we need to fix this as per the GHCB specifications.

As per GHCB 2.0 specifications:

SW_EXITINFO2
...
State from Hypervisor: Upper
32-bits (63:32) will contain the
return code from the hypervisor.
Lower 32-bits (31:0) will contain
the return code from the firmware
call (0 = success)

So i believe the FW error code (which is the FW error code from
SNP_GUEST_REQUEST or *rc here) should be contained in the lower 32-bits
and the error code being returned back due to response buffer pages
reclaim failure and/or failure to transisition these pages back to
shared state is basically hypervisor (error) return code and that should
be returned in the upper 32-bit of the exitinfo.

There is work in progress to check conformance of SNP v7 patches to GHCB
2.0 specifications, so probably this fix can be included as part of
those patches.

>
> @@ -3638,8 +3655,9 @@ static void snp_handle_ext_guest_request(struct vcpu_svm *svm, gpa_t req_gpa, gp
> struct kvm *kvm = vcpu->kvm;
> unsigned long data_npages;
> struct kvm_sev_info *sev;
> - unsigned long rc, err;

This needs to be looked at more carefully. The SEV firmware status code
is defined as 32-bit, but is being handled as unsigned long in the
KVM/SNP code and as int in the CCP driver. So this needs to be fixed
consistently across, snp_setup_guest_buf() return value will need to be
fixed accordingly.

> + unsigned long exitcode;
> u64 data_gpa;
> + int err, rc;
>
> if (!sev_snp_guest(vcpu->kvm)) {
> rc = SEV_RET_INVALID_GUEST;
> @@ -3669,17 +3687,16 @@ static void snp_handle_ext_guest_request(struct vcpu_svm *svm, gpa_t req_gpa, gp
> */
> if (sev->snp_certs_len) {
> if ((data_npages << PAGE_SHIFT) < sev->snp_certs_len) {
> - rc = -EINVAL;
> - err = SNP_GUEST_REQ_INVALID_LEN;
> + rc = -ENOSPC;

Why do we need to introduce ENOSPC error code?

If we continue to use SNP_GUEST_REQ_INVALID_LEN we don't need to map
ENOSPC to SNP_GUEST_REQ_INVALID_LEN below.

And the CCP driver can return SNP_GUEST_REQ_INVALID_LEN as earlier via
the fw_err parameter.

Thanks,
Ashish

> goto datalen;
> }
> - rc = sev_issue_cmd(kvm, SEV_CMD_SNP_GUEST_REQUEST, &req,
> - (int *)&err);
> + rc = sev_issue_cmd(kvm, SEV_CMD_SNP_GUEST_REQUEST, &req, &err);
> } else {
> - rc = snp_guest_ext_guest_request(&req,
> - (unsigned long)sev->snp_certs_data,
> - &data_npages, &err);
> + rc = sev_issue_cmd_cert(kvm, SEV_CMD_SNP_GUEST_REQUEST, &req,
> + (unsigned long)sev->snp_certs_data,
> + &data_npages, &err);
> }
> +
> datalen:
> if (sev->snp_certs_len)
> data_npages = sev->snp_certs_len >> PAGE_SHIFT;
> @@ -3689,27 +3706,30 @@ static void snp_handle_ext_guest_request(struct vcpu_svm *svm, gpa_t req_gpa, gp
> * If buffer length is small then return the expected
> * length in rbx.
> */
> - if (err == SNP_GUEST_REQ_INVALID_LEN)
> + if (rc == -ENOSPC) {
> vcpu->arch.regs[VCPU_REGS_RBX] = data_npages;
> + exitcode = SNP_GUEST_REQ_INVALID_LEN;
> + goto cleanup;
> + }
>
> /* pass the firmware error code */
> - rc = err;
> + exitcode = err;
> goto cleanup;
> }
>
> /* Copy the certificate blob in the guest memory */
> if (data_npages &&
> kvm_write_guest(kvm, data_gpa, sev->snp_certs_data, data_npages << PAGE_SHIFT))
> - rc = SEV_RET_INVALID_ADDRESS;
> + exitcode = SEV_RET_INVALID_ADDRESS;
>
> cleanup:
> - snp_cleanup_guest_buf(&req, &rc);
> + snp_cleanup_guest_buf(&req, &exitcode);
>
> unlock:
> mutex_unlock(&sev->guest_req_lock);
>
> e_fail:
> - svm_set_ghcb_sw_exit_info_2(vcpu, rc);
> + svm_set_ghcb_sw_exit_info_2(vcpu, exitcode);
> }
>
> static kvm_pfn_t gfn_to_pfn_restricted(struct kvm *kvm, gfn_t gfn)
> diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
> index 6c4fdcaed72b..73f56c20255c 100644
> --- a/drivers/crypto/ccp/sev-dev.c
> +++ b/drivers/crypto/ccp/sev-dev.c
> @@ -2070,8 +2070,8 @@ int snp_guest_dbg_decrypt_page(u64 gctx_pfn, u64 src_pfn, u64 dst_pfn, int *erro
> }
> EXPORT_SYMBOL_GPL(snp_guest_dbg_decrypt_page);
>
> -int snp_guest_ext_guest_request(struct sev_data_snp_guest_request *data,
> - unsigned long vaddr, unsigned long *npages, unsigned long *fw_err)
> +int sev_issue_cmd_external_user_cert(struct file *filep, unsigned int cmd, void *data,
> + unsigned long vaddr, unsigned long *npages, int *error)
> {
> unsigned long expected_npages;
> struct sev_device *sev;
> @@ -2093,12 +2093,11 @@ int snp_guest_ext_guest_request(struct sev_data_snp_guest_request *data,
> expected_npages = sev->snp_certs_len >> PAGE_SHIFT;
> if (*npages < expected_npages) {
> *npages = expected_npages;
> - *fw_err = SNP_GUEST_REQ_INVALID_LEN;
> mutex_unlock(&sev->snp_certs_lock);
> - return -EINVAL;
> + return -ENOSPC;
> }
>
> - rc = sev_do_cmd(SEV_CMD_SNP_GUEST_REQUEST, data, (int *)fw_err);
> + rc = sev_issue_cmd_external_user(filep, cmd, data, error);
> if (rc) {
> mutex_unlock(&sev->snp_certs_lock);
> return rc;
> @@ -2115,7 +2114,7 @@ int snp_guest_ext_guest_request(struct sev_data_snp_guest_request *data,
> mutex_unlock(&sev->snp_certs_lock);
> return rc;
> }
> -EXPORT_SYMBOL_GPL(snp_guest_ext_guest_request);
> +EXPORT_SYMBOL_GPL(sev_issue_cmd_external_user_cert);
>
> static void sev_exit(struct kref *ref)
> {
>

2023-02-07 01:25:29

by Alexey Kardashevskiy

[permalink] [raw]
Subject: Re: [PATCH kernel] KVM: SVM: Fix SVM_VMGEXIT_EXT_GUEST_REQUEST to follow the rest of API



On 07/02/2023 08:57, Kalra, Ashish wrote:
> On 2/5/2023 9:13 PM, Alexey Kardashevskiy wrote:
>> When SVM VM is up, KVM uses sev_issue_cmd_external_user() with an open
>> /dev/sev fd which ensures that the SVM initialization was done correctly.
>> The only helper not following the scheme is snp_guest_ext_guest_request()
>> which bypasses the fd check.
>>
>> Change the SEV API to require passing a file.
>>
>> Handle errors with care in the SNP Extended Guest Request handler
>> (snp_handle_ext_guest_request()) as there are actually 3 types of errors:
>> - @rc: return code SEV device's sev_issue_cmd() which is int==int32;
>> - @err: a psp return code in sev_issue_cmd(), also int==int32 (probably
>> a mistake but kvm_sev_cmd::error uses __u32 for some time now);
>> - (added by this) @exitcode: GHCB's exit code sw_exit_info_2, uint64.
>>
>> Use the right types, remove cast to int* and return ENOSPC from SEV
>> device for converting it to the GHCB's exit code
>> SNP_GUEST_REQ_INVALID_LEN==BIT(32).
>>
>> Fixes: 17f1d0c995ac ("KVM: SVM: Provide support for SNP_GUEST_REQUEST
>> NAE event")
>> While at this, preserve the original error in snp_cleanup_guest_buf().
>>
>> Signed-off-by: Alexey Kardashevskiy <[email protected]>
>> ---
>>
>> This can easily be squashed into what it fixes.
>>
>> The patch is made for
>> https://github.com/AMDESE/linux/commits/upmv10-host-snp-v7-rfc
>> ---
>>   include/linux/psp-sev.h      | 62 +++++++++++---------
>>   arch/x86/kvm/svm/sev.c       | 50 +++++++++++-----
>>   drivers/crypto/ccp/sev-dev.c | 11 ++--
>>   3 files changed, 73 insertions(+), 50 deletions(-)
>>
>> diff --git a/include/linux/psp-sev.h b/include/linux/psp-sev.h
>> index 970a9de0ed20..466b1a6e7d7b 100644
>> --- a/include/linux/psp-sev.h
>> +++ b/include/linux/psp-sev.h
>> @@ -848,6 +848,36 @@ int sev_platform_status(struct
>> sev_user_data_status *status, int *error);
>>   int sev_issue_cmd_external_user(struct file *filep, unsigned int id,
>>                   void *data, int *error);
>> +/**
>> + * sev_issue_cmd_external_user_cert - issue SEV command by other
>> driver with a file
>> + * handle and return certificates set onto SEV device via
>> SNP_SET_EXT_CONFIG;
>> + * intended for use by the SNP extended guest request command defined
>> + * in the GHCB specification.
>> + *
>> + * @filep - SEV device file pointer
>> + * @cmd - command to issue
>> + * @data - command buffer
>> + * @vaddr: address where the certificate blob need to be copied.
>> + * @npages: number of pages for the certificate blob.
>> + *    If the specified page count is less than the certificate blob
>> size, then the
>> + *    required page count is returned with ENOSPC error code.
>> + *    If the specified page count is more than the certificate blob
>> size, then
>> + *    page count is updated to reflect the amount of valid data
>> copied in the
>> + *    vaddr.
>> + *
>> + * @error: SEV command return code
>> + *
>> + * Returns:
>> + * 0 if the sev successfully processed the command
>> + * -%ENODEV    if the sev device is not available
>> + * -%ENOTSUPP  if the sev does not support SEV
>> + * -%ETIMEDOUT if the sev command timed out
>> + * -%EIO       if the sev returned a non-zero return code
>> + * -%ENOSPC    if the specified page count is too small
>> + */
>> +int sev_issue_cmd_external_user_cert(struct file *filep, unsigned int
>> cmd, void *data,
>> +                     unsigned long vaddr, unsigned long *npages, int
>> *error);
>> +
>>   /**
>>    * sev_guest_deactivate - perform SEV DEACTIVATE command
>>    *
>> @@ -945,32 +975,6 @@ void snp_free_firmware_page(void *addr);
>>    */
>>   void snp_mark_pages_offline(unsigned long pfn, unsigned int npages);
>> -/**
>> - * snp_guest_ext_guest_request - perform the SNP extended guest
>> request command
>> - *  defined in the GHCB specification.
>> - *
>> - * @data: the input guest request structure
>> - * @vaddr: address where the certificate blob need to be copied.
>> - * @npages: number of pages for the certificate blob.
>> - *    If the specified page count is less than the certificate blob
>> size, then the
>> - *    required page count is returned with error code defined in the
>> GHCB spec.
>> - *    If the specified page count is more than the certificate blob
>> size, then
>> - *    page count is updated to reflect the amount of valid data
>> copied in the
>> - *    vaddr.
>> - *
>> - * @sev_ret: sev command return code
>> - *
>> - * Returns:
>> - * 0 if the sev successfully processed the command
>> - * -%ENODEV    if the sev device is not available
>> - * -%ENOTSUPP  if the sev does not support SEV
>> - * -%ETIMEDOUT if the sev command timed out
>> - * -%EIO       if the sev returned a non-zero return code
>> - */
>> -int snp_guest_ext_guest_request(struct sev_data_snp_guest_request *data,
>> -                unsigned long vaddr, unsigned long *npages,
>> -                unsigned long *error);
>> -
>>   #else    /* !CONFIG_CRYPTO_DEV_SP_PSP */
>>   static inline int
>> @@ -1013,9 +1017,9 @@ static inline void
>> *snp_alloc_firmware_page(gfp_t mask)
>>   static inline void snp_free_firmware_page(void *addr) { }
>> -static inline int snp_guest_ext_guest_request(struct
>> sev_data_snp_guest_request *data,
>> -                          unsigned long vaddr, unsigned long *n,
>> -                          unsigned long *error)
>> +static inline int sev_issue_cmd_external_user_cert(struct file
>> *filep, unsigned int cmd,
>> +                           void *data, unsigned long vaddr,
>> +                           unsigned long *npages, int *error)
>>   {
>>       return -ENODEV;
>>   }
>> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
>> index d0e58cffd1ed..b268c35efab4 100644
>> --- a/arch/x86/kvm/svm/sev.c
>> +++ b/arch/x86/kvm/svm/sev.c
>> @@ -394,6 +394,23 @@ static int sev_issue_cmd(struct kvm *kvm, int id,
>> void *data, int *error)
>>       return __sev_issue_cmd(sev->fd, id, data, error);
>>   }
>> +static int sev_issue_cmd_cert(struct kvm *kvm, int id, void *data,
>> +                  unsigned long vaddr, unsigned long *npages, int
>> *error)
>> +{
>> +    struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
>> +    struct fd f;
>> +    int ret;
>> +
>> +    f = fdget(sev->fd);
>> +    if (!f.file)
>> +        return -EBADF;
>> +
>> +    ret = sev_issue_cmd_external_user_cert(f.file, id, data, vaddr,
>> npages, error);
>> +
>> +    fdput(f);
>> +    return ret;
>> +}
>> +
>>   static int sev_launch_start(struct kvm *kvm, struct kvm_sev_cmd *argp)
>>   {
>>       struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
>> @@ -3587,11 +3604,11 @@ static void snp_cleanup_guest_buf(struct
>> sev_data_snp_guest_request *data, unsig
>>       int ret;
>>       ret = snp_page_reclaim(pfn);
>> -    if (ret)
>> +    if (ret && (*rc == SEV_RET_SUCCESS))
>>           *rc = SEV_RET_INVALID_ADDRESS;
>>       ret = rmp_make_shared(pfn, PG_LEVEL_4K);
>> -    if (ret)
>> +    if (ret && (*rc == SEV_RET_SUCCESS))
>>           *rc = SEV_RET_INVALID_ADDRESS;
>>   }
>
> I believe we need to fix this as per the GHCB specifications.
>
> As per GHCB 2.0 specifications:
>
> SW_EXITINFO2
> ...
> State from Hypervisor: Upper
> 32-bits (63:32) will contain the
> return code from the hypervisor.
> Lower 32-bits (31:0) will contain
> the return code from the firmware
> call (0 = success)
>
> So i believe the FW error code (which is the FW error code from
> SNP_GUEST_REQUEST or *rc here) should be contained in the lower 32-bits
> and the error code being returned back due to response buffer pages
> reclaim failure and/or failure to transisition these pages back to
> shared state is basically hypervisor (error) return code and that should
> be returned in the upper 32-bit of the exitinfo.
>
> There is work in progress to check conformance of SNP v7 patches to GHCB
> 2.0 specifications, so probably this fix can be included as part of
> those patches.

Yes, please :)


>
>> @@ -3638,8 +3655,9 @@ static void snp_handle_ext_guest_request(struct
>> vcpu_svm *svm, gpa_t req_gpa, gp
>>       struct kvm *kvm = vcpu->kvm;
>>       unsigned long data_npages;
>>       struct kvm_sev_info *sev;
>> -    unsigned long rc, err;
>
> This needs to be looked at more carefully. The SEV firmware status code
> is defined as 32-bit, but is being handled as unsigned long in the
> KVM/SNP code and as int in the CCP driver. So this needs to be fixed
> consistently across,

Ultimately it should be explicit u32 in SEV and u64 in GHCB because PSP
and GHCB are binary interfaces and the sizes should be explicit. Error
codes between KVM and CCP can be anything (unsigned long, u64) as it is
the same binary.

> snp_setup_guest_buf() return value will need to be
> fixed accordingly.
>
>> +    unsigned long exitcode;
>>       u64 data_gpa;
>> +    int err, rc;
>>       if (!sev_snp_guest(vcpu->kvm)) {
>>           rc = SEV_RET_INVALID_GUEST;
>> @@ -3669,17 +3687,16 @@ static void
>> snp_handle_ext_guest_request(struct vcpu_svm *svm, gpa_t req_gpa, gp
>>        */
>>       if (sev->snp_certs_len) {
>>           if ((data_npages << PAGE_SHIFT) < sev->snp_certs_len) {
>> -            rc = -EINVAL;
>> -            err = SNP_GUEST_REQ_INVALID_LEN;
>> +            rc = -ENOSPC;
>
> Why do we need to introduce ENOSPC error code?

To distinguish it from other errors and return SNP_GUEST_REQ_INVALID_LEN
when needed (the commit log mentions this).


> If we continue to use SNP_GUEST_REQ_INVALID_LEN we don't need to map
> ENOSPC to SNP_GUEST_REQ_INVALID_LEN below.
> And the CCP driver can return SNP_GUEST_REQ_INVALID_LEN as earlier via
> the fw_err parameter.

imho this is a bad idea.

SNP_GUEST_REQ_INVALID_LEN is defined in the GHCB spec and GHCB is
between KVM and VM, /dev/sev is neither GHCB nor KVM. err here is for
the firmware errors but SNP_GUEST_REQ_INVALID_LEN is not from the
firmware and for not-from-the-firmware-errors we already have "return
rc" so lets just use that. Also err is 32bit across the place, in things
like sev_issue_cmd() and then there is this ugly cast to int*. Thanks,


>
> Thanks,
> Ashish
>
>>               goto datalen;
>>           }
>> -        rc = sev_issue_cmd(kvm, SEV_CMD_SNP_GUEST_REQUEST, &req,
>> -                   (int *)&err);
>> +        rc = sev_issue_cmd(kvm, SEV_CMD_SNP_GUEST_REQUEST, &req, &err);
>>       } else {
>> -        rc = snp_guest_ext_guest_request(&req,
>> -                         (unsigned long)sev->snp_certs_data,
>> -                         &data_npages, &err);
>> +        rc = sev_issue_cmd_cert(kvm, SEV_CMD_SNP_GUEST_REQUEST, &req,
>> +                    (unsigned long)sev->snp_certs_data,
>> +                    &data_npages, &err);
>>       }
>> +
>>   datalen:
>>       if (sev->snp_certs_len)
>>           data_npages = sev->snp_certs_len >> PAGE_SHIFT;
>> @@ -3689,27 +3706,30 @@ static void
>> snp_handle_ext_guest_request(struct vcpu_svm *svm, gpa_t req_gpa, gp
>>            * If buffer length is small then return the expected
>>            * length in rbx.
>>            */
>> -        if (err == SNP_GUEST_REQ_INVALID_LEN)
>> +        if (rc == -ENOSPC) {
>>               vcpu->arch.regs[VCPU_REGS_RBX] = data_npages;
>> +            exitcode = SNP_GUEST_REQ_INVALID_LEN;
>> +            goto cleanup;
>> +        }
>>           /* pass the firmware error code */
>> -        rc = err;
>> +        exitcode = err;
>>           goto cleanup;
>>       }
>>       /* Copy the certificate blob in the guest memory */
>>       if (data_npages &&
>>           kvm_write_guest(kvm, data_gpa, sev->snp_certs_data,
>> data_npages << PAGE_SHIFT))
>> -        rc = SEV_RET_INVALID_ADDRESS;
>> +        exitcode = SEV_RET_INVALID_ADDRESS;
>>   cleanup:
>> -    snp_cleanup_guest_buf(&req, &rc);
>> +    snp_cleanup_guest_buf(&req, &exitcode);
>>   unlock:
>>       mutex_unlock(&sev->guest_req_lock);
>>   e_fail:
>> -    svm_set_ghcb_sw_exit_info_2(vcpu, rc);
>> +    svm_set_ghcb_sw_exit_info_2(vcpu, exitcode);
>>   }
>>   static kvm_pfn_t gfn_to_pfn_restricted(struct kvm *kvm, gfn_t gfn)
>> diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
>> index 6c4fdcaed72b..73f56c20255c 100644
>> --- a/drivers/crypto/ccp/sev-dev.c
>> +++ b/drivers/crypto/ccp/sev-dev.c
>> @@ -2070,8 +2070,8 @@ int snp_guest_dbg_decrypt_page(u64 gctx_pfn, u64
>> src_pfn, u64 dst_pfn, int *erro
>>   }
>>   EXPORT_SYMBOL_GPL(snp_guest_dbg_decrypt_page);
>> -int snp_guest_ext_guest_request(struct sev_data_snp_guest_request *data,
>> -                unsigned long vaddr, unsigned long *npages, unsigned
>> long *fw_err)
>> +int sev_issue_cmd_external_user_cert(struct file *filep, unsigned int
>> cmd, void *data,
>> +                     unsigned long vaddr, unsigned long *npages, int
>> *error)
>>   {
>>       unsigned long expected_npages;
>>       struct sev_device *sev;
>> @@ -2093,12 +2093,11 @@ int snp_guest_ext_guest_request(struct
>> sev_data_snp_guest_request *data,
>>       expected_npages = sev->snp_certs_len >> PAGE_SHIFT;
>>       if (*npages < expected_npages) {
>>           *npages = expected_npages;
>> -        *fw_err = SNP_GUEST_REQ_INVALID_LEN;
>>           mutex_unlock(&sev->snp_certs_lock);
>> -        return -EINVAL;
>> +        return -ENOSPC;
>>       }
>> -    rc = sev_do_cmd(SEV_CMD_SNP_GUEST_REQUEST, data, (int *)fw_err);
>> +    rc = sev_issue_cmd_external_user(filep, cmd, data, error);
>>       if (rc) {
>>           mutex_unlock(&sev->snp_certs_lock);
>>           return rc;
>> @@ -2115,7 +2114,7 @@ int snp_guest_ext_guest_request(struct
>> sev_data_snp_guest_request *data,
>>       mutex_unlock(&sev->snp_certs_lock);
>>       return rc;
>>   }
>> -EXPORT_SYMBOL_GPL(snp_guest_ext_guest_request);
>> +EXPORT_SYMBOL_GPL(sev_issue_cmd_external_user_cert);
>>   static void sev_exit(struct kref *ref)
>>   {
>>

--
Alexey

2023-02-08 16:32:45

by Liam Merwick

[permalink] [raw]
Subject: Re: [PATCH RFC v7 16/64] x86/sev: Add helper functions for RMPUPDATE and PSMASH instruction

On 14/12/2022 19:40, Michael Roth wrote:
> From: Brijesh Singh <[email protected]>
>
> The RMPUPDATE instruction writes a new RMP entry in the RMP Table. The
> hypervisor will use the instruction to add pages to the RMP table. See
> APM3 for details on the instruction operations.
>
> The PSMASH instruction expands a 2MB RMP entry into a corresponding set
> of contiguous 4KB-Page RMP entries. The hypervisor will use this
> instruction to adjust the RMP entry without invalidating the previous
> RMP entry.
>
> Add the following external interface API functions:
>
> int psmash(u64 pfn);
> psmash is used to smash a 2MB aligned page into 4K
> pages while preserving the Validated bit in the RMP.
>
> int rmp_make_private(u64 pfn, u64 gpa, enum pg_level level, int asid, bool immutable);
> Used to assign a page to guest using the RMPUPDATE instruction.
>
> int rmp_make_shared(u64 pfn, enum pg_level level);
> Used to transition a page to hypervisor/shared state using the RMPUPDATE instruction.
>
> Signed-off-by: Ashish Kalra <[email protected]>
> Signed-off-by: Brijesh Singh <[email protected]>
> [mdr: add RMPUPDATE retry logic for transient FAIL_OVERLAP errors]
> Signed-off-by: Michael Roth <[email protected]>
> ---
> arch/x86/include/asm/sev.h | 24 ++++++++++
> arch/x86/kernel/sev.c | 95 ++++++++++++++++++++++++++++++++++++++
> 2 files changed, 119 insertions(+)
>

...

> +
> +static int rmpupdate(u64 pfn, struct rmp_state *val)
> +{
> + unsigned long paddr = pfn << PAGE_SHIFT;
> + int retries = 0;
> + int ret;
> +
> + if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
> + return -ENXIO;
> +
> +retry:
> + /* Binutils version 2.36 supports the RMPUPDATE mnemonic. */
> + asm volatile(".byte 0xF2, 0x0F, 0x01, 0xFE"
> + : "=a"(ret)
> + : "a"(paddr), "c"((unsigned long)val)
> + : "memory", "cc");
> +
> + if (ret) {
> + if (!retries) {
> + pr_err("RMPUPDATE failed, ret: %d, pfn: %llx, npages: %d, level: %d, retrying (max: %d)...\n",
> + ret, pfn, npages, level, 2 * num_present_cpus());

This patch isn't bisectable - 'npages' isn't defined in this patch -
it's defined later in Patch18

otherwise LGTM

Regards,
Liam


> + dump_stack();
> + }
> + retries++;
> + if (retries < 2 * num_present_cpus())
> + goto retry;
> + } else if (retries > 0) {
> + pr_err("RMPUPDATE for pfn %llx succeeded after %d retries\n", pfn, retries);
> + }
> +
> + return ret;
> +}


2023-02-08 21:51:03

by Kalra, Ashish

[permalink] [raw]
Subject: Re: [PATCH kernel] KVM: SVM: Fix SVM_VMGEXIT_EXT_GUEST_REQUEST to follow the rest of API

Hello Alexey,

On 2/6/2023 7:24 PM, Alexey Kardashevskiy wrote:
>
>
> On 07/02/2023 08:57, Kalra, Ashish wrote:
>> On 2/5/2023 9:13 PM, Alexey Kardashevskiy wrote:
>>> When SVM VM is up, KVM uses sev_issue_cmd_external_user() with an open
>>> /dev/sev fd which ensures that the SVM initialization was done
>>> correctly.
>>> The only helper not following the scheme is
>>> snp_guest_ext_guest_request()
>>> which bypasses the fd check.
>>>
>>> Change the SEV API to require passing a file.
>>>
>>> Handle errors with care in the SNP Extended Guest Request handler
>>> (snp_handle_ext_guest_request()) as there are actually 3 types of
>>> errors:
>>> - @rc: return code SEV device's sev_issue_cmd() which is int==int32;
>>> - @err: a psp return code in sev_issue_cmd(), also int==int32 (probably
>>> a mistake but kvm_sev_cmd::error uses __u32 for some time now);
>>> - (added by this) @exitcode: GHCB's exit code sw_exit_info_2, uint64.
>>>
>>> Use the right types, remove cast to int* and return ENOSPC from SEV
>>> device for converting it to the GHCB's exit code
>>> SNP_GUEST_REQ_INVALID_LEN==BIT(32).
>>>
>>> Fixes: 17f1d0c995ac ("KVM: SVM: Provide support for SNP_GUEST_REQUEST
>>> NAE event")
>>> While at this, preserve the original error in snp_cleanup_guest_buf().
>>>
>>> Signed-off-by: Alexey Kardashevskiy <[email protected]>
>>> ---
>>>
>>> This can easily be squashed into what it fixes.
>>>
>>> The patch is made for
>>> https://github.com/AMDESE/linux/commits/upmv10-host-snp-v7-rfc
>>> ---
>>>   include/linux/psp-sev.h      | 62 +++++++++++---------
>>>   arch/x86/kvm/svm/sev.c       | 50 +++++++++++-----
>>>   drivers/crypto/ccp/sev-dev.c | 11 ++--
>>>   3 files changed, 73 insertions(+), 50 deletions(-)
>>>
>>> diff --git a/include/linux/psp-sev.h b/include/linux/psp-sev.h
>>> index 970a9de0ed20..466b1a6e7d7b 100644
>>> --- a/include/linux/psp-sev.h
>>> +++ b/include/linux/psp-sev.h
>>> @@ -848,6 +848,36 @@ int sev_platform_status(struct
>>> sev_user_data_status *status, int *error);
>>>   int sev_issue_cmd_external_user(struct file *filep, unsigned int id,
>>>                   void *data, int *error);
>>> +/**
>>> + * sev_issue_cmd_external_user_cert - issue SEV command by other
>>> driver with a file
>>> + * handle and return certificates set onto SEV device via
>>> SNP_SET_EXT_CONFIG;
>>> + * intended for use by the SNP extended guest request command defined
>>> + * in the GHCB specification.
>>> + *
>>> + * @filep - SEV device file pointer
>>> + * @cmd - command to issue
>>> + * @data - command buffer
>>> + * @vaddr: address where the certificate blob need to be copied.
>>> + * @npages: number of pages for the certificate blob.
>>> + *    If the specified page count is less than the certificate blob
>>> size, then the
>>> + *    required page count is returned with ENOSPC error code.
>>> + *    If the specified page count is more than the certificate blob
>>> size, then
>>> + *    page count is updated to reflect the amount of valid data
>>> copied in the
>>> + *    vaddr.
>>> + *
>>> + * @error: SEV command return code
>>> + *
>>> + * Returns:
>>> + * 0 if the sev successfully processed the command
>>> + * -%ENODEV    if the sev device is not available
>>> + * -%ENOTSUPP  if the sev does not support SEV
>>> + * -%ETIMEDOUT if the sev command timed out
>>> + * -%EIO       if the sev returned a non-zero return code
>>> + * -%ENOSPC    if the specified page count is too small
>>> + */
>>> +int sev_issue_cmd_external_user_cert(struct file *filep, unsigned
>>> int cmd, void *data,
>>> +                     unsigned long vaddr, unsigned long *npages, int
>>> *error);
>>> +
>>>   /**
>>>    * sev_guest_deactivate - perform SEV DEACTIVATE command
>>>    *
>>> @@ -945,32 +975,6 @@ void snp_free_firmware_page(void *addr);
>>>    */
>>>   void snp_mark_pages_offline(unsigned long pfn, unsigned int npages);
>>> -/**
>>> - * snp_guest_ext_guest_request - perform the SNP extended guest
>>> request command
>>> - *  defined in the GHCB specification.
>>> - *
>>> - * @data: the input guest request structure
>>> - * @vaddr: address where the certificate blob need to be copied.
>>> - * @npages: number of pages for the certificate blob.
>>> - *    If the specified page count is less than the certificate blob
>>> size, then the
>>> - *    required page count is returned with error code defined in the
>>> GHCB spec.
>>> - *    If the specified page count is more than the certificate blob
>>> size, then
>>> - *    page count is updated to reflect the amount of valid data
>>> copied in the
>>> - *    vaddr.
>>> - *
>>> - * @sev_ret: sev command return code
>>> - *
>>> - * Returns:
>>> - * 0 if the sev successfully processed the command
>>> - * -%ENODEV    if the sev device is not available
>>> - * -%ENOTSUPP  if the sev does not support SEV
>>> - * -%ETIMEDOUT if the sev command timed out
>>> - * -%EIO       if the sev returned a non-zero return code
>>> - */
>>> -int snp_guest_ext_guest_request(struct sev_data_snp_guest_request
>>> *data,
>>> -                unsigned long vaddr, unsigned long *npages,
>>> -                unsigned long *error);
>>> -
>>>   #else    /* !CONFIG_CRYPTO_DEV_SP_PSP */
>>>   static inline int
>>> @@ -1013,9 +1017,9 @@ static inline void
>>> *snp_alloc_firmware_page(gfp_t mask)
>>>   static inline void snp_free_firmware_page(void *addr) { }
>>> -static inline int snp_guest_ext_guest_request(struct
>>> sev_data_snp_guest_request *data,
>>> -                          unsigned long vaddr, unsigned long *n,
>>> -                          unsigned long *error)
>>> +static inline int sev_issue_cmd_external_user_cert(struct file
>>> *filep, unsigned int cmd,
>>> +                           void *data, unsigned long vaddr,
>>> +                           unsigned long *npages, int *error)
>>>   {
>>>       return -ENODEV;
>>>   }
>>> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
>>> index d0e58cffd1ed..b268c35efab4 100644
>>> --- a/arch/x86/kvm/svm/sev.c
>>> +++ b/arch/x86/kvm/svm/sev.c
>>> @@ -394,6 +394,23 @@ static int sev_issue_cmd(struct kvm *kvm, int
>>> id, void *data, int *error)
>>>       return __sev_issue_cmd(sev->fd, id, data, error);
>>>   }
>>> +static int sev_issue_cmd_cert(struct kvm *kvm, int id, void *data,
>>> +                  unsigned long vaddr, unsigned long *npages, int
>>> *error)
>>> +{
>>> +    struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
>>> +    struct fd f;
>>> +    int ret;
>>> +
>>> +    f = fdget(sev->fd);
>>> +    if (!f.file)
>>> +        return -EBADF;
>>> +
>>> +    ret = sev_issue_cmd_external_user_cert(f.file, id, data, vaddr,
>>> npages, error);
>>> +
>>> +    fdput(f);
>>> +    return ret;
>>> +}
>>> +
>>>   static int sev_launch_start(struct kvm *kvm, struct kvm_sev_cmd *argp)
>>>   {
>>>       struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
>>> @@ -3587,11 +3604,11 @@ static void snp_cleanup_guest_buf(struct
>>> sev_data_snp_guest_request *data, unsig
>>>       int ret;
>>>       ret = snp_page_reclaim(pfn);
>>> -    if (ret)
>>> +    if (ret && (*rc == SEV_RET_SUCCESS))
>>>           *rc = SEV_RET_INVALID_ADDRESS;
>>>       ret = rmp_make_shared(pfn, PG_LEVEL_4K);
>>> -    if (ret)
>>> +    if (ret && (*rc == SEV_RET_SUCCESS))
>>>           *rc = SEV_RET_INVALID_ADDRESS;
>>>   }
>>
>> I believe we need to fix this as per the GHCB specifications.
>>
>> As per GHCB 2.0 specifications:
>>
>> SW_EXITINFO2
>> ...
>> State from Hypervisor: Upper
>> 32-bits (63:32) will contain the
>> return code from the hypervisor.
>> Lower 32-bits (31:0) will contain
>> the return code from the firmware
>> call (0 = success)
>>
>> So i believe the FW error code (which is the FW error code from
>> SNP_GUEST_REQUEST or *rc here) should be contained in the lower
>> 32-bits and the error code being returned back due to response buffer
>> pages reclaim failure and/or failure to transisition these pages back
>> to shared state is basically hypervisor (error) return code and that
>> should be returned in the upper 32-bit of the exitinfo.
>>
>> There is work in progress to check conformance of SNP v7 patches to
>> GHCB 2.0 specifications, so probably this fix can be included as part
>> of those patches.
>
> Yes, please :)
>

Yes, will address this in the GHCB specs conformance patch-set for SNP
as per the following revisions of the GHCB specs:

The SNP Guest Request and SNP Extended Guest Request have been updated
to expand on the use of the SW_EXITINFO2 return value to better allow
for the hypervisor to return error codes.

>
>>
>>> @@ -3638,8 +3655,9 @@ static void snp_handle_ext_guest_request(struct
>>> vcpu_svm *svm, gpa_t req_gpa, gp
>>>       struct kvm *kvm = vcpu->kvm;
>>>       unsigned long data_npages;
>>>       struct kvm_sev_info *sev;
>>> -    unsigned long rc, err;
>>
>> This needs to be looked at more carefully. The SEV firmware status
>> code is defined as 32-bit, but is being handled as unsigned long in
>> the KVM/SNP code and as int in the CCP driver. So this needs to be
>> fixed consistently across,
>
> Ultimately it should be explicit u32 in SEV and u64 in GHCB because PSP
> and GHCB are binary interfaces and the sizes should be explicit. Error
> codes between KVM and CCP can be anything (unsigned long, u64) as it is
> the same binary.
>

Again, as lower 32-bits (31:0) of SW_EXITINFO2 is supposed to be set
to the return code from the firmware, so this should also be u32 in GHCB
and same in KVM/SNP code.

> > snp_setup_guest_buf() return value will need to be
>> fixed accordingly.
>>
>>> +    unsigned long exitcode;
>>>       u64 data_gpa;
>>> +    int err, rc;
>>>       if (!sev_snp_guest(vcpu->kvm)) {
>>>           rc = SEV_RET_INVALID_GUEST;
>>> @@ -3669,17 +3687,16 @@ static void
>>> snp_handle_ext_guest_request(struct vcpu_svm *svm, gpa_t req_gpa, gp
>>>        */
>>>       if (sev->snp_certs_len) {
>>>           if ((data_npages << PAGE_SHIFT) < sev->snp_certs_len) {
>>> -            rc = -EINVAL;
>>> -            err = SNP_GUEST_REQ_INVALID_LEN;
>>> +            rc = -ENOSPC;
>>
>> Why do we need to introduce ENOSPC error code?
>
> To distinguish it from other errors and return SNP_GUEST_REQ_INVALID_LEN
> when needed (the commit log mentions this).
>
>
>> If we continue to use SNP_GUEST_REQ_INVALID_LEN we don't need to map
>> ENOSPC to SNP_GUEST_REQ_INVALID_LEN below.
>> And the CCP driver can return SNP_GUEST_REQ_INVALID_LEN as earlier via
>> the fw_err parameter.
>
> imho this is a bad idea.
>
> SNP_GUEST_REQ_INVALID_LEN is defined in the GHCB spec and GHCB is
> between KVM and VM, /dev/sev is neither GHCB nor KVM. err here is for
> the firmware errors but SNP_GUEST_REQ_INVALID_LEN is not from the
> firmware and for not-from-the-firmware-errors we already have "return
> rc" so lets just use that. Also err is 32bit across the place, in things
> like sev_issue_cmd() and then there is this ugly cast to int*. Thanks,
>

Ok, that does make sense.

Thanks,
Ashish

2023-02-20 16:41:41

by Michael Roth

[permalink] [raw]
Subject: Re: [PATCH RFC v7 03/64] KVM: SVM: Advertise private memory support to KVM

On Fri, Jan 20, 2023 at 09:20:30PM +0000, Jarkko Sakkinen wrote:
> On Wed, Jan 04, 2023 at 08:14:19PM -0600, Michael Roth wrote:
> > On Fri, Dec 23, 2022 at 05:56:50PM +0100, Borislav Petkov wrote:
> > > On Wed, Dec 14, 2022 at 01:39:55PM -0600, Michael Roth wrote:
> > > > + bool (*private_mem_enabled)(struct kvm *kvm);
> > >
> > > This looks like a function returning boolean to me. IOW, you can
> > > simplify this to:
> >
> > The semantics and existing uses of KVM_X86_OP_OPTIONAL_RET0() gave me the
> > impression it needed to return an integer value, since by default if a
> > platform doesn't implement the op it would "return 0", and so could
> > still be called unconditionally.
> >
> > Maybe that's not actually enforced, by it seems awkward to try to use a
> > bool return instead. At least for KVM_X86_OP_OPTIONAL_RET0().
> >
> > However, we could just use KVM_X86_OP() to declare it so we can cleanly
> > use a function that returns bool, and then we just need to do:
> >
> > bool kvm_arch_has_private_mem(struct kvm *kvm)
> > {
> > if (kvm_x86_ops.private_mem_enabled)
> > return static_call(kvm_x86_private_mem_enabled)(kvm);
>
> I guess this is missing:
>
> return false;
>
> > }
> >
> > instead of relying on default return value. So I'll take that approach
> > and adopt your other suggested changes.
> >
> > ...
> >
> > On a separate topic though, at a high level, this hook is basically a way
> > for platform-specific code to tell generic KVM code that private memslots
> > are supported by overriding the kvm_arch_has_private_mem() weak
> > reference. In this case the AMD platform is using using kvm->arch.upm_mode
> > flag to convey that, which is in turn set by the
> > KVM_CAP_UNMAPPED_PRIVATE_MEMORY introduced in this series.
> >
> > But if, as I suggested in response to your PATCH 2 comments, we drop
> > KVM_CAP_UNAMMPED_PRIVATE_MEMORY in favor of
> > KVM_SET_SUPPORTED_MEMORY_ATTRIBUTES ioctl to enable "UPM mode" in SEV/SNP
> > code, then we need to rethink things a bit, since KVM_SET_MEMORY_ATTRIBUTES
> > in-part relies on kvm_arch_has_private_mem() to determine what flags are
> > supported, whereas SEV/SNP code would be using what was set by
> > KVM_SET_MEMORY_ATTRIBUTES to determine the return value in
> > kvm_arch_has_private_mem().
>
> Does this mean that internal calls to kvm_vm_set_region_attr() will
> cease to exist, and it will rely for user space to use the ioctl
> properly instead?

Patches 1-3 are no longer needed and have been dropped for v8, instead
"UPM mode" is set via KVM_VM_CREATE vm_type arg, and SEV/SNP can simply
call kvm_arch_has_private_mem() to query whether userspace has enabled
UPM mode or not.

But even still, we call kvm_vm_set_region_attr() in
sev_launch_update_data() and snp_launch_update() after copying initial
payload into private memory.

I don't think there's much worth in having userspace have to do it via
KVM_SET_MEMORY_ATTRIBUTES afterward. It could be done that way I suppose,
but generally RMP update from shared->private happens as part of
KVM_SET_MEMORY_ATTRIBUTES, whereas in this case it would necessarily
happen *after* the RMP updates, since SNP_LAUNCH_UPDATE expects the pages
to be marked private beforehand.

Just seems like more corner cases to deal with and more boilerplate code
for userspace, which already needed to operate under the assumption that
pages will be private after SNP_LAUNCH_UPDATE, so seems to make sense to
just have the memory attributes also updated accordingly.

-Mike
>
> > So, for AMD, the return value of kvm_arch_has_private_mem() needs to rely
> > on something else. Maybe the logic can just be:
> >
> > bool svm_private_mem_enabled(struct kvm *kvm)
> > {
> > return sev_enabled(kvm) || sev_snp_enabled(kvm)
> > }
> >
> > (at least in the context of this patchset where UPM support is added for
> > both SEV and SNP).
> >
> > So I'll plan to make that change as well.
> >
> > -Mike
> >
> > >
> > > diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
> > > index 82ba4a564e58..4449aeff0dff 100644
> > > --- a/arch/x86/include/asm/kvm-x86-ops.h
> > > +++ b/arch/x86/include/asm/kvm-x86-ops.h
> > > @@ -129,6 +129,7 @@ KVM_X86_OP(msr_filter_changed)
> > > KVM_X86_OP(complete_emulated_msr)
> > > KVM_X86_OP(vcpu_deliver_sipi_vector)
> > > KVM_X86_OP_OPTIONAL_RET0(vcpu_get_apicv_inhibit_reasons);
> > > +KVM_X86_OP_OPTIONAL_RET0(private_mem_enabled);
> > >
> > > #undef KVM_X86_OP
> > > #undef KVM_X86_OP_OPTIONAL
> > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > > index 1da0474edb2d..1b4b89ddeb55 100644
> > > --- a/arch/x86/include/asm/kvm_host.h
> > > +++ b/arch/x86/include/asm/kvm_host.h
> > > @@ -1574,6 +1574,7 @@ struct kvm_x86_ops {
> > >
> > > void (*load_mmu_pgd)(struct kvm_vcpu *vcpu, hpa_t root_hpa,
> > > int root_level);
> > > + bool (*private_mem_enabled)(struct kvm *kvm);
> > >
> > > bool (*has_wbinvd_exit)(void);
> > >
> > > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> > > index ce362e88a567..73b780fa4653 100644
> > > --- a/arch/x86/kvm/svm/svm.c
> > > +++ b/arch/x86/kvm/svm/svm.c
> > > @@ -4680,6 +4680,14 @@ static int svm_vm_init(struct kvm *kvm)
> > > return 0;
> > > }
> > >
> > > +static bool svm_private_mem_enabled(struct kvm *kvm)
> > > +{
> > > + if (sev_guest(kvm))
> > > + return kvm->arch.upm_mode;
> > > +
> > > + return IS_ENABLED(CONFIG_HAVE_KVM_PRIVATE_MEM_TESTING);
> > > +}
> > > +
> > > static struct kvm_x86_ops svm_x86_ops __initdata = {
> > > .name = "kvm_amd",
> > >
> > > @@ -4760,6 +4768,8 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
> > >
> > > .vcpu_after_set_cpuid = svm_vcpu_after_set_cpuid,
> > >
> > > + .private_mem_enabled = svm_private_mem_enabled,
> > > +
> > > .has_wbinvd_exit = svm_has_wbinvd_exit,
> > >
> > > .get_l2_tsc_offset = svm_get_l2_tsc_offset,
> > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > > index 823646d601db..9a1ca59d36a4 100644
> > > --- a/arch/x86/kvm/x86.c
> > > +++ b/arch/x86/kvm/x86.c
> > > @@ -12556,6 +12556,11 @@ void __user * __x86_set_memory_region(struct kvm *kvm, int id, gpa_t gpa,
> > > }
> > > EXPORT_SYMBOL_GPL(__x86_set_memory_region);
> > >
> > > +bool kvm_arch_has_private_mem(struct kvm *kvm)
> > > +{
> > > + return static_call(kvm_x86_private_mem_enabled)(kvm);
> > > +}
> > > +
> > > void kvm_arch_pre_destroy_vm(struct kvm *kvm)
> > > {
> > > kvm_mmu_pre_destroy_vm(kvm);
> > >
> > > --
> > > Regards/Gruss,
> > > Boris.
> > >
> > > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpeople.kernel.org%2Ftglx%2Fnotes-about-netiquette&data=05%7C01%7Cmichael.roth%40amd.com%7C319e89ce555a46eace4d08dae506b51a%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C638074114318137471%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=aG11K7va1BhemwlKCKKdcIXEwXGUzImYL%2BZ9%2FQ7XToI%3D&reserved=0
>
> BR, Jarkko

2023-02-20 16:42:23

by Michael Roth

[permalink] [raw]
Subject: Re: [PATCH RFC v7 04/64] KVM: x86: Add 'fault_is_private' x86 op

On Fri, Jan 13, 2023 at 03:48:59PM +0000, Sean Christopherson wrote:
> On Fri, Jan 13, 2023, Borislav Petkov wrote:
> > On Wed, Jan 04, 2023 at 08:42:56PM -0600, Michael Roth wrote:
> > > Obviously I need to add some proper documentation for this, but a 1
> > > return basically means 'private_fault' pass-by-ref arg has been set
> > > with the appropriate value, whereas 0 means "there's no platform-specific
> > > handling for this, so if you have some generic way to determine this
> > > then use that instead".
> >
> > Still binary, tho, and can be bool, right?
> >
> > I.e., you can just as well do:
> >
> > if (static_call(kvm_x86_fault_is_private)(kvm, gpa, err, &private_fault))
> > goto out;
> >
> > at the call site.
>
> Ya. Don't spend too much time trying to make this look super pretty though, there
> are subtle bugs inherited from the base UPM series that need to be sorted out and
> will impact this code. E.g. invoking kvm_mem_is_private() outside of the protection
> of mmu_invalidate_seq means changes to the attributes may not be reflected in the
> page tables.
>
> I'm also hoping we can avoid a callback entirely, though that may prove to be
> more pain than gain. I'm poking at the UPM and testing series right now, will
> circle back to this and TDX in a few weeks to see if there's a sane way to communicate
> shared vs. private without having to resort to a callback, and without having
> races between page faults, KVM_SET_MEMORY_ATTRIBUTES, and KVM_SET_USER_MEMORY_REGION2.

Can circle back on this, but for v8 at least I've kept the callback, but
simplified SVM implementation of it so that it's only needed for SNP. For
protected-SEV it will fall through to the same generic handling used by UPM
self-tests.

It seems like it's safe to have a callback of that sort here for TDX/SNP (or
whatever we end up replacing the callback with), since the #NPF flags
themselves won't change based on attribute updates, and the subsequent
comparison to kvm_mem_is_private() will happen after mmu_invalidate_seq
is logged.

But for protected-SEV and UPM selftests the initial kvm_mem_is_private()
can become stale vs. the one in __kvm_faultin_pfn(), but it seems like ATM
it would only lead to a spurious KVM_EXIT_MEMORY_FAULT, which SEV at least
should treat at an implicit page-state change and be able to recover from.
But yah, not ideal, and maybe for self-tests that makes it difficult to tell
if things are working as expected or not.

Maybe we should just skip setting fault->is_private here in the
non-TDX/non-SNP cases, and just have some other indicator so it's
initialized/ignored in kvm_mem_is_private() later. I think some iterations
of UPM did it this way prior to 'is_private' becoming const.

>
> > > This is mainly to handle CONFIG_HAVE_KVM_PRIVATE_MEM_TESTING, which
> > > just parrots whatever kvm_mem_is_private() returns to support running
> > > KVM selftests without needed hardware/platform support. If we don't
> > > take care to skip this check where the above fault_is_private() hook
> > > returns 1, then it ends up breaking SNP in cases where the kernel has
> > > been compiled with CONFIG_HAVE_KVM_PRIVATE_MEM_TESTING, since SNP
> > > relies on the page fault flags to make this determination, not
> > > kvm_mem_is_private(), which normally only tracks the memory attributes
> > > set by userspace via KVM_SET_MEMORY_ATTRIBUTES ioctl.
> >
> > Some of that explanation belongs into the commit message, which is a bit
> > lacking...
>
> I'll circle back to this too when I give this series (and TDX) a proper look,
> there's got too be a better way to handle this.
>

It seems like for SNP/TDX we just need to register the shared/encrypted
bits with KVM MMU and let it handle checking the #NPF flags, but can
iterate on that for the next spin when we have a better idea what it
should look like.

-Mike

2023-02-20 16:43:30

by Michael Roth

[permalink] [raw]
Subject: Re: [PATCH RFC v7 13/64] x86/cpufeatures: Add SEV-SNP CPU feature

On Wed, Feb 01, 2023 at 07:39:32PM +0100, Borislav Petkov wrote:
> On Wed, Dec 14, 2022 at 01:40:05PM -0600, Michael Roth wrote:
> > From: Brijesh Singh <[email protected]>
> >
> > Add CPU feature detection for Secure Encrypted Virtualization with
> > Secure Nested Paging. This feature adds a strong memory integrity
> > protection to help prevent malicious hypervisor-based attacks like
> > data replay, memory re-mapping, and more.
> >
> > Link: https://lore.kernel.org/all/YrGINaPc3cojG6%[email protected]/
>
> That points to some review feedback I've given - dunno if it is
> relevant.
>
> > Signed-off-by: Brijesh Singh <[email protected]>
> > Signed-off-by: Jarkko Sakkinen <[email protected]>
>
> I read this as Jarkko has handled this patch too. Is that the case?

Yes we shared some patches via an internal tree at some stages.

-Mike

>
> > Signed-off-by: Ashish Kalra <[email protected]>
> > Signed-off-by: Michael Roth <[email protected]>
>
> Those last two are ok - you took ovef from Ashish.
>
> Thx.
>
> --
> Regards/Gruss,
> Boris.
>
> https://people.kernel.org/tglx/notes-about-netiquette

2023-02-20 17:50:52

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH RFC v7 13/64] x86/cpufeatures: Add SEV-SNP CPU feature

On Mon, Feb 20, 2023 at 10:26:47AM -0600, Michael Roth wrote:
> On Wed, Feb 01, 2023 at 07:39:32PM +0100, Borislav Petkov wrote:
> > On Wed, Dec 14, 2022 at 01:40:05PM -0600, Michael Roth wrote:
> > > From: Brijesh Singh <[email protected]>
> > >
> > > Add CPU feature detection for Secure Encrypted Virtualization with
> > > Secure Nested Paging. This feature adds a strong memory integrity
> > > protection to help prevent malicious hypervisor-based attacks like
> > > data replay, memory re-mapping, and more.
> > >
> > > Link: https://lore.kernel.org/all/YrGINaPc3cojG6%[email protected]/
> >
> > That points to some review feedback I've given - dunno if it is
> > relevant.
> >
> > > Signed-off-by: Brijesh Singh <[email protected]>
> > > Signed-off-by: Jarkko Sakkinen <[email protected]>
> >
> > I read this as Jarkko has handled this patch too. Is that the case?
>
> Yes we shared some patches via an internal tree at some stages.

In the sense that, he took Brijesh's patch, then he did something with
it(?) and then Ashish took it from him and then you took it from Ashish?

This is how I'm reading this SOB chain at least...

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2023-02-20 18:01:47

by Michael Roth

[permalink] [raw]
Subject: Re: [PATCH RFC v7 13/64] x86/cpufeatures: Add SEV-SNP CPU feature

On Mon, Feb 20, 2023 at 06:50:09PM +0100, Borislav Petkov wrote:
> On Mon, Feb 20, 2023 at 10:26:47AM -0600, Michael Roth wrote:
> > On Wed, Feb 01, 2023 at 07:39:32PM +0100, Borislav Petkov wrote:
> > > On Wed, Dec 14, 2022 at 01:40:05PM -0600, Michael Roth wrote:
> > > > From: Brijesh Singh <[email protected]>
> > > >
> > > > Add CPU feature detection for Secure Encrypted Virtualization with
> > > > Secure Nested Paging. This feature adds a strong memory integrity
> > > > protection to help prevent malicious hypervisor-based attacks like
> > > > data replay, memory re-mapping, and more.
> > > >
> > > > Link: https://lore.kernel.org/all/YrGINaPc3cojG6%[email protected]/
> > >
> > > That points to some review feedback I've given - dunno if it is
> > > relevant.
> > >
> > > > Signed-off-by: Brijesh Singh <[email protected]>
> > > > Signed-off-by: Jarkko Sakkinen <[email protected]>
> > >
> > > I read this as Jarkko has handled this patch too. Is that the case?
> >
> > Yes we shared some patches via an internal tree at some stages.
>
> In the sense that, he took Brijesh's patch, then he did something with
> it(?) and then Ashish took it from him and then you took it from Ashish?

Yes, I think he rebased Ashish's tree on a newer tree and added his SoB on
patches that required any conflict resolutions or changes on his end, so
we kept those intact since then.

-Mike

>
> This is how I'm reading this SOB chain at least...
>
> --
> Regards/Gruss,
> Boris.
>
> https://people.kernel.org/tglx/notes-about-netiquette

2023-02-20 18:36:45

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH RFC v7 13/64] x86/cpufeatures: Add SEV-SNP CPU feature

On Mon, Feb 20, 2023 at 12:00:38PM -0600, Michael Roth wrote:
> Yes, I think he rebased Ashish's tree on a newer tree and added his SoB on
> patches that required any conflict resolutions or changes on his end, so
> we kept those intact since then.

Ok, that makes sense.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2023-03-15 13:51:10

by Peter Gonda

[permalink] [raw]
Subject: Re: [PATCH RFC v7 38/64] KVM: SVM: Add KVM_SEV_SNP_LAUNCH_START command

>
> +/*
> + * The guest context contains all the information, keys and metadata
> + * associated with the guest that the firmware tracks to implement SEV
> + * and SNP features. The firmware stores the guest context in hypervisor
> + * provide page via the SNP_GCTX_CREATE command.
> + */
> +static void *snp_context_create(struct kvm *kvm, struct kvm_sev_cmd *argp)
> +{
> + struct sev_data_snp_addr data = {};
> + void *context;
> + int rc;
> +
> + /* Allocate memory for context page */
> + context = snp_alloc_firmware_page(GFP_KERNEL_ACCOUNT);
> + if (!context)
> + return NULL;
> +
> + data.gctx_paddr = __psp_pa(context);
> + rc = __sev_issue_cmd(argp->sev_fd, SEV_CMD_SNP_GCTX_CREATE, &data, &argp->error);
> + if (rc) {
> + snp_free_firmware_page(context);
> + return NULL;
> + }
> +
> + return context;
> +}
> +
> +static int snp_bind_asid(struct kvm *kvm, int *error)
> +{
> + struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> + struct sev_data_snp_activate data = {0};
> +
> + data.gctx_paddr = __psp_pa(sev->snp_context);
> + data.asid = sev_get_asid(kvm);
> + return sev_issue_cmd(kvm, SEV_CMD_SNP_ACTIVATE, &data, error);
> +}
> +
> +static int snp_launch_start(struct kvm *kvm, struct kvm_sev_cmd *argp)
> +{
> + struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> + struct sev_data_snp_launch_start start = {0};
> + struct kvm_sev_snp_launch_start params;
> + int rc;
> +
> + if (!sev_snp_guest(kvm))
> + return -ENOTTY;
> +
> + if (copy_from_user(&params, (void __user *)(uintptr_t)argp->data, sizeof(params)))
> + return -EFAULT;
> +
> + sev->snp_context = snp_context_create(kvm, argp);
> + if (!sev->snp_context)
> + return -ENOTTY;

This was reported-by josheads@. Its possible that userspace can
repeatedly call snp_launch_start() causing the leak of memory from
repeated snp_context_create() calls, leaking SNP contexts in the ASP,
and leaking ASIDs.

A possible solution could be to just error out if snp_context already exists?


+ if (sev->snp_context)
+ return -EINVAL;
+



> +
> + start.gctx_paddr = __psp_pa(sev->snp_context);
> + start.policy = params.policy;
> + memcpy(start.gosvw, params.gosvw, sizeof(params.gosvw));
> + rc = __sev_issue_cmd(argp->sev_fd, SEV_CMD_SNP_LAUNCH_START, &start, &argp->error);
> + if (rc)
> + goto e_free_context;
> +
> + sev->fd = argp->sev_fd;
> + rc = snp_bind_asid(kvm, &argp->error);
> + if (rc)
> + goto e_free_context;
> +
> + return 0;
> +
> +e_free_context:
> + snp_decommission_context(kvm);
> +
> + return rc;
> +}
> +

2023-05-11 23:04:16

by Dionna Amalie Glaze

[permalink] [raw]
Subject: Re: [PATCH RFC v7 52/64] KVM: SVM: Provide support for SNP_GUEST_REQUEST NAE event

Would it be okay to request that we add a KVM stat for how often there
are GUEST_REQUEST_NAE exits? I think it'd be good for service
operators to get a better idea how valued the feature is.

--
-Dionna Glaze, PhD (she/her)

2023-05-11 23:34:26

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH RFC v7 52/64] KVM: SVM: Provide support for SNP_GUEST_REQUEST NAE event

On Thu, May 11, 2023, Dionna Amalie Glaze wrote:
> Would it be okay to request that we add a KVM stat for how often there
> are GUEST_REQUEST_NAE exits? I think it'd be good for service
> operators to get a better idea how valued the feature is.

Heh, it's always ok to request something, but sometimes the answer will be no.

And in the case, the answer is likely "no stat for you". A year or so ago, in the
context of us (Google) trying to upstream a pile of stats, we (KVM folks) came to
a rough consensus that KVM should only add upstream stats if they are relatively
generic and (almost) universally useful[*]. IMO, a one-off stat for a specific exit
reason is too narrowly focused, e.g. collecting information on all exit reasons is
superior. And no, that won't be accepted upstream either, because for some environments
gathering detailed information on all exits is too much overhead (also covered in
the link).

FWIW, we (GCE) plan on carrying stats like this in out-of-tree patches, i.e. your
request for stats is likely something that would get accepted internally (if it
isn't already captured through our generic stats collection).

[*] https://lore.kernel.org/all/[email protected]

2023-05-15 16:45:45

by Dionna Amalie Glaze

[permalink] [raw]
Subject: Re: [PATCH RFC v7 52/64] KVM: SVM: Provide support for SNP_GUEST_REQUEST NAE event

On Thu, May 11, 2023 at 4:33 PM Sean Christopherson <[email protected]> wrote:
>
> On Thu, May 11, 2023, Dionna Amalie Glaze wrote:
> > Would it be okay to request that we add a KVM stat for how often there
> > are GUEST_REQUEST_NAE exits? I think it'd be good for service
> > operators to get a better idea how valued the feature is.
>
> Heh, it's always ok to request something, but sometimes the answer will be no.
>
> And in the case, the answer is likely "no stat for you". A year or so ago, in the
> context of us (Google) trying to upstream a pile of stats, we (KVM folks) came to
> a rough consensus that KVM should only add upstream stats if they are relatively
> generic and (almost) universally useful[*]. IMO, a one-off stat for a specific exit
> reason is too narrowly focused, e.g. collecting information on all exit reasons is
> superior. And no, that won't be accepted upstream either, because for some environments
> gathering detailed information on all exits is too much overhead (also covered in
> the link).
>
> FWIW, we (GCE) plan on carrying stats like this in out-of-tree patches, i.e. your
> request for stats is likely something that would get accepted internally (if it
> isn't already captured through our generic stats collection).
>
> [*] https://lore.kernel.org/all/[email protected]

Thanks Sean, noted :)

--
-Dionna Glaze, PhD (she/her)