This patchset is also available at:
https://github.com/amdese/linux/commits/snp-host-v11
and is based on top of the following series:
"[PATCH v1] Add AMD Secure Nested Paging (SEV-SNP) Initialization Support"
https://lore.kernel.org/kvm/[email protected]/
which in turn is based on linux-next tag next-20231222
The host initialization patches have been split off to a separate series
as noted above to more easily shepherd them into tip/x86, while this series
now focuses on KVM support. Additionally, the gmem RFC[1] that this series
was previously based on is now included for better visibility and to provide
more context. Please see the RFC link however for more context on why the
gmem changes/hooks are implemented in their current form.
[1] https://lore.kernel.org/kvm/[email protected]/
== OVERVIEW ==
This patchset implements SEV-SNP hypervisor support for linux. It
relies on the SNP host initialization patches noted above, which we
originally included in this series up until v10, but are now posted
separately for inclusion into x86 tree, while this series is targetted
for the x86 KVM tree. Both aggregate of these patchsets are being based
on linux-next to hopefully make it easier coordinate and test against
tip and kvm-next.
== TESTING ==
For testing this via QEMU, use the following tree:
https://github.com/amdese/qemu/commits/snp-v3-wip
SEV-SNP with gmem enabled:
qemu-system-x86_64 -cpu EPYC-Milan-v2 \
-object memory-backend-memfd,id=ram1,size=2G,share=true,prealloc=false,reserve=false \
-object sev-snp-guest,id=sev0,cbitpos=51,reduced-phys-bits=1 \
-machine q35,confidential-guest-support=sev0,memory-backend=ram1 \
...
KVM selftests for guest_memfd / KVM_GENERIC_PRIVATE_MEM:
cd $kernel_src_dir
make -C tools/testing/selftests TARGETS="kvm" EXTRA_CFLAGS="-DDEBUG -I<path to kernel headers>"
sudo tools/testing/selftests/kvm/x86_64/private_mem_conversions_test
== BACKGROUND (SEV-SNP) ==
This part of the Secure Encrypted Paging (SEV-SNP) series focuses on the
changes required to add KVM support for SEV-SNP. The series builds upon
SEV-SNP Guest Support, now part of mainline, and a separate series that
implements basic host initialization requirements for SNP-enabled systems.
This series provides the basic building blocks to support booting the SEV-SNP
VMs, it does not cover all the security enhancement introduced by the SEV-SNP
such as interrupt protection.
The CCP driver is enhanced to provide new APIs that use the SEV-SNP
specific commands defined in the SEV-SNP firmware specification. The KVM
driver uses those APIs to create and managed the SEV-SNP guests.
The GHCB specification version 2 introduces new set of NAE's that is
used by the SEV-SNP guest to communicate with the hypervisor. The series
provides support to handle the following new NAE events:
- Register GHCB GPA
- Page State Change Request
- Hypevisor feature
- Guest message request
When pages are marked as guest-owned in the RMP table, they are assigned
to a specific guest/ASID, as well as a specific GFN with in the guest. Any
attempts to map it in the RMP table to a different guest/ASID, or a
different GFN within a guest/ASID, will result in an RMP nested page fault.
Prior to accessing a guest-owned page, the guest must validate it with a
special PVALIDATE instruction which will set a special bit in the RMP table
for the guest. This is the only way to set the validated bit outisde of the
initial pre-encrypted guest payload/image; any attempts outside the guest to
modify the RMP entry from that point forward will result in the validated
bit being cleared, at which point the guest will trigger an exception if it
attempts to access that page so it can be made aware of possible tampering.
One exception to this is the initial guest payload, which is pre-validated
by the firmware prior to launching. The guest can use Guest Message requests
to fetch an attestation report which will include the measurement of the
initial image so that the guest can verify it was booted with the expected
image/environment.
After boot, guests can use Page State Change requests to switch pages
between shared/hypervisor-owned and private/guest-owned to share data for
things like DMA, virtio buffers, and other GHCB requests.
In this implementation SEV-SNP, private guest memory is managed by a new
kernel framework called guest_memfd (gmem). With gmem, a new
KVM_SET_MEMORY_ATTRIBUTES KVM ioctl has been added to tell the KVM
MMU whether a particular GFN should be backed by shared (normal) memory or
private (gmem-allocated) memory. To tie into this, Page State Change
requests are forward to userspace via KVM_EXIT_VMGEXIT exits, which will
then issue the corresponding KVM_SET_MEMORY_ATTRIBUTES call to set the
private/shared state in the KVM MMU.
The gmem / KVM MMU hooks implemented in this series will then update the RMP
table entries for the backing PFNs to set them to guest-owned/private when
mapping private pages into the guest via KVM MMU, or use the normal KVM MMU
handling in the case of shared pages where the corresponding RMP table
entries are left in the default shared/hypervisor-owned state.
Feedback/review is very much appreciated!
-Mike
Changes since v10:
* Split off host initialization patches to separate series
* Drop SNP_{SET,GET}_EXT_CONFIG SEV ioctls, and drop
KVM_SEV_SNP_{SET,GET}_CERTS KVM ioctls. Instead, all certificate data is
now fetched from uerspace as part of a new KVM_EXIT_VMGEXIT event type.
(Sean, Dionna)
* SNP_SET_EXT_CONFIG is now replaced with a more basic SNP_SET_CONFIG,
which is now just a light wrapper around the SNP_CONFIG firmware command,
and SNP_GET_EXT_CONFIG is now redundant with existing SNP_PLATFORM_STATUS,
so just stick with that interface
* Introduce SNP_SET_CONFIG_{START,END}, which can be used to pause extended
guest requests while reported TCB / certificates are being updated so
the updates are done atomically relative to running guests.
* Improve documentation for KVM_EXIT_VMGEXIT event types and tighten down
the expected input/output for union types rather than exposing GHCB
page/MSR
* Various re-factorings, commit/comments fixups (Boris, Liam, Vlastimil)
* Make CONFIG_KVM_AMD_SEV depend on KVM_GENERIC_PRIVATE_MEM instead of
CONFIG_KVM_SW_PROTECTED_VM (Paolo)
* Include Sean's patch to add hugepage support to gmem, but modify it based
on discussions to be best-effort and not rely on explicit flag
Changes since v9:
* Split off gmem changes to separate RFC series, drop RFC tag from this series
* Use 2M RMPUPDATE instructions whenever possible when invalidating/releasing
gmem pages
* Tighten up RMP #NPF handling to better differentiate spurious cases from
unexpected behavior
* Simplify/optimize logic for determine when 2M NPT private mappings are
possible
* Be more consistent with PFN data types and stub return values (Dave)
* Reduce potential flooding from frequently-printed pr_debug()'s (Dave)
* Use existing #PF handling paths to catch illegal userspace-generated RMP
faults (Dave)
* Improve host kexec/kdump support (Ashish)
* Reduce overhead from unecessary WBINVD via MMU notifiers (Ashish)
* Avoid host crashes during CCP module probe if SNP_INIT* is issued while
guests are running (Tom L.)
* Simplify AutoIBRS disablement (Kim, Dave)
* Avoid unecessary zero'ing in extended guest requests (Alexey)
* Fix padding in struct sev_user_data_ext_snp_config (Alexey)
* Report AP creation failures via GHCB error codes rather than inducing #GP in
guest (Peter)
* Disallow multiple allocations of snp_context via userspace (Peter)
* Error out on unsupported SNP policy bits (Tom)
* Fix snp_leak_pages() stub (Jeremi)
* Use C99 flexible arrays where appropriate
* Use helper to handle HVA->PFN conversions prior to dumping RMP entries (Dave)
* Don't potentially print out all 512 entries when dumping 2MB RMP range (Dave)
* Don't use a union to dump raw RMP entries, just cast at dump-site (Dave)
* Don't use helpers to access RMP entry bitfields, use them directly (Dave)
* Simplify logic and improve comments for AutoIBRS disablement (Dave)
# Changes that were split off to separate gmem series
* Use KVM_X86_SNP_VM to implement SNP-specific checks on whether a fault was
shared/private and drop the duplicate memslot lookup (Isaku, Sean)
* Use Isaku's version of patch to plumb 64-bit #NPF error code (Isaku)
* Fix up stub for kvm_arch_gmem_invalidate() (Boris)
----------------------------------------------------------------
Ashish Kalra (1):
KVM: SEV: Avoid WBINVD for HVA-based MMU notifications for SNP
Brijesh Singh (14):
KVM: x86: Define RMP page fault error bits for #NPF
KVM: SEV: Add GHCB handling for Hypervisor Feature Support requests
KVM: SEV: Add initial SEV-SNP support
KVM: SEV: Add KVM_SNP_INIT command
KVM: SEV: Add KVM_SEV_SNP_LAUNCH_START command
KVM: SEV: Add KVM_SEV_SNP_LAUNCH_UPDATE command
KVM: SEV: Add KVM_SEV_SNP_LAUNCH_FINISH command
KVM: SEV: Add support to handle GHCB GPA register VMGEXIT
KVM: SEV: Add support to handle MSR based Page State Change VMGEXIT
KVM: SEV: Add support to handle Page State Change VMGEXIT
KVM: x86: Export the kvm_zap_gfn_range() for the SNP use
KVM: SEV: Add support to handle RMP nested page faults
KVM: SVM: Add module parameter to enable the SEV-SNP
KVM: SEV: Provide support for SNP_GUEST_REQUEST NAE event
Michael Roth (15):
mm: Introduce AS_INACCESSIBLE for encrypted/confidential memory
KVM: Use AS_INACCESSIBLE when creating guest_memfd inode
KVM: x86: Add gmem hook for initializing memory
KVM: x86: Add gmem hook for invalidating memory
KVM: x86/mmu: Pass around full 64-bit error code for KVM page faults
KVM: x86: Add KVM_X86_SNP_VM vm_type
KVM: x86: Determine shared/private faults based on vm_type
KVM: SEV: Do not intercept accesses to MSR_IA32_XSS for SEV-ES guests
KVM: SEV: Select KVM_GENERIC_PRIVATE_MEM when CONFIG_KVM_AMD_SEV=y
KVM: SEV: Add support for GHCB-based termination requests
KVM: SEV: Implement gmem hook for initializing private pages
KVM: SEV: Implement gmem hook for invalidating private pages
KVM: x86: Add gmem hook for determining max NPT mapping level
crypto: ccp: Add the SNP_SET_CONFIG_{START,END} commands
KVM: SEV: Provide support for SNP_EXTENDED_GUEST_REQUEST NAE event
Sean Christopherson (1):
KVM: Add hugepage support for dedicated guest memory
Tom Lendacky (3):
KVM: SEV: Add support to handle AP reset MSR protocol
KVM: SEV: Use a VMSA physical address variable for populating VMCB
KVM: SEV: Support SEV-SNP AP Creation NAE event
Vishal Annapurve (1):
KVM: Add HVA range operator
Documentation/virt/coco/sev-guest.rst | 33 +-
Documentation/virt/kvm/api.rst | 73 ++
.../virt/kvm/x86/amd-memory-encryption.rst | 103 ++
arch/x86/include/asm/kvm-x86-ops.h | 3 +
arch/x86/include/asm/kvm_host.h | 15 +
arch/x86/include/asm/sev-common.h | 22 +-
arch/x86/include/asm/sev.h | 11 +
arch/x86/include/asm/svm.h | 6 +
arch/x86/include/uapi/asm/kvm.h | 1 +
arch/x86/kvm/Kconfig | 3 +
arch/x86/kvm/mmu.h | 2 -
arch/x86/kvm/mmu/mmu.c | 28 +-
arch/x86/kvm/mmu/mmu_internal.h | 24 +-
arch/x86/kvm/mmu/mmutrace.h | 2 +-
arch/x86/kvm/mmu/paging_tmpl.h | 2 +-
arch/x86/kvm/svm/sev.c | 1362 +++++++++++++++++++-
arch/x86/kvm/svm/svm.c | 38 +-
arch/x86/kvm/svm/svm.h | 40 +-
arch/x86/kvm/x86.c | 44 +-
arch/x86/virt/svm/sev.c | 51 +
drivers/crypto/ccp/sev-dev.c | 44 +
include/linux/kvm_host.h | 24 +
include/linux/pagemap.h | 1 +
include/uapi/linux/kvm.h | 84 ++
include/uapi/linux/psp-sev.h | 12 +
include/uapi/linux/sev-guest.h | 9 +
mm/truncate.c | 3 +-
virt/kvm/Kconfig | 8 +
virt/kvm/guest_memfd.c | 132 +-
virt/kvm/kvm_main.c | 49 +
30 files changed, 2175 insertions(+), 54 deletions(-)
SEV-SNP relies on restricted/protected memory support to run guests, so
make sure to enable that support via the CONFIG_KVM_GENERIC_PRIVATE_MEM
config option.
Signed-off-by: Michael Roth <[email protected]>
---
arch/x86/kvm/Kconfig | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index e807eb56dc08..4ec53d6d5773 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -124,6 +124,7 @@ config KVM_AMD_SEV
bool "AMD Secure Encrypted Virtualization (SEV) support"
depends on KVM_AMD && X86_64
depends on CRYPTO_DEV_SP_PSP && !(KVM_AMD=y && CRYPTO_DEV_CCP_DD=m)
+ select KVM_GENERIC_PRIVATE_MEM
help
Provides support for launching Encrypted VMs (SEV) and Encrypted VMs
with Encrypted State (SEV-ES) on AMD processors.
--
2.25.1
From: Tom Lendacky <[email protected]>
Add support for AP Reset Hold being invoked using the GHCB MSR protocol,
available in version 2 of the GHCB specification.
Signed-off-by: Tom Lendacky <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
Signed-off-by: Michael Roth <[email protected]>
---
arch/x86/include/asm/sev-common.h | 6 ++--
arch/x86/kvm/svm/sev.c | 56 ++++++++++++++++++++++++++-----
arch/x86/kvm/svm/svm.h | 1 +
3 files changed, 53 insertions(+), 10 deletions(-)
diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
index b463fcbd4b90..01261f7054ad 100644
--- a/arch/x86/include/asm/sev-common.h
+++ b/arch/x86/include/asm/sev-common.h
@@ -54,8 +54,10 @@
(((unsigned long)fn) << 32))
/* AP Reset Hold */
-#define GHCB_MSR_AP_RESET_HOLD_REQ 0x006
-#define GHCB_MSR_AP_RESET_HOLD_RESP 0x007
+#define GHCB_MSR_AP_RESET_HOLD_REQ 0x006
+#define GHCB_MSR_AP_RESET_HOLD_RESP 0x007
+#define GHCB_MSR_AP_RESET_HOLD_RESULT_POS 12
+#define GHCB_MSR_AP_RESET_HOLD_RESULT_MASK GENMASK_ULL(51, 0)
/* GHCB GPA Register */
#define GHCB_MSR_REG_GPA_REQ 0x012
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index f99435b6648f..b09bdaed586e 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -65,6 +65,10 @@ module_param_named(debug_swap, sev_es_debug_swap_enabled, bool, 0444);
#define sev_es_debug_swap_enabled false
#endif /* CONFIG_KVM_AMD_SEV */
+#define AP_RESET_HOLD_NONE 0
+#define AP_RESET_HOLD_NAE_EVENT 1
+#define AP_RESET_HOLD_MSR_PROTO 2
+
static u8 sev_enc_bit;
static DECLARE_RWSEM(sev_deactivate_lock);
static DEFINE_MUTEX(sev_bitmap_lock);
@@ -2600,6 +2604,9 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
void sev_es_unmap_ghcb(struct vcpu_svm *svm)
{
+ /* Clear any indication that the vCPU is in a type of AP Reset Hold */
+ svm->sev_es.ap_reset_hold_type = AP_RESET_HOLD_NONE;
+
if (!svm->sev_es.ghcb)
return;
@@ -2811,6 +2818,22 @@ static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
GHCB_MSR_INFO_POS);
break;
}
+ case GHCB_MSR_AP_RESET_HOLD_REQ:
+ svm->sev_es.ap_reset_hold_type = AP_RESET_HOLD_MSR_PROTO;
+ ret = kvm_emulate_ap_reset_hold(&svm->vcpu);
+
+ /*
+ * Preset the result to a non-SIPI return and then only set
+ * the result to non-zero when delivering a SIPI.
+ */
+ set_ghcb_msr_bits(svm, 0,
+ GHCB_MSR_AP_RESET_HOLD_RESULT_MASK,
+ GHCB_MSR_AP_RESET_HOLD_RESULT_POS);
+
+ set_ghcb_msr_bits(svm, GHCB_MSR_AP_RESET_HOLD_RESP,
+ GHCB_MSR_INFO_MASK,
+ GHCB_MSR_INFO_POS);
+ break;
case GHCB_MSR_TERM_REQ: {
u64 reason_set, reason_code;
@@ -2910,6 +2933,7 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
ret = 1;
break;
case SVM_VMGEXIT_AP_HLT_LOOP:
+ svm->sev_es.ap_reset_hold_type = AP_RESET_HOLD_NAE_EVENT;
ret = kvm_emulate_ap_reset_hold(vcpu);
break;
case SVM_VMGEXIT_AP_JUMP_TABLE: {
@@ -3153,15 +3177,31 @@ void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector)
return;
}
- /*
- * Subsequent SIPI: Return from an AP Reset Hold VMGEXIT, where
- * the guest will set the CS and RIP. Set SW_EXIT_INFO_2 to a
- * non-zero value.
- */
- if (!svm->sev_es.ghcb)
- return;
+ /* Subsequent SIPI */
+ switch (svm->sev_es.ap_reset_hold_type) {
+ case AP_RESET_HOLD_NAE_EVENT:
+ /*
+ * Return from an AP Reset Hold VMGEXIT, where the guest will
+ * set the CS and RIP. Set SW_EXIT_INFO_2 to a non-zero value.
+ */
+ ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, 1);
+ break;
+ case AP_RESET_HOLD_MSR_PROTO:
+ /*
+ * Return from an AP Reset Hold VMGEXIT, where the guest will
+ * set the CS and RIP. Set GHCB data field to a non-zero value.
+ */
+ set_ghcb_msr_bits(svm, 1,
+ GHCB_MSR_AP_RESET_HOLD_RESULT_MASK,
+ GHCB_MSR_AP_RESET_HOLD_RESULT_POS);
- ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, 1);
+ set_ghcb_msr_bits(svm, GHCB_MSR_AP_RESET_HOLD_RESP,
+ GHCB_MSR_INFO_MASK,
+ GHCB_MSR_INFO_POS);
+ break;
+ default:
+ break;
+ }
}
struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 7f1fbd874c45..eecb2b744d79 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -197,6 +197,7 @@ struct vcpu_sev_es_state {
u8 valid_bitmap[16];
struct kvm_host_map ghcb_map;
bool received_first_sipi;
+ unsigned int ap_reset_hold_type;
/* SEV-ES scratch area support */
u64 sw_scratch;
--
2.25.1
From: Brijesh Singh <[email protected]>
Version 2 of the GHCB specification introduced advertisement of features
that are supported by the Hypervisor.
Now that KVM supports version 2 of the GHCB specification, bump the
maximum supported protocol version.
Signed-off-by: Brijesh Singh <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
Signed-off-by: Michael Roth <[email protected]>
---
arch/x86/include/asm/sev-common.h | 2 ++
arch/x86/kvm/svm/sev.c | 12 ++++++++++++
arch/x86/kvm/svm/svm.h | 3 ++-
3 files changed, 16 insertions(+), 1 deletion(-)
diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
index 01261f7054ad..5a8246dd532f 100644
--- a/arch/x86/include/asm/sev-common.h
+++ b/arch/x86/include/asm/sev-common.h
@@ -101,6 +101,8 @@ enum psc_op {
/* GHCB Hypervisor Feature Request/Response */
#define GHCB_MSR_HV_FT_REQ 0x080
#define GHCB_MSR_HV_FT_RESP 0x081
+#define GHCB_MSR_HV_FT_POS 12
+#define GHCB_MSR_HV_FT_MASK GENMASK_ULL(51, 0)
#define GHCB_MSR_HV_FT_RESP_VAL(v) \
/* GHCBData[63:12] */ \
(((u64)(v) & GENMASK_ULL(63, 12)) >> 12)
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index b09bdaed586e..d6e206d21750 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -2574,6 +2574,7 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
case SVM_VMGEXIT_AP_HLT_LOOP:
case SVM_VMGEXIT_AP_JUMP_TABLE:
case SVM_VMGEXIT_UNSUPPORTED_EVENT:
+ case SVM_VMGEXIT_HV_FEATURES:
break;
default:
reason = GHCB_ERR_INVALID_EVENT;
@@ -2834,6 +2835,12 @@ static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
GHCB_MSR_INFO_MASK,
GHCB_MSR_INFO_POS);
break;
+ case GHCB_MSR_HV_FT_REQ:
+ set_ghcb_msr_bits(svm, GHCB_HV_FT_SUPPORTED,
+ GHCB_MSR_HV_FT_MASK, GHCB_MSR_HV_FT_POS);
+ set_ghcb_msr_bits(svm, GHCB_MSR_HV_FT_RESP,
+ GHCB_MSR_INFO_MASK, GHCB_MSR_INFO_POS);
+ break;
case GHCB_MSR_TERM_REQ: {
u64 reason_set, reason_code;
@@ -2958,6 +2965,11 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
ret = 1;
break;
}
+ case SVM_VMGEXIT_HV_FEATURES:
+ ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, GHCB_HV_FT_SUPPORTED);
+
+ ret = 1;
+ break;
case SVM_VMGEXIT_UNSUPPORTED_EVENT:
vcpu_unimpl(vcpu,
"vmgexit: unsupported event - exit_info_1=%#llx, exit_info_2=%#llx\n",
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index eecb2b744d79..d0f8167ada7c 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -665,9 +665,10 @@ void avic_refresh_virtual_apic_mode(struct kvm_vcpu *vcpu);
/* sev.c */
-#define GHCB_VERSION_MAX 1ULL
+#define GHCB_VERSION_MAX 2ULL
#define GHCB_VERSION_MIN 1ULL
+#define GHCB_HV_FT_SUPPORTED GHCB_HV_FT_SNP
extern unsigned int max_sev_asid;
--
2.25.1
From: Brijesh Singh <[email protected]>
The next generation of SEV is called SEV-SNP (Secure Nested Paging).
SEV-SNP builds upon existing SEV and SEV-ES functionality while adding
new hardware-based security protection. SEV-SNP adds strong memory
encryption and integrity protection to help prevent malicious
hypervisor-based attacks such as data replay, memory re-mapping, and
more, to create an isolated execution environment.
Implement some initial infrastructure in KVM to check/report when SNP is
enabled on the system.
Signed-off-by: Brijesh Singh <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
[mdr: commit fixups, use similar ASID reporting as with SEV/SEV-ES]
Signed-off-by: Michael Roth <[email protected]>
---
arch/x86/kvm/svm/sev.c | 10 ++++++++++
arch/x86/kvm/svm/svm.h | 8 ++++++++
2 files changed, 18 insertions(+)
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index d6e206d21750..18c09863377b 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -59,10 +59,13 @@ module_param_named(sev_es, sev_es_enabled, bool, 0444);
/* enable/disable SEV-ES DebugSwap support */
static bool sev_es_debug_swap_enabled = true;
module_param_named(debug_swap, sev_es_debug_swap_enabled, bool, 0444);
+
+static bool sev_snp_enabled;
#else
#define sev_enabled false
#define sev_es_enabled false
#define sev_es_debug_swap_enabled false
+#define sev_snp_enabled false
#endif /* CONFIG_KVM_AMD_SEV */
#define AP_RESET_HOLD_NONE 0
@@ -2189,6 +2192,7 @@ void __init sev_hardware_setup(void)
{
#ifdef CONFIG_KVM_AMD_SEV
unsigned int eax, ebx, ecx, edx, sev_asid_count, sev_es_asid_count;
+ bool sev_snp_supported = false;
bool sev_es_supported = false;
bool sev_supported = false;
@@ -2267,6 +2271,7 @@ void __init sev_hardware_setup(void)
sev_es_asid_count = min_sev_asid - 1;
WARN_ON_ONCE(misc_cg_set_capacity(MISC_CG_RES_SEV_ES, sev_es_asid_count));
sev_es_supported = true;
+ sev_snp_supported = sev_snp_enabled && cpu_feature_enabled(X86_FEATURE_SEV_SNP);
out:
if (boot_cpu_has(X86_FEATURE_SEV))
@@ -2277,12 +2282,17 @@ void __init sev_hardware_setup(void)
pr_info("SEV-ES %s (ASIDs %u - %u)\n",
sev_es_supported ? "enabled" : "disabled",
min_sev_asid > 1 ? 1 : 0, min_sev_asid - 1);
+ if (boot_cpu_has(X86_FEATURE_SEV_SNP))
+ pr_info("SEV-SNP %s (ASIDs %u - %u)\n",
+ sev_snp_supported ? "enabled" : "disabled",
+ min_sev_asid > 1 ? 1 : 0, min_sev_asid - 1);
sev_enabled = sev_supported;
sev_es_enabled = sev_es_supported;
if (!sev_es_enabled || !cpu_feature_enabled(X86_FEATURE_DEBUG_SWAP) ||
!cpu_feature_enabled(X86_FEATURE_NO_NESTED_DATA_BP))
sev_es_debug_swap_enabled = false;
+ sev_snp_enabled = sev_snp_supported;
#endif
}
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index d0f8167ada7c..a3e27c82866b 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -79,6 +79,7 @@ enum {
struct kvm_sev_info {
bool active; /* SEV enabled guest */
bool es_active; /* SEV-ES enabled guest */
+ bool snp_active; /* SEV-SNP enabled guest */
unsigned int asid; /* ASID used for this guest */
unsigned int handle; /* SEV firmware handle */
int fd; /* SEV device fd */
@@ -341,6 +342,13 @@ static __always_inline bool sev_es_guest(struct kvm *kvm)
#endif
}
+static __always_inline bool sev_snp_guest(struct kvm *kvm)
+{
+ struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+
+ return sev_es_guest(kvm) && sev->snp_active;
+}
+
static inline void vmcb_mark_all_dirty(struct vmcb *vmcb)
{
vmcb->control.clean = 0;
--
2.25.1
From: Brijesh Singh <[email protected]>
The KVM_SNP_INIT command is used by the hypervisor to initialize the
SEV-SNP platform context. In a typical workflow, this command should be
the first command issued. When creating SEV-SNP guest, the VMM must use
this command instead of the KVM_SEV_INIT or KVM_SEV_ES_INIT.
The flags value must be zero, it will be extended in future SNP support
to communicate the optional features (such as restricted INT injection
etc).
Co-developed-by: Pavan Kumar Paluri <[email protected]>
Signed-off-by: Pavan Kumar Paluri <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
Signed-off-by: Michael Roth <[email protected]>
---
.../virt/kvm/x86/amd-memory-encryption.rst | 27 +++++++++++++
arch/x86/include/asm/svm.h | 1 +
arch/x86/kvm/svm/sev.c | 39 ++++++++++++++++++-
arch/x86/kvm/svm/svm.h | 4 ++
include/uapi/linux/kvm.h | 13 +++++++
5 files changed, 83 insertions(+), 1 deletion(-)
diff --git a/Documentation/virt/kvm/x86/amd-memory-encryption.rst b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
index 995780088eb2..b1a19c9a577a 100644
--- a/Documentation/virt/kvm/x86/amd-memory-encryption.rst
+++ b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
@@ -434,6 +434,33 @@ issued by the hypervisor to make the guest ready for execution.
Returns: 0 on success, -negative on error
+18. KVM_SNP_INIT
+----------------
+
+The KVM_SNP_INIT command can be used by the hypervisor to initialize SEV-SNP
+context. In a typical workflow, this command should be the first command issued.
+
+Parameters (in/out): struct kvm_snp_init
+
+Returns: 0 on success, -negative on error
+
+::
+
+ struct kvm_snp_init {
+ __u64 flags;
+ };
+
+The flags bitmap is defined as::
+
+ /* enable the restricted injection */
+ #define KVM_SEV_SNP_RESTRICTED_INJET (1<<0)
+
+ /* enable the restricted injection timer */
+ #define KVM_SEV_SNP_RESTRICTED_TIMER_INJET (1<<1)
+
+If the specified flags is not supported then return -EOPNOTSUPP, and the supported
+flags are returned.
+
References
==========
diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index 87a7b917d30e..ba8ce15b27d7 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -286,6 +286,7 @@ static_assert((X2AVIC_MAX_PHYSICAL_ID & AVIC_PHYSICAL_MAX_INDEX_MASK) == X2AVIC_
#define AVIC_HPA_MASK ~((0xFFFULL << 52) | 0xFFF)
#define SVM_SEV_FEAT_DEBUG_SWAP BIT(5)
+#define SVM_SEV_FEAT_SNP_ACTIVE BIT(0)
struct vmcb_seg {
u16 selector;
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 18c09863377b..43b8ae7b74f8 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -250,6 +250,25 @@ static void sev_unbind_asid(struct kvm *kvm, unsigned int handle)
sev_decommission(handle);
}
+static int verify_snp_init_flags(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+ struct kvm_snp_init params;
+ int ret = 0;
+
+ if (copy_from_user(¶ms, (void __user *)(uintptr_t)argp->data, sizeof(params)))
+ return -EFAULT;
+
+ if (params.flags & ~SEV_SNP_SUPPORTED_FLAGS)
+ ret = -EOPNOTSUPP;
+
+ params.flags = SEV_SNP_SUPPORTED_FLAGS;
+
+ if (copy_to_user((void __user *)(uintptr_t)argp->data, ¶ms, sizeof(params)))
+ ret = -EFAULT;
+
+ return ret;
+}
+
static int sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp)
{
struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
@@ -264,12 +283,19 @@ static int sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp)
return ret;
sev->active = true;
- sev->es_active = argp->id == KVM_SEV_ES_INIT;
+ sev->es_active = (argp->id == KVM_SEV_ES_INIT || argp->id == KVM_SEV_SNP_INIT);
+ sev->snp_active = argp->id == KVM_SEV_SNP_INIT;
asid = sev_asid_new(sev);
if (asid < 0)
goto e_no_asid;
sev->asid = asid;
+ if (sev->snp_active) {
+ ret = verify_snp_init_flags(kvm, argp);
+ if (ret)
+ goto e_free;
+ }
+
init_args.probe = false;
ret = sev_platform_init(&init_args);
if (ret)
@@ -287,6 +313,7 @@ static int sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp)
sev_asid_free(sev);
sev->asid = 0;
e_no_asid:
+ sev->snp_active = false;
sev->es_active = false;
sev->active = false;
return ret;
@@ -625,6 +652,10 @@ static int sev_es_sync_vmsa(struct vcpu_svm *svm)
if (sev_es_debug_swap_enabled)
save->sev_features |= SVM_SEV_FEAT_DEBUG_SWAP;
+ /* Enable the SEV-SNP feature */
+ if (sev_snp_guest(svm->vcpu.kvm))
+ save->sev_features |= SVM_SEV_FEAT_SNP_ACTIVE;
+
pr_debug("Virtual Machine Save Area (VMSA):\n");
print_hex_dump_debug("", DUMP_PREFIX_NONE, 16, 1, save, sizeof(*save), false);
@@ -1883,6 +1914,12 @@ int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
}
switch (sev_cmd.id) {
+ case KVM_SEV_SNP_INIT:
+ if (!sev_snp_enabled) {
+ r = -ENOTTY;
+ goto out;
+ }
+ fallthrough;
case KVM_SEV_ES_INIT:
if (!sev_es_enabled) {
r = -ENOTTY;
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index a3e27c82866b..07a9eb5b6ce5 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -76,6 +76,9 @@ enum {
/* TPR and CR2 are always written before VMRUN */
#define VMCB_ALWAYS_DIRTY_MASK ((1U << VMCB_INTR) | (1U << VMCB_CR2))
+/* Supported init feature flags */
+#define SEV_SNP_SUPPORTED_FLAGS 0x0
+
struct kvm_sev_info {
bool active; /* SEV enabled guest */
bool es_active; /* SEV-ES enabled guest */
@@ -91,6 +94,7 @@ struct kvm_sev_info {
struct list_head mirror_entry; /* Use as a list entry of mirrors */
struct misc_cg *misc_cg; /* For misc cgroup accounting */
atomic_t migration_in_progress;
+ u64 snp_init_flags;
};
struct kvm_svm {
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index c3308536482b..73702e9b9d76 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1869,6 +1869,9 @@ enum sev_cmd_id {
/* Guest Migration Extension */
KVM_SEV_SEND_CANCEL,
+ /* SNP specific commands */
+ KVM_SEV_SNP_INIT,
+
KVM_SEV_NR_MAX,
};
@@ -1965,6 +1968,16 @@ struct kvm_sev_receive_update_data {
__u32 trans_len;
};
+/* enable the restricted injection */
+#define KVM_SEV_SNP_RESTRICTED_INJET (1 << 0)
+
+/* enable the restricted injection timer */
+#define KVM_SEV_SNP_RESTRICTED_TIMER_INJET (1 << 1)
+
+struct kvm_snp_init {
+ __u64 flags;
+};
+
#define KVM_DEV_ASSIGN_ENABLE_IOMMU (1 << 0)
#define KVM_DEV_ASSIGN_PCI_2_3 (1 << 1)
#define KVM_DEV_ASSIGN_MASK_INTX (1 << 2)
--
2.25.1
From: Brijesh Singh <[email protected]>
KVM_SEV_SNP_LAUNCH_START begins the launch process for an SEV-SNP guest.
The command initializes a cryptographic digest context used to construct
the measurement of the guest. If the guest is expected to be migrated,
the command also binds a migration agent (MA) to the guest.
For more information see the SEV-SNP specification.
Signed-off-by: Brijesh Singh <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
[mdr: hold sev_deactivate_lock when calling SEV_CMD_SNP_DECOMMISSION]
Signed-off-by: Michael Roth <[email protected]>
---
.../virt/kvm/x86/amd-memory-encryption.rst | 24 +++
arch/x86/kvm/svm/sev.c | 144 +++++++++++++++++-
arch/x86/kvm/svm/svm.h | 1 +
include/uapi/linux/kvm.h | 10 ++
4 files changed, 176 insertions(+), 3 deletions(-)
diff --git a/Documentation/virt/kvm/x86/amd-memory-encryption.rst b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
index b1a19c9a577a..b1beb2fe8766 100644
--- a/Documentation/virt/kvm/x86/amd-memory-encryption.rst
+++ b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
@@ -461,6 +461,30 @@ The flags bitmap is defined as::
If the specified flags is not supported then return -EOPNOTSUPP, and the supported
flags are returned.
+19. KVM_SNP_LAUNCH_START
+------------------------
+
+The KVM_SNP_LAUNCH_START command is used for creating the memory encryption
+context for the SEV-SNP guest. To create the encryption context, user must
+provide a guest policy, migration agent (if any) and guest OS visible
+workarounds value as defined SEV-SNP specification.
+
+Parameters (in): struct kvm_snp_launch_start
+
+Returns: 0 on success, -negative on error
+
+::
+
+ struct kvm_sev_snp_launch_start {
+ __u64 policy; /* Guest policy to use. */
+ __u64 ma_uaddr; /* userspace address of migration agent */
+ __u8 ma_en; /* 1 if the migration agent is enabled */
+ __u8 imi_en; /* set IMI to 1. */
+ __u8 gosvw[16]; /* guest OS visible workarounds */
+ };
+
+See the SEV-SNP specification for further detail on the launch input.
+
References
==========
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 43b8ae7b74f8..e2f4d4bc125c 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -24,6 +24,7 @@
#include <asm/trapnr.h>
#include <asm/fpu/xcr.h>
#include <asm/debugreg.h>
+#include <asm/sev.h>
#include "mmu.h"
#include "x86.h"
@@ -72,6 +73,10 @@ static bool sev_snp_enabled;
#define AP_RESET_HOLD_NAE_EVENT 1
#define AP_RESET_HOLD_MSR_PROTO 2
+/* As defined by SEV-SNP Firmware ABI, under "Guest Policy". */
+#define SNP_POLICY_MASK_SMT BIT_ULL(16)
+#define SNP_POLICY_MASK_SINGLE_SOCKET BIT_ULL(20)
+
static u8 sev_enc_bit;
static DECLARE_RWSEM(sev_deactivate_lock);
static DEFINE_MUTEX(sev_bitmap_lock);
@@ -82,6 +87,8 @@ static unsigned int nr_asids;
static unsigned long *sev_asid_bitmap;
static unsigned long *sev_reclaim_asid_bitmap;
+static int snp_decommission_context(struct kvm *kvm);
+
struct enc_region {
struct list_head list;
unsigned long npages;
@@ -107,12 +114,17 @@ static int sev_flush_asids(int min_asid, int max_asid)
down_write(&sev_deactivate_lock);
wbinvd_on_all_cpus();
- ret = sev_guest_df_flush(&error);
+
+ if (sev_snp_enabled)
+ ret = sev_do_cmd(SEV_CMD_SNP_DF_FLUSH, NULL, &error);
+ else
+ ret = sev_guest_df_flush(&error);
up_write(&sev_deactivate_lock);
if (ret)
- pr_err("SEV: DF_FLUSH failed, ret=%d, error=%#x\n", ret, error);
+ pr_err("SEV%s: DF_FLUSH failed, ret=%d, error=%#x\n",
+ sev_snp_enabled ? "-SNP" : "", ret, error);
return ret;
}
@@ -1890,6 +1902,94 @@ int sev_vm_move_enc_context_from(struct kvm *kvm, unsigned int source_fd)
return ret;
}
+/*
+ * The guest context contains all the information, keys and metadata
+ * associated with the guest that the firmware tracks to implement SEV
+ * and SNP features. The firmware stores the guest context in hypervisor
+ * provide page via the SNP_GCTX_CREATE command.
+ */
+static void *snp_context_create(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+ struct sev_data_snp_addr data = {};
+ void *context;
+ int rc;
+
+ /* Allocate memory for context page */
+ context = snp_alloc_firmware_page(GFP_KERNEL_ACCOUNT);
+ if (!context)
+ return NULL;
+
+ data.gctx_paddr = __psp_pa(context);
+ rc = __sev_issue_cmd(argp->sev_fd, SEV_CMD_SNP_GCTX_CREATE, &data, &argp->error);
+ if (rc) {
+ snp_free_firmware_page(context);
+ return NULL;
+ }
+
+ return context;
+}
+
+static int snp_bind_asid(struct kvm *kvm, int *error)
+{
+ struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+ struct sev_data_snp_activate data = {0};
+
+ data.gctx_paddr = __psp_pa(sev->snp_context);
+ data.asid = sev_get_asid(kvm);
+ return sev_issue_cmd(kvm, SEV_CMD_SNP_ACTIVATE, &data, error);
+}
+
+static int snp_launch_start(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+ struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+ struct sev_data_snp_launch_start start = {0};
+ struct kvm_sev_snp_launch_start params;
+ int rc;
+
+ if (!sev_snp_guest(kvm))
+ return -ENOTTY;
+
+ if (copy_from_user(¶ms, (void __user *)(uintptr_t)argp->data, sizeof(params)))
+ return -EFAULT;
+
+ /* Don't allow userspace to allocate memory for more than 1 SNP context. */
+ if (sev->snp_context)
+ return -EINVAL;
+
+ sev->snp_context = snp_context_create(kvm, argp);
+ if (!sev->snp_context)
+ return -ENOTTY;
+
+ if (params.policy & SNP_POLICY_MASK_SINGLE_SOCKET) {
+ pr_warn("SEV-SNP hypervisor does not support limiting guests to a single socket.");
+ return -EINVAL;
+ }
+
+ if (!(params.policy & SNP_POLICY_MASK_SMT)) {
+ pr_warn("SEV-SNP hypervisor does not support limiting guests to a single SMT thread.");
+ return -EINVAL;
+ }
+
+ start.gctx_paddr = __psp_pa(sev->snp_context);
+ start.policy = params.policy;
+ memcpy(start.gosvw, params.gosvw, sizeof(params.gosvw));
+ rc = __sev_issue_cmd(argp->sev_fd, SEV_CMD_SNP_LAUNCH_START, &start, &argp->error);
+ if (rc)
+ goto e_free_context;
+
+ sev->fd = argp->sev_fd;
+ rc = snp_bind_asid(kvm, &argp->error);
+ if (rc)
+ goto e_free_context;
+
+ return 0;
+
+e_free_context:
+ snp_decommission_context(kvm);
+
+ return rc;
+}
+
int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
{
struct kvm_sev_cmd sev_cmd;
@@ -1980,6 +2080,9 @@ int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
case KVM_SEV_RECEIVE_FINISH:
r = sev_receive_finish(kvm, &sev_cmd);
break;
+ case KVM_SEV_SNP_LAUNCH_START:
+ r = snp_launch_start(kvm, &sev_cmd);
+ break;
default:
r = -EINVAL;
goto out;
@@ -2172,6 +2275,33 @@ int sev_vm_copy_enc_context_from(struct kvm *kvm, unsigned int source_fd)
return ret;
}
+static int snp_decommission_context(struct kvm *kvm)
+{
+ struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+ struct sev_data_snp_addr data = {};
+ int ret;
+
+ /* If context is not created then do nothing */
+ if (!sev->snp_context)
+ return 0;
+
+ data.gctx_paddr = __sme_pa(sev->snp_context);
+ down_write(&sev_deactivate_lock);
+ ret = sev_do_cmd(SEV_CMD_SNP_DECOMMISSION, &data, NULL);
+ if (WARN_ONCE(ret, "failed to release guest context")) {
+ up_write(&sev_deactivate_lock);
+ return ret;
+ }
+
+ up_write(&sev_deactivate_lock);
+
+ /* free the context page now */
+ snp_free_firmware_page(sev->snp_context);
+ sev->snp_context = NULL;
+
+ return 0;
+}
+
void sev_vm_destroy(struct kvm *kvm)
{
struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
@@ -2213,7 +2343,15 @@ void sev_vm_destroy(struct kvm *kvm)
}
}
- sev_unbind_asid(kvm, sev->handle);
+ if (sev_snp_guest(kvm)) {
+ if (snp_decommission_context(kvm)) {
+ WARN_ONCE(1, "Failed to free SNP guest context, leaking asid!\n");
+ return;
+ }
+ } else {
+ sev_unbind_asid(kvm, sev->handle);
+ }
+
sev_asid_free(sev);
}
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 07a9eb5b6ce5..9c633173b779 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -95,6 +95,7 @@ struct kvm_sev_info {
struct misc_cg *misc_cg; /* For misc cgroup accounting */
atomic_t migration_in_progress;
u64 snp_init_flags;
+ void *snp_context; /* SNP guest context page */
};
struct kvm_svm {
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 73702e9b9d76..9fe36408d55b 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1871,6 +1871,7 @@ enum sev_cmd_id {
/* SNP specific commands */
KVM_SEV_SNP_INIT,
+ KVM_SEV_SNP_LAUNCH_START,
KVM_SEV_NR_MAX,
};
@@ -1978,6 +1979,15 @@ struct kvm_snp_init {
__u64 flags;
};
+struct kvm_sev_snp_launch_start {
+ __u64 policy;
+ __u64 ma_uaddr;
+ __u8 ma_en;
+ __u8 imi_en;
+ __u8 gosvw[16];
+ __u8 pad[6];
+};
+
#define KVM_DEV_ASSIGN_ENABLE_IOMMU (1 << 0)
#define KVM_DEV_ASSIGN_PCI_2_3 (1 << 1)
#define KVM_DEV_ASSIGN_MASK_INTX (1 << 2)
--
2.25.1
From: Vishal Annapurve <[email protected]>
Introduce HVA range operator so that other KVM subsystems can operate on
HVA ranges.
Signed-off-by: Vishal Annapurve <[email protected]>
[mdr: minor checkpatch alignment fixups]
Signed-off-by: Michael Roth <[email protected]>
---
include/linux/kvm_host.h | 6 +++++
virt/kvm/kvm_main.c | 49 ++++++++++++++++++++++++++++++++++++++++
2 files changed, 55 insertions(+)
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index a2a8331fbb94..bc3a468e97e3 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1429,6 +1429,12 @@ void kvm_mmu_invalidate_range_add(struct kvm *kvm, gfn_t start, gfn_t end);
void kvm_mmu_invalidate_end(struct kvm *kvm);
bool kvm_mmu_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range);
+typedef int (*kvm_hva_range_op_t)(struct kvm *kvm,
+ struct kvm_gfn_range *range, void *data);
+
+int kvm_vm_do_hva_range_op(struct kvm *kvm, unsigned long hva_start,
+ unsigned long hva_end, kvm_hva_range_op_t handler, void *data);
+
long kvm_arch_dev_ioctl(struct file *filp,
unsigned int ioctl, unsigned long arg);
long kvm_arch_vcpu_ioctl(struct file *filp,
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 4fd0fb0044f5..03243a7ece08 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -667,6 +667,55 @@ static __always_inline kvm_mn_ret_t __kvm_handle_hva_range(struct kvm *kvm,
return r;
}
+int kvm_vm_do_hva_range_op(struct kvm *kvm, unsigned long hva_start,
+ unsigned long hva_end, kvm_hva_range_op_t handler, void *data)
+{
+ int ret = 0;
+ struct kvm_gfn_range gfn_range;
+ struct kvm_memory_slot *slot;
+ struct kvm_memslots *slots;
+ int i, idx;
+
+ if (WARN_ON_ONCE(hva_end <= hva_start))
+ return -EINVAL;
+
+ idx = srcu_read_lock(&kvm->srcu);
+
+ for (i = 0; i < kvm_arch_nr_memslot_as_ids(kvm); i++) {
+ struct interval_tree_node *node;
+
+ slots = __kvm_memslots(kvm, i);
+ kvm_for_each_memslot_in_hva_range(node, slots,
+ hva_start, hva_end - 1) {
+ unsigned long start, end;
+
+ slot = container_of(node, struct kvm_memory_slot,
+ hva_node[slots->node_idx]);
+ start = max(hva_start, slot->userspace_addr);
+ end = min(hva_end, slot->userspace_addr +
+ (slot->npages << PAGE_SHIFT));
+
+ /*
+ * {gfn(page) | page intersects with [hva_start, hva_end)} =
+ * {gfn_start, gfn_start+1, ..., gfn_end-1}.
+ */
+ gfn_range.start = hva_to_gfn_memslot(start, slot);
+ gfn_range.end = hva_to_gfn_memslot(end + PAGE_SIZE - 1, slot);
+ gfn_range.slot = slot;
+
+ ret = handler(kvm, &gfn_range, data);
+ if (ret)
+ goto e_ret;
+ }
+ }
+
+e_ret:
+ srcu_read_unlock(&kvm->srcu, idx);
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(kvm_vm_do_hva_range_op);
+
static __always_inline int kvm_handle_hva_range(struct mmu_notifier *mn,
unsigned long start,
unsigned long end,
--
2.25.1
From: Brijesh Singh <[email protected]>
The KVM_SEV_SNP_LAUNCH_FINISH finalize the cryptographic digest and stores
it as the measurement of the guest at launch.
While finalizing the launch flow, it also issues the LAUNCH_UPDATE command
to encrypt the VMSA pages.
If its an SNP guest, then VMSA was added in the RMP entry as
a guest owned page and also removed from the kernel direct map
so flush it later after it is transitioned back to hypervisor
state and restored in the direct map.
Signed-off-by: Brijesh Singh <[email protected]>
Signed-off-by: Harald Hoyer <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
[mdr: always measure BSP first to get consistent launch measurements]
Signed-off-by: Michael Roth <[email protected]>
---
.../virt/kvm/x86/amd-memory-encryption.rst | 24 +++
arch/x86/kvm/svm/sev.c | 146 ++++++++++++++++++
include/uapi/linux/kvm.h | 14 ++
3 files changed, 184 insertions(+)
diff --git a/Documentation/virt/kvm/x86/amd-memory-encryption.rst b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
index d4325b26724c..b89634cfcc06 100644
--- a/Documentation/virt/kvm/x86/amd-memory-encryption.rst
+++ b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
@@ -513,6 +513,30 @@ Returns: 0 on success, -negative on error
See the SEV-SNP spec for further details on how to build the VMPL permission
mask and page type.
+21. KVM_SNP_LAUNCH_FINISH
+-------------------------
+
+After completion of the SNP guest launch flow, the KVM_SNP_LAUNCH_FINISH command can be
+issued to make the guest ready for the execution.
+
+Parameters (in): struct kvm_sev_snp_launch_finish
+
+Returns: 0 on success, -negative on error
+
+::
+
+ struct kvm_sev_snp_launch_finish {
+ __u64 id_block_uaddr;
+ __u64 id_auth_uaddr;
+ __u8 id_block_en;
+ __u8 auth_key_en;
+ __u8 host_data[32];
+ __u8 pad[6];
+ };
+
+
+See SEV-SNP specification for further details on launch finish input parameters.
+
References
==========
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index d60209e6e68b..ada40a79b2f7 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -77,6 +77,8 @@ static bool sev_snp_enabled;
#define SNP_POLICY_MASK_SMT BIT_ULL(16)
#define SNP_POLICY_MASK_SINGLE_SOCKET BIT_ULL(20)
+#define INITIAL_VMSA_GPA 0xFFFFFFFFF000
+
static u8 sev_enc_bit;
static DECLARE_RWSEM(sev_deactivate_lock);
static DEFINE_MUTEX(sev_bitmap_lock);
@@ -749,7 +751,29 @@ static int sev_launch_update_vmsa(struct kvm *kvm, struct kvm_sev_cmd *argp)
if (!sev_es_guest(kvm))
return -ENOTTY;
+ /* Handle boot vCPU first to ensure consistent measurement of initial state. */
+ kvm_for_each_vcpu(i, vcpu, kvm) {
+ if (vcpu->vcpu_id != 0)
+ continue;
+
+ ret = mutex_lock_killable(&vcpu->mutex);
+ if (ret)
+ return ret;
+
+ ret = __sev_launch_update_vmsa(kvm, vcpu, &argp->error);
+
+ mutex_unlock(&vcpu->mutex);
+ if (ret)
+ return ret;
+
+ break;
+ }
+
+ /* Handle remaining vCPUs. */
kvm_for_each_vcpu(i, vcpu, kvm) {
+ if (vcpu->vcpu_id == 0)
+ continue;
+
ret = mutex_lock_killable(&vcpu->mutex);
if (ret)
return ret;
@@ -2168,6 +2192,109 @@ static int snp_launch_update(struct kvm *kvm, struct kvm_sev_cmd *argp)
snp_launch_update_gfn_handler, argp);
}
+static int snp_launch_update_vmsa(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+ struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+ struct sev_data_snp_launch_update data = {};
+ struct kvm_vcpu *vcpu;
+ unsigned long i;
+ int ret;
+
+ data.gctx_paddr = __psp_pa(sev->snp_context);
+ data.page_type = SNP_PAGE_TYPE_VMSA;
+
+ kvm_for_each_vcpu(i, vcpu, kvm) {
+ struct vcpu_svm *svm = to_svm(vcpu);
+ u64 pfn = __pa(svm->sev_es.vmsa) >> PAGE_SHIFT;
+
+ /* Perform some pre-encryption checks against the VMSA */
+ ret = sev_es_sync_vmsa(svm);
+ if (ret)
+ return ret;
+
+ /* Transition the VMSA page to a firmware state. */
+ ret = rmp_make_private(pfn, INITIAL_VMSA_GPA, PG_LEVEL_4K, sev->asid, true);
+ if (ret)
+ return ret;
+
+ /* Issue the SNP command to encrypt the VMSA */
+ data.address = __sme_pa(svm->sev_es.vmsa);
+ ret = __sev_issue_cmd(argp->sev_fd, SEV_CMD_SNP_LAUNCH_UPDATE,
+ &data, &argp->error);
+ if (ret) {
+ snp_page_reclaim(pfn);
+ return ret;
+ }
+
+ svm->vcpu.arch.guest_state_protected = true;
+ }
+
+ return 0;
+}
+
+static int snp_launch_finish(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+ struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+ struct kvm_sev_snp_launch_finish params;
+ struct sev_data_snp_launch_finish *data;
+ void *id_block = NULL, *id_auth = NULL;
+ int ret;
+
+ if (!sev_snp_guest(kvm))
+ return -ENOTTY;
+
+ if (!sev->snp_context)
+ return -EINVAL;
+
+ if (copy_from_user(¶ms, (void __user *)(uintptr_t)argp->data, sizeof(params)))
+ return -EFAULT;
+
+ /* Measure all vCPUs using LAUNCH_UPDATE before finalizing the launch flow. */
+ ret = snp_launch_update_vmsa(kvm, argp);
+ if (ret)
+ return ret;
+
+ data = kzalloc(sizeof(*data), GFP_KERNEL_ACCOUNT);
+ if (!data)
+ return -ENOMEM;
+
+ if (params.id_block_en) {
+ id_block = psp_copy_user_blob(params.id_block_uaddr, KVM_SEV_SNP_ID_BLOCK_SIZE);
+ if (IS_ERR(id_block)) {
+ ret = PTR_ERR(id_block);
+ goto e_free;
+ }
+
+ data->id_block_en = 1;
+ data->id_block_paddr = __sme_pa(id_block);
+
+ id_auth = psp_copy_user_blob(params.id_auth_uaddr, KVM_SEV_SNP_ID_AUTH_SIZE);
+ if (IS_ERR(id_auth)) {
+ ret = PTR_ERR(id_auth);
+ goto e_free_id_block;
+ }
+
+ data->id_auth_paddr = __sme_pa(id_auth);
+
+ if (params.auth_key_en)
+ data->auth_key_en = 1;
+ }
+
+ memcpy(data->host_data, params.host_data, KVM_SEV_SNP_FINISH_DATA_SIZE);
+ data->gctx_paddr = __psp_pa(sev->snp_context);
+ ret = sev_issue_cmd(kvm, SEV_CMD_SNP_LAUNCH_FINISH, data, &argp->error);
+
+ kfree(id_auth);
+
+e_free_id_block:
+ kfree(id_block);
+
+e_free:
+ kfree(data);
+
+ return ret;
+}
+
int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
{
struct kvm_sev_cmd sev_cmd;
@@ -2264,6 +2391,9 @@ int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
case KVM_SEV_SNP_LAUNCH_UPDATE:
r = snp_launch_update(kvm, &sev_cmd);
break;
+ case KVM_SEV_SNP_LAUNCH_FINISH:
+ r = snp_launch_finish(kvm, &sev_cmd);
+ break;
default:
r = -EINVAL;
goto out;
@@ -2736,11 +2866,27 @@ void sev_free_vcpu(struct kvm_vcpu *vcpu)
svm = to_svm(vcpu);
+ /*
+ * If its an SNP guest, then VMSA was added in the RMP entry as
+ * a guest owned page. Transition the page to hypervisor state
+ * before releasing it back to the system.
+ * Also the page is removed from the kernel direct map, so flush it
+ * later after it is transitioned back to hypervisor state and
+ * restored in the direct map.
+ */
+ if (sev_snp_guest(vcpu->kvm)) {
+ u64 pfn = __pa(svm->sev_es.vmsa) >> PAGE_SHIFT;
+
+ if (host_rmp_make_shared(pfn, PG_LEVEL_4K, true))
+ goto skip_vmsa_free;
+ }
+
if (vcpu->arch.guest_state_protected)
sev_flush_encrypted_page(vcpu, svm->sev_es.vmsa);
__free_page(virt_to_page(svm->sev_es.vmsa));
+skip_vmsa_free:
if (svm->sev_es.ghcb_sa_free)
kvfree(svm->sev_es.ghcb_sa);
}
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 6e6e3a478022..5218075fe1f4 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1873,6 +1873,7 @@ enum sev_cmd_id {
KVM_SEV_SNP_INIT,
KVM_SEV_SNP_LAUNCH_START,
KVM_SEV_SNP_LAUNCH_UPDATE,
+ KVM_SEV_SNP_LAUNCH_FINISH,
KVM_SEV_NR_MAX,
};
@@ -2007,6 +2008,19 @@ struct kvm_sev_snp_launch_update {
__u8 vmpl1_perms;
};
+#define KVM_SEV_SNP_ID_BLOCK_SIZE 96
+#define KVM_SEV_SNP_ID_AUTH_SIZE 4096
+#define KVM_SEV_SNP_FINISH_DATA_SIZE 32
+
+struct kvm_sev_snp_launch_finish {
+ __u64 id_block_uaddr;
+ __u64 id_auth_uaddr;
+ __u8 id_block_en;
+ __u8 auth_key_en;
+ __u8 host_data[KVM_SEV_SNP_FINISH_DATA_SIZE];
+ __u8 pad[6];
+};
+
#define KVM_DEV_ASSIGN_ENABLE_IOMMU (1 << 0)
#define KVM_DEV_ASSIGN_PCI_2_3 (1 << 1)
#define KVM_DEV_ASSIGN_MASK_INTX (1 << 2)
--
2.25.1
From: Brijesh Singh <[email protected]>
SEV-SNP guests are required to perform a GHCB GPA registration. Before
using a GHCB GPA for a vCPU the first time, a guest must register the
vCPU GHCB GPA. If hypervisor can work with the guest requested GPA then
it must respond back with the same GPA otherwise return -1.
On VMEXIT, verify that the GHCB GPA matches with the registered value.
If a mismatch is detected, then abort the guest.
Signed-off-by: Brijesh Singh <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
Signed-off-by: Michael Roth <[email protected]>
---
arch/x86/include/asm/sev-common.h | 8 ++++++++
arch/x86/kvm/svm/sev.c | 27 +++++++++++++++++++++++++++
arch/x86/kvm/svm/svm.h | 7 +++++++
3 files changed, 42 insertions(+)
diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
index 5a8246dd532f..1006bfffe07a 100644
--- a/arch/x86/include/asm/sev-common.h
+++ b/arch/x86/include/asm/sev-common.h
@@ -59,6 +59,14 @@
#define GHCB_MSR_AP_RESET_HOLD_RESULT_POS 12
#define GHCB_MSR_AP_RESET_HOLD_RESULT_MASK GENMASK_ULL(51, 0)
+/* Preferred GHCB GPA Request */
+#define GHCB_MSR_PREF_GPA_REQ 0x010
+#define GHCB_MSR_GPA_VALUE_POS 12
+#define GHCB_MSR_GPA_VALUE_MASK GENMASK_ULL(51, 0)
+
+#define GHCB_MSR_PREF_GPA_RESP 0x011
+#define GHCB_MSR_PREF_GPA_NONE 0xfffffffffffff
+
/* GHCB GPA Register */
#define GHCB_MSR_REG_GPA_REQ 0x012
#define GHCB_MSR_REG_GPA_REQ_VAL(v) \
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index ada40a79b2f7..0b8837e21705 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3353,6 +3353,26 @@ static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
set_ghcb_msr_bits(svm, GHCB_MSR_HV_FT_RESP,
GHCB_MSR_INFO_MASK, GHCB_MSR_INFO_POS);
break;
+ case GHCB_MSR_PREF_GPA_REQ:
+ set_ghcb_msr_bits(svm, GHCB_MSR_PREF_GPA_NONE, GHCB_MSR_GPA_VALUE_MASK,
+ GHCB_MSR_GPA_VALUE_POS);
+ set_ghcb_msr_bits(svm, GHCB_MSR_PREF_GPA_RESP, GHCB_MSR_INFO_MASK,
+ GHCB_MSR_INFO_POS);
+ break;
+ case GHCB_MSR_REG_GPA_REQ: {
+ u64 gfn;
+
+ gfn = get_ghcb_msr_bits(svm, GHCB_MSR_GPA_VALUE_MASK,
+ GHCB_MSR_GPA_VALUE_POS);
+
+ svm->sev_es.ghcb_registered_gpa = gfn_to_gpa(gfn);
+
+ set_ghcb_msr_bits(svm, gfn, GHCB_MSR_GPA_VALUE_MASK,
+ GHCB_MSR_GPA_VALUE_POS);
+ set_ghcb_msr_bits(svm, GHCB_MSR_REG_GPA_RESP, GHCB_MSR_INFO_MASK,
+ GHCB_MSR_INFO_POS);
+ break;
+ }
case GHCB_MSR_TERM_REQ: {
u64 reason_set, reason_code;
@@ -3416,6 +3436,13 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
trace_kvm_vmgexit_enter(vcpu->vcpu_id, svm->sev_es.ghcb);
sev_es_sync_from_ghcb(svm);
+
+ /* SEV-SNP guest requires that the GHCB GPA must be registered */
+ if (sev_snp_guest(svm->vcpu.kvm) && !ghcb_gpa_is_registered(svm, ghcb_gpa)) {
+ vcpu_unimpl(&svm->vcpu, "vmgexit: GHCB GPA [%#llx] is not registered.\n", ghcb_gpa);
+ return -EINVAL;
+ }
+
ret = sev_es_validate_vmgexit(svm);
if (ret)
return ret;
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 9c633173b779..2bee24017bae 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -211,6 +211,8 @@ struct vcpu_sev_es_state {
u32 ghcb_sa_len;
bool ghcb_sa_sync;
bool ghcb_sa_free;
+
+ u64 ghcb_registered_gpa;
};
struct vcpu_svm {
@@ -354,6 +356,11 @@ static __always_inline bool sev_snp_guest(struct kvm *kvm)
return sev_es_guest(kvm) && sev->snp_active;
}
+static inline bool ghcb_gpa_is_registered(struct vcpu_svm *svm, u64 val)
+{
+ return svm->sev_es.ghcb_registered_gpa == val;
+}
+
static inline void vmcb_mark_all_dirty(struct vmcb *vmcb)
{
vmcb->control.clean = 0;
--
2.25.1
From: Sean Christopherson <[email protected]>
Extended guest_memfd to allow backing guest memory with hugepages. This
is done as a best-effort by default until a better-defined mechanism is
put in place that can provide better control/assurances to userspace
about hugepage allocations.
When reporting the max order when KVM gets a pfn from guest_memfd, force
order-0 pages if the hugepage is not fully contained by the memslot
binding, e.g. if userspace requested hugepages but punches a hole in the
memslot bindings in order to emulate x86's VGA hole.
Link: https://lore.kernel.org/kvm/[email protected]/T/#mccbd3e8bf9897f0ddbf864e6318d6f2f208b269c
Signed-off-by: Sean Christopherson <[email protected]>
Message-Id: <[email protected]>
[Allow even with CONFIG_TRANSPARENT_HUGEPAGE; dropped momentarily due to
uneasiness about the API. - Paolo]
Signed-off-by: Paolo Bonzini <[email protected]>
[mdr: based on discussion in the Link regarding original patch, make the
following set of changes:
- For now, don't introduce an opt-in flag to enable hugepage
support. By default, just make a best-effort for PMD_ORDER
allocations so that there are no false assurances to userspace
that they'll get hugepages. It's better at least than the
current guarantee that they will get 4K pages every time. A more
proper opt-in interface can then improve on things later.
- Pass GFP_NOWARN to alloc_pages() so failures are not disruptive
to normal operations
- Drop size checks during creation time. Instead just avoid huge
allocations if they extend beyond end of the memfd.
- Drop hugepage-related unit tests since everything is now handled
transparently to userspace anyway.
- Update commit message accordingly.]
Signed-off-by: Michael Roth <[email protected]>
Signed-off-by: Michael Roth <[email protected]>
---
virt/kvm/guest_memfd.c | 63 +++++++++++++++++++++++++++++++++++++-----
1 file changed, 56 insertions(+), 7 deletions(-)
diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index 4aa23b01aa98..784690a664ac 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -13,14 +13,46 @@ struct kvm_gmem {
struct list_head entry;
};
-static struct folio *kvm_gmem_get_folio(struct inode *inode, pgoff_t index)
+static struct folio *kvm_gmem_get_huge_folio(struct inode *inode, pgoff_t index,
+ unsigned int order)
{
+ pgoff_t npages = 1UL << order;
+ pgoff_t huge_index = round_down(index, npages);
+ struct address_space *mapping = inode->i_mapping;
+ gfp_t gfp = mapping_gfp_mask(mapping) | __GFP_NOWARN;
+ loff_t size = i_size_read(inode);
struct folio *folio;
- /* TODO: Support huge pages. */
- folio = filemap_grab_folio(inode->i_mapping, index);
- if (IS_ERR_OR_NULL(folio))
+ /* Make sure hugepages would be fully-contained by inode */
+ if ((huge_index + npages) * PAGE_SIZE > size)
+ return NULL;
+
+ if (filemap_range_has_page(mapping, (loff_t)huge_index << PAGE_SHIFT,
+ (loff_t)(huge_index + npages - 1) << PAGE_SHIFT))
+ return NULL;
+
+ folio = filemap_alloc_folio(gfp, order);
+ if (!folio)
+ return NULL;
+
+ if (filemap_add_folio(mapping, folio, huge_index, gfp)) {
+ folio_put(folio);
return NULL;
+ }
+
+ return folio;
+}
+
+static struct folio *kvm_gmem_get_folio(struct inode *inode, pgoff_t index)
+{
+ struct folio *folio;
+
+ folio = kvm_gmem_get_huge_folio(inode, index, PMD_ORDER);
+ if (!folio) {
+ folio = filemap_grab_folio(inode->i_mapping, index);
+ if (IS_ERR_OR_NULL(folio))
+ return NULL;
+ }
/*
* Use the up-to-date flag to track whether or not the memory has been
@@ -361,6 +393,7 @@ static int __kvm_gmem_create(struct kvm *kvm, loff_t size, u64 flags)
inode->i_mode |= S_IFREG;
inode->i_size = size;
mapping_set_gfp_mask(inode->i_mapping, GFP_HIGHUSER);
+ mapping_set_large_folios(inode->i_mapping);
mapping_set_unmovable(inode->i_mapping);
/* Unmovable mappings are supposed to be marked unevictable as well. */
WARN_ON_ONCE(!mapping_unevictable(inode->i_mapping));
@@ -486,7 +519,7 @@ void kvm_gmem_unbind(struct kvm_memory_slot *slot)
int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
gfn_t gfn, kvm_pfn_t *pfn, int *max_order)
{
- pgoff_t index = gfn - slot->base_gfn + slot->gmem.pgoff;
+ pgoff_t index, huge_index;
struct kvm_gmem *gmem;
struct folio *folio;
struct page *page;
@@ -499,6 +532,7 @@ int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
gmem = file->private_data;
+ index = gfn - slot->base_gfn + slot->gmem.pgoff;
if (WARN_ON_ONCE(xa_load(&gmem->bindings, index) != slot)) {
r = -EIO;
goto out_fput;
@@ -518,9 +552,24 @@ int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
page = folio_file_page(folio, index);
*pfn = page_to_pfn(page);
- if (max_order)
- *max_order = 0;
+ if (!max_order)
+ goto success;
+
+ *max_order = compound_order(compound_head(page));
+ if (!*max_order)
+ goto success;
+ /*
+ * The folio can be mapped with a hugepage if and only if the folio is
+ * fully contained by the range the memslot is bound to. Note, the
+ * caller is responsible for handling gfn alignment, this only deals
+ * with the file binding.
+ */
+ huge_index = ALIGN(index, 1ull << *max_order);
+ if (huge_index < ALIGN(slot->gmem.pgoff, 1ull << *max_order) ||
+ huge_index + (1ull << *max_order) > slot->gmem.pgoff + slot->npages)
+ *max_order = 0;
+success:
r = 0;
out_unlock:
--
2.25.1
From: Brijesh Singh <[email protected]>
While resolving the RMP page fault, there may be cases where the page
level between the RMP entry and TDP does not match and the 2M RMP entry
must be split into 4K RMP entries. Or a 2M TDP page need to be broken
into multiple of 4K pages.
To keep the RMP and TDP page level in sync, zap the gfn range after
splitting the pages in the RMP entry. The zap should force the TDP to
gets rebuilt with the new page level.
Signed-off-by: Brijesh Singh <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
Signed-off-by: Michael Roth <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/kvm/mmu.h | 2 --
arch/x86/kvm/mmu/mmu.c | 1 +
3 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 843695217b4b..3fdcbb1da856 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1923,6 +1923,7 @@ void kvm_mmu_slot_leaf_clear_dirty(struct kvm *kvm,
const struct kvm_memory_slot *memslot);
void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm, u64 gen);
void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned long kvm_nr_mmu_pages);
+void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end);
int load_pdptrs(struct kvm_vcpu *vcpu, unsigned long cr3);
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 60f21bb4c27b..df4d2c137a67 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -252,8 +252,6 @@ static inline bool kvm_mmu_honors_guest_mtrrs(struct kvm *kvm)
return __kvm_mmu_honors_guest_mtrrs(kvm_arch_has_noncoherent_dma(kvm));
}
-void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end);
-
int kvm_arch_write_log_dirty(struct kvm_vcpu *vcpu);
int kvm_mmu_post_init_vm(struct kvm *kvm);
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 61213f6648a1..1882096fba3e 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -6702,6 +6702,7 @@ static bool kvm_mmu_zap_collapsible_spte(struct kvm *kvm,
return need_tlb_flush;
}
+EXPORT_SYMBOL_GPL(kvm_zap_gfn_range);
static void kvm_rmap_zap_collapsible_sptes(struct kvm *kvm,
const struct kvm_memory_slot *slot)
--
2.25.1
This will handle RMP table updates and direct map changes needed to put
a page into a private state before mapping it into an SEV-SNP guest.
Signed-off-by: Michael Roth <[email protected]>
---
arch/x86/kvm/Kconfig | 1 +
arch/x86/kvm/svm/sev.c | 98 ++++++++++++++++++++++++++++++++++++++++++
arch/x86/kvm/svm/svm.c | 2 +
arch/x86/kvm/svm/svm.h | 1 +
virt/kvm/guest_memfd.c | 4 +-
5 files changed, 104 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index 4ec53d6d5773..79c002e1bb5c 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -125,6 +125,7 @@ config KVM_AMD_SEV
depends on KVM_AMD && X86_64
depends on CRYPTO_DEV_SP_PSP && !(KVM_AMD=y && CRYPTO_DEV_CCP_DD=m)
select KVM_GENERIC_PRIVATE_MEM
+ select HAVE_KVM_GMEM_PREPARE
help
Provides support for launching Encrypted VMs (SEV) and Encrypted VMs
with Encrypted State (SEV-ES) on AMD processors.
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index b2ac696c436a..91f53f4a6059 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -4154,3 +4154,101 @@ void handle_rmp_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code)
out:
put_page(pfn_to_page(pfn));
}
+
+static bool is_pfn_range_shared(kvm_pfn_t start, kvm_pfn_t end)
+{
+ kvm_pfn_t pfn = start;
+
+ while (pfn < end) {
+ int ret, rmp_level;
+ bool assigned;
+
+ ret = snp_lookup_rmpentry(pfn, &assigned, &rmp_level);
+ if (ret) {
+ pr_warn_ratelimited("SEV: Failed to retrieve RMP entry: PFN 0x%llx GFN start 0x%llx GFN end 0x%llx RMP level %d error %d\n",
+ pfn, start, end, rmp_level, ret);
+ return false;
+ }
+
+ if (assigned) {
+ pr_debug("%s: overlap detected, PFN 0x%llx start 0x%llx end 0x%llx RMP level %d\n",
+ __func__, pfn, start, end, rmp_level);
+ return false;
+ }
+
+ pfn++;
+ }
+
+ return true;
+}
+
+static u8 max_level_for_order(int order)
+{
+ if (order >= KVM_HPAGE_GFN_SHIFT(PG_LEVEL_2M))
+ return PG_LEVEL_2M;
+
+ return PG_LEVEL_4K;
+}
+
+static bool is_large_rmp_possible(struct kvm *kvm, kvm_pfn_t pfn, int order)
+{
+ kvm_pfn_t pfn_aligned = ALIGN_DOWN(pfn, PTRS_PER_PMD);
+
+ /*
+ * If this is a large folio, and the entire 2M range containing the
+ * PFN is currently shared, then the entire 2M-aligned range can be
+ * set to private via a single 2M RMP entry.
+ */
+ if (max_level_for_order(order) > PG_LEVEL_4K &&
+ is_pfn_range_shared(pfn_aligned, pfn_aligned + PTRS_PER_PMD))
+ return true;
+
+ return false;
+}
+
+int sev_gmem_prepare(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, int max_order)
+{
+ struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+ kvm_pfn_t pfn_aligned;
+ gfn_t gfn_aligned;
+ int level, rc;
+ bool assigned;
+
+ if (!sev_snp_guest(kvm))
+ return 0;
+
+ rc = snp_lookup_rmpentry(pfn, &assigned, &level);
+ if (rc) {
+ pr_err_ratelimited("SEV: Failed to look up RMP entry: GFN %llx PFN %llx error %d\n",
+ gfn, pfn, rc);
+ return -ENOENT;
+ }
+
+ if (assigned) {
+ pr_debug("%s: already assigned: gfn %llx pfn %llx max_order %d level %d\n",
+ __func__, gfn, pfn, max_order, level);
+ return 0;
+ }
+
+ if (is_large_rmp_possible(kvm, pfn, max_order)) {
+ level = PG_LEVEL_2M;
+ pfn_aligned = ALIGN_DOWN(pfn, PTRS_PER_PMD);
+ gfn_aligned = ALIGN_DOWN(gfn, PTRS_PER_PMD);
+ } else {
+ level = PG_LEVEL_4K;
+ pfn_aligned = pfn;
+ gfn_aligned = gfn;
+ }
+
+ rc = rmp_make_private(pfn_aligned, gfn_to_gpa(gfn_aligned), level, sev->asid, false);
+ if (rc) {
+ pr_err_ratelimited("SEV: Failed to update RMP entry: GFN %llx PFN %llx level %d error %d\n",
+ gfn, pfn, level, rc);
+ return -EINVAL;
+ }
+
+ pr_debug("%s: updated: gfn %llx pfn %llx pfn_aligned %llx max_order %d level %d\n",
+ __func__, gfn, pfn, pfn_aligned, max_order, level);
+
+ return 0;
+}
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 240518f8d6c7..32cef8626b57 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -5065,6 +5065,8 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
.vcpu_deliver_sipi_vector = svm_vcpu_deliver_sipi_vector,
.vcpu_get_apicv_inhibit_reasons = avic_vcpu_get_apicv_inhibit_reasons,
.alloc_apic_backing_page = svm_alloc_apic_backing_page,
+
+ .gmem_prepare = sev_gmem_prepare,
};
/*
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index d953ae41c619..9ece9612dbb9 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -725,6 +725,7 @@ void sev_es_unmap_ghcb(struct vcpu_svm *svm);
struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu);
void handle_rmp_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code);
void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu);
+int sev_gmem_prepare(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, int max_order);
/* vmenter.S */
diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index feec0da93d98..ddea45279fef 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -66,8 +66,8 @@ static int kvm_gmem_prepare_folio(struct inode *inode, pgoff_t index, struct fol
gfn = slot->base_gfn + index - slot->gmem.pgoff;
rc = kvm_arch_gmem_prepare(kvm, gfn, pfn, compound_order(compound_head(page)));
if (rc) {
- pr_warn_ratelimited("gmem: Failed to prepare folio for index %lx, error %d.\n",
- index, rc);
+ pr_warn_ratelimited("gmem: Failed to prepare folio for index %lx GFN %llx PFN %llx error %d.\n",
+ index, gfn, pfn, rc);
return rc;
}
}
--
2.25.1
filemap users like guest_memfd may use page cache pages to
allocate/manage memory that is only intended to be accessed by guests
via hardware protections like encryption. Writes to memory of this sort
in common paths like truncation may cause unexpected behavior such
writing garbage instead of zeros when attempting to zero pages, or
worse, triggering hardware protections that are considered fatal as far
as the kernel is concerned.
Introduce a new address_space flag, AS_INACCESSIBLE, and use this
initially to prevent zero'ing of pages during truncation, with the
understanding that it is up to the owner of the mapping to handle this
specially if needed.
Link: https://lore.kernel.org/lkml/[email protected]/
Cc: Matthew Wilcox <[email protected]>
Suggested-by: Sean Christopherson <[email protected]>
Signed-off-by: Michael Roth <[email protected]>
---
include/linux/pagemap.h | 1 +
mm/truncate.c | 3 ++-
2 files changed, 3 insertions(+), 1 deletion(-)
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index c2d90588c0bf..b56081a3512e 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -207,6 +207,7 @@ enum mapping_flags {
AS_STABLE_WRITES, /* must wait for writeback before modifying
folio contents */
AS_UNMOVABLE, /* The mapping cannot be moved, ever */
+ AS_INACCESSIBLE, /* Do not attempt direct R/W access to the mapping */
};
/**
diff --git a/mm/truncate.c b/mm/truncate.c
index 725b150e47ac..c501338c7ebd 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -233,7 +233,8 @@ bool truncate_inode_partial_folio(struct folio *folio, loff_t start, loff_t end)
* doing a complex calculation here, and then doing the zeroing
* anyway if the page split fails.
*/
- folio_zero_range(folio, offset, length);
+ if (!(folio->mapping->flags & AS_INACCESSIBLE))
+ folio_zero_range(folio, offset, length);
if (folio_has_private(folio))
folio_invalidate(folio, offset, length);
--
2.25.1
From: Ashish Kalra <[email protected]>
With SNP/guest_memfd, private/encrypted memory should not be mappable,
and MMU notifications for HVA-mapped memory will only be relevant to
unencrypted guest memory. Therefore, the rationale behind issuing a
wbinvd_on_all_cpus() in sev_guest_memory_reclaimed() should not apply
for SNP guests and can be ignored.
Signed-off-by: Ashish Kalra <[email protected]>
[mdr: Add some clarifications in commit]
Signed-off-by: Michael Roth <[email protected]>
---
arch/x86/kvm/svm/sev.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 5eb836b73131..2cc909cc18c1 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -2858,7 +2858,14 @@ static void sev_flush_encrypted_page(struct kvm_vcpu *vcpu, void *va)
void sev_guest_memory_reclaimed(struct kvm *kvm)
{
- if (!sev_guest(kvm))
+ /*
+ * With SNP+gmem, private/encrypted memory should be
+ * unreachable via the hva-based mmu notifiers. Additionally,
+ * for shared->private translations, H/W coherency will ensure
+ * first guest access to the page would clear out any existing
+ * dirty copies of that cacheline.
+ */
+ if (!sev_guest(kvm) || sev_snp_guest(kvm))
return;
wbinvd_on_all_cpus();
--
2.25.1
From: Brijesh Singh <[email protected]>
Add a module parameter than can be used to enable or disable the SEV-SNP
feature. Now that KVM contains the support for the SNP set the GHCB
hypervisor feature flag to indicate that SNP is supported.
Signed-off-by: Brijesh Singh <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
Signed-off-by: Michael Roth <[email protected]>
---
arch/x86/kvm/svm/sev.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 2cc909cc18c1..30a2e75fd94f 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -61,7 +61,8 @@ module_param_named(sev_es, sev_es_enabled, bool, 0444);
static bool sev_es_debug_swap_enabled = true;
module_param_named(debug_swap, sev_es_debug_swap_enabled, bool, 0444);
-static bool sev_snp_enabled;
+static bool sev_snp_enabled = true;
+module_param_named(sev_snp, sev_snp_enabled, bool, 0444);
#else
#define sev_enabled false
#define sev_es_enabled false
--
2.25.1
These commands can be used to create a transaction such that commands
that update the reported TCB, such as SNP_SET_CONFIG/SNP_COMMIT, and
updates to userspace-supplied certificates, can be handled atomically
relative to any extended guest requests issued by any SNP guests while
the updates are taking place.
Without this interface, there is a risk that a guest will be given
certificate information that does not correspond to the VCEK/VLEK used
to sign a particular attestation report unless all the running guests
are paused in advance, which would cause disruption to all guests in the
system even if no attestation requests are being made. Even then, care
is needed to ensure that KVM does not pass along certificate information
that was fetched from userspace in advance of the guest being paused.
This interface also provides some versatility with how similar firmware
maintenance activity can be handled in the future without passing
unnecessary management complexity on to userspace.
Signed-off-by: Michael Roth <[email protected]>
---
Documentation/virt/coco/sev-guest.rst | 33 ++++++++++++++++++--
arch/x86/include/asm/sev.h | 4 +++
arch/x86/virt/svm/sev.c | 31 +++++++++++++++++++
drivers/crypto/ccp/sev-dev.c | 44 +++++++++++++++++++++++++++
include/uapi/linux/psp-sev.h | 12 ++++++++
5 files changed, 122 insertions(+), 2 deletions(-)
diff --git a/Documentation/virt/coco/sev-guest.rst b/Documentation/virt/coco/sev-guest.rst
index 4f696aacc866..0426ebad7671 100644
--- a/Documentation/virt/coco/sev-guest.rst
+++ b/Documentation/virt/coco/sev-guest.rst
@@ -127,8 +127,6 @@ the SEV-SNP specification for further details.
The SNP_GET_EXT_REPORT ioctl is similar to the SNP_GET_REPORT. The difference is
related to the additional certificate data that is returned with the report.
-The certificate data returned is being provided by the hypervisor through the
-SNP_SET_EXT_CONFIG.
The ioctl uses the SNP_GUEST_REQUEST (MSG_REPORT_REQ) command provided by the SEV-SNP
firmware to get the attestation report.
@@ -175,6 +173,37 @@ SNP_CONFIG command defined in the SEV-SNP spec. The current values of the
firmware parameters affected by this command can be queried via
SNP_PLATFORM_STATUS.
+2.7 SNP_SET_CONFIG_START / SNP_SET_CONFIG_END
+---------------------------------------------
+:Technology: sev-snp
+:Type: hypervisor ioctl cmd
+:Parameters (out): struct sev_user_data_snp_config_transaction
+:Returns (out): 0 on success, -negative on error
+
+When requesting attestation reports, SNP guests have the option of issuing
+an extended guest request which allows host userspace to supply additional
+certificate data that can be used to validate the signature used to sign
+the attestation report. This signature is generated using a key that is
+derived from the reported TCB that can be set via the SNP_SET_CONFIG and
+SNP_COMMIT ioctls, so the accompanying certificate data needs to be kept in
+sync with the changes made to the reported TCB via these ioctls.
+
+To allow for this, SNP_SET_CONFIG_START can be issued prior to performing
+any updates to the reported TCB or certificate data that will be fetched
+from userspace. Any attestation report requests via extended guest requests
+that are in-progress, or received after SNP_SET_CONFIG_START is issued, will
+result in the guest receiving a GHCB-defined error message instructing it to
+retry the request. Once the updates are completed on the host,
+SNP_SET_CONFIG_END must be issued to resume normal servicing of extended
+guest requests.
+
+In general, hosts should avoid having more than 1 outstanding
+SNP_SET_CONFIG_{START,END} transaction in flight at any point in time, but
+each ioctl will return a transaction ID in the response so the caller can
+monitor whether the start/end ID both match. If they don't, the caller
+should assume the transaction has been invalidated and retry the full update
+sequence.
+
3. SEV-SNP CPUID Enforcement
============================
diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index e84dd1d2d8ab..925578ad34e6 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -268,6 +268,8 @@ int psmash(u64 pfn);
int rmp_make_private(u64 pfn, u64 gpa, enum pg_level level, int asid, bool immutable);
int rmp_make_shared(u64 pfn, enum pg_level level);
void snp_leak_pages(u64 pfn, unsigned int npages);
+u64 snp_config_transaction_start(void);
+u64 snp_config_transaction_end(void);
#else
static inline bool snp_probe_rmptable_info(void) { return false; }
static inline int snp_lookup_rmpentry(u64 pfn, bool *assigned, int *level) { return -ENODEV; }
@@ -280,6 +282,8 @@ static inline int rmp_make_private(u64 pfn, u64 gpa, enum pg_level level, int as
}
static inline int rmp_make_shared(u64 pfn, enum pg_level level) { return -ENODEV; }
static inline void snp_leak_pages(u64 pfn, unsigned int npages) {}
+static inline u64 snp_config_transaction_start(void) { return 0; }
+static inline u64 snp_config_transaction_end(void) { return 0; }
#endif
#endif
diff --git a/arch/x86/virt/svm/sev.c b/arch/x86/virt/svm/sev.c
index 0f2e1ce241b5..fc9e1b7fc187 100644
--- a/arch/x86/virt/svm/sev.c
+++ b/arch/x86/virt/svm/sev.c
@@ -71,6 +71,11 @@ static DEFINE_SPINLOCK(snp_leaked_pages_list_lock);
static unsigned long snp_nr_leaked_pages;
+/* For synchronizing TCB updates with extended guest requests */
+static DEFINE_MUTEX(snp_transaction_lock);
+static u64 snp_transaction_id;
+static bool snp_transaction_pending;
+
#undef pr_fmt
#define pr_fmt(fmt) "SEV-SNP: " fmt
@@ -511,3 +516,29 @@ void snp_leak_pages(u64 pfn, unsigned int npages)
spin_unlock(&snp_leaked_pages_list_lock);
}
EXPORT_SYMBOL_GPL(snp_leak_pages);
+
+u64 snp_config_transaction_start(void)
+{
+ u64 id;
+
+ mutex_lock(&snp_transaction_lock);
+ snp_transaction_pending = true;
+ id = ++snp_transaction_id;
+ mutex_unlock(&snp_transaction_lock);
+
+ return id;
+}
+EXPORT_SYMBOL_GPL(snp_config_transaction_start);
+
+u64 snp_config_transaction_end(void)
+{
+ u64 id;
+
+ mutex_lock(&snp_transaction_lock);
+ snp_transaction_pending = false;
+ id = snp_transaction_id;
+ mutex_unlock(&snp_transaction_lock);
+
+ return id;
+}
+EXPORT_SYMBOL_GPL(snp_config_transaction_end);
diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index c5b26b3fe7ff..d81f86d2697a 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -2035,6 +2035,44 @@ static int sev_ioctl_do_snp_set_config(struct sev_issue_cmd *argp, bool writable
return __sev_do_cmd_locked(SEV_CMD_SNP_CONFIG, &config, &argp->error);
}
+static int sev_ioctl_do_snp_set_config_start(struct sev_issue_cmd *argp, bool writable)
+{
+ struct sev_user_data_snp_config_transaction transaction = {0};
+ struct sev_device *sev = psp_master->sev_data;
+
+ if (!sev->snp_initialized || !argp->data)
+ return -EINVAL;
+
+ if (!writable)
+ return -EPERM;
+
+ transaction.id = snp_config_transaction_start();
+
+ if (copy_to_user((void __user *)argp->data, &transaction, sizeof(transaction)))
+ return -EFAULT;
+
+ return 0;
+}
+
+static int sev_ioctl_do_snp_set_config_end(struct sev_issue_cmd *argp, bool writable)
+{
+ struct sev_user_data_snp_config_transaction transaction = {0};
+ struct sev_device *sev = psp_master->sev_data;
+
+ if (!sev->snp_initialized || !argp->data)
+ return -EINVAL;
+
+ if (!writable)
+ return -EPERM;
+
+ transaction.id = snp_config_transaction_end();
+
+ if (copy_to_user((void __user *)argp->data, &transaction, sizeof(transaction)))
+ return -EFAULT;
+
+ return 0;
+}
+
static long sev_ioctl(struct file *file, unsigned int ioctl, unsigned long arg)
{
void __user *argp = (void __user *)arg;
@@ -2095,6 +2133,12 @@ static long sev_ioctl(struct file *file, unsigned int ioctl, unsigned long arg)
case SNP_SET_CONFIG:
ret = sev_ioctl_do_snp_set_config(&input, writable);
break;
+ case SNP_SET_CONFIG_START:
+ ret = sev_ioctl_do_snp_set_config_start(&input, writable);
+ break;
+ case SNP_SET_CONFIG_END:
+ ret = sev_ioctl_do_snp_set_config_end(&input, writable);
+ break;
default:
ret = -EINVAL;
goto out;
diff --git a/include/uapi/linux/psp-sev.h b/include/uapi/linux/psp-sev.h
index f28d4fb5bc21..9deacb894b1e 100644
--- a/include/uapi/linux/psp-sev.h
+++ b/include/uapi/linux/psp-sev.h
@@ -31,6 +31,8 @@ enum {
SNP_PLATFORM_STATUS,
SNP_COMMIT,
SNP_SET_CONFIG,
+ SNP_SET_CONFIG_START,
+ SNP_SET_CONFIG_END,
SEV_MAX,
};
@@ -214,6 +216,16 @@ struct sev_user_data_snp_config {
__u8 rsvd1[52];
} __packed;
+/**
+ * struct sev_user_data_snp_config_transaction - metadata for config transactions
+ *
+ * @id: the ID of the transaction started/ended by a call to SNP_SET_CONFIG_START
+ * or SNP_SET_CONFIG_END, respectively.
+ */
+struct sev_user_data_snp_config_transaction {
+ __u64 id; /* Out */
+} __packed;
+
/**
* struct sev_issue_cmd - SEV ioctl parameters
*
--
2.25.1
From: Brijesh Singh <[email protected]>
Version 2 of GHCB specification added support for the SNP Guest Request
Message NAE event. The event allows for an SEV-SNP guest to make
requests to the SEV-SNP firmware through hypervisor using the
SNP_GUEST_REQUEST API defined in the SEV-SNP firmware specification.
This is used by guests primarily to request attestation reports from
firmware. There are other request types are available as well, but the
specifics of what guest requests are being made are opaque to the
hypervisor, which only serves as a proxy for the guest requests and
firmware responses.
Implement handling for these events.
Co-developed-by: Alexey Kardashevskiy <[email protected]>
Signed-off-by: Alexey Kardashevskiy <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
[mdr: ensure FW command failures are indicated to guest, drop extended
request handling to be re-written as separate patch, massage commit]
Signed-off-by: Michael Roth <[email protected]>
---
arch/x86/kvm/svm/sev.c | 83 ++++++++++++++++++++++++++++++++++
include/uapi/linux/sev-guest.h | 9 ++++
2 files changed, 92 insertions(+)
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 30a2e75fd94f..05051e36926d 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -19,6 +19,7 @@
#include <linux/misc_cgroup.h>
#include <linux/processor.h>
#include <linux/trace_events.h>
+#include <uapi/linux/sev-guest.h>
#include <asm/pkru.h>
#include <asm/trapnr.h>
@@ -3109,6 +3110,7 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
case SVM_VMGEXIT_HV_FEATURES:
case SVM_VMGEXIT_PSC:
case SVM_VMGEXIT_TERM_REQUEST:
+ case SVM_VMGEXIT_GUEST_REQUEST:
break;
default:
reason = GHCB_ERR_INVALID_EVENT;
@@ -3539,6 +3541,83 @@ static int sev_snp_ap_creation(struct vcpu_svm *svm)
return ret;
}
+static bool snp_setup_guest_buf(struct kvm *kvm, struct sev_data_snp_guest_request *data,
+ gpa_t req_gpa, gpa_t resp_gpa)
+{
+ struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+ kvm_pfn_t req_pfn, resp_pfn;
+
+ if (!IS_ALIGNED(req_gpa, PAGE_SIZE) || !IS_ALIGNED(resp_gpa, PAGE_SIZE))
+ return false;
+
+ req_pfn = gfn_to_pfn(kvm, gpa_to_gfn(req_gpa));
+ if (is_error_noslot_pfn(req_pfn))
+ return false;
+
+ resp_pfn = gfn_to_pfn(kvm, gpa_to_gfn(resp_gpa));
+ if (is_error_noslot_pfn(resp_pfn))
+ return false;
+
+ if (rmp_make_private(resp_pfn, 0, PG_LEVEL_4K, 0, true))
+ return false;
+
+ data->gctx_paddr = __psp_pa(sev->snp_context);
+ data->req_paddr = __sme_set(req_pfn << PAGE_SHIFT);
+ data->res_paddr = __sme_set(resp_pfn << PAGE_SHIFT);
+
+ return true;
+}
+
+static bool snp_cleanup_guest_buf(struct sev_data_snp_guest_request *data)
+{
+ u64 pfn = __sme_clr(data->res_paddr) >> PAGE_SHIFT;
+
+ if (snp_page_reclaim(pfn))
+ return false;
+
+ if (rmp_make_shared(pfn, PG_LEVEL_4K))
+ return false;
+
+ return true;
+}
+
+static bool __snp_handle_guest_req(struct kvm *kvm, gpa_t req_gpa, gpa_t resp_gpa,
+ sev_ret_code *fw_err)
+{
+ struct sev_data_snp_guest_request data = {0};
+ struct kvm_sev_info *sev;
+ bool ret = true;
+
+ if (!sev_snp_guest(kvm))
+ return false;
+
+ sev = &to_kvm_svm(kvm)->sev_info;
+
+ if (!snp_setup_guest_buf(kvm, &data, req_gpa, resp_gpa))
+ return false;
+
+ if (sev_issue_cmd(kvm, SEV_CMD_SNP_GUEST_REQUEST, &data, fw_err))
+ ret = false;
+
+ if (!snp_cleanup_guest_buf(&data))
+ ret = false;
+
+ return ret;
+}
+
+static void snp_handle_guest_req(struct vcpu_svm *svm, gpa_t req_gpa, gpa_t resp_gpa)
+{
+ struct kvm_vcpu *vcpu = &svm->vcpu;
+ struct kvm *kvm = vcpu->kvm;
+ sev_ret_code fw_err = 0;
+ int vmm_ret = 0;
+
+ if (!__snp_handle_guest_req(kvm, req_gpa, resp_gpa, &fw_err))
+ vmm_ret = SNP_GUEST_VMM_ERR_GENERIC;
+
+ ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, SNP_GUEST_ERR(vmm_ret, fw_err));
+}
+
static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
{
struct vmcb_control_area *control = &svm->vmcb->control;
@@ -3799,6 +3878,10 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
vcpu->run->system_event.ndata = 1;
vcpu->run->system_event.data[0] = control->ghcb_gpa;
break;
+ case SVM_VMGEXIT_GUEST_REQUEST:
+ snp_handle_guest_req(svm, control->exit_info_1, control->exit_info_2);
+ ret = 1;
+ break;
case SVM_VMGEXIT_UNSUPPORTED_EVENT:
vcpu_unimpl(vcpu,
"vmgexit: unsupported event - exit_info_1=%#llx, exit_info_2=%#llx\n",
diff --git a/include/uapi/linux/sev-guest.h b/include/uapi/linux/sev-guest.h
index 154a87a1eca9..7bd78e258569 100644
--- a/include/uapi/linux/sev-guest.h
+++ b/include/uapi/linux/sev-guest.h
@@ -89,8 +89,17 @@ struct snp_ext_report_req {
#define SNP_GUEST_FW_ERR_MASK GENMASK_ULL(31, 0)
#define SNP_GUEST_VMM_ERR_SHIFT 32
#define SNP_GUEST_VMM_ERR(x) (((u64)x) << SNP_GUEST_VMM_ERR_SHIFT)
+#define SNP_GUEST_FW_ERR(x) ((x) & SNP_GUEST_FW_ERR_MASK)
+#define SNP_GUEST_ERR(vmm_err, fw_err) (SNP_GUEST_VMM_ERR(vmm_err) | \
+ SNP_GUEST_FW_ERR(fw_err))
+/*
+ * The GHCB spec only formally defines INVALID_LEN/BUSY VMM errors, but define
+ * a GENERIC error code such that it won't ever conflict with GHCB-defined
+ * errors if any get added in the future.
+ */
#define SNP_GUEST_VMM_ERR_INVALID_LEN 1
#define SNP_GUEST_VMM_ERR_BUSY 2
+#define SNP_GUEST_VMM_ERR_GENERIC BIT(31)
#endif /* __UAPI_LINUX_SEV_GUEST_H_ */
--
2.25.1
truncate_inode_pages_range() may attempt to zero pages before truncating
them, and this will occur before arch-specific invalidations can be
triggered via .invalidate_folio/.free_folio hooks via kvm_gmem_aops. For
AMD SEV-SNP this would result in an RMP #PF being generated by the
hardware, which is currently treated as fatal (and even if specifically
allowed for, would not result in anything other than garbage being
written to guest pages due to encryption). On Intel TDX this would also
result in undesirable behavior.
Set the AS_INACCESSIBLE flag to prevent the MM from attempting
unexpected accesses of this sort during operations like truncation.
This may also in some cases yield a decent performance improvement for
guest_memfd userspace implementations that hole-punch ranges immediately
after private->shared conversions via KVM_SET_MEMORY_ATTRIBUTES, since
the current implementation of truncate_inode_pages_range() always ends
up zero'ing an entire 4K range if it is backing by a 2M folio.
Link: https://lore.kernel.org/lkml/[email protected]/
Suggested-by: Sean Christopherson <[email protected]>
Signed-off-by: Michael Roth <[email protected]>
---
virt/kvm/guest_memfd.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index 784690a664ac..4aaa82227978 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -390,6 +390,7 @@ static int __kvm_gmem_create(struct kvm *kvm, loff_t size, u64 flags)
inode->i_private = (void *)(unsigned long)flags;
inode->i_op = &kvm_gmem_iops;
inode->i_mapping->a_ops = &kvm_gmem_aops;
+ inode->i_mapping->flags |= AS_INACCESSIBLE;
inode->i_mode |= S_IFREG;
inode->i_size = size;
mapping_set_gfp_mask(inode->i_mapping, GFP_HIGHUSER);
--
2.25.1
Version 2 of GHCB specification added support for the SNP Extended Guest
Request Message NAE event. This event serves a nearly identical purpose
to the previously-added SNP_GUEST_REQUEST event, but allows for
additional certificate data to be supplied via an additional
guest-supplied buffer to be used mainly for verifying the signature of
an attestation report as returned by firmware.
This certificate data is supplied by userspace, so unlike with
SNP_GUEST_REQUEST events, SNP_EXTENDED_GUEST_REQUEST events are first
forwarded to userspace via a KVM_EXIT_VMGEXIT exit type, and then the
firmware request is made only afterward.
Implement handling for these events.
Since there is a potential for race conditions where the
userspace-supplied certificate data may be out-of-sync relative to the
reported TCB that firmware will use when signing attestation reports,
make use of the transaction/synchronization mechanisms added by the
SNP_SET_CONFIG_{START,END} SEV device ioctls such that the guest will be
told to retry the request when an update to reported TCB or
userspace-supplied certificates may have occurred or is in progress
while an extended guest request is being processed.
Signed-off-by: Michael Roth <[email protected]>
---
Documentation/virt/kvm/api.rst | 26 ++++++++++++
arch/x86/include/asm/sev.h | 4 ++
arch/x86/kvm/svm/sev.c | 75 ++++++++++++++++++++++++++++++++++
arch/x86/kvm/svm/svm.h | 3 ++
arch/x86/virt/svm/sev.c | 20 +++++++++
include/uapi/linux/kvm.h | 6 +++
6 files changed, 134 insertions(+)
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 2a526b4f8e06..960e2153d468 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -7037,6 +7037,7 @@ values in kvm_run even if the corresponding bit in kvm_dirty_regs is not set.
struct kvm_user_vmgexit {
#define KVM_USER_VMGEXIT_PSC_MSR 1
#define KVM_USER_VMGEXIT_PSC 2
+ #define KVM_USER_VMGEXIT_EXT_GUEST_REQ 3
__u32 type; /* KVM_USER_VMGEXIT_* type */
union {
struct {
@@ -7050,6 +7051,11 @@ values in kvm_run even if the corresponding bit in kvm_dirty_regs is not set.
__u64 shared_gpa;
__u64 ret;
} psc;
+ struct {
+ __u64 data_gpa;
+ __u64 data_npages;
+ __u32 ret;
+ } ext_guest_req;
};
};
@@ -7079,6 +7085,26 @@ private/shared state. Userspace will return a value in 'ret' that is in
agreement with the GHCB-defined return values that the guest will expect
in the SW_EXITINFO2 field of the GHCB in response to these requests.
+For the KVM_USER_VMGEXIT_EXT_GUEST_REQ type, the ext_guest_req union type
+is used. The kernel will supply in 'data_gpa' the value the guest supplies
+via the RAX field of the GHCB when issued extended guest requests.
+'data_npages' will similarly contain the value the guest supplies in RBX
+denoting the number of shared pages available to write the certificate
+data into.
+
+ - If the supplied number of pages is sufficient, userspace should write
+ the certificate data blob (in the format defined by the GHCB spec) in
+ the address indicated by 'data_gpa' and set 'ret' to 0.
+
+ - If the number of pages supplied is not sufficient, userspace must write
+ the required number of pages in 'data_npages' and then set 'ret' to 1.
+
+ - If userspace is temporarily unable to handle the request, 'ret' should
+ be set to 2 to inform the guest to retry later.
+
+ - If some other error occurred, userspace should set 'ret' to a non-zero
+ value that is distinct from the specific return values mentioned above.
+
6. Capabilities that can be enabled on vCPUs
============================================
diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index 925578ad34e6..25f533827d62 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -270,6 +270,8 @@ int rmp_make_shared(u64 pfn, enum pg_level level);
void snp_leak_pages(u64 pfn, unsigned int npages);
u64 snp_config_transaction_start(void);
u64 snp_config_transaction_end(void);
+u64 snp_config_transaction_get_id(void);
+bool snp_config_transaction_is_stale(u64 id);
#else
static inline bool snp_probe_rmptable_info(void) { return false; }
static inline int snp_lookup_rmpentry(u64 pfn, bool *assigned, int *level) { return -ENODEV; }
@@ -284,6 +286,8 @@ static inline int rmp_make_shared(u64 pfn, enum pg_level level) { return -ENODEV
static inline void snp_leak_pages(u64 pfn, unsigned int npages) {}
static inline u64 snp_config_transaction_start(void) { return 0; }
static inline u64 snp_config_transaction_end(void) { return 0; }
+static inline u64 snp_config_transaction_get_id(void) { return 0; }
+static inline bool snp_config_transaction_is_stale(u64 id) { return false; }
#endif
#endif
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 05051e36926d..dda195999c42 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3111,6 +3111,7 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
case SVM_VMGEXIT_PSC:
case SVM_VMGEXIT_TERM_REQUEST:
case SVM_VMGEXIT_GUEST_REQUEST:
+ case SVM_VMGEXIT_EXT_GUEST_REQUEST:
break;
default:
reason = GHCB_ERR_INVALID_EVENT;
@@ -3618,6 +3619,77 @@ static void snp_handle_guest_req(struct vcpu_svm *svm, gpa_t req_gpa, gpa_t resp
ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, SNP_GUEST_ERR(vmm_ret, fw_err));
}
+static int snp_complete_ext_guest_req(struct kvm_vcpu *vcpu)
+{
+ struct vcpu_svm *svm = to_svm(vcpu);
+ struct vmcb_control_area *control;
+ struct kvm *kvm = vcpu->kvm;
+ sev_ret_code fw_err = 0;
+ int vmm_ret;
+
+ vmm_ret = vcpu->run->vmgexit.ext_guest_req.ret;
+ if (vmm_ret) {
+ if (vmm_ret == SNP_GUEST_VMM_ERR_INVALID_LEN)
+ vcpu->arch.regs[VCPU_REGS_RBX] =
+ vcpu->run->vmgexit.ext_guest_req.data_npages;
+ goto abort_request;
+ }
+
+ control = &svm->vmcb->control;
+
+ if (!__snp_handle_guest_req(kvm, control->exit_info_1, control->exit_info_2,
+ &fw_err))
+ vmm_ret = SNP_GUEST_VMM_ERR_GENERIC;
+
+ /*
+ * Give errors related to stale transactions precedence to provide more
+ * potential options for servicing firmware while guests are running.
+ */
+ if (snp_config_transaction_is_stale(svm->snp_transaction_id))
+ vmm_ret = SNP_GUEST_VMM_ERR_BUSY;
+
+abort_request:
+ ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, SNP_GUEST_ERR(vmm_ret, fw_err));
+
+ return 1; /* resume guest */
+}
+
+static int snp_begin_ext_guest_req(struct kvm_vcpu *vcpu)
+{
+ int vmm_ret = SNP_GUEST_VMM_ERR_GENERIC;
+ struct vcpu_svm *svm = to_svm(vcpu);
+ unsigned long data_npages;
+ sev_ret_code fw_err;
+ gpa_t data_gpa;
+
+ if (!sev_snp_guest(vcpu->kvm))
+ goto abort_request;
+
+ data_gpa = vcpu->arch.regs[VCPU_REGS_RAX];
+ data_npages = vcpu->arch.regs[VCPU_REGS_RBX];
+
+ if (!IS_ALIGNED(data_gpa, PAGE_SIZE))
+ goto abort_request;
+
+ svm->snp_transaction_id = snp_config_transaction_get_id();
+ if (snp_config_transaction_is_stale(svm->snp_transaction_id)) {
+ vmm_ret = SNP_GUEST_VMM_ERR_BUSY;
+ goto abort_request;
+ }
+
+ vcpu->run->exit_reason = KVM_EXIT_VMGEXIT;
+ vcpu->run->vmgexit.type = KVM_USER_VMGEXIT_EXT_GUEST_REQ;
+ vcpu->run->vmgexit.ext_guest_req.data_gpa = data_gpa;
+ vcpu->run->vmgexit.ext_guest_req.data_npages = data_npages;
+ vcpu->arch.complete_userspace_io = snp_complete_ext_guest_req;
+
+ return 0; /* forward request to userspace */
+
+abort_request:
+ ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, SNP_GUEST_ERR(vmm_ret, fw_err));
+ return 1; /* resume guest */
+}
+
static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
{
struct vmcb_control_area *control = &svm->vmcb->control;
@@ -3882,6 +3954,9 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
snp_handle_guest_req(svm, control->exit_info_1, control->exit_info_2);
ret = 1;
break;
+ case SVM_VMGEXIT_EXT_GUEST_REQUEST:
+ ret = snp_begin_ext_guest_req(vcpu);
+ break;
case SVM_VMGEXIT_UNSUPPORTED_EVENT:
vcpu_unimpl(vcpu,
"vmgexit: unsupported event - exit_info_1=%#llx, exit_info_2=%#llx\n",
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index a56109e100ac..a2ac6dc3a79a 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -307,6 +307,9 @@ struct vcpu_svm {
/* Guest GIF value, used when vGIF is not enabled */
bool guest_gif;
+
+ /* Transaction ID associated with SNP config updates */
+ u64 snp_transaction_id;
};
struct svm_cpu_data {
diff --git a/arch/x86/virt/svm/sev.c b/arch/x86/virt/svm/sev.c
index fc9e1b7fc187..fee273a84030 100644
--- a/arch/x86/virt/svm/sev.c
+++ b/arch/x86/virt/svm/sev.c
@@ -542,3 +542,23 @@ u64 snp_config_transaction_end(void)
return id;
}
EXPORT_SYMBOL_GPL(snp_config_transaction_end);
+
+u64 snp_config_transaction_get_id(void)
+{
+ return snp_transaction_id;
+}
+EXPORT_SYMBOL_GPL(snp_config_transaction_get_id);
+
+bool snp_config_transaction_is_stale(u64 id)
+{
+ bool stale = false;
+
+ mutex_lock(&snp_transaction_lock);
+ if (snp_transaction_pending ||
+ id != snp_transaction_id)
+ stale = true;
+ mutex_unlock(&snp_transaction_lock);
+
+ return stale;
+}
+EXPORT_SYMBOL_GPL(snp_config_transaction_is_stale);
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index e0599144387b..fe8994b95de9 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -170,6 +170,7 @@ struct kvm_xen_exit {
struct kvm_user_vmgexit {
#define KVM_USER_VMGEXIT_PSC_MSR 1
#define KVM_USER_VMGEXIT_PSC 2
+#define KVM_USER_VMGEXIT_EXT_GUEST_REQ 3
__u32 type; /* KVM_USER_VMGEXIT_* type */
union {
struct {
@@ -183,6 +184,11 @@ struct kvm_user_vmgexit {
__u64 shared_gpa;
__u64 ret;
} psc;
+ struct {
+ __u64 data_gpa;
+ __u64 data_npages;
+ __u32 ret;
+ } ext_guest_req;
};
};
--
2.25.1
guest_memfd pages are generally expected to be in some arch-defined
initial state prior to using them for guest memory. For SEV-SNP this
initial state is 'private', or 'guest-owned', and requires additional
operations to move these pages into a 'private' state by updating the
corresponding entries the RMP table.
Allow for an arch-defined hook to handle updates of this sort, and go
ahead and implement one for x86 so KVM implementations like AMD SVM can
register a kvm_x86_ops callback to handle these updates for SEV-SNP
guests.
The preparation callback is always called when allocating/grabbing
folios via gmem, and it is up to the architecture to keep track of
whether or not the pages are already in the expected state (e.g. the RMP
table in the case of SEV-SNP).
In some cases, it is necessary to defer the preparation of the pages to
handle things like in-place encryption of initial guest memory payloads
before marking these pages as 'private'/'guest-owned', so also add a
helper that performs the same function as kvm_gmem_get_pfn(), but allows
for the preparation callback to be bypassed to allow for pages to be
accessed beforehand.
Link: https://lore.kernel.org/lkml/[email protected]/
Signed-off-by: Michael Roth <[email protected]>
---
arch/x86/include/asm/kvm-x86-ops.h | 1 +
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/kvm/x86.c | 6 ++++
include/linux/kvm_host.h | 14 ++++++++
virt/kvm/Kconfig | 4 +++
virt/kvm/guest_memfd.c | 56 +++++++++++++++++++++++++++---
6 files changed, 77 insertions(+), 5 deletions(-)
diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
index ab24ce207988..5e6b58439100 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -139,6 +139,7 @@ KVM_X86_OP(vcpu_deliver_sipi_vector)
KVM_X86_OP_OPTIONAL_RET0(vcpu_get_apicv_inhibit_reasons);
KVM_X86_OP_OPTIONAL(get_untagged_addr)
KVM_X86_OP_OPTIONAL(alloc_apic_backing_page)
+KVM_X86_OP_OPTIONAL_RET0(gmem_prepare)
#undef KVM_X86_OP
#undef KVM_X86_OP_OPTIONAL
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 9b0f18d096ed..1fc14aa58913 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1795,6 +1795,7 @@ struct kvm_x86_ops {
gva_t (*get_untagged_addr)(struct kvm_vcpu *vcpu, gva_t gva, unsigned int flags);
void *(*alloc_apic_backing_page)(struct kvm_vcpu *vcpu);
+ int (*gmem_prepare)(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, int max_order);
};
struct kvm_x86_nested_ops {
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 27e23714e960..67d7c9e1331c 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -13512,6 +13512,12 @@ bool kvm_arch_no_poll(struct kvm_vcpu *vcpu)
}
EXPORT_SYMBOL_GPL(kvm_arch_no_poll);
+#ifdef CONFIG_HAVE_KVM_GMEM_PREPARE
+int kvm_arch_gmem_prepare(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn, int max_order)
+{
+ return static_call(kvm_x86_gmem_prepare)(kvm, pfn, gfn, max_order);
+}
+#endif
int kvm_spec_ctrl_test_value(u64 value)
{
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 7e7fd25b09b3..22feb4910854 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -2375,9 +2375,19 @@ static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
#endif /* CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES */
#ifdef CONFIG_KVM_PRIVATE_MEM
+int __kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
+ gfn_t gfn, kvm_pfn_t *pfn, int *max_order, bool prep);
int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
gfn_t gfn, kvm_pfn_t *pfn, int *max_order);
#else
+static inline int __kvm_gmem_get_pfn(struct kvm *kvm,
+ struct kvm_memory_slot *slot, gfn_t gfn,
+ kvm_pfn_t *pfn, int *max_order, bool prep)
+{
+ KVM_BUG_ON(1, kvm);
+ return -EIO;
+}
+
static inline int kvm_gmem_get_pfn(struct kvm *kvm,
struct kvm_memory_slot *slot, gfn_t gfn,
kvm_pfn_t *pfn, int *max_order)
@@ -2387,4 +2397,8 @@ static inline int kvm_gmem_get_pfn(struct kvm *kvm,
}
#endif /* CONFIG_KVM_PRIVATE_MEM */
+#ifdef CONFIG_HAVE_KVM_GMEM_PREPARE
+int kvm_arch_gmem_prepare(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn, int max_order);
+#endif
+
#endif
diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
index 6793211a0b64..f5d6256607d2 100644
--- a/virt/kvm/Kconfig
+++ b/virt/kvm/Kconfig
@@ -103,3 +103,7 @@ config KVM_GENERIC_PRIVATE_MEM
select KVM_GENERIC_MEMORY_ATTRIBUTES
select KVM_PRIVATE_MEM
bool
+
+config HAVE_KVM_GMEM_PREPARE
+ bool
+ depends on KVM_PRIVATE_MEM
diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index 4aaa82227978..5e88e525cf75 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -43,7 +43,40 @@ static struct folio *kvm_gmem_get_huge_folio(struct inode *inode, pgoff_t index,
return folio;
}
-static struct folio *kvm_gmem_get_folio(struct inode *inode, pgoff_t index)
+static int kvm_gmem_prepare_folio(struct inode *inode, pgoff_t index, struct folio *folio)
+{
+#ifdef CONFIG_HAVE_KVM_GMEM_PREPARE
+ struct list_head *gmem_list = &inode->i_mapping->i_private_list;
+ struct kvm_gmem *gmem;
+
+ list_for_each_entry(gmem, gmem_list, entry) {
+ struct kvm_memory_slot *slot;
+ struct kvm *kvm = gmem->kvm;
+ struct page *page;
+ kvm_pfn_t pfn;
+ gfn_t gfn;
+ int rc;
+
+ slot = xa_load(&gmem->bindings, index);
+ if (!slot)
+ continue;
+
+ page = folio_file_page(folio, index);
+ pfn = page_to_pfn(page);
+ gfn = slot->base_gfn + index - slot->gmem.pgoff;
+ rc = kvm_arch_gmem_prepare(kvm, gfn, pfn, compound_order(compound_head(page)));
+ if (rc) {
+ pr_warn_ratelimited("gmem: Failed to prepare folio for index %lx, error %d.\n",
+ index, rc);
+ return rc;
+ }
+ }
+
+#endif
+ return 0;
+}
+
+static struct folio *kvm_gmem_get_folio(struct inode *inode, pgoff_t index, bool prep)
{
struct folio *folio;
@@ -73,6 +106,12 @@ static struct folio *kvm_gmem_get_folio(struct inode *inode, pgoff_t index)
folio_mark_uptodate(folio);
}
+ if (prep && kvm_gmem_prepare_folio(inode, index, folio)) {
+ folio_unlock(folio);
+ folio_put(folio);
+ return NULL;
+ }
+
/*
* Ignore accessed, referenced, and dirty flags. The memory is
* unevictable and there is no storage to write back to.
@@ -177,7 +216,7 @@ static long kvm_gmem_allocate(struct inode *inode, loff_t offset, loff_t len)
break;
}
- folio = kvm_gmem_get_folio(inode, index);
+ folio = kvm_gmem_get_folio(inode, index, true);
if (!folio) {
r = -ENOMEM;
break;
@@ -517,8 +556,8 @@ void kvm_gmem_unbind(struct kvm_memory_slot *slot)
fput(file);
}
-int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
- gfn_t gfn, kvm_pfn_t *pfn, int *max_order)
+int __kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
+ gfn_t gfn, kvm_pfn_t *pfn, int *max_order, bool prep)
{
pgoff_t index, huge_index;
struct kvm_gmem *gmem;
@@ -539,7 +578,7 @@ int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
goto out_fput;
}
- folio = kvm_gmem_get_folio(file_inode(file), index);
+ folio = kvm_gmem_get_folio(file_inode(file), index, prep);
if (!folio) {
r = -ENOMEM;
goto out_fput;
@@ -580,4 +619,11 @@ int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
return r;
}
+EXPORT_SYMBOL_GPL(__kvm_gmem_get_pfn);
+
+int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
+ gfn_t gfn, kvm_pfn_t *pfn, int *max_order)
+{
+ return __kvm_gmem_get_pfn(kvm, slot, gfn, pfn, max_order, true);
+}
EXPORT_SYMBOL_GPL(kvm_gmem_get_pfn);
--
2.25.1
In some cases the full 64-bit error code for the KVM page fault will be
needed to determine things like whether or not a fault was for a private
or shared guest page, so update related code to accept the full 64-bit
value so it can be plumbed all the way through to where it is needed.
The accessors of fault->error_code are changed as follows:
- FNAME(page_fault): change to explicitly use lower_32_bits() since that
is no longer done in kvm_mmu_page_fault()
- kvm_mmu_page_fault(): explicit mask with PFERR_RSVD_MASK,
PFERR_NESTED_GUEST_PAGE
- mmutrace: changed u32 -> u64
Signed-off-by: Isaku Yamahata <[email protected]>
Link: https://lore.kernel.org/kvm/[email protected]/T/#mbd0b20c9a2cf50319d5d2a27b63f73c772112076
[mdr: drop references/changes to code not in current gmem tree, update
commit message]
Signed-off-by: Michael Roth <[email protected]>
---
arch/x86/kvm/mmu/mmu.c | 3 +--
arch/x86/kvm/mmu/mmu_internal.h | 4 ++--
arch/x86/kvm/mmu/mmutrace.h | 2 +-
arch/x86/kvm/mmu/paging_tmpl.h | 2 +-
4 files changed, 5 insertions(+), 6 deletions(-)
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 3c844e428684..d3fbfe0686a0 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -5813,8 +5813,7 @@ int noinline kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 err
}
if (r == RET_PF_INVALID) {
- r = kvm_mmu_do_page_fault(vcpu, cr2_or_gpa,
- lower_32_bits(error_code), false,
+ r = kvm_mmu_do_page_fault(vcpu, cr2_or_gpa, error_code, false,
&emulation_type);
if (KVM_BUG_ON(r == RET_PF_INVALID, vcpu->kvm))
return -EIO;
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index 0669a8a668ca..21f55e8b4dc6 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -190,7 +190,7 @@ static inline bool is_nx_huge_page_enabled(struct kvm *kvm)
struct kvm_page_fault {
/* arguments to kvm_mmu_do_page_fault. */
const gpa_t addr;
- const u32 error_code;
+ const u64 error_code;
const bool prefetch;
/* Derived from error_code. */
@@ -280,7 +280,7 @@ enum {
};
static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
- u32 err, bool prefetch, int *emulation_type)
+ u64 err, bool prefetch, int *emulation_type)
{
struct kvm_page_fault fault = {
.addr = cr2_or_gpa,
diff --git a/arch/x86/kvm/mmu/mmutrace.h b/arch/x86/kvm/mmu/mmutrace.h
index ae86820cef69..195d98bc8de8 100644
--- a/arch/x86/kvm/mmu/mmutrace.h
+++ b/arch/x86/kvm/mmu/mmutrace.h
@@ -260,7 +260,7 @@ TRACE_EVENT(
TP_STRUCT__entry(
__field(int, vcpu_id)
__field(gpa_t, cr2_or_gpa)
- __field(u32, error_code)
+ __field(u64, error_code)
__field(u64 *, sptep)
__field(u64, old_spte)
__field(u64, new_spte)
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 4d4e98fe4f35..c418f3b1cfca 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -787,7 +787,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
* The bit needs to be cleared before walking guest page tables.
*/
r = FNAME(walk_addr)(&walker, vcpu, fault->addr,
- fault->error_code & ~PFERR_RSVD_MASK);
+ lower_32_bits(fault->error_code) & ~PFERR_RSVD_MASK);
/*
* The page is not mapped by the guest. Let the guest handle it.
--
2.25.1
In some cases, like with SEV-SNP, guest memory needs to be updated in a
platform-specific manner before it can be safely freed back to the host.
Wire up arch-defined hooks to the .free_folio kvm_gmem_aops callback to
allow for special handling of this sort when freeing memory in response
to FALLOC_FL_PUNCH_HOLE operations and when releasing the inode, and go
ahead and define an arch-specific hook for x86 since it will be needed
for handling memory used for SEV-SNP guests.
Signed-off-by: Michael Roth <[email protected]>
---
arch/x86/include/asm/kvm-x86-ops.h | 1 +
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/kvm/x86.c | 7 +++++++
include/linux/kvm_host.h | 4 ++++
virt/kvm/Kconfig | 4 ++++
virt/kvm/guest_memfd.c | 14 ++++++++++++++
6 files changed, 31 insertions(+)
diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
index 5e6b58439100..c4b7b0db7be3 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -140,6 +140,7 @@ KVM_X86_OP_OPTIONAL_RET0(vcpu_get_apicv_inhibit_reasons);
KVM_X86_OP_OPTIONAL(get_untagged_addr)
KVM_X86_OP_OPTIONAL(alloc_apic_backing_page)
KVM_X86_OP_OPTIONAL_RET0(gmem_prepare)
+KVM_X86_OP_OPTIONAL(gmem_invalidate)
#undef KVM_X86_OP
#undef KVM_X86_OP_OPTIONAL
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 1fc14aa58913..63596fe45013 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1796,6 +1796,7 @@ struct kvm_x86_ops {
gva_t (*get_untagged_addr)(struct kvm_vcpu *vcpu, gva_t gva, unsigned int flags);
void *(*alloc_apic_backing_page)(struct kvm_vcpu *vcpu);
int (*gmem_prepare)(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, int max_order);
+ void (*gmem_invalidate)(kvm_pfn_t start, kvm_pfn_t end);
};
struct kvm_x86_nested_ops {
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 67d7c9e1331c..aaf71e5c1d18 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -13519,6 +13519,13 @@ int kvm_arch_gmem_prepare(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn, int max_ord
}
#endif
+#ifdef CONFIG_HAVE_KVM_GMEM_INVALIDATE
+void kvm_arch_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end)
+{
+ static_call_cond(kvm_x86_gmem_invalidate)(start, end);
+}
+#endif
+
int kvm_spec_ctrl_test_value(u64 value)
{
/*
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 22feb4910854..a2a8331fbb94 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -2401,4 +2401,8 @@ static inline int kvm_gmem_get_pfn(struct kvm *kvm,
int kvm_arch_gmem_prepare(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn, int max_order);
#endif
+#ifdef CONFIG_HAVE_KVM_GMEM_INVALIDATE
+void kvm_arch_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end);
+#endif
+
#endif
diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
index f5d6256607d2..734b7d467380 100644
--- a/virt/kvm/Kconfig
+++ b/virt/kvm/Kconfig
@@ -107,3 +107,7 @@ config KVM_GENERIC_PRIVATE_MEM
config HAVE_KVM_GMEM_PREPARE
bool
depends on KVM_PRIVATE_MEM
+
+config HAVE_KVM_GMEM_INVALIDATE
+ bool
+ depends on KVM_PRIVATE_MEM
diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index 5e88e525cf75..feec0da93d98 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -370,10 +370,24 @@ static int kvm_gmem_error_folio(struct address_space *mapping,
return MF_DELAYED;
}
+#ifdef CONFIG_HAVE_KVM_GMEM_INVALIDATE
+static void kvm_gmem_free_folio(struct folio *folio)
+{
+ struct page *page = folio_page(folio, 0);
+ kvm_pfn_t pfn = page_to_pfn(page);
+ int order = folio_order(folio);
+
+ kvm_arch_gmem_invalidate(pfn, pfn + (1ul << order));
+}
+#endif
+
static const struct address_space_operations kvm_gmem_aops = {
.dirty_folio = noop_dirty_folio,
.migrate_folio = kvm_gmem_migrate_folio,
.error_remove_folio = kvm_gmem_error_folio,
+#ifdef CONFIG_HAVE_KVM_GMEM_INVALIDATE
+ .free_folio = kvm_gmem_free_folio,
+#endif
};
static int kvm_gmem_getattr(struct mnt_idmap *idmap, const struct path *path,
--
2.25.1
From: Brijesh Singh <[email protected]>
When SEV-SNP is enabled in the guest, the hardware places restrictions
on all memory accesses based on the contents of the RMP table. When
hardware encounters RMP check failure caused by the guest memory access
it raises the #NPF. The error code contains additional information on
the access type. See the APM volume 2 for additional information.
When using gmem, RMP faults resulting from mismatches between the state
in the RMP table vs. what the guest expects via its page table result
in KVM_EXIT_MEMORY_FAULTs being forwarded to userspace to handle. This
means the only expected case that needs to be handled in the kernel is
when the page size of the entry in the RMP table is larger than the
mapping in the nested page table, in which case a PSMASH instruction
needs to be issued to split the large RMP entry into individual 4K
entries so that subsequent accesses can succeed.
Signed-off-by: Brijesh Singh <[email protected]>
Co-developed-by: Michael Roth <[email protected]>
Signed-off-by: Michael Roth <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
---
arch/x86/include/asm/sev.h | 3 ++
arch/x86/kvm/svm/sev.c | 92 ++++++++++++++++++++++++++++++++++++++
arch/x86/kvm/svm/svm.c | 21 +++++++--
arch/x86/kvm/svm/svm.h | 1 +
4 files changed, 113 insertions(+), 4 deletions(-)
diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index 435ba9bc4510..e84dd1d2d8ab 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -90,6 +90,9 @@ extern bool handle_vc_boot_ghcb(struct pt_regs *regs);
/* RMUPDATE detected 4K page and 2MB page overlap. */
#define RMPUPDATE_FAIL_OVERLAP 4
+/* PSMASH failed due to concurrent access by another CPU */
+#define PSMASH_FAIL_INUSE 3
+
/* RMP page size */
#define RMP_PG_SIZE_4K 0
#define RMP_PG_SIZE_2M 1
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 8b6143110411..ad1aea7f6266 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3276,6 +3276,13 @@ static void set_ghcb_msr(struct vcpu_svm *svm, u64 value)
svm->vmcb->control.ghcb_gpa = value;
}
+static int snp_rmptable_psmash(kvm_pfn_t pfn)
+{
+ pfn = pfn & ~(KVM_PAGES_PER_HPAGE(PG_LEVEL_2M) - 1);
+
+ return psmash(pfn);
+}
+
static int snp_complete_psc_msr(struct kvm_vcpu *vcpu)
{
struct vcpu_svm *svm = to_svm(vcpu);
@@ -3835,3 +3842,88 @@ struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu)
return p;
}
+
+void handle_rmp_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code)
+{
+ struct kvm_memory_slot *slot;
+ struct kvm *kvm = vcpu->kvm;
+ int order, rmp_level, ret;
+ bool assigned;
+ kvm_pfn_t pfn;
+ gfn_t gfn;
+
+ gfn = gpa >> PAGE_SHIFT;
+
+ /*
+ * The only time RMP faults occur for shared pages is when the guest is
+ * triggering an RMP fault for an implicit page-state change from
+ * shared->private. Implicit page-state changes are forwarded to
+ * userspace via KVM_EXIT_MEMORY_FAULT events, however, so RMP faults
+ * for shared pages should not end up here.
+ */
+ if (!kvm_mem_is_private(kvm, gfn)) {
+ pr_warn_ratelimited("SEV: Unexpected RMP fault, size-mismatch for non-private GPA 0x%llx\n",
+ gpa);
+ return;
+ }
+
+ slot = gfn_to_memslot(kvm, gfn);
+ if (!kvm_slot_can_be_private(slot)) {
+ pr_warn_ratelimited("SEV: Unexpected RMP fault, non-private slot for GPA 0x%llx\n",
+ gpa);
+ return;
+ }
+
+ ret = kvm_gmem_get_pfn(kvm, slot, gfn, &pfn, &order);
+ if (ret) {
+ pr_warn_ratelimited("SEV: Unexpected RMP fault, no private backing page for GPA 0x%llx\n",
+ gpa);
+ return;
+ }
+
+ ret = snp_lookup_rmpentry(pfn, &assigned, &rmp_level);
+ if (ret || !assigned) {
+ pr_warn_ratelimited("SEV: Unexpected RMP fault, no assigned RMP entry found for GPA 0x%llx PFN 0x%llx error %d\n",
+ gpa, pfn, ret);
+ goto out;
+ }
+
+ /*
+ * There are 2 cases where a PSMASH may be needed to resolve an #NPF
+ * with PFERR_GUEST_RMP_BIT set:
+ *
+ * 1) RMPADJUST/PVALIDATE can trigger an #NPF with PFERR_GUEST_SIZEM
+ * bit set if the guest issues them with a smaller granularity than
+ * what is indicated by the page-size bit in the 2MB-aligned RMP
+ * entry for the PFN that backs the GPA.
+ *
+ * 2) Guest access via NPT can trigger an #NPF if the NPT mapping is
+ * smaller than what is indicated by the 2MB-aligned RMP entry for
+ * the PFN that backs the GPA.
+ *
+ * In both these cases, the corresponding 2M RMP entry needs to
+ * be PSMASH'd to 512 4K RMP entries. If the RMP entry is already
+ * split into 4K RMP entries, then this is likely a spurious case which
+ * can occur when there are concurrent accesses by the guest to a 2MB
+ * GPA range that is backed by a 2MB-aligned PFN who's RMP entry is in
+ * the process of being PMASH'd into 4K entries. These cases should
+ * resolve automatically on subsequent accesses, so just ignore them
+ * here.
+ */
+ if (rmp_level == PG_LEVEL_4K) {
+ pr_debug_ratelimited("%s: Spurious RMP fault for GPA 0x%llx, error_code 0x%llx",
+ __func__, gpa, error_code);
+ goto out;
+ }
+
+ pr_debug_ratelimited("%s: Splitting 2M RMP entry for GPA 0x%llx, error_code 0x%llx",
+ __func__, gpa, error_code);
+ ret = snp_rmptable_psmash(pfn);
+ if (ret && ret != PSMASH_FAIL_INUSE)
+ pr_err_ratelimited("SEV: Unable to split RMP entry for GPA 0x%llx PFN 0x%llx ret %d\n",
+ gpa, pfn, ret);
+
+ kvm_zap_gfn_range(kvm, gfn, gfn + PTRS_PER_PMD);
+out:
+ put_page(pfn_to_page(pfn));
+}
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 18d55df7fa5f..4367da074612 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -2051,15 +2051,28 @@ static int pf_interception(struct kvm_vcpu *vcpu)
static int npf_interception(struct kvm_vcpu *vcpu)
{
struct vcpu_svm *svm = to_svm(vcpu);
+ int rc;
u64 fault_address = svm->vmcb->control.exit_info_2;
u64 error_code = svm->vmcb->control.exit_info_1;
trace_kvm_page_fault(vcpu, fault_address, error_code);
- return kvm_mmu_page_fault(vcpu, fault_address, error_code,
- static_cpu_has(X86_FEATURE_DECODEASSISTS) ?
- svm->vmcb->control.insn_bytes : NULL,
- svm->vmcb->control.insn_len);
+ rc = kvm_mmu_page_fault(vcpu, fault_address, error_code,
+ static_cpu_has(X86_FEATURE_DECODEASSISTS) ?
+ svm->vmcb->control.insn_bytes : NULL,
+ svm->vmcb->control.insn_len);
+
+ /*
+ * rc == 0 indicates a userspace exit is needed to handle page
+ * transitions, so do that first before updating the RMP table.
+ */
+ if (error_code & PFERR_GUEST_RMP_MASK) {
+ if (rc == 0)
+ return rc;
+ handle_rmp_page_fault(vcpu, fault_address, error_code);
+ }
+
+ return rc;
}
static int db_interception(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 2bee24017bae..fb98d88d8124 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -717,6 +717,7 @@ void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector);
void sev_es_prepare_switch_to_guest(struct sev_es_save_area *hostsa);
void sev_es_unmap_ghcb(struct vcpu_svm *svm);
struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu);
+void handle_rmp_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code);
/* vmenter.S */
--
2.25.1
On Sat, Dec 30, 2023, Michael Roth wrote:
> In some cases the full 64-bit error code for the KVM page fault will be
> needed to determine things like whether or not a fault was for a private
> or shared guest page, so update related code to accept the full 64-bit
> value so it can be plumbed all the way through to where it is needed.
>
> The accessors of fault->error_code are changed as follows:
>
> - FNAME(page_fault): change to explicitly use lower_32_bits() since that
> is no longer done in kvm_mmu_page_fault()
> - kvm_mmu_page_fault(): explicit mask with PFERR_RSVD_MASK,
> PFERR_NESTED_GUEST_PAGE
> - mmutrace: changed u32 -> u64
>
> Signed-off-by: Isaku Yamahata <[email protected]>
> Link: https://lore.kernel.org/kvm/[email protected]/T/#mbd0b20c9a2cf50319d5d2a27b63f73c772112076
> [mdr: drop references/changes to code not in current gmem tree, update
> commit message]
> Signed-off-by: Michael Roth <[email protected]>
I assume Isaku is the original author? If so, that's missing from this patch.
On Sat, Dec 30, 2023 at 6:26 PM Michael Roth <[email protected]> wrote:
>
> From: Brijesh Singh <[email protected]>
>
> The KVM_SNP_INIT command is used by the hypervisor to initialize the
> SEV-SNP platform context. In a typical workflow, this command should be
> the first command issued. When creating SEV-SNP guest, the VMM must use
> this command instead of the KVM_SEV_INIT or KVM_SEV_ES_INIT.
>
> The flags value must be zero, it will be extended in future SNP support
> to communicate the optional features (such as restricted INT injection
> etc).
We have a (preexisting) problem in that KVM_SEV_INIT and
KVM_SEV_ES_INIT are not flexible enough. debug_swap has broken
measurements of the VMSA because it changed the contents of the VMSA
under userspace's feet, therefore VMSA features need to be passed into
the API somehow. It's preexisting but we need to fix it before the new
KVM_SNP_INIT API makes it worse.
I have started prototyping a change to move SEV-ES/SEV-SNP to
KVM_CREATE_VM, and introduce a single KVM_SEV_INIT_VM operation that
can be used for the PSP initialization.
> +The flags bitmap is defined as::
> +
> + /* enable the restricted injection */
> + #define KVM_SEV_SNP_RESTRICTED_INJET (1<<0)
> +
> + /* enable the restricted injection timer */
> + #define KVM_SEV_SNP_RESTRICTED_TIMER_INJET (1<<1)
These are not yet supported, so they might as well not be documented.
If you want to document them, you need to provide an API to query
SEV_SNP_SUPPORTED_FLAGS. Let's do that later.
> + if (params.flags & ~SEV_SNP_SUPPORTED_FLAGS)
> + ret = -EOPNOTSUPP;
> +
> + params.flags = SEV_SNP_SUPPORTED_FLAGS;
This assignment is not necessary.
Paolo
On Tue, Feb 6, 2024 at 9:52 PM Sean Christopherson <[email protected]> wrote:
>
> On Sat, Dec 30, 2023, Michael Roth wrote:
> > In some cases the full 64-bit error code for the KVM page fault will be
> > needed to determine things like whether or not a fault was for a private
> > or shared guest page, so update related code to accept the full 64-bit
> > value so it can be plumbed all the way through to where it is needed.
> >
> > The accessors of fault->error_code are changed as follows:
> >
> > - FNAME(page_fault): change to explicitly use lower_32_bits() since that
> > is no longer done in kvm_mmu_page_fault()
> > - kvm_mmu_page_fault(): explicit mask with PFERR_RSVD_MASK,
> > PFERR_NESTED_GUEST_PAGE
> > - mmutrace: changed u32 -> u64
> >
> > Signed-off-by: Isaku Yamahata <[email protected]>
> > Link: https://lore.kernel.org/kvm/[email protected]/T/#mbd0b20c9a2cf50319d5d2a27b63f73c772112076
> > [mdr: drop references/changes to code not in current gmem tree, update
> > commit message]
> > Signed-off-by: Michael Roth <[email protected]>
>
> I assume Isaku is the original author? If so, that's missing from this patch.
The root of this patch seem to be in a reply to "KVM: x86: Add
'fault_is_private' x86 op"
(https://patchew.org/linux/[email protected]/[email protected]/),
so yes.
Paolo
On Mon, Feb 12, 2024 at 11:00:27AM +0100, Paolo Bonzini wrote:
> On Tue, Feb 6, 2024 at 9:52 PM Sean Christopherson <[email protected]> wrote:
> >
> > On Sat, Dec 30, 2023, Michael Roth wrote:
> > > In some cases the full 64-bit error code for the KVM page fault will be
> > > needed to determine things like whether or not a fault was for a private
> > > or shared guest page, so update related code to accept the full 64-bit
> > > value so it can be plumbed all the way through to where it is needed.
> > >
> > > The accessors of fault->error_code are changed as follows:
> > >
> > > - FNAME(page_fault): change to explicitly use lower_32_bits() since that
> > > is no longer done in kvm_mmu_page_fault()
> > > - kvm_mmu_page_fault(): explicit mask with PFERR_RSVD_MASK,
> > > PFERR_NESTED_GUEST_PAGE
> > > - mmutrace: changed u32 -> u64
> > >
> > > Signed-off-by: Isaku Yamahata <[email protected]>
> > > Link: https://lore.kernel.org/kvm/[email protected]/T/#mbd0b20c9a2cf50319d5d2a27b63f73c772112076
> > > [mdr: drop references/changes to code not in current gmem tree, update
> > > commit message]
> > > Signed-off-by: Michael Roth <[email protected]>
> >
> > I assume Isaku is the original author? If so, that's missing from this patch.
>
> The root of this patch seem to be in a reply to "KVM: x86: Add
> 'fault_is_private' x86 op"
> (https://patchew.org/linux/[email protected]/[email protected]/),
> so yes.
Yes this is Isaku's patch, I think the authorship got mangled during a rebase.
I'll make sure to get that fixed up.
-Mike
>
> Paolo
>
On Sat, Dec 30, 2023 at 6:26 PM Michael Roth <[email protected]> wrote:
> + struct kvm_snp_init {
> + __u64 flags;
> + };
> +
> +The flags bitmap is defined as::
> +
> + /* enable the restricted injection */
> + #define KVM_SEV_SNP_RESTRICTED_INJET (1<<0)
> +
> + /* enable the restricted injection timer */
> + #define KVM_SEV_SNP_RESTRICTED_TIMER_INJET (1<<1)
The flags are the same as the vmsa_features introduced by
KVM_SEV_INIT2, which is great - SNP does not need any change in this
department and this patch almost entirely goes away.
> if (sev_es_debug_swap_enabled)
> save->sev_features |= SVM_SEV_FEAT_DEBUG_SWAP;
>
> + /* Enable the SEV-SNP feature */
> + if (sev_snp_guest(svm->vcpu.kvm))
> + save->sev_features |= SVM_SEV_FEAT_SNP_ACTIVE;
... on the other hand this begs the question whether
SVM_SEV_FEAT_SNP_ACTIVE should be exposed in the
KVM_X86_SEV_VMSA_FEATURES attribute. I think it shouldn't.
This means that this patch becomes a two-liner change to
sev_guest_init() that you can squash in patch 14 ("KVM: SEV: Add
initial SEV-SNP support"):
sev->es_active = es_active;
sev->vmsa_features = data->vmsa_features;
+ if (vm_type == KVM_X86_SNP_VM)
+ sev->vmsa_features |= SVM_SEV_FEAT_SNP_ACTIVE
Also, since there is now sev->vmsa_features (that wasn't there at the
time of your posting), I'd even drop sev->snp_active in favor of
"sev->vmsa_features & SVM_SEV_FEAT_SNP_ACTIVE". It's only ever used in
sev_snp_guest() so it's a useless duplication.
Looking forward to see v12. :) If you have any problems rebasing on
top of https://lore.kernel.org/kvm/[email protected]/,
please shout.
Paolo
> pr_debug("Virtual Machine Save Area (VMSA):\n");
> print_hex_dump_debug("", DUMP_PREFIX_NONE, 16, 1, save, sizeof(*save), false);
>
> @@ -1883,6 +1914,12 @@ int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
> }
>
> switch (sev_cmd.id) {
> + case KVM_SEV_SNP_INIT:
> + if (!sev_snp_enabled) {
> + r = -ENOTTY;
> + goto out;
> + }
> + fallthrough;
> case KVM_SEV_ES_INIT:
> if (!sev_es_enabled) {
> r = -ENOTTY;
> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> index a3e27c82866b..07a9eb5b6ce5 100644
> --- a/arch/x86/kvm/svm/svm.h
> +++ b/arch/x86/kvm/svm/svm.h
> @@ -76,6 +76,9 @@ enum {
> /* TPR and CR2 are always written before VMRUN */
> #define VMCB_ALWAYS_DIRTY_MASK ((1U << VMCB_INTR) | (1U << VMCB_CR2))
>
> +/* Supported init feature flags */
> +#define SEV_SNP_SUPPORTED_FLAGS 0x0
> +
> struct kvm_sev_info {
> bool active; /* SEV enabled guest */
> bool es_active; /* SEV-ES enabled guest */
> @@ -91,6 +94,7 @@ struct kvm_sev_info {
> struct list_head mirror_entry; /* Use as a list entry of mirrors */
> struct misc_cg *misc_cg; /* For misc cgroup accounting */
> atomic_t migration_in_progress;
> + u64 snp_init_flags;
> };
>
> struct kvm_svm {
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index c3308536482b..73702e9b9d76 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -1869,6 +1869,9 @@ enum sev_cmd_id {
> /* Guest Migration Extension */
> KVM_SEV_SEND_CANCEL,
>
> + /* SNP specific commands */
> + KVM_SEV_SNP_INIT,
> +
> KVM_SEV_NR_MAX,
> };
>
> @@ -1965,6 +1968,16 @@ struct kvm_sev_receive_update_data {
> __u32 trans_len;
> };
>
> +/* enable the restricted injection */
> +#define KVM_SEV_SNP_RESTRICTED_INJET (1 << 0)
> +
> +/* enable the restricted injection timer */
> +#define KVM_SEV_SNP_RESTRICTED_TIMER_INJET (1 << 1)
> +
> +struct kvm_snp_init {
> + __u64 flags;
> +};
> +
> #define KVM_DEV_ASSIGN_ENABLE_IOMMU (1 << 0)
> #define KVM_DEV_ASSIGN_PCI_2_3 (1 << 1)
> #define KVM_DEV_ASSIGN_MASK_INTX (1 << 2)
> --
> 2.25.1
>