From: Ashish Kalra <[email protected]>
The series add support for AMD SEV guest live migration commands. To protect the
confidentiality of an SEV protected guest memory while in transit we need to
use the SEV commands defined in SEV API spec [1].
SEV guest VMs have the concept of private and shared memory. Private memory
is encrypted with the guest-specific key, while shared memory may be encrypted
with hypervisor key. The commands provided by the SEV FW are meant to be used
for the private memory only. The patch series introduces a new hypercall.
The guest OS can use this hypercall to notify the page encryption status.
If the page is encrypted with guest specific-key then we use SEV command during
the migration. If page is not encrypted then fallback to default.
The patch adds new ioctls KVM_{SET,GET}_PAGE_ENC_BITMAP. The ioctl can be used
by the qemu to get the page encrypted bitmap. Qemu can consult this bitmap
during the migration to know whether the page is encrypted.
[1] https://developer.amd.com/wp-content/resources/55766.PDF
Changes since v5:
- Fix build errors as
Reported-by: kbuild test robot <[email protected]>
Changes since v4:
- Host support has been added to extend KVM capabilities/feature bits to
include a new KVM_FEATURE_SEV_LIVE_MIGRATION, which the guest can
query for host-side support for SEV live migration and a new custom MSR
MSR_KVM_SEV_LIVE_MIG_EN is added for guest to enable the SEV live
migration feature.
- Ensure that _bss_decrypted section is marked as decrypted in the
page encryption bitmap.
- Fixing KVM_GET_PAGE_ENC_BITMAP ioctl to return the correct bitmap
as per the number of pages being requested by the user. Ensure that
we only copy bmap->num_pages bytes in the userspace buffer, if
bmap->num_pages is not byte aligned we read the trailing bits
from the userspace and copy those bits as is. This fixes guest
page(s) corruption issues observed after migration completion.
- Add kexec support for SEV Live Migration to reset the host's
page encryption bitmap related to kernel specific page encryption
status settings before we load a new kernel by kexec. We cannot
reset the complete page encryption bitmap here as we need to
retain the UEFI/OVMF firmware specific settings.
Changes since v3:
- Rebasing to mainline and testing.
- Adding a new KVM_PAGE_ENC_BITMAP_RESET ioctl, which resets the
page encryption bitmap on a guest reboot event.
- Adding a more reliable sanity check for GPA range being passed to
the hypercall to ensure that guest MMIO ranges are also marked
in the page encryption bitmap.
Changes since v2:
- reset the page encryption bitmap on vcpu reboot
Changes since v1:
- Add support to share the page encryption between the source and target
machine.
- Fix review feedbacks from Tom Lendacky.
- Add check to limit the session blob length.
- Update KVM_GET_PAGE_ENC_BITMAP icotl to use the base_gfn instead of
the memory slot when querying the bitmap.
Ashish Kalra (3):
KVM: x86: Introduce KVM_PAGE_ENC_BITMAP_RESET ioctl
KVM: x86: Introduce new KVM_FEATURE_SEV_LIVE_MIGRATION feature &
Custom MSR.
KVM: x86: Add kexec support for SEV Live Migration.
Brijesh Singh (11):
KVM: SVM: Add KVM_SEV SEND_START command
KVM: SVM: Add KVM_SEND_UPDATE_DATA command
KVM: SVM: Add KVM_SEV_SEND_FINISH command
KVM: SVM: Add support for KVM_SEV_RECEIVE_START command
KVM: SVM: Add KVM_SEV_RECEIVE_UPDATE_DATA command
KVM: SVM: Add KVM_SEV_RECEIVE_FINISH command
KVM: x86: Add AMD SEV specific Hypercall3
KVM: X86: Introduce KVM_HC_PAGE_ENC_STATUS hypercall
KVM: x86: Introduce KVM_GET_PAGE_ENC_BITMAP ioctl
mm: x86: Invoke hypercall when page encryption status is changed
KVM: x86: Introduce KVM_SET_PAGE_ENC_BITMAP ioctl
.../virt/kvm/amd-memory-encryption.rst | 120 +++
Documentation/virt/kvm/api.rst | 62 ++
Documentation/virt/kvm/cpuid.rst | 4 +
Documentation/virt/kvm/hypercalls.rst | 15 +
Documentation/virt/kvm/msr.rst | 10 +
arch/x86/include/asm/kvm_host.h | 10 +
arch/x86/include/asm/kvm_para.h | 12 +
arch/x86/include/asm/paravirt.h | 10 +
arch/x86/include/asm/paravirt_types.h | 2 +
arch/x86/include/uapi/asm/kvm_para.h | 5 +
arch/x86/kernel/kvm.c | 32 +
arch/x86/kernel/paravirt.c | 1 +
arch/x86/kvm/cpuid.c | 3 +-
arch/x86/kvm/svm.c | 699 +++++++++++++++++-
arch/x86/kvm/vmx/vmx.c | 1 +
arch/x86/kvm/x86.c | 43 ++
arch/x86/mm/mem_encrypt.c | 69 +-
arch/x86/mm/pat/set_memory.c | 7 +
include/linux/psp-sev.h | 8 +-
include/uapi/linux/kvm.h | 53 ++
include/uapi/linux/kvm_para.h | 1 +
21 files changed, 1157 insertions(+), 10 deletions(-)
--
2.17.1
From: Brijesh Singh <[email protected]>
The command is used to finailize the encryption context created with
KVM_SEV_SEND_START command.
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: "Radim Krčmář" <[email protected]>
Cc: Joerg Roedel <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Tom Lendacky <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Reviewed-by: Steve Rutherford <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
---
.../virt/kvm/amd-memory-encryption.rst | 8 +++++++
arch/x86/kvm/svm.c | 23 +++++++++++++++++++
2 files changed, 31 insertions(+)
diff --git a/Documentation/virt/kvm/amd-memory-encryption.rst b/Documentation/virt/kvm/amd-memory-encryption.rst
index f46817ef7019..a45dcb5f8687 100644
--- a/Documentation/virt/kvm/amd-memory-encryption.rst
+++ b/Documentation/virt/kvm/amd-memory-encryption.rst
@@ -314,6 +314,14 @@ Returns: 0 on success, -negative on error
__u32 trans_len;
};
+12. KVM_SEV_SEND_FINISH
+------------------------
+
+After completion of the migration flow, the KVM_SEV_SEND_FINISH command can be
+issued by the hypervisor to delete the encryption context.
+
+Returns: 0 on success, -negative on error
+
References
==========
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 8561c47cc4f9..71a4cb3b817d 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -7399,6 +7399,26 @@ static int sev_send_update_data(struct kvm *kvm, struct kvm_sev_cmd *argp)
return ret;
}
+static int sev_send_finish(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+ struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+ struct sev_data_send_finish *data;
+ int ret;
+
+ if (!sev_guest(kvm))
+ return -ENOTTY;
+
+ data = kzalloc(sizeof(*data), GFP_KERNEL);
+ if (!data)
+ return -ENOMEM;
+
+ data->handle = sev->handle;
+ ret = sev_issue_cmd(kvm, SEV_CMD_SEND_FINISH, data, &argp->error);
+
+ kfree(data);
+ return ret;
+}
+
static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
{
struct kvm_sev_cmd sev_cmd;
@@ -7449,6 +7469,9 @@ static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
case KVM_SEV_SEND_UPDATE_DATA:
r = sev_send_update_data(kvm, &sev_cmd);
break;
+ case KVM_SEV_SEND_FINISH:
+ r = sev_send_finish(kvm, &sev_cmd);
+ break;
default:
r = -EINVAL;
goto out;
--
2.17.1
From: Brijesh Singh <[email protected]>
The command finalize the guest receiving process and make the SEV guest
ready for the execution.
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: "Radim Krčmář" <[email protected]>
Cc: Joerg Roedel <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Tom Lendacky <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Brijesh Singh <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
---
.../virt/kvm/amd-memory-encryption.rst | 8 +++++++
arch/x86/kvm/svm.c | 23 +++++++++++++++++++
2 files changed, 31 insertions(+)
diff --git a/Documentation/virt/kvm/amd-memory-encryption.rst b/Documentation/virt/kvm/amd-memory-encryption.rst
index 554aa33a99cc..93cd95d9a6c0 100644
--- a/Documentation/virt/kvm/amd-memory-encryption.rst
+++ b/Documentation/virt/kvm/amd-memory-encryption.rst
@@ -375,6 +375,14 @@ Returns: 0 on success, -negative on error
__u32 trans_len;
};
+15. KVM_SEV_RECEIVE_FINISH
+------------------------
+
+After completion of the migration flow, the KVM_SEV_RECEIVE_FINISH command can be
+issued by the hypervisor to make the guest ready for execution.
+
+Returns: 0 on success, -negative on error
+
References
==========
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 5fc5355536d7..7c2721e18b06 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -7573,6 +7573,26 @@ static int sev_receive_update_data(struct kvm *kvm, struct kvm_sev_cmd *argp)
return ret;
}
+static int sev_receive_finish(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+ struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+ struct sev_data_receive_finish *data;
+ int ret;
+
+ if (!sev_guest(kvm))
+ return -ENOTTY;
+
+ data = kzalloc(sizeof(*data), GFP_KERNEL);
+ if (!data)
+ return -ENOMEM;
+
+ data->handle = sev->handle;
+ ret = sev_issue_cmd(kvm, SEV_CMD_RECEIVE_FINISH, data, &argp->error);
+
+ kfree(data);
+ return ret;
+}
+
static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
{
struct kvm_sev_cmd sev_cmd;
@@ -7632,6 +7652,9 @@ static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
case KVM_SEV_RECEIVE_UPDATE_DATA:
r = sev_receive_update_data(kvm, &sev_cmd);
break;
+ case KVM_SEV_RECEIVE_FINISH:
+ r = sev_receive_finish(kvm, &sev_cmd);
+ break;
default:
r = -EINVAL;
goto out;
--
2.17.1
From: Brijesh Singh <[email protected]>
The command is used for encrypting the guest memory region using the encryption
context created with KVM_SEV_SEND_START.
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: "Radim Krčmář" <[email protected]>
Cc: Joerg Roedel <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Tom Lendacky <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Reviewed-by : Steve Rutherford <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
---
.../virt/kvm/amd-memory-encryption.rst | 24 ++++
arch/x86/kvm/svm.c | 136 +++++++++++++++++-
include/uapi/linux/kvm.h | 9 ++
3 files changed, 165 insertions(+), 4 deletions(-)
diff --git a/Documentation/virt/kvm/amd-memory-encryption.rst b/Documentation/virt/kvm/amd-memory-encryption.rst
index 4fd34fc5c7a7..f46817ef7019 100644
--- a/Documentation/virt/kvm/amd-memory-encryption.rst
+++ b/Documentation/virt/kvm/amd-memory-encryption.rst
@@ -290,6 +290,30 @@ Returns: 0 on success, -negative on error
__u32 session_len;
};
+11. KVM_SEV_SEND_UPDATE_DATA
+----------------------------
+
+The KVM_SEV_SEND_UPDATE_DATA command can be used by the hypervisor to encrypt the
+outgoing guest memory region with the encryption context creating using
+KVM_SEV_SEND_START.
+
+Parameters (in): struct kvm_sev_send_update_data
+
+Returns: 0 on success, -negative on error
+
+::
+
+ struct kvm_sev_launch_send_update_data {
+ __u64 hdr_uaddr; /* userspace address containing the packet header */
+ __u32 hdr_len;
+
+ __u64 guest_uaddr; /* the source memory region to be encrypted */
+ __u32 guest_len;
+
+ __u64 trans_uaddr; /* the destition memory region */
+ __u32 trans_len;
+ };
+
References
==========
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 63d172e974ad..8561c47cc4f9 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -428,6 +428,7 @@ static DECLARE_RWSEM(sev_deactivate_lock);
static DEFINE_MUTEX(sev_bitmap_lock);
static unsigned int max_sev_asid;
static unsigned int min_sev_asid;
+static unsigned long sev_me_mask;
static unsigned long *sev_asid_bitmap;
static unsigned long *sev_reclaim_asid_bitmap;
#define __sme_page_pa(x) __sme_set(page_to_pfn(x) << PAGE_SHIFT)
@@ -1232,16 +1233,22 @@ static int avic_ga_log_notifier(u32 ga_tag)
static __init int sev_hardware_setup(void)
{
struct sev_user_data_status *status;
+ u32 eax, ebx;
int rc;
- /* Maximum number of encrypted guests supported simultaneously */
- max_sev_asid = cpuid_ecx(0x8000001F);
+ /*
+ * Query the memory encryption information.
+ * EBX: Bit 0:5 Pagetable bit position used to indicate encryption
+ * (aka Cbit).
+ * ECX: Maximum number of encrypted guests supported simultaneously.
+ * EDX: Minimum ASID value that should be used for SEV guest.
+ */
+ cpuid(0x8000001f, &eax, &ebx, &max_sev_asid, &min_sev_asid);
if (!max_sev_asid)
return 1;
- /* Minimum ASID value that should be used for SEV guest */
- min_sev_asid = cpuid_edx(0x8000001F);
+ sev_me_mask = 1UL << (ebx & 0x3f);
/* Initialize SEV ASID bitmaps */
sev_asid_bitmap = bitmap_zalloc(max_sev_asid, GFP_KERNEL);
@@ -7274,6 +7281,124 @@ static int sev_send_start(struct kvm *kvm, struct kvm_sev_cmd *argp)
return ret;
}
+/* Userspace wants to query either header or trans length. */
+static int
+__sev_send_update_data_query_lengths(struct kvm *kvm, struct kvm_sev_cmd *argp,
+ struct kvm_sev_send_update_data *params)
+{
+ struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+ struct sev_data_send_update_data *data;
+ int ret;
+
+ data = kzalloc(sizeof(*data), GFP_KERNEL_ACCOUNT);
+ if (!data)
+ return -ENOMEM;
+
+ data->handle = sev->handle;
+ ret = sev_issue_cmd(kvm, SEV_CMD_SEND_UPDATE_DATA, data, &argp->error);
+
+ params->hdr_len = data->hdr_len;
+ params->trans_len = data->trans_len;
+
+ if (copy_to_user((void __user *)(uintptr_t)argp->data, params,
+ sizeof(struct kvm_sev_send_update_data)))
+ ret = -EFAULT;
+
+ kfree(data);
+ return ret;
+}
+
+static int sev_send_update_data(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+ struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+ struct sev_data_send_update_data *data;
+ struct kvm_sev_send_update_data params;
+ void *hdr, *trans_data;
+ struct page **guest_page;
+ unsigned long n;
+ int ret, offset;
+
+ if (!sev_guest(kvm))
+ return -ENOTTY;
+
+ if (copy_from_user(¶ms, (void __user *)(uintptr_t)argp->data,
+ sizeof(struct kvm_sev_send_update_data)))
+ return -EFAULT;
+
+ /* userspace wants to query either header or trans length */
+ if (!params.trans_len || !params.hdr_len)
+ return __sev_send_update_data_query_lengths(kvm, argp, ¶ms);
+
+ if (!params.trans_uaddr || !params.guest_uaddr ||
+ !params.guest_len || !params.hdr_uaddr)
+ return -EINVAL;
+
+
+ /* Check if we are crossing the page boundary */
+ offset = params.guest_uaddr & (PAGE_SIZE - 1);
+ if ((params.guest_len + offset > PAGE_SIZE))
+ return -EINVAL;
+
+ /* Pin guest memory */
+ guest_page = sev_pin_memory(kvm, params.guest_uaddr & PAGE_MASK,
+ PAGE_SIZE, &n, 0);
+ if (!guest_page)
+ return -EFAULT;
+
+ /* allocate memory for header and transport buffer */
+ ret = -ENOMEM;
+ hdr = kmalloc(params.hdr_len, GFP_KERNEL_ACCOUNT);
+ if (!hdr)
+ goto e_unpin;
+
+ trans_data = kmalloc(params.trans_len, GFP_KERNEL_ACCOUNT);
+ if (!trans_data)
+ goto e_free_hdr;
+
+ data = kzalloc(sizeof(*data), GFP_KERNEL);
+ if (!data)
+ goto e_free_trans_data;
+
+ data->hdr_address = __psp_pa(hdr);
+ data->hdr_len = params.hdr_len;
+ data->trans_address = __psp_pa(trans_data);
+ data->trans_len = params.trans_len;
+
+ /* The SEND_UPDATE_DATA command requires C-bit to be always set. */
+ data->guest_address = (page_to_pfn(guest_page[0]) << PAGE_SHIFT) +
+ offset;
+ data->guest_address |= sev_me_mask;
+ data->guest_len = params.guest_len;
+ data->handle = sev->handle;
+
+ ret = sev_issue_cmd(kvm, SEV_CMD_SEND_UPDATE_DATA, data, &argp->error);
+
+ if (ret)
+ goto e_free;
+
+ /* copy transport buffer to user space */
+ if (copy_to_user((void __user *)(uintptr_t)params.trans_uaddr,
+ trans_data, params.trans_len)) {
+ ret = -EFAULT;
+ goto e_unpin;
+ }
+
+ /* Copy packet header to userspace. */
+ ret = copy_to_user((void __user *)(uintptr_t)params.hdr_uaddr, hdr,
+ params.hdr_len);
+
+e_free:
+ kfree(data);
+e_free_trans_data:
+ kfree(trans_data);
+e_free_hdr:
+ kfree(hdr);
+e_unpin:
+ sev_unpin_memory(kvm, guest_page, n);
+
+ return ret;
+}
+
static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
{
struct kvm_sev_cmd sev_cmd;
@@ -7321,6 +7446,9 @@ static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
case KVM_SEV_SEND_START:
r = sev_send_start(kvm, &sev_cmd);
break;
+ case KVM_SEV_SEND_UPDATE_DATA:
+ r = sev_send_update_data(kvm, &sev_cmd);
+ break;
default:
r = -EINVAL;
goto out;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 17bef4c245e1..d9dc81bb9c55 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1570,6 +1570,15 @@ struct kvm_sev_send_start {
__u32 session_len;
};
+struct kvm_sev_send_update_data {
+ __u64 hdr_uaddr;
+ __u32 hdr_len;
+ __u64 guest_uaddr;
+ __u32 guest_len;
+ __u64 trans_uaddr;
+ __u32 trans_len;
+};
+
#define KVM_DEV_ASSIGN_ENABLE_IOMMU (1 << 0)
#define KVM_DEV_ASSIGN_PCI_2_3 (1 << 1)
#define KVM_DEV_ASSIGN_MASK_INTX (1 << 2)
--
2.17.1
From: Brijesh Singh <[email protected]>
KVM hypercall framework relies on alternative framework to patch the
VMCALL -> VMMCALL on AMD platform. If a hypercall is made before
apply_alternative() is called then it defaults to VMCALL. The approach
works fine on non SEV guest. A VMCALL would causes #UD, and hypervisor
will be able to decode the instruction and do the right things. But
when SEV is active, guest memory is encrypted with guest key and
hypervisor will not be able to decode the instruction bytes.
Add SEV specific hypercall3, it unconditionally uses VMMCALL. The hypercall
will be used by the SEV guest to notify encrypted pages to the hypervisor.
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: "Radim Krčmář" <[email protected]>
Cc: Joerg Roedel <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Tom Lendacky <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Brijesh Singh <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
---
arch/x86/include/asm/kvm_para.h | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index 9b4df6eaa11a..6c09255633a4 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -84,6 +84,18 @@ static inline long kvm_hypercall4(unsigned int nr, unsigned long p1,
return ret;
}
+static inline long kvm_sev_hypercall3(unsigned int nr, unsigned long p1,
+ unsigned long p2, unsigned long p3)
+{
+ long ret;
+
+ asm volatile("vmmcall"
+ : "=a"(ret)
+ : "a"(nr), "b"(p1), "c"(p2), "d"(p3)
+ : "memory");
+ return ret;
+}
+
#ifdef CONFIG_KVM_GUEST
bool kvm_para_available(void);
unsigned int kvm_arch_para_features(void);
--
2.17.1
From: Brijesh Singh <[email protected]>
The command is used for copying the incoming buffer into the
SEV guest memory space.
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: "Radim Krčmář" <[email protected]>
Cc: Joerg Roedel <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Tom Lendacky <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Brijesh Singh <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
---
.../virt/kvm/amd-memory-encryption.rst | 24 ++++++
arch/x86/kvm/svm.c | 79 +++++++++++++++++++
include/uapi/linux/kvm.h | 9 +++
3 files changed, 112 insertions(+)
diff --git a/Documentation/virt/kvm/amd-memory-encryption.rst b/Documentation/virt/kvm/amd-memory-encryption.rst
index ef1f1f3a5b40..554aa33a99cc 100644
--- a/Documentation/virt/kvm/amd-memory-encryption.rst
+++ b/Documentation/virt/kvm/amd-memory-encryption.rst
@@ -351,6 +351,30 @@ On success, the 'handle' field contains a new handle and on error, a negative va
For more details, see SEV spec Section 6.12.
+14. KVM_SEV_RECEIVE_UPDATE_DATA
+----------------------------
+
+The KVM_SEV_RECEIVE_UPDATE_DATA command can be used by the hypervisor to copy
+the incoming buffers into the guest memory region with encryption context
+created during the KVM_SEV_RECEIVE_START.
+
+Parameters (in): struct kvm_sev_receive_update_data
+
+Returns: 0 on success, -negative on error
+
+::
+
+ struct kvm_sev_launch_receive_update_data {
+ __u64 hdr_uaddr; /* userspace address containing the packet header */
+ __u32 hdr_len;
+
+ __u64 guest_uaddr; /* the destination guest memory region */
+ __u32 guest_len;
+
+ __u64 trans_uaddr; /* the incoming buffer memory region */
+ __u32 trans_len;
+ };
+
References
==========
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 038b47685733..5fc5355536d7 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -7497,6 +7497,82 @@ static int sev_receive_start(struct kvm *kvm, struct kvm_sev_cmd *argp)
return ret;
}
+static int sev_receive_update_data(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+ struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+ struct kvm_sev_receive_update_data params;
+ struct sev_data_receive_update_data *data;
+ void *hdr = NULL, *trans = NULL;
+ struct page **guest_page;
+ unsigned long n;
+ int ret, offset;
+
+ if (!sev_guest(kvm))
+ return -EINVAL;
+
+ if (copy_from_user(¶ms, (void __user *)(uintptr_t)argp->data,
+ sizeof(struct kvm_sev_receive_update_data)))
+ return -EFAULT;
+
+ if (!params.hdr_uaddr || !params.hdr_len ||
+ !params.guest_uaddr || !params.guest_len ||
+ !params.trans_uaddr || !params.trans_len)
+ return -EINVAL;
+
+ /* Check if we are crossing the page boundary */
+ offset = params.guest_uaddr & (PAGE_SIZE - 1);
+ if ((params.guest_len + offset > PAGE_SIZE))
+ return -EINVAL;
+
+ hdr = psp_copy_user_blob(params.hdr_uaddr, params.hdr_len);
+ if (IS_ERR(hdr))
+ return PTR_ERR(hdr);
+
+ trans = psp_copy_user_blob(params.trans_uaddr, params.trans_len);
+ if (IS_ERR(trans)) {
+ ret = PTR_ERR(trans);
+ goto e_free_hdr;
+ }
+
+ ret = -ENOMEM;
+ data = kzalloc(sizeof(*data), GFP_KERNEL);
+ if (!data)
+ goto e_free_trans;
+
+ data->hdr_address = __psp_pa(hdr);
+ data->hdr_len = params.hdr_len;
+ data->trans_address = __psp_pa(trans);
+ data->trans_len = params.trans_len;
+
+ /* Pin guest memory */
+ ret = -EFAULT;
+ guest_page = sev_pin_memory(kvm, params.guest_uaddr & PAGE_MASK,
+ PAGE_SIZE, &n, 0);
+ if (!guest_page)
+ goto e_free;
+
+ /* The RECEIVE_UPDATE_DATA command requires C-bit to be always set. */
+ data->guest_address = (page_to_pfn(guest_page[0]) << PAGE_SHIFT) +
+ offset;
+ data->guest_address |= sev_me_mask;
+ data->guest_len = params.guest_len;
+ data->handle = sev->handle;
+
+ ret = sev_issue_cmd(kvm, SEV_CMD_RECEIVE_UPDATE_DATA, data,
+ &argp->error);
+
+ sev_unpin_memory(kvm, guest_page, n);
+
+e_free:
+ kfree(data);
+e_free_trans:
+ kfree(trans);
+e_free_hdr:
+ kfree(hdr);
+
+ return ret;
+}
+
static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
{
struct kvm_sev_cmd sev_cmd;
@@ -7553,6 +7629,9 @@ static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
case KVM_SEV_RECEIVE_START:
r = sev_receive_start(kvm, &sev_cmd);
break;
+ case KVM_SEV_RECEIVE_UPDATE_DATA:
+ r = sev_receive_update_data(kvm, &sev_cmd);
+ break;
default:
r = -EINVAL;
goto out;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 74764b9db5fa..4e80c57a3182 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1588,6 +1588,15 @@ struct kvm_sev_receive_start {
__u32 session_len;
};
+struct kvm_sev_receive_update_data {
+ __u64 hdr_uaddr;
+ __u32 hdr_len;
+ __u64 guest_uaddr;
+ __u32 guest_len;
+ __u64 trans_uaddr;
+ __u32 trans_len;
+};
+
#define KVM_DEV_ASSIGN_ENABLE_IOMMU (1 << 0)
#define KVM_DEV_ASSIGN_PCI_2_3 (1 << 1)
#define KVM_DEV_ASSIGN_MASK_INTX (1 << 2)
--
2.17.1
From: Brijesh Singh <[email protected]>
This hypercall is used by the SEV guest to notify a change in the page
encryption status to the hypervisor. The hypercall should be invoked
only when the encryption attribute is changed from encrypted -> decrypted
and vice versa. By default all guest pages are considered encrypted.
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: "Radim Krčmář" <[email protected]>
Cc: Joerg Roedel <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Tom Lendacky <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Brijesh Singh <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
---
Documentation/virt/kvm/hypercalls.rst | 15 +++++
arch/x86/include/asm/kvm_host.h | 2 +
arch/x86/kvm/svm.c | 95 +++++++++++++++++++++++++++
arch/x86/kvm/vmx/vmx.c | 1 +
arch/x86/kvm/x86.c | 6 ++
include/uapi/linux/kvm_para.h | 1 +
6 files changed, 120 insertions(+)
diff --git a/Documentation/virt/kvm/hypercalls.rst b/Documentation/virt/kvm/hypercalls.rst
index dbaf207e560d..ff5287e68e81 100644
--- a/Documentation/virt/kvm/hypercalls.rst
+++ b/Documentation/virt/kvm/hypercalls.rst
@@ -169,3 +169,18 @@ a0: destination APIC ID
:Usage example: When sending a call-function IPI-many to vCPUs, yield if
any of the IPI target vCPUs was preempted.
+
+
+8. KVM_HC_PAGE_ENC_STATUS
+-------------------------
+:Architecture: x86
+:Status: active
+:Purpose: Notify the encryption status changes in guest page table (SEV guest)
+
+a0: the guest physical address of the start page
+a1: the number of pages
+a2: encryption attribute
+
+ Where:
+ * 1: Encryption attribute is set
+ * 0: Encryption attribute is cleared
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 98959e8cd448..90718fa3db47 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1267,6 +1267,8 @@ struct kvm_x86_ops {
bool (*apic_init_signal_blocked)(struct kvm_vcpu *vcpu);
int (*enable_direct_tlbflush)(struct kvm_vcpu *vcpu);
+ int (*page_enc_status_hc)(struct kvm *kvm, unsigned long gpa,
+ unsigned long sz, unsigned long mode);
};
struct kvm_arch_async_pf {
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 7c2721e18b06..1d8beaf1bceb 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -136,6 +136,8 @@ struct kvm_sev_info {
int fd; /* SEV device fd */
unsigned long pages_locked; /* Number of pages locked */
struct list_head regions_list; /* List of registered regions */
+ unsigned long *page_enc_bmap;
+ unsigned long page_enc_bmap_size;
};
struct kvm_svm {
@@ -1991,6 +1993,9 @@ static void sev_vm_destroy(struct kvm *kvm)
sev_unbind_asid(kvm, sev->handle);
sev_asid_free(sev->asid);
+
+ kvfree(sev->page_enc_bmap);
+ sev->page_enc_bmap = NULL;
}
static void avic_vm_destroy(struct kvm *kvm)
@@ -7593,6 +7598,94 @@ static int sev_receive_finish(struct kvm *kvm, struct kvm_sev_cmd *argp)
return ret;
}
+static int sev_resize_page_enc_bitmap(struct kvm *kvm, unsigned long new_size)
+{
+ struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+ unsigned long *map;
+ unsigned long sz;
+
+ if (sev->page_enc_bmap_size >= new_size)
+ return 0;
+
+ sz = ALIGN(new_size, BITS_PER_LONG) / 8;
+
+ map = vmalloc(sz);
+ if (!map) {
+ pr_err_once("Failed to allocate encrypted bitmap size %lx\n",
+ sz);
+ return -ENOMEM;
+ }
+
+ /* mark the page encrypted (by default) */
+ memset(map, 0xff, sz);
+
+ bitmap_copy(map, sev->page_enc_bmap, sev->page_enc_bmap_size);
+ kvfree(sev->page_enc_bmap);
+
+ sev->page_enc_bmap = map;
+ sev->page_enc_bmap_size = new_size;
+
+ return 0;
+}
+
+static int svm_page_enc_status_hc(struct kvm *kvm, unsigned long gpa,
+ unsigned long npages, unsigned long enc)
+{
+ struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+ kvm_pfn_t pfn_start, pfn_end;
+ gfn_t gfn_start, gfn_end;
+ int ret;
+
+ if (!sev_guest(kvm))
+ return -EINVAL;
+
+ if (!npages)
+ return 0;
+
+ gfn_start = gpa_to_gfn(gpa);
+ gfn_end = gfn_start + npages;
+
+ /* out of bound access error check */
+ if (gfn_end <= gfn_start)
+ return -EINVAL;
+
+ /* lets make sure that gpa exist in our memslot */
+ pfn_start = gfn_to_pfn(kvm, gfn_start);
+ pfn_end = gfn_to_pfn(kvm, gfn_end);
+
+ if (is_error_noslot_pfn(pfn_start) && !is_noslot_pfn(pfn_start)) {
+ /*
+ * Allow guest MMIO range(s) to be added
+ * to the page encryption bitmap.
+ */
+ return -EINVAL;
+ }
+
+ if (is_error_noslot_pfn(pfn_end) && !is_noslot_pfn(pfn_end)) {
+ /*
+ * Allow guest MMIO range(s) to be added
+ * to the page encryption bitmap.
+ */
+ return -EINVAL;
+ }
+
+ mutex_lock(&kvm->lock);
+ ret = sev_resize_page_enc_bitmap(kvm, gfn_end);
+ if (ret)
+ goto unlock;
+
+ if (enc)
+ __bitmap_set(sev->page_enc_bmap, gfn_start,
+ gfn_end - gfn_start);
+ else
+ __bitmap_clear(sev->page_enc_bmap, gfn_start,
+ gfn_end - gfn_start);
+
+unlock:
+ mutex_unlock(&kvm->lock);
+ return ret;
+}
+
static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
{
struct kvm_sev_cmd sev_cmd;
@@ -7995,6 +8088,8 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
.need_emulation_on_page_fault = svm_need_emulation_on_page_fault,
.apic_init_signal_blocked = svm_apic_init_signal_blocked,
+
+ .page_enc_status_hc = svm_page_enc_status_hc,
};
static int __init svm_init(void)
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 079d9fbf278e..f68e76ee7f9c 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -8001,6 +8001,7 @@ static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
.nested_get_evmcs_version = NULL,
.need_emulation_on_page_fault = vmx_need_emulation_on_page_fault,
.apic_init_signal_blocked = vmx_apic_init_signal_blocked,
+ .page_enc_status_hc = NULL,
};
static void vmx_cleanup_l1d_flush(void)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index cf95c36cb4f4..68428eef2dde 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7564,6 +7564,12 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
kvm_sched_yield(vcpu->kvm, a0);
ret = 0;
break;
+ case KVM_HC_PAGE_ENC_STATUS:
+ ret = -KVM_ENOSYS;
+ if (kvm_x86_ops->page_enc_status_hc)
+ ret = kvm_x86_ops->page_enc_status_hc(vcpu->kvm,
+ a0, a1, a2);
+ break;
default:
ret = -KVM_ENOSYS;
break;
diff --git a/include/uapi/linux/kvm_para.h b/include/uapi/linux/kvm_para.h
index 8b86609849b9..847b83b75dc8 100644
--- a/include/uapi/linux/kvm_para.h
+++ b/include/uapi/linux/kvm_para.h
@@ -29,6 +29,7 @@
#define KVM_HC_CLOCK_PAIRING 9
#define KVM_HC_SEND_IPI 10
#define KVM_HC_SCHED_YIELD 11
+#define KVM_HC_PAGE_ENC_STATUS 12
/*
* hypercalls use architecture specific
--
2.17.1
From: Brijesh Singh <[email protected]>
Invoke a hypercall when a memory region is changed from encrypted ->
decrypted and vice versa. Hypervisor need to know the page encryption
status during the guest migration.
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: "Radim Krčmář" <[email protected]>
Cc: Joerg Roedel <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Tom Lendacky <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Brijesh Singh <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
---
arch/x86/include/asm/paravirt.h | 10 +++++
arch/x86/include/asm/paravirt_types.h | 2 +
arch/x86/kernel/paravirt.c | 1 +
arch/x86/mm/mem_encrypt.c | 57 ++++++++++++++++++++++++++-
arch/x86/mm/pat/set_memory.c | 7 ++++
5 files changed, 76 insertions(+), 1 deletion(-)
diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index 694d8daf4983..8127b9c141bf 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -78,6 +78,12 @@ static inline void paravirt_arch_exit_mmap(struct mm_struct *mm)
PVOP_VCALL1(mmu.exit_mmap, mm);
}
+static inline void page_encryption_changed(unsigned long vaddr, int npages,
+ bool enc)
+{
+ PVOP_VCALL3(mmu.page_encryption_changed, vaddr, npages, enc);
+}
+
#ifdef CONFIG_PARAVIRT_XXL
static inline void load_sp0(unsigned long sp0)
{
@@ -946,6 +952,10 @@ static inline void paravirt_arch_dup_mmap(struct mm_struct *oldmm,
static inline void paravirt_arch_exit_mmap(struct mm_struct *mm)
{
}
+
+static inline void page_encryption_changed(unsigned long vaddr, int npages, bool enc)
+{
+}
#endif
#endif /* __ASSEMBLY__ */
#endif /* _ASM_X86_PARAVIRT_H */
diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h
index 732f62e04ddb..03bfd515c59c 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -215,6 +215,8 @@ struct pv_mmu_ops {
/* Hook for intercepting the destruction of an mm_struct. */
void (*exit_mmap)(struct mm_struct *mm);
+ void (*page_encryption_changed)(unsigned long vaddr, int npages,
+ bool enc);
#ifdef CONFIG_PARAVIRT_XXL
struct paravirt_callee_save read_cr2;
diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
index c131ba4e70ef..840c02b23aeb 100644
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -367,6 +367,7 @@ struct paravirt_patch_template pv_ops = {
(void (*)(struct mmu_gather *, void *))tlb_remove_page,
.mmu.exit_mmap = paravirt_nop,
+ .mmu.page_encryption_changed = paravirt_nop,
#ifdef CONFIG_PARAVIRT_XXL
.mmu.read_cr2 = __PV_IS_CALLEE_SAVE(native_read_cr2),
diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index f4bd4b431ba1..c9800fa811f6 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -19,6 +19,7 @@
#include <linux/kernel.h>
#include <linux/bitops.h>
#include <linux/dma-mapping.h>
+#include <linux/kvm_para.h>
#include <asm/tlbflush.h>
#include <asm/fixmap.h>
@@ -29,6 +30,7 @@
#include <asm/processor-flags.h>
#include <asm/msr.h>
#include <asm/cmdline.h>
+#include <asm/kvm_para.h>
#include "mm_internal.h"
@@ -196,6 +198,47 @@ void __init sme_early_init(void)
swiotlb_force = SWIOTLB_FORCE;
}
+static void set_memory_enc_dec_hypercall(unsigned long vaddr, int npages,
+ bool enc)
+{
+ unsigned long sz = npages << PAGE_SHIFT;
+ unsigned long vaddr_end, vaddr_next;
+
+ vaddr_end = vaddr + sz;
+
+ for (; vaddr < vaddr_end; vaddr = vaddr_next) {
+ int psize, pmask, level;
+ unsigned long pfn;
+ pte_t *kpte;
+
+ kpte = lookup_address(vaddr, &level);
+ if (!kpte || pte_none(*kpte))
+ return;
+
+ switch (level) {
+ case PG_LEVEL_4K:
+ pfn = pte_pfn(*kpte);
+ break;
+ case PG_LEVEL_2M:
+ pfn = pmd_pfn(*(pmd_t *)kpte);
+ break;
+ case PG_LEVEL_1G:
+ pfn = pud_pfn(*(pud_t *)kpte);
+ break;
+ default:
+ return;
+ }
+
+ psize = page_level_size(level);
+ pmask = page_level_mask(level);
+
+ kvm_sev_hypercall3(KVM_HC_PAGE_ENC_STATUS,
+ pfn << PAGE_SHIFT, psize >> PAGE_SHIFT, enc);
+
+ vaddr_next = (vaddr & pmask) + psize;
+ }
+}
+
static void __init __set_clr_pte_enc(pte_t *kpte, int level, bool enc)
{
pgprot_t old_prot, new_prot;
@@ -253,12 +296,13 @@ static void __init __set_clr_pte_enc(pte_t *kpte, int level, bool enc)
static int __init early_set_memory_enc_dec(unsigned long vaddr,
unsigned long size, bool enc)
{
- unsigned long vaddr_end, vaddr_next;
+ unsigned long vaddr_end, vaddr_next, start;
unsigned long psize, pmask;
int split_page_size_mask;
int level, ret;
pte_t *kpte;
+ start = vaddr;
vaddr_next = vaddr;
vaddr_end = vaddr + size;
@@ -313,6 +357,8 @@ static int __init early_set_memory_enc_dec(unsigned long vaddr,
ret = 0;
+ set_memory_enc_dec_hypercall(start, PAGE_ALIGN(size) >> PAGE_SHIFT,
+ enc);
out:
__flush_tlb_all();
return ret;
@@ -451,6 +497,15 @@ void __init mem_encrypt_init(void)
if (sev_active())
static_branch_enable(&sev_enable_key);
+#ifdef CONFIG_PARAVIRT
+ /*
+ * With SEV, we need to make a hypercall when page encryption state is
+ * changed.
+ */
+ if (sev_active())
+ pv_ops.mmu.page_encryption_changed = set_memory_enc_dec_hypercall;
+#endif
+
pr_info("AMD %s active\n",
sev_active() ? "Secure Encrypted Virtualization (SEV)"
: "Secure Memory Encryption (SME)");
diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
index c4aedd00c1ba..86b7804129fc 100644
--- a/arch/x86/mm/pat/set_memory.c
+++ b/arch/x86/mm/pat/set_memory.c
@@ -26,6 +26,7 @@
#include <asm/proto.h>
#include <asm/memtype.h>
#include <asm/set_memory.h>
+#include <asm/paravirt.h>
#include "../mm_internal.h"
@@ -1987,6 +1988,12 @@ static int __set_memory_enc_dec(unsigned long addr, int numpages, bool enc)
*/
cpa_flush(&cpa, 0);
+ /* Notify hypervisor that a given memory range is mapped encrypted
+ * or decrypted. The hypervisor will use this information during the
+ * VM migration.
+ */
+ page_encryption_changed(addr, numpages, enc);
+
return ret;
}
--
2.17.1
From: Brijesh Singh <[email protected]>
The ioctl can be used to retrieve page encryption bitmap for a given
gfn range.
Return the correct bitmap as per the number of pages being requested
by the user. Ensure that we only copy bmap->num_pages bytes in the
userspace buffer, if bmap->num_pages is not byte aligned we read
the trailing bits from the userspace and copy those bits as is.
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: "Radim Krčmář" <[email protected]>
Cc: Joerg Roedel <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Tom Lendacky <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Brijesh Singh <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
---
Documentation/virt/kvm/api.rst | 27 +++++++++++++
arch/x86/include/asm/kvm_host.h | 2 +
arch/x86/kvm/svm.c | 71 +++++++++++++++++++++++++++++++++
arch/x86/kvm/x86.c | 12 ++++++
include/uapi/linux/kvm.h | 12 ++++++
5 files changed, 124 insertions(+)
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index ebd383fba939..8ad800ebb54f 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -4648,6 +4648,33 @@ This ioctl resets VCPU registers and control structures according to
the clear cpu reset definition in the POP. However, the cpu is not put
into ESA mode. This reset is a superset of the initial reset.
+4.125 KVM_GET_PAGE_ENC_BITMAP (vm ioctl)
+---------------------------------------
+
+:Capability: basic
+:Architectures: x86
+:Type: vm ioctl
+:Parameters: struct kvm_page_enc_bitmap (in/out)
+:Returns: 0 on success, -1 on error
+
+/* for KVM_GET_PAGE_ENC_BITMAP */
+struct kvm_page_enc_bitmap {
+ __u64 start_gfn;
+ __u64 num_pages;
+ union {
+ void __user *enc_bitmap; /* one bit per page */
+ __u64 padding2;
+ };
+};
+
+The encrypted VMs have concept of private and shared pages. The private
+page is encrypted with the guest-specific key, while shared page may
+be encrypted with the hypervisor key. The KVM_GET_PAGE_ENC_BITMAP can
+be used to get the bitmap indicating whether the guest page is private
+or shared. The bitmap can be used during the guest migration, if the page
+is private then userspace need to use SEV migration commands to transmit
+the page.
+
5. The kvm_run structure
========================
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 90718fa3db47..27e43e3ec9d8 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1269,6 +1269,8 @@ struct kvm_x86_ops {
int (*enable_direct_tlbflush)(struct kvm_vcpu *vcpu);
int (*page_enc_status_hc)(struct kvm *kvm, unsigned long gpa,
unsigned long sz, unsigned long mode);
+ int (*get_page_enc_bitmap)(struct kvm *kvm,
+ struct kvm_page_enc_bitmap *bmap);
};
struct kvm_arch_async_pf {
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 1d8beaf1bceb..bae783cd396a 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -7686,6 +7686,76 @@ static int svm_page_enc_status_hc(struct kvm *kvm, unsigned long gpa,
return ret;
}
+static int svm_get_page_enc_bitmap(struct kvm *kvm,
+ struct kvm_page_enc_bitmap *bmap)
+{
+ struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+ unsigned long gfn_start, gfn_end;
+ unsigned long sz, i, sz_bytes;
+ unsigned long *bitmap;
+ int ret, n;
+
+ if (!sev_guest(kvm))
+ return -ENOTTY;
+
+ gfn_start = bmap->start_gfn;
+ gfn_end = gfn_start + bmap->num_pages;
+
+ sz = ALIGN(bmap->num_pages, BITS_PER_LONG) / BITS_PER_BYTE;
+ bitmap = kmalloc(sz, GFP_KERNEL);
+ if (!bitmap)
+ return -ENOMEM;
+
+ /* by default all pages are marked encrypted */
+ memset(bitmap, 0xff, sz);
+
+ mutex_lock(&kvm->lock);
+ if (sev->page_enc_bmap) {
+ i = gfn_start;
+ for_each_clear_bit_from(i, sev->page_enc_bmap,
+ min(sev->page_enc_bmap_size, gfn_end))
+ clear_bit(i - gfn_start, bitmap);
+ }
+ mutex_unlock(&kvm->lock);
+
+ ret = -EFAULT;
+
+ n = bmap->num_pages % BITS_PER_BYTE;
+ sz_bytes = ALIGN(bmap->num_pages, BITS_PER_BYTE) / BITS_PER_BYTE;
+
+ /*
+ * Return the correct bitmap as per the number of pages being
+ * requested by the user. Ensure that we only copy bmap->num_pages
+ * bytes in the userspace buffer, if bmap->num_pages is not byte
+ * aligned we read the trailing bits from the userspace and copy
+ * those bits as is.
+ */
+
+ if (n) {
+ unsigned char *bitmap_kernel = (unsigned char *)bitmap;
+ unsigned char bitmap_user;
+ unsigned long offset, mask;
+
+ offset = bmap->num_pages / BITS_PER_BYTE;
+ if (copy_from_user(&bitmap_user, bmap->enc_bitmap + offset,
+ sizeof(unsigned char)))
+ goto out;
+
+ mask = GENMASK(n - 1, 0);
+ bitmap_user &= ~mask;
+ bitmap_kernel[offset] &= mask;
+ bitmap_kernel[offset] |= bitmap_user;
+ }
+
+ if (copy_to_user(bmap->enc_bitmap, bitmap, sz_bytes))
+ goto out;
+
+ ret = 0;
+out:
+ kfree(bitmap);
+ return ret;
+}
+
static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
{
struct kvm_sev_cmd sev_cmd;
@@ -8090,6 +8160,7 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
.apic_init_signal_blocked = svm_apic_init_signal_blocked,
.page_enc_status_hc = svm_page_enc_status_hc,
+ .get_page_enc_bitmap = svm_get_page_enc_bitmap,
};
static int __init svm_init(void)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 68428eef2dde..3c3fea4e20b5 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5226,6 +5226,18 @@ long kvm_arch_vm_ioctl(struct file *filp,
case KVM_SET_PMU_EVENT_FILTER:
r = kvm_vm_ioctl_set_pmu_event_filter(kvm, argp);
break;
+ case KVM_GET_PAGE_ENC_BITMAP: {
+ struct kvm_page_enc_bitmap bitmap;
+
+ r = -EFAULT;
+ if (copy_from_user(&bitmap, argp, sizeof(bitmap)))
+ goto out;
+
+ r = -ENOTTY;
+ if (kvm_x86_ops->get_page_enc_bitmap)
+ r = kvm_x86_ops->get_page_enc_bitmap(kvm, &bitmap);
+ break;
+ }
default:
r = -ENOTTY;
}
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 4e80c57a3182..db1ebf85e177 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -500,6 +500,16 @@ struct kvm_dirty_log {
};
};
+/* for KVM_GET_PAGE_ENC_BITMAP */
+struct kvm_page_enc_bitmap {
+ __u64 start_gfn;
+ __u64 num_pages;
+ union {
+ void __user *enc_bitmap; /* one bit per page */
+ __u64 padding2;
+ };
+};
+
/* for KVM_CLEAR_DIRTY_LOG */
struct kvm_clear_dirty_log {
__u32 slot;
@@ -1478,6 +1488,8 @@ struct kvm_enc_region {
#define KVM_S390_NORMAL_RESET _IO(KVMIO, 0xc3)
#define KVM_S390_CLEAR_RESET _IO(KVMIO, 0xc4)
+#define KVM_GET_PAGE_ENC_BITMAP _IOW(KVMIO, 0xc5, struct kvm_page_enc_bitmap)
+
/* Secure Encrypted Virtualization command */
enum sev_cmd_id {
/* Guest initialization commands */
--
2.17.1
From: Brijesh Singh <[email protected]>
The ioctl can be used to set page encryption bitmap for an
incoming guest.
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: "Radim Krčmář" <[email protected]>
Cc: Joerg Roedel <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Tom Lendacky <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Brijesh Singh <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
---
Documentation/virt/kvm/api.rst | 22 +++++++++++++++++
arch/x86/include/asm/kvm_host.h | 2 ++
arch/x86/kvm/svm.c | 42 +++++++++++++++++++++++++++++++++
arch/x86/kvm/x86.c | 12 ++++++++++
include/uapi/linux/kvm.h | 1 +
5 files changed, 79 insertions(+)
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 8ad800ebb54f..4d1004a154f6 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -4675,6 +4675,28 @@ or shared. The bitmap can be used during the guest migration, if the page
is private then userspace need to use SEV migration commands to transmit
the page.
+4.126 KVM_SET_PAGE_ENC_BITMAP (vm ioctl)
+---------------------------------------
+
+:Capability: basic
+:Architectures: x86
+:Type: vm ioctl
+:Parameters: struct kvm_page_enc_bitmap (in/out)
+:Returns: 0 on success, -1 on error
+
+/* for KVM_SET_PAGE_ENC_BITMAP */
+struct kvm_page_enc_bitmap {
+ __u64 start_gfn;
+ __u64 num_pages;
+ union {
+ void __user *enc_bitmap; /* one bit per page */
+ __u64 padding2;
+ };
+};
+
+During the guest live migration the outgoing guest exports its page encryption
+bitmap, the KVM_SET_PAGE_ENC_BITMAP can be used to build the page encryption
+bitmap for an incoming guest.
5. The kvm_run structure
========================
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 27e43e3ec9d8..d30f770aaaea 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1271,6 +1271,8 @@ struct kvm_x86_ops {
unsigned long sz, unsigned long mode);
int (*get_page_enc_bitmap)(struct kvm *kvm,
struct kvm_page_enc_bitmap *bmap);
+ int (*set_page_enc_bitmap)(struct kvm *kvm,
+ struct kvm_page_enc_bitmap *bmap);
};
struct kvm_arch_async_pf {
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index bae783cd396a..313343a43045 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -7756,6 +7756,47 @@ static int svm_get_page_enc_bitmap(struct kvm *kvm,
return ret;
}
+static int svm_set_page_enc_bitmap(struct kvm *kvm,
+ struct kvm_page_enc_bitmap *bmap)
+{
+ struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+ unsigned long gfn_start, gfn_end;
+ unsigned long *bitmap;
+ unsigned long sz, i;
+ int ret;
+
+ if (!sev_guest(kvm))
+ return -ENOTTY;
+
+ gfn_start = bmap->start_gfn;
+ gfn_end = gfn_start + bmap->num_pages;
+
+ sz = ALIGN(bmap->num_pages, BITS_PER_LONG) / 8;
+ bitmap = kmalloc(sz, GFP_KERNEL);
+ if (!bitmap)
+ return -ENOMEM;
+
+ ret = -EFAULT;
+ if (copy_from_user(bitmap, bmap->enc_bitmap, sz))
+ goto out;
+
+ mutex_lock(&kvm->lock);
+ ret = sev_resize_page_enc_bitmap(kvm, gfn_end);
+ if (ret)
+ goto unlock;
+
+ i = gfn_start;
+ for_each_clear_bit_from(i, bitmap, (gfn_end - gfn_start))
+ clear_bit(i + gfn_start, sev->page_enc_bmap);
+
+ ret = 0;
+unlock:
+ mutex_unlock(&kvm->lock);
+out:
+ kfree(bitmap);
+ return ret;
+}
+
static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
{
struct kvm_sev_cmd sev_cmd;
@@ -8161,6 +8202,7 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
.page_enc_status_hc = svm_page_enc_status_hc,
.get_page_enc_bitmap = svm_get_page_enc_bitmap,
+ .set_page_enc_bitmap = svm_set_page_enc_bitmap,
};
static int __init svm_init(void)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 3c3fea4e20b5..05e953b2ec61 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5238,6 +5238,18 @@ long kvm_arch_vm_ioctl(struct file *filp,
r = kvm_x86_ops->get_page_enc_bitmap(kvm, &bitmap);
break;
}
+ case KVM_SET_PAGE_ENC_BITMAP: {
+ struct kvm_page_enc_bitmap bitmap;
+
+ r = -EFAULT;
+ if (copy_from_user(&bitmap, argp, sizeof(bitmap)))
+ goto out;
+
+ r = -ENOTTY;
+ if (kvm_x86_ops->set_page_enc_bitmap)
+ r = kvm_x86_ops->set_page_enc_bitmap(kvm, &bitmap);
+ break;
+ }
default:
r = -ENOTTY;
}
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index db1ebf85e177..b4b01d47e568 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1489,6 +1489,7 @@ struct kvm_enc_region {
#define KVM_S390_CLEAR_RESET _IO(KVMIO, 0xc4)
#define KVM_GET_PAGE_ENC_BITMAP _IOW(KVMIO, 0xc5, struct kvm_page_enc_bitmap)
+#define KVM_SET_PAGE_ENC_BITMAP _IOW(KVMIO, 0xc6, struct kvm_page_enc_bitmap)
/* Secure Encrypted Virtualization command */
enum sev_cmd_id {
--
2.17.1
From: Brijesh Singh <[email protected]>
The command is used to create the encryption context for an incoming
SEV guest. The encryption context can be later used by the hypervisor
to import the incoming data into the SEV guest memory space.
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: "Radim Krčmář" <[email protected]>
Cc: Joerg Roedel <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Tom Lendacky <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Reviewed-by: Steve Rutherford <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
---
.../virt/kvm/amd-memory-encryption.rst | 29 +++++++
arch/x86/kvm/svm.c | 81 +++++++++++++++++++
include/uapi/linux/kvm.h | 9 +++
3 files changed, 119 insertions(+)
diff --git a/Documentation/virt/kvm/amd-memory-encryption.rst b/Documentation/virt/kvm/amd-memory-encryption.rst
index a45dcb5f8687..ef1f1f3a5b40 100644
--- a/Documentation/virt/kvm/amd-memory-encryption.rst
+++ b/Documentation/virt/kvm/amd-memory-encryption.rst
@@ -322,6 +322,35 @@ issued by the hypervisor to delete the encryption context.
Returns: 0 on success, -negative on error
+13. KVM_SEV_RECEIVE_START
+------------------------
+
+The KVM_SEV_RECEIVE_START command is used for creating the memory encryption
+context for an incoming SEV guest. To create the encryption context, the user must
+provide a guest policy, the platform public Diffie-Hellman (PDH) key and session
+information.
+
+Parameters: struct kvm_sev_receive_start (in/out)
+
+Returns: 0 on success, -negative on error
+
+::
+
+ struct kvm_sev_receive_start {
+ __u32 handle; /* if zero then firmware creates a new handle */
+ __u32 policy; /* guest's policy */
+
+ __u64 pdh_uaddr; /* userspace address pointing to the PDH key */
+ __u32 dh_len;
+
+ __u64 session_addr; /* userspace address which points to the guest session information */
+ __u32 session_len;
+ };
+
+On success, the 'handle' field contains a new handle and on error, a negative value.
+
+For more details, see SEV spec Section 6.12.
+
References
==========
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 71a4cb3b817d..038b47685733 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -7419,6 +7419,84 @@ static int sev_send_finish(struct kvm *kvm, struct kvm_sev_cmd *argp)
return ret;
}
+static int sev_receive_start(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+ struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+ struct sev_data_receive_start *start;
+ struct kvm_sev_receive_start params;
+ int *error = &argp->error;
+ void *session_data;
+ void *pdh_data;
+ int ret;
+
+ if (!sev_guest(kvm))
+ return -ENOTTY;
+
+ /* Get parameter from the userspace */
+ if (copy_from_user(¶ms, (void __user *)(uintptr_t)argp->data,
+ sizeof(struct kvm_sev_receive_start)))
+ return -EFAULT;
+
+ /* some sanity checks */
+ if (!params.pdh_uaddr || !params.pdh_len ||
+ !params.session_uaddr || !params.session_len)
+ return -EINVAL;
+
+ pdh_data = psp_copy_user_blob(params.pdh_uaddr, params.pdh_len);
+ if (IS_ERR(pdh_data))
+ return PTR_ERR(pdh_data);
+
+ session_data = psp_copy_user_blob(params.session_uaddr,
+ params.session_len);
+ if (IS_ERR(session_data)) {
+ ret = PTR_ERR(session_data);
+ goto e_free_pdh;
+ }
+
+ ret = -ENOMEM;
+ start = kzalloc(sizeof(*start), GFP_KERNEL);
+ if (!start)
+ goto e_free_session;
+
+ start->handle = params.handle;
+ start->policy = params.policy;
+ start->pdh_cert_address = __psp_pa(pdh_data);
+ start->pdh_cert_len = params.pdh_len;
+ start->session_address = __psp_pa(session_data);
+ start->session_len = params.session_len;
+
+ /* create memory encryption context */
+ ret = __sev_issue_cmd(argp->sev_fd, SEV_CMD_RECEIVE_START, start,
+ error);
+ if (ret)
+ goto e_free;
+
+ /* Bind ASID to this guest */
+ ret = sev_bind_asid(kvm, start->handle, error);
+ if (ret)
+ goto e_free;
+
+ params.handle = start->handle;
+ if (copy_to_user((void __user *)(uintptr_t)argp->data,
+ ¶ms, sizeof(struct kvm_sev_receive_start))) {
+ ret = -EFAULT;
+ sev_unbind_asid(kvm, start->handle);
+ goto e_free;
+ }
+
+ sev->handle = start->handle;
+ sev->fd = argp->sev_fd;
+
+e_free:
+ kfree(start);
+e_free_session:
+ kfree(session_data);
+e_free_pdh:
+ kfree(pdh_data);
+
+ return ret;
+}
+
static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
{
struct kvm_sev_cmd sev_cmd;
@@ -7472,6 +7550,9 @@ static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
case KVM_SEV_SEND_FINISH:
r = sev_send_finish(kvm, &sev_cmd);
break;
+ case KVM_SEV_RECEIVE_START:
+ r = sev_receive_start(kvm, &sev_cmd);
+ break;
default:
r = -EINVAL;
goto out;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index d9dc81bb9c55..74764b9db5fa 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1579,6 +1579,15 @@ struct kvm_sev_send_update_data {
__u32 trans_len;
};
+struct kvm_sev_receive_start {
+ __u32 handle;
+ __u32 policy;
+ __u64 pdh_uaddr;
+ __u32 pdh_len;
+ __u64 session_uaddr;
+ __u32 session_len;
+};
+
#define KVM_DEV_ASSIGN_ENABLE_IOMMU (1 << 0)
#define KVM_DEV_ASSIGN_PCI_2_3 (1 << 1)
#define KVM_DEV_ASSIGN_MASK_INTX (1 << 2)
--
2.17.1
From: Ashish Kalra <[email protected]>
Add new KVM_FEATURE_SEV_LIVE_MIGRATION feature for guest to check
for host-side support for SEV live migration. Also add a new custom
MSR_KVM_SEV_LIVE_MIG_EN for guest to enable the SEV live migration
feature.
Also, ensure that _bss_decrypted section is marked as decrypted in the
page encryption bitmap.
Signed-off-by: Ashish Kalra <[email protected]>
---
Documentation/virt/kvm/cpuid.rst | 4 ++++
Documentation/virt/kvm/msr.rst | 10 ++++++++++
arch/x86/include/asm/kvm_host.h | 3 +++
arch/x86/include/uapi/asm/kvm_para.h | 5 +++++
arch/x86/kernel/kvm.c | 4 ++++
arch/x86/kvm/cpuid.c | 3 ++-
arch/x86/kvm/svm.c | 5 +++++
arch/x86/kvm/x86.c | 7 +++++++
arch/x86/mm/mem_encrypt.c | 14 +++++++++++++-
9 files changed, 53 insertions(+), 2 deletions(-)
diff --git a/Documentation/virt/kvm/cpuid.rst b/Documentation/virt/kvm/cpuid.rst
index 01b081f6e7ea..fcb191bb3016 100644
--- a/Documentation/virt/kvm/cpuid.rst
+++ b/Documentation/virt/kvm/cpuid.rst
@@ -86,6 +86,10 @@ KVM_FEATURE_PV_SCHED_YIELD 13 guest checks this feature bit
before using paravirtualized
sched yield.
+KVM_FEATURE_SEV_LIVE_MIGRATION 14 guest checks this feature bit
+ before enabling SEV live
+ migration feature.
+
KVM_FEATURE_CLOCSOURCE_STABLE_BIT 24 host will warn if no guest-side
per-cpu warps are expeced in
kvmclock
diff --git a/Documentation/virt/kvm/msr.rst b/Documentation/virt/kvm/msr.rst
index 33892036672d..7cd7786bbb03 100644
--- a/Documentation/virt/kvm/msr.rst
+++ b/Documentation/virt/kvm/msr.rst
@@ -319,3 +319,13 @@ data:
KVM guests can request the host not to poll on HLT, for example if
they are performing polling themselves.
+
+MSR_KVM_SEV_LIVE_MIG_EN:
+ 0x4b564d06
+
+ Control SEV Live Migration features.
+
+data:
+ Bit 0 enables (1) or disables (0) host-side SEV Live Migration feature.
+ Bit 1 enables (1) or disables (0) support for SEV Live Migration extensions.
+ All other bits are reserved.
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index a96ef6338cd2..ad5faaed43c0 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -780,6 +780,9 @@ struct kvm_vcpu_arch {
u64 msr_kvm_poll_control;
+ /* SEV Live Migration MSR (AMD only) */
+ u64 msr_kvm_sev_live_migration_flag;
+
/*
* Indicates the guest is trying to write a gfn that contains one or
* more of the PTEs used to translate the write itself, i.e. the access
diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
index 2a8e0b6b9805..d9d4953b42ad 100644
--- a/arch/x86/include/uapi/asm/kvm_para.h
+++ b/arch/x86/include/uapi/asm/kvm_para.h
@@ -31,6 +31,7 @@
#define KVM_FEATURE_PV_SEND_IPI 11
#define KVM_FEATURE_POLL_CONTROL 12
#define KVM_FEATURE_PV_SCHED_YIELD 13
+#define KVM_FEATURE_SEV_LIVE_MIGRATION 14
#define KVM_HINTS_REALTIME 0
@@ -50,6 +51,7 @@
#define MSR_KVM_STEAL_TIME 0x4b564d03
#define MSR_KVM_PV_EOI_EN 0x4b564d04
#define MSR_KVM_POLL_CONTROL 0x4b564d05
+#define MSR_KVM_SEV_LIVE_MIG_EN 0x4b564d06
struct kvm_steal_time {
__u64 steal;
@@ -122,4 +124,7 @@ struct kvm_vcpu_pv_apf_data {
#define KVM_PV_EOI_ENABLED KVM_PV_EOI_MASK
#define KVM_PV_EOI_DISABLED 0x0
+#define KVM_SEV_LIVE_MIGRATION_ENABLED (1 << 0)
+#define KVM_SEV_LIVE_MIGRATION_EXTENSIONS_SUPPORTED (1 << 1)
+
#endif /* _UAPI_ASM_X86_KVM_PARA_H */
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 6efe0410fb72..8fcee0b45231 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -418,6 +418,10 @@ static void __init sev_map_percpu_data(void)
if (!sev_active())
return;
+ if (kvm_para_has_feature(KVM_FEATURE_SEV_LIVE_MIGRATION)) {
+ wrmsrl(MSR_KVM_SEV_LIVE_MIG_EN, KVM_SEV_LIVE_MIGRATION_ENABLED);
+ }
+
for_each_possible_cpu(cpu) {
__set_percpu_decrypted(&per_cpu(apf_reason, cpu), sizeof(apf_reason));
__set_percpu_decrypted(&per_cpu(steal_time, cpu), sizeof(steal_time));
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index b1c469446b07..74c8b2a7270c 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -716,7 +716,8 @@ static inline int __do_cpuid_func(struct kvm_cpuid_entry2 *entry, u32 function,
(1 << KVM_FEATURE_ASYNC_PF_VMEXIT) |
(1 << KVM_FEATURE_PV_SEND_IPI) |
(1 << KVM_FEATURE_POLL_CONTROL) |
- (1 << KVM_FEATURE_PV_SCHED_YIELD);
+ (1 << KVM_FEATURE_PV_SCHED_YIELD) |
+ (1 << KVM_FEATURE_SEV_LIVE_MIGRATION);
if (sched_info_on())
entry->eax |= (1 << KVM_FEATURE_STEAL_TIME);
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index c99b0207a443..60ddc242a133 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -7632,6 +7632,7 @@ static int svm_page_enc_status_hc(struct kvm *kvm, unsigned long gpa,
unsigned long npages, unsigned long enc)
{
struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+ struct kvm_vcpu *vcpu = kvm->vcpus[0];
kvm_pfn_t pfn_start, pfn_end;
gfn_t gfn_start, gfn_end;
int ret;
@@ -7639,6 +7640,10 @@ static int svm_page_enc_status_hc(struct kvm *kvm, unsigned long gpa,
if (!sev_guest(kvm))
return -EINVAL;
+ if (!(vcpu->arch.msr_kvm_sev_live_migration_flag &
+ KVM_SEV_LIVE_MIGRATION_ENABLED))
+ return -ENOTTY;
+
if (!npages)
return 0;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 2127ed937f53..82867b8798f8 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2880,6 +2880,10 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
vcpu->arch.msr_kvm_poll_control = data;
break;
+ case MSR_KVM_SEV_LIVE_MIG_EN:
+ vcpu->arch.msr_kvm_sev_live_migration_flag = data;
+ break;
+
case MSR_IA32_MCG_CTL:
case MSR_IA32_MCG_STATUS:
case MSR_IA32_MC0_CTL ... MSR_IA32_MCx_CTL(KVM_MAX_MCE_BANKS) - 1:
@@ -3126,6 +3130,9 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
case MSR_KVM_POLL_CONTROL:
msr_info->data = vcpu->arch.msr_kvm_poll_control;
break;
+ case MSR_KVM_SEV_LIVE_MIG_EN:
+ msr_info->data = vcpu->arch.msr_kvm_sev_live_migration_flag;
+ break;
case MSR_IA32_P5_MC_ADDR:
case MSR_IA32_P5_MC_TYPE:
case MSR_IA32_MCG_CAP:
diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index c9800fa811f6..f6a841494845 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -502,8 +502,20 @@ void __init mem_encrypt_init(void)
* With SEV, we need to make a hypercall when page encryption state is
* changed.
*/
- if (sev_active())
+ if (sev_active()) {
+ unsigned long nr_pages;
+
pv_ops.mmu.page_encryption_changed = set_memory_enc_dec_hypercall;
+
+ /*
+ * Ensure that _bss_decrypted section is marked as decrypted in the
+ * page encryption bitmap.
+ */
+ nr_pages = DIV_ROUND_UP(__end_bss_decrypted - __start_bss_decrypted,
+ PAGE_SIZE);
+ set_memory_enc_dec_hypercall((unsigned long)__start_bss_decrypted,
+ nr_pages, 0);
+ }
#endif
pr_info("AMD %s active\n",
--
2.17.1
From: Ashish Kalra <[email protected]>
Reset the host's page encryption bitmap related to kernel
specific page encryption status settings before we load a
new kernel by kexec. We cannot reset the complete
page encryption bitmap here as we need to retain the
UEFI/OVMF firmware specific settings.
Signed-off-by: Ashish Kalra <[email protected]>
---
arch/x86/kernel/kvm.c | 28 ++++++++++++++++++++++++++++
1 file changed, 28 insertions(+)
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 8fcee0b45231..ba6cce3c84af 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -34,6 +34,7 @@
#include <asm/hypervisor.h>
#include <asm/tlb.h>
#include <asm/cpuidle_haltpoll.h>
+#include <asm/e820/api.h>
static int kvmapf = 1;
@@ -357,6 +358,33 @@ static void kvm_pv_guest_cpu_reboot(void *unused)
*/
if (kvm_para_has_feature(KVM_FEATURE_PV_EOI))
wrmsrl(MSR_KVM_PV_EOI_EN, 0);
+ /*
+ * Reset the host's page encryption bitmap related to kernel
+ * specific page encryption status settings before we load a
+ * new kernel by kexec. NOTE: We cannot reset the complete
+ * page encryption bitmap here as we need to retain the
+ * UEFI/OVMF firmware specific settings.
+ */
+ if (kvm_para_has_feature(KVM_FEATURE_SEV_LIVE_MIGRATION) &&
+ (smp_processor_id() == 0)) {
+ unsigned long nr_pages;
+ int i;
+
+ for (i = 0; i < e820_table->nr_entries; i++) {
+ struct e820_entry *entry = &e820_table->entries[i];
+ unsigned long start_pfn, end_pfn;
+
+ if (entry->type != E820_TYPE_RAM)
+ continue;
+
+ start_pfn = entry->addr >> PAGE_SHIFT;
+ end_pfn = (entry->addr + entry->size) >> PAGE_SHIFT;
+ nr_pages = DIV_ROUND_UP(entry->size, PAGE_SIZE);
+
+ kvm_sev_hypercall3(KVM_HC_PAGE_ENC_STATUS,
+ entry->addr, nr_pages, 1);
+ }
+ }
kvm_pv_disable_apf();
kvm_disable_steal_time();
}
--
2.17.1
From: Ashish Kalra <[email protected]>
This ioctl can be used by the application to reset the page
encryption bitmap managed by the KVM driver. A typical usage
for this ioctl is on VM reboot, on reboot, we must reinitialize
the bitmap.
Signed-off-by: Ashish Kalra <[email protected]>
---
Documentation/virt/kvm/api.rst | 13 +++++++++++++
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/kvm/svm.c | 16 ++++++++++++++++
arch/x86/kvm/x86.c | 6 ++++++
include/uapi/linux/kvm.h | 1 +
5 files changed, 37 insertions(+)
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 4d1004a154f6..a11326ccc51d 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -4698,6 +4698,19 @@ During the guest live migration the outgoing guest exports its page encryption
bitmap, the KVM_SET_PAGE_ENC_BITMAP can be used to build the page encryption
bitmap for an incoming guest.
+4.127 KVM_PAGE_ENC_BITMAP_RESET (vm ioctl)
+-----------------------------------------
+
+:Capability: basic
+:Architectures: x86
+:Type: vm ioctl
+:Parameters: none
+:Returns: 0 on success, -1 on error
+
+The KVM_PAGE_ENC_BITMAP_RESET is used to reset the guest's page encryption
+bitmap during guest reboot and this is only done on the guest's boot vCPU.
+
+
5. The kvm_run structure
========================
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index d30f770aaaea..a96ef6338cd2 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1273,6 +1273,7 @@ struct kvm_x86_ops {
struct kvm_page_enc_bitmap *bmap);
int (*set_page_enc_bitmap)(struct kvm *kvm,
struct kvm_page_enc_bitmap *bmap);
+ int (*reset_page_enc_bitmap)(struct kvm *kvm);
};
struct kvm_arch_async_pf {
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 313343a43045..c99b0207a443 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -7797,6 +7797,21 @@ static int svm_set_page_enc_bitmap(struct kvm *kvm,
return ret;
}
+static int svm_reset_page_enc_bitmap(struct kvm *kvm)
+{
+ struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+
+ if (!sev_guest(kvm))
+ return -ENOTTY;
+
+ mutex_lock(&kvm->lock);
+ /* by default all pages should be marked encrypted */
+ if (sev->page_enc_bmap_size)
+ bitmap_fill(sev->page_enc_bmap, sev->page_enc_bmap_size);
+ mutex_unlock(&kvm->lock);
+ return 0;
+}
+
static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
{
struct kvm_sev_cmd sev_cmd;
@@ -8203,6 +8218,7 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
.page_enc_status_hc = svm_page_enc_status_hc,
.get_page_enc_bitmap = svm_get_page_enc_bitmap,
.set_page_enc_bitmap = svm_set_page_enc_bitmap,
+ .reset_page_enc_bitmap = svm_reset_page_enc_bitmap,
};
static int __init svm_init(void)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 05e953b2ec61..2127ed937f53 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5250,6 +5250,12 @@ long kvm_arch_vm_ioctl(struct file *filp,
r = kvm_x86_ops->set_page_enc_bitmap(kvm, &bitmap);
break;
}
+ case KVM_PAGE_ENC_BITMAP_RESET: {
+ r = -ENOTTY;
+ if (kvm_x86_ops->reset_page_enc_bitmap)
+ r = kvm_x86_ops->reset_page_enc_bitmap(kvm);
+ break;
+ }
default:
r = -ENOTTY;
}
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index b4b01d47e568..0884a581fc37 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1490,6 +1490,7 @@ struct kvm_enc_region {
#define KVM_GET_PAGE_ENC_BITMAP _IOW(KVMIO, 0xc5, struct kvm_page_enc_bitmap)
#define KVM_SET_PAGE_ENC_BITMAP _IOW(KVMIO, 0xc6, struct kvm_page_enc_bitmap)
+#define KVM_PAGE_ENC_BITMAP_RESET _IO(KVMIO, 0xc7)
/* Secure Encrypted Virtualization command */
enum sev_cmd_id {
--
2.17.1
On 3/30/20 1:23 AM, Ashish Kalra wrote:
> From: Ashish Kalra <[email protected]>
>
> Add new KVM_FEATURE_SEV_LIVE_MIGRATION feature for guest to check
> for host-side support for SEV live migration. Also add a new custom
> MSR_KVM_SEV_LIVE_MIG_EN for guest to enable the SEV live migration
> feature.
>
> Also, ensure that _bss_decrypted section is marked as decrypted in the
> page encryption bitmap.
>
> Signed-off-by: Ashish Kalra <[email protected]>
> ---
> Documentation/virt/kvm/cpuid.rst | 4 ++++
> Documentation/virt/kvm/msr.rst | 10 ++++++++++
> arch/x86/include/asm/kvm_host.h | 3 +++
> arch/x86/include/uapi/asm/kvm_para.h | 5 +++++
> arch/x86/kernel/kvm.c | 4 ++++
> arch/x86/kvm/cpuid.c | 3 ++-
> arch/x86/kvm/svm.c | 5 +++++
> arch/x86/kvm/x86.c | 7 +++++++
> arch/x86/mm/mem_encrypt.c | 14 +++++++++++++-
> 9 files changed, 53 insertions(+), 2 deletions(-)
IMHO, this patch should be broken into multiple patches as it touches
guest, and hypervisor at the same time. The first patch can introduce
the feature flag in the kvm, second patch can make the changes specific
to svm, and third patch can focus on how to make use of that feature
inside the guest. Additionally invoking the HC to clear the
__bss_decrypted section should be either squash in Patch 10/14 or be a
separate patch itself.
> diff --git a/Documentation/virt/kvm/cpuid.rst b/Documentation/virt/kvm/cpuid.rst
> index 01b081f6e7ea..fcb191bb3016 100644
> --- a/Documentation/virt/kvm/cpuid.rst
> +++ b/Documentation/virt/kvm/cpuid.rst
> @@ -86,6 +86,10 @@ KVM_FEATURE_PV_SCHED_YIELD 13 guest checks this feature bit
> before using paravirtualized
> sched yield.
>
> +KVM_FEATURE_SEV_LIVE_MIGRATION 14 guest checks this feature bit
> + before enabling SEV live
> + migration feature.
> +
> KVM_FEATURE_CLOCSOURCE_STABLE_BIT 24 host will warn if no guest-side
> per-cpu warps are expeced in
> kvmclock
> diff --git a/Documentation/virt/kvm/msr.rst b/Documentation/virt/kvm/msr.rst
> index 33892036672d..7cd7786bbb03 100644
> --- a/Documentation/virt/kvm/msr.rst
> +++ b/Documentation/virt/kvm/msr.rst
> @@ -319,3 +319,13 @@ data:
>
> KVM guests can request the host not to poll on HLT, for example if
> they are performing polling themselves.
> +
> +MSR_KVM_SEV_LIVE_MIG_EN:
> + 0x4b564d06
> +
> + Control SEV Live Migration features.
> +
> +data:
> + Bit 0 enables (1) or disables (0) host-side SEV Live Migration feature.
> + Bit 1 enables (1) or disables (0) support for SEV Live Migration extensions.
> + All other bits are reserved.
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index a96ef6338cd2..ad5faaed43c0 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -780,6 +780,9 @@ struct kvm_vcpu_arch {
>
> u64 msr_kvm_poll_control;
>
> + /* SEV Live Migration MSR (AMD only) */
> + u64 msr_kvm_sev_live_migration_flag;
> +
> /*
> * Indicates the guest is trying to write a gfn that contains one or
> * more of the PTEs used to translate the write itself, i.e. the access
> diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
> index 2a8e0b6b9805..d9d4953b42ad 100644
> --- a/arch/x86/include/uapi/asm/kvm_para.h
> +++ b/arch/x86/include/uapi/asm/kvm_para.h
> @@ -31,6 +31,7 @@
> #define KVM_FEATURE_PV_SEND_IPI 11
> #define KVM_FEATURE_POLL_CONTROL 12
> #define KVM_FEATURE_PV_SCHED_YIELD 13
> +#define KVM_FEATURE_SEV_LIVE_MIGRATION 14
>
> #define KVM_HINTS_REALTIME 0
>
> @@ -50,6 +51,7 @@
> #define MSR_KVM_STEAL_TIME 0x4b564d03
> #define MSR_KVM_PV_EOI_EN 0x4b564d04
> #define MSR_KVM_POLL_CONTROL 0x4b564d05
> +#define MSR_KVM_SEV_LIVE_MIG_EN 0x4b564d06
>
> struct kvm_steal_time {
> __u64 steal;
> @@ -122,4 +124,7 @@ struct kvm_vcpu_pv_apf_data {
> #define KVM_PV_EOI_ENABLED KVM_PV_EOI_MASK
> #define KVM_PV_EOI_DISABLED 0x0
>
> +#define KVM_SEV_LIVE_MIGRATION_ENABLED (1 << 0)
> +#define KVM_SEV_LIVE_MIGRATION_EXTENSIONS_SUPPORTED (1 << 1)
> +
> #endif /* _UAPI_ASM_X86_KVM_PARA_H */
> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> index 6efe0410fb72..8fcee0b45231 100644
> --- a/arch/x86/kernel/kvm.c
> +++ b/arch/x86/kernel/kvm.c
> @@ -418,6 +418,10 @@ static void __init sev_map_percpu_data(void)
> if (!sev_active())
> return;
>
> + if (kvm_para_has_feature(KVM_FEATURE_SEV_LIVE_MIGRATION)) {
> + wrmsrl(MSR_KVM_SEV_LIVE_MIG_EN, KVM_SEV_LIVE_MIGRATION_ENABLED);
> + }
> +
> for_each_possible_cpu(cpu) {
> __set_percpu_decrypted(&per_cpu(apf_reason, cpu), sizeof(apf_reason));
> __set_percpu_decrypted(&per_cpu(steal_time, cpu), sizeof(steal_time));
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index b1c469446b07..74c8b2a7270c 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -716,7 +716,8 @@ static inline int __do_cpuid_func(struct kvm_cpuid_entry2 *entry, u32 function,
> (1 << KVM_FEATURE_ASYNC_PF_VMEXIT) |
> (1 << KVM_FEATURE_PV_SEND_IPI) |
> (1 << KVM_FEATURE_POLL_CONTROL) |
> - (1 << KVM_FEATURE_PV_SCHED_YIELD);
> + (1 << KVM_FEATURE_PV_SCHED_YIELD) |
> + (1 << KVM_FEATURE_SEV_LIVE_MIGRATION);
Do we want to enable this feature unconditionally ? Who will clear the
feature flags for the non-SEV guest ?
>
> if (sched_info_on())
> entry->eax |= (1 << KVM_FEATURE_STEAL_TIME);
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index c99b0207a443..60ddc242a133 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -7632,6 +7632,7 @@ static int svm_page_enc_status_hc(struct kvm *kvm, unsigned long gpa,
> unsigned long npages, unsigned long enc)
> {
> struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> + struct kvm_vcpu *vcpu = kvm->vcpus[0];
> kvm_pfn_t pfn_start, pfn_end;
> gfn_t gfn_start, gfn_end;
> int ret;
> @@ -7639,6 +7640,10 @@ static int svm_page_enc_status_hc(struct kvm *kvm, unsigned long gpa,
> if (!sev_guest(kvm))
> return -EINVAL;
>
> + if (!(vcpu->arch.msr_kvm_sev_live_migration_flag &
> + KVM_SEV_LIVE_MIGRATION_ENABLED))
> + return -ENOTTY;
> +
> if (!npages)
> return 0;
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 2127ed937f53..82867b8798f8 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -2880,6 +2880,10 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> vcpu->arch.msr_kvm_poll_control = data;
> break;
>
> + case MSR_KVM_SEV_LIVE_MIG_EN:
> + vcpu->arch.msr_kvm_sev_live_migration_flag = data;
> + break;
> +
> case MSR_IA32_MCG_CTL:
> case MSR_IA32_MCG_STATUS:
> case MSR_IA32_MC0_CTL ... MSR_IA32_MCx_CTL(KVM_MAX_MCE_BANKS) - 1:
> @@ -3126,6 +3130,9 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> case MSR_KVM_POLL_CONTROL:
> msr_info->data = vcpu->arch.msr_kvm_poll_control;
> break;
> + case MSR_KVM_SEV_LIVE_MIG_EN:
> + msr_info->data = vcpu->arch.msr_kvm_sev_live_migration_flag;
> + break;
> case MSR_IA32_P5_MC_ADDR:
> case MSR_IA32_P5_MC_TYPE:
> case MSR_IA32_MCG_CAP:
> diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
> index c9800fa811f6..f6a841494845 100644
> --- a/arch/x86/mm/mem_encrypt.c
> +++ b/arch/x86/mm/mem_encrypt.c
> @@ -502,8 +502,20 @@ void __init mem_encrypt_init(void)
> * With SEV, we need to make a hypercall when page encryption state is
> * changed.
> */
> - if (sev_active())
> + if (sev_active()) {
> + unsigned long nr_pages;
> +
> pv_ops.mmu.page_encryption_changed = set_memory_enc_dec_hypercall;
> +
> + /*
> + * Ensure that _bss_decrypted section is marked as decrypted in the
> + * page encryption bitmap.
> + */
> + nr_pages = DIV_ROUND_UP(__end_bss_decrypted - __start_bss_decrypted,
> + PAGE_SIZE);
> + set_memory_enc_dec_hypercall((unsigned long)__start_bss_decrypted,
> + nr_pages, 0);
> + }
Isn't this too late, should we be making hypercall at the same time we
clear the encryption bit ?
> #endif
>
> pr_info("AMD %s active\n",
On 3/30/20 1:23 AM, Ashish Kalra wrote:
> From: Ashish Kalra <[email protected]>
>
> Reset the host's page encryption bitmap related to kernel
> specific page encryption status settings before we load a
> new kernel by kexec. We cannot reset the complete
> page encryption bitmap here as we need to retain the
> UEFI/OVMF firmware specific settings.
>
> Signed-off-by: Ashish Kalra <[email protected]>
> ---
> arch/x86/kernel/kvm.c | 28 ++++++++++++++++++++++++++++
> 1 file changed, 28 insertions(+)
>
> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> index 8fcee0b45231..ba6cce3c84af 100644
> --- a/arch/x86/kernel/kvm.c
> +++ b/arch/x86/kernel/kvm.c
> @@ -34,6 +34,7 @@
> #include <asm/hypervisor.h>
> #include <asm/tlb.h>
> #include <asm/cpuidle_haltpoll.h>
> +#include <asm/e820/api.h>
>
> static int kvmapf = 1;
>
> @@ -357,6 +358,33 @@ static void kvm_pv_guest_cpu_reboot(void *unused)
> */
> if (kvm_para_has_feature(KVM_FEATURE_PV_EOI))
> wrmsrl(MSR_KVM_PV_EOI_EN, 0);
> + /*
> + * Reset the host's page encryption bitmap related to kernel
> + * specific page encryption status settings before we load a
> + * new kernel by kexec. NOTE: We cannot reset the complete
> + * page encryption bitmap here as we need to retain the
> + * UEFI/OVMF firmware specific settings.
> + */
> + if (kvm_para_has_feature(KVM_FEATURE_SEV_LIVE_MIGRATION) &&
> + (smp_processor_id() == 0)) {
In patch 13/14, the KVM_FEATURE_SEV_LIVE_MIGRATION is set
unconditionally and because of that now the below code will be executed
on non-SEV guest. IMO, this feature must be cleared for non-SEV guest to
avoid making unnecessary hypercall's.
> + unsigned long nr_pages;
> + int i;
> +
> + for (i = 0; i < e820_table->nr_entries; i++) {
> + struct e820_entry *entry = &e820_table->entries[i];
> + unsigned long start_pfn, end_pfn;
> +
> + if (entry->type != E820_TYPE_RAM)
> + continue;
> +
> + start_pfn = entry->addr >> PAGE_SHIFT;
> + end_pfn = (entry->addr + entry->size) >> PAGE_SHIFT;
> + nr_pages = DIV_ROUND_UP(entry->size, PAGE_SIZE);
> +
> + kvm_sev_hypercall3(KVM_HC_PAGE_ENC_STATUS,
> + entry->addr, nr_pages, 1);
> + }
> + }
> kvm_pv_disable_apf();
> kvm_disable_steal_time();
> }
Hello Brijesh,
On Mon, Mar 30, 2020 at 10:52:16AM -0500, Brijesh Singh wrote:
>
> On 3/30/20 1:23 AM, Ashish Kalra wrote:
> > From: Ashish Kalra <[email protected]>
> >
> > Add new KVM_FEATURE_SEV_LIVE_MIGRATION feature for guest to check
> > for host-side support for SEV live migration. Also add a new custom
> > MSR_KVM_SEV_LIVE_MIG_EN for guest to enable the SEV live migration
> > feature.
> >
> > Also, ensure that _bss_decrypted section is marked as decrypted in the
> > page encryption bitmap.
> >
> > Signed-off-by: Ashish Kalra <[email protected]>
> > ---
> > Documentation/virt/kvm/cpuid.rst | 4 ++++
> > Documentation/virt/kvm/msr.rst | 10 ++++++++++
> > arch/x86/include/asm/kvm_host.h | 3 +++
> > arch/x86/include/uapi/asm/kvm_para.h | 5 +++++
> > arch/x86/kernel/kvm.c | 4 ++++
> > arch/x86/kvm/cpuid.c | 3 ++-
> > arch/x86/kvm/svm.c | 5 +++++
> > arch/x86/kvm/x86.c | 7 +++++++
> > arch/x86/mm/mem_encrypt.c | 14 +++++++++++++-
> > 9 files changed, 53 insertions(+), 2 deletions(-)
>
>
> IMHO, this patch should be broken into multiple patches as it touches
> guest, and hypervisor at the same time. The first patch can introduce
> the feature flag in the kvm, second patch can make the changes specific
> to svm,? and third patch can focus on how to make use of that feature
> inside the guest. Additionally invoking the HC to clear the
> __bss_decrypted section should be either squash in Patch 10/14 or be a
> separate patch itself.
>
>
Ok.
I will also move the __bss_decrypted section HC to a separate patch.
> > diff --git a/Documentation/virt/kvm/cpuid.rst b/Documentation/virt/kvm/cpuid.rst
> > index 01b081f6e7ea..fcb191bb3016 100644
> > --- a/Documentation/virt/kvm/cpuid.rst
> > +++ b/Documentation/virt/kvm/cpuid.rst
> > @@ -86,6 +86,10 @@ KVM_FEATURE_PV_SCHED_YIELD 13 guest checks this feature bit
> > before using paravirtualized
> > sched yield.
> >
> > +KVM_FEATURE_SEV_LIVE_MIGRATION 14 guest checks this feature bit
> > + before enabling SEV live
> > + migration feature.
> > +
> > KVM_FEATURE_CLOCSOURCE_STABLE_BIT 24 host will warn if no guest-side
> > per-cpu warps are expeced in
> > kvmclock
> > diff --git a/Documentation/virt/kvm/msr.rst b/Documentation/virt/kvm/msr.rst
> > index 33892036672d..7cd7786bbb03 100644
> > --- a/Documentation/virt/kvm/msr.rst
> > +++ b/Documentation/virt/kvm/msr.rst
> > @@ -319,3 +319,13 @@ data:
> >
> > KVM guests can request the host not to poll on HLT, for example if
> > they are performing polling themselves.
> > +
> > +MSR_KVM_SEV_LIVE_MIG_EN:
> > + 0x4b564d06
> > +
> > + Control SEV Live Migration features.
> > +
> > +data:
> > + Bit 0 enables (1) or disables (0) host-side SEV Live Migration feature.
> > + Bit 1 enables (1) or disables (0) support for SEV Live Migration extensions.
> > + All other bits are reserved.
> > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > index a96ef6338cd2..ad5faaed43c0 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -780,6 +780,9 @@ struct kvm_vcpu_arch {
> >
> > u64 msr_kvm_poll_control;
> >
> > + /* SEV Live Migration MSR (AMD only) */
> > + u64 msr_kvm_sev_live_migration_flag;
> > +
> > /*
> > * Indicates the guest is trying to write a gfn that contains one or
> > * more of the PTEs used to translate the write itself, i.e. the access
> > diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
> > index 2a8e0b6b9805..d9d4953b42ad 100644
> > --- a/arch/x86/include/uapi/asm/kvm_para.h
> > +++ b/arch/x86/include/uapi/asm/kvm_para.h
> > @@ -31,6 +31,7 @@
> > #define KVM_FEATURE_PV_SEND_IPI 11
> > #define KVM_FEATURE_POLL_CONTROL 12
> > #define KVM_FEATURE_PV_SCHED_YIELD 13
> > +#define KVM_FEATURE_SEV_LIVE_MIGRATION 14
> >
> > #define KVM_HINTS_REALTIME 0
> >
> > @@ -50,6 +51,7 @@
> > #define MSR_KVM_STEAL_TIME 0x4b564d03
> > #define MSR_KVM_PV_EOI_EN 0x4b564d04
> > #define MSR_KVM_POLL_CONTROL 0x4b564d05
> > +#define MSR_KVM_SEV_LIVE_MIG_EN 0x4b564d06
> >
> > struct kvm_steal_time {
> > __u64 steal;
> > @@ -122,4 +124,7 @@ struct kvm_vcpu_pv_apf_data {
> > #define KVM_PV_EOI_ENABLED KVM_PV_EOI_MASK
> > #define KVM_PV_EOI_DISABLED 0x0
> >
> > +#define KVM_SEV_LIVE_MIGRATION_ENABLED (1 << 0)
> > +#define KVM_SEV_LIVE_MIGRATION_EXTENSIONS_SUPPORTED (1 << 1)
> > +
> > #endif /* _UAPI_ASM_X86_KVM_PARA_H */
> > diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> > index 6efe0410fb72..8fcee0b45231 100644
> > --- a/arch/x86/kernel/kvm.c
> > +++ b/arch/x86/kernel/kvm.c
> > @@ -418,6 +418,10 @@ static void __init sev_map_percpu_data(void)
> > if (!sev_active())
> > return;
> >
> > + if (kvm_para_has_feature(KVM_FEATURE_SEV_LIVE_MIGRATION)) {
> > + wrmsrl(MSR_KVM_SEV_LIVE_MIG_EN, KVM_SEV_LIVE_MIGRATION_ENABLED);
> > + }
> > +
> > for_each_possible_cpu(cpu) {
> > __set_percpu_decrypted(&per_cpu(apf_reason, cpu), sizeof(apf_reason));
> > __set_percpu_decrypted(&per_cpu(steal_time, cpu), sizeof(steal_time));
> > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> > index b1c469446b07..74c8b2a7270c 100644
> > --- a/arch/x86/kvm/cpuid.c
> > +++ b/arch/x86/kvm/cpuid.c
> > @@ -716,7 +716,8 @@ static inline int __do_cpuid_func(struct kvm_cpuid_entry2 *entry, u32 function,
> > (1 << KVM_FEATURE_ASYNC_PF_VMEXIT) |
> > (1 << KVM_FEATURE_PV_SEND_IPI) |
> > (1 << KVM_FEATURE_POLL_CONTROL) |
> > - (1 << KVM_FEATURE_PV_SCHED_YIELD);
> > + (1 << KVM_FEATURE_PV_SCHED_YIELD) |
> > + (1 << KVM_FEATURE_SEV_LIVE_MIGRATION);
>
>
> Do we want to enable this feature unconditionally ? Who will clear the
> feature flags for the non-SEV guest ?
>
The guest only enables/activates this feature if sev is active.
> >
> > if (sched_info_on())
> > entry->eax |= (1 << KVM_FEATURE_STEAL_TIME);
> > diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> > index c99b0207a443..60ddc242a133 100644
> > --- a/arch/x86/kvm/svm.c
> > +++ b/arch/x86/kvm/svm.c
> > @@ -7632,6 +7632,7 @@ static int svm_page_enc_status_hc(struct kvm *kvm, unsigned long gpa,
> > unsigned long npages, unsigned long enc)
> > {
> > struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> > + struct kvm_vcpu *vcpu = kvm->vcpus[0];
> > kvm_pfn_t pfn_start, pfn_end;
> > gfn_t gfn_start, gfn_end;
> > int ret;
> > @@ -7639,6 +7640,10 @@ static int svm_page_enc_status_hc(struct kvm *kvm, unsigned long gpa,
> > if (!sev_guest(kvm))
> > return -EINVAL;
> >
> > + if (!(vcpu->arch.msr_kvm_sev_live_migration_flag &
> > + KVM_SEV_LIVE_MIGRATION_ENABLED))
> > + return -ENOTTY;
> > +
> > if (!npages)
> > return 0;
> >
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 2127ed937f53..82867b8798f8 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -2880,6 +2880,10 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> > vcpu->arch.msr_kvm_poll_control = data;
> > break;
> >
> > + case MSR_KVM_SEV_LIVE_MIG_EN:
> > + vcpu->arch.msr_kvm_sev_live_migration_flag = data;
> > + break;
> > +
> > case MSR_IA32_MCG_CTL:
> > case MSR_IA32_MCG_STATUS:
> > case MSR_IA32_MC0_CTL ... MSR_IA32_MCx_CTL(KVM_MAX_MCE_BANKS) - 1:
> > @@ -3126,6 +3130,9 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> > case MSR_KVM_POLL_CONTROL:
> > msr_info->data = vcpu->arch.msr_kvm_poll_control;
> > break;
> > + case MSR_KVM_SEV_LIVE_MIG_EN:
> > + msr_info->data = vcpu->arch.msr_kvm_sev_live_migration_flag;
> > + break;
> > case MSR_IA32_P5_MC_ADDR:
> > case MSR_IA32_P5_MC_TYPE:
> > case MSR_IA32_MCG_CAP:
> > diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
> > index c9800fa811f6..f6a841494845 100644
> > --- a/arch/x86/mm/mem_encrypt.c
> > +++ b/arch/x86/mm/mem_encrypt.c
> > @@ -502,8 +502,20 @@ void __init mem_encrypt_init(void)
> > * With SEV, we need to make a hypercall when page encryption state is
> > * changed.
> > */
> > - if (sev_active())
> > + if (sev_active()) {
> > + unsigned long nr_pages;
> > +
> > pv_ops.mmu.page_encryption_changed = set_memory_enc_dec_hypercall;
> > +
> > + /*
> > + * Ensure that _bss_decrypted section is marked as decrypted in the
> > + * page encryption bitmap.
> > + */
> > + nr_pages = DIV_ROUND_UP(__end_bss_decrypted - __start_bss_decrypted,
> > + PAGE_SIZE);
> > + set_memory_enc_dec_hypercall((unsigned long)__start_bss_decrypted,
> > + nr_pages, 0);
> > + }
>
>
> Isn't this too late, should we be making hypercall at the same time we
> clear the encryption bit ?
>
>
Actually this is being done somewhat lazily, after the guest enables/activates the live migration feature, it should be fine to do it
here or it can be moved into sev_map_percpu_data() where the first hypercalls are done, in both cases the __bss_decrypted section will
be marked before the live migration process is initiated.
> > #endif
> >
> > pr_info("AMD %s active\n",
Thanks,
Ashish
Hello Brijesh,
On Mon, Mar 30, 2020 at 11:00:14AM -0500, Brijesh Singh wrote:
>
> On 3/30/20 1:23 AM, Ashish Kalra wrote:
> > From: Ashish Kalra <[email protected]>
> >
> > Reset the host's page encryption bitmap related to kernel
> > specific page encryption status settings before we load a
> > new kernel by kexec. We cannot reset the complete
> > page encryption bitmap here as we need to retain the
> > UEFI/OVMF firmware specific settings.
> >
> > Signed-off-by: Ashish Kalra <[email protected]>
> > ---
> > arch/x86/kernel/kvm.c | 28 ++++++++++++++++++++++++++++
> > 1 file changed, 28 insertions(+)
> >
> > diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> > index 8fcee0b45231..ba6cce3c84af 100644
> > --- a/arch/x86/kernel/kvm.c
> > +++ b/arch/x86/kernel/kvm.c
> > @@ -34,6 +34,7 @@
> > #include <asm/hypervisor.h>
> > #include <asm/tlb.h>
> > #include <asm/cpuidle_haltpoll.h>
> > +#include <asm/e820/api.h>
> >
> > static int kvmapf = 1;
> >
> > @@ -357,6 +358,33 @@ static void kvm_pv_guest_cpu_reboot(void *unused)
> > */
> > if (kvm_para_has_feature(KVM_FEATURE_PV_EOI))
> > wrmsrl(MSR_KVM_PV_EOI_EN, 0);
> > + /*
> > + * Reset the host's page encryption bitmap related to kernel
> > + * specific page encryption status settings before we load a
> > + * new kernel by kexec. NOTE: We cannot reset the complete
> > + * page encryption bitmap here as we need to retain the
> > + * UEFI/OVMF firmware specific settings.
> > + */
> > + if (kvm_para_has_feature(KVM_FEATURE_SEV_LIVE_MIGRATION) &&
> > + (smp_processor_id() == 0)) {
>
>
> In patch 13/14, the KVM_FEATURE_SEV_LIVE_MIGRATION is set
> unconditionally and because of that now the below code will be executed
> on non-SEV guest. IMO, this feature must be cleared for non-SEV guest to
> avoid making unnecessary hypercall's.
>
>
I will additionally add a sev_active() check here to ensure that we don't make the unnecassary hypercalls on non-SEV guests.
> > + unsigned long nr_pages;
> > + int i;
> > +
> > + for (i = 0; i < e820_table->nr_entries; i++) {
> > + struct e820_entry *entry = &e820_table->entries[i];
> > + unsigned long start_pfn, end_pfn;
> > +
> > + if (entry->type != E820_TYPE_RAM)
> > + continue;
> > +
> > + start_pfn = entry->addr >> PAGE_SHIFT;
> > + end_pfn = (entry->addr + entry->size) >> PAGE_SHIFT;
> > + nr_pages = DIV_ROUND_UP(entry->size, PAGE_SIZE);
> > +
> > + kvm_sev_hypercall3(KVM_HC_PAGE_ENC_STATUS,
> > + entry->addr, nr_pages, 1);
> > + }
> > + }
> > kvm_pv_disable_apf();
> > kvm_disable_steal_time();
> > }
Thanks,
Ashish
On 2020-03-30 06:19:27 +0000, Ashish Kalra wrote:
> From: Ashish Kalra <[email protected]>
>
> The series add support for AMD SEV guest live migration commands. To protect the
> confidentiality of an SEV protected guest memory while in transit we need to
> use the SEV commands defined in SEV API spec [1].
>
> SEV guest VMs have the concept of private and shared memory. Private memory
> is encrypted with the guest-specific key, while shared memory may be encrypted
> with hypervisor key. The commands provided by the SEV FW are meant to be used
> for the private memory only. The patch series introduces a new hypercall.
> The guest OS can use this hypercall to notify the page encryption status.
> If the page is encrypted with guest specific-key then we use SEV command during
> the migration. If page is not encrypted then fallback to default.
>
> The patch adds new ioctls KVM_{SET,GET}_PAGE_ENC_BITMAP. The ioctl can be used
> by the qemu to get the page encrypted bitmap. Qemu can consult this bitmap
> during the migration to know whether the page is encrypted.
>
> [1] https://developer.amd.com/wp-content/resources/55766.PDF
>
> Changes since v5:
> - Fix build errors as
> Reported-by: kbuild test robot <[email protected]>
Which upstream tag should I use to apply this patch set? I tried the
top of Linus's tree, and I get the following error when I apply this
patch set.
$ git am PATCH-v6-01-14-KVM-SVM-Add-KVM_SEV-SEND_START-command.mbox
Applying: KVM: SVM: Add KVM_SEV SEND_START command
Applying: KVM: SVM: Add KVM_SEND_UPDATE_DATA command
Applying: KVM: SVM: Add KVM_SEV_SEND_FINISH command
Applying: KVM: SVM: Add KVM_SEV_RECEIVE_FINISH command
error: patch failed: Documentation/virt/kvm/amd-memory-encryption.rst:375
error: Documentation/virt/kvm/amd-memory-encryption.rst: patch does not apply
error: patch failed: arch/x86/kvm/svm.c:7632
error: arch/x86/kvm/svm.c: patch does not apply
Patch failed at 0004 KVM: SVM: Add KVM_SEV_RECEIVE_FINISH command
Thanks,
Venu
>
> Changes since v4:
> - Host support has been added to extend KVM capabilities/feature bits to
> include a new KVM_FEATURE_SEV_LIVE_MIGRATION, which the guest can
> query for host-side support for SEV live migration and a new custom MSR
> MSR_KVM_SEV_LIVE_MIG_EN is added for guest to enable the SEV live
> migration feature.
> - Ensure that _bss_decrypted section is marked as decrypted in the
> page encryption bitmap.
> - Fixing KVM_GET_PAGE_ENC_BITMAP ioctl to return the correct bitmap
> as per the number of pages being requested by the user. Ensure that
> we only copy bmap->num_pages bytes in the userspace buffer, if
> bmap->num_pages is not byte aligned we read the trailing bits
> from the userspace and copy those bits as is. This fixes guest
> page(s) corruption issues observed after migration completion.
> - Add kexec support for SEV Live Migration to reset the host's
> page encryption bitmap related to kernel specific page encryption
> status settings before we load a new kernel by kexec. We cannot
> reset the complete page encryption bitmap here as we need to
> retain the UEFI/OVMF firmware specific settings.
>
> Changes since v3:
> - Rebasing to mainline and testing.
> - Adding a new KVM_PAGE_ENC_BITMAP_RESET ioctl, which resets the
> page encryption bitmap on a guest reboot event.
> - Adding a more reliable sanity check for GPA range being passed to
> the hypercall to ensure that guest MMIO ranges are also marked
> in the page encryption bitmap.
>
> Changes since v2:
> - reset the page encryption bitmap on vcpu reboot
>
> Changes since v1:
> - Add support to share the page encryption between the source and target
> machine.
> - Fix review feedbacks from Tom Lendacky.
> - Add check to limit the session blob length.
> - Update KVM_GET_PAGE_ENC_BITMAP icotl to use the base_gfn instead of
> the memory slot when querying the bitmap.
>
> Ashish Kalra (3):
> KVM: x86: Introduce KVM_PAGE_ENC_BITMAP_RESET ioctl
> KVM: x86: Introduce new KVM_FEATURE_SEV_LIVE_MIGRATION feature &
> Custom MSR.
> KVM: x86: Add kexec support for SEV Live Migration.
>
> Brijesh Singh (11):
> KVM: SVM: Add KVM_SEV SEND_START command
> KVM: SVM: Add KVM_SEND_UPDATE_DATA command
> KVM: SVM: Add KVM_SEV_SEND_FINISH command
> KVM: SVM: Add support for KVM_SEV_RECEIVE_START command
> KVM: SVM: Add KVM_SEV_RECEIVE_UPDATE_DATA command
> KVM: SVM: Add KVM_SEV_RECEIVE_FINISH command
> KVM: x86: Add AMD SEV specific Hypercall3
> KVM: X86: Introduce KVM_HC_PAGE_ENC_STATUS hypercall
> KVM: x86: Introduce KVM_GET_PAGE_ENC_BITMAP ioctl
> mm: x86: Invoke hypercall when page encryption status is changed
> KVM: x86: Introduce KVM_SET_PAGE_ENC_BITMAP ioctl
>
> .../virt/kvm/amd-memory-encryption.rst | 120 +++
> Documentation/virt/kvm/api.rst | 62 ++
> Documentation/virt/kvm/cpuid.rst | 4 +
> Documentation/virt/kvm/hypercalls.rst | 15 +
> Documentation/virt/kvm/msr.rst | 10 +
> arch/x86/include/asm/kvm_host.h | 10 +
> arch/x86/include/asm/kvm_para.h | 12 +
> arch/x86/include/asm/paravirt.h | 10 +
> arch/x86/include/asm/paravirt_types.h | 2 +
> arch/x86/include/uapi/asm/kvm_para.h | 5 +
> arch/x86/kernel/kvm.c | 32 +
> arch/x86/kernel/paravirt.c | 1 +
> arch/x86/kvm/cpuid.c | 3 +-
> arch/x86/kvm/svm.c | 699 +++++++++++++++++-
> arch/x86/kvm/vmx/vmx.c | 1 +
> arch/x86/kvm/x86.c | 43 ++
> arch/x86/mm/mem_encrypt.c | 69 +-
> arch/x86/mm/pat/set_memory.c | 7 +
> include/linux/psp-sev.h | 8 +-
> include/uapi/linux/kvm.h | 53 ++
> include/uapi/linux/kvm_para.h | 1 +
> 21 files changed, 1157 insertions(+), 10 deletions(-)
>
> --
> 2.17.1
>
This is applied on top of Linux 5.6, as per commit below :
commit 7111951b8d4973bda27ff663f2cf18b663d15b48 (tag: v5.6, origin/master, origin/HEAD)
Author: Linus Torvalds <[email protected]>
Date: Sun Mar 29 15:25:41 2020 -0700
Linux 5.6
Makefile | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
Thanks,
Ashish
On Mon, Mar 30, 2020 at 12:24:46PM -0500, Venu Busireddy wrote:
> On 2020-03-30 06:19:27 +0000, Ashish Kalra wrote:
> > From: Ashish Kalra <[email protected]>
> >
> > The series add support for AMD SEV guest live migration commands. To protect the
> > confidentiality of an SEV protected guest memory while in transit we need to
> > use the SEV commands defined in SEV API spec [1].
> >
> > SEV guest VMs have the concept of private and shared memory. Private memory
> > is encrypted with the guest-specific key, while shared memory may be encrypted
> > with hypervisor key. The commands provided by the SEV FW are meant to be used
> > for the private memory only. The patch series introduces a new hypercall.
> > The guest OS can use this hypercall to notify the page encryption status.
> > If the page is encrypted with guest specific-key then we use SEV command during
> > the migration. If page is not encrypted then fallback to default.
> >
> > The patch adds new ioctls KVM_{SET,GET}_PAGE_ENC_BITMAP. The ioctl can be used
> > by the qemu to get the page encrypted bitmap. Qemu can consult this bitmap
> > during the migration to know whether the page is encrypted.
> >
> > [1] https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdeveloper.amd.com%2Fwp-content%2Fresources%2F55766.PDF&data=02%7C01%7CAshish.Kalra%40amd.com%7Cb87828d7e1eb41fe401c08d7d4cf4937%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637211859771455979&sdata=yN1OrvcuNb%2F8JAaLwlf2pIJtEvBRFOSvTKPYWz9ASUY%3D&reserved=0
> >
> > Changes since v5:
> > - Fix build errors as
> > Reported-by: kbuild test robot <[email protected]>
>
> Which upstream tag should I use to apply this patch set? I tried the
> top of Linus's tree, and I get the following error when I apply this
> patch set.
>
> $ git am PATCH-v6-01-14-KVM-SVM-Add-KVM_SEV-SEND_START-command.mbox
> Applying: KVM: SVM: Add KVM_SEV SEND_START command
> Applying: KVM: SVM: Add KVM_SEND_UPDATE_DATA command
> Applying: KVM: SVM: Add KVM_SEV_SEND_FINISH command
> Applying: KVM: SVM: Add KVM_SEV_RECEIVE_FINISH command
> error: patch failed: Documentation/virt/kvm/amd-memory-encryption.rst:375
> error: Documentation/virt/kvm/amd-memory-encryption.rst: patch does not apply
> error: patch failed: arch/x86/kvm/svm.c:7632
> error: arch/x86/kvm/svm.c: patch does not apply
> Patch failed at 0004 KVM: SVM: Add KVM_SEV_RECEIVE_FINISH command
>
> Thanks,
>
> Venu
>
> >
> > Changes since v4:
> > - Host support has been added to extend KVM capabilities/feature bits to
> > include a new KVM_FEATURE_SEV_LIVE_MIGRATION, which the guest can
> > query for host-side support for SEV live migration and a new custom MSR
> > MSR_KVM_SEV_LIVE_MIG_EN is added for guest to enable the SEV live
> > migration feature.
> > - Ensure that _bss_decrypted section is marked as decrypted in the
> > page encryption bitmap.
> > - Fixing KVM_GET_PAGE_ENC_BITMAP ioctl to return the correct bitmap
> > as per the number of pages being requested by the user. Ensure that
> > we only copy bmap->num_pages bytes in the userspace buffer, if
> > bmap->num_pages is not byte aligned we read the trailing bits
> > from the userspace and copy those bits as is. This fixes guest
> > page(s) corruption issues observed after migration completion.
> > - Add kexec support for SEV Live Migration to reset the host's
> > page encryption bitmap related to kernel specific page encryption
> > status settings before we load a new kernel by kexec. We cannot
> > reset the complete page encryption bitmap here as we need to
> > retain the UEFI/OVMF firmware specific settings.
> >
> > Changes since v3:
> > - Rebasing to mainline and testing.
> > - Adding a new KVM_PAGE_ENC_BITMAP_RESET ioctl, which resets the
> > page encryption bitmap on a guest reboot event.
> > - Adding a more reliable sanity check for GPA range being passed to
> > the hypercall to ensure that guest MMIO ranges are also marked
> > in the page encryption bitmap.
> >
> > Changes since v2:
> > - reset the page encryption bitmap on vcpu reboot
> >
> > Changes since v1:
> > - Add support to share the page encryption between the source and target
> > machine.
> > - Fix review feedbacks from Tom Lendacky.
> > - Add check to limit the session blob length.
> > - Update KVM_GET_PAGE_ENC_BITMAP icotl to use the base_gfn instead of
> > the memory slot when querying the bitmap.
> >
> > Ashish Kalra (3):
> > KVM: x86: Introduce KVM_PAGE_ENC_BITMAP_RESET ioctl
> > KVM: x86: Introduce new KVM_FEATURE_SEV_LIVE_MIGRATION feature &
> > Custom MSR.
> > KVM: x86: Add kexec support for SEV Live Migration.
> >
> > Brijesh Singh (11):
> > KVM: SVM: Add KVM_SEV SEND_START command
> > KVM: SVM: Add KVM_SEND_UPDATE_DATA command
> > KVM: SVM: Add KVM_SEV_SEND_FINISH command
> > KVM: SVM: Add support for KVM_SEV_RECEIVE_START command
> > KVM: SVM: Add KVM_SEV_RECEIVE_UPDATE_DATA command
> > KVM: SVM: Add KVM_SEV_RECEIVE_FINISH command
> > KVM: x86: Add AMD SEV specific Hypercall3
> > KVM: X86: Introduce KVM_HC_PAGE_ENC_STATUS hypercall
> > KVM: x86: Introduce KVM_GET_PAGE_ENC_BITMAP ioctl
> > mm: x86: Invoke hypercall when page encryption status is changed
> > KVM: x86: Introduce KVM_SET_PAGE_ENC_BITMAP ioctl
> >
> > .../virt/kvm/amd-memory-encryption.rst | 120 +++
> > Documentation/virt/kvm/api.rst | 62 ++
> > Documentation/virt/kvm/cpuid.rst | 4 +
> > Documentation/virt/kvm/hypercalls.rst | 15 +
> > Documentation/virt/kvm/msr.rst | 10 +
> > arch/x86/include/asm/kvm_host.h | 10 +
> > arch/x86/include/asm/kvm_para.h | 12 +
> > arch/x86/include/asm/paravirt.h | 10 +
> > arch/x86/include/asm/paravirt_types.h | 2 +
> > arch/x86/include/uapi/asm/kvm_para.h | 5 +
> > arch/x86/kernel/kvm.c | 32 +
> > arch/x86/kernel/paravirt.c | 1 +
> > arch/x86/kvm/cpuid.c | 3 +-
> > arch/x86/kvm/svm.c | 699 +++++++++++++++++-
> > arch/x86/kvm/vmx/vmx.c | 1 +
> > arch/x86/kvm/x86.c | 43 ++
> > arch/x86/mm/mem_encrypt.c | 69 +-
> > arch/x86/mm/pat/set_memory.c | 7 +
> > include/linux/psp-sev.h | 8 +-
> > include/uapi/linux/kvm.h | 53 ++
> > include/uapi/linux/kvm_para.h | 1 +
> > 21 files changed, 1157 insertions(+), 10 deletions(-)
> >
> > --
> > 2.17.1
> >
On 2020-03-30 18:28:45 +0000, Ashish Kalra wrote:
> This is applied on top of Linux 5.6, as per commit below :
>
> commit 7111951b8d4973bda27ff663f2cf18b663d15b48 (tag: v5.6, origin/master, origin/HEAD)
> Author: Linus Torvalds <[email protected]>
> Date: Sun Mar 29 15:25:41 2020 -0700
>
> Linux 5.6
>
> Makefile | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
Not sure what I am missing here! This the current state of my sandbox:
$ git remote -v
origin git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git (fetch)
origin git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git (push)
$ git log --oneline
12acbbfef749 (HEAD -> master) KVM: SVM: Add KVM_SEV_SEND_FINISH command
e5f21e48bfff KVM: SVM: Add KVM_SEND_UPDATE_DATA command
6b2bcf682d08 KVM: SVM: Add KVM_SEV SEND_START command
7111951b8d49 (tag: v5.6, origin/master, origin/HEAD) Linux 5.6
$ git status
On branch master
Your branch is ahead of 'origin/master' by 3 commits.
As can be seen, I started with the commit (7111951b8d49) you mentioned.
I could apply 3 of the patches, but 04/14 is failing.
Any suggestions?
Thanks,
Venu
> Thanks,
> Ashish
>
> On Mon, Mar 30, 2020 at 12:24:46PM -0500, Venu Busireddy wrote:
> > On 2020-03-30 06:19:27 +0000, Ashish Kalra wrote:
> > > From: Ashish Kalra <[email protected]>
> > >
> > > The series add support for AMD SEV guest live migration commands. To protect the
> > > confidentiality of an SEV protected guest memory while in transit we need to
> > > use the SEV commands defined in SEV API spec [1].
> > >
> > > SEV guest VMs have the concept of private and shared memory. Private memory
> > > is encrypted with the guest-specific key, while shared memory may be encrypted
> > > with hypervisor key. The commands provided by the SEV FW are meant to be used
> > > for the private memory only. The patch series introduces a new hypercall.
> > > The guest OS can use this hypercall to notify the page encryption status.
> > > If the page is encrypted with guest specific-key then we use SEV command during
> > > the migration. If page is not encrypted then fallback to default.
> > >
> > > The patch adds new ioctls KVM_{SET,GET}_PAGE_ENC_BITMAP. The ioctl can be used
> > > by the qemu to get the page encrypted bitmap. Qemu can consult this bitmap
> > > during the migration to know whether the page is encrypted.
> > >
> > > [1] https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdeveloper.amd.com%2Fwp-content%2Fresources%2F55766.PDF&data=02%7C01%7CAshish.Kalra%40amd.com%7Cb87828d7e1eb41fe401c08d7d4cf4937%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637211859771455979&sdata=yN1OrvcuNb%2F8JAaLwlf2pIJtEvBRFOSvTKPYWz9ASUY%3D&reserved=0
> > >
> > > Changes since v5:
> > > - Fix build errors as
> > > Reported-by: kbuild test robot <[email protected]>
> >
> > Which upstream tag should I use to apply this patch set? I tried the
> > top of Linus's tree, and I get the following error when I apply this
> > patch set.
> >
> > $ git am PATCH-v6-01-14-KVM-SVM-Add-KVM_SEV-SEND_START-command.mbox
> > Applying: KVM: SVM: Add KVM_SEV SEND_START command
> > Applying: KVM: SVM: Add KVM_SEND_UPDATE_DATA command
> > Applying: KVM: SVM: Add KVM_SEV_SEND_FINISH command
> > Applying: KVM: SVM: Add KVM_SEV_RECEIVE_FINISH command
> > error: patch failed: Documentation/virt/kvm/amd-memory-encryption.rst:375
> > error: Documentation/virt/kvm/amd-memory-encryption.rst: patch does not apply
> > error: patch failed: arch/x86/kvm/svm.c:7632
> > error: arch/x86/kvm/svm.c: patch does not apply
> > Patch failed at 0004 KVM: SVM: Add KVM_SEV_RECEIVE_FINISH command
> >
> > Thanks,
> >
> > Venu
> >
> > >
> > > Changes since v4:
> > > - Host support has been added to extend KVM capabilities/feature bits to
> > > include a new KVM_FEATURE_SEV_LIVE_MIGRATION, which the guest can
> > > query for host-side support for SEV live migration and a new custom MSR
> > > MSR_KVM_SEV_LIVE_MIG_EN is added for guest to enable the SEV live
> > > migration feature.
> > > - Ensure that _bss_decrypted section is marked as decrypted in the
> > > page encryption bitmap.
> > > - Fixing KVM_GET_PAGE_ENC_BITMAP ioctl to return the correct bitmap
> > > as per the number of pages being requested by the user. Ensure that
> > > we only copy bmap->num_pages bytes in the userspace buffer, if
> > > bmap->num_pages is not byte aligned we read the trailing bits
> > > from the userspace and copy those bits as is. This fixes guest
> > > page(s) corruption issues observed after migration completion.
> > > - Add kexec support for SEV Live Migration to reset the host's
> > > page encryption bitmap related to kernel specific page encryption
> > > status settings before we load a new kernel by kexec. We cannot
> > > reset the complete page encryption bitmap here as we need to
> > > retain the UEFI/OVMF firmware specific settings.
> > >
> > > Changes since v3:
> > > - Rebasing to mainline and testing.
> > > - Adding a new KVM_PAGE_ENC_BITMAP_RESET ioctl, which resets the
> > > page encryption bitmap on a guest reboot event.
> > > - Adding a more reliable sanity check for GPA range being passed to
> > > the hypercall to ensure that guest MMIO ranges are also marked
> > > in the page encryption bitmap.
> > >
> > > Changes since v2:
> > > - reset the page encryption bitmap on vcpu reboot
> > >
> > > Changes since v1:
> > > - Add support to share the page encryption between the source and target
> > > machine.
> > > - Fix review feedbacks from Tom Lendacky.
> > > - Add check to limit the session blob length.
> > > - Update KVM_GET_PAGE_ENC_BITMAP icotl to use the base_gfn instead of
> > > the memory slot when querying the bitmap.
> > >
> > > Ashish Kalra (3):
> > > KVM: x86: Introduce KVM_PAGE_ENC_BITMAP_RESET ioctl
> > > KVM: x86: Introduce new KVM_FEATURE_SEV_LIVE_MIGRATION feature &
> > > Custom MSR.
> > > KVM: x86: Add kexec support for SEV Live Migration.
> > >
> > > Brijesh Singh (11):
> > > KVM: SVM: Add KVM_SEV SEND_START command
> > > KVM: SVM: Add KVM_SEND_UPDATE_DATA command
> > > KVM: SVM: Add KVM_SEV_SEND_FINISH command
> > > KVM: SVM: Add support for KVM_SEV_RECEIVE_START command
> > > KVM: SVM: Add KVM_SEV_RECEIVE_UPDATE_DATA command
> > > KVM: SVM: Add KVM_SEV_RECEIVE_FINISH command
> > > KVM: x86: Add AMD SEV specific Hypercall3
> > > KVM: X86: Introduce KVM_HC_PAGE_ENC_STATUS hypercall
> > > KVM: x86: Introduce KVM_GET_PAGE_ENC_BITMAP ioctl
> > > mm: x86: Invoke hypercall when page encryption status is changed
> > > KVM: x86: Introduce KVM_SET_PAGE_ENC_BITMAP ioctl
> > >
> > > .../virt/kvm/amd-memory-encryption.rst | 120 +++
> > > Documentation/virt/kvm/api.rst | 62 ++
> > > Documentation/virt/kvm/cpuid.rst | 4 +
> > > Documentation/virt/kvm/hypercalls.rst | 15 +
> > > Documentation/virt/kvm/msr.rst | 10 +
> > > arch/x86/include/asm/kvm_host.h | 10 +
> > > arch/x86/include/asm/kvm_para.h | 12 +
> > > arch/x86/include/asm/paravirt.h | 10 +
> > > arch/x86/include/asm/paravirt_types.h | 2 +
> > > arch/x86/include/uapi/asm/kvm_para.h | 5 +
> > > arch/x86/kernel/kvm.c | 32 +
> > > arch/x86/kernel/paravirt.c | 1 +
> > > arch/x86/kvm/cpuid.c | 3 +-
> > > arch/x86/kvm/svm.c | 699 +++++++++++++++++-
> > > arch/x86/kvm/vmx/vmx.c | 1 +
> > > arch/x86/kvm/x86.c | 43 ++
> > > arch/x86/mm/mem_encrypt.c | 69 +-
> > > arch/x86/mm/pat/set_memory.c | 7 +
> > > include/linux/psp-sev.h | 8 +-
> > > include/uapi/linux/kvm.h | 53 ++
> > > include/uapi/linux/kvm_para.h | 1 +
> > > 21 files changed, 1157 insertions(+), 10 deletions(-)
> > >
> > > --
> > > 2.17.1
> > >
I just did a fresh install of Linus's tree and i can install these
patches cleanly on top of the tree.
Thanks,
Ashish
On Mon, Mar 30, 2020 at 02:13:07PM -0500, Venu Busireddy wrote:
> On 2020-03-30 18:28:45 +0000, Ashish Kalra wrote:
> > This is applied on top of Linux 5.6, as per commit below :
> >
> > commit 7111951b8d4973bda27ff663f2cf18b663d15b48 (tag: v5.6, origin/master, origin/HEAD)
> > Author: Linus Torvalds <[email protected]>
> > Date: Sun Mar 29 15:25:41 2020 -0700
> >
> > Linux 5.6
> >
> > Makefile | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
>
> Not sure what I am missing here! This the current state of my sandbox:
>
> $ git remote -v
> origin git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git (fetch)
> origin git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git (push)
>
> $ git log --oneline
> 12acbbfef749 (HEAD -> master) KVM: SVM: Add KVM_SEV_SEND_FINISH command
> e5f21e48bfff KVM: SVM: Add KVM_SEND_UPDATE_DATA command
> 6b2bcf682d08 KVM: SVM: Add KVM_SEV SEND_START command
> 7111951b8d49 (tag: v5.6, origin/master, origin/HEAD) Linux 5.6
>
> $ git status
> On branch master
> Your branch is ahead of 'origin/master' by 3 commits.
>
> As can be seen, I started with the commit (7111951b8d49) you mentioned.
> I could apply 3 of the patches, but 04/14 is failing.
>
> Any suggestions?
>
> Thanks,
>
> Venu
>
> > Thanks,
> > Ashish
> >
> > On Mon, Mar 30, 2020 at 12:24:46PM -0500, Venu Busireddy wrote:
> > > On 2020-03-30 06:19:27 +0000, Ashish Kalra wrote:
> > > > From: Ashish Kalra <[email protected]>
> > > >
> > > > The series add support for AMD SEV guest live migration commands. To protect the
> > > > confidentiality of an SEV protected guest memory while in transit we need to
> > > > use the SEV commands defined in SEV API spec [1].
> > > >
> > > > SEV guest VMs have the concept of private and shared memory. Private memory
> > > > is encrypted with the guest-specific key, while shared memory may be encrypted
> > > > with hypervisor key. The commands provided by the SEV FW are meant to be used
> > > > for the private memory only. The patch series introduces a new hypercall.
> > > > The guest OS can use this hypercall to notify the page encryption status.
> > > > If the page is encrypted with guest specific-key then we use SEV command during
> > > > the migration. If page is not encrypted then fallback to default.
> > > >
> > > > The patch adds new ioctls KVM_{SET,GET}_PAGE_ENC_BITMAP. The ioctl can be used
> > > > by the qemu to get the page encrypted bitmap. Qemu can consult this bitmap
> > > > during the migration to know whether the page is encrypted.
> > > >
> > > > [1] https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdeveloper.amd.com%2Fwp-content%2Fresources%2F55766.PDF&data=02%7C01%7Cashish.kalra%40amd.com%7C2546be8861e3409b9d3408d7d4de683c%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637211924007781552&sdata=lYAGaXWFveawb7Fre8Qo7iGyKcLREiodSgQswMBirHc%3D&reserved=0
> > > >
> > > > Changes since v5:
> > > > - Fix build errors as
> > > > Reported-by: kbuild test robot <[email protected]>
> > >
> > > Which upstream tag should I use to apply this patch set? I tried the
> > > top of Linus's tree, and I get the following error when I apply this
> > > patch set.
> > >
> > > $ git am PATCH-v6-01-14-KVM-SVM-Add-KVM_SEV-SEND_START-command.mbox
> > > Applying: KVM: SVM: Add KVM_SEV SEND_START command
> > > Applying: KVM: SVM: Add KVM_SEND_UPDATE_DATA command
> > > Applying: KVM: SVM: Add KVM_SEV_SEND_FINISH command
> > > Applying: KVM: SVM: Add KVM_SEV_RECEIVE_FINISH command
> > > error: patch failed: Documentation/virt/kvm/amd-memory-encryption.rst:375
> > > error: Documentation/virt/kvm/amd-memory-encryption.rst: patch does not apply
> > > error: patch failed: arch/x86/kvm/svm.c:7632
> > > error: arch/x86/kvm/svm.c: patch does not apply
> > > Patch failed at 0004 KVM: SVM: Add KVM_SEV_RECEIVE_FINISH command
> > >
> > > Thanks,
> > >
> > > Venu
> > >
> > > >
> > > > Changes since v4:
> > > > - Host support has been added to extend KVM capabilities/feature bits to
> > > > include a new KVM_FEATURE_SEV_LIVE_MIGRATION, which the guest can
> > > > query for host-side support for SEV live migration and a new custom MSR
> > > > MSR_KVM_SEV_LIVE_MIG_EN is added for guest to enable the SEV live
> > > > migration feature.
> > > > - Ensure that _bss_decrypted section is marked as decrypted in the
> > > > page encryption bitmap.
> > > > - Fixing KVM_GET_PAGE_ENC_BITMAP ioctl to return the correct bitmap
> > > > as per the number of pages being requested by the user. Ensure that
> > > > we only copy bmap->num_pages bytes in the userspace buffer, if
> > > > bmap->num_pages is not byte aligned we read the trailing bits
> > > > from the userspace and copy those bits as is. This fixes guest
> > > > page(s) corruption issues observed after migration completion.
> > > > - Add kexec support for SEV Live Migration to reset the host's
> > > > page encryption bitmap related to kernel specific page encryption
> > > > status settings before we load a new kernel by kexec. We cannot
> > > > reset the complete page encryption bitmap here as we need to
> > > > retain the UEFI/OVMF firmware specific settings.
> > > >
> > > > Changes since v3:
> > > > - Rebasing to mainline and testing.
> > > > - Adding a new KVM_PAGE_ENC_BITMAP_RESET ioctl, which resets the
> > > > page encryption bitmap on a guest reboot event.
> > > > - Adding a more reliable sanity check for GPA range being passed to
> > > > the hypercall to ensure that guest MMIO ranges are also marked
> > > > in the page encryption bitmap.
> > > >
> > > > Changes since v2:
> > > > - reset the page encryption bitmap on vcpu reboot
> > > >
> > > > Changes since v1:
> > > > - Add support to share the page encryption between the source and target
> > > > machine.
> > > > - Fix review feedbacks from Tom Lendacky.
> > > > - Add check to limit the session blob length.
> > > > - Update KVM_GET_PAGE_ENC_BITMAP icotl to use the base_gfn instead of
> > > > the memory slot when querying the bitmap.
> > > >
> > > > Ashish Kalra (3):
> > > > KVM: x86: Introduce KVM_PAGE_ENC_BITMAP_RESET ioctl
> > > > KVM: x86: Introduce new KVM_FEATURE_SEV_LIVE_MIGRATION feature &
> > > > Custom MSR.
> > > > KVM: x86: Add kexec support for SEV Live Migration.
> > > >
> > > > Brijesh Singh (11):
> > > > KVM: SVM: Add KVM_SEV SEND_START command
> > > > KVM: SVM: Add KVM_SEND_UPDATE_DATA command
> > > > KVM: SVM: Add KVM_SEV_SEND_FINISH command
> > > > KVM: SVM: Add support for KVM_SEV_RECEIVE_START command
> > > > KVM: SVM: Add KVM_SEV_RECEIVE_UPDATE_DATA command
> > > > KVM: SVM: Add KVM_SEV_RECEIVE_FINISH command
> > > > KVM: x86: Add AMD SEV specific Hypercall3
> > > > KVM: X86: Introduce KVM_HC_PAGE_ENC_STATUS hypercall
> > > > KVM: x86: Introduce KVM_GET_PAGE_ENC_BITMAP ioctl
> > > > mm: x86: Invoke hypercall when page encryption status is changed
> > > > KVM: x86: Introduce KVM_SET_PAGE_ENC_BITMAP ioctl
> > > >
> > > > .../virt/kvm/amd-memory-encryption.rst | 120 +++
> > > > Documentation/virt/kvm/api.rst | 62 ++
> > > > Documentation/virt/kvm/cpuid.rst | 4 +
> > > > Documentation/virt/kvm/hypercalls.rst | 15 +
> > > > Documentation/virt/kvm/msr.rst | 10 +
> > > > arch/x86/include/asm/kvm_host.h | 10 +
> > > > arch/x86/include/asm/kvm_para.h | 12 +
> > > > arch/x86/include/asm/paravirt.h | 10 +
> > > > arch/x86/include/asm/paravirt_types.h | 2 +
> > > > arch/x86/include/uapi/asm/kvm_para.h | 5 +
> > > > arch/x86/kernel/kvm.c | 32 +
> > > > arch/x86/kernel/paravirt.c | 1 +
> > > > arch/x86/kvm/cpuid.c | 3 +-
> > > > arch/x86/kvm/svm.c | 699 +++++++++++++++++-
> > > > arch/x86/kvm/vmx/vmx.c | 1 +
> > > > arch/x86/kvm/x86.c | 43 ++
> > > > arch/x86/mm/mem_encrypt.c | 69 +-
> > > > arch/x86/mm/pat/set_memory.c | 7 +
> > > > include/linux/psp-sev.h | 8 +-
> > > > include/uapi/linux/kvm.h | 53 ++
> > > > include/uapi/linux/kvm_para.h | 1 +
> > > > 21 files changed, 1157 insertions(+), 10 deletions(-)
> > > >
> > > > --
> > > > 2.17.1
> > > >
On 3/30/20 11:45 AM, Ashish Kalra wrote:
> Hello Brijesh,
>
> On Mon, Mar 30, 2020 at 11:00:14AM -0500, Brijesh Singh wrote:
>> On 3/30/20 1:23 AM, Ashish Kalra wrote:
>>> From: Ashish Kalra <[email protected]>
>>>
>>> Reset the host's page encryption bitmap related to kernel
>>> specific page encryption status settings before we load a
>>> new kernel by kexec. We cannot reset the complete
>>> page encryption bitmap here as we need to retain the
>>> UEFI/OVMF firmware specific settings.
>>>
>>> Signed-off-by: Ashish Kalra <[email protected]>
>>> ---
>>> arch/x86/kernel/kvm.c | 28 ++++++++++++++++++++++++++++
>>> 1 file changed, 28 insertions(+)
>>>
>>> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
>>> index 8fcee0b45231..ba6cce3c84af 100644
>>> --- a/arch/x86/kernel/kvm.c
>>> +++ b/arch/x86/kernel/kvm.c
>>> @@ -34,6 +34,7 @@
>>> #include <asm/hypervisor.h>
>>> #include <asm/tlb.h>
>>> #include <asm/cpuidle_haltpoll.h>
>>> +#include <asm/e820/api.h>
>>>
>>> static int kvmapf = 1;
>>>
>>> @@ -357,6 +358,33 @@ static void kvm_pv_guest_cpu_reboot(void *unused)
>>> */
>>> if (kvm_para_has_feature(KVM_FEATURE_PV_EOI))
>>> wrmsrl(MSR_KVM_PV_EOI_EN, 0);
>>> + /*
>>> + * Reset the host's page encryption bitmap related to kernel
>>> + * specific page encryption status settings before we load a
>>> + * new kernel by kexec. NOTE: We cannot reset the complete
>>> + * page encryption bitmap here as we need to retain the
>>> + * UEFI/OVMF firmware specific settings.
>>> + */
>>> + if (kvm_para_has_feature(KVM_FEATURE_SEV_LIVE_MIGRATION) &&
>>> + (smp_processor_id() == 0)) {
>>
>> In patch 13/14, the KVM_FEATURE_SEV_LIVE_MIGRATION is set
>> unconditionally and because of that now the below code will be executed
>> on non-SEV guest. IMO, this feature must be cleared for non-SEV guest to
>> avoid making unnecessary hypercall's.
>>
>>
> I will additionally add a sev_active() check here to ensure that we don't make the unnecassary hypercalls on non-SEV guests.
IMO, instead of using the sev_active() we should make sure that the
feature is not enabled when SEV is not active.
>>> + unsigned long nr_pages;
>>> + int i;
>>> +
>>> + for (i = 0; i < e820_table->nr_entries; i++) {
>>> + struct e820_entry *entry = &e820_table->entries[i];
>>> + unsigned long start_pfn, end_pfn;
>>> +
>>> + if (entry->type != E820_TYPE_RAM)
>>> + continue;
>>> +
>>> + start_pfn = entry->addr >> PAGE_SHIFT;
>>> + end_pfn = (entry->addr + entry->size) >> PAGE_SHIFT;
>>> + nr_pages = DIV_ROUND_UP(entry->size, PAGE_SIZE);
>>> +
>>> + kvm_sev_hypercall3(KVM_HC_PAGE_ENC_STATUS,
>>> + entry->addr, nr_pages, 1);
>>> + }
>>> + }
>>> kvm_pv_disable_apf();
>>> kvm_disable_steal_time();
>>> }
> Thanks,
> Ashish
On 2020-03-30 21:52:45 +0000, Ashish Kalra wrote:
> I just did a fresh install of Linus's tree and i can install these
> patches cleanly on top of the tree.
Figured out what the problem was. Though the patches are listed sorted at
https://lore.kernel.org/kvm/[email protected]/,
the patches inside
https://lore.kernel.org/kvm/[email protected]/t.mbox.gz
are not sorted (sequential). Hence, they were being applied out of
order by 'git am ....mbox', which caused the error. I had to edit the
mbox file by hand, or use a tool such as b4 (suggested by a colleague),
to sort the patches in the mbox file. Once that is done, I was able to
apply the entire patch set to my code base.
Thanks,
Venu
>
> Thanks,
> Ashish
>
> On Mon, Mar 30, 2020 at 02:13:07PM -0500, Venu Busireddy wrote:
> > On 2020-03-30 18:28:45 +0000, Ashish Kalra wrote:
> > > This is applied on top of Linux 5.6, as per commit below :
> > >
> > > commit 7111951b8d4973bda27ff663f2cf18b663d15b48 (tag: v5.6, origin/master, origin/HEAD)
> > > Author: Linus Torvalds <[email protected]>
> > > Date: Sun Mar 29 15:25:41 2020 -0700
> > >
> > > Linux 5.6
> > >
> > > Makefile | 2 +-
> > > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > Not sure what I am missing here! This the current state of my sandbox:
> >
> > $ git remote -v
> > origin git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git (fetch)
> > origin git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git (push)
> >
> > $ git log --oneline
> > 12acbbfef749 (HEAD -> master) KVM: SVM: Add KVM_SEV_SEND_FINISH command
> > e5f21e48bfff KVM: SVM: Add KVM_SEND_UPDATE_DATA command
> > 6b2bcf682d08 KVM: SVM: Add KVM_SEV SEND_START command
> > 7111951b8d49 (tag: v5.6, origin/master, origin/HEAD) Linux 5.6
> >
> > $ git status
> > On branch master
> > Your branch is ahead of 'origin/master' by 3 commits.
> >
> > As can be seen, I started with the commit (7111951b8d49) you mentioned.
> > I could apply 3 of the patches, but 04/14 is failing.
> >
> > Any suggestions?
> >
> > Thanks,
> >
> > Venu
> >
> > > Thanks,
> > > Ashish
> > >
> > > On Mon, Mar 30, 2020 at 12:24:46PM -0500, Venu Busireddy wrote:
> > > > On 2020-03-30 06:19:27 +0000, Ashish Kalra wrote:
> > > > > From: Ashish Kalra <[email protected]>
> > > > >
> > > > > The series add support for AMD SEV guest live migration commands. To protect the
> > > > > confidentiality of an SEV protected guest memory while in transit we need to
> > > > > use the SEV commands defined in SEV API spec [1].
> > > > >
> > > > > SEV guest VMs have the concept of private and shared memory. Private memory
> > > > > is encrypted with the guest-specific key, while shared memory may be encrypted
> > > > > with hypervisor key. The commands provided by the SEV FW are meant to be used
> > > > > for the private memory only. The patch series introduces a new hypercall.
> > > > > The guest OS can use this hypercall to notify the page encryption status.
> > > > > If the page is encrypted with guest specific-key then we use SEV command during
> > > > > the migration. If page is not encrypted then fallback to default.
> > > > >
> > > > > The patch adds new ioctls KVM_{SET,GET}_PAGE_ENC_BITMAP. The ioctl can be used
> > > > > by the qemu to get the page encrypted bitmap. Qemu can consult this bitmap
> > > > > during the migration to know whether the page is encrypted.
> > > > >
> > > > > [1] https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdeveloper.amd.com%2Fwp-content%2Fresources%2F55766.PDF&data=02%7C01%7Cashish.kalra%40amd.com%7C2546be8861e3409b9d3408d7d4de683c%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637211924007781552&sdata=lYAGaXWFveawb7Fre8Qo7iGyKcLREiodSgQswMBirHc%3D&reserved=0
> > > > >
> > > > > Changes since v5:
> > > > > - Fix build errors as
> > > > > Reported-by: kbuild test robot <[email protected]>
> > > >
> > > > Which upstream tag should I use to apply this patch set? I tried the
> > > > top of Linus's tree, and I get the following error when I apply this
> > > > patch set.
> > > >
> > > > $ git am PATCH-v6-01-14-KVM-SVM-Add-KVM_SEV-SEND_START-command.mbox
> > > > Applying: KVM: SVM: Add KVM_SEV SEND_START command
> > > > Applying: KVM: SVM: Add KVM_SEND_UPDATE_DATA command
> > > > Applying: KVM: SVM: Add KVM_SEV_SEND_FINISH command
> > > > Applying: KVM: SVM: Add KVM_SEV_RECEIVE_FINISH command
> > > > error: patch failed: Documentation/virt/kvm/amd-memory-encryption.rst:375
> > > > error: Documentation/virt/kvm/amd-memory-encryption.rst: patch does not apply
> > > > error: patch failed: arch/x86/kvm/svm.c:7632
> > > > error: arch/x86/kvm/svm.c: patch does not apply
> > > > Patch failed at 0004 KVM: SVM: Add KVM_SEV_RECEIVE_FINISH command
> > > >
> > > > Thanks,
> > > >
> > > > Venu
> > > >
> > > > >
> > > > > Changes since v4:
> > > > > - Host support has been added to extend KVM capabilities/feature bits to
> > > > > include a new KVM_FEATURE_SEV_LIVE_MIGRATION, which the guest can
> > > > > query for host-side support for SEV live migration and a new custom MSR
> > > > > MSR_KVM_SEV_LIVE_MIG_EN is added for guest to enable the SEV live
> > > > > migration feature.
> > > > > - Ensure that _bss_decrypted section is marked as decrypted in the
> > > > > page encryption bitmap.
> > > > > - Fixing KVM_GET_PAGE_ENC_BITMAP ioctl to return the correct bitmap
> > > > > as per the number of pages being requested by the user. Ensure that
> > > > > we only copy bmap->num_pages bytes in the userspace buffer, if
> > > > > bmap->num_pages is not byte aligned we read the trailing bits
> > > > > from the userspace and copy those bits as is. This fixes guest
> > > > > page(s) corruption issues observed after migration completion.
> > > > > - Add kexec support for SEV Live Migration to reset the host's
> > > > > page encryption bitmap related to kernel specific page encryption
> > > > > status settings before we load a new kernel by kexec. We cannot
> > > > > reset the complete page encryption bitmap here as we need to
> > > > > retain the UEFI/OVMF firmware specific settings.
> > > > >
> > > > > Changes since v3:
> > > > > - Rebasing to mainline and testing.
> > > > > - Adding a new KVM_PAGE_ENC_BITMAP_RESET ioctl, which resets the
> > > > > page encryption bitmap on a guest reboot event.
> > > > > - Adding a more reliable sanity check for GPA range being passed to
> > > > > the hypercall to ensure that guest MMIO ranges are also marked
> > > > > in the page encryption bitmap.
> > > > >
> > > > > Changes since v2:
> > > > > - reset the page encryption bitmap on vcpu reboot
> > > > >
> > > > > Changes since v1:
> > > > > - Add support to share the page encryption between the source and target
> > > > > machine.
> > > > > - Fix review feedbacks from Tom Lendacky.
> > > > > - Add check to limit the session blob length.
> > > > > - Update KVM_GET_PAGE_ENC_BITMAP icotl to use the base_gfn instead of
> > > > > the memory slot when querying the bitmap.
> > > > >
> > > > > Ashish Kalra (3):
> > > > > KVM: x86: Introduce KVM_PAGE_ENC_BITMAP_RESET ioctl
> > > > > KVM: x86: Introduce new KVM_FEATURE_SEV_LIVE_MIGRATION feature &
> > > > > Custom MSR.
> > > > > KVM: x86: Add kexec support for SEV Live Migration.
> > > > >
> > > > > Brijesh Singh (11):
> > > > > KVM: SVM: Add KVM_SEV SEND_START command
> > > > > KVM: SVM: Add KVM_SEND_UPDATE_DATA command
> > > > > KVM: SVM: Add KVM_SEV_SEND_FINISH command
> > > > > KVM: SVM: Add support for KVM_SEV_RECEIVE_START command
> > > > > KVM: SVM: Add KVM_SEV_RECEIVE_UPDATE_DATA command
> > > > > KVM: SVM: Add KVM_SEV_RECEIVE_FINISH command
> > > > > KVM: x86: Add AMD SEV specific Hypercall3
> > > > > KVM: X86: Introduce KVM_HC_PAGE_ENC_STATUS hypercall
> > > > > KVM: x86: Introduce KVM_GET_PAGE_ENC_BITMAP ioctl
> > > > > mm: x86: Invoke hypercall when page encryption status is changed
> > > > > KVM: x86: Introduce KVM_SET_PAGE_ENC_BITMAP ioctl
> > > > >
> > > > > .../virt/kvm/amd-memory-encryption.rst | 120 +++
> > > > > Documentation/virt/kvm/api.rst | 62 ++
> > > > > Documentation/virt/kvm/cpuid.rst | 4 +
> > > > > Documentation/virt/kvm/hypercalls.rst | 15 +
> > > > > Documentation/virt/kvm/msr.rst | 10 +
> > > > > arch/x86/include/asm/kvm_host.h | 10 +
> > > > > arch/x86/include/asm/kvm_para.h | 12 +
> > > > > arch/x86/include/asm/paravirt.h | 10 +
> > > > > arch/x86/include/asm/paravirt_types.h | 2 +
> > > > > arch/x86/include/uapi/asm/kvm_para.h | 5 +
> > > > > arch/x86/kernel/kvm.c | 32 +
> > > > > arch/x86/kernel/paravirt.c | 1 +
> > > > > arch/x86/kvm/cpuid.c | 3 +-
> > > > > arch/x86/kvm/svm.c | 699 +++++++++++++++++-
> > > > > arch/x86/kvm/vmx/vmx.c | 1 +
> > > > > arch/x86/kvm/x86.c | 43 ++
> > > > > arch/x86/mm/mem_encrypt.c | 69 +-
> > > > > arch/x86/mm/pat/set_memory.c | 7 +
> > > > > include/linux/psp-sev.h | 8 +-
> > > > > include/uapi/linux/kvm.h | 53 ++
> > > > > include/uapi/linux/kvm_para.h | 1 +
> > > > > 21 files changed, 1157 insertions(+), 10 deletions(-)
> > > > >
> > > > > --
> > > > > 2.17.1
> > > > >
On 2020-03-30 06:20:33 +0000, Ashish Kalra wrote:
> From: Brijesh Singh <[email protected]>
>
> The command is used for encrypting the guest memory region using the encryption
> context created with KVM_SEV_SEND_START.
>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: "H. Peter Anvin" <[email protected]>
> Cc: Paolo Bonzini <[email protected]>
> Cc: "Radim Krčmář" <[email protected]>
> Cc: Joerg Roedel <[email protected]>
> Cc: Borislav Petkov <[email protected]>
> Cc: Tom Lendacky <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Reviewed-by : Steve Rutherford <[email protected]>
> Signed-off-by: Brijesh Singh <[email protected]>
> Signed-off-by: Ashish Kalra <[email protected]>
> ---
> .../virt/kvm/amd-memory-encryption.rst | 24 ++++
> arch/x86/kvm/svm.c | 136 +++++++++++++++++-
> include/uapi/linux/kvm.h | 9 ++
> 3 files changed, 165 insertions(+), 4 deletions(-)
>
> diff --git a/Documentation/virt/kvm/amd-memory-encryption.rst b/Documentation/virt/kvm/amd-memory-encryption.rst
> index 4fd34fc5c7a7..f46817ef7019 100644
> --- a/Documentation/virt/kvm/amd-memory-encryption.rst
> +++ b/Documentation/virt/kvm/amd-memory-encryption.rst
> @@ -290,6 +290,30 @@ Returns: 0 on success, -negative on error
> __u32 session_len;
> };
>
> +11. KVM_SEV_SEND_UPDATE_DATA
> +----------------------------
> +
> +The KVM_SEV_SEND_UPDATE_DATA command can be used by the hypervisor to encrypt the
> +outgoing guest memory region with the encryption context creating using
> +KVM_SEV_SEND_START.
> +
> +Parameters (in): struct kvm_sev_send_update_data
> +
> +Returns: 0 on success, -negative on error
> +
> +::
> +
> + struct kvm_sev_launch_send_update_data {
> + __u64 hdr_uaddr; /* userspace address containing the packet header */
> + __u32 hdr_len;
> +
> + __u64 guest_uaddr; /* the source memory region to be encrypted */
> + __u32 guest_len;
> +
> + __u64 trans_uaddr; /* the destition memory region */
> + __u32 trans_len;
> + };
> +
> References
> ==========
>
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index 63d172e974ad..8561c47cc4f9 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -428,6 +428,7 @@ static DECLARE_RWSEM(sev_deactivate_lock);
> static DEFINE_MUTEX(sev_bitmap_lock);
> static unsigned int max_sev_asid;
> static unsigned int min_sev_asid;
> +static unsigned long sev_me_mask;
> static unsigned long *sev_asid_bitmap;
> static unsigned long *sev_reclaim_asid_bitmap;
> #define __sme_page_pa(x) __sme_set(page_to_pfn(x) << PAGE_SHIFT)
> @@ -1232,16 +1233,22 @@ static int avic_ga_log_notifier(u32 ga_tag)
> static __init int sev_hardware_setup(void)
> {
> struct sev_user_data_status *status;
> + u32 eax, ebx;
> int rc;
>
> - /* Maximum number of encrypted guests supported simultaneously */
> - max_sev_asid = cpuid_ecx(0x8000001F);
> + /*
> + * Query the memory encryption information.
> + * EBX: Bit 0:5 Pagetable bit position used to indicate encryption
> + * (aka Cbit).
> + * ECX: Maximum number of encrypted guests supported simultaneously.
> + * EDX: Minimum ASID value that should be used for SEV guest.
> + */
> + cpuid(0x8000001f, &eax, &ebx, &max_sev_asid, &min_sev_asid);
>
> if (!max_sev_asid)
> return 1;
>
> - /* Minimum ASID value that should be used for SEV guest */
> - min_sev_asid = cpuid_edx(0x8000001F);
> + sev_me_mask = 1UL << (ebx & 0x3f);
>
> /* Initialize SEV ASID bitmaps */
> sev_asid_bitmap = bitmap_zalloc(max_sev_asid, GFP_KERNEL);
> @@ -7274,6 +7281,124 @@ static int sev_send_start(struct kvm *kvm, struct kvm_sev_cmd *argp)
> return ret;
> }
>
> +/* Userspace wants to query either header or trans length. */
> +static int
> +__sev_send_update_data_query_lengths(struct kvm *kvm, struct kvm_sev_cmd *argp,
> + struct kvm_sev_send_update_data *params)
> +{
> + struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> + struct sev_data_send_update_data *data;
> + int ret;
> +
> + data = kzalloc(sizeof(*data), GFP_KERNEL_ACCOUNT);
> + if (!data)
> + return -ENOMEM;
> +
> + data->handle = sev->handle;
> + ret = sev_issue_cmd(kvm, SEV_CMD_SEND_UPDATE_DATA, data, &argp->error);
> +
> + params->hdr_len = data->hdr_len;
> + params->trans_len = data->trans_len;
> +
> + if (copy_to_user((void __user *)(uintptr_t)argp->data, params,
> + sizeof(struct kvm_sev_send_update_data)))
> + ret = -EFAULT;
> +
> + kfree(data);
> + return ret;
> +}
> +
> +static int sev_send_update_data(struct kvm *kvm, struct kvm_sev_cmd *argp)
> +{
> + struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> + struct sev_data_send_update_data *data;
> + struct kvm_sev_send_update_data params;
> + void *hdr, *trans_data;
> + struct page **guest_page;
> + unsigned long n;
> + int ret, offset;
> +
> + if (!sev_guest(kvm))
> + return -ENOTTY;
> +
> + if (copy_from_user(¶ms, (void __user *)(uintptr_t)argp->data,
> + sizeof(struct kvm_sev_send_update_data)))
> + return -EFAULT;
> +
> + /* userspace wants to query either header or trans length */
> + if (!params.trans_len || !params.hdr_len)
> + return __sev_send_update_data_query_lengths(kvm, argp, ¶ms);
> +
> + if (!params.trans_uaddr || !params.guest_uaddr ||
> + !params.guest_len || !params.hdr_uaddr)
> + return -EINVAL;
> +
> +
> + /* Check if we are crossing the page boundary */
> + offset = params.guest_uaddr & (PAGE_SIZE - 1);
> + if ((params.guest_len + offset > PAGE_SIZE))
> + return -EINVAL;
> +
> + /* Pin guest memory */
> + guest_page = sev_pin_memory(kvm, params.guest_uaddr & PAGE_MASK,
> + PAGE_SIZE, &n, 0);
> + if (!guest_page)
> + return -EFAULT;
> +
> + /* allocate memory for header and transport buffer */
> + ret = -ENOMEM;
> + hdr = kmalloc(params.hdr_len, GFP_KERNEL_ACCOUNT);
> + if (!hdr)
> + goto e_unpin;
> +
> + trans_data = kmalloc(params.trans_len, GFP_KERNEL_ACCOUNT);
> + if (!trans_data)
> + goto e_free_hdr;
> +
> + data = kzalloc(sizeof(*data), GFP_KERNEL);
> + if (!data)
> + goto e_free_trans_data;
> +
> + data->hdr_address = __psp_pa(hdr);
> + data->hdr_len = params.hdr_len;
> + data->trans_address = __psp_pa(trans_data);
> + data->trans_len = params.trans_len;
> +
> + /* The SEND_UPDATE_DATA command requires C-bit to be always set. */
> + data->guest_address = (page_to_pfn(guest_page[0]) << PAGE_SHIFT) +
> + offset;
> + data->guest_address |= sev_me_mask;
> + data->guest_len = params.guest_len;
> + data->handle = sev->handle;
> +
> + ret = sev_issue_cmd(kvm, SEV_CMD_SEND_UPDATE_DATA, data, &argp->error);
> +
> + if (ret)
> + goto e_free;
> +
> + /* copy transport buffer to user space */
> + if (copy_to_user((void __user *)(uintptr_t)params.trans_uaddr,
> + trans_data, params.trans_len)) {
> + ret = -EFAULT;
> + goto e_unpin;
Shouldn't this be
goto e_free;
?
> + }
> +
> + /* Copy packet header to userspace. */
> + ret = copy_to_user((void __user *)(uintptr_t)params.hdr_uaddr, hdr,
> + params.hdr_len);
> +
> +e_free:
> + kfree(data);
> +e_free_trans_data:
> + kfree(trans_data);
> +e_free_hdr:
> + kfree(hdr);
> +e_unpin:
> + sev_unpin_memory(kvm, guest_page, n);
> +
> + return ret;
> +}
> +
> static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
> {
> struct kvm_sev_cmd sev_cmd;
> @@ -7321,6 +7446,9 @@ static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
> case KVM_SEV_SEND_START:
> r = sev_send_start(kvm, &sev_cmd);
> break;
> + case KVM_SEV_SEND_UPDATE_DATA:
> + r = sev_send_update_data(kvm, &sev_cmd);
> + break;
> default:
> r = -EINVAL;
> goto out;
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 17bef4c245e1..d9dc81bb9c55 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -1570,6 +1570,15 @@ struct kvm_sev_send_start {
> __u32 session_len;
> };
>
> +struct kvm_sev_send_update_data {
> + __u64 hdr_uaddr;
> + __u32 hdr_len;
> + __u64 guest_uaddr;
> + __u32 guest_len;
> + __u64 trans_uaddr;
> + __u32 trans_len;
> +};
> +
> #define KVM_DEV_ASSIGN_ENABLE_IOMMU (1 << 0)
> #define KVM_DEV_ASSIGN_PCI_2_3 (1 << 1)
> #define KVM_DEV_ASSIGN_MASK_INTX (1 << 2)
> --
> 2.17.1
>
On 2020-03-30 06:20:49 +0000, Ashish Kalra wrote:
> From: Brijesh Singh <[email protected]>
>
> The command is used to finailize the encryption context created with
> KVM_SEV_SEND_START command.
>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: "H. Peter Anvin" <[email protected]>
> Cc: Paolo Bonzini <[email protected]>
> Cc: "Radim Krčmář" <[email protected]>
> Cc: Joerg Roedel <[email protected]>
> Cc: Borislav Petkov <[email protected]>
> Cc: Tom Lendacky <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Reviewed-by: Steve Rutherford <[email protected]>
> Signed-off-by: Brijesh Singh <[email protected]>
> Signed-off-by: Ashish Kalra <[email protected]>
> ---
> .../virt/kvm/amd-memory-encryption.rst | 8 +++++++
> arch/x86/kvm/svm.c | 23 +++++++++++++++++++
> 2 files changed, 31 insertions(+)
>
> diff --git a/Documentation/virt/kvm/amd-memory-encryption.rst b/Documentation/virt/kvm/amd-memory-encryption.rst
> index f46817ef7019..a45dcb5f8687 100644
> --- a/Documentation/virt/kvm/amd-memory-encryption.rst
> +++ b/Documentation/virt/kvm/amd-memory-encryption.rst
> @@ -314,6 +314,14 @@ Returns: 0 on success, -negative on error
> __u32 trans_len;
> };
>
> +12. KVM_SEV_SEND_FINISH
> +------------------------
> +
> +After completion of the migration flow, the KVM_SEV_SEND_FINISH command can be
> +issued by the hypervisor to delete the encryption context.
> +
> +Returns: 0 on success, -negative on error
Didn't notice this earlier. I would suggest changing all occurrences of
"-negative" to either "negative" or "less than 0" in this file.
> +
> References
> ==========
>
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index 8561c47cc4f9..71a4cb3b817d 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -7399,6 +7399,26 @@ static int sev_send_update_data(struct kvm *kvm, struct kvm_sev_cmd *argp)
> return ret;
> }
>
> +static int sev_send_finish(struct kvm *kvm, struct kvm_sev_cmd *argp)
> +{
> + struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> + struct sev_data_send_finish *data;
> + int ret;
> +
> + if (!sev_guest(kvm))
> + return -ENOTTY;
> +
> + data = kzalloc(sizeof(*data), GFP_KERNEL);
> + if (!data)
> + return -ENOMEM;
> +
> + data->handle = sev->handle;
> + ret = sev_issue_cmd(kvm, SEV_CMD_SEND_FINISH, data, &argp->error);
> +
> + kfree(data);
> + return ret;
> +}
> +
> static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
> {
> struct kvm_sev_cmd sev_cmd;
> @@ -7449,6 +7469,9 @@ static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
> case KVM_SEV_SEND_UPDATE_DATA:
> r = sev_send_update_data(kvm, &sev_cmd);
> break;
> + case KVM_SEV_SEND_FINISH:
> + r = sev_send_finish(kvm, &sev_cmd);
> + break;
> default:
> r = -EINVAL;
> goto out;
> --
> 2.17.1
>
On 3/29/20 11:20 PM, Ashish Kalra wrote:
> From: Brijesh Singh <[email protected]>
>
> The command is used to finailize the encryption context created with
> KVM_SEV_SEND_START command.
>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: "H. Peter Anvin" <[email protected]>
> Cc: Paolo Bonzini <[email protected]>
> Cc: "Radim Krčmář" <[email protected]>
> Cc: Joerg Roedel <[email protected]>
> Cc: Borislav Petkov <[email protected]>
> Cc: Tom Lendacky <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Reviewed-by: Steve Rutherford <[email protected]>
> Signed-off-by: Brijesh Singh <[email protected]>
> Signed-off-by: Ashish Kalra <[email protected]>
> ---
> .../virt/kvm/amd-memory-encryption.rst | 8 +++++++
> arch/x86/kvm/svm.c | 23 +++++++++++++++++++
> 2 files changed, 31 insertions(+)
>
> diff --git a/Documentation/virt/kvm/amd-memory-encryption.rst b/Documentation/virt/kvm/amd-memory-encryption.rst
> index f46817ef7019..a45dcb5f8687 100644
> --- a/Documentation/virt/kvm/amd-memory-encryption.rst
> +++ b/Documentation/virt/kvm/amd-memory-encryption.rst
> @@ -314,6 +314,14 @@ Returns: 0 on success, -negative on error
> __u32 trans_len;
> };
>
> +12. KVM_SEV_SEND_FINISH
> +------------------------
> +
> +After completion of the migration flow, the KVM_SEV_SEND_FINISH command can be
> +issued by the hypervisor to delete the encryption context.
> +
> +Returns: 0 on success, -negative on error
> +
> References
> ==========
>
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index 8561c47cc4f9..71a4cb3b817d 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -7399,6 +7399,26 @@ static int sev_send_update_data(struct kvm *kvm, struct kvm_sev_cmd *argp)
> return ret;
> }
>
> +static int sev_send_finish(struct kvm *kvm, struct kvm_sev_cmd *argp)
> +{
> + struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> + struct sev_data_send_finish *data;
> + int ret;
> +
> + if (!sev_guest(kvm))
> + return -ENOTTY;
> +
> + data = kzalloc(sizeof(*data), GFP_KERNEL);
> + if (!data)
> + return -ENOMEM;
> +
> + data->handle = sev->handle;
> + ret = sev_issue_cmd(kvm, SEV_CMD_SEND_FINISH, data, &argp->error);
> +
> + kfree(data);
> + return ret;
> +}
> +
> static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
> {
> struct kvm_sev_cmd sev_cmd;
> @@ -7449,6 +7469,9 @@ static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
> case KVM_SEV_SEND_UPDATE_DATA:
> r = sev_send_update_data(kvm, &sev_cmd);
> break;
> + case KVM_SEV_SEND_FINISH:
> + r = sev_send_finish(kvm, &sev_cmd);
> + break;
> default:
> r = -EINVAL;
> goto out;
Reviewed-by: Krish Sadhukhan <[email protected]>
On 3/29/20 11:20 PM, Ashish Kalra wrote:
> From: Brijesh Singh <[email protected]>
>
> The command is used for encrypting the guest memory region using the encryption
> context created with KVM_SEV_SEND_START.
>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: "H. Peter Anvin" <[email protected]>
> Cc: Paolo Bonzini <[email protected]>
> Cc: "Radim Krčmář" <[email protected]>
> Cc: Joerg Roedel <[email protected]>
> Cc: Borislav Petkov <[email protected]>
> Cc: Tom Lendacky <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Reviewed-by : Steve Rutherford <[email protected]>
> Signed-off-by: Brijesh Singh <[email protected]>
> Signed-off-by: Ashish Kalra <[email protected]>
> ---
> .../virt/kvm/amd-memory-encryption.rst | 24 ++++
> arch/x86/kvm/svm.c | 136 +++++++++++++++++-
> include/uapi/linux/kvm.h | 9 ++
> 3 files changed, 165 insertions(+), 4 deletions(-)
>
> diff --git a/Documentation/virt/kvm/amd-memory-encryption.rst b/Documentation/virt/kvm/amd-memory-encryption.rst
> index 4fd34fc5c7a7..f46817ef7019 100644
> --- a/Documentation/virt/kvm/amd-memory-encryption.rst
> +++ b/Documentation/virt/kvm/amd-memory-encryption.rst
> @@ -290,6 +290,30 @@ Returns: 0 on success, -negative on error
> __u32 session_len;
> };
>
> +11. KVM_SEV_SEND_UPDATE_DATA
> +----------------------------
> +
> +The KVM_SEV_SEND_UPDATE_DATA command can be used by the hypervisor to encrypt the
> +outgoing guest memory region with the encryption context creating using
> +KVM_SEV_SEND_START.
> +
> +Parameters (in): struct kvm_sev_send_update_data
> +
> +Returns: 0 on success, -negative on error
> +
> +::
> +
> + struct kvm_sev_launch_send_update_data {
> + __u64 hdr_uaddr; /* userspace address containing the packet header */
> + __u32 hdr_len;
> +
> + __u64 guest_uaddr; /* the source memory region to be encrypted */
> + __u32 guest_len;
> +
> + __u64 trans_uaddr; /* the destition memory region */
> + __u32 trans_len;
> + };
> +
> References
> ==========
>
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index 63d172e974ad..8561c47cc4f9 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -428,6 +428,7 @@ static DECLARE_RWSEM(sev_deactivate_lock);
> static DEFINE_MUTEX(sev_bitmap_lock);
> static unsigned int max_sev_asid;
> static unsigned int min_sev_asid;
> +static unsigned long sev_me_mask;
> static unsigned long *sev_asid_bitmap;
> static unsigned long *sev_reclaim_asid_bitmap;
> #define __sme_page_pa(x) __sme_set(page_to_pfn(x) << PAGE_SHIFT)
> @@ -1232,16 +1233,22 @@ static int avic_ga_log_notifier(u32 ga_tag)
> static __init int sev_hardware_setup(void)
> {
> struct sev_user_data_status *status;
> + u32 eax, ebx;
> int rc;
>
> - /* Maximum number of encrypted guests supported simultaneously */
> - max_sev_asid = cpuid_ecx(0x8000001F);
> + /*
> + * Query the memory encryption information.
> + * EBX: Bit 0:5 Pagetable bit position used to indicate encryption
> + * (aka Cbit).
> + * ECX: Maximum number of encrypted guests supported simultaneously.
> + * EDX: Minimum ASID value that should be used for SEV guest.
> + */
> + cpuid(0x8000001f, &eax, &ebx, &max_sev_asid, &min_sev_asid);
Will max_sev_asid and the max number of guests supported be the same
number always ?
>
> if (!max_sev_asid)
> return 1;
>
> - /* Minimum ASID value that should be used for SEV guest */
> - min_sev_asid = cpuid_edx(0x8000001F);
> + sev_me_mask = 1UL << (ebx & 0x3f);
>
> /* Initialize SEV ASID bitmaps */
> sev_asid_bitmap = bitmap_zalloc(max_sev_asid, GFP_KERNEL);
> @@ -7274,6 +7281,124 @@ static int sev_send_start(struct kvm *kvm, struct kvm_sev_cmd *argp)
> return ret;
> }
>
> +/* Userspace wants to query either header or trans length. */
> +static int
> +__sev_send_update_data_query_lengths(struct kvm *kvm, struct kvm_sev_cmd *argp,
> + struct kvm_sev_send_update_data *params)
> +{
> + struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> + struct sev_data_send_update_data *data;
> + int ret;
> +
> + data = kzalloc(sizeof(*data), GFP_KERNEL_ACCOUNT);
> + if (!data)
> + return -ENOMEM;
> +
> + data->handle = sev->handle;
> + ret = sev_issue_cmd(kvm, SEV_CMD_SEND_UPDATE_DATA, data, &argp->error);
> +
> + params->hdr_len = data->hdr_len;
> + params->trans_len = data->trans_len;
> +
> + if (copy_to_user((void __user *)(uintptr_t)argp->data, params,
> + sizeof(struct kvm_sev_send_update_data)))
> + ret = -EFAULT;
> +
> + kfree(data);
> + return ret;
> +}
> +
> +static int sev_send_update_data(struct kvm *kvm, struct kvm_sev_cmd *argp)
> +{
> + struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> + struct sev_data_send_update_data *data;
> + struct kvm_sev_send_update_data params;
> + void *hdr, *trans_data;
> + struct page **guest_page;
> + unsigned long n;
> + int ret, offset;
> +
> + if (!sev_guest(kvm))
> + return -ENOTTY;
Do we need to check the following conditions here ?
"The platform must be in the PSTATE.WORKING state.
The guest must be in the GSTATE.SUPDATE state."
> +
> + if (copy_from_user(¶ms, (void __user *)(uintptr_t)argp->data,
> + sizeof(struct kvm_sev_send_update_data)))
> + return -EFAULT;
> +
> + /* userspace wants to query either header or trans length */
> + if (!params.trans_len || !params.hdr_len)
> + return __sev_send_update_data_query_lengths(kvm, argp, ¶ms);
> +
> + if (!params.trans_uaddr || !params.guest_uaddr ||
> + !params.guest_len || !params.hdr_uaddr)
> + return -EINVAL;
> +
> +
> + /* Check if we are crossing the page boundary */
> + offset = params.guest_uaddr & (PAGE_SIZE - 1);
> + if ((params.guest_len + offset > PAGE_SIZE))
> + return -EINVAL;
> +
> + /* Pin guest memory */
> + guest_page = sev_pin_memory(kvm, params.guest_uaddr & PAGE_MASK,
> + PAGE_SIZE, &n, 0);
> + if (!guest_page)
> + return -EFAULT;
> +
> + /* allocate memory for header and transport buffer */
> + ret = -ENOMEM;
> + hdr = kmalloc(params.hdr_len, GFP_KERNEL_ACCOUNT);
> + if (!hdr)
> + goto e_unpin;
> +
> + trans_data = kmalloc(params.trans_len, GFP_KERNEL_ACCOUNT);
> + if (!trans_data)
> + goto e_free_hdr;
> +
> + data = kzalloc(sizeof(*data), GFP_KERNEL);
> + if (!data)
> + goto e_free_trans_data;
> +
> + data->hdr_address = __psp_pa(hdr);
> + data->hdr_len = params.hdr_len;
> + data->trans_address = __psp_pa(trans_data);
> + data->trans_len = params.trans_len;
> +
> + /* The SEND_UPDATE_DATA command requires C-bit to be always set. */
> + data->guest_address = (page_to_pfn(guest_page[0]) << PAGE_SHIFT) +
> + offset;
> + data->guest_address |= sev_me_mask;
Why not name the variable 'sev_cbit_mask' instead of sev_me_mask ?
> + data->guest_len = params.guest_len;
> + data->handle = sev->handle;
> +
> + ret = sev_issue_cmd(kvm, SEV_CMD_SEND_UPDATE_DATA, data, &argp->error);
> +
> + if (ret)
> + goto e_free;
> +
> + /* copy transport buffer to user space */
> + if (copy_to_user((void __user *)(uintptr_t)params.trans_uaddr,
> + trans_data, params.trans_len)) {
> + ret = -EFAULT;
> + goto e_unpin;
> + }
> +
> + /* Copy packet header to userspace. */
> + ret = copy_to_user((void __user *)(uintptr_t)params.hdr_uaddr, hdr,
> + params.hdr_len);
> +
> +e_free:
> + kfree(data);
> +e_free_trans_data:
> + kfree(trans_data);
> +e_free_hdr:
> + kfree(hdr);
> +e_unpin:
> + sev_unpin_memory(kvm, guest_page, n);
> +
> + return ret;
> +}
> +
> static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
> {
> struct kvm_sev_cmd sev_cmd;
> @@ -7321,6 +7446,9 @@ static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
> case KVM_SEV_SEND_START:
> r = sev_send_start(kvm, &sev_cmd);
> break;
> + case KVM_SEV_SEND_UPDATE_DATA:
> + r = sev_send_update_data(kvm, &sev_cmd);
> + break;
> default:
> r = -EINVAL;
> goto out;
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 17bef4c245e1..d9dc81bb9c55 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -1570,6 +1570,15 @@ struct kvm_sev_send_start {
> __u32 session_len;
> };
>
> +struct kvm_sev_send_update_data {
> + __u64 hdr_uaddr;
> + __u32 hdr_len;
> + __u64 guest_uaddr;
> + __u32 guest_len;
> + __u64 trans_uaddr;
> + __u32 trans_len;
> +};
> +
> #define KVM_DEV_ASSIGN_ENABLE_IOMMU (1 << 0)
> #define KVM_DEV_ASSIGN_PCI_2_3 (1 << 1)
> #define KVM_DEV_ASSIGN_MASK_INTX (1 << 2)
Reviewed-by: Krish Sadhukhan <[email protected]>
On 2020-03-30 06:21:04 +0000, Ashish Kalra wrote:
> From: Brijesh Singh <[email protected]>
>
> The command is used to create the encryption context for an incoming
> SEV guest. The encryption context can be later used by the hypervisor
> to import the incoming data into the SEV guest memory space.
>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: "H. Peter Anvin" <[email protected]>
> Cc: Paolo Bonzini <[email protected]>
> Cc: "Radim Krčmář" <[email protected]>
> Cc: Joerg Roedel <[email protected]>
> Cc: Borislav Petkov <[email protected]>
> Cc: Tom Lendacky <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Reviewed-by: Steve Rutherford <[email protected]>
> Signed-off-by: Brijesh Singh <[email protected]>
> Signed-off-by: Ashish Kalra <[email protected]>
> ---
> .../virt/kvm/amd-memory-encryption.rst | 29 +++++++
> arch/x86/kvm/svm.c | 81 +++++++++++++++++++
> include/uapi/linux/kvm.h | 9 +++
> 3 files changed, 119 insertions(+)
>
> diff --git a/Documentation/virt/kvm/amd-memory-encryption.rst b/Documentation/virt/kvm/amd-memory-encryption.rst
> index a45dcb5f8687..ef1f1f3a5b40 100644
> --- a/Documentation/virt/kvm/amd-memory-encryption.rst
> +++ b/Documentation/virt/kvm/amd-memory-encryption.rst
> @@ -322,6 +322,35 @@ issued by the hypervisor to delete the encryption context.
>
> Returns: 0 on success, -negative on error
>
> +13. KVM_SEV_RECEIVE_START
> +------------------------
> +
> +The KVM_SEV_RECEIVE_START command is used for creating the memory encryption
> +context for an incoming SEV guest. To create the encryption context, the user must
> +provide a guest policy, the platform public Diffie-Hellman (PDH) key and session
> +information.
> +
> +Parameters: struct kvm_sev_receive_start (in/out)
> +
> +Returns: 0 on success, -negative on error
> +
> +::
> +
> + struct kvm_sev_receive_start {
> + __u32 handle; /* if zero then firmware creates a new handle */
> + __u32 policy; /* guest's policy */
> +
> + __u64 pdh_uaddr; /* userspace address pointing to the PDH key */
> + __u32 dh_len;
Could dh_len be changed to pdh_len, to match the names in
kvm_sev_receive_start in include/uapi/linux/kvm.h?
> +
> + __u64 session_addr; /* userspace address which points to the guest session information */
Also, session_addr to session_uaddr?
> + __u32 session_len;
> + };
> +
> +On success, the 'handle' field contains a new handle and on error, a negative value.
> +
> +For more details, see SEV spec Section 6.12.
> +
> References
> ==========
>
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index 71a4cb3b817d..038b47685733 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -7419,6 +7419,84 @@ static int sev_send_finish(struct kvm *kvm, struct kvm_sev_cmd *argp)
> return ret;
> }
>
> +static int sev_receive_start(struct kvm *kvm, struct kvm_sev_cmd *argp)
> +{
> + struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> + struct sev_data_receive_start *start;
> + struct kvm_sev_receive_start params;
> + int *error = &argp->error;
> + void *session_data;
> + void *pdh_data;
> + int ret;
> +
> + if (!sev_guest(kvm))
> + return -ENOTTY;
> +
> + /* Get parameter from the userspace */
> + if (copy_from_user(¶ms, (void __user *)(uintptr_t)argp->data,
> + sizeof(struct kvm_sev_receive_start)))
> + return -EFAULT;
> +
> + /* some sanity checks */
> + if (!params.pdh_uaddr || !params.pdh_len ||
> + !params.session_uaddr || !params.session_len)
> + return -EINVAL;
> +
> + pdh_data = psp_copy_user_blob(params.pdh_uaddr, params.pdh_len);
> + if (IS_ERR(pdh_data))
> + return PTR_ERR(pdh_data);
> +
> + session_data = psp_copy_user_blob(params.session_uaddr,
> + params.session_len);
> + if (IS_ERR(session_data)) {
> + ret = PTR_ERR(session_data);
> + goto e_free_pdh;
> + }
> +
> + ret = -ENOMEM;
> + start = kzalloc(sizeof(*start), GFP_KERNEL);
> + if (!start)
> + goto e_free_session;
> +
> + start->handle = params.handle;
> + start->policy = params.policy;
> + start->pdh_cert_address = __psp_pa(pdh_data);
> + start->pdh_cert_len = params.pdh_len;
> + start->session_address = __psp_pa(session_data);
> + start->session_len = params.session_len;
> +
> + /* create memory encryption context */
> + ret = __sev_issue_cmd(argp->sev_fd, SEV_CMD_RECEIVE_START, start,
> + error);
> + if (ret)
> + goto e_free;
> +
> + /* Bind ASID to this guest */
> + ret = sev_bind_asid(kvm, start->handle, error);
> + if (ret)
> + goto e_free;
> +
> + params.handle = start->handle;
> + if (copy_to_user((void __user *)(uintptr_t)argp->data,
> + ¶ms, sizeof(struct kvm_sev_receive_start))) {
> + ret = -EFAULT;
> + sev_unbind_asid(kvm, start->handle);
> + goto e_free;
> + }
> +
> + sev->handle = start->handle;
> + sev->fd = argp->sev_fd;
> +
> +e_free:
> + kfree(start);
> +e_free_session:
> + kfree(session_data);
> +e_free_pdh:
> + kfree(pdh_data);
> +
> + return ret;
> +}
> +
> static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
> {
> struct kvm_sev_cmd sev_cmd;
> @@ -7472,6 +7550,9 @@ static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
> case KVM_SEV_SEND_FINISH:
> r = sev_send_finish(kvm, &sev_cmd);
> break;
> + case KVM_SEV_RECEIVE_START:
> + r = sev_receive_start(kvm, &sev_cmd);
> + break;
> default:
> r = -EINVAL;
> goto out;
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index d9dc81bb9c55..74764b9db5fa 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -1579,6 +1579,15 @@ struct kvm_sev_send_update_data {
> __u32 trans_len;
> };
>
> +struct kvm_sev_receive_start {
> + __u32 handle;
> + __u32 policy;
> + __u64 pdh_uaddr;
> + __u32 pdh_len;
> + __u64 session_uaddr;
> + __u32 session_len;
> +};
> +
> #define KVM_DEV_ASSIGN_ENABLE_IOMMU (1 << 0)
> #define KVM_DEV_ASSIGN_PCI_2_3 (1 << 1)
> #define KVM_DEV_ASSIGN_MASK_INTX (1 << 2)
> --
> 2.17.1
>
On 3/29/20 11:21 PM, Ashish Kalra wrote:
> From: Brijesh Singh <[email protected]>
>
> The command finalize the guest receiving process and make the SEV guest
> ready for the execution.
>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: "H. Peter Anvin" <[email protected]>
> Cc: Paolo Bonzini <[email protected]>
> Cc: "Radim Krčmář" <[email protected]>
> Cc: Joerg Roedel <[email protected]>
> Cc: Borislav Petkov <[email protected]>
> Cc: Tom Lendacky <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Signed-off-by: Brijesh Singh <[email protected]>
> Signed-off-by: Ashish Kalra <[email protected]>
> ---
> .../virt/kvm/amd-memory-encryption.rst | 8 +++++++
> arch/x86/kvm/svm.c | 23 +++++++++++++++++++
> 2 files changed, 31 insertions(+)
>
> diff --git a/Documentation/virt/kvm/amd-memory-encryption.rst b/Documentation/virt/kvm/amd-memory-encryption.rst
> index 554aa33a99cc..93cd95d9a6c0 100644
> --- a/Documentation/virt/kvm/amd-memory-encryption.rst
> +++ b/Documentation/virt/kvm/amd-memory-encryption.rst
> @@ -375,6 +375,14 @@ Returns: 0 on success, -negative on error
> __u32 trans_len;
> };
>
> +15. KVM_SEV_RECEIVE_FINISH
> +------------------------
> +
> +After completion of the migration flow, the KVM_SEV_RECEIVE_FINISH command can be
> +issued by the hypervisor to make the guest ready for execution.
> +
> +Returns: 0 on success, -negative on error
> +
> References
> ==========
>
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index 5fc5355536d7..7c2721e18b06 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -7573,6 +7573,26 @@ static int sev_receive_update_data(struct kvm *kvm, struct kvm_sev_cmd *argp)
> return ret;
> }
>
> +static int sev_receive_finish(struct kvm *kvm, struct kvm_sev_cmd *argp)
> +{
> + struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> + struct sev_data_receive_finish *data;
> + int ret;
> +
> + if (!sev_guest(kvm))
> + return -ENOTTY;
> +
> + data = kzalloc(sizeof(*data), GFP_KERNEL);
> + if (!data)
> + return -ENOMEM;
> +
> + data->handle = sev->handle;
> + ret = sev_issue_cmd(kvm, SEV_CMD_RECEIVE_FINISH, data, &argp->error);
> +
> + kfree(data);
> + return ret;
> +}
> +
> static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
> {
> struct kvm_sev_cmd sev_cmd;
> @@ -7632,6 +7652,9 @@ static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
> case KVM_SEV_RECEIVE_UPDATE_DATA:
> r = sev_receive_update_data(kvm, &sev_cmd);
> break;
> + case KVM_SEV_RECEIVE_FINISH:
> + r = sev_receive_finish(kvm, &sev_cmd);
> + break;
> default:
> r = -EINVAL;
> goto out;
Reviewed-by: Krish Sadhukhan <[email protected]>
On 3/29/20 11:21 PM, Ashish Kalra wrote:
> From: Brijesh Singh <[email protected]>
>
> The command is used to create the encryption context for an incoming
> SEV guest. The encryption context can be later used by the hypervisor
> to import the incoming data into the SEV guest memory space.
>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: "H. Peter Anvin" <[email protected]>
> Cc: Paolo Bonzini <[email protected]>
> Cc: "Radim Krčmář" <[email protected]>
> Cc: Joerg Roedel <[email protected]>
> Cc: Borislav Petkov <[email protected]>
> Cc: Tom Lendacky <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Reviewed-by: Steve Rutherford <[email protected]>
> Signed-off-by: Brijesh Singh <[email protected]>
> Signed-off-by: Ashish Kalra <[email protected]>
> ---
> .../virt/kvm/amd-memory-encryption.rst | 29 +++++++
> arch/x86/kvm/svm.c | 81 +++++++++++++++++++
> include/uapi/linux/kvm.h | 9 +++
> 3 files changed, 119 insertions(+)
>
> diff --git a/Documentation/virt/kvm/amd-memory-encryption.rst b/Documentation/virt/kvm/amd-memory-encryption.rst
> index a45dcb5f8687..ef1f1f3a5b40 100644
> --- a/Documentation/virt/kvm/amd-memory-encryption.rst
> +++ b/Documentation/virt/kvm/amd-memory-encryption.rst
> @@ -322,6 +322,35 @@ issued by the hypervisor to delete the encryption context.
>
> Returns: 0 on success, -negative on error
>
> +13. KVM_SEV_RECEIVE_START
> +------------------------
> +
> +The KVM_SEV_RECEIVE_START command is used for creating the memory encryption
> +context for an incoming SEV guest. To create the encryption context, the user must
> +provide a guest policy, the platform public Diffie-Hellman (PDH) key and session
> +information.
> +
> +Parameters: struct kvm_sev_receive_start (in/out)
> +
> +Returns: 0 on success, -negative on error
> +
> +::
> +
> + struct kvm_sev_receive_start {
> + __u32 handle; /* if zero then firmware creates a new handle */
> + __u32 policy; /* guest's policy */
> +
> + __u64 pdh_uaddr; /* userspace address pointing to the PDH key */
> + __u32 dh_len;
> +
> + __u64 session_addr; /* userspace address which points to the guest session information */
> + __u32 session_len;
> + };
> +
> +On success, the 'handle' field contains a new handle and on error, a negative value.
> +
> +For more details, see SEV spec Section 6.12.
> +
> References
> ==========
>
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index 71a4cb3b817d..038b47685733 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -7419,6 +7419,84 @@ static int sev_send_finish(struct kvm *kvm, struct kvm_sev_cmd *argp)
> return ret;
> }
>
> +static int sev_receive_start(struct kvm *kvm, struct kvm_sev_cmd *argp)
> +{
> + struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> + struct sev_data_receive_start *start;
> + struct kvm_sev_receive_start params;
> + int *error = &argp->error;
> + void *session_data;
> + void *pdh_data;
> + int ret;
> +
> + if (!sev_guest(kvm))
> + return -ENOTTY;
> +
> + /* Get parameter from the userspace */
> + if (copy_from_user(¶ms, (void __user *)(uintptr_t)argp->data,
> + sizeof(struct kvm_sev_receive_start)))
> + return -EFAULT;
> +
> + /* some sanity checks */
> + if (!params.pdh_uaddr || !params.pdh_len ||
> + !params.session_uaddr || !params.session_len)
> + return -EINVAL;
> +
> + pdh_data = psp_copy_user_blob(params.pdh_uaddr, params.pdh_len);
> + if (IS_ERR(pdh_data))
> + return PTR_ERR(pdh_data);
> +
> + session_data = psp_copy_user_blob(params.session_uaddr,
> + params.session_len);
> + if (IS_ERR(session_data)) {
> + ret = PTR_ERR(session_data);
> + goto e_free_pdh;
> + }
> +
> + ret = -ENOMEM;
> + start = kzalloc(sizeof(*start), GFP_KERNEL);
> + if (!start)
> + goto e_free_session;
> +
> + start->handle = params.handle;
> + start->policy = params.policy;
> + start->pdh_cert_address = __psp_pa(pdh_data);
> + start->pdh_cert_len = params.pdh_len;
> + start->session_address = __psp_pa(session_data);
> + start->session_len = params.session_len;
> +
> + /* create memory encryption context */
> + ret = __sev_issue_cmd(argp->sev_fd, SEV_CMD_RECEIVE_START, start,
> + error);
> + if (ret)
> + goto e_free;
> +
> + /* Bind ASID to this guest */
> + ret = sev_bind_asid(kvm, start->handle, error);
> + if (ret)
> + goto e_free;
> +
> + params.handle = start->handle;
> + if (copy_to_user((void __user *)(uintptr_t)argp->data,
> + ¶ms, sizeof(struct kvm_sev_receive_start))) {
> + ret = -EFAULT;
> + sev_unbind_asid(kvm, start->handle);
> + goto e_free;
> + }
> +
> + sev->handle = start->handle;
> + sev->fd = argp->sev_fd;
> +
> +e_free:
> + kfree(start);
> +e_free_session:
> + kfree(session_data);
> +e_free_pdh:
> + kfree(pdh_data);
> +
> + return ret;
> +}
> +
> static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
> {
> struct kvm_sev_cmd sev_cmd;
> @@ -7472,6 +7550,9 @@ static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
> case KVM_SEV_SEND_FINISH:
> r = sev_send_finish(kvm, &sev_cmd);
> break;
> + case KVM_SEV_RECEIVE_START:
> + r = sev_receive_start(kvm, &sev_cmd);
> + break;
> default:
> r = -EINVAL;
> goto out;
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index d9dc81bb9c55..74764b9db5fa 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -1579,6 +1579,15 @@ struct kvm_sev_send_update_data {
> __u32 trans_len;
> };
>
> +struct kvm_sev_receive_start {
> + __u32 handle;
> + __u32 policy;
> + __u64 pdh_uaddr;
> + __u32 pdh_len;
Why not 'pdh_cert_uaddr' and 'pdh_cert_len' ? That's the naming
convention you have followed in previous patches.
> + __u64 session_uaddr;
> + __u32 session_len;
> +};
> +
> #define KVM_DEV_ASSIGN_ENABLE_IOMMU (1 << 0)
> #define KVM_DEV_ASSIGN_PCI_2_3 (1 << 1)
> #define KVM_DEV_ASSIGN_MASK_INTX (1 << 2)
Reviewed-by: Krish Sadhukhan <[email protected]>
On 2020-03-30 06:21:36 +0000, Ashish Kalra wrote:
> From: Brijesh Singh <[email protected]>
>
> The command finalize the guest receiving process and make the SEV guest
> ready for the execution.
>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: "H. Peter Anvin" <[email protected]>
> Cc: Paolo Bonzini <[email protected]>
> Cc: "Radim Krčmář" <[email protected]>
> Cc: Joerg Roedel <[email protected]>
> Cc: Borislav Petkov <[email protected]>
> Cc: Tom Lendacky <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Signed-off-by: Brijesh Singh <[email protected]>
> Signed-off-by: Ashish Kalra <[email protected]>
> ---
> .../virt/kvm/amd-memory-encryption.rst | 8 +++++++
> arch/x86/kvm/svm.c | 23 +++++++++++++++++++
> 2 files changed, 31 insertions(+)
>
> diff --git a/Documentation/virt/kvm/amd-memory-encryption.rst b/Documentation/virt/kvm/amd-memory-encryption.rst
> index 554aa33a99cc..93cd95d9a6c0 100644
> --- a/Documentation/virt/kvm/amd-memory-encryption.rst
> +++ b/Documentation/virt/kvm/amd-memory-encryption.rst
> @@ -375,6 +375,14 @@ Returns: 0 on success, -negative on error
> __u32 trans_len;
> };
>
> +15. KVM_SEV_RECEIVE_FINISH
> +------------------------
> +
> +After completion of the migration flow, the KVM_SEV_RECEIVE_FINISH command can be
> +issued by the hypervisor to make the guest ready for execution.
> +
> +Returns: 0 on success, -negative on error
> +
> References
> ==========
>
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index 5fc5355536d7..7c2721e18b06 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -7573,6 +7573,26 @@ static int sev_receive_update_data(struct kvm *kvm, struct kvm_sev_cmd *argp)
> return ret;
> }
>
> +static int sev_receive_finish(struct kvm *kvm, struct kvm_sev_cmd *argp)
> +{
> + struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> + struct sev_data_receive_finish *data;
> + int ret;
> +
> + if (!sev_guest(kvm))
> + return -ENOTTY;
Noticed this in earlier patches too. Is -ENOTTY the best return value?
Aren't one of -ENXIO, or -ENODEV, or -EINVAL a better choice? What is
the rationale for using -ENOTTY?
> +
> + data = kzalloc(sizeof(*data), GFP_KERNEL);
> + if (!data)
> + return -ENOMEM;
> +
> + data->handle = sev->handle;
> + ret = sev_issue_cmd(kvm, SEV_CMD_RECEIVE_FINISH, data, &argp->error);
> +
> + kfree(data);
> + return ret;
> +}
> +
> static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
> {
> struct kvm_sev_cmd sev_cmd;
> @@ -7632,6 +7652,9 @@ static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
> case KVM_SEV_RECEIVE_UPDATE_DATA:
> r = sev_receive_update_data(kvm, &sev_cmd);
> break;
> + case KVM_SEV_RECEIVE_FINISH:
> + r = sev_receive_finish(kvm, &sev_cmd);
> + break;
> default:
> r = -EINVAL;
> goto out;
> --
> 2.17.1
>
On 3/29/20 11:21 PM, Ashish Kalra wrote:
> From: Brijesh Singh <[email protected]>
>
> The command is used for copying the incoming buffer into the
> SEV guest memory space.
>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: "H. Peter Anvin" <[email protected]>
> Cc: Paolo Bonzini <[email protected]>
> Cc: "Radim Krčmář" <[email protected]>
> Cc: Joerg Roedel <[email protected]>
> Cc: Borislav Petkov <[email protected]>
> Cc: Tom Lendacky <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Signed-off-by: Brijesh Singh <[email protected]>
> Signed-off-by: Ashish Kalra <[email protected]>
> ---
> .../virt/kvm/amd-memory-encryption.rst | 24 ++++++
> arch/x86/kvm/svm.c | 79 +++++++++++++++++++
> include/uapi/linux/kvm.h | 9 +++
> 3 files changed, 112 insertions(+)
>
> diff --git a/Documentation/virt/kvm/amd-memory-encryption.rst b/Documentation/virt/kvm/amd-memory-encryption.rst
> index ef1f1f3a5b40..554aa33a99cc 100644
> --- a/Documentation/virt/kvm/amd-memory-encryption.rst
> +++ b/Documentation/virt/kvm/amd-memory-encryption.rst
> @@ -351,6 +351,30 @@ On success, the 'handle' field contains a new handle and on error, a negative va
>
> For more details, see SEV spec Section 6.12.
>
> +14. KVM_SEV_RECEIVE_UPDATE_DATA
> +----------------------------
> +
> +The KVM_SEV_RECEIVE_UPDATE_DATA command can be used by the hypervisor to copy
> +the incoming buffers into the guest memory region with encryption context
> +created during the KVM_SEV_RECEIVE_START.
> +
> +Parameters (in): struct kvm_sev_receive_update_data
> +
> +Returns: 0 on success, -negative on error
> +
> +::
> +
> + struct kvm_sev_launch_receive_update_data {
> + __u64 hdr_uaddr; /* userspace address containing the packet header */
> + __u32 hdr_len;
> +
> + __u64 guest_uaddr; /* the destination guest memory region */
> + __u32 guest_len;
> +
> + __u64 trans_uaddr; /* the incoming buffer memory region */
> + __u32 trans_len;
> + };
> +
> References
> ==========
>
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index 038b47685733..5fc5355536d7 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -7497,6 +7497,82 @@ static int sev_receive_start(struct kvm *kvm, struct kvm_sev_cmd *argp)
> return ret;
> }
>
> +static int sev_receive_update_data(struct kvm *kvm, struct kvm_sev_cmd *argp)
> +{
> + struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> + struct kvm_sev_receive_update_data params;
> + struct sev_data_receive_update_data *data;
> + void *hdr = NULL, *trans = NULL;
> + struct page **guest_page;
> + unsigned long n;
> + int ret, offset;
> +
> + if (!sev_guest(kvm))
> + return -EINVAL;
> +
> + if (copy_from_user(¶ms, (void __user *)(uintptr_t)argp->data,
> + sizeof(struct kvm_sev_receive_update_data)))
> + return -EFAULT;
> +
> + if (!params.hdr_uaddr || !params.hdr_len ||
> + !params.guest_uaddr || !params.guest_len ||
> + !params.trans_uaddr || !params.trans_len)
> + return -EINVAL;
> +
> + /* Check if we are crossing the page boundary */
> + offset = params.guest_uaddr & (PAGE_SIZE - 1);
> + if ((params.guest_len + offset > PAGE_SIZE))
> + return -EINVAL;
> +
> + hdr = psp_copy_user_blob(params.hdr_uaddr, params.hdr_len);
> + if (IS_ERR(hdr))
> + return PTR_ERR(hdr);
> +
> + trans = psp_copy_user_blob(params.trans_uaddr, params.trans_len);
> + if (IS_ERR(trans)) {
> + ret = PTR_ERR(trans);
> + goto e_free_hdr;
> + }
> +
> + ret = -ENOMEM;
> + data = kzalloc(sizeof(*data), GFP_KERNEL);
> + if (!data)
> + goto e_free_trans;
> +
> + data->hdr_address = __psp_pa(hdr);
> + data->hdr_len = params.hdr_len;
> + data->trans_address = __psp_pa(trans);
> + data->trans_len = params.trans_len;
> +
> + /* Pin guest memory */
> + ret = -EFAULT;
> + guest_page = sev_pin_memory(kvm, params.guest_uaddr & PAGE_MASK,
> + PAGE_SIZE, &n, 0);
> + if (!guest_page)
> + goto e_free;
> +
> + /* The RECEIVE_UPDATE_DATA command requires C-bit to be always set. */
> + data->guest_address = (page_to_pfn(guest_page[0]) << PAGE_SHIFT) +
> + offset;
> + data->guest_address |= sev_me_mask;
> + data->guest_len = params.guest_len;
> + data->handle = sev->handle;
> +
> + ret = sev_issue_cmd(kvm, SEV_CMD_RECEIVE_UPDATE_DATA, data,
> + &argp->error);
> +
> + sev_unpin_memory(kvm, guest_page, n);
> +
> +e_free:
> + kfree(data);
> +e_free_trans:
> + kfree(trans);
> +e_free_hdr:
> + kfree(hdr);
> +
> + return ret;
> +}
> +
> static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
> {
> struct kvm_sev_cmd sev_cmd;
> @@ -7553,6 +7629,9 @@ static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
> case KVM_SEV_RECEIVE_START:
> r = sev_receive_start(kvm, &sev_cmd);
> break;
> + case KVM_SEV_RECEIVE_UPDATE_DATA:
> + r = sev_receive_update_data(kvm, &sev_cmd);
> + break;
> default:
> r = -EINVAL;
> goto out;
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 74764b9db5fa..4e80c57a3182 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -1588,6 +1588,15 @@ struct kvm_sev_receive_start {
> __u32 session_len;
> };
>
> +struct kvm_sev_receive_update_data {
> + __u64 hdr_uaddr;
> + __u32 hdr_len;
> + __u64 guest_uaddr;
> + __u32 guest_len;
> + __u64 trans_uaddr;
> + __u32 trans_len;
> +};
> +
> #define KVM_DEV_ASSIGN_ENABLE_IOMMU (1 << 0)
> #define KVM_DEV_ASSIGN_PCI_2_3 (1 << 1)
> #define KVM_DEV_ASSIGN_MASK_INTX (1 << 2)
Reviewed-by: Krish Sadhukhan <[email protected]>
On 2020-03-30 06:21:20 +0000, Ashish Kalra wrote:
> From: Brijesh Singh <[email protected]>
>
> The command is used for copying the incoming buffer into the
> SEV guest memory space.
>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: "H. Peter Anvin" <[email protected]>
> Cc: Paolo Bonzini <[email protected]>
> Cc: "Radim Krčmář" <[email protected]>
> Cc: Joerg Roedel <[email protected]>
> Cc: Borislav Petkov <[email protected]>
> Cc: Tom Lendacky <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Signed-off-by: Brijesh Singh <[email protected]>
> Signed-off-by: Ashish Kalra <[email protected]>
Reviewed-by: Venu Busireddy <[email protected]>
> ---
> .../virt/kvm/amd-memory-encryption.rst | 24 ++++++
> arch/x86/kvm/svm.c | 79 +++++++++++++++++++
> include/uapi/linux/kvm.h | 9 +++
> 3 files changed, 112 insertions(+)
>
> diff --git a/Documentation/virt/kvm/amd-memory-encryption.rst b/Documentation/virt/kvm/amd-memory-encryption.rst
> index ef1f1f3a5b40..554aa33a99cc 100644
> --- a/Documentation/virt/kvm/amd-memory-encryption.rst
> +++ b/Documentation/virt/kvm/amd-memory-encryption.rst
> @@ -351,6 +351,30 @@ On success, the 'handle' field contains a new handle and on error, a negative va
>
> For more details, see SEV spec Section 6.12.
>
> +14. KVM_SEV_RECEIVE_UPDATE_DATA
> +----------------------------
> +
> +The KVM_SEV_RECEIVE_UPDATE_DATA command can be used by the hypervisor to copy
> +the incoming buffers into the guest memory region with encryption context
> +created during the KVM_SEV_RECEIVE_START.
> +
> +Parameters (in): struct kvm_sev_receive_update_data
> +
> +Returns: 0 on success, -negative on error
> +
> +::
> +
> + struct kvm_sev_launch_receive_update_data {
> + __u64 hdr_uaddr; /* userspace address containing the packet header */
> + __u32 hdr_len;
> +
> + __u64 guest_uaddr; /* the destination guest memory region */
> + __u32 guest_len;
> +
> + __u64 trans_uaddr; /* the incoming buffer memory region */
> + __u32 trans_len;
> + };
> +
> References
> ==========
>
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index 038b47685733..5fc5355536d7 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -7497,6 +7497,82 @@ static int sev_receive_start(struct kvm *kvm, struct kvm_sev_cmd *argp)
> return ret;
> }
>
> +static int sev_receive_update_data(struct kvm *kvm, struct kvm_sev_cmd *argp)
> +{
> + struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> + struct kvm_sev_receive_update_data params;
> + struct sev_data_receive_update_data *data;
> + void *hdr = NULL, *trans = NULL;
> + struct page **guest_page;
> + unsigned long n;
> + int ret, offset;
> +
> + if (!sev_guest(kvm))
> + return -EINVAL;
> +
> + if (copy_from_user(¶ms, (void __user *)(uintptr_t)argp->data,
> + sizeof(struct kvm_sev_receive_update_data)))
> + return -EFAULT;
> +
> + if (!params.hdr_uaddr || !params.hdr_len ||
> + !params.guest_uaddr || !params.guest_len ||
> + !params.trans_uaddr || !params.trans_len)
> + return -EINVAL;
> +
> + /* Check if we are crossing the page boundary */
> + offset = params.guest_uaddr & (PAGE_SIZE - 1);
> + if ((params.guest_len + offset > PAGE_SIZE))
> + return -EINVAL;
> +
> + hdr = psp_copy_user_blob(params.hdr_uaddr, params.hdr_len);
> + if (IS_ERR(hdr))
> + return PTR_ERR(hdr);
> +
> + trans = psp_copy_user_blob(params.trans_uaddr, params.trans_len);
> + if (IS_ERR(trans)) {
> + ret = PTR_ERR(trans);
> + goto e_free_hdr;
> + }
> +
> + ret = -ENOMEM;
> + data = kzalloc(sizeof(*data), GFP_KERNEL);
> + if (!data)
> + goto e_free_trans;
> +
> + data->hdr_address = __psp_pa(hdr);
> + data->hdr_len = params.hdr_len;
> + data->trans_address = __psp_pa(trans);
> + data->trans_len = params.trans_len;
> +
> + /* Pin guest memory */
> + ret = -EFAULT;
> + guest_page = sev_pin_memory(kvm, params.guest_uaddr & PAGE_MASK,
> + PAGE_SIZE, &n, 0);
> + if (!guest_page)
> + goto e_free;
> +
> + /* The RECEIVE_UPDATE_DATA command requires C-bit to be always set. */
> + data->guest_address = (page_to_pfn(guest_page[0]) << PAGE_SHIFT) +
> + offset;
> + data->guest_address |= sev_me_mask;
> + data->guest_len = params.guest_len;
> + data->handle = sev->handle;
> +
> + ret = sev_issue_cmd(kvm, SEV_CMD_RECEIVE_UPDATE_DATA, data,
> + &argp->error);
> +
> + sev_unpin_memory(kvm, guest_page, n);
> +
> +e_free:
> + kfree(data);
> +e_free_trans:
> + kfree(trans);
> +e_free_hdr:
> + kfree(hdr);
> +
> + return ret;
> +}
> +
> static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
> {
> struct kvm_sev_cmd sev_cmd;
> @@ -7553,6 +7629,9 @@ static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
> case KVM_SEV_RECEIVE_START:
> r = sev_receive_start(kvm, &sev_cmd);
> break;
> + case KVM_SEV_RECEIVE_UPDATE_DATA:
> + r = sev_receive_update_data(kvm, &sev_cmd);
> + break;
> default:
> r = -EINVAL;
> goto out;
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 74764b9db5fa..4e80c57a3182 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -1588,6 +1588,15 @@ struct kvm_sev_receive_start {
> __u32 session_len;
> };
>
> +struct kvm_sev_receive_update_data {
> + __u64 hdr_uaddr;
> + __u32 hdr_len;
> + __u64 guest_uaddr;
> + __u32 guest_len;
> + __u64 trans_uaddr;
> + __u32 trans_len;
> +};
> +
> #define KVM_DEV_ASSIGN_ENABLE_IOMMU (1 << 0)
> #define KVM_DEV_ASSIGN_PCI_2_3 (1 << 1)
> #define KVM_DEV_ASSIGN_MASK_INTX (1 << 2)
> --
> 2.17.1
>
On 2020-03-30 06:21:52 +0000, Ashish Kalra wrote:
> From: Brijesh Singh <[email protected]>
>
> KVM hypercall framework relies on alternative framework to patch the
> VMCALL -> VMMCALL on AMD platform. If a hypercall is made before
> apply_alternative() is called then it defaults to VMCALL. The approach
> works fine on non SEV guest. A VMCALL would causes #UD, and hypervisor
^^^^^^
cause
> will be able to decode the instruction and do the right things. But
> when SEV is active, guest memory is encrypted with guest key and
> hypervisor will not be able to decode the instruction bytes.
>
> Add SEV specific hypercall3, it unconditionally uses VMMCALL. The hypercall
^^
which
> will be used by the SEV guest to notify encrypted pages to the hypervisor.
>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: "H. Peter Anvin" <[email protected]>
> Cc: Paolo Bonzini <[email protected]>
> Cc: "Radim Krčmář" <[email protected]>
> Cc: Joerg Roedel <[email protected]>
> Cc: Borislav Petkov <[email protected]>
> Cc: Tom Lendacky <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Signed-off-by: Brijesh Singh <[email protected]>
> Signed-off-by: Ashish Kalra <[email protected]>
Reviewed-by: Venu Busireddy <[email protected]>
> ---
> arch/x86/include/asm/kvm_para.h | 12 ++++++++++++
> 1 file changed, 12 insertions(+)
>
> diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
> index 9b4df6eaa11a..6c09255633a4 100644
> --- a/arch/x86/include/asm/kvm_para.h
> +++ b/arch/x86/include/asm/kvm_para.h
> @@ -84,6 +84,18 @@ static inline long kvm_hypercall4(unsigned int nr, unsigned long p1,
> return ret;
> }
>
> +static inline long kvm_sev_hypercall3(unsigned int nr, unsigned long p1,
> + unsigned long p2, unsigned long p3)
> +{
> + long ret;
> +
> + asm volatile("vmmcall"
> + : "=a"(ret)
> + : "a"(nr), "b"(p1), "c"(p2), "d"(p3)
> + : "memory");
> + return ret;
> +}
> +
> #ifdef CONFIG_KVM_GUEST
> bool kvm_para_available(void);
> unsigned int kvm_arch_para_features(void);
> --
> 2.17.1
>
Hello Brijesh,
>
> On Tue, Mar 31, 2020 at 05:13:36PM +0000, Ashish Kalra wrote:
> > Hello Brijesh,
> >
> > > > Actually this is being done somewhat lazily, after the guest
> > > > enables/activates the live migration feature, it should be fine to do it
> > > > here or it can be moved into sev_map_percpu_data() where the first
> > > > hypercalls are done, in both cases the __bss_decrypted section will be
> > > > marked before the live migration process is initiated.
> > >
> > >
> > > IMO, its not okay to do it here or inside sev_map_percpu_data(). So far,
> > > as soon as C-bit state is changed in page table we make a hypercall. It
> > > will be good idea to stick to that approach. I don't see any reason why
> > > we need to make an exception for the __bss_decrypted unless I am missing
> > > something. What will happen if VMM initiate the migration while guest
> > > BIOS is booting?? Are you saying its not supported ?
> > >
> >
> > The one thing this will require is checking for KVM para capability
> > KVM_FEATURE_SEV_LIVE_MIGRATION as part of this code in startup_64(), i
> > need to verify if i can check for this feature so early in startup code.
> >
> > I need to check for this capability and do the wrmsrl() here as this
> > will be the 1st hypercall in the guest kernel and i will need to
> > enable live migration feature and hypercall support on the host
> > before making the hypercall.
> >
I added the KVM para feature capability check here in startup_64(), and
as i thought this does "not" work and also as a side effect disables
the KVM paravirtualization check and so KVM paravirtualization is not
detected later during kernel boot and all KVM paravirt features remain
disabled.
Digged deeper into this and here's what happens ...
kvm_para_has_feature() calls kvm_arch_para_feature() which in turn calls
kvm_cpuid_base() and this invokes __kvm_cpuid_base(). As the
"boot_cpu_data" is still not populated/setup, therefore,
__kvm_cpuid_base() does not detect X86_FEATURE_HYPERVISOR and
also as a side effect sets the variable kvm_cpuid_base == 0.
So as the kvm_para_feature() is not detected in startup_64(), therefore
the hypercall does not get invoked and also as the side effect of calling
kvm_para_feature() in startup_64(), the static variable "kvm_cpuid_base"
gets set to 0, and later during hypervisor detection (kvm_detect), this
variable's setting causes kvm_detect() to return failure and hence
KVM paravirtualization features don't get enabled for the guest kernel.
So, calling kvm_para_has_feature() so early in startup_64() code is
not going to work, hence, it is probably best to do the hypercall to mark
__bss_decrypted section as decrypted (lazily) as part of sev_map_percpu_data()
as per my original thought.
Thanks,
Ashish
Hello Brijesh,
On Tue, Mar 31, 2020 at 09:26:26AM -0500, Brijesh Singh wrote:
>
> On 3/30/20 11:45 AM, Ashish Kalra wrote:
> > Hello Brijesh,
> >
> > On Mon, Mar 30, 2020 at 11:00:14AM -0500, Brijesh Singh wrote:
> >> On 3/30/20 1:23 AM, Ashish Kalra wrote:
> >>> From: Ashish Kalra <[email protected]>
> >>>
> >>> Reset the host's page encryption bitmap related to kernel
> >>> specific page encryption status settings before we load a
> >>> new kernel by kexec. We cannot reset the complete
> >>> page encryption bitmap here as we need to retain the
> >>> UEFI/OVMF firmware specific settings.
> >>>
> >>> Signed-off-by: Ashish Kalra <[email protected]>
> >>> ---
> >>> arch/x86/kernel/kvm.c | 28 ++++++++++++++++++++++++++++
> >>> 1 file changed, 28 insertions(+)
> >>>
> >>> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> >>> index 8fcee0b45231..ba6cce3c84af 100644
> >>> --- a/arch/x86/kernel/kvm.c
> >>> +++ b/arch/x86/kernel/kvm.c
> >>> @@ -34,6 +34,7 @@
> >>> #include <asm/hypervisor.h>
> >>> #include <asm/tlb.h>
> >>> #include <asm/cpuidle_haltpoll.h>
> >>> +#include <asm/e820/api.h>
> >>>
> >>> static int kvmapf = 1;
> >>>
> >>> @@ -357,6 +358,33 @@ static void kvm_pv_guest_cpu_reboot(void *unused)
> >>> */
> >>> if (kvm_para_has_feature(KVM_FEATURE_PV_EOI))
> >>> wrmsrl(MSR_KVM_PV_EOI_EN, 0);
> >>> + /*
> >>> + * Reset the host's page encryption bitmap related to kernel
> >>> + * specific page encryption status settings before we load a
> >>> + * new kernel by kexec. NOTE: We cannot reset the complete
> >>> + * page encryption bitmap here as we need to retain the
> >>> + * UEFI/OVMF firmware specific settings.
> >>> + */
> >>> + if (kvm_para_has_feature(KVM_FEATURE_SEV_LIVE_MIGRATION) &&
> >>> + (smp_processor_id() == 0)) {
> >>
> >> In patch 13/14, the KVM_FEATURE_SEV_LIVE_MIGRATION is set
> >> unconditionally and because of that now the below code will be executed
> >> on non-SEV guest. IMO, this feature must be cleared for non-SEV guest to
> >> avoid making unnecessary hypercall's.
> >>
> >>
> > I will additionally add a sev_active() check here to ensure that we don't make the unnecassary hypercalls on non-SEV guests.
>
>
> IMO, instead of using the sev_active() we should make sure that the
> feature is not enabled when SEV is not active.
>
Yes, now the KVM_FEATURE_SEV_LIVE_MIGRATION feature is enabled
dynamically in svm_cpuid_update() after it gets called from
svm_launch_finish(), which ensures that it only gets set when a SEV
guest is active.
Thanks,
Ashish
>
> >>> + unsigned long nr_pages;
> >>> + int i;
> >>> +
> >>> + for (i = 0; i < e820_table->nr_entries; i++) {
> >>> + struct e820_entry *entry = &e820_table->entries[i];
> >>> + unsigned long start_pfn, end_pfn;
> >>> +
> >>> + if (entry->type != E820_TYPE_RAM)
> >>> + continue;
> >>> +
> >>> + start_pfn = entry->addr >> PAGE_SHIFT;
> >>> + end_pfn = (entry->addr + entry->size) >> PAGE_SHIFT;
> >>> + nr_pages = DIV_ROUND_UP(entry->size, PAGE_SIZE);
> >>> +
> >>> + kvm_sev_hypercall3(KVM_HC_PAGE_ENC_STATUS,
> >>> + entry->addr, nr_pages, 1);
> >>> + }
> >>> + }
> >>> kvm_pv_disable_apf();
> >>> kvm_disable_steal_time();
> >>> }
> > Thanks,
> > Ashish
On 3/29/20 11:21 PM, Ashish Kalra wrote:
> From: Brijesh Singh <[email protected]>
>
> KVM hypercall framework relies on alternative framework to patch the
> VMCALL -> VMMCALL on AMD platform. If a hypercall is made before
> apply_alternative()
s/apply_alternative/apply_alternatives/
> is called then it defaults to VMCALL. The approach
> works fine on non SEV guest. A VMCALL would causes #UD, and hypervisor
> will be able to decode the instruction and do the right things. But
> when SEV is active, guest memory is encrypted with guest key and
> hypervisor will not be able to decode the instruction bytes.
>
> Add SEV specific hypercall3, it unconditionally uses VMMCALL. The hypercall
> will be used by the SEV guest to notify encrypted pages to the hypervisor.
>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: "H. Peter Anvin" <[email protected]>
> Cc: Paolo Bonzini <[email protected]>
> Cc: "Radim Krčmář" <[email protected]>
> Cc: Joerg Roedel <[email protected]>
> Cc: Borislav Petkov <[email protected]>
> Cc: Tom Lendacky <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Signed-off-by: Brijesh Singh <[email protected]>
> Signed-off-by: Ashish Kalra <[email protected]>
> ---
> arch/x86/include/asm/kvm_para.h | 12 ++++++++++++
> 1 file changed, 12 insertions(+)
>
> diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
> index 9b4df6eaa11a..6c09255633a4 100644
> --- a/arch/x86/include/asm/kvm_para.h
> +++ b/arch/x86/include/asm/kvm_para.h
> @@ -84,6 +84,18 @@ static inline long kvm_hypercall4(unsigned int nr, unsigned long p1,
> return ret;
> }
>
> +static inline long kvm_sev_hypercall3(unsigned int nr, unsigned long p1,
> + unsigned long p2, unsigned long p3)
> +{
> + long ret;
> +
> + asm volatile("vmmcall"
> + : "=a"(ret)
> + : "a"(nr), "b"(p1), "c"(p2), "d"(p3)
> + : "memory");
> + return ret;
> +}
> +
> #ifdef CONFIG_KVM_GUEST
> bool kvm_para_available(void);
> unsigned int kvm_arch_para_features(void);
Reviewed-by: Krish Sadhukhan <[email protected]>
On 2020-03-30 06:22:07 +0000, Ashish Kalra wrote:
> From: Brijesh Singh <[email protected]>
>
> This hypercall is used by the SEV guest to notify a change in the page
> encryption status to the hypervisor. The hypercall should be invoked
> only when the encryption attribute is changed from encrypted -> decrypted
> and vice versa. By default all guest pages are considered encrypted.
>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: "H. Peter Anvin" <[email protected]>
> Cc: Paolo Bonzini <[email protected]>
> Cc: "Radim Krčmář" <[email protected]>
> Cc: Joerg Roedel <[email protected]>
> Cc: Borislav Petkov <[email protected]>
> Cc: Tom Lendacky <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Signed-off-by: Brijesh Singh <[email protected]>
> Signed-off-by: Ashish Kalra <[email protected]>
Reviewed-by: Venu Busireddy <[email protected]>
> ---
> Documentation/virt/kvm/hypercalls.rst | 15 +++++
> arch/x86/include/asm/kvm_host.h | 2 +
> arch/x86/kvm/svm.c | 95 +++++++++++++++++++++++++++
> arch/x86/kvm/vmx/vmx.c | 1 +
> arch/x86/kvm/x86.c | 6 ++
> include/uapi/linux/kvm_para.h | 1 +
> 6 files changed, 120 insertions(+)
>
> diff --git a/Documentation/virt/kvm/hypercalls.rst b/Documentation/virt/kvm/hypercalls.rst
> index dbaf207e560d..ff5287e68e81 100644
> --- a/Documentation/virt/kvm/hypercalls.rst
> +++ b/Documentation/virt/kvm/hypercalls.rst
> @@ -169,3 +169,18 @@ a0: destination APIC ID
>
> :Usage example: When sending a call-function IPI-many to vCPUs, yield if
> any of the IPI target vCPUs was preempted.
> +
> +
> +8. KVM_HC_PAGE_ENC_STATUS
> +-------------------------
> +:Architecture: x86
> +:Status: active
> +:Purpose: Notify the encryption status changes in guest page table (SEV guest)
> +
> +a0: the guest physical address of the start page
> +a1: the number of pages
> +a2: encryption attribute
> +
> + Where:
> + * 1: Encryption attribute is set
> + * 0: Encryption attribute is cleared
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 98959e8cd448..90718fa3db47 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1267,6 +1267,8 @@ struct kvm_x86_ops {
>
> bool (*apic_init_signal_blocked)(struct kvm_vcpu *vcpu);
> int (*enable_direct_tlbflush)(struct kvm_vcpu *vcpu);
> + int (*page_enc_status_hc)(struct kvm *kvm, unsigned long gpa,
> + unsigned long sz, unsigned long mode);
> };
>
> struct kvm_arch_async_pf {
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index 7c2721e18b06..1d8beaf1bceb 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -136,6 +136,8 @@ struct kvm_sev_info {
> int fd; /* SEV device fd */
> unsigned long pages_locked; /* Number of pages locked */
> struct list_head regions_list; /* List of registered regions */
> + unsigned long *page_enc_bmap;
> + unsigned long page_enc_bmap_size;
> };
>
> struct kvm_svm {
> @@ -1991,6 +1993,9 @@ static void sev_vm_destroy(struct kvm *kvm)
>
> sev_unbind_asid(kvm, sev->handle);
> sev_asid_free(sev->asid);
> +
> + kvfree(sev->page_enc_bmap);
> + sev->page_enc_bmap = NULL;
> }
>
> static void avic_vm_destroy(struct kvm *kvm)
> @@ -7593,6 +7598,94 @@ static int sev_receive_finish(struct kvm *kvm, struct kvm_sev_cmd *argp)
> return ret;
> }
>
> +static int sev_resize_page_enc_bitmap(struct kvm *kvm, unsigned long new_size)
> +{
> + struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> + unsigned long *map;
> + unsigned long sz;
> +
> + if (sev->page_enc_bmap_size >= new_size)
> + return 0;
> +
> + sz = ALIGN(new_size, BITS_PER_LONG) / 8;
> +
> + map = vmalloc(sz);
> + if (!map) {
> + pr_err_once("Failed to allocate encrypted bitmap size %lx\n",
> + sz);
> + return -ENOMEM;
> + }
> +
> + /* mark the page encrypted (by default) */
> + memset(map, 0xff, sz);
> +
> + bitmap_copy(map, sev->page_enc_bmap, sev->page_enc_bmap_size);
> + kvfree(sev->page_enc_bmap);
> +
> + sev->page_enc_bmap = map;
> + sev->page_enc_bmap_size = new_size;
> +
> + return 0;
> +}
> +
> +static int svm_page_enc_status_hc(struct kvm *kvm, unsigned long gpa,
> + unsigned long npages, unsigned long enc)
> +{
> + struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> + kvm_pfn_t pfn_start, pfn_end;
> + gfn_t gfn_start, gfn_end;
> + int ret;
> +
> + if (!sev_guest(kvm))
> + return -EINVAL;
> +
> + if (!npages)
> + return 0;
> +
> + gfn_start = gpa_to_gfn(gpa);
> + gfn_end = gfn_start + npages;
> +
> + /* out of bound access error check */
> + if (gfn_end <= gfn_start)
> + return -EINVAL;
> +
> + /* lets make sure that gpa exist in our memslot */
> + pfn_start = gfn_to_pfn(kvm, gfn_start);
> + pfn_end = gfn_to_pfn(kvm, gfn_end);
> +
> + if (is_error_noslot_pfn(pfn_start) && !is_noslot_pfn(pfn_start)) {
> + /*
> + * Allow guest MMIO range(s) to be added
> + * to the page encryption bitmap.
> + */
> + return -EINVAL;
> + }
> +
> + if (is_error_noslot_pfn(pfn_end) && !is_noslot_pfn(pfn_end)) {
> + /*
> + * Allow guest MMIO range(s) to be added
> + * to the page encryption bitmap.
> + */
> + return -EINVAL;
> + }
> +
> + mutex_lock(&kvm->lock);
> + ret = sev_resize_page_enc_bitmap(kvm, gfn_end);
> + if (ret)
> + goto unlock;
> +
> + if (enc)
> + __bitmap_set(sev->page_enc_bmap, gfn_start,
> + gfn_end - gfn_start);
> + else
> + __bitmap_clear(sev->page_enc_bmap, gfn_start,
> + gfn_end - gfn_start);
> +
> +unlock:
> + mutex_unlock(&kvm->lock);
> + return ret;
> +}
> +
> static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
> {
> struct kvm_sev_cmd sev_cmd;
> @@ -7995,6 +8088,8 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
> .need_emulation_on_page_fault = svm_need_emulation_on_page_fault,
>
> .apic_init_signal_blocked = svm_apic_init_signal_blocked,
> +
> + .page_enc_status_hc = svm_page_enc_status_hc,
> };
>
> static int __init svm_init(void)
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 079d9fbf278e..f68e76ee7f9c 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -8001,6 +8001,7 @@ static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
> .nested_get_evmcs_version = NULL,
> .need_emulation_on_page_fault = vmx_need_emulation_on_page_fault,
> .apic_init_signal_blocked = vmx_apic_init_signal_blocked,
> + .page_enc_status_hc = NULL,
> };
>
> static void vmx_cleanup_l1d_flush(void)
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index cf95c36cb4f4..68428eef2dde 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -7564,6 +7564,12 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
> kvm_sched_yield(vcpu->kvm, a0);
> ret = 0;
> break;
> + case KVM_HC_PAGE_ENC_STATUS:
> + ret = -KVM_ENOSYS;
> + if (kvm_x86_ops->page_enc_status_hc)
> + ret = kvm_x86_ops->page_enc_status_hc(vcpu->kvm,
> + a0, a1, a2);
> + break;
> default:
> ret = -KVM_ENOSYS;
> break;
> diff --git a/include/uapi/linux/kvm_para.h b/include/uapi/linux/kvm_para.h
> index 8b86609849b9..847b83b75dc8 100644
> --- a/include/uapi/linux/kvm_para.h
> +++ b/include/uapi/linux/kvm_para.h
> @@ -29,6 +29,7 @@
> #define KVM_HC_CLOCK_PAIRING 9
> #define KVM_HC_SEND_IPI 10
> #define KVM_HC_SCHED_YIELD 11
> +#define KVM_HC_PAGE_ENC_STATUS 12
>
> /*
> * hypercalls use architecture specific
> --
> 2.17.1
>
On 3/29/20 11:22 PM, Ashish Kalra wrote:
> From: Brijesh Singh <[email protected]>
>
> This hypercall is used by the SEV guest to notify a change in the page
> encryption status to the hypervisor. The hypercall should be invoked
> only when the encryption attribute is changed from encrypted -> decrypted
> and vice versa. By default all guest pages are considered encrypted.
>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: "H. Peter Anvin" <[email protected]>
> Cc: Paolo Bonzini <[email protected]>
> Cc: "Radim Krčmář" <[email protected]>
> Cc: Joerg Roedel <[email protected]>
> Cc: Borislav Petkov <[email protected]>
> Cc: Tom Lendacky <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Signed-off-by: Brijesh Singh <[email protected]>
> Signed-off-by: Ashish Kalra <[email protected]>
> ---
> Documentation/virt/kvm/hypercalls.rst | 15 +++++
> arch/x86/include/asm/kvm_host.h | 2 +
> arch/x86/kvm/svm.c | 95 +++++++++++++++++++++++++++
> arch/x86/kvm/vmx/vmx.c | 1 +
> arch/x86/kvm/x86.c | 6 ++
> include/uapi/linux/kvm_para.h | 1 +
> 6 files changed, 120 insertions(+)
>
> diff --git a/Documentation/virt/kvm/hypercalls.rst b/Documentation/virt/kvm/hypercalls.rst
> index dbaf207e560d..ff5287e68e81 100644
> --- a/Documentation/virt/kvm/hypercalls.rst
> +++ b/Documentation/virt/kvm/hypercalls.rst
> @@ -169,3 +169,18 @@ a0: destination APIC ID
>
> :Usage example: When sending a call-function IPI-many to vCPUs, yield if
> any of the IPI target vCPUs was preempted.
> +
> +
> +8. KVM_HC_PAGE_ENC_STATUS
> +-------------------------
> +:Architecture: x86
> +:Status: active
> +:Purpose: Notify the encryption status changes in guest page table (SEV guest)
> +
> +a0: the guest physical address of the start page
> +a1: the number of pages
> +a2: encryption attribute
> +
> + Where:
> + * 1: Encryption attribute is set
> + * 0: Encryption attribute is cleared
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 98959e8cd448..90718fa3db47 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1267,6 +1267,8 @@ struct kvm_x86_ops {
>
> bool (*apic_init_signal_blocked)(struct kvm_vcpu *vcpu);
> int (*enable_direct_tlbflush)(struct kvm_vcpu *vcpu);
> + int (*page_enc_status_hc)(struct kvm *kvm, unsigned long gpa,
> + unsigned long sz, unsigned long mode);
> };
>
> struct kvm_arch_async_pf {
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index 7c2721e18b06..1d8beaf1bceb 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -136,6 +136,8 @@ struct kvm_sev_info {
> int fd; /* SEV device fd */
> unsigned long pages_locked; /* Number of pages locked */
> struct list_head regions_list; /* List of registered regions */
> + unsigned long *page_enc_bmap;
> + unsigned long page_enc_bmap_size;
> };
>
> struct kvm_svm {
> @@ -1991,6 +1993,9 @@ static void sev_vm_destroy(struct kvm *kvm)
>
> sev_unbind_asid(kvm, sev->handle);
> sev_asid_free(sev->asid);
> +
> + kvfree(sev->page_enc_bmap);
> + sev->page_enc_bmap = NULL;
> }
>
> static void avic_vm_destroy(struct kvm *kvm)
> @@ -7593,6 +7598,94 @@ static int sev_receive_finish(struct kvm *kvm, struct kvm_sev_cmd *argp)
> return ret;
> }
>
> +static int sev_resize_page_enc_bitmap(struct kvm *kvm, unsigned long new_size)
> +{
> + struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> + unsigned long *map;
> + unsigned long sz;
> +
> + if (sev->page_enc_bmap_size >= new_size)
> + return 0;
> +
> + sz = ALIGN(new_size, BITS_PER_LONG) / 8;
> +
> + map = vmalloc(sz);
Just wondering why we can't directly modify sev->page_enc_bmap.
> + if (!map) {
> + pr_err_once("Failed to allocate encrypted bitmap size %lx\n",
> + sz);
> + return -ENOMEM;
> + }
> +
> + /* mark the page encrypted (by default) */
> + memset(map, 0xff, sz);
> +
> + bitmap_copy(map, sev->page_enc_bmap, sev->page_enc_bmap_size);
> + kvfree(sev->page_enc_bmap);
> +
> + sev->page_enc_bmap = map;
> + sev->page_enc_bmap_size = new_size;
> +
> + return 0;
> +}
> +
> +static int svm_page_enc_status_hc(struct kvm *kvm, unsigned long gpa,
> + unsigned long npages, unsigned long enc)
> +{
> + struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> + kvm_pfn_t pfn_start, pfn_end;
> + gfn_t gfn_start, gfn_end;
> + int ret;
> +
> + if (!sev_guest(kvm))
> + return -EINVAL;
> +
> + if (!npages)
> + return 0;
> +
> + gfn_start = gpa_to_gfn(gpa);
> + gfn_end = gfn_start + npages;
> +
> + /* out of bound access error check */
> + if (gfn_end <= gfn_start)
> + return -EINVAL;
> +
> + /* lets make sure that gpa exist in our memslot */
> + pfn_start = gfn_to_pfn(kvm, gfn_start);
> + pfn_end = gfn_to_pfn(kvm, gfn_end);
> +
> + if (is_error_noslot_pfn(pfn_start) && !is_noslot_pfn(pfn_start)) {
> + /*
> + * Allow guest MMIO range(s) to be added
> + * to the page encryption bitmap.
> + */
> + return -EINVAL;
> + }
> +
> + if (is_error_noslot_pfn(pfn_end) && !is_noslot_pfn(pfn_end)) {
> + /*
> + * Allow guest MMIO range(s) to be added
> + * to the page encryption bitmap.
> + */
> + return -EINVAL;
> + }
It seems is_error_noslot_pfn() covers both cases - i) gfn slot is
absent, ii) failure to translate to pfn. So do we still need
is_noslot_pfn() ?
> +
> + mutex_lock(&kvm->lock);
> + ret = sev_resize_page_enc_bitmap(kvm, gfn_end);
> + if (ret)
> + goto unlock;
> +
> + if (enc)
> + __bitmap_set(sev->page_enc_bmap, gfn_start,
> + gfn_end - gfn_start);
> + else
> + __bitmap_clear(sev->page_enc_bmap, gfn_start,
> + gfn_end - gfn_start);
> +
> +unlock:
> + mutex_unlock(&kvm->lock);
> + return ret;
> +}
> +
> static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
> {
> struct kvm_sev_cmd sev_cmd;
> @@ -7995,6 +8088,8 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
> .need_emulation_on_page_fault = svm_need_emulation_on_page_fault,
>
> .apic_init_signal_blocked = svm_apic_init_signal_blocked,
> +
> + .page_enc_status_hc = svm_page_enc_status_hc,
Why not place it where other encryption ops are located ?
...
.mem_enc_unreg_region
+ .page_enc_status_hc = svm_page_enc_status_hc
> };
>
> static int __init svm_init(void)
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 079d9fbf278e..f68e76ee7f9c 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -8001,6 +8001,7 @@ static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
> .nested_get_evmcs_version = NULL,
> .need_emulation_on_page_fault = vmx_need_emulation_on_page_fault,
> .apic_init_signal_blocked = vmx_apic_init_signal_blocked,
> + .page_enc_status_hc = NULL,
> };
>
> static void vmx_cleanup_l1d_flush(void)
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index cf95c36cb4f4..68428eef2dde 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -7564,6 +7564,12 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
> kvm_sched_yield(vcpu->kvm, a0);
> ret = 0;
> break;
> + case KVM_HC_PAGE_ENC_STATUS:
> + ret = -KVM_ENOSYS;
> + if (kvm_x86_ops->page_enc_status_hc)
> + ret = kvm_x86_ops->page_enc_status_hc(vcpu->kvm,
> + a0, a1, a2);
> + break;
> default:
> ret = -KVM_ENOSYS;
> break;
> diff --git a/include/uapi/linux/kvm_para.h b/include/uapi/linux/kvm_para.h
> index 8b86609849b9..847b83b75dc8 100644
> --- a/include/uapi/linux/kvm_para.h
> +++ b/include/uapi/linux/kvm_para.h
> @@ -29,6 +29,7 @@
> #define KVM_HC_CLOCK_PAIRING 9
> #define KVM_HC_SEND_IPI 10
> #define KVM_HC_SCHED_YIELD 11
> +#define KVM_HC_PAGE_ENC_STATUS 12
>
> /*
> * hypercalls use architecture specific
On Thu, Apr 02, 2020 at 06:31:54PM -0700, Krish Sadhukhan wrote:
>
> On 3/29/20 11:22 PM, Ashish Kalra wrote:
> > From: Brijesh Singh <[email protected]>
> >
> > This hypercall is used by the SEV guest to notify a change in the page
> > encryption status to the hypervisor. The hypercall should be invoked
> > only when the encryption attribute is changed from encrypted -> decrypted
> > and vice versa. By default all guest pages are considered encrypted.
> >
> > Cc: Thomas Gleixner <[email protected]>
> > Cc: Ingo Molnar <[email protected]>
> > Cc: "H. Peter Anvin" <[email protected]>
> > Cc: Paolo Bonzini <[email protected]>
> > Cc: "Radim Krčmář" <[email protected]>
> > Cc: Joerg Roedel <[email protected]>
> > Cc: Borislav Petkov <[email protected]>
> > Cc: Tom Lendacky <[email protected]>
> > Cc: [email protected]
> > Cc: [email protected]
> > Cc: [email protected]
> > Signed-off-by: Brijesh Singh <[email protected]>
> > Signed-off-by: Ashish Kalra <[email protected]>
> > ---
> > Documentation/virt/kvm/hypercalls.rst | 15 +++++
> > arch/x86/include/asm/kvm_host.h | 2 +
> > arch/x86/kvm/svm.c | 95 +++++++++++++++++++++++++++
> > arch/x86/kvm/vmx/vmx.c | 1 +
> > arch/x86/kvm/x86.c | 6 ++
> > include/uapi/linux/kvm_para.h | 1 +
> > 6 files changed, 120 insertions(+)
> >
> > diff --git a/Documentation/virt/kvm/hypercalls.rst b/Documentation/virt/kvm/hypercalls.rst
> > index dbaf207e560d..ff5287e68e81 100644
> > --- a/Documentation/virt/kvm/hypercalls.rst
> > +++ b/Documentation/virt/kvm/hypercalls.rst
> > @@ -169,3 +169,18 @@ a0: destination APIC ID
> > :Usage example: When sending a call-function IPI-many to vCPUs, yield if
> > any of the IPI target vCPUs was preempted.
> > +
> > +
> > +8. KVM_HC_PAGE_ENC_STATUS
> > +-------------------------
> > +:Architecture: x86
> > +:Status: active
> > +:Purpose: Notify the encryption status changes in guest page table (SEV guest)
> > +
> > +a0: the guest physical address of the start page
> > +a1: the number of pages
> > +a2: encryption attribute
> > +
> > + Where:
> > + * 1: Encryption attribute is set
> > + * 0: Encryption attribute is cleared
> > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > index 98959e8cd448..90718fa3db47 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -1267,6 +1267,8 @@ struct kvm_x86_ops {
> > bool (*apic_init_signal_blocked)(struct kvm_vcpu *vcpu);
> > int (*enable_direct_tlbflush)(struct kvm_vcpu *vcpu);
> > + int (*page_enc_status_hc)(struct kvm *kvm, unsigned long gpa,
> > + unsigned long sz, unsigned long mode);
> > };
> > struct kvm_arch_async_pf {
> > diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> > index 7c2721e18b06..1d8beaf1bceb 100644
> > --- a/arch/x86/kvm/svm.c
> > +++ b/arch/x86/kvm/svm.c
> > @@ -136,6 +136,8 @@ struct kvm_sev_info {
> > int fd; /* SEV device fd */
> > unsigned long pages_locked; /* Number of pages locked */
> > struct list_head regions_list; /* List of registered regions */
> > + unsigned long *page_enc_bmap;
> > + unsigned long page_enc_bmap_size;
> > };
> > struct kvm_svm {
> > @@ -1991,6 +1993,9 @@ static void sev_vm_destroy(struct kvm *kvm)
> > sev_unbind_asid(kvm, sev->handle);
> > sev_asid_free(sev->asid);
> > +
> > + kvfree(sev->page_enc_bmap);
> > + sev->page_enc_bmap = NULL;
> > }
> > static void avic_vm_destroy(struct kvm *kvm)
> > @@ -7593,6 +7598,94 @@ static int sev_receive_finish(struct kvm *kvm, struct kvm_sev_cmd *argp)
> > return ret;
> > }
> > +static int sev_resize_page_enc_bitmap(struct kvm *kvm, unsigned long new_size)
> > +{
> > + struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> > + unsigned long *map;
> > + unsigned long sz;
> > +
> > + if (sev->page_enc_bmap_size >= new_size)
> > + return 0;
> > +
> > + sz = ALIGN(new_size, BITS_PER_LONG) / 8;
> > +
> > + map = vmalloc(sz);
>
>
> Just wondering why we can't directly modify sev->page_enc_bmap.
>
Because the page_enc_bitmap needs to be re-sized here, it needs to be
expanded here.
> > + if (!map) {
> > + pr_err_once("Failed to allocate encrypted bitmap size %lx\n",
> > + sz);
> > + return -ENOMEM;
> > + }
> > +
> > + /* mark the page encrypted (by default) */
> > + memset(map, 0xff, sz);
> > +
> > + bitmap_copy(map, sev->page_enc_bmap, sev->page_enc_bmap_size);
> > + kvfree(sev->page_enc_bmap);
> > +
> > + sev->page_enc_bmap = map;
> > + sev->page_enc_bmap_size = new_size;
> > +
> > + return 0;
> > +}
> > +
> > +static int svm_page_enc_status_hc(struct kvm *kvm, unsigned long gpa,
> > + unsigned long npages, unsigned long enc)
> > +{
> > + struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> > + kvm_pfn_t pfn_start, pfn_end;
> > + gfn_t gfn_start, gfn_end;
> > + int ret;
> > +
> > + if (!sev_guest(kvm))
> > + return -EINVAL;
> > +
> > + if (!npages)
> > + return 0;
> > +
> > + gfn_start = gpa_to_gfn(gpa);
> > + gfn_end = gfn_start + npages;
> > +
> > + /* out of bound access error check */
> > + if (gfn_end <= gfn_start)
> > + return -EINVAL;
> > +
> > + /* lets make sure that gpa exist in our memslot */
> > + pfn_start = gfn_to_pfn(kvm, gfn_start);
> > + pfn_end = gfn_to_pfn(kvm, gfn_end);
> > +
> > + if (is_error_noslot_pfn(pfn_start) && !is_noslot_pfn(pfn_start)) {
> > + /*
> > + * Allow guest MMIO range(s) to be added
> > + * to the page encryption bitmap.
> > + */
> > + return -EINVAL;
> > + }
> > +
> > + if (is_error_noslot_pfn(pfn_end) && !is_noslot_pfn(pfn_end)) {
> > + /*
> > + * Allow guest MMIO range(s) to be added
> > + * to the page encryption bitmap.
> > + */
> > + return -EINVAL;
> > + }
>
>
> It seems is_error_noslot_pfn() covers both cases - i) gfn slot is absent,
> ii) failure to translate to pfn. So do we still need is_noslot_pfn() ?
>
We do need to check for !is_noslot_pfn(..) additionally as the MMIO ranges will not
be having a slot allocated.
Thanks,
Ashish
> > +
> > + mutex_lock(&kvm->lock);
> > + ret = sev_resize_page_enc_bitmap(kvm, gfn_end);
> > + if (ret)
> > + goto unlock;
> > +
> > + if (enc)
> > + __bitmap_set(sev->page_enc_bmap, gfn_start,
> > + gfn_end - gfn_start);
> > + else
> > + __bitmap_clear(sev->page_enc_bmap, gfn_start,
> > + gfn_end - gfn_start);
> > +
> > +unlock:
> > + mutex_unlock(&kvm->lock);
> > + return ret;
> > +}
> > +
> > static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
> > {
> > struct kvm_sev_cmd sev_cmd;
> > @@ -7995,6 +8088,8 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
> > .need_emulation_on_page_fault = svm_need_emulation_on_page_fault,
> > .apic_init_signal_blocked = svm_apic_init_signal_blocked,
> > +
> > + .page_enc_status_hc = svm_page_enc_status_hc,
>
>
> Why not place it where other encryption ops are located ?
>
> ...
>
> .mem_enc_unreg_region
>
> + .page_enc_status_hc = svm_page_enc_status_hc
>
> > };
> > static int __init svm_init(void)
> > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> > index 079d9fbf278e..f68e76ee7f9c 100644
> > --- a/arch/x86/kvm/vmx/vmx.c
> > +++ b/arch/x86/kvm/vmx/vmx.c
> > @@ -8001,6 +8001,7 @@ static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
> > .nested_get_evmcs_version = NULL,
> > .need_emulation_on_page_fault = vmx_need_emulation_on_page_fault,
> > .apic_init_signal_blocked = vmx_apic_init_signal_blocked,
> > + .page_enc_status_hc = NULL,
> > };
> > static void vmx_cleanup_l1d_flush(void)
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index cf95c36cb4f4..68428eef2dde 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -7564,6 +7564,12 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
> > kvm_sched_yield(vcpu->kvm, a0);
> > ret = 0;
> > break;
> > + case KVM_HC_PAGE_ENC_STATUS:
> > + ret = -KVM_ENOSYS;
> > + if (kvm_x86_ops->page_enc_status_hc)
> > + ret = kvm_x86_ops->page_enc_status_hc(vcpu->kvm,
> > + a0, a1, a2);
> > + break;
> > default:
> > ret = -KVM_ENOSYS;
> > break;
> > diff --git a/include/uapi/linux/kvm_para.h b/include/uapi/linux/kvm_para.h
> > index 8b86609849b9..847b83b75dc8 100644
> > --- a/include/uapi/linux/kvm_para.h
> > +++ b/include/uapi/linux/kvm_para.h
> > @@ -29,6 +29,7 @@
> > #define KVM_HC_CLOCK_PAIRING 9
> > #define KVM_HC_SEND_IPI 10
> > #define KVM_HC_SCHED_YIELD 11
> > +#define KVM_HC_PAGE_ENC_STATUS 12
> > /*
> > * hypercalls use architecture specific
On Fri, Apr 03, 2020 at 01:57:48AM +0000, Ashish Kalra wrote:
> On Thu, Apr 02, 2020 at 06:31:54PM -0700, Krish Sadhukhan wrote:
> >
> > On 3/29/20 11:22 PM, Ashish Kalra wrote:
> > > From: Brijesh Singh <[email protected]>
> > >
> > > This hypercall is used by the SEV guest to notify a change in the page
> > > encryption status to the hypervisor. The hypercall should be invoked
> > > only when the encryption attribute is changed from encrypted -> decrypted
> > > and vice versa. By default all guest pages are considered encrypted.
> > >
> > > Cc: Thomas Gleixner <[email protected]>
> > > Cc: Ingo Molnar <[email protected]>
> > > Cc: "H. Peter Anvin" <[email protected]>
> > > Cc: Paolo Bonzini <[email protected]>
> > > Cc: "Radim Krčmář" <[email protected]>
> > > Cc: Joerg Roedel <[email protected]>
> > > Cc: Borislav Petkov <[email protected]>
> > > Cc: Tom Lendacky <[email protected]>
> > > Cc: [email protected]
> > > Cc: [email protected]
> > > Cc: [email protected]
> > > Signed-off-by: Brijesh Singh <[email protected]>
> > > Signed-off-by: Ashish Kalra <[email protected]>
> > > ---
> > > Documentation/virt/kvm/hypercalls.rst | 15 +++++
> > > arch/x86/include/asm/kvm_host.h | 2 +
> > > arch/x86/kvm/svm.c | 95 +++++++++++++++++++++++++++
> > > arch/x86/kvm/vmx/vmx.c | 1 +
> > > arch/x86/kvm/x86.c | 6 ++
> > > include/uapi/linux/kvm_para.h | 1 +
> > > 6 files changed, 120 insertions(+)
> > >
> > > diff --git a/Documentation/virt/kvm/hypercalls.rst b/Documentation/virt/kvm/hypercalls.rst
> > > index dbaf207e560d..ff5287e68e81 100644
> > > --- a/Documentation/virt/kvm/hypercalls.rst
> > > +++ b/Documentation/virt/kvm/hypercalls.rst
> > > @@ -169,3 +169,18 @@ a0: destination APIC ID
> > > :Usage example: When sending a call-function IPI-many to vCPUs, yield if
> > > any of the IPI target vCPUs was preempted.
> > > +
> > > +
> > > +8. KVM_HC_PAGE_ENC_STATUS
> > > +-------------------------
> > > +:Architecture: x86
> > > +:Status: active
> > > +:Purpose: Notify the encryption status changes in guest page table (SEV guest)
> > > +
> > > +a0: the guest physical address of the start page
> > > +a1: the number of pages
> > > +a2: encryption attribute
> > > +
> > > + Where:
> > > + * 1: Encryption attribute is set
> > > + * 0: Encryption attribute is cleared
> > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > > index 98959e8cd448..90718fa3db47 100644
> > > --- a/arch/x86/include/asm/kvm_host.h
> > > +++ b/arch/x86/include/asm/kvm_host.h
> > > @@ -1267,6 +1267,8 @@ struct kvm_x86_ops {
> > > bool (*apic_init_signal_blocked)(struct kvm_vcpu *vcpu);
> > > int (*enable_direct_tlbflush)(struct kvm_vcpu *vcpu);
> > > + int (*page_enc_status_hc)(struct kvm *kvm, unsigned long gpa,
> > > + unsigned long sz, unsigned long mode);
> > > };
> > > struct kvm_arch_async_pf {
> > > diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> > > index 7c2721e18b06..1d8beaf1bceb 100644
> > > --- a/arch/x86/kvm/svm.c
> > > +++ b/arch/x86/kvm/svm.c
> > > @@ -136,6 +136,8 @@ struct kvm_sev_info {
> > > int fd; /* SEV device fd */
> > > unsigned long pages_locked; /* Number of pages locked */
> > > struct list_head regions_list; /* List of registered regions */
> > > + unsigned long *page_enc_bmap;
> > > + unsigned long page_enc_bmap_size;
> > > };
> > > struct kvm_svm {
> > > @@ -1991,6 +1993,9 @@ static void sev_vm_destroy(struct kvm *kvm)
> > > sev_unbind_asid(kvm, sev->handle);
> > > sev_asid_free(sev->asid);
> > > +
> > > + kvfree(sev->page_enc_bmap);
> > > + sev->page_enc_bmap = NULL;
> > > }
> > > static void avic_vm_destroy(struct kvm *kvm)
> > > @@ -7593,6 +7598,94 @@ static int sev_receive_finish(struct kvm *kvm, struct kvm_sev_cmd *argp)
> > > return ret;
> > > }
> > > +static int sev_resize_page_enc_bitmap(struct kvm *kvm, unsigned long new_size)
> > > +{
> > > + struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> > > + unsigned long *map;
> > > + unsigned long sz;
> > > +
> > > + if (sev->page_enc_bmap_size >= new_size)
> > > + return 0;
> > > +
> > > + sz = ALIGN(new_size, BITS_PER_LONG) / 8;
> > > +
> > > + map = vmalloc(sz);
> >
> >
> > Just wondering why we can't directly modify sev->page_enc_bmap.
> >
>
> Because the page_enc_bitmap needs to be re-sized here, it needs to be
> expanded here.
>
I don't believe there is anything is like a realloc() kind of equivalent
for the kmalloc() interfaces.
Thanks,
Ashish
> > > + if (!map) {
> > > + pr_err_once("Failed to allocate encrypted bitmap size %lx\n",
> > > + sz);
> > > + return -ENOMEM;
> > > + }
> > > +
> > > + /* mark the page encrypted (by default) */
> > > + memset(map, 0xff, sz);
> > > +
> > > + bitmap_copy(map, sev->page_enc_bmap, sev->page_enc_bmap_size);
> > > + kvfree(sev->page_enc_bmap);
> > > +
> > > + sev->page_enc_bmap = map;
> > > + sev->page_enc_bmap_size = new_size;
> > > +
> > > + return 0;
> > > +}
> > > +
> > > +static int svm_page_enc_status_hc(struct kvm *kvm, unsigned long gpa,
> > > + unsigned long npages, unsigned long enc)
> > > +{
> > > + struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> > > + kvm_pfn_t pfn_start, pfn_end;
> > > + gfn_t gfn_start, gfn_end;
> > > + int ret;
> > > +
> > > + if (!sev_guest(kvm))
> > > + return -EINVAL;
> > > +
> > > + if (!npages)
> > > + return 0;
> > > +
> > > + gfn_start = gpa_to_gfn(gpa);
> > > + gfn_end = gfn_start + npages;
> > > +
> > > + /* out of bound access error check */
> > > + if (gfn_end <= gfn_start)
> > > + return -EINVAL;
> > > +
> > > + /* lets make sure that gpa exist in our memslot */
> > > + pfn_start = gfn_to_pfn(kvm, gfn_start);
> > > + pfn_end = gfn_to_pfn(kvm, gfn_end);
> > > +
> > > + if (is_error_noslot_pfn(pfn_start) && !is_noslot_pfn(pfn_start)) {
> > > + /*
> > > + * Allow guest MMIO range(s) to be added
> > > + * to the page encryption bitmap.
> > > + */
> > > + return -EINVAL;
> > > + }
> > > +
> > > + if (is_error_noslot_pfn(pfn_end) && !is_noslot_pfn(pfn_end)) {
> > > + /*
> > > + * Allow guest MMIO range(s) to be added
> > > + * to the page encryption bitmap.
> > > + */
> > > + return -EINVAL;
> > > + }
> >
> >
> > It seems is_error_noslot_pfn() covers both cases - i) gfn slot is absent,
> > ii) failure to translate to pfn. So do we still need is_noslot_pfn() ?
> >
>
> We do need to check for !is_noslot_pfn(..) additionally as the MMIO ranges will not
> be having a slot allocated.
>
> Thanks,
> Ashish
>
> > > +
> > > + mutex_lock(&kvm->lock);
> > > + ret = sev_resize_page_enc_bitmap(kvm, gfn_end);
> > > + if (ret)
> > > + goto unlock;
> > > +
> > > + if (enc)
> > > + __bitmap_set(sev->page_enc_bmap, gfn_start,
> > > + gfn_end - gfn_start);
> > > + else
> > > + __bitmap_clear(sev->page_enc_bmap, gfn_start,
> > > + gfn_end - gfn_start);
> > > +
> > > +unlock:
> > > + mutex_unlock(&kvm->lock);
> > > + return ret;
> > > +}
> > > +
> > > static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
> > > {
> > > struct kvm_sev_cmd sev_cmd;
> > > @@ -7995,6 +8088,8 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
> > > .need_emulation_on_page_fault = svm_need_emulation_on_page_fault,
> > > .apic_init_signal_blocked = svm_apic_init_signal_blocked,
> > > +
> > > + .page_enc_status_hc = svm_page_enc_status_hc,
> >
> >
> > Why not place it where other encryption ops are located ?
> >
> > ...
> >
> > .mem_enc_unreg_region
> >
> > + .page_enc_status_hc = svm_page_enc_status_hc
> >
> > > };
> > > static int __init svm_init(void)
> > > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> > > index 079d9fbf278e..f68e76ee7f9c 100644
> > > --- a/arch/x86/kvm/vmx/vmx.c
> > > +++ b/arch/x86/kvm/vmx/vmx.c
> > > @@ -8001,6 +8001,7 @@ static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
> > > .nested_get_evmcs_version = NULL,
> > > .need_emulation_on_page_fault = vmx_need_emulation_on_page_fault,
> > > .apic_init_signal_blocked = vmx_apic_init_signal_blocked,
> > > + .page_enc_status_hc = NULL,
> > > };
> > > static void vmx_cleanup_l1d_flush(void)
> > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > > index cf95c36cb4f4..68428eef2dde 100644
> > > --- a/arch/x86/kvm/x86.c
> > > +++ b/arch/x86/kvm/x86.c
> > > @@ -7564,6 +7564,12 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
> > > kvm_sched_yield(vcpu->kvm, a0);
> > > ret = 0;
> > > break;
> > > + case KVM_HC_PAGE_ENC_STATUS:
> > > + ret = -KVM_ENOSYS;
> > > + if (kvm_x86_ops->page_enc_status_hc)
> > > + ret = kvm_x86_ops->page_enc_status_hc(vcpu->kvm,
> > > + a0, a1, a2);
> > > + break;
> > > default:
> > > ret = -KVM_ENOSYS;
> > > break;
> > > diff --git a/include/uapi/linux/kvm_para.h b/include/uapi/linux/kvm_para.h
> > > index 8b86609849b9..847b83b75dc8 100644
> > > --- a/include/uapi/linux/kvm_para.h
> > > +++ b/include/uapi/linux/kvm_para.h
> > > @@ -29,6 +29,7 @@
> > > #define KVM_HC_CLOCK_PAIRING 9
> > > #define KVM_HC_SEND_IPI 10
> > > #define KVM_HC_SCHED_YIELD 11
> > > +#define KVM_HC_PAGE_ENC_STATUS 12
> > > /*
> > > * hypercalls use architecture specific
Ccing kexec list.
Ashish, could you cc kexec list if you repost later?
On 03/30/20 at 06:23am, Ashish Kalra wrote:
> From: Ashish Kalra <[email protected]>
>
> Reset the host's page encryption bitmap related to kernel
> specific page encryption status settings before we load a
> new kernel by kexec. We cannot reset the complete
> page encryption bitmap here as we need to retain the
> UEFI/OVMF firmware specific settings.
>
> Signed-off-by: Ashish Kalra <[email protected]>
> ---
> arch/x86/kernel/kvm.c | 28 ++++++++++++++++++++++++++++
> 1 file changed, 28 insertions(+)
>
> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> index 8fcee0b45231..ba6cce3c84af 100644
> --- a/arch/x86/kernel/kvm.c
> +++ b/arch/x86/kernel/kvm.c
> @@ -34,6 +34,7 @@
> #include <asm/hypervisor.h>
> #include <asm/tlb.h>
> #include <asm/cpuidle_haltpoll.h>
> +#include <asm/e820/api.h>
>
> static int kvmapf = 1;
>
> @@ -357,6 +358,33 @@ static void kvm_pv_guest_cpu_reboot(void *unused)
> */
> if (kvm_para_has_feature(KVM_FEATURE_PV_EOI))
> wrmsrl(MSR_KVM_PV_EOI_EN, 0);
> + /*
> + * Reset the host's page encryption bitmap related to kernel
> + * specific page encryption status settings before we load a
> + * new kernel by kexec. NOTE: We cannot reset the complete
> + * page encryption bitmap here as we need to retain the
> + * UEFI/OVMF firmware specific settings.
> + */
> + if (kvm_para_has_feature(KVM_FEATURE_SEV_LIVE_MIGRATION) &&
> + (smp_processor_id() == 0)) {
> + unsigned long nr_pages;
> + int i;
> +
> + for (i = 0; i < e820_table->nr_entries; i++) {
> + struct e820_entry *entry = &e820_table->entries[i];
> + unsigned long start_pfn, end_pfn;
> +
> + if (entry->type != E820_TYPE_RAM)
> + continue;
> +
> + start_pfn = entry->addr >> PAGE_SHIFT;
> + end_pfn = (entry->addr + entry->size) >> PAGE_SHIFT;
> + nr_pages = DIV_ROUND_UP(entry->size, PAGE_SIZE);
> +
> + kvm_sev_hypercall3(KVM_HC_PAGE_ENC_STATUS,
> + entry->addr, nr_pages, 1);
> + }
> + }
> kvm_pv_disable_apf();
> kvm_disable_steal_time();
> }
> --
> 2.17.1
>
On 2020-03-30 06:22:23 +0000, Ashish Kalra wrote:
> From: Brijesh Singh <[email protected]>
>
> The ioctl can be used to retrieve page encryption bitmap for a given
> gfn range.
>
> Return the correct bitmap as per the number of pages being requested
> by the user. Ensure that we only copy bmap->num_pages bytes in the
> userspace buffer, if bmap->num_pages is not byte aligned we read
> the trailing bits from the userspace and copy those bits as is.
>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: "H. Peter Anvin" <[email protected]>
> Cc: Paolo Bonzini <[email protected]>
> Cc: "Radim Krčmář" <[email protected]>
> Cc: Joerg Roedel <[email protected]>
> Cc: Borislav Petkov <[email protected]>
> Cc: Tom Lendacky <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Signed-off-by: Brijesh Singh <[email protected]>
> Signed-off-by: Ashish Kalra <[email protected]>
With the suggestions below...
Reviewed-by: Venu Busireddy <[email protected]>
> ---
> Documentation/virt/kvm/api.rst | 27 +++++++++++++
> arch/x86/include/asm/kvm_host.h | 2 +
> arch/x86/kvm/svm.c | 71 +++++++++++++++++++++++++++++++++
> arch/x86/kvm/x86.c | 12 ++++++
> include/uapi/linux/kvm.h | 12 ++++++
> 5 files changed, 124 insertions(+)
>
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index ebd383fba939..8ad800ebb54f 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -4648,6 +4648,33 @@ This ioctl resets VCPU registers and control structures according to
> the clear cpu reset definition in the POP. However, the cpu is not put
> into ESA mode. This reset is a superset of the initial reset.
>
> +4.125 KVM_GET_PAGE_ENC_BITMAP (vm ioctl)
> +---------------------------------------
> +
> +:Capability: basic
> +:Architectures: x86
> +:Type: vm ioctl
> +:Parameters: struct kvm_page_enc_bitmap (in/out)
> +:Returns: 0 on success, -1 on error
> +
> +/* for KVM_GET_PAGE_ENC_BITMAP */
> +struct kvm_page_enc_bitmap {
> + __u64 start_gfn;
> + __u64 num_pages;
> + union {
> + void __user *enc_bitmap; /* one bit per page */
> + __u64 padding2;
> + };
> +};
> +
> +The encrypted VMs have concept of private and shared pages. The private
s/have concept/have the concept/
> +page is encrypted with the guest-specific key, while shared page may
s/page is/pages are/
s/shared page/the shared pages/
> +be encrypted with the hypervisor key. The KVM_GET_PAGE_ENC_BITMAP can
> +be used to get the bitmap indicating whether the guest page is private
> +or shared. The bitmap can be used during the guest migration, if the page
s/, if/. If/
> +is private then userspace need to use SEV migration commands to transmit
s/then userspace need/then the userspace needs/
> +the page.
> +
>
> 5. The kvm_run structure
> ========================
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 90718fa3db47..27e43e3ec9d8 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1269,6 +1269,8 @@ struct kvm_x86_ops {
> int (*enable_direct_tlbflush)(struct kvm_vcpu *vcpu);
> int (*page_enc_status_hc)(struct kvm *kvm, unsigned long gpa,
> unsigned long sz, unsigned long mode);
> + int (*get_page_enc_bitmap)(struct kvm *kvm,
> + struct kvm_page_enc_bitmap *bmap);
> };
>
> struct kvm_arch_async_pf {
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index 1d8beaf1bceb..bae783cd396a 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -7686,6 +7686,76 @@ static int svm_page_enc_status_hc(struct kvm *kvm, unsigned long gpa,
> return ret;
> }
>
> +static int svm_get_page_enc_bitmap(struct kvm *kvm,
> + struct kvm_page_enc_bitmap *bmap)
> +{
> + struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> + unsigned long gfn_start, gfn_end;
> + unsigned long sz, i, sz_bytes;
> + unsigned long *bitmap;
> + int ret, n;
> +
> + if (!sev_guest(kvm))
> + return -ENOTTY;
> +
> + gfn_start = bmap->start_gfn;
> + gfn_end = gfn_start + bmap->num_pages;
> +
> + sz = ALIGN(bmap->num_pages, BITS_PER_LONG) / BITS_PER_BYTE;
> + bitmap = kmalloc(sz, GFP_KERNEL);
> + if (!bitmap)
> + return -ENOMEM;
> +
> + /* by default all pages are marked encrypted */
> + memset(bitmap, 0xff, sz);
> +
> + mutex_lock(&kvm->lock);
> + if (sev->page_enc_bmap) {
> + i = gfn_start;
> + for_each_clear_bit_from(i, sev->page_enc_bmap,
> + min(sev->page_enc_bmap_size, gfn_end))
> + clear_bit(i - gfn_start, bitmap);
> + }
> + mutex_unlock(&kvm->lock);
> +
> + ret = -EFAULT;
> +
> + n = bmap->num_pages % BITS_PER_BYTE;
> + sz_bytes = ALIGN(bmap->num_pages, BITS_PER_BYTE) / BITS_PER_BYTE;
> +
> + /*
> + * Return the correct bitmap as per the number of pages being
> + * requested by the user. Ensure that we only copy bmap->num_pages
> + * bytes in the userspace buffer, if bmap->num_pages is not byte
> + * aligned we read the trailing bits from the userspace and copy
> + * those bits as is.
> + */
> +
> + if (n) {
> + unsigned char *bitmap_kernel = (unsigned char *)bitmap;
> + unsigned char bitmap_user;
> + unsigned long offset, mask;
> +
> + offset = bmap->num_pages / BITS_PER_BYTE;
> + if (copy_from_user(&bitmap_user, bmap->enc_bitmap + offset,
> + sizeof(unsigned char)))
> + goto out;
> +
> + mask = GENMASK(n - 1, 0);
> + bitmap_user &= ~mask;
> + bitmap_kernel[offset] &= mask;
> + bitmap_kernel[offset] |= bitmap_user;
> + }
> +
> + if (copy_to_user(bmap->enc_bitmap, bitmap, sz_bytes))
> + goto out;
> +
> + ret = 0;
> +out:
> + kfree(bitmap);
> + return ret;
> +}
> +
> static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
> {
> struct kvm_sev_cmd sev_cmd;
> @@ -8090,6 +8160,7 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
> .apic_init_signal_blocked = svm_apic_init_signal_blocked,
>
> .page_enc_status_hc = svm_page_enc_status_hc,
> + .get_page_enc_bitmap = svm_get_page_enc_bitmap,
> };
>
> static int __init svm_init(void)
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 68428eef2dde..3c3fea4e20b5 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -5226,6 +5226,18 @@ long kvm_arch_vm_ioctl(struct file *filp,
> case KVM_SET_PMU_EVENT_FILTER:
> r = kvm_vm_ioctl_set_pmu_event_filter(kvm, argp);
> break;
> + case KVM_GET_PAGE_ENC_BITMAP: {
> + struct kvm_page_enc_bitmap bitmap;
> +
> + r = -EFAULT;
> + if (copy_from_user(&bitmap, argp, sizeof(bitmap)))
> + goto out;
> +
> + r = -ENOTTY;
> + if (kvm_x86_ops->get_page_enc_bitmap)
> + r = kvm_x86_ops->get_page_enc_bitmap(kvm, &bitmap);
> + break;
> + }
> default:
> r = -ENOTTY;
> }
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 4e80c57a3182..db1ebf85e177 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -500,6 +500,16 @@ struct kvm_dirty_log {
> };
> };
>
> +/* for KVM_GET_PAGE_ENC_BITMAP */
> +struct kvm_page_enc_bitmap {
> + __u64 start_gfn;
> + __u64 num_pages;
> + union {
> + void __user *enc_bitmap; /* one bit per page */
> + __u64 padding2;
> + };
> +};
> +
> /* for KVM_CLEAR_DIRTY_LOG */
> struct kvm_clear_dirty_log {
> __u32 slot;
> @@ -1478,6 +1488,8 @@ struct kvm_enc_region {
> #define KVM_S390_NORMAL_RESET _IO(KVMIO, 0xc3)
> #define KVM_S390_CLEAR_RESET _IO(KVMIO, 0xc4)
>
> +#define KVM_GET_PAGE_ENC_BITMAP _IOW(KVMIO, 0xc5, struct kvm_page_enc_bitmap)
> +
> /* Secure Encrypted Virtualization command */
> enum sev_cmd_id {
> /* Guest initialization commands */
> --
> 2.17.1
>
On 3/29/20 11:22 PM, Ashish Kalra wrote:
> From: Brijesh Singh <[email protected]>
>
> The ioctl can be used to retrieve page encryption bitmap for a given
> gfn range.
>
> Return the correct bitmap as per the number of pages being requested
> by the user. Ensure that we only copy bmap->num_pages bytes in the
> userspace buffer, if bmap->num_pages is not byte aligned we read
> the trailing bits from the userspace and copy those bits as is.
>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: "H. Peter Anvin" <[email protected]>
> Cc: Paolo Bonzini <[email protected]>
> Cc: "Radim Krčmář" <[email protected]>
> Cc: Joerg Roedel <[email protected]>
> Cc: Borislav Petkov <[email protected]>
> Cc: Tom Lendacky <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Signed-off-by: Brijesh Singh <[email protected]>
> Signed-off-by: Ashish Kalra <[email protected]>
> ---
> Documentation/virt/kvm/api.rst | 27 +++++++++++++
> arch/x86/include/asm/kvm_host.h | 2 +
> arch/x86/kvm/svm.c | 71 +++++++++++++++++++++++++++++++++
> arch/x86/kvm/x86.c | 12 ++++++
> include/uapi/linux/kvm.h | 12 ++++++
> 5 files changed, 124 insertions(+)
>
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index ebd383fba939..8ad800ebb54f 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -4648,6 +4648,33 @@ This ioctl resets VCPU registers and control structures according to
> the clear cpu reset definition in the POP. However, the cpu is not put
> into ESA mode. This reset is a superset of the initial reset.
>
> +4.125 KVM_GET_PAGE_ENC_BITMAP (vm ioctl)
> +---------------------------------------
> +
> +:Capability: basic
> +:Architectures: x86
> +:Type: vm ioctl
> +:Parameters: struct kvm_page_enc_bitmap (in/out)
> +:Returns: 0 on success, -1 on error
> +
> +/* for KVM_GET_PAGE_ENC_BITMAP */
> +struct kvm_page_enc_bitmap {
> + __u64 start_gfn;
> + __u64 num_pages;
> + union {
> + void __user *enc_bitmap; /* one bit per page */
> + __u64 padding2;
> + };
> +};
> +
> +The encrypted VMs have concept of private and shared pages. The private
> +page is encrypted with the guest-specific key, while shared page may
> +be encrypted with the hypervisor key. The KVM_GET_PAGE_ENC_BITMAP can
> +be used to get the bitmap indicating whether the guest page is private
> +or shared. The bitmap can be used during the guest migration, if the page
> +is private then userspace need to use SEV migration commands to transmit
> +the page.
> +
>
> 5. The kvm_run structure
> ========================
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 90718fa3db47..27e43e3ec9d8 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1269,6 +1269,8 @@ struct kvm_x86_ops {
> int (*enable_direct_tlbflush)(struct kvm_vcpu *vcpu);
> int (*page_enc_status_hc)(struct kvm *kvm, unsigned long gpa,
> unsigned long sz, unsigned long mode);
> + int (*get_page_enc_bitmap)(struct kvm *kvm,
> + struct kvm_page_enc_bitmap *bmap);
Looking back at the previous patch, it seems that these two are
basically the setter/getter action for page encryption, though one is
implemented as a hypercall while the other as an ioctl. If we consider
the setter/getter aspect, isn't it better to have some sort of symmetry
in the naming of the ops ? For example,
set_page_enc_hc
get_page_enc_ioctl
> };
>
> struct kvm_arch_async_pf {
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index 1d8beaf1bceb..bae783cd396a 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -7686,6 +7686,76 @@ static int svm_page_enc_status_hc(struct kvm *kvm, unsigned long gpa,
> return ret;
> }
>
> +static int svm_get_page_enc_bitmap(struct kvm *kvm,
> + struct kvm_page_enc_bitmap *bmap)
> +{
> + struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> + unsigned long gfn_start, gfn_end;
> + unsigned long sz, i, sz_bytes;
> + unsigned long *bitmap;
> + int ret, n;
> +
> + if (!sev_guest(kvm))
> + return -ENOTTY;
> +
> + gfn_start = bmap->start_gfn;
What if bmap->start_gfn is junk ?
> + gfn_end = gfn_start + bmap->num_pages;
> +
> + sz = ALIGN(bmap->num_pages, BITS_PER_LONG) / BITS_PER_BYTE;
> + bitmap = kmalloc(sz, GFP_KERNEL);
> + if (!bitmap)
> + return -ENOMEM;
> +
> + /* by default all pages are marked encrypted */
> + memset(bitmap, 0xff, sz);
> +
> + mutex_lock(&kvm->lock);
> + if (sev->page_enc_bmap) {
> + i = gfn_start;
> + for_each_clear_bit_from(i, sev->page_enc_bmap,
> + min(sev->page_enc_bmap_size, gfn_end))
> + clear_bit(i - gfn_start, bitmap);
> + }
> + mutex_unlock(&kvm->lock);
> +
> + ret = -EFAULT;
> +
> + n = bmap->num_pages % BITS_PER_BYTE;
> + sz_bytes = ALIGN(bmap->num_pages, BITS_PER_BYTE) / BITS_PER_BYTE;
> +
> + /*
> + * Return the correct bitmap as per the number of pages being
> + * requested by the user. Ensure that we only copy bmap->num_pages
> + * bytes in the userspace buffer, if bmap->num_pages is not byte
> + * aligned we read the trailing bits from the userspace and copy
> + * those bits as is.
> + */
> +
> + if (n) {
Is it better to check for 'num_pages' at the beginning of the function
rather than coming this far if bmap->num_pages is zero ?
> + unsigned char *bitmap_kernel = (unsigned char *)bitmap;
Just trying to understand why you need this extra variable instead of
using 'bitmap' directly.
> + unsigned char bitmap_user;
> + unsigned long offset, mask;
> +
> + offset = bmap->num_pages / BITS_PER_BYTE;
> + if (copy_from_user(&bitmap_user, bmap->enc_bitmap + offset,
> + sizeof(unsigned char)))
> + goto out;
> +
> + mask = GENMASK(n - 1, 0);
> + bitmap_user &= ~mask;
> + bitmap_kernel[offset] &= mask;
> + bitmap_kernel[offset] |= bitmap_user;
> + }
> +
> + if (copy_to_user(bmap->enc_bitmap, bitmap, sz_bytes))
If 'n' is zero, we are still copying stuff back to the user. Is that
what is expected from userland ?
Another point. Since copy_from_user() was done in the caller, isn't it
better to move this to the caller to keep a symmetry ?
> + goto out;
> +
> + ret = 0;
> +out:
> + kfree(bitmap);
> + return ret;
> +}
> +
> static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
> {
> struct kvm_sev_cmd sev_cmd;
> @@ -8090,6 +8160,7 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
> .apic_init_signal_blocked = svm_apic_init_signal_blocked,
>
> .page_enc_status_hc = svm_page_enc_status_hc,
> + .get_page_enc_bitmap = svm_get_page_enc_bitmap,
> };
>
> static int __init svm_init(void)
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 68428eef2dde..3c3fea4e20b5 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -5226,6 +5226,18 @@ long kvm_arch_vm_ioctl(struct file *filp,
> case KVM_SET_PMU_EVENT_FILTER:
> r = kvm_vm_ioctl_set_pmu_event_filter(kvm, argp);
> break;
> + case KVM_GET_PAGE_ENC_BITMAP: {
> + struct kvm_page_enc_bitmap bitmap;
> +
> + r = -EFAULT;
> + if (copy_from_user(&bitmap, argp, sizeof(bitmap)))
> + goto out;
> +
> + r = -ENOTTY;
> + if (kvm_x86_ops->get_page_enc_bitmap)
> + r = kvm_x86_ops->get_page_enc_bitmap(kvm, &bitmap);
> + break;
> + }
> default:
> r = -ENOTTY;
> }
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 4e80c57a3182..db1ebf85e177 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -500,6 +500,16 @@ struct kvm_dirty_log {
> };
> };
>
> +/* for KVM_GET_PAGE_ENC_BITMAP */
> +struct kvm_page_enc_bitmap {
> + __u64 start_gfn;
> + __u64 num_pages;
> + union {
> + void __user *enc_bitmap; /* one bit per page */
> + __u64 padding2;
> + };
> +};
> +
> /* for KVM_CLEAR_DIRTY_LOG */
> struct kvm_clear_dirty_log {
> __u32 slot;
> @@ -1478,6 +1488,8 @@ struct kvm_enc_region {
> #define KVM_S390_NORMAL_RESET _IO(KVMIO, 0xc3)
> #define KVM_S390_CLEAR_RESET _IO(KVMIO, 0xc4)
>
> +#define KVM_GET_PAGE_ENC_BITMAP _IOW(KVMIO, 0xc5, struct kvm_page_enc_bitmap)
> +
> /* Secure Encrypted Virtualization command */
> enum sev_cmd_id {
> /* Guest initialization commands */
On Fri, Apr 03, 2020 at 01:18:52PM -0700, Krish Sadhukhan wrote:
>
> On 3/29/20 11:22 PM, Ashish Kalra wrote:
> > From: Brijesh Singh <[email protected]>
> >
> > The ioctl can be used to retrieve page encryption bitmap for a given
> > gfn range.
> >
> > Return the correct bitmap as per the number of pages being requested
> > by the user. Ensure that we only copy bmap->num_pages bytes in the
> > userspace buffer, if bmap->num_pages is not byte aligned we read
> > the trailing bits from the userspace and copy those bits as is.
> >
> > Cc: Thomas Gleixner <[email protected]>
> > Cc: Ingo Molnar <[email protected]>
> > Cc: "H. Peter Anvin" <[email protected]>
> > Cc: Paolo Bonzini <[email protected]>
> > Cc: "Radim Krčmář" <[email protected]>
> > Cc: Joerg Roedel <[email protected]>
> > Cc: Borislav Petkov <[email protected]>
> > Cc: Tom Lendacky <[email protected]>
> > Cc: [email protected]
> > Cc: [email protected]
> > Cc: [email protected]
> > Signed-off-by: Brijesh Singh <[email protected]>
> > Signed-off-by: Ashish Kalra <[email protected]>
> > ---
> > Documentation/virt/kvm/api.rst | 27 +++++++++++++
> > arch/x86/include/asm/kvm_host.h | 2 +
> > arch/x86/kvm/svm.c | 71 +++++++++++++++++++++++++++++++++
> > arch/x86/kvm/x86.c | 12 ++++++
> > include/uapi/linux/kvm.h | 12 ++++++
> > 5 files changed, 124 insertions(+)
> >
> > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> > index ebd383fba939..8ad800ebb54f 100644
> > --- a/Documentation/virt/kvm/api.rst
> > +++ b/Documentation/virt/kvm/api.rst
> > @@ -4648,6 +4648,33 @@ This ioctl resets VCPU registers and control structures according to
> > the clear cpu reset definition in the POP. However, the cpu is not put
> > into ESA mode. This reset is a superset of the initial reset.
> > +4.125 KVM_GET_PAGE_ENC_BITMAP (vm ioctl)
> > +---------------------------------------
> > +
> > +:Capability: basic
> > +:Architectures: x86
> > +:Type: vm ioctl
> > +:Parameters: struct kvm_page_enc_bitmap (in/out)
> > +:Returns: 0 on success, -1 on error
> > +
> > +/* for KVM_GET_PAGE_ENC_BITMAP */
> > +struct kvm_page_enc_bitmap {
> > + __u64 start_gfn;
> > + __u64 num_pages;
> > + union {
> > + void __user *enc_bitmap; /* one bit per page */
> > + __u64 padding2;
> > + };
> > +};
> > +
> > +The encrypted VMs have concept of private and shared pages. The private
> > +page is encrypted with the guest-specific key, while shared page may
> > +be encrypted with the hypervisor key. The KVM_GET_PAGE_ENC_BITMAP can
> > +be used to get the bitmap indicating whether the guest page is private
> > +or shared. The bitmap can be used during the guest migration, if the page
> > +is private then userspace need to use SEV migration commands to transmit
> > +the page.
> > +
> > 5. The kvm_run structure
> > ========================
> > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > index 90718fa3db47..27e43e3ec9d8 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -1269,6 +1269,8 @@ struct kvm_x86_ops {
> > int (*enable_direct_tlbflush)(struct kvm_vcpu *vcpu);
> > int (*page_enc_status_hc)(struct kvm *kvm, unsigned long gpa,
> > unsigned long sz, unsigned long mode);
> > + int (*get_page_enc_bitmap)(struct kvm *kvm,
> > + struct kvm_page_enc_bitmap *bmap);
>
>
> Looking back at the previous patch, it seems that these two are basically
> the setter/getter action for page encryption, though one is implemented as a
> hypercall while the other as an ioctl. If we consider the setter/getter
> aspect, isn't it better to have some sort of symmetry in the naming of the
> ops ? For example,
>
> set_page_enc_hc
>
> get_page_enc_ioctl
>
> > };
These are named as per their usage. While the page_enc_status_hc is a
hypercall used by a guest to mark the page encryption bitmap, the other
ones are ioctl interfaces used by Qemu (or Qemu alternative) to get/set
the page encryption bitmaps, so these are named accordingly.
> > struct kvm_arch_async_pf {
> > diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> > index 1d8beaf1bceb..bae783cd396a 100644
> > --- a/arch/x86/kvm/svm.c
> > +++ b/arch/x86/kvm/svm.c
> > @@ -7686,6 +7686,76 @@ static int svm_page_enc_status_hc(struct kvm *kvm, unsigned long gpa,
> > return ret;
> > }
> > +static int svm_get_page_enc_bitmap(struct kvm *kvm,
> > + struct kvm_page_enc_bitmap *bmap)
> > +{
> > + struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> > + unsigned long gfn_start, gfn_end;
> > + unsigned long sz, i, sz_bytes;
> > + unsigned long *bitmap;
> > + int ret, n;
> > +
> > + if (!sev_guest(kvm))
> > + return -ENOTTY;
> > +
> > + gfn_start = bmap->start_gfn;
>
>
> What if bmap->start_gfn is junk ?
>
> > + gfn_end = gfn_start + bmap->num_pages;
> > +
> > + sz = ALIGN(bmap->num_pages, BITS_PER_LONG) / BITS_PER_BYTE;
> > + bitmap = kmalloc(sz, GFP_KERNEL);
> > + if (!bitmap)
> > + return -ENOMEM;
> > +
> > + /* by default all pages are marked encrypted */
> > + memset(bitmap, 0xff, sz);
> > +
> > + mutex_lock(&kvm->lock);
> > + if (sev->page_enc_bmap) {
> > + i = gfn_start;
> > + for_each_clear_bit_from(i, sev->page_enc_bmap,
> > + min(sev->page_enc_bmap_size, gfn_end))
> > + clear_bit(i - gfn_start, bitmap);
> > + }
> > + mutex_unlock(&kvm->lock);
> > +
> > + ret = -EFAULT;
> > +
> > + n = bmap->num_pages % BITS_PER_BYTE;
> > + sz_bytes = ALIGN(bmap->num_pages, BITS_PER_BYTE) / BITS_PER_BYTE;
> > +
> > + /*
> > + * Return the correct bitmap as per the number of pages being
> > + * requested by the user. Ensure that we only copy bmap->num_pages
> > + * bytes in the userspace buffer, if bmap->num_pages is not byte
> > + * aligned we read the trailing bits from the userspace and copy
> > + * those bits as is.
> > + */
> > +
> > + if (n) {
>
>
> Is it better to check for 'num_pages' at the beginning of the function
> rather than coming this far if bmap->num_pages is zero ?
>
This is not checking for "num_pages", this is basically checking if
bmap->num_pages is not byte aligned.
> > + unsigned char *bitmap_kernel = (unsigned char *)bitmap;
>
>
> Just trying to understand why you need this extra variable instead of using
> 'bitmap' directly.
>
Makes the code much more readable/understandable.
> > + unsigned char bitmap_user;
> > + unsigned long offset, mask;
> > +
> > + offset = bmap->num_pages / BITS_PER_BYTE;
> > + if (copy_from_user(&bitmap_user, bmap->enc_bitmap + offset,
> > + sizeof(unsigned char)))
> > + goto out;
> > +
> > + mask = GENMASK(n - 1, 0);
> > + bitmap_user &= ~mask;
> > + bitmap_kernel[offset] &= mask;
> > + bitmap_kernel[offset] |= bitmap_user;
> > + }
> > +
> > + if (copy_to_user(bmap->enc_bitmap, bitmap, sz_bytes))
>
>
> If 'n' is zero, we are still copying stuff back to the user. Is that what is
> expected from userland ?
>
> Another point. Since copy_from_user() was done in the caller, isn't it
> better to move this to the caller to keep a symmetry ?
>
As per the comments above, please note if n is not zero that means
bmap->num_pages is not byte aligned so we read the trailing bits
from the userspace and copy those bits as is. If n is zero, then
bmap->num_pages is correctly aligned and we copy all the bytes back.
Thanks,
Ashish
> > + goto out;
> > +
> > + ret = 0;
> > +out:
> > + kfree(bitmap);
> > + return ret;
> > +}
> > +
> > static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
> > {
> > struct kvm_sev_cmd sev_cmd;
> > @@ -8090,6 +8160,7 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
> > .apic_init_signal_blocked = svm_apic_init_signal_blocked,
> > .page_enc_status_hc = svm_page_enc_status_hc,
> > + .get_page_enc_bitmap = svm_get_page_enc_bitmap,
> > };
> > static int __init svm_init(void)
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 68428eef2dde..3c3fea4e20b5 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -5226,6 +5226,18 @@ long kvm_arch_vm_ioctl(struct file *filp,
> > case KVM_SET_PMU_EVENT_FILTER:
> > r = kvm_vm_ioctl_set_pmu_event_filter(kvm, argp);
> > break;
> > + case KVM_GET_PAGE_ENC_BITMAP: {
> > + struct kvm_page_enc_bitmap bitmap;
> > +
> > + r = -EFAULT;
> > + if (copy_from_user(&bitmap, argp, sizeof(bitmap)))
> > + goto out;
> > +
> > + r = -ENOTTY;
> > + if (kvm_x86_ops->get_page_enc_bitmap)
> > + r = kvm_x86_ops->get_page_enc_bitmap(kvm, &bitmap);
> > + break;
> > + }
> > default:
> > r = -ENOTTY;
> > }
> > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> > index 4e80c57a3182..db1ebf85e177 100644
> > --- a/include/uapi/linux/kvm.h
> > +++ b/include/uapi/linux/kvm.h
> > @@ -500,6 +500,16 @@ struct kvm_dirty_log {
> > };
> > };
> > +/* for KVM_GET_PAGE_ENC_BITMAP */
> > +struct kvm_page_enc_bitmap {
> > + __u64 start_gfn;
> > + __u64 num_pages;
> > + union {
> > + void __user *enc_bitmap; /* one bit per page */
> > + __u64 padding2;
> > + };
> > +};
> > +
> > /* for KVM_CLEAR_DIRTY_LOG */
> > struct kvm_clear_dirty_log {
> > __u32 slot;
> > @@ -1478,6 +1488,8 @@ struct kvm_enc_region {
> > #define KVM_S390_NORMAL_RESET _IO(KVMIO, 0xc3)
> > #define KVM_S390_CLEAR_RESET _IO(KVMIO, 0xc4)
> > +#define KVM_GET_PAGE_ENC_BITMAP _IOW(KVMIO, 0xc5, struct kvm_page_enc_bitmap)
> > +
> > /* Secure Encrypted Virtualization command */
> > enum sev_cmd_id {
> > /* Guest initialization commands */
On 2020-04-03 13:18:52 -0700, Krish Sadhukhan wrote:
>
> On 3/29/20 11:22 PM, Ashish Kalra wrote:
> > From: Brijesh Singh <[email protected]>
> >
> > The ioctl can be used to retrieve page encryption bitmap for a given
> > gfn range.
> >
> > Return the correct bitmap as per the number of pages being requested
> > by the user. Ensure that we only copy bmap->num_pages bytes in the
> > userspace buffer, if bmap->num_pages is not byte aligned we read
> > the trailing bits from the userspace and copy those bits as is.
> >
> > Cc: Thomas Gleixner <[email protected]>
> > Cc: Ingo Molnar <[email protected]>
> > Cc: "H. Peter Anvin" <[email protected]>
> > Cc: Paolo Bonzini <[email protected]>
> > Cc: "Radim Krčmář" <[email protected]>
> > Cc: Joerg Roedel <[email protected]>
> > Cc: Borislav Petkov <[email protected]>
> > Cc: Tom Lendacky <[email protected]>
> > Cc: [email protected]
> > Cc: [email protected]
> > Cc: [email protected]
> > Signed-off-by: Brijesh Singh <[email protected]>
> > Signed-off-by: Ashish Kalra <[email protected]>
> > ---
> > Documentation/virt/kvm/api.rst | 27 +++++++++++++
> > arch/x86/include/asm/kvm_host.h | 2 +
> > arch/x86/kvm/svm.c | 71 +++++++++++++++++++++++++++++++++
> > arch/x86/kvm/x86.c | 12 ++++++
> > include/uapi/linux/kvm.h | 12 ++++++
> > 5 files changed, 124 insertions(+)
> >
> > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> > index ebd383fba939..8ad800ebb54f 100644
> > --- a/Documentation/virt/kvm/api.rst
> > +++ b/Documentation/virt/kvm/api.rst
> > @@ -4648,6 +4648,33 @@ This ioctl resets VCPU registers and control structures according to
> > the clear cpu reset definition in the POP. However, the cpu is not put
> > into ESA mode. This reset is a superset of the initial reset.
> > +4.125 KVM_GET_PAGE_ENC_BITMAP (vm ioctl)
> > +---------------------------------------
> > +
> > +:Capability: basic
> > +:Architectures: x86
> > +:Type: vm ioctl
> > +:Parameters: struct kvm_page_enc_bitmap (in/out)
> > +:Returns: 0 on success, -1 on error
> > +
> > +/* for KVM_GET_PAGE_ENC_BITMAP */
> > +struct kvm_page_enc_bitmap {
> > + __u64 start_gfn;
> > + __u64 num_pages;
> > + union {
> > + void __user *enc_bitmap; /* one bit per page */
> > + __u64 padding2;
> > + };
> > +};
> > +
> > +The encrypted VMs have concept of private and shared pages. The private
> > +page is encrypted with the guest-specific key, while shared page may
> > +be encrypted with the hypervisor key. The KVM_GET_PAGE_ENC_BITMAP can
> > +be used to get the bitmap indicating whether the guest page is private
> > +or shared. The bitmap can be used during the guest migration, if the page
> > +is private then userspace need to use SEV migration commands to transmit
> > +the page.
> > +
> > 5. The kvm_run structure
> > ========================
> > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > index 90718fa3db47..27e43e3ec9d8 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -1269,6 +1269,8 @@ struct kvm_x86_ops {
> > int (*enable_direct_tlbflush)(struct kvm_vcpu *vcpu);
> > int (*page_enc_status_hc)(struct kvm *kvm, unsigned long gpa,
> > unsigned long sz, unsigned long mode);
> > + int (*get_page_enc_bitmap)(struct kvm *kvm,
> > + struct kvm_page_enc_bitmap *bmap);
>
>
> Looking back at the previous patch, it seems that these two are basically
> the setter/getter action for page encryption, though one is implemented as a
> hypercall while the other as an ioctl. If we consider the setter/getter
> aspect, isn't it better to have some sort of symmetry in the naming of the
> ops ? For example,
>
> set_page_enc_hc
>
> get_page_enc_ioctl
>
> > };
> > struct kvm_arch_async_pf {
> > diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> > index 1d8beaf1bceb..bae783cd396a 100644
> > --- a/arch/x86/kvm/svm.c
> > +++ b/arch/x86/kvm/svm.c
> > @@ -7686,6 +7686,76 @@ static int svm_page_enc_status_hc(struct kvm *kvm, unsigned long gpa,
> > return ret;
> > }
> > +static int svm_get_page_enc_bitmap(struct kvm *kvm,
> > + struct kvm_page_enc_bitmap *bmap)
> > +{
> > + struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> > + unsigned long gfn_start, gfn_end;
> > + unsigned long sz, i, sz_bytes;
> > + unsigned long *bitmap;
> > + int ret, n;
> > +
> > + if (!sev_guest(kvm))
> > + return -ENOTTY;
> > +
> > + gfn_start = bmap->start_gfn;
>
>
> What if bmap->start_gfn is junk ?
>
> > + gfn_end = gfn_start + bmap->num_pages;
> > +
> > + sz = ALIGN(bmap->num_pages, BITS_PER_LONG) / BITS_PER_BYTE;
> > + bitmap = kmalloc(sz, GFP_KERNEL);
> > + if (!bitmap)
> > + return -ENOMEM;
> > +
> > + /* by default all pages are marked encrypted */
> > + memset(bitmap, 0xff, sz);
> > +
> > + mutex_lock(&kvm->lock);
> > + if (sev->page_enc_bmap) {
> > + i = gfn_start;
> > + for_each_clear_bit_from(i, sev->page_enc_bmap,
> > + min(sev->page_enc_bmap_size, gfn_end))
> > + clear_bit(i - gfn_start, bitmap);
> > + }
> > + mutex_unlock(&kvm->lock);
> > +
> > + ret = -EFAULT;
> > +
> > + n = bmap->num_pages % BITS_PER_BYTE;
> > + sz_bytes = ALIGN(bmap->num_pages, BITS_PER_BYTE) / BITS_PER_BYTE;
> > +
> > + /*
> > + * Return the correct bitmap as per the number of pages being
> > + * requested by the user. Ensure that we only copy bmap->num_pages
> > + * bytes in the userspace buffer, if bmap->num_pages is not byte
> > + * aligned we read the trailing bits from the userspace and copy
> > + * those bits as is.
> > + */
> > +
> > + if (n) {
>
>
> Is it better to check for 'num_pages' at the beginning of the function
> rather than coming this far if bmap->num_pages is zero ?
>
> > + unsigned char *bitmap_kernel = (unsigned char *)bitmap;
>
>
> Just trying to understand why you need this extra variable instead of using
> 'bitmap' directly.
>
> > + unsigned char bitmap_user;
> > + unsigned long offset, mask;
> > +
> > + offset = bmap->num_pages / BITS_PER_BYTE;
> > + if (copy_from_user(&bitmap_user, bmap->enc_bitmap + offset,
> > + sizeof(unsigned char)))
> > + goto out;
> > +
> > + mask = GENMASK(n - 1, 0);
> > + bitmap_user &= ~mask;
> > + bitmap_kernel[offset] &= mask;
> > + bitmap_kernel[offset] |= bitmap_user;
> > + }
> > +
> > + if (copy_to_user(bmap->enc_bitmap, bitmap, sz_bytes))
>
>
> If 'n' is zero, we are still copying stuff back to the user. Is that what is
> expected from userland ?
>
> Another point. Since copy_from_user() was done in the caller, isn't it
> better to move this to the caller to keep a symmetry ?
That would need the interface of .get_page_enc_bitmap to change, to pass
back the local bitmap to the caller for use in copy_to_user() and then
free it up. I think it is better to call copy_to_user() here and free
the bitmap before returning.
>
> > + goto out;
> > +
> > + ret = 0;
> > +out:
> > + kfree(bitmap);
> > + return ret;
> > +}
> > +
> > static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
> > {
> > struct kvm_sev_cmd sev_cmd;
> > @@ -8090,6 +8160,7 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
> > .apic_init_signal_blocked = svm_apic_init_signal_blocked,
> > .page_enc_status_hc = svm_page_enc_status_hc,
> > + .get_page_enc_bitmap = svm_get_page_enc_bitmap,
> > };
> > static int __init svm_init(void)
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 68428eef2dde..3c3fea4e20b5 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -5226,6 +5226,18 @@ long kvm_arch_vm_ioctl(struct file *filp,
> > case KVM_SET_PMU_EVENT_FILTER:
> > r = kvm_vm_ioctl_set_pmu_event_filter(kvm, argp);
> > break;
> > + case KVM_GET_PAGE_ENC_BITMAP: {
> > + struct kvm_page_enc_bitmap bitmap;
> > +
> > + r = -EFAULT;
> > + if (copy_from_user(&bitmap, argp, sizeof(bitmap)))
> > + goto out;
> > +
> > + r = -ENOTTY;
> > + if (kvm_x86_ops->get_page_enc_bitmap)
> > + r = kvm_x86_ops->get_page_enc_bitmap(kvm, &bitmap);
> > + break;
> > + }
> > default:
> > r = -ENOTTY;
> > }
> > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> > index 4e80c57a3182..db1ebf85e177 100644
> > --- a/include/uapi/linux/kvm.h
> > +++ b/include/uapi/linux/kvm.h
> > @@ -500,6 +500,16 @@ struct kvm_dirty_log {
> > };
> > };
> > +/* for KVM_GET_PAGE_ENC_BITMAP */
> > +struct kvm_page_enc_bitmap {
> > + __u64 start_gfn;
> > + __u64 num_pages;
> > + union {
> > + void __user *enc_bitmap; /* one bit per page */
> > + __u64 padding2;
> > + };
> > +};
> > +
> > /* for KVM_CLEAR_DIRTY_LOG */
> > struct kvm_clear_dirty_log {
> > __u32 slot;
> > @@ -1478,6 +1488,8 @@ struct kvm_enc_region {
> > #define KVM_S390_NORMAL_RESET _IO(KVMIO, 0xc3)
> > #define KVM_S390_CLEAR_RESET _IO(KVMIO, 0xc4)
> > +#define KVM_GET_PAGE_ENC_BITMAP _IOW(KVMIO, 0xc5, struct kvm_page_enc_bitmap)
> > +
> > /* Secure Encrypted Virtualization command */
> > enum sev_cmd_id {
> > /* Guest initialization commands */
On Fri, Apr 03, 2020 at 03:55:07PM -0500, Venu Busireddy wrote:
> On 2020-04-03 13:18:52 -0700, Krish Sadhukhan wrote:
> >
> > On 3/29/20 11:22 PM, Ashish Kalra wrote:
> > > From: Brijesh Singh <[email protected]>
> > >
> > > The ioctl can be used to retrieve page encryption bitmap for a given
> > > gfn range.
> > >
> > > Return the correct bitmap as per the number of pages being requested
> > > by the user. Ensure that we only copy bmap->num_pages bytes in the
> > > userspace buffer, if bmap->num_pages is not byte aligned we read
> > > the trailing bits from the userspace and copy those bits as is.
> > >
> > > Cc: Thomas Gleixner <[email protected]>
> > > Cc: Ingo Molnar <[email protected]>
> > > Cc: "H. Peter Anvin" <[email protected]>
> > > Cc: Paolo Bonzini <[email protected]>
> > > Cc: "Radim Krčmář" <[email protected]>
> > > Cc: Joerg Roedel <[email protected]>
> > > Cc: Borislav Petkov <[email protected]>
> > > Cc: Tom Lendacky <[email protected]>
> > > Cc: [email protected]
> > > Cc: [email protected]
> > > Cc: [email protected]
> > > Signed-off-by: Brijesh Singh <[email protected]>
> > > Signed-off-by: Ashish Kalra <[email protected]>
> > > ---
> > > Documentation/virt/kvm/api.rst | 27 +++++++++++++
> > > arch/x86/include/asm/kvm_host.h | 2 +
> > > arch/x86/kvm/svm.c | 71 +++++++++++++++++++++++++++++++++
> > > arch/x86/kvm/x86.c | 12 ++++++
> > > include/uapi/linux/kvm.h | 12 ++++++
> > > 5 files changed, 124 insertions(+)
> > >
> > > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> > > index ebd383fba939..8ad800ebb54f 100644
> > > --- a/Documentation/virt/kvm/api.rst
> > > +++ b/Documentation/virt/kvm/api.rst
> > > @@ -4648,6 +4648,33 @@ This ioctl resets VCPU registers and control structures according to
> > > the clear cpu reset definition in the POP. However, the cpu is not put
> > > into ESA mode. This reset is a superset of the initial reset.
> > > +4.125 KVM_GET_PAGE_ENC_BITMAP (vm ioctl)
> > > +---------------------------------------
> > > +
> > > +:Capability: basic
> > > +:Architectures: x86
> > > +:Type: vm ioctl
> > > +:Parameters: struct kvm_page_enc_bitmap (in/out)
> > > +:Returns: 0 on success, -1 on error
> > > +
> > > +/* for KVM_GET_PAGE_ENC_BITMAP */
> > > +struct kvm_page_enc_bitmap {
> > > + __u64 start_gfn;
> > > + __u64 num_pages;
> > > + union {
> > > + void __user *enc_bitmap; /* one bit per page */
> > > + __u64 padding2;
> > > + };
> > > +};
> > > +
> > > +The encrypted VMs have concept of private and shared pages. The private
> > > +page is encrypted with the guest-specific key, while shared page may
> > > +be encrypted with the hypervisor key. The KVM_GET_PAGE_ENC_BITMAP can
> > > +be used to get the bitmap indicating whether the guest page is private
> > > +or shared. The bitmap can be used during the guest migration, if the page
> > > +is private then userspace need to use SEV migration commands to transmit
> > > +the page.
> > > +
> > > 5. The kvm_run structure
> > > ========================
> > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > > index 90718fa3db47..27e43e3ec9d8 100644
> > > --- a/arch/x86/include/asm/kvm_host.h
> > > +++ b/arch/x86/include/asm/kvm_host.h
> > > @@ -1269,6 +1269,8 @@ struct kvm_x86_ops {
> > > int (*enable_direct_tlbflush)(struct kvm_vcpu *vcpu);
> > > int (*page_enc_status_hc)(struct kvm *kvm, unsigned long gpa,
> > > unsigned long sz, unsigned long mode);
> > > + int (*get_page_enc_bitmap)(struct kvm *kvm,
> > > + struct kvm_page_enc_bitmap *bmap);
> >
> >
> > Looking back at the previous patch, it seems that these two are basically
> > the setter/getter action for page encryption, though one is implemented as a
> > hypercall while the other as an ioctl. If we consider the setter/getter
> > aspect, isn't it better to have some sort of symmetry in the naming of the
> > ops ? For example,
> >
> > set_page_enc_hc
> >
> > get_page_enc_ioctl
> >
> > > };
> > > struct kvm_arch_async_pf {
> > > diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> > > index 1d8beaf1bceb..bae783cd396a 100644
> > > --- a/arch/x86/kvm/svm.c
> > > +++ b/arch/x86/kvm/svm.c
> > > @@ -7686,6 +7686,76 @@ static int svm_page_enc_status_hc(struct kvm *kvm, unsigned long gpa,
> > > return ret;
> > > }
> > > +static int svm_get_page_enc_bitmap(struct kvm *kvm,
> > > + struct kvm_page_enc_bitmap *bmap)
> > > +{
> > > + struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> > > + unsigned long gfn_start, gfn_end;
> > > + unsigned long sz, i, sz_bytes;
> > > + unsigned long *bitmap;
> > > + int ret, n;
> > > +
> > > + if (!sev_guest(kvm))
> > > + return -ENOTTY;
> > > +
> > > + gfn_start = bmap->start_gfn;
> >
> >
> > What if bmap->start_gfn is junk ?
> >
> > > + gfn_end = gfn_start + bmap->num_pages;
> > > +
> > > + sz = ALIGN(bmap->num_pages, BITS_PER_LONG) / BITS_PER_BYTE;
> > > + bitmap = kmalloc(sz, GFP_KERNEL);
> > > + if (!bitmap)
> > > + return -ENOMEM;
> > > +
> > > + /* by default all pages are marked encrypted */
> > > + memset(bitmap, 0xff, sz);
> > > +
> > > + mutex_lock(&kvm->lock);
> > > + if (sev->page_enc_bmap) {
> > > + i = gfn_start;
> > > + for_each_clear_bit_from(i, sev->page_enc_bmap,
> > > + min(sev->page_enc_bmap_size, gfn_end))
> > > + clear_bit(i - gfn_start, bitmap);
> > > + }
> > > + mutex_unlock(&kvm->lock);
> > > +
> > > + ret = -EFAULT;
> > > +
> > > + n = bmap->num_pages % BITS_PER_BYTE;
> > > + sz_bytes = ALIGN(bmap->num_pages, BITS_PER_BYTE) / BITS_PER_BYTE;
> > > +
> > > + /*
> > > + * Return the correct bitmap as per the number of pages being
> > > + * requested by the user. Ensure that we only copy bmap->num_pages
> > > + * bytes in the userspace buffer, if bmap->num_pages is not byte
> > > + * aligned we read the trailing bits from the userspace and copy
> > > + * those bits as is.
> > > + */
> > > +
> > > + if (n) {
> >
> >
> > Is it better to check for 'num_pages' at the beginning of the function
> > rather than coming this far if bmap->num_pages is zero ?
> >
> > > + unsigned char *bitmap_kernel = (unsigned char *)bitmap;
> >
> >
> > Just trying to understand why you need this extra variable instead of using
> > 'bitmap' directly.
> >
> > > + unsigned char bitmap_user;
> > > + unsigned long offset, mask;
> > > +
> > > + offset = bmap->num_pages / BITS_PER_BYTE;
> > > + if (copy_from_user(&bitmap_user, bmap->enc_bitmap + offset,
> > > + sizeof(unsigned char)))
> > > + goto out;
> > > +
> > > + mask = GENMASK(n - 1, 0);
> > > + bitmap_user &= ~mask;
> > > + bitmap_kernel[offset] &= mask;
> > > + bitmap_kernel[offset] |= bitmap_user;
> > > + }
> > > +
> > > + if (copy_to_user(bmap->enc_bitmap, bitmap, sz_bytes))
> >
> >
> > If 'n' is zero, we are still copying stuff back to the user. Is that what is
> > expected from userland ?
> >
> > Another point. Since copy_from_user() was done in the caller, isn't it
> > better to move this to the caller to keep a symmetry ?
>
> That would need the interface of .get_page_enc_bitmap to change, to pass
> back the local bitmap to the caller for use in copy_to_user() and then
> free it up. I think it is better to call copy_to_user() here and free
> the bitmap before returning.
>
As i replied in my earlier response to this patch, please note that
as per comments above, here we are checking if bmap->num_pages is not byte
aligned and if not then we read the trailing bits from the userspace and copy
those bits as is.
Thanks,
Ashish
> >
> > > + goto out;
> > > +
> > > + ret = 0;
> > > +out:
> > > + kfree(bitmap);
> > > + return ret;
> > > +}
> > > +
> > > static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
> > > {
> > > struct kvm_sev_cmd sev_cmd;
> > > @@ -8090,6 +8160,7 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
> > > .apic_init_signal_blocked = svm_apic_init_signal_blocked,
> > > .page_enc_status_hc = svm_page_enc_status_hc,
> > > + .get_page_enc_bitmap = svm_get_page_enc_bitmap,
> > > };
> > > static int __init svm_init(void)
> > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > > index 68428eef2dde..3c3fea4e20b5 100644
> > > --- a/arch/x86/kvm/x86.c
> > > +++ b/arch/x86/kvm/x86.c
> > > @@ -5226,6 +5226,18 @@ long kvm_arch_vm_ioctl(struct file *filp,
> > > case KVM_SET_PMU_EVENT_FILTER:
> > > r = kvm_vm_ioctl_set_pmu_event_filter(kvm, argp);
> > > break;
> > > + case KVM_GET_PAGE_ENC_BITMAP: {
> > > + struct kvm_page_enc_bitmap bitmap;
> > > +
> > > + r = -EFAULT;
> > > + if (copy_from_user(&bitmap, argp, sizeof(bitmap)))
> > > + goto out;
> > > +
> > > + r = -ENOTTY;
> > > + if (kvm_x86_ops->get_page_enc_bitmap)
> > > + r = kvm_x86_ops->get_page_enc_bitmap(kvm, &bitmap);
> > > + break;
> > > + }
> > > default:
> > > r = -ENOTTY;
> > > }
> > > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> > > index 4e80c57a3182..db1ebf85e177 100644
> > > --- a/include/uapi/linux/kvm.h
> > > +++ b/include/uapi/linux/kvm.h
> > > @@ -500,6 +500,16 @@ struct kvm_dirty_log {
> > > };
> > > };
> > > +/* for KVM_GET_PAGE_ENC_BITMAP */
> > > +struct kvm_page_enc_bitmap {
> > > + __u64 start_gfn;
> > > + __u64 num_pages;
> > > + union {
> > > + void __user *enc_bitmap; /* one bit per page */
> > > + __u64 padding2;
> > > + };
> > > +};
> > > +
> > > /* for KVM_CLEAR_DIRTY_LOG */
> > > struct kvm_clear_dirty_log {
> > > __u32 slot;
> > > @@ -1478,6 +1488,8 @@ struct kvm_enc_region {
> > > #define KVM_S390_NORMAL_RESET _IO(KVMIO, 0xc3)
> > > #define KVM_S390_CLEAR_RESET _IO(KVMIO, 0xc4)
> > > +#define KVM_GET_PAGE_ENC_BITMAP _IOW(KVMIO, 0xc5, struct kvm_page_enc_bitmap)
> > > +
> > > /* Secure Encrypted Virtualization command */
> > > enum sev_cmd_id {
> > > /* Guest initialization commands */
On 3/29/20 11:22 PM, Ashish Kalra wrote:
> From: Brijesh Singh <[email protected]>
>
> Invoke a hypercall when a memory region is changed from encrypted ->
> decrypted and vice versa. Hypervisor need to know the page encryption
> status during the guest migration.
>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: "H. Peter Anvin" <[email protected]>
> Cc: Paolo Bonzini <[email protected]>
> Cc: "Radim Krčmář" <[email protected]>
> Cc: Joerg Roedel <[email protected]>
> Cc: Borislav Petkov <[email protected]>
> Cc: Tom Lendacky <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Signed-off-by: Brijesh Singh <[email protected]>
> Signed-off-by: Ashish Kalra <[email protected]>
> ---
> arch/x86/include/asm/paravirt.h | 10 +++++
> arch/x86/include/asm/paravirt_types.h | 2 +
> arch/x86/kernel/paravirt.c | 1 +
> arch/x86/mm/mem_encrypt.c | 57 ++++++++++++++++++++++++++-
> arch/x86/mm/pat/set_memory.c | 7 ++++
> 5 files changed, 76 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
> index 694d8daf4983..8127b9c141bf 100644
> --- a/arch/x86/include/asm/paravirt.h
> +++ b/arch/x86/include/asm/paravirt.h
> @@ -78,6 +78,12 @@ static inline void paravirt_arch_exit_mmap(struct mm_struct *mm)
> PVOP_VCALL1(mmu.exit_mmap, mm);
> }
>
> +static inline void page_encryption_changed(unsigned long vaddr, int npages,
> + bool enc)
> +{
> + PVOP_VCALL3(mmu.page_encryption_changed, vaddr, npages, enc);
> +}
> +
> #ifdef CONFIG_PARAVIRT_XXL
> static inline void load_sp0(unsigned long sp0)
> {
> @@ -946,6 +952,10 @@ static inline void paravirt_arch_dup_mmap(struct mm_struct *oldmm,
> static inline void paravirt_arch_exit_mmap(struct mm_struct *mm)
> {
> }
> +
> +static inline void page_encryption_changed(unsigned long vaddr, int npages, bool enc)
> +{
> +}
> #endif
> #endif /* __ASSEMBLY__ */
> #endif /* _ASM_X86_PARAVIRT_H */
> diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h
> index 732f62e04ddb..03bfd515c59c 100644
> --- a/arch/x86/include/asm/paravirt_types.h
> +++ b/arch/x86/include/asm/paravirt_types.h
> @@ -215,6 +215,8 @@ struct pv_mmu_ops {
>
> /* Hook for intercepting the destruction of an mm_struct. */
> void (*exit_mmap)(struct mm_struct *mm);
> + void (*page_encryption_changed)(unsigned long vaddr, int npages,
> + bool enc);
>
> #ifdef CONFIG_PARAVIRT_XXL
> struct paravirt_callee_save read_cr2;
> diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
> index c131ba4e70ef..840c02b23aeb 100644
> --- a/arch/x86/kernel/paravirt.c
> +++ b/arch/x86/kernel/paravirt.c
> @@ -367,6 +367,7 @@ struct paravirt_patch_template pv_ops = {
> (void (*)(struct mmu_gather *, void *))tlb_remove_page,
>
> .mmu.exit_mmap = paravirt_nop,
> + .mmu.page_encryption_changed = paravirt_nop,
>
> #ifdef CONFIG_PARAVIRT_XXL
> .mmu.read_cr2 = __PV_IS_CALLEE_SAVE(native_read_cr2),
> diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
> index f4bd4b431ba1..c9800fa811f6 100644
> --- a/arch/x86/mm/mem_encrypt.c
> +++ b/arch/x86/mm/mem_encrypt.c
> @@ -19,6 +19,7 @@
> #include <linux/kernel.h>
> #include <linux/bitops.h>
> #include <linux/dma-mapping.h>
> +#include <linux/kvm_para.h>
>
> #include <asm/tlbflush.h>
> #include <asm/fixmap.h>
> @@ -29,6 +30,7 @@
> #include <asm/processor-flags.h>
> #include <asm/msr.h>
> #include <asm/cmdline.h>
> +#include <asm/kvm_para.h>
>
> #include "mm_internal.h"
>
> @@ -196,6 +198,47 @@ void __init sme_early_init(void)
> swiotlb_force = SWIOTLB_FORCE;
> }
>
> +static void set_memory_enc_dec_hypercall(unsigned long vaddr, int npages,
> + bool enc)
> +{
> + unsigned long sz = npages << PAGE_SHIFT;
> + unsigned long vaddr_end, vaddr_next;
> +
> + vaddr_end = vaddr + sz;
> +
> + for (; vaddr < vaddr_end; vaddr = vaddr_next) {
> + int psize, pmask, level;
> + unsigned long pfn;
> + pte_t *kpte;
> +
> + kpte = lookup_address(vaddr, &level);
> + if (!kpte || pte_none(*kpte))
> + return;
> +
> + switch (level) {
> + case PG_LEVEL_4K:
> + pfn = pte_pfn(*kpte);
> + break;
> + case PG_LEVEL_2M:
> + pfn = pmd_pfn(*(pmd_t *)kpte);
> + break;
> + case PG_LEVEL_1G:
> + pfn = pud_pfn(*(pud_t *)kpte);
> + break;
> + default:
> + return;
> + }
Is it possible to re-use the code in __set_clr_pte_enc() ?
> +
> + psize = page_level_size(level);
> + pmask = page_level_mask(level);
> +
> + kvm_sev_hypercall3(KVM_HC_PAGE_ENC_STATUS,
> + pfn << PAGE_SHIFT, psize >> PAGE_SHIFT, enc);
> +
> + vaddr_next = (vaddr & pmask) + psize;
> + }
> +}
> +
> static void __init __set_clr_pte_enc(pte_t *kpte, int level, bool enc)
> {
> pgprot_t old_prot, new_prot;
> @@ -253,12 +296,13 @@ static void __init __set_clr_pte_enc(pte_t *kpte, int level, bool enc)
> static int __init early_set_memory_enc_dec(unsigned long vaddr,
> unsigned long size, bool enc)
> {
> - unsigned long vaddr_end, vaddr_next;
> + unsigned long vaddr_end, vaddr_next, start;
> unsigned long psize, pmask;
> int split_page_size_mask;
> int level, ret;
> pte_t *kpte;
>
> + start = vaddr;
> vaddr_next = vaddr;
> vaddr_end = vaddr + size;
>
> @@ -313,6 +357,8 @@ static int __init early_set_memory_enc_dec(unsigned long vaddr,
>
> ret = 0;
>
> + set_memory_enc_dec_hypercall(start, PAGE_ALIGN(size) >> PAGE_SHIFT,
> + enc);
If I haven't missed anything, it seems early_set_memory_encrypted()
doesn't have a caller. So is there a possibility that we can end up
calling it in non-SEV context and hence do we need to have the
sev_active() guard here ?
> out:
> __flush_tlb_all();
> return ret;
> @@ -451,6 +497,15 @@ void __init mem_encrypt_init(void)
> if (sev_active())
> static_branch_enable(&sev_enable_key);
>
> +#ifdef CONFIG_PARAVIRT
> + /*
> + * With SEV, we need to make a hypercall when page encryption state is
> + * changed.
> + */
> + if (sev_active())
> + pv_ops.mmu.page_encryption_changed = set_memory_enc_dec_hypercall;
> +#endif
> +
> pr_info("AMD %s active\n",
> sev_active() ? "Secure Encrypted Virtualization (SEV)"
> : "Secure Memory Encryption (SME)");
> diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
> index c4aedd00c1ba..86b7804129fc 100644
> --- a/arch/x86/mm/pat/set_memory.c
> +++ b/arch/x86/mm/pat/set_memory.c
> @@ -26,6 +26,7 @@
> #include <asm/proto.h>
> #include <asm/memtype.h>
> #include <asm/set_memory.h>
> +#include <asm/paravirt.h>
>
> #include "../mm_internal.h"
>
> @@ -1987,6 +1988,12 @@ static int __set_memory_enc_dec(unsigned long addr, int numpages, bool enc)
> */
> cpa_flush(&cpa, 0);
>
> + /* Notify hypervisor that a given memory range is mapped encrypted
> + * or decrypted. The hypervisor will use this information during the
> + * VM migration.
> + */
> + page_encryption_changed(addr, numpages, enc);
> +
> return ret;
> }
>
On 3/29/20 11:22 PM, Ashish Kalra wrote:
> From: Brijesh Singh <[email protected]>
>
> The ioctl can be used to set page encryption bitmap for an
> incoming guest.
>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: "H. Peter Anvin" <[email protected]>
> Cc: Paolo Bonzini <[email protected]>
> Cc: "Radim Krčmář" <[email protected]>
> Cc: Joerg Roedel <[email protected]>
> Cc: Borislav Petkov <[email protected]>
> Cc: Tom Lendacky <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Signed-off-by: Brijesh Singh <[email protected]>
> Signed-off-by: Ashish Kalra <[email protected]>
> ---
> Documentation/virt/kvm/api.rst | 22 +++++++++++++++++
> arch/x86/include/asm/kvm_host.h | 2 ++
> arch/x86/kvm/svm.c | 42 +++++++++++++++++++++++++++++++++
> arch/x86/kvm/x86.c | 12 ++++++++++
> include/uapi/linux/kvm.h | 1 +
> 5 files changed, 79 insertions(+)
>
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index 8ad800ebb54f..4d1004a154f6 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -4675,6 +4675,28 @@ or shared. The bitmap can be used during the guest migration, if the page
> is private then userspace need to use SEV migration commands to transmit
> the page.
>
> +4.126 KVM_SET_PAGE_ENC_BITMAP (vm ioctl)
> +---------------------------------------
> +
> +:Capability: basic
> +:Architectures: x86
> +:Type: vm ioctl
> +:Parameters: struct kvm_page_enc_bitmap (in/out)
> +:Returns: 0 on success, -1 on error
> +
> +/* for KVM_SET_PAGE_ENC_BITMAP */
> +struct kvm_page_enc_bitmap {
> + __u64 start_gfn;
> + __u64 num_pages;
> + union {
> + void __user *enc_bitmap; /* one bit per page */
> + __u64 padding2;
> + };
> +};
> +
> +During the guest live migration the outgoing guest exports its page encryption
> +bitmap, the KVM_SET_PAGE_ENC_BITMAP can be used to build the page encryption
> +bitmap for an incoming guest.
>
> 5. The kvm_run structure
> ========================
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 27e43e3ec9d8..d30f770aaaea 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1271,6 +1271,8 @@ struct kvm_x86_ops {
> unsigned long sz, unsigned long mode);
> int (*get_page_enc_bitmap)(struct kvm *kvm,
> struct kvm_page_enc_bitmap *bmap);
> + int (*set_page_enc_bitmap)(struct kvm *kvm,
> + struct kvm_page_enc_bitmap *bmap);
> };
>
> struct kvm_arch_async_pf {
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index bae783cd396a..313343a43045 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -7756,6 +7756,47 @@ static int svm_get_page_enc_bitmap(struct kvm *kvm,
> return ret;
> }
>
> +static int svm_set_page_enc_bitmap(struct kvm *kvm,
> + struct kvm_page_enc_bitmap *bmap)
> +{
> + struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> + unsigned long gfn_start, gfn_end;
> + unsigned long *bitmap;
> + unsigned long sz, i;
> + int ret;
> +
> + if (!sev_guest(kvm))
> + return -ENOTTY;
> +
> + gfn_start = bmap->start_gfn;
> + gfn_end = gfn_start + bmap->num_pages;
Same comment as the previous one. Do we continue if num_pages is zero ?
> +
> + sz = ALIGN(bmap->num_pages, BITS_PER_LONG) / 8;
> + bitmap = kmalloc(sz, GFP_KERNEL);
> + if (!bitmap)
> + return -ENOMEM;
> +
> + ret = -EFAULT;
> + if (copy_from_user(bitmap, bmap->enc_bitmap, sz))
> + goto out;
> +
> + mutex_lock(&kvm->lock);
> + ret = sev_resize_page_enc_bitmap(kvm, gfn_end);
> + if (ret)
> + goto unlock;
> +
> + i = gfn_start;
> + for_each_clear_bit_from(i, bitmap, (gfn_end - gfn_start))
> + clear_bit(i + gfn_start, sev->page_enc_bmap);
> +
> + ret = 0;
> +unlock:
> + mutex_unlock(&kvm->lock);
> +out:
> + kfree(bitmap);
> + return ret;
> +}
> +
> static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
> {
> struct kvm_sev_cmd sev_cmd;
> @@ -8161,6 +8202,7 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
>
> .page_enc_status_hc = svm_page_enc_status_hc,
> .get_page_enc_bitmap = svm_get_page_enc_bitmap,
> + .set_page_enc_bitmap = svm_set_page_enc_bitmap,
> };
>
> static int __init svm_init(void)
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 3c3fea4e20b5..05e953b2ec61 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -5238,6 +5238,18 @@ long kvm_arch_vm_ioctl(struct file *filp,
> r = kvm_x86_ops->get_page_enc_bitmap(kvm, &bitmap);
> break;
> }
> + case KVM_SET_PAGE_ENC_BITMAP: {
> + struct kvm_page_enc_bitmap bitmap;
> +
> + r = -EFAULT;
> + if (copy_from_user(&bitmap, argp, sizeof(bitmap)))
> + goto out;
> +
> + r = -ENOTTY;
> + if (kvm_x86_ops->set_page_enc_bitmap)
> + r = kvm_x86_ops->set_page_enc_bitmap(kvm, &bitmap);
> + break;
> + }
> default:
> r = -ENOTTY;
> }
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index db1ebf85e177..b4b01d47e568 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -1489,6 +1489,7 @@ struct kvm_enc_region {
> #define KVM_S390_CLEAR_RESET _IO(KVMIO, 0xc4)
>
> #define KVM_GET_PAGE_ENC_BITMAP _IOW(KVMIO, 0xc5, struct kvm_page_enc_bitmap)
> +#define KVM_SET_PAGE_ENC_BITMAP _IOW(KVMIO, 0xc6, struct kvm_page_enc_bitmap)
>
> /* Secure Encrypted Virtualization command */
> enum sev_cmd_id {
Reviewed-by: Krish Sadhukhan <[email protected]>
On 3/29/20 11:23 PM, Ashish Kalra wrote:
> From: Ashish Kalra <[email protected]>
>
> This ioctl can be used by the application to reset the page
> encryption bitmap managed by the KVM driver. A typical usage
> for this ioctl is on VM reboot, on reboot, we must reinitialize
> the bitmap.
>
> Signed-off-by: Ashish Kalra <[email protected]>
> ---
> Documentation/virt/kvm/api.rst | 13 +++++++++++++
> arch/x86/include/asm/kvm_host.h | 1 +
> arch/x86/kvm/svm.c | 16 ++++++++++++++++
> arch/x86/kvm/x86.c | 6 ++++++
> include/uapi/linux/kvm.h | 1 +
> 5 files changed, 37 insertions(+)
>
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index 4d1004a154f6..a11326ccc51d 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -4698,6 +4698,19 @@ During the guest live migration the outgoing guest exports its page encryption
> bitmap, the KVM_SET_PAGE_ENC_BITMAP can be used to build the page encryption
> bitmap for an incoming guest.
>
> +4.127 KVM_PAGE_ENC_BITMAP_RESET (vm ioctl)
> +-----------------------------------------
> +
> +:Capability: basic
> +:Architectures: x86
> +:Type: vm ioctl
> +:Parameters: none
> +:Returns: 0 on success, -1 on error
> +
> +The KVM_PAGE_ENC_BITMAP_RESET is used to reset the guest's page encryption
> +bitmap during guest reboot and this is only done on the guest's boot vCPU.
> +
> +
> 5. The kvm_run structure
> ========================
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index d30f770aaaea..a96ef6338cd2 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1273,6 +1273,7 @@ struct kvm_x86_ops {
> struct kvm_page_enc_bitmap *bmap);
> int (*set_page_enc_bitmap)(struct kvm *kvm,
> struct kvm_page_enc_bitmap *bmap);
> + int (*reset_page_enc_bitmap)(struct kvm *kvm);
> };
>
> struct kvm_arch_async_pf {
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index 313343a43045..c99b0207a443 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -7797,6 +7797,21 @@ static int svm_set_page_enc_bitmap(struct kvm *kvm,
> return ret;
> }
>
> +static int svm_reset_page_enc_bitmap(struct kvm *kvm)
> +{
> + struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> +
> + if (!sev_guest(kvm))
> + return -ENOTTY;
> +
> + mutex_lock(&kvm->lock);
> + /* by default all pages should be marked encrypted */
> + if (sev->page_enc_bmap_size)
> + bitmap_fill(sev->page_enc_bmap, sev->page_enc_bmap_size);
> + mutex_unlock(&kvm->lock);
> + return 0;
> +}
> +
> static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
> {
> struct kvm_sev_cmd sev_cmd;
> @@ -8203,6 +8218,7 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
> .page_enc_status_hc = svm_page_enc_status_hc,
> .get_page_enc_bitmap = svm_get_page_enc_bitmap,
> .set_page_enc_bitmap = svm_set_page_enc_bitmap,
> + .reset_page_enc_bitmap = svm_reset_page_enc_bitmap,
We don't need to initialize the intel ops to NULL ? It's not initialized
in the previous patch either.
> };
>
> static int __init svm_init(void)
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 05e953b2ec61..2127ed937f53 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -5250,6 +5250,12 @@ long kvm_arch_vm_ioctl(struct file *filp,
> r = kvm_x86_ops->set_page_enc_bitmap(kvm, &bitmap);
> break;
> }
> + case KVM_PAGE_ENC_BITMAP_RESET: {
> + r = -ENOTTY;
> + if (kvm_x86_ops->reset_page_enc_bitmap)
> + r = kvm_x86_ops->reset_page_enc_bitmap(kvm);
> + break;
> + }
> default:
> r = -ENOTTY;
> }
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index b4b01d47e568..0884a581fc37 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -1490,6 +1490,7 @@ struct kvm_enc_region {
>
> #define KVM_GET_PAGE_ENC_BITMAP _IOW(KVMIO, 0xc5, struct kvm_page_enc_bitmap)
> #define KVM_SET_PAGE_ENC_BITMAP _IOW(KVMIO, 0xc6, struct kvm_page_enc_bitmap)
> +#define KVM_PAGE_ENC_BITMAP_RESET _IO(KVMIO, 0xc7)
>
> /* Secure Encrypted Virtualization command */
> enum sev_cmd_id {
Reviewed-by: Krish Sadhukhan <[email protected]>
On Fri, Apr 03, 2020 at 02:07:02PM -0700, Krish Sadhukhan wrote:
>
> On 3/29/20 11:22 PM, Ashish Kalra wrote:
> > From: Brijesh Singh <[email protected]>
> >
> > Invoke a hypercall when a memory region is changed from encrypted ->
> > decrypted and vice versa. Hypervisor need to know the page encryption
> > status during the guest migration.
> >
> > Cc: Thomas Gleixner <[email protected]>
> > Cc: Ingo Molnar <[email protected]>
> > Cc: "H. Peter Anvin" <[email protected]>
> > Cc: Paolo Bonzini <[email protected]>
> > Cc: "Radim Krčmář" <[email protected]>
> > Cc: Joerg Roedel <[email protected]>
> > Cc: Borislav Petkov <[email protected]>
> > Cc: Tom Lendacky <[email protected]>
> > Cc: [email protected]
> > Cc: [email protected]
> > Cc: [email protected]
> > Signed-off-by: Brijesh Singh <[email protected]>
> > Signed-off-by: Ashish Kalra <[email protected]>
> > ---
> > arch/x86/include/asm/paravirt.h | 10 +++++
> > arch/x86/include/asm/paravirt_types.h | 2 +
> > arch/x86/kernel/paravirt.c | 1 +
> > arch/x86/mm/mem_encrypt.c | 57 ++++++++++++++++++++++++++-
> > arch/x86/mm/pat/set_memory.c | 7 ++++
> > 5 files changed, 76 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
> > index 694d8daf4983..8127b9c141bf 100644
> > --- a/arch/x86/include/asm/paravirt.h
> > +++ b/arch/x86/include/asm/paravirt.h
> > @@ -78,6 +78,12 @@ static inline void paravirt_arch_exit_mmap(struct mm_struct *mm)
> > PVOP_VCALL1(mmu.exit_mmap, mm);
> > }
> > +static inline void page_encryption_changed(unsigned long vaddr, int npages,
> > + bool enc)
> > +{
> > + PVOP_VCALL3(mmu.page_encryption_changed, vaddr, npages, enc);
> > +}
> > +
> > #ifdef CONFIG_PARAVIRT_XXL
> > static inline void load_sp0(unsigned long sp0)
> > {
> > @@ -946,6 +952,10 @@ static inline void paravirt_arch_dup_mmap(struct mm_struct *oldmm,
> > static inline void paravirt_arch_exit_mmap(struct mm_struct *mm)
> > {
> > }
> > +
> > +static inline void page_encryption_changed(unsigned long vaddr, int npages, bool enc)
> > +{
> > +}
> > #endif
> > #endif /* __ASSEMBLY__ */
> > #endif /* _ASM_X86_PARAVIRT_H */
> > diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h
> > index 732f62e04ddb..03bfd515c59c 100644
> > --- a/arch/x86/include/asm/paravirt_types.h
> > +++ b/arch/x86/include/asm/paravirt_types.h
> > @@ -215,6 +215,8 @@ struct pv_mmu_ops {
> > /* Hook for intercepting the destruction of an mm_struct. */
> > void (*exit_mmap)(struct mm_struct *mm);
> > + void (*page_encryption_changed)(unsigned long vaddr, int npages,
> > + bool enc);
> > #ifdef CONFIG_PARAVIRT_XXL
> > struct paravirt_callee_save read_cr2;
> > diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
> > index c131ba4e70ef..840c02b23aeb 100644
> > --- a/arch/x86/kernel/paravirt.c
> > +++ b/arch/x86/kernel/paravirt.c
> > @@ -367,6 +367,7 @@ struct paravirt_patch_template pv_ops = {
> > (void (*)(struct mmu_gather *, void *))tlb_remove_page,
> > .mmu.exit_mmap = paravirt_nop,
> > + .mmu.page_encryption_changed = paravirt_nop,
> > #ifdef CONFIG_PARAVIRT_XXL
> > .mmu.read_cr2 = __PV_IS_CALLEE_SAVE(native_read_cr2),
> > diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
> > index f4bd4b431ba1..c9800fa811f6 100644
> > --- a/arch/x86/mm/mem_encrypt.c
> > +++ b/arch/x86/mm/mem_encrypt.c
> > @@ -19,6 +19,7 @@
> > #include <linux/kernel.h>
> > #include <linux/bitops.h>
> > #include <linux/dma-mapping.h>
> > +#include <linux/kvm_para.h>
> > #include <asm/tlbflush.h>
> > #include <asm/fixmap.h>
> > @@ -29,6 +30,7 @@
> > #include <asm/processor-flags.h>
> > #include <asm/msr.h>
> > #include <asm/cmdline.h>
> > +#include <asm/kvm_para.h>
> > #include "mm_internal.h"
> > @@ -196,6 +198,47 @@ void __init sme_early_init(void)
> > swiotlb_force = SWIOTLB_FORCE;
> > }
> > +static void set_memory_enc_dec_hypercall(unsigned long vaddr, int npages,
> > + bool enc)
> > +{
> > + unsigned long sz = npages << PAGE_SHIFT;
> > + unsigned long vaddr_end, vaddr_next;
> > +
> > + vaddr_end = vaddr + sz;
> > +
> > + for (; vaddr < vaddr_end; vaddr = vaddr_next) {
> > + int psize, pmask, level;
> > + unsigned long pfn;
> > + pte_t *kpte;
> > +
> > + kpte = lookup_address(vaddr, &level);
> > + if (!kpte || pte_none(*kpte))
> > + return;
> > +
> > + switch (level) {
> > + case PG_LEVEL_4K:
> > + pfn = pte_pfn(*kpte);
> > + break;
> > + case PG_LEVEL_2M:
> > + pfn = pmd_pfn(*(pmd_t *)kpte);
> > + break;
> > + case PG_LEVEL_1G:
> > + pfn = pud_pfn(*(pud_t *)kpte);
> > + break;
> > + default:
> > + return;
> > + }
>
>
> Is it possible to re-use the code in __set_clr_pte_enc() ?
>
> > +
> > + psize = page_level_size(level);
> > + pmask = page_level_mask(level);
> > +
> > + kvm_sev_hypercall3(KVM_HC_PAGE_ENC_STATUS,
> > + pfn << PAGE_SHIFT, psize >> PAGE_SHIFT, enc);
> > +
> > + vaddr_next = (vaddr & pmask) + psize;
> > + }
> > +}
> > +
> > static void __init __set_clr_pte_enc(pte_t *kpte, int level, bool enc)
> > {
> > pgprot_t old_prot, new_prot;
> > @@ -253,12 +296,13 @@ static void __init __set_clr_pte_enc(pte_t *kpte, int level, bool enc)
> > static int __init early_set_memory_enc_dec(unsigned long vaddr,
> > unsigned long size, bool enc)
> > {
> > - unsigned long vaddr_end, vaddr_next;
> > + unsigned long vaddr_end, vaddr_next, start;
> > unsigned long psize, pmask;
> > int split_page_size_mask;
> > int level, ret;
> > pte_t *kpte;
> > + start = vaddr;
> > vaddr_next = vaddr;
> > vaddr_end = vaddr + size;
> > @@ -313,6 +357,8 @@ static int __init early_set_memory_enc_dec(unsigned long vaddr,
> > ret = 0;
> > + set_memory_enc_dec_hypercall(start, PAGE_ALIGN(size) >> PAGE_SHIFT,
> > + enc);
>
>
> If I haven't missed anything, it seems early_set_memory_encrypted() doesn't
> have a caller. So is there a possibility that we can end up calling it in
> non-SEV context and hence do we need to have the sev_active() guard here ?
>
As of now early_set_memory_encrypted() is not used, but
early_set_memory_decrypted() is used in __set_percpu_decrypted() and
that is called with the sev_active() check.
Thanks,
Ashish
> > out:
> > __flush_tlb_all();
> > return ret;
> > @@ -451,6 +497,15 @@ void __init mem_encrypt_init(void)
> > if (sev_active())
> > static_branch_enable(&sev_enable_key);
> > +#ifdef CONFIG_PARAVIRT
> > + /*
> > + * With SEV, we need to make a hypercall when page encryption state is
> > + * changed.
> > + */
> > + if (sev_active())
> > + pv_ops.mmu.page_encryption_changed = set_memory_enc_dec_hypercall;
> > +#endif
> > +
> > pr_info("AMD %s active\n",
> > sev_active() ? "Secure Encrypted Virtualization (SEV)"
> > : "Secure Memory Encryption (SME)");
> > diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
> > index c4aedd00c1ba..86b7804129fc 100644
> > --- a/arch/x86/mm/pat/set_memory.c
> > +++ b/arch/x86/mm/pat/set_memory.c
> > @@ -26,6 +26,7 @@
> > #include <asm/proto.h>
> > #include <asm/memtype.h>
> > #include <asm/set_memory.h>
> > +#include <asm/paravirt.h>
> > #include "../mm_internal.h"
> > @@ -1987,6 +1988,12 @@ static int __set_memory_enc_dec(unsigned long addr, int numpages, bool enc)
> > */
> > cpa_flush(&cpa, 0);
> > + /* Notify hypervisor that a given memory range is mapped encrypted
> > + * or decrypted. The hypervisor will use this information during the
> > + * VM migration.
> > + */
> > + page_encryption_changed(addr, numpages, enc);
> > +
> > return ret;
> > }
On 2020-03-30 06:22:38 +0000, Ashish Kalra wrote:
> From: Brijesh Singh <[email protected]>
>
> Invoke a hypercall when a memory region is changed from encrypted ->
> decrypted and vice versa. Hypervisor need to know the page encryption
s/need/needs/
> status during the guest migration.
>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: "H. Peter Anvin" <[email protected]>
> Cc: Paolo Bonzini <[email protected]>
> Cc: "Radim Krčmář" <[email protected]>
> Cc: Joerg Roedel <[email protected]>
> Cc: Borislav Petkov <[email protected]>
> Cc: Tom Lendacky <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Signed-off-by: Brijesh Singh <[email protected]>
> Signed-off-by: Ashish Kalra <[email protected]>
Reviewed-by: Venu Busireddy <[email protected]>
> ---
> arch/x86/include/asm/paravirt.h | 10 +++++
> arch/x86/include/asm/paravirt_types.h | 2 +
> arch/x86/kernel/paravirt.c | 1 +
> arch/x86/mm/mem_encrypt.c | 57 ++++++++++++++++++++++++++-
> arch/x86/mm/pat/set_memory.c | 7 ++++
> 5 files changed, 76 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
> index 694d8daf4983..8127b9c141bf 100644
> --- a/arch/x86/include/asm/paravirt.h
> +++ b/arch/x86/include/asm/paravirt.h
> @@ -78,6 +78,12 @@ static inline void paravirt_arch_exit_mmap(struct mm_struct *mm)
> PVOP_VCALL1(mmu.exit_mmap, mm);
> }
>
> +static inline void page_encryption_changed(unsigned long vaddr, int npages,
> + bool enc)
> +{
> + PVOP_VCALL3(mmu.page_encryption_changed, vaddr, npages, enc);
> +}
> +
> #ifdef CONFIG_PARAVIRT_XXL
> static inline void load_sp0(unsigned long sp0)
> {
> @@ -946,6 +952,10 @@ static inline void paravirt_arch_dup_mmap(struct mm_struct *oldmm,
> static inline void paravirt_arch_exit_mmap(struct mm_struct *mm)
> {
> }
> +
> +static inline void page_encryption_changed(unsigned long vaddr, int npages, bool enc)
> +{
> +}
> #endif
> #endif /* __ASSEMBLY__ */
> #endif /* _ASM_X86_PARAVIRT_H */
> diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h
> index 732f62e04ddb..03bfd515c59c 100644
> --- a/arch/x86/include/asm/paravirt_types.h
> +++ b/arch/x86/include/asm/paravirt_types.h
> @@ -215,6 +215,8 @@ struct pv_mmu_ops {
>
> /* Hook for intercepting the destruction of an mm_struct. */
> void (*exit_mmap)(struct mm_struct *mm);
> + void (*page_encryption_changed)(unsigned long vaddr, int npages,
> + bool enc);
>
> #ifdef CONFIG_PARAVIRT_XXL
> struct paravirt_callee_save read_cr2;
> diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
> index c131ba4e70ef..840c02b23aeb 100644
> --- a/arch/x86/kernel/paravirt.c
> +++ b/arch/x86/kernel/paravirt.c
> @@ -367,6 +367,7 @@ struct paravirt_patch_template pv_ops = {
> (void (*)(struct mmu_gather *, void *))tlb_remove_page,
>
> .mmu.exit_mmap = paravirt_nop,
> + .mmu.page_encryption_changed = paravirt_nop,
>
> #ifdef CONFIG_PARAVIRT_XXL
> .mmu.read_cr2 = __PV_IS_CALLEE_SAVE(native_read_cr2),
> diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
> index f4bd4b431ba1..c9800fa811f6 100644
> --- a/arch/x86/mm/mem_encrypt.c
> +++ b/arch/x86/mm/mem_encrypt.c
> @@ -19,6 +19,7 @@
> #include <linux/kernel.h>
> #include <linux/bitops.h>
> #include <linux/dma-mapping.h>
> +#include <linux/kvm_para.h>
>
> #include <asm/tlbflush.h>
> #include <asm/fixmap.h>
> @@ -29,6 +30,7 @@
> #include <asm/processor-flags.h>
> #include <asm/msr.h>
> #include <asm/cmdline.h>
> +#include <asm/kvm_para.h>
>
> #include "mm_internal.h"
>
> @@ -196,6 +198,47 @@ void __init sme_early_init(void)
> swiotlb_force = SWIOTLB_FORCE;
> }
>
> +static void set_memory_enc_dec_hypercall(unsigned long vaddr, int npages,
> + bool enc)
> +{
> + unsigned long sz = npages << PAGE_SHIFT;
> + unsigned long vaddr_end, vaddr_next;
> +
> + vaddr_end = vaddr + sz;
> +
> + for (; vaddr < vaddr_end; vaddr = vaddr_next) {
> + int psize, pmask, level;
> + unsigned long pfn;
> + pte_t *kpte;
> +
> + kpte = lookup_address(vaddr, &level);
> + if (!kpte || pte_none(*kpte))
> + return;
> +
> + switch (level) {
> + case PG_LEVEL_4K:
> + pfn = pte_pfn(*kpte);
> + break;
> + case PG_LEVEL_2M:
> + pfn = pmd_pfn(*(pmd_t *)kpte);
> + break;
> + case PG_LEVEL_1G:
> + pfn = pud_pfn(*(pud_t *)kpte);
> + break;
> + default:
> + return;
> + }
> +
> + psize = page_level_size(level);
> + pmask = page_level_mask(level);
> +
> + kvm_sev_hypercall3(KVM_HC_PAGE_ENC_STATUS,
> + pfn << PAGE_SHIFT, psize >> PAGE_SHIFT, enc);
> +
> + vaddr_next = (vaddr & pmask) + psize;
> + }
> +}
> +
> static void __init __set_clr_pte_enc(pte_t *kpte, int level, bool enc)
> {
> pgprot_t old_prot, new_prot;
> @@ -253,12 +296,13 @@ static void __init __set_clr_pte_enc(pte_t *kpte, int level, bool enc)
> static int __init early_set_memory_enc_dec(unsigned long vaddr,
> unsigned long size, bool enc)
> {
> - unsigned long vaddr_end, vaddr_next;
> + unsigned long vaddr_end, vaddr_next, start;
> unsigned long psize, pmask;
> int split_page_size_mask;
> int level, ret;
> pte_t *kpte;
>
> + start = vaddr;
> vaddr_next = vaddr;
> vaddr_end = vaddr + size;
>
> @@ -313,6 +357,8 @@ static int __init early_set_memory_enc_dec(unsigned long vaddr,
>
> ret = 0;
>
> + set_memory_enc_dec_hypercall(start, PAGE_ALIGN(size) >> PAGE_SHIFT,
> + enc);
> out:
> __flush_tlb_all();
> return ret;
> @@ -451,6 +497,15 @@ void __init mem_encrypt_init(void)
> if (sev_active())
> static_branch_enable(&sev_enable_key);
>
> +#ifdef CONFIG_PARAVIRT
> + /*
> + * With SEV, we need to make a hypercall when page encryption state is
> + * changed.
> + */
> + if (sev_active())
> + pv_ops.mmu.page_encryption_changed = set_memory_enc_dec_hypercall;
> +#endif
> +
> pr_info("AMD %s active\n",
> sev_active() ? "Secure Encrypted Virtualization (SEV)"
> : "Secure Memory Encryption (SME)");
> diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
> index c4aedd00c1ba..86b7804129fc 100644
> --- a/arch/x86/mm/pat/set_memory.c
> +++ b/arch/x86/mm/pat/set_memory.c
> @@ -26,6 +26,7 @@
> #include <asm/proto.h>
> #include <asm/memtype.h>
> #include <asm/set_memory.h>
> +#include <asm/paravirt.h>
>
> #include "../mm_internal.h"
>
> @@ -1987,6 +1988,12 @@ static int __set_memory_enc_dec(unsigned long addr, int numpages, bool enc)
> */
> cpa_flush(&cpa, 0);
>
> + /* Notify hypervisor that a given memory range is mapped encrypted
> + * or decrypted. The hypervisor will use this information during the
> + * VM migration.
> + */
> + page_encryption_changed(addr, numpages, enc);
> +
> return ret;
> }
>
> --
> 2.17.1
>
On Fri, Apr 03, 2020 at 02:14:23PM -0700, Krish Sadhukhan wrote:
>
> On 3/29/20 11:23 PM, Ashish Kalra wrote:
> > From: Ashish Kalra <[email protected]>
> >
> > This ioctl can be used by the application to reset the page
> > encryption bitmap managed by the KVM driver. A typical usage
> > for this ioctl is on VM reboot, on reboot, we must reinitialize
> > the bitmap.
> >
> > Signed-off-by: Ashish Kalra <[email protected]>
> > ---
> > Documentation/virt/kvm/api.rst | 13 +++++++++++++
> > arch/x86/include/asm/kvm_host.h | 1 +
> > arch/x86/kvm/svm.c | 16 ++++++++++++++++
> > arch/x86/kvm/x86.c | 6 ++++++
> > include/uapi/linux/kvm.h | 1 +
> > 5 files changed, 37 insertions(+)
> >
> > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> > index 4d1004a154f6..a11326ccc51d 100644
> > --- a/Documentation/virt/kvm/api.rst
> > +++ b/Documentation/virt/kvm/api.rst
> > @@ -4698,6 +4698,19 @@ During the guest live migration the outgoing guest exports its page encryption
> > bitmap, the KVM_SET_PAGE_ENC_BITMAP can be used to build the page encryption
> > bitmap for an incoming guest.
> > +4.127 KVM_PAGE_ENC_BITMAP_RESET (vm ioctl)
> > +-----------------------------------------
> > +
> > +:Capability: basic
> > +:Architectures: x86
> > +:Type: vm ioctl
> > +:Parameters: none
> > +:Returns: 0 on success, -1 on error
> > +
> > +The KVM_PAGE_ENC_BITMAP_RESET is used to reset the guest's page encryption
> > +bitmap during guest reboot and this is only done on the guest's boot vCPU.
> > +
> > +
> > 5. The kvm_run structure
> > ========================
> > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > index d30f770aaaea..a96ef6338cd2 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -1273,6 +1273,7 @@ struct kvm_x86_ops {
> > struct kvm_page_enc_bitmap *bmap);
> > int (*set_page_enc_bitmap)(struct kvm *kvm,
> > struct kvm_page_enc_bitmap *bmap);
> > + int (*reset_page_enc_bitmap)(struct kvm *kvm);
> > };
> > struct kvm_arch_async_pf {
> > diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> > index 313343a43045..c99b0207a443 100644
> > --- a/arch/x86/kvm/svm.c
> > +++ b/arch/x86/kvm/svm.c
> > @@ -7797,6 +7797,21 @@ static int svm_set_page_enc_bitmap(struct kvm *kvm,
> > return ret;
> > }
> > +static int svm_reset_page_enc_bitmap(struct kvm *kvm)
> > +{
> > + struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> > +
> > + if (!sev_guest(kvm))
> > + return -ENOTTY;
> > +
> > + mutex_lock(&kvm->lock);
> > + /* by default all pages should be marked encrypted */
> > + if (sev->page_enc_bmap_size)
> > + bitmap_fill(sev->page_enc_bmap, sev->page_enc_bmap_size);
> > + mutex_unlock(&kvm->lock);
> > + return 0;
> > +}
> > +
> > static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
> > {
> > struct kvm_sev_cmd sev_cmd;
> > @@ -8203,6 +8218,7 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
> > .page_enc_status_hc = svm_page_enc_status_hc,
> > .get_page_enc_bitmap = svm_get_page_enc_bitmap,
> > .set_page_enc_bitmap = svm_set_page_enc_bitmap,
> > + .reset_page_enc_bitmap = svm_reset_page_enc_bitmap,
>
>
> We don't need to initialize the intel ops to NULL ? It's not initialized in
> the previous patch either.
>
> > };
This struct is declared as "static storage", so won't the non-initialized
members be 0 ?
> > static int __init svm_init(void)
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 05e953b2ec61..2127ed937f53 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -5250,6 +5250,12 @@ long kvm_arch_vm_ioctl(struct file *filp,
> > r = kvm_x86_ops->set_page_enc_bitmap(kvm, &bitmap);
> > break;
> > }
> > + case KVM_PAGE_ENC_BITMAP_RESET: {
> > + r = -ENOTTY;
> > + if (kvm_x86_ops->reset_page_enc_bitmap)
> > + r = kvm_x86_ops->reset_page_enc_bitmap(kvm);
> > + break;
> > + }
> > default:
> > r = -ENOTTY;
> > }
> > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> > index b4b01d47e568..0884a581fc37 100644
> > --- a/include/uapi/linux/kvm.h
> > +++ b/include/uapi/linux/kvm.h
> > @@ -1490,6 +1490,7 @@ struct kvm_enc_region {
> > #define KVM_GET_PAGE_ENC_BITMAP _IOW(KVMIO, 0xc5, struct kvm_page_enc_bitmap)
> > #define KVM_SET_PAGE_ENC_BITMAP _IOW(KVMIO, 0xc6, struct kvm_page_enc_bitmap)
> > +#define KVM_PAGE_ENC_BITMAP_RESET _IO(KVMIO, 0xc7)
> > /* Secure Encrypted Virtualization command */
> > enum sev_cmd_id {
> Reviewed-by: Krish Sadhukhan <[email protected]>
On 2020-03-30 06:22:55 +0000, Ashish Kalra wrote:
> From: Brijesh Singh <[email protected]>
>
> The ioctl can be used to set page encryption bitmap for an
> incoming guest.
>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: "H. Peter Anvin" <[email protected]>
> Cc: Paolo Bonzini <[email protected]>
> Cc: "Radim Krčmář" <[email protected]>
> Cc: Joerg Roedel <[email protected]>
> Cc: Borislav Petkov <[email protected]>
> Cc: Tom Lendacky <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Signed-off-by: Brijesh Singh <[email protected]>
> Signed-off-by: Ashish Kalra <[email protected]>
Reviewed-by: Venu Busireddy <[email protected]>
> ---
> Documentation/virt/kvm/api.rst | 22 +++++++++++++++++
> arch/x86/include/asm/kvm_host.h | 2 ++
> arch/x86/kvm/svm.c | 42 +++++++++++++++++++++++++++++++++
> arch/x86/kvm/x86.c | 12 ++++++++++
> include/uapi/linux/kvm.h | 1 +
> 5 files changed, 79 insertions(+)
>
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index 8ad800ebb54f..4d1004a154f6 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -4675,6 +4675,28 @@ or shared. The bitmap can be used during the guest migration, if the page
> is private then userspace need to use SEV migration commands to transmit
> the page.
>
> +4.126 KVM_SET_PAGE_ENC_BITMAP (vm ioctl)
> +---------------------------------------
> +
> +:Capability: basic
> +:Architectures: x86
> +:Type: vm ioctl
> +:Parameters: struct kvm_page_enc_bitmap (in/out)
> +:Returns: 0 on success, -1 on error
> +
> +/* for KVM_SET_PAGE_ENC_BITMAP */
> +struct kvm_page_enc_bitmap {
> + __u64 start_gfn;
> + __u64 num_pages;
> + union {
> + void __user *enc_bitmap; /* one bit per page */
> + __u64 padding2;
> + };
> +};
> +
> +During the guest live migration the outgoing guest exports its page encryption
> +bitmap, the KVM_SET_PAGE_ENC_BITMAP can be used to build the page encryption
> +bitmap for an incoming guest.
>
> 5. The kvm_run structure
> ========================
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 27e43e3ec9d8..d30f770aaaea 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1271,6 +1271,8 @@ struct kvm_x86_ops {
> unsigned long sz, unsigned long mode);
> int (*get_page_enc_bitmap)(struct kvm *kvm,
> struct kvm_page_enc_bitmap *bmap);
> + int (*set_page_enc_bitmap)(struct kvm *kvm,
> + struct kvm_page_enc_bitmap *bmap);
> };
>
> struct kvm_arch_async_pf {
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index bae783cd396a..313343a43045 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -7756,6 +7756,47 @@ static int svm_get_page_enc_bitmap(struct kvm *kvm,
> return ret;
> }
>
> +static int svm_set_page_enc_bitmap(struct kvm *kvm,
> + struct kvm_page_enc_bitmap *bmap)
> +{
> + struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> + unsigned long gfn_start, gfn_end;
> + unsigned long *bitmap;
> + unsigned long sz, i;
> + int ret;
> +
> + if (!sev_guest(kvm))
> + return -ENOTTY;
> +
> + gfn_start = bmap->start_gfn;
> + gfn_end = gfn_start + bmap->num_pages;
> +
> + sz = ALIGN(bmap->num_pages, BITS_PER_LONG) / 8;
> + bitmap = kmalloc(sz, GFP_KERNEL);
> + if (!bitmap)
> + return -ENOMEM;
> +
> + ret = -EFAULT;
> + if (copy_from_user(bitmap, bmap->enc_bitmap, sz))
> + goto out;
> +
> + mutex_lock(&kvm->lock);
> + ret = sev_resize_page_enc_bitmap(kvm, gfn_end);
> + if (ret)
> + goto unlock;
> +
> + i = gfn_start;
> + for_each_clear_bit_from(i, bitmap, (gfn_end - gfn_start))
> + clear_bit(i + gfn_start, sev->page_enc_bmap);
> +
> + ret = 0;
> +unlock:
> + mutex_unlock(&kvm->lock);
> +out:
> + kfree(bitmap);
> + return ret;
> +}
> +
> static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
> {
> struct kvm_sev_cmd sev_cmd;
> @@ -8161,6 +8202,7 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
>
> .page_enc_status_hc = svm_page_enc_status_hc,
> .get_page_enc_bitmap = svm_get_page_enc_bitmap,
> + .set_page_enc_bitmap = svm_set_page_enc_bitmap,
> };
>
> static int __init svm_init(void)
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 3c3fea4e20b5..05e953b2ec61 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -5238,6 +5238,18 @@ long kvm_arch_vm_ioctl(struct file *filp,
> r = kvm_x86_ops->get_page_enc_bitmap(kvm, &bitmap);
> break;
> }
> + case KVM_SET_PAGE_ENC_BITMAP: {
> + struct kvm_page_enc_bitmap bitmap;
> +
> + r = -EFAULT;
> + if (copy_from_user(&bitmap, argp, sizeof(bitmap)))
> + goto out;
> +
> + r = -ENOTTY;
> + if (kvm_x86_ops->set_page_enc_bitmap)
> + r = kvm_x86_ops->set_page_enc_bitmap(kvm, &bitmap);
> + break;
> + }
> default:
> r = -ENOTTY;
> }
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index db1ebf85e177..b4b01d47e568 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -1489,6 +1489,7 @@ struct kvm_enc_region {
> #define KVM_S390_CLEAR_RESET _IO(KVMIO, 0xc4)
>
> #define KVM_GET_PAGE_ENC_BITMAP _IOW(KVMIO, 0xc5, struct kvm_page_enc_bitmap)
> +#define KVM_SET_PAGE_ENC_BITMAP _IOW(KVMIO, 0xc6, struct kvm_page_enc_bitmap)
>
> /* Secure Encrypted Virtualization command */
> enum sev_cmd_id {
> --
> 2.17.1
>
On 2020-03-30 06:23:10 +0000, Ashish Kalra wrote:
> From: Ashish Kalra <[email protected]>
>
> This ioctl can be used by the application to reset the page
> encryption bitmap managed by the KVM driver. A typical usage
> for this ioctl is on VM reboot, on reboot, we must reinitialize
> the bitmap.
>
> Signed-off-by: Ashish Kalra <[email protected]>
Reviewed-by: Venu Busireddy <[email protected]>
> ---
> Documentation/virt/kvm/api.rst | 13 +++++++++++++
> arch/x86/include/asm/kvm_host.h | 1 +
> arch/x86/kvm/svm.c | 16 ++++++++++++++++
> arch/x86/kvm/x86.c | 6 ++++++
> include/uapi/linux/kvm.h | 1 +
> 5 files changed, 37 insertions(+)
>
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index 4d1004a154f6..a11326ccc51d 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -4698,6 +4698,19 @@ During the guest live migration the outgoing guest exports its page encryption
> bitmap, the KVM_SET_PAGE_ENC_BITMAP can be used to build the page encryption
> bitmap for an incoming guest.
>
> +4.127 KVM_PAGE_ENC_BITMAP_RESET (vm ioctl)
> +-----------------------------------------
> +
> +:Capability: basic
> +:Architectures: x86
> +:Type: vm ioctl
> +:Parameters: none
> +:Returns: 0 on success, -1 on error
> +
> +The KVM_PAGE_ENC_BITMAP_RESET is used to reset the guest's page encryption
> +bitmap during guest reboot and this is only done on the guest's boot vCPU.
> +
> +
> 5. The kvm_run structure
> ========================
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index d30f770aaaea..a96ef6338cd2 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1273,6 +1273,7 @@ struct kvm_x86_ops {
> struct kvm_page_enc_bitmap *bmap);
> int (*set_page_enc_bitmap)(struct kvm *kvm,
> struct kvm_page_enc_bitmap *bmap);
> + int (*reset_page_enc_bitmap)(struct kvm *kvm);
> };
>
> struct kvm_arch_async_pf {
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index 313343a43045..c99b0207a443 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -7797,6 +7797,21 @@ static int svm_set_page_enc_bitmap(struct kvm *kvm,
> return ret;
> }
>
> +static int svm_reset_page_enc_bitmap(struct kvm *kvm)
> +{
> + struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> +
> + if (!sev_guest(kvm))
> + return -ENOTTY;
> +
> + mutex_lock(&kvm->lock);
> + /* by default all pages should be marked encrypted */
> + if (sev->page_enc_bmap_size)
> + bitmap_fill(sev->page_enc_bmap, sev->page_enc_bmap_size);
> + mutex_unlock(&kvm->lock);
> + return 0;
> +}
> +
> static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
> {
> struct kvm_sev_cmd sev_cmd;
> @@ -8203,6 +8218,7 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
> .page_enc_status_hc = svm_page_enc_status_hc,
> .get_page_enc_bitmap = svm_get_page_enc_bitmap,
> .set_page_enc_bitmap = svm_set_page_enc_bitmap,
> + .reset_page_enc_bitmap = svm_reset_page_enc_bitmap,
> };
>
> static int __init svm_init(void)
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 05e953b2ec61..2127ed937f53 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -5250,6 +5250,12 @@ long kvm_arch_vm_ioctl(struct file *filp,
> r = kvm_x86_ops->set_page_enc_bitmap(kvm, &bitmap);
> break;
> }
> + case KVM_PAGE_ENC_BITMAP_RESET: {
> + r = -ENOTTY;
> + if (kvm_x86_ops->reset_page_enc_bitmap)
> + r = kvm_x86_ops->reset_page_enc_bitmap(kvm);
> + break;
> + }
> default:
> r = -ENOTTY;
> }
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index b4b01d47e568..0884a581fc37 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -1490,6 +1490,7 @@ struct kvm_enc_region {
>
> #define KVM_GET_PAGE_ENC_BITMAP _IOW(KVMIO, 0xc5, struct kvm_page_enc_bitmap)
> #define KVM_SET_PAGE_ENC_BITMAP _IOW(KVMIO, 0xc6, struct kvm_page_enc_bitmap)
> +#define KVM_PAGE_ENC_BITMAP_RESET _IO(KVMIO, 0xc7)
>
> /* Secure Encrypted Virtualization command */
> enum sev_cmd_id {
> --
> 2.17.1
>
On 3/29/20 11:23 PM, Ashish Kalra wrote:
> From: Ashish Kalra <[email protected]>
>
> Add new KVM_FEATURE_SEV_LIVE_MIGRATION feature for guest to check
> for host-side support for SEV live migration. Also add a new custom
> MSR_KVM_SEV_LIVE_MIG_EN for guest to enable the SEV live migration
> feature.
>
> Also, ensure that _bss_decrypted section is marked as decrypted in the
> page encryption bitmap.
>
> Signed-off-by: Ashish Kalra <[email protected]>
> ---
> Documentation/virt/kvm/cpuid.rst | 4 ++++
> Documentation/virt/kvm/msr.rst | 10 ++++++++++
> arch/x86/include/asm/kvm_host.h | 3 +++
> arch/x86/include/uapi/asm/kvm_para.h | 5 +++++
> arch/x86/kernel/kvm.c | 4 ++++
> arch/x86/kvm/cpuid.c | 3 ++-
> arch/x86/kvm/svm.c | 5 +++++
> arch/x86/kvm/x86.c | 7 +++++++
> arch/x86/mm/mem_encrypt.c | 14 +++++++++++++-
> 9 files changed, 53 insertions(+), 2 deletions(-)
>
> diff --git a/Documentation/virt/kvm/cpuid.rst b/Documentation/virt/kvm/cpuid.rst
> index 01b081f6e7ea..fcb191bb3016 100644
> --- a/Documentation/virt/kvm/cpuid.rst
> +++ b/Documentation/virt/kvm/cpuid.rst
> @@ -86,6 +86,10 @@ KVM_FEATURE_PV_SCHED_YIELD 13 guest checks this feature bit
> before using paravirtualized
> sched yield.
>
> +KVM_FEATURE_SEV_LIVE_MIGRATION 14 guest checks this feature bit
> + before enabling SEV live
> + migration feature.
> +
> KVM_FEATURE_CLOCSOURCE_STABLE_BIT 24 host will warn if no guest-side
> per-cpu warps are expeced in
> kvmclock
> diff --git a/Documentation/virt/kvm/msr.rst b/Documentation/virt/kvm/msr.rst
> index 33892036672d..7cd7786bbb03 100644
> --- a/Documentation/virt/kvm/msr.rst
> +++ b/Documentation/virt/kvm/msr.rst
> @@ -319,3 +319,13 @@ data:
>
> KVM guests can request the host not to poll on HLT, for example if
> they are performing polling themselves.
> +
> +MSR_KVM_SEV_LIVE_MIG_EN:
> + 0x4b564d06
> +
> + Control SEV Live Migration features.
> +
> +data:
> + Bit 0 enables (1) or disables (0) host-side SEV Live Migration feature.
> + Bit 1 enables (1) or disables (0) support for SEV Live Migration extensions.
> + All other bits are reserved.
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index a96ef6338cd2..ad5faaed43c0 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -780,6 +780,9 @@ struct kvm_vcpu_arch {
>
> u64 msr_kvm_poll_control;
>
> + /* SEV Live Migration MSR (AMD only) */
> + u64 msr_kvm_sev_live_migration_flag;
> +
> /*
> * Indicates the guest is trying to write a gfn that contains one or
> * more of the PTEs used to translate the write itself, i.e. the access
> diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
> index 2a8e0b6b9805..d9d4953b42ad 100644
> --- a/arch/x86/include/uapi/asm/kvm_para.h
> +++ b/arch/x86/include/uapi/asm/kvm_para.h
> @@ -31,6 +31,7 @@
> #define KVM_FEATURE_PV_SEND_IPI 11
> #define KVM_FEATURE_POLL_CONTROL 12
> #define KVM_FEATURE_PV_SCHED_YIELD 13
> +#define KVM_FEATURE_SEV_LIVE_MIGRATION 14
>
> #define KVM_HINTS_REALTIME 0
>
> @@ -50,6 +51,7 @@
> #define MSR_KVM_STEAL_TIME 0x4b564d03
> #define MSR_KVM_PV_EOI_EN 0x4b564d04
> #define MSR_KVM_POLL_CONTROL 0x4b564d05
> +#define MSR_KVM_SEV_LIVE_MIG_EN 0x4b564d06
>
> struct kvm_steal_time {
> __u64 steal;
> @@ -122,4 +124,7 @@ struct kvm_vcpu_pv_apf_data {
> #define KVM_PV_EOI_ENABLED KVM_PV_EOI_MASK
> #define KVM_PV_EOI_DISABLED 0x0
>
> +#define KVM_SEV_LIVE_MIGRATION_ENABLED (1 << 0)
> +#define KVM_SEV_LIVE_MIGRATION_EXTENSIONS_SUPPORTED (1 << 1)
> +
> #endif /* _UAPI_ASM_X86_KVM_PARA_H */
> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> index 6efe0410fb72..8fcee0b45231 100644
> --- a/arch/x86/kernel/kvm.c
> +++ b/arch/x86/kernel/kvm.c
> @@ -418,6 +418,10 @@ static void __init sev_map_percpu_data(void)
> if (!sev_active())
> return;
>
> + if (kvm_para_has_feature(KVM_FEATURE_SEV_LIVE_MIGRATION)) {
> + wrmsrl(MSR_KVM_SEV_LIVE_MIG_EN, KVM_SEV_LIVE_MIGRATION_ENABLED);
> + }
> +
> for_each_possible_cpu(cpu) {
> __set_percpu_decrypted(&per_cpu(apf_reason, cpu), sizeof(apf_reason));
> __set_percpu_decrypted(&per_cpu(steal_time, cpu), sizeof(steal_time));
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index b1c469446b07..74c8b2a7270c 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -716,7 +716,8 @@ static inline int __do_cpuid_func(struct kvm_cpuid_entry2 *entry, u32 function,
> (1 << KVM_FEATURE_ASYNC_PF_VMEXIT) |
> (1 << KVM_FEATURE_PV_SEND_IPI) |
> (1 << KVM_FEATURE_POLL_CONTROL) |
> - (1 << KVM_FEATURE_PV_SCHED_YIELD);
> + (1 << KVM_FEATURE_PV_SCHED_YIELD) |
> + (1 << KVM_FEATURE_SEV_LIVE_MIGRATION);
>
> if (sched_info_on())
> entry->eax |= (1 << KVM_FEATURE_STEAL_TIME);
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index c99b0207a443..60ddc242a133 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -7632,6 +7632,7 @@ static int svm_page_enc_status_hc(struct kvm *kvm, unsigned long gpa,
> unsigned long npages, unsigned long enc)
> {
> struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> + struct kvm_vcpu *vcpu = kvm->vcpus[0];
> kvm_pfn_t pfn_start, pfn_end;
> gfn_t gfn_start, gfn_end;
> int ret;
> @@ -7639,6 +7640,10 @@ static int svm_page_enc_status_hc(struct kvm *kvm, unsigned long gpa,
> if (!sev_guest(kvm))
> return -EINVAL;
>
> + if (!(vcpu->arch.msr_kvm_sev_live_migration_flag &
> + KVM_SEV_LIVE_MIGRATION_ENABLED))
> + return -ENOTTY;
> +
> if (!npages)
> return 0;
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 2127ed937f53..82867b8798f8 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -2880,6 +2880,10 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> vcpu->arch.msr_kvm_poll_control = data;
> break;
>
> + case MSR_KVM_SEV_LIVE_MIG_EN:
> + vcpu->arch.msr_kvm_sev_live_migration_flag = data;
> + break;
> +
> case MSR_IA32_MCG_CTL:
> case MSR_IA32_MCG_STATUS:
> case MSR_IA32_MC0_CTL ... MSR_IA32_MCx_CTL(KVM_MAX_MCE_BANKS) - 1:
> @@ -3126,6 +3130,9 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> case MSR_KVM_POLL_CONTROL:
> msr_info->data = vcpu->arch.msr_kvm_poll_control;
> break;
> + case MSR_KVM_SEV_LIVE_MIG_EN:
> + msr_info->data = vcpu->arch.msr_kvm_sev_live_migration_flag;
> + break;
> case MSR_IA32_P5_MC_ADDR:
> case MSR_IA32_P5_MC_TYPE:
> case MSR_IA32_MCG_CAP:
> diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
> index c9800fa811f6..f6a841494845 100644
> --- a/arch/x86/mm/mem_encrypt.c
> +++ b/arch/x86/mm/mem_encrypt.c
> @@ -502,8 +502,20 @@ void __init mem_encrypt_init(void)
> * With SEV, we need to make a hypercall when page encryption state is
> * changed.
> */
> - if (sev_active())
> + if (sev_active()) {
> + unsigned long nr_pages;
> +
> pv_ops.mmu.page_encryption_changed = set_memory_enc_dec_hypercall;
> +
> + /*
> + * Ensure that _bss_decrypted section is marked as decrypted in the
> + * page encryption bitmap.
> + */
> + nr_pages = DIV_ROUND_UP(__end_bss_decrypted - __start_bss_decrypted,
> + PAGE_SIZE);
> + set_memory_enc_dec_hypercall((unsigned long)__start_bss_decrypted,
> + nr_pages, 0);
> + }
> #endif
>
> pr_info("AMD %s active\n",
Reviewed-by: Krish Sadhukhan <[email protected]>
On 3/29/20 11:23 PM, Ashish Kalra wrote:
> From: Ashish Kalra <[email protected]>
>
> Reset the host's page encryption bitmap related to kernel
> specific page encryption status settings before we load a
> new kernel by kexec. We cannot reset the complete
> page encryption bitmap here as we need to retain the
> UEFI/OVMF firmware specific settings.
Can the commit message mention why host page encryption needs to be
reset ? Since the theme of these patches is guest migration in-SEV
context, it might be useful to mention why the host context comes in here.
>
> Signed-off-by: Ashish Kalra <[email protected]>
> ---
> arch/x86/kernel/kvm.c | 28 ++++++++++++++++++++++++++++
> 1 file changed, 28 insertions(+)
>
> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> index 8fcee0b45231..ba6cce3c84af 100644
> --- a/arch/x86/kernel/kvm.c
> +++ b/arch/x86/kernel/kvm.c
> @@ -34,6 +34,7 @@
> #include <asm/hypervisor.h>
> #include <asm/tlb.h>
> #include <asm/cpuidle_haltpoll.h>
> +#include <asm/e820/api.h>
>
> static int kvmapf = 1;
>
> @@ -357,6 +358,33 @@ static void kvm_pv_guest_cpu_reboot(void *unused)
> */
> if (kvm_para_has_feature(KVM_FEATURE_PV_EOI))
> wrmsrl(MSR_KVM_PV_EOI_EN, 0);
> + /*
> + * Reset the host's page encryption bitmap related to kernel
> + * specific page encryption status settings before we load a
> + * new kernel by kexec. NOTE: We cannot reset the complete
> + * page encryption bitmap here as we need to retain the
> + * UEFI/OVMF firmware specific settings.
> + */
> + if (kvm_para_has_feature(KVM_FEATURE_SEV_LIVE_MIGRATION) &&
> + (smp_processor_id() == 0)) {
> + unsigned long nr_pages;
> + int i;
> +
> + for (i = 0; i < e820_table->nr_entries; i++) {
> + struct e820_entry *entry = &e820_table->entries[i];
> + unsigned long start_pfn, end_pfn;
> +
> + if (entry->type != E820_TYPE_RAM)
> + continue;
> +
> + start_pfn = entry->addr >> PAGE_SHIFT;
> + end_pfn = (entry->addr + entry->size) >> PAGE_SHIFT;
> + nr_pages = DIV_ROUND_UP(entry->size, PAGE_SIZE);
> +
> + kvm_sev_hypercall3(KVM_HC_PAGE_ENC_STATUS,
> + entry->addr, nr_pages, 1);
> + }
> + }
> kvm_pv_disable_apf();
> kvm_disable_steal_time();
> }
The host's page encryption bitmap is maintained for the guest to keep the encrypted/decrypted state
of the guest pages, therefore we need to explicitly mark all shared pages as encrypted again before
rebooting into the new guest kernel.
On Fri, Apr 03, 2020 at 05:55:52PM -0700, Krish Sadhukhan wrote:
>
> On 3/29/20 11:23 PM, Ashish Kalra wrote:
> > From: Ashish Kalra <[email protected]>
> >
> > Reset the host's page encryption bitmap related to kernel
> > specific page encryption status settings before we load a
> > new kernel by kexec. We cannot reset the complete
> > page encryption bitmap here as we need to retain the
> > UEFI/OVMF firmware specific settings.
>
>
> Can the commit message mention why host page encryption needs to be reset ?
> Since the theme of these patches is guest migration in-SEV context, it might
> be useful to mention why the host context comes in here.
>
> >
> > Signed-off-by: Ashish Kalra <[email protected]>
> > ---
> > arch/x86/kernel/kvm.c | 28 ++++++++++++++++++++++++++++
> > 1 file changed, 28 insertions(+)
> >
> > diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> > index 8fcee0b45231..ba6cce3c84af 100644
> > --- a/arch/x86/kernel/kvm.c
> > +++ b/arch/x86/kernel/kvm.c
> > @@ -34,6 +34,7 @@
> > #include <asm/hypervisor.h>
> > #include <asm/tlb.h>
> > #include <asm/cpuidle_haltpoll.h>
> > +#include <asm/e820/api.h>
> > static int kvmapf = 1;
> > @@ -357,6 +358,33 @@ static void kvm_pv_guest_cpu_reboot(void *unused)
> > */
> > if (kvm_para_has_feature(KVM_FEATURE_PV_EOI))
> > wrmsrl(MSR_KVM_PV_EOI_EN, 0);
> > + /*
> > + * Reset the host's page encryption bitmap related to kernel
> > + * specific page encryption status settings before we load a
> > + * new kernel by kexec. NOTE: We cannot reset the complete
> > + * page encryption bitmap here as we need to retain the
> > + * UEFI/OVMF firmware specific settings.
> > + */
> > + if (kvm_para_has_feature(KVM_FEATURE_SEV_LIVE_MIGRATION) &&
> > + (smp_processor_id() == 0)) {
> > + unsigned long nr_pages;
> > + int i;
> > +
> > + for (i = 0; i < e820_table->nr_entries; i++) {
> > + struct e820_entry *entry = &e820_table->entries[i];
> > + unsigned long start_pfn, end_pfn;
> > +
> > + if (entry->type != E820_TYPE_RAM)
> > + continue;
> > +
> > + start_pfn = entry->addr >> PAGE_SHIFT;
> > + end_pfn = (entry->addr + entry->size) >> PAGE_SHIFT;
> > + nr_pages = DIV_ROUND_UP(entry->size, PAGE_SIZE);
> > +
> > + kvm_sev_hypercall3(KVM_HC_PAGE_ENC_STATUS,
> > + entry->addr, nr_pages, 1);
> > + }
> > + }
> > kvm_pv_disable_apf();
> > kvm_disable_steal_time();
> > }
On 4/4/20 2:57 PM, Ashish Kalra wrote:
> The host's page encryption bitmap is maintained for the guest to keep the encrypted/decrypted state
> of the guest pages, therefore we need to explicitly mark all shared pages as encrypted again before
> rebooting into the new guest kernel.
>
> On Fri, Apr 03, 2020 at 05:55:52PM -0700, Krish Sadhukhan wrote:
>> On 3/29/20 11:23 PM, Ashish Kalra wrote:
>>> From: Ashish Kalra <[email protected]>
>>>
>>> Reset the host's page encryption bitmap related to kernel
>>> specific page encryption status settings before we load a
>>> new kernel by kexec. We cannot reset the complete
>>> page encryption bitmap here as we need to retain the
>>> UEFI/OVMF firmware specific settings.
>>
>> Can the commit message mention why host page encryption needs to be reset ?
>> Since the theme of these patches is guest migration in-SEV context, it might
>> be useful to mention why the host context comes in here.
>>
>>> Signed-off-by: Ashish Kalra <[email protected]>
>>> ---
>>> arch/x86/kernel/kvm.c | 28 ++++++++++++++++++++++++++++
>>> 1 file changed, 28 insertions(+)
>>>
>>> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
>>> index 8fcee0b45231..ba6cce3c84af 100644
>>> --- a/arch/x86/kernel/kvm.c
>>> +++ b/arch/x86/kernel/kvm.c
>>> @@ -34,6 +34,7 @@
>>> #include <asm/hypervisor.h>
>>> #include <asm/tlb.h>
>>> #include <asm/cpuidle_haltpoll.h>
>>> +#include <asm/e820/api.h>
>>> static int kvmapf = 1;
>>> @@ -357,6 +358,33 @@ static void kvm_pv_guest_cpu_reboot(void *unused)
>>> */
>>> if (kvm_para_has_feature(KVM_FEATURE_PV_EOI))
>>> wrmsrl(MSR_KVM_PV_EOI_EN, 0);
>>> + /*
>>> + * Reset the host's page encryption bitmap related to kernel
>>> + * specific page encryption status settings before we load a
>>> + * new kernel by kexec. NOTE: We cannot reset the complete
>>> + * page encryption bitmap here as we need to retain the
>>> + * UEFI/OVMF firmware specific settings.
>>> + */
>>> + if (kvm_para_has_feature(KVM_FEATURE_SEV_LIVE_MIGRATION) &&
>>> + (smp_processor_id() == 0)) {
>>> + unsigned long nr_pages;
>>> + int i;
>>> +
>>> + for (i = 0; i < e820_table->nr_entries; i++) {
>>> + struct e820_entry *entry = &e820_table->entries[i];
>>> + unsigned long start_pfn, end_pfn;
>>> +
>>> + if (entry->type != E820_TYPE_RAM)
>>> + continue;
>>> +
>>> + start_pfn = entry->addr >> PAGE_SHIFT;
>>> + end_pfn = (entry->addr + entry->size) >> PAGE_SHIFT;
>>> + nr_pages = DIV_ROUND_UP(entry->size, PAGE_SIZE);
>>> +
>>> + kvm_sev_hypercall3(KVM_HC_PAGE_ENC_STATUS,
>>> + entry->addr, nr_pages, 1);
>>> + }
>>> + }
>>> kvm_pv_disable_apf();
>>> kvm_disable_steal_time();
>>> }
Thanks for the explanation. It will certainly help one understand the
context better if you add it to the commit message.
Reviewed-by: Krish Sadhukhan <[email protected]>
On 4/3/20 2:45 PM, Ashish Kalra wrote:
> On Fri, Apr 03, 2020 at 02:14:23PM -0700, Krish Sadhukhan wrote:
>> On 3/29/20 11:23 PM, Ashish Kalra wrote:
>>> From: Ashish Kalra <[email protected]>
>>>
>>> This ioctl can be used by the application to reset the page
>>> encryption bitmap managed by the KVM driver. A typical usage
>>> for this ioctl is on VM reboot, on reboot, we must reinitialize
>>> the bitmap.
>>>
>>> Signed-off-by: Ashish Kalra <[email protected]>
>>> ---
>>> Documentation/virt/kvm/api.rst | 13 +++++++++++++
>>> arch/x86/include/asm/kvm_host.h | 1 +
>>> arch/x86/kvm/svm.c | 16 ++++++++++++++++
>>> arch/x86/kvm/x86.c | 6 ++++++
>>> include/uapi/linux/kvm.h | 1 +
>>> 5 files changed, 37 insertions(+)
>>>
>>> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
>>> index 4d1004a154f6..a11326ccc51d 100644
>>> --- a/Documentation/virt/kvm/api.rst
>>> +++ b/Documentation/virt/kvm/api.rst
>>> @@ -4698,6 +4698,19 @@ During the guest live migration the outgoing guest exports its page encryption
>>> bitmap, the KVM_SET_PAGE_ENC_BITMAP can be used to build the page encryption
>>> bitmap for an incoming guest.
>>> +4.127 KVM_PAGE_ENC_BITMAP_RESET (vm ioctl)
>>> +-----------------------------------------
>>> +
>>> +:Capability: basic
>>> +:Architectures: x86
>>> +:Type: vm ioctl
>>> +:Parameters: none
>>> +:Returns: 0 on success, -1 on error
>>> +
>>> +The KVM_PAGE_ENC_BITMAP_RESET is used to reset the guest's page encryption
>>> +bitmap during guest reboot and this is only done on the guest's boot vCPU.
>>> +
>>> +
>>> 5. The kvm_run structure
>>> ========================
>>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>>> index d30f770aaaea..a96ef6338cd2 100644
>>> --- a/arch/x86/include/asm/kvm_host.h
>>> +++ b/arch/x86/include/asm/kvm_host.h
>>> @@ -1273,6 +1273,7 @@ struct kvm_x86_ops {
>>> struct kvm_page_enc_bitmap *bmap);
>>> int (*set_page_enc_bitmap)(struct kvm *kvm,
>>> struct kvm_page_enc_bitmap *bmap);
>>> + int (*reset_page_enc_bitmap)(struct kvm *kvm);
>>> };
>>> struct kvm_arch_async_pf {
>>> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
>>> index 313343a43045..c99b0207a443 100644
>>> --- a/arch/x86/kvm/svm.c
>>> +++ b/arch/x86/kvm/svm.c
>>> @@ -7797,6 +7797,21 @@ static int svm_set_page_enc_bitmap(struct kvm *kvm,
>>> return ret;
>>> }
>>> +static int svm_reset_page_enc_bitmap(struct kvm *kvm)
>>> +{
>>> + struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
>>> +
>>> + if (!sev_guest(kvm))
>>> + return -ENOTTY;
>>> +
>>> + mutex_lock(&kvm->lock);
>>> + /* by default all pages should be marked encrypted */
>>> + if (sev->page_enc_bmap_size)
>>> + bitmap_fill(sev->page_enc_bmap, sev->page_enc_bmap_size);
>>> + mutex_unlock(&kvm->lock);
>>> + return 0;
>>> +}
>>> +
>>> static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
>>> {
>>> struct kvm_sev_cmd sev_cmd;
>>> @@ -8203,6 +8218,7 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
>>> .page_enc_status_hc = svm_page_enc_status_hc,
>>> .get_page_enc_bitmap = svm_get_page_enc_bitmap,
>>> .set_page_enc_bitmap = svm_set_page_enc_bitmap,
>>> + .reset_page_enc_bitmap = svm_reset_page_enc_bitmap,
>>
>> We don't need to initialize the intel ops to NULL ? It's not initialized in
>> the previous patch either.
>>
>>> };
> This struct is declared as "static storage", so won't the non-initialized
> members be 0 ?
Correct. Although, I see that 'nested_enable_evmcs' is explicitly
initialized. We should maintain the convention, perhaps.
>
>>> static int __init svm_init(void)
>>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>>> index 05e953b2ec61..2127ed937f53 100644
>>> --- a/arch/x86/kvm/x86.c
>>> +++ b/arch/x86/kvm/x86.c
>>> @@ -5250,6 +5250,12 @@ long kvm_arch_vm_ioctl(struct file *filp,
>>> r = kvm_x86_ops->set_page_enc_bitmap(kvm, &bitmap);
>>> break;
>>> }
>>> + case KVM_PAGE_ENC_BITMAP_RESET: {
>>> + r = -ENOTTY;
>>> + if (kvm_x86_ops->reset_page_enc_bitmap)
>>> + r = kvm_x86_ops->reset_page_enc_bitmap(kvm);
>>> + break;
>>> + }
>>> default:
>>> r = -ENOTTY;
>>> }
>>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
>>> index b4b01d47e568..0884a581fc37 100644
>>> --- a/include/uapi/linux/kvm.h
>>> +++ b/include/uapi/linux/kvm.h
>>> @@ -1490,6 +1490,7 @@ struct kvm_enc_region {
>>> #define KVM_GET_PAGE_ENC_BITMAP _IOW(KVMIO, 0xc5, struct kvm_page_enc_bitmap)
>>> #define KVM_SET_PAGE_ENC_BITMAP _IOW(KVMIO, 0xc6, struct kvm_page_enc_bitmap)
>>> +#define KVM_PAGE_ENC_BITMAP_RESET _IO(KVMIO, 0xc7)
>>> /* Secure Encrypted Virtualization command */
>>> enum sev_cmd_id {
>> Reviewed-by: Krish Sadhukhan <[email protected]>
On 4/3/20 1:47 PM, Ashish Kalra wrote:
> On Fri, Apr 03, 2020 at 01:18:52PM -0700, Krish Sadhukhan wrote:
>> On 3/29/20 11:22 PM, Ashish Kalra wrote:
>>> From: Brijesh Singh <[email protected]>
>>>
>>> The ioctl can be used to retrieve page encryption bitmap for a given
>>> gfn range.
>>>
>>> Return the correct bitmap as per the number of pages being requested
>>> by the user. Ensure that we only copy bmap->num_pages bytes in the
>>> userspace buffer, if bmap->num_pages is not byte aligned we read
>>> the trailing bits from the userspace and copy those bits as is.
>>>
>>> Cc: Thomas Gleixner <[email protected]>
>>> Cc: Ingo Molnar <[email protected]>
>>> Cc: "H. Peter Anvin" <[email protected]>
>>> Cc: Paolo Bonzini <[email protected]>
>>> Cc: "Radim Krčmář" <[email protected]>
>>> Cc: Joerg Roedel <[email protected]>
>>> Cc: Borislav Petkov <[email protected]>
>>> Cc: Tom Lendacky <[email protected]>
>>> Cc: [email protected]
>>> Cc: [email protected]
>>> Cc: [email protected]
>>> Signed-off-by: Brijesh Singh <[email protected]>
>>> Signed-off-by: Ashish Kalra <[email protected]>
>>> ---
>>> Documentation/virt/kvm/api.rst | 27 +++++++++++++
>>> arch/x86/include/asm/kvm_host.h | 2 +
>>> arch/x86/kvm/svm.c | 71 +++++++++++++++++++++++++++++++++
>>> arch/x86/kvm/x86.c | 12 ++++++
>>> include/uapi/linux/kvm.h | 12 ++++++
>>> 5 files changed, 124 insertions(+)
>>>
>>> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
>>> index ebd383fba939..8ad800ebb54f 100644
>>> --- a/Documentation/virt/kvm/api.rst
>>> +++ b/Documentation/virt/kvm/api.rst
>>> @@ -4648,6 +4648,33 @@ This ioctl resets VCPU registers and control structures according to
>>> the clear cpu reset definition in the POP. However, the cpu is not put
>>> into ESA mode. This reset is a superset of the initial reset.
>>> +4.125 KVM_GET_PAGE_ENC_BITMAP (vm ioctl)
>>> +---------------------------------------
>>> +
>>> +:Capability: basic
>>> +:Architectures: x86
>>> +:Type: vm ioctl
>>> +:Parameters: struct kvm_page_enc_bitmap (in/out)
>>> +:Returns: 0 on success, -1 on error
>>> +
>>> +/* for KVM_GET_PAGE_ENC_BITMAP */
>>> +struct kvm_page_enc_bitmap {
>>> + __u64 start_gfn;
>>> + __u64 num_pages;
>>> + union {
>>> + void __user *enc_bitmap; /* one bit per page */
>>> + __u64 padding2;
>>> + };
>>> +};
>>> +
>>> +The encrypted VMs have concept of private and shared pages. The private
>>> +page is encrypted with the guest-specific key, while shared page may
>>> +be encrypted with the hypervisor key. The KVM_GET_PAGE_ENC_BITMAP can
>>> +be used to get the bitmap indicating whether the guest page is private
>>> +or shared. The bitmap can be used during the guest migration, if the page
>>> +is private then userspace need to use SEV migration commands to transmit
>>> +the page.
>>> +
>>> 5. The kvm_run structure
>>> ========================
>>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>>> index 90718fa3db47..27e43e3ec9d8 100644
>>> --- a/arch/x86/include/asm/kvm_host.h
>>> +++ b/arch/x86/include/asm/kvm_host.h
>>> @@ -1269,6 +1269,8 @@ struct kvm_x86_ops {
>>> int (*enable_direct_tlbflush)(struct kvm_vcpu *vcpu);
>>> int (*page_enc_status_hc)(struct kvm *kvm, unsigned long gpa,
>>> unsigned long sz, unsigned long mode);
>>> + int (*get_page_enc_bitmap)(struct kvm *kvm,
>>> + struct kvm_page_enc_bitmap *bmap);
>>
>> Looking back at the previous patch, it seems that these two are basically
>> the setter/getter action for page encryption, though one is implemented as a
>> hypercall while the other as an ioctl. If we consider the setter/getter
>> aspect, isn't it better to have some sort of symmetry in the naming of the
>> ops ? For example,
>>
>> set_page_enc_hc
>>
>> get_page_enc_ioctl
>>
>>> };
> These are named as per their usage. While the page_enc_status_hc is a
> hypercall used by a guest to mark the page encryption bitmap, the other
> ones are ioctl interfaces used by Qemu (or Qemu alternative) to get/set
> the page encryption bitmaps, so these are named accordingly.
OK.
Please rename 'set_page_enc_hc' to 'set_page_enc_hypercall' to match
'patch_hypercall'.
Reviewed-by: Krish Sadhukhan <[email protected]>
>
>>> struct kvm_arch_async_pf {
>>> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
>>> index 1d8beaf1bceb..bae783cd396a 100644
>>> --- a/arch/x86/kvm/svm.c
>>> +++ b/arch/x86/kvm/svm.c
>>> @@ -7686,6 +7686,76 @@ static int svm_page_enc_status_hc(struct kvm *kvm, unsigned long gpa,
>>> return ret;
>>> }
>>> +static int svm_get_page_enc_bitmap(struct kvm *kvm,
>>> + struct kvm_page_enc_bitmap *bmap)
>>> +{
>>> + struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
>>> + unsigned long gfn_start, gfn_end;
>>> + unsigned long sz, i, sz_bytes;
>>> + unsigned long *bitmap;
>>> + int ret, n;
>>> +
>>> + if (!sev_guest(kvm))
>>> + return -ENOTTY;
>>> +
>>> + gfn_start = bmap->start_gfn;
>>
>> What if bmap->start_gfn is junk ?
>>
>>> + gfn_end = gfn_start + bmap->num_pages;
>>> +
>>> + sz = ALIGN(bmap->num_pages, BITS_PER_LONG) / BITS_PER_BYTE;
>>> + bitmap = kmalloc(sz, GFP_KERNEL);
>>> + if (!bitmap)
>>> + return -ENOMEM;
>>> +
>>> + /* by default all pages are marked encrypted */
>>> + memset(bitmap, 0xff, sz);
>>> +
>>> + mutex_lock(&kvm->lock);
>>> + if (sev->page_enc_bmap) {
>>> + i = gfn_start;
>>> + for_each_clear_bit_from(i, sev->page_enc_bmap,
>>> + min(sev->page_enc_bmap_size, gfn_end))
>>> + clear_bit(i - gfn_start, bitmap);
>>> + }
>>> + mutex_unlock(&kvm->lock);
>>> +
>>> + ret = -EFAULT;
>>> +
>>> + n = bmap->num_pages % BITS_PER_BYTE;
>>> + sz_bytes = ALIGN(bmap->num_pages, BITS_PER_BYTE) / BITS_PER_BYTE;
>>> +
>>> + /*
>>> + * Return the correct bitmap as per the number of pages being
>>> + * requested by the user. Ensure that we only copy bmap->num_pages
>>> + * bytes in the userspace buffer, if bmap->num_pages is not byte
>>> + * aligned we read the trailing bits from the userspace and copy
>>> + * those bits as is.
>>> + */
>>> +
>>> + if (n) {
>>
>> Is it better to check for 'num_pages' at the beginning of the function
>> rather than coming this far if bmap->num_pages is zero ?
>>
> This is not checking for "num_pages", this is basically checking if
> bmap->num_pages is not byte aligned.
>
>>> + unsigned char *bitmap_kernel = (unsigned char *)bitmap;
>>
>> Just trying to understand why you need this extra variable instead of using
>> 'bitmap' directly.
>>
> Makes the code much more readable/understandable.
>
>>> + unsigned char bitmap_user;
>>> + unsigned long offset, mask;
>>> +
>>> + offset = bmap->num_pages / BITS_PER_BYTE;
>>> + if (copy_from_user(&bitmap_user, bmap->enc_bitmap + offset,
>>> + sizeof(unsigned char)))
>>> + goto out;
>>> +
>>> + mask = GENMASK(n - 1, 0);
>>> + bitmap_user &= ~mask;
>>> + bitmap_kernel[offset] &= mask;
>>> + bitmap_kernel[offset] |= bitmap_user;
>>> + }
>>> +
>>> + if (copy_to_user(bmap->enc_bitmap, bitmap, sz_bytes))
>>
>> If 'n' is zero, we are still copying stuff back to the user. Is that what is
>> expected from userland ?
>>
>> Another point. Since copy_from_user() was done in the caller, isn't it
>> better to move this to the caller to keep a symmetry ?
>>
> As per the comments above, please note if n is not zero that means
> bmap->num_pages is not byte aligned so we read the trailing bits
> from the userspace and copy those bits as is. If n is zero, then
> bmap->num_pages is correctly aligned and we copy all the bytes back.
>
> Thanks,
> Ashish
>
>>> + goto out;
>>> +
>>> + ret = 0;
>>> +out:
>>> + kfree(bitmap);
>>> + return ret;
>>> +}
>>> +
>>> static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
>>> {
>>> struct kvm_sev_cmd sev_cmd;
>>> @@ -8090,6 +8160,7 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
>>> .apic_init_signal_blocked = svm_apic_init_signal_blocked,
>>> .page_enc_status_hc = svm_page_enc_status_hc,
>>> + .get_page_enc_bitmap = svm_get_page_enc_bitmap,
>>> };
>>> static int __init svm_init(void)
>>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>>> index 68428eef2dde..3c3fea4e20b5 100644
>>> --- a/arch/x86/kvm/x86.c
>>> +++ b/arch/x86/kvm/x86.c
>>> @@ -5226,6 +5226,18 @@ long kvm_arch_vm_ioctl(struct file *filp,
>>> case KVM_SET_PMU_EVENT_FILTER:
>>> r = kvm_vm_ioctl_set_pmu_event_filter(kvm, argp);
>>> break;
>>> + case KVM_GET_PAGE_ENC_BITMAP: {
>>> + struct kvm_page_enc_bitmap bitmap;
>>> +
>>> + r = -EFAULT;
>>> + if (copy_from_user(&bitmap, argp, sizeof(bitmap)))
>>> + goto out;
>>> +
>>> + r = -ENOTTY;
>>> + if (kvm_x86_ops->get_page_enc_bitmap)
>>> + r = kvm_x86_ops->get_page_enc_bitmap(kvm, &bitmap);
>>> + break;
>>> + }
>>> default:
>>> r = -ENOTTY;
>>> }
>>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
>>> index 4e80c57a3182..db1ebf85e177 100644
>>> --- a/include/uapi/linux/kvm.h
>>> +++ b/include/uapi/linux/kvm.h
>>> @@ -500,6 +500,16 @@ struct kvm_dirty_log {
>>> };
>>> };
>>> +/* for KVM_GET_PAGE_ENC_BITMAP */
>>> +struct kvm_page_enc_bitmap {
>>> + __u64 start_gfn;
>>> + __u64 num_pages;
>>> + union {
>>> + void __user *enc_bitmap; /* one bit per page */
>>> + __u64 padding2;
>>> + };
>>> +};
>>> +
>>> /* for KVM_CLEAR_DIRTY_LOG */
>>> struct kvm_clear_dirty_log {
>>> __u32 slot;
>>> @@ -1478,6 +1488,8 @@ struct kvm_enc_region {
>>> #define KVM_S390_NORMAL_RESET _IO(KVMIO, 0xc3)
>>> #define KVM_S390_CLEAR_RESET _IO(KVMIO, 0xc4)
>>> +#define KVM_GET_PAGE_ENC_BITMAP _IOW(KVMIO, 0xc5, struct kvm_page_enc_bitmap)
>>> +
>>> /* Secure Encrypted Virtualization command */
>>> enum sev_cmd_id {
>>> /* Guest initialization commands */
On 4/2/20 7:58 PM, Ashish Kalra wrote:
> On Fri, Apr 03, 2020 at 01:57:48AM +0000, Ashish Kalra wrote:
>> On Thu, Apr 02, 2020 at 06:31:54PM -0700, Krish Sadhukhan wrote:
>>> On 3/29/20 11:22 PM, Ashish Kalra wrote:
>>>> From: Brijesh Singh <[email protected]>
>>>>
>>>> This hypercall is used by the SEV guest to notify a change in the page
>>>> encryption status to the hypervisor. The hypercall should be invoked
>>>> only when the encryption attribute is changed from encrypted -> decrypted
>>>> and vice versa. By default all guest pages are considered encrypted.
>>>>
>>>> Cc: Thomas Gleixner <[email protected]>
>>>> Cc: Ingo Molnar <[email protected]>
>>>> Cc: "H. Peter Anvin" <[email protected]>
>>>> Cc: Paolo Bonzini <[email protected]>
>>>> Cc: "Radim Krčmář" <[email protected]>
>>>> Cc: Joerg Roedel <[email protected]>
>>>> Cc: Borislav Petkov <[email protected]>
>>>> Cc: Tom Lendacky <[email protected]>
>>>> Cc: [email protected]
>>>> Cc: [email protected]
>>>> Cc: [email protected]
>>>> Signed-off-by: Brijesh Singh <[email protected]>
>>>> Signed-off-by: Ashish Kalra <[email protected]>
>>>> ---
>>>> Documentation/virt/kvm/hypercalls.rst | 15 +++++
>>>> arch/x86/include/asm/kvm_host.h | 2 +
>>>> arch/x86/kvm/svm.c | 95 +++++++++++++++++++++++++++
>>>> arch/x86/kvm/vmx/vmx.c | 1 +
>>>> arch/x86/kvm/x86.c | 6 ++
>>>> include/uapi/linux/kvm_para.h | 1 +
>>>> 6 files changed, 120 insertions(+)
>>>>
>>>> diff --git a/Documentation/virt/kvm/hypercalls.rst b/Documentation/virt/kvm/hypercalls.rst
>>>> index dbaf207e560d..ff5287e68e81 100644
>>>> --- a/Documentation/virt/kvm/hypercalls.rst
>>>> +++ b/Documentation/virt/kvm/hypercalls.rst
>>>> @@ -169,3 +169,18 @@ a0: destination APIC ID
>>>> :Usage example: When sending a call-function IPI-many to vCPUs, yield if
>>>> any of the IPI target vCPUs was preempted.
>>>> +
>>>> +
>>>> +8. KVM_HC_PAGE_ENC_STATUS
>>>> +-------------------------
>>>> +:Architecture: x86
>>>> +:Status: active
>>>> +:Purpose: Notify the encryption status changes in guest page table (SEV guest)
>>>> +
>>>> +a0: the guest physical address of the start page
>>>> +a1: the number of pages
>>>> +a2: encryption attribute
>>>> +
>>>> + Where:
>>>> + * 1: Encryption attribute is set
>>>> + * 0: Encryption attribute is cleared
>>>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>>>> index 98959e8cd448..90718fa3db47 100644
>>>> --- a/arch/x86/include/asm/kvm_host.h
>>>> +++ b/arch/x86/include/asm/kvm_host.h
>>>> @@ -1267,6 +1267,8 @@ struct kvm_x86_ops {
>>>> bool (*apic_init_signal_blocked)(struct kvm_vcpu *vcpu);
>>>> int (*enable_direct_tlbflush)(struct kvm_vcpu *vcpu);
>>>> + int (*page_enc_status_hc)(struct kvm *kvm, unsigned long gpa,
>>>> + unsigned long sz, unsigned long mode);
>>>> };
>>>> struct kvm_arch_async_pf {
>>>> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
>>>> index 7c2721e18b06..1d8beaf1bceb 100644
>>>> --- a/arch/x86/kvm/svm.c
>>>> +++ b/arch/x86/kvm/svm.c
>>>> @@ -136,6 +136,8 @@ struct kvm_sev_info {
>>>> int fd; /* SEV device fd */
>>>> unsigned long pages_locked; /* Number of pages locked */
>>>> struct list_head regions_list; /* List of registered regions */
>>>> + unsigned long *page_enc_bmap;
>>>> + unsigned long page_enc_bmap_size;
>>>> };
>>>> struct kvm_svm {
>>>> @@ -1991,6 +1993,9 @@ static void sev_vm_destroy(struct kvm *kvm)
>>>> sev_unbind_asid(kvm, sev->handle);
>>>> sev_asid_free(sev->asid);
>>>> +
>>>> + kvfree(sev->page_enc_bmap);
>>>> + sev->page_enc_bmap = NULL;
>>>> }
>>>> static void avic_vm_destroy(struct kvm *kvm)
>>>> @@ -7593,6 +7598,94 @@ static int sev_receive_finish(struct kvm *kvm, struct kvm_sev_cmd *argp)
>>>> return ret;
>>>> }
>>>> +static int sev_resize_page_enc_bitmap(struct kvm *kvm, unsigned long new_size)
>>>> +{
>>>> + struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
>>>> + unsigned long *map;
>>>> + unsigned long sz;
>>>> +
>>>> + if (sev->page_enc_bmap_size >= new_size)
>>>> + return 0;
>>>> +
>>>> + sz = ALIGN(new_size, BITS_PER_LONG) / 8;
>>>> +
>>>> + map = vmalloc(sz);
>>>
>>> Just wondering why we can't directly modify sev->page_enc_bmap.
>>>
>> Because the page_enc_bitmap needs to be re-sized here, it needs to be
>> expanded here.
OK.
> I don't believe there is anything is like a realloc() kind of equivalent
> for the kmalloc() interfaces.
>
> Thanks,
> Ashish
>
>>>> + if (!map) {
>>>> + pr_err_once("Failed to allocate encrypted bitmap size %lx\n",
>>>> + sz);
>>>> + return -ENOMEM;
>>>> + }
>>>> +
>>>> + /* mark the page encrypted (by default) */
>>>> + memset(map, 0xff, sz);
>>>> +
>>>> + bitmap_copy(map, sev->page_enc_bmap, sev->page_enc_bmap_size);
>>>> + kvfree(sev->page_enc_bmap);
>>>> +
>>>> + sev->page_enc_bmap = map;
>>>> + sev->page_enc_bmap_size = new_size;
>>>> +
>>>> + return 0;
>>>> +}
>>>> +
>>>> +static int svm_page_enc_status_hc(struct kvm *kvm, unsigned long gpa,
>>>> + unsigned long npages, unsigned long enc)
>>>> +{
>>>> + struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
>>>> + kvm_pfn_t pfn_start, pfn_end;
>>>> + gfn_t gfn_start, gfn_end;
>>>> + int ret;
>>>> +
>>>> + if (!sev_guest(kvm))
>>>> + return -EINVAL;
>>>> +
>>>> + if (!npages)
>>>> + return 0;
>>>> +
>>>> + gfn_start = gpa_to_gfn(gpa);
>>>> + gfn_end = gfn_start + npages;
>>>> +
>>>> + /* out of bound access error check */
>>>> + if (gfn_end <= gfn_start)
>>>> + return -EINVAL;
>>>> +
>>>> + /* lets make sure that gpa exist in our memslot */
>>>> + pfn_start = gfn_to_pfn(kvm, gfn_start);
>>>> + pfn_end = gfn_to_pfn(kvm, gfn_end);
>>>> +
>>>> + if (is_error_noslot_pfn(pfn_start) && !is_noslot_pfn(pfn_start)) {
>>>> + /*
>>>> + * Allow guest MMIO range(s) to be added
>>>> + * to the page encryption bitmap.
>>>> + */
>>>> + return -EINVAL;
>>>> + }
>>>> +
>>>> + if (is_error_noslot_pfn(pfn_end) && !is_noslot_pfn(pfn_end)) {
>>>> + /*
>>>> + * Allow guest MMIO range(s) to be added
>>>> + * to the page encryption bitmap.
>>>> + */
>>>> + return -EINVAL;
>>>> + }
>>>
>>> It seems is_error_noslot_pfn() covers both cases - i) gfn slot is absent,
>>> ii) failure to translate to pfn. So do we still need is_noslot_pfn() ?
>>>
>> We do need to check for !is_noslot_pfn(..) additionally as the MMIO ranges will not
>> be having a slot allocated.
The comments above is_error_noslot_pfn() seem to indicate that it covers
both cases, "not in slot" and "failure to translate"...
Reviewed-by: Krish Sadhukhan <[email protected]>
>>
>> Thanks,
>> Ashish
>>
>>>> +
>>>> + mutex_lock(&kvm->lock);
>>>> + ret = sev_resize_page_enc_bitmap(kvm, gfn_end);
>>>> + if (ret)
>>>> + goto unlock;
>>>> +
>>>> + if (enc)
>>>> + __bitmap_set(sev->page_enc_bmap, gfn_start,
>>>> + gfn_end - gfn_start);
>>>> + else
>>>> + __bitmap_clear(sev->page_enc_bmap, gfn_start,
>>>> + gfn_end - gfn_start);
>>>> +
>>>> +unlock:
>>>> + mutex_unlock(&kvm->lock);
>>>> + return ret;
>>>> +}
>>>> +
>>>> static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
>>>> {
>>>> struct kvm_sev_cmd sev_cmd;
>>>> @@ -7995,6 +8088,8 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
>>>> .need_emulation_on_page_fault = svm_need_emulation_on_page_fault,
>>>> .apic_init_signal_blocked = svm_apic_init_signal_blocked,
>>>> +
>>>> + .page_enc_status_hc = svm_page_enc_status_hc,
>>>
>>> Why not place it where other encryption ops are located ?
>>>
>>> ...
>>>
>>> .mem_enc_unreg_region
>>>
>>> + .page_enc_status_hc = svm_page_enc_status_hc
>>>
>>>> };
>>>> static int __init svm_init(void)
>>>> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
>>>> index 079d9fbf278e..f68e76ee7f9c 100644
>>>> --- a/arch/x86/kvm/vmx/vmx.c
>>>> +++ b/arch/x86/kvm/vmx/vmx.c
>>>> @@ -8001,6 +8001,7 @@ static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
>>>> .nested_get_evmcs_version = NULL,
>>>> .need_emulation_on_page_fault = vmx_need_emulation_on_page_fault,
>>>> .apic_init_signal_blocked = vmx_apic_init_signal_blocked,
>>>> + .page_enc_status_hc = NULL,
>>>> };
>>>> static void vmx_cleanup_l1d_flush(void)
>>>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>>>> index cf95c36cb4f4..68428eef2dde 100644
>>>> --- a/arch/x86/kvm/x86.c
>>>> +++ b/arch/x86/kvm/x86.c
>>>> @@ -7564,6 +7564,12 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
>>>> kvm_sched_yield(vcpu->kvm, a0);
>>>> ret = 0;
>>>> break;
>>>> + case KVM_HC_PAGE_ENC_STATUS:
>>>> + ret = -KVM_ENOSYS;
>>>> + if (kvm_x86_ops->page_enc_status_hc)
>>>> + ret = kvm_x86_ops->page_enc_status_hc(vcpu->kvm,
>>>> + a0, a1, a2);
>>>> + break;
>>>> default:
>>>> ret = -KVM_ENOSYS;
>>>> break;
>>>> diff --git a/include/uapi/linux/kvm_para.h b/include/uapi/linux/kvm_para.h
>>>> index 8b86609849b9..847b83b75dc8 100644
>>>> --- a/include/uapi/linux/kvm_para.h
>>>> +++ b/include/uapi/linux/kvm_para.h
>>>> @@ -29,6 +29,7 @@
>>>> #define KVM_HC_CLOCK_PAIRING 9
>>>> #define KVM_HC_SEND_IPI 10
>>>> #define KVM_HC_SCHED_YIELD 11
>>>> +#define KVM_HC_PAGE_ENC_STATUS 12
>>>> /*
>>>> * hypercalls use architecture specific
On Thu, Apr 2, 2020 at 3:31 PM Venu Busireddy <[email protected]> wrote:
>
> On 2020-03-30 06:21:20 +0000, Ashish Kalra wrote:
> > From: Brijesh Singh <[email protected]>
> >
> > The command is used for copying the incoming buffer into the
> > SEV guest memory space.
> >
> > Cc: Thomas Gleixner <[email protected]>
> > Cc: Ingo Molnar <[email protected]>
> > Cc: "H. Peter Anvin" <[email protected]>
> > Cc: Paolo Bonzini <[email protected]>
> > Cc: "Radim Krčmář" <[email protected]>
> > Cc: Joerg Roedel <[email protected]>
> > Cc: Borislav Petkov <[email protected]>
> > Cc: Tom Lendacky <[email protected]>
> > Cc: [email protected]
> > Cc: [email protected]
> > Cc: [email protected]
> > Signed-off-by: Brijesh Singh <[email protected]>
> > Signed-off-by: Ashish Kalra <[email protected]>
>
> Reviewed-by: Venu Busireddy <[email protected]>
>
> > ---
> > .../virt/kvm/amd-memory-encryption.rst | 24 ++++++
> > arch/x86/kvm/svm.c | 79 +++++++++++++++++++
> > include/uapi/linux/kvm.h | 9 +++
> > 3 files changed, 112 insertions(+)
> >
> > diff --git a/Documentation/virt/kvm/amd-memory-encryption.rst b/Documentation/virt/kvm/amd-memory-encryption.rst
> > index ef1f1f3a5b40..554aa33a99cc 100644
> > --- a/Documentation/virt/kvm/amd-memory-encryption.rst
> > +++ b/Documentation/virt/kvm/amd-memory-encryption.rst
> > @@ -351,6 +351,30 @@ On success, the 'handle' field contains a new handle and on error, a negative va
> >
> > For more details, see SEV spec Section 6.12.
> >
> > +14. KVM_SEV_RECEIVE_UPDATE_DATA
> > +----------------------------
> > +
> > +The KVM_SEV_RECEIVE_UPDATE_DATA command can be used by the hypervisor to copy
> > +the incoming buffers into the guest memory region with encryption context
> > +created during the KVM_SEV_RECEIVE_START.
> > +
> > +Parameters (in): struct kvm_sev_receive_update_data
> > +
> > +Returns: 0 on success, -negative on error
> > +
> > +::
> > +
> > + struct kvm_sev_launch_receive_update_data {
> > + __u64 hdr_uaddr; /* userspace address containing the packet header */
> > + __u32 hdr_len;
> > +
> > + __u64 guest_uaddr; /* the destination guest memory region */
> > + __u32 guest_len;
> > +
> > + __u64 trans_uaddr; /* the incoming buffer memory region */
> > + __u32 trans_len;
> > + };
> > +
> > References
> > ==========
> >
> > diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> > index 038b47685733..5fc5355536d7 100644
> > --- a/arch/x86/kvm/svm.c
> > +++ b/arch/x86/kvm/svm.c
> > @@ -7497,6 +7497,82 @@ static int sev_receive_start(struct kvm *kvm, struct kvm_sev_cmd *argp)
> > return ret;
> > }
> >
> > +static int sev_receive_update_data(struct kvm *kvm, struct kvm_sev_cmd *argp)
> > +{
> > + struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> > + struct kvm_sev_receive_update_data params;
> > + struct sev_data_receive_update_data *data;
> > + void *hdr = NULL, *trans = NULL;
> > + struct page **guest_page;
> > + unsigned long n;
> > + int ret, offset;
> > +
> > + if (!sev_guest(kvm))
> > + return -EINVAL;
> > +
> > + if (copy_from_user(¶ms, (void __user *)(uintptr_t)argp->data,
> > + sizeof(struct kvm_sev_receive_update_data)))
> > + return -EFAULT;
> > +
> > + if (!params.hdr_uaddr || !params.hdr_len ||
> > + !params.guest_uaddr || !params.guest_len ||
> > + !params.trans_uaddr || !params.trans_len)
> > + return -EINVAL;
> > +
> > + /* Check if we are crossing the page boundary */
> > + offset = params.guest_uaddr & (PAGE_SIZE - 1);
> > + if ((params.guest_len + offset > PAGE_SIZE))
> > + return -EINVAL;
Check for overflow.
>
> > +
> > + hdr = psp_copy_user_blob(params.hdr_uaddr, params.hdr_len);
> > + if (IS_ERR(hdr))
> > + return PTR_ERR(hdr);
> > +
> > + trans = psp_copy_user_blob(params.trans_uaddr, params.trans_len);
> > + if (IS_ERR(trans)) {
> > + ret = PTR_ERR(trans);
> > + goto e_free_hdr;
> > + }
> > +
> > + ret = -ENOMEM;
> > + data = kzalloc(sizeof(*data), GFP_KERNEL);
> > + if (!data)
> > + goto e_free_trans;
> > +
> > + data->hdr_address = __psp_pa(hdr);
> > + data->hdr_len = params.hdr_len;
> > + data->trans_address = __psp_pa(trans);
> > + data->trans_len = params.trans_len;
> > +
> > + /* Pin guest memory */
> > + ret = -EFAULT;
> > + guest_page = sev_pin_memory(kvm, params.guest_uaddr & PAGE_MASK,
> > + PAGE_SIZE, &n, 0);
> > + if (!guest_page)
> > + goto e_free;
> > +
> > + /* The RECEIVE_UPDATE_DATA command requires C-bit to be always set. */
> > + data->guest_address = (page_to_pfn(guest_page[0]) << PAGE_SHIFT) +
> > + offset;
> > + data->guest_address |= sev_me_mask;
> > + data->guest_len = params.guest_len;
> > + data->handle = sev->handle;
> > +
> > + ret = sev_issue_cmd(kvm, SEV_CMD_RECEIVE_UPDATE_DATA, data,
> > + &argp->error);
> > +
> > + sev_unpin_memory(kvm, guest_page, n);
> > +
> > +e_free:
> > + kfree(data);
> > +e_free_trans:
> > + kfree(trans);
> > +e_free_hdr:
> > + kfree(hdr);
> > +
> > + return ret;
> > +}
> > +
> > static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
> > {
> > struct kvm_sev_cmd sev_cmd;
> > @@ -7553,6 +7629,9 @@ static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
> > case KVM_SEV_RECEIVE_START:
> > r = sev_receive_start(kvm, &sev_cmd);
> > break;
> > + case KVM_SEV_RECEIVE_UPDATE_DATA:
> > + r = sev_receive_update_data(kvm, &sev_cmd);
> > + break;
> > default:
> > r = -EINVAL;
> > goto out;
> > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> > index 74764b9db5fa..4e80c57a3182 100644
> > --- a/include/uapi/linux/kvm.h
> > +++ b/include/uapi/linux/kvm.h
> > @@ -1588,6 +1588,15 @@ struct kvm_sev_receive_start {
> > __u32 session_len;
> > };
> >
> > +struct kvm_sev_receive_update_data {
> > + __u64 hdr_uaddr;
> > + __u32 hdr_len;
> > + __u64 guest_uaddr;
> > + __u32 guest_len;
> > + __u64 trans_uaddr;
> > + __u32 trans_len;
> > +};
> > +
> > #define KVM_DEV_ASSIGN_ENABLE_IOMMU (1 << 0)
> > #define KVM_DEV_ASSIGN_PCI_2_3 (1 << 1)
> > #define KVM_DEV_ASSIGN_MASK_INTX (1 << 2)
> > --
> > 2.17.1
> >
Otherwise looks fine to my eye.
Reviewed-by: Steve Rutherford <[email protected]>
On Thu, Apr 2, 2020 at 3:27 PM Krish Sadhukhan
<[email protected]> wrote:
>
>
> On 3/29/20 11:21 PM, Ashish Kalra wrote:
> > From: Brijesh Singh <[email protected]>
> >
> > The command finalize the guest receiving process and make the SEV guest
> > ready for the execution.
> >
> > Cc: Thomas Gleixner <[email protected]>
> > Cc: Ingo Molnar <[email protected]>
> > Cc: "H. Peter Anvin" <[email protected]>
> > Cc: Paolo Bonzini <[email protected]>
> > Cc: "Radim Krčmář" <[email protected]>
> > Cc: Joerg Roedel <[email protected]>
> > Cc: Borislav Petkov <[email protected]>
> > Cc: Tom Lendacky <[email protected]>
> > Cc: [email protected]
> > Cc: [email protected]
> > Cc: [email protected]
> > Signed-off-by: Brijesh Singh <[email protected]>
> > Signed-off-by: Ashish Kalra <[email protected]>
> > ---
> > .../virt/kvm/amd-memory-encryption.rst | 8 +++++++
> > arch/x86/kvm/svm.c | 23 +++++++++++++++++++
> > 2 files changed, 31 insertions(+)
> >
> > diff --git a/Documentation/virt/kvm/amd-memory-encryption.rst b/Documentation/virt/kvm/amd-memory-encryption.rst
> > index 554aa33a99cc..93cd95d9a6c0 100644
> > --- a/Documentation/virt/kvm/amd-memory-encryption.rst
> > +++ b/Documentation/virt/kvm/amd-memory-encryption.rst
> > @@ -375,6 +375,14 @@ Returns: 0 on success, -negative on error
> > __u32 trans_len;
> > };
> >
> > +15. KVM_SEV_RECEIVE_FINISH
> > +------------------------
> > +
> > +After completion of the migration flow, the KVM_SEV_RECEIVE_FINISH command can be
> > +issued by the hypervisor to make the guest ready for execution.
> > +
> > +Returns: 0 on success, -negative on error
> > +
> > References
> > ==========
> >
> > diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> > index 5fc5355536d7..7c2721e18b06 100644
> > --- a/arch/x86/kvm/svm.c
> > +++ b/arch/x86/kvm/svm.c
> > @@ -7573,6 +7573,26 @@ static int sev_receive_update_data(struct kvm *kvm, struct kvm_sev_cmd *argp)
> > return ret;
> > }
> >
> > +static int sev_receive_finish(struct kvm *kvm, struct kvm_sev_cmd *argp)
> > +{
> > + struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> > + struct sev_data_receive_finish *data;
> > + int ret;
> > +
> > + if (!sev_guest(kvm))
> > + return -ENOTTY;
> > +
> > + data = kzalloc(sizeof(*data), GFP_KERNEL);
> > + if (!data)
> > + return -ENOMEM;
> > +
> > + data->handle = sev->handle;
> > + ret = sev_issue_cmd(kvm, SEV_CMD_RECEIVE_FINISH, data, &argp->error);
> > +
> > + kfree(data);
> > + return ret;
> > +}
> > +
> > static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
> > {
> > struct kvm_sev_cmd sev_cmd;
> > @@ -7632,6 +7652,9 @@ static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
> > case KVM_SEV_RECEIVE_UPDATE_DATA:
> > r = sev_receive_update_data(kvm, &sev_cmd);
> > break;
> > + case KVM_SEV_RECEIVE_FINISH:
> > + r = sev_receive_finish(kvm, &sev_cmd);
> > + break;
> > default:
> > r = -EINVAL;
> > goto out;
> Reviewed-by: Krish Sadhukhan <[email protected]>
As to ENOTTY, man page for ioctl translates it as "The specified
request does not apply to the kind of object that the file descriptor
fd references", which seems appropriate here.
Reviewed-by: Steve Rutherford <[email protected]>
On Thu, Apr 2, 2020 at 4:56 PM Krish Sadhukhan
<[email protected]> wrote:
>
>
> On 3/29/20 11:21 PM, Ashish Kalra wrote:
> > From: Brijesh Singh <[email protected]>
> >
> > KVM hypercall framework relies on alternative framework to patch the
> > VMCALL -> VMMCALL on AMD platform. If a hypercall is made before
> > apply_alternative()
>
> s/apply_alternative/apply_alternatives/
> > is called then it defaults to VMCALL. The approach
> > works fine on non SEV guest. A VMCALL would causes #UD, and hypervisor
> > will be able to decode the instruction and do the right things. But
> > when SEV is active, guest memory is encrypted with guest key and
> > hypervisor will not be able to decode the instruction bytes.
> >
> > Add SEV specific hypercall3, it unconditionally uses VMMCALL. The hypercall
> > will be used by the SEV guest to notify encrypted pages to the hypervisor.
> >
> > Cc: Thomas Gleixner <[email protected]>
> > Cc: Ingo Molnar <[email protected]>
> > Cc: "H. Peter Anvin" <[email protected]>
> > Cc: Paolo Bonzini <[email protected]>
> > Cc: "Radim Krčmář" <[email protected]>
> > Cc: Joerg Roedel <[email protected]>
> > Cc: Borislav Petkov <[email protected]>
> > Cc: Tom Lendacky <[email protected]>
> > Cc: [email protected]
> > Cc: [email protected]
> > Cc: [email protected]
> > Signed-off-by: Brijesh Singh <[email protected]>
> > Signed-off-by: Ashish Kalra <[email protected]>
> > ---
> > arch/x86/include/asm/kvm_para.h | 12 ++++++++++++
> > 1 file changed, 12 insertions(+)
> >
> > diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
> > index 9b4df6eaa11a..6c09255633a4 100644
> > --- a/arch/x86/include/asm/kvm_para.h
> > +++ b/arch/x86/include/asm/kvm_para.h
> > @@ -84,6 +84,18 @@ static inline long kvm_hypercall4(unsigned int nr, unsigned long p1,
> > return ret;
> > }
> >
> > +static inline long kvm_sev_hypercall3(unsigned int nr, unsigned long p1,
> > + unsigned long p2, unsigned long p3)
> > +{
> > + long ret;
> > +
> > + asm volatile("vmmcall"
> > + : "=a"(ret)
> > + : "a"(nr), "b"(p1), "c"(p2), "d"(p3)
> > + : "memory");
> > + return ret;
> > +}
> > +
> > #ifdef CONFIG_KVM_GUEST
> > bool kvm_para_available(void);
> > unsigned int kvm_arch_para_features(void);
> Reviewed-by: Krish Sadhukhan <[email protected]>
Nit: I'd personally have named this kvm_vmmcall3, since it's about
invoking a particular instruction. The usage happens to be for SEV.
Reviewed-by: Steve Rutherford <[email protected]>
On Sun, Mar 29, 2020 at 11:22 PM Ashish Kalra <[email protected]> wrote:
>
> From: Brijesh Singh <[email protected]>
>
> This hypercall is used by the SEV guest to notify a change in the page
> encryption status to the hypervisor. The hypercall should be invoked
> only when the encryption attribute is changed from encrypted -> decrypted
> and vice versa. By default all guest pages are considered encrypted.
>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: "H. Peter Anvin" <[email protected]>
> Cc: Paolo Bonzini <[email protected]>
> Cc: "Radim Krčmář" <[email protected]>
> Cc: Joerg Roedel <[email protected]>
> Cc: Borislav Petkov <[email protected]>
> Cc: Tom Lendacky <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Signed-off-by: Brijesh Singh <[email protected]>
> Signed-off-by: Ashish Kalra <[email protected]>
> ---
> Documentation/virt/kvm/hypercalls.rst | 15 +++++
> arch/x86/include/asm/kvm_host.h | 2 +
> arch/x86/kvm/svm.c | 95 +++++++++++++++++++++++++++
> arch/x86/kvm/vmx/vmx.c | 1 +
> arch/x86/kvm/x86.c | 6 ++
> include/uapi/linux/kvm_para.h | 1 +
> 6 files changed, 120 insertions(+)
>
> diff --git a/Documentation/virt/kvm/hypercalls.rst b/Documentation/virt/kvm/hypercalls.rst
> index dbaf207e560d..ff5287e68e81 100644
> --- a/Documentation/virt/kvm/hypercalls.rst
> +++ b/Documentation/virt/kvm/hypercalls.rst
> @@ -169,3 +169,18 @@ a0: destination APIC ID
>
> :Usage example: When sending a call-function IPI-many to vCPUs, yield if
> any of the IPI target vCPUs was preempted.
> +
> +
> +8. KVM_HC_PAGE_ENC_STATUS
> +-------------------------
> +:Architecture: x86
> +:Status: active
> +:Purpose: Notify the encryption status changes in guest page table (SEV guest)
> +
> +a0: the guest physical address of the start page
> +a1: the number of pages
> +a2: encryption attribute
> +
> + Where:
> + * 1: Encryption attribute is set
> + * 0: Encryption attribute is cleared
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 98959e8cd448..90718fa3db47 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1267,6 +1267,8 @@ struct kvm_x86_ops {
>
> bool (*apic_init_signal_blocked)(struct kvm_vcpu *vcpu);
> int (*enable_direct_tlbflush)(struct kvm_vcpu *vcpu);
> + int (*page_enc_status_hc)(struct kvm *kvm, unsigned long gpa,
> + unsigned long sz, unsigned long mode);
Nit: spell out size instead of sz.
> };
>
> struct kvm_arch_async_pf {
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index 7c2721e18b06..1d8beaf1bceb 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -136,6 +136,8 @@ struct kvm_sev_info {
> int fd; /* SEV device fd */
> unsigned long pages_locked; /* Number of pages locked */
> struct list_head regions_list; /* List of registered regions */
> + unsigned long *page_enc_bmap;
> + unsigned long page_enc_bmap_size;
> };
>
> struct kvm_svm {
> @@ -1991,6 +1993,9 @@ static void sev_vm_destroy(struct kvm *kvm)
>
> sev_unbind_asid(kvm, sev->handle);
> sev_asid_free(sev->asid);
> +
> + kvfree(sev->page_enc_bmap);
> + sev->page_enc_bmap = NULL;
> }
>
> static void avic_vm_destroy(struct kvm *kvm)
> @@ -7593,6 +7598,94 @@ static int sev_receive_finish(struct kvm *kvm, struct kvm_sev_cmd *argp)
> return ret;
> }
>
> +static int sev_resize_page_enc_bitmap(struct kvm *kvm, unsigned long new_size)
> +{
> + struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> + unsigned long *map;
> + unsigned long sz;
> +
> + if (sev->page_enc_bmap_size >= new_size)
> + return 0;
> +
> + sz = ALIGN(new_size, BITS_PER_LONG) / 8;
> +
> + map = vmalloc(sz);
> + if (!map) {
> + pr_err_once("Failed to allocate encrypted bitmap size %lx\n",
> + sz);
> + return -ENOMEM;
> + }
> +
> + /* mark the page encrypted (by default) */
> + memset(map, 0xff, sz);
> +
> + bitmap_copy(map, sev->page_enc_bmap, sev->page_enc_bmap_size);
> + kvfree(sev->page_enc_bmap);
> +
> + sev->page_enc_bmap = map;
> + sev->page_enc_bmap_size = new_size;
> +
> + return 0;
> +}
> +
> +static int svm_page_enc_status_hc(struct kvm *kvm, unsigned long gpa,
> + unsigned long npages, unsigned long enc)
> +{
> + struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> + kvm_pfn_t pfn_start, pfn_end;
> + gfn_t gfn_start, gfn_end;
> + int ret;
> +
> + if (!sev_guest(kvm))
> + return -EINVAL;
> +
> + if (!npages)
> + return 0;
> +
> + gfn_start = gpa_to_gfn(gpa);
> + gfn_end = gfn_start + npages;
> +
> + /* out of bound access error check */
> + if (gfn_end <= gfn_start)
> + return -EINVAL;
> +
> + /* lets make sure that gpa exist in our memslot */
> + pfn_start = gfn_to_pfn(kvm, gfn_start);
> + pfn_end = gfn_to_pfn(kvm, gfn_end);
> +
> + if (is_error_noslot_pfn(pfn_start) && !is_noslot_pfn(pfn_start)) {
> + /*
> + * Allow guest MMIO range(s) to be added
> + * to the page encryption bitmap.
> + */
> + return -EINVAL;
> + }
> +
> + if (is_error_noslot_pfn(pfn_end) && !is_noslot_pfn(pfn_end)) {
> + /*
> + * Allow guest MMIO range(s) to be added
> + * to the page encryption bitmap.
> + */
> + return -EINVAL;
> + }
> +
> + mutex_lock(&kvm->lock);
> + ret = sev_resize_page_enc_bitmap(kvm, gfn_end);
> + if (ret)
> + goto unlock;
> +
> + if (enc)
> + __bitmap_set(sev->page_enc_bmap, gfn_start,
> + gfn_end - gfn_start);
> + else
> + __bitmap_clear(sev->page_enc_bmap, gfn_start,
> + gfn_end - gfn_start);
> +
> +unlock:
> + mutex_unlock(&kvm->lock);
> + return ret;
> +}
> +
> static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
> {
> struct kvm_sev_cmd sev_cmd;
> @@ -7995,6 +8088,8 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
> .need_emulation_on_page_fault = svm_need_emulation_on_page_fault,
>
> .apic_init_signal_blocked = svm_apic_init_signal_blocked,
> +
> + .page_enc_status_hc = svm_page_enc_status_hc,
> };
>
> static int __init svm_init(void)
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 079d9fbf278e..f68e76ee7f9c 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -8001,6 +8001,7 @@ static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
> .nested_get_evmcs_version = NULL,
> .need_emulation_on_page_fault = vmx_need_emulation_on_page_fault,
> .apic_init_signal_blocked = vmx_apic_init_signal_blocked,
> + .page_enc_status_hc = NULL,
> };
>
> static void vmx_cleanup_l1d_flush(void)
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index cf95c36cb4f4..68428eef2dde 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -7564,6 +7564,12 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
> kvm_sched_yield(vcpu->kvm, a0);
> ret = 0;
> break;
> + case KVM_HC_PAGE_ENC_STATUS:
> + ret = -KVM_ENOSYS;
> + if (kvm_x86_ops->page_enc_status_hc)
> + ret = kvm_x86_ops->page_enc_status_hc(vcpu->kvm,
> + a0, a1, a2);
> + break;
> default:
> ret = -KVM_ENOSYS;
> break;
> diff --git a/include/uapi/linux/kvm_para.h b/include/uapi/linux/kvm_para.h
> index 8b86609849b9..847b83b75dc8 100644
> --- a/include/uapi/linux/kvm_para.h
> +++ b/include/uapi/linux/kvm_para.h
> @@ -29,6 +29,7 @@
> #define KVM_HC_CLOCK_PAIRING 9
> #define KVM_HC_SEND_IPI 10
> #define KVM_HC_SCHED_YIELD 11
> +#define KVM_HC_PAGE_ENC_STATUS 12
>
> /*
> * hypercalls use architecture specific
> --
> 2.17.1
>
I'm still not excited by the dynamic resizing. I believe the guest
hypercall can be called in atomic contexts, which makes me
particularly unexcited to see a potentially large vmalloc on the host
followed by filling the buffer. Particularly when the buffer might be
non-trivial in size (~1MB per 32GB, per some back of the envelope
math).
I'd like to see an enable cap for preallocating this. Yes, the first
call might not be the right value because of hotplug, but won't the
VMM know when hotplug is happening? If the VMM asks for the wrong
size, and does not update the size correctly before granting the VM
access to more RAM, that seems like the VMM's fault.
Hello Steve,
On Mon, Apr 06, 2020 at 07:17:37PM -0700, Steve Rutherford wrote:
> On Sun, Mar 29, 2020 at 11:22 PM Ashish Kalra <[email protected]> wrote:
> >
> > From: Brijesh Singh <[email protected]>
> >
> > This hypercall is used by the SEV guest to notify a change in the page
> > encryption status to the hypervisor. The hypercall should be invoked
> > only when the encryption attribute is changed from encrypted -> decrypted
> > and vice versa. By default all guest pages are considered encrypted.
> >
> > Cc: Thomas Gleixner <[email protected]>
> > Cc: Ingo Molnar <[email protected]>
> > Cc: "H. Peter Anvin" <[email protected]>
> > Cc: Paolo Bonzini <[email protected]>
> > Cc: "Radim Krčmář" <[email protected]>
> > Cc: Joerg Roedel <[email protected]>
> > Cc: Borislav Petkov <[email protected]>
> > Cc: Tom Lendacky <[email protected]>
> > Cc: [email protected]
> > Cc: [email protected]
> > Cc: [email protected]
> > Signed-off-by: Brijesh Singh <[email protected]>
> > Signed-off-by: Ashish Kalra <[email protected]>
> > ---
> > Documentation/virt/kvm/hypercalls.rst | 15 +++++
> > arch/x86/include/asm/kvm_host.h | 2 +
> > arch/x86/kvm/svm.c | 95 +++++++++++++++++++++++++++
> > arch/x86/kvm/vmx/vmx.c | 1 +
> > arch/x86/kvm/x86.c | 6 ++
> > include/uapi/linux/kvm_para.h | 1 +
> > 6 files changed, 120 insertions(+)
> >
> > diff --git a/Documentation/virt/kvm/hypercalls.rst b/Documentation/virt/kvm/hypercalls.rst
> > index dbaf207e560d..ff5287e68e81 100644
> > --- a/Documentation/virt/kvm/hypercalls.rst
> > +++ b/Documentation/virt/kvm/hypercalls.rst
> > @@ -169,3 +169,18 @@ a0: destination APIC ID
> >
> > :Usage example: When sending a call-function IPI-many to vCPUs, yield if
> > any of the IPI target vCPUs was preempted.
> > +
> > +
> > +8. KVM_HC_PAGE_ENC_STATUS
> > +-------------------------
> > +:Architecture: x86
> > +:Status: active
> > +:Purpose: Notify the encryption status changes in guest page table (SEV guest)
> > +
> > +a0: the guest physical address of the start page
> > +a1: the number of pages
> > +a2: encryption attribute
> > +
> > + Where:
> > + * 1: Encryption attribute is set
> > + * 0: Encryption attribute is cleared
> > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > index 98959e8cd448..90718fa3db47 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -1267,6 +1267,8 @@ struct kvm_x86_ops {
> >
> > bool (*apic_init_signal_blocked)(struct kvm_vcpu *vcpu);
> > int (*enable_direct_tlbflush)(struct kvm_vcpu *vcpu);
> > + int (*page_enc_status_hc)(struct kvm *kvm, unsigned long gpa,
> > + unsigned long sz, unsigned long mode);
> Nit: spell out size instead of sz.
> > };
> >
> > struct kvm_arch_async_pf {
> > diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> > index 7c2721e18b06..1d8beaf1bceb 100644
> > --- a/arch/x86/kvm/svm.c
> > +++ b/arch/x86/kvm/svm.c
> > @@ -136,6 +136,8 @@ struct kvm_sev_info {
> > int fd; /* SEV device fd */
> > unsigned long pages_locked; /* Number of pages locked */
> > struct list_head regions_list; /* List of registered regions */
> > + unsigned long *page_enc_bmap;
> > + unsigned long page_enc_bmap_size;
> > };
> >
> > struct kvm_svm {
> > @@ -1991,6 +1993,9 @@ static void sev_vm_destroy(struct kvm *kvm)
> >
> > sev_unbind_asid(kvm, sev->handle);
> > sev_asid_free(sev->asid);
> > +
> > + kvfree(sev->page_enc_bmap);
> > + sev->page_enc_bmap = NULL;
> > }
> >
> > static void avic_vm_destroy(struct kvm *kvm)
> > @@ -7593,6 +7598,94 @@ static int sev_receive_finish(struct kvm *kvm, struct kvm_sev_cmd *argp)
> > return ret;
> > }
> >
> > +static int sev_resize_page_enc_bitmap(struct kvm *kvm, unsigned long new_size)
> > +{
> > + struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> > + unsigned long *map;
> > + unsigned long sz;
> > +
> > + if (sev->page_enc_bmap_size >= new_size)
> > + return 0;
> > +
> > + sz = ALIGN(new_size, BITS_PER_LONG) / 8;
> > +
> > + map = vmalloc(sz);
> > + if (!map) {
> > + pr_err_once("Failed to allocate encrypted bitmap size %lx\n",
> > + sz);
> > + return -ENOMEM;
> > + }
> > +
> > + /* mark the page encrypted (by default) */
> > + memset(map, 0xff, sz);
> > +
> > + bitmap_copy(map, sev->page_enc_bmap, sev->page_enc_bmap_size);
> > + kvfree(sev->page_enc_bmap);
> > +
> > + sev->page_enc_bmap = map;
> > + sev->page_enc_bmap_size = new_size;
> > +
> > + return 0;
> > +}
> > +
> > +static int svm_page_enc_status_hc(struct kvm *kvm, unsigned long gpa,
> > + unsigned long npages, unsigned long enc)
> > +{
> > + struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> > + kvm_pfn_t pfn_start, pfn_end;
> > + gfn_t gfn_start, gfn_end;
> > + int ret;
> > +
> > + if (!sev_guest(kvm))
> > + return -EINVAL;
> > +
> > + if (!npages)
> > + return 0;
> > +
> > + gfn_start = gpa_to_gfn(gpa);
> > + gfn_end = gfn_start + npages;
> > +
> > + /* out of bound access error check */
> > + if (gfn_end <= gfn_start)
> > + return -EINVAL;
> > +
> > + /* lets make sure that gpa exist in our memslot */
> > + pfn_start = gfn_to_pfn(kvm, gfn_start);
> > + pfn_end = gfn_to_pfn(kvm, gfn_end);
> > +
> > + if (is_error_noslot_pfn(pfn_start) && !is_noslot_pfn(pfn_start)) {
> > + /*
> > + * Allow guest MMIO range(s) to be added
> > + * to the page encryption bitmap.
> > + */
> > + return -EINVAL;
> > + }
> > +
> > + if (is_error_noslot_pfn(pfn_end) && !is_noslot_pfn(pfn_end)) {
> > + /*
> > + * Allow guest MMIO range(s) to be added
> > + * to the page encryption bitmap.
> > + */
> > + return -EINVAL;
> > + }
> > +
> > + mutex_lock(&kvm->lock);
> > + ret = sev_resize_page_enc_bitmap(kvm, gfn_end);
> > + if (ret)
> > + goto unlock;
> > +
> > + if (enc)
> > + __bitmap_set(sev->page_enc_bmap, gfn_start,
> > + gfn_end - gfn_start);
> > + else
> > + __bitmap_clear(sev->page_enc_bmap, gfn_start,
> > + gfn_end - gfn_start);
> > +
> > +unlock:
> > + mutex_unlock(&kvm->lock);
> > + return ret;
> > +}
> > +
> > static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
> > {
> > struct kvm_sev_cmd sev_cmd;
> > @@ -7995,6 +8088,8 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
> > .need_emulation_on_page_fault = svm_need_emulation_on_page_fault,
> >
> > .apic_init_signal_blocked = svm_apic_init_signal_blocked,
> > +
> > + .page_enc_status_hc = svm_page_enc_status_hc,
> > };
> >
> > static int __init svm_init(void)
> > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> > index 079d9fbf278e..f68e76ee7f9c 100644
> > --- a/arch/x86/kvm/vmx/vmx.c
> > +++ b/arch/x86/kvm/vmx/vmx.c
> > @@ -8001,6 +8001,7 @@ static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
> > .nested_get_evmcs_version = NULL,
> > .need_emulation_on_page_fault = vmx_need_emulation_on_page_fault,
> > .apic_init_signal_blocked = vmx_apic_init_signal_blocked,
> > + .page_enc_status_hc = NULL,
> > };
> >
> > static void vmx_cleanup_l1d_flush(void)
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index cf95c36cb4f4..68428eef2dde 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -7564,6 +7564,12 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
> > kvm_sched_yield(vcpu->kvm, a0);
> > ret = 0;
> > break;
> > + case KVM_HC_PAGE_ENC_STATUS:
> > + ret = -KVM_ENOSYS;
> > + if (kvm_x86_ops->page_enc_status_hc)
> > + ret = kvm_x86_ops->page_enc_status_hc(vcpu->kvm,
> > + a0, a1, a2);
> > + break;
> > default:
> > ret = -KVM_ENOSYS;
> > break;
> > diff --git a/include/uapi/linux/kvm_para.h b/include/uapi/linux/kvm_para.h
> > index 8b86609849b9..847b83b75dc8 100644
> > --- a/include/uapi/linux/kvm_para.h
> > +++ b/include/uapi/linux/kvm_para.h
> > @@ -29,6 +29,7 @@
> > #define KVM_HC_CLOCK_PAIRING 9
> > #define KVM_HC_SEND_IPI 10
> > #define KVM_HC_SCHED_YIELD 11
> > +#define KVM_HC_PAGE_ENC_STATUS 12
> >
> > /*
> > * hypercalls use architecture specific
> > --
> > 2.17.1
> >
>
> I'm still not excited by the dynamic resizing. I believe the guest
> hypercall can be called in atomic contexts, which makes me
> particularly unexcited to see a potentially large vmalloc on the host
> followed by filling the buffer. Particularly when the buffer might be
> non-trivial in size (~1MB per 32GB, per some back of the envelope
> math).
>
I think looking at more practical situations, most hypercalls will
happen during the boot stage, when device specific initializations are
happening, so typically the maximum page encryption bitmap size would
be allocated early enough.
In fact, initial hypercalls made by OVMF will probably allocate the
maximum page bitmap size even before the kernel comes up, especially
as they will be setting up page enc/dec status for MMIO, ROM, ACPI
regions, PCI device memory, etc., and most importantly for
"non-existent" high memory range (which will probably be the
maximum size page encryption bitmap allocated/resized).
Let me know if you have different thoughts on this ?
> I'd like to see an enable cap for preallocating this. Yes, the first
> call might not be the right value because of hotplug, but won't the
> VMM know when hotplug is happening? If the VMM asks for the wrong
> size, and does not update the size correctly before granting the VM
> access to more RAM, that seems like the VMM's fault.o
Thanks,
Ashish
On Mon, Apr 6, 2020 at 10:27 PM Ashish Kalra <[email protected]> wrote:
>
> Hello Steve,
>
> On Mon, Apr 06, 2020 at 07:17:37PM -0700, Steve Rutherford wrote:
> > On Sun, Mar 29, 2020 at 11:22 PM Ashish Kalra <[email protected]> wrote:
> > >
> > > From: Brijesh Singh <[email protected]>
> > >
> > > This hypercall is used by the SEV guest to notify a change in the page
> > > encryption status to the hypervisor. The hypercall should be invoked
> > > only when the encryption attribute is changed from encrypted -> decrypted
> > > and vice versa. By default all guest pages are considered encrypted.
> > >
> > > Cc: Thomas Gleixner <[email protected]>
> > > Cc: Ingo Molnar <[email protected]>
> > > Cc: "H. Peter Anvin" <[email protected]>
> > > Cc: Paolo Bonzini <[email protected]>
> > > Cc: "Radim Krčmář" <[email protected]>
> > > Cc: Joerg Roedel <[email protected]>
> > > Cc: Borislav Petkov <[email protected]>
> > > Cc: Tom Lendacky <[email protected]>
> > > Cc: [email protected]
> > > Cc: [email protected]
> > > Cc: [email protected]
> > > Signed-off-by: Brijesh Singh <[email protected]>
> > > Signed-off-by: Ashish Kalra <[email protected]>
> > > ---
> > > Documentation/virt/kvm/hypercalls.rst | 15 +++++
> > > arch/x86/include/asm/kvm_host.h | 2 +
> > > arch/x86/kvm/svm.c | 95 +++++++++++++++++++++++++++
> > > arch/x86/kvm/vmx/vmx.c | 1 +
> > > arch/x86/kvm/x86.c | 6 ++
> > > include/uapi/linux/kvm_para.h | 1 +
> > > 6 files changed, 120 insertions(+)
> > >
> > > diff --git a/Documentation/virt/kvm/hypercalls.rst b/Documentation/virt/kvm/hypercalls.rst
> > > index dbaf207e560d..ff5287e68e81 100644
> > > --- a/Documentation/virt/kvm/hypercalls.rst
> > > +++ b/Documentation/virt/kvm/hypercalls.rst
> > > @@ -169,3 +169,18 @@ a0: destination APIC ID
> > >
> > > :Usage example: When sending a call-function IPI-many to vCPUs, yield if
> > > any of the IPI target vCPUs was preempted.
> > > +
> > > +
> > > +8. KVM_HC_PAGE_ENC_STATUS
> > > +-------------------------
> > > +:Architecture: x86
> > > +:Status: active
> > > +:Purpose: Notify the encryption status changes in guest page table (SEV guest)
> > > +
> > > +a0: the guest physical address of the start page
> > > +a1: the number of pages
> > > +a2: encryption attribute
> > > +
> > > + Where:
> > > + * 1: Encryption attribute is set
> > > + * 0: Encryption attribute is cleared
> > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > > index 98959e8cd448..90718fa3db47 100644
> > > --- a/arch/x86/include/asm/kvm_host.h
> > > +++ b/arch/x86/include/asm/kvm_host.h
> > > @@ -1267,6 +1267,8 @@ struct kvm_x86_ops {
> > >
> > > bool (*apic_init_signal_blocked)(struct kvm_vcpu *vcpu);
> > > int (*enable_direct_tlbflush)(struct kvm_vcpu *vcpu);
> > > + int (*page_enc_status_hc)(struct kvm *kvm, unsigned long gpa,
> > > + unsigned long sz, unsigned long mode);
> > Nit: spell out size instead of sz.
> > > };
> > >
> > > struct kvm_arch_async_pf {
> > > diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> > > index 7c2721e18b06..1d8beaf1bceb 100644
> > > --- a/arch/x86/kvm/svm.c
> > > +++ b/arch/x86/kvm/svm.c
> > > @@ -136,6 +136,8 @@ struct kvm_sev_info {
> > > int fd; /* SEV device fd */
> > > unsigned long pages_locked; /* Number of pages locked */
> > > struct list_head regions_list; /* List of registered regions */
> > > + unsigned long *page_enc_bmap;
> > > + unsigned long page_enc_bmap_size;
> > > };
> > >
> > > struct kvm_svm {
> > > @@ -1991,6 +1993,9 @@ static void sev_vm_destroy(struct kvm *kvm)
> > >
> > > sev_unbind_asid(kvm, sev->handle);
> > > sev_asid_free(sev->asid);
> > > +
> > > + kvfree(sev->page_enc_bmap);
> > > + sev->page_enc_bmap = NULL;
> > > }
> > >
> > > static void avic_vm_destroy(struct kvm *kvm)
> > > @@ -7593,6 +7598,94 @@ static int sev_receive_finish(struct kvm *kvm, struct kvm_sev_cmd *argp)
> > > return ret;
> > > }
> > >
> > > +static int sev_resize_page_enc_bitmap(struct kvm *kvm, unsigned long new_size)
> > > +{
> > > + struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> > > + unsigned long *map;
> > > + unsigned long sz;
> > > +
> > > + if (sev->page_enc_bmap_size >= new_size)
> > > + return 0;
> > > +
> > > + sz = ALIGN(new_size, BITS_PER_LONG) / 8;
> > > +
> > > + map = vmalloc(sz);
> > > + if (!map) {
> > > + pr_err_once("Failed to allocate encrypted bitmap size %lx\n",
> > > + sz);
> > > + return -ENOMEM;
> > > + }
> > > +
> > > + /* mark the page encrypted (by default) */
> > > + memset(map, 0xff, sz);
> > > +
> > > + bitmap_copy(map, sev->page_enc_bmap, sev->page_enc_bmap_size);
> > > + kvfree(sev->page_enc_bmap);
> > > +
> > > + sev->page_enc_bmap = map;
> > > + sev->page_enc_bmap_size = new_size;
> > > +
> > > + return 0;
> > > +}
> > > +
> > > +static int svm_page_enc_status_hc(struct kvm *kvm, unsigned long gpa,
> > > + unsigned long npages, unsigned long enc)
> > > +{
> > > + struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> > > + kvm_pfn_t pfn_start, pfn_end;
> > > + gfn_t gfn_start, gfn_end;
> > > + int ret;
> > > +
> > > + if (!sev_guest(kvm))
> > > + return -EINVAL;
> > > +
> > > + if (!npages)
> > > + return 0;
> > > +
> > > + gfn_start = gpa_to_gfn(gpa);
> > > + gfn_end = gfn_start + npages;
> > > +
> > > + /* out of bound access error check */
> > > + if (gfn_end <= gfn_start)
> > > + return -EINVAL;
> > > +
> > > + /* lets make sure that gpa exist in our memslot */
> > > + pfn_start = gfn_to_pfn(kvm, gfn_start);
> > > + pfn_end = gfn_to_pfn(kvm, gfn_end);
> > > +
> > > + if (is_error_noslot_pfn(pfn_start) && !is_noslot_pfn(pfn_start)) {
> > > + /*
> > > + * Allow guest MMIO range(s) to be added
> > > + * to the page encryption bitmap.
> > > + */
> > > + return -EINVAL;
> > > + }
> > > +
> > > + if (is_error_noslot_pfn(pfn_end) && !is_noslot_pfn(pfn_end)) {
> > > + /*
> > > + * Allow guest MMIO range(s) to be added
> > > + * to the page encryption bitmap.
> > > + */
> > > + return -EINVAL;
> > > + }
> > > +
> > > + mutex_lock(&kvm->lock);
> > > + ret = sev_resize_page_enc_bitmap(kvm, gfn_end);
> > > + if (ret)
> > > + goto unlock;
> > > +
> > > + if (enc)
> > > + __bitmap_set(sev->page_enc_bmap, gfn_start,
> > > + gfn_end - gfn_start);
> > > + else
> > > + __bitmap_clear(sev->page_enc_bmap, gfn_start,
> > > + gfn_end - gfn_start);
> > > +
> > > +unlock:
> > > + mutex_unlock(&kvm->lock);
> > > + return ret;
> > > +}
> > > +
> > > static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
> > > {
> > > struct kvm_sev_cmd sev_cmd;
> > > @@ -7995,6 +8088,8 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
> > > .need_emulation_on_page_fault = svm_need_emulation_on_page_fault,
> > >
> > > .apic_init_signal_blocked = svm_apic_init_signal_blocked,
> > > +
> > > + .page_enc_status_hc = svm_page_enc_status_hc,
> > > };
> > >
> > > static int __init svm_init(void)
> > > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> > > index 079d9fbf278e..f68e76ee7f9c 100644
> > > --- a/arch/x86/kvm/vmx/vmx.c
> > > +++ b/arch/x86/kvm/vmx/vmx.c
> > > @@ -8001,6 +8001,7 @@ static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
> > > .nested_get_evmcs_version = NULL,
> > > .need_emulation_on_page_fault = vmx_need_emulation_on_page_fault,
> > > .apic_init_signal_blocked = vmx_apic_init_signal_blocked,
> > > + .page_enc_status_hc = NULL,
> > > };
> > >
> > > static void vmx_cleanup_l1d_flush(void)
> > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > > index cf95c36cb4f4..68428eef2dde 100644
> > > --- a/arch/x86/kvm/x86.c
> > > +++ b/arch/x86/kvm/x86.c
> > > @@ -7564,6 +7564,12 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
> > > kvm_sched_yield(vcpu->kvm, a0);
> > > ret = 0;
> > > break;
> > > + case KVM_HC_PAGE_ENC_STATUS:
> > > + ret = -KVM_ENOSYS;
> > > + if (kvm_x86_ops->page_enc_status_hc)
> > > + ret = kvm_x86_ops->page_enc_status_hc(vcpu->kvm,
> > > + a0, a1, a2);
> > > + break;
> > > default:
> > > ret = -KVM_ENOSYS;
> > > break;
> > > diff --git a/include/uapi/linux/kvm_para.h b/include/uapi/linux/kvm_para.h
> > > index 8b86609849b9..847b83b75dc8 100644
> > > --- a/include/uapi/linux/kvm_para.h
> > > +++ b/include/uapi/linux/kvm_para.h
> > > @@ -29,6 +29,7 @@
> > > #define KVM_HC_CLOCK_PAIRING 9
> > > #define KVM_HC_SEND_IPI 10
> > > #define KVM_HC_SCHED_YIELD 11
> > > +#define KVM_HC_PAGE_ENC_STATUS 12
> > >
> > > /*
> > > * hypercalls use architecture specific
> > > --
> > > 2.17.1
> > >
> >
> > I'm still not excited by the dynamic resizing. I believe the guest
> > hypercall can be called in atomic contexts, which makes me
> > particularly unexcited to see a potentially large vmalloc on the host
> > followed by filling the buffer. Particularly when the buffer might be
> > non-trivial in size (~1MB per 32GB, per some back of the envelope
> > math).
> >
>
> I think looking at more practical situations, most hypercalls will
> happen during the boot stage, when device specific initializations are
> happening, so typically the maximum page encryption bitmap size would
> be allocated early enough.
>
> In fact, initial hypercalls made by OVMF will probably allocate the
> maximum page bitmap size even before the kernel comes up, especially
> as they will be setting up page enc/dec status for MMIO, ROM, ACPI
> regions, PCI device memory, etc., and most importantly for
> "non-existent" high memory range (which will probably be the
> maximum size page encryption bitmap allocated/resized).
>
> Let me know if you have different thoughts on this ?
Hi Ashish,
If this is not an issue in practice, we can just move past this. If we
are basically guaranteed that OVMF will trigger hypercalls that expand
the bitmap beyond the top of memory, then, yes, that should work. That
leaves me slightly nervous that OVMF might regress since it's not
obvious that calling a hypercall beyond the top of memory would be
"required" for avoiding a somewhat indirectly related issue in guest
kernels.
Adding a kvm_enable_cap doesn't seem particularly complicated and side
steps all of these concerns, so I still prefer it. Caveat, haven't
reviewed the patches about the feature bits yet: the enable cap would
also make it possible for kernels that support live migration to avoid
advertising live migration if host usermode does not want it to be
advertised. This seems pretty important, since hosts that don't plan
to live migrate should have the ability to tell the guest to stop
calling.
Thanks,
Steve
On Sun, Mar 29, 2020 at 11:23 PM Ashish Kalra <[email protected]> wrote:
>
> From: Brijesh Singh <[email protected]>
>
> The ioctl can be used to set page encryption bitmap for an
> incoming guest.
>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: "H. Peter Anvin" <[email protected]>
> Cc: Paolo Bonzini <[email protected]>
> Cc: "Radim Krčmář" <[email protected]>
> Cc: Joerg Roedel <[email protected]>
> Cc: Borislav Petkov <[email protected]>
> Cc: Tom Lendacky <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Signed-off-by: Brijesh Singh <[email protected]>
> Signed-off-by: Ashish Kalra <[email protected]>
> ---
> Documentation/virt/kvm/api.rst | 22 +++++++++++++++++
> arch/x86/include/asm/kvm_host.h | 2 ++
> arch/x86/kvm/svm.c | 42 +++++++++++++++++++++++++++++++++
> arch/x86/kvm/x86.c | 12 ++++++++++
> include/uapi/linux/kvm.h | 1 +
> 5 files changed, 79 insertions(+)
>
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index 8ad800ebb54f..4d1004a154f6 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -4675,6 +4675,28 @@ or shared. The bitmap can be used during the guest migration, if the page
> is private then userspace need to use SEV migration commands to transmit
> the page.
>
> +4.126 KVM_SET_PAGE_ENC_BITMAP (vm ioctl)
> +---------------------------------------
> +
> +:Capability: basic
> +:Architectures: x86
> +:Type: vm ioctl
> +:Parameters: struct kvm_page_enc_bitmap (in/out)
> +:Returns: 0 on success, -1 on error
> +
> +/* for KVM_SET_PAGE_ENC_BITMAP */
> +struct kvm_page_enc_bitmap {
> + __u64 start_gfn;
> + __u64 num_pages;
> + union {
> + void __user *enc_bitmap; /* one bit per page */
> + __u64 padding2;
> + };
> +};
> +
> +During the guest live migration the outgoing guest exports its page encryption
> +bitmap, the KVM_SET_PAGE_ENC_BITMAP can be used to build the page encryption
> +bitmap for an incoming guest.
>
> 5. The kvm_run structure
> ========================
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 27e43e3ec9d8..d30f770aaaea 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1271,6 +1271,8 @@ struct kvm_x86_ops {
> unsigned long sz, unsigned long mode);
> int (*get_page_enc_bitmap)(struct kvm *kvm,
> struct kvm_page_enc_bitmap *bmap);
> + int (*set_page_enc_bitmap)(struct kvm *kvm,
> + struct kvm_page_enc_bitmap *bmap);
> };
>
> struct kvm_arch_async_pf {
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index bae783cd396a..313343a43045 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -7756,6 +7756,47 @@ static int svm_get_page_enc_bitmap(struct kvm *kvm,
> return ret;
> }
>
> +static int svm_set_page_enc_bitmap(struct kvm *kvm,
> + struct kvm_page_enc_bitmap *bmap)
> +{
> + struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> + unsigned long gfn_start, gfn_end;
> + unsigned long *bitmap;
> + unsigned long sz, i;
> + int ret;
> +
> + if (!sev_guest(kvm))
> + return -ENOTTY;
> +
> + gfn_start = bmap->start_gfn;
> + gfn_end = gfn_start + bmap->num_pages;
> +
> + sz = ALIGN(bmap->num_pages, BITS_PER_LONG) / 8;
> + bitmap = kmalloc(sz, GFP_KERNEL);
> + if (!bitmap)
> + return -ENOMEM;
> +
> + ret = -EFAULT;
> + if (copy_from_user(bitmap, bmap->enc_bitmap, sz))
> + goto out;
> +
> + mutex_lock(&kvm->lock);
> + ret = sev_resize_page_enc_bitmap(kvm, gfn_end);
I realize now that usermode could use this for initializing the
minimum size of the enc bitmap, which probably solves my issue from
the other thread.
> + if (ret)
> + goto unlock;
> +
> + i = gfn_start;
> + for_each_clear_bit_from(i, bitmap, (gfn_end - gfn_start))
> + clear_bit(i + gfn_start, sev->page_enc_bmap);
This API seems a bit strange, since it can only clear bits. I would
expect "set" to force the values to match the values passed down,
instead of only ensuring that cleared bits in the input are also
cleared in the kernel.
This should copy the values from userspace (and fix up the ends since
byte alignment makes that complicated), instead of iterating in this
way.
> +
> + ret = 0;
> +unlock:
> + mutex_unlock(&kvm->lock);
> +out:
> + kfree(bitmap);
> + return ret;
> +}
> +
> static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
> {
> struct kvm_sev_cmd sev_cmd;
> @@ -8161,6 +8202,7 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
>
> .page_enc_status_hc = svm_page_enc_status_hc,
> .get_page_enc_bitmap = svm_get_page_enc_bitmap,
> + .set_page_enc_bitmap = svm_set_page_enc_bitmap,
> };
>
> static int __init svm_init(void)
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 3c3fea4e20b5..05e953b2ec61 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -5238,6 +5238,18 @@ long kvm_arch_vm_ioctl(struct file *filp,
> r = kvm_x86_ops->get_page_enc_bitmap(kvm, &bitmap);
> break;
> }
> + case KVM_SET_PAGE_ENC_BITMAP: {
> + struct kvm_page_enc_bitmap bitmap;
> +
> + r = -EFAULT;
> + if (copy_from_user(&bitmap, argp, sizeof(bitmap)))
> + goto out;
> +
> + r = -ENOTTY;
> + if (kvm_x86_ops->set_page_enc_bitmap)
> + r = kvm_x86_ops->set_page_enc_bitmap(kvm, &bitmap);
> + break;
> + }
> default:
> r = -ENOTTY;
> }
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index db1ebf85e177..b4b01d47e568 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -1489,6 +1489,7 @@ struct kvm_enc_region {
> #define KVM_S390_CLEAR_RESET _IO(KVMIO, 0xc4)
>
> #define KVM_GET_PAGE_ENC_BITMAP _IOW(KVMIO, 0xc5, struct kvm_page_enc_bitmap)
> +#define KVM_SET_PAGE_ENC_BITMAP _IOW(KVMIO, 0xc6, struct kvm_page_enc_bitmap)
>
> /* Secure Encrypted Virtualization command */
> enum sev_cmd_id {
> --
> 2.17.1
>
On 4/7/20 7:01 PM, Steve Rutherford wrote:
> On Mon, Apr 6, 2020 at 10:27 PM Ashish Kalra <[email protected]> wrote:
>> Hello Steve,
>>
>> On Mon, Apr 06, 2020 at 07:17:37PM -0700, Steve Rutherford wrote:
>>> On Sun, Mar 29, 2020 at 11:22 PM Ashish Kalra <[email protected]> wrote:
>>>> From: Brijesh Singh <[email protected]>
>>>>
>>>> This hypercall is used by the SEV guest to notify a change in the page
>>>> encryption status to the hypervisor. The hypercall should be invoked
>>>> only when the encryption attribute is changed from encrypted -> decrypted
>>>> and vice versa. By default all guest pages are considered encrypted.
>>>>
>>>> Cc: Thomas Gleixner <[email protected]>
>>>> Cc: Ingo Molnar <[email protected]>
>>>> Cc: "H. Peter Anvin" <[email protected]>
>>>> Cc: Paolo Bonzini <[email protected]>
>>>> Cc: "Radim Krčmář" <[email protected]>
>>>> Cc: Joerg Roedel <[email protected]>
>>>> Cc: Borislav Petkov <[email protected]>
>>>> Cc: Tom Lendacky <[email protected]>
>>>> Cc: [email protected]
>>>> Cc: [email protected]
>>>> Cc: [email protected]
>>>> Signed-off-by: Brijesh Singh <[email protected]>
>>>> Signed-off-by: Ashish Kalra <[email protected]>
>>>> ---
>>>> Documentation/virt/kvm/hypercalls.rst | 15 +++++
>>>> arch/x86/include/asm/kvm_host.h | 2 +
>>>> arch/x86/kvm/svm.c | 95 +++++++++++++++++++++++++++
>>>> arch/x86/kvm/vmx/vmx.c | 1 +
>>>> arch/x86/kvm/x86.c | 6 ++
>>>> include/uapi/linux/kvm_para.h | 1 +
>>>> 6 files changed, 120 insertions(+)
>>>>
>>>> diff --git a/Documentation/virt/kvm/hypercalls.rst b/Documentation/virt/kvm/hypercalls.rst
>>>> index dbaf207e560d..ff5287e68e81 100644
>>>> --- a/Documentation/virt/kvm/hypercalls.rst
>>>> +++ b/Documentation/virt/kvm/hypercalls.rst
>>>> @@ -169,3 +169,18 @@ a0: destination APIC ID
>>>>
>>>> :Usage example: When sending a call-function IPI-many to vCPUs, yield if
>>>> any of the IPI target vCPUs was preempted.
>>>> +
>>>> +
>>>> +8. KVM_HC_PAGE_ENC_STATUS
>>>> +-------------------------
>>>> +:Architecture: x86
>>>> +:Status: active
>>>> +:Purpose: Notify the encryption status changes in guest page table (SEV guest)
>>>> +
>>>> +a0: the guest physical address of the start page
>>>> +a1: the number of pages
>>>> +a2: encryption attribute
>>>> +
>>>> + Where:
>>>> + * 1: Encryption attribute is set
>>>> + * 0: Encryption attribute is cleared
>>>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>>>> index 98959e8cd448..90718fa3db47 100644
>>>> --- a/arch/x86/include/asm/kvm_host.h
>>>> +++ b/arch/x86/include/asm/kvm_host.h
>>>> @@ -1267,6 +1267,8 @@ struct kvm_x86_ops {
>>>>
>>>> bool (*apic_init_signal_blocked)(struct kvm_vcpu *vcpu);
>>>> int (*enable_direct_tlbflush)(struct kvm_vcpu *vcpu);
>>>> + int (*page_enc_status_hc)(struct kvm *kvm, unsigned long gpa,
>>>> + unsigned long sz, unsigned long mode);
>>> Nit: spell out size instead of sz.
>>>> };
>>>>
>>>> struct kvm_arch_async_pf {
>>>> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
>>>> index 7c2721e18b06..1d8beaf1bceb 100644
>>>> --- a/arch/x86/kvm/svm.c
>>>> +++ b/arch/x86/kvm/svm.c
>>>> @@ -136,6 +136,8 @@ struct kvm_sev_info {
>>>> int fd; /* SEV device fd */
>>>> unsigned long pages_locked; /* Number of pages locked */
>>>> struct list_head regions_list; /* List of registered regions */
>>>> + unsigned long *page_enc_bmap;
>>>> + unsigned long page_enc_bmap_size;
>>>> };
>>>>
>>>> struct kvm_svm {
>>>> @@ -1991,6 +1993,9 @@ static void sev_vm_destroy(struct kvm *kvm)
>>>>
>>>> sev_unbind_asid(kvm, sev->handle);
>>>> sev_asid_free(sev->asid);
>>>> +
>>>> + kvfree(sev->page_enc_bmap);
>>>> + sev->page_enc_bmap = NULL;
>>>> }
>>>>
>>>> static void avic_vm_destroy(struct kvm *kvm)
>>>> @@ -7593,6 +7598,94 @@ static int sev_receive_finish(struct kvm *kvm, struct kvm_sev_cmd *argp)
>>>> return ret;
>>>> }
>>>>
>>>> +static int sev_resize_page_enc_bitmap(struct kvm *kvm, unsigned long new_size)
>>>> +{
>>>> + struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
>>>> + unsigned long *map;
>>>> + unsigned long sz;
>>>> +
>>>> + if (sev->page_enc_bmap_size >= new_size)
>>>> + return 0;
>>>> +
>>>> + sz = ALIGN(new_size, BITS_PER_LONG) / 8;
>>>> +
>>>> + map = vmalloc(sz);
>>>> + if (!map) {
>>>> + pr_err_once("Failed to allocate encrypted bitmap size %lx\n",
>>>> + sz);
>>>> + return -ENOMEM;
>>>> + }
>>>> +
>>>> + /* mark the page encrypted (by default) */
>>>> + memset(map, 0xff, sz);
>>>> +
>>>> + bitmap_copy(map, sev->page_enc_bmap, sev->page_enc_bmap_size);
>>>> + kvfree(sev->page_enc_bmap);
>>>> +
>>>> + sev->page_enc_bmap = map;
>>>> + sev->page_enc_bmap_size = new_size;
>>>> +
>>>> + return 0;
>>>> +}
>>>> +
>>>> +static int svm_page_enc_status_hc(struct kvm *kvm, unsigned long gpa,
>>>> + unsigned long npages, unsigned long enc)
>>>> +{
>>>> + struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
>>>> + kvm_pfn_t pfn_start, pfn_end;
>>>> + gfn_t gfn_start, gfn_end;
>>>> + int ret;
>>>> +
>>>> + if (!sev_guest(kvm))
>>>> + return -EINVAL;
>>>> +
>>>> + if (!npages)
>>>> + return 0;
>>>> +
>>>> + gfn_start = gpa_to_gfn(gpa);
>>>> + gfn_end = gfn_start + npages;
>>>> +
>>>> + /* out of bound access error check */
>>>> + if (gfn_end <= gfn_start)
>>>> + return -EINVAL;
>>>> +
>>>> + /* lets make sure that gpa exist in our memslot */
>>>> + pfn_start = gfn_to_pfn(kvm, gfn_start);
>>>> + pfn_end = gfn_to_pfn(kvm, gfn_end);
>>>> +
>>>> + if (is_error_noslot_pfn(pfn_start) && !is_noslot_pfn(pfn_start)) {
>>>> + /*
>>>> + * Allow guest MMIO range(s) to be added
>>>> + * to the page encryption bitmap.
>>>> + */
>>>> + return -EINVAL;
>>>> + }
>>>> +
>>>> + if (is_error_noslot_pfn(pfn_end) && !is_noslot_pfn(pfn_end)) {
>>>> + /*
>>>> + * Allow guest MMIO range(s) to be added
>>>> + * to the page encryption bitmap.
>>>> + */
>>>> + return -EINVAL;
>>>> + }
>>>> +
>>>> + mutex_lock(&kvm->lock);
>>>> + ret = sev_resize_page_enc_bitmap(kvm, gfn_end);
>>>> + if (ret)
>>>> + goto unlock;
>>>> +
>>>> + if (enc)
>>>> + __bitmap_set(sev->page_enc_bmap, gfn_start,
>>>> + gfn_end - gfn_start);
>>>> + else
>>>> + __bitmap_clear(sev->page_enc_bmap, gfn_start,
>>>> + gfn_end - gfn_start);
>>>> +
>>>> +unlock:
>>>> + mutex_unlock(&kvm->lock);
>>>> + return ret;
>>>> +}
>>>> +
>>>> static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
>>>> {
>>>> struct kvm_sev_cmd sev_cmd;
>>>> @@ -7995,6 +8088,8 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
>>>> .need_emulation_on_page_fault = svm_need_emulation_on_page_fault,
>>>>
>>>> .apic_init_signal_blocked = svm_apic_init_signal_blocked,
>>>> +
>>>> + .page_enc_status_hc = svm_page_enc_status_hc,
>>>> };
>>>>
>>>> static int __init svm_init(void)
>>>> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
>>>> index 079d9fbf278e..f68e76ee7f9c 100644
>>>> --- a/arch/x86/kvm/vmx/vmx.c
>>>> +++ b/arch/x86/kvm/vmx/vmx.c
>>>> @@ -8001,6 +8001,7 @@ static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
>>>> .nested_get_evmcs_version = NULL,
>>>> .need_emulation_on_page_fault = vmx_need_emulation_on_page_fault,
>>>> .apic_init_signal_blocked = vmx_apic_init_signal_blocked,
>>>> + .page_enc_status_hc = NULL,
>>>> };
>>>>
>>>> static void vmx_cleanup_l1d_flush(void)
>>>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>>>> index cf95c36cb4f4..68428eef2dde 100644
>>>> --- a/arch/x86/kvm/x86.c
>>>> +++ b/arch/x86/kvm/x86.c
>>>> @@ -7564,6 +7564,12 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
>>>> kvm_sched_yield(vcpu->kvm, a0);
>>>> ret = 0;
>>>> break;
>>>> + case KVM_HC_PAGE_ENC_STATUS:
>>>> + ret = -KVM_ENOSYS;
>>>> + if (kvm_x86_ops->page_enc_status_hc)
>>>> + ret = kvm_x86_ops->page_enc_status_hc(vcpu->kvm,
>>>> + a0, a1, a2);
>>>> + break;
>>>> default:
>>>> ret = -KVM_ENOSYS;
>>>> break;
>>>> diff --git a/include/uapi/linux/kvm_para.h b/include/uapi/linux/kvm_para.h
>>>> index 8b86609849b9..847b83b75dc8 100644
>>>> --- a/include/uapi/linux/kvm_para.h
>>>> +++ b/include/uapi/linux/kvm_para.h
>>>> @@ -29,6 +29,7 @@
>>>> #define KVM_HC_CLOCK_PAIRING 9
>>>> #define KVM_HC_SEND_IPI 10
>>>> #define KVM_HC_SCHED_YIELD 11
>>>> +#define KVM_HC_PAGE_ENC_STATUS 12
>>>>
>>>> /*
>>>> * hypercalls use architecture specific
>>>> --
>>>> 2.17.1
>>>>
>>> I'm still not excited by the dynamic resizing. I believe the guest
>>> hypercall can be called in atomic contexts, which makes me
>>> particularly unexcited to see a potentially large vmalloc on the host
>>> followed by filling the buffer. Particularly when the buffer might be
>>> non-trivial in size (~1MB per 32GB, per some back of the envelope
>>> math).
>>>
>> I think looking at more practical situations, most hypercalls will
>> happen during the boot stage, when device specific initializations are
>> happening, so typically the maximum page encryption bitmap size would
>> be allocated early enough.
>>
>> In fact, initial hypercalls made by OVMF will probably allocate the
>> maximum page bitmap size even before the kernel comes up, especially
>> as they will be setting up page enc/dec status for MMIO, ROM, ACPI
>> regions, PCI device memory, etc., and most importantly for
>> "non-existent" high memory range (which will probably be the
>> maximum size page encryption bitmap allocated/resized).
>>
>> Let me know if you have different thoughts on this ?
> Hi Ashish,
>
> If this is not an issue in practice, we can just move past this. If we
> are basically guaranteed that OVMF will trigger hypercalls that expand
> the bitmap beyond the top of memory, then, yes, that should work. That
> leaves me slightly nervous that OVMF might regress since it's not
> obvious that calling a hypercall beyond the top of memory would be
> "required" for avoiding a somewhat indirectly related issue in guest
> kernels.
If possible then we should try to avoid growing/shrinking the bitmap .
Today OVMF may not be accessing beyond memory but a malicious guest
could send a hypercall down which can trigger a huge memory allocation
on the host side and may eventually cause denial of service for other.
I am in favor if we can find some solution to handle this case. How
about Steve's suggestion about VMM making a call down to the kernel to
tell how big the bitmap should be? Initially it should be equal to the
guest RAM and if VMM ever did the memory expansion then it can send down
another notification to increase the bitmap ?
Optionally, instead of adding a new ioctl, I was wondering if we can
extend the kvm_arch_prepare_memory_region() to make svm specific x86_ops
which can take read the userspace provided memory region and calculate
the amount of guest RAM managed by the KVM and grow/shrink the bitmap
based on that information. I have not looked deep enough to see if its
doable but if it can work then we can avoid adding yet another ioctl.
>
> Adding a kvm_enable_cap doesn't seem particularly complicated and side
> steps all of these concerns, so I still prefer it. Caveat, haven't
> reviewed the patches about the feature bits yet: the enable cap would
> also make it possible for kernels that support live migration to avoid
> advertising live migration if host usermode does not want it to be
> advertised. This seems pretty important, since hosts that don't plan
> to live migrate should have the ability to tell the guest to stop
> calling.
>
> Thanks,
> Steve
On Tue, Apr 7, 2020 at 5:29 PM Brijesh Singh <[email protected]> wrote:
>
>
> On 4/7/20 7:01 PM, Steve Rutherford wrote:
> > On Mon, Apr 6, 2020 at 10:27 PM Ashish Kalra <[email protected]> wrote:
> >> Hello Steve,
> >>
> >> On Mon, Apr 06, 2020 at 07:17:37PM -0700, Steve Rutherford wrote:
> >>> On Sun, Mar 29, 2020 at 11:22 PM Ashish Kalra <[email protected]> wrote:
> >>>> From: Brijesh Singh <[email protected]>
> >>>>
> >>>> This hypercall is used by the SEV guest to notify a change in the page
> >>>> encryption status to the hypervisor. The hypercall should be invoked
> >>>> only when the encryption attribute is changed from encrypted -> decrypted
> >>>> and vice versa. By default all guest pages are considered encrypted.
> >>>>
> >>>> Cc: Thomas Gleixner <[email protected]>
> >>>> Cc: Ingo Molnar <[email protected]>
> >>>> Cc: "H. Peter Anvin" <[email protected]>
> >>>> Cc: Paolo Bonzini <[email protected]>
> >>>> Cc: "Radim Krčmář" <[email protected]>
> >>>> Cc: Joerg Roedel <[email protected]>
> >>>> Cc: Borislav Petkov <[email protected]>
> >>>> Cc: Tom Lendacky <[email protected]>
> >>>> Cc: [email protected]
> >>>> Cc: [email protected]
> >>>> Cc: [email protected]
> >>>> Signed-off-by: Brijesh Singh <[email protected]>
> >>>> Signed-off-by: Ashish Kalra <[email protected]>
> >>>> ---
> >>>> Documentation/virt/kvm/hypercalls.rst | 15 +++++
> >>>> arch/x86/include/asm/kvm_host.h | 2 +
> >>>> arch/x86/kvm/svm.c | 95 +++++++++++++++++++++++++++
> >>>> arch/x86/kvm/vmx/vmx.c | 1 +
> >>>> arch/x86/kvm/x86.c | 6 ++
> >>>> include/uapi/linux/kvm_para.h | 1 +
> >>>> 6 files changed, 120 insertions(+)
> >>>>
> >>>> diff --git a/Documentation/virt/kvm/hypercalls.rst b/Documentation/virt/kvm/hypercalls.rst
> >>>> index dbaf207e560d..ff5287e68e81 100644
> >>>> --- a/Documentation/virt/kvm/hypercalls.rst
> >>>> +++ b/Documentation/virt/kvm/hypercalls.rst
> >>>> @@ -169,3 +169,18 @@ a0: destination APIC ID
> >>>>
> >>>> :Usage example: When sending a call-function IPI-many to vCPUs, yield if
> >>>> any of the IPI target vCPUs was preempted.
> >>>> +
> >>>> +
> >>>> +8. KVM_HC_PAGE_ENC_STATUS
> >>>> +-------------------------
> >>>> +:Architecture: x86
> >>>> +:Status: active
> >>>> +:Purpose: Notify the encryption status changes in guest page table (SEV guest)
> >>>> +
> >>>> +a0: the guest physical address of the start page
> >>>> +a1: the number of pages
> >>>> +a2: encryption attribute
> >>>> +
> >>>> + Where:
> >>>> + * 1: Encryption attribute is set
> >>>> + * 0: Encryption attribute is cleared
> >>>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> >>>> index 98959e8cd448..90718fa3db47 100644
> >>>> --- a/arch/x86/include/asm/kvm_host.h
> >>>> +++ b/arch/x86/include/asm/kvm_host.h
> >>>> @@ -1267,6 +1267,8 @@ struct kvm_x86_ops {
> >>>>
> >>>> bool (*apic_init_signal_blocked)(struct kvm_vcpu *vcpu);
> >>>> int (*enable_direct_tlbflush)(struct kvm_vcpu *vcpu);
> >>>> + int (*page_enc_status_hc)(struct kvm *kvm, unsigned long gpa,
> >>>> + unsigned long sz, unsigned long mode);
> >>> Nit: spell out size instead of sz.
> >>>> };
> >>>>
> >>>> struct kvm_arch_async_pf {
> >>>> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> >>>> index 7c2721e18b06..1d8beaf1bceb 100644
> >>>> --- a/arch/x86/kvm/svm.c
> >>>> +++ b/arch/x86/kvm/svm.c
> >>>> @@ -136,6 +136,8 @@ struct kvm_sev_info {
> >>>> int fd; /* SEV device fd */
> >>>> unsigned long pages_locked; /* Number of pages locked */
> >>>> struct list_head regions_list; /* List of registered regions */
> >>>> + unsigned long *page_enc_bmap;
> >>>> + unsigned long page_enc_bmap_size;
> >>>> };
> >>>>
> >>>> struct kvm_svm {
> >>>> @@ -1991,6 +1993,9 @@ static void sev_vm_destroy(struct kvm *kvm)
> >>>>
> >>>> sev_unbind_asid(kvm, sev->handle);
> >>>> sev_asid_free(sev->asid);
> >>>> +
> >>>> + kvfree(sev->page_enc_bmap);
> >>>> + sev->page_enc_bmap = NULL;
> >>>> }
> >>>>
> >>>> static void avic_vm_destroy(struct kvm *kvm)
> >>>> @@ -7593,6 +7598,94 @@ static int sev_receive_finish(struct kvm *kvm, struct kvm_sev_cmd *argp)
> >>>> return ret;
> >>>> }
> >>>>
> >>>> +static int sev_resize_page_enc_bitmap(struct kvm *kvm, unsigned long new_size)
> >>>> +{
> >>>> + struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> >>>> + unsigned long *map;
> >>>> + unsigned long sz;
> >>>> +
> >>>> + if (sev->page_enc_bmap_size >= new_size)
> >>>> + return 0;
> >>>> +
> >>>> + sz = ALIGN(new_size, BITS_PER_LONG) / 8;
> >>>> +
> >>>> + map = vmalloc(sz);
> >>>> + if (!map) {
> >>>> + pr_err_once("Failed to allocate encrypted bitmap size %lx\n",
> >>>> + sz);
> >>>> + return -ENOMEM;
> >>>> + }
> >>>> +
> >>>> + /* mark the page encrypted (by default) */
> >>>> + memset(map, 0xff, sz);
> >>>> +
> >>>> + bitmap_copy(map, sev->page_enc_bmap, sev->page_enc_bmap_size);
> >>>> + kvfree(sev->page_enc_bmap);
> >>>> +
> >>>> + sev->page_enc_bmap = map;
> >>>> + sev->page_enc_bmap_size = new_size;
> >>>> +
> >>>> + return 0;
> >>>> +}
> >>>> +
> >>>> +static int svm_page_enc_status_hc(struct kvm *kvm, unsigned long gpa,
> >>>> + unsigned long npages, unsigned long enc)
> >>>> +{
> >>>> + struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> >>>> + kvm_pfn_t pfn_start, pfn_end;
> >>>> + gfn_t gfn_start, gfn_end;
> >>>> + int ret;
> >>>> +
> >>>> + if (!sev_guest(kvm))
> >>>> + return -EINVAL;
> >>>> +
> >>>> + if (!npages)
> >>>> + return 0;
> >>>> +
> >>>> + gfn_start = gpa_to_gfn(gpa);
> >>>> + gfn_end = gfn_start + npages;
> >>>> +
> >>>> + /* out of bound access error check */
> >>>> + if (gfn_end <= gfn_start)
> >>>> + return -EINVAL;
> >>>> +
> >>>> + /* lets make sure that gpa exist in our memslot */
> >>>> + pfn_start = gfn_to_pfn(kvm, gfn_start);
> >>>> + pfn_end = gfn_to_pfn(kvm, gfn_end);
> >>>> +
> >>>> + if (is_error_noslot_pfn(pfn_start) && !is_noslot_pfn(pfn_start)) {
> >>>> + /*
> >>>> + * Allow guest MMIO range(s) to be added
> >>>> + * to the page encryption bitmap.
> >>>> + */
> >>>> + return -EINVAL;
> >>>> + }
> >>>> +
> >>>> + if (is_error_noslot_pfn(pfn_end) && !is_noslot_pfn(pfn_end)) {
> >>>> + /*
> >>>> + * Allow guest MMIO range(s) to be added
> >>>> + * to the page encryption bitmap.
> >>>> + */
> >>>> + return -EINVAL;
> >>>> + }
> >>>> +
> >>>> + mutex_lock(&kvm->lock);
> >>>> + ret = sev_resize_page_enc_bitmap(kvm, gfn_end);
> >>>> + if (ret)
> >>>> + goto unlock;
> >>>> +
> >>>> + if (enc)
> >>>> + __bitmap_set(sev->page_enc_bmap, gfn_start,
> >>>> + gfn_end - gfn_start);
> >>>> + else
> >>>> + __bitmap_clear(sev->page_enc_bmap, gfn_start,
> >>>> + gfn_end - gfn_start);
> >>>> +
> >>>> +unlock:
> >>>> + mutex_unlock(&kvm->lock);
> >>>> + return ret;
> >>>> +}
> >>>> +
> >>>> static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
> >>>> {
> >>>> struct kvm_sev_cmd sev_cmd;
> >>>> @@ -7995,6 +8088,8 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
> >>>> .need_emulation_on_page_fault = svm_need_emulation_on_page_fault,
> >>>>
> >>>> .apic_init_signal_blocked = svm_apic_init_signal_blocked,
> >>>> +
> >>>> + .page_enc_status_hc = svm_page_enc_status_hc,
> >>>> };
> >>>>
> >>>> static int __init svm_init(void)
> >>>> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> >>>> index 079d9fbf278e..f68e76ee7f9c 100644
> >>>> --- a/arch/x86/kvm/vmx/vmx.c
> >>>> +++ b/arch/x86/kvm/vmx/vmx.c
> >>>> @@ -8001,6 +8001,7 @@ static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
> >>>> .nested_get_evmcs_version = NULL,
> >>>> .need_emulation_on_page_fault = vmx_need_emulation_on_page_fault,
> >>>> .apic_init_signal_blocked = vmx_apic_init_signal_blocked,
> >>>> + .page_enc_status_hc = NULL,
> >>>> };
> >>>>
> >>>> static void vmx_cleanup_l1d_flush(void)
> >>>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> >>>> index cf95c36cb4f4..68428eef2dde 100644
> >>>> --- a/arch/x86/kvm/x86.c
> >>>> +++ b/arch/x86/kvm/x86.c
> >>>> @@ -7564,6 +7564,12 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
> >>>> kvm_sched_yield(vcpu->kvm, a0);
> >>>> ret = 0;
> >>>> break;
> >>>> + case KVM_HC_PAGE_ENC_STATUS:
> >>>> + ret = -KVM_ENOSYS;
> >>>> + if (kvm_x86_ops->page_enc_status_hc)
> >>>> + ret = kvm_x86_ops->page_enc_status_hc(vcpu->kvm,
> >>>> + a0, a1, a2);
> >>>> + break;
> >>>> default:
> >>>> ret = -KVM_ENOSYS;
> >>>> break;
> >>>> diff --git a/include/uapi/linux/kvm_para.h b/include/uapi/linux/kvm_para.h
> >>>> index 8b86609849b9..847b83b75dc8 100644
> >>>> --- a/include/uapi/linux/kvm_para.h
> >>>> +++ b/include/uapi/linux/kvm_para.h
> >>>> @@ -29,6 +29,7 @@
> >>>> #define KVM_HC_CLOCK_PAIRING 9
> >>>> #define KVM_HC_SEND_IPI 10
> >>>> #define KVM_HC_SCHED_YIELD 11
> >>>> +#define KVM_HC_PAGE_ENC_STATUS 12
> >>>>
> >>>> /*
> >>>> * hypercalls use architecture specific
> >>>> --
> >>>> 2.17.1
> >>>>
> >>> I'm still not excited by the dynamic resizing. I believe the guest
> >>> hypercall can be called in atomic contexts, which makes me
> >>> particularly unexcited to see a potentially large vmalloc on the host
> >>> followed by filling the buffer. Particularly when the buffer might be
> >>> non-trivial in size (~1MB per 32GB, per some back of the envelope
> >>> math).
> >>>
> >> I think looking at more practical situations, most hypercalls will
> >> happen during the boot stage, when device specific initializations are
> >> happening, so typically the maximum page encryption bitmap size would
> >> be allocated early enough.
> >>
> >> In fact, initial hypercalls made by OVMF will probably allocate the
> >> maximum page bitmap size even before the kernel comes up, especially
> >> as they will be setting up page enc/dec status for MMIO, ROM, ACPI
> >> regions, PCI device memory, etc., and most importantly for
> >> "non-existent" high memory range (which will probably be the
> >> maximum size page encryption bitmap allocated/resized).
> >>
> >> Let me know if you have different thoughts on this ?
> > Hi Ashish,
> >
> > If this is not an issue in practice, we can just move past this. If we
> > are basically guaranteed that OVMF will trigger hypercalls that expand
> > the bitmap beyond the top of memory, then, yes, that should work. That
> > leaves me slightly nervous that OVMF might regress since it's not
> > obvious that calling a hypercall beyond the top of memory would be
> > "required" for avoiding a somewhat indirectly related issue in guest
> > kernels.
>
>
> If possible then we should try to avoid growing/shrinking the bitmap .
> Today OVMF may not be accessing beyond memory but a malicious guest
> could send a hypercall down which can trigger a huge memory allocation
> on the host side and may eventually cause denial of service for other.
Nice catch! Was just writing up an email about this.
> I am in favor if we can find some solution to handle this case. How
> about Steve's suggestion about VMM making a call down to the kernel to
> tell how big the bitmap should be? Initially it should be equal to the
> guest RAM and if VMM ever did the memory expansion then it can send down
> another notification to increase the bitmap ?
>
> Optionally, instead of adding a new ioctl, I was wondering if we can
> extend the kvm_arch_prepare_memory_region() to make svm specific x86_ops
> which can take read the userspace provided memory region and calculate
> the amount of guest RAM managed by the KVM and grow/shrink the bitmap
> based on that information. I have not looked deep enough to see if its
> doable but if it can work then we can avoid adding yet another ioctl.
We also have the set bitmap ioctl in a later patch in this series. We
could also use the set ioctl for initialization (it's a little
excessive for initialization since there will be an additional
ephemeral allocation and a few additional buffer copies, but that's
probably fine). An enable_cap has the added benefit of probably being
necessary anyway so usermode can disable the migration feature flag.
In general, userspace is going to have to be in direct control of the
buffer and its size.
Hello Steve, Brijesh,
On Tue, Apr 07, 2020 at 05:35:57PM -0700, Steve Rutherford wrote:
> On Tue, Apr 7, 2020 at 5:29 PM Brijesh Singh <[email protected]> wrote:
> >
> >
> > On 4/7/20 7:01 PM, Steve Rutherford wrote:
> > > On Mon, Apr 6, 2020 at 10:27 PM Ashish Kalra <[email protected]> wrote:
> > >> Hello Steve,
> > >>
> > >> On Mon, Apr 06, 2020 at 07:17:37PM -0700, Steve Rutherford wrote:
> > >>> On Sun, Mar 29, 2020 at 11:22 PM Ashish Kalra <[email protected]> wrote:
> > >>>> From: Brijesh Singh <[email protected]>
> > >>>>
> > >>>> This hypercall is used by the SEV guest to notify a change in the page
> > >>>> encryption status to the hypervisor. The hypercall should be invoked
> > >>>> only when the encryption attribute is changed from encrypted -> decrypted
> > >>>> and vice versa. By default all guest pages are considered encrypted.
> > >>>>
> > >>>> Cc: Thomas Gleixner <[email protected]>
> > >>>> Cc: Ingo Molnar <[email protected]>
> > >>>> Cc: "H. Peter Anvin" <[email protected]>
> > >>>> Cc: Paolo Bonzini <[email protected]>
> > >>>> Cc: "Radim Krčmář" <[email protected]>
> > >>>> Cc: Joerg Roedel <[email protected]>
> > >>>> Cc: Borislav Petkov <[email protected]>
> > >>>> Cc: Tom Lendacky <[email protected]>
> > >>>> Cc: [email protected]
> > >>>> Cc: [email protected]
> > >>>> Cc: [email protected]
> > >>>> Signed-off-by: Brijesh Singh <[email protected]>
> > >>>> Signed-off-by: Ashish Kalra <[email protected]>
> > >>>> ---
> > >>>> Documentation/virt/kvm/hypercalls.rst | 15 +++++
> > >>>> arch/x86/include/asm/kvm_host.h | 2 +
> > >>>> arch/x86/kvm/svm.c | 95 +++++++++++++++++++++++++++
> > >>>> arch/x86/kvm/vmx/vmx.c | 1 +
> > >>>> arch/x86/kvm/x86.c | 6 ++
> > >>>> include/uapi/linux/kvm_para.h | 1 +
> > >>>> 6 files changed, 120 insertions(+)
> > >>>>
> > >>>> diff --git a/Documentation/virt/kvm/hypercalls.rst b/Documentation/virt/kvm/hypercalls.rst
> > >>>> index dbaf207e560d..ff5287e68e81 100644
> > >>>> --- a/Documentation/virt/kvm/hypercalls.rst
> > >>>> +++ b/Documentation/virt/kvm/hypercalls.rst
> > >>>> @@ -169,3 +169,18 @@ a0: destination APIC ID
> > >>>>
> > >>>> :Usage example: When sending a call-function IPI-many to vCPUs, yield if
> > >>>> any of the IPI target vCPUs was preempted.
> > >>>> +
> > >>>> +
> > >>>> +8. KVM_HC_PAGE_ENC_STATUS
> > >>>> +-------------------------
> > >>>> +:Architecture: x86
> > >>>> +:Status: active
> > >>>> +:Purpose: Notify the encryption status changes in guest page table (SEV guest)
> > >>>> +
> > >>>> +a0: the guest physical address of the start page
> > >>>> +a1: the number of pages
> > >>>> +a2: encryption attribute
> > >>>> +
> > >>>> + Where:
> > >>>> + * 1: Encryption attribute is set
> > >>>> + * 0: Encryption attribute is cleared
> > >>>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > >>>> index 98959e8cd448..90718fa3db47 100644
> > >>>> --- a/arch/x86/include/asm/kvm_host.h
> > >>>> +++ b/arch/x86/include/asm/kvm_host.h
> > >>>> @@ -1267,6 +1267,8 @@ struct kvm_x86_ops {
> > >>>>
> > >>>> bool (*apic_init_signal_blocked)(struct kvm_vcpu *vcpu);
> > >>>> int (*enable_direct_tlbflush)(struct kvm_vcpu *vcpu);
> > >>>> + int (*page_enc_status_hc)(struct kvm *kvm, unsigned long gpa,
> > >>>> + unsigned long sz, unsigned long mode);
> > >>> Nit: spell out size instead of sz.
> > >>>> };
> > >>>>
> > >>>> struct kvm_arch_async_pf {
> > >>>> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> > >>>> index 7c2721e18b06..1d8beaf1bceb 100644
> > >>>> --- a/arch/x86/kvm/svm.c
> > >>>> +++ b/arch/x86/kvm/svm.c
> > >>>> @@ -136,6 +136,8 @@ struct kvm_sev_info {
> > >>>> int fd; /* SEV device fd */
> > >>>> unsigned long pages_locked; /* Number of pages locked */
> > >>>> struct list_head regions_list; /* List of registered regions */
> > >>>> + unsigned long *page_enc_bmap;
> > >>>> + unsigned long page_enc_bmap_size;
> > >>>> };
> > >>>>
> > >>>> struct kvm_svm {
> > >>>> @@ -1991,6 +1993,9 @@ static void sev_vm_destroy(struct kvm *kvm)
> > >>>>
> > >>>> sev_unbind_asid(kvm, sev->handle);
> > >>>> sev_asid_free(sev->asid);
> > >>>> +
> > >>>> + kvfree(sev->page_enc_bmap);
> > >>>> + sev->page_enc_bmap = NULL;
> > >>>> }
> > >>>>
> > >>>> static void avic_vm_destroy(struct kvm *kvm)
> > >>>> @@ -7593,6 +7598,94 @@ static int sev_receive_finish(struct kvm *kvm, struct kvm_sev_cmd *argp)
> > >>>> return ret;
> > >>>> }
> > >>>>
> > >>>> +static int sev_resize_page_enc_bitmap(struct kvm *kvm, unsigned long new_size)
> > >>>> +{
> > >>>> + struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> > >>>> + unsigned long *map;
> > >>>> + unsigned long sz;
> > >>>> +
> > >>>> + if (sev->page_enc_bmap_size >= new_size)
> > >>>> + return 0;
> > >>>> +
> > >>>> + sz = ALIGN(new_size, BITS_PER_LONG) / 8;
> > >>>> +
> > >>>> + map = vmalloc(sz);
> > >>>> + if (!map) {
> > >>>> + pr_err_once("Failed to allocate encrypted bitmap size %lx\n",
> > >>>> + sz);
> > >>>> + return -ENOMEM;
> > >>>> + }
> > >>>> +
> > >>>> + /* mark the page encrypted (by default) */
> > >>>> + memset(map, 0xff, sz);
> > >>>> +
> > >>>> + bitmap_copy(map, sev->page_enc_bmap, sev->page_enc_bmap_size);
> > >>>> + kvfree(sev->page_enc_bmap);
> > >>>> +
> > >>>> + sev->page_enc_bmap = map;
> > >>>> + sev->page_enc_bmap_size = new_size;
> > >>>> +
> > >>>> + return 0;
> > >>>> +}
> > >>>> +
> > >>>> +static int svm_page_enc_status_hc(struct kvm *kvm, unsigned long gpa,
> > >>>> + unsigned long npages, unsigned long enc)
> > >>>> +{
> > >>>> + struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> > >>>> + kvm_pfn_t pfn_start, pfn_end;
> > >>>> + gfn_t gfn_start, gfn_end;
> > >>>> + int ret;
> > >>>> +
> > >>>> + if (!sev_guest(kvm))
> > >>>> + return -EINVAL;
> > >>>> +
> > >>>> + if (!npages)
> > >>>> + return 0;
> > >>>> +
> > >>>> + gfn_start = gpa_to_gfn(gpa);
> > >>>> + gfn_end = gfn_start + npages;
> > >>>> +
> > >>>> + /* out of bound access error check */
> > >>>> + if (gfn_end <= gfn_start)
> > >>>> + return -EINVAL;
> > >>>> +
> > >>>> + /* lets make sure that gpa exist in our memslot */
> > >>>> + pfn_start = gfn_to_pfn(kvm, gfn_start);
> > >>>> + pfn_end = gfn_to_pfn(kvm, gfn_end);
> > >>>> +
> > >>>> + if (is_error_noslot_pfn(pfn_start) && !is_noslot_pfn(pfn_start)) {
> > >>>> + /*
> > >>>> + * Allow guest MMIO range(s) to be added
> > >>>> + * to the page encryption bitmap.
> > >>>> + */
> > >>>> + return -EINVAL;
> > >>>> + }
> > >>>> +
> > >>>> + if (is_error_noslot_pfn(pfn_end) && !is_noslot_pfn(pfn_end)) {
> > >>>> + /*
> > >>>> + * Allow guest MMIO range(s) to be added
> > >>>> + * to the page encryption bitmap.
> > >>>> + */
> > >>>> + return -EINVAL;
> > >>>> + }
> > >>>> +
> > >>>> + mutex_lock(&kvm->lock);
> > >>>> + ret = sev_resize_page_enc_bitmap(kvm, gfn_end);
> > >>>> + if (ret)
> > >>>> + goto unlock;
> > >>>> +
> > >>>> + if (enc)
> > >>>> + __bitmap_set(sev->page_enc_bmap, gfn_start,
> > >>>> + gfn_end - gfn_start);
> > >>>> + else
> > >>>> + __bitmap_clear(sev->page_enc_bmap, gfn_start,
> > >>>> + gfn_end - gfn_start);
> > >>>> +
> > >>>> +unlock:
> > >>>> + mutex_unlock(&kvm->lock);
> > >>>> + return ret;
> > >>>> +}
> > >>>> +
> > >>>> static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
> > >>>> {
> > >>>> struct kvm_sev_cmd sev_cmd;
> > >>>> @@ -7995,6 +8088,8 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
> > >>>> .need_emulation_on_page_fault = svm_need_emulation_on_page_fault,
> > >>>>
> > >>>> .apic_init_signal_blocked = svm_apic_init_signal_blocked,
> > >>>> +
> > >>>> + .page_enc_status_hc = svm_page_enc_status_hc,
> > >>>> };
> > >>>>
> > >>>> static int __init svm_init(void)
> > >>>> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> > >>>> index 079d9fbf278e..f68e76ee7f9c 100644
> > >>>> --- a/arch/x86/kvm/vmx/vmx.c
> > >>>> +++ b/arch/x86/kvm/vmx/vmx.c
> > >>>> @@ -8001,6 +8001,7 @@ static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
> > >>>> .nested_get_evmcs_version = NULL,
> > >>>> .need_emulation_on_page_fault = vmx_need_emulation_on_page_fault,
> > >>>> .apic_init_signal_blocked = vmx_apic_init_signal_blocked,
> > >>>> + .page_enc_status_hc = NULL,
> > >>>> };
> > >>>>
> > >>>> static void vmx_cleanup_l1d_flush(void)
> > >>>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > >>>> index cf95c36cb4f4..68428eef2dde 100644
> > >>>> --- a/arch/x86/kvm/x86.c
> > >>>> +++ b/arch/x86/kvm/x86.c
> > >>>> @@ -7564,6 +7564,12 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
> > >>>> kvm_sched_yield(vcpu->kvm, a0);
> > >>>> ret = 0;
> > >>>> break;
> > >>>> + case KVM_HC_PAGE_ENC_STATUS:
> > >>>> + ret = -KVM_ENOSYS;
> > >>>> + if (kvm_x86_ops->page_enc_status_hc)
> > >>>> + ret = kvm_x86_ops->page_enc_status_hc(vcpu->kvm,
> > >>>> + a0, a1, a2);
> > >>>> + break;
> > >>>> default:
> > >>>> ret = -KVM_ENOSYS;
> > >>>> break;
> > >>>> diff --git a/include/uapi/linux/kvm_para.h b/include/uapi/linux/kvm_para.h
> > >>>> index 8b86609849b9..847b83b75dc8 100644
> > >>>> --- a/include/uapi/linux/kvm_para.h
> > >>>> +++ b/include/uapi/linux/kvm_para.h
> > >>>> @@ -29,6 +29,7 @@
> > >>>> #define KVM_HC_CLOCK_PAIRING 9
> > >>>> #define KVM_HC_SEND_IPI 10
> > >>>> #define KVM_HC_SCHED_YIELD 11
> > >>>> +#define KVM_HC_PAGE_ENC_STATUS 12
> > >>>>
> > >>>> /*
> > >>>> * hypercalls use architecture specific
> > >>>> --
> > >>>> 2.17.1
> > >>>>
> > >>> I'm still not excited by the dynamic resizing. I believe the guest
> > >>> hypercall can be called in atomic contexts, which makes me
> > >>> particularly unexcited to see a potentially large vmalloc on the host
> > >>> followed by filling the buffer. Particularly when the buffer might be
> > >>> non-trivial in size (~1MB per 32GB, per some back of the envelope
> > >>> math).
> > >>>
> > >> I think looking at more practical situations, most hypercalls will
> > >> happen during the boot stage, when device specific initializations are
> > >> happening, so typically the maximum page encryption bitmap size would
> > >> be allocated early enough.
> > >>
> > >> In fact, initial hypercalls made by OVMF will probably allocate the
> > >> maximum page bitmap size even before the kernel comes up, especially
> > >> as they will be setting up page enc/dec status for MMIO, ROM, ACPI
> > >> regions, PCI device memory, etc., and most importantly for
> > >> "non-existent" high memory range (which will probably be the
> > >> maximum size page encryption bitmap allocated/resized).
> > >>
> > >> Let me know if you have different thoughts on this ?
> > > Hi Ashish,
> > >
> > > If this is not an issue in practice, we can just move past this. If we
> > > are basically guaranteed that OVMF will trigger hypercalls that expand
> > > the bitmap beyond the top of memory, then, yes, that should work. That
> > > leaves me slightly nervous that OVMF might regress since it's not
> > > obvious that calling a hypercall beyond the top of memory would be
> > > "required" for avoiding a somewhat indirectly related issue in guest
> > > kernels.
> >
> >
> > If possible then we should try to avoid growing/shrinking the bitmap .
> > Today OVMF may not be accessing beyond memory but a malicious guest
> > could send a hypercall down which can trigger a huge memory allocation
> > on the host side and may eventually cause denial of service for other.
> Nice catch! Was just writing up an email about this.
> > I am in favor if we can find some solution to handle this case. How
> > about Steve's suggestion about VMM making a call down to the kernel to
> > tell how big the bitmap should be? Initially it should be equal to the
> > guest RAM and if VMM ever did the memory expansion then it can send down
> > another notification to increase the bitmap ?
> >
> > Optionally, instead of adding a new ioctl, I was wondering if we can
> > extend the kvm_arch_prepare_memory_region() to make svm specific x86_ops
> > which can take read the userspace provided memory region and calculate
> > the amount of guest RAM managed by the KVM and grow/shrink the bitmap
> > based on that information. I have not looked deep enough to see if its
> > doable but if it can work then we can avoid adding yet another ioctl.
> We also have the set bitmap ioctl in a later patch in this series. We
> could also use the set ioctl for initialization (it's a little
> excessive for initialization since there will be an additional
> ephemeral allocation and a few additional buffer copies, but that's
> probably fine). An enable_cap has the added benefit of probably being
> necessary anyway so usermode can disable the migration feature flag.
>
> In general, userspace is going to have to be in direct control of the
> buffer and its size.
My only practical concern about setting a static bitmap size based on guest
memory is about the hypercalls being made initially by OVMF to set page
enc/dec status for ROM, ACPI regions and especially the non-existent
high memory range. The new ioctl will statically setup bitmap size to
whatever guest RAM is specified, say 4G, 8G, etc., but the OVMF
hypercall for non-existent memory will try to do a hypercall for guest
physical memory range like ~6G->64G (for 4G guest RAM setup), this
hypercall will basically have to just return doing nothing, because
the allocated bitmap won't have this guest physical range available ?
Also, hypercalls for ROM, ACPI, device regions and any memory holes within
the static bitmap setup as per guest RAM config will work, but what
about hypercalls for any device regions beyond the guest RAM config ?
Thanks,
Ashish
On Mon, Apr 6, 2020 at 11:53 AM Krish Sadhukhan
<[email protected]> wrote:
>
>
> On 4/3/20 2:45 PM, Ashish Kalra wrote:
> > On Fri, Apr 03, 2020 at 02:14:23PM -0700, Krish Sadhukhan wrote:
> >> On 3/29/20 11:23 PM, Ashish Kalra wrote:
> >>> From: Ashish Kalra <[email protected]>
> >>>
> >>> This ioctl can be used by the application to reset the page
> >>> encryption bitmap managed by the KVM driver. A typical usage
> >>> for this ioctl is on VM reboot, on reboot, we must reinitialize
> >>> the bitmap.
> >>>
> >>> Signed-off-by: Ashish Kalra <[email protected]>
> >>> ---
> >>> Documentation/virt/kvm/api.rst | 13 +++++++++++++
> >>> arch/x86/include/asm/kvm_host.h | 1 +
> >>> arch/x86/kvm/svm.c | 16 ++++++++++++++++
> >>> arch/x86/kvm/x86.c | 6 ++++++
> >>> include/uapi/linux/kvm.h | 1 +
> >>> 5 files changed, 37 insertions(+)
> >>>
> >>> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> >>> index 4d1004a154f6..a11326ccc51d 100644
> >>> --- a/Documentation/virt/kvm/api.rst
> >>> +++ b/Documentation/virt/kvm/api.rst
> >>> @@ -4698,6 +4698,19 @@ During the guest live migration the outgoing guest exports its page encryption
> >>> bitmap, the KVM_SET_PAGE_ENC_BITMAP can be used to build the page encryption
> >>> bitmap for an incoming guest.
> >>> +4.127 KVM_PAGE_ENC_BITMAP_RESET (vm ioctl)
> >>> +-----------------------------------------
> >>> +
> >>> +:Capability: basic
> >>> +:Architectures: x86
> >>> +:Type: vm ioctl
> >>> +:Parameters: none
> >>> +:Returns: 0 on success, -1 on error
> >>> +
> >>> +The KVM_PAGE_ENC_BITMAP_RESET is used to reset the guest's page encryption
> >>> +bitmap during guest reboot and this is only done on the guest's boot vCPU.
> >>> +
> >>> +
> >>> 5. The kvm_run structure
> >>> ========================
> >>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> >>> index d30f770aaaea..a96ef6338cd2 100644
> >>> --- a/arch/x86/include/asm/kvm_host.h
> >>> +++ b/arch/x86/include/asm/kvm_host.h
> >>> @@ -1273,6 +1273,7 @@ struct kvm_x86_ops {
> >>> struct kvm_page_enc_bitmap *bmap);
> >>> int (*set_page_enc_bitmap)(struct kvm *kvm,
> >>> struct kvm_page_enc_bitmap *bmap);
> >>> + int (*reset_page_enc_bitmap)(struct kvm *kvm);
> >>> };
> >>> struct kvm_arch_async_pf {
> >>> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> >>> index 313343a43045..c99b0207a443 100644
> >>> --- a/arch/x86/kvm/svm.c
> >>> +++ b/arch/x86/kvm/svm.c
> >>> @@ -7797,6 +7797,21 @@ static int svm_set_page_enc_bitmap(struct kvm *kvm,
> >>> return ret;
> >>> }
> >>> +static int svm_reset_page_enc_bitmap(struct kvm *kvm)
> >>> +{
> >>> + struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> >>> +
> >>> + if (!sev_guest(kvm))
> >>> + return -ENOTTY;
> >>> +
> >>> + mutex_lock(&kvm->lock);
> >>> + /* by default all pages should be marked encrypted */
> >>> + if (sev->page_enc_bmap_size)
> >>> + bitmap_fill(sev->page_enc_bmap, sev->page_enc_bmap_size);
> >>> + mutex_unlock(&kvm->lock);
> >>> + return 0;
> >>> +}
> >>> +
> >>> static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
> >>> {
> >>> struct kvm_sev_cmd sev_cmd;
> >>> @@ -8203,6 +8218,7 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
> >>> .page_enc_status_hc = svm_page_enc_status_hc,
> >>> .get_page_enc_bitmap = svm_get_page_enc_bitmap,
> >>> .set_page_enc_bitmap = svm_set_page_enc_bitmap,
> >>> + .reset_page_enc_bitmap = svm_reset_page_enc_bitmap,
> >>
> >> We don't need to initialize the intel ops to NULL ? It's not initialized in
> >> the previous patch either.
> >>
> >>> };
> > This struct is declared as "static storage", so won't the non-initialized
> > members be 0 ?
>
>
> Correct. Although, I see that 'nested_enable_evmcs' is explicitly
> initialized. We should maintain the convention, perhaps.
>
> >
> >>> static int __init svm_init(void)
> >>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> >>> index 05e953b2ec61..2127ed937f53 100644
> >>> --- a/arch/x86/kvm/x86.c
> >>> +++ b/arch/x86/kvm/x86.c
> >>> @@ -5250,6 +5250,12 @@ long kvm_arch_vm_ioctl(struct file *filp,
> >>> r = kvm_x86_ops->set_page_enc_bitmap(kvm, &bitmap);
> >>> break;
> >>> }
> >>> + case KVM_PAGE_ENC_BITMAP_RESET: {
> >>> + r = -ENOTTY;
> >>> + if (kvm_x86_ops->reset_page_enc_bitmap)
> >>> + r = kvm_x86_ops->reset_page_enc_bitmap(kvm);
> >>> + break;
> >>> + }
> >>> default:
> >>> r = -ENOTTY;
> >>> }
> >>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> >>> index b4b01d47e568..0884a581fc37 100644
> >>> --- a/include/uapi/linux/kvm.h
> >>> +++ b/include/uapi/linux/kvm.h
> >>> @@ -1490,6 +1490,7 @@ struct kvm_enc_region {
> >>> #define KVM_GET_PAGE_ENC_BITMAP _IOW(KVMIO, 0xc5, struct kvm_page_enc_bitmap)
> >>> #define KVM_SET_PAGE_ENC_BITMAP _IOW(KVMIO, 0xc6, struct kvm_page_enc_bitmap)
> >>> +#define KVM_PAGE_ENC_BITMAP_RESET _IO(KVMIO, 0xc7)
> >>> /* Secure Encrypted Virtualization command */
> >>> enum sev_cmd_id {
> >> Reviewed-by: Krish Sadhukhan <[email protected]>
Doesn't this overlap with the set ioctl? Yes, obviously, you have to
copy the new value down and do a bit more work, but I don't think
resetting the bitmap is going to be the bottleneck on reboot. Seems
excessive to add another ioctl for this.
On Tue, Apr 7, 2020 at 6:17 PM Ashish Kalra <[email protected]> wrote:
>
> Hello Steve, Brijesh,
>
> On Tue, Apr 07, 2020 at 05:35:57PM -0700, Steve Rutherford wrote:
> > On Tue, Apr 7, 2020 at 5:29 PM Brijesh Singh <[email protected]> wrote:
> > >
> > >
> > > On 4/7/20 7:01 PM, Steve Rutherford wrote:
> > > > On Mon, Apr 6, 2020 at 10:27 PM Ashish Kalra <[email protected]> wrote:
> > > >> Hello Steve,
> > > >>
> > > >> On Mon, Apr 06, 2020 at 07:17:37PM -0700, Steve Rutherford wrote:
> > > >>> On Sun, Mar 29, 2020 at 11:22 PM Ashish Kalra <[email protected]> wrote:
> > > >>>> From: Brijesh Singh <[email protected]>
> > > >>>>
> > > >>>> This hypercall is used by the SEV guest to notify a change in the page
> > > >>>> encryption status to the hypervisor. The hypercall should be invoked
> > > >>>> only when the encryption attribute is changed from encrypted -> decrypted
> > > >>>> and vice versa. By default all guest pages are considered encrypted.
> > > >>>>
> > > >>>> Cc: Thomas Gleixner <[email protected]>
> > > >>>> Cc: Ingo Molnar <[email protected]>
> > > >>>> Cc: "H. Peter Anvin" <[email protected]>
> > > >>>> Cc: Paolo Bonzini <[email protected]>
> > > >>>> Cc: "Radim Krčmář" <[email protected]>
> > > >>>> Cc: Joerg Roedel <[email protected]>
> > > >>>> Cc: Borislav Petkov <[email protected]>
> > > >>>> Cc: Tom Lendacky <[email protected]>
> > > >>>> Cc: [email protected]
> > > >>>> Cc: [email protected]
> > > >>>> Cc: [email protected]
> > > >>>> Signed-off-by: Brijesh Singh <[email protected]>
> > > >>>> Signed-off-by: Ashish Kalra <[email protected]>
> > > >>>> ---
> > > >>>> Documentation/virt/kvm/hypercalls.rst | 15 +++++
> > > >>>> arch/x86/include/asm/kvm_host.h | 2 +
> > > >>>> arch/x86/kvm/svm.c | 95 +++++++++++++++++++++++++++
> > > >>>> arch/x86/kvm/vmx/vmx.c | 1 +
> > > >>>> arch/x86/kvm/x86.c | 6 ++
> > > >>>> include/uapi/linux/kvm_para.h | 1 +
> > > >>>> 6 files changed, 120 insertions(+)
> > > >>>>
> > > >>>> diff --git a/Documentation/virt/kvm/hypercalls.rst b/Documentation/virt/kvm/hypercalls.rst
> > > >>>> index dbaf207e560d..ff5287e68e81 100644
> > > >>>> --- a/Documentation/virt/kvm/hypercalls.rst
> > > >>>> +++ b/Documentation/virt/kvm/hypercalls.rst
> > > >>>> @@ -169,3 +169,18 @@ a0: destination APIC ID
> > > >>>>
> > > >>>> :Usage example: When sending a call-function IPI-many to vCPUs, yield if
> > > >>>> any of the IPI target vCPUs was preempted.
> > > >>>> +
> > > >>>> +
> > > >>>> +8. KVM_HC_PAGE_ENC_STATUS
> > > >>>> +-------------------------
> > > >>>> +:Architecture: x86
> > > >>>> +:Status: active
> > > >>>> +:Purpose: Notify the encryption status changes in guest page table (SEV guest)
> > > >>>> +
> > > >>>> +a0: the guest physical address of the start page
> > > >>>> +a1: the number of pages
> > > >>>> +a2: encryption attribute
> > > >>>> +
> > > >>>> + Where:
> > > >>>> + * 1: Encryption attribute is set
> > > >>>> + * 0: Encryption attribute is cleared
> > > >>>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > > >>>> index 98959e8cd448..90718fa3db47 100644
> > > >>>> --- a/arch/x86/include/asm/kvm_host.h
> > > >>>> +++ b/arch/x86/include/asm/kvm_host.h
> > > >>>> @@ -1267,6 +1267,8 @@ struct kvm_x86_ops {
> > > >>>>
> > > >>>> bool (*apic_init_signal_blocked)(struct kvm_vcpu *vcpu);
> > > >>>> int (*enable_direct_tlbflush)(struct kvm_vcpu *vcpu);
> > > >>>> + int (*page_enc_status_hc)(struct kvm *kvm, unsigned long gpa,
> > > >>>> + unsigned long sz, unsigned long mode);
> > > >>> Nit: spell out size instead of sz.
> > > >>>> };
> > > >>>>
> > > >>>> struct kvm_arch_async_pf {
> > > >>>> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> > > >>>> index 7c2721e18b06..1d8beaf1bceb 100644
> > > >>>> --- a/arch/x86/kvm/svm.c
> > > >>>> +++ b/arch/x86/kvm/svm.c
> > > >>>> @@ -136,6 +136,8 @@ struct kvm_sev_info {
> > > >>>> int fd; /* SEV device fd */
> > > >>>> unsigned long pages_locked; /* Number of pages locked */
> > > >>>> struct list_head regions_list; /* List of registered regions */
> > > >>>> + unsigned long *page_enc_bmap;
> > > >>>> + unsigned long page_enc_bmap_size;
> > > >>>> };
> > > >>>>
> > > >>>> struct kvm_svm {
> > > >>>> @@ -1991,6 +1993,9 @@ static void sev_vm_destroy(struct kvm *kvm)
> > > >>>>
> > > >>>> sev_unbind_asid(kvm, sev->handle);
> > > >>>> sev_asid_free(sev->asid);
> > > >>>> +
> > > >>>> + kvfree(sev->page_enc_bmap);
> > > >>>> + sev->page_enc_bmap = NULL;
> > > >>>> }
> > > >>>>
> > > >>>> static void avic_vm_destroy(struct kvm *kvm)
> > > >>>> @@ -7593,6 +7598,94 @@ static int sev_receive_finish(struct kvm *kvm, struct kvm_sev_cmd *argp)
> > > >>>> return ret;
> > > >>>> }
> > > >>>>
> > > >>>> +static int sev_resize_page_enc_bitmap(struct kvm *kvm, unsigned long new_size)
> > > >>>> +{
> > > >>>> + struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> > > >>>> + unsigned long *map;
> > > >>>> + unsigned long sz;
> > > >>>> +
> > > >>>> + if (sev->page_enc_bmap_size >= new_size)
> > > >>>> + return 0;
> > > >>>> +
> > > >>>> + sz = ALIGN(new_size, BITS_PER_LONG) / 8;
> > > >>>> +
> > > >>>> + map = vmalloc(sz);
> > > >>>> + if (!map) {
> > > >>>> + pr_err_once("Failed to allocate encrypted bitmap size %lx\n",
> > > >>>> + sz);
> > > >>>> + return -ENOMEM;
> > > >>>> + }
> > > >>>> +
> > > >>>> + /* mark the page encrypted (by default) */
> > > >>>> + memset(map, 0xff, sz);
> > > >>>> +
> > > >>>> + bitmap_copy(map, sev->page_enc_bmap, sev->page_enc_bmap_size);
> > > >>>> + kvfree(sev->page_enc_bmap);
> > > >>>> +
> > > >>>> + sev->page_enc_bmap = map;
> > > >>>> + sev->page_enc_bmap_size = new_size;
> > > >>>> +
> > > >>>> + return 0;
> > > >>>> +}
> > > >>>> +
> > > >>>> +static int svm_page_enc_status_hc(struct kvm *kvm, unsigned long gpa,
> > > >>>> + unsigned long npages, unsigned long enc)
> > > >>>> +{
> > > >>>> + struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> > > >>>> + kvm_pfn_t pfn_start, pfn_end;
> > > >>>> + gfn_t gfn_start, gfn_end;
> > > >>>> + int ret;
> > > >>>> +
> > > >>>> + if (!sev_guest(kvm))
> > > >>>> + return -EINVAL;
> > > >>>> +
> > > >>>> + if (!npages)
> > > >>>> + return 0;
> > > >>>> +
> > > >>>> + gfn_start = gpa_to_gfn(gpa);
> > > >>>> + gfn_end = gfn_start + npages;
> > > >>>> +
> > > >>>> + /* out of bound access error check */
> > > >>>> + if (gfn_end <= gfn_start)
> > > >>>> + return -EINVAL;
> > > >>>> +
> > > >>>> + /* lets make sure that gpa exist in our memslot */
> > > >>>> + pfn_start = gfn_to_pfn(kvm, gfn_start);
> > > >>>> + pfn_end = gfn_to_pfn(kvm, gfn_end);
> > > >>>> +
> > > >>>> + if (is_error_noslot_pfn(pfn_start) && !is_noslot_pfn(pfn_start)) {
> > > >>>> + /*
> > > >>>> + * Allow guest MMIO range(s) to be added
> > > >>>> + * to the page encryption bitmap.
> > > >>>> + */
> > > >>>> + return -EINVAL;
> > > >>>> + }
> > > >>>> +
> > > >>>> + if (is_error_noslot_pfn(pfn_end) && !is_noslot_pfn(pfn_end)) {
> > > >>>> + /*
> > > >>>> + * Allow guest MMIO range(s) to be added
> > > >>>> + * to the page encryption bitmap.
> > > >>>> + */
> > > >>>> + return -EINVAL;
> > > >>>> + }
> > > >>>> +
> > > >>>> + mutex_lock(&kvm->lock);
> > > >>>> + ret = sev_resize_page_enc_bitmap(kvm, gfn_end);
> > > >>>> + if (ret)
> > > >>>> + goto unlock;
> > > >>>> +
> > > >>>> + if (enc)
> > > >>>> + __bitmap_set(sev->page_enc_bmap, gfn_start,
> > > >>>> + gfn_end - gfn_start);
> > > >>>> + else
> > > >>>> + __bitmap_clear(sev->page_enc_bmap, gfn_start,
> > > >>>> + gfn_end - gfn_start);
> > > >>>> +
> > > >>>> +unlock:
> > > >>>> + mutex_unlock(&kvm->lock);
> > > >>>> + return ret;
> > > >>>> +}
> > > >>>> +
> > > >>>> static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
> > > >>>> {
> > > >>>> struct kvm_sev_cmd sev_cmd;
> > > >>>> @@ -7995,6 +8088,8 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
> > > >>>> .need_emulation_on_page_fault = svm_need_emulation_on_page_fault,
> > > >>>>
> > > >>>> .apic_init_signal_blocked = svm_apic_init_signal_blocked,
> > > >>>> +
> > > >>>> + .page_enc_status_hc = svm_page_enc_status_hc,
> > > >>>> };
> > > >>>>
> > > >>>> static int __init svm_init(void)
> > > >>>> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> > > >>>> index 079d9fbf278e..f68e76ee7f9c 100644
> > > >>>> --- a/arch/x86/kvm/vmx/vmx.c
> > > >>>> +++ b/arch/x86/kvm/vmx/vmx.c
> > > >>>> @@ -8001,6 +8001,7 @@ static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
> > > >>>> .nested_get_evmcs_version = NULL,
> > > >>>> .need_emulation_on_page_fault = vmx_need_emulation_on_page_fault,
> > > >>>> .apic_init_signal_blocked = vmx_apic_init_signal_blocked,
> > > >>>> + .page_enc_status_hc = NULL,
> > > >>>> };
> > > >>>>
> > > >>>> static void vmx_cleanup_l1d_flush(void)
> > > >>>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > > >>>> index cf95c36cb4f4..68428eef2dde 100644
> > > >>>> --- a/arch/x86/kvm/x86.c
> > > >>>> +++ b/arch/x86/kvm/x86.c
> > > >>>> @@ -7564,6 +7564,12 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
> > > >>>> kvm_sched_yield(vcpu->kvm, a0);
> > > >>>> ret = 0;
> > > >>>> break;
> > > >>>> + case KVM_HC_PAGE_ENC_STATUS:
> > > >>>> + ret = -KVM_ENOSYS;
> > > >>>> + if (kvm_x86_ops->page_enc_status_hc)
> > > >>>> + ret = kvm_x86_ops->page_enc_status_hc(vcpu->kvm,
> > > >>>> + a0, a1, a2);
> > > >>>> + break;
> > > >>>> default:
> > > >>>> ret = -KVM_ENOSYS;
> > > >>>> break;
> > > >>>> diff --git a/include/uapi/linux/kvm_para.h b/include/uapi/linux/kvm_para.h
> > > >>>> index 8b86609849b9..847b83b75dc8 100644
> > > >>>> --- a/include/uapi/linux/kvm_para.h
> > > >>>> +++ b/include/uapi/linux/kvm_para.h
> > > >>>> @@ -29,6 +29,7 @@
> > > >>>> #define KVM_HC_CLOCK_PAIRING 9
> > > >>>> #define KVM_HC_SEND_IPI 10
> > > >>>> #define KVM_HC_SCHED_YIELD 11
> > > >>>> +#define KVM_HC_PAGE_ENC_STATUS 12
> > > >>>>
> > > >>>> /*
> > > >>>> * hypercalls use architecture specific
> > > >>>> --
> > > >>>> 2.17.1
> > > >>>>
> > > >>> I'm still not excited by the dynamic resizing. I believe the guest
> > > >>> hypercall can be called in atomic contexts, which makes me
> > > >>> particularly unexcited to see a potentially large vmalloc on the host
> > > >>> followed by filling the buffer. Particularly when the buffer might be
> > > >>> non-trivial in size (~1MB per 32GB, per some back of the envelope
> > > >>> math).
> > > >>>
> > > >> I think looking at more practical situations, most hypercalls will
> > > >> happen during the boot stage, when device specific initializations are
> > > >> happening, so typically the maximum page encryption bitmap size would
> > > >> be allocated early enough.
> > > >>
> > > >> In fact, initial hypercalls made by OVMF will probably allocate the
> > > >> maximum page bitmap size even before the kernel comes up, especially
> > > >> as they will be setting up page enc/dec status for MMIO, ROM, ACPI
> > > >> regions, PCI device memory, etc., and most importantly for
> > > >> "non-existent" high memory range (which will probably be the
> > > >> maximum size page encryption bitmap allocated/resized).
> > > >>
> > > >> Let me know if you have different thoughts on this ?
> > > > Hi Ashish,
> > > >
> > > > If this is not an issue in practice, we can just move past this. If we
> > > > are basically guaranteed that OVMF will trigger hypercalls that expand
> > > > the bitmap beyond the top of memory, then, yes, that should work. That
> > > > leaves me slightly nervous that OVMF might regress since it's not
> > > > obvious that calling a hypercall beyond the top of memory would be
> > > > "required" for avoiding a somewhat indirectly related issue in guest
> > > > kernels.
> > >
> > >
> > > If possible then we should try to avoid growing/shrinking the bitmap .
> > > Today OVMF may not be accessing beyond memory but a malicious guest
> > > could send a hypercall down which can trigger a huge memory allocation
> > > on the host side and may eventually cause denial of service for other.
> > Nice catch! Was just writing up an email about this.
> > > I am in favor if we can find some solution to handle this case. How
> > > about Steve's suggestion about VMM making a call down to the kernel to
> > > tell how big the bitmap should be? Initially it should be equal to the
> > > guest RAM and if VMM ever did the memory expansion then it can send down
> > > another notification to increase the bitmap ?
> > >
> > > Optionally, instead of adding a new ioctl, I was wondering if we can
> > > extend the kvm_arch_prepare_memory_region() to make svm specific x86_ops
> > > which can take read the userspace provided memory region and calculate
> > > the amount of guest RAM managed by the KVM and grow/shrink the bitmap
> > > based on that information. I have not looked deep enough to see if its
> > > doable but if it can work then we can avoid adding yet another ioctl.
> > We also have the set bitmap ioctl in a later patch in this series. We
> > could also use the set ioctl for initialization (it's a little
> > excessive for initialization since there will be an additional
> > ephemeral allocation and a few additional buffer copies, but that's
> > probably fine). An enable_cap has the added benefit of probably being
> > necessary anyway so usermode can disable the migration feature flag.
> >
> > In general, userspace is going to have to be in direct control of the
> > buffer and its size.
>
> My only practical concern about setting a static bitmap size based on guest
> memory is about the hypercalls being made initially by OVMF to set page
> enc/dec status for ROM, ACPI regions and especially the non-existent
> high memory range. The new ioctl will statically setup bitmap size to
> whatever guest RAM is specified, say 4G, 8G, etc., but the OVMF
> hypercall for non-existent memory will try to do a hypercall for guest
> physical memory range like ~6G->64G (for 4G guest RAM setup), this
> hypercall will basically have to just return doing nothing, because
> the allocated bitmap won't have this guest physical range available ?
>
> Also, hypercalls for ROM, ACPI, device regions and any memory holes within
> the static bitmap setup as per guest RAM config will work, but what
> about hypercalls for any device regions beyond the guest RAM config ?
>
> Thanks,
> Ashish
I'm not super familiar with what the address beyond the top of ram is
used for. If the memory is not backed by RAM, will it even matter for
migration? Sounds like the encryption for SEV won't even apply to it.
If we don't need to know what the c-bit state of an address is, we
don't need to track it. It doesn't hurt to track it (which is why I'm
not super concerned about tracking the memory holes).
Hello Steve,
On Tue, Apr 07, 2020 at 05:26:33PM -0700, Steve Rutherford wrote:
> On Sun, Mar 29, 2020 at 11:23 PM Ashish Kalra <[email protected]> wrote:
> >
> > From: Brijesh Singh <[email protected]>
> >
> > The ioctl can be used to set page encryption bitmap for an
> > incoming guest.
> >
> > Cc: Thomas Gleixner <[email protected]>
> > Cc: Ingo Molnar <[email protected]>
> > Cc: "H. Peter Anvin" <[email protected]>
> > Cc: Paolo Bonzini <[email protected]>
> > Cc: "Radim Krčmář" <[email protected]>
> > Cc: Joerg Roedel <[email protected]>
> > Cc: Borislav Petkov <[email protected]>
> > Cc: Tom Lendacky <[email protected]>
> > Cc: [email protected]
> > Cc: [email protected]
> > Cc: [email protected]
> > Signed-off-by: Brijesh Singh <[email protected]>
> > Signed-off-by: Ashish Kalra <[email protected]>
> > ---
> > Documentation/virt/kvm/api.rst | 22 +++++++++++++++++
> > arch/x86/include/asm/kvm_host.h | 2 ++
> > arch/x86/kvm/svm.c | 42 +++++++++++++++++++++++++++++++++
> > arch/x86/kvm/x86.c | 12 ++++++++++
> > include/uapi/linux/kvm.h | 1 +
> > 5 files changed, 79 insertions(+)
> >
> > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> > index 8ad800ebb54f..4d1004a154f6 100644
> > --- a/Documentation/virt/kvm/api.rst
> > +++ b/Documentation/virt/kvm/api.rst
> > @@ -4675,6 +4675,28 @@ or shared. The bitmap can be used during the guest migration, if the page
> > is private then userspace need to use SEV migration commands to transmit
> > the page.
> >
> > +4.126 KVM_SET_PAGE_ENC_BITMAP (vm ioctl)
> > +---------------------------------------
> > +
> > +:Capability: basic
> > +:Architectures: x86
> > +:Type: vm ioctl
> > +:Parameters: struct kvm_page_enc_bitmap (in/out)
> > +:Returns: 0 on success, -1 on error
> > +
> > +/* for KVM_SET_PAGE_ENC_BITMAP */
> > +struct kvm_page_enc_bitmap {
> > + __u64 start_gfn;
> > + __u64 num_pages;
> > + union {
> > + void __user *enc_bitmap; /* one bit per page */
> > + __u64 padding2;
> > + };
> > +};
> > +
> > +During the guest live migration the outgoing guest exports its page encryption
> > +bitmap, the KVM_SET_PAGE_ENC_BITMAP can be used to build the page encryption
> > +bitmap for an incoming guest.
> >
> > 5. The kvm_run structure
> > ========================
> > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > index 27e43e3ec9d8..d30f770aaaea 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -1271,6 +1271,8 @@ struct kvm_x86_ops {
> > unsigned long sz, unsigned long mode);
> > int (*get_page_enc_bitmap)(struct kvm *kvm,
> > struct kvm_page_enc_bitmap *bmap);
> > + int (*set_page_enc_bitmap)(struct kvm *kvm,
> > + struct kvm_page_enc_bitmap *bmap);
> > };
> >
> > struct kvm_arch_async_pf {
> > diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> > index bae783cd396a..313343a43045 100644
> > --- a/arch/x86/kvm/svm.c
> > +++ b/arch/x86/kvm/svm.c
> > @@ -7756,6 +7756,47 @@ static int svm_get_page_enc_bitmap(struct kvm *kvm,
> > return ret;
> > }
> >
> > +static int svm_set_page_enc_bitmap(struct kvm *kvm,
> > + struct kvm_page_enc_bitmap *bmap)
> > +{
> > + struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> > + unsigned long gfn_start, gfn_end;
> > + unsigned long *bitmap;
> > + unsigned long sz, i;
> > + int ret;
> > +
> > + if (!sev_guest(kvm))
> > + return -ENOTTY;
> > +
> > + gfn_start = bmap->start_gfn;
> > + gfn_end = gfn_start + bmap->num_pages;
> > +
> > + sz = ALIGN(bmap->num_pages, BITS_PER_LONG) / 8;
> > + bitmap = kmalloc(sz, GFP_KERNEL);
> > + if (!bitmap)
> > + return -ENOMEM;
> > +
> > + ret = -EFAULT;
> > + if (copy_from_user(bitmap, bmap->enc_bitmap, sz))
> > + goto out;
> > +
> > + mutex_lock(&kvm->lock);
> > + ret = sev_resize_page_enc_bitmap(kvm, gfn_end);
> I realize now that usermode could use this for initializing the
> minimum size of the enc bitmap, which probably solves my issue from
> the other thread.
> > + if (ret)
> > + goto unlock;
> > +
> > + i = gfn_start;
> > + for_each_clear_bit_from(i, bitmap, (gfn_end - gfn_start))
> > + clear_bit(i + gfn_start, sev->page_enc_bmap);
> This API seems a bit strange, since it can only clear bits. I would
> expect "set" to force the values to match the values passed down,
> instead of only ensuring that cleared bits in the input are also
> cleared in the kernel.
>
The sev_resize_page_enc_bitmap() will allocate a new bitmap and
set it to all 0xFF's, therefore, the code here simply clears the bits
in the bitmap as per the cleared bits in the input.
Thanks,
Ashish
> This should copy the values from userspace (and fix up the ends since
> byte alignment makes that complicated), instead of iterating in this
> way.
> > +
> > + ret = 0;
> > +unlock:
> > + mutex_unlock(&kvm->lock);
> > +out:
> > + kfree(bitmap);
> > + return ret;
> > +}
> > +
> > static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
> > {
> > struct kvm_sev_cmd sev_cmd;
> > @@ -8161,6 +8202,7 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
> >
> > .page_enc_status_hc = svm_page_enc_status_hc,
> > .get_page_enc_bitmap = svm_get_page_enc_bitmap,
> > + .set_page_enc_bitmap = svm_set_page_enc_bitmap,
> > };
> >
> > static int __init svm_init(void)
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 3c3fea4e20b5..05e953b2ec61 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -5238,6 +5238,18 @@ long kvm_arch_vm_ioctl(struct file *filp,
> > r = kvm_x86_ops->get_page_enc_bitmap(kvm, &bitmap);
> > break;
> > }
> > + case KVM_SET_PAGE_ENC_BITMAP: {
> > + struct kvm_page_enc_bitmap bitmap;
> > +
> > + r = -EFAULT;
> > + if (copy_from_user(&bitmap, argp, sizeof(bitmap)))
> > + goto out;
> > +
> > + r = -ENOTTY;
> > + if (kvm_x86_ops->set_page_enc_bitmap)
> > + r = kvm_x86_ops->set_page_enc_bitmap(kvm, &bitmap);
> > + break;
> > + }
> > default:
> > r = -ENOTTY;
> > }
> > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> > index db1ebf85e177..b4b01d47e568 100644
> > --- a/include/uapi/linux/kvm.h
> > +++ b/include/uapi/linux/kvm.h
> > @@ -1489,6 +1489,7 @@ struct kvm_enc_region {
> > #define KVM_S390_CLEAR_RESET _IO(KVMIO, 0xc4)
> >
> > #define KVM_GET_PAGE_ENC_BITMAP _IOW(KVMIO, 0xc5, struct kvm_page_enc_bitmap)
> > +#define KVM_SET_PAGE_ENC_BITMAP _IOW(KVMIO, 0xc6, struct kvm_page_enc_bitmap)
> >
> > /* Secure Encrypted Virtualization command */
> > enum sev_cmd_id {
> > --
> > 2.17.1
> >
Hello Steve,
On Tue, Apr 07, 2020 at 06:25:51PM -0700, Steve Rutherford wrote:
> On Mon, Apr 6, 2020 at 11:53 AM Krish Sadhukhan
> <[email protected]> wrote:
> >
> >
> > On 4/3/20 2:45 PM, Ashish Kalra wrote:
> > > On Fri, Apr 03, 2020 at 02:14:23PM -0700, Krish Sadhukhan wrote:
> > >> On 3/29/20 11:23 PM, Ashish Kalra wrote:
> > >>> From: Ashish Kalra <[email protected]>
> > >>>
> > >>> This ioctl can be used by the application to reset the page
> > >>> encryption bitmap managed by the KVM driver. A typical usage
> > >>> for this ioctl is on VM reboot, on reboot, we must reinitialize
> > >>> the bitmap.
> > >>>
> > >>> Signed-off-by: Ashish Kalra <[email protected]>
> > >>> ---
> > >>> Documentation/virt/kvm/api.rst | 13 +++++++++++++
> > >>> arch/x86/include/asm/kvm_host.h | 1 +
> > >>> arch/x86/kvm/svm.c | 16 ++++++++++++++++
> > >>> arch/x86/kvm/x86.c | 6 ++++++
> > >>> include/uapi/linux/kvm.h | 1 +
> > >>> 5 files changed, 37 insertions(+)
> > >>>
> > >>> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> > >>> index 4d1004a154f6..a11326ccc51d 100644
> > >>> --- a/Documentation/virt/kvm/api.rst
> > >>> +++ b/Documentation/virt/kvm/api.rst
> > >>> @@ -4698,6 +4698,19 @@ During the guest live migration the outgoing guest exports its page encryption
> > >>> bitmap, the KVM_SET_PAGE_ENC_BITMAP can be used to build the page encryption
> > >>> bitmap for an incoming guest.
> > >>> +4.127 KVM_PAGE_ENC_BITMAP_RESET (vm ioctl)
> > >>> +-----------------------------------------
> > >>> +
> > >>> +:Capability: basic
> > >>> +:Architectures: x86
> > >>> +:Type: vm ioctl
> > >>> +:Parameters: none
> > >>> +:Returns: 0 on success, -1 on error
> > >>> +
> > >>> +The KVM_PAGE_ENC_BITMAP_RESET is used to reset the guest's page encryption
> > >>> +bitmap during guest reboot and this is only done on the guest's boot vCPU.
> > >>> +
> > >>> +
> > >>> 5. The kvm_run structure
> > >>> ========================
> > >>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > >>> index d30f770aaaea..a96ef6338cd2 100644
> > >>> --- a/arch/x86/include/asm/kvm_host.h
> > >>> +++ b/arch/x86/include/asm/kvm_host.h
> > >>> @@ -1273,6 +1273,7 @@ struct kvm_x86_ops {
> > >>> struct kvm_page_enc_bitmap *bmap);
> > >>> int (*set_page_enc_bitmap)(struct kvm *kvm,
> > >>> struct kvm_page_enc_bitmap *bmap);
> > >>> + int (*reset_page_enc_bitmap)(struct kvm *kvm);
> > >>> };
> > >>> struct kvm_arch_async_pf {
> > >>> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> > >>> index 313343a43045..c99b0207a443 100644
> > >>> --- a/arch/x86/kvm/svm.c
> > >>> +++ b/arch/x86/kvm/svm.c
> > >>> @@ -7797,6 +7797,21 @@ static int svm_set_page_enc_bitmap(struct kvm *kvm,
> > >>> return ret;
> > >>> }
> > >>> +static int svm_reset_page_enc_bitmap(struct kvm *kvm)
> > >>> +{
> > >>> + struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> > >>> +
> > >>> + if (!sev_guest(kvm))
> > >>> + return -ENOTTY;
> > >>> +
> > >>> + mutex_lock(&kvm->lock);
> > >>> + /* by default all pages should be marked encrypted */
> > >>> + if (sev->page_enc_bmap_size)
> > >>> + bitmap_fill(sev->page_enc_bmap, sev->page_enc_bmap_size);
> > >>> + mutex_unlock(&kvm->lock);
> > >>> + return 0;
> > >>> +}
> > >>> +
> > >>> static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
> > >>> {
> > >>> struct kvm_sev_cmd sev_cmd;
> > >>> @@ -8203,6 +8218,7 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
> > >>> .page_enc_status_hc = svm_page_enc_status_hc,
> > >>> .get_page_enc_bitmap = svm_get_page_enc_bitmap,
> > >>> .set_page_enc_bitmap = svm_set_page_enc_bitmap,
> > >>> + .reset_page_enc_bitmap = svm_reset_page_enc_bitmap,
> > >>
> > >> We don't need to initialize the intel ops to NULL ? It's not initialized in
> > >> the previous patch either.
> > >>
> > >>> };
> > > This struct is declared as "static storage", so won't the non-initialized
> > > members be 0 ?
> >
> >
> > Correct. Although, I see that 'nested_enable_evmcs' is explicitly
> > initialized. We should maintain the convention, perhaps.
> >
> > >
> > >>> static int __init svm_init(void)
> > >>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > >>> index 05e953b2ec61..2127ed937f53 100644
> > >>> --- a/arch/x86/kvm/x86.c
> > >>> +++ b/arch/x86/kvm/x86.c
> > >>> @@ -5250,6 +5250,12 @@ long kvm_arch_vm_ioctl(struct file *filp,
> > >>> r = kvm_x86_ops->set_page_enc_bitmap(kvm, &bitmap);
> > >>> break;
> > >>> }
> > >>> + case KVM_PAGE_ENC_BITMAP_RESET: {
> > >>> + r = -ENOTTY;
> > >>> + if (kvm_x86_ops->reset_page_enc_bitmap)
> > >>> + r = kvm_x86_ops->reset_page_enc_bitmap(kvm);
> > >>> + break;
> > >>> + }
> > >>> default:
> > >>> r = -ENOTTY;
> > >>> }
> > >>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> > >>> index b4b01d47e568..0884a581fc37 100644
> > >>> --- a/include/uapi/linux/kvm.h
> > >>> +++ b/include/uapi/linux/kvm.h
> > >>> @@ -1490,6 +1490,7 @@ struct kvm_enc_region {
> > >>> #define KVM_GET_PAGE_ENC_BITMAP _IOW(KVMIO, 0xc5, struct kvm_page_enc_bitmap)
> > >>> #define KVM_SET_PAGE_ENC_BITMAP _IOW(KVMIO, 0xc6, struct kvm_page_enc_bitmap)
> > >>> +#define KVM_PAGE_ENC_BITMAP_RESET _IO(KVMIO, 0xc7)
> > >>> /* Secure Encrypted Virtualization command */
> > >>> enum sev_cmd_id {
> > >> Reviewed-by: Krish Sadhukhan <[email protected]>
>
>
> Doesn't this overlap with the set ioctl? Yes, obviously, you have to
> copy the new value down and do a bit more work, but I don't think
> resetting the bitmap is going to be the bottleneck on reboot. Seems
> excessive to add another ioctl for this.
The set ioctl is generally available/provided for the incoming VM to setup
the page encryption bitmap, this reset ioctl is meant for the source VM
as a simple interface to reset the whole page encryption bitmap.
Thanks,
Ashish
On 4/7/20 8:38 PM, Steve Rutherford wrote:
> On Tue, Apr 7, 2020 at 6:17 PM Ashish Kalra <[email protected]> wrote:
>> Hello Steve, Brijesh,
>>
>> On Tue, Apr 07, 2020 at 05:35:57PM -0700, Steve Rutherford wrote:
>>> On Tue, Apr 7, 2020 at 5:29 PM Brijesh Singh <[email protected]> wrote:
>>>>
>>>> On 4/7/20 7:01 PM, Steve Rutherford wrote:
>>>>> On Mon, Apr 6, 2020 at 10:27 PM Ashish Kalra <[email protected]> wrote:
>>>>>> Hello Steve,
>>>>>>
>>>>>> On Mon, Apr 06, 2020 at 07:17:37PM -0700, Steve Rutherford wrote:
>>>>>>> On Sun, Mar 29, 2020 at 11:22 PM Ashish Kalra <[email protected]> wrote:
>>>>>>>> From: Brijesh Singh <[email protected]>
>>>>>>>>
>>>>>>>> This hypercall is used by the SEV guest to notify a change in the page
>>>>>>>> encryption status to the hypervisor. The hypercall should be invoked
>>>>>>>> only when the encryption attribute is changed from encrypted -> decrypted
>>>>>>>> and vice versa. By default all guest pages are considered encrypted.
>>>>>>>>
>>>>>>>> Cc: Thomas Gleixner <[email protected]>
>>>>>>>> Cc: Ingo Molnar <[email protected]>
>>>>>>>> Cc: "H. Peter Anvin" <[email protected]>
>>>>>>>> Cc: Paolo Bonzini <[email protected]>
>>>>>>>> Cc: "Radim Krčmář" <[email protected]>
>>>>>>>> Cc: Joerg Roedel <[email protected]>
>>>>>>>> Cc: Borislav Petkov <[email protected]>
>>>>>>>> Cc: Tom Lendacky <[email protected]>
>>>>>>>> Cc: [email protected]
>>>>>>>> Cc: [email protected]
>>>>>>>> Cc: [email protected]
>>>>>>>> Signed-off-by: Brijesh Singh <[email protected]>
>>>>>>>> Signed-off-by: Ashish Kalra <[email protected]>
>>>>>>>> ---
>>>>>>>> Documentation/virt/kvm/hypercalls.rst | 15 +++++
>>>>>>>> arch/x86/include/asm/kvm_host.h | 2 +
>>>>>>>> arch/x86/kvm/svm.c | 95 +++++++++++++++++++++++++++
>>>>>>>> arch/x86/kvm/vmx/vmx.c | 1 +
>>>>>>>> arch/x86/kvm/x86.c | 6 ++
>>>>>>>> include/uapi/linux/kvm_para.h | 1 +
>>>>>>>> 6 files changed, 120 insertions(+)
>>>>>>>>
>>>>>>>> diff --git a/Documentation/virt/kvm/hypercalls.rst b/Documentation/virt/kvm/hypercalls.rst
>>>>>>>> index dbaf207e560d..ff5287e68e81 100644
>>>>>>>> --- a/Documentation/virt/kvm/hypercalls.rst
>>>>>>>> +++ b/Documentation/virt/kvm/hypercalls.rst
>>>>>>>> @@ -169,3 +169,18 @@ a0: destination APIC ID
>>>>>>>>
>>>>>>>> :Usage example: When sending a call-function IPI-many to vCPUs, yield if
>>>>>>>> any of the IPI target vCPUs was preempted.
>>>>>>>> +
>>>>>>>> +
>>>>>>>> +8. KVM_HC_PAGE_ENC_STATUS
>>>>>>>> +-------------------------
>>>>>>>> +:Architecture: x86
>>>>>>>> +:Status: active
>>>>>>>> +:Purpose: Notify the encryption status changes in guest page table (SEV guest)
>>>>>>>> +
>>>>>>>> +a0: the guest physical address of the start page
>>>>>>>> +a1: the number of pages
>>>>>>>> +a2: encryption attribute
>>>>>>>> +
>>>>>>>> + Where:
>>>>>>>> + * 1: Encryption attribute is set
>>>>>>>> + * 0: Encryption attribute is cleared
>>>>>>>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>>>>>>>> index 98959e8cd448..90718fa3db47 100644
>>>>>>>> --- a/arch/x86/include/asm/kvm_host.h
>>>>>>>> +++ b/arch/x86/include/asm/kvm_host.h
>>>>>>>> @@ -1267,6 +1267,8 @@ struct kvm_x86_ops {
>>>>>>>>
>>>>>>>> bool (*apic_init_signal_blocked)(struct kvm_vcpu *vcpu);
>>>>>>>> int (*enable_direct_tlbflush)(struct kvm_vcpu *vcpu);
>>>>>>>> + int (*page_enc_status_hc)(struct kvm *kvm, unsigned long gpa,
>>>>>>>> + unsigned long sz, unsigned long mode);
>>>>>>> Nit: spell out size instead of sz.
>>>>>>>> };
>>>>>>>>
>>>>>>>> struct kvm_arch_async_pf {
>>>>>>>> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
>>>>>>>> index 7c2721e18b06..1d8beaf1bceb 100644
>>>>>>>> --- a/arch/x86/kvm/svm.c
>>>>>>>> +++ b/arch/x86/kvm/svm.c
>>>>>>>> @@ -136,6 +136,8 @@ struct kvm_sev_info {
>>>>>>>> int fd; /* SEV device fd */
>>>>>>>> unsigned long pages_locked; /* Number of pages locked */
>>>>>>>> struct list_head regions_list; /* List of registered regions */
>>>>>>>> + unsigned long *page_enc_bmap;
>>>>>>>> + unsigned long page_enc_bmap_size;
>>>>>>>> };
>>>>>>>>
>>>>>>>> struct kvm_svm {
>>>>>>>> @@ -1991,6 +1993,9 @@ static void sev_vm_destroy(struct kvm *kvm)
>>>>>>>>
>>>>>>>> sev_unbind_asid(kvm, sev->handle);
>>>>>>>> sev_asid_free(sev->asid);
>>>>>>>> +
>>>>>>>> + kvfree(sev->page_enc_bmap);
>>>>>>>> + sev->page_enc_bmap = NULL;
>>>>>>>> }
>>>>>>>>
>>>>>>>> static void avic_vm_destroy(struct kvm *kvm)
>>>>>>>> @@ -7593,6 +7598,94 @@ static int sev_receive_finish(struct kvm *kvm, struct kvm_sev_cmd *argp)
>>>>>>>> return ret;
>>>>>>>> }
>>>>>>>>
>>>>>>>> +static int sev_resize_page_enc_bitmap(struct kvm *kvm, unsigned long new_size)
>>>>>>>> +{
>>>>>>>> + struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
>>>>>>>> + unsigned long *map;
>>>>>>>> + unsigned long sz;
>>>>>>>> +
>>>>>>>> + if (sev->page_enc_bmap_size >= new_size)
>>>>>>>> + return 0;
>>>>>>>> +
>>>>>>>> + sz = ALIGN(new_size, BITS_PER_LONG) / 8;
>>>>>>>> +
>>>>>>>> + map = vmalloc(sz);
>>>>>>>> + if (!map) {
>>>>>>>> + pr_err_once("Failed to allocate encrypted bitmap size %lx\n",
>>>>>>>> + sz);
>>>>>>>> + return -ENOMEM;
>>>>>>>> + }
>>>>>>>> +
>>>>>>>> + /* mark the page encrypted (by default) */
>>>>>>>> + memset(map, 0xff, sz);
>>>>>>>> +
>>>>>>>> + bitmap_copy(map, sev->page_enc_bmap, sev->page_enc_bmap_size);
>>>>>>>> + kvfree(sev->page_enc_bmap);
>>>>>>>> +
>>>>>>>> + sev->page_enc_bmap = map;
>>>>>>>> + sev->page_enc_bmap_size = new_size;
>>>>>>>> +
>>>>>>>> + return 0;
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +static int svm_page_enc_status_hc(struct kvm *kvm, unsigned long gpa,
>>>>>>>> + unsigned long npages, unsigned long enc)
>>>>>>>> +{
>>>>>>>> + struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
>>>>>>>> + kvm_pfn_t pfn_start, pfn_end;
>>>>>>>> + gfn_t gfn_start, gfn_end;
>>>>>>>> + int ret;
>>>>>>>> +
>>>>>>>> + if (!sev_guest(kvm))
>>>>>>>> + return -EINVAL;
>>>>>>>> +
>>>>>>>> + if (!npages)
>>>>>>>> + return 0;
>>>>>>>> +
>>>>>>>> + gfn_start = gpa_to_gfn(gpa);
>>>>>>>> + gfn_end = gfn_start + npages;
>>>>>>>> +
>>>>>>>> + /* out of bound access error check */
>>>>>>>> + if (gfn_end <= gfn_start)
>>>>>>>> + return -EINVAL;
>>>>>>>> +
>>>>>>>> + /* lets make sure that gpa exist in our memslot */
>>>>>>>> + pfn_start = gfn_to_pfn(kvm, gfn_start);
>>>>>>>> + pfn_end = gfn_to_pfn(kvm, gfn_end);
>>>>>>>> +
>>>>>>>> + if (is_error_noslot_pfn(pfn_start) && !is_noslot_pfn(pfn_start)) {
>>>>>>>> + /*
>>>>>>>> + * Allow guest MMIO range(s) to be added
>>>>>>>> + * to the page encryption bitmap.
>>>>>>>> + */
>>>>>>>> + return -EINVAL;
>>>>>>>> + }
>>>>>>>> +
>>>>>>>> + if (is_error_noslot_pfn(pfn_end) && !is_noslot_pfn(pfn_end)) {
>>>>>>>> + /*
>>>>>>>> + * Allow guest MMIO range(s) to be added
>>>>>>>> + * to the page encryption bitmap.
>>>>>>>> + */
>>>>>>>> + return -EINVAL;
>>>>>>>> + }
>>>>>>>> +
>>>>>>>> + mutex_lock(&kvm->lock);
>>>>>>>> + ret = sev_resize_page_enc_bitmap(kvm, gfn_end);
>>>>>>>> + if (ret)
>>>>>>>> + goto unlock;
>>>>>>>> +
>>>>>>>> + if (enc)
>>>>>>>> + __bitmap_set(sev->page_enc_bmap, gfn_start,
>>>>>>>> + gfn_end - gfn_start);
>>>>>>>> + else
>>>>>>>> + __bitmap_clear(sev->page_enc_bmap, gfn_start,
>>>>>>>> + gfn_end - gfn_start);
>>>>>>>> +
>>>>>>>> +unlock:
>>>>>>>> + mutex_unlock(&kvm->lock);
>>>>>>>> + return ret;
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
>>>>>>>> {
>>>>>>>> struct kvm_sev_cmd sev_cmd;
>>>>>>>> @@ -7995,6 +8088,8 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
>>>>>>>> .need_emulation_on_page_fault = svm_need_emulation_on_page_fault,
>>>>>>>>
>>>>>>>> .apic_init_signal_blocked = svm_apic_init_signal_blocked,
>>>>>>>> +
>>>>>>>> + .page_enc_status_hc = svm_page_enc_status_hc,
>>>>>>>> };
>>>>>>>>
>>>>>>>> static int __init svm_init(void)
>>>>>>>> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
>>>>>>>> index 079d9fbf278e..f68e76ee7f9c 100644
>>>>>>>> --- a/arch/x86/kvm/vmx/vmx.c
>>>>>>>> +++ b/arch/x86/kvm/vmx/vmx.c
>>>>>>>> @@ -8001,6 +8001,7 @@ static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
>>>>>>>> .nested_get_evmcs_version = NULL,
>>>>>>>> .need_emulation_on_page_fault = vmx_need_emulation_on_page_fault,
>>>>>>>> .apic_init_signal_blocked = vmx_apic_init_signal_blocked,
>>>>>>>> + .page_enc_status_hc = NULL,
>>>>>>>> };
>>>>>>>>
>>>>>>>> static void vmx_cleanup_l1d_flush(void)
>>>>>>>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>>>>>>>> index cf95c36cb4f4..68428eef2dde 100644
>>>>>>>> --- a/arch/x86/kvm/x86.c
>>>>>>>> +++ b/arch/x86/kvm/x86.c
>>>>>>>> @@ -7564,6 +7564,12 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
>>>>>>>> kvm_sched_yield(vcpu->kvm, a0);
>>>>>>>> ret = 0;
>>>>>>>> break;
>>>>>>>> + case KVM_HC_PAGE_ENC_STATUS:
>>>>>>>> + ret = -KVM_ENOSYS;
>>>>>>>> + if (kvm_x86_ops->page_enc_status_hc)
>>>>>>>> + ret = kvm_x86_ops->page_enc_status_hc(vcpu->kvm,
>>>>>>>> + a0, a1, a2);
>>>>>>>> + break;
>>>>>>>> default:
>>>>>>>> ret = -KVM_ENOSYS;
>>>>>>>> break;
>>>>>>>> diff --git a/include/uapi/linux/kvm_para.h b/include/uapi/linux/kvm_para.h
>>>>>>>> index 8b86609849b9..847b83b75dc8 100644
>>>>>>>> --- a/include/uapi/linux/kvm_para.h
>>>>>>>> +++ b/include/uapi/linux/kvm_para.h
>>>>>>>> @@ -29,6 +29,7 @@
>>>>>>>> #define KVM_HC_CLOCK_PAIRING 9
>>>>>>>> #define KVM_HC_SEND_IPI 10
>>>>>>>> #define KVM_HC_SCHED_YIELD 11
>>>>>>>> +#define KVM_HC_PAGE_ENC_STATUS 12
>>>>>>>>
>>>>>>>> /*
>>>>>>>> * hypercalls use architecture specific
>>>>>>>> --
>>>>>>>> 2.17.1
>>>>>>>>
>>>>>>> I'm still not excited by the dynamic resizing. I believe the guest
>>>>>>> hypercall can be called in atomic contexts, which makes me
>>>>>>> particularly unexcited to see a potentially large vmalloc on the host
>>>>>>> followed by filling the buffer. Particularly when the buffer might be
>>>>>>> non-trivial in size (~1MB per 32GB, per some back of the envelope
>>>>>>> math).
>>>>>>>
>>>>>> I think looking at more practical situations, most hypercalls will
>>>>>> happen during the boot stage, when device specific initializations are
>>>>>> happening, so typically the maximum page encryption bitmap size would
>>>>>> be allocated early enough.
>>>>>>
>>>>>> In fact, initial hypercalls made by OVMF will probably allocate the
>>>>>> maximum page bitmap size even before the kernel comes up, especially
>>>>>> as they will be setting up page enc/dec status for MMIO, ROM, ACPI
>>>>>> regions, PCI device memory, etc., and most importantly for
>>>>>> "non-existent" high memory range (which will probably be the
>>>>>> maximum size page encryption bitmap allocated/resized).
>>>>>>
>>>>>> Let me know if you have different thoughts on this ?
>>>>> Hi Ashish,
>>>>>
>>>>> If this is not an issue in practice, we can just move past this. If we
>>>>> are basically guaranteed that OVMF will trigger hypercalls that expand
>>>>> the bitmap beyond the top of memory, then, yes, that should work. That
>>>>> leaves me slightly nervous that OVMF might regress since it's not
>>>>> obvious that calling a hypercall beyond the top of memory would be
>>>>> "required" for avoiding a somewhat indirectly related issue in guest
>>>>> kernels.
>>>>
>>>> If possible then we should try to avoid growing/shrinking the bitmap .
>>>> Today OVMF may not be accessing beyond memory but a malicious guest
>>>> could send a hypercall down which can trigger a huge memory allocation
>>>> on the host side and may eventually cause denial of service for other.
>>> Nice catch! Was just writing up an email about this.
>>>> I am in favor if we can find some solution to handle this case. How
>>>> about Steve's suggestion about VMM making a call down to the kernel to
>>>> tell how big the bitmap should be? Initially it should be equal to the
>>>> guest RAM and if VMM ever did the memory expansion then it can send down
>>>> another notification to increase the bitmap ?
>>>>
>>>> Optionally, instead of adding a new ioctl, I was wondering if we can
>>>> extend the kvm_arch_prepare_memory_region() to make svm specific x86_ops
>>>> which can take read the userspace provided memory region and calculate
>>>> the amount of guest RAM managed by the KVM and grow/shrink the bitmap
>>>> based on that information. I have not looked deep enough to see if its
>>>> doable but if it can work then we can avoid adding yet another ioctl.
>>> We also have the set bitmap ioctl in a later patch in this series. We
>>> could also use the set ioctl for initialization (it's a little
>>> excessive for initialization since there will be an additional
>>> ephemeral allocation and a few additional buffer copies, but that's
>>> probably fine). An enable_cap has the added benefit of probably being
>>> necessary anyway so usermode can disable the migration feature flag.
>>>
>>> In general, userspace is going to have to be in direct control of the
>>> buffer and its size.
>> My only practical concern about setting a static bitmap size based on guest
>> memory is about the hypercalls being made initially by OVMF to set page
>> enc/dec status for ROM, ACPI regions and especially the non-existent
>> high memory range. The new ioctl will statically setup bitmap size to
>> whatever guest RAM is specified, say 4G, 8G, etc., but the OVMF
>> hypercall for non-existent memory will try to do a hypercall for guest
>> physical memory range like ~6G->64G (for 4G guest RAM setup), this
>> hypercall will basically have to just return doing nothing, because
>> the allocated bitmap won't have this guest physical range available ?
IMO, Ovmf issuing a hypercall beyond the guest RAM is simple wrong, it
should *not* do that. There was a feature request I submitted sometime
back to Tianocore https://bugzilla.tianocore.org/show_bug.cgi?id=623 as
I saw this coming in future. I tried highlighting the problem in the
MdeModulePkg that it does not provide a notifier to tell OVMF when core
creates the MMIO holes etc. It was not a big problem with the SEV
initially because we were never getting down to hypervisor to do
something about those non-existent regions. But with the migration its
now important that we should restart the discussion with UEFI folks and
see what can be done. In the kernel patches we should do what is right
for the kernel and not workaround the Ovmf limitation.
>> Also, hypercalls for ROM, ACPI, device regions and any memory holes within
>> the static bitmap setup as per guest RAM config will work, but what
>> about hypercalls for any device regions beyond the guest RAM config ?
>>
>> Thanks,
>> Ashish
> I'm not super familiar with what the address beyond the top of ram is
> used for. If the memory is not backed by RAM, will it even matter for
> migration? Sounds like the encryption for SEV won't even apply to it.
> If we don't need to know what the c-bit state of an address is, we
> don't need to track it. It doesn't hurt to track it (which is why I'm
> not super concerned about tracking the memory holes).
Hello Brijesh,
On Tue, Apr 07, 2020 at 09:34:15PM -0500, Brijesh Singh wrote:
>
> On 4/7/20 8:38 PM, Steve Rutherford wrote:
> > On Tue, Apr 7, 2020 at 6:17 PM Ashish Kalra <[email protected]> wrote:
> >> Hello Steve, Brijesh,
> >>
> >> On Tue, Apr 07, 2020 at 05:35:57PM -0700, Steve Rutherford wrote:
> >>> On Tue, Apr 7, 2020 at 5:29 PM Brijesh Singh <[email protected]> wrote:
> >>>>
> >>>> On 4/7/20 7:01 PM, Steve Rutherford wrote:
> >>>>> On Mon, Apr 6, 2020 at 10:27 PM Ashish Kalra <[email protected]> wrote:
> >>>>>> Hello Steve,
> >>>>>>
> >>>>>> On Mon, Apr 06, 2020 at 07:17:37PM -0700, Steve Rutherford wrote:
> >>>>>>> On Sun, Mar 29, 2020 at 11:22 PM Ashish Kalra <[email protected]> wrote:
> >>>>>>>> From: Brijesh Singh <[email protected]>
> >>>>>>>>
> >>>>>>>> This hypercall is used by the SEV guest to notify a change in the page
> >>>>>>>> encryption status to the hypervisor. The hypercall should be invoked
> >>>>>>>> only when the encryption attribute is changed from encrypted -> decrypted
> >>>>>>>> and vice versa. By default all guest pages are considered encrypted.
> >>>>>>>>
> >>>>>>>> Cc: Thomas Gleixner <[email protected]>
> >>>>>>>> Cc: Ingo Molnar <[email protected]>
> >>>>>>>> Cc: "H. Peter Anvin" <[email protected]>
> >>>>>>>> Cc: Paolo Bonzini <[email protected]>
> >>>>>>>> Cc: "Radim Krčmář" <[email protected]>
> >>>>>>>> Cc: Joerg Roedel <[email protected]>
> >>>>>>>> Cc: Borislav Petkov <[email protected]>
> >>>>>>>> Cc: Tom Lendacky <[email protected]>
> >>>>>>>> Cc: [email protected]
> >>>>>>>> Cc: [email protected]
> >>>>>>>> Cc: [email protected]
> >>>>>>>> Signed-off-by: Brijesh Singh <[email protected]>
> >>>>>>>> Signed-off-by: Ashish Kalra <[email protected]>
> >>>>>>>> ---
> >>>>>>>> Documentation/virt/kvm/hypercalls.rst | 15 +++++
> >>>>>>>> arch/x86/include/asm/kvm_host.h | 2 +
> >>>>>>>> arch/x86/kvm/svm.c | 95 +++++++++++++++++++++++++++
> >>>>>>>> arch/x86/kvm/vmx/vmx.c | 1 +
> >>>>>>>> arch/x86/kvm/x86.c | 6 ++
> >>>>>>>> include/uapi/linux/kvm_para.h | 1 +
> >>>>>>>> 6 files changed, 120 insertions(+)
> >>>>>>>>
> >>>>>>>> diff --git a/Documentation/virt/kvm/hypercalls.rst b/Documentation/virt/kvm/hypercalls.rst
> >>>>>>>> index dbaf207e560d..ff5287e68e81 100644
> >>>>>>>> --- a/Documentation/virt/kvm/hypercalls.rst
> >>>>>>>> +++ b/Documentation/virt/kvm/hypercalls.rst
> >>>>>>>> @@ -169,3 +169,18 @@ a0: destination APIC ID
> >>>>>>>>
> >>>>>>>> :Usage example: When sending a call-function IPI-many to vCPUs, yield if
> >>>>>>>> any of the IPI target vCPUs was preempted.
> >>>>>>>> +
> >>>>>>>> +
> >>>>>>>> +8. KVM_HC_PAGE_ENC_STATUS
> >>>>>>>> +-------------------------
> >>>>>>>> +:Architecture: x86
> >>>>>>>> +:Status: active
> >>>>>>>> +:Purpose: Notify the encryption status changes in guest page table (SEV guest)
> >>>>>>>> +
> >>>>>>>> +a0: the guest physical address of the start page
> >>>>>>>> +a1: the number of pages
> >>>>>>>> +a2: encryption attribute
> >>>>>>>> +
> >>>>>>>> + Where:
> >>>>>>>> + * 1: Encryption attribute is set
> >>>>>>>> + * 0: Encryption attribute is cleared
> >>>>>>>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> >>>>>>>> index 98959e8cd448..90718fa3db47 100644
> >>>>>>>> --- a/arch/x86/include/asm/kvm_host.h
> >>>>>>>> +++ b/arch/x86/include/asm/kvm_host.h
> >>>>>>>> @@ -1267,6 +1267,8 @@ struct kvm_x86_ops {
> >>>>>>>>
> >>>>>>>> bool (*apic_init_signal_blocked)(struct kvm_vcpu *vcpu);
> >>>>>>>> int (*enable_direct_tlbflush)(struct kvm_vcpu *vcpu);
> >>>>>>>> + int (*page_enc_status_hc)(struct kvm *kvm, unsigned long gpa,
> >>>>>>>> + unsigned long sz, unsigned long mode);
> >>>>>>> Nit: spell out size instead of sz.
> >>>>>>>> };
> >>>>>>>>
> >>>>>>>> struct kvm_arch_async_pf {
> >>>>>>>> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> >>>>>>>> index 7c2721e18b06..1d8beaf1bceb 100644
> >>>>>>>> --- a/arch/x86/kvm/svm.c
> >>>>>>>> +++ b/arch/x86/kvm/svm.c
> >>>>>>>> @@ -136,6 +136,8 @@ struct kvm_sev_info {
> >>>>>>>> int fd; /* SEV device fd */
> >>>>>>>> unsigned long pages_locked; /* Number of pages locked */
> >>>>>>>> struct list_head regions_list; /* List of registered regions */
> >>>>>>>> + unsigned long *page_enc_bmap;
> >>>>>>>> + unsigned long page_enc_bmap_size;
> >>>>>>>> };
> >>>>>>>>
> >>>>>>>> struct kvm_svm {
> >>>>>>>> @@ -1991,6 +1993,9 @@ static void sev_vm_destroy(struct kvm *kvm)
> >>>>>>>>
> >>>>>>>> sev_unbind_asid(kvm, sev->handle);
> >>>>>>>> sev_asid_free(sev->asid);
> >>>>>>>> +
> >>>>>>>> + kvfree(sev->page_enc_bmap);
> >>>>>>>> + sev->page_enc_bmap = NULL;
> >>>>>>>> }
> >>>>>>>>
> >>>>>>>> static void avic_vm_destroy(struct kvm *kvm)
> >>>>>>>> @@ -7593,6 +7598,94 @@ static int sev_receive_finish(struct kvm *kvm, struct kvm_sev_cmd *argp)
> >>>>>>>> return ret;
> >>>>>>>> }
> >>>>>>>>
> >>>>>>>> +static int sev_resize_page_enc_bitmap(struct kvm *kvm, unsigned long new_size)
> >>>>>>>> +{
> >>>>>>>> + struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> >>>>>>>> + unsigned long *map;
> >>>>>>>> + unsigned long sz;
> >>>>>>>> +
> >>>>>>>> + if (sev->page_enc_bmap_size >= new_size)
> >>>>>>>> + return 0;
> >>>>>>>> +
> >>>>>>>> + sz = ALIGN(new_size, BITS_PER_LONG) / 8;
> >>>>>>>> +
> >>>>>>>> + map = vmalloc(sz);
> >>>>>>>> + if (!map) {
> >>>>>>>> + pr_err_once("Failed to allocate encrypted bitmap size %lx\n",
> >>>>>>>> + sz);
> >>>>>>>> + return -ENOMEM;
> >>>>>>>> + }
> >>>>>>>> +
> >>>>>>>> + /* mark the page encrypted (by default) */
> >>>>>>>> + memset(map, 0xff, sz);
> >>>>>>>> +
> >>>>>>>> + bitmap_copy(map, sev->page_enc_bmap, sev->page_enc_bmap_size);
> >>>>>>>> + kvfree(sev->page_enc_bmap);
> >>>>>>>> +
> >>>>>>>> + sev->page_enc_bmap = map;
> >>>>>>>> + sev->page_enc_bmap_size = new_size;
> >>>>>>>> +
> >>>>>>>> + return 0;
> >>>>>>>> +}
> >>>>>>>> +
> >>>>>>>> +static int svm_page_enc_status_hc(struct kvm *kvm, unsigned long gpa,
> >>>>>>>> + unsigned long npages, unsigned long enc)
> >>>>>>>> +{
> >>>>>>>> + struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> >>>>>>>> + kvm_pfn_t pfn_start, pfn_end;
> >>>>>>>> + gfn_t gfn_start, gfn_end;
> >>>>>>>> + int ret;
> >>>>>>>> +
> >>>>>>>> + if (!sev_guest(kvm))
> >>>>>>>> + return -EINVAL;
> >>>>>>>> +
> >>>>>>>> + if (!npages)
> >>>>>>>> + return 0;
> >>>>>>>> +
> >>>>>>>> + gfn_start = gpa_to_gfn(gpa);
> >>>>>>>> + gfn_end = gfn_start + npages;
> >>>>>>>> +
> >>>>>>>> + /* out of bound access error check */
> >>>>>>>> + if (gfn_end <= gfn_start)
> >>>>>>>> + return -EINVAL;
> >>>>>>>> +
> >>>>>>>> + /* lets make sure that gpa exist in our memslot */
> >>>>>>>> + pfn_start = gfn_to_pfn(kvm, gfn_start);
> >>>>>>>> + pfn_end = gfn_to_pfn(kvm, gfn_end);
> >>>>>>>> +
> >>>>>>>> + if (is_error_noslot_pfn(pfn_start) && !is_noslot_pfn(pfn_start)) {
> >>>>>>>> + /*
> >>>>>>>> + * Allow guest MMIO range(s) to be added
> >>>>>>>> + * to the page encryption bitmap.
> >>>>>>>> + */
> >>>>>>>> + return -EINVAL;
> >>>>>>>> + }
> >>>>>>>> +
> >>>>>>>> + if (is_error_noslot_pfn(pfn_end) && !is_noslot_pfn(pfn_end)) {
> >>>>>>>> + /*
> >>>>>>>> + * Allow guest MMIO range(s) to be added
> >>>>>>>> + * to the page encryption bitmap.
> >>>>>>>> + */
> >>>>>>>> + return -EINVAL;
> >>>>>>>> + }
> >>>>>>>> +
> >>>>>>>> + mutex_lock(&kvm->lock);
> >>>>>>>> + ret = sev_resize_page_enc_bitmap(kvm, gfn_end);
> >>>>>>>> + if (ret)
> >>>>>>>> + goto unlock;
> >>>>>>>> +
> >>>>>>>> + if (enc)
> >>>>>>>> + __bitmap_set(sev->page_enc_bmap, gfn_start,
> >>>>>>>> + gfn_end - gfn_start);
> >>>>>>>> + else
> >>>>>>>> + __bitmap_clear(sev->page_enc_bmap, gfn_start,
> >>>>>>>> + gfn_end - gfn_start);
> >>>>>>>> +
> >>>>>>>> +unlock:
> >>>>>>>> + mutex_unlock(&kvm->lock);
> >>>>>>>> + return ret;
> >>>>>>>> +}
> >>>>>>>> +
> >>>>>>>> static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
> >>>>>>>> {
> >>>>>>>> struct kvm_sev_cmd sev_cmd;
> >>>>>>>> @@ -7995,6 +8088,8 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
> >>>>>>>> .need_emulation_on_page_fault = svm_need_emulation_on_page_fault,
> >>>>>>>>
> >>>>>>>> .apic_init_signal_blocked = svm_apic_init_signal_blocked,
> >>>>>>>> +
> >>>>>>>> + .page_enc_status_hc = svm_page_enc_status_hc,
> >>>>>>>> };
> >>>>>>>>
> >>>>>>>> static int __init svm_init(void)
> >>>>>>>> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> >>>>>>>> index 079d9fbf278e..f68e76ee7f9c 100644
> >>>>>>>> --- a/arch/x86/kvm/vmx/vmx.c
> >>>>>>>> +++ b/arch/x86/kvm/vmx/vmx.c
> >>>>>>>> @@ -8001,6 +8001,7 @@ static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
> >>>>>>>> .nested_get_evmcs_version = NULL,
> >>>>>>>> .need_emulation_on_page_fault = vmx_need_emulation_on_page_fault,
> >>>>>>>> .apic_init_signal_blocked = vmx_apic_init_signal_blocked,
> >>>>>>>> + .page_enc_status_hc = NULL,
> >>>>>>>> };
> >>>>>>>>
> >>>>>>>> static void vmx_cleanup_l1d_flush(void)
> >>>>>>>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> >>>>>>>> index cf95c36cb4f4..68428eef2dde 100644
> >>>>>>>> --- a/arch/x86/kvm/x86.c
> >>>>>>>> +++ b/arch/x86/kvm/x86.c
> >>>>>>>> @@ -7564,6 +7564,12 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
> >>>>>>>> kvm_sched_yield(vcpu->kvm, a0);
> >>>>>>>> ret = 0;
> >>>>>>>> break;
> >>>>>>>> + case KVM_HC_PAGE_ENC_STATUS:
> >>>>>>>> + ret = -KVM_ENOSYS;
> >>>>>>>> + if (kvm_x86_ops->page_enc_status_hc)
> >>>>>>>> + ret = kvm_x86_ops->page_enc_status_hc(vcpu->kvm,
> >>>>>>>> + a0, a1, a2);
> >>>>>>>> + break;
> >>>>>>>> default:
> >>>>>>>> ret = -KVM_ENOSYS;
> >>>>>>>> break;
> >>>>>>>> diff --git a/include/uapi/linux/kvm_para.h b/include/uapi/linux/kvm_para.h
> >>>>>>>> index 8b86609849b9..847b83b75dc8 100644
> >>>>>>>> --- a/include/uapi/linux/kvm_para.h
> >>>>>>>> +++ b/include/uapi/linux/kvm_para.h
> >>>>>>>> @@ -29,6 +29,7 @@
> >>>>>>>> #define KVM_HC_CLOCK_PAIRING 9
> >>>>>>>> #define KVM_HC_SEND_IPI 10
> >>>>>>>> #define KVM_HC_SCHED_YIELD 11
> >>>>>>>> +#define KVM_HC_PAGE_ENC_STATUS 12
> >>>>>>>>
> >>>>>>>> /*
> >>>>>>>> * hypercalls use architecture specific
> >>>>>>>> --
> >>>>>>>> 2.17.1
> >>>>>>>>
> >>>>>>> I'm still not excited by the dynamic resizing. I believe the guest
> >>>>>>> hypercall can be called in atomic contexts, which makes me
> >>>>>>> particularly unexcited to see a potentially large vmalloc on the host
> >>>>>>> followed by filling the buffer. Particularly when the buffer might be
> >>>>>>> non-trivial in size (~1MB per 32GB, per some back of the envelope
> >>>>>>> math).
> >>>>>>>
> >>>>>> I think looking at more practical situations, most hypercalls will
> >>>>>> happen during the boot stage, when device specific initializations are
> >>>>>> happening, so typically the maximum page encryption bitmap size would
> >>>>>> be allocated early enough.
> >>>>>>
> >>>>>> In fact, initial hypercalls made by OVMF will probably allocate the
> >>>>>> maximum page bitmap size even before the kernel comes up, especially
> >>>>>> as they will be setting up page enc/dec status for MMIO, ROM, ACPI
> >>>>>> regions, PCI device memory, etc., and most importantly for
> >>>>>> "non-existent" high memory range (which will probably be the
> >>>>>> maximum size page encryption bitmap allocated/resized).
> >>>>>>
> >>>>>> Let me know if you have different thoughts on this ?
> >>>>> Hi Ashish,
> >>>>>
> >>>>> If this is not an issue in practice, we can just move past this. If we
> >>>>> are basically guaranteed that OVMF will trigger hypercalls that expand
> >>>>> the bitmap beyond the top of memory, then, yes, that should work. That
> >>>>> leaves me slightly nervous that OVMF might regress since it's not
> >>>>> obvious that calling a hypercall beyond the top of memory would be
> >>>>> "required" for avoiding a somewhat indirectly related issue in guest
> >>>>> kernels.
> >>>>
> >>>> If possible then we should try to avoid growing/shrinking the bitmap .
> >>>> Today OVMF may not be accessing beyond memory but a malicious guest
> >>>> could send a hypercall down which can trigger a huge memory allocation
> >>>> on the host side and may eventually cause denial of service for other.
> >>> Nice catch! Was just writing up an email about this.
> >>>> I am in favor if we can find some solution to handle this case. How
> >>>> about Steve's suggestion about VMM making a call down to the kernel to
> >>>> tell how big the bitmap should be? Initially it should be equal to the
> >>>> guest RAM and if VMM ever did the memory expansion then it can send down
> >>>> another notification to increase the bitmap ?
> >>>>
> >>>> Optionally, instead of adding a new ioctl, I was wondering if we can
> >>>> extend the kvm_arch_prepare_memory_region() to make svm specific x86_ops
> >>>> which can take read the userspace provided memory region and calculate
> >>>> the amount of guest RAM managed by the KVM and grow/shrink the bitmap
> >>>> based on that information. I have not looked deep enough to see if its
> >>>> doable but if it can work then we can avoid adding yet another ioctl.
> >>> We also have the set bitmap ioctl in a later patch in this series. We
> >>> could also use the set ioctl for initialization (it's a little
> >>> excessive for initialization since there will be an additional
> >>> ephemeral allocation and a few additional buffer copies, but that's
> >>> probably fine). An enable_cap has the added benefit of probably being
> >>> necessary anyway so usermode can disable the migration feature flag.
> >>>
> >>> In general, userspace is going to have to be in direct control of the
> >>> buffer and its size.
> >> My only practical concern about setting a static bitmap size based on guest
> >> memory is about the hypercalls being made initially by OVMF to set page
> >> enc/dec status for ROM, ACPI regions and especially the non-existent
> >> high memory range. The new ioctl will statically setup bitmap size to
> >> whatever guest RAM is specified, say 4G, 8G, etc., but the OVMF
> >> hypercall for non-existent memory will try to do a hypercall for guest
> >> physical memory range like ~6G->64G (for 4G guest RAM setup), this
> >> hypercall will basically have to just return doing nothing, because
> >> the allocated bitmap won't have this guest physical range available ?
>
>
> IMO, Ovmf issuing a hypercall beyond the guest RAM is simple wrong, it
> should *not* do that. There was a feature request I submitted sometime
> back to Tianocore https://bugzilla.tianocore.org/show_bug.cgi?id=623 as
> I saw this coming in future. I tried highlighting the problem in the
> MdeModulePkg that it does not provide a notifier to tell OVMF when core
> creates the MMIO holes etc. It was not a big problem with the SEV
> initially because we were never getting down to hypervisor to do
> something about those non-existent regions. But with the migration its
> now important that we should restart the discussion with UEFI folks and
> see what can be done. In the kernel patches we should do what is right
> for the kernel and not workaround the Ovmf limitation.
Ok, this makes sense. I will start exploring
kvm_arch_prepare_memory_region() to see if it can assist in computing
the guest RAM or otherwise i will look at adding a new ioctl interface
for the same.
Thanks,
Ashish
>
>
> >> Also, hypercalls for ROM, ACPI, device regions and any memory holes within
> >> the static bitmap setup as per guest RAM config will work, but what
> >> about hypercalls for any device regions beyond the guest RAM config ?
> >>
> >> Thanks,
> >> Ashish
> > I'm not super familiar with what the address beyond the top of ram is
> > used for. If the memory is not backed by RAM, will it even matter for
> > migration? Sounds like the encryption for SEV won't even apply to it.
> > If we don't need to know what the c-bit state of an address is, we
> > don't need to track it. It doesn't hurt to track it (which is why I'm
> > not super concerned about tracking the memory holes).
Hello Brijesh, Steve,
On Wed, Apr 08, 2020 at 03:18:18AM +0000, Ashish Kalra wrote:
> Hello Brijesh,
>
> On Tue, Apr 07, 2020 at 09:34:15PM -0500, Brijesh Singh wrote:
> >
> > On 4/7/20 8:38 PM, Steve Rutherford wrote:
> > > On Tue, Apr 7, 2020 at 6:17 PM Ashish Kalra <[email protected]> wrote:
> > >> Hello Steve, Brijesh,
> > >>
> > >> On Tue, Apr 07, 2020 at 05:35:57PM -0700, Steve Rutherford wrote:
> > >>> On Tue, Apr 7, 2020 at 5:29 PM Brijesh Singh <[email protected]> wrote:
> > >>>>
> > >>>> On 4/7/20 7:01 PM, Steve Rutherford wrote:
> > >>>>> On Mon, Apr 6, 2020 at 10:27 PM Ashish Kalra <[email protected]> wrote:
> > >>>>>> Hello Steve,
> > >>>>>>
> > >>>>>> On Mon, Apr 06, 2020 at 07:17:37PM -0700, Steve Rutherford wrote:
> > >>>>>>> On Sun, Mar 29, 2020 at 11:22 PM Ashish Kalra <[email protected]> wrote:
> > >>>>>>>> From: Brijesh Singh <[email protected]>
> > >>>>>>>>
> > >>>>>>>> This hypercall is used by the SEV guest to notify a change in the page
> > >>>>>>>> encryption status to the hypervisor. The hypercall should be invoked
> > >>>>>>>> only when the encryption attribute is changed from encrypted -> decrypted
> > >>>>>>>> and vice versa. By default all guest pages are considered encrypted.
> > >>>>>>>>
> > >>>>>>>> Cc: Thomas Gleixner <[email protected]>
> > >>>>>>>> Cc: Ingo Molnar <[email protected]>
> > >>>>>>>> Cc: "H. Peter Anvin" <[email protected]>
> > >>>>>>>> Cc: Paolo Bonzini <[email protected]>
> > >>>>>>>> Cc: "Radim Krčmář" <[email protected]>
> > >>>>>>>> Cc: Joerg Roedel <[email protected]>
> > >>>>>>>> Cc: Borislav Petkov <[email protected]>
> > >>>>>>>> Cc: Tom Lendacky <[email protected]>
> > >>>>>>>> Cc: [email protected]
> > >>>>>>>> Cc: [email protected]
> > >>>>>>>> Cc: [email protected]
> > >>>>>>>> Signed-off-by: Brijesh Singh <[email protected]>
> > >>>>>>>> Signed-off-by: Ashish Kalra <[email protected]>
> > >>>>>>>> ---
> > >>>>>>>> Documentation/virt/kvm/hypercalls.rst | 15 +++++
> > >>>>>>>> arch/x86/include/asm/kvm_host.h | 2 +
> > >>>>>>>> arch/x86/kvm/svm.c | 95 +++++++++++++++++++++++++++
> > >>>>>>>> arch/x86/kvm/vmx/vmx.c | 1 +
> > >>>>>>>> arch/x86/kvm/x86.c | 6 ++
> > >>>>>>>> include/uapi/linux/kvm_para.h | 1 +
> > >>>>>>>> 6 files changed, 120 insertions(+)
> > >>>>>>>>
> > >>>>>>>> diff --git a/Documentation/virt/kvm/hypercalls.rst b/Documentation/virt/kvm/hypercalls.rst
> > >>>>>>>> index dbaf207e560d..ff5287e68e81 100644
> > >>>>>>>> --- a/Documentation/virt/kvm/hypercalls.rst
> > >>>>>>>> +++ b/Documentation/virt/kvm/hypercalls.rst
> > >>>>>>>> @@ -169,3 +169,18 @@ a0: destination APIC ID
> > >>>>>>>>
> > >>>>>>>> :Usage example: When sending a call-function IPI-many to vCPUs, yield if
> > >>>>>>>> any of the IPI target vCPUs was preempted.
> > >>>>>>>> +
> > >>>>>>>> +
> > >>>>>>>> +8. KVM_HC_PAGE_ENC_STATUS
> > >>>>>>>> +-------------------------
> > >>>>>>>> +:Architecture: x86
> > >>>>>>>> +:Status: active
> > >>>>>>>> +:Purpose: Notify the encryption status changes in guest page table (SEV guest)
> > >>>>>>>> +
> > >>>>>>>> +a0: the guest physical address of the start page
> > >>>>>>>> +a1: the number of pages
> > >>>>>>>> +a2: encryption attribute
> > >>>>>>>> +
> > >>>>>>>> + Where:
> > >>>>>>>> + * 1: Encryption attribute is set
> > >>>>>>>> + * 0: Encryption attribute is cleared
> > >>>>>>>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > >>>>>>>> index 98959e8cd448..90718fa3db47 100644
> > >>>>>>>> --- a/arch/x86/include/asm/kvm_host.h
> > >>>>>>>> +++ b/arch/x86/include/asm/kvm_host.h
> > >>>>>>>> @@ -1267,6 +1267,8 @@ struct kvm_x86_ops {
> > >>>>>>>>
> > >>>>>>>> bool (*apic_init_signal_blocked)(struct kvm_vcpu *vcpu);
> > >>>>>>>> int (*enable_direct_tlbflush)(struct kvm_vcpu *vcpu);
> > >>>>>>>> + int (*page_enc_status_hc)(struct kvm *kvm, unsigned long gpa,
> > >>>>>>>> + unsigned long sz, unsigned long mode);
> > >>>>>>> Nit: spell out size instead of sz.
> > >>>>>>>> };
> > >>>>>>>>
> > >>>>>>>> struct kvm_arch_async_pf {
> > >>>>>>>> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> > >>>>>>>> index 7c2721e18b06..1d8beaf1bceb 100644
> > >>>>>>>> --- a/arch/x86/kvm/svm.c
> > >>>>>>>> +++ b/arch/x86/kvm/svm.c
> > >>>>>>>> @@ -136,6 +136,8 @@ struct kvm_sev_info {
> > >>>>>>>> int fd; /* SEV device fd */
> > >>>>>>>> unsigned long pages_locked; /* Number of pages locked */
> > >>>>>>>> struct list_head regions_list; /* List of registered regions */
> > >>>>>>>> + unsigned long *page_enc_bmap;
> > >>>>>>>> + unsigned long page_enc_bmap_size;
> > >>>>>>>> };
> > >>>>>>>>
> > >>>>>>>> struct kvm_svm {
> > >>>>>>>> @@ -1991,6 +1993,9 @@ static void sev_vm_destroy(struct kvm *kvm)
> > >>>>>>>>
> > >>>>>>>> sev_unbind_asid(kvm, sev->handle);
> > >>>>>>>> sev_asid_free(sev->asid);
> > >>>>>>>> +
> > >>>>>>>> + kvfree(sev->page_enc_bmap);
> > >>>>>>>> + sev->page_enc_bmap = NULL;
> > >>>>>>>> }
> > >>>>>>>>
> > >>>>>>>> static void avic_vm_destroy(struct kvm *kvm)
> > >>>>>>>> @@ -7593,6 +7598,94 @@ static int sev_receive_finish(struct kvm *kvm, struct kvm_sev_cmd *argp)
> > >>>>>>>> return ret;
> > >>>>>>>> }
> > >>>>>>>>
> > >>>>>>>> +static int sev_resize_page_enc_bitmap(struct kvm *kvm, unsigned long new_size)
> > >>>>>>>> +{
> > >>>>>>>> + struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> > >>>>>>>> + unsigned long *map;
> > >>>>>>>> + unsigned long sz;
> > >>>>>>>> +
> > >>>>>>>> + if (sev->page_enc_bmap_size >= new_size)
> > >>>>>>>> + return 0;
> > >>>>>>>> +
> > >>>>>>>> + sz = ALIGN(new_size, BITS_PER_LONG) / 8;
> > >>>>>>>> +
> > >>>>>>>> + map = vmalloc(sz);
> > >>>>>>>> + if (!map) {
> > >>>>>>>> + pr_err_once("Failed to allocate encrypted bitmap size %lx\n",
> > >>>>>>>> + sz);
> > >>>>>>>> + return -ENOMEM;
> > >>>>>>>> + }
> > >>>>>>>> +
> > >>>>>>>> + /* mark the page encrypted (by default) */
> > >>>>>>>> + memset(map, 0xff, sz);
> > >>>>>>>> +
> > >>>>>>>> + bitmap_copy(map, sev->page_enc_bmap, sev->page_enc_bmap_size);
> > >>>>>>>> + kvfree(sev->page_enc_bmap);
> > >>>>>>>> +
> > >>>>>>>> + sev->page_enc_bmap = map;
> > >>>>>>>> + sev->page_enc_bmap_size = new_size;
> > >>>>>>>> +
> > >>>>>>>> + return 0;
> > >>>>>>>> +}
> > >>>>>>>> +
> > >>>>>>>> +static int svm_page_enc_status_hc(struct kvm *kvm, unsigned long gpa,
> > >>>>>>>> + unsigned long npages, unsigned long enc)
> > >>>>>>>> +{
> > >>>>>>>> + struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> > >>>>>>>> + kvm_pfn_t pfn_start, pfn_end;
> > >>>>>>>> + gfn_t gfn_start, gfn_end;
> > >>>>>>>> + int ret;
> > >>>>>>>> +
> > >>>>>>>> + if (!sev_guest(kvm))
> > >>>>>>>> + return -EINVAL;
> > >>>>>>>> +
> > >>>>>>>> + if (!npages)
> > >>>>>>>> + return 0;
> > >>>>>>>> +
> > >>>>>>>> + gfn_start = gpa_to_gfn(gpa);
> > >>>>>>>> + gfn_end = gfn_start + npages;
> > >>>>>>>> +
> > >>>>>>>> + /* out of bound access error check */
> > >>>>>>>> + if (gfn_end <= gfn_start)
> > >>>>>>>> + return -EINVAL;
> > >>>>>>>> +
> > >>>>>>>> + /* lets make sure that gpa exist in our memslot */
> > >>>>>>>> + pfn_start = gfn_to_pfn(kvm, gfn_start);
> > >>>>>>>> + pfn_end = gfn_to_pfn(kvm, gfn_end);
> > >>>>>>>> +
> > >>>>>>>> + if (is_error_noslot_pfn(pfn_start) && !is_noslot_pfn(pfn_start)) {
> > >>>>>>>> + /*
> > >>>>>>>> + * Allow guest MMIO range(s) to be added
> > >>>>>>>> + * to the page encryption bitmap.
> > >>>>>>>> + */
> > >>>>>>>> + return -EINVAL;
> > >>>>>>>> + }
> > >>>>>>>> +
> > >>>>>>>> + if (is_error_noslot_pfn(pfn_end) && !is_noslot_pfn(pfn_end)) {
> > >>>>>>>> + /*
> > >>>>>>>> + * Allow guest MMIO range(s) to be added
> > >>>>>>>> + * to the page encryption bitmap.
> > >>>>>>>> + */
> > >>>>>>>> + return -EINVAL;
> > >>>>>>>> + }
> > >>>>>>>> +
> > >>>>>>>> + mutex_lock(&kvm->lock);
> > >>>>>>>> + ret = sev_resize_page_enc_bitmap(kvm, gfn_end);
> > >>>>>>>> + if (ret)
> > >>>>>>>> + goto unlock;
> > >>>>>>>> +
> > >>>>>>>> + if (enc)
> > >>>>>>>> + __bitmap_set(sev->page_enc_bmap, gfn_start,
> > >>>>>>>> + gfn_end - gfn_start);
> > >>>>>>>> + else
> > >>>>>>>> + __bitmap_clear(sev->page_enc_bmap, gfn_start,
> > >>>>>>>> + gfn_end - gfn_start);
> > >>>>>>>> +
> > >>>>>>>> +unlock:
> > >>>>>>>> + mutex_unlock(&kvm->lock);
> > >>>>>>>> + return ret;
> > >>>>>>>> +}
> > >>>>>>>> +
> > >>>>>>>> static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
> > >>>>>>>> {
> > >>>>>>>> struct kvm_sev_cmd sev_cmd;
> > >>>>>>>> @@ -7995,6 +8088,8 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
> > >>>>>>>> .need_emulation_on_page_fault = svm_need_emulation_on_page_fault,
> > >>>>>>>>
> > >>>>>>>> .apic_init_signal_blocked = svm_apic_init_signal_blocked,
> > >>>>>>>> +
> > >>>>>>>> + .page_enc_status_hc = svm_page_enc_status_hc,
> > >>>>>>>> };
> > >>>>>>>>
> > >>>>>>>> static int __init svm_init(void)
> > >>>>>>>> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> > >>>>>>>> index 079d9fbf278e..f68e76ee7f9c 100644
> > >>>>>>>> --- a/arch/x86/kvm/vmx/vmx.c
> > >>>>>>>> +++ b/arch/x86/kvm/vmx/vmx.c
> > >>>>>>>> @@ -8001,6 +8001,7 @@ static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
> > >>>>>>>> .nested_get_evmcs_version = NULL,
> > >>>>>>>> .need_emulation_on_page_fault = vmx_need_emulation_on_page_fault,
> > >>>>>>>> .apic_init_signal_blocked = vmx_apic_init_signal_blocked,
> > >>>>>>>> + .page_enc_status_hc = NULL,
> > >>>>>>>> };
> > >>>>>>>>
> > >>>>>>>> static void vmx_cleanup_l1d_flush(void)
> > >>>>>>>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > >>>>>>>> index cf95c36cb4f4..68428eef2dde 100644
> > >>>>>>>> --- a/arch/x86/kvm/x86.c
> > >>>>>>>> +++ b/arch/x86/kvm/x86.c
> > >>>>>>>> @@ -7564,6 +7564,12 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
> > >>>>>>>> kvm_sched_yield(vcpu->kvm, a0);
> > >>>>>>>> ret = 0;
> > >>>>>>>> break;
> > >>>>>>>> + case KVM_HC_PAGE_ENC_STATUS:
> > >>>>>>>> + ret = -KVM_ENOSYS;
> > >>>>>>>> + if (kvm_x86_ops->page_enc_status_hc)
> > >>>>>>>> + ret = kvm_x86_ops->page_enc_status_hc(vcpu->kvm,
> > >>>>>>>> + a0, a1, a2);
> > >>>>>>>> + break;
> > >>>>>>>> default:
> > >>>>>>>> ret = -KVM_ENOSYS;
> > >>>>>>>> break;
> > >>>>>>>> diff --git a/include/uapi/linux/kvm_para.h b/include/uapi/linux/kvm_para.h
> > >>>>>>>> index 8b86609849b9..847b83b75dc8 100644
> > >>>>>>>> --- a/include/uapi/linux/kvm_para.h
> > >>>>>>>> +++ b/include/uapi/linux/kvm_para.h
> > >>>>>>>> @@ -29,6 +29,7 @@
> > >>>>>>>> #define KVM_HC_CLOCK_PAIRING 9
> > >>>>>>>> #define KVM_HC_SEND_IPI 10
> > >>>>>>>> #define KVM_HC_SCHED_YIELD 11
> > >>>>>>>> +#define KVM_HC_PAGE_ENC_STATUS 12
> > >>>>>>>>
> > >>>>>>>> /*
> > >>>>>>>> * hypercalls use architecture specific
> > >>>>>>>> --
> > >>>>>>>> 2.17.1
> > >>>>>>>>
> > >>>>>>> I'm still not excited by the dynamic resizing. I believe the guest
> > >>>>>>> hypercall can be called in atomic contexts, which makes me
> > >>>>>>> particularly unexcited to see a potentially large vmalloc on the host
> > >>>>>>> followed by filling the buffer. Particularly when the buffer might be
> > >>>>>>> non-trivial in size (~1MB per 32GB, per some back of the envelope
> > >>>>>>> math).
> > >>>>>>>
> > >>>>>> I think looking at more practical situations, most hypercalls will
> > >>>>>> happen during the boot stage, when device specific initializations are
> > >>>>>> happening, so typically the maximum page encryption bitmap size would
> > >>>>>> be allocated early enough.
> > >>>>>>
> > >>>>>> In fact, initial hypercalls made by OVMF will probably allocate the
> > >>>>>> maximum page bitmap size even before the kernel comes up, especially
> > >>>>>> as they will be setting up page enc/dec status for MMIO, ROM, ACPI
> > >>>>>> regions, PCI device memory, etc., and most importantly for
> > >>>>>> "non-existent" high memory range (which will probably be the
> > >>>>>> maximum size page encryption bitmap allocated/resized).
> > >>>>>>
> > >>>>>> Let me know if you have different thoughts on this ?
> > >>>>> Hi Ashish,
> > >>>>>
> > >>>>> If this is not an issue in practice, we can just move past this. If we
> > >>>>> are basically guaranteed that OVMF will trigger hypercalls that expand
> > >>>>> the bitmap beyond the top of memory, then, yes, that should work. That
> > >>>>> leaves me slightly nervous that OVMF might regress since it's not
> > >>>>> obvious that calling a hypercall beyond the top of memory would be
> > >>>>> "required" for avoiding a somewhat indirectly related issue in guest
> > >>>>> kernels.
> > >>>>
> > >>>> If possible then we should try to avoid growing/shrinking the bitmap .
> > >>>> Today OVMF may not be accessing beyond memory but a malicious guest
> > >>>> could send a hypercall down which can trigger a huge memory allocation
> > >>>> on the host side and may eventually cause denial of service for other.
> > >>> Nice catch! Was just writing up an email about this.
> > >>>> I am in favor if we can find some solution to handle this case. How
> > >>>> about Steve's suggestion about VMM making a call down to the kernel to
> > >>>> tell how big the bitmap should be? Initially it should be equal to the
> > >>>> guest RAM and if VMM ever did the memory expansion then it can send down
> > >>>> another notification to increase the bitmap ?
> > >>>>
> > >>>> Optionally, instead of adding a new ioctl, I was wondering if we can
> > >>>> extend the kvm_arch_prepare_memory_region() to make svm specific x86_ops
> > >>>> which can take read the userspace provided memory region and calculate
> > >>>> the amount of guest RAM managed by the KVM and grow/shrink the bitmap
> > >>>> based on that information. I have not looked deep enough to see if its
> > >>>> doable but if it can work then we can avoid adding yet another ioctl.
> > >>> We also have the set bitmap ioctl in a later patch in this series. We
> > >>> could also use the set ioctl for initialization (it's a little
> > >>> excessive for initialization since there will be an additional
> > >>> ephemeral allocation and a few additional buffer copies, but that's
> > >>> probably fine). An enable_cap has the added benefit of probably being
> > >>> necessary anyway so usermode can disable the migration feature flag.
> > >>>
> > >>> In general, userspace is going to have to be in direct control of the
> > >>> buffer and its size.
> > >> My only practical concern about setting a static bitmap size based on guest
> > >> memory is about the hypercalls being made initially by OVMF to set page
> > >> enc/dec status for ROM, ACPI regions and especially the non-existent
> > >> high memory range. The new ioctl will statically setup bitmap size to
> > >> whatever guest RAM is specified, say 4G, 8G, etc., but the OVMF
> > >> hypercall for non-existent memory will try to do a hypercall for guest
> > >> physical memory range like ~6G->64G (for 4G guest RAM setup), this
> > >> hypercall will basically have to just return doing nothing, because
> > >> the allocated bitmap won't have this guest physical range available ?
> >
> >
> > IMO, Ovmf issuing a hypercall beyond the guest RAM is simple wrong, it
> > should *not* do that. There was a feature request I submitted sometime
> > back to Tianocore https://bugzilla.tianocore.org/show_bug.cgi?id=623 as
> > I saw this coming in future. I tried highlighting the problem in the
> > MdeModulePkg that it does not provide a notifier to tell OVMF when core
> > creates the MMIO holes etc. It was not a big problem with the SEV
> > initially because we were never getting down to hypervisor to do
> > something about those non-existent regions. But with the migration its
> > now important that we should restart the discussion with UEFI folks and
> > see what can be done. In the kernel patches we should do what is right
> > for the kernel and not workaround the Ovmf limitation.
>
> Ok, this makes sense. I will start exploring
> kvm_arch_prepare_memory_region() to see if it can assist in computing
> the guest RAM or otherwise i will look at adding a new ioctl interface
> for the same.
>
I looked at kvm_arch_prepare_memory_region() and
kvm_arch_commit_memory_region() and kvm_arch_commit_memory_region()
looks to be ideal to use for this.
I walked the kvm_memslots in this function and i can compute the
approximate guest RAM mapped by KVM, though, i get the guest RAM size as
"twice" the configured size because of the two address spaces on x86 KVM,
i believe there is one additional address space for SMM/SMRAM use.
I don't think we have a use case of migrating a SEV guest with SMM
support ?
Considering that, i believe that i can just compute the guest RAM size
using memslots for address space #0 and use that to grow/shrink the bitmap.
As you mentioned i will need to add a new SVM specific x86_ops to
callback as part of kvm_arch_commit_memory_region() which will in-turn
call sev_resize_page_enc_bitmap().
Thanks,
Ashish
> >
> >
> > >> Also, hypercalls for ROM, ACPI, device regions and any memory holes within
> > >> the static bitmap setup as per guest RAM config will work, but what
> > >> about hypercalls for any device regions beyond the guest RAM config ?
> > >>
> > >> Thanks,
> > >> Ashish
> > > I'm not super familiar with what the address beyond the top of ram is
> > > used for. If the memory is not backed by RAM, will it even matter for
> > > migration? Sounds like the encryption for SEV won't even apply to it.
> > > If we don't need to know what the c-bit state of an address is, we
> > > don't need to track it. It doesn't hurt to track it (which is why I'm
> > > not super concerned about tracking the memory holes).
On Thu, Apr 9, 2020 at 9:18 AM Ashish Kalra <[email protected]> wrote:
>
> Hello Brijesh, Steve,
>
> On Wed, Apr 08, 2020 at 03:18:18AM +0000, Ashish Kalra wrote:
> > Hello Brijesh,
> >
> > On Tue, Apr 07, 2020 at 09:34:15PM -0500, Brijesh Singh wrote:
> > >
> > > On 4/7/20 8:38 PM, Steve Rutherford wrote:
> > > > On Tue, Apr 7, 2020 at 6:17 PM Ashish Kalra <[email protected]> wrote:
> > > >> Hello Steve, Brijesh,
> > > >>
> > > >> On Tue, Apr 07, 2020 at 05:35:57PM -0700, Steve Rutherford wrote:
> > > >>> On Tue, Apr 7, 2020 at 5:29 PM Brijesh Singh <[email protected]> wrote:
> > > >>>>
> > > >>>> On 4/7/20 7:01 PM, Steve Rutherford wrote:
> > > >>>>> On Mon, Apr 6, 2020 at 10:27 PM Ashish Kalra <[email protected]> wrote:
> > > >>>>>> Hello Steve,
> > > >>>>>>
> > > >>>>>> On Mon, Apr 06, 2020 at 07:17:37PM -0700, Steve Rutherford wrote:
> > > >>>>>>> On Sun, Mar 29, 2020 at 11:22 PM Ashish Kalra <[email protected]> wrote:
> > > >>>>>>>> From: Brijesh Singh <[email protected]>
> > > >>>>>>>>
> > > >>>>>>>> This hypercall is used by the SEV guest to notify a change in the page
> > > >>>>>>>> encryption status to the hypervisor. The hypercall should be invoked
> > > >>>>>>>> only when the encryption attribute is changed from encrypted -> decrypted
> > > >>>>>>>> and vice versa. By default all guest pages are considered encrypted.
> > > >>>>>>>>
> > > >>>>>>>> Cc: Thomas Gleixner <[email protected]>
> > > >>>>>>>> Cc: Ingo Molnar <[email protected]>
> > > >>>>>>>> Cc: "H. Peter Anvin" <[email protected]>
> > > >>>>>>>> Cc: Paolo Bonzini <[email protected]>
> > > >>>>>>>> Cc: "Radim Krčmář" <[email protected]>
> > > >>>>>>>> Cc: Joerg Roedel <[email protected]>
> > > >>>>>>>> Cc: Borislav Petkov <[email protected]>
> > > >>>>>>>> Cc: Tom Lendacky <[email protected]>
> > > >>>>>>>> Cc: [email protected]
> > > >>>>>>>> Cc: [email protected]
> > > >>>>>>>> Cc: [email protected]
> > > >>>>>>>> Signed-off-by: Brijesh Singh <[email protected]>
> > > >>>>>>>> Signed-off-by: Ashish Kalra <[email protected]>
> > > >>>>>>>> ---
> > > >>>>>>>> Documentation/virt/kvm/hypercalls.rst | 15 +++++
> > > >>>>>>>> arch/x86/include/asm/kvm_host.h | 2 +
> > > >>>>>>>> arch/x86/kvm/svm.c | 95 +++++++++++++++++++++++++++
> > > >>>>>>>> arch/x86/kvm/vmx/vmx.c | 1 +
> > > >>>>>>>> arch/x86/kvm/x86.c | 6 ++
> > > >>>>>>>> include/uapi/linux/kvm_para.h | 1 +
> > > >>>>>>>> 6 files changed, 120 insertions(+)
> > > >>>>>>>>
> > > >>>>>>>> diff --git a/Documentation/virt/kvm/hypercalls.rst b/Documentation/virt/kvm/hypercalls.rst
> > > >>>>>>>> index dbaf207e560d..ff5287e68e81 100644
> > > >>>>>>>> --- a/Documentation/virt/kvm/hypercalls.rst
> > > >>>>>>>> +++ b/Documentation/virt/kvm/hypercalls.rst
> > > >>>>>>>> @@ -169,3 +169,18 @@ a0: destination APIC ID
> > > >>>>>>>>
> > > >>>>>>>> :Usage example: When sending a call-function IPI-many to vCPUs, yield if
> > > >>>>>>>> any of the IPI target vCPUs was preempted.
> > > >>>>>>>> +
> > > >>>>>>>> +
> > > >>>>>>>> +8. KVM_HC_PAGE_ENC_STATUS
> > > >>>>>>>> +-------------------------
> > > >>>>>>>> +:Architecture: x86
> > > >>>>>>>> +:Status: active
> > > >>>>>>>> +:Purpose: Notify the encryption status changes in guest page table (SEV guest)
> > > >>>>>>>> +
> > > >>>>>>>> +a0: the guest physical address of the start page
> > > >>>>>>>> +a1: the number of pages
> > > >>>>>>>> +a2: encryption attribute
> > > >>>>>>>> +
> > > >>>>>>>> + Where:
> > > >>>>>>>> + * 1: Encryption attribute is set
> > > >>>>>>>> + * 0: Encryption attribute is cleared
> > > >>>>>>>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > > >>>>>>>> index 98959e8cd448..90718fa3db47 100644
> > > >>>>>>>> --- a/arch/x86/include/asm/kvm_host.h
> > > >>>>>>>> +++ b/arch/x86/include/asm/kvm_host.h
> > > >>>>>>>> @@ -1267,6 +1267,8 @@ struct kvm_x86_ops {
> > > >>>>>>>>
> > > >>>>>>>> bool (*apic_init_signal_blocked)(struct kvm_vcpu *vcpu);
> > > >>>>>>>> int (*enable_direct_tlbflush)(struct kvm_vcpu *vcpu);
> > > >>>>>>>> + int (*page_enc_status_hc)(struct kvm *kvm, unsigned long gpa,
> > > >>>>>>>> + unsigned long sz, unsigned long mode);
> > > >>>>>>> Nit: spell out size instead of sz.
> > > >>>>>>>> };
> > > >>>>>>>>
> > > >>>>>>>> struct kvm_arch_async_pf {
> > > >>>>>>>> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> > > >>>>>>>> index 7c2721e18b06..1d8beaf1bceb 100644
> > > >>>>>>>> --- a/arch/x86/kvm/svm.c
> > > >>>>>>>> +++ b/arch/x86/kvm/svm.c
> > > >>>>>>>> @@ -136,6 +136,8 @@ struct kvm_sev_info {
> > > >>>>>>>> int fd; /* SEV device fd */
> > > >>>>>>>> unsigned long pages_locked; /* Number of pages locked */
> > > >>>>>>>> struct list_head regions_list; /* List of registered regions */
> > > >>>>>>>> + unsigned long *page_enc_bmap;
> > > >>>>>>>> + unsigned long page_enc_bmap_size;
> > > >>>>>>>> };
> > > >>>>>>>>
> > > >>>>>>>> struct kvm_svm {
> > > >>>>>>>> @@ -1991,6 +1993,9 @@ static void sev_vm_destroy(struct kvm *kvm)
> > > >>>>>>>>
> > > >>>>>>>> sev_unbind_asid(kvm, sev->handle);
> > > >>>>>>>> sev_asid_free(sev->asid);
> > > >>>>>>>> +
> > > >>>>>>>> + kvfree(sev->page_enc_bmap);
> > > >>>>>>>> + sev->page_enc_bmap = NULL;
> > > >>>>>>>> }
> > > >>>>>>>>
> > > >>>>>>>> static void avic_vm_destroy(struct kvm *kvm)
> > > >>>>>>>> @@ -7593,6 +7598,94 @@ static int sev_receive_finish(struct kvm *kvm, struct kvm_sev_cmd *argp)
> > > >>>>>>>> return ret;
> > > >>>>>>>> }
> > > >>>>>>>>
> > > >>>>>>>> +static int sev_resize_page_enc_bitmap(struct kvm *kvm, unsigned long new_size)
> > > >>>>>>>> +{
> > > >>>>>>>> + struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> > > >>>>>>>> + unsigned long *map;
> > > >>>>>>>> + unsigned long sz;
> > > >>>>>>>> +
> > > >>>>>>>> + if (sev->page_enc_bmap_size >= new_size)
> > > >>>>>>>> + return 0;
> > > >>>>>>>> +
> > > >>>>>>>> + sz = ALIGN(new_size, BITS_PER_LONG) / 8;
> > > >>>>>>>> +
> > > >>>>>>>> + map = vmalloc(sz);
> > > >>>>>>>> + if (!map) {
> > > >>>>>>>> + pr_err_once("Failed to allocate encrypted bitmap size %lx\n",
> > > >>>>>>>> + sz);
> > > >>>>>>>> + return -ENOMEM;
> > > >>>>>>>> + }
> > > >>>>>>>> +
> > > >>>>>>>> + /* mark the page encrypted (by default) */
> > > >>>>>>>> + memset(map, 0xff, sz);
> > > >>>>>>>> +
> > > >>>>>>>> + bitmap_copy(map, sev->page_enc_bmap, sev->page_enc_bmap_size);
> > > >>>>>>>> + kvfree(sev->page_enc_bmap);
> > > >>>>>>>> +
> > > >>>>>>>> + sev->page_enc_bmap = map;
> > > >>>>>>>> + sev->page_enc_bmap_size = new_size;
> > > >>>>>>>> +
> > > >>>>>>>> + return 0;
> > > >>>>>>>> +}
> > > >>>>>>>> +
> > > >>>>>>>> +static int svm_page_enc_status_hc(struct kvm *kvm, unsigned long gpa,
> > > >>>>>>>> + unsigned long npages, unsigned long enc)
> > > >>>>>>>> +{
> > > >>>>>>>> + struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> > > >>>>>>>> + kvm_pfn_t pfn_start, pfn_end;
> > > >>>>>>>> + gfn_t gfn_start, gfn_end;
> > > >>>>>>>> + int ret;
> > > >>>>>>>> +
> > > >>>>>>>> + if (!sev_guest(kvm))
> > > >>>>>>>> + return -EINVAL;
> > > >>>>>>>> +
> > > >>>>>>>> + if (!npages)
> > > >>>>>>>> + return 0;
> > > >>>>>>>> +
> > > >>>>>>>> + gfn_start = gpa_to_gfn(gpa);
> > > >>>>>>>> + gfn_end = gfn_start + npages;
> > > >>>>>>>> +
> > > >>>>>>>> + /* out of bound access error check */
> > > >>>>>>>> + if (gfn_end <= gfn_start)
> > > >>>>>>>> + return -EINVAL;
> > > >>>>>>>> +
> > > >>>>>>>> + /* lets make sure that gpa exist in our memslot */
> > > >>>>>>>> + pfn_start = gfn_to_pfn(kvm, gfn_start);
> > > >>>>>>>> + pfn_end = gfn_to_pfn(kvm, gfn_end);
> > > >>>>>>>> +
> > > >>>>>>>> + if (is_error_noslot_pfn(pfn_start) && !is_noslot_pfn(pfn_start)) {
> > > >>>>>>>> + /*
> > > >>>>>>>> + * Allow guest MMIO range(s) to be added
> > > >>>>>>>> + * to the page encryption bitmap.
> > > >>>>>>>> + */
> > > >>>>>>>> + return -EINVAL;
> > > >>>>>>>> + }
> > > >>>>>>>> +
> > > >>>>>>>> + if (is_error_noslot_pfn(pfn_end) && !is_noslot_pfn(pfn_end)) {
> > > >>>>>>>> + /*
> > > >>>>>>>> + * Allow guest MMIO range(s) to be added
> > > >>>>>>>> + * to the page encryption bitmap.
> > > >>>>>>>> + */
> > > >>>>>>>> + return -EINVAL;
> > > >>>>>>>> + }
> > > >>>>>>>> +
> > > >>>>>>>> + mutex_lock(&kvm->lock);
> > > >>>>>>>> + ret = sev_resize_page_enc_bitmap(kvm, gfn_end);
> > > >>>>>>>> + if (ret)
> > > >>>>>>>> + goto unlock;
> > > >>>>>>>> +
> > > >>>>>>>> + if (enc)
> > > >>>>>>>> + __bitmap_set(sev->page_enc_bmap, gfn_start,
> > > >>>>>>>> + gfn_end - gfn_start);
> > > >>>>>>>> + else
> > > >>>>>>>> + __bitmap_clear(sev->page_enc_bmap, gfn_start,
> > > >>>>>>>> + gfn_end - gfn_start);
> > > >>>>>>>> +
> > > >>>>>>>> +unlock:
> > > >>>>>>>> + mutex_unlock(&kvm->lock);
> > > >>>>>>>> + return ret;
> > > >>>>>>>> +}
> > > >>>>>>>> +
> > > >>>>>>>> static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
> > > >>>>>>>> {
> > > >>>>>>>> struct kvm_sev_cmd sev_cmd;
> > > >>>>>>>> @@ -7995,6 +8088,8 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
> > > >>>>>>>> .need_emulation_on_page_fault = svm_need_emulation_on_page_fault,
> > > >>>>>>>>
> > > >>>>>>>> .apic_init_signal_blocked = svm_apic_init_signal_blocked,
> > > >>>>>>>> +
> > > >>>>>>>> + .page_enc_status_hc = svm_page_enc_status_hc,
> > > >>>>>>>> };
> > > >>>>>>>>
> > > >>>>>>>> static int __init svm_init(void)
> > > >>>>>>>> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> > > >>>>>>>> index 079d9fbf278e..f68e76ee7f9c 100644
> > > >>>>>>>> --- a/arch/x86/kvm/vmx/vmx.c
> > > >>>>>>>> +++ b/arch/x86/kvm/vmx/vmx.c
> > > >>>>>>>> @@ -8001,6 +8001,7 @@ static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
> > > >>>>>>>> .nested_get_evmcs_version = NULL,
> > > >>>>>>>> .need_emulation_on_page_fault = vmx_need_emulation_on_page_fault,
> > > >>>>>>>> .apic_init_signal_blocked = vmx_apic_init_signal_blocked,
> > > >>>>>>>> + .page_enc_status_hc = NULL,
> > > >>>>>>>> };
> > > >>>>>>>>
> > > >>>>>>>> static void vmx_cleanup_l1d_flush(void)
> > > >>>>>>>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > > >>>>>>>> index cf95c36cb4f4..68428eef2dde 100644
> > > >>>>>>>> --- a/arch/x86/kvm/x86.c
> > > >>>>>>>> +++ b/arch/x86/kvm/x86.c
> > > >>>>>>>> @@ -7564,6 +7564,12 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
> > > >>>>>>>> kvm_sched_yield(vcpu->kvm, a0);
> > > >>>>>>>> ret = 0;
> > > >>>>>>>> break;
> > > >>>>>>>> + case KVM_HC_PAGE_ENC_STATUS:
> > > >>>>>>>> + ret = -KVM_ENOSYS;
> > > >>>>>>>> + if (kvm_x86_ops->page_enc_status_hc)
> > > >>>>>>>> + ret = kvm_x86_ops->page_enc_status_hc(vcpu->kvm,
> > > >>>>>>>> + a0, a1, a2);
> > > >>>>>>>> + break;
> > > >>>>>>>> default:
> > > >>>>>>>> ret = -KVM_ENOSYS;
> > > >>>>>>>> break;
> > > >>>>>>>> diff --git a/include/uapi/linux/kvm_para.h b/include/uapi/linux/kvm_para.h
> > > >>>>>>>> index 8b86609849b9..847b83b75dc8 100644
> > > >>>>>>>> --- a/include/uapi/linux/kvm_para.h
> > > >>>>>>>> +++ b/include/uapi/linux/kvm_para.h
> > > >>>>>>>> @@ -29,6 +29,7 @@
> > > >>>>>>>> #define KVM_HC_CLOCK_PAIRING 9
> > > >>>>>>>> #define KVM_HC_SEND_IPI 10
> > > >>>>>>>> #define KVM_HC_SCHED_YIELD 11
> > > >>>>>>>> +#define KVM_HC_PAGE_ENC_STATUS 12
> > > >>>>>>>>
> > > >>>>>>>> /*
> > > >>>>>>>> * hypercalls use architecture specific
> > > >>>>>>>> --
> > > >>>>>>>> 2.17.1
> > > >>>>>>>>
> > > >>>>>>> I'm still not excited by the dynamic resizing. I believe the guest
> > > >>>>>>> hypercall can be called in atomic contexts, which makes me
> > > >>>>>>> particularly unexcited to see a potentially large vmalloc on the host
> > > >>>>>>> followed by filling the buffer. Particularly when the buffer might be
> > > >>>>>>> non-trivial in size (~1MB per 32GB, per some back of the envelope
> > > >>>>>>> math).
> > > >>>>>>>
> > > >>>>>> I think looking at more practical situations, most hypercalls will
> > > >>>>>> happen during the boot stage, when device specific initializations are
> > > >>>>>> happening, so typically the maximum page encryption bitmap size would
> > > >>>>>> be allocated early enough.
> > > >>>>>>
> > > >>>>>> In fact, initial hypercalls made by OVMF will probably allocate the
> > > >>>>>> maximum page bitmap size even before the kernel comes up, especially
> > > >>>>>> as they will be setting up page enc/dec status for MMIO, ROM, ACPI
> > > >>>>>> regions, PCI device memory, etc., and most importantly for
> > > >>>>>> "non-existent" high memory range (which will probably be the
> > > >>>>>> maximum size page encryption bitmap allocated/resized).
> > > >>>>>>
> > > >>>>>> Let me know if you have different thoughts on this ?
> > > >>>>> Hi Ashish,
> > > >>>>>
> > > >>>>> If this is not an issue in practice, we can just move past this. If we
> > > >>>>> are basically guaranteed that OVMF will trigger hypercalls that expand
> > > >>>>> the bitmap beyond the top of memory, then, yes, that should work. That
> > > >>>>> leaves me slightly nervous that OVMF might regress since it's not
> > > >>>>> obvious that calling a hypercall beyond the top of memory would be
> > > >>>>> "required" for avoiding a somewhat indirectly related issue in guest
> > > >>>>> kernels.
> > > >>>>
> > > >>>> If possible then we should try to avoid growing/shrinking the bitmap .
> > > >>>> Today OVMF may not be accessing beyond memory but a malicious guest
> > > >>>> could send a hypercall down which can trigger a huge memory allocation
> > > >>>> on the host side and may eventually cause denial of service for other.
> > > >>> Nice catch! Was just writing up an email about this.
> > > >>>> I am in favor if we can find some solution to handle this case. How
> > > >>>> about Steve's suggestion about VMM making a call down to the kernel to
> > > >>>> tell how big the bitmap should be? Initially it should be equal to the
> > > >>>> guest RAM and if VMM ever did the memory expansion then it can send down
> > > >>>> another notification to increase the bitmap ?
> > > >>>>
> > > >>>> Optionally, instead of adding a new ioctl, I was wondering if we can
> > > >>>> extend the kvm_arch_prepare_memory_region() to make svm specific x86_ops
> > > >>>> which can take read the userspace provided memory region and calculate
> > > >>>> the amount of guest RAM managed by the KVM and grow/shrink the bitmap
> > > >>>> based on that information. I have not looked deep enough to see if its
> > > >>>> doable but if it can work then we can avoid adding yet another ioctl.
> > > >>> We also have the set bitmap ioctl in a later patch in this series. We
> > > >>> could also use the set ioctl for initialization (it's a little
> > > >>> excessive for initialization since there will be an additional
> > > >>> ephemeral allocation and a few additional buffer copies, but that's
> > > >>> probably fine). An enable_cap has the added benefit of probably being
> > > >>> necessary anyway so usermode can disable the migration feature flag.
> > > >>>
> > > >>> In general, userspace is going to have to be in direct control of the
> > > >>> buffer and its size.
> > > >> My only practical concern about setting a static bitmap size based on guest
> > > >> memory is about the hypercalls being made initially by OVMF to set page
> > > >> enc/dec status for ROM, ACPI regions and especially the non-existent
> > > >> high memory range. The new ioctl will statically setup bitmap size to
> > > >> whatever guest RAM is specified, say 4G, 8G, etc., but the OVMF
> > > >> hypercall for non-existent memory will try to do a hypercall for guest
> > > >> physical memory range like ~6G->64G (for 4G guest RAM setup), this
> > > >> hypercall will basically have to just return doing nothing, because
> > > >> the allocated bitmap won't have this guest physical range available ?
> > >
> > >
> > > IMO, Ovmf issuing a hypercall beyond the guest RAM is simple wrong, it
> > > should *not* do that. There was a feature request I submitted sometime
> > > back to Tianocore https://bugzilla.tianocore.org/show_bug.cgi?id=623 as
> > > I saw this coming in future. I tried highlighting the problem in the
> > > MdeModulePkg that it does not provide a notifier to tell OVMF when core
> > > creates the MMIO holes etc. It was not a big problem with the SEV
> > > initially because we were never getting down to hypervisor to do
> > > something about those non-existent regions. But with the migration its
> > > now important that we should restart the discussion with UEFI folks and
> > > see what can be done. In the kernel patches we should do what is right
> > > for the kernel and not workaround the Ovmf limitation.
> >
> > Ok, this makes sense. I will start exploring
> > kvm_arch_prepare_memory_region() to see if it can assist in computing
> > the guest RAM or otherwise i will look at adding a new ioctl interface
> > for the same.
> >
>
> I looked at kvm_arch_prepare_memory_region() and
> kvm_arch_commit_memory_region() and kvm_arch_commit_memory_region()
> looks to be ideal to use for this.
>
> I walked the kvm_memslots in this function and i can compute the
> approximate guest RAM mapped by KVM, though, i get the guest RAM size as
> "twice" the configured size because of the two address spaces on x86 KVM,
> i believe there is one additional address space for SMM/SMRAM use.
>
> I don't think we have a use case of migrating a SEV guest with SMM
> support ?
>
> Considering that, i believe that i can just compute the guest RAM size
> using memslots for address space #0 and use that to grow/shrink the bitmap.
>
> As you mentioned i will need to add a new SVM specific x86_ops to
> callback as part of kvm_arch_commit_memory_region() which will in-turn
> call sev_resize_page_enc_bitmap().
>
> Thanks,
> Ashish
>
> > >
> > >
> > > >> Also, hypercalls for ROM, ACPI, device regions and any memory holes within
> > > >> the static bitmap setup as per guest RAM config will work, but what
> > > >> about hypercalls for any device regions beyond the guest RAM config ?
> > > >>
> > > >> Thanks,
> > > >> Ashish
Supporting Migration of SEV VM with SMM seems unnecessary, but I
wouldn't go out of my way to make it harder to fix. The bitmap is
as_id unaware already (as it likely should be, since the PA space is
shared), so I would personally ignore summing up the sizes, and
instead look for the highest PA that is mapped by a memslot. If I'm
not mistaken, the address spaces should (more or less) entirely
overlap. This will be suboptimal if the PA space is sparse, but that
would be weird anyway.
You could also go back to a suggestion I had much earlier and have the
memslots own the bitmaps. I believe there is space to add flags for
enabling and disabling features like this on memslots. Basically, you
could treat this in the same way as dirty bitmaps, which seem pretty
similar. This would work well even if PA space were sparse.
The current patch set is nowhere near sufficient for supporting SMM
migration securely, which is fine: we would need to separate out
access to the hypercall for particular pages, so that non-SMM could
not corrupt SMM. And then the code in SMM would also need to support
the hypercall if it ever used shared pages. Until someone asks for
this I don't believe it is worth doing. If we put the bitmaps on the
memslots, we could probably limit access to a memslot's bitmap to
vcpus that can access that memslot. This seems unnecessary for now.
--Steve
On Tue, Apr 7, 2020 at 6:49 PM Ashish Kalra <[email protected]> wrote:
>
> Hello Steve,
>
> On Tue, Apr 07, 2020 at 05:26:33PM -0700, Steve Rutherford wrote:
> > On Sun, Mar 29, 2020 at 11:23 PM Ashish Kalra <[email protected]> wrote:
> > >
> > > From: Brijesh Singh <[email protected]>
> > >
> > > The ioctl can be used to set page encryption bitmap for an
> > > incoming guest.
> > >
> > > Cc: Thomas Gleixner <[email protected]>
> > > Cc: Ingo Molnar <[email protected]>
> > > Cc: "H. Peter Anvin" <[email protected]>
> > > Cc: Paolo Bonzini <[email protected]>
> > > Cc: "Radim Krčmář" <[email protected]>
> > > Cc: Joerg Roedel <[email protected]>
> > > Cc: Borislav Petkov <[email protected]>
> > > Cc: Tom Lendacky <[email protected]>
> > > Cc: [email protected]
> > > Cc: [email protected]
> > > Cc: [email protected]
> > > Signed-off-by: Brijesh Singh <[email protected]>
> > > Signed-off-by: Ashish Kalra <[email protected]>
> > > ---
> > > Documentation/virt/kvm/api.rst | 22 +++++++++++++++++
> > > arch/x86/include/asm/kvm_host.h | 2 ++
> > > arch/x86/kvm/svm.c | 42 +++++++++++++++++++++++++++++++++
> > > arch/x86/kvm/x86.c | 12 ++++++++++
> > > include/uapi/linux/kvm.h | 1 +
> > > 5 files changed, 79 insertions(+)
> > >
> > > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> > > index 8ad800ebb54f..4d1004a154f6 100644
> > > --- a/Documentation/virt/kvm/api.rst
> > > +++ b/Documentation/virt/kvm/api.rst
> > > @@ -4675,6 +4675,28 @@ or shared. The bitmap can be used during the guest migration, if the page
> > > is private then userspace need to use SEV migration commands to transmit
> > > the page.
> > >
> > > +4.126 KVM_SET_PAGE_ENC_BITMAP (vm ioctl)
> > > +---------------------------------------
> > > +
> > > +:Capability: basic
> > > +:Architectures: x86
> > > +:Type: vm ioctl
> > > +:Parameters: struct kvm_page_enc_bitmap (in/out)
> > > +:Returns: 0 on success, -1 on error
> > > +
> > > +/* for KVM_SET_PAGE_ENC_BITMAP */
> > > +struct kvm_page_enc_bitmap {
> > > + __u64 start_gfn;
> > > + __u64 num_pages;
> > > + union {
> > > + void __user *enc_bitmap; /* one bit per page */
> > > + __u64 padding2;
> > > + };
> > > +};
> > > +
> > > +During the guest live migration the outgoing guest exports its page encryption
> > > +bitmap, the KVM_SET_PAGE_ENC_BITMAP can be used to build the page encryption
> > > +bitmap for an incoming guest.
> > >
> > > 5. The kvm_run structure
> > > ========================
> > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > > index 27e43e3ec9d8..d30f770aaaea 100644
> > > --- a/arch/x86/include/asm/kvm_host.h
> > > +++ b/arch/x86/include/asm/kvm_host.h
> > > @@ -1271,6 +1271,8 @@ struct kvm_x86_ops {
> > > unsigned long sz, unsigned long mode);
> > > int (*get_page_enc_bitmap)(struct kvm *kvm,
> > > struct kvm_page_enc_bitmap *bmap);
> > > + int (*set_page_enc_bitmap)(struct kvm *kvm,
> > > + struct kvm_page_enc_bitmap *bmap);
> > > };
> > >
> > > struct kvm_arch_async_pf {
> > > diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> > > index bae783cd396a..313343a43045 100644
> > > --- a/arch/x86/kvm/svm.c
> > > +++ b/arch/x86/kvm/svm.c
> > > @@ -7756,6 +7756,47 @@ static int svm_get_page_enc_bitmap(struct kvm *kvm,
> > > return ret;
> > > }
> > >
> > > +static int svm_set_page_enc_bitmap(struct kvm *kvm,
> > > + struct kvm_page_enc_bitmap *bmap)
> > > +{
> > > + struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> > > + unsigned long gfn_start, gfn_end;
> > > + unsigned long *bitmap;
> > > + unsigned long sz, i;
> > > + int ret;
> > > +
> > > + if (!sev_guest(kvm))
> > > + return -ENOTTY;
> > > +
> > > + gfn_start = bmap->start_gfn;
> > > + gfn_end = gfn_start + bmap->num_pages;
> > > +
> > > + sz = ALIGN(bmap->num_pages, BITS_PER_LONG) / 8;
> > > + bitmap = kmalloc(sz, GFP_KERNEL);
> > > + if (!bitmap)
> > > + return -ENOMEM;
> > > +
> > > + ret = -EFAULT;
> > > + if (copy_from_user(bitmap, bmap->enc_bitmap, sz))
> > > + goto out;
> > > +
> > > + mutex_lock(&kvm->lock);
> > > + ret = sev_resize_page_enc_bitmap(kvm, gfn_end);
> > I realize now that usermode could use this for initializing the
> > minimum size of the enc bitmap, which probably solves my issue from
> > the other thread.
> > > + if (ret)
> > > + goto unlock;
> > > +
> > > + i = gfn_start;
> > > + for_each_clear_bit_from(i, bitmap, (gfn_end - gfn_start))
> > > + clear_bit(i + gfn_start, sev->page_enc_bmap);
> > This API seems a bit strange, since it can only clear bits. I would
> > expect "set" to force the values to match the values passed down,
> > instead of only ensuring that cleared bits in the input are also
> > cleared in the kernel.
> >
>
> The sev_resize_page_enc_bitmap() will allocate a new bitmap and
> set it to all 0xFF's, therefore, the code here simply clears the bits
> in the bitmap as per the cleared bits in the input.
If I'm not mistaken, resize only reinitializes the newly extended part
of the buffer, and copies the old values for the rest.
With the API you proposed you could probably reimplement a normal set
call by calling get, then reset, and then set, but this feels
cumbersome.
--Steve
On Tue, Apr 7, 2020 at 6:52 PM Ashish Kalra <[email protected]> wrote:
>
> Hello Steve,
>
> On Tue, Apr 07, 2020 at 06:25:51PM -0700, Steve Rutherford wrote:
> > On Mon, Apr 6, 2020 at 11:53 AM Krish Sadhukhan
> > <[email protected]> wrote:
> > >
> > >
> > > On 4/3/20 2:45 PM, Ashish Kalra wrote:
> > > > On Fri, Apr 03, 2020 at 02:14:23PM -0700, Krish Sadhukhan wrote:
> > > >> On 3/29/20 11:23 PM, Ashish Kalra wrote:
> > > >>> From: Ashish Kalra <[email protected]>
> > > >>>
> > > >>> This ioctl can be used by the application to reset the page
> > > >>> encryption bitmap managed by the KVM driver. A typical usage
> > > >>> for this ioctl is on VM reboot, on reboot, we must reinitialize
> > > >>> the bitmap.
> > > >>>
> > > >>> Signed-off-by: Ashish Kalra <[email protected]>
> > > >>> ---
> > > >>> Documentation/virt/kvm/api.rst | 13 +++++++++++++
> > > >>> arch/x86/include/asm/kvm_host.h | 1 +
> > > >>> arch/x86/kvm/svm.c | 16 ++++++++++++++++
> > > >>> arch/x86/kvm/x86.c | 6 ++++++
> > > >>> include/uapi/linux/kvm.h | 1 +
> > > >>> 5 files changed, 37 insertions(+)
> > > >>>
> > > >>> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> > > >>> index 4d1004a154f6..a11326ccc51d 100644
> > > >>> --- a/Documentation/virt/kvm/api.rst
> > > >>> +++ b/Documentation/virt/kvm/api.rst
> > > >>> @@ -4698,6 +4698,19 @@ During the guest live migration the outgoing guest exports its page encryption
> > > >>> bitmap, the KVM_SET_PAGE_ENC_BITMAP can be used to build the page encryption
> > > >>> bitmap for an incoming guest.
> > > >>> +4.127 KVM_PAGE_ENC_BITMAP_RESET (vm ioctl)
> > > >>> +-----------------------------------------
> > > >>> +
> > > >>> +:Capability: basic
> > > >>> +:Architectures: x86
> > > >>> +:Type: vm ioctl
> > > >>> +:Parameters: none
> > > >>> +:Returns: 0 on success, -1 on error
> > > >>> +
> > > >>> +The KVM_PAGE_ENC_BITMAP_RESET is used to reset the guest's page encryption
> > > >>> +bitmap during guest reboot and this is only done on the guest's boot vCPU.
> > > >>> +
> > > >>> +
> > > >>> 5. The kvm_run structure
> > > >>> ========================
> > > >>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > > >>> index d30f770aaaea..a96ef6338cd2 100644
> > > >>> --- a/arch/x86/include/asm/kvm_host.h
> > > >>> +++ b/arch/x86/include/asm/kvm_host.h
> > > >>> @@ -1273,6 +1273,7 @@ struct kvm_x86_ops {
> > > >>> struct kvm_page_enc_bitmap *bmap);
> > > >>> int (*set_page_enc_bitmap)(struct kvm *kvm,
> > > >>> struct kvm_page_enc_bitmap *bmap);
> > > >>> + int (*reset_page_enc_bitmap)(struct kvm *kvm);
> > > >>> };
> > > >>> struct kvm_arch_async_pf {
> > > >>> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> > > >>> index 313343a43045..c99b0207a443 100644
> > > >>> --- a/arch/x86/kvm/svm.c
> > > >>> +++ b/arch/x86/kvm/svm.c
> > > >>> @@ -7797,6 +7797,21 @@ static int svm_set_page_enc_bitmap(struct kvm *kvm,
> > > >>> return ret;
> > > >>> }
> > > >>> +static int svm_reset_page_enc_bitmap(struct kvm *kvm)
> > > >>> +{
> > > >>> + struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> > > >>> +
> > > >>> + if (!sev_guest(kvm))
> > > >>> + return -ENOTTY;
> > > >>> +
> > > >>> + mutex_lock(&kvm->lock);
> > > >>> + /* by default all pages should be marked encrypted */
> > > >>> + if (sev->page_enc_bmap_size)
> > > >>> + bitmap_fill(sev->page_enc_bmap, sev->page_enc_bmap_size);
> > > >>> + mutex_unlock(&kvm->lock);
> > > >>> + return 0;
> > > >>> +}
> > > >>> +
> > > >>> static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
> > > >>> {
> > > >>> struct kvm_sev_cmd sev_cmd;
> > > >>> @@ -8203,6 +8218,7 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
> > > >>> .page_enc_status_hc = svm_page_enc_status_hc,
> > > >>> .get_page_enc_bitmap = svm_get_page_enc_bitmap,
> > > >>> .set_page_enc_bitmap = svm_set_page_enc_bitmap,
> > > >>> + .reset_page_enc_bitmap = svm_reset_page_enc_bitmap,
> > > >>
> > > >> We don't need to initialize the intel ops to NULL ? It's not initialized in
> > > >> the previous patch either.
> > > >>
> > > >>> };
> > > > This struct is declared as "static storage", so won't the non-initialized
> > > > members be 0 ?
> > >
> > >
> > > Correct. Although, I see that 'nested_enable_evmcs' is explicitly
> > > initialized. We should maintain the convention, perhaps.
> > >
> > > >
> > > >>> static int __init svm_init(void)
> > > >>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > > >>> index 05e953b2ec61..2127ed937f53 100644
> > > >>> --- a/arch/x86/kvm/x86.c
> > > >>> +++ b/arch/x86/kvm/x86.c
> > > >>> @@ -5250,6 +5250,12 @@ long kvm_arch_vm_ioctl(struct file *filp,
> > > >>> r = kvm_x86_ops->set_page_enc_bitmap(kvm, &bitmap);
> > > >>> break;
> > > >>> }
> > > >>> + case KVM_PAGE_ENC_BITMAP_RESET: {
> > > >>> + r = -ENOTTY;
> > > >>> + if (kvm_x86_ops->reset_page_enc_bitmap)
> > > >>> + r = kvm_x86_ops->reset_page_enc_bitmap(kvm);
> > > >>> + break;
> > > >>> + }
> > > >>> default:
> > > >>> r = -ENOTTY;
> > > >>> }
> > > >>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> > > >>> index b4b01d47e568..0884a581fc37 100644
> > > >>> --- a/include/uapi/linux/kvm.h
> > > >>> +++ b/include/uapi/linux/kvm.h
> > > >>> @@ -1490,6 +1490,7 @@ struct kvm_enc_region {
> > > >>> #define KVM_GET_PAGE_ENC_BITMAP _IOW(KVMIO, 0xc5, struct kvm_page_enc_bitmap)
> > > >>> #define KVM_SET_PAGE_ENC_BITMAP _IOW(KVMIO, 0xc6, struct kvm_page_enc_bitmap)
> > > >>> +#define KVM_PAGE_ENC_BITMAP_RESET _IO(KVMIO, 0xc7)
> > > >>> /* Secure Encrypted Virtualization command */
> > > >>> enum sev_cmd_id {
> > > >> Reviewed-by: Krish Sadhukhan <[email protected]>
> >
> >
> > Doesn't this overlap with the set ioctl? Yes, obviously, you have to
> > copy the new value down and do a bit more work, but I don't think
> > resetting the bitmap is going to be the bottleneck on reboot. Seems
> > excessive to add another ioctl for this.
>
> The set ioctl is generally available/provided for the incoming VM to setup
> the page encryption bitmap, this reset ioctl is meant for the source VM
> as a simple interface to reset the whole page encryption bitmap.
>
> Thanks,
> Ashish
Hey Ashish,
These seem very overlapping. I think this API should be refactored a bit.
1) Use kvm_vm_ioctl_enable_cap to control whether or not this
hypercall (and related feature bit) is offered to the VM, and also the
size of the buffer.
2) Use set for manipulating values in the bitmap, including resetting
the bitmap. Set the bitmap pointer to null if you want to reset to all
0xFFs. When the bitmap pointer is set, it should set the values to
exactly what is pointed at, instead of only clearing bits, as is done
currently.
3) Use get for fetching values from the kernel. Personally, I'd
require alignment of the base GFN to a multiple of 8 (but the number
of pages could be whatever), so you can just use a memcpy. Optionally,
you may want some way to tell userspace the size of the existing
buffer, so it can ensure that it can ask for the entire buffer without
having to track the size in usermode (not strictly necessary, but nice
to have since it ensures that there is only one place that has to
manage this value).
If you want to expand or contract the bitmap, you can use enable cap
to adjust the size.
If you don't want to offer the hypercall to the guest, don't call the
enable cap.
This API avoids using up another ioctl. Ioctl space is somewhat
scarce. It also gives userspace fine grained control over the buffer,
so it can support both hot-plug and hot-unplug (or at the very least
it is not obviously incompatible with those). It also gives userspace
control over whether or not the feature is offered. The hypercall
isn't free, and being able to tell guests to not call when the host
wasn't going to migrate it anyway will be useful.
Thanks,
--Steve
Hello Steve,
On Thu, Apr 09, 2020 at 05:06:21PM -0700, Steve Rutherford wrote:
> On Tue, Apr 7, 2020 at 6:49 PM Ashish Kalra <[email protected]> wrote:
> >
> > Hello Steve,
> >
> > On Tue, Apr 07, 2020 at 05:26:33PM -0700, Steve Rutherford wrote:
> > > On Sun, Mar 29, 2020 at 11:23 PM Ashish Kalra <[email protected]> wrote:
> > > >
> > > > From: Brijesh Singh <[email protected]>
> > > >
> > > > The ioctl can be used to set page encryption bitmap for an
> > > > incoming guest.
> > > >
> > > > Cc: Thomas Gleixner <[email protected]>
> > > > Cc: Ingo Molnar <[email protected]>
> > > > Cc: "H. Peter Anvin" <[email protected]>
> > > > Cc: Paolo Bonzini <[email protected]>
> > > > Cc: "Radim Krčmář" <[email protected]>
> > > > Cc: Joerg Roedel <[email protected]>
> > > > Cc: Borislav Petkov <[email protected]>
> > > > Cc: Tom Lendacky <[email protected]>
> > > > Cc: [email protected]
> > > > Cc: [email protected]
> > > > Cc: [email protected]
> > > > Signed-off-by: Brijesh Singh <[email protected]>
> > > > Signed-off-by: Ashish Kalra <[email protected]>
> > > > ---
> > > > Documentation/virt/kvm/api.rst | 22 +++++++++++++++++
> > > > arch/x86/include/asm/kvm_host.h | 2 ++
> > > > arch/x86/kvm/svm.c | 42 +++++++++++++++++++++++++++++++++
> > > > arch/x86/kvm/x86.c | 12 ++++++++++
> > > > include/uapi/linux/kvm.h | 1 +
> > > > 5 files changed, 79 insertions(+)
> > > >
> > > > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> > > > index 8ad800ebb54f..4d1004a154f6 100644
> > > > --- a/Documentation/virt/kvm/api.rst
> > > > +++ b/Documentation/virt/kvm/api.rst
> > > > @@ -4675,6 +4675,28 @@ or shared. The bitmap can be used during the guest migration, if the page
> > > > is private then userspace need to use SEV migration commands to transmit
> > > > the page.
> > > >
> > > > +4.126 KVM_SET_PAGE_ENC_BITMAP (vm ioctl)
> > > > +---------------------------------------
> > > > +
> > > > +:Capability: basic
> > > > +:Architectures: x86
> > > > +:Type: vm ioctl
> > > > +:Parameters: struct kvm_page_enc_bitmap (in/out)
> > > > +:Returns: 0 on success, -1 on error
> > > > +
> > > > +/* for KVM_SET_PAGE_ENC_BITMAP */
> > > > +struct kvm_page_enc_bitmap {
> > > > + __u64 start_gfn;
> > > > + __u64 num_pages;
> > > > + union {
> > > > + void __user *enc_bitmap; /* one bit per page */
> > > > + __u64 padding2;
> > > > + };
> > > > +};
> > > > +
> > > > +During the guest live migration the outgoing guest exports its page encryption
> > > > +bitmap, the KVM_SET_PAGE_ENC_BITMAP can be used to build the page encryption
> > > > +bitmap for an incoming guest.
> > > >
> > > > 5. The kvm_run structure
> > > > ========================
> > > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > > > index 27e43e3ec9d8..d30f770aaaea 100644
> > > > --- a/arch/x86/include/asm/kvm_host.h
> > > > +++ b/arch/x86/include/asm/kvm_host.h
> > > > @@ -1271,6 +1271,8 @@ struct kvm_x86_ops {
> > > > unsigned long sz, unsigned long mode);
> > > > int (*get_page_enc_bitmap)(struct kvm *kvm,
> > > > struct kvm_page_enc_bitmap *bmap);
> > > > + int (*set_page_enc_bitmap)(struct kvm *kvm,
> > > > + struct kvm_page_enc_bitmap *bmap);
> > > > };
> > > >
> > > > struct kvm_arch_async_pf {
> > > > diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> > > > index bae783cd396a..313343a43045 100644
> > > > --- a/arch/x86/kvm/svm.c
> > > > +++ b/arch/x86/kvm/svm.c
> > > > @@ -7756,6 +7756,47 @@ static int svm_get_page_enc_bitmap(struct kvm *kvm,
> > > > return ret;
> > > > }
> > > >
> > > > +static int svm_set_page_enc_bitmap(struct kvm *kvm,
> > > > + struct kvm_page_enc_bitmap *bmap)
> > > > +{
> > > > + struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> > > > + unsigned long gfn_start, gfn_end;
> > > > + unsigned long *bitmap;
> > > > + unsigned long sz, i;
> > > > + int ret;
> > > > +
> > > > + if (!sev_guest(kvm))
> > > > + return -ENOTTY;
> > > > +
> > > > + gfn_start = bmap->start_gfn;
> > > > + gfn_end = gfn_start + bmap->num_pages;
> > > > +
> > > > + sz = ALIGN(bmap->num_pages, BITS_PER_LONG) / 8;
> > > > + bitmap = kmalloc(sz, GFP_KERNEL);
> > > > + if (!bitmap)
> > > > + return -ENOMEM;
> > > > +
> > > > + ret = -EFAULT;
> > > > + if (copy_from_user(bitmap, bmap->enc_bitmap, sz))
> > > > + goto out;
> > > > +
> > > > + mutex_lock(&kvm->lock);
> > > > + ret = sev_resize_page_enc_bitmap(kvm, gfn_end);
> > > I realize now that usermode could use this for initializing the
> > > minimum size of the enc bitmap, which probably solves my issue from
> > > the other thread.
> > > > + if (ret)
> > > > + goto unlock;
> > > > +
> > > > + i = gfn_start;
> > > > + for_each_clear_bit_from(i, bitmap, (gfn_end - gfn_start))
> > > > + clear_bit(i + gfn_start, sev->page_enc_bmap);
> > > This API seems a bit strange, since it can only clear bits. I would
> > > expect "set" to force the values to match the values passed down,
> > > instead of only ensuring that cleared bits in the input are also
> > > cleared in the kernel.
> > >
> >
> > The sev_resize_page_enc_bitmap() will allocate a new bitmap and
> > set it to all 0xFF's, therefore, the code here simply clears the bits
> > in the bitmap as per the cleared bits in the input.
>
> If I'm not mistaken, resize only reinitializes the newly extended part
> of the buffer, and copies the old values for the rest.
> With the API you proposed you could probably reimplement a normal set
> call by calling get, then reset, and then set, but this feels
> cumbersome.
>
As i mentioned earlier, the set api is basically meant for the incoming
VM, the resize will initialize the incoming VM's bitmap to all 0xFF's
and as there won't be any bitmap allocated initially on the incoming VM,
therefore, the bitmap copy will not do anything and the clear_bit later
will clear the incoming VM's bits as per the input.
Thanks,
Ashish
Hello Steve,
On Thu, Apr 09, 2020 at 05:59:56PM -0700, Steve Rutherford wrote:
> On Tue, Apr 7, 2020 at 6:52 PM Ashish Kalra <[email protected]> wrote:
> >
> > Hello Steve,
> >
> > On Tue, Apr 07, 2020 at 06:25:51PM -0700, Steve Rutherford wrote:
> > > On Mon, Apr 6, 2020 at 11:53 AM Krish Sadhukhan
> > > <[email protected]> wrote:
> > > >
> > > >
> > > > On 4/3/20 2:45 PM, Ashish Kalra wrote:
> > > > > On Fri, Apr 03, 2020 at 02:14:23PM -0700, Krish Sadhukhan wrote:
> > > > >> On 3/29/20 11:23 PM, Ashish Kalra wrote:
> > > > >>> From: Ashish Kalra <[email protected]>
> > > > >>>
> > > > >>> This ioctl can be used by the application to reset the page
> > > > >>> encryption bitmap managed by the KVM driver. A typical usage
> > > > >>> for this ioctl is on VM reboot, on reboot, we must reinitialize
> > > > >>> the bitmap.
> > > > >>>
> > > > >>> Signed-off-by: Ashish Kalra <[email protected]>
> > > > >>> ---
> > > > >>> Documentation/virt/kvm/api.rst | 13 +++++++++++++
> > > > >>> arch/x86/include/asm/kvm_host.h | 1 +
> > > > >>> arch/x86/kvm/svm.c | 16 ++++++++++++++++
> > > > >>> arch/x86/kvm/x86.c | 6 ++++++
> > > > >>> include/uapi/linux/kvm.h | 1 +
> > > > >>> 5 files changed, 37 insertions(+)
> > > > >>>
> > > > >>> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> > > > >>> index 4d1004a154f6..a11326ccc51d 100644
> > > > >>> --- a/Documentation/virt/kvm/api.rst
> > > > >>> +++ b/Documentation/virt/kvm/api.rst
> > > > >>> @@ -4698,6 +4698,19 @@ During the guest live migration the outgoing guest exports its page encryption
> > > > >>> bitmap, the KVM_SET_PAGE_ENC_BITMAP can be used to build the page encryption
> > > > >>> bitmap for an incoming guest.
> > > > >>> +4.127 KVM_PAGE_ENC_BITMAP_RESET (vm ioctl)
> > > > >>> +-----------------------------------------
> > > > >>> +
> > > > >>> +:Capability: basic
> > > > >>> +:Architectures: x86
> > > > >>> +:Type: vm ioctl
> > > > >>> +:Parameters: none
> > > > >>> +:Returns: 0 on success, -1 on error
> > > > >>> +
> > > > >>> +The KVM_PAGE_ENC_BITMAP_RESET is used to reset the guest's page encryption
> > > > >>> +bitmap during guest reboot and this is only done on the guest's boot vCPU.
> > > > >>> +
> > > > >>> +
> > > > >>> 5. The kvm_run structure
> > > > >>> ========================
> > > > >>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > > > >>> index d30f770aaaea..a96ef6338cd2 100644
> > > > >>> --- a/arch/x86/include/asm/kvm_host.h
> > > > >>> +++ b/arch/x86/include/asm/kvm_host.h
> > > > >>> @@ -1273,6 +1273,7 @@ struct kvm_x86_ops {
> > > > >>> struct kvm_page_enc_bitmap *bmap);
> > > > >>> int (*set_page_enc_bitmap)(struct kvm *kvm,
> > > > >>> struct kvm_page_enc_bitmap *bmap);
> > > > >>> + int (*reset_page_enc_bitmap)(struct kvm *kvm);
> > > > >>> };
> > > > >>> struct kvm_arch_async_pf {
> > > > >>> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> > > > >>> index 313343a43045..c99b0207a443 100644
> > > > >>> --- a/arch/x86/kvm/svm.c
> > > > >>> +++ b/arch/x86/kvm/svm.c
> > > > >>> @@ -7797,6 +7797,21 @@ static int svm_set_page_enc_bitmap(struct kvm *kvm,
> > > > >>> return ret;
> > > > >>> }
> > > > >>> +static int svm_reset_page_enc_bitmap(struct kvm *kvm)
> > > > >>> +{
> > > > >>> + struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> > > > >>> +
> > > > >>> + if (!sev_guest(kvm))
> > > > >>> + return -ENOTTY;
> > > > >>> +
> > > > >>> + mutex_lock(&kvm->lock);
> > > > >>> + /* by default all pages should be marked encrypted */
> > > > >>> + if (sev->page_enc_bmap_size)
> > > > >>> + bitmap_fill(sev->page_enc_bmap, sev->page_enc_bmap_size);
> > > > >>> + mutex_unlock(&kvm->lock);
> > > > >>> + return 0;
> > > > >>> +}
> > > > >>> +
> > > > >>> static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
> > > > >>> {
> > > > >>> struct kvm_sev_cmd sev_cmd;
> > > > >>> @@ -8203,6 +8218,7 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
> > > > >>> .page_enc_status_hc = svm_page_enc_status_hc,
> > > > >>> .get_page_enc_bitmap = svm_get_page_enc_bitmap,
> > > > >>> .set_page_enc_bitmap = svm_set_page_enc_bitmap,
> > > > >>> + .reset_page_enc_bitmap = svm_reset_page_enc_bitmap,
> > > > >>
> > > > >> We don't need to initialize the intel ops to NULL ? It's not initialized in
> > > > >> the previous patch either.
> > > > >>
> > > > >>> };
> > > > > This struct is declared as "static storage", so won't the non-initialized
> > > > > members be 0 ?
> > > >
> > > >
> > > > Correct. Although, I see that 'nested_enable_evmcs' is explicitly
> > > > initialized. We should maintain the convention, perhaps.
> > > >
> > > > >
> > > > >>> static int __init svm_init(void)
> > > > >>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > > > >>> index 05e953b2ec61..2127ed937f53 100644
> > > > >>> --- a/arch/x86/kvm/x86.c
> > > > >>> +++ b/arch/x86/kvm/x86.c
> > > > >>> @@ -5250,6 +5250,12 @@ long kvm_arch_vm_ioctl(struct file *filp,
> > > > >>> r = kvm_x86_ops->set_page_enc_bitmap(kvm, &bitmap);
> > > > >>> break;
> > > > >>> }
> > > > >>> + case KVM_PAGE_ENC_BITMAP_RESET: {
> > > > >>> + r = -ENOTTY;
> > > > >>> + if (kvm_x86_ops->reset_page_enc_bitmap)
> > > > >>> + r = kvm_x86_ops->reset_page_enc_bitmap(kvm);
> > > > >>> + break;
> > > > >>> + }
> > > > >>> default:
> > > > >>> r = -ENOTTY;
> > > > >>> }
> > > > >>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> > > > >>> index b4b01d47e568..0884a581fc37 100644
> > > > >>> --- a/include/uapi/linux/kvm.h
> > > > >>> +++ b/include/uapi/linux/kvm.h
> > > > >>> @@ -1490,6 +1490,7 @@ struct kvm_enc_region {
> > > > >>> #define KVM_GET_PAGE_ENC_BITMAP _IOW(KVMIO, 0xc5, struct kvm_page_enc_bitmap)
> > > > >>> #define KVM_SET_PAGE_ENC_BITMAP _IOW(KVMIO, 0xc6, struct kvm_page_enc_bitmap)
> > > > >>> +#define KVM_PAGE_ENC_BITMAP_RESET _IO(KVMIO, 0xc7)
> > > > >>> /* Secure Encrypted Virtualization command */
> > > > >>> enum sev_cmd_id {
> > > > >> Reviewed-by: Krish Sadhukhan <[email protected]>
> > >
> > >
> > > Doesn't this overlap with the set ioctl? Yes, obviously, you have to
> > > copy the new value down and do a bit more work, but I don't think
> > > resetting the bitmap is going to be the bottleneck on reboot. Seems
> > > excessive to add another ioctl for this.
> >
> > The set ioctl is generally available/provided for the incoming VM to setup
> > the page encryption bitmap, this reset ioctl is meant for the source VM
> > as a simple interface to reset the whole page encryption bitmap.
> >
> > Thanks,
> > Ashish
>
>
> Hey Ashish,
>
> These seem very overlapping. I think this API should be refactored a bit.
>
> 1) Use kvm_vm_ioctl_enable_cap to control whether or not this
> hypercall (and related feature bit) is offered to the VM, and also the
> size of the buffer.
If you look at patch 13/14, i have added a new kvm para feature called
"KVM_FEATURE_SEV_LIVE_MIGRATION" which indicates host support for SEV
Live Migration and a new Custom MSR which the guest does a wrmsr to
enable the Live Migration feature, so this is like the enable cap
support.
There are further extensions to this support i am adding, so patch 13/14
of this patch-set is still being enhanced and will have full support
when i repost next.
> 2) Use set for manipulating values in the bitmap, including resetting
> the bitmap. Set the bitmap pointer to null if you want to reset to all
> 0xFFs. When the bitmap pointer is set, it should set the values to
> exactly what is pointed at, instead of only clearing bits, as is done
> currently.
As i mentioned in my earlier email, the set api is supposed to be for
the incoming VM, but if you really need to use it for the outgoing VM
then it can be modified.
> 3) Use get for fetching values from the kernel. Personally, I'd
> require alignment of the base GFN to a multiple of 8 (but the number
> of pages could be whatever), so you can just use a memcpy. Optionally,
> you may want some way to tell userspace the size of the existing
> buffer, so it can ensure that it can ask for the entire buffer without
> having to track the size in usermode (not strictly necessary, but nice
> to have since it ensures that there is only one place that has to
> manage this value).
>
> If you want to expand or contract the bitmap, you can use enable cap
> to adjust the size.
As being discussed on the earlier mail thread, we are doing this
dynamically now by computing the guest RAM size when the
set_user_memory_region ioctl is invoked. I believe that should handle
the hot-plug and hot-unplug events too, as any hot memory updates will
need KVM memslots to be updated.
> If you don't want to offer the hypercall to the guest, don't call the
> enable cap.
> This API avoids using up another ioctl. Ioctl space is somewhat
> scarce. It also gives userspace fine grained control over the buffer,
> so it can support both hot-plug and hot-unplug (or at the very least
> it is not obviously incompatible with those). It also gives userspace
> control over whether or not the feature is offered. The hypercall
> isn't free, and being able to tell guests to not call when the host
> wasn't going to migrate it anyway will be useful.
>
As i mentioned above, now the host indicates if it supports the Live
Migration feature and the feature and the hypercall are only enabled on
the host when the guest checks for this support and does a wrmsr() to
enable the feature. Also the guest will not make the hypercall if the
host does not indicate support for it.
Thanks,
Ashish
On Thu, Apr 9, 2020 at 6:23 PM Ashish Kalra <[email protected]> wrote:
>
> Hello Steve,
>
> On Thu, Apr 09, 2020 at 05:06:21PM -0700, Steve Rutherford wrote:
> > On Tue, Apr 7, 2020 at 6:49 PM Ashish Kalra <[email protected]> wrote:
> > >
> > > Hello Steve,
> > >
> > > On Tue, Apr 07, 2020 at 05:26:33PM -0700, Steve Rutherford wrote:
> > > > On Sun, Mar 29, 2020 at 11:23 PM Ashish Kalra <[email protected]> wrote:
> > > > >
> > > > > From: Brijesh Singh <[email protected]>
> > > > >
> > > > > The ioctl can be used to set page encryption bitmap for an
> > > > > incoming guest.
> > > > >
> > > > > Cc: Thomas Gleixner <[email protected]>
> > > > > Cc: Ingo Molnar <[email protected]>
> > > > > Cc: "H. Peter Anvin" <[email protected]>
> > > > > Cc: Paolo Bonzini <[email protected]>
> > > > > Cc: "Radim Krčmář" <[email protected]>
> > > > > Cc: Joerg Roedel <[email protected]>
> > > > > Cc: Borislav Petkov <[email protected]>
> > > > > Cc: Tom Lendacky <[email protected]>
> > > > > Cc: [email protected]
> > > > > Cc: [email protected]
> > > > > Cc: [email protected]
> > > > > Signed-off-by: Brijesh Singh <[email protected]>
> > > > > Signed-off-by: Ashish Kalra <[email protected]>
> > > > > ---
> > > > > Documentation/virt/kvm/api.rst | 22 +++++++++++++++++
> > > > > arch/x86/include/asm/kvm_host.h | 2 ++
> > > > > arch/x86/kvm/svm.c | 42 +++++++++++++++++++++++++++++++++
> > > > > arch/x86/kvm/x86.c | 12 ++++++++++
> > > > > include/uapi/linux/kvm.h | 1 +
> > > > > 5 files changed, 79 insertions(+)
> > > > >
> > > > > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> > > > > index 8ad800ebb54f..4d1004a154f6 100644
> > > > > --- a/Documentation/virt/kvm/api.rst
> > > > > +++ b/Documentation/virt/kvm/api.rst
> > > > > @@ -4675,6 +4675,28 @@ or shared. The bitmap can be used during the guest migration, if the page
> > > > > is private then userspace need to use SEV migration commands to transmit
> > > > > the page.
> > > > >
> > > > > +4.126 KVM_SET_PAGE_ENC_BITMAP (vm ioctl)
> > > > > +---------------------------------------
> > > > > +
> > > > > +:Capability: basic
> > > > > +:Architectures: x86
> > > > > +:Type: vm ioctl
> > > > > +:Parameters: struct kvm_page_enc_bitmap (in/out)
> > > > > +:Returns: 0 on success, -1 on error
> > > > > +
> > > > > +/* for KVM_SET_PAGE_ENC_BITMAP */
> > > > > +struct kvm_page_enc_bitmap {
> > > > > + __u64 start_gfn;
> > > > > + __u64 num_pages;
> > > > > + union {
> > > > > + void __user *enc_bitmap; /* one bit per page */
> > > > > + __u64 padding2;
> > > > > + };
> > > > > +};
> > > > > +
> > > > > +During the guest live migration the outgoing guest exports its page encryption
> > > > > +bitmap, the KVM_SET_PAGE_ENC_BITMAP can be used to build the page encryption
> > > > > +bitmap for an incoming guest.
> > > > >
> > > > > 5. The kvm_run structure
> > > > > ========================
> > > > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > > > > index 27e43e3ec9d8..d30f770aaaea 100644
> > > > > --- a/arch/x86/include/asm/kvm_host.h
> > > > > +++ b/arch/x86/include/asm/kvm_host.h
> > > > > @@ -1271,6 +1271,8 @@ struct kvm_x86_ops {
> > > > > unsigned long sz, unsigned long mode);
> > > > > int (*get_page_enc_bitmap)(struct kvm *kvm,
> > > > > struct kvm_page_enc_bitmap *bmap);
> > > > > + int (*set_page_enc_bitmap)(struct kvm *kvm,
> > > > > + struct kvm_page_enc_bitmap *bmap);
> > > > > };
> > > > >
> > > > > struct kvm_arch_async_pf {
> > > > > diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> > > > > index bae783cd396a..313343a43045 100644
> > > > > --- a/arch/x86/kvm/svm.c
> > > > > +++ b/arch/x86/kvm/svm.c
> > > > > @@ -7756,6 +7756,47 @@ static int svm_get_page_enc_bitmap(struct kvm *kvm,
> > > > > return ret;
> > > > > }
> > > > >
> > > > > +static int svm_set_page_enc_bitmap(struct kvm *kvm,
> > > > > + struct kvm_page_enc_bitmap *bmap)
> > > > > +{
> > > > > + struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> > > > > + unsigned long gfn_start, gfn_end;
> > > > > + unsigned long *bitmap;
> > > > > + unsigned long sz, i;
> > > > > + int ret;
> > > > > +
> > > > > + if (!sev_guest(kvm))
> > > > > + return -ENOTTY;
> > > > > +
> > > > > + gfn_start = bmap->start_gfn;
> > > > > + gfn_end = gfn_start + bmap->num_pages;
> > > > > +
> > > > > + sz = ALIGN(bmap->num_pages, BITS_PER_LONG) / 8;
> > > > > + bitmap = kmalloc(sz, GFP_KERNEL);
> > > > > + if (!bitmap)
> > > > > + return -ENOMEM;
> > > > > +
> > > > > + ret = -EFAULT;
> > > > > + if (copy_from_user(bitmap, bmap->enc_bitmap, sz))
> > > > > + goto out;
> > > > > +
> > > > > + mutex_lock(&kvm->lock);
> > > > > + ret = sev_resize_page_enc_bitmap(kvm, gfn_end);
> > > > I realize now that usermode could use this for initializing the
> > > > minimum size of the enc bitmap, which probably solves my issue from
> > > > the other thread.
> > > > > + if (ret)
> > > > > + goto unlock;
> > > > > +
> > > > > + i = gfn_start;
> > > > > + for_each_clear_bit_from(i, bitmap, (gfn_end - gfn_start))
> > > > > + clear_bit(i + gfn_start, sev->page_enc_bmap);
> > > > This API seems a bit strange, since it can only clear bits. I would
> > > > expect "set" to force the values to match the values passed down,
> > > > instead of only ensuring that cleared bits in the input are also
> > > > cleared in the kernel.
> > > >
> > >
> > > The sev_resize_page_enc_bitmap() will allocate a new bitmap and
> > > set it to all 0xFF's, therefore, the code here simply clears the bits
> > > in the bitmap as per the cleared bits in the input.
> >
> > If I'm not mistaken, resize only reinitializes the newly extended part
> > of the buffer, and copies the old values for the rest.
> > With the API you proposed you could probably reimplement a normal set
> > call by calling get, then reset, and then set, but this feels
> > cumbersome.
> >
>
> As i mentioned earlier, the set api is basically meant for the incoming
> VM, the resize will initialize the incoming VM's bitmap to all 0xFF's
> and as there won't be any bitmap allocated initially on the incoming VM,
> therefore, the bitmap copy will not do anything and the clear_bit later
> will clear the incoming VM's bits as per the input.
The documentation does not make that super clear. A typical set call
in the KVM API let's you go to any state, not just a subset of states.
Yes, this works in the common case of migrating a VM to a particular
target, once. I find the behavior of the current API surprising. I
prefer APIs that are unsurprising. If I were to not have read the
code, it would be very easy for me to have assumed it worked like a
normal set call. You could rename the ioctl something like
"CLEAR_BITS", but a set based API is more common.
Thanks,
Steve
On Thu, Apr 9, 2020 at 6:34 PM Ashish Kalra <[email protected]> wrote:
>
> Hello Steve,
>
> On Thu, Apr 09, 2020 at 05:59:56PM -0700, Steve Rutherford wrote:
> > On Tue, Apr 7, 2020 at 6:52 PM Ashish Kalra <[email protected]> wrote:
> > >
> > > Hello Steve,
> > >
> > > On Tue, Apr 07, 2020 at 06:25:51PM -0700, Steve Rutherford wrote:
> > > > On Mon, Apr 6, 2020 at 11:53 AM Krish Sadhukhan
> > > > <[email protected]> wrote:
> > > > >
> > > > >
> > > > > On 4/3/20 2:45 PM, Ashish Kalra wrote:
> > > > > > On Fri, Apr 03, 2020 at 02:14:23PM -0700, Krish Sadhukhan wrote:
> > > > > >> On 3/29/20 11:23 PM, Ashish Kalra wrote:
> > > > > >>> From: Ashish Kalra <[email protected]>
> > > > > >>>
> > > > > >>> This ioctl can be used by the application to reset the page
> > > > > >>> encryption bitmap managed by the KVM driver. A typical usage
> > > > > >>> for this ioctl is on VM reboot, on reboot, we must reinitialize
> > > > > >>> the bitmap.
> > > > > >>>
> > > > > >>> Signed-off-by: Ashish Kalra <[email protected]>
> > > > > >>> ---
> > > > > >>> Documentation/virt/kvm/api.rst | 13 +++++++++++++
> > > > > >>> arch/x86/include/asm/kvm_host.h | 1 +
> > > > > >>> arch/x86/kvm/svm.c | 16 ++++++++++++++++
> > > > > >>> arch/x86/kvm/x86.c | 6 ++++++
> > > > > >>> include/uapi/linux/kvm.h | 1 +
> > > > > >>> 5 files changed, 37 insertions(+)
> > > > > >>>
> > > > > >>> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> > > > > >>> index 4d1004a154f6..a11326ccc51d 100644
> > > > > >>> --- a/Documentation/virt/kvm/api.rst
> > > > > >>> +++ b/Documentation/virt/kvm/api.rst
> > > > > >>> @@ -4698,6 +4698,19 @@ During the guest live migration the outgoing guest exports its page encryption
> > > > > >>> bitmap, the KVM_SET_PAGE_ENC_BITMAP can be used to build the page encryption
> > > > > >>> bitmap for an incoming guest.
> > > > > >>> +4.127 KVM_PAGE_ENC_BITMAP_RESET (vm ioctl)
> > > > > >>> +-----------------------------------------
> > > > > >>> +
> > > > > >>> +:Capability: basic
> > > > > >>> +:Architectures: x86
> > > > > >>> +:Type: vm ioctl
> > > > > >>> +:Parameters: none
> > > > > >>> +:Returns: 0 on success, -1 on error
> > > > > >>> +
> > > > > >>> +The KVM_PAGE_ENC_BITMAP_RESET is used to reset the guest's page encryption
> > > > > >>> +bitmap during guest reboot and this is only done on the guest's boot vCPU.
> > > > > >>> +
> > > > > >>> +
> > > > > >>> 5. The kvm_run structure
> > > > > >>> ========================
> > > > > >>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > > > > >>> index d30f770aaaea..a96ef6338cd2 100644
> > > > > >>> --- a/arch/x86/include/asm/kvm_host.h
> > > > > >>> +++ b/arch/x86/include/asm/kvm_host.h
> > > > > >>> @@ -1273,6 +1273,7 @@ struct kvm_x86_ops {
> > > > > >>> struct kvm_page_enc_bitmap *bmap);
> > > > > >>> int (*set_page_enc_bitmap)(struct kvm *kvm,
> > > > > >>> struct kvm_page_enc_bitmap *bmap);
> > > > > >>> + int (*reset_page_enc_bitmap)(struct kvm *kvm);
> > > > > >>> };
> > > > > >>> struct kvm_arch_async_pf {
> > > > > >>> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> > > > > >>> index 313343a43045..c99b0207a443 100644
> > > > > >>> --- a/arch/x86/kvm/svm.c
> > > > > >>> +++ b/arch/x86/kvm/svm.c
> > > > > >>> @@ -7797,6 +7797,21 @@ static int svm_set_page_enc_bitmap(struct kvm *kvm,
> > > > > >>> return ret;
> > > > > >>> }
> > > > > >>> +static int svm_reset_page_enc_bitmap(struct kvm *kvm)
> > > > > >>> +{
> > > > > >>> + struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> > > > > >>> +
> > > > > >>> + if (!sev_guest(kvm))
> > > > > >>> + return -ENOTTY;
> > > > > >>> +
> > > > > >>> + mutex_lock(&kvm->lock);
> > > > > >>> + /* by default all pages should be marked encrypted */
> > > > > >>> + if (sev->page_enc_bmap_size)
> > > > > >>> + bitmap_fill(sev->page_enc_bmap, sev->page_enc_bmap_size);
> > > > > >>> + mutex_unlock(&kvm->lock);
> > > > > >>> + return 0;
> > > > > >>> +}
> > > > > >>> +
> > > > > >>> static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
> > > > > >>> {
> > > > > >>> struct kvm_sev_cmd sev_cmd;
> > > > > >>> @@ -8203,6 +8218,7 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
> > > > > >>> .page_enc_status_hc = svm_page_enc_status_hc,
> > > > > >>> .get_page_enc_bitmap = svm_get_page_enc_bitmap,
> > > > > >>> .set_page_enc_bitmap = svm_set_page_enc_bitmap,
> > > > > >>> + .reset_page_enc_bitmap = svm_reset_page_enc_bitmap,
> > > > > >>
> > > > > >> We don't need to initialize the intel ops to NULL ? It's not initialized in
> > > > > >> the previous patch either.
> > > > > >>
> > > > > >>> };
> > > > > > This struct is declared as "static storage", so won't the non-initialized
> > > > > > members be 0 ?
> > > > >
> > > > >
> > > > > Correct. Although, I see that 'nested_enable_evmcs' is explicitly
> > > > > initialized. We should maintain the convention, perhaps.
> > > > >
> > > > > >
> > > > > >>> static int __init svm_init(void)
> > > > > >>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > > > > >>> index 05e953b2ec61..2127ed937f53 100644
> > > > > >>> --- a/arch/x86/kvm/x86.c
> > > > > >>> +++ b/arch/x86/kvm/x86.c
> > > > > >>> @@ -5250,6 +5250,12 @@ long kvm_arch_vm_ioctl(struct file *filp,
> > > > > >>> r = kvm_x86_ops->set_page_enc_bitmap(kvm, &bitmap);
> > > > > >>> break;
> > > > > >>> }
> > > > > >>> + case KVM_PAGE_ENC_BITMAP_RESET: {
> > > > > >>> + r = -ENOTTY;
> > > > > >>> + if (kvm_x86_ops->reset_page_enc_bitmap)
> > > > > >>> + r = kvm_x86_ops->reset_page_enc_bitmap(kvm);
> > > > > >>> + break;
> > > > > >>> + }
> > > > > >>> default:
> > > > > >>> r = -ENOTTY;
> > > > > >>> }
> > > > > >>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> > > > > >>> index b4b01d47e568..0884a581fc37 100644
> > > > > >>> --- a/include/uapi/linux/kvm.h
> > > > > >>> +++ b/include/uapi/linux/kvm.h
> > > > > >>> @@ -1490,6 +1490,7 @@ struct kvm_enc_region {
> > > > > >>> #define KVM_GET_PAGE_ENC_BITMAP _IOW(KVMIO, 0xc5, struct kvm_page_enc_bitmap)
> > > > > >>> #define KVM_SET_PAGE_ENC_BITMAP _IOW(KVMIO, 0xc6, struct kvm_page_enc_bitmap)
> > > > > >>> +#define KVM_PAGE_ENC_BITMAP_RESET _IO(KVMIO, 0xc7)
> > > > > >>> /* Secure Encrypted Virtualization command */
> > > > > >>> enum sev_cmd_id {
> > > > > >> Reviewed-by: Krish Sadhukhan <[email protected]>
> > > >
> > > >
> > > > Doesn't this overlap with the set ioctl? Yes, obviously, you have to
> > > > copy the new value down and do a bit more work, but I don't think
> > > > resetting the bitmap is going to be the bottleneck on reboot. Seems
> > > > excessive to add another ioctl for this.
> > >
> > > The set ioctl is generally available/provided for the incoming VM to setup
> > > the page encryption bitmap, this reset ioctl is meant for the source VM
> > > as a simple interface to reset the whole page encryption bitmap.
> > >
> > > Thanks,
> > > Ashish
> >
> >
> > Hey Ashish,
> >
> > These seem very overlapping. I think this API should be refactored a bit.
> >
> > 1) Use kvm_vm_ioctl_enable_cap to control whether or not this
> > hypercall (and related feature bit) is offered to the VM, and also the
> > size of the buffer.
>
> If you look at patch 13/14, i have added a new kvm para feature called
> "KVM_FEATURE_SEV_LIVE_MIGRATION" which indicates host support for SEV
> Live Migration and a new Custom MSR which the guest does a wrmsr to
> enable the Live Migration feature, so this is like the enable cap
> support.
>
> There are further extensions to this support i am adding, so patch 13/14
> of this patch-set is still being enhanced and will have full support
> when i repost next.
>
> > 2) Use set for manipulating values in the bitmap, including resetting
> > the bitmap. Set the bitmap pointer to null if you want to reset to all
> > 0xFFs. When the bitmap pointer is set, it should set the values to
> > exactly what is pointed at, instead of only clearing bits, as is done
> > currently.
>
> As i mentioned in my earlier email, the set api is supposed to be for
> the incoming VM, but if you really need to use it for the outgoing VM
> then it can be modified.
>
> > 3) Use get for fetching values from the kernel. Personally, I'd
> > require alignment of the base GFN to a multiple of 8 (but the number
> > of pages could be whatever), so you can just use a memcpy. Optionally,
> > you may want some way to tell userspace the size of the existing
> > buffer, so it can ensure that it can ask for the entire buffer without
> > having to track the size in usermode (not strictly necessary, but nice
> > to have since it ensures that there is only one place that has to
> > manage this value).
> >
> > If you want to expand or contract the bitmap, you can use enable cap
> > to adjust the size.
>
> As being discussed on the earlier mail thread, we are doing this
> dynamically now by computing the guest RAM size when the
> set_user_memory_region ioctl is invoked. I believe that should handle
> the hot-plug and hot-unplug events too, as any hot memory updates will
> need KVM memslots to be updated.
Ahh, sorry, forgot you mentioned this: yes this can work. Host needs
to be able to decide not to allocate, but this should be workable.
>
> > If you don't want to offer the hypercall to the guest, don't call the
> > enable cap.
> > This API avoids using up another ioctl. Ioctl space is somewhat
> > scarce. It also gives userspace fine grained control over the buffer,
> > so it can support both hot-plug and hot-unplug (or at the very least
> > it is not obviously incompatible with those). It also gives userspace
> > control over whether or not the feature is offered. The hypercall
> > isn't free, and being able to tell guests to not call when the host
> > wasn't going to migrate it anyway will be useful.
> >
>
> As i mentioned above, now the host indicates if it supports the Live
> Migration feature and the feature and the hypercall are only enabled on
> the host when the guest checks for this support and does a wrmsr() to
> enable the feature. Also the guest will not make the hypercall if the
> host does not indicate support for it.
If my read of those patches was correct, the host will always
advertise support for the hypercall. And the only bit controlling
whether or not the hypercall is advertised is essentially the kernel
version. You need to rollout a new kernel to disable the hypercall.
An enable cap could give the host control of this feature bit. It
could also give the host control of whether or not the bitmaps are
allocated. This is important since I assume not every SEV VM is
planned to be migrated. And if the host does not plan to migrate it
probably doesn't want to waste memory or cycles managing this bitmap.
Thanks,
Steve
On Fri, Apr 10, 2020 at 11:14 AM Steve Rutherford
<[email protected]> wrote:
>
> On Thu, Apr 9, 2020 at 6:34 PM Ashish Kalra <[email protected]> wrote:
> >
> > Hello Steve,
> >
> > On Thu, Apr 09, 2020 at 05:59:56PM -0700, Steve Rutherford wrote:
> > > On Tue, Apr 7, 2020 at 6:52 PM Ashish Kalra <[email protected]> wrote:
> > > >
> > > > Hello Steve,
> > > >
> > > > On Tue, Apr 07, 2020 at 06:25:51PM -0700, Steve Rutherford wrote:
> > > > > On Mon, Apr 6, 2020 at 11:53 AM Krish Sadhukhan
> > > > > <[email protected]> wrote:
> > > > > >
> > > > > >
> > > > > > On 4/3/20 2:45 PM, Ashish Kalra wrote:
> > > > > > > On Fri, Apr 03, 2020 at 02:14:23PM -0700, Krish Sadhukhan wrote:
> > > > > > >> On 3/29/20 11:23 PM, Ashish Kalra wrote:
> > > > > > >>> From: Ashish Kalra <[email protected]>
> > > > > > >>>
> > > > > > >>> This ioctl can be used by the application to reset the page
> > > > > > >>> encryption bitmap managed by the KVM driver. A typical usage
> > > > > > >>> for this ioctl is on VM reboot, on reboot, we must reinitialize
> > > > > > >>> the bitmap.
> > > > > > >>>
> > > > > > >>> Signed-off-by: Ashish Kalra <[email protected]>
> > > > > > >>> ---
> > > > > > >>> Documentation/virt/kvm/api.rst | 13 +++++++++++++
> > > > > > >>> arch/x86/include/asm/kvm_host.h | 1 +
> > > > > > >>> arch/x86/kvm/svm.c | 16 ++++++++++++++++
> > > > > > >>> arch/x86/kvm/x86.c | 6 ++++++
> > > > > > >>> include/uapi/linux/kvm.h | 1 +
> > > > > > >>> 5 files changed, 37 insertions(+)
> > > > > > >>>
> > > > > > >>> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> > > > > > >>> index 4d1004a154f6..a11326ccc51d 100644
> > > > > > >>> --- a/Documentation/virt/kvm/api.rst
> > > > > > >>> +++ b/Documentation/virt/kvm/api.rst
> > > > > > >>> @@ -4698,6 +4698,19 @@ During the guest live migration the outgoing guest exports its page encryption
> > > > > > >>> bitmap, the KVM_SET_PAGE_ENC_BITMAP can be used to build the page encryption
> > > > > > >>> bitmap for an incoming guest.
> > > > > > >>> +4.127 KVM_PAGE_ENC_BITMAP_RESET (vm ioctl)
> > > > > > >>> +-----------------------------------------
> > > > > > >>> +
> > > > > > >>> +:Capability: basic
> > > > > > >>> +:Architectures: x86
> > > > > > >>> +:Type: vm ioctl
> > > > > > >>> +:Parameters: none
> > > > > > >>> +:Returns: 0 on success, -1 on error
> > > > > > >>> +
> > > > > > >>> +The KVM_PAGE_ENC_BITMAP_RESET is used to reset the guest's page encryption
> > > > > > >>> +bitmap during guest reboot and this is only done on the guest's boot vCPU.
> > > > > > >>> +
> > > > > > >>> +
> > > > > > >>> 5. The kvm_run structure
> > > > > > >>> ========================
> > > > > > >>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > > > > > >>> index d30f770aaaea..a96ef6338cd2 100644
> > > > > > >>> --- a/arch/x86/include/asm/kvm_host.h
> > > > > > >>> +++ b/arch/x86/include/asm/kvm_host.h
> > > > > > >>> @@ -1273,6 +1273,7 @@ struct kvm_x86_ops {
> > > > > > >>> struct kvm_page_enc_bitmap *bmap);
> > > > > > >>> int (*set_page_enc_bitmap)(struct kvm *kvm,
> > > > > > >>> struct kvm_page_enc_bitmap *bmap);
> > > > > > >>> + int (*reset_page_enc_bitmap)(struct kvm *kvm);
> > > > > > >>> };
> > > > > > >>> struct kvm_arch_async_pf {
> > > > > > >>> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> > > > > > >>> index 313343a43045..c99b0207a443 100644
> > > > > > >>> --- a/arch/x86/kvm/svm.c
> > > > > > >>> +++ b/arch/x86/kvm/svm.c
> > > > > > >>> @@ -7797,6 +7797,21 @@ static int svm_set_page_enc_bitmap(struct kvm *kvm,
> > > > > > >>> return ret;
> > > > > > >>> }
> > > > > > >>> +static int svm_reset_page_enc_bitmap(struct kvm *kvm)
> > > > > > >>> +{
> > > > > > >>> + struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> > > > > > >>> +
> > > > > > >>> + if (!sev_guest(kvm))
> > > > > > >>> + return -ENOTTY;
> > > > > > >>> +
> > > > > > >>> + mutex_lock(&kvm->lock);
> > > > > > >>> + /* by default all pages should be marked encrypted */
> > > > > > >>> + if (sev->page_enc_bmap_size)
> > > > > > >>> + bitmap_fill(sev->page_enc_bmap, sev->page_enc_bmap_size);
> > > > > > >>> + mutex_unlock(&kvm->lock);
> > > > > > >>> + return 0;
> > > > > > >>> +}
> > > > > > >>> +
> > > > > > >>> static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
> > > > > > >>> {
> > > > > > >>> struct kvm_sev_cmd sev_cmd;
> > > > > > >>> @@ -8203,6 +8218,7 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
> > > > > > >>> .page_enc_status_hc = svm_page_enc_status_hc,
> > > > > > >>> .get_page_enc_bitmap = svm_get_page_enc_bitmap,
> > > > > > >>> .set_page_enc_bitmap = svm_set_page_enc_bitmap,
> > > > > > >>> + .reset_page_enc_bitmap = svm_reset_page_enc_bitmap,
> > > > > > >>
> > > > > > >> We don't need to initialize the intel ops to NULL ? It's not initialized in
> > > > > > >> the previous patch either.
> > > > > > >>
> > > > > > >>> };
> > > > > > > This struct is declared as "static storage", so won't the non-initialized
> > > > > > > members be 0 ?
> > > > > >
> > > > > >
> > > > > > Correct. Although, I see that 'nested_enable_evmcs' is explicitly
> > > > > > initialized. We should maintain the convention, perhaps.
> > > > > >
> > > > > > >
> > > > > > >>> static int __init svm_init(void)
> > > > > > >>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > > > > > >>> index 05e953b2ec61..2127ed937f53 100644
> > > > > > >>> --- a/arch/x86/kvm/x86.c
> > > > > > >>> +++ b/arch/x86/kvm/x86.c
> > > > > > >>> @@ -5250,6 +5250,12 @@ long kvm_arch_vm_ioctl(struct file *filp,
> > > > > > >>> r = kvm_x86_ops->set_page_enc_bitmap(kvm, &bitmap);
> > > > > > >>> break;
> > > > > > >>> }
> > > > > > >>> + case KVM_PAGE_ENC_BITMAP_RESET: {
> > > > > > >>> + r = -ENOTTY;
> > > > > > >>> + if (kvm_x86_ops->reset_page_enc_bitmap)
> > > > > > >>> + r = kvm_x86_ops->reset_page_enc_bitmap(kvm);
> > > > > > >>> + break;
> > > > > > >>> + }
> > > > > > >>> default:
> > > > > > >>> r = -ENOTTY;
> > > > > > >>> }
> > > > > > >>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> > > > > > >>> index b4b01d47e568..0884a581fc37 100644
> > > > > > >>> --- a/include/uapi/linux/kvm.h
> > > > > > >>> +++ b/include/uapi/linux/kvm.h
> > > > > > >>> @@ -1490,6 +1490,7 @@ struct kvm_enc_region {
> > > > > > >>> #define KVM_GET_PAGE_ENC_BITMAP _IOW(KVMIO, 0xc5, struct kvm_page_enc_bitmap)
> > > > > > >>> #define KVM_SET_PAGE_ENC_BITMAP _IOW(KVMIO, 0xc6, struct kvm_page_enc_bitmap)
> > > > > > >>> +#define KVM_PAGE_ENC_BITMAP_RESET _IO(KVMIO, 0xc7)
> > > > > > >>> /* Secure Encrypted Virtualization command */
> > > > > > >>> enum sev_cmd_id {
> > > > > > >> Reviewed-by: Krish Sadhukhan <[email protected]>
> > > > >
> > > > >
> > > > > Doesn't this overlap with the set ioctl? Yes, obviously, you have to
> > > > > copy the new value down and do a bit more work, but I don't think
> > > > > resetting the bitmap is going to be the bottleneck on reboot. Seems
> > > > > excessive to add another ioctl for this.
> > > >
> > > > The set ioctl is generally available/provided for the incoming VM to setup
> > > > the page encryption bitmap, this reset ioctl is meant for the source VM
> > > > as a simple interface to reset the whole page encryption bitmap.
> > > >
> > > > Thanks,
> > > > Ashish
> > >
> > >
> > > Hey Ashish,
> > >
> > > These seem very overlapping. I think this API should be refactored a bit.
> > >
> > > 1) Use kvm_vm_ioctl_enable_cap to control whether or not this
> > > hypercall (and related feature bit) is offered to the VM, and also the
> > > size of the buffer.
> >
> > If you look at patch 13/14, i have added a new kvm para feature called
> > "KVM_FEATURE_SEV_LIVE_MIGRATION" which indicates host support for SEV
> > Live Migration and a new Custom MSR which the guest does a wrmsr to
> > enable the Live Migration feature, so this is like the enable cap
> > support.
> >
> > There are further extensions to this support i am adding, so patch 13/14
> > of this patch-set is still being enhanced and will have full support
> > when i repost next.
> >
> > > 2) Use set for manipulating values in the bitmap, including resetting
> > > the bitmap. Set the bitmap pointer to null if you want to reset to all
> > > 0xFFs. When the bitmap pointer is set, it should set the values to
> > > exactly what is pointed at, instead of only clearing bits, as is done
> > > currently.
> >
> > As i mentioned in my earlier email, the set api is supposed to be for
> > the incoming VM, but if you really need to use it for the outgoing VM
> > then it can be modified.
> >
> > > 3) Use get for fetching values from the kernel. Personally, I'd
> > > require alignment of the base GFN to a multiple of 8 (but the number
> > > of pages could be whatever), so you can just use a memcpy. Optionally,
> > > you may want some way to tell userspace the size of the existing
> > > buffer, so it can ensure that it can ask for the entire buffer without
> > > having to track the size in usermode (not strictly necessary, but nice
> > > to have since it ensures that there is only one place that has to
> > > manage this value).
> > >
> > > If you want to expand or contract the bitmap, you can use enable cap
> > > to adjust the size.
> >
> > As being discussed on the earlier mail thread, we are doing this
> > dynamically now by computing the guest RAM size when the
> > set_user_memory_region ioctl is invoked. I believe that should handle
> > the hot-plug and hot-unplug events too, as any hot memory updates will
> > need KVM memslots to be updated.
> Ahh, sorry, forgot you mentioned this: yes this can work. Host needs
> to be able to decide not to allocate, but this should be workable.
> >
> > > If you don't want to offer the hypercall to the guest, don't call the
> > > enable cap.
> > > This API avoids using up another ioctl. Ioctl space is somewhat
> > > scarce. It also gives userspace fine grained control over the buffer,
> > > so it can support both hot-plug and hot-unplug (or at the very least
> > > it is not obviously incompatible with those). It also gives userspace
> > > control over whether or not the feature is offered. The hypercall
> > > isn't free, and being able to tell guests to not call when the host
> > > wasn't going to migrate it anyway will be useful.
> > >
> >
> > As i mentioned above, now the host indicates if it supports the Live
> > Migration feature and the feature and the hypercall are only enabled on
> > the host when the guest checks for this support and does a wrmsr() to
> > enable the feature. Also the guest will not make the hypercall if the
> > host does not indicate support for it.
> If my read of those patches was correct, the host will always
> advertise support for the hypercall. And the only bit controlling
> whether or not the hypercall is advertised is essentially the kernel
> version. You need to rollout a new kernel to disable the hypercall.
Ahh, awesome, I see I misunderstood how the CPUID bits get passed
through: usermode can still override them. Forgot about the back and
forth for CPUID with usermode. My point about informing the guest
kernel is clearly moot. The host still needs the ability to prevent
allocations, but that is more minor. Maybe use a flag on the memslots
directly?
On Fri, Apr 10, 2020 at 1:16 PM Steve Rutherford <[email protected]> wrote:
>
> On Fri, Apr 10, 2020 at 11:14 AM Steve Rutherford
> <[email protected]> wrote:
> >
> > On Thu, Apr 9, 2020 at 6:34 PM Ashish Kalra <[email protected]> wrote:
> > >
> > > Hello Steve,
> > >
> > > On Thu, Apr 09, 2020 at 05:59:56PM -0700, Steve Rutherford wrote:
> > > > On Tue, Apr 7, 2020 at 6:52 PM Ashish Kalra <[email protected]> wrote:
> > > > >
> > > > > Hello Steve,
> > > > >
> > > > > On Tue, Apr 07, 2020 at 06:25:51PM -0700, Steve Rutherford wrote:
> > > > > > On Mon, Apr 6, 2020 at 11:53 AM Krish Sadhukhan
> > > > > > <[email protected]> wrote:
> > > > > > >
> > > > > > >
> > > > > > > On 4/3/20 2:45 PM, Ashish Kalra wrote:
> > > > > > > > On Fri, Apr 03, 2020 at 02:14:23PM -0700, Krish Sadhukhan wrote:
> > > > > > > >> On 3/29/20 11:23 PM, Ashish Kalra wrote:
> > > > > > > >>> From: Ashish Kalra <[email protected]>
> > > > > > > >>>
> > > > > > > >>> This ioctl can be used by the application to reset the page
> > > > > > > >>> encryption bitmap managed by the KVM driver. A typical usage
> > > > > > > >>> for this ioctl is on VM reboot, on reboot, we must reinitialize
> > > > > > > >>> the bitmap.
> > > > > > > >>>
> > > > > > > >>> Signed-off-by: Ashish Kalra <[email protected]>
> > > > > > > >>> ---
> > > > > > > >>> Documentation/virt/kvm/api.rst | 13 +++++++++++++
> > > > > > > >>> arch/x86/include/asm/kvm_host.h | 1 +
> > > > > > > >>> arch/x86/kvm/svm.c | 16 ++++++++++++++++
> > > > > > > >>> arch/x86/kvm/x86.c | 6 ++++++
> > > > > > > >>> include/uapi/linux/kvm.h | 1 +
> > > > > > > >>> 5 files changed, 37 insertions(+)
> > > > > > > >>>
> > > > > > > >>> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> > > > > > > >>> index 4d1004a154f6..a11326ccc51d 100644
> > > > > > > >>> --- a/Documentation/virt/kvm/api.rst
> > > > > > > >>> +++ b/Documentation/virt/kvm/api.rst
> > > > > > > >>> @@ -4698,6 +4698,19 @@ During the guest live migration the outgoing guest exports its page encryption
> > > > > > > >>> bitmap, the KVM_SET_PAGE_ENC_BITMAP can be used to build the page encryption
> > > > > > > >>> bitmap for an incoming guest.
> > > > > > > >>> +4.127 KVM_PAGE_ENC_BITMAP_RESET (vm ioctl)
> > > > > > > >>> +-----------------------------------------
> > > > > > > >>> +
> > > > > > > >>> +:Capability: basic
> > > > > > > >>> +:Architectures: x86
> > > > > > > >>> +:Type: vm ioctl
> > > > > > > >>> +:Parameters: none
> > > > > > > >>> +:Returns: 0 on success, -1 on error
> > > > > > > >>> +
> > > > > > > >>> +The KVM_PAGE_ENC_BITMAP_RESET is used to reset the guest's page encryption
> > > > > > > >>> +bitmap during guest reboot and this is only done on the guest's boot vCPU.
> > > > > > > >>> +
> > > > > > > >>> +
> > > > > > > >>> 5. The kvm_run structure
> > > > > > > >>> ========================
> > > > > > > >>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > > > > > > >>> index d30f770aaaea..a96ef6338cd2 100644
> > > > > > > >>> --- a/arch/x86/include/asm/kvm_host.h
> > > > > > > >>> +++ b/arch/x86/include/asm/kvm_host.h
> > > > > > > >>> @@ -1273,6 +1273,7 @@ struct kvm_x86_ops {
> > > > > > > >>> struct kvm_page_enc_bitmap *bmap);
> > > > > > > >>> int (*set_page_enc_bitmap)(struct kvm *kvm,
> > > > > > > >>> struct kvm_page_enc_bitmap *bmap);
> > > > > > > >>> + int (*reset_page_enc_bitmap)(struct kvm *kvm);
> > > > > > > >>> };
> > > > > > > >>> struct kvm_arch_async_pf {
> > > > > > > >>> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> > > > > > > >>> index 313343a43045..c99b0207a443 100644
> > > > > > > >>> --- a/arch/x86/kvm/svm.c
> > > > > > > >>> +++ b/arch/x86/kvm/svm.c
> > > > > > > >>> @@ -7797,6 +7797,21 @@ static int svm_set_page_enc_bitmap(struct kvm *kvm,
> > > > > > > >>> return ret;
> > > > > > > >>> }
> > > > > > > >>> +static int svm_reset_page_enc_bitmap(struct kvm *kvm)
> > > > > > > >>> +{
> > > > > > > >>> + struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> > > > > > > >>> +
> > > > > > > >>> + if (!sev_guest(kvm))
> > > > > > > >>> + return -ENOTTY;
> > > > > > > >>> +
> > > > > > > >>> + mutex_lock(&kvm->lock);
> > > > > > > >>> + /* by default all pages should be marked encrypted */
> > > > > > > >>> + if (sev->page_enc_bmap_size)
> > > > > > > >>> + bitmap_fill(sev->page_enc_bmap, sev->page_enc_bmap_size);
> > > > > > > >>> + mutex_unlock(&kvm->lock);
> > > > > > > >>> + return 0;
> > > > > > > >>> +}
> > > > > > > >>> +
> > > > > > > >>> static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
> > > > > > > >>> {
> > > > > > > >>> struct kvm_sev_cmd sev_cmd;
> > > > > > > >>> @@ -8203,6 +8218,7 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
> > > > > > > >>> .page_enc_status_hc = svm_page_enc_status_hc,
> > > > > > > >>> .get_page_enc_bitmap = svm_get_page_enc_bitmap,
> > > > > > > >>> .set_page_enc_bitmap = svm_set_page_enc_bitmap,
> > > > > > > >>> + .reset_page_enc_bitmap = svm_reset_page_enc_bitmap,
> > > > > > > >>
> > > > > > > >> We don't need to initialize the intel ops to NULL ? It's not initialized in
> > > > > > > >> the previous patch either.
> > > > > > > >>
> > > > > > > >>> };
> > > > > > > > This struct is declared as "static storage", so won't the non-initialized
> > > > > > > > members be 0 ?
> > > > > > >
> > > > > > >
> > > > > > > Correct. Although, I see that 'nested_enable_evmcs' is explicitly
> > > > > > > initialized. We should maintain the convention, perhaps.
> > > > > > >
> > > > > > > >
> > > > > > > >>> static int __init svm_init(void)
> > > > > > > >>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > > > > > > >>> index 05e953b2ec61..2127ed937f53 100644
> > > > > > > >>> --- a/arch/x86/kvm/x86.c
> > > > > > > >>> +++ b/arch/x86/kvm/x86.c
> > > > > > > >>> @@ -5250,6 +5250,12 @@ long kvm_arch_vm_ioctl(struct file *filp,
> > > > > > > >>> r = kvm_x86_ops->set_page_enc_bitmap(kvm, &bitmap);
> > > > > > > >>> break;
> > > > > > > >>> }
> > > > > > > >>> + case KVM_PAGE_ENC_BITMAP_RESET: {
> > > > > > > >>> + r = -ENOTTY;
> > > > > > > >>> + if (kvm_x86_ops->reset_page_enc_bitmap)
> > > > > > > >>> + r = kvm_x86_ops->reset_page_enc_bitmap(kvm);
> > > > > > > >>> + break;
> > > > > > > >>> + }
> > > > > > > >>> default:
> > > > > > > >>> r = -ENOTTY;
> > > > > > > >>> }
> > > > > > > >>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> > > > > > > >>> index b4b01d47e568..0884a581fc37 100644
> > > > > > > >>> --- a/include/uapi/linux/kvm.h
> > > > > > > >>> +++ b/include/uapi/linux/kvm.h
> > > > > > > >>> @@ -1490,6 +1490,7 @@ struct kvm_enc_region {
> > > > > > > >>> #define KVM_GET_PAGE_ENC_BITMAP _IOW(KVMIO, 0xc5, struct kvm_page_enc_bitmap)
> > > > > > > >>> #define KVM_SET_PAGE_ENC_BITMAP _IOW(KVMIO, 0xc6, struct kvm_page_enc_bitmap)
> > > > > > > >>> +#define KVM_PAGE_ENC_BITMAP_RESET _IO(KVMIO, 0xc7)
> > > > > > > >>> /* Secure Encrypted Virtualization command */
> > > > > > > >>> enum sev_cmd_id {
> > > > > > > >> Reviewed-by: Krish Sadhukhan <[email protected]>
> > > > > >
> > > > > >
> > > > > > Doesn't this overlap with the set ioctl? Yes, obviously, you have to
> > > > > > copy the new value down and do a bit more work, but I don't think
> > > > > > resetting the bitmap is going to be the bottleneck on reboot. Seems
> > > > > > excessive to add another ioctl for this.
> > > > >
> > > > > The set ioctl is generally available/provided for the incoming VM to setup
> > > > > the page encryption bitmap, this reset ioctl is meant for the source VM
> > > > > as a simple interface to reset the whole page encryption bitmap.
> > > > >
> > > > > Thanks,
> > > > > Ashish
> > > >
> > > >
> > > > Hey Ashish,
> > > >
> > > > These seem very overlapping. I think this API should be refactored a bit.
> > > >
> > > > 1) Use kvm_vm_ioctl_enable_cap to control whether or not this
> > > > hypercall (and related feature bit) is offered to the VM, and also the
> > > > size of the buffer.
> > >
> > > If you look at patch 13/14, i have added a new kvm para feature called
> > > "KVM_FEATURE_SEV_LIVE_MIGRATION" which indicates host support for SEV
> > > Live Migration and a new Custom MSR which the guest does a wrmsr to
> > > enable the Live Migration feature, so this is like the enable cap
> > > support.
> > >
> > > There are further extensions to this support i am adding, so patch 13/14
> > > of this patch-set is still being enhanced and will have full support
> > > when i repost next.
> > >
> > > > 2) Use set for manipulating values in the bitmap, including resetting
> > > > the bitmap. Set the bitmap pointer to null if you want to reset to all
> > > > 0xFFs. When the bitmap pointer is set, it should set the values to
> > > > exactly what is pointed at, instead of only clearing bits, as is done
> > > > currently.
> > >
> > > As i mentioned in my earlier email, the set api is supposed to be for
> > > the incoming VM, but if you really need to use it for the outgoing VM
> > > then it can be modified.
> > >
> > > > 3) Use get for fetching values from the kernel. Personally, I'd
> > > > require alignment of the base GFN to a multiple of 8 (but the number
> > > > of pages could be whatever), so you can just use a memcpy. Optionally,
> > > > you may want some way to tell userspace the size of the existing
> > > > buffer, so it can ensure that it can ask for the entire buffer without
> > > > having to track the size in usermode (not strictly necessary, but nice
> > > > to have since it ensures that there is only one place that has to
> > > > manage this value).
> > > >
> > > > If you want to expand or contract the bitmap, you can use enable cap
> > > > to adjust the size.
> > >
> > > As being discussed on the earlier mail thread, we are doing this
> > > dynamically now by computing the guest RAM size when the
> > > set_user_memory_region ioctl is invoked. I believe that should handle
> > > the hot-plug and hot-unplug events too, as any hot memory updates will
> > > need KVM memslots to be updated.
> > Ahh, sorry, forgot you mentioned this: yes this can work. Host needs
> > to be able to decide not to allocate, but this should be workable.
> > >
> > > > If you don't want to offer the hypercall to the guest, don't call the
> > > > enable cap.
> > > > This API avoids using up another ioctl. Ioctl space is somewhat
> > > > scarce. It also gives userspace fine grained control over the buffer,
> > > > so it can support both hot-plug and hot-unplug (or at the very least
> > > > it is not obviously incompatible with those). It also gives userspace
> > > > control over whether or not the feature is offered. The hypercall
> > > > isn't free, and being able to tell guests to not call when the host
> > > > wasn't going to migrate it anyway will be useful.
> > > >
> > >
> > > As i mentioned above, now the host indicates if it supports the Live
> > > Migration feature and the feature and the hypercall are only enabled on
> > > the host when the guest checks for this support and does a wrmsr() to
> > > enable the feature. Also the guest will not make the hypercall if the
> > > host does not indicate support for it.
> > If my read of those patches was correct, the host will always
> > advertise support for the hypercall. And the only bit controlling
> > whether or not the hypercall is advertised is essentially the kernel
> > version. You need to rollout a new kernel to disable the hypercall.
>
> Ahh, awesome, I see I misunderstood how the CPUID bits get passed
> through: usermode can still override them. Forgot about the back and
> forth for CPUID with usermode. My point about informing the guest
> kernel is clearly moot. The host still needs the ability to prevent
> allocations, but that is more minor. Maybe use a flag on the memslots
> directly?
On second thought: burning the memslot flag for 30mb per tb of VM
seems like a waste.
[AMD Official Use Only - Internal Distribution Only]
Hello Steve,
-----Original Message-----
From: Steve Rutherford <[email protected]>
Sent: Friday, April 10, 2020 3:19 PM
To: Kalra, Ashish <[email protected]>
Cc: Krish Sadhukhan <[email protected]>; Paolo Bonzini <[email protected]>; Thomas Gleixner <[email protected]>; Ingo Molnar <[email protected]>; H. Peter Anvin <[email protected]>; Joerg Roedel <[email protected]>; Borislav Petkov <[email protected]>; Lendacky, Thomas <[email protected]>; X86 ML <[email protected]>; KVM list <[email protected]>; LKML <[email protected]>; David Rientjes <[email protected]>; Andy Lutomirski <[email protected]>; Singh, Brijesh <[email protected]>
Subject: Re: [PATCH v6 12/14] KVM: x86: Introduce KVM_PAGE_ENC_BITMAP_RESET ioctl
On Fri, Apr 10, 2020 at 1:16 PM Steve Rutherford <[email protected]> wrote:
>
> On Fri, Apr 10, 2020 at 11:14 AM Steve Rutherford
> <[email protected]> wrote:
> >
> > On Thu, Apr 9, 2020 at 6:34 PM Ashish Kalra <[email protected]> wrote:
> > >
> > > Hello Steve,
> > >
> > > On Thu, Apr 09, 2020 at 05:59:56PM -0700, Steve Rutherford wrote:
> > > > On Tue, Apr 7, 2020 at 6:52 PM Ashish Kalra <[email protected]> wrote:
> > > > >
> > > > > Hello Steve,
> > > > >
> > > > > On Tue, Apr 07, 2020 at 06:25:51PM -0700, Steve Rutherford wrote:
> > > > > > On Mon, Apr 6, 2020 at 11:53 AM Krish Sadhukhan
> > > > > > <[email protected]> wrote:
> > > > > > >
> > > > > > >
> > > > > > > On 4/3/20 2:45 PM, Ashish Kalra wrote:
> > > > > > > > On Fri, Apr 03, 2020 at 02:14:23PM -0700, Krish Sadhukhan wrote:
> > > > > > > >> On 3/29/20 11:23 PM, Ashish Kalra wrote:
> > > > > > > >>> From: Ashish Kalra <[email protected]>
> > > > > > > >>>
> > > > > > > >>> This ioctl can be used by the application to reset the
> > > > > > > >>> page encryption bitmap managed by the KVM driver. A
> > > > > > > >>> typical usage for this ioctl is on VM reboot, on
> > > > > > > >>> reboot, we must reinitialize the bitmap.
> > > > > > > >>>
> > > > > > > >>> Signed-off-by: Ashish Kalra <[email protected]>
> > > > > > > >>> ---
> > > > > > > >>> Documentation/virt/kvm/api.rst | 13 +++++++++++++
> > > > > > > >>> arch/x86/include/asm/kvm_host.h | 1 +
> > > > > > > >>> arch/x86/kvm/svm.c | 16 ++++++++++++++++
> > > > > > > >>> arch/x86/kvm/x86.c | 6 ++++++
> > > > > > > >>> include/uapi/linux/kvm.h | 1 +
> > > > > > > >>> 5 files changed, 37 insertions(+)
> > > > > > > >>>
> > > > > > > >>> diff --git a/Documentation/virt/kvm/api.rst
> > > > > > > >>> b/Documentation/virt/kvm/api.rst index
> > > > > > > >>> 4d1004a154f6..a11326ccc51d 100644
> > > > > > > >>> --- a/Documentation/virt/kvm/api.rst
> > > > > > > >>> +++ b/Documentation/virt/kvm/api.rst
> > > > > > > >>> @@ -4698,6 +4698,19 @@ During the guest live migration the outgoing guest exports its page encryption
> > > > > > > >>> bitmap, the KVM_SET_PAGE_ENC_BITMAP can be used to build the page encryption
> > > > > > > >>> bitmap for an incoming guest.
> > > > > > > >>> +4.127 KVM_PAGE_ENC_BITMAP_RESET (vm ioctl)
> > > > > > > >>> +-----------------------------------------
> > > > > > > >>> +
> > > > > > > >>> +:Capability: basic
> > > > > > > >>> +:Architectures: x86
> > > > > > > >>> +:Type: vm ioctl
> > > > > > > >>> +:Parameters: none
> > > > > > > >>> +:Returns: 0 on success, -1 on error
> > > > > > > >>> +
> > > > > > > >>> +The KVM_PAGE_ENC_BITMAP_RESET is used to reset the
> > > > > > > >>> +guest's page encryption bitmap during guest reboot and this is only done on the guest's boot vCPU.
> > > > > > > >>> +
> > > > > > > >>> +
> > > > > > > >>> 5. The kvm_run structure
> > > > > > > >>> ======================== diff --git
> > > > > > > >>> a/arch/x86/include/asm/kvm_host.h
> > > > > > > >>> b/arch/x86/include/asm/kvm_host.h index
> > > > > > > >>> d30f770aaaea..a96ef6338cd2 100644
> > > > > > > >>> --- a/arch/x86/include/asm/kvm_host.h
> > > > > > > >>> +++ b/arch/x86/include/asm/kvm_host.h
> > > > > > > >>> @@ -1273,6 +1273,7 @@ struct kvm_x86_ops {
> > > > > > > >>> struct kvm_page_enc_bitmap *bmap);
> > > > > > > >>> int (*set_page_enc_bitmap)(struct kvm *kvm,
> > > > > > > >>> struct kvm_page_enc_bitmap
> > > > > > > >>> *bmap);
> > > > > > > >>> + int (*reset_page_enc_bitmap)(struct kvm *kvm);
> > > > > > > >>> };
> > > > > > > >>> struct kvm_arch_async_pf { diff --git
> > > > > > > >>> a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index
> > > > > > > >>> 313343a43045..c99b0207a443 100644
> > > > > > > >>> --- a/arch/x86/kvm/svm.c
> > > > > > > >>> +++ b/arch/x86/kvm/svm.c
> > > > > > > >>> @@ -7797,6 +7797,21 @@ static int svm_set_page_enc_bitmap(struct kvm *kvm,
> > > > > > > >>> return ret;
> > > > > > > >>> }
> > > > > > > >>> +static int svm_reset_page_enc_bitmap(struct kvm *kvm)
> > > > > > > >>> +{
> > > > > > > >>> + struct kvm_sev_info *sev =
> > > > > > > >>> +&to_kvm_svm(kvm)->sev_info;
> > > > > > > >>> +
> > > > > > > >>> + if (!sev_guest(kvm))
> > > > > > > >>> + return -ENOTTY;
> > > > > > > >>> +
> > > > > > > >>> + mutex_lock(&kvm->lock);
> > > > > > > >>> + /* by default all pages should be marked encrypted */
> > > > > > > >>> + if (sev->page_enc_bmap_size)
> > > > > > > >>> + bitmap_fill(sev->page_enc_bmap, sev->page_enc_bmap_size);
> > > > > > > >>> + mutex_unlock(&kvm->lock);
> > > > > > > >>> + return 0;
> > > > > > > >>> +}
> > > > > > > >>> +
> > > > > > > >>> static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
> > > > > > > >>> {
> > > > > > > >>> struct kvm_sev_cmd sev_cmd; @@ -8203,6 +8218,7 @@
> > > > > > > >>> static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
> > > > > > > >>> .page_enc_status_hc = svm_page_enc_status_hc,
> > > > > > > >>> .get_page_enc_bitmap = svm_get_page_enc_bitmap,
> > > > > > > >>> .set_page_enc_bitmap = svm_set_page_enc_bitmap,
> > > > > > > >>> + .reset_page_enc_bitmap =
> > > > > > > >>> + svm_reset_page_enc_bitmap,
> > > > > > > >>
> > > > > > > >> We don't need to initialize the intel ops to NULL ?
> > > > > > > >> It's not initialized in the previous patch either.
> > > > > > > >>
> > > > > > > >>> };
> > > > > > > > This struct is declared as "static storage", so won't
> > > > > > > > the non-initialized members be 0 ?
> > > > > > >
> > > > > > >
> > > > > > > Correct. Although, I see that 'nested_enable_evmcs' is
> > > > > > > explicitly initialized. We should maintain the convention, perhaps.
> > > > > > >
> > > > > > > >
> > > > > > > >>> static int __init svm_init(void) diff --git
> > > > > > > >>> a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index
> > > > > > > >>> 05e953b2ec61..2127ed937f53 100644
> > > > > > > >>> --- a/arch/x86/kvm/x86.c
> > > > > > > >>> +++ b/arch/x86/kvm/x86.c
> > > > > > > >>> @@ -5250,6 +5250,12 @@ long kvm_arch_vm_ioctl(struct file *filp,
> > > > > > > >>> r = kvm_x86_ops->set_page_enc_bitmap(kvm, &bitmap);
> > > > > > > >>> break;
> > > > > > > >>> }
> > > > > > > >>> + case KVM_PAGE_ENC_BITMAP_RESET: {
> > > > > > > >>> + r = -ENOTTY;
> > > > > > > >>> + if (kvm_x86_ops->reset_page_enc_bitmap)
> > > > > > > >>> + r = kvm_x86_ops->reset_page_enc_bitmap(kvm);
> > > > > > > >>> + break;
> > > > > > > >>> + }
> > > > > > > >>> default:
> > > > > > > >>> r = -ENOTTY;
> > > > > > > >>> }
> > > > > > > >>> diff --git a/include/uapi/linux/kvm.h
> > > > > > > >>> b/include/uapi/linux/kvm.h index
> > > > > > > >>> b4b01d47e568..0884a581fc37 100644
> > > > > > > >>> --- a/include/uapi/linux/kvm.h
> > > > > > > >>> +++ b/include/uapi/linux/kvm.h
> > > > > > > >>> @@ -1490,6 +1490,7 @@ struct kvm_enc_region {
> > > > > > > >>> #define KVM_GET_PAGE_ENC_BITMAP _IOW(KVMIO, 0xc5, struct kvm_page_enc_bitmap)
> > > > > > > >>> #define KVM_SET_PAGE_ENC_BITMAP _IOW(KVMIO, 0xc6,
> > > > > > > >>> struct kvm_page_enc_bitmap)
> > > > > > > >>> +#define KVM_PAGE_ENC_BITMAP_RESET _IO(KVMIO, 0xc7)
> > > > > > > >>> /* Secure Encrypted Virtualization command */
> > > > > > > >>> enum sev_cmd_id {
> > > > > > > >> Reviewed-by: Krish Sadhukhan
> > > > > > > >> <[email protected]>
> > > > > >
> > > > > >
> > > > > > Doesn't this overlap with the set ioctl? Yes, obviously, you
> > > > > > have to copy the new value down and do a bit more work, but
> > > > > > I don't think resetting the bitmap is going to be the
> > > > > > bottleneck on reboot. Seems excessive to add another ioctl for this.
> > > > >
> > > > > The set ioctl is generally available/provided for the incoming
> > > > > VM to setup the page encryption bitmap, this reset ioctl is
> > > > > meant for the source VM as a simple interface to reset the whole page encryption bitmap.
> > > > >
> > > > > Thanks,
> > > > > Ashish
> > > >
> > > >
> > > > Hey Ashish,
> > > >
> > > > These seem very overlapping. I think this API should be refactored a bit.
> > > >
> > > > 1) Use kvm_vm_ioctl_enable_cap to control whether or not this
> > > > hypercall (and related feature bit) is offered to the VM, and
> > > > also the size of the buffer.
> > >
> > > If you look at patch 13/14, i have added a new kvm para feature
> > > called "KVM_FEATURE_SEV_LIVE_MIGRATION" which indicates host
> > > support for SEV Live Migration and a new Custom MSR which the
> > > guest does a wrmsr to enable the Live Migration feature, so this
> > > is like the enable cap support.
> > >
> > > There are further extensions to this support i am adding, so patch
> > > 13/14 of this patch-set is still being enhanced and will have full
> > > support when i repost next.
> > >
> > > > 2) Use set for manipulating values in the bitmap, including
> > > > resetting the bitmap. Set the bitmap pointer to null if you want
> > > > to reset to all 0xFFs. When the bitmap pointer is set, it should
> > > > set the values to exactly what is pointed at, instead of only
> > > > clearing bits, as is done currently.
> > >
> > > As i mentioned in my earlier email, the set api is supposed to be
> > > for the incoming VM, but if you really need to use it for the
> > > outgoing VM then it can be modified.
> > >
> > > > 3) Use get for fetching values from the kernel. Personally, I'd
> > > > require alignment of the base GFN to a multiple of 8 (but the
> > > > number of pages could be whatever), so you can just use a
> > > > memcpy. Optionally, you may want some way to tell userspace the
> > > > size of the existing buffer, so it can ensure that it can ask
> > > > for the entire buffer without having to track the size in
> > > > usermode (not strictly necessary, but nice to have since it
> > > > ensures that there is only one place that has to manage this value).
> > > >
> > > > If you want to expand or contract the bitmap, you can use enable
> > > > cap to adjust the size.
> > >
> > > As being discussed on the earlier mail thread, we are doing this
> > > dynamically now by computing the guest RAM size when the
> > > set_user_memory_region ioctl is invoked. I believe that should
> > > handle the hot-plug and hot-unplug events too, as any hot memory
> > > updates will need KVM memslots to be updated.
> > Ahh, sorry, forgot you mentioned this: yes this can work. Host needs
> > to be able to decide not to allocate, but this should be workable.
> > >
> > > > If you don't want to offer the hypercall to the guest, don't
> > > > call the enable cap.
> > > > This API avoids using up another ioctl. Ioctl space is somewhat
> > > > scarce. It also gives userspace fine grained control over the
> > > > buffer, so it can support both hot-plug and hot-unplug (or at
> > > > the very least it is not obviously incompatible with those). It
> > > > also gives userspace control over whether or not the feature is
> > > > offered. The hypercall isn't free, and being able to tell guests
> > > > to not call when the host wasn't going to migrate it anyway will be useful.
> > > >
> > >
> > > As i mentioned above, now the host indicates if it supports the
> > > Live Migration feature and the feature and the hypercall are only
> > > enabled on the host when the guest checks for this support and
> > > does a wrmsr() to enable the feature. Also the guest will not make
> > > the hypercall if the host does not indicate support for it.
> > If my read of those patches was correct, the host will always
> > advertise support for the hypercall. And the only bit controlling
> > whether or not the hypercall is advertised is essentially the kernel
> > version. You need to rollout a new kernel to disable the hypercall.
>
> Ahh, awesome, I see I misunderstood how the CPUID bits get passed
> through: usermode can still override them. Forgot about the back and
> forth for CPUID with usermode. My point about informing the guest
> kernel is clearly moot. The host still needs the ability to prevent
> allocations, but that is more minor. Maybe use a flag on the memslots
> directly?
> On second thought: burning the memslot flag for 30mb per tb of VM seems like a waste.
Currently, I am still using the approach of a "unified" page encryption bitmap instead of a
bitmap per memslot, with the main change being that the resizing is only done whenever
there are any updates in memslots, when memslots are updated using the
kvm_arch_commit_memory_region() interface.
Thanks,
Ashish
On 4/10/20 3:55 PM, Kalra, Ashish wrote:
> [AMD Official Use Only - Internal Distribution Only]
>
> Hello Steve,
>
> -----Original Message-----
> From: Steve Rutherford <[email protected]>
> Sent: Friday, April 10, 2020 3:19 PM
> To: Kalra, Ashish <[email protected]>
> Cc: Krish Sadhukhan <[email protected]>; Paolo Bonzini <[email protected]>; Thomas Gleixner <[email protected]>; Ingo Molnar <[email protected]>; H. Peter Anvin <[email protected]>; Joerg Roedel <[email protected]>; Borislav Petkov <[email protected]>; Lendacky, Thomas <[email protected]>; X86 ML <[email protected]>; KVM list <[email protected]>; LKML <[email protected]>; David Rientjes <[email protected]>; Andy Lutomirski <[email protected]>; Singh, Brijesh <[email protected]>
> Subject: Re: [PATCH v6 12/14] KVM: x86: Introduce KVM_PAGE_ENC_BITMAP_RESET ioctl
>
> On Fri, Apr 10, 2020 at 1:16 PM Steve Rutherford <[email protected]> wrote:
>> On Fri, Apr 10, 2020 at 11:14 AM Steve Rutherford
>> <[email protected]> wrote:
>>> On Thu, Apr 9, 2020 at 6:34 PM Ashish Kalra <[email protected]> wrote:
>>>> Hello Steve,
>>>>
>>>> On Thu, Apr 09, 2020 at 05:59:56PM -0700, Steve Rutherford wrote:
>>>>> On Tue, Apr 7, 2020 at 6:52 PM Ashish Kalra <[email protected]> wrote:
>>>>>> Hello Steve,
>>>>>>
>>>>>> On Tue, Apr 07, 2020 at 06:25:51PM -0700, Steve Rutherford wrote:
>>>>>>> On Mon, Apr 6, 2020 at 11:53 AM Krish Sadhukhan
>>>>>>> <[email protected]> wrote:
>>>>>>>>
>>>>>>>> On 4/3/20 2:45 PM, Ashish Kalra wrote:
>>>>>>>>> On Fri, Apr 03, 2020 at 02:14:23PM -0700, Krish Sadhukhan wrote:
>>>>>>>>>> On 3/29/20 11:23 PM, Ashish Kalra wrote:
>>>>>>>>>>> From: Ashish Kalra <[email protected]>
>>>>>>>>>>>
>>>>>>>>>>> This ioctl can be used by the application to reset the
>>>>>>>>>>> page encryption bitmap managed by the KVM driver. A
>>>>>>>>>>> typical usage for this ioctl is on VM reboot, on
>>>>>>>>>>> reboot, we must reinitialize the bitmap.
>>>>>>>>>>>
>>>>>>>>>>> Signed-off-by: Ashish Kalra <[email protected]>
>>>>>>>>>>> ---
>>>>>>>>>>> Documentation/virt/kvm/api.rst | 13 +++++++++++++
>>>>>>>>>>> arch/x86/include/asm/kvm_host.h | 1 +
>>>>>>>>>>> arch/x86/kvm/svm.c | 16 ++++++++++++++++
>>>>>>>>>>> arch/x86/kvm/x86.c | 6 ++++++
>>>>>>>>>>> include/uapi/linux/kvm.h | 1 +
>>>>>>>>>>> 5 files changed, 37 insertions(+)
>>>>>>>>>>>
>>>>>>>>>>> diff --git a/Documentation/virt/kvm/api.rst
>>>>>>>>>>> b/Documentation/virt/kvm/api.rst index
>>>>>>>>>>> 4d1004a154f6..a11326ccc51d 100644
>>>>>>>>>>> --- a/Documentation/virt/kvm/api.rst
>>>>>>>>>>> +++ b/Documentation/virt/kvm/api.rst
>>>>>>>>>>> @@ -4698,6 +4698,19 @@ During the guest live migration the outgoing guest exports its page encryption
>>>>>>>>>>> bitmap, the KVM_SET_PAGE_ENC_BITMAP can be used to build the page encryption
>>>>>>>>>>> bitmap for an incoming guest.
>>>>>>>>>>> +4.127 KVM_PAGE_ENC_BITMAP_RESET (vm ioctl)
>>>>>>>>>>> +-----------------------------------------
>>>>>>>>>>> +
>>>>>>>>>>> +:Capability: basic
>>>>>>>>>>> +:Architectures: x86
>>>>>>>>>>> +:Type: vm ioctl
>>>>>>>>>>> +:Parameters: none
>>>>>>>>>>> +:Returns: 0 on success, -1 on error
>>>>>>>>>>> +
>>>>>>>>>>> +The KVM_PAGE_ENC_BITMAP_RESET is used to reset the
>>>>>>>>>>> +guest's page encryption bitmap during guest reboot and this is only done on the guest's boot vCPU.
>>>>>>>>>>> +
>>>>>>>>>>> +
>>>>>>>>>>> 5. The kvm_run structure
>>>>>>>>>>> ======================== diff --git
>>>>>>>>>>> a/arch/x86/include/asm/kvm_host.h
>>>>>>>>>>> b/arch/x86/include/asm/kvm_host.h index
>>>>>>>>>>> d30f770aaaea..a96ef6338cd2 100644
>>>>>>>>>>> --- a/arch/x86/include/asm/kvm_host.h
>>>>>>>>>>> +++ b/arch/x86/include/asm/kvm_host.h
>>>>>>>>>>> @@ -1273,6 +1273,7 @@ struct kvm_x86_ops {
>>>>>>>>>>> struct kvm_page_enc_bitmap *bmap);
>>>>>>>>>>> int (*set_page_enc_bitmap)(struct kvm *kvm,
>>>>>>>>>>> struct kvm_page_enc_bitmap
>>>>>>>>>>> *bmap);
>>>>>>>>>>> + int (*reset_page_enc_bitmap)(struct kvm *kvm);
>>>>>>>>>>> };
>>>>>>>>>>> struct kvm_arch_async_pf { diff --git
>>>>>>>>>>> a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index
>>>>>>>>>>> 313343a43045..c99b0207a443 100644
>>>>>>>>>>> --- a/arch/x86/kvm/svm.c
>>>>>>>>>>> +++ b/arch/x86/kvm/svm.c
>>>>>>>>>>> @@ -7797,6 +7797,21 @@ static int svm_set_page_enc_bitmap(struct kvm *kvm,
>>>>>>>>>>> return ret;
>>>>>>>>>>> }
>>>>>>>>>>> +static int svm_reset_page_enc_bitmap(struct kvm *kvm)
>>>>>>>>>>> +{
>>>>>>>>>>> + struct kvm_sev_info *sev =
>>>>>>>>>>> +&to_kvm_svm(kvm)->sev_info;
>>>>>>>>>>> +
>>>>>>>>>>> + if (!sev_guest(kvm))
>>>>>>>>>>> + return -ENOTTY;
>>>>>>>>>>> +
>>>>>>>>>>> + mutex_lock(&kvm->lock);
>>>>>>>>>>> + /* by default all pages should be marked encrypted */
>>>>>>>>>>> + if (sev->page_enc_bmap_size)
>>>>>>>>>>> + bitmap_fill(sev->page_enc_bmap, sev->page_enc_bmap_size);
>>>>>>>>>>> + mutex_unlock(&kvm->lock);
>>>>>>>>>>> + return 0;
>>>>>>>>>>> +}
>>>>>>>>>>> +
>>>>>>>>>>> static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
>>>>>>>>>>> {
>>>>>>>>>>> struct kvm_sev_cmd sev_cmd; @@ -8203,6 +8218,7 @@
>>>>>>>>>>> static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
>>>>>>>>>>> .page_enc_status_hc = svm_page_enc_status_hc,
>>>>>>>>>>> .get_page_enc_bitmap = svm_get_page_enc_bitmap,
>>>>>>>>>>> .set_page_enc_bitmap = svm_set_page_enc_bitmap,
>>>>>>>>>>> + .reset_page_enc_bitmap =
>>>>>>>>>>> + svm_reset_page_enc_bitmap,
>>>>>>>>>> We don't need to initialize the intel ops to NULL ?
>>>>>>>>>> It's not initialized in the previous patch either.
>>>>>>>>>>
>>>>>>>>>>> };
>>>>>>>>> This struct is declared as "static storage", so won't
>>>>>>>>> the non-initialized members be 0 ?
>>>>>>>>
>>>>>>>> Correct. Although, I see that 'nested_enable_evmcs' is
>>>>>>>> explicitly initialized. We should maintain the convention, perhaps.
>>>>>>>>
>>>>>>>>>>> static int __init svm_init(void) diff --git
>>>>>>>>>>> a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index
>>>>>>>>>>> 05e953b2ec61..2127ed937f53 100644
>>>>>>>>>>> --- a/arch/x86/kvm/x86.c
>>>>>>>>>>> +++ b/arch/x86/kvm/x86.c
>>>>>>>>>>> @@ -5250,6 +5250,12 @@ long kvm_arch_vm_ioctl(struct file *filp,
>>>>>>>>>>> r = kvm_x86_ops->set_page_enc_bitmap(kvm, &bitmap);
>>>>>>>>>>> break;
>>>>>>>>>>> }
>>>>>>>>>>> + case KVM_PAGE_ENC_BITMAP_RESET: {
>>>>>>>>>>> + r = -ENOTTY;
>>>>>>>>>>> + if (kvm_x86_ops->reset_page_enc_bitmap)
>>>>>>>>>>> + r = kvm_x86_ops->reset_page_enc_bitmap(kvm);
>>>>>>>>>>> + break;
>>>>>>>>>>> + }
>>>>>>>>>>> default:
>>>>>>>>>>> r = -ENOTTY;
>>>>>>>>>>> }
>>>>>>>>>>> diff --git a/include/uapi/linux/kvm.h
>>>>>>>>>>> b/include/uapi/linux/kvm.h index
>>>>>>>>>>> b4b01d47e568..0884a581fc37 100644
>>>>>>>>>>> --- a/include/uapi/linux/kvm.h
>>>>>>>>>>> +++ b/include/uapi/linux/kvm.h
>>>>>>>>>>> @@ -1490,6 +1490,7 @@ struct kvm_enc_region {
>>>>>>>>>>> #define KVM_GET_PAGE_ENC_BITMAP _IOW(KVMIO, 0xc5, struct kvm_page_enc_bitmap)
>>>>>>>>>>> #define KVM_SET_PAGE_ENC_BITMAP _IOW(KVMIO, 0xc6,
>>>>>>>>>>> struct kvm_page_enc_bitmap)
>>>>>>>>>>> +#define KVM_PAGE_ENC_BITMAP_RESET _IO(KVMIO, 0xc7)
>>>>>>>>>>> /* Secure Encrypted Virtualization command */
>>>>>>>>>>> enum sev_cmd_id {
>>>>>>>>>> Reviewed-by: Krish Sadhukhan
>>>>>>>>>> <[email protected]>
>>>>>>>
>>>>>>> Doesn't this overlap with the set ioctl? Yes, obviously, you
>>>>>>> have to copy the new value down and do a bit more work, but
>>>>>>> I don't think resetting the bitmap is going to be the
>>>>>>> bottleneck on reboot. Seems excessive to add another ioctl for this.
>>>>>> The set ioctl is generally available/provided for the incoming
>>>>>> VM to setup the page encryption bitmap, this reset ioctl is
>>>>>> meant for the source VM as a simple interface to reset the whole page encryption bitmap.
>>>>>>
>>>>>> Thanks,
>>>>>> Ashish
>>>>>
>>>>> Hey Ashish,
>>>>>
>>>>> These seem very overlapping. I think this API should be refactored a bit.
>>>>>
>>>>> 1) Use kvm_vm_ioctl_enable_cap to control whether or not this
>>>>> hypercall (and related feature bit) is offered to the VM, and
>>>>> also the size of the buffer.
>>>> If you look at patch 13/14, i have added a new kvm para feature
>>>> called "KVM_FEATURE_SEV_LIVE_MIGRATION" which indicates host
>>>> support for SEV Live Migration and a new Custom MSR which the
>>>> guest does a wrmsr to enable the Live Migration feature, so this
>>>> is like the enable cap support.
>>>>
>>>> There are further extensions to this support i am adding, so patch
>>>> 13/14 of this patch-set is still being enhanced and will have full
>>>> support when i repost next.
>>>>
>>>>> 2) Use set for manipulating values in the bitmap, including
>>>>> resetting the bitmap. Set the bitmap pointer to null if you want
>>>>> to reset to all 0xFFs. When the bitmap pointer is set, it should
>>>>> set the values to exactly what is pointed at, instead of only
>>>>> clearing bits, as is done currently.
>>>> As i mentioned in my earlier email, the set api is supposed to be
>>>> for the incoming VM, but if you really need to use it for the
>>>> outgoing VM then it can be modified.
>>>>
>>>>> 3) Use get for fetching values from the kernel. Personally, I'd
>>>>> require alignment of the base GFN to a multiple of 8 (but the
>>>>> number of pages could be whatever), so you can just use a
>>>>> memcpy. Optionally, you may want some way to tell userspace the
>>>>> size of the existing buffer, so it can ensure that it can ask
>>>>> for the entire buffer without having to track the size in
>>>>> usermode (not strictly necessary, but nice to have since it
>>>>> ensures that there is only one place that has to manage this value).
>>>>>
>>>>> If you want to expand or contract the bitmap, you can use enable
>>>>> cap to adjust the size.
>>>> As being discussed on the earlier mail thread, we are doing this
>>>> dynamically now by computing the guest RAM size when the
>>>> set_user_memory_region ioctl is invoked. I believe that should
>>>> handle the hot-plug and hot-unplug events too, as any hot memory
>>>> updates will need KVM memslots to be updated.
>>> Ahh, sorry, forgot you mentioned this: yes this can work. Host needs
>>> to be able to decide not to allocate, but this should be workable.
>>>>> If you don't want to offer the hypercall to the guest, don't
>>>>> call the enable cap.
>>>>> This API avoids using up another ioctl. Ioctl space is somewhat
>>>>> scarce. It also gives userspace fine grained control over the
>>>>> buffer, so it can support both hot-plug and hot-unplug (or at
>>>>> the very least it is not obviously incompatible with those). It
>>>>> also gives userspace control over whether or not the feature is
>>>>> offered. The hypercall isn't free, and being able to tell guests
>>>>> to not call when the host wasn't going to migrate it anyway will be useful.
>>>>>
>>>> As i mentioned above, now the host indicates if it supports the
>>>> Live Migration feature and the feature and the hypercall are only
>>>> enabled on the host when the guest checks for this support and
>>>> does a wrmsr() to enable the feature. Also the guest will not make
>>>> the hypercall if the host does not indicate support for it.
>>> If my read of those patches was correct, the host will always
>>> advertise support for the hypercall. And the only bit controlling
>>> whether or not the hypercall is advertised is essentially the kernel
>>> version. You need to rollout a new kernel to disable the hypercall.
>> Ahh, awesome, I see I misunderstood how the CPUID bits get passed
>> through: usermode can still override them. Forgot about the back and
>> forth for CPUID with usermode. My point about informing the guest
>> kernel is clearly moot. The host still needs the ability to prevent
>> allocations, but that is more minor. Maybe use a flag on the memslots
>> directly?
>> On second thought: burning the memslot flag for 30mb per tb of VM seems like a waste.
> Currently, I am still using the approach of a "unified" page encryption bitmap instead of a
> bitmap per memslot, with the main change being that the resizing is only done whenever
> there are any updates in memslots, when memslots are updated using the
> kvm_arch_commit_memory_region() interface.
Just a note, I believe kvm_arch_commit_memory_region() maybe getting
called every time there is a change in the memory region (e.g add,
update, delete etc). So your architecture specific hook now need to be
well aware of all those changes and act accordingly. This basically
means that the svm.c will probably need to understand all those memory
slot flags etc. IMO, having a separate ioctl to hint the size makes more
sense if you are doing a unified bitmap but if the bitmap is per memslot
then calculating the size based on the memslot information makes more sense.
On Fri, Apr 10, 2020 at 04:42:29PM -0500, Brijesh Singh wrote:
>
> On 4/10/20 3:55 PM, Kalra, Ashish wrote:
> > [AMD Official Use Only - Internal Distribution Only]
Can you please resend the original mail without the above header, so us
non-AMD folks can follow along? :-)
On 4/10/20 4:46 PM, Sean Christopherson wrote:
> On Fri, Apr 10, 2020 at 04:42:29PM -0500, Brijesh Singh wrote:
>> On 4/10/20 3:55 PM, Kalra, Ashish wrote:
>>> [AMD Official Use Only - Internal Distribution Only]
> Can you please resend the original mail without the above header, so us
> non-AMD folks can follow along? :-)
Haha :) looks like one of us probably used outlook for responding. These
days IT have been auto add some of those tags. Let me resend with it
removed.
resend with internal distribution tag removed.
On 4/10/20 3:55 PM, Kalra, Ashish wrote:
[snip]
..
>
> Hello Steve,
>
> -----Original Message-----
> From: Steve Rutherford <[email protected]>
> Sent: Friday, April 10, 2020 3:19 PM
> To: Kalra, Ashish <[email protected]>
> Cc: Krish Sadhukhan <[email protected]>; Paolo Bonzini <[email protected]>; Thomas Gleixner <[email protected]>; Ingo Molnar <[email protected]>; H. Peter Anvin <[email protected]>; Joerg Roedel <[email protected]>; Borislav Petkov <[email protected]>; Lendacky, Thomas <[email protected]>; X86 ML <[email protected]>; KVM list <[email protected]>; LKML <[email protected]>; David Rientjes <[email protected]>; Andy Lutomirski <[email protected]>; Singh, Brijesh <[email protected]>
> Subject: Re: [PATCH v6 12/14] KVM: x86: Introduce KVM_PAGE_ENC_BITMAP_RESET ioctl
>
> On Fri, Apr 10, 2020 at 1:16 PM Steve Rutherford <[email protected]> wrote:
>> On Fri, Apr 10, 2020 at 11:14 AM Steve Rutherford
>> <[email protected]> wrote:
>>> On Thu, Apr 9, 2020 at 6:34 PM Ashish Kalra <[email protected]> wrote:
>>>> Hello Steve,
>>>>
>>>> On Thu, Apr 09, 2020 at 05:59:56PM -0700, Steve Rutherford wrote:
>>>>> On Tue, Apr 7, 2020 at 6:52 PM Ashish Kalra <[email protected]> wrote:
>>>>>> Hello Steve,
>>>>>>
>>>>>> On Tue, Apr 07, 2020 at 06:25:51PM -0700, Steve Rutherford wrote:
>>>>>>> On Mon, Apr 6, 2020 at 11:53 AM Krish Sadhukhan
>>>>>>> <[email protected]> wrote:
>>>>>>>>
>>>>>>>> On 4/3/20 2:45 PM, Ashish Kalra wrote:
>>>>>>>>> On Fri, Apr 03, 2020 at 02:14:23PM -0700, Krish Sadhukhan wrote:
>>>>>>>>>> On 3/29/20 11:23 PM, Ashish Kalra wrote:
>>>>>>>>>>> From: Ashish Kalra <[email protected]>
>>>>>>>>>>>
>>>>>>>>>>> This ioctl can be used by the application to reset the
>>>>>>>>>>> page encryption bitmap managed by the KVM driver. A
>>>>>>>>>>> typical usage for this ioctl is on VM reboot, on
>>>>>>>>>>> reboot, we must reinitialize the bitmap.
>>>>>>>>>>>
>>>>>>>>>>> Signed-off-by: Ashish Kalra <[email protected]>
>>>>>>>>>>> ---
>>>>>>>>>>> Documentation/virt/kvm/api.rst | 13 +++++++++++++
>>>>>>>>>>> arch/x86/include/asm/kvm_host.h | 1 +
>>>>>>>>>>> arch/x86/kvm/svm.c | 16 ++++++++++++++++
>>>>>>>>>>> arch/x86/kvm/x86.c | 6 ++++++
>>>>>>>>>>> include/uapi/linux/kvm.h | 1 +
>>>>>>>>>>> 5 files changed, 37 insertions(+)
>>>>>>>>>>>
>>>>>>>>>>> diff --git a/Documentation/virt/kvm/api.rst
>>>>>>>>>>> b/Documentation/virt/kvm/api.rst index
>>>>>>>>>>> 4d1004a154f6..a11326ccc51d 100644
>>>>>>>>>>> --- a/Documentation/virt/kvm/api.rst
>>>>>>>>>>> +++ b/Documentation/virt/kvm/api.rst
>>>>>>>>>>> @@ -4698,6 +4698,19 @@ During the guest live migration the outgoing guest exports its page encryption
>>>>>>>>>>> bitmap, the KVM_SET_PAGE_ENC_BITMAP can be used to build the page encryption
>>>>>>>>>>> bitmap for an incoming guest.
>>>>>>>>>>> +4.127 KVM_PAGE_ENC_BITMAP_RESET (vm ioctl)
>>>>>>>>>>> +-----------------------------------------
>>>>>>>>>>> +
>>>>>>>>>>> +:Capability: basic
>>>>>>>>>>> +:Architectures: x86
>>>>>>>>>>> +:Type: vm ioctl
>>>>>>>>>>> +:Parameters: none
>>>>>>>>>>> +:Returns: 0 on success, -1 on error
>>>>>>>>>>> +
>>>>>>>>>>> +The KVM_PAGE_ENC_BITMAP_RESET is used to reset the
>>>>>>>>>>> +guest's page encryption bitmap during guest reboot and this is only done on the guest's boot vCPU.
>>>>>>>>>>> +
>>>>>>>>>>> +
>>>>>>>>>>> 5. The kvm_run structure
>>>>>>>>>>> ======================== diff --git
>>>>>>>>>>> a/arch/x86/include/asm/kvm_host.h
>>>>>>>>>>> b/arch/x86/include/asm/kvm_host.h index
>>>>>>>>>>> d30f770aaaea..a96ef6338cd2 100644
>>>>>>>>>>> --- a/arch/x86/include/asm/kvm_host.h
>>>>>>>>>>> +++ b/arch/x86/include/asm/kvm_host.h
>>>>>>>>>>> @@ -1273,6 +1273,7 @@ struct kvm_x86_ops {
>>>>>>>>>>> struct kvm_page_enc_bitmap *bmap);
>>>>>>>>>>> int (*set_page_enc_bitmap)(struct kvm *kvm,
>>>>>>>>>>> struct kvm_page_enc_bitmap
>>>>>>>>>>> *bmap);
>>>>>>>>>>> + int (*reset_page_enc_bitmap)(struct kvm *kvm);
>>>>>>>>>>> };
>>>>>>>>>>> struct kvm_arch_async_pf { diff --git
>>>>>>>>>>> a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index
>>>>>>>>>>> 313343a43045..c99b0207a443 100644
>>>>>>>>>>> --- a/arch/x86/kvm/svm.c
>>>>>>>>>>> +++ b/arch/x86/kvm/svm.c
>>>>>>>>>>> @@ -7797,6 +7797,21 @@ static int svm_set_page_enc_bitmap(struct kvm *kvm,
>>>>>>>>>>> return ret;
>>>>>>>>>>> }
>>>>>>>>>>> +static int svm_reset_page_enc_bitmap(struct kvm *kvm)
>>>>>>>>>>> +{
>>>>>>>>>>> + struct kvm_sev_info *sev =
>>>>>>>>>>> +&to_kvm_svm(kvm)->sev_info;
>>>>>>>>>>> +
>>>>>>>>>>> + if (!sev_guest(kvm))
>>>>>>>>>>> + return -ENOTTY;
>>>>>>>>>>> +
>>>>>>>>>>> + mutex_lock(&kvm->lock);
>>>>>>>>>>> + /* by default all pages should be marked encrypted */
>>>>>>>>>>> + if (sev->page_enc_bmap_size)
>>>>>>>>>>> + bitmap_fill(sev->page_enc_bmap, sev->page_enc_bmap_size);
>>>>>>>>>>> + mutex_unlock(&kvm->lock);
>>>>>>>>>>> + return 0;
>>>>>>>>>>> +}
>>>>>>>>>>> +
>>>>>>>>>>> static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
>>>>>>>>>>> {
>>>>>>>>>>> struct kvm_sev_cmd sev_cmd; @@ -8203,6 +8218,7 @@
>>>>>>>>>>> static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
>>>>>>>>>>> .page_enc_status_hc = svm_page_enc_status_hc,
>>>>>>>>>>> .get_page_enc_bitmap = svm_get_page_enc_bitmap,
>>>>>>>>>>> .set_page_enc_bitmap = svm_set_page_enc_bitmap,
>>>>>>>>>>> + .reset_page_enc_bitmap =
>>>>>>>>>>> + svm_reset_page_enc_bitmap,
>>>>>>>>>> We don't need to initialize the intel ops to NULL ?
>>>>>>>>>> It's not initialized in the previous patch either.
>>>>>>>>>>
>>>>>>>>>>> };
>>>>>>>>> This struct is declared as "static storage", so won't
>>>>>>>>> the non-initialized members be 0 ?
>>>>>>>>
>>>>>>>> Correct. Although, I see that 'nested_enable_evmcs' is
>>>>>>>> explicitly initialized. We should maintain the convention, perhaps.
>>>>>>>>
>>>>>>>>>>> static int __init svm_init(void) diff --git
>>>>>>>>>>> a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index
>>>>>>>>>>> 05e953b2ec61..2127ed937f53 100644
>>>>>>>>>>> --- a/arch/x86/kvm/x86.c
>>>>>>>>>>> +++ b/arch/x86/kvm/x86.c
>>>>>>>>>>> @@ -5250,6 +5250,12 @@ long kvm_arch_vm_ioctl(struct file *filp,
>>>>>>>>>>> r = kvm_x86_ops->set_page_enc_bitmap(kvm, &bitmap);
>>>>>>>>>>> break;
>>>>>>>>>>> }
>>>>>>>>>>> + case KVM_PAGE_ENC_BITMAP_RESET: {
>>>>>>>>>>> + r = -ENOTTY;
>>>>>>>>>>> + if (kvm_x86_ops->reset_page_enc_bitmap)
>>>>>>>>>>> + r = kvm_x86_ops->reset_page_enc_bitmap(kvm);
>>>>>>>>>>> + break;
>>>>>>>>>>> + }
>>>>>>>>>>> default:
>>>>>>>>>>> r = -ENOTTY;
>>>>>>>>>>> }
>>>>>>>>>>> diff --git a/include/uapi/linux/kvm.h
>>>>>>>>>>> b/include/uapi/linux/kvm.h index
>>>>>>>>>>> b4b01d47e568..0884a581fc37 100644
>>>>>>>>>>> --- a/include/uapi/linux/kvm.h
>>>>>>>>>>> +++ b/include/uapi/linux/kvm.h
>>>>>>>>>>> @@ -1490,6 +1490,7 @@ struct kvm_enc_region {
>>>>>>>>>>> #define KVM_GET_PAGE_ENC_BITMAP _IOW(KVMIO, 0xc5, struct kvm_page_enc_bitmap)
>>>>>>>>>>> #define KVM_SET_PAGE_ENC_BITMAP _IOW(KVMIO, 0xc6,
>>>>>>>>>>> struct kvm_page_enc_bitmap)
>>>>>>>>>>> +#define KVM_PAGE_ENC_BITMAP_RESET _IO(KVMIO, 0xc7)
>>>>>>>>>>> /* Secure Encrypted Virtualization command */
>>>>>>>>>>> enum sev_cmd_id {
>>>>>>>>>> Reviewed-by: Krish Sadhukhan
>>>>>>>>>> <[email protected]>
>>>>>>>
>>>>>>> Doesn't this overlap with the set ioctl? Yes, obviously, you
>>>>>>> have to copy the new value down and do a bit more work, but
>>>>>>> I don't think resetting the bitmap is going to be the
>>>>>>> bottleneck on reboot. Seems excessive to add another ioctl for this.
>>>>>> The set ioctl is generally available/provided for the incoming
>>>>>> VM to setup the page encryption bitmap, this reset ioctl is
>>>>>> meant for the source VM as a simple interface to reset the whole page encryption bitmap.
>>>>>>
>>>>>> Thanks,
>>>>>> Ashish
>>>>>
>>>>> Hey Ashish,
>>>>>
>>>>> These seem very overlapping. I think this API should be refactored a bit.
>>>>>
>>>>> 1) Use kvm_vm_ioctl_enable_cap to control whether or not this
>>>>> hypercall (and related feature bit) is offered to the VM, and
>>>>> also the size of the buffer.
>>>> If you look at patch 13/14, i have added a new kvm para feature
>>>> called "KVM_FEATURE_SEV_LIVE_MIGRATION" which indicates host
>>>> support for SEV Live Migration and a new Custom MSR which the
>>>> guest does a wrmsr to enable the Live Migration feature, so this
>>>> is like the enable cap support.
>>>>
>>>> There are further extensions to this support i am adding, so patch
>>>> 13/14 of this patch-set is still being enhanced and will have full
>>>> support when i repost next.
>>>>
>>>>> 2) Use set for manipulating values in the bitmap, including
>>>>> resetting the bitmap. Set the bitmap pointer to null if you want
>>>>> to reset to all 0xFFs. When the bitmap pointer is set, it should
>>>>> set the values to exactly what is pointed at, instead of only
>>>>> clearing bits, as is done currently.
>>>> As i mentioned in my earlier email, the set api is supposed to be
>>>> for the incoming VM, but if you really need to use it for the
>>>> outgoing VM then it can be modified.
>>>>
>>>>> 3) Use get for fetching values from the kernel. Personally, I'd
>>>>> require alignment of the base GFN to a multiple of 8 (but the
>>>>> number of pages could be whatever), so you can just use a
>>>>> memcpy. Optionally, you may want some way to tell userspace the
>>>>> size of the existing buffer, so it can ensure that it can ask
>>>>> for the entire buffer without having to track the size in
>>>>> usermode (not strictly necessary, but nice to have since it
>>>>> ensures that there is only one place that has to manage this value).
>>>>>
>>>>> If you want to expand or contract the bitmap, you can use enable
>>>>> cap to adjust the size.
>>>> As being discussed on the earlier mail thread, we are doing this
>>>> dynamically now by computing the guest RAM size when the
>>>> set_user_memory_region ioctl is invoked. I believe that should
>>>> handle the hot-plug and hot-unplug events too, as any hot memory
>>>> updates will need KVM memslots to be updated.
>>> Ahh, sorry, forgot you mentioned this: yes this can work. Host needs
>>> to be able to decide not to allocate, but this should be workable.
>>>>> If you don't want to offer the hypercall to the guest, don't
>>>>> call the enable cap.
>>>>> This API avoids using up another ioctl. Ioctl space is somewhat
>>>>> scarce. It also gives userspace fine grained control over the
>>>>> buffer, so it can support both hot-plug and hot-unplug (or at
>>>>> the very least it is not obviously incompatible with those). It
>>>>> also gives userspace control over whether or not the feature is
>>>>> offered. The hypercall isn't free, and being able to tell guests
>>>>> to not call when the host wasn't going to migrate it anyway will be useful.
>>>>>
>>>> As i mentioned above, now the host indicates if it supports the
>>>> Live Migration feature and the feature and the hypercall are only
>>>> enabled on the host when the guest checks for this support and
>>>> does a wrmsr() to enable the feature. Also the guest will not make
>>>> the hypercall if the host does not indicate support for it.
>>> If my read of those patches was correct, the host will always
>>> advertise support for the hypercall. And the only bit controlling
>>> whether or not the hypercall is advertised is essentially the kernel
>>> version. You need to rollout a new kernel to disable the hypercall.
>> Ahh, awesome, I see I misunderstood how the CPUID bits get passed
>> through: usermode can still override them. Forgot about the back and
>> forth for CPUID with usermode. My point about informing the guest
>> kernel is clearly moot. The host still needs the ability to prevent
>> allocations, but that is more minor. Maybe use a flag on the memslots
>> directly?
>> On second thought: burning the memslot flag for 30mb per tb of VM seems like a waste.
> Currently, I am still using the approach of a "unified" page encryption bitmap instead of a
> bitmap per memslot, with the main change being that the resizing is only done whenever
> there are any updates in memslots, when memslots are updated using the
> kvm_arch_commit_memory_region() interface.
Just a note, I believe kvm_arch_commit_memory_region() maybe getting
called every time there is a change in the memory region (e.g add,
update, delete etc). So your architecture specific hook now need to be
well aware of all those changes and act accordingly. This basically
means that the svm.c will probably need to understand all those memory
slot flags etc. IMO, having a separate ioctl to hint the size makes more
sense if you are doing a unified bitmap but if the bitmap is per memslot
then calculating the size based on the memslot information makes more sense.
Hello Brijesh,
On Fri, Apr 10, 2020 at 05:02:46PM -0500, Brijesh Singh wrote:
> resend with internal distribution tag removed.
>
>
> On 4/10/20 3:55 PM, Kalra, Ashish wrote:
> [snip]
> ..
> >
> > Hello Steve,
> >
> > -----Original Message-----
> > From: Steve Rutherford <[email protected]>
> > Sent: Friday, April 10, 2020 3:19 PM
> > To: Kalra, Ashish <[email protected]>
> > Cc: Krish Sadhukhan <[email protected]>; Paolo Bonzini <[email protected]>; Thomas Gleixner <[email protected]>; Ingo Molnar <[email protected]>; H. Peter Anvin <[email protected]>; Joerg Roedel <[email protected]>; Borislav Petkov <[email protected]>; Lendacky, Thomas <[email protected]>; X86 ML <[email protected]>; KVM list <[email protected]>; LKML <[email protected]>; David Rientjes <[email protected]>; Andy Lutomirski <[email protected]>; Singh, Brijesh <[email protected]>
> > Subject: Re: [PATCH v6 12/14] KVM: x86: Introduce KVM_PAGE_ENC_BITMAP_RESET ioctl
> >
> > On Fri, Apr 10, 2020 at 1:16 PM Steve Rutherford <[email protected]> wrote:
> >> On Fri, Apr 10, 2020 at 11:14 AM Steve Rutherford
> >> <[email protected]> wrote:
> >>> On Thu, Apr 9, 2020 at 6:34 PM Ashish Kalra <[email protected]> wrote:
> >>>> Hello Steve,
> >>>>
> >>>> On Thu, Apr 09, 2020 at 05:59:56PM -0700, Steve Rutherford wrote:
> >>>>> On Tue, Apr 7, 2020 at 6:52 PM Ashish Kalra <[email protected]> wrote:
> >>>>>> Hello Steve,
> >>>>>>
> >>>>>> On Tue, Apr 07, 2020 at 06:25:51PM -0700, Steve Rutherford wrote:
> >>>>>>> On Mon, Apr 6, 2020 at 11:53 AM Krish Sadhukhan
> >>>>>>> <[email protected]> wrote:
> >>>>>>>>
> >>>>>>>> On 4/3/20 2:45 PM, Ashish Kalra wrote:
> >>>>>>>>> On Fri, Apr 03, 2020 at 02:14:23PM -0700, Krish Sadhukhan wrote:
> >>>>>>>>>> On 3/29/20 11:23 PM, Ashish Kalra wrote:
> >>>>>>>>>>> From: Ashish Kalra <[email protected]>
> >>>>>>>>>>>
> >>>>>>>>>>> This ioctl can be used by the application to reset the
> >>>>>>>>>>> page encryption bitmap managed by the KVM driver. A
> >>>>>>>>>>> typical usage for this ioctl is on VM reboot, on
> >>>>>>>>>>> reboot, we must reinitialize the bitmap.
> >>>>>>>>>>>
> >>>>>>>>>>> Signed-off-by: Ashish Kalra <[email protected]>
> >>>>>>>>>>> ---
> >>>>>>>>>>> Documentation/virt/kvm/api.rst | 13 +++++++++++++
> >>>>>>>>>>> arch/x86/include/asm/kvm_host.h | 1 +
> >>>>>>>>>>> arch/x86/kvm/svm.c | 16 ++++++++++++++++
> >>>>>>>>>>> arch/x86/kvm/x86.c | 6 ++++++
> >>>>>>>>>>> include/uapi/linux/kvm.h | 1 +
> >>>>>>>>>>> 5 files changed, 37 insertions(+)
> >>>>>>>>>>>
> >>>>>>>>>>> diff --git a/Documentation/virt/kvm/api.rst
> >>>>>>>>>>> b/Documentation/virt/kvm/api.rst index
> >>>>>>>>>>> 4d1004a154f6..a11326ccc51d 100644
> >>>>>>>>>>> --- a/Documentation/virt/kvm/api.rst
> >>>>>>>>>>> +++ b/Documentation/virt/kvm/api.rst
> >>>>>>>>>>> @@ -4698,6 +4698,19 @@ During the guest live migration the outgoing guest exports its page encryption
> >>>>>>>>>>> bitmap, the KVM_SET_PAGE_ENC_BITMAP can be used to build the page encryption
> >>>>>>>>>>> bitmap for an incoming guest.
> >>>>>>>>>>> +4.127 KVM_PAGE_ENC_BITMAP_RESET (vm ioctl)
> >>>>>>>>>>> +-----------------------------------------
> >>>>>>>>>>> +
> >>>>>>>>>>> +:Capability: basic
> >>>>>>>>>>> +:Architectures: x86
> >>>>>>>>>>> +:Type: vm ioctl
> >>>>>>>>>>> +:Parameters: none
> >>>>>>>>>>> +:Returns: 0 on success, -1 on error
> >>>>>>>>>>> +
> >>>>>>>>>>> +The KVM_PAGE_ENC_BITMAP_RESET is used to reset the
> >>>>>>>>>>> +guest's page encryption bitmap during guest reboot and this is only done on the guest's boot vCPU.
> >>>>>>>>>>> +
> >>>>>>>>>>> +
> >>>>>>>>>>> 5. The kvm_run structure
> >>>>>>>>>>> ======================== diff --git
> >>>>>>>>>>> a/arch/x86/include/asm/kvm_host.h
> >>>>>>>>>>> b/arch/x86/include/asm/kvm_host.h index
> >>>>>>>>>>> d30f770aaaea..a96ef6338cd2 100644
> >>>>>>>>>>> --- a/arch/x86/include/asm/kvm_host.h
> >>>>>>>>>>> +++ b/arch/x86/include/asm/kvm_host.h
> >>>>>>>>>>> @@ -1273,6 +1273,7 @@ struct kvm_x86_ops {
> >>>>>>>>>>> struct kvm_page_enc_bitmap *bmap);
> >>>>>>>>>>> int (*set_page_enc_bitmap)(struct kvm *kvm,
> >>>>>>>>>>> struct kvm_page_enc_bitmap
> >>>>>>>>>>> *bmap);
> >>>>>>>>>>> + int (*reset_page_enc_bitmap)(struct kvm *kvm);
> >>>>>>>>>>> };
> >>>>>>>>>>> struct kvm_arch_async_pf { diff --git
> >>>>>>>>>>> a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index
> >>>>>>>>>>> 313343a43045..c99b0207a443 100644
> >>>>>>>>>>> --- a/arch/x86/kvm/svm.c
> >>>>>>>>>>> +++ b/arch/x86/kvm/svm.c
> >>>>>>>>>>> @@ -7797,6 +7797,21 @@ static int svm_set_page_enc_bitmap(struct kvm *kvm,
> >>>>>>>>>>> return ret;
> >>>>>>>>>>> }
> >>>>>>>>>>> +static int svm_reset_page_enc_bitmap(struct kvm *kvm)
> >>>>>>>>>>> +{
> >>>>>>>>>>> + struct kvm_sev_info *sev =
> >>>>>>>>>>> +&to_kvm_svm(kvm)->sev_info;
> >>>>>>>>>>> +
> >>>>>>>>>>> + if (!sev_guest(kvm))
> >>>>>>>>>>> + return -ENOTTY;
> >>>>>>>>>>> +
> >>>>>>>>>>> + mutex_lock(&kvm->lock);
> >>>>>>>>>>> + /* by default all pages should be marked encrypted */
> >>>>>>>>>>> + if (sev->page_enc_bmap_size)
> >>>>>>>>>>> + bitmap_fill(sev->page_enc_bmap, sev->page_enc_bmap_size);
> >>>>>>>>>>> + mutex_unlock(&kvm->lock);
> >>>>>>>>>>> + return 0;
> >>>>>>>>>>> +}
> >>>>>>>>>>> +
> >>>>>>>>>>> static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
> >>>>>>>>>>> {
> >>>>>>>>>>> struct kvm_sev_cmd sev_cmd; @@ -8203,6 +8218,7 @@
> >>>>>>>>>>> static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
> >>>>>>>>>>> .page_enc_status_hc = svm_page_enc_status_hc,
> >>>>>>>>>>> .get_page_enc_bitmap = svm_get_page_enc_bitmap,
> >>>>>>>>>>> .set_page_enc_bitmap = svm_set_page_enc_bitmap,
> >>>>>>>>>>> + .reset_page_enc_bitmap =
> >>>>>>>>>>> + svm_reset_page_enc_bitmap,
> >>>>>>>>>> We don't need to initialize the intel ops to NULL ?
> >>>>>>>>>> It's not initialized in the previous patch either.
> >>>>>>>>>>
> >>>>>>>>>>> };
> >>>>>>>>> This struct is declared as "static storage", so won't
> >>>>>>>>> the non-initialized members be 0 ?
> >>>>>>>>
> >>>>>>>> Correct. Although, I see that 'nested_enable_evmcs' is
> >>>>>>>> explicitly initialized. We should maintain the convention, perhaps.
> >>>>>>>>
> >>>>>>>>>>> static int __init svm_init(void) diff --git
> >>>>>>>>>>> a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index
> >>>>>>>>>>> 05e953b2ec61..2127ed937f53 100644
> >>>>>>>>>>> --- a/arch/x86/kvm/x86.c
> >>>>>>>>>>> +++ b/arch/x86/kvm/x86.c
> >>>>>>>>>>> @@ -5250,6 +5250,12 @@ long kvm_arch_vm_ioctl(struct file *filp,
> >>>>>>>>>>> r = kvm_x86_ops->set_page_enc_bitmap(kvm, &bitmap);
> >>>>>>>>>>> break;
> >>>>>>>>>>> }
> >>>>>>>>>>> + case KVM_PAGE_ENC_BITMAP_RESET: {
> >>>>>>>>>>> + r = -ENOTTY;
> >>>>>>>>>>> + if (kvm_x86_ops->reset_page_enc_bitmap)
> >>>>>>>>>>> + r = kvm_x86_ops->reset_page_enc_bitmap(kvm);
> >>>>>>>>>>> + break;
> >>>>>>>>>>> + }
> >>>>>>>>>>> default:
> >>>>>>>>>>> r = -ENOTTY;
> >>>>>>>>>>> }
> >>>>>>>>>>> diff --git a/include/uapi/linux/kvm.h
> >>>>>>>>>>> b/include/uapi/linux/kvm.h index
> >>>>>>>>>>> b4b01d47e568..0884a581fc37 100644
> >>>>>>>>>>> --- a/include/uapi/linux/kvm.h
> >>>>>>>>>>> +++ b/include/uapi/linux/kvm.h
> >>>>>>>>>>> @@ -1490,6 +1490,7 @@ struct kvm_enc_region {
> >>>>>>>>>>> #define KVM_GET_PAGE_ENC_BITMAP _IOW(KVMIO, 0xc5, struct kvm_page_enc_bitmap)
> >>>>>>>>>>> #define KVM_SET_PAGE_ENC_BITMAP _IOW(KVMIO, 0xc6,
> >>>>>>>>>>> struct kvm_page_enc_bitmap)
> >>>>>>>>>>> +#define KVM_PAGE_ENC_BITMAP_RESET _IO(KVMIO, 0xc7)
> >>>>>>>>>>> /* Secure Encrypted Virtualization command */
> >>>>>>>>>>> enum sev_cmd_id {
> >>>>>>>>>> Reviewed-by: Krish Sadhukhan
> >>>>>>>>>> <[email protected]>
> >>>>>>>
> >>>>>>> Doesn't this overlap with the set ioctl? Yes, obviously, you
> >>>>>>> have to copy the new value down and do a bit more work, but
> >>>>>>> I don't think resetting the bitmap is going to be the
> >>>>>>> bottleneck on reboot. Seems excessive to add another ioctl for this.
> >>>>>> The set ioctl is generally available/provided for the incoming
> >>>>>> VM to setup the page encryption bitmap, this reset ioctl is
> >>>>>> meant for the source VM as a simple interface to reset the whole page encryption bitmap.
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Ashish
> >>>>>
> >>>>> Hey Ashish,
> >>>>>
> >>>>> These seem very overlapping. I think this API should be refactored a bit.
> >>>>>
> >>>>> 1) Use kvm_vm_ioctl_enable_cap to control whether or not this
> >>>>> hypercall (and related feature bit) is offered to the VM, and
> >>>>> also the size of the buffer.
> >>>> If you look at patch 13/14, i have added a new kvm para feature
> >>>> called "KVM_FEATURE_SEV_LIVE_MIGRATION" which indicates host
> >>>> support for SEV Live Migration and a new Custom MSR which the
> >>>> guest does a wrmsr to enable the Live Migration feature, so this
> >>>> is like the enable cap support.
> >>>>
> >>>> There are further extensions to this support i am adding, so patch
> >>>> 13/14 of this patch-set is still being enhanced and will have full
> >>>> support when i repost next.
> >>>>
> >>>>> 2) Use set for manipulating values in the bitmap, including
> >>>>> resetting the bitmap. Set the bitmap pointer to null if you want
> >>>>> to reset to all 0xFFs. When the bitmap pointer is set, it should
> >>>>> set the values to exactly what is pointed at, instead of only
> >>>>> clearing bits, as is done currently.
> >>>> As i mentioned in my earlier email, the set api is supposed to be
> >>>> for the incoming VM, but if you really need to use it for the
> >>>> outgoing VM then it can be modified.
> >>>>
> >>>>> 3) Use get for fetching values from the kernel. Personally, I'd
> >>>>> require alignment of the base GFN to a multiple of 8 (but the
> >>>>> number of pages could be whatever), so you can just use a
> >>>>> memcpy. Optionally, you may want some way to tell userspace the
> >>>>> size of the existing buffer, so it can ensure that it can ask
> >>>>> for the entire buffer without having to track the size in
> >>>>> usermode (not strictly necessary, but nice to have since it
> >>>>> ensures that there is only one place that has to manage this value).
> >>>>>
> >>>>> If you want to expand or contract the bitmap, you can use enable
> >>>>> cap to adjust the size.
> >>>> As being discussed on the earlier mail thread, we are doing this
> >>>> dynamically now by computing the guest RAM size when the
> >>>> set_user_memory_region ioctl is invoked. I believe that should
> >>>> handle the hot-plug and hot-unplug events too, as any hot memory
> >>>> updates will need KVM memslots to be updated.
> >>> Ahh, sorry, forgot you mentioned this: yes this can work. Host needs
> >>> to be able to decide not to allocate, but this should be workable.
> >>>>> If you don't want to offer the hypercall to the guest, don't
> >>>>> call the enable cap.
> >>>>> This API avoids using up another ioctl. Ioctl space is somewhat
> >>>>> scarce. It also gives userspace fine grained control over the
> >>>>> buffer, so it can support both hot-plug and hot-unplug (or at
> >>>>> the very least it is not obviously incompatible with those). It
> >>>>> also gives userspace control over whether or not the feature is
> >>>>> offered. The hypercall isn't free, and being able to tell guests
> >>>>> to not call when the host wasn't going to migrate it anyway will be useful.
> >>>>>
> >>>> As i mentioned above, now the host indicates if it supports the
> >>>> Live Migration feature and the feature and the hypercall are only
> >>>> enabled on the host when the guest checks for this support and
> >>>> does a wrmsr() to enable the feature. Also the guest will not make
> >>>> the hypercall if the host does not indicate support for it.
> >>> If my read of those patches was correct, the host will always
> >>> advertise support for the hypercall. And the only bit controlling
> >>> whether or not the hypercall is advertised is essentially the kernel
> >>> version. You need to rollout a new kernel to disable the hypercall.
> >> Ahh, awesome, I see I misunderstood how the CPUID bits get passed
> >> through: usermode can still override them. Forgot about the back and
> >> forth for CPUID with usermode. My point about informing the guest
> >> kernel is clearly moot. The host still needs the ability to prevent
> >> allocations, but that is more minor. Maybe use a flag on the memslots
> >> directly?
> >> On second thought: burning the memslot flag for 30mb per tb of VM seems like a waste.
> > Currently, I am still using the approach of a "unified" page encryption bitmap instead of a
> > bitmap per memslot, with the main change being that the resizing is only done whenever
> > there are any updates in memslots, when memslots are updated using the
> > kvm_arch_commit_memory_region() interface.
>
>
> Just a note, I believe kvm_arch_commit_memory_region() maybe getting
> called every time there is a change in the memory region (e.g add,
> update, delete etc). So your architecture specific hook now need to be
> well aware of all those changes and act accordingly. This basically
> means that the svm.c will probably need to understand all those memory
> slot flags etc. IMO, having a separate ioctl to hint the size makes more
> sense if you are doing a unified bitmap but if the bitmap is per memslot
> then calculating the size based on the memslot information makes more sense.
>
If instead of unified bitmap, i use a bitmap per memslot approach, even
then the svm/sev code will still need to to have knowledge of memslot flags
etc., that information will be required as svm/sev code will be
responsible for syncing the page encryption bitmap to userspace and not
the generic KVM x86 code which gets invoked for the dirty page bitmap sync.
Currently, the architecture hooks need to be aware of
KVM_MR_ADD/KVM_MR_DELETE flags and the resize will only happen
if the highest guest PA that is mapped by a memslot gets modified,
otherwise, typically, there will mainly one resize at the initial guest
launch.
Thanks,
Ashish