Previously, when a protected VM was rebooted or when it was shut down,
its memory was made unprotected, and then the protected VM itself was
destroyed. Looping over the whole address space can take some time,
considering the overhead of the various Ultravisor Calls (UVCs). This
means that a reboot or a shutdown would take a potentially long amount
of time, depending on the amount of used memory.
This patchseries implements a deferred destroy mechanism for protected
guests. When a protected guest is destroyed, its memory can be cleared
in background, allowing the guest to restart or terminate significantly
faster than before.
There are 2 possibilities when a protected VM is torn down:
* it still has an address space associated (reboot case)
* it does not have an address space anymore (shutdown case)
For the reboot case, two new commands are available for the
KVM_S390_PV_COMMAND:
KVM_PV_ASYNC_CLEANUP_PREPARE: prepares the current protected VM for
asynchronous teardown. The current VM will then continue immediately
as non-protected. If a protected VM had already been set aside without
starting the teardown process, this call will fail. In this case the
userspace process should issue a normal KVM_PV_DISABLE
KVM_PV_ASYNC_CLEANUP_PERFORM: tears down the protected VM previously
set aside for asychronous teardown. This PV command should ideally be
issued by userspace from a separate thread. If a fatal signal is
received (or the process terminates naturally), the command will
terminate immediately without completing. The rest of the normal KVM
teardown process will take care of properly cleaning up all leftovers.
The idea is that userspace should first issue the
KVM_PV_ASYNC_CLEANUP_PREPARE command, and in case of success, create a
new thread and issue KVM_PV_ASYNC_CLEANUP_PERFORM from there. This also
allows for proper accounting of the CPU time needed for the
asynchronous teardown.
This means that the same address space can have memory belonging to
more than one protected guest, although only one will be running, the
others will in fact not even have any CPUs.
The shutdown case should be dealt with in userspace (e.g. using
clone(CLONE_VM)).
A module parameter is also provided to disable the new functionality,
which is otherwise enabled by default. This should not be an issue
since the new functionality is opt-in anyway. This is mainly thought to
aid debugging.
v12->v13
* drop the patches that have been already merged
* rebase
Claudio Imbrenda (6):
KVM: s390: pv: asynchronous destroy for reboot
KVM: s390: pv: api documentation for asynchronous destroy
KVM: s390: pv: add KVM_CAP_S390_PROTECTED_ASYNC_DISABLE
KVM: s390: pv: avoid export before import if possible
KVM: s390: pv: support for Destroy fast UVC
KVM: s390: pv: module parameter to fence asynchronous destroy
Documentation/virt/kvm/api.rst | 30 ++-
arch/s390/include/asm/kvm_host.h | 2 +
arch/s390/include/asm/uv.h | 10 +
arch/s390/kernel/uv.c | 2 +
arch/s390/kvm/kvm-s390.c | 55 ++++-
arch/s390/kvm/kvm-s390.h | 3 +
arch/s390/kvm/pv.c | 331 ++++++++++++++++++++++++++++++-
include/uapi/linux/kvm.h | 3 +
8 files changed, 416 insertions(+), 20 deletions(-)
--
2.37.1
Add support for the Destroy Secure Configuration Fast Ultravisor call,
and take advantage of it for asynchronous destroy.
When supported, the protected guest is destroyed immediately using the
new UVC, leaving only the memory to be cleaned up asynchronously.
Signed-off-by: Claudio Imbrenda <[email protected]>
---
arch/s390/include/asm/uv.h | 10 +++++++
arch/s390/kvm/pv.c | 57 ++++++++++++++++++++++++++++++++------
2 files changed, 59 insertions(+), 8 deletions(-)
diff --git a/arch/s390/include/asm/uv.h b/arch/s390/include/asm/uv.h
index be3ef9dd6972..28a9ad57b6f1 100644
--- a/arch/s390/include/asm/uv.h
+++ b/arch/s390/include/asm/uv.h
@@ -34,6 +34,7 @@
#define UVC_CMD_INIT_UV 0x000f
#define UVC_CMD_CREATE_SEC_CONF 0x0100
#define UVC_CMD_DESTROY_SEC_CONF 0x0101
+#define UVC_CMD_DESTROY_SEC_CONF_FAST 0x0102
#define UVC_CMD_CREATE_SEC_CPU 0x0120
#define UVC_CMD_DESTROY_SEC_CPU 0x0121
#define UVC_CMD_CONV_TO_SEC_STOR 0x0200
@@ -81,6 +82,7 @@ enum uv_cmds_inst {
BIT_UVC_CMD_UNSHARE_ALL = 20,
BIT_UVC_CMD_PIN_PAGE_SHARED = 21,
BIT_UVC_CMD_UNPIN_PAGE_SHARED = 22,
+ BIT_UVC_CMD_DESTROY_SEC_CONF_FAST = 23,
BIT_UVC_CMD_DUMP_INIT = 24,
BIT_UVC_CMD_DUMP_CONFIG_STOR_STATE = 25,
BIT_UVC_CMD_DUMP_CPU = 26,
@@ -230,6 +232,14 @@ struct uv_cb_nodata {
u64 reserved20[4];
} __packed __aligned(8);
+/* Destroy Configuration Fast */
+struct uv_cb_destroy_fast {
+ struct uv_cb_header header;
+ u64 reserved08[2];
+ u64 handle;
+ u64 reserved20[5];
+} __packed __aligned(8);
+
/* Set Shared Access */
struct uv_cb_share {
struct uv_cb_header header;
diff --git a/arch/s390/kvm/pv.c b/arch/s390/kvm/pv.c
index eba4f3fc2138..160ef95cea68 100644
--- a/arch/s390/kvm/pv.c
+++ b/arch/s390/kvm/pv.c
@@ -201,6 +201,9 @@ static int kvm_s390_pv_dispose_one_leftover(struct kvm *kvm,
{
int cc;
+ /* It used the destroy-fast UVC, nothing left to do here */
+ if (!leftover->handle)
+ goto done_fast;
cc = uv_cmd_nodata(leftover->handle, UVC_CMD_DESTROY_SEC_CONF, rc, rrc);
KVM_UV_EVENT(kvm, 3, "PROTVIRT DESTROY LEFTOVER VM: rc %x rrc %x", *rc, *rrc);
WARN_ONCE(cc, "protvirt destroy leftover vm failed rc %x rrc %x", *rc, *rrc);
@@ -215,6 +218,7 @@ static int kvm_s390_pv_dispose_one_leftover(struct kvm *kvm,
free_pages(leftover->stor_base, get_order(uv_info.guest_base_stor_len));
free_pages(leftover->old_gmap_table, CRST_ALLOC_ORDER);
vfree(leftover->stor_var);
+done_fast:
atomic_dec(&kvm->mm->context.protected_count);
return 0;
}
@@ -248,6 +252,32 @@ static void kvm_s390_destroy_lower_2g(struct kvm *kvm)
srcu_read_unlock(&kvm->srcu, srcu_idx);
}
+static int kvm_s390_pv_deinit_vm_fast(struct kvm *kvm, u16 *rc, u16 *rrc)
+{
+ struct uv_cb_destroy_fast uvcb = {
+ .header.cmd = UVC_CMD_DESTROY_SEC_CONF_FAST,
+ .header.len = sizeof(uvcb),
+ .handle = kvm_s390_pv_get_handle(kvm),
+ };
+ int cc;
+
+ cc = uv_call_sched(0, (u64)&uvcb);
+ *rc = uvcb.header.rc;
+ *rrc = uvcb.header.rrc;
+ WRITE_ONCE(kvm->arch.gmap->guest_handle, 0);
+ KVM_UV_EVENT(kvm, 3, "PROTVIRT DESTROY VM FAST: rc %x rrc %x", *rc, *rrc);
+ WARN_ONCE(cc, "protvirt destroy vm fast failed rc %x rrc %x", *rc, *rrc);
+ /* Inteded memory leak on "impossible" error */
+ if (!cc)
+ kvm_s390_pv_dealloc_vm(kvm);
+ return cc ? -EIO : 0;
+}
+
+static inline bool is_destroy_fast_available(void)
+{
+ return test_bit_inv(BIT_UVC_CMD_DESTROY_SEC_CONF_FAST, uv_info.inst_calls_list);
+}
+
/**
* kvm_s390_pv_set_aside - Set aside a protected VM for later teardown.
* @kvm: the VM
@@ -269,6 +299,7 @@ static void kvm_s390_destroy_lower_2g(struct kvm *kvm)
int kvm_s390_pv_set_aside(struct kvm *kvm, u16 *rc, u16 *rrc)
{
struct pv_vm_to_be_destroyed *priv;
+ int res = 0;
/*
* If another protected VM was already prepared, refuse.
@@ -280,14 +311,21 @@ int kvm_s390_pv_set_aside(struct kvm *kvm, u16 *rc, u16 *rrc)
if (!priv)
return -ENOMEM;
- priv->stor_var = kvm->arch.pv.stor_var;
- priv->stor_base = kvm->arch.pv.stor_base;
- priv->handle = kvm_s390_pv_get_handle(kvm);
- priv->old_gmap_table = (unsigned long)kvm->arch.gmap->table;
- WRITE_ONCE(kvm->arch.gmap->guest_handle, 0);
- if (s390_replace_asce(kvm->arch.gmap)) {
+ if (is_destroy_fast_available()) {
+ res = kvm_s390_pv_deinit_vm_fast(kvm, rc, rrc);
+ } else {
+ priv->stor_var = kvm->arch.pv.stor_var;
+ priv->stor_base = kvm->arch.pv.stor_base;
+ priv->handle = kvm_s390_pv_get_handle(kvm);
+ priv->old_gmap_table = (unsigned long)kvm->arch.gmap->table;
+ WRITE_ONCE(kvm->arch.gmap->guest_handle, 0);
+ if (s390_replace_asce(kvm->arch.gmap))
+ res = -ENOMEM;
+ }
+
+ if (res) {
kfree(priv);
- return -ENOMEM;
+ return res;
}
kvm_s390_destroy_lower_2g(kvm);
@@ -464,6 +502,7 @@ static void kvm_s390_pv_mmu_notifier_release(struct mmu_notifier *subscription,
{
struct kvm *kvm = container_of(subscription, struct kvm, arch.pv.mmu_notifier);
u16 dummy;
+ int r;
/*
* No locking is needed since this is the last thread of the last user of this
@@ -472,7 +511,9 @@ static void kvm_s390_pv_mmu_notifier_release(struct mmu_notifier *subscription,
* unregistered. This means that if this notifier runs, then the
* struct kvm is still valid.
*/
- kvm_s390_cpus_from_pv(kvm, &dummy, &dummy);
+ r = kvm_s390_cpus_from_pv(kvm, &dummy, &dummy);
+ if (!r && is_destroy_fast_available())
+ kvm_s390_pv_deinit_vm_fast(kvm, &dummy, &dummy);
}
static const struct mmu_notifier_ops kvm_s390_pv_mmu_notifier_ops = {
--
2.37.1
Add the module parameter "async_destroy", to allow the asynchronous
destroy mechanism to be switched off. This might be useful for
debugging purposes.
The parameter is enabled by default since the feature is opt-in anyway.
Signed-off-by: Claudio Imbrenda <[email protected]>
Reviewed-by: Janosch Frank <[email protected]>
---
arch/s390/kvm/kvm-s390.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 4a20a6be6601..8c7af96c4546 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -209,7 +209,13 @@ unsigned int diag9c_forwarding_hz;
module_param(diag9c_forwarding_hz, uint, 0644);
MODULE_PARM_DESC(diag9c_forwarding_hz, "Maximum diag9c forwarding per second, 0 to turn off");
-static int async_destroy;
+/*
+ * allow asynchronous deinit for protected guests; enable by default since
+ * the feature is opt-in anyway
+ */
+static int async_destroy = 1;
+module_param(async_destroy, int, 0444);
+MODULE_PARM_DESC(async_destroy, "Asynchronous destroy for protected guests");
/*
* For now we handle at most 16 double words as this is what the s390 base
--
2.37.1
Quoting Claudio Imbrenda (2022-08-10 14:56:24)
> Add support for the Destroy Secure Configuration Fast Ultravisor call,
> and take advantage of it for asynchronous destroy.
>
> When supported, the protected guest is destroyed immediately using the
> new UVC, leaving only the memory to be cleaned up asynchronously.
>
> Signed-off-by: Claudio Imbrenda <[email protected]>
Reviewed-by: Nico Boehr <[email protected]>
On 8/10/22 14:56, Claudio Imbrenda wrote:
> Add support for the Destroy Secure Configuration Fast Ultravisor call,
> and take advantage of it for asynchronous destroy.
>
> When supported, the protected guest is destroyed immediately using the
> new UVC, leaving only the memory to be cleaned up asynchronously.
>
> Signed-off-by: Claudio Imbrenda <[email protected]>
Reviewed-by: Janosch Frank <[email protected]>
> ---
> arch/s390/include/asm/uv.h | 10 +++++++
> arch/s390/kvm/pv.c | 57 ++++++++++++++++++++++++++++++++------
> 2 files changed, 59 insertions(+), 8 deletions(-)
>
> diff --git a/arch/s390/include/asm/uv.h b/arch/s390/include/asm/uv.h
> index be3ef9dd6972..28a9ad57b6f1 100644
> --- a/arch/s390/include/asm/uv.h
> +++ b/arch/s390/include/asm/uv.h
> @@ -34,6 +34,7 @@
> #define UVC_CMD_INIT_UV 0x000f
> #define UVC_CMD_CREATE_SEC_CONF 0x0100
> #define UVC_CMD_DESTROY_SEC_CONF 0x0101
> +#define UVC_CMD_DESTROY_SEC_CONF_FAST 0x0102
> #define UVC_CMD_CREATE_SEC_CPU 0x0120
> #define UVC_CMD_DESTROY_SEC_CPU 0x0121
> #define UVC_CMD_CONV_TO_SEC_STOR 0x0200
> @@ -81,6 +82,7 @@ enum uv_cmds_inst {
> BIT_UVC_CMD_UNSHARE_ALL = 20,
> BIT_UVC_CMD_PIN_PAGE_SHARED = 21,
> BIT_UVC_CMD_UNPIN_PAGE_SHARED = 22,
> + BIT_UVC_CMD_DESTROY_SEC_CONF_FAST = 23,
> BIT_UVC_CMD_DUMP_INIT = 24,
> BIT_UVC_CMD_DUMP_CONFIG_STOR_STATE = 25,
> BIT_UVC_CMD_DUMP_CPU = 26,
> @@ -230,6 +232,14 @@ struct uv_cb_nodata {
> u64 reserved20[4];
> } __packed __aligned(8);
>
> +/* Destroy Configuration Fast */
> +struct uv_cb_destroy_fast {
> + struct uv_cb_header header;
> + u64 reserved08[2];
> + u64 handle;
> + u64 reserved20[5];
> +} __packed __aligned(8);