Received: by 2002:a05:6a10:6d10:0:0:0:0 with SMTP id gq16csp769525pxb; Fri, 15 Apr 2022 10:51:28 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzvSMSgKaeKj+Ekus252Si+656TsA1E/ZQP5mT8EpKY4essRyQ/PCmyqFaUea49hNUYEh/6 X-Received: by 2002:a17:90a:8c8e:b0:1c9:c81d:9e13 with SMTP id b14-20020a17090a8c8e00b001c9c81d9e13mr15553pjo.123.1650045088545; Fri, 15 Apr 2022 10:51:28 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1650045088; cv=none; d=google.com; s=arc-20160816; b=iA/+DIwZAWBF9HktI0cUAvfnhq1G2d+5jGfLHK6dhYbc3lopINLZXLzkEU4EFz3hNt 8hli614vANtXl6EGGfhDV/DE+JMP0lW7JqNUW6xK69RCAlrZnDizXdsQKkLls6TXvvVC 4DZzkGkstl4rDeWV3hW94yQSTVQoNxderh0mRE5gqeBuAxP6KJnQd60GMEMOnkj4ZtOY usvxFBpjRN+5renFEfl6QWa8gYsALGVB345YdDVQRoyJy+0SePwNCih30hEGgrlx3Ksj /qHDtl0QnDBE1ZUm9NVAc3beEJuCDa+17aWJqrEOd342XKG1kHAA5WtS3nZVpmqlKehO yMng== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=6MJLcIGPz/ye8Ef9utDxaWLPKjIMbFIfQ1aVZE2v3u4=; b=i9VwLR5r0uOl6x2mF+5umDijKJ0nDrGTalxHCpfwtzoGJM4u2saGiacg0+i6rkwdx2 FrNQczEz3R4Gr00mZJukiN81qlmU0uaWc7j9Ck2PitQcP0JFTNgqbxJWkMt45jzDafj9 hGkQBxw0+bdRj5yqx9NcR1T2QPIg5Y7DRsrFPBIz8c5QZKNxrC8wKvZcSowltJlGG8dh zdRiYGHkwnn5XeaGbISEMjsuq2oJIMGuyW4en7xKb00fYmEdzn8O9jz+h7U9BtkU18Fk ZZzhI6ARO5kb6bi2EmRFkqF+tsHRK23QI5/VUuC9YFblNbb7KyL0Pa4uAGrlMqRlYPsS rVJA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@ibm.com header.s=pp1 header.b=WnszT0xM; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=ibm.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id q15-20020a17090311cf00b00153b2d16571si2077973plh.377.2022.04.15.10.51.01; Fri, 15 Apr 2022 10:51:28 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@ibm.com header.s=pp1 header.b=WnszT0xM; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=ibm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240879AbiDNIHJ (ORCPT + 99 others); Thu, 14 Apr 2022 04:07:09 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56056 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S240396AbiDNIGF (ORCPT ); Thu, 14 Apr 2022 04:06:05 -0400 Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 09A444DF6D; Thu, 14 Apr 2022 01:03:32 -0700 (PDT) Received: from pps.filterd (m0127361.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.1.2/8.16.1.2) with SMTP id 23E7EAHn022718; Thu, 14 Apr 2022 08:03:32 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=pp1; bh=6MJLcIGPz/ye8Ef9utDxaWLPKjIMbFIfQ1aVZE2v3u4=; b=WnszT0xMedfK6aXHx0Dpn56oSvOPABY6yuvk0OHbKs/I2E5pbs3uTchquA6geiB7IaIX +yxVb/Jfc4/v3iqv70VET4cxySC0X7KVVttBEfGFT638i5CWB9h2nbTbB3CpgBf56rNz MJM9mIlnKdFnyT6R/7GVaN/NMIasChKOLaXuiojPYbsDujkAZW8lL+hj7AvM0LI/zg0E 9sP22AqrI6lvgxB/EZ9tpGRCY1rs5rIcDl40xGFsdY4lGfxOa69jqellI9AfzEKzdcwI 2coCAeo78NQ36bSd/HMitqLFR8uu5LbkcmM2wk98bR3oWDkLV0I/FatKqIVh/Us4YSXN NQ== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com with ESMTP id 3fef1p0tcb-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 14 Apr 2022 08:03:31 +0000 Received: from m0127361.ppops.net (m0127361.ppops.net [127.0.0.1]) by pps.reinject (8.16.0.43/8.16.0.43) with SMTP id 23E7xDIf013209; Thu, 14 Apr 2022 08:03:31 GMT Received: from ppma06ams.nl.ibm.com (66.31.33a9.ip4.static.sl-reverse.com [169.51.49.102]) by mx0a-001b2d01.pphosted.com with ESMTP id 3fef1p0tc1-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 14 Apr 2022 08:03:31 +0000 Received: from pps.filterd (ppma06ams.nl.ibm.com [127.0.0.1]) by ppma06ams.nl.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 23E7m6EZ029540; Thu, 14 Apr 2022 08:03:29 GMT Received: from b06cxnps3075.portsmouth.uk.ibm.com (d06relay10.portsmouth.uk.ibm.com [9.149.109.195]) by ppma06ams.nl.ibm.com with ESMTP id 3fb1dj7yhf-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 14 Apr 2022 08:03:29 +0000 Received: from d06av26.portsmouth.uk.ibm.com (d06av26.portsmouth.uk.ibm.com [9.149.105.62]) by b06cxnps3075.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 23E83Q3L53346706 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 14 Apr 2022 08:03:26 GMT Received: from d06av26.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 7174EAE05D; Thu, 14 Apr 2022 08:03:26 +0000 (GMT) Received: from d06av26.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id E26C5AE05A; Thu, 14 Apr 2022 08:03:25 +0000 (GMT) Received: from localhost.localdomain (unknown [9.145.1.140]) by d06av26.portsmouth.uk.ibm.com (Postfix) with ESMTP; Thu, 14 Apr 2022 08:03:25 +0000 (GMT) From: Claudio Imbrenda To: kvm@vger.kernel.org Cc: borntraeger@de.ibm.com, frankja@linux.ibm.com, thuth@redhat.com, pasic@linux.ibm.com, david@redhat.com, linux-s390@vger.kernel.org, linux-kernel@vger.kernel.org, scgl@linux.ibm.com, mimu@linux.ibm.com, nrb@linux.ibm.com Subject: [PATCH v10 15/19] KVM: s390: pv: asynchronous destroy for reboot Date: Thu, 14 Apr 2022 10:03:06 +0200 Message-Id: <20220414080311.1084834-16-imbrenda@linux.ibm.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20220414080311.1084834-1-imbrenda@linux.ibm.com> References: <20220414080311.1084834-1-imbrenda@linux.ibm.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-TM-AS-GCONF: 00 X-Proofpoint-GUID: 6zum5KKHbVy1Qm3JWAszBIU_IvVGWwgp X-Proofpoint-ORIG-GUID: 9RsxcQUe2_lGxhiyt6wlKR5sRxXK3mcg X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.858,Hydra:6.0.486,FMLib:17.11.64.514 definitions=2022-04-14_02,2022-04-13_01,2022-02-23_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 bulkscore=0 spamscore=0 malwarescore=0 priorityscore=1501 phishscore=0 lowpriorityscore=0 adultscore=0 mlxscore=0 suspectscore=0 impostorscore=0 clxscore=1015 mlxlogscore=975 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2202240000 definitions=main-2204140044 X-Spam-Status: No, score=-2.0 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_EF,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Until now, destroying a protected guest was an entirely synchronous operation that could potentially take a very long time, depending on the size of the guest, due to the time needed to clean up the address space from protected pages. This patch implements an asynchronous destroy mechanism, that allows a protected guest to reboot significantly faster than previously. This is achieved by clearing the pages of the old guest in background. In case of reboot, the new guest will be able to run in the same address space almost immediately. The old protected guest is then only destroyed when all of its memory has been destroyed or otherwise made non protected. Two new PV commands are added for the KVM_S390_PV_COMMAND ioctl: KVM_PV_ASYNC_DISABLE_PREPARE: prepares the current protected VM for asynchronous teardown. The current VM will then continue immediately as non-protected. If a protected VM had already been set aside without starting the teardown process, this call will fail. KVM_PV_ASYNC_DISABLE: tears down the protected VM previously set aside for asynchronous teardown. This PV command should ideally be issued by userspace from a separate thread. If a fatal signal is received (or the process terminates naturally), the command will terminate immediately without completing. Leftover protected VMs are cleaned up when a KVM VM is torn down normally (either via IOCTL or when the process terminates); this cleanup has been implemented in a previous patch. Signed-off-by: Claudio Imbrenda --- arch/s390/kvm/kvm-s390.c | 24 ++++++++ arch/s390/kvm/kvm-s390.h | 2 + arch/s390/kvm/pv.c | 126 +++++++++++++++++++++++++++++++++++++++ include/uapi/linux/kvm.h | 2 + 4 files changed, 154 insertions(+) diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c index 05c976bf2438..e00283b48a47 100644 --- a/arch/s390/kvm/kvm-s390.c +++ b/arch/s390/kvm/kvm-s390.c @@ -2292,6 +2292,30 @@ static int kvm_s390_handle_pv(struct kvm *kvm, struct kvm_pv_cmd *cmd) set_bit(IRQ_PEND_EXT_SERVICE, &kvm->arch.float_int.masked_irqs); break; } + case KVM_PV_ASYNC_DISABLE_PREPARE: + r = -EINVAL; + if (!kvm_s390_pv_is_protected(kvm) || !lazy_destroy) + break; + + r = kvm_s390_cpus_from_pv(kvm, &cmd->rc, &cmd->rrc); + /* + * If a CPU could not be destroyed, destroy VM will also fail. + * There is no point in trying to destroy it. Instead return + * the rc and rrc from the first CPU that failed destroying. + */ + if (r) + break; + r = kvm_s390_pv_deinit_vm_async_prepare(kvm, &cmd->rc, &cmd->rrc); + + /* no need to block service interrupts any more */ + clear_bit(IRQ_PEND_EXT_SERVICE, &kvm->arch.float_int.masked_irqs); + break; + case KVM_PV_ASYNC_DISABLE: + r = -EINVAL; + if (!kvm->arch.pv.async_deinit) + break; + r = kvm_s390_pv_deinit_vm_async(kvm, &cmd->rc, &cmd->rrc); + break; case KVM_PV_DISABLE: { r = -EINVAL; if (!kvm_s390_pv_is_protected(kvm)) diff --git a/arch/s390/kvm/kvm-s390.h b/arch/s390/kvm/kvm-s390.h index d3abedafa7a8..d296afb6041c 100644 --- a/arch/s390/kvm/kvm-s390.h +++ b/arch/s390/kvm/kvm-s390.h @@ -243,6 +243,8 @@ static inline u32 kvm_s390_get_gisa_desc(struct kvm *kvm) /* implemented in pv.c */ int kvm_s390_pv_destroy_cpu(struct kvm_vcpu *vcpu, u16 *rc, u16 *rrc); int kvm_s390_pv_create_cpu(struct kvm_vcpu *vcpu, u16 *rc, u16 *rrc); +int kvm_s390_pv_deinit_vm_async_prepare(struct kvm *kvm, u16 *rc, u16 *rrc); +int kvm_s390_pv_deinit_vm_async(struct kvm *kvm, u16 *rc, u16 *rrc); int kvm_s390_pv_deinit_vm(struct kvm *kvm, u16 *rc, u16 *rrc); int kvm_s390_pv_init_vm(struct kvm *kvm, u16 *rc, u16 *rrc); int kvm_s390_pv_set_sec_parms(struct kvm *kvm, void *hdr, u64 length, u16 *rc, diff --git a/arch/s390/kvm/pv.c b/arch/s390/kvm/pv.c index b20f2cbd43d9..36bc107bbd7d 100644 --- a/arch/s390/kvm/pv.c +++ b/arch/s390/kvm/pv.c @@ -262,6 +262,132 @@ int kvm_s390_pv_deinit_vm(struct kvm *kvm, u16 *rc, u16 *rrc) return cc ? -EIO : 0; } +/** + * kvm_s390_clear_2g - Clear the first 2GB of guest memory. + * @kvm the VM whose memory is to be cleared. + * Clear the first 2GB of guest memory, to avoid prefix issues after reboot. + */ +static void kvm_s390_clear_2g(struct kvm *kvm) +{ + struct kvm_memory_slot *slot; + unsigned long lim; + int srcu_idx; + + srcu_idx = srcu_read_lock(&kvm->srcu); + + slot = gfn_to_memslot(kvm, 0); + /* Clear all slots that are completely below 2GB */ + while (slot && slot->base_gfn + slot->npages < SZ_2G / PAGE_SIZE) { + lim = slot->userspace_addr + slot->npages * PAGE_SIZE; + s390_uv_destroy_range(kvm->mm, slot->userspace_addr, lim); + slot = gfn_to_memslot(kvm, slot->base_gfn + slot->npages); + } + /* Last slot crosses the 2G boundary, clear only up to 2GB */ + if (slot && slot->base_gfn < SZ_2G / PAGE_SIZE) { + lim = slot->userspace_addr + SZ_2G - slot->base_gfn * PAGE_SIZE; + s390_uv_destroy_range(kvm->mm, slot->userspace_addr, lim); + } + + srcu_read_unlock(&kvm->srcu, srcu_idx); +} + +/** + * kvm_s390_pv_deinit_vm_async_prepare - Prepare a protected VM for + * asynchronous teardown. + * @kvm the VM + * @rc return value for the RC field of the UVCB + * @rrc return value for the RRC field of the UVCB + * + * Prepare the protected VM for asynchronous teardown. The VM will be able + * to continue immediately as a non-secure VM, and the information needed to + * properly tear down the protected VM is set aside. If another protected VM + * was already set aside without starting a teardown, the function will + * fail. + * + * Context: kvm->lock needs to be held + * + * Return: 0 in case of success, -EINVAL if another protected VM was already set + * aside, -ENOMEM if the system ran out of memory. + */ +int kvm_s390_pv_deinit_vm_async_prepare(struct kvm *kvm, u16 *rc, u16 *rrc) +{ + struct deferred_priv *priv; + + /* + * If an asynchronous deinitialization is already pending, refuse. + * A synchronous deinitialization has to be performed instead. + */ + if (kvm->arch.pv.async_deinit) + return -EINVAL; + priv = kmalloc(sizeof(*priv), GFP_KERNEL | __GFP_ZERO); + if (!priv) + return -ENOMEM; + + priv->stor_var = kvm->arch.pv.stor_var; + priv->stor_base = kvm->arch.pv.stor_base; + priv->handle = kvm_s390_pv_get_handle(kvm); + priv->old_table = (unsigned long)kvm->arch.gmap->table; + WRITE_ONCE(kvm->arch.gmap->guest_handle, 0); + if (s390_replace_asce(kvm->arch.gmap)) { + kfree(priv); + return -ENOMEM; + } + + kvm_s390_clear_2g(kvm); + kvm_s390_clear_pv_state(kvm); + kvm->arch.pv.async_deinit = priv; + + *rc = 1; + *rrc = 42; + return 0; +} + +/** + * kvm_s390_pv_deinit_vm_async - Perform an asynchronous teardown of a + * protected VM. + * @kvm the VM previously associated with the protected VM + * @rc return value for the RC field of the UVCB + * @rrc return value for the RRC field of the UVCB + * + * Tear down the protected VM that had previously been set aside using + * kvm_s390_pv_deinit_vm_async_prepare. + * + * Context: kvm->lock needs to be held + * + * Return: 0 in case of success, -EINVAL if no protected VM had been + * prepared for asynchronous teardowm, -EIO in case of other errors. + */ +int kvm_s390_pv_deinit_vm_async(struct kvm *kvm, u16 *rc, u16 *rrc) +{ + struct deferred_priv *p = kvm->arch.pv.async_deinit; + int ret = 0; + + if (!p) + return -EINVAL; + kvm->arch.pv.async_deinit = NULL; + mutex_unlock(&kvm->lock); + + /* When a fatal signal is received, stop immediately */ + if (s390_uv_destroy_range_interruptible(kvm->mm, 0, TASK_SIZE_MAX)) + goto done; + if (kvm_s390_pv_cleanup_deferred(kvm, p)) + ret = -EIO; + else + atomic_dec(&kvm->mm->context.protected_count); + kfree(p); + p = NULL; +done: + /* The caller expects the lock to be held */ + mutex_lock(&kvm->lock); + /* + * p is not NULL if we aborted because of a fatal signal, in which + * case queue the leftover for later cleanup. + */ + if (p) + list_add(&p->list, &kvm->arch.pv.need_cleanup); + return ret; +} + static void kvm_s390_pv_mmu_notifier_release(struct mmu_notifier *subscription, struct mm_struct *mm) { diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 91a6fe4e02c0..0abad46d5910 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -1653,6 +1653,8 @@ enum pv_cmd_id { KVM_PV_VERIFY, KVM_PV_PREP_RESET, KVM_PV_UNSHARE_ALL, + KVM_PV_ASYNC_DISABLE_PREPARE, + KVM_PV_ASYNC_DISABLE, }; struct kvm_pv_cmd { -- 2.34.1