Received: by 2002:a6b:fb09:0:0:0:0:0 with SMTP id h9csp384007iog; Wed, 15 Jun 2022 04:24:19 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzMSTnq2oizWz1RBlGd6+bR4Qhn/autrZ9zF8331xAZJ9XDLBKWE7unLn9pYEMkU86p5cSo X-Received: by 2002:aa7:c54b:0:b0:42d:be18:c261 with SMTP id s11-20020aa7c54b000000b0042dbe18c261mr12033786edr.267.1655292259524; Wed, 15 Jun 2022 04:24:19 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1655292259; cv=none; d=google.com; s=arc-20160816; b=QJQ5Rs42vle9CcdEHjfYxSF3CJldgfz7WEZScIJC3Z1JqqbN1sP+oZ45Qsvfa1SmYB Mxqef0f19z96ElrHPOMuvjPDXpgTRdHl+RvetbLCklIMrqZZUnmNj64hZs3BBTjDOYY0 Tj53FtHQkMeEf1joeNHLKsxxzDx56CsB2lgrwEvuoCSFvBg8CAIyJhn1FKOiRlZsBNTK lH3R3RdeOTVKAHSTsZPxMR74i4QhJ87oIIhQ2xU9siX1cLjb2zrncZmC/ZHHu0UlBktG aEdcyO7i/gBNxxh16D+8A8Ht5cSmz+XkGCEpHO0to5RxBY35Cfa/2N8MZKQ3z5WCtrHY 3AjA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:subject :from:references:cc:to:content-language:user-agent:mime-version:date :message-id:dkim-signature; bh=C4vMRRi1D2cJwV0PhiCWcFn+hJT19+STCZkVDOc5sqU=; b=sfx+Fq6GWaPevhX06d9ENYXsr9sNN0lF+UDp/y+iGjrz1tLN5rM2LbMttcpZpmdZMH KHYnakTcJxr8LW+leQM1n1nkhD+e6c39IJoCvOwsbzwllWK5lnXsayvtbItv8LEHk124 8l/p8XyNs4hdNXubNFR1XrK/7/b8bhyH1waPmRUKw8lm163ZSrv7MCXNPwDAzZaRSYLj TlG4VTnPI6nW/1S1pi9sPOpu8v0xh4bs4GYW4VOEgCGqBDTA+Ho5zAYi9xdBGxJt20cR n2BUcKHlLd+4kKebIaOmT6Ig6o+gutmSN5eV+gj5BMcBZ5bljnIWSFiI+QQAtKVgDGAU pknQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@ibm.com header.s=pp1 header.b="I/2iGZ0p"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=ibm.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id eg3-20020a056402288300b0042dbd57bb65si13595951edb.154.2022.06.15.04.23.54; Wed, 15 Jun 2022 04:24:19 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@ibm.com header.s=pp1 header.b="I/2iGZ0p"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=ibm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1345662AbiFOK6S (ORCPT + 99 others); Wed, 15 Jun 2022 06:58:18 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50658 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S242799AbiFOK6N (ORCPT ); Wed, 15 Jun 2022 06:58:13 -0400 Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 50FF9252A9; Wed, 15 Jun 2022 03:58:08 -0700 (PDT) Received: from pps.filterd (m0127361.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 25FAJX03011824; Wed, 15 Jun 2022 10:58:07 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=message-id : date : mime-version : to : cc : references : from : subject : in-reply-to : content-type : content-transfer-encoding; s=pp1; bh=C4vMRRi1D2cJwV0PhiCWcFn+hJT19+STCZkVDOc5sqU=; b=I/2iGZ0pvMREkFB6BwfZI/zLUACk6n67iRPHWN7wbgG7GTbE90FJBeWmm8FeHPXgJflH moBP+7V5VP5aT0gAhFUQuqqNrgIdoXREtpHvA3v1t6eyE38GQ0s+YG5mnGpaad9n5VMD 1ak+npE3Tcii8ytu0gISX4cvP/LkK+Jem29SPxH04eRBPkce9g7ukG7ojNFRBAjwff3k EvmA9C9FMxFqEunWx1+BFp3xBDQp1I3pSzy+aSnVVwM4RNd0C2XltfACyDsAZ7Ni+XWz lnW52fTFzRsQkqbSTtVuXhbL2wwJBPDnGnw1MlhzT6kIeDDy0WcHUiS619BQhvpsbI/V kA== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3gpr3g425r-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 15 Jun 2022 10:58:07 +0000 Received: from m0127361.ppops.net (m0127361.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 25F8oVBN031273; Wed, 15 Jun 2022 10:58:06 GMT Received: from ppma06ams.nl.ibm.com (66.31.33a9.ip4.static.sl-reverse.com [169.51.49.102]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3gpr3g4251-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 15 Jun 2022 10:58:06 +0000 Received: from pps.filterd (ppma06ams.nl.ibm.com [127.0.0.1]) by ppma06ams.nl.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 25FAoknN009479; Wed, 15 Jun 2022 10:58:04 GMT Received: from b06cxnps4075.portsmouth.uk.ibm.com (d06relay12.portsmouth.uk.ibm.com [9.149.109.197]) by ppma06ams.nl.ibm.com with ESMTP id 3gmjajdptj-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 15 Jun 2022 10:58:04 +0000 Received: from d06av26.portsmouth.uk.ibm.com (d06av26.portsmouth.uk.ibm.com [9.149.105.62]) by b06cxnps4075.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 25FAw1BH21758372 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 15 Jun 2022 10:58:01 GMT Received: from d06av26.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id B85EBAE045; Wed, 15 Jun 2022 10:58:01 +0000 (GMT) Received: from d06av26.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 0C598AE055; Wed, 15 Jun 2022 10:58:01 +0000 (GMT) Received: from [9.145.158.83] (unknown [9.145.158.83]) by d06av26.portsmouth.uk.ibm.com (Postfix) with ESMTP; Wed, 15 Jun 2022 10:58:00 +0000 (GMT) Message-ID: <60d0d472-ca46-7a34-c1a2-b4342a4eb9f7@linux.ibm.com> Date: Wed, 15 Jun 2022 12:58:00 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.9.0 Content-Language: en-US To: Claudio Imbrenda , kvm@vger.kernel.org Cc: borntraeger@de.ibm.com, thuth@redhat.com, pasic@linux.ibm.com, david@redhat.com, linux-s390@vger.kernel.org, linux-kernel@vger.kernel.org, scgl@linux.ibm.com, mimu@linux.ibm.com, nrb@linux.ibm.com References: <20220603065645.10019-1-imbrenda@linux.ibm.com> <20220603065645.10019-16-imbrenda@linux.ibm.com> From: Janosch Frank Subject: Re: [PATCH v11 15/19] KVM: s390: pv: asynchronous destroy for reboot In-Reply-To: <20220603065645.10019-16-imbrenda@linux.ibm.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: sLkvmm6zY2Dj_-IB-x04UoqclrtDjlTw X-Proofpoint-GUID: DhV2RtsGLyCk748KZ6RxrBmkf9g6Qfcc X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.874,Hydra:6.0.517,FMLib:17.11.64.514 definitions=2022-06-15_03,2022-06-13_01,2022-02-23_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 impostorscore=0 malwarescore=0 adultscore=0 phishscore=0 priorityscore=1501 clxscore=1015 bulkscore=0 suspectscore=0 spamscore=0 mlxlogscore=999 mlxscore=0 lowpriorityscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2204290000 definitions=main-2206150040 X-Spam-Status: No, score=-3.2 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_EF,NICE_REPLY_A,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 6/3/22 08:56, Claudio Imbrenda wrote: > Until now, destroying a protected guest was an entirely synchronous > operation that could potentially take a very long time, depending on > the size of the guest, due to the time needed to clean up the address > space from protected pages. > > This patch implements an asynchronous destroy mechanism, that allows a > protected guest to reboot significantly faster than previously. > > This is achieved by clearing the pages of the old guest in background. > In case of reboot, the new guest will be able to run in the same > address space almost immediately. > > The old protected guest is then only destroyed when all of its memory has > been destroyed or otherwise made non protected. > > Two new PV commands are added for the KVM_S390_PV_COMMAND ioctl: > > KVM_PV_ASYNC_DISABLE_PREPARE: prepares the current protected VM for > asynchronous teardown. The current VM will then continue immediately > as non-protected. If a protected VM had already been set aside without > starting the teardown process, this call will fail. > > KVM_PV_ASYNC_DISABLE: tears down the protected VM previously set aside > for asynchronous teardown. This PV command should ideally be issued by > userspace from a separate thread. If a fatal signal is received (or the > process terminates naturally), the command will terminate immediately > without completing. > > Leftover protected VMs are cleaned up when a KVM VM is torn down > normally (either via IOCTL or when the process terminates); this > cleanup has been implemented in a previous patch. > > Signed-off-by: Claudio Imbrenda > --- > arch/s390/kvm/kvm-s390.c | 34 +++++++++- > arch/s390/kvm/kvm-s390.h | 2 + > arch/s390/kvm/pv.c | 131 +++++++++++++++++++++++++++++++++++++++ > include/uapi/linux/kvm.h | 2 + > 4 files changed, 166 insertions(+), 3 deletions(-) > > diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c > index 369de8377116..842419092c0c 100644 > --- a/arch/s390/kvm/kvm-s390.c > +++ b/arch/s390/kvm/kvm-s390.c > @@ -2256,9 +2256,13 @@ static int kvm_s390_cpus_to_pv(struct kvm *kvm, u16 *rc, u16 *rrc) > > static int kvm_s390_handle_pv(struct kvm *kvm, struct kvm_pv_cmd *cmd) > { > + const bool needslock = (cmd->cmd != KVM_PV_ASYNC_DISABLE); > + void __user *argp = (void __user *)cmd->data; > int r = 0; > u16 dummy; > - void __user *argp = (void __user *)cmd->data; > + > + if (needslock) > + mutex_lock(&kvm->lock); > > switch (cmd->cmd) { > case KVM_PV_ENABLE: { > @@ -2292,6 +2296,28 @@ static int kvm_s390_handle_pv(struct kvm *kvm, struct kvm_pv_cmd *cmd) > set_bit(IRQ_PEND_EXT_SERVICE, &kvm->arch.float_int.masked_irqs); > break; > } > + case KVM_PV_ASYNC_DISABLE_PREPARE: > + r = -EINVAL; > + if (!kvm_s390_pv_is_protected(kvm) || !async_destroy) > + break; > + > + r = kvm_s390_cpus_from_pv(kvm, &cmd->rc, &cmd->rrc); > + /* > + * If a CPU could not be destroyed, destroy VM will also fail. > + * There is no point in trying to destroy it. Instead return > + * the rc and rrc from the first CPU that failed destroying. > + */ > + if (r) > + break; > + r = kvm_s390_pv_deinit_vm_async_prepare(kvm, &cmd->rc, &cmd->rrc); > + > + /* no need to block service interrupts any more */ > + clear_bit(IRQ_PEND_EXT_SERVICE, &kvm->arch.float_int.masked_irqs); > + break; > + case KVM_PV_ASYNC_DISABLE: > + /* This must not be called while holding kvm->lock */ > + r = kvm_s390_pv_deinit_vm_async(kvm, &cmd->rc, &cmd->rrc); > + break; > case KVM_PV_DISABLE: { > r = -EINVAL; > if (!kvm_s390_pv_is_protected(kvm)) > @@ -2393,6 +2419,9 @@ static int kvm_s390_handle_pv(struct kvm *kvm, struct kvm_pv_cmd *cmd) > default: > r = -ENOTTY; > } > + if (needslock) > + mutex_unlock(&kvm->lock); > + > return r; > } > > @@ -2597,9 +2626,8 @@ long kvm_arch_vm_ioctl(struct file *filp, > r = -EINVAL; > break; > } > - mutex_lock(&kvm->lock); > + /* must be called without kvm->lock */ > r = kvm_s390_handle_pv(kvm, &args); > - mutex_unlock(&kvm->lock); > if (copy_to_user(argp, &args, sizeof(args))) { > r = -EFAULT; > break; > diff --git a/arch/s390/kvm/kvm-s390.h b/arch/s390/kvm/kvm-s390.h > index d3abedafa7a8..d296afb6041c 100644 > --- a/arch/s390/kvm/kvm-s390.h > +++ b/arch/s390/kvm/kvm-s390.h > @@ -243,6 +243,8 @@ static inline u32 kvm_s390_get_gisa_desc(struct kvm *kvm) > /* implemented in pv.c */ > int kvm_s390_pv_destroy_cpu(struct kvm_vcpu *vcpu, u16 *rc, u16 *rrc); > int kvm_s390_pv_create_cpu(struct kvm_vcpu *vcpu, u16 *rc, u16 *rrc); > +int kvm_s390_pv_deinit_vm_async_prepare(struct kvm *kvm, u16 *rc, u16 *rrc); > +int kvm_s390_pv_deinit_vm_async(struct kvm *kvm, u16 *rc, u16 *rrc); > int kvm_s390_pv_deinit_vm(struct kvm *kvm, u16 *rc, u16 *rrc); > int kvm_s390_pv_init_vm(struct kvm *kvm, u16 *rc, u16 *rrc); > int kvm_s390_pv_set_sec_parms(struct kvm *kvm, void *hdr, u64 length, u16 *rc, > diff --git a/arch/s390/kvm/pv.c b/arch/s390/kvm/pv.c > index 8471c17d538c..ab06fa366e49 100644 > --- a/arch/s390/kvm/pv.c > +++ b/arch/s390/kvm/pv.c > @@ -279,6 +279,137 @@ int kvm_s390_pv_deinit_vm(struct kvm *kvm, u16 *rc, u16 *rrc) > return cc ? -EIO : 0; > } > > +/** > + * kvm_s390_destroy_lower_2g - Destroy the first 2GB of protected guest memory. > + * @kvm the VM whose memory is to be cleared. > + * Destroy the first 2GB of guest memory, to avoid prefix issues after reboot. > + */ > +static void kvm_s390_destroy_lower_2g(struct kvm *kvm) > +{ > + struct kvm_memory_slot *slot; > + unsigned long lim; > + int srcu_idx; > + > + srcu_idx = srcu_read_lock(&kvm->srcu); > + > + /* Take the memslot containing guest absolute address 0 */ > + slot = gfn_to_memslot(kvm, 0); > + /* Clear all slots that are completely below 2GB */ > + while (slot && slot->base_gfn + slot->npages < SZ_2G / PAGE_SIZE) { > + lim = slot->userspace_addr + slot->npages * PAGE_SIZE; > + s390_uv_destroy_range(kvm->mm, slot->userspace_addr, lim); > + /* Take the next memslot */ > + slot = gfn_to_memslot(kvm, slot->base_gfn + slot->npages); > + } > + /* Last slot crosses the 2G boundary, clear only up to 2GB */ > + if (slot && slot->base_gfn < SZ_2G / PAGE_SIZE) { > + lim = slot->userspace_addr + SZ_2G - slot->base_gfn * PAGE_SIZE; > + s390_uv_destroy_range(kvm->mm, slot->userspace_addr, lim); > + } Any reason why you split that up instead of always calculating a length and using MIN()?