Received: by 2002:a05:6a10:1a4d:0:0:0:0 with SMTP id nk13csp3719330pxb; Fri, 4 Feb 2022 15:02:07 -0800 (PST) X-Google-Smtp-Source: ABdhPJxGWtYUVkKbuyiDXD19LTbcm1lotUi/efniQ4ONscJUTJTFJ/HDGelcT1U1crbx78zrgK0r X-Received: by 2002:a17:90a:fd8c:: with SMTP id cx12mr1439384pjb.78.1644015727205; Fri, 04 Feb 2022 15:02:07 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1644015727; cv=none; d=google.com; s=arc-20160816; b=RMFGuSrPUt35SyYiyMYJYpr4Sh7FoKDCTnH2mZY9MB0Czj03KCkYxqvgrCmUPGYKkK qV+W2hF5iah0bNrPRDdt/VkQP/gAsHhJ90OBVF8ZliC5ugbnlKFv3SLInyYNjOwMD+uX pSpVFNf1pif7hsRI4NMNG8fRnyJlj6Zysu2Kl7hobWq31G43izr2dLUI5bD8VjX35A6I OUuu/xU1YXffmeqQD3IBBMOgqHwhqy6Y2wBEsnc6XV9jUzqvnTBD+wLu1v1KWiZyO90e YlaKM9B8v2xF0H0GDCke4FWknQLXqduIb9EA7FHejD8slqolwvh97piQpnvDW7U72TE2 bl0w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=15UCtvlaHoQy8Fa+pxbqVOVB9XlIJR9eMIBLHYK7hMI=; b=Whv5aN7rQ+HOljkpg3qrwBsYxGl2EuFIx0d0OgYEFfsBGQ/4FVQa+MAkU2ySVBVMwQ w/VKgB7ney5kK18o9j+MAPdPook8F93+Jhho81qE2jI7I9yKfj+ljgVwdDrHXixK5Wvo NR3GXoFT7TI+7KsI+Na7b88iFzmE+iea5qIeS7nnMdRBG3Daf3hGA4Em6lFtjJI+D7Xy 24rbAgrwKQEjX9clqYX1wrqW5UYgWPLOG/d9cnFVPHBvZHwyLiwXdNUPzyZI7CJg4b5e p9Y4YjL1piNRGswZ8Kv61EeBKruvO9p+vjksunf/y/mMTSfApg0XoPMXL3jRwKj8IQFk 9P8A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@ibm.com header.s=pp1 header.b=X3fVkdGL; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=ibm.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id f128si3350584pgc.440.2022.02.04.15.01.55; Fri, 04 Feb 2022 15:02:07 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@ibm.com header.s=pp1 header.b=X3fVkdGL; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=ibm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1376292AbiBDPx7 (ORCPT + 99 others); Fri, 4 Feb 2022 10:53:59 -0500 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:29390 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S240524AbiBDPx6 (ORCPT ); Fri, 4 Feb 2022 10:53:58 -0500 Received: from pps.filterd (m0098394.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.1.2/8.16.1.2) with SMTP id 214EIxl2010489; Fri, 4 Feb 2022 15:53:58 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=pp1; bh=15UCtvlaHoQy8Fa+pxbqVOVB9XlIJR9eMIBLHYK7hMI=; b=X3fVkdGLUvm/jwcFh5tdq8mQH6wkV/XhkEVdR4WdOmpvYK0Yny8UrDeQ1ZGvk4a5Qz+e bjmn3bELHokU5LjwnKQusWdi1RaVKyRyXMIXEDuB7VMFHfcVNbhb7yFRCRkuJc+iKqy4 U82Aa3WXC0KzOEN5SXAx9q0kcyfill2105iQ577fbqS8Yl2uEg8Zm+IgaCFLzn+ezLuM S0RhRCOVh4wuxaa+vcN2k4yNk87NMMsY3/SDSdBts3lNxVjy4HViU3TqcJqBj5BL/ul4 GbC4UBR65ksZ1cHoeHl0EOKV61O3xOtxsW6myOqQXMk3FEo63OwhV1e8AxC1ywGSLLBB eQ== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com with ESMTP id 3e0qx40sgs-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 04 Feb 2022 15:53:57 +0000 Received: from m0098394.ppops.net (m0098394.ppops.net [127.0.0.1]) by pps.reinject (8.16.0.43/8.16.0.43) with SMTP id 214FCTc5029582; Fri, 4 Feb 2022 15:53:57 GMT Received: from ppma02fra.de.ibm.com (47.49.7a9f.ip4.static.sl-reverse.com [159.122.73.71]) by mx0a-001b2d01.pphosted.com with ESMTP id 3e0qx40sg9-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 04 Feb 2022 15:53:56 +0000 Received: from pps.filterd (ppma02fra.de.ibm.com [127.0.0.1]) by ppma02fra.de.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 214FlL9N023865; Fri, 4 Feb 2022 15:53:54 GMT Received: from b06avi18626390.portsmouth.uk.ibm.com (b06avi18626390.portsmouth.uk.ibm.com [9.149.26.192]) by ppma02fra.de.ibm.com with ESMTP id 3e0r0v60de-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 04 Feb 2022 15:53:54 +0000 Received: from d06av26.portsmouth.uk.ibm.com (d06av26.portsmouth.uk.ibm.com [9.149.105.62]) by b06avi18626390.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 214FhtW238666498 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 4 Feb 2022 15:43:55 GMT Received: from d06av26.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 16E06AE045; Fri, 4 Feb 2022 15:53:51 +0000 (GMT) Received: from d06av26.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 848B4AE04D; Fri, 4 Feb 2022 15:53:50 +0000 (GMT) Received: from p-imbrenda.bredband2.com (unknown [9.145.8.50]) by d06av26.portsmouth.uk.ibm.com (Postfix) with ESMTP; Fri, 4 Feb 2022 15:53:50 +0000 (GMT) From: Claudio Imbrenda To: kvm@vger.kernel.org Cc: borntraeger@de.ibm.com, frankja@linux.ibm.com, thuth@redhat.com, pasic@linux.ibm.com, david@redhat.com, linux-s390@vger.kernel.org, linux-kernel@vger.kernel.org, scgl@linux.ibm.com Subject: [PATCH v7 01/17] KVM: s390: pv: leak the topmost page table when destroy fails Date: Fri, 4 Feb 2022 16:53:33 +0100 Message-Id: <20220204155349.63238-2-imbrenda@linux.ibm.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20220204155349.63238-1-imbrenda@linux.ibm.com> References: <20220204155349.63238-1-imbrenda@linux.ibm.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: w1b3-0y-xRB2MbMnDgPg57sVekW06sbp X-Proofpoint-GUID: UOf-lD8AF4-xgA3e7N9vGaKqATY72Y2L X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.816,Hydra:6.0.425,FMLib:17.11.62.513 definitions=2022-02-04_07,2022-02-03_01,2021-12-02_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 mlxlogscore=999 lowpriorityscore=0 phishscore=0 clxscore=1015 impostorscore=0 adultscore=0 mlxscore=0 spamscore=0 suspectscore=0 malwarescore=0 priorityscore=1501 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2201110000 definitions=main-2202040086 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Each secure guest must have a unique ASCE (address space control element); we must avoid that new guests use the same page for their ASCE, to avoid errors. Since the ASCE mostly consists of the address of the topmost page table (plus some flags), we must not return that memory to the pool unless the ASCE is no longer in use. Only a successful Destroy Secure Configuration UVC will make the ASCE reusable again. If the Destroy Configuration UVC fails, the ASCE cannot be reused for a secure guest (either for the ASCE or for other memory areas). To avoid a collision, it must not be used again. This is a permanent error and the page becomes in practice unusable, so we set it aside and leak it. On failure we already leak other memory that belongs to the ultravisor (i.e. the variable and base storage for a guest) and not leaking the topmost page table was an oversight. This error (and thus the leakage) should not happen unless the hardware is broken or KVM has some unknown serious bug. Signed-off-by: Claudio Imbrenda Fixes: 29b40f105ec8d55 ("KVM: s390: protvirt: Add initial vm and cpu lifecycle handling") --- arch/s390/include/asm/gmap.h | 2 ++ arch/s390/kvm/pv.c | 9 +++-- arch/s390/mm/gmap.c | 69 ++++++++++++++++++++++++++++++++++++ 3 files changed, 77 insertions(+), 3 deletions(-) diff --git a/arch/s390/include/asm/gmap.h b/arch/s390/include/asm/gmap.h index 40264f60b0da..746e18bf8984 100644 --- a/arch/s390/include/asm/gmap.h +++ b/arch/s390/include/asm/gmap.h @@ -148,4 +148,6 @@ void gmap_sync_dirty_log_pmd(struct gmap *gmap, unsigned long dirty_bitmap[4], unsigned long gaddr, unsigned long vmaddr); int gmap_mark_unmergeable(void); void s390_reset_acc(struct mm_struct *mm); +void s390_remove_old_asce(struct gmap *gmap); +int s390_replace_asce(struct gmap *gmap); #endif /* _ASM_S390_GMAP_H */ diff --git a/arch/s390/kvm/pv.c b/arch/s390/kvm/pv.c index 7f7c0d6af2ce..3c59ef763dde 100644 --- a/arch/s390/kvm/pv.c +++ b/arch/s390/kvm/pv.c @@ -166,10 +166,13 @@ int kvm_s390_pv_deinit_vm(struct kvm *kvm, u16 *rc, u16 *rrc) atomic_set(&kvm->mm->context.is_protected, 0); KVM_UV_EVENT(kvm, 3, "PROTVIRT DESTROY VM: rc %x rrc %x", *rc, *rrc); WARN_ONCE(cc, "protvirt destroy vm failed rc %x rrc %x", *rc, *rrc); - /* Inteded memory leak on "impossible" error */ - if (!cc) + /* Intended memory leak on "impossible" error */ + if (!cc) { kvm_s390_pv_dealloc_vm(kvm); - return cc ? -EIO : 0; + return 0; + } + s390_replace_asce(kvm->arch.gmap); + return -EIO; } int kvm_s390_pv_init_vm(struct kvm *kvm, u16 *rc, u16 *rrc) diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c index dfee0ebb2fac..ce6cac4463f2 100644 --- a/arch/s390/mm/gmap.c +++ b/arch/s390/mm/gmap.c @@ -2714,3 +2714,72 @@ void s390_reset_acc(struct mm_struct *mm) mmput(mm); } EXPORT_SYMBOL_GPL(s390_reset_acc); + +/** + * s390_remove_old_asce - Remove the topmost level of page tables from the + * list of page tables of the gmap. + * @gmap the gmap whose table is to be removed + * + * This means that it will not be freed when the VM is torn down, and needs + * to be handled separately by the caller, unless an intentional leak is + * intended. + */ +void s390_remove_old_asce(struct gmap *gmap) +{ + struct page *old; + + old = virt_to_page(gmap->table); + spin_lock(&gmap->guest_table_lock); + list_del(&old->lru); + /* + * in case the ASCE needs to be "removed" multiple times, for example + * if the VM is rebooted into secure mode several times + * concurrently. + */ + INIT_LIST_HEAD(&old->lru); + spin_unlock(&gmap->guest_table_lock); +} +EXPORT_SYMBOL_GPL(s390_remove_old_asce); + +/** + * s390_replace_asce - Try to replace the current ASCE of a gmap with + * another equivalent one. + * @gmap the gmap + * + * If the allocation of the new top level page table fails, the ASCE is not + * replaced. + * In any case, the old ASCE is always removed from the list. Therefore the + * caller has to make sure to save a pointer to it beforehands, unless an + * intentional leak is intended. + */ +int s390_replace_asce(struct gmap *gmap) +{ + unsigned long asce; + struct page *page; + void *table; + + s390_remove_old_asce(gmap); + + page = alloc_pages(GFP_KERNEL_ACCOUNT, CRST_ALLOC_ORDER); + if (!page) + return -ENOMEM; + table = page_to_virt(page); + memcpy(table, gmap->table, 1UL << (CRST_ALLOC_ORDER + PAGE_SHIFT)); + + /* + * The caller has to deal with the old ASCE, but here we make sure + * the new one is properly added to the list of page tables, so that + * it will be freed when the VM is torn down. + */ + spin_lock(&gmap->guest_table_lock); + list_add(&page->lru, &gmap->crst_list); + spin_unlock(&gmap->guest_table_lock); + + asce = (gmap->asce & ~PAGE_MASK) | __pa(table); + WRITE_ONCE(gmap->asce, asce); + WRITE_ONCE(gmap->mm->context.gmap_asce, asce); + WRITE_ONCE(gmap->table, table); + + return 0; +} +EXPORT_SYMBOL_GPL(s390_replace_asce); -- 2.34.1