Received: by 2002:a05:7412:98c1:b0:fa:551:50a7 with SMTP id kc1csp1623056rdb; Mon, 8 Jan 2024 05:29:02 -0800 (PST) X-Google-Smtp-Source: AGHT+IHZS1F6l+b5ITjOVu3YUHpMk4G8nXRxm2kjopxhCHYO1KLh+hkgqft22FpM34TId9Ff6hie X-Received: by 2002:a05:6e02:154d:b0:360:8f34:583d with SMTP id j13-20020a056e02154d00b003608f34583dmr2817756ilu.94.1704720541832; Mon, 08 Jan 2024 05:29:01 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1704720541; cv=none; d=google.com; s=arc-20160816; b=x9QLONcpSDOSj9gd9Pr8VY9qu6SDawXXJB/rLNQ5Y7lNnvNVaITM92+EHiWMdP2U8h /3djmB8/aUQ/J64pvGUItaTDX+SsEVJgLi0QLD8aSrOyBV8jDegGsp8LMORJ4VI3Vd1/ oSaWhvzj0ijHZZITSUOl9hpHbMpBav41aSgo96m+Uau8jXQ6I53riTkcjLlb7z7IwVFb 1JF0u89AO2QMn2PXU9GI89NBnnZk6GtYLWSf7eVZuyDDLm25i/01w+lOjXoNQpOvvMcS 49A6neWkLffaY4mHT0rMMxAiLbA2GGkKL7ZnCY+tp/yaKBSjkcubDYl0HWX7UuJjGQpm McRw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from:dkim-signature; bh=eroLOb5YGmfTzlQLd8h3TE8H8OwqUddeDVt011pdTMY=; fh=4w46s1ayy3YIUOIGaG7mWmB/wwSQj8/Ao7ZPTNB1Fw0=; b=jIYHW1j04h5pCaNDKNckMsvFBAc+it6K4kH2K3nSagiHsvTLInXd64/89psEfL4JDr Ec+cE1tN5Rwssq5T0Bche7sSosuIyYyYgY8xqN5uckgfnHRtu8Y23lv2E5hN4IwMEkyv Fh3dQqOBg65WGWi83DRC6ajKU63CMqpuvElSFJyUfKE7V5vrSYnlNplK1IHEm6ENdk5P oFawBQMLLPKs+KxBSWeLMU+a1rPYuHwh9rKjkQ2ES9iQvVi4NBxhbmamh4/zfZfw5kgA XQTYy5pe573QlFHDsN/Uu0GFNR0QVlhXC5xM9UvfeiFso3e4gNpYWvDQok8kBlGHvkd2 pIgw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@ibm.com header.s=pp1 header.b=Ft9RQiWy; spf=pass (google.com: domain of linux-kernel+bounces-19616-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-19616-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=NONE dis=NONE) header.from=ibm.com Return-Path: Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [2604:1380:45e3:2400::1]) by mx.google.com with ESMTPS id s2-20020a656902000000b005ce0160641fsi5901935pgq.551.2024.01.08.05.29.01 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 08 Jan 2024 05:29:01 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-19616-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) client-ip=2604:1380:45e3:2400::1; Authentication-Results: mx.google.com; dkim=pass header.i=@ibm.com header.s=pp1 header.b=Ft9RQiWy; spf=pass (google.com: domain of linux-kernel+bounces-19616-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-19616-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=NONE dis=NONE) header.from=ibm.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id 75F55283EFC for ; Mon, 8 Jan 2024 13:29:01 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id B0E7745BF8; Mon, 8 Jan 2024 13:28:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="Ft9RQiWy" X-Original-To: linux-kernel@vger.kernel.org Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2F87A4597C; Mon, 8 Jan 2024 13:28:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Received: from pps.filterd (m0353726.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 408DLC0L020841; Mon, 8 Jan 2024 13:28:05 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=pp1; bh=eroLOb5YGmfTzlQLd8h3TE8H8OwqUddeDVt011pdTMY=; b=Ft9RQiWy8oCfYavwTtkpQDUEaZYSV0fS4fxmyBhA3g+ww+b0HoPb8DD30XJ68R1V9H23 x2fgfxaEW5z0mGJAfmpUzprjMV/2wGqp/HWC3QK572+eHlE/WQ92FQyT1QbUlUi3YzGG JTJbFNpQ7WUJx5jUoqeJwmuqWpLUOFqMU7KuZELTsaFDIZ6iBIG4I0+EJJ6n2Ca91O5g 6BwpOv0vWGculP8lNlanWVfmDZFk1oBCg+tG1zoSFHL1v9Dy83R87MdJlQ/hLhtCGxtC A0goITuVk+qZIwM3fhl0s3IRnI5lkso+2kaEzCLNjpajxaVY46ALOuixSeDPEF9Dg8gx AQ== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3vf1uypqdf-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 08 Jan 2024 13:28:05 +0000 Received: from m0353726.ppops.net (m0353726.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 408DMIXK024263; Mon, 8 Jan 2024 13:28:04 GMT Received: from ppma21.wdc07v.mail.ibm.com (5b.69.3da9.ip4.static.sl-reverse.com [169.61.105.91]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3vf1uypqcc-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 08 Jan 2024 13:28:04 +0000 Received: from pps.filterd (ppma21.wdc07v.mail.ibm.com [127.0.0.1]) by ppma21.wdc07v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 408CcujI022877; Mon, 8 Jan 2024 13:28:03 GMT Received: from smtprelay04.fra02v.mail.ibm.com ([9.218.2.228]) by ppma21.wdc07v.mail.ibm.com (PPS) with ESMTPS id 3vfj6n82k9-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 08 Jan 2024 13:28:03 +0000 Received: from smtpav05.fra02v.mail.ibm.com (smtpav05.fra02v.mail.ibm.com [10.20.54.104]) by smtprelay04.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 408DRw6045482458 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 8 Jan 2024 13:28:00 GMT Received: from smtpav05.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 9F9DF20040; Mon, 8 Jan 2024 13:27:58 +0000 (GMT) Received: from smtpav05.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 3F74420063; Mon, 8 Jan 2024 13:27:58 +0000 (GMT) Received: from tuxmaker.boeblingen.de.ibm.com (unknown [9.152.85.9]) by smtpav05.fra02v.mail.ibm.com (Postfix) with ESMTP; Mon, 8 Jan 2024 13:27:58 +0000 (GMT) From: Sumanth Korikkar To: linux-mm , Andrew Morton , David Hildenbrand Cc: Oscar Salvador , Michal Hocko , "Aneesh Kumar K.V" , Anshuman Khandual , Gerald Schaefer , Alexander Gordeev , Heiko Carstens , Vasily Gorbik , linux-s390 , LKML , Sumanth Korikkar Subject: [PATCH v6 2/5] s390/mm: allocate vmemmap pages from self-contained memory range Date: Mon, 8 Jan 2024 14:27:44 +0100 Message-Id: <20240108132747.3238763-3-sumanthk@linux.ibm.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20240108132747.3238763-1-sumanthk@linux.ibm.com> References: <20240108132747.3238763-1-sumanthk@linux.ibm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: 4fLSsujhrM_e219FyLS-SVDTCsvCtWA- X-Proofpoint-GUID: vcvD5MqpipMI0C1xqu8Pbay2fc3h3lKM X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.997,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2024-01-08_04,2024-01-08_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 phishscore=0 priorityscore=1501 clxscore=1015 adultscore=0 mlxlogscore=890 malwarescore=0 bulkscore=0 impostorscore=0 lowpriorityscore=0 spamscore=0 suspectscore=0 mlxscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2311290000 definitions=main-2401080115 Allocate memory map (struct pages array) from the hotplugged memory range, rather than using system memory. The change addresses the issue where standby memory, when configured to be much larger than online memory, could potentially lead to ipl failure due to memory map allocation from online memory. For example, 16MB of memory map allocation is needed for a memory block size of 1GB and when standby memory is configured much larger than online memory, this could lead to ipl failure. To address this issue, the solution involves introducing "memmap on memory" using the vmem_altmap structure on s390. Architectures that want to implement it should pass the altmap to the vmemmap_populate() function and its associated callchain. This enhancement is discussed in commit 4b94ffdc4163 ("x86, mm: introduce vmem_altmap to augment vmemmap_populate()") Provide "memmap on memory" support for s390 by passing the altmap in vmemmap_populate() and its callchain. The allocation path is described as follows: * When altmap is NULL in vmemmap_populate(), memory map allocation occurs using the existing vmemmap_alloc_block_buf(). * When altmap is not NULL in vmemmap_populate(), memory map allocation still uses vmemmap_alloc_block_buf(), but this function internally calls altmap_alloc_block_buf(). For deallocation, the process is outlined as follows: * When altmap is NULL in vmemmap_free(), memory map deallocation happens through free_pages(). * When altmap is not NULL in vmemmap_free(), memory map deallocation occurs via vmem_altmap_free(). While memory map allocation is primarily handled through the self-contained memory map range, there might still be a small amount of system memory allocation required for vmemmap pagetables. To mitigate this impact, this feature will be limited to machines with EDAT1 support. Reviewed-by: Gerald Schaefer Signed-off-by: Sumanth Korikkar --- arch/s390/mm/init.c | 3 --- arch/s390/mm/vmem.c | 62 +++++++++++++++++++++++++-------------------- 2 files changed, 35 insertions(+), 30 deletions(-) diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c index 43e612bc2bcd..8d9a60ccb777 100644 --- a/arch/s390/mm/init.c +++ b/arch/s390/mm/init.c @@ -281,9 +281,6 @@ int arch_add_memory(int nid, u64 start, u64 size, unsigned long size_pages = PFN_DOWN(size); int rc; - if (WARN_ON_ONCE(params->altmap)) - return -EINVAL; - if (WARN_ON_ONCE(params->pgprot.pgprot != PAGE_KERNEL.pgprot)) return -EINVAL; diff --git a/arch/s390/mm/vmem.c b/arch/s390/mm/vmem.c index 186a020857cf..eb100479f7be 100644 --- a/arch/s390/mm/vmem.c +++ b/arch/s390/mm/vmem.c @@ -33,8 +33,12 @@ static void __ref *vmem_alloc_pages(unsigned int order) return memblock_alloc(size, size); } -static void vmem_free_pages(unsigned long addr, int order) +static void vmem_free_pages(unsigned long addr, int order, struct vmem_altmap *altmap) { + if (altmap) { + vmem_altmap_free(altmap, 1 << order); + return; + } /* We don't expect boot memory to be removed ever. */ if (!slab_is_available() || WARN_ON_ONCE(PageReserved(virt_to_page((void *)addr)))) @@ -156,7 +160,8 @@ static bool vmemmap_unuse_sub_pmd(unsigned long start, unsigned long end) /* __ref: we'll only call vmemmap_alloc_block() via vmemmap_populate() */ static int __ref modify_pte_table(pmd_t *pmd, unsigned long addr, - unsigned long end, bool add, bool direct) + unsigned long end, bool add, bool direct, + struct vmem_altmap *altmap) { unsigned long prot, pages = 0; int ret = -ENOMEM; @@ -172,11 +177,11 @@ static int __ref modify_pte_table(pmd_t *pmd, unsigned long addr, if (pte_none(*pte)) continue; if (!direct) - vmem_free_pages((unsigned long) pfn_to_virt(pte_pfn(*pte)), 0); + vmem_free_pages((unsigned long)pfn_to_virt(pte_pfn(*pte)), get_order(PAGE_SIZE), altmap); pte_clear(&init_mm, addr, pte); } else if (pte_none(*pte)) { if (!direct) { - void *new_page = vmemmap_alloc_block(PAGE_SIZE, NUMA_NO_NODE); + void *new_page = vmemmap_alloc_block_buf(PAGE_SIZE, NUMA_NO_NODE, altmap); if (!new_page) goto out; @@ -213,7 +218,8 @@ static void try_free_pte_table(pmd_t *pmd, unsigned long start) /* __ref: we'll only call vmemmap_alloc_block() via vmemmap_populate() */ static int __ref modify_pmd_table(pud_t *pud, unsigned long addr, - unsigned long end, bool add, bool direct) + unsigned long end, bool add, bool direct, + struct vmem_altmap *altmap) { unsigned long next, prot, pages = 0; int ret = -ENOMEM; @@ -234,11 +240,11 @@ static int __ref modify_pmd_table(pud_t *pud, unsigned long addr, if (IS_ALIGNED(addr, PMD_SIZE) && IS_ALIGNED(next, PMD_SIZE)) { if (!direct) - vmem_free_pages(pmd_deref(*pmd), get_order(PMD_SIZE)); + vmem_free_pages(pmd_deref(*pmd), get_order(PMD_SIZE), altmap); pmd_clear(pmd); pages++; } else if (!direct && vmemmap_unuse_sub_pmd(addr, next)) { - vmem_free_pages(pmd_deref(*pmd), get_order(PMD_SIZE)); + vmem_free_pages(pmd_deref(*pmd), get_order(PMD_SIZE), altmap); pmd_clear(pmd); } continue; @@ -261,7 +267,7 @@ static int __ref modify_pmd_table(pud_t *pud, unsigned long addr, * page tables since vmemmap_populate gets * called for each section separately. */ - new_page = vmemmap_alloc_block(PMD_SIZE, NUMA_NO_NODE); + new_page = vmemmap_alloc_block_buf(PMD_SIZE, NUMA_NO_NODE, altmap); if (new_page) { set_pmd(pmd, __pmd(__pa(new_page) | prot)); if (!IS_ALIGNED(addr, PMD_SIZE) || @@ -280,7 +286,7 @@ static int __ref modify_pmd_table(pud_t *pud, unsigned long addr, vmemmap_use_sub_pmd(addr, next); continue; } - ret = modify_pte_table(pmd, addr, next, add, direct); + ret = modify_pte_table(pmd, addr, next, add, direct, altmap); if (ret) goto out; if (!add) @@ -302,12 +308,12 @@ static void try_free_pmd_table(pud_t *pud, unsigned long start) for (i = 0; i < PTRS_PER_PMD; i++, pmd++) if (!pmd_none(*pmd)) return; - vmem_free_pages(pud_deref(*pud), CRST_ALLOC_ORDER); + vmem_free_pages(pud_deref(*pud), CRST_ALLOC_ORDER, NULL); pud_clear(pud); } static int modify_pud_table(p4d_t *p4d, unsigned long addr, unsigned long end, - bool add, bool direct) + bool add, bool direct, struct vmem_altmap *altmap) { unsigned long next, prot, pages = 0; int ret = -ENOMEM; @@ -347,7 +353,7 @@ static int modify_pud_table(p4d_t *p4d, unsigned long addr, unsigned long end, } else if (pud_large(*pud)) { continue; } - ret = modify_pmd_table(pud, addr, next, add, direct); + ret = modify_pmd_table(pud, addr, next, add, direct, altmap); if (ret) goto out; if (!add) @@ -370,12 +376,12 @@ static void try_free_pud_table(p4d_t *p4d, unsigned long start) if (!pud_none(*pud)) return; } - vmem_free_pages(p4d_deref(*p4d), CRST_ALLOC_ORDER); + vmem_free_pages(p4d_deref(*p4d), CRST_ALLOC_ORDER, NULL); p4d_clear(p4d); } static int modify_p4d_table(pgd_t *pgd, unsigned long addr, unsigned long end, - bool add, bool direct) + bool add, bool direct, struct vmem_altmap *altmap) { unsigned long next; int ret = -ENOMEM; @@ -394,7 +400,7 @@ static int modify_p4d_table(pgd_t *pgd, unsigned long addr, unsigned long end, goto out; p4d_populate(&init_mm, p4d, pud); } - ret = modify_pud_table(p4d, addr, next, add, direct); + ret = modify_pud_table(p4d, addr, next, add, direct, altmap); if (ret) goto out; if (!add) @@ -415,12 +421,12 @@ static void try_free_p4d_table(pgd_t *pgd, unsigned long start) if (!p4d_none(*p4d)) return; } - vmem_free_pages(pgd_deref(*pgd), CRST_ALLOC_ORDER); + vmem_free_pages(pgd_deref(*pgd), CRST_ALLOC_ORDER, NULL); pgd_clear(pgd); } static int modify_pagetable(unsigned long start, unsigned long end, bool add, - bool direct) + bool direct, struct vmem_altmap *altmap) { unsigned long addr, next; int ret = -ENOMEM; @@ -445,7 +451,7 @@ static int modify_pagetable(unsigned long start, unsigned long end, bool add, goto out; pgd_populate(&init_mm, pgd, p4d); } - ret = modify_p4d_table(pgd, addr, next, add, direct); + ret = modify_p4d_table(pgd, addr, next, add, direct, altmap); if (ret) goto out; if (!add) @@ -458,14 +464,16 @@ static int modify_pagetable(unsigned long start, unsigned long end, bool add, return ret; } -static int add_pagetable(unsigned long start, unsigned long end, bool direct) +static int add_pagetable(unsigned long start, unsigned long end, bool direct, + struct vmem_altmap *altmap) { - return modify_pagetable(start, end, true, direct); + return modify_pagetable(start, end, true, direct, altmap); } -static int remove_pagetable(unsigned long start, unsigned long end, bool direct) +static int remove_pagetable(unsigned long start, unsigned long end, bool direct, + struct vmem_altmap *altmap) { - return modify_pagetable(start, end, false, direct); + return modify_pagetable(start, end, false, direct, altmap); } /* @@ -474,7 +482,7 @@ static int remove_pagetable(unsigned long start, unsigned long end, bool direct) static int vmem_add_range(unsigned long start, unsigned long size) { start = (unsigned long)__va(start); - return add_pagetable(start, start + size, true); + return add_pagetable(start, start + size, true, NULL); } /* @@ -483,7 +491,7 @@ static int vmem_add_range(unsigned long start, unsigned long size) static void vmem_remove_range(unsigned long start, unsigned long size) { start = (unsigned long)__va(start); - remove_pagetable(start, start + size, true); + remove_pagetable(start, start + size, true, NULL); } /* @@ -496,9 +504,9 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node, mutex_lock(&vmem_mutex); /* We don't care about the node, just use NUMA_NO_NODE on allocations */ - ret = add_pagetable(start, end, false); + ret = add_pagetable(start, end, false, altmap); if (ret) - remove_pagetable(start, end, false); + remove_pagetable(start, end, false, altmap); mutex_unlock(&vmem_mutex); return ret; } @@ -509,7 +517,7 @@ void vmemmap_free(unsigned long start, unsigned long end, struct vmem_altmap *altmap) { mutex_lock(&vmem_mutex); - remove_pagetable(start, end, false); + remove_pagetable(start, end, false, altmap); mutex_unlock(&vmem_mutex); } -- 2.40.1