Received: by 2002:a05:7412:d8a:b0:e2:908c:2ebd with SMTP id b10csp2819829rdg; Mon, 16 Oct 2023 16:33:15 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGT9casGL5M7rZTtKZ1yqvCq1+RISDqxELX1C89taOYn5ggLmgueT1/DiRWoGMotFzylOEd X-Received: by 2002:a17:902:e841:b0:1c9:ba77:b27e with SMTP id t1-20020a170902e84100b001c9ba77b27emr833377plg.46.1697499195370; Mon, 16 Oct 2023 16:33:15 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1697499195; cv=none; d=google.com; s=arc-20160816; b=PND+GUAUr75Kh7qRQAGggNcJiBapAumAjW1tr8Z8m94MOFBl6MJ8jiMxK0EVegnrbB gXxUuK8nnLwtoU/qxGMVgaCJePsiv5snBhZpzSeEFWSnu9VthrUuQfTZ4KVTnV2hrjd0 qreInNuQ18afjTPE6qlRMS/U5R/X+ipj8MeemUKRazPyUC//x9xnwPW6VmGk7BzjmfCj dNGWgdmv+03WfY3MBXP2ecnQCpIz+U1FnulADKpaM6I2XnNdKoQASHQW/A0n8e4WDEd0 9fdDKBTTm6krO4UNsdpr2iRQxVNdDD9kokeG0W7q8YrZEzHfWzNBsrgfv4ntORPYsZQq pLyQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:to:from :dkim-signature:dkim-filter; bh=jdahwvucpOV4HlU+VMir0deFqTNbGFSK5yumW26pSKk=; fh=QvBREeObeZx1N+80DqwmVjqUC+p/g9/4gdsZ6e4dydM=; b=mmspTS5tqzC7OrhgFzxug+xRvTTNg++JLaVSl34Tgue/liX6bqegbA5++873wSxXiy UlajZOIaqarI2XmeU8CfcRMmnIUOVMajbCkYjCaqa2QwHqxXMUQOQSs7C6CMSBxhDeX+ kdcI1Ded7PLh7G+S5H4TirR/mOdZi53phuA4+6CQInPwDFk5xzuInqexoQeK4AF01/Op 2n3K2QBqG2Li/Jer7pT/KfxZfi9TXLm5ErQcQw5AvpbsU9DX2b94nRxK15c27Mtsfxew SQbEoXRXDlkUckdoS9OHmuykO9xaA4sTjnyGe/YeP/KzW6/BxyOJDEN/m68xO+J1WDKG SX3Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linux.microsoft.com header.s=default header.b=hwWrpHDq; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linux.microsoft.com Return-Path: Received: from snail.vger.email (snail.vger.email. [23.128.96.37]) by mx.google.com with ESMTPS id c4-20020a170902aa4400b001c73732c1f2si374669plr.223.2023.10.16.16.33.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 16 Oct 2023 16:33:15 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) client-ip=23.128.96.37; Authentication-Results: mx.google.com; dkim=pass header.i=@linux.microsoft.com header.s=default header.b=hwWrpHDq; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linux.microsoft.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by snail.vger.email (Postfix) with ESMTP id 7F2C580D44E9; Mon, 16 Oct 2023 16:33:14 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at snail.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234322AbjJPXdK (ORCPT + 99 others); Mon, 16 Oct 2023 19:33:10 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53248 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234235AbjJPXco (ORCPT ); Mon, 16 Oct 2023 19:32:44 -0400 Received: from linux.microsoft.com (linux.microsoft.com [13.77.154.182]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 62414F0 for ; Mon, 16 Oct 2023 16:32:42 -0700 (PDT) Received: from localhost.localdomain (unknown [47.186.13.91]) by linux.microsoft.com (Postfix) with ESMTPSA id 157B920B74C2; Mon, 16 Oct 2023 16:32:41 -0700 (PDT) DKIM-Filter: OpenDKIM Filter v2.11.0 linux.microsoft.com 157B920B74C2 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.microsoft.com; s=default; t=1697499161; bh=jdahwvucpOV4HlU+VMir0deFqTNbGFSK5yumW26pSKk=; h=From:To:Subject:Date:In-Reply-To:References:From; b=hwWrpHDqsY/2Vc+kA26DX4AIcUqm9xfrrtFP6iNdt65sOgThXhvGUFVGRtjVyVMbc i9pcqWdkXmXOfByFXjbedYuNDtoZxt7dtYjpDjpGLDLbnu3UYy6RDhfQ5eKpv6S6ay 7KM//RnndVIjbifxyWzsM+TLr8msP/bETMWpn14Y= From: madvenka@linux.microsoft.com To: gregkh@linuxfoundation.org, pbonzini@redhat.com, rppt@kernel.org, jgowans@amazon.com, graf@amazon.de, arnd@arndb.de, keescook@chromium.org, stanislav.kinsburskii@gmail.com, anthony.yznaga@oracle.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, madvenka@linux.microsoft.com, jamorris@linux.microsoft.com Subject: [RFC PATCH v1 10/10] mm/prmem: Implement dynamic expansion of prmem. Date: Mon, 16 Oct 2023 18:32:15 -0500 Message-Id: <20231016233215.13090-11-madvenka@linux.microsoft.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20231016233215.13090-1-madvenka@linux.microsoft.com> References: <1b1bc25eb87355b91fcde1de7c2f93f38abb2bf9> <20231016233215.13090-1-madvenka@linux.microsoft.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-17.5 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_BLOCKED, SPF_HELO_PASS,SPF_PASS,USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (snail.vger.email [0.0.0.0]); Mon, 16 Oct 2023 16:33:14 -0700 (PDT) From: "Madhavan T. Venkataraman" For some use cases, it is hard to predict how much actual memory is needed to store persistent data. This will depend on the workload. Either we would have to overcommit memory for persistent data. Or, we could allow dynamic expansion of prmem memory. Implement dynamic expansion of prmem. When the allocator runs out of memory it calls alloc_pages(MAX_ORDER) to allocate a max order page. It creates a region for that memory and adds it to the list of regions. Then, the allocator can allocate from that region. To allow this, extend the command line parameter: prmem=size[KMG][,max_size[KMG]] Size is allocated upfront as mentioned before. Between size and max_size, prmem is expanded dynamically as mentioned above. Choosing a max order page means that no fragmentation is created for transparent huge pages and kmem slabs. But fragmentation may be created for 1GB pages. This is not a problem for 1GB pages that are reserved up front. This could be a problem for 1GB pages that are allocated at run time dynamically. If max_size is omitted from the command line parameter, no dynamic expansion will happen. Signed-off-by: Madhavan T. Venkataraman --- include/linux/prmem.h | 8 +++++++ kernel/prmem/prmem_allocator.c | 38 ++++++++++++++++++++++++++++++++++ kernel/prmem/prmem_init.c | 1 + kernel/prmem/prmem_misc.c | 3 ++- kernel/prmem/prmem_parse.c | 20 +++++++++++++++++- kernel/prmem/prmem_region.c | 1 + kernel/prmem/prmem_reserve.c | 1 + 7 files changed, 70 insertions(+), 2 deletions(-) diff --git a/include/linux/prmem.h b/include/linux/prmem.h index c7034690f7cb..bb552946cb5b 100644 --- a/include/linux/prmem.h +++ b/include/linux/prmem.h @@ -83,6 +83,9 @@ struct prmem_instance { * metadata Physical address of the metadata page. * size Size of initial memory allocated to prmem. * + * cur_size Current amount of memory allocated to prmem. + * max_size Maximum amount of memory that can be allocated to prmem. + * * regions List of memory regions. * * instances Persistent instances. @@ -95,6 +98,10 @@ struct prmem { unsigned long metadata; size_t size; + /* Dynamic expansion. */ + size_t cur_size; + size_t max_size; + /* Persistent Regions. */ struct list_head regions; @@ -109,6 +116,7 @@ extern struct prmem *prmem; extern unsigned long prmem_metadata; extern unsigned long prmem_pa; extern size_t prmem_size; +extern size_t prmem_max_size; extern bool prmem_inited; extern spinlock_t prmem_lock; diff --git a/kernel/prmem/prmem_allocator.c b/kernel/prmem/prmem_allocator.c index f12975bc6777..1cb3eae8a3e7 100644 --- a/kernel/prmem/prmem_allocator.c +++ b/kernel/prmem/prmem_allocator.c @@ -9,17 +9,55 @@ /* Page Allocation functions. */ +static void prmem_expand(void) +{ + struct prmem_region *region; + struct page *pages; + unsigned int order = MAX_ORDER; + size_t size = (1UL << order) << PAGE_SHIFT; + + if (prmem->cur_size + size > prmem->max_size) + return; + + spin_unlock(&prmem_lock); + pages = alloc_pages(GFP_NOWAIT, order); + spin_lock(&prmem_lock); + + if (!pages) + return; + + /* cur_size may have changed. Recheck. */ + if (prmem->cur_size + size > prmem->max_size) + goto free; + + region = prmem_add_region(page_to_phys(pages), size); + if (!region) + goto free; + + pr_warn("%s: prmem expanded by %ld\n", __func__, size); + return; +free: + __free_pages(pages, order); +} + void *prmem_alloc_pages_locked(unsigned int order) { struct prmem_region *region; void *va; size_t size = (1UL << order) << PAGE_SHIFT; + bool expand = true; +retry: list_for_each_entry(region, &prmem->regions, node) { va = prmem_alloc_pool(region, size, size); if (va) return va; } + if (expand) { + expand = false; + prmem_expand(); + goto retry; + } return NULL; } diff --git a/kernel/prmem/prmem_init.c b/kernel/prmem/prmem_init.c index 166fca688ab3..f4814cc88508 100644 --- a/kernel/prmem/prmem_init.c +++ b/kernel/prmem/prmem_init.c @@ -20,6 +20,7 @@ void __init prmem_init(void) /* Cold boot. */ prmem->metadata = prmem_metadata; prmem->size = prmem_size; + prmem->max_size = prmem_max_size; INIT_LIST_HEAD(&prmem->regions); INIT_LIST_HEAD(&prmem->instances); diff --git a/kernel/prmem/prmem_misc.c b/kernel/prmem/prmem_misc.c index 49b6a7232c1a..3100662d2cbe 100644 --- a/kernel/prmem/prmem_misc.c +++ b/kernel/prmem/prmem_misc.c @@ -68,7 +68,8 @@ bool __init prmem_validate(void) unsigned long checksum; /* Sanity check the boot parameter. */ - if (prmem_metadata != prmem->metadata || prmem_size != prmem->size) { + if (prmem_metadata != prmem->metadata || prmem_size != prmem->size || + prmem_max_size != prmem->max_size) { pr_warn("%s: Boot parameter mismatch\n", __func__); return false; } diff --git a/kernel/prmem/prmem_parse.c b/kernel/prmem/prmem_parse.c index 6c1a23c6b84e..3a57b37fa191 100644 --- a/kernel/prmem/prmem_parse.c +++ b/kernel/prmem/prmem_parse.c @@ -8,9 +8,11 @@ #include /* - * Syntax: prmem=size[KMG] + * Syntax: prmem=size[KMG][,max_size[KMG]] * * Specifies the size of the initial memory to be allocated to prmem. + * Optionally, specifies the maximum amount of memory to be allocated to + * prmem. prmem will expand dynamically between size and max_size. */ static int __init prmem_size_parse(char *cmdline) { @@ -28,6 +30,22 @@ static int __init prmem_size_parse(char *cmdline) } prmem_size = size; + prmem_max_size = size; + + cur = tmp; + if (*cur++ == ',') { + /* Get max size. */ + size = memparse(cur, &tmp); + if (cur == tmp || !size || size & (PAGE_SIZE - 1) || + size <= prmem_size) { + prmem_size = 0; + prmem_max_size = 0; + pr_warn("%s: Incorrect max size %lx\n", __func__, size); + return -EINVAL; + } + prmem_max_size = size; + } + return 0; } early_param("prmem", prmem_size_parse); diff --git a/kernel/prmem/prmem_region.c b/kernel/prmem/prmem_region.c index 6dc88c74d9c8..390329a34b74 100644 --- a/kernel/prmem/prmem_region.c +++ b/kernel/prmem/prmem_region.c @@ -82,5 +82,6 @@ struct prmem_region *prmem_add_region(unsigned long pa, size_t size) return NULL; list_add_tail(®ion->node, &prmem->regions); + prmem->cur_size += size; return region; } diff --git a/kernel/prmem/prmem_reserve.c b/kernel/prmem/prmem_reserve.c index 8000fff05402..c5ae5d7d8f0a 100644 --- a/kernel/prmem/prmem_reserve.c +++ b/kernel/prmem/prmem_reserve.c @@ -11,6 +11,7 @@ struct prmem *prmem; unsigned long prmem_metadata; unsigned long prmem_pa; unsigned long prmem_size; +unsigned long prmem_max_size; void __init prmem_reserve_early(void) { -- 2.25.1