Received: by 2002:a05:7412:1492:b0:e2:908c:2ebd with SMTP id s18csp566880rdh; Wed, 23 Aug 2023 08:20:08 -0700 (PDT) X-Google-Smtp-Source: AGHT+IEcokyGtMsOY0xLUYJVaCyP/rFGiKOx/YCbgr3/7CamlQf9iNFL48HX/iwsPaeYuBFBJBnY X-Received: by 2002:aca:1b16:0:b0:3a7:330d:93da with SMTP id b22-20020aca1b16000000b003a7330d93damr14430718oib.19.1692804007759; Wed, 23 Aug 2023 08:20:07 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1692804007; cv=none; d=google.com; s=arc-20160816; b=c2BO1c/SvlVWDdQ6iwqvw7BJmByfXaS2p9jV3ob/Jyam5JCEFBtzHQG72a+SiY9stA efs82e1krapclg6SFftAkAt17A68Ae1+WSS7RlyJJ2Myh6rX9eK7yRFrWQtqhEGidq1O Xri9WQFLe20Yicsdn0usw4O7WBfRmjUONvAUzE0crnKM+InInKww82/UmXB9Iqo51F5H V+gMWhrSZKW94u8wftfQtSx4/Ln9+CJf2UESyLVohmR9iUNrkIK2iRsVfnAd2/XdZsxL 6JVP7E94zlLZqxc0A0iNthDyyP53yMliBFFAIN3jOGcQkLK83h35BQFwFu9hpsxoCCLo 7Eug== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-transfer-encoding :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=sgVNyPPFwInBZcvvfNd1h6nIVezo14wBUrGibolXSc4=; fh=S0WGdJlf7rkb6bj4DQCVgsMH9D56d5ihQNvRisEZsR0=; b=CpJF+AVWtsQCH6oF5qnbaZ7B1sBlcaTMW/nqTLlmPKYoi3fmewgLqB/6gBekv5kayq OOg3ImOM1olGRdP7FjDihDb0N1CWpeAvBA/zmLTQrbhX1zIyaDGjKIGkRTZFmYyDdSka 2B7qGD0lB7wA/EpppEq1QM+K3agOGlluzQAqW/wDVWNiFEoG34ljjrSwiJZcBiQomFul JYIqJm8Ea7S0/MUpP/0WgiXYtl5uvLQGnmLlsON+/dIcGuRXjD6iFsm/PIgYgfyD+df7 Jg40LvtXSv/zdbbdHnmAa2DCpokKXE9xYk8OIo63scRC1t0+mOYtK2n4/MM/8juh8avk gMRA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=qZLsHVOk; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id n7-20020a6543c7000000b005694b73013fsi10533284pgp.261.2023.08.23.08.19.54; Wed, 23 Aug 2023 08:20:07 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=qZLsHVOk; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236119AbjHWOgb (ORCPT + 99 others); Wed, 23 Aug 2023 10:36:31 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44222 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235712AbjHWOgb (ORCPT ); Wed, 23 Aug 2023 10:36:31 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F1E0FE5F for ; Wed, 23 Aug 2023 07:36:28 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 7AB8765295 for ; Wed, 23 Aug 2023 14:36:28 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id AFEE6C433C9; Wed, 23 Aug 2023 14:36:23 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1692801387; bh=eRT5wpPzE2dG8Htf1U5WjVNZWKMJy4q5xPRnCCdw3jg=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=qZLsHVOkbIOn1hjfAmB+/RmDZAks8n7X+JQRAlO5qAuHYIP46RoFCPzYnCoKGUCnp Z2eOolqYNWZV2PQW9dxB5jwJjrxvWjHMjcWQtBg6xoG5DnSZS3dLDyUW24EKCuekIt rE3ZlfKpkUcQWNZjx9XGhO1bcpy+0l32Ltl0ool76uPBNsu1p0FqWRuwQ0BbNthHo1 2HKVGbgx5xwA/uU2bQh39yGp62AvxtIPtjuNc/t8C0INMsDxg9IpMOo37BomlTdeKu U0m1dojN3+E1f6F0fzWD6pVPmj4x8qVJoBDNEZ9SlNduy3drMmWNFVfpjWB/fNfVkk DkmwDwJ584Ajw== Date: Wed, 23 Aug 2023 15:35:56 +0100 From: Mike Rapoport To: Liam Ni Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, loongarch@lists.linux.dev, zhoubinbin@loongson.cn, chenfeiyang@loongson.cn, jiaxun.yang@flygoat.com, Andrew Morton , "H. Peter Anvin" , x86@kernel.org, Borislav Petkov , Ingo Molnar , Thomas Gleixner , peterz@infradead.org, luto@kernel.org, Dave Hansen , kernel@xen0n.name, chenhuacai@kernel.org Subject: Re: [RESEND PATCH V3] NUMA:Improve the efficiency of calculating pages loss Message-ID: <20230823143556.GA188089@kernel.org> References: <20230814155911.GN2607694@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Aug 22, 2023 at 07:49:05PM +0800, Liam Ni wrote: > On Tue, 15 Aug 2023 at 00:00, Mike Rapoport wrote: > > > > On Fri, Aug 04, 2023 at 11:32:51PM +0800, Liam Ni wrote: > > > Optimize the way of calculating missing pages. > > > > > > In the previous implementation, We calculate missing pages as follows: > > > 1. calculate numaram by traverse all the numa_meminfo's and for each of > > > them traverse all the regions in memblock.memory to prepare for > > > counting missing pages. > > > > > > 2. Traverse all the regions in memblock.memory again to get e820ram. > > > > > > 3. the missing page is (e820ram - numaram ) > > > > > > But,it's enough to count memory in ‘memblock.memory’ that doesn't have > > > the node assigned. > > > > > > V2:https://lore.kernel.org/all/20230619075315.49114-1-zhiguangni01@gmail.com/ > > > V1:https://lore.kernel.org/all/20230615142016.419570-1-zhiguangni01@gmail.com/ > > > > > > Signed-off-by: Liam Ni > > > --- > > > arch/loongarch/kernel/numa.c | 23 ++++++++--------------- > > > arch/x86/mm/numa.c | 26 +++++++------------------- > > > include/linux/mm.h | 1 + > > > mm/mm_init.c | 20 ++++++++++++++++++++ > > > 4 files changed, 36 insertions(+), 34 deletions(-) > > > > > > diff --git a/arch/loongarch/kernel/numa.c b/arch/loongarch/kernel/numa.c > > > index 708665895b47..0239891e4d19 100644 > > > --- a/arch/loongarch/kernel/numa.c > > > +++ b/arch/loongarch/kernel/numa.c > > > @@ -262,25 +262,18 @@ static void __init node_mem_init(unsigned int node) > > > * Sanity check to catch more bad NUMA configurations (they are amazingly > > > * common). Make sure the nodes cover all memory. > > > */ > > > -static bool __init numa_meminfo_cover_memory(const struct numa_meminfo *mi) > > > +static bool __init memblock_validate_numa_coverage(const u64 limit) > > > > There is no need to have arch specific memblock_validate_numa_coverage(). > > You can add this function to memblock and call it from NUMA initialization > > instead of numa_meminfo_cover_memory(). > > Remove implementation of numa_meminfo_cover_memory function? Yes, that's the idea. > > The memblock_validate_numa_coverage() will count all the pages without node > > ID set and compare to the threshold provided by the architectures. > > > > > { > > > - int i; > > > - u64 numaram, biosram; > > > + u64 lo_pg; > > > > > > - numaram = 0; > > > - for (i = 0; i < mi->nr_blks; i++) { > > > - u64 s = mi->blk[i].start >> PAGE_SHIFT; > > > - u64 e = mi->blk[i].end >> PAGE_SHIFT; > > > + lo_pg = max_pfn - calculate_without_node_pages_in_range(); > > > > > > - numaram += e - s; > > > - numaram -= __absent_pages_in_range(mi->blk[i].nid, s, e); > > > - if ((s64)numaram < 0) > > > - numaram = 0; > > > + /* We seem to lose 3 pages somewhere. Allow 1M of slack. */ > > > + if (lo_pg >= limit) { > > > + pr_err("NUMA: We lost 1m size page.\n"); > > > + return false; > > > } > > > - max_pfn = max_low_pfn; > > > - biosram = max_pfn - absent_pages_in_range(0, max_pfn); > > > > > > - BUG_ON((s64)(biosram - numaram) >= (1 << (20 - PAGE_SHIFT))); > > > return true; > > > } > > > > > > @@ -428,7 +421,7 @@ int __init init_numa_memory(void) > > > return -EINVAL; > > > > > > init_node_memblock(); > > > - if (numa_meminfo_cover_memory(&numa_meminfo) == false) > > > + if (memblock_validate_numa_coverage(SZ_1M) == false) > > > return -EINVAL; > > > > > > for_each_node_mask(node, node_possible_map) { > > > diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c > > > index 2aadb2019b4f..14feec144675 100644 > > > --- a/arch/x86/mm/numa.c > > > +++ b/arch/x86/mm/numa.c > > > @@ -451,30 +451,18 @@ EXPORT_SYMBOL(__node_distance); > > > * Sanity check to catch more bad NUMA configurations (they are amazingly > > > * common). Make sure the nodes cover all memory. > > > */ > > > -static bool __init numa_meminfo_cover_memory(const struct numa_meminfo *mi) > > > +static bool __init memblock_validate_numa_coverage(const u64 limit) > > > { > > > - u64 numaram, e820ram; > > > - int i; > > > + u64 lo_pg; > > > > > > - numaram = 0; > > > - for (i = 0; i < mi->nr_blks; i++) { > > > - u64 s = mi->blk[i].start >> PAGE_SHIFT; > > > - u64 e = mi->blk[i].end >> PAGE_SHIFT; > > > - numaram += e - s; > > > - numaram -= __absent_pages_in_range(mi->blk[i].nid, s, e); > > > - if ((s64)numaram < 0) > > > - numaram = 0; > > > - } > > > - > > > - e820ram = max_pfn - absent_pages_in_range(0, max_pfn); > > > + lo_pg = max_pfn - calculate_without_node_pages_in_range(); > > > > > > /* We seem to lose 3 pages somewhere. Allow 1M of slack. */ > > > - if ((s64)(e820ram - numaram) >= (1 << (20 - PAGE_SHIFT))) { > > > - printk(KERN_ERR "NUMA: nodes only cover %LuMB of your > > > %LuMB e820 RAM. Not used.\n", > > > - (numaram << PAGE_SHIFT) >> 20, > > > - (e820ram << PAGE_SHIFT) >> 20); > > > + if (lo_pg >= limit) { > > > + pr_err("NUMA: We lost 1m size page.\n"); > > > return false; > > > } > > > + > > > return true; > > > } > > > > > > @@ -583,7 +571,7 @@ static int __init numa_register_memblks(struct > > > numa_meminfo *mi) > > > return -EINVAL; > > > } > > > } > > > - if (!numa_meminfo_cover_memory(mi)) > > > + if (!memblock_validate_numa_coverage(SZ_1M)) > > > return -EINVAL; > > > > > > /* Finally register nodes. */ > > > diff --git a/include/linux/mm.h b/include/linux/mm.h > > > index 0daef3f2f029..b32457ad1ae3 100644 > > > --- a/include/linux/mm.h > > > +++ b/include/linux/mm.h > > > @@ -3043,6 +3043,7 @@ unsigned long __absent_pages_in_range(int nid, > > > unsigned long start_pfn, > > > unsigned long end_pfn); > > > extern unsigned long absent_pages_in_range(unsigned long start_pfn, > > > unsigned long end_pfn); > > > +extern unsigned long calculate_without_node_pages_in_range(void); > > > extern void get_pfn_range_for_nid(unsigned int nid, > > > unsigned long *start_pfn, unsigned long *end_pfn); > > > > > > diff --git a/mm/mm_init.c b/mm/mm_init.c > > > index 3ddd18a89b66..13a4883787e3 100644 > > > --- a/mm/mm_init.c > > > +++ b/mm/mm_init.c > > > @@ -1132,6 +1132,26 @@ static void __init > > > adjust_zone_range_for_zone_movable(int nid, > > > } > > > } > > > > > > +/** > > > + * @start_pfn: The start PFN to start searching for holes > > > + * @end_pfn: The end PFN to stop searching for holes > > > + * > > > + * Return: Return the number of page frames without node assigned > > > within a range. > > > + */ > > > +unsigned long __init calculate_without_node_pages_in_range(void) > > > +{ > > > + unsigned long num_pages; > > > + unsigned long start_pfn, end_pfn; > > > + int nid, i; > > > + > > > + for_each_mem_pfn_range(i, MAX_NUMNODES, &start_pfn, &end_pfn, &nid) { > > > + if (nid == NUMA_NO_NODE) > > > + num_pages += end_pfn - start_pfn; > > > + } > > > + > > > + return num_pages; > > > +} > > > + > > > /* > > > * Return the number of holes in a range on a node. If nid is MAX_NUMNODES, > > > * then all holes in the requested range will be accounted for. > > > -- > > > 2.25.1 > > > > -- > > Sincerely yours, > > Mike. -- Sincerely yours, Mike.