Received: by 2002:ac0:a5a7:0:0:0:0:0 with SMTP id m36-v6csp840763imm; Fri, 27 Jul 2018 07:05:17 -0700 (PDT) X-Google-Smtp-Source: AAOMgpdiFaTLK3nwDGisJ4gqNLjNgiNFmRICidhT/MmFChfWzrp29wOh5A0q+rToDIo+HJ2ficH9 X-Received: by 2002:a62:c00c:: with SMTP id x12-v6mr6801866pff.216.1532700317724; Fri, 27 Jul 2018 07:05:17 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1532700317; cv=none; d=google.com; s=arc-20160816; b=G6nPriGHZsepXn58dKbwu29zkZbD12BnEyt2mshm6t5EGyz9rxQdN7ruCx8LmRfRs2 IYI6grugcTg0Da/lUcWAysmKF0MXJBC+73nD6TpaBUfAAvwzCcHOj0aKauDmGZ8EOtLG x/qqdb3nrd/XTAjT5zreCxQ9N2VUAbOglR2mr36yVcc4+6GdXOvIk2CYuX3AtScsb9Qd BNNVgK+gJTK61IrIBKzmZPCUnj+NgY7VNTZThcsqAaMoaACbbdmO0S3yrdNyI1uj24/I h2/FC85mSw6yQc1kei0LBUJQ1XIJ0LNy05W6WteCYxp9LisVCp6RnMyDzsxCdda7Tbod ym3w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:arc-authentication-results; bh=1mWj6kw4V2h85RIilg8mu7E83vFWeK2uvy+173htq1k=; b=w+0HzNHqdFM2gB4B7MjugVMcaAwBNNN5peLsmSwlo6Lm/KmttJsUXkzJOWqJgtMnvO cxCtA3FOrJdz8f/CrghEp2PZYgk7HAij6UYP0PkZF1MQdBaMdmxlWqz71pJ0BShrGj6u Tb9vLcg+B5CZO8uTEPooDRv7KWOlRmfsXHbeMjrEqVSJHroIM/R2VuUXLWjjujBdC+x4 qh6aJLfp/5zndTP6PPGTGZwGpg1lQWxPSrWd/XF1h12W2/zNoDIelauXeKoQzsXqcKxo E8RBKP6eSyOv8jL+LRcg5mvlH+F1vug84UJoFRKiaNKu3Iz6NzeP4DA09fJUHA8TozKK oXJA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id n1-v6si3577834pld.429.2018.07.27.07.05.02; Fri, 27 Jul 2018 07:05:17 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2388745AbeG0PZn (ORCPT + 99 others); Fri, 27 Jul 2018 11:25:43 -0400 Received: from mail-wr1-f67.google.com ([209.85.221.67]:33904 "EHLO mail-wr1-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730737AbeG0PZl (ORCPT ); Fri, 27 Jul 2018 11:25:41 -0400 Received: by mail-wr1-f67.google.com with SMTP id c13-v6so5175455wrt.1 for ; Fri, 27 Jul 2018 07:03:36 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=1mWj6kw4V2h85RIilg8mu7E83vFWeK2uvy+173htq1k=; b=qzsFW0n8d8OyKhojQs+dsJGC/Xd0brYiUc77jStMnlfP4hKLuhDTDX4ncuLFkMo+ee kDuRu3CNXZQ3wdLAKDCTMJC4iMQZUG9ebK2VS3jIpVSC07kgWw+fTxlQP3JZjMe2h4gs lXBAIaKLbFELjVtPLlIvCJ2SvC1upa4okaUBM2f0K3GMAZ5OY5Phv7p2le9uvhqfU6t7 bjSr/S6Nj0idYT2yXXqWmjp9Qk/6hkzR++LiMUp1eBWSvJROne3Og/jBUXRMHwwaWCKF XfeN/0yZplxrWMUfLKzStvSEgMTAfK0VMF+0C1y2DrtU9nFJMoyKhWuXoJMQqvnO2Mwg EELg== X-Gm-Message-State: AOUpUlGvywNWKsK9rmiFFPv40TbVyJYI7DRxt5mUEihqCOVeyv3u8BEH hthO1lxsZvSb6ltK9OWS8KE= X-Received: by 2002:adf:96c3:: with SMTP id u61-v6mr4838403wrb.72.1532700215535; Fri, 27 Jul 2018 07:03:35 -0700 (PDT) Received: from techadventures.net (techadventures.net. [62.201.165.239]) by smtp.gmail.com with ESMTPSA id b126-v6sm6507972wmf.41.2018.07.27.07.03.32 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 27 Jul 2018 07:03:33 -0700 (PDT) Received: from d104.suse.de (nat.nue.novell.com [195.135.221.2]) by techadventures.net (Postfix) with ESMTPA id 29F14124408; Fri, 27 Jul 2018 16:03:32 +0200 (CEST) From: osalvador@techadventures.net To: akpm@linux-foundation.org Cc: mhocko@suse.com, vbabka@suse.cz, pasha.tatashin@oracle.com, mgorman@techsingularity.net, aaron.lu@intel.com, iamjoonsoo.kim@lge.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, dan.j.williams@intel.com, Oscar Salvador Subject: [PATCH v4 4/4] mm/page_alloc: Introduce free_area_init_core_hotplug Date: Fri, 27 Jul 2018 16:03:25 +0200 Message-Id: <20180727140325.11881-5-osalvador@techadventures.net> X-Mailer: git-send-email 2.13.6 In-Reply-To: <20180727140325.11881-1-osalvador@techadventures.net> References: <20180727140325.11881-1-osalvador@techadventures.net> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Oscar Salvador Currently, whenever a new node is created/re-used from the memhotplug path, we call free_area_init_node()->free_area_init_core(). But there is some code that we do not really need to run when we are coming from such path. free_area_init_core() performs the following actions: 1) Initializes pgdat internals, such as spinlock, waitqueues and more. 2) Account # nr_all_pages and # nr_kernel_pages. These values are used later on when creating hash tables. 3) Account number of managed_pages per zone, substracting dma_reserved and memmap pages. 4) Initializes some fields of the zone structure data 5) Calls init_currently_empty_zone to initialize all the freelists 6) Calls memmap_init to initialize all pages belonging to certain zone When called from memhotplug path, free_area_init_core() only performs actions #1 and #4. Action #2 is pointless as the zones do not have any pages since either the node was freed, or we are re-using it, eitherway all zones belonging to this node should have 0 pages. For the same reason, action #3 results always in manages_pages being 0. Action #5 and #6 are performed later on when onlining the pages: online_pages()->move_pfn_range_to_zone()->init_currently_empty_zone() online_pages()->move_pfn_range_to_zone()->memmap_init_zone() This patch does two things: First, moves the node/zone initializtion to their own function, so it allows us to create a small version of free_area_init_core, where we only perform: 1) Initialization of pgdat internals, such as spinlock, waitqueues and more 4) Initialization of some fields of the zone structure data These two functions are: pgdat_init_internals() and zone_init_internals(). The second thing this patch does, is to introduce free_area_init_core_hotplug(), the memhotplug version of free_area_init_core(): Currently, we call free_area_init_node() from the memhotplug path. In there, we set some pgdat's fields, and call calculate_node_totalpages(). calculate_node_totalpages() calculates the # of pages the node has. Since the node is either new, or we are re-using it, the zones belonging to this node should not have any pages, so there is no point to calculate this now. Actually, we re-set these values to 0 later on with the calls to: reset_node_managed_pages() reset_node_present_pages() The # of pages per node and the # of pages per zone will be calculated when onlining the pages: online_pages()->move_pfn_range()->move_pfn_range_to_zone()->resize_zone_range() online_pages()->move_pfn_range()->move_pfn_range_to_zone()->resize_pgdat_range() Also, since free_area_init_core() will now only get called during early init, let us replace __paginginit with __init, so its code gets freed up. Signed-off-by: Oscar Salvador --- include/linux/mm.h | 4 +++ mm/memory_hotplug.c | 16 ++++-------- mm/page_alloc.c | 71 ++++++++++++++++++++++++++++++++++++++--------------- 3 files changed, 60 insertions(+), 31 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 6954ad183159..17bab044c1bc 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2002,6 +2002,10 @@ extern void free_area_init_node(int nid, unsigned long * zones_size, unsigned long zone_start_pfn, unsigned long *zholes_size); extern void free_initmem(void); +#ifdef CONFIG_MEMORY_HOTPLUG +extern void free_area_init_core_hotplug(int nid); +#endif + /* * Free reserved pages within range [PAGE_ALIGN(start), end & PAGE_MASK) * into the buddy system. The freed pages will be poisoned with pattern diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 4eb6e824a80c..9eea6e809a4e 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -982,8 +982,6 @@ static void reset_node_present_pages(pg_data_t *pgdat) static pg_data_t __ref *hotadd_new_pgdat(int nid, u64 start) { struct pglist_data *pgdat; - unsigned long zones_size[MAX_NR_ZONES] = {0}; - unsigned long zholes_size[MAX_NR_ZONES] = {0}; unsigned long start_pfn = PFN_DOWN(start); pgdat = NODE_DATA(nid); @@ -1006,8 +1004,11 @@ static pg_data_t __ref *hotadd_new_pgdat(int nid, u64 start) /* we can use NODE_DATA(nid) from here */ + pgdat->node_id = nid; + pgdat->node_start_pfn = start_pfn; + /* init node's zones as empty zones, we don't have any present pages.*/ - free_area_init_node(nid, zones_size, start_pfn, zholes_size); + free_area_init_core_hotplug(nid); pgdat->per_cpu_nodestats = alloc_percpu(struct per_cpu_nodestat); /* @@ -1017,18 +1018,11 @@ static pg_data_t __ref *hotadd_new_pgdat(int nid, u64 start) build_all_zonelists(pgdat); /* - * zone->managed_pages is set to an approximate value in - * free_area_init_core(), which will cause - * /sys/device/system/node/nodeX/meminfo has wrong data. - * So reset it to 0 before any memory is onlined. - */ - reset_node_managed_pages(pgdat); - - /* * When memory is hot-added, all the memory is in offline state. So * clear all zones' present_pages because they will be updated in * online_pages() and offline_pages(). */ + reset_node_managed_pages(pgdat); reset_node_present_pages(pgdat); return pgdat; diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 4e84a17a5030..4e386423622a 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -6237,21 +6237,9 @@ static void pgdat_init_kcompactd(struct pglist_data *pgdat) static void pgdat_init_kcompactd(struct pglist_data *pgdat) {} #endif -/* - * Set up the zone data structures: - * - mark all pages reserved - * - mark all memory queues empty - * - clear the memory bitmaps - * - * NOTE: pgdat should get zeroed by caller. - */ -static void __paginginit free_area_init_core(struct pglist_data *pgdat) +static void __paginginit pgdat_init_internals(struct pglist_data *pgdat) { - enum zone_type j; - int nid = pgdat->node_id; - pgdat_resize_init(pgdat); - pgdat_init_numabalancing(pgdat); pgdat_init_split_queue(pgdat); pgdat_init_kcompactd(pgdat); @@ -6262,7 +6250,56 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat) pgdat_page_ext_init(pgdat); spin_lock_init(&pgdat->lru_lock); lruvec_init(node_lruvec(pgdat)); +} + +static void __paginginit zone_init_internals(struct zone *zone, enum zone_type idx, int nid, + unsigned long remaining_pages) +{ + zone->managed_pages = remaining_pages; + zone_set_nid(zone, nid); + zone->name = zone_names[idx]; + zone->zone_pgdat = NODE_DATA(nid); + spin_lock_init(&zone->lock); + zone_seqlock_init(zone); + zone_pcp_init(zone); +} + +/* + * Set up the zone data structures + * - init pgdat internals + * - init all zones belonging to this node + * + * NOTE: this function is only called during memory hotplug + */ +#ifdef CONFIG_MEMORY_HOTPLUG +void __paginginit free_area_init_core_hotplug(int nid) +{ + enum zone_type j; + pg_data_t *pgdat = NODE_DATA(nid); + + pgdat_init_internals(pgdat); + for (j = 0; j < MAX_NR_ZONES; j++) { + struct zone *zone = pgdat->node_zones + j; + zone_init_internals(zone, j, nid, 0); + } +} +#endif + +/* + * Set up the zone data structures: + * - mark all pages reserved + * - mark all memory queues empty + * - clear the memory bitmaps + * + * NOTE: pgdat should get zeroed by caller. + * NOTE: this function is only called during early init. + */ +static void __init free_area_init_core(struct pglist_data *pgdat) +{ + enum zone_type j; + int nid = pgdat->node_id; + pgdat_init_internals(pgdat); pgdat->per_cpu_nodestats = &boot_nodestats; for (j = 0; j < MAX_NR_ZONES; j++) { @@ -6310,13 +6347,7 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat) * when the bootmem allocator frees pages into the buddy system. * And all highmem pages will be managed by the buddy system. */ - zone->managed_pages = freesize; - zone_set_nid(zone, nid); - zone->name = zone_names[j]; - zone->zone_pgdat = pgdat; - spin_lock_init(&zone->lock); - zone_seqlock_init(zone); - zone_pcp_init(zone); + zone_init_internals(zone, j, nid, freesize); if (!size) continue; -- 2.13.6