2018-07-30 10:20:26

by Oscar Salvador

[permalink] [raw]
Subject: [PATCH v5 0/4] Refactor free_area_init_core and add free_area_init_core_hotplug

From: Oscar Salvador <[email protected]>

Changes:

v4 -> v5:
- Remove __ref from hotadd_new_pgdat and placed it to
free_area_init_core_hotplug. (Suggested by Pavel)
- Since free_area_init_core_hotplug is now allowed to be in a different
section (__ref), remove the __paginginit.)
- Stylecode in free_area_init_core_hotplug (Suggested by Pavel)
- Replace s/@__paginginit/@__init for free_area_init_node/free_area_init_core
as these functions are now only called during early init.
- Add Reviewd-by from Pavel

v3 -> v4:
- Unify patch-5 and patch-4.
- Make free_area_init_core __init (Suggested by Michal).
- Make zone_init_internals __paginginit (Suggested by Pavel).
- Add Reviewed-by/Acked-by:

v2 -> v3:
- Think better about split free_area_init_core for
memhotplug/early init context (Suggested by Michal).

This patchset does three things:

1) Clean up/refactor free_area_init_core/free_area_init_node
by moving the ifdefery out of the functions.
2) Move the pgdat/zone initialization in free_area_init_core to its
own function.
3) Introduce free_area_init_core_hotplug, a small subset of free_area_init_core,
which is only called from memhotlug code path.
In this way, we have:

free_area_init_core: called during early initialization
free_area_init_core_hotplug: called whenever a new node is allocated/re-used (memhotplug path)

Oscar Salvador (3):
mm/page_alloc: Move ifdefery out of free_area_init_core
mm/page_alloc: Inline function to handle
CONFIG_DEFERRED_STRUCT_PAGE_INIT
mm/page_alloc: Introduce free_area_init_core_hotplug

Pavel Tatashin (1):
mm: access zone->node via zone_to_nid() and zone_set_nid()

include/linux/mm.h | 15 ++----
include/linux/mmzone.h | 26 +++++++---
mm/memory_hotplug.c | 16 ++----
mm/mempolicy.c | 4 +-
mm/mm_init.c | 9 +---
mm/page_alloc.c | 134 +++++++++++++++++++++++++++++++++++--------------
6 files changed, 130 insertions(+), 74 deletions(-)

--
2.13.6



2018-07-30 10:19:26

by Oscar Salvador

[permalink] [raw]
Subject: [PATCH v5 2/4] mm: access zone->node via zone_to_nid() and zone_set_nid()

From: Pavel Tatashin <[email protected]>

zone->node is configured only when CONFIG_NUMA=y, so it is a good idea to
have inline functions to access this field in order to avoid ifdef's in
c files.

Signed-off-by: Pavel Tatashin <[email protected]>
Signed-off-by: Oscar Salvador <[email protected]>
Reviewed-by: Oscar Salvador <[email protected]>
Acked-by: Michal Hocko <[email protected]>
---
include/linux/mm.h | 9 ---------
include/linux/mmzone.h | 26 ++++++++++++++++++++------
mm/mempolicy.c | 4 ++--
mm/mm_init.c | 9 ++-------
mm/page_alloc.c | 10 ++++------
5 files changed, 28 insertions(+), 30 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 726e71475144..6954ad183159 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -940,15 +940,6 @@ static inline int page_zone_id(struct page *page)
return (page->flags >> ZONEID_PGSHIFT) & ZONEID_MASK;
}

-static inline int zone_to_nid(struct zone *zone)
-{
-#ifdef CONFIG_NUMA
- return zone->node;
-#else
- return 0;
-#endif
-}
-
#ifdef NODE_NOT_IN_PAGE_FLAGS
extern int page_to_nid(const struct page *page);
#else
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index ae1a034c3e2c..17fdff3bfb41 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -842,6 +842,25 @@ static inline bool populated_zone(struct zone *zone)
return zone->present_pages;
}

+#ifdef CONFIG_NUMA
+static inline int zone_to_nid(struct zone *zone)
+{
+ return zone->node;
+}
+
+static inline void zone_set_nid(struct zone *zone, int nid)
+{
+ zone->node = nid;
+}
+#else
+static inline int zone_to_nid(struct zone *zone)
+{
+ return 0;
+}
+
+static inline void zone_set_nid(struct zone *zone, int nid) {}
+#endif
+
extern int movable_zone;

#ifdef CONFIG_HIGHMEM
@@ -957,12 +976,7 @@ static inline int zonelist_zone_idx(struct zoneref *zoneref)

static inline int zonelist_node_idx(struct zoneref *zoneref)
{
-#ifdef CONFIG_NUMA
- /* zone_to_nid not available in this context */
- return zoneref->zone->node;
-#else
- return 0;
-#endif /* CONFIG_NUMA */
+ return zone_to_nid(zoneref->zone);
}

struct zoneref *__next_zones_zonelist(struct zoneref *z,
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index f0fcf70bcec7..8c1c09b3852a 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -1784,7 +1784,7 @@ unsigned int mempolicy_slab_node(void)
zonelist = &NODE_DATA(node)->node_zonelists[ZONELIST_FALLBACK];
z = first_zones_zonelist(zonelist, highest_zoneidx,
&policy->v.nodes);
- return z->zone ? z->zone->node : node;
+ return z->zone ? zone_to_nid(z->zone) : node;
}

default:
@@ -2326,7 +2326,7 @@ int mpol_misplaced(struct page *page, struct vm_area_struct *vma, unsigned long
node_zonelist(numa_node_id(), GFP_HIGHUSER),
gfp_zone(GFP_HIGHUSER),
&pol->v.nodes);
- polnid = z->zone->node;
+ polnid = zone_to_nid(z->zone);
break;

default:
diff --git a/mm/mm_init.c b/mm/mm_init.c
index 5b72266b4b03..6838a530789b 100644
--- a/mm/mm_init.c
+++ b/mm/mm_init.c
@@ -53,13 +53,8 @@ void __init mminit_verify_zonelist(void)
zone->name);

/* Iterate the zonelist */
- for_each_zone_zonelist(zone, z, zonelist, zoneid) {
-#ifdef CONFIG_NUMA
- pr_cont("%d:%s ", zone->node, zone->name);
-#else
- pr_cont("0:%s ", zone->name);
-#endif /* CONFIG_NUMA */
- }
+ for_each_zone_zonelist(zone, z, zonelist, zoneid)
+ pr_cont("%d:%s ", zone_to_nid(zone), zone->name);
pr_cont("\n");
}
}
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 8a73305f7c55..10b754fba5fa 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2909,10 +2909,10 @@ static inline void zone_statistics(struct zone *preferred_zone, struct zone *z)
if (!static_branch_likely(&vm_numa_stat_key))
return;

- if (z->node != numa_node_id())
+ if (zone_to_nid(z) != numa_node_id())
local_stat = NUMA_OTHER;

- if (z->node == preferred_zone->node)
+ if (zone_to_nid(z) == zone_to_nid(preferred_zone))
__inc_numa_state(z, NUMA_HIT);
else {
__inc_numa_state(z, NUMA_MISS);
@@ -5287,7 +5287,7 @@ int local_memory_node(int node)
z = first_zones_zonelist(node_zonelist(node, GFP_KERNEL),
gfp_zone(GFP_KERNEL),
NULL);
- return z->zone->node;
+ return zone_to_nid(z->zone);
}
#endif

@@ -6311,9 +6311,7 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat)
* And all highmem pages will be managed by the buddy system.
*/
zone->managed_pages = freesize;
-#ifdef CONFIG_NUMA
- zone->node = nid;
-#endif
+ zone_set_nid(zone, nid);
zone->name = zone_names[j];
zone->zone_pgdat = pgdat;
spin_lock_init(&zone->lock);
--
2.13.6


2018-07-30 10:19:43

by Oscar Salvador

[permalink] [raw]
Subject: [PATCH v5 4/4] mm/page_alloc: Introduce free_area_init_core_hotplug

From: Oscar Salvador <[email protected]>

Currently, whenever a new node is created/re-used from the memhotplug path,
we call free_area_init_node()->free_area_init_core().
But there is some code that we do not really need to run when we are coming
from such path.

free_area_init_core() performs the following actions:

1) Initializes pgdat internals, such as spinlock, waitqueues and more.
2) Account # nr_all_pages and # nr_kernel_pages. These values are used later on
when creating hash tables.
3) Account number of managed_pages per zone, substracting dma_reserved and memmap pages.
4) Initializes some fields of the zone structure data
5) Calls init_currently_empty_zone to initialize all the freelists
6) Calls memmap_init to initialize all pages belonging to certain zone

When called from memhotplug path, free_area_init_core() only performs actions #1 and #4.

Action #2 is pointless as the zones do not have any pages since either the node was freed,
or we are re-using it, eitherway all zones belonging to this node should have 0 pages.
For the same reason, action #3 results always in manages_pages being 0.

Action #5 and #6 are performed later on when onlining the pages:
online_pages()->move_pfn_range_to_zone()->init_currently_empty_zone()
online_pages()->move_pfn_range_to_zone()->memmap_init_zone()

This patch does two things:

First, moves the node/zone initializtion to their own function, so it allows us
to create a small version of free_area_init_core, where we only perform:

1) Initialization of pgdat internals, such as spinlock, waitqueues and more
4) Initialization of some fields of the zone structure data

These two functions are: pgdat_init_internals() and zone_init_internals().

The second thing this patch does, is to introduce free_area_init_core_hotplug(),
the memhotplug version of free_area_init_core():

Currently, we call free_area_init_node() from the memhotplug path.
In there, we set some pgdat's fields, and call calculate_node_totalpages().
calculate_node_totalpages() calculates the # of pages the node has.

Since the node is either new, or we are re-using it, the zones belonging to
this node should not have any pages, so there is no point to calculate this now.

Actually, we re-set these values to 0 later on with the calls to:

reset_node_managed_pages()
reset_node_present_pages()

The # of pages per node and the # of pages per zone will be calculated when
onlining the pages:

online_pages()->move_pfn_range()->move_pfn_range_to_zone()->resize_zone_range()
online_pages()->move_pfn_range()->move_pfn_range_to_zone()->resize_pgdat_range()

Also, since free_area_init_core/free_area_init_node will now only get called during early init, let us replace
__paginginit with __init, so their code gets freed up.

Signed-off-by: Oscar Salvador <[email protected]>
Reviewed-by: Pavel Tatashin <[email protected]>
---
include/linux/mm.h | 6 ++++-
mm/memory_hotplug.c | 16 ++++--------
mm/page_alloc.c | 71 +++++++++++++++++++++++++++++++++++++----------------
3 files changed, 60 insertions(+), 33 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 6954ad183159..af3222785347 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1998,10 +1998,14 @@ static inline spinlock_t *pud_lock(struct mm_struct *mm, pud_t *pud)

extern void __init pagecache_init(void);
extern void free_area_init(unsigned long * zones_size);
-extern void free_area_init_node(int nid, unsigned long * zones_size,
+extern void __init free_area_init_node(int nid, unsigned long * zones_size,
unsigned long zone_start_pfn, unsigned long *zholes_size);
extern void free_initmem(void);

+#ifdef CONFIG_MEMORY_HOTPLUG
+extern void __ref free_area_init_core_hotplug(int nid);
+#endif
+
/*
* Free reserved pages within range [PAGE_ALIGN(start), end & PAGE_MASK)
* into the buddy system. The freed pages will be poisoned with pattern
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 4eb6e824a80c..9eea6e809a4e 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -982,8 +982,6 @@ static void reset_node_present_pages(pg_data_t *pgdat)
static pg_data_t __ref *hotadd_new_pgdat(int nid, u64 start)
{
struct pglist_data *pgdat;
- unsigned long zones_size[MAX_NR_ZONES] = {0};
- unsigned long zholes_size[MAX_NR_ZONES] = {0};
unsigned long start_pfn = PFN_DOWN(start);

pgdat = NODE_DATA(nid);
@@ -1006,8 +1004,11 @@ static pg_data_t __ref *hotadd_new_pgdat(int nid, u64 start)

/* we can use NODE_DATA(nid) from here */

+ pgdat->node_id = nid;
+ pgdat->node_start_pfn = start_pfn;
+
/* init node's zones as empty zones, we don't have any present pages.*/
- free_area_init_node(nid, zones_size, start_pfn, zholes_size);
+ free_area_init_core_hotplug(nid);
pgdat->per_cpu_nodestats = alloc_percpu(struct per_cpu_nodestat);

/*
@@ -1017,18 +1018,11 @@ static pg_data_t __ref *hotadd_new_pgdat(int nid, u64 start)
build_all_zonelists(pgdat);

/*
- * zone->managed_pages is set to an approximate value in
- * free_area_init_core(), which will cause
- * /sys/device/system/node/nodeX/meminfo has wrong data.
- * So reset it to 0 before any memory is onlined.
- */
- reset_node_managed_pages(pgdat);
-
- /*
* When memory is hot-added, all the memory is in offline state. So
* clear all zones' present_pages because they will be updated in
* online_pages() and offline_pages().
*/
+ reset_node_managed_pages(pgdat);
reset_node_present_pages(pgdat);

return pgdat;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 4e84a17a5030..b2ccade42020 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6237,21 +6237,9 @@ static void pgdat_init_kcompactd(struct pglist_data *pgdat)
static void pgdat_init_kcompactd(struct pglist_data *pgdat) {}
#endif

-/*
- * Set up the zone data structures:
- * - mark all pages reserved
- * - mark all memory queues empty
- * - clear the memory bitmaps
- *
- * NOTE: pgdat should get zeroed by caller.
- */
-static void __paginginit free_area_init_core(struct pglist_data *pgdat)
+static void __paginginit pgdat_init_internals(struct pglist_data *pgdat)
{
- enum zone_type j;
- int nid = pgdat->node_id;
-
pgdat_resize_init(pgdat);
-
pgdat_init_numabalancing(pgdat);
pgdat_init_split_queue(pgdat);
pgdat_init_kcompactd(pgdat);
@@ -6262,7 +6250,54 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat)
pgdat_page_ext_init(pgdat);
spin_lock_init(&pgdat->lru_lock);
lruvec_init(node_lruvec(pgdat));
+}
+
+static void __paginginit zone_init_internals(struct zone *zone, enum zone_type idx, int nid,
+ unsigned long remaining_pages)
+{
+ zone->managed_pages = remaining_pages;
+ zone_set_nid(zone, nid);
+ zone->name = zone_names[idx];
+ zone->zone_pgdat = NODE_DATA(nid);
+ spin_lock_init(&zone->lock);
+ zone_seqlock_init(zone);
+ zone_pcp_init(zone);
+}
+
+/*
+ * Set up the zone data structures
+ * - init pgdat internals
+ * - init all zones belonging to this node
+ *
+ * NOTE: this function is only called during memory hotplug
+ */
+#ifdef CONFIG_MEMORY_HOTPLUG
+void __ref free_area_init_core_hotplug(int nid)
+{
+ enum zone_type z;
+ pg_data_t *pgdat = NODE_DATA(nid);
+
+ pgdat_init_internals(pgdat);
+ for (z = 0; z < MAX_NR_ZONES; z++)
+ zone_init_internals(&pgdat->node_zones[z], z, nid, 0);
+}
+#endif
+
+/*
+ * Set up the zone data structures:
+ * - mark all pages reserved
+ * - mark all memory queues empty
+ * - clear the memory bitmaps
+ *
+ * NOTE: pgdat should get zeroed by caller.
+ * NOTE: this function is only called during early init.
+ */
+static void __init free_area_init_core(struct pglist_data *pgdat)
+{
+ enum zone_type j;
+ int nid = pgdat->node_id;

+ pgdat_init_internals(pgdat);
pgdat->per_cpu_nodestats = &boot_nodestats;

for (j = 0; j < MAX_NR_ZONES; j++) {
@@ -6310,13 +6345,7 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat)
* when the bootmem allocator frees pages into the buddy system.
* And all highmem pages will be managed by the buddy system.
*/
- zone->managed_pages = freesize;
- zone_set_nid(zone, nid);
- zone->name = zone_names[j];
- zone->zone_pgdat = pgdat;
- spin_lock_init(&zone->lock);
- zone_seqlock_init(zone);
- zone_pcp_init(zone);
+ zone_init_internals(zone, j, nid, freesize);

if (!size)
continue;
@@ -6391,7 +6420,7 @@ static inline void pgdat_set_deferred_range(pg_data_t *pgdat)
static inline void pgdat_set_deferred_range(pg_data_t *pgdat) {}
#endif

-void __paginginit free_area_init_node(int nid, unsigned long *zones_size,
+void __init free_area_init_node(int nid, unsigned long *zones_size,
unsigned long node_start_pfn, unsigned long *zholes_size)
{
pg_data_t *pgdat = NODE_DATA(nid);
--
2.13.6


2018-07-30 10:19:48

by Oscar Salvador

[permalink] [raw]
Subject: [PATCH v5 3/4] mm/page_alloc: Inline function to handle CONFIG_DEFERRED_STRUCT_PAGE_INIT

From: Oscar Salvador <[email protected]>

Let us move the code between CONFIG_DEFERRED_STRUCT_PAGE_INIT
to an inline function.
Not having an ifdef in the function makes the code more readable.

Signed-off-by: Oscar Salvador <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Reviewed-by: Pavel Tatashin <[email protected]>
---
mm/page_alloc.c | 25 ++++++++++++++++---------
1 file changed, 16 insertions(+), 9 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 10b754fba5fa..4e84a17a5030 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6376,6 +6376,21 @@ static void __ref alloc_node_mem_map(struct pglist_data *pgdat)
static void __ref alloc_node_mem_map(struct pglist_data *pgdat) { }
#endif /* CONFIG_FLAT_NODE_MEM_MAP */

+#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
+static inline void pgdat_set_deferred_range(pg_data_t *pgdat)
+{
+ /*
+ * We start only with one section of pages, more pages are added as
+ * needed until the rest of deferred pages are initialized.
+ */
+ pgdat->static_init_pgcnt = min_t(unsigned long, PAGES_PER_SECTION,
+ pgdat->node_spanned_pages);
+ pgdat->first_deferred_pfn = ULONG_MAX;
+}
+#else
+static inline void pgdat_set_deferred_range(pg_data_t *pgdat) {}
+#endif
+
void __paginginit free_area_init_node(int nid, unsigned long *zones_size,
unsigned long node_start_pfn, unsigned long *zholes_size)
{
@@ -6401,16 +6416,8 @@ void __paginginit free_area_init_node(int nid, unsigned long *zones_size,
zones_size, zholes_size);

alloc_node_mem_map(pgdat);
+ pgdat_set_deferred_range(pgdat);

-#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
- /*
- * We start only with one section of pages, more pages are added as
- * needed until the rest of deferred pages are initialized.
- */
- pgdat->static_init_pgcnt = min_t(unsigned long, PAGES_PER_SECTION,
- pgdat->node_spanned_pages);
- pgdat->first_deferred_pfn = ULONG_MAX;
-#endif
free_area_init_core(pgdat);
}

--
2.13.6


2018-07-30 10:20:10

by Oscar Salvador

[permalink] [raw]
Subject: [PATCH v5 1/4] mm/page_alloc: Move ifdefery out of free_area_init_core

From: Oscar Salvador <[email protected]>

Moving the #ifdefs out of the function makes it easier to follow.

Signed-off-by: Oscar Salvador <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Reviewed-by: Pavel Tatashin <[email protected]>
---
mm/page_alloc.c | 50 +++++++++++++++++++++++++++++++++++++-------------
1 file changed, 37 insertions(+), 13 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index e357189cd24a..8a73305f7c55 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6206,6 +6206,37 @@ static unsigned long __paginginit calc_memmap_size(unsigned long spanned_pages,
return PAGE_ALIGN(pages * sizeof(struct page)) >> PAGE_SHIFT;
}

+#ifdef CONFIG_NUMA_BALANCING
+static void pgdat_init_numabalancing(struct pglist_data *pgdat)
+{
+ spin_lock_init(&pgdat->numabalancing_migrate_lock);
+ pgdat->numabalancing_migrate_nr_pages = 0;
+ pgdat->numabalancing_migrate_next_window = jiffies;
+}
+#else
+static void pgdat_init_numabalancing(struct pglist_data *pgdat) {}
+#endif
+
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+static void pgdat_init_split_queue(struct pglist_data *pgdat)
+{
+ spin_lock_init(&pgdat->split_queue_lock);
+ INIT_LIST_HEAD(&pgdat->split_queue);
+ pgdat->split_queue_len = 0;
+}
+#else
+static void pgdat_init_split_queue(struct pglist_data *pgdat) {}
+#endif
+
+#ifdef CONFIG_COMPACTION
+static void pgdat_init_kcompactd(struct pglist_data *pgdat)
+{
+ init_waitqueue_head(&pgdat->kcompactd_wait);
+}
+#else
+static void pgdat_init_kcompactd(struct pglist_data *pgdat) {}
+#endif
+
/*
* Set up the zone data structures:
* - mark all pages reserved
@@ -6220,21 +6251,14 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat)
int nid = pgdat->node_id;

pgdat_resize_init(pgdat);
-#ifdef CONFIG_NUMA_BALANCING
- spin_lock_init(&pgdat->numabalancing_migrate_lock);
- pgdat->numabalancing_migrate_nr_pages = 0;
- pgdat->numabalancing_migrate_next_window = jiffies;
-#endif
-#ifdef CONFIG_TRANSPARENT_HUGEPAGE
- spin_lock_init(&pgdat->split_queue_lock);
- INIT_LIST_HEAD(&pgdat->split_queue);
- pgdat->split_queue_len = 0;
-#endif
+
+ pgdat_init_numabalancing(pgdat);
+ pgdat_init_split_queue(pgdat);
+ pgdat_init_kcompactd(pgdat);
+
init_waitqueue_head(&pgdat->kswapd_wait);
init_waitqueue_head(&pgdat->pfmemalloc_wait);
-#ifdef CONFIG_COMPACTION
- init_waitqueue_head(&pgdat->kcompactd_wait);
-#endif
+
pgdat_page_ext_init(pgdat);
spin_lock_init(&pgdat->lru_lock);
lruvec_init(node_lruvec(pgdat));
--
2.13.6


2018-07-31 10:18:56

by Oscar Salvador

[permalink] [raw]
Subject: Re: [PATCH v5 4/4] mm/page_alloc: Introduce free_area_init_core_hotplug

On Mon, Jul 30, 2018 at 12:17:57PM +0200, [email protected] wrote:
> From: Oscar Salvador <[email protected]>
...
> Also, since free_area_init_core/free_area_init_node will now only get called during early init, let us replace
> __paginginit with __init, so their code gets freed up.
>
> Signed-off-by: Oscar Salvador <[email protected]>
> Reviewed-by: Pavel Tatashin <[email protected]>

Andrew, could you please fold the following cleanup into this patch?
thanks

Pavel, since this has your Reviewed-by, are you ok with the following on top?

set_pageblock_order() is only called from free_area_init_core() and sparse_init().
sparse_init() is only called during early init, and the same applies for free_area_init_core()
from now on (with this patchset)

The same goes for calc_memmap_size().

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index bb11cc23b862..c1cf088607c5 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6142,7 +6142,7 @@ static inline void setup_usemap(struct pglist_data *pgdat, struct zone *zone,
#ifdef CONFIG_HUGETLB_PAGE_SIZE_VARIABLE

/* Initialise the number of pages represented by NR_PAGEBLOCK_BITS */
-void __paginginit set_pageblock_order(void)
+void __init set_pageblock_order(void)
{
unsigned int order;

@@ -6170,13 +6170,13 @@ void __paginginit set_pageblock_order(void)
* include/linux/pageblock-flags.h for the values of pageblock_order based on
* the kernel config
*/
-void __paginginit set_pageblock_order(void)
+void __init set_pageblock_order(void)
{
}

#endif /* CONFIG_HUGETLB_PAGE_SIZE_VARIABLE */

-static unsigned long __paginginit calc_memmap_size(unsigned long spanned_pages,
+static unsigned long __init calc_memmap_size(unsigned long spanned_pages,
unsigned long present_pages)
{
unsigned long pages = spanned_pages;
@@ -6448,7 +6448,7 @@ void __init free_area_init_node(int nid, unsigned long *zones_size,
* may be accessed (for example page_to_pfn() on some configuration accesses
* flags). We must explicitly zero those struct pages.
*/
-void __paginginit zero_resv_unavail(void)
+void __init zero_resv_unavail(void)
{
phys_addr_t start, end;
unsigned long pfn;

Thanks
--
Oscar Salvador
SUSE L3

2018-07-31 16:15:17

by Pavel Tatashin

[permalink] [raw]
Subject: Re: [PATCH v5 4/4] mm/page_alloc: Introduce free_area_init_core_hotplug

On Tue, Jul 31, 2018 at 6:17 AM Oscar Salvador
<[email protected]> wrote:
>
> On Mon, Jul 30, 2018 at 12:17:57PM +0200, [email protected] wrote:
> > From: Oscar Salvador <[email protected]>
> ...
> > Also, since free_area_init_core/free_area_init_node will now only get called during early init, let us replace
> > __paginginit with __init, so their code gets freed up.
> >
> > Signed-off-by: Oscar Salvador <[email protected]>
> > Reviewed-by: Pavel Tatashin <[email protected]>
>
> Andrew, could you please fold the following cleanup into this patch?
> thanks
>
> Pavel, since this has your Reviewed-by, are you ok with the following on top?

Yes, Looks good to me.

Thank you,
Pavel

2018-08-01 11:48:35

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH v5 4/4] mm/page_alloc: Introduce free_area_init_core_hotplug

On Mon 30-07-18 12:17:57, [email protected] wrote:
> From: Oscar Salvador <[email protected]>
>
> Currently, whenever a new node is created/re-used from the memhotplug path,
> we call free_area_init_node()->free_area_init_core().
> But there is some code that we do not really need to run when we are coming
> from such path.
>
> free_area_init_core() performs the following actions:
>
> 1) Initializes pgdat internals, such as spinlock, waitqueues and more.
> 2) Account # nr_all_pages and # nr_kernel_pages. These values are used later on
> when creating hash tables.
> 3) Account number of managed_pages per zone, substracting dma_reserved and memmap pages.
> 4) Initializes some fields of the zone structure data
> 5) Calls init_currently_empty_zone to initialize all the freelists
> 6) Calls memmap_init to initialize all pages belonging to certain zone
>
> When called from memhotplug path, free_area_init_core() only performs actions #1 and #4.
>
> Action #2 is pointless as the zones do not have any pages since either the node was freed,
> or we are re-using it, eitherway all zones belonging to this node should have 0 pages.
> For the same reason, action #3 results always in manages_pages being 0.
>
> Action #5 and #6 are performed later on when onlining the pages:
> online_pages()->move_pfn_range_to_zone()->init_currently_empty_zone()
> online_pages()->move_pfn_range_to_zone()->memmap_init_zone()
>
> This patch does two things:
>
> First, moves the node/zone initializtion to their own function, so it allows us
> to create a small version of free_area_init_core, where we only perform:
>
> 1) Initialization of pgdat internals, such as spinlock, waitqueues and more
> 4) Initialization of some fields of the zone structure data
>
> These two functions are: pgdat_init_internals() and zone_init_internals().
>
> The second thing this patch does, is to introduce free_area_init_core_hotplug(),
> the memhotplug version of free_area_init_core():
>
> Currently, we call free_area_init_node() from the memhotplug path.
> In there, we set some pgdat's fields, and call calculate_node_totalpages().
> calculate_node_totalpages() calculates the # of pages the node has.
>
> Since the node is either new, or we are re-using it, the zones belonging to
> this node should not have any pages, so there is no point to calculate this now.
>
> Actually, we re-set these values to 0 later on with the calls to:
>
> reset_node_managed_pages()
> reset_node_present_pages()
>
> The # of pages per node and the # of pages per zone will be calculated when
> onlining the pages:
>
> online_pages()->move_pfn_range()->move_pfn_range_to_zone()->resize_zone_range()
> online_pages()->move_pfn_range()->move_pfn_range_to_zone()->resize_pgdat_range()
>
> Also, since free_area_init_core/free_area_init_node will now only get called during early init, let us replace
> __paginginit with __init, so their code gets freed up.

The split up makes sense to me. Sections attributes can be handled on
top. Btw. free_area_init_core_hotplug declaration could have gone into
include/linux/memory_hotplug.h to save the ifdef

> Signed-off-by: Oscar Salvador <[email protected]>
> Reviewed-by: Pavel Tatashin <[email protected]>

Acked-by: Michal Hocko <[email protected]>

> ---
> include/linux/mm.h | 6 ++++-
> mm/memory_hotplug.c | 16 ++++--------
> mm/page_alloc.c | 71 +++++++++++++++++++++++++++++++++++++----------------
> 3 files changed, 60 insertions(+), 33 deletions(-)
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 6954ad183159..af3222785347 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1998,10 +1998,14 @@ static inline spinlock_t *pud_lock(struct mm_struct *mm, pud_t *pud)
>
> extern void __init pagecache_init(void);
> extern void free_area_init(unsigned long * zones_size);
> -extern void free_area_init_node(int nid, unsigned long * zones_size,
> +extern void __init free_area_init_node(int nid, unsigned long * zones_size,
> unsigned long zone_start_pfn, unsigned long *zholes_size);
> extern void free_initmem(void);
>
> +#ifdef CONFIG_MEMORY_HOTPLUG
> +extern void __ref free_area_init_core_hotplug(int nid);
> +#endif
> +
> /*
> * Free reserved pages within range [PAGE_ALIGN(start), end & PAGE_MASK)
> * into the buddy system. The freed pages will be poisoned with pattern
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 4eb6e824a80c..9eea6e809a4e 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -982,8 +982,6 @@ static void reset_node_present_pages(pg_data_t *pgdat)
> static pg_data_t __ref *hotadd_new_pgdat(int nid, u64 start)
> {
> struct pglist_data *pgdat;
> - unsigned long zones_size[MAX_NR_ZONES] = {0};
> - unsigned long zholes_size[MAX_NR_ZONES] = {0};
> unsigned long start_pfn = PFN_DOWN(start);
>
> pgdat = NODE_DATA(nid);
> @@ -1006,8 +1004,11 @@ static pg_data_t __ref *hotadd_new_pgdat(int nid, u64 start)
>
> /* we can use NODE_DATA(nid) from here */
>
> + pgdat->node_id = nid;
> + pgdat->node_start_pfn = start_pfn;
> +
> /* init node's zones as empty zones, we don't have any present pages.*/
> - free_area_init_node(nid, zones_size, start_pfn, zholes_size);
> + free_area_init_core_hotplug(nid);
> pgdat->per_cpu_nodestats = alloc_percpu(struct per_cpu_nodestat);
>
> /*
> @@ -1017,18 +1018,11 @@ static pg_data_t __ref *hotadd_new_pgdat(int nid, u64 start)
> build_all_zonelists(pgdat);
>
> /*
> - * zone->managed_pages is set to an approximate value in
> - * free_area_init_core(), which will cause
> - * /sys/device/system/node/nodeX/meminfo has wrong data.
> - * So reset it to 0 before any memory is onlined.
> - */
> - reset_node_managed_pages(pgdat);
> -
> - /*
> * When memory is hot-added, all the memory is in offline state. So
> * clear all zones' present_pages because they will be updated in
> * online_pages() and offline_pages().
> */
> + reset_node_managed_pages(pgdat);
> reset_node_present_pages(pgdat);
>
> return pgdat;
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 4e84a17a5030..b2ccade42020 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -6237,21 +6237,9 @@ static void pgdat_init_kcompactd(struct pglist_data *pgdat)
> static void pgdat_init_kcompactd(struct pglist_data *pgdat) {}
> #endif
>
> -/*
> - * Set up the zone data structures:
> - * - mark all pages reserved
> - * - mark all memory queues empty
> - * - clear the memory bitmaps
> - *
> - * NOTE: pgdat should get zeroed by caller.
> - */
> -static void __paginginit free_area_init_core(struct pglist_data *pgdat)
> +static void __paginginit pgdat_init_internals(struct pglist_data *pgdat)
> {
> - enum zone_type j;
> - int nid = pgdat->node_id;
> -
> pgdat_resize_init(pgdat);
> -
> pgdat_init_numabalancing(pgdat);
> pgdat_init_split_queue(pgdat);
> pgdat_init_kcompactd(pgdat);
> @@ -6262,7 +6250,54 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat)
> pgdat_page_ext_init(pgdat);
> spin_lock_init(&pgdat->lru_lock);
> lruvec_init(node_lruvec(pgdat));
> +}
> +
> +static void __paginginit zone_init_internals(struct zone *zone, enum zone_type idx, int nid,
> + unsigned long remaining_pages)
> +{
> + zone->managed_pages = remaining_pages;
> + zone_set_nid(zone, nid);
> + zone->name = zone_names[idx];
> + zone->zone_pgdat = NODE_DATA(nid);
> + spin_lock_init(&zone->lock);
> + zone_seqlock_init(zone);
> + zone_pcp_init(zone);
> +}
> +
> +/*
> + * Set up the zone data structures
> + * - init pgdat internals
> + * - init all zones belonging to this node
> + *
> + * NOTE: this function is only called during memory hotplug
> + */
> +#ifdef CONFIG_MEMORY_HOTPLUG
> +void __ref free_area_init_core_hotplug(int nid)
> +{
> + enum zone_type z;
> + pg_data_t *pgdat = NODE_DATA(nid);
> +
> + pgdat_init_internals(pgdat);
> + for (z = 0; z < MAX_NR_ZONES; z++)
> + zone_init_internals(&pgdat->node_zones[z], z, nid, 0);
> +}
> +#endif
> +
> +/*
> + * Set up the zone data structures:
> + * - mark all pages reserved
> + * - mark all memory queues empty
> + * - clear the memory bitmaps
> + *
> + * NOTE: pgdat should get zeroed by caller.
> + * NOTE: this function is only called during early init.
> + */
> +static void __init free_area_init_core(struct pglist_data *pgdat)
> +{
> + enum zone_type j;
> + int nid = pgdat->node_id;
>
> + pgdat_init_internals(pgdat);
> pgdat->per_cpu_nodestats = &boot_nodestats;
>
> for (j = 0; j < MAX_NR_ZONES; j++) {
> @@ -6310,13 +6345,7 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat)
> * when the bootmem allocator frees pages into the buddy system.
> * And all highmem pages will be managed by the buddy system.
> */
> - zone->managed_pages = freesize;
> - zone_set_nid(zone, nid);
> - zone->name = zone_names[j];
> - zone->zone_pgdat = pgdat;
> - spin_lock_init(&zone->lock);
> - zone_seqlock_init(zone);
> - zone_pcp_init(zone);
> + zone_init_internals(zone, j, nid, freesize);
>
> if (!size)
> continue;
> @@ -6391,7 +6420,7 @@ static inline void pgdat_set_deferred_range(pg_data_t *pgdat)
> static inline void pgdat_set_deferred_range(pg_data_t *pgdat) {}
> #endif
>
> -void __paginginit free_area_init_node(int nid, unsigned long *zones_size,
> +void __init free_area_init_node(int nid, unsigned long *zones_size,
> unsigned long node_start_pfn, unsigned long *zholes_size)
> {
> pg_data_t *pgdat = NODE_DATA(nid);
> --
> 2.13.6
>

--
Michal Hocko
SUSE Labs

2018-08-01 11:55:01

by Oscar Salvador

[permalink] [raw]
Subject: Re: [PATCH v5 4/4] mm/page_alloc: Introduce free_area_init_core_hotplug

On Wed, Aug 01, 2018 at 01:47:26PM +0200, Michal Hocko wrote:
>
> The split up makes sense to me. Sections attributes can be handled on
> top. Btw. free_area_init_core_hotplug declaration could have gone into
> include/linux/memory_hotplug.h to save the ifdef

You are right, I will fix this up.

>
> > Signed-off-by: Oscar Salvador <[email protected]>
> > Reviewed-by: Pavel Tatashin <[email protected]>
>
> Acked-by: Michal Hocko <[email protected]>

Thanks Michal!

Since Pavel and I agreed on putting a patch of his which removes __paginginit
into this patchset, I will send out a v6 in a few minutes.

--
Oscar Salvador
SUSE L3

2018-08-01 12:30:45

by Oscar Salvador

[permalink] [raw]
Subject: Re: [PATCH v5 4/4] mm/page_alloc: Introduce free_area_init_core_hotplug

On Tue, Jul 31, 2018 at 12:17:52PM +0200, Oscar Salvador wrote:
> On Mon, Jul 30, 2018 at 12:17:57PM +0200, [email protected] wrote:
> > From: Oscar Salvador <[email protected]>
> ...
> > Also, since free_area_init_core/free_area_init_node will now only get called during early init, let us replace
> > __paginginit with __init, so their code gets freed up.
> >
> > Signed-off-by: Oscar Salvador <[email protected]>
> > Reviewed-by: Pavel Tatashin <[email protected]>
>
> Andrew, could you please fold the following cleanup into this patch?
> thanks

Hi Andrew,

I sent v6, which already includes that cleanup-fixup plus another patch from Pavel and
an Acked-by from Michal Hocko.
So if it looks fine to you, feel free to replace it with the version that is sitting right now in -mm (v5)

Thanks
--
Oscar Salvador
SUSE L3