2018-07-26 19:36:50

by Pavel Tatashin

[permalink] [raw]
Subject: [PATCH v2 0/3] memmap_init_zone improvements

Changelog:

v1 - v2
- Merged with linux-next
- Removed inline from functions that have static variables.
- Added a comment to defer_init() that it is called early in
boot and therefore no need to protect static.

Three small patches that improve memmap_init_zone() and also fix a small
deferred pages bug.

The improvements include reducing number of ifdefs and making code more
modular.

The bug is the deferred_init_update() should be called after the mirrored
memory skipping is taken into account.

Pavel Tatashin (3):
mm: make memmap_init a proper function
mm: calculate deferred pages after skipping mirrored memory
mm: move mirrored memory specific code outside of memmap_init_zone

arch/ia64/include/asm/pgtable.h | 1 -
mm/page_alloc.c | 124 ++++++++++++++++----------------
2 files changed, 62 insertions(+), 63 deletions(-)

--
2.18.0



2018-07-26 19:36:55

by Pavel Tatashin

[permalink] [raw]
Subject: [PATCH v2 2/3] mm: calculate deferred pages after skipping mirrored memory

update_defer_init() should be called only when struct page is about to be
initialized. Because it counts number of initialized struct pages, but
there we may skip struct pages if there is some mirrored memory.

So move, update_defer_init() after checking for mirrored memory.

Also, rename update_defer_init() to defer_init() and reverse the return
boolean to emphasize that this is a boolean function, that tells that the
reset of memmap initialization should be deferred.

Make this function self-contained: do not pass number of already
initialized pages in this zone by using static counters.

Signed-off-by: Pavel Tatashin <[email protected]>
---
mm/page_alloc.c | 45 +++++++++++++++++++++++++--------------------
1 file changed, 25 insertions(+), 20 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 6796dacd46ac..4946c73e549b 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -306,24 +306,33 @@ static inline bool __meminit early_page_uninitialised(unsigned long pfn)
}

/*
- * Returns false when the remaining initialisation should be deferred until
+ * Returns true when the remaining initialisation should be deferred until
* later in the boot cycle when it can be parallelised.
*/
-static inline bool update_defer_init(pg_data_t *pgdat,
- unsigned long pfn, unsigned long zone_end,
- unsigned long *nr_initialised)
+static bool __meminit
+defer_init(int nid, unsigned long pfn, unsigned long end_pfn)
{
+ static unsigned long prev_end_pfn, nr_initialised;
+
+ /*
+ * prev_end_pfn static that contains the end of previous zone
+ * No need to protect because called very early in boot before smp_init.
+ */
+ if (prev_end_pfn != end_pfn) {
+ prev_end_pfn = end_pfn;
+ nr_initialised = 0;
+ }
+
/* Always populate low zones for address-constrained allocations */
- if (zone_end < pgdat_end_pfn(pgdat))
- return true;
- (*nr_initialised)++;
- if ((*nr_initialised > pgdat->static_init_pgcnt) &&
- (pfn & (PAGES_PER_SECTION - 1)) == 0) {
- pgdat->first_deferred_pfn = pfn;
+ if (end_pfn < pgdat_end_pfn(NODE_DATA(nid)))
return false;
+ nr_initialised++;
+ if ((nr_initialised > NODE_DATA(nid)->static_init_pgcnt) &&
+ (pfn & (PAGES_PER_SECTION - 1)) == 0) {
+ NODE_DATA(nid)->first_deferred_pfn = pfn;
+ return true;
}
-
- return true;
+ return false;
}
#else
static inline bool early_page_uninitialised(unsigned long pfn)
@@ -331,11 +340,9 @@ static inline bool early_page_uninitialised(unsigned long pfn)
return false;
}

-static inline bool update_defer_init(pg_data_t *pgdat,
- unsigned long pfn, unsigned long zone_end,
- unsigned long *nr_initialised)
+static inline bool defer_init(int nid, unsigned long pfn, unsigned long end_pfn)
{
- return true;
+ return false;
}
#endif

@@ -5459,9 +5466,7 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
struct vmem_altmap *altmap)
{
unsigned long end_pfn = start_pfn + size;
- pg_data_t *pgdat = NODE_DATA(nid);
unsigned long pfn;
- unsigned long nr_initialised = 0;
struct page *page;
#ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
struct memblock_region *r = NULL, *tmp;
@@ -5492,8 +5497,6 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,

if (!early_pfn_in_nid(pfn, nid))
continue;
- if (!update_defer_init(pgdat, pfn, end_pfn, &nr_initialised))
- break;

#ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
/*
@@ -5516,6 +5519,8 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
}
}
#endif
+ if (defer_init(nid, pfn, end_pfn))
+ break;

not_early:
page = pfn_to_page(pfn);
--
2.18.0


2018-07-26 19:37:09

by Pavel Tatashin

[permalink] [raw]
Subject: [PATCH v2 3/3] mm: move mirrored memory specific code outside of memmap_init_zone

memmap_init_zone, is getting complex, because it is called from different
contexts: hotplug, and during boot, and also because it must handle some
architecture quirks. One of them is mirroed memory.

Move the code that decides whether to skip mirrored memory outside of
memmap_init_zone, into a separate function.

Signed-off-by: Pavel Tatashin <[email protected]>
Reviewed-by: Oscar Salvador <[email protected]>
---
mm/page_alloc.c | 74 +++++++++++++++++++++++--------------------------
1 file changed, 34 insertions(+), 40 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 4946c73e549b..02e4b84038f8 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5456,6 +5456,30 @@ void __ref build_all_zonelists(pg_data_t *pgdat)
#endif
}

+/* If zone is ZONE_MOVABLE but memory is mirrored, it is an overlapped init */
+static bool __meminit
+overlap_memmap_init(unsigned long zone, unsigned long *pfn)
+{
+#ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
+ static struct memblock_region *r;
+
+ if (mirrored_kernelcore && zone == ZONE_MOVABLE) {
+ if (!r || *pfn >= memblock_region_memory_end_pfn(r)) {
+ for_each_memblock(memory, r) {
+ if (*pfn < memblock_region_memory_end_pfn(r))
+ break;
+ }
+ }
+ if (*pfn >= memblock_region_memory_base_pfn(r) &&
+ memblock_is_mirror(r)) {
+ *pfn = memblock_region_memory_end_pfn(r);
+ return true;
+ }
+ }
+#endif
+ return false;
+}
+
/*
* Initially all pages are reserved - free ones are freed
* up by free_all_bootmem() once the early boot process is
@@ -5465,12 +5489,8 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
unsigned long start_pfn, enum memmap_context context,
struct vmem_altmap *altmap)
{
- unsigned long end_pfn = start_pfn + size;
- unsigned long pfn;
+ unsigned long pfn, end_pfn = start_pfn + size;
struct page *page;
-#ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
- struct memblock_region *r = NULL, *tmp;
-#endif

if (highest_memmap_pfn < end_pfn - 1)
highest_memmap_pfn = end_pfn - 1;
@@ -5487,42 +5507,19 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
* There can be holes in boot-time mem_map[]s handed to this
* function. They do not exist on hotplugged memory.
*/
- if (context != MEMMAP_EARLY)
- goto not_early;
-
- if (!early_pfn_valid(pfn)) {
- pfn = next_valid_pfn(pfn) - 1;
- continue;
- }
-
- if (!early_pfn_in_nid(pfn, nid))
- continue;
-
-#ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
- /*
- * Check given memblock attribute by firmware which can affect
- * kernel memory layout. If zone==ZONE_MOVABLE but memory is
- * mirrored, it's an overlapped memmap init. skip it.
- */
- if (mirrored_kernelcore && zone == ZONE_MOVABLE) {
- if (!r || pfn >= memblock_region_memory_end_pfn(r)) {
- for_each_memblock(memory, tmp)
- if (pfn < memblock_region_memory_end_pfn(tmp))
- break;
- r = tmp;
- }
- if (pfn >= memblock_region_memory_base_pfn(r) &&
- memblock_is_mirror(r)) {
- /* already initialized as NORMAL */
- pfn = memblock_region_memory_end_pfn(r);
+ if (context == MEMMAP_EARLY) {
+ if (!early_pfn_valid(pfn)) {
+ pfn = next_valid_pfn(pfn) - 1;
continue;
}
+ if (!early_pfn_in_nid(pfn, nid))
+ continue;
+ if (overlap_memmap_init(zone, &pfn))
+ continue;
+ if (defer_init(nid, pfn, end_pfn))
+ break;
}
-#endif
- if (defer_init(nid, pfn, end_pfn))
- break;

-not_early:
page = pfn_to_page(pfn);
__init_single_page(page, pfn, zone, nid);
if (context == MEMMAP_HOTPLUG)
@@ -5539,9 +5536,6 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
* can be created for invalid pages (for alignment)
* check here not to call set_pageblock_migratetype() against
* pfn out of zone.
- *
- * Please note that MEMMAP_HOTPLUG path doesn't clear memmap
- * because this is done early in sparse_add_one_section
*/
if (!(pfn & (pageblock_nr_pages - 1))) {
set_pageblock_migratetype(page, MIGRATE_MOVABLE);
--
2.18.0


2018-07-26 19:37:47

by Pavel Tatashin

[permalink] [raw]
Subject: [PATCH v2 1/3] mm: make memmap_init a proper function

memmap_init is sometimes a macro sometimes a function based on
__HAVE_ARCH_MEMMAP_INIT. It is only a function on ia64. Make
memmap_init a weak function instead, and let ia64 redefine it.

Signed-off-by: Pavel Tatashin <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Reviewed-by: Oscar Salvador <[email protected]>
---
arch/ia64/include/asm/pgtable.h | 1 -
mm/page_alloc.c | 9 +++++----
2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/ia64/include/asm/pgtable.h b/arch/ia64/include/asm/pgtable.h
index 165827774bea..b1e7468eb65a 100644
--- a/arch/ia64/include/asm/pgtable.h
+++ b/arch/ia64/include/asm/pgtable.h
@@ -544,7 +544,6 @@ extern struct page *zero_page_memmap_ptr;

# ifdef CONFIG_VIRTUAL_MEM_MAP
/* arch mem_map init routine is needed due to holes in a virtual mem_map */
-# define __HAVE_ARCH_MEMMAP_INIT
extern void memmap_init (unsigned long size, int nid, unsigned long zone,
unsigned long start_pfn);
# endif /* CONFIG_VIRTUAL_MEM_MAP */
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 2dec8056a091..6796dacd46ac 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5554,10 +5554,11 @@ static void __meminit zone_init_free_lists(struct zone *zone)
}
}

-#ifndef __HAVE_ARCH_MEMMAP_INIT
-#define memmap_init(size, nid, zone, start_pfn) \
- memmap_init_zone((size), (nid), (zone), (start_pfn), MEMMAP_EARLY, NULL)
-#endif
+void __meminit __weak memmap_init(unsigned long size, int nid,
+ unsigned long zone, unsigned long start_pfn)
+{
+ memmap_init_zone(size, nid, zone, start_pfn, MEMMAP_EARLY, NULL);
+}

static int zone_batchsize(struct zone *zone)
{
--
2.18.0


2018-07-27 11:59:09

by Oscar Salvador

[permalink] [raw]
Subject: Re: [PATCH v2 2/3] mm: calculate deferred pages after skipping mirrored memory

On Thu, Jul 26, 2018 at 03:35:08PM -0400, Pavel Tatashin wrote:
> update_defer_init() should be called only when struct page is about to be
> initialized. Because it counts number of initialized struct pages, but
> there we may skip struct pages if there is some mirrored memory.
>
> So move, update_defer_init() after checking for mirrored memory.
>
> Also, rename update_defer_init() to defer_init() and reverse the return
> boolean to emphasize that this is a boolean function, that tells that the
> reset of memmap initialization should be deferred.
>
> Make this function self-contained: do not pass number of already
> initialized pages in this zone by using static counters.
>
> Signed-off-by: Pavel Tatashin <[email protected]>
> ---
> mm/page_alloc.c | 45 +++++++++++++++++++++++++--------------------
> 1 file changed, 25 insertions(+), 20 deletions(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 6796dacd46ac..4946c73e549b 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -306,24 +306,33 @@ static inline bool __meminit early_page_uninitialised(unsigned long pfn)
> }
>
> /*
> - * Returns false when the remaining initialisation should be deferred until
> + * Returns true when the remaining initialisation should be deferred until
> * later in the boot cycle when it can be parallelised.
> */
> -static inline bool update_defer_init(pg_data_t *pgdat,
> - unsigned long pfn, unsigned long zone_end,
> - unsigned long *nr_initialised)
> +static bool __meminit
> +defer_init(int nid, unsigned long pfn, unsigned long end_pfn)

Hi Pavel,

maybe I do not understand properly the __init/__meminit macros, but should not
"defer_init" be __init instead of __meminit?
I think that functions marked as __meminit are not freed up, right?

Besides that, this looks good to me:

Reviewed-by: Oscar Salvador <[email protected]>
--
Oscar Salvador
SUSE L3

2018-07-27 14:55:13

by Pavel Tatashin

[permalink] [raw]
Subject: Re: [PATCH v2 2/3] mm: calculate deferred pages after skipping mirrored memory

unsigned long *nr_initialised)
> > +static bool __meminit
> > +defer_init(int nid, unsigned long pfn, unsigned long end_pfn)
>
> Hi Pavel,
>
> maybe I do not understand properly the __init/__meminit macros, but should not
> "defer_init" be __init instead of __meminit?
> I think that functions marked as __meminit are not freed up, right?

Not exactly. As I understand: __meminit is the same as __init when
CONFIG_MEMORY_HOTPLUG=n. But, when memory hotplug is configured,
__meminit is not freed, because code that adds memory is shared
between boot and hotplug. In this case defer_init() is called only
during boot, and could be __init, but it is called from
memmap_init_zone() which is __meminit and thus section mismatch would
happen.

We could split memmap_init_zone() into two functions: boot and hotplug
variants, or we could use __ref, but I do not think any of that is
really needed. Keeping defer_init() in __meminit is OK, it does not
take that much memory.

>
> Reviewed-by: Oscar Salvador <[email protected]>

Thank you,
Pavel

2018-07-27 15:08:37

by Oscar Salvador

[permalink] [raw]
Subject: Re: [PATCH v2 2/3] mm: calculate deferred pages after skipping mirrored memory

On Fri, Jul 27, 2018 at 10:53:24AM -0400, Pavel Tatashin wrote:
> unsigned long *nr_initialised)
> > > +static bool __meminit
> > > +defer_init(int nid, unsigned long pfn, unsigned long end_pfn)
> >
> > Hi Pavel,
> >
> > maybe I do not understand properly the __init/__meminit macros, but should not
> > "defer_init" be __init instead of __meminit?
> > I think that functions marked as __meminit are not freed up, right?
>
> Not exactly. As I understand: __meminit is the same as __init when
> CONFIG_MEMORY_HOTPLUG=n. But, when memory hotplug is configured,
> __meminit is not freed, because code that adds memory is shared
> between boot and hotplug. In this case defer_init() is called only
> during boot, and could be __init, but it is called from
> memmap_init_zone() which is __meminit and thus section mismatch would
> happen.

Oh yes, I did not think about memmap_init_zone(), you are right.
Then, nothing to argue about ;-).

Thanks
--
Oscar Salvador
SUSE L3