2018-07-24 23:57:06

by Pavel Tatashin

[permalink] [raw]
Subject: [PATCH 0/3] memmap_init_zone improvements

Three small patches that improve memmap_init_zone() and also fix a small
deferred pages bug.

The improvements include reducing number of ifdefs and making code more
modular.

The bug is the deferred_init_update() should be called after the mirrored
memory skipping is taken into account.

Pavel Tatashin (3):
mm: make memmap_init a proper function
mm: calculate deferred pages after skipping mirrored memory
mm: move mirrored memory specific code outside of memmap_init_zone

arch/ia64/include/asm/pgtable.h | 1 -
mm/page_alloc.c | 115 +++++++++++++++-----------------
2 files changed, 55 insertions(+), 61 deletions(-)

--
2.18.0



2018-07-24 23:57:13

by Pavel Tatashin

[permalink] [raw]
Subject: [PATCH 3/3] mm: move mirrored memory specific code outside of memmap_init_zone

memmap_init_zone, is getting complex, because it is called from different
contexts: hotplug, and during boot, and also because it must handle some
architecture quirks. One of them is mirroed memory.

Move the code that decides whether to skip mirrored memory outside of
memmap_init_zone, into a separate function.

Signed-off-by: Pavel Tatashin <[email protected]>
---
mm/page_alloc.c | 70 ++++++++++++++++++++++---------------------------
1 file changed, 32 insertions(+), 38 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 86c678cec6bd..d7dce4ccefd5 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5454,6 +5454,29 @@ void __ref build_all_zonelists(pg_data_t *pgdat)
#endif
}

+/* If zone is ZONE_MOVABLE but memory is mirrored, it is an overlapped init */
+static inline bool overlap_memmap_init(unsigned long zone, unsigned long *pfn)
+{
+#ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
+ static struct memblock_region *r;
+
+ if (mirrored_kernelcore && zone == ZONE_MOVABLE) {
+ if (!r || *pfn >= memblock_region_memory_end_pfn(r)) {
+ for_each_memblock(memory, r) {
+ if (*pfn < memblock_region_memory_end_pfn(r))
+ break;
+ }
+ }
+ if (*pfn >= memblock_region_memory_base_pfn(r) &&
+ memblock_is_mirror(r)) {
+ *pfn = memblock_region_memory_end_pfn(r);
+ return true;
+ }
+ }
+#endif
+ return false;
+}
+
/*
* Initially all pages are reserved - free ones are freed
* up by free_all_bootmem() once the early boot process is
@@ -5463,12 +5486,8 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
unsigned long start_pfn, enum memmap_context context,
struct vmem_altmap *altmap)
{
- unsigned long end_pfn = start_pfn + size;
- unsigned long pfn;
+ unsigned long pfn, end_pfn = start_pfn + size;
struct page *page;
-#ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
- struct memblock_region *r = NULL, *tmp;
-#endif

if (highest_memmap_pfn < end_pfn - 1)
highest_memmap_pfn = end_pfn - 1;
@@ -5485,39 +5504,17 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
* There can be holes in boot-time mem_map[]s handed to this
* function. They do not exist on hotplugged memory.
*/
- if (context != MEMMAP_EARLY)
- goto not_early;
-
- if (!early_pfn_valid(pfn))
- continue;
- if (!early_pfn_in_nid(pfn, nid))
- continue;
-
-#ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
- /*
- * Check given memblock attribute by firmware which can affect
- * kernel memory layout. If zone==ZONE_MOVABLE but memory is
- * mirrored, it's an overlapped memmap init. skip it.
- */
- if (mirrored_kernelcore && zone == ZONE_MOVABLE) {
- if (!r || pfn >= memblock_region_memory_end_pfn(r)) {
- for_each_memblock(memory, tmp)
- if (pfn < memblock_region_memory_end_pfn(tmp))
- break;
- r = tmp;
- }
- if (pfn >= memblock_region_memory_base_pfn(r) &&
- memblock_is_mirror(r)) {
- /* already initialized as NORMAL */
- pfn = memblock_region_memory_end_pfn(r);
+ if (context == MEMMAP_EARLY) {
+ if (!early_pfn_valid(pfn))
continue;
- }
+ if (!early_pfn_in_nid(pfn, nid))
+ continue;
+ if (overlap_memmap_init(zone, &pfn))
+ continue;
+ if (defer_init(nid, pfn, end_pfn))
+ break;
}
-#endif
- if (defer_init(nid, pfn, end_pfn))
- break;

-not_early:
page = pfn_to_page(pfn);
__init_single_page(page, pfn, zone, nid);
if (context == MEMMAP_HOTPLUG)
@@ -5534,9 +5531,6 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
* can be created for invalid pages (for alignment)
* check here not to call set_pageblock_migratetype() against
* pfn out of zone.
- *
- * Please note that MEMMAP_HOTPLUG path doesn't clear memmap
- * because this is done early in sparse_add_one_section
*/
if (!(pfn & (pageblock_nr_pages - 1))) {
set_pageblock_migratetype(page, MIGRATE_MOVABLE);
--
2.18.0


2018-07-25 00:27:27

by Pavel Tatashin

[permalink] [raw]
Subject: [PATCH 2/3] mm: calculate deferred pages after skipping mirrored memory

update_defer_init() should be called only when struct page is about to be
initialized. Because it counts number of initialized struct pages, but
there we may skip struct pages if there is some mirrored memory.

So move, update_defer_init() after checking for mirrored memory.

Also, rename update_defer_init() to defer_init() and reverse the return
boolean to emphasize that this is a boolean function, that tells that the
reset of memmap initialization should be deferred.

Make this function self-contained: do not pass number of already
initialized pages in this zone by using static counters.

Signed-off-by: Pavel Tatashin <[email protected]>
---
mm/page_alloc.c | 40 ++++++++++++++++++++--------------------
1 file changed, 20 insertions(+), 20 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index cea749b26394..86c678cec6bd 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -306,24 +306,28 @@ static inline bool __meminit early_page_uninitialised(unsigned long pfn)
}

/*
- * Returns false when the remaining initialisation should be deferred until
+ * Returns true when the remaining initialisation should be deferred until
* later in the boot cycle when it can be parallelised.
*/
-static inline bool update_defer_init(pg_data_t *pgdat,
- unsigned long pfn, unsigned long zone_end,
- unsigned long *nr_initialised)
+static inline bool defer_init(int nid, unsigned long pfn, unsigned long end_pfn)
{
+ static unsigned long prev_end_pfn, nr_initialised;
+
+ if (prev_end_pfn != end_pfn) {
+ prev_end_pfn = end_pfn;
+ nr_initialised = 0;
+ }
+
/* Always populate low zones for address-constrained allocations */
- if (zone_end < pgdat_end_pfn(pgdat))
- return true;
- (*nr_initialised)++;
- if ((*nr_initialised > pgdat->static_init_pgcnt) &&
- (pfn & (PAGES_PER_SECTION - 1)) == 0) {
- pgdat->first_deferred_pfn = pfn;
+ if (end_pfn < pgdat_end_pfn(NODE_DATA(nid)))
return false;
+ nr_initialised++;
+ if ((nr_initialised > NODE_DATA(nid)->static_init_pgcnt) &&
+ (pfn & (PAGES_PER_SECTION - 1)) == 0) {
+ NODE_DATA(nid)->first_deferred_pfn = pfn;
+ return true;
}
-
- return true;
+ return false;
}
#else
static inline bool early_page_uninitialised(unsigned long pfn)
@@ -331,11 +335,9 @@ static inline bool early_page_uninitialised(unsigned long pfn)
return false;
}

-static inline bool update_defer_init(pg_data_t *pgdat,
- unsigned long pfn, unsigned long zone_end,
- unsigned long *nr_initialised)
+static inline bool defer_init(int nid, unsigned long pfn, unsigned long end_pfn)
{
- return true;
+ return false;
}
#endif

@@ -5462,9 +5464,7 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
struct vmem_altmap *altmap)
{
unsigned long end_pfn = start_pfn + size;
- pg_data_t *pgdat = NODE_DATA(nid);
unsigned long pfn;
- unsigned long nr_initialised = 0;
struct page *page;
#ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
struct memblock_region *r = NULL, *tmp;
@@ -5492,8 +5492,6 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
continue;
if (!early_pfn_in_nid(pfn, nid))
continue;
- if (!update_defer_init(pgdat, pfn, end_pfn, &nr_initialised))
- break;

#ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
/*
@@ -5516,6 +5514,8 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
}
}
#endif
+ if (defer_init(nid, pfn, end_pfn))
+ break;

not_early:
page = pfn_to_page(pfn);
--
2.18.0


2018-07-25 00:27:28

by Pavel Tatashin

[permalink] [raw]
Subject: [PATCH 1/3] mm: make memmap_init a proper function

memmap_init is sometimes a macro sometimes a function based on
__HAVE_ARCH_MEMMAP_INIT. It is only a function on ia64. Make
memmap_init a weak function instead, and let ia64 redefine it.

Signed-off-by: Pavel Tatashin <[email protected]>
---
arch/ia64/include/asm/pgtable.h | 1 -
mm/page_alloc.c | 9 +++++----
2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/ia64/include/asm/pgtable.h b/arch/ia64/include/asm/pgtable.h
index 165827774bea..b1e7468eb65a 100644
--- a/arch/ia64/include/asm/pgtable.h
+++ b/arch/ia64/include/asm/pgtable.h
@@ -544,7 +544,6 @@ extern struct page *zero_page_memmap_ptr;

# ifdef CONFIG_VIRTUAL_MEM_MAP
/* arch mem_map init routine is needed due to holes in a virtual mem_map */
-# define __HAVE_ARCH_MEMMAP_INIT
extern void memmap_init (unsigned long size, int nid, unsigned long zone,
unsigned long start_pfn);
# endif /* CONFIG_VIRTUAL_MEM_MAP */
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index a790ef4be74e..cea749b26394 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5554,10 +5554,11 @@ static void __meminit zone_init_free_lists(struct zone *zone)
}
}

-#ifndef __HAVE_ARCH_MEMMAP_INIT
-#define memmap_init(size, nid, zone, start_pfn) \
- memmap_init_zone((size), (nid), (zone), (start_pfn), MEMMAP_EARLY, NULL)
-#endif
+void __meminit __weak memmap_init(unsigned long size, int nid,
+ unsigned long zone, unsigned long start_pfn)
+{
+ memmap_init_zone(size, nid, zone, start_pfn, MEMMAP_EARLY, NULL);
+}

static int zone_batchsize(struct zone *zone)
{
--
2.18.0


2018-07-25 01:13:28

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 2/3] mm: calculate deferred pages after skipping mirrored memory

On Tue, 24 Jul 2018 19:55:19 -0400 Pavel Tatashin <[email protected]> wrote:

> update_defer_init() should be called only when struct page is about to be
> initialized. Because it counts number of initialized struct pages, but
> there we may skip struct pages if there is some mirrored memory.

What are the runtime effects of this error?

2018-07-25 01:19:32

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 3/3] mm: move mirrored memory specific code outside of memmap_init_zone

On Tue, 24 Jul 2018 19:55:20 -0400 Pavel Tatashin <[email protected]> wrote:

> memmap_init_zone, is getting complex, because it is called from different
> contexts: hotplug, and during boot, and also because it must handle some
> architecture quirks. One of them is mirroed memory.
>
> Move the code that decides whether to skip mirrored memory outside of
> memmap_init_zone, into a separate function.

Conflicts a bit with the page_alloc.c hunk from
http://ozlabs.org/~akpm/mmots/broken-out/mm-page_alloc-remain-memblock_next_valid_pfn-on-arm-arm64.patch. Please check my fixup:

void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
unsigned long start_pfn, enum memmap_context context,
struct vmem_altmap *altmap)
{
unsigned long pfn, end_pfn = start_pfn + size;
struct page *page;

if (highest_memmap_pfn < end_pfn - 1)
highest_memmap_pfn = end_pfn - 1;

/*
* Honor reservation requested by the driver for this ZONE_DEVICE
* memory
*/
if (altmap && start_pfn == altmap->base_pfn)
start_pfn += altmap->reserve;

for (pfn = start_pfn; pfn < end_pfn; pfn++) {
/*
* There can be holes in boot-time mem_map[]s handed to this
* function. They do not exist on hotplugged memory.
*/
if (context == MEMMAP_EARLY) {
if (!early_pfn_valid(pfn)) {
pfn = next_valid_pfn(pfn) - 1;
continue;
}
if (!early_pfn_in_nid(pfn, nid))
continue;
if (overlap_memmap_init(zone, &pfn))
continue;
if (defer_init(nid, pfn, end_pfn))
break;
}

page = pfn_to_page(pfn);
__init_single_page(page, pfn, zone, nid);
if (context == MEMMAP_HOTPLUG)
SetPageReserved(page);

/*
* Mark the block movable so that blocks are reserved for
* movable at startup. This will force kernel allocations
* to reserve their blocks rather than leaking throughout
* the address space during boot when many long-lived
* kernel allocations are made.
*
* bitmap is created for zone's valid pfn range. but memmap
* can be created for invalid pages (for alignment)
* check here not to call set_pageblock_migratetype() against
* pfn out of zone.
*/
if (!(pfn & (pageblock_nr_pages - 1))) {
set_pageblock_migratetype(page, MIGRATE_MOVABLE);
cond_resched();
}
}
}


2018-07-25 01:21:39

by Pavel Tatashin

[permalink] [raw]
Subject: Re: [PATCH 2/3] mm: calculate deferred pages after skipping mirrored memory

On Tue, Jul 24, 2018 at 9:12 PM Andrew Morton <[email protected]> wrote:
>
> On Tue, 24 Jul 2018 19:55:19 -0400 Pavel Tatashin <[email protected]> wrote:
>
> > update_defer_init() should be called only when struct page is about to be
> > initialized. Because it counts number of initialized struct pages, but
> > there we may skip struct pages if there is some mirrored memory.
>
> What are the runtime effects of this error?

I found this bug by reading the code. The effect is that fewer than
expected struct pages are initialized early in boot, and it is
possible that in some corner cases we may fail to boot when mirrored
pages are used. The deferred on demand code should somewhat mitigate
this. But, this still brings some inconsistencies compared to when
booting without mirrored pages, so it is better to fix.

Pavel

2018-07-25 01:32:49

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 2/3] mm: calculate deferred pages after skipping mirrored memory

On Tue, 24 Jul 2018 19:55:19 -0400 Pavel Tatashin <[email protected]> wrote:

> update_defer_init() should be called only when struct page is about to be
> initialized. Because it counts number of initialized struct pages, but
> there we may skip struct pages if there is some mirrored memory.
>
> So move, update_defer_init() after checking for mirrored memory.
>
> Also, rename update_defer_init() to defer_init() and reverse the return
> boolean to emphasize that this is a boolean function, that tells that the
> reset of memmap initialization should be deferred.
>
> Make this function self-contained: do not pass number of already
> initialized pages in this zone by using static counters.
>
> ...
>
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -306,24 +306,28 @@ static inline bool __meminit early_page_uninitialised(unsigned long pfn)
> }
>
> /*
> - * Returns false when the remaining initialisation should be deferred until
> + * Returns true when the remaining initialisation should be deferred until
> * later in the boot cycle when it can be parallelised.
> */
> -static inline bool update_defer_init(pg_data_t *pgdat,
> - unsigned long pfn, unsigned long zone_end,
> - unsigned long *nr_initialised)
> +static inline bool defer_init(int nid, unsigned long pfn, unsigned long end_pfn)
> {
> + static unsigned long prev_end_pfn, nr_initialised;

So answer me quick, what happens with a static variable in an inlined
function? Is there one copy kernel-wide? One copy per invocation
site? One copy per compilation unit?

Well I didn't know so I wrote a little test. One copy per compilation
unit (.o file), it appears.

It's OK in this case because the function is in .c (and has only one
call site). But if someone moves it into a header and uses it from a
different .c file, they have problems.

So it's dangerous, and poor practice. I'll make this non-static
__meminit.

--- a/mm/page_alloc.c~mm-calculate-deferred-pages-after-skipping-mirrored-memory-fix
+++ a/mm/page_alloc.c
@@ -309,7 +309,8 @@ static inline bool __meminit early_page_
* Returns true when the remaining initialisation should be deferred until
* later in the boot cycle when it can be parallelised.
*/
-static inline bool defer_init(int nid, unsigned long pfn, unsigned long end_pfn)
+static bool __meminit
+defer_init(int nid, unsigned long pfn, unsigned long end_pfn)
{
static unsigned long prev_end_pfn, nr_initialised;


Also, what locking protects these statics? Our knowledge that this
code is single-threaded, presumably?

2018-07-25 01:33:25

by Pavel Tatashin

[permalink] [raw]
Subject: Re: [PATCH 3/3] mm: move mirrored memory specific code outside of memmap_init_zone

On Tue, Jul 24, 2018 at 9:18 PM Andrew Morton <[email protected]> wrote:
>
> On Tue, 24 Jul 2018 19:55:20 -0400 Pavel Tatashin <[email protected]> wrote:
>
> > memmap_init_zone, is getting complex, because it is called from different
> > contexts: hotplug, and during boot, and also because it must handle some
> > architecture quirks. One of them is mirroed memory.
> >
> > Move the code that decides whether to skip mirrored memory outside of
> > memmap_init_zone, into a separate function.
>
> Conflicts a bit with the page_alloc.c hunk from
> http://ozlabs.org/~akpm/mmots/broken-out/mm-page_alloc-remain-memblock_next_valid_pfn-on-arm-arm64.patch. Please check my fixup:

The merge looks good to me. Thank you.

>
> void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
> unsigned long start_pfn, enum memmap_context context,
> struct vmem_altmap *altmap)
> {
> unsigned long pfn, end_pfn = start_pfn + size;
> struct page *page;
>
> if (highest_memmap_pfn < end_pfn - 1)
> highest_memmap_pfn = end_pfn - 1;
>
> /*
> * Honor reservation requested by the driver for this ZONE_DEVICE
> * memory
> */
> if (altmap && start_pfn == altmap->base_pfn)
> start_pfn += altmap->reserve;
>
> for (pfn = start_pfn; pfn < end_pfn; pfn++) {
> /*
> * There can be holes in boot-time mem_map[]s handed to this
> * function. They do not exist on hotplugged memory.
> */
> if (context == MEMMAP_EARLY) {
> if (!early_pfn_valid(pfn)) {
> pfn = next_valid_pfn(pfn) - 1;

I wish we did not have to do next_valid_pfn(pfn) - 1, and instead
could do something like:
for (pfn = start_pfn; pfn < end_pfn; pfn = next_valid_pfn(pfn))

Of course the performance of next_valid_pfn() should be optimized on
arm for the common case where next valid pfn is pfn++.

Pavel

2018-07-25 01:48:13

by Pavel Tatashin

[permalink] [raw]
Subject: Re: [PATCH 2/3] mm: calculate deferred pages after skipping mirrored memory

On Tue, Jul 24, 2018 at 9:31 PM Andrew Morton <[email protected]> wrote:
>
> On Tue, 24 Jul 2018 19:55:19 -0400 Pavel Tatashin <[email protected]> wrote:
>
> > update_defer_init() should be called only when struct page is about to be
> > initialized. Because it counts number of initialized struct pages, but
> > there we may skip struct pages if there is some mirrored memory.
> >
> > So move, update_defer_init() after checking for mirrored memory.
> >
> > Also, rename update_defer_init() to defer_init() and reverse the return
> > boolean to emphasize that this is a boolean function, that tells that the
> > reset of memmap initialization should be deferred.
> >
> > Make this function self-contained: do not pass number of already
> > initialized pages in this zone by using static counters.
> >
> > ...
> >
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -306,24 +306,28 @@ static inline bool __meminit early_page_uninitialised(unsigned long pfn)
> > }
> >
> > /*
> > - * Returns false when the remaining initialisation should be deferred until
> > + * Returns true when the remaining initialisation should be deferred until
> > * later in the boot cycle when it can be parallelised.
> > */
> > -static inline bool update_defer_init(pg_data_t *pgdat,
> > - unsigned long pfn, unsigned long zone_end,
> > - unsigned long *nr_initialised)
> > +static inline bool defer_init(int nid, unsigned long pfn, unsigned long end_pfn)
> > {
> > + static unsigned long prev_end_pfn, nr_initialised;
>
> So answer me quick, what happens with a static variable in an inlined
> function? Is there one copy kernel-wide? One copy per invocation
> site? One copy per compilation unit?
>
> Well I didn't know so I wrote a little test. One copy per compilation
> unit (.o file), it appears.
>
> It's OK in this case because the function is in .c (and has only one
> call site). But if someone moves it into a header and uses it from a
> different .c file, they have problems.
>
> So it's dangerous, and poor practice. I'll make this non-static
> __meminit.

I agree, it should not be moved to header it is dangerous.

But, on the other hand this is a hot-path. memmap_init_zone() might
need to go through billions of struct pages early in boot, and I did
not want us to waste time on function calls. With defer_init() this is
not a problem, because if CONFIG_DEFERRED_STRUCT_PAGE_INIT is set
memmap_init_zone() won't have much work to do, but for
overlap_memmap_init() this is a problem, especially because I expect
compiler to optimize the pfn dereference usage in inline function.

>
> --- a/mm/page_alloc.c~mm-calculate-deferred-pages-after-skipping-mirrored-memory-fix
> +++ a/mm/page_alloc.c
> @@ -309,7 +309,8 @@ static inline bool __meminit early_page_
> * Returns true when the remaining initialisation should be deferred until
> * later in the boot cycle when it can be parallelised.
> */
> -static inline bool defer_init(int nid, unsigned long pfn, unsigned long end_pfn)
> +static bool __meminit
> +defer_init(int nid, unsigned long pfn, unsigned long end_pfn)
> {
> static unsigned long prev_end_pfn, nr_initialised;
>
>
> Also, what locking protects these statics? Our knowledge that this
> code is single-threaded, presumably?

Correct, this is called only from "context == MEMMAP_EARLY", way
before smp_init().

2018-07-25 11:49:53

by Oscar Salvador

[permalink] [raw]
Subject: Re: [PATCH 3/3] mm: move mirrored memory specific code outside of memmap_init_zone

On Tue, Jul 24, 2018 at 07:55:20PM -0400, Pavel Tatashin wrote:
> memmap_init_zone, is getting complex, because it is called from different
> contexts: hotplug, and during boot, and also because it must handle some
> architecture quirks. One of them is mirroed memory.
>
> Move the code that decides whether to skip mirrored memory outside of
> memmap_init_zone, into a separate function.
>
> Signed-off-by: Pavel Tatashin <[email protected]>

Hi Pavel,

this looks good to me.
Over the past days I thought if it would make sense to have two
memmap_init_zone functions, one for hotplug and another one for early init,
so we could get rid of the altmap stuff in the early init, and also the
MEMMAP_EARLY/HOTPLUG context thing could be gone.

But I think that they would just share too much of the code, so I do not think
it is worth.

I am working to do that for free_area_init_core, let us see what I come up with.

Anyway, this looks nicer, so thanks for that.
I also gave it a try, and early init and memhotplug code seems to work fine.

Reviewed-by: Oscar Salvador <[email protected]>

Thanks
--
Oscar Salvador
SUSE L3

2018-07-25 12:17:16

by Oscar Salvador

[permalink] [raw]
Subject: Re: [PATCH 2/3] mm: calculate deferred pages after skipping mirrored memory

On Tue, Jul 24, 2018 at 07:55:19PM -0400, Pavel Tatashin wrote:
> update_defer_init() should be called only when struct page is about to be
> initialized. Because it counts number of initialized struct pages, but
> there we may skip struct pages if there is some mirrored memory.
>
> So move, update_defer_init() after checking for mirrored memory.
>
> Also, rename update_defer_init() to defer_init() and reverse the return
> boolean to emphasize that this is a boolean function, that tells that the
> reset of memmap initialization should be deferred.
>
> Make this function self-contained: do not pass number of already
> initialized pages in this zone by using static counters.
>
> Signed-off-by: Pavel Tatashin <[email protected]>
> ---
> mm/page_alloc.c | 40 ++++++++++++++++++++--------------------
> 1 file changed, 20 insertions(+), 20 deletions(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index cea749b26394..86c678cec6bd 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -306,24 +306,28 @@ static inline bool __meminit early_page_uninitialised(unsigned long pfn)
> }
>
> /*
> - * Returns false when the remaining initialisation should be deferred until
> + * Returns true when the remaining initialisation should be deferred until
> * later in the boot cycle when it can be parallelised.
> */
> -static inline bool update_defer_init(pg_data_t *pgdat,
> - unsigned long pfn, unsigned long zone_end,
> - unsigned long *nr_initialised)
> +static inline bool defer_init(int nid, unsigned long pfn, unsigned long end_pfn)
> {
> + static unsigned long prev_end_pfn, nr_initialised;
> +
> + if (prev_end_pfn != end_pfn) {
> + prev_end_pfn = end_pfn;
> + nr_initialised = 0;
> + }
Hi Pavel,

What about a comment explaining that "if".
I am not the brightest one, so it took me a bit to figure out that we got that "if" there
because now that the variables are static, we need to somehow track whenever we change to
another zone.

Thanks
--
Oscar Salvador
SUSE L3

2018-07-25 13:34:39

by Pavel Tatashin

[permalink] [raw]
Subject: Re: [PATCH 2/3] mm: calculate deferred pages after skipping mirrored memory

On Wed, Jul 25, 2018 at 8:15 AM Oscar Salvador
<[email protected]> wrote:
>
> On Tue, Jul 24, 2018 at 07:55:19PM -0400, Pavel Tatashin wrote:
> > update_defer_init() should be called only when struct page is about to be
> > initialized. Because it counts number of initialized struct pages, but
> > there we may skip struct pages if there is some mirrored memory.
> >
> > So move, update_defer_init() after checking for mirrored memory.
> >
> > Also, rename update_defer_init() to defer_init() and reverse the return
> > boolean to emphasize that this is a boolean function, that tells that the
> > reset of memmap initialization should be deferred.
> >
> > Make this function self-contained: do not pass number of already
> > initialized pages in this zone by using static counters.
> >
> > Signed-off-by: Pavel Tatashin <[email protected]>
> > ---
> > mm/page_alloc.c | 40 ++++++++++++++++++++--------------------
> > 1 file changed, 20 insertions(+), 20 deletions(-)
> >
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index cea749b26394..86c678cec6bd 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -306,24 +306,28 @@ static inline bool __meminit early_page_uninitialised(unsigned long pfn)
> > }
> >
> > /*
> > - * Returns false when the remaining initialisation should be deferred until
> > + * Returns true when the remaining initialisation should be deferred until
> > * later in the boot cycle when it can be parallelised.
> > */
> > -static inline bool update_defer_init(pg_data_t *pgdat,
> > - unsigned long pfn, unsigned long zone_end,
> > - unsigned long *nr_initialised)
> > +static inline bool defer_init(int nid, unsigned long pfn, unsigned long end_pfn)
> > {
> > + static unsigned long prev_end_pfn, nr_initialised;
> > +
> > + if (prev_end_pfn != end_pfn) {
> > + prev_end_pfn = end_pfn;
> > + nr_initialised = 0;
> > + }
> Hi Pavel,
>
> What about a comment explaining that "if".
> I am not the brightest one, so it took me a bit to figure out that we got that "if" there
> because now that the variables are static, we need to somehow track whenever we change to
> another zone.

Hi Oscar,

Hm, yeah a comment would be appropriate here. I will send an updated
patch. I will also change the functions from inline to normal
functions as Andrew pointed out: it is not a good idea to use statics
in inline functions.

Thank you,
Pavel

2018-07-25 21:31:35

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 2/3] mm: calculate deferred pages after skipping mirrored memory

On Tue, 24 Jul 2018 21:46:25 -0400 Pavel Tatashin <[email protected]> wrote:

> > > +static inline bool defer_init(int nid, unsigned long pfn, unsigned long end_pfn)
> > > {
> > > + static unsigned long prev_end_pfn, nr_initialised;
> >
> > So answer me quick, what happens with a static variable in an inlined
> > function? Is there one copy kernel-wide? One copy per invocation
> > site? One copy per compilation unit?
> >
> > Well I didn't know so I wrote a little test. One copy per compilation
> > unit (.o file), it appears.
> >
> > It's OK in this case because the function is in .c (and has only one
> > call site). But if someone moves it into a header and uses it from a
> > different .c file, they have problems.
> >
> > So it's dangerous, and poor practice. I'll make this non-static
> > __meminit.
>
> I agree, it should not be moved to header it is dangerous.
>
> But, on the other hand this is a hot-path. memmap_init_zone() might
> need to go through billions of struct pages early in boot, and I did
> not want us to waste time on function calls. With defer_init() this is
> not a problem, because if CONFIG_DEFERRED_STRUCT_PAGE_INIT is set
> memmap_init_zone() won't have much work to do, but for
> overlap_memmap_init() this is a problem, especially because I expect
> compiler to optimize the pfn dereference usage in inline function.

Well. The compiler will just go and inline defer_init() anwyay - it
has a single callsite and is in the same __meminint section as its
calling function. My gcc-7.2.0 does this. Marking it noninline
__meminit is basically syntactic fluff designed to encourage people to
think twice.

> >
> > --- a/mm/page_alloc.c~mm-calculate-deferred-pages-after-skipping-mirrored-memory-fix
> > +++ a/mm/page_alloc.c
> > @@ -309,7 +309,8 @@ static inline bool __meminit early_page_
> > * Returns true when the remaining initialisation should be deferred until
> > * later in the boot cycle when it can be parallelised.
> > */
> > -static inline bool defer_init(int nid, unsigned long pfn, unsigned long end_pfn)
> > +static bool __meminit
> > +defer_init(int nid, unsigned long pfn, unsigned long end_pfn)
> > {
> > static unsigned long prev_end_pfn, nr_initialised;
> >
> >
> > Also, what locking protects these statics? Our knowledge that this
> > code is single-threaded, presumably?
>
> Correct, this is called only from "context == MEMMAP_EARLY", way
> before smp_init().

Might be worth a little comment to put readers minds at ease.

2018-07-26 07:52:35

by Oscar Salvador

[permalink] [raw]
Subject: Re: [PATCH 1/3] mm: make memmap_init a proper function

On Tue, Jul 24, 2018 at 07:55:18PM -0400, Pavel Tatashin wrote:
> memmap_init is sometimes a macro sometimes a function based on
> __HAVE_ARCH_MEMMAP_INIT. It is only a function on ia64. Make
> memmap_init a weak function instead, and let ia64 redefine it.
>
> Signed-off-by: Pavel Tatashin <[email protected]>

Looks good, and it is easier to read.

Reviewed-by: Oscar Salvador <[email protected]>

Thanks
--
Oscar Salvador
SUSE L3

2018-07-26 15:42:14

by Pavel Tatashin

[permalink] [raw]
Subject: Re: [PATCH 2/3] mm: calculate deferred pages after skipping mirrored memory

On Wed, Jul 25, 2018 at 5:30 PM Andrew Morton <[email protected]> wrote:
>
> On Tue, 24 Jul 2018 21:46:25 -0400 Pavel Tatashin <[email protected]> wrote:
>
> > > > +static inline bool defer_init(int nid, unsigned long pfn, unsigned long end_pfn)
> > > > {
> > > > + static unsigned long prev_end_pfn, nr_initialised;
> > >
> > > So answer me quick, what happens with a static variable in an inlined
> > > function? Is there one copy kernel-wide? One copy per invocation
> > > site? One copy per compilation unit?
> > >
> > > Well I didn't know so I wrote a little test. One copy per compilation
> > > unit (.o file), it appears.
> > >
> > > It's OK in this case because the function is in .c (and has only one
> > > call site). But if someone moves it into a header and uses it from a
> > > different .c file, they have problems.
> > >
> > > So it's dangerous, and poor practice. I'll make this non-static
> > > __meminit.
> >
> > I agree, it should not be moved to header it is dangerous.
> >
> > But, on the other hand this is a hot-path. memmap_init_zone() might
> > need to go through billions of struct pages early in boot, and I did
> > not want us to waste time on function calls. With defer_init() this is
> > not a problem, because if CONFIG_DEFERRED_STRUCT_PAGE_INIT is set
> > memmap_init_zone() won't have much work to do, but for
> > overlap_memmap_init() this is a problem, especially because I expect
> > compiler to optimize the pfn dereference usage in inline function.
>
> Well. The compiler will just go and inline defer_init() anwyay - it
> has a single callsite and is in the same __meminint section as its
> calling function. My gcc-7.2.0 does this. Marking it noninline
> __meminit is basically syntactic fluff designed to encourage people to
> think twice.

Makes sense. I will do the change in the next version of the patches.

>
> > >
> > > --- a/mm/page_alloc.c~mm-calculate-deferred-pages-after-skipping-mirrored-memory-fix
> > > +++ a/mm/page_alloc.c
> > > @@ -309,7 +309,8 @@ static inline bool __meminit early_page_
> > > * Returns true when the remaining initialisation should be deferred until
> > > * later in the boot cycle when it can be parallelised.
> > > */
> > > -static inline bool defer_init(int nid, unsigned long pfn, unsigned long end_pfn)
> > > +static bool __meminit
> > > +defer_init(int nid, unsigned long pfn, unsigned long end_pfn)
> > > {
> > > static unsigned long prev_end_pfn, nr_initialised;
> > >
> > >
> > > Also, what locking protects these statics? Our knowledge that this
> > > code is single-threaded, presumably?
> >
> > Correct, this is called only from "context == MEMMAP_EARLY", way
> > before smp_init().
>
> Might be worth a little comment to put readers minds at ease.

Will add it.

Thank you,
Pavel