2018-09-04 18:36:19

by Alexander Duyck

[permalink] [raw]
Subject: [PATCH 0/2] Address issues slowing memory init

This patch series is meant to address some issues I consider to be
low-hanging fruit in regards to memory initialization optimization.

With these two changes I am able to cut the hot-plug memory initialization
times in my environment in half.

---

Alexander Duyck (2):
mm: Move page struct poisoning from CONFIG_DEBUG_VM to CONFIG_DEBUG_VM_PGFLAGS
mm: Create non-atomic version of SetPageReserved for init use


include/linux/page-flags.h | 1 +
mm/memblock.c | 2 +-
mm/page_alloc.c | 4 ++--
mm/sparse.c | 2 +-
4 files changed, 5 insertions(+), 4 deletions(-)

--


2018-09-04 18:35:17

by Alexander Duyck

[permalink] [raw]
Subject: [PATCH 1/2] mm: Move page struct poisoning from CONFIG_DEBUG_VM to CONFIG_DEBUG_VM_PGFLAGS

From: Alexander Duyck <[email protected]>

On systems with a large amount of memory it can take a significant amount
of time to initialize all of the page structs with the PAGE_POISON_PATTERN
value. I have seen it take over 2 minutes to initialize a system with
over 12GB of RAM.

In order to work around the issue I had to disable CONFIG_DEBUG_VM and then
the boot time returned to something much more reasonable as the
arch_add_memory call completed in milliseconds versus seconds. However in
doing that I had to disable all of the other VM debugging on the system.

I did a bit of research and it seems like the only function that checks
for this poison value is the PagePoisoned function, and it is only called
in two spots. One is the PF_POISONED_CHECK macro that is only in use when
CONFIG_DEBUG_VM_PGFLAGS is defined, and the other is as a part of the
__dump_page function which is using the check to prevent a recursive
failure in the event of discovering a poisoned page.

With this being the case I am opting to move the poisoning of the page
structs from CONFIG_DEBUG_VM to CONFIG_DEBUG_VM_PGFLAGS so that we are
only performing the memset if it will be used to test for failures.

Signed-off-by: Alexander Duyck <[email protected]>
---
mm/memblock.c | 2 +-
mm/sparse.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/memblock.c b/mm/memblock.c
index 237944479d25..51e8ae927257 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -1444,7 +1444,7 @@ void * __init memblock_virt_alloc_try_nid_raw(

ptr = memblock_virt_alloc_internal(size, align,
min_addr, max_addr, nid);
-#ifdef CONFIG_DEBUG_VM
+#ifdef CONFIG_DEBUG_VM_PGFLAGS
if (ptr && size > 0)
memset(ptr, PAGE_POISON_PATTERN, size);
#endif
diff --git a/mm/sparse.c b/mm/sparse.c
index 10b07eea9a6e..0fd9ad5021b0 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -696,7 +696,7 @@ int __meminit sparse_add_one_section(struct pglist_data *pgdat,
goto out;
}

-#ifdef CONFIG_DEBUG_VM
+#ifdef CONFIG_DEBUG_VM_PGFLAGS
/*
* Poison uninitialized struct pages in order to catch invalid flags
* combinations.


2018-09-04 18:35:55

by Alexander Duyck

[permalink] [raw]
Subject: [PATCH 2/2] mm: Create non-atomic version of SetPageReserved for init use

From: Alexander Duyck <[email protected]>

It doesn't make much sense to use the atomic SetPageReserved at init time
when we are using memset to clear the memory and manipulating the page
flags via simple "&=" and "|=" operations in __init_single_page.

This patch adds a non-atomic version __SetPageReserved that can be used
during page init and shows about a 10% improvement in initialization times
on the systems I have available for testing.

Signed-off-by: Alexander Duyck <[email protected]>
---
include/linux/page-flags.h | 1 +
mm/page_alloc.c | 4 ++--
2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 74bee8cecf4c..57ec3fef7e9f 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -292,6 +292,7 @@ static inline int PagePoisoned(const struct page *page)

PAGEFLAG(Reserved, reserved, PF_NO_COMPOUND)
__CLEARPAGEFLAG(Reserved, reserved, PF_NO_COMPOUND)
+ __SETPAGEFLAG(Reserved, reserved, PF_NO_COMPOUND)
PAGEFLAG(SwapBacked, swapbacked, PF_NO_TAIL)
__CLEARPAGEFLAG(SwapBacked, swapbacked, PF_NO_TAIL)
__SETPAGEFLAG(SwapBacked, swapbacked, PF_NO_TAIL)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 05e983f42316..9c7d6e971630 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1231,7 +1231,7 @@ void __meminit reserve_bootmem_region(phys_addr_t start, phys_addr_t end)
/* Avoid false-positive PageTail() */
INIT_LIST_HEAD(&page->lru);

- SetPageReserved(page);
+ __SetPageReserved(page);
}
}
}
@@ -5518,7 +5518,7 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
page = pfn_to_page(pfn);
__init_single_page(page, pfn, zone, nid);
if (context == MEMMAP_HOTPLUG)
- SetPageReserved(page);
+ __SetPageReserved(page);

/*
* Mark the block movable so that blocks are reserved for


2018-09-04 19:27:13

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH 1/2] mm: Move page struct poisoning from CONFIG_DEBUG_VM to CONFIG_DEBUG_VM_PGFLAGS

On 09/04/2018 11:33 AM, Alexander Duyck wrote:
> --- a/mm/memblock.c
> +++ b/mm/memblock.c
> @@ -1444,7 +1444,7 @@ void * __init memblock_virt_alloc_try_nid_raw(
>
> ptr = memblock_virt_alloc_internal(size, align,
> min_addr, max_addr, nid);
> -#ifdef CONFIG_DEBUG_VM
> +#ifdef CONFIG_DEBUG_VM_PGFLAGS
> if (ptr && size > 0)
> memset(ptr, PAGE_POISON_PATTERN, size);
> #endif
> diff --git a/mm/sparse.c b/mm/sparse.c
> index 10b07eea9a6e..0fd9ad5021b0 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -696,7 +696,7 @@ int __meminit sparse_add_one_section(struct pglist_data *pgdat,
> goto out;
> }
>
> -#ifdef CONFIG_DEBUG_VM
> +#ifdef CONFIG_DEBUG_VM_PGFLAGS
> /*
> * Poison uninitialized struct pages in order to catch invalid flags
> * combinations.

I think this is the wrong way to do this. It keeps the setting and
checking still rather tenuously connected. If you were to leave it this
way, it needs commenting. It's also rather odd that we're memsetting
the entire 'struct page' for a config option that's supposedly dealing
with page->flags. That deserves _some_ addressing in a comment or
changelog.

How about:

#ifdef CONFIG_DEBUG_VM_PGFLAGS
#define VM_BUG_ON_PGFLAGS(cond, page) VM_BUG_ON_PAGE(cond, page)
+static inline void poison_struct_pages(struct page *pages, int nr)
+{
+ memset(pages, PAGE_POISON_PATTERN, size * sizeof(...));
+}
#else
#define VM_BUG_ON_PGFLAGS(cond, page) BUILD_BUG_ON_INVALID(cond)
static inline void poison_struct_pages(struct page *pages, int nr) {}
#endif

That puts the setting and checking in one spot, and also removes a
couple of #ifdefs from .c files.

2018-09-04 19:29:48

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH 2/2] mm: Create non-atomic version of SetPageReserved for init use

On 09/04/2018 11:33 AM, Alexander Duyck wrote:
> +++ b/mm/page_alloc.c
> @@ -1231,7 +1231,7 @@ void __meminit reserve_bootmem_region(phys_addr_t start, phys_addr_t end)
> /* Avoid false-positive PageTail() */
> INIT_LIST_HEAD(&page->lru);
>
> - SetPageReserved(page);
> + __SetPageReserved(page);
> }
> }
> }
> @@ -5518,7 +5518,7 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
> page = pfn_to_page(pfn);
> __init_single_page(page, pfn, zone, nid);
> if (context == MEMMAP_HOTPLUG)
> - SetPageReserved(page);
> + __SetPageReserved(page);

Comments needed, please. SetPageReserved() is opaque enough by itself,
but having to discern between it and an __ variant is even less fun.


2018-09-04 19:56:52

by Alexander Duyck

[permalink] [raw]
Subject: Re: [PATCH 1/2] mm: Move page struct poisoning from CONFIG_DEBUG_VM to CONFIG_DEBUG_VM_PGFLAGS

On Tue, Sep 4, 2018 at 12:25 PM Dave Hansen <[email protected]> wrote:
>
> On 09/04/2018 11:33 AM, Alexander Duyck wrote:
> > --- a/mm/memblock.c
> > +++ b/mm/memblock.c
> > @@ -1444,7 +1444,7 @@ void * __init memblock_virt_alloc_try_nid_raw(
> >
> > ptr = memblock_virt_alloc_internal(size, align,
> > min_addr, max_addr, nid);
> > -#ifdef CONFIG_DEBUG_VM
> > +#ifdef CONFIG_DEBUG_VM_PGFLAGS
> > if (ptr && size > 0)
> > memset(ptr, PAGE_POISON_PATTERN, size);
> > #endif
> > diff --git a/mm/sparse.c b/mm/sparse.c
> > index 10b07eea9a6e..0fd9ad5021b0 100644
> > --- a/mm/sparse.c
> > +++ b/mm/sparse.c
> > @@ -696,7 +696,7 @@ int __meminit sparse_add_one_section(struct pglist_data *pgdat,
> > goto out;
> > }
> >
> > -#ifdef CONFIG_DEBUG_VM
> > +#ifdef CONFIG_DEBUG_VM_PGFLAGS
> > /*
> > * Poison uninitialized struct pages in order to catch invalid flags
> > * combinations.
>
> I think this is the wrong way to do this. It keeps the setting and
> checking still rather tenuously connected. If you were to leave it this
> way, it needs commenting. It's also rather odd that we're memsetting
> the entire 'struct page' for a config option that's supposedly dealing
> with page->flags. That deserves _some_ addressing in a comment or
> changelog.
>
> How about:
>
> #ifdef CONFIG_DEBUG_VM_PGFLAGS
> #define VM_BUG_ON_PGFLAGS(cond, page) VM_BUG_ON_PAGE(cond, page)
> +static inline void poison_struct_pages(struct page *pages, int nr)
> +{
> + memset(pages, PAGE_POISON_PATTERN, size * sizeof(...));
> +}
> #else
> #define VM_BUG_ON_PGFLAGS(cond, page) BUILD_BUG_ON_INVALID(cond)
> static inline void poison_struct_pages(struct page *pages, int nr) {}
> #endif
>
> That puts the setting and checking in one spot, and also removes a
> couple of #ifdefs from .c files.

So the only issue with this is the fact that the code here is wrapped
in a check for CONFIG_DEBUG_VM, so if that isn't defined we end up
with build errors.

If the goal is to consolidate things I could probably look at adding a
function in include/linux/page-flags.h, probably next to PagePoisoned.
I could then probably just look at wrapping the memset call itself
with the CONFIG_DEBUG_VM_PGFLAGS instead of the entire function. I
could then place some code documentation in there explaining why it is
wrapped.

- Alex

2018-09-04 20:09:37

by Pasha Tatashin

[permalink] [raw]
Subject: Re: [PATCH 1/2] mm: Move page struct poisoning from CONFIG_DEBUG_VM to CONFIG_DEBUG_VM_PGFLAGS

Hi Alexander,

This is a wrong way to do it. memblock_virt_alloc_try_nid_raw() does not
initialize allocated memory, and by setting memory to all ones in debug
build we ensure that no callers rely on this function to return zeroed
memory just by accident.

And, the accidents are frequent because most of the BIOSes and
hypervisors zero memory for us. The exception is kexec reboot.

So, the fact that page flags checks this pattern, does not mean that
this is the only user. Memory that is returned by
memblock_virt_alloc_try_nid_raw() is used for page table as well, and
can be used in other places as well that don't want memblock to zero the
memory for them for performance reasons.

I am surprised that CONFIG_DEBUG_VM is used in production kernel, but if
so perhaps a new CONFIG should be added: CONFIG_DEBUG_MEMBLOCK

Thank you,
Pavel

On 9/4/18 2:33 PM, Alexander Duyck wrote:
> From: Alexander Duyck <[email protected]>
>
> On systems with a large amount of memory it can take a significant amount
> of time to initialize all of the page structs with the PAGE_POISON_PATTERN
> value. I have seen it take over 2 minutes to initialize a system with
> over 12GB of RAM.
>
> In order to work around the issue I had to disable CONFIG_DEBUG_VM and then
> the boot time returned to something much more reasonable as the
> arch_add_memory call completed in milliseconds versus seconds. However in
> doing that I had to disable all of the other VM debugging on the system.
>
> I did a bit of research and it seems like the only function that checks
> for this poison value is the PagePoisoned function, and it is only called
> in two spots. One is the PF_POISONED_CHECK macro that is only in use when
> CONFIG_DEBUG_VM_PGFLAGS is defined, and the other is as a part of the
> __dump_page function which is using the check to prevent a recursive
> failure in the event of discovering a poisoned page.
>
> With this being the case I am opting to move the poisoning of the page
> structs from CONFIG_DEBUG_VM to CONFIG_DEBUG_VM_PGFLAGS so that we are
> only performing the memset if it will be used to test for failures.
>
> Signed-off-by: Alexander Duyck <[email protected]>
> ---
> mm/memblock.c | 2 +-
> mm/sparse.c | 2 +-
> 2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/mm/memblock.c b/mm/memblock.c
> index 237944479d25..51e8ae927257 100644
> --- a/mm/memblock.c
> +++ b/mm/memblock.c
> @@ -1444,7 +1444,7 @@ void * __init memblock_virt_alloc_try_nid_raw(
>
> ptr = memblock_virt_alloc_internal(size, align,
> min_addr, max_addr, nid);
> -#ifdef CONFIG_DEBUG_VM
> +#ifdef CONFIG_DEBUG_VM_PGFLAGS
> if (ptr && size > 0)
> memset(ptr, PAGE_POISON_PATTERN, size);
> #endif
> diff --git a/mm/sparse.c b/mm/sparse.c
> index 10b07eea9a6e..0fd9ad5021b0 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -696,7 +696,7 @@ int __meminit sparse_add_one_section(struct pglist_data *pgdat,
> goto out;
> }
>
> -#ifdef CONFIG_DEBUG_VM
> +#ifdef CONFIG_DEBUG_VM_PGFLAGS
> /*
> * Poison uninitialized struct pages in order to catch invalid flags
> * combinations.
>

2018-09-04 21:15:07

by Alexander Duyck

[permalink] [raw]
Subject: Re: [PATCH 1/2] mm: Move page struct poisoning from CONFIG_DEBUG_VM to CONFIG_DEBUG_VM_PGFLAGS

On Tue, Sep 4, 2018 at 1:07 PM Pasha Tatashin
<[email protected]> wrote:
>
> Hi Alexander,
>
> This is a wrong way to do it. memblock_virt_alloc_try_nid_raw() does not
> initialize allocated memory, and by setting memory to all ones in debug
> build we ensure that no callers rely on this function to return zeroed
> memory just by accident.

I get that, but setting this to all 1's is still just debugging code
and that is adding significant overhead.

> And, the accidents are frequent because most of the BIOSes and
> hypervisors zero memory for us. The exception is kexec reboot.
>
> So, the fact that page flags checks this pattern, does not mean that
> this is the only user. Memory that is returned by
> memblock_virt_alloc_try_nid_raw() is used for page table as well, and
> can be used in other places as well that don't want memblock to zero the
> memory for them for performance reasons.

The logic behind this statement is confusing. You are saying they
don't want memblock to zero the memory for performance reasons, yet
you are setting it to all 1's for debugging reasons? I get that it is
wrapped, but in my mind just using CONFIG_DEBUG_VM is too broad of a
brush. Especially with distros like Fedora enabling it by default.

> I am surprised that CONFIG_DEBUG_VM is used in production kernel, but if
> so perhaps a new CONFIG should be added: CONFIG_DEBUG_MEMBLOCK
>
> Thank you,
> Pavel

I don't know about production. I am running a Fedora kernel on my
development system and it has it enabled. It looks like it has been
that way for a while based on a FC20 Bugzilla
(https://bugzilla.redhat.com/show_bug.cgi?id=1074710). A quick look at
one of my CentOS systems shows that it doesn't have it set. I suspect
it will vary from distro to distro. I just know it spooked me when I
was stuck staring at a blank screen for three minutes when I was
booting a system with 12TB of memory since this delay can hit you
early in the boot.

I had considered adding a completely new CONFIG. The only thing is it
doesn't make much sense to have the logic setting the value to all 1's
without any logic to test for it. That is why I thought it made more
sense to just fold it into CONFIG_DEBUG_VM_PGFLAGS. I suppose I could
look at something like CONFIG_DEBUG_PAGE_INIT if we want to go that
route. I figure using something like MEMBLOCK probably wouldn't make
sense since this also impacts sparse section init.

Thanks.

- Alex

2018-09-04 21:47:01

by Pasha Tatashin

[permalink] [raw]
Subject: Re: [PATCH 1/2] mm: Move page struct poisoning from CONFIG_DEBUG_VM to CONFIG_DEBUG_VM_PGFLAGS



On 9/4/18 5:13 PM, Alexander Duyck wrote:
> On Tue, Sep 4, 2018 at 1:07 PM Pasha Tatashin
> <[email protected]> wrote:
>>
>> Hi Alexander,
>>
>> This is a wrong way to do it. memblock_virt_alloc_try_nid_raw() does not
>> initialize allocated memory, and by setting memory to all ones in debug
>> build we ensure that no callers rely on this function to return zeroed
>> memory just by accident.
>
> I get that, but setting this to all 1's is still just debugging code
> and that is adding significant overhead.

That's correct debugging code on debugging kernel.

>
>> And, the accidents are frequent because most of the BIOSes and
>> hypervisors zero memory for us. The exception is kexec reboot.
>>
>> So, the fact that page flags checks this pattern, does not mean that
>> this is the only user. Memory that is returned by
>> memblock_virt_alloc_try_nid_raw() is used for page table as well, and
>> can be used in other places as well that don't want memblock to zero the
>> memory for them for performance reasons.
>
> The logic behind this statement is confusing. You are saying they
> don't want memblock to zero the memory for performance reasons, yet
> you are setting it to all 1's for debugging reasons? I get that it is
> wrapped, but in my mind just using CONFIG_DEBUG_VM is too broad of a
> brush. Especially with distros like Fedora enabling it by default.

The idea is not to zero memory on production kernel, and ensure that not
zeroing memory does not cause any accidental bugs by having debug code
on debug kernel.

>
>> I am surprised that CONFIG_DEBUG_VM is used in production kernel, but if
>> so perhaps a new CONFIG should be added: CONFIG_DEBUG_MEMBLOCK
>>
>> Thank you,
>> Pavel
>
> I don't know about production. I am running a Fedora kernel on my
> development system and it has it enabled. It looks like it has been
> that way for a while based on a FC20 Bugzilla
> (https://bugzilla.redhat.com/show_bug.cgi?id=1074710). A quick look at
> one of my CentOS systems shows that it doesn't have it set. I suspect
> it will vary from distro to distro. I just know it spooked me when I
> was stuck staring at a blank screen for three minutes when I was
> booting a system with 12TB of memory since this delay can hit you
> early in the boot.

I understand, this is the delay that I fixed when I removed memset(0)
from struct page initialization code. However, we still need to keep
this debug code memset(1) in order to catch some bugs. And we do from
time to time.

For far too long linux was expecting that the memory that is returned by
memblock and boot allocator is always zeroed.

>
> I had considered adding a completely new CONFIG. The only thing is it
> doesn't make much sense to have the logic setting the value to all 1's
> without any logic to test for it.

When memory is zeroed, page table works by accident as the entries are
empty. However, when entries are all ones, and we accidentally try to
use that memory as page table invalid VA in page table will crash debug
kernel (and it has in the past helped finding some bugs).

So, the testing is not only that uninitialized struct pages are not
accessed, but also that only explicitly initialized page tables are
accessed.


That is why I thought it made more
> sense to just fold it into CONFIG_DEBUG_VM_PGFLAGS. I suppose I could
> look at something like CONFIG_DEBUG_PAGE_INIT if we want to go that
> route. I figure using something like MEMBLOCK probably wouldn't make
> sense since this also impacts sparse section init.

If distros are using CONFIG_DEBUG_VM in production kernels (as you
pointed out above), it makes sense to add CONFIG_DEBUG_MEMBLOCK.

Thank you,
Pavel

2018-09-05 06:12:43

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH 1/2] mm: Move page struct poisoning from CONFIG_DEBUG_VM to CONFIG_DEBUG_VM_PGFLAGS

On Tue 04-09-18 11:33:39, Alexander Duyck wrote:
> From: Alexander Duyck <[email protected]>
>
> On systems with a large amount of memory it can take a significant amount
> of time to initialize all of the page structs with the PAGE_POISON_PATTERN
> value. I have seen it take over 2 minutes to initialize a system with
> over 12GB of RAM.
>
> In order to work around the issue I had to disable CONFIG_DEBUG_VM and then
> the boot time returned to something much more reasonable as the
> arch_add_memory call completed in milliseconds versus seconds. However in
> doing that I had to disable all of the other VM debugging on the system.

I agree that CONFIG_DEBUG_VM is a big hammer but the primary point of
this check is to catch uninitialized struct pages after the early mem
init rework so the intention was to make it enabled on as many systems
with debugging enabled as possible. DEBUG_VM is not free already so it
sounded like a good idea to sneak it there.

> I did a bit of research and it seems like the only function that checks
> for this poison value is the PagePoisoned function, and it is only called
> in two spots. One is the PF_POISONED_CHECK macro that is only in use when
> CONFIG_DEBUG_VM_PGFLAGS is defined, and the other is as a part of the
> __dump_page function which is using the check to prevent a recursive
> failure in the event of discovering a poisoned page.

Hmm, I have missed the dependency on CONFIG_DEBUG_VM_PGFLAGS when
reviewing the patch. My debugging kernel config doesn't have it enabled
for example. I know that Fedora configs have CONFIG_DEBUG_VM enabled
but I cannot find their config right now to double check for the
CONFIG_DEBUG_VM_PGFLAGS right now.

I am not really sure this dependency was intentional but I strongly
suspect Pavel really wanted to have it DEBUG_VM scoped.
--
Michal Hocko
SUSE Labs

2018-09-05 06:25:50

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH 2/2] mm: Create non-atomic version of SetPageReserved for init use

On Tue 04-09-18 11:33:45, Alexander Duyck wrote:
> From: Alexander Duyck <[email protected]>
>
> It doesn't make much sense to use the atomic SetPageReserved at init time
> when we are using memset to clear the memory and manipulating the page
> flags via simple "&=" and "|=" operations in __init_single_page.
>
> This patch adds a non-atomic version __SetPageReserved that can be used
> during page init and shows about a 10% improvement in initialization times
> on the systems I have available for testing.

I agree with Dave about a comment is due. I am also quite surprised that
this leads to such a large improvement. Could you be more specific about
your test and machines you were testing on?

Other than that the patch makes sense to me.

> Signed-off-by: Alexander Duyck <[email protected]>

With the above addressed, feel free to add
Acked-by: Michal Hocko <[email protected]>

Thanks!

> ---
> include/linux/page-flags.h | 1 +
> mm/page_alloc.c | 4 ++--
> 2 files changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> index 74bee8cecf4c..57ec3fef7e9f 100644
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -292,6 +292,7 @@ static inline int PagePoisoned(const struct page *page)
>
> PAGEFLAG(Reserved, reserved, PF_NO_COMPOUND)
> __CLEARPAGEFLAG(Reserved, reserved, PF_NO_COMPOUND)
> + __SETPAGEFLAG(Reserved, reserved, PF_NO_COMPOUND)
> PAGEFLAG(SwapBacked, swapbacked, PF_NO_TAIL)
> __CLEARPAGEFLAG(SwapBacked, swapbacked, PF_NO_TAIL)
> __SETPAGEFLAG(SwapBacked, swapbacked, PF_NO_TAIL)
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 05e983f42316..9c7d6e971630 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1231,7 +1231,7 @@ void __meminit reserve_bootmem_region(phys_addr_t start, phys_addr_t end)
> /* Avoid false-positive PageTail() */
> INIT_LIST_HEAD(&page->lru);
>
> - SetPageReserved(page);
> + __SetPageReserved(page);
> }
> }
> }
> @@ -5518,7 +5518,7 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
> page = pfn_to_page(pfn);
> __init_single_page(page, pfn, zone, nid);
> if (context == MEMMAP_HOTPLUG)
> - SetPageReserved(page);
> + __SetPageReserved(page);
>
> /*
> * Mark the block movable so that blocks are reserved for
>

--
Michal Hocko
SUSE Labs

2018-09-05 15:33:56

by Alexander Duyck

[permalink] [raw]
Subject: Re: [PATCH 1/2] mm: Move page struct poisoning from CONFIG_DEBUG_VM to CONFIG_DEBUG_VM_PGFLAGS

On Tue, Sep 4, 2018 at 11:10 PM Michal Hocko <[email protected]> wrote:
>
> On Tue 04-09-18 11:33:39, Alexander Duyck wrote:
> > From: Alexander Duyck <[email protected]>
> >
> > On systems with a large amount of memory it can take a significant amount
> > of time to initialize all of the page structs with the PAGE_POISON_PATTERN
> > value. I have seen it take over 2 minutes to initialize a system with
> > over 12GB of RAM.
> >
> > In order to work around the issue I had to disable CONFIG_DEBUG_VM and then
> > the boot time returned to something much more reasonable as the
> > arch_add_memory call completed in milliseconds versus seconds. However in
> > doing that I had to disable all of the other VM debugging on the system.
>
> I agree that CONFIG_DEBUG_VM is a big hammer but the primary point of
> this check is to catch uninitialized struct pages after the early mem
> init rework so the intention was to make it enabled on as many systems
> with debugging enabled as possible. DEBUG_VM is not free already so it
> sounded like a good idea to sneak it there.
>
> > I did a bit of research and it seems like the only function that checks
> > for this poison value is the PagePoisoned function, and it is only called
> > in two spots. One is the PF_POISONED_CHECK macro that is only in use when
> > CONFIG_DEBUG_VM_PGFLAGS is defined, and the other is as a part of the
> > __dump_page function which is using the check to prevent a recursive
> > failure in the event of discovering a poisoned page.
>
> Hmm, I have missed the dependency on CONFIG_DEBUG_VM_PGFLAGS when
> reviewing the patch. My debugging kernel config doesn't have it enabled
> for example. I know that Fedora configs have CONFIG_DEBUG_VM enabled
> but I cannot find their config right now to double check for the
> CONFIG_DEBUG_VM_PGFLAGS right now.
>
> I am not really sure this dependency was intentional but I strongly
> suspect Pavel really wanted to have it DEBUG_VM scoped.

So I think the idea as per the earlier discussion with Pavel is that
by preloading it with all 1's anything that is expecting all 0's will
blow up one way or another. We just aren't explicitly checking for the
value, but it is still possibly going to be discovered via something
like a GPF when we try to access an invalid pointer or counter.

What I think I can do to address some of the concern is make this
something that depends on CONFIG_DEBUG_VM and defaults to Y. That way
for systems that are defaulting their config they should maintain the
same behavior, however for those systems that are running a large
amount of memory they can optionally turn off
CONFIG_DEBUG_VM_PAGE_INIT_POISON instead of having to switch off all
the virtual memory debugging via CONFIG_DEBUG_VM. I guess it would
become more of a peer to CONFIG_DEBUG_VM_PGFLAGS as the poison check
wouldn't really apply after init anyway.

- Alex

2018-09-05 20:21:08

by Alexander Duyck

[permalink] [raw]
Subject: Re: [PATCH 2/2] mm: Create non-atomic version of SetPageReserved for init use

On Tue, Sep 4, 2018 at 11:24 PM Michal Hocko <[email protected]> wrote:
>
> On Tue 04-09-18 11:33:45, Alexander Duyck wrote:
> > From: Alexander Duyck <[email protected]>
> >
> > It doesn't make much sense to use the atomic SetPageReserved at init time
> > when we are using memset to clear the memory and manipulating the page
> > flags via simple "&=" and "|=" operations in __init_single_page.
> >
> > This patch adds a non-atomic version __SetPageReserved that can be used
> > during page init and shows about a 10% improvement in initialization times
> > on the systems I have available for testing.
>
> I agree with Dave about a comment is due. I am also quite surprised that
> this leads to such a large improvement. Could you be more specific about
> your test and machines you were testing on?

So my test case has been just initializing 4 3TB blocks of persistent
memory with a few trace_printk values added to track total time in
move_pfn_range_to_zone.

What I have been seeing is that the time needed for the call drops on
average from 35-36 seconds down to around 31-32.

> Other than that the patch makes sense to me.
>
> > Signed-off-by: Alexander Duyck <[email protected]>
>
> With the above addressed, feel free to add
> Acked-by: Michal Hocko <[email protected]>
>
> Thanks!

As far as adding a comment are we just talking about why it is
reserved, or do we need a description of the __SetPageReserved versus
SetPageReserved. For now I was looking at adding a comment like:
@@ -5517,8 +5517,13 @@ void __meminit memmap_init_zone(unsigned long
size, int nid, unsigned long zone,
not_early:
page = pfn_to_page(pfn);
__init_single_page(page, pfn, zone, nid);
+
+ /*
+ * Mark page reserved as it will need to wait for onlining
+ * phase for it to be fully associated with a zone.
+ */
if (context == MEMMAP_HOTPLUG)
- SetPageReserved(page);
+ __SetPageReserved(page);

/*
* Mark the block movable so that blocks are reserved for

Any thoughts on this?

Thanks.

- Alex

2018-09-05 20:24:13

by Pasha Tatashin

[permalink] [raw]
Subject: Re: [PATCH 2/2] mm: Create non-atomic version of SetPageReserved for init use



On 9/5/18 4:18 PM, Alexander Duyck wrote:
> On Tue, Sep 4, 2018 at 11:24 PM Michal Hocko <[email protected]> wrote:
>>
>> On Tue 04-09-18 11:33:45, Alexander Duyck wrote:
>>> From: Alexander Duyck <[email protected]>
>>>
>>> It doesn't make much sense to use the atomic SetPageReserved at init time
>>> when we are using memset to clear the memory and manipulating the page
>>> flags via simple "&=" and "|=" operations in __init_single_page.
>>>
>>> This patch adds a non-atomic version __SetPageReserved that can be used
>>> during page init and shows about a 10% improvement in initialization times
>>> on the systems I have available for testing.
>>
>> I agree with Dave about a comment is due. I am also quite surprised that
>> this leads to such a large improvement. Could you be more specific about
>> your test and machines you were testing on?
>
> So my test case has been just initializing 4 3TB blocks of persistent
> memory with a few trace_printk values added to track total time in
> move_pfn_range_to_zone.
>
> What I have been seeing is that the time needed for the call drops on
> average from 35-36 seconds down to around 31-32.

Just curious why is there variance? During boot time is usually pretty
consistent, as there is only one thread and system is in pretty much the
same state.

A dmesg output in the commit log would be helpful.

Thank you,
Pavel

>
>> Other than that the patch makes sense to me.
>>
>>> Signed-off-by: Alexander Duyck <[email protected]>
>>
>> With the above addressed, feel free to add
>> Acked-by: Michal Hocko <[email protected]>
>>
>> Thanks!
>
> As far as adding a comment are we just talking about why it is
> reserved, or do we need a description of the __SetPageReserved versus
> SetPageReserved. For now I was looking at adding a comment like:
> @@ -5517,8 +5517,13 @@ void __meminit memmap_init_zone(unsigned long
> size, int nid, unsigned long zone,
> not_early:
> page = pfn_to_page(pfn);
> __init_single_page(page, pfn, zone, nid);
> +
> + /*
> + * Mark page reserved as it will need to wait for onlining
> + * phase for it to be fully associated with a zone.
> + */
> if (context == MEMMAP_HOTPLUG)
> - SetPageReserved(page);
> + __SetPageReserved(page);
>
> /*
> * Mark the block movable so that blocks are reserved for
>
> Any thoughts on this?
>
> Thanks.
>
> - Alex
>

2018-09-05 20:38:22

by Alexander Duyck

[permalink] [raw]
Subject: Re: [PATCH 2/2] mm: Create non-atomic version of SetPageReserved for init use

On Wed, Sep 5, 2018 at 1:22 PM Pasha Tatashin
<[email protected]> wrote:
>
>
>
> On 9/5/18 4:18 PM, Alexander Duyck wrote:
> > On Tue, Sep 4, 2018 at 11:24 PM Michal Hocko <[email protected]> wrote:
> >>
> >> On Tue 04-09-18 11:33:45, Alexander Duyck wrote:
> >>> From: Alexander Duyck <[email protected]>
> >>>
> >>> It doesn't make much sense to use the atomic SetPageReserved at init time
> >>> when we are using memset to clear the memory and manipulating the page
> >>> flags via simple "&=" and "|=" operations in __init_single_page.
> >>>
> >>> This patch adds a non-atomic version __SetPageReserved that can be used
> >>> during page init and shows about a 10% improvement in initialization times
> >>> on the systems I have available for testing.
> >>
> >> I agree with Dave about a comment is due. I am also quite surprised that
> >> this leads to such a large improvement. Could you be more specific about
> >> your test and machines you were testing on?
> >
> > So my test case has been just initializing 4 3TB blocks of persistent
> > memory with a few trace_printk values added to track total time in
> > move_pfn_range_to_zone.
> >
> > What I have been seeing is that the time needed for the call drops on
> > average from 35-36 seconds down to around 31-32.
>
> Just curious why is there variance? During boot time is usually pretty
> consistent, as there is only one thread and system is in pretty much the
> same state.
>
> A dmesg output in the commit log would be helpful.
>
> Thank you,
> Pavel

The variance has to do with the fact that it is being added via
hot-plug. So in this case the system boots and then after 5 minutes it
then goes about hot-plugging the memory. The memmap_init_zone call
will make regular calls into cond_resched() and it seems like if there
are any other active threads that can end up impacting the timings and
provide a few hundred ms of variation between runs.

In addition there is also NUMA locality that plays a role. I have seen
values as low as 25.5s pre-patch, 23.2 after, and values as high as
39.17 pre-patch, 37.3 after. I am assuming that the lowest values just
happened to luck into being node local, and the highest values end up
being 2 nodes away on the 4 node system I am testing. I'm planning to
try and address the NUMA issues using an approach similar to what the
deferred_init is already doing by trying to start a kernel thread on
the correct node and then probably just waiting on that to complete
outside of the hotplug lock. The solution will end up being a hybrid
probably between the work Dan Williams had submitted a couple months
ago and the existing deferred_init code. But I will be targeting that
for 4.20 at the earliest.

- Alex

2018-09-06 05:39:31

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH 1/2] mm: Move page struct poisoning from CONFIG_DEBUG_VM to CONFIG_DEBUG_VM_PGFLAGS

On Wed 05-09-18 08:32:05, Alexander Duyck wrote:
> On Tue, Sep 4, 2018 at 11:10 PM Michal Hocko <[email protected]> wrote:
> >
> > On Tue 04-09-18 11:33:39, Alexander Duyck wrote:
> > > From: Alexander Duyck <[email protected]>
> > >
> > > On systems with a large amount of memory it can take a significant amount
> > > of time to initialize all of the page structs with the PAGE_POISON_PATTERN
> > > value. I have seen it take over 2 minutes to initialize a system with
> > > over 12GB of RAM.
> > >
> > > In order to work around the issue I had to disable CONFIG_DEBUG_VM and then
> > > the boot time returned to something much more reasonable as the
> > > arch_add_memory call completed in milliseconds versus seconds. However in
> > > doing that I had to disable all of the other VM debugging on the system.
> >
> > I agree that CONFIG_DEBUG_VM is a big hammer but the primary point of
> > this check is to catch uninitialized struct pages after the early mem
> > init rework so the intention was to make it enabled on as many systems
> > with debugging enabled as possible. DEBUG_VM is not free already so it
> > sounded like a good idea to sneak it there.
> >
> > > I did a bit of research and it seems like the only function that checks
> > > for this poison value is the PagePoisoned function, and it is only called
> > > in two spots. One is the PF_POISONED_CHECK macro that is only in use when
> > > CONFIG_DEBUG_VM_PGFLAGS is defined, and the other is as a part of the
> > > __dump_page function which is using the check to prevent a recursive
> > > failure in the event of discovering a poisoned page.
> >
> > Hmm, I have missed the dependency on CONFIG_DEBUG_VM_PGFLAGS when
> > reviewing the patch. My debugging kernel config doesn't have it enabled
> > for example. I know that Fedora configs have CONFIG_DEBUG_VM enabled
> > but I cannot find their config right now to double check for the
> > CONFIG_DEBUG_VM_PGFLAGS right now.
> >
> > I am not really sure this dependency was intentional but I strongly
> > suspect Pavel really wanted to have it DEBUG_VM scoped.
>
> So I think the idea as per the earlier discussion with Pavel is that
> by preloading it with all 1's anything that is expecting all 0's will
> blow up one way or another. We just aren't explicitly checking for the
> value, but it is still possibly going to be discovered via something
> like a GPF when we try to access an invalid pointer or counter.
>
> What I think I can do to address some of the concern is make this
> something that depends on CONFIG_DEBUG_VM and defaults to Y. That way
> for systems that are defaulting their config they should maintain the
> same behavior, however for those systems that are running a large
> amount of memory they can optionally turn off
> CONFIG_DEBUG_VM_PAGE_INIT_POISON instead of having to switch off all
> the virtual memory debugging via CONFIG_DEBUG_VM. I guess it would
> become more of a peer to CONFIG_DEBUG_VM_PGFLAGS as the poison check
> wouldn't really apply after init anyway.

So the most obvious question is, why don't you simply disable DEBUG_VM?
It is not aimed at production workloads because it adds asserts at many
places and it is quite likely to come up with performance penalty
already.

Besides that, Initializing memory to all ones is not much different to
initializing it to all zeroes which we have been doing until recently
when Pavel has removed that. So why do we need to add yet another
debugging config option. We have way too many of config options already.
--
Michal Hocko
SUSE Labs

2018-09-06 05:43:19

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH 2/2] mm: Create non-atomic version of SetPageReserved for init use

On Wed 05-09-18 13:18:24, Alexander Duyck wrote:
> On Tue, Sep 4, 2018 at 11:24 PM Michal Hocko <[email protected]> wrote:
> >
> > On Tue 04-09-18 11:33:45, Alexander Duyck wrote:
> > > From: Alexander Duyck <[email protected]>
> > >
> > > It doesn't make much sense to use the atomic SetPageReserved at init time
> > > when we are using memset to clear the memory and manipulating the page
> > > flags via simple "&=" and "|=" operations in __init_single_page.
> > >
> > > This patch adds a non-atomic version __SetPageReserved that can be used
> > > during page init and shows about a 10% improvement in initialization times
> > > on the systems I have available for testing.
> >
> > I agree with Dave about a comment is due. I am also quite surprised that
> > this leads to such a large improvement. Could you be more specific about
> > your test and machines you were testing on?
>
> So my test case has been just initializing 4 3TB blocks of persistent
> memory with a few trace_printk values added to track total time in
> move_pfn_range_to_zone.
>
> What I have been seeing is that the time needed for the call drops on
> average from 35-36 seconds down to around 31-32.

This information belongs to the changelog.

>
> > Other than that the patch makes sense to me.
> >
> > > Signed-off-by: Alexander Duyck <[email protected]>
> >
> > With the above addressed, feel free to add
> > Acked-by: Michal Hocko <[email protected]>
> >
> > Thanks!
>
> As far as adding a comment are we just talking about why it is
> reserved, or do we need a description of the __SetPageReserved versus
> SetPageReserved. For now I was looking at adding a comment like:

the later. The reason why we make it reserved should be quite clear. A
comment wouldn't hurt of course and what you have is a good start. But
it is usually atomic vs. non-atomic SetPage$Foo which needs some
clarification.

> @@ -5517,8 +5517,13 @@ void __meminit memmap_init_zone(unsigned long
> size, int nid, unsigned long zone,
> not_early:
> page = pfn_to_page(pfn);
> __init_single_page(page, pfn, zone, nid);
> +
> + /*
> + * Mark page reserved as it will need to wait for onlining
> + * phase for it to be fully associated with a zone.
> + */
> if (context == MEMMAP_HOTPLUG)
> - SetPageReserved(page);
> + __SetPageReserved(page);
>
> /*
> * Mark the block movable so that blocks are reserved for
>
> Any thoughts on this?
>
> Thanks.
>
> - Alex

--
Michal Hocko
SUSE Labs