LinuxLists.cc - [PATCH] mm/page_alloc: fix memmap_init

2018-03-01 12:51:07

Subject: [PATCH] mm/page_alloc: fix memmap_init_zone pageblock alignment

In move_freepages() a BUG_ON() can be triggered on uninitialized page structures
due to pageblock alignment. Aligning the skipped pfns in memmap_init_zone() the
same way as in move_freepages_block() simply fixes those crashes.

Fixes: b92df1de5d28 ("[mm] page_alloc: skip over regions of invalid pfns where possible")
Signed-off-by: Daniel Vacek <[email protected]>
Cc: [email protected]
---
mm/page_alloc.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index cb416723538f..9edee36e6a74 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5359,9 +5359,14 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
/*
* Skip to the pfn preceding the next valid one (or
* end_pfn), such that we hit a valid pfn (or end_pfn)
- * on our next iteration of the loop.
+ * on our next iteration of the loop. Note that it needs
+ * to be pageblock aligned even when the region itself
+ * is not as move_freepages_block() can shift ahead of
+ * the valid region but still depends on correct page
+ * metadata.
*/
- pfn = memblock_next_valid_pfn(pfn, end_pfn) - 1;
+ pfn = (memblock_next_valid_pfn(pfn, end_pfn) &
+ ~(pageblock_nr_pages-1)) - 1;
#endif
continue;
}
--
2.16.2

2018-03-01 13:11:30

by Michal Hocko

[permalink] [raw]

Subject: Re: [PATCH] mm/page_alloc: fix memmap_init_zone pageblock alignment

On Thu 01-03-18 13:47:45, Daniel Vacek wrote:
> In move_freepages() a BUG_ON() can be triggered on uninitialized page structures
> due to pageblock alignment. Aligning the skipped pfns in memmap_init_zone() the
> same way as in move_freepages_block() simply fixes those crashes.

This changelog doesn't describe how the fix works. Why doesn't
memblock_next_valid_pfn return the first valid pfn as one would expect?

It would be also good put the panic info in the changelog.

> Fixes: b92df1de5d28 ("[mm] page_alloc: skip over regions of invalid pfns where possible")
> Signed-off-by: Daniel Vacek <[email protected]>
> Cc: [email protected]
> ---
> mm/page_alloc.c | 9 +++++++--
> 1 file changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index cb416723538f..9edee36e6a74 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5359,9 +5359,14 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
> /*
> * Skip to the pfn preceding the next valid one (or
> * end_pfn), such that we hit a valid pfn (or end_pfn)
> - * on our next iteration of the loop.
> + * on our next iteration of the loop. Note that it needs
> + * to be pageblock aligned even when the region itself
> + * is not as move_freepages_block() can shift ahead of
> + * the valid region but still depends on correct page
> + * metadata.
> */
> - pfn = memblock_next_valid_pfn(pfn, end_pfn) - 1;
> + pfn = (memblock_next_valid_pfn(pfn, end_pfn) &
> + ~(pageblock_nr_pages-1)) - 1;
> #endif
> continue;
> }
> --
> 2.16.2
>

--
Michal Hocko
SUSE Labs

2018-03-01 15:11:22

by Daniel Vacek

[permalink] [raw]

Subject: Re: [PATCH] mm/page_alloc: fix memmap_init_zone pageblock alignment

ffffe31d01ed8000 7b600000 0 0 0 0
On Thu, Mar 1, 2018 at 2:10 PM, Michal Hocko <[email protected]> wrote:
> On Thu 01-03-18 13:47:45, Daniel Vacek wrote:
>> In move_freepages() a BUG_ON() can be triggered on uninitialized page structures
>> due to pageblock alignment. Aligning the skipped pfns in memmap_init_zone() the
>> same way as in move_freepages_block() simply fixes those crashes.
>
> This changelog doesn't describe how the fix works. Why doesn't
> memblock_next_valid_pfn return the first valid pfn as one would expect?

Actually it does. The point is it is not guaranteed to be pageblock
aligned. And we
actually want to initialize even those page structures which are
outside of the range.
Hence the alignment here.

For example from reproducer machine, memory map from e820/BIOS:

$ grep 7b7ff000 /proc/iomem
7b7ff000-7b7fffff : System RAM

Page structures before commit b92df1de5d28:

crash> kmem -p 77fff000 78000000 7b5ff000 7b600000 7b7fe000 7b7ff000
7b800000 7ffff000 80000000
PAGE PHYSICAL MAPPING INDEX CNT FLAGS
fffff73941e00000 78000000 0 0 1 1fffff00000000
fffff73941ed7fc0 7b5ff000 0 0 1 1fffff00000000
fffff73941ed8000 7b600000 0 0 1 1fffff00000000
fffff73941edff80 7b7fe000 0 0 1 1fffff00000000
fffff73941edffc0 7b7ff000 ffff8e67e04d3ae0 ad84 1 1fffff00020068
uptodate,lru,active,mappedtodisk <<<< start of the range here
fffff73941ee0000 7b800000 0 0 1 1fffff00000000
fffff73941ffffc0 7ffff000 0 0 1 1fffff00000000

So far so good.

After commit b92df1de5d28 machine eventually crashes with:

BUG at mm/page_alloc.c:1913

> VM_BUG_ON(page_zone(start_page) != page_zone(end_page));

From registers and stack I digged start_page points to
ffffe31d01ed8000 (note that this is
page ffffe31d01edffc0 aligned to pageblock) and I can see this in memory dump:

crash> kmem -p 77fff000 78000000 7b5ff000 7b600000 7b7fe000 7b7ff000
7b800000 7ffff000 80000000
PAGE PHYSICAL MAPPING INDEX CNT FLAGS
ffffe31d01e00000 78000000 0 0 0 0
ffffe31d01ed7fc0 7b5ff000 0 0 0 0
ffffe31d01ed8000 7b600000 0 0 0 0 <<<< note
that nodeid and zonenr are encoded in top bits of page flags which are
not initialized here, hence the crash :-(
ffffe31d01edff80 7b7fe000 0 0 0 0
ffffe31d01edffc0 7b7ff000 0 0 1 1fffff00000000
ffffe31d01ee0000 7b800000 0 0 1 1fffff00000000
ffffe31d01ffffc0 7ffff000 0 0 1 1fffff00000000

With my fix applied:

crash> kmem -p 77fff000 78000000 7b5ff000 7b600000 7b7fe000 7b7ff000
7b800000 7ffff000 80000000
PAGE PHYSICAL MAPPING INDEX CNT FLAGS
ffffea0001e00000 78000000 0 0 0 0
ffffea0001e00000 7b5ff000 0 0 0 0
ffffea0001ed8000 7b600000 0 0 1 1fffff00000000
<<<< vital data filled in here this time \o/
ffffea0001edff80 7b7fe000 0 0 1 1fffff00000000
ffffea0001edffc0 7b7ff000 ffff88017fb13720 8 2 1fffff00020068
uptodate,lru,active,mappedtodisk
ffffea0001ee0000 7b800000 0 0 1 1fffff00000000
ffffea0001ffffc0 7ffff000 0 0 1 1fffff00000000

We are not interested in the beginning of whole section. Just the
pages in the first
populated block where the range begins are important (actually just
the first one really, but...).

> It would be also good put the panic info in the changelog.

Of course I forgot to link the related bugzilla:

https://bugzilla.kernel.org/show_bug.cgi?id=196443

Though it is not very well explained there as well. I hope my notes
above make it clear.

>> Fixes: b92df1de5d28 ("[mm] page_alloc: skip over regions of invalid pfns where possible")
>> Signed-off-by: Daniel Vacek <[email protected]>
>> Cc: [email protected]
>> ---
>> mm/page_alloc.c | 9 +++++++--
>> 1 file changed, 7 insertions(+), 2 deletions(-)
>>
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index cb416723538f..9edee36e6a74 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -5359,9 +5359,14 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
>> /*
>> * Skip to the pfn preceding the next valid one (or
>> * end_pfn), such that we hit a valid pfn (or end_pfn)
>> - * on our next iteration of the loop.
>> + * on our next iteration of the loop. Note that it needs
>> + * to be pageblock aligned even when the region itself
>> + * is not as move_freepages_block() can shift ahead of
>> + * the valid region but still depends on correct page
>> + * metadata.
>> */
>> - pfn = memblock_next_valid_pfn(pfn, end_pfn) - 1;
>> + pfn = (memblock_next_valid_pfn(pfn, end_pfn) &
>> + ~(pageblock_nr_pages-1)) - 1;
>> #endif
>> continue;
>> }
>> --
>> 2.16.2
>>
>
> --
> Michal Hocko
> SUSE Labs

2018-03-01 15:30:27

by Michal Hocko

[permalink] [raw]

Subject: Re: [PATCH] mm/page_alloc: fix memmap_init_zone pageblock alignment

On Thu 01-03-18 16:09:35, Daniel Vacek wrote:
[...]
> $ grep 7b7ff000 /proc/iomem
> 7b7ff000-7b7fffff : System RAM
[...]
> After commit b92df1de5d28 machine eventually crashes with:
>
> BUG at mm/page_alloc.c:1913
>
> > VM_BUG_ON(page_zone(start_page) != page_zone(end_page));

This is an important information that should be in the changelog.

> >From registers and stack I digged start_page points to
> ffffe31d01ed8000 (note that this is
> page ffffe31d01edffc0 aligned to pageblock) and I can see this in memory dump:
>
> crash> kmem -p 77fff000 78000000 7b5ff000 7b600000 7b7fe000 7b7ff000
> 7b800000 7ffff000 80000000
> PAGE PHYSICAL MAPPING INDEX CNT FLAGS
> ffffe31d01e00000 78000000 0 0 0 0
> ffffe31d01ed7fc0 7b5ff000 0 0 0 0
> ffffe31d01ed8000 7b600000 0 0 0 0 <<<< note

Are those ranges covered by the System RAM as well?

> that nodeid and zonenr are encoded in top bits of page flags which are
> not initialized here, hence the crash :-(
> ffffe31d01edff80 7b7fe000 0 0 0 0
> ffffe31d01edffc0 7b7ff000 0 0 1 1fffff00000000
> ffffe31d01ee0000 7b800000 0 0 1 1fffff00000000
> ffffe31d01ffffc0 7ffff000 0 0 1 1fffff00000000

It is still not clear why not to do the alignment in
memblock_next_valid_pfn rahter than its caller.
--
Michal Hocko
SUSE Labs

2018-03-01 16:21:25

by Daniel Vacek

[permalink] [raw]

Subject: Re: [PATCH] mm/page_alloc: fix memmap_init_zone pageblock alignment

On Thu, Mar 1, 2018 at 4:27 PM, Michal Hocko <[email protected]> wrote:
> On Thu 01-03-18 16:09:35, Daniel Vacek wrote:
> [...]
>> $ grep 7b7ff000 /proc/iomem
>> 7b7ff000-7b7fffff : System RAM
> [...]
>> After commit b92df1de5d28 machine eventually crashes with:
>>
>> BUG at mm/page_alloc.c:1913
>>
>> > VM_BUG_ON(page_zone(start_page) != page_zone(end_page));
>
> This is an important information that should be in the changelog.

And that's exactly what my seven very first words tried to express in
human readable form instead of mechanically pasting the source code. I
guess that's a matter of preference. Though I see grepping later can
be an issue here.

>> >From registers and stack I digged start_page points to
>> ffffe31d01ed8000 (note that this is
>> page ffffe31d01edffc0 aligned to pageblock) and I can see this in memory dump:
>>
>> crash> kmem -p 77fff000 78000000 7b5ff000 7b600000 7b7fe000 7b7ff000
>> 7b800000 7ffff000 80000000
>> PAGE PHYSICAL MAPPING INDEX CNT FLAGS
>> ffffe31d01e00000 78000000 0 0 0 0
>> ffffe31d01ed7fc0 7b5ff000 0 0 0 0
>> ffffe31d01ed8000 7b600000 0 0 0 0 <<<< note
>
> Are those ranges covered by the System RAM as well?
>
>> that nodeid and zonenr are encoded in top bits of page flags which are
>> not initialized here, hence the crash :-(
>> ffffe31d01edff80 7b7fe000 0 0 0 0
>> ffffe31d01edffc0 7b7ff000 0 0 1 1fffff00000000
>> ffffe31d01ee0000 7b800000 0 0 1 1fffff00000000
>> ffffe31d01ffffc0 7ffff000 0 0 1 1fffff00000000
>
> It is still not clear why not to do the alignment in
> memblock_next_valid_pfn rather than its caller.

As it's the mem init which needs it to be aligned. Other callers may
not, possibly?
Not that there are any other callers at the moment so it really does
not matter where it is placed. The only difference would be the end of
the loop with end_pfn vs aligned end_pfn. And it looks like the pure
(unaligned) end_pfn would be preferred here. Wanna me send a v2?

> --
> Michal Hocko
> SUSE Labs

2018-03-01 17:26:00

by Daniel Vacek

[permalink] [raw]

Subject: Re: [PATCH] mm/page_alloc: fix memmap_init_zone pageblock alignment

On Thu, Mar 1, 2018 at 4:27 PM, Michal Hocko <[email protected]> wrote:
> On Thu 01-03-18 16:09:35, Daniel Vacek wrote:
>> From registers and stack I digged start_page points to
>> ffffe31d01ed8000 (note that this is
>> page ffffe31d01edffc0 aligned to pageblock) and I can see this in memory dump:
>>
>> crash> kmem -p 77fff000 78000000 7b5ff000 7b600000 7b7fe000 7b7ff000
>> 7b800000 7ffff000 80000000
>> PAGE PHYSICAL MAPPING INDEX CNT FLAGS
>> ffffe31d01e00000 78000000 0 0 0 0
>> ffffe31d01ed7fc0 7b5ff000 0 0 0 0
>> ffffe31d01ed8000 7b600000 0 0 0 0 <<<< note
>
> Are those ranges covered by the System RAM as well?

Sorry I forgot to answer this. If they were, the loop won't be
skipping them, right? But it really does not matter here, kernel needs
(some) page structures initialized anyways. And I do not feel
comfortable with removing the VM_BUG_ON(). The initialization is what
changed with commit b92df1de5d28, hence fixing this.

--nX

>> that nodeid and zonenr are encoded in top bits of page flags which are
>> not initialized here, hence the crash :-(
>> ffffe31d01edff80 7b7fe000 0 0 0 0
>> ffffe31d01edffc0 7b7ff000 0 0 1 1fffff00000000
>> ffffe31d01ee0000 7b800000 0 0 1 1fffff00000000
>> ffffe31d01ffffc0 7ffff000 0 0 1 1fffff00000000
>
> It is still not clear why not to do the alignment in
> memblock_next_valid_pfn rahter than its caller.
> --
> Michal Hocko
> SUSE Labs

2018-03-01 23:23:17

by Andrew Morton

[permalink] [raw]

Subject: Re: [PATCH] mm/page_alloc: fix memmap_init_zone pageblock alignment

On Thu, 1 Mar 2018 17:20:04 +0100 Daniel Vacek <[email protected]> wrote:

> Wanna me send a v2?

Yes please ;)

2018-03-02 10:56:25

by Daniel Vacek

[permalink] [raw]

Subject: Re: [PATCH] mm/page_alloc: fix memmap_init_zone pageblock alignment

On Thu, Mar 1, 2018 at 5:20 PM, Daniel Vacek <[email protected]> wrote:
> On Thu, Mar 1, 2018 at 4:27 PM, Michal Hocko <[email protected]> wrote:
>> It is still not clear why not to do the alignment in
>> memblock_next_valid_pfn rather than its caller.
>
> As it's the mem init which needs it to be aligned. Other callers may
> not, possibly?
> Not that there are any other callers at the moment so it really does
> not matter where it is placed. The only difference would be the end of
> the loop with end_pfn vs aligned end_pfn. And it looks like the pure
> (unaligned) end_pfn would be preferred here. Wanna me send a v2?

Thinking about it again memblock has nothing to do with pageblock. And
the function name suggests one shall get a next valid pfn, not
something totally unrelated to memblock. So that's what it returns.
It's the mem init which needs to align this and hence mem init aligns
it for it's purposes. I'd call this the correct design.

To deal with the end_pfn special case I'd actually get rid of it
completely and hardcode -1UL as max pfn instead (rather than 0).
Caller should handle max pfn as an error or end of the loop as here in
this case.

I'll send a v2 with this implemented.

Paul> Why is it based on memblock actually? Wouldn't a generic
mem_section solution work satisfiable for you? That would be natively
aligned with whole section (doing a bit more work as a result in the
end) and also independent of CONFIG_HAVE_MEMBLOCK_NODE_MAP
availability.

>> --
>> Michal Hocko
>> SUSE Labs

2018-03-02 15:29:59

by Michal Hocko

[permalink] [raw]

Subject: Re: [PATCH] mm/page_alloc: fix memmap_init_zone pageblock alignment

On Thu 01-03-18 17:20:04, Daniel Vacek wrote:
> On Thu, Mar 1, 2018 at 4:27 PM, Michal Hocko <[email protected]> wrote:
> > On Thu 01-03-18 16:09:35, Daniel Vacek wrote:
> > [...]
> >> $ grep 7b7ff000 /proc/iomem
> >> 7b7ff000-7b7fffff : System RAM
> > [...]
> >> After commit b92df1de5d28 machine eventually crashes with:
> >>
> >> BUG at mm/page_alloc.c:1913
> >>
> >> > VM_BUG_ON(page_zone(start_page) != page_zone(end_page));
> >
> > This is an important information that should be in the changelog.
>
> And that's exactly what my seven very first words tried to express in
> human readable form instead of mechanically pasting the source code. I
> guess that's a matter of preference. Though I see grepping later can
> be an issue here.

Do not get me wrong I do not want to nag just for fun of it. The
changelog should be really clear about the problem. What might be clear
to you based on the debugging might not be so clear to others. And the
struct page initialization code is far from trivial especially when we
have different alignment requirements by the memory model and the page
allocator.

Therefore being as clear as possible is really valuable. So I would
really love to see the changelog to contain.
- What is going on - VM_BUG_ON in move_freepages along with the crash
report
- memory ranges exported by BIOS/FW
- explain why is the pageblock alignment the proper one. How does the
range look from the memory section POV (with SPARSEMEM).
- What about those unaligned pages which are not backed by any memory?
Are they reserved so that they will never get used?

And just to be clear. I am not saying your patch is wrong. It just
raises more questions than answers and I suspect it just papers over
some more fundamental problem. I might be clearly wrong and I cannot
deserve this more time for the next week because I will be offline
but I would _really_ appreciate if this all got explained.

Thanks!
--
Michal Hocko
SUSE Labs

2018-03-02 16:13:26

by Daniel Vacek

[permalink] [raw]

Subject: Re: [PATCH] mm/page_alloc: fix memmap_init_zone pageblock alignment

On Fri, Mar 2, 2018 at 2:01 PM, Michal Hocko <[email protected]> wrote:
> On Thu 01-03-18 17:20:04, Daniel Vacek wrote:
>> On Thu, Mar 1, 2018 at 4:27 PM, Michal Hocko <[email protected]> wrote:
>> > On Thu 01-03-18 16:09:35, Daniel Vacek wrote:
>> > [...]
>> >> $ grep 7b7ff000 /proc/iomem
>> >> 7b7ff000-7b7fffff : System RAM
>> > [...]
>> >> After commit b92df1de5d28 machine eventually crashes with:
>> >>
>> >> BUG at mm/page_alloc.c:1913
>> >>
>> >> > VM_BUG_ON(page_zone(start_page) != page_zone(end_page));
>> >
>> > This is an important information that should be in the changelog.
>>
>> And that's exactly what my seven very first words tried to express in
>> human readable form instead of mechanically pasting the source code. I
>> guess that's a matter of preference. Though I see grepping later can
>> be an issue here.
>
> Do not get me wrong I do not want to nag just for fun of it. The
> changelog should be really clear about the problem. What might be clear
> to you based on the debugging might not be so clear to others. And the
> struct page initialization code is far from trivial especially when we
> have different alignment requirements by the memory model and the page
> allocator.

I get it. I didn't mean to be rude or something. I just thought I
covered all the relevant details..

> Therefore being as clear as possible is really valuable. So I would
> really love to see the changelog to contain.
> - What is going on - VM_BUG_ON in move_freepages along with the crash
> report

I'll put more details there.

> - memory ranges exported by BIOS/FW

They were not mentioned as they are not really relevant. Any e820 map
can have issues. Now I only saw reports on few selected machines,
mostly LENOVO System x3650 M5, some FUJITSU, some Cisco blades. But
the map is always fairly normal. IIUC, the bug only happens if the
range which is not pageblock aligned happens to be the first one in a
zone or following after an not-populated section.

Again, nothing of that is really relevant. What is is that the commit
b92df1de5d28 changes the way page structures are initialized so that
for some perfectly fine maps from BIOS kernel now can crash as a
result. And my fix tries to keep at least the bare minimum of the
original behavior needed to keep kernel stable.

> - explain why is the pageblock alignment the proper one. How does the
> range look from the memory section POV (with SPARSEMEM).

The commit message explains that. "the same way as in
move_freepages_block()" to quote myself. The alignment in this
function is the one causing the crash as the VM_BUG_ON() assert in
subsequential move_freepages() is checking the (now) uninitialized
structure. If we follow this alignment the initialization will not get
skipped for that structure. Again, this is partially restoring the
original behavior rather than rewriting move_freepages{,_block} to not
crash with some data it was not designed for.

I'll try to explain this more transparently in commit message.

Alternatively you can just revert the b92df1de5d28. That will fix the
crashes as well.

> - What about those unaligned pages which are not backed by any memory?
> Are they reserved so that they will never get used?

They are handled the same way as it used to be before b92df1de5d28.
This patch does not change or touch anything with this regards. Or am
I wrong?

> And just to be clear. I am not saying your patch is wrong. It just

You better not. My patch it totally correct :p
(I hope)

> raises more questions than answers and I suspect it just papers over
> some more fundamental problem. I might be clearly wrong and I cannot

I see. Thank you for looking into it. It's appreciated. I would not
call it a fundamental problem, rather a design of
move_freepages{,_block} which I'd vote for keeping for now. Hopefully
I explained it above.

> deserve this more time for the next week because I will be offline

Enjoy your time off.

> but I would _really_ appreciate if this all got explained.

I'll do my best.

> Thanks!
> --
> Michal Hocko
> SUSE Labs

2018-03-02 18:07:40

by Daniel Vacek

[permalink] [raw]

Subject: [PATCH v2] mm/page_alloc: fix memmap_init_zone pageblock alignment

BUG at mm/page_alloc.c:1913

> VM_BUG_ON(page_zone(start_page) != page_zone(end_page));

Commit b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns
where possible") introduced a bug where move_freepages() triggers a
VM_BUG_ON() on uninitialized page structure due to pageblock alignment.
To fix this, simply align the skipped pfns in memmap_init_zone()
the same way as in move_freepages_block().

Fixes: b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns where possible")
Signed-off-by: Daniel Vacek <[email protected]>
Cc: [email protected]
---
mm/memblock.c | 13 ++++++-------
mm/page_alloc.c | 9 +++++++--
2 files changed, 13 insertions(+), 9 deletions(-)

diff --git a/mm/memblock.c b/mm/memblock.c
index 5a9ca2a1751b..2a5facd236bb 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -1101,13 +1101,12 @@ void __init_memblock __next_mem_pfn_range(int *idx, int nid,
*out_nid = r->nid;
}

-unsigned long __init_memblock memblock_next_valid_pfn(unsigned long pfn,
- unsigned long max_pfn)
+unsigned long __init_memblock memblock_next_valid_pfn(unsigned long pfn)
{
struct memblock_type *type = &memblock.memory;
unsigned int right = type->cnt;
unsigned int mid, left = 0;
- phys_addr_t addr = PFN_PHYS(pfn + 1);
+ phys_addr_t addr = PFN_PHYS(++pfn);

do {
mid = (right + left) / 2;
@@ -1118,15 +1117,15 @@ unsigned long __init_memblock memblock_next_valid_pfn(unsigned long pfn,
type->regions[mid].size))
left = mid + 1;
else {
- /* addr is within the region, so pfn + 1 is valid */
- return min(pfn + 1, max_pfn);
+ /* addr is within the region, so pfn is valid */
+ return pfn;
}
} while (left < right);

if (right == type->cnt)
- return max_pfn;
+ return -1UL;
else
- return min(PHYS_PFN(type->regions[right].base), max_pfn);
+ return PHYS_PFN(type->regions[right].base);
}

/**
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index cb416723538f..eb27ccb50928 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5359,9 +5359,14 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
/*
* Skip to the pfn preceding the next valid one (or
* end_pfn), such that we hit a valid pfn (or end_pfn)
- * on our next iteration of the loop.
+ * on our next iteration of the loop. Note that it needs
+ * to be pageblock aligned even when the region itself
+ * is not as move_freepages_block() can shift ahead of
+ * the valid region but still depends on correct page
+ * metadata.
*/
- pfn = memblock_next_valid_pfn(pfn, end_pfn) - 1;
+ pfn = (memblock_next_valid_pfn(pfn) &
+ ~(pageblock_nr_pages-1)) - 1;
#endif
continue;
}
--
2.16.2

2018-03-03 03:53:18

by Daniel Vacek

[permalink] [raw]

Subject: [PATCH v3 2/2] mm/page_alloc: fix memmap_init_zone pageblock alignment

Commit b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns
where possible") introduced a bug where move_freepages() triggers a
VM_BUG_ON() on uninitialized page structure due to pageblock alignment.
To fix this, simply align the skipped pfns in memmap_init_zone()
the same way as in move_freepages_block().

From one of the RHEL reports:

crash> log | grep -e BUG -e RIP -e Call.Trace -e move_freepages_block -e rmqueue -e freelist -A1
kernel BUG at mm/page_alloc.c:1389!
invalid opcode: 0000 [#1] SMP
--
RIP: 0010:[<ffffffff8118833e>] [<ffffffff8118833e>] move_freepages+0x15e/0x160
RSP: 0018:ffff88054d727688 EFLAGS: 00010087
--
Call Trace:
[<ffffffff811883b3>] move_freepages_block+0x73/0x80
[<ffffffff81189e63>] __rmqueue+0x263/0x460
[<ffffffff8118c781>] get_page_from_freelist+0x7e1/0x9e0
[<ffffffff8118caf6>] __alloc_pages_nodemask+0x176/0x420
--
RIP [<ffffffff8118833e>] move_freepages+0x15e/0x160
RSP <ffff88054d727688>

crash> page_init_bug -v | grep RAM
<struct resource 0xffff88067fffd2f8> 1000 - 9bfff System RAM (620.00 KiB)
<struct resource 0xffff88067fffd3a0> 100000 - 430bffff System RAM ( 1.05 GiB = 1071.75 MiB = 1097472.00 KiB)
<struct resource 0xffff88067fffd410> 4b0c8000 - 4bf9cfff System RAM ( 14.83 MiB = 15188.00 KiB)
<struct resource 0xffff88067fffd480> 4bfac000 - 646b1fff System RAM (391.02 MiB = 400408.00 KiB)
<struct resource 0xffff88067fffd560> 7b788000 - 7b7fffff System RAM (480.00 KiB)
<struct resource 0xffff88067fffd640> 100000000 - 67fffffff System RAM ( 22.00 GiB)

crash> page_init_bug | head -6
<struct resource 0xffff88067fffd560> 7b788000 - 7b7fffff System RAM (480.00 KiB)
<struct page 0xffffea0001ede200> 1fffff00000000 0 <struct pglist_data 0xffff88047ffd9000> 1 <struct zone 0xffff88047ffd9800> DMA32 4096 1048575
<struct page 0xffffea0001ede200> 505736 505344 <struct page 0xffffea0001ed8000> 505855 <struct page 0xffffea0001edffc0>
<struct page 0xffffea0001ed8000> 0 0 <struct pglist_data 0xffff88047ffd9000> 0 <struct zone 0xffff88047ffd9000> DMA 1 4095
<struct page 0xffffea0001edffc0> 1fffff00000400 0 <struct pglist_data 0xffff88047ffd9000> 1 <struct zone 0xffff88047ffd9800> DMA32 4096 1048575
BUG, zones differ!

Note that this range follows two not populated sections 68000000-77ffffff
in this zone. 7b788000-7b7fffff is the first one after a gap. This makes
memmap_init_zone() skip all the pfns up to the beginning of this range.
But this range is not pageblock (2M) aligned. In fact no range has to be.

crash> kmem -p 77fff000 78000000 7b5ff000 7b600000 7b787000 7b788000
PAGE PHYSICAL MAPPING INDEX CNT FLAGS
ffffea0001e00000 78000000 0 0 0 0
ffffea0001ed7fc0 7b5ff000 0 0 0 0
ffffea0001ed8000 7b600000 0 0 0 0 <<<<
ffffea0001ede1c0 7b787000 0 0 0 0
ffffea0001ede200 7b788000 0 0 1 1fffff00000000

Top part of page flags should contain nodeid and zonenr, which is not
the case for page ffffea0001ed8000 here (<<<<).

crash> log | grep -o fffea0001ed[^\ ]* | sort -u
fffea0001ed8000
fffea0001eded20
fffea0001edffc0

crash> bt -r | grep -o fffea0001ed[^\ ]* | sort -u
fffea0001ed8000
fffea0001eded00
fffea0001eded20
fffea0001edffc0

Initialization of the whole beginning of the section is skipped up to the
start of the range due to the commit b92df1de5d28. Now any code calling
move_freepages_block() (like reusing the page from a freelist as in this
example) with a page from the beginning of the range will get the page
rounded down to start_page ffffea0001ed8000 and passed to move_freepages()
which crashes on assertion getting wrong zonenr.

> VM_BUG_ON(page_zone(start_page) != page_zone(end_page));

Note, page_zone() derives the zone from page flags here.

From similar machine before commit b92df1de5d28:

crash> kmem -p 77fff000 78000000 7b5ff000 7b600000 7b7fe000 7b7ff000
PAGE PHYSICAL MAPPING INDEX CNT FLAGS
fffff73941e00000 78000000 0 0 1 1fffff00000000
fffff73941ed7fc0 7b5ff000 0 0 1 1fffff00000000
fffff73941ed8000 7b600000 0 0 1 1fffff00000000
fffff73941edff80 7b7fe000 0 0 1 1fffff00000000
fffff73941edffc0 7b7ff000 ffff8e67e04d3ae0 ad84 1 1fffff00020068 uptodate,lru,active,mappedtodisk

All the pages since the beginning of the section are initialized.
move_freepages()' not gonna blow up.

The same machine with this fix applied:

crash> kmem -p 77fff000 78000000 7b5ff000 7b600000 7b7fe000 7b7ff000
PAGE PHYSICAL MAPPING INDEX CNT FLAGS
ffffea0001e00000 78000000 0 0 0 0
ffffea0001e00000 7b5ff000 0 0 0 0
ffffea0001ed8000 7b600000 0 0 1 1fffff00000000
ffffea0001edff80 7b7fe000 0 0 1 1fffff00000000
ffffea0001edffc0 7b7ff000 ffff88017fb13720 8 2 1fffff00020068 uptodate,lru,active,mappedtodisk

At least the bare minimum of pages is initialized preventing the crash as well.

Fixes: b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns where possible")
Signed-off-by: Daniel Vacek <[email protected]>
Cc: [email protected]
---
mm/page_alloc.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index f2c57da5bbe5..eb27ccb50928 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5359,9 +5359,14 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
/*
* Skip to the pfn preceding the next valid one (or
* end_pfn), such that we hit a valid pfn (or end_pfn)
- * on our next iteration of the loop.
+ * on our next iteration of the loop. Note that it needs
+ * to be pageblock aligned even when the region itself
+ * is not. move_freepages_block() can shift ahead of
+ * the valid region but still depends on correct page
+ * metadata.
*/
- pfn = memblock_next_valid_pfn(pfn) - 1;
+ pfn = (memblock_next_valid_pfn(pfn) &
+ ~(pageblock_nr_pages-1)) - 1;
#endif
continue;
}
--
2.16.2

2018-03-03 03:53:49

by Daniel Vacek

[permalink] [raw]

Subject: [PATCH v3 0/2] mm/page_alloc: fix kernel BUG at mm/page_alloc.c:1913! crash in move_freepages()

Kernel can crash on failed VM_BUG_ON assertion in function move_freepages()
on some rare physical memory mappings (with huge range(s) of memory
reserved by BIOS followed by usable memory not aligned to pageblock).

crash> page_init_bug -v | grep resource | sed '/RAM .3/,/RAM .4/!d'
<struct resource 0xffff88067fffd480> 4bfac000 - 646b1fff System RAM (391.02 MiB = 400408.00 KiB)
<struct resource 0xffff88067fffd4b8> 646b2000 - 793fefff reserved (333.30 MiB = 341300.00 KiB)
<struct resource 0xffff88067fffd4f0> 793ff000 - 7b3fefff ACPI Non-volatile Storage ( 32.00 MiB)
<struct resource 0xffff88067fffd528> 7b3ff000 - 7b787fff ACPI Tables ( 3.54 MiB = 3620.00 KiB)
<struct resource 0xffff88067fffd560> 7b788000 - 7b7fffff System RAM (480.00 KiB)

More details in second patch.

v2: Use -1 constant for max_pfn and remove the parameter. That's
mostly just a cosmetics.
v3: Split to two patches series to make clear what is the actual fix
and what is just a clean up. No code changes compared to v2 and
second patch is identical to original v1.

Cc: [email protected]

Daniel Vacek (2):
mm/memblock: hardcode the max_pfn being -1
mm/page_alloc: fix memmap_init_zone pageblock alignment

mm/memblock.c | 13 ++++++-------
mm/page_alloc.c | 9 +++++++--
2 files changed, 13 insertions(+), 9 deletions(-)

--
2.16.2

2018-03-03 03:53:55

by Daniel Vacek

[permalink] [raw]

Subject: [PATCH v3 1/2] mm/memblock: hardcode the end_pfn being -1

This is just a clean up. It aids preventing to handle the special end case
in the next commit.

Signed-off-by: Daniel Vacek <[email protected]>
Cc: [email protected]
---
mm/memblock.c | 13 ++++++-------
mm/page_alloc.c | 2 +-
2 files changed, 7 insertions(+), 8 deletions(-)

diff --git a/mm/memblock.c b/mm/memblock.c
index 5a9ca2a1751b..2a5facd236bb 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -1101,13 +1101,12 @@ void __init_memblock __next_mem_pfn_range(int *idx, int nid,
*out_nid = r->nid;
}

-unsigned long __init_memblock memblock_next_valid_pfn(unsigned long pfn,
- unsigned long max_pfn)
+unsigned long __init_memblock memblock_next_valid_pfn(unsigned long pfn)
{
struct memblock_type *type = &memblock.memory;
unsigned int right = type->cnt;
unsigned int mid, left = 0;
- phys_addr_t addr = PFN_PHYS(pfn + 1);
+ phys_addr_t addr = PFN_PHYS(++pfn);

do {
mid = (right + left) / 2;
@@ -1118,15 +1117,15 @@ unsigned long __init_memblock memblock_next_valid_pfn(unsigned long pfn,
type->regions[mid].size))
left = mid + 1;
else {
- /* addr is within the region, so pfn + 1 is valid */
- return min(pfn + 1, max_pfn);
+ /* addr is within the region, so pfn is valid */
+ return pfn;
}
} while (left < right);

if (right == type->cnt)
- return max_pfn;
+ return -1UL;
else
- return min(PHYS_PFN(type->regions[right].base), max_pfn);
+ return PHYS_PFN(type->regions[right].base);
}

/**
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index cb416723538f..f2c57da5bbe5 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5361,7 +5361,7 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
* end_pfn), such that we hit a valid pfn (or end_pfn)
* on our next iteration of the loop.
*/
- pfn = memblock_next_valid_pfn(pfn, end_pfn) - 1;
+ pfn = memblock_next_valid_pfn(pfn) - 1;
#endif
continue;
}
--
2.16.2

2018-03-03 03:56:02

by Andrew Morton

[permalink] [raw]

Subject: Re: [PATCH v3 2/2] mm/page_alloc: fix memmap_init_zone pageblock alignment

On Sat, 3 Mar 2018 01:12:26 +0100 Daniel Vacek <[email protected]> wrote:

> Commit b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns
> where possible") introduced a bug where move_freepages() triggers a
> VM_BUG_ON() on uninitialized page structure due to pageblock alignment.

b92df1de5d28 was merged a year ago. Can you suggest why this hasn't
been reported before now?

This makes me wonder whether a -stable backport is really needed...

2018-03-03 03:58:14

by Daniel Vacek

[permalink] [raw]

Subject: Re: [PATCH v3 2/2] mm/page_alloc: fix memmap_init_zone pageblock alignment

On Sat, Mar 3, 2018 at 1:40 AM, Andrew Morton <[email protected]> wrote:
> On Sat, 3 Mar 2018 01:12:26 +0100 Daniel Vacek <[email protected]> wrote:
>
>> Commit b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns
>> where possible") introduced a bug where move_freepages() triggers a
>> VM_BUG_ON() on uninitialized page structure due to pageblock alignment.
>
> b92df1de5d28 was merged a year ago. Can you suggest why this hasn't
> been reported before now?

Yeah. I was surprised myself I couldn't find a fix to backport to
RHEL. But actually customers started to report this as soon as 7.4
(where b92df1de5d28 was merged in RHEL) was released. I remember
reports from September/October-ish times. It's not easily reproduced
and happens on a handful of machines only. I guess that's why. But
that does not make it less serious, I think.

Though there actually is a report here:
https://bugzilla.kernel.org/show_bug.cgi?id=196443

And there are reports for Fedora from July:
https://bugzilla.redhat.com/show_bug.cgi?id=1473242
and CentOS: https://bugs.centos.org/view.php?id=13964
and we internally track several dozens reports for RHEL bug
https://bugzilla.redhat.com/show_bug.cgi?id=1525121

Enough? ;-)

> This makes me wonder whether a -stable backport is really needed...

For some machines it definitely is. Won't hurt either, IMHO.

--nX

2018-03-12 12:29:02

by Sudeep Holla

[permalink] [raw]

Subject: Re: [PATCH v3 2/2] mm/page_alloc: fix memmap_init_zone pageblock alignment

Hi,

I couldn't find the exact mail corresponding to the patch merged in v4.16-rc5
but commit 864b75f9d6b01 "mm/page_alloc: fix memmap_init_zone
pageblock alignment"
cause boot hang on my ARM64 platform.

Log:
[ 0.000000] NUMA: No NUMA configuration found
[ 0.000000] NUMA: Faking a node at [mem
0x0000000000000000-0x00000009ffffffff]
[ 0.000000] NUMA: NODE_DATA [mem 0x9fffcb480-0x9fffccf7f]
[ 0.000000] Zone ranges:
[ 0.000000] DMA32 [mem 0x0000000080000000-0x00000000ffffffff]
[ 0.000000] Normal [mem 0x0000000100000000-0x00000009ffffffff]
[ 0.000000] Movable zone start for each node
[ 0.000000] Early memory node ranges
[ 0.000000] node 0: [mem 0x0000000080000000-0x00000000f8f9afff]
[ 0.000000] node 0: [mem 0x00000000f8f9b000-0x00000000f908ffff]
[ 0.000000] node 0: [mem 0x00000000f9090000-0x00000000f914ffff]
[ 0.000000] node 0: [mem 0x00000000f9150000-0x00000000f920ffff]
[ 0.000000] node 0: [mem 0x00000000f9210000-0x00000000f922ffff]
[ 0.000000] node 0: [mem 0x00000000f9230000-0x00000000f95bffff]
[ 0.000000] node 0: [mem 0x00000000f95c0000-0x00000000fe58ffff]
[ 0.000000] node 0: [mem 0x00000000fe590000-0x00000000fe5cffff]
[ 0.000000] node 0: [mem 0x00000000fe5d0000-0x00000000fe5dffff]
[ 0.000000] node 0: [mem 0x00000000fe5e0000-0x00000000fe62ffff]
[ 0.000000] node 0: [mem 0x00000000fe630000-0x00000000feffffff]
[ 0.000000] node 0: [mem 0x0000000880000000-0x00000009ffffffff]
[ 0.000000] Initmem setup node 0 [mem 0x0000000080000000-0x00000009ffffffff]

On Sat, Mar 3, 2018 at 1:08 AM, Daniel Vacek <[email protected]> wrote:
> On Sat, Mar 3, 2018 at 1:40 AM, Andrew Morton <[email protected]> wrote:
>> On Sat, 3 Mar 2018 01:12:26 +0100 Daniel Vacek <[email protected]> wrote:
>>
>>> Commit b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns
>>> where possible") introduced a bug where move_freepages() triggers a
>>> VM_BUG_ON() on uninitialized page structure due to pageblock alignment.
>>
>> b92df1de5d28 was merged a year ago. Can you suggest why this hasn't
>> been reported before now?
>
> Yeah. I was surprised myself I couldn't find a fix to backport to
> RHEL. But actually customers started to report this as soon as 7.4
> (where b92df1de5d28 was merged in RHEL) was released. I remember
> reports from September/October-ish times. It's not easily reproduced
> and happens on a handful of machines only. I guess that's why. But
> that does not make it less serious, I think.
>
> Though there actually is a report here:
> https://bugzilla.kernel.org/show_bug.cgi?id=196443
>
> And there are reports for Fedora from July:
> https://bugzilla.redhat.com/show_bug.cgi?id=1473242
> and CentOS: https://bugs.centos.org/view.php?id=13964
> and we internally track several dozens reports for RHEL bug
> https://bugzilla.redhat.com/show_bug.cgi?id=1525121
>
> Enough? ;-)
>
>> This makes me wonder whether a -stable backport is really needed...
>
> For some machines it definitely is. Won't hurt either, IMHO.
>
> --nX

2018-03-12 14:50:28

by Naresh Kamboju

[permalink] [raw]

Subject: Re: [PATCH v3 2/2] mm/page_alloc: fix memmap_init_zone pageblock alignment

On 12 March 2018 at 17:56, Sudeep Holla <[email protected]> wrote:
> Hi,
>
> I couldn't find the exact mail corresponding to the patch merged in v4.16-rc5
> but commit 864b75f9d6b01 "mm/page_alloc: fix memmap_init_zone
> pageblock alignment"
> cause boot hang on my ARM64 platform.

I have also noticed this problem on hi6220 Hikey - arm64.

LKFT: linux-next: Hikey boot failed linux-next-20180308
https://bugs.linaro.org/show_bug.cgi?id=3676

- Naresh

>
> Log:
> [ 0.000000] NUMA: No NUMA configuration found
> [ 0.000000] NUMA: Faking a node at [mem
> 0x0000000000000000-0x00000009ffffffff]
> [ 0.000000] NUMA: NODE_DATA [mem 0x9fffcb480-0x9fffccf7f]
> [ 0.000000] Zone ranges:
> [ 0.000000] DMA32 [mem 0x0000000080000000-0x00000000ffffffff]
> [ 0.000000] Normal [mem 0x0000000100000000-0x00000009ffffffff]
> [ 0.000000] Movable zone start for each node
> [ 0.000000] Early memory node ranges
> [ 0.000000] node 0: [mem 0x0000000080000000-0x00000000f8f9afff]
> [ 0.000000] node 0: [mem 0x00000000f8f9b000-0x00000000f908ffff]
> [ 0.000000] node 0: [mem 0x00000000f9090000-0x00000000f914ffff]
> [ 0.000000] node 0: [mem 0x00000000f9150000-0x00000000f920ffff]
> [ 0.000000] node 0: [mem 0x00000000f9210000-0x00000000f922ffff]
> [ 0.000000] node 0: [mem 0x00000000f9230000-0x00000000f95bffff]
> [ 0.000000] node 0: [mem 0x00000000f95c0000-0x00000000fe58ffff]
> [ 0.000000] node 0: [mem 0x00000000fe590000-0x00000000fe5cffff]
> [ 0.000000] node 0: [mem 0x00000000fe5d0000-0x00000000fe5dffff]
> [ 0.000000] node 0: [mem 0x00000000fe5e0000-0x00000000fe62ffff]
> [ 0.000000] node 0: [mem 0x00000000fe630000-0x00000000feffffff]
> [ 0.000000] node 0: [mem 0x0000000880000000-0x00000009ffffffff]
> [ 0.000000] Initmem setup node 0 [mem 0x0000000080000000-0x00000009ffffffff]
>
> On Sat, Mar 3, 2018 at 1:08 AM, Daniel Vacek <[email protected]> wrote:
>> On Sat, Mar 3, 2018 at 1:40 AM, Andrew Morton <[email protected]> wrote:
>>> On Sat, 3 Mar 2018 01:12:26 +0100 Daniel Vacek <[email protected]> wrote:
>>>
>>>> Commit b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns
>>>> where possible") introduced a bug where move_freepages() triggers a
>>>> VM_BUG_ON() on uninitialized page structure due to pageblock alignment.
>>>
>>> b92df1de5d28 was merged a year ago. Can you suggest why this hasn't
>>> been reported before now?
>>
>> Yeah. I was surprised myself I couldn't find a fix to backport to
>> RHEL. But actually customers started to report this as soon as 7.4
>> (where b92df1de5d28 was merged in RHEL) was released. I remember
>> reports from September/October-ish times. It's not easily reproduced
>> and happens on a handful of machines only. I guess that's why. But
>> that does not make it less serious, I think.
>>
>> Though there actually is a report here:
>> https://bugzilla.kernel.org/show_bug.cgi?id=196443
>>
>> And there are reports for Fedora from July:
>> https://bugzilla.redhat.com/show_bug.cgi?id=1473242
>> and CentOS: https://bugs.centos.org/view.php?id=13964
>> and we internally track several dozens reports for RHEL bug
>> https://bugzilla.redhat.com/show_bug.cgi?id=1525121
>>
>> Enough? ;-)
>>
>>> This makes me wonder whether a -stable backport is really needed...
>>
>> For some machines it definitely is. Won't hurt either, IMHO.
>>
>> --nX

2018-03-12 16:52:41

by Daniel Vacek

[permalink] [raw]

Subject: Re: [PATCH v3 2/2] mm/page_alloc: fix memmap_init_zone pageblock alignment

On Mon, Mar 12, 2018 at 3:49 PM, Naresh Kamboju
<[email protected]> wrote:
> On 12 March 2018 at 17:56, Sudeep Holla <[email protected]> wrote:
>> Hi,
>>
>> I couldn't find the exact mail corresponding to the patch merged in v4.16-rc5
>> but commit 864b75f9d6b01 "mm/page_alloc: fix memmap_init_zone
>> pageblock alignment"
>> cause boot hang on my ARM64 platform.
>
> I have also noticed this problem on hi6220 Hikey - arm64.
>
> LKFT: linux-next: Hikey boot failed linux-next-20180308
> https://bugs.linaro.org/show_bug.cgi?id=3676
>
> - Naresh
>
>>
>> Log:
>> [ 0.000000] NUMA: No NUMA configuration found
>> [ 0.000000] NUMA: Faking a node at [mem
>> 0x0000000000000000-0x00000009ffffffff]
>> [ 0.000000] NUMA: NODE_DATA [mem 0x9fffcb480-0x9fffccf7f]
>> [ 0.000000] Zone ranges:
>> [ 0.000000] DMA32 [mem 0x0000000080000000-0x00000000ffffffff]
>> [ 0.000000] Normal [mem 0x0000000100000000-0x00000009ffffffff]
>> [ 0.000000] Movable zone start for each node
>> [ 0.000000] Early memory node ranges
>> [ 0.000000] node 0: [mem 0x0000000080000000-0x00000000f8f9afff]
>> [ 0.000000] node 0: [mem 0x00000000f8f9b000-0x00000000f908ffff]
>> [ 0.000000] node 0: [mem 0x00000000f9090000-0x00000000f914ffff]
>> [ 0.000000] node 0: [mem 0x00000000f9150000-0x00000000f920ffff]
>> [ 0.000000] node 0: [mem 0x00000000f9210000-0x00000000f922ffff]
>> [ 0.000000] node 0: [mem 0x00000000f9230000-0x00000000f95bffff]
>> [ 0.000000] node 0: [mem 0x00000000f95c0000-0x00000000fe58ffff]
>> [ 0.000000] node 0: [mem 0x00000000fe590000-0x00000000fe5cffff]
>> [ 0.000000] node 0: [mem 0x00000000fe5d0000-0x00000000fe5dffff]
>> [ 0.000000] node 0: [mem 0x00000000fe5e0000-0x00000000fe62ffff]
>> [ 0.000000] node 0: [mem 0x00000000fe630000-0x00000000feffffff]
>> [ 0.000000] node 0: [mem 0x0000000880000000-0x00000009ffffffff]
>> [ 0.000000] Initmem setup node 0 [mem 0x0000000080000000-0x00000009ffffffff]
>>
>> On Sat, Mar 3, 2018 at 1:08 AM, Daniel Vacek <[email protected]> wrote:
>>> On Sat, Mar 3, 2018 at 1:40 AM, Andrew Morton <[email protected]> wrote:
>>>>
>>>> This makes me wonder whether a -stable backport is really needed...
>>>
>>> For some machines it definitely is. Won't hurt either, IMHO.
>>>
>>> --nX

Hmm, does it step back perhaps?

Can you check if below cures the boot hang?

--nX

~~~~
neelx@metal:~/nX/src/linux$ git diff
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 3d974cb2a1a1..415571120bbd 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5365,8 +5365,10 @@ void __meminit memmap_init_zone(unsigned long
size, int nid, unsigned long zone,
* the valid region but still depends on correct page
* metadata.
*/
- pfn = (memblock_next_valid_pfn(pfn, end_pfn) &
+ unsigned long next_pfn;
+ next_pfn = (memblock_next_valid_pfn(pfn, end_pfn) &
~(pageblock_nr_pages-1)) - 1;
+ pfn = max(next_pfn, pfn);
#endif
continue;
}
~~~~

2018-03-12 17:12:37

by Sudeep Holla

[permalink] [raw]

Subject: Re: [PATCH v3 2/2] mm/page_alloc: fix memmap_init_zone pageblock alignment

On 12/03/18 16:51, Daniel Vacek wrote:
[...]

>
> Hmm, does it step back perhaps?
>
> Can you check if below cures the boot hang?
>

Yes it does fix the boot hang.

> --nX
>
> ~~~~
> neelx@metal:~/nX/src/linux$ git diff
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 3d974cb2a1a1..415571120bbd 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5365,8 +5365,10 @@ void __meminit memmap_init_zone(unsigned long
> size, int nid, unsigned long zone,
> * the valid region but still depends on correct page
> * metadata.
> */
> - pfn = (memblock_next_valid_pfn(pfn, end_pfn) &
> + unsigned long next_pfn;
> + next_pfn = (memblock_next_valid_pfn(pfn, end_pfn) &
> ~(pageblock_nr_pages-1)) - 1;
> + pfn = max(next_pfn, pfn);
> #endif
> continue;
> }
> ~~~~
>

--
Regards,
Sudeep

2018-03-13 06:36:04

by Naresh Kamboju

[permalink] [raw]

Subject: Re: [PATCH v3 2/2] mm/page_alloc: fix memmap_init_zone pageblock alignment

On 12 March 2018 at 22:21, Daniel Vacek <[email protected]> wrote:
> On Mon, Mar 12, 2018 at 3:49 PM, Naresh Kamboju
> <[email protected]> wrote:
>> On 12 March 2018 at 17:56, Sudeep Holla <[email protected]> wrote:
>>> Hi,
>>>
>>> I couldn't find the exact mail corresponding to the patch merged in v4.16-rc5
>>> but commit 864b75f9d6b01 "mm/page_alloc: fix memmap_init_zone
>>> pageblock alignment"
>>> cause boot hang on my ARM64 platform.
>>
>> I have also noticed this problem on hi6220 Hikey - arm64.
>>
>> LKFT: linux-next: Hikey boot failed linux-next-20180308
>> https://bugs.linaro.org/show_bug.cgi?id=3676
>>
>> - Naresh
>>
>>>
>>> Log:
>>> [ 0.000000] NUMA: No NUMA configuration found
>>> [ 0.000000] NUMA: Faking a node at [mem
>>> 0x0000000000000000-0x00000009ffffffff]
>>> [ 0.000000] NUMA: NODE_DATA [mem 0x9fffcb480-0x9fffccf7f]
>>> [ 0.000000] Zone ranges:
>>> [ 0.000000] DMA32 [mem 0x0000000080000000-0x00000000ffffffff]
>>> [ 0.000000] Normal [mem 0x0000000100000000-0x00000009ffffffff]
>>> [ 0.000000] Movable zone start for each node
>>> [ 0.000000] Early memory node ranges
>>> [ 0.000000] node 0: [mem 0x0000000080000000-0x00000000f8f9afff]
>>> [ 0.000000] node 0: [mem 0x00000000f8f9b000-0x00000000f908ffff]
>>> [ 0.000000] node 0: [mem 0x00000000f9090000-0x00000000f914ffff]
>>> [ 0.000000] node 0: [mem 0x00000000f9150000-0x00000000f920ffff]
>>> [ 0.000000] node 0: [mem 0x00000000f9210000-0x00000000f922ffff]
>>> [ 0.000000] node 0: [mem 0x00000000f9230000-0x00000000f95bffff]
>>> [ 0.000000] node 0: [mem 0x00000000f95c0000-0x00000000fe58ffff]
>>> [ 0.000000] node 0: [mem 0x00000000fe590000-0x00000000fe5cffff]
>>> [ 0.000000] node 0: [mem 0x00000000fe5d0000-0x00000000fe5dffff]
>>> [ 0.000000] node 0: [mem 0x00000000fe5e0000-0x00000000fe62ffff]
>>> [ 0.000000] node 0: [mem 0x00000000fe630000-0x00000000feffffff]
>>> [ 0.000000] node 0: [mem 0x0000000880000000-0x00000009ffffffff]
>>> [ 0.000000] Initmem setup node 0 [mem 0x0000000080000000-0x00000009ffffffff]
>>>
>>> On Sat, Mar 3, 2018 at 1:08 AM, Daniel Vacek <[email protected]> wrote:
>>>> On Sat, Mar 3, 2018 at 1:40 AM, Andrew Morton <[email protected]> wrote:
>>>>>
>>>>> This makes me wonder whether a -stable backport is really needed...
>>>>
>>>> For some machines it definitely is. Won't hurt either, IMHO.
>>>>
>>>> --nX
>
> Hmm, does it step back perhaps?
>
> Can you check if below cures the boot hang?
>
> --nX
>
> ~~~~
> neelx@metal:~/nX/src/linux$ git diff
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 3d974cb2a1a1..415571120bbd 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5365,8 +5365,10 @@ void __meminit memmap_init_zone(unsigned long
> size, int nid, unsigned long zone,
> * the valid region but still depends on correct page
> * metadata.
> */
> - pfn = (memblock_next_valid_pfn(pfn, end_pfn) &
> + unsigned long next_pfn;
> + next_pfn = (memblock_next_valid_pfn(pfn, end_pfn) &
> ~(pageblock_nr_pages-1)) - 1;
> + pfn = max(next_pfn, pfn);
> #endif
> continue;
> }

After applying this patch on linux-next the boot hang problem resolved.
Now the hi6220-hikey is booting successfully.
Thank you.

- Naresh

> ~~~~

2018-03-13 22:48:51

by Daniel Vacek

[permalink] [raw]

Subject: Re: [PATCH v3 2/2] mm/page_alloc: fix memmap_init_zone pageblock alignment

On Tue, Mar 13, 2018 at 7:34 AM, Naresh Kamboju
<[email protected]> wrote:
> On 12 March 2018 at 22:21, Daniel Vacek <[email protected]> wrote:
>> On Mon, Mar 12, 2018 at 3:49 PM, Naresh Kamboju
>> <[email protected]> wrote:
>>> On 12 March 2018 at 17:56, Sudeep Holla <[email protected]> wrote:
>>>> Hi,
>>>>
>>>> I couldn't find the exact mail corresponding to the patch merged in v4.16-rc5
>>>> but commit 864b75f9d6b01 "mm/page_alloc: fix memmap_init_zone
>>>> pageblock alignment"
>>>> cause boot hang on my ARM64 platform.
>>>
>>> I have also noticed this problem on hi6220 Hikey - arm64.
>>>
>>> LKFT: linux-next: Hikey boot failed linux-next-20180308
>>> https://bugs.linaro.org/show_bug.cgi?id=3676
>>>
>>> - Naresh
>>>
>>>>
>>>> Log:
>>>> [ 0.000000] NUMA: No NUMA configuration found
>>>> [ 0.000000] NUMA: Faking a node at [mem
>>>> 0x0000000000000000-0x00000009ffffffff]
>>>> [ 0.000000] NUMA: NODE_DATA [mem 0x9fffcb480-0x9fffccf7f]
>>>> [ 0.000000] Zone ranges:
>>>> [ 0.000000] DMA32 [mem 0x0000000080000000-0x00000000ffffffff]
>>>> [ 0.000000] Normal [mem 0x0000000100000000-0x00000009ffffffff]
>>>> [ 0.000000] Movable zone start for each node
>>>> [ 0.000000] Early memory node ranges
>>>> [ 0.000000] node 0: [mem 0x0000000080000000-0x00000000f8f9afff]
>>>> [ 0.000000] node 0: [mem 0x00000000f8f9b000-0x00000000f908ffff]
>>>> [ 0.000000] node 0: [mem 0x00000000f9090000-0x00000000f914ffff]
>>>> [ 0.000000] node 0: [mem 0x00000000f9150000-0x00000000f920ffff]
>>>> [ 0.000000] node 0: [mem 0x00000000f9210000-0x00000000f922ffff]
>>>> [ 0.000000] node 0: [mem 0x00000000f9230000-0x00000000f95bffff]
>>>> [ 0.000000] node 0: [mem 0x00000000f95c0000-0x00000000fe58ffff]
>>>> [ 0.000000] node 0: [mem 0x00000000fe590000-0x00000000fe5cffff]
>>>> [ 0.000000] node 0: [mem 0x00000000fe5d0000-0x00000000fe5dffff]
>>>> [ 0.000000] node 0: [mem 0x00000000fe5e0000-0x00000000fe62ffff]
>>>> [ 0.000000] node 0: [mem 0x00000000fe630000-0x00000000feffffff]
>>>> [ 0.000000] node 0: [mem 0x0000000880000000-0x00000009ffffffff]
>>>> [ 0.000000] Initmem setup node 0 [mem 0x0000000080000000-0x00000009ffffffff]
>>>>
>>>> On Sat, Mar 3, 2018 at 1:08 AM, Daniel Vacek <[email protected]> wrote:
>>>>> On Sat, Mar 3, 2018 at 1:40 AM, Andrew Morton <[email protected]> wrote:
>>>>>>
>>>>>> This makes me wonder whether a -stable backport is really needed...
>>>>>
>>>>> For some machines it definitely is. Won't hurt either, IMHO.
>>>>>
>>>>> --nX
>>
>> Hmm, does it step back perhaps?
>>
>> Can you check if below cures the boot hang?
>>
>> --nX
>>
>> ~~~~
>> neelx@metal:~/nX/src/linux$ git diff
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index 3d974cb2a1a1..415571120bbd 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -5365,8 +5365,10 @@ void __meminit memmap_init_zone(unsigned long
>> size, int nid, unsigned long zone,
>> * the valid region but still depends on correct page
>> * metadata.
>> */
>> - pfn = (memblock_next_valid_pfn(pfn, end_pfn) &
>> + unsigned long next_pfn;
>> + next_pfn = (memblock_next_valid_pfn(pfn, end_pfn) &
>> ~(pageblock_nr_pages-1)) - 1;
>> + pfn = max(next_pfn, pfn);
>> #endif
>> continue;
>> }
>
> After applying this patch on linux-next the boot hang problem resolved.
> Now the hi6220-hikey is booting successfully.
> Thank you.

Thank you and Sudeep for testing. I've just sent Andrew a formal patch.

>
> - Naresh
>
>> ~~~~