2015-08-25 22:42:06

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH] mm: Check if section present during memory block (un)registering

Tony found on his setup, if memory block size 512M will cause crash
during booting.

BUG: unable to handle kernel paging request at ffffea0074000020
IP: [<ffffffff81670527>] get_nid_for_pfn+0x17/0x40
PGD 128ffcb067 PUD 128ffc9067 PMD 0
Oops: 0000 [#1] SMP
Modules linked in:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.2.0-rc8 #1
...
Call Trace:
[<ffffffff81453b56>] ? register_mem_sect_under_node+0x66/0xe0
[<ffffffff81453eeb>] register_one_node+0x17b/0x240
[<ffffffff81b1f1ed>] ? pci_iommu_alloc+0x6e/0x6e
[<ffffffff81b1f229>] topology_init+0x3c/0x95
[<ffffffff8100213d>] do_one_initcall+0xcd/0x1f0

The system has non continuous RAM address:
BIOS-e820: [mem 0x0000001300000000-0x0000001cffffffff] usable
BIOS-e820: [mem 0x0000001d70000000-0x0000001ec7ffefff] usable
BIOS-e820: [mem 0x0000001f00000000-0x0000002bffffffff] usable
BIOS-e820: [mem 0x0000002c18000000-0x0000002d6fffefff] usable
BIOS-e820: [mem 0x0000002e00000000-0x00000039ffffffff] usable

So there are start sections in memory block not present.
For example:
memory block : [0x2c18000000, 0x2c20000000) 512M
first three sections are not present.

Current register_mem_sect_under_node() assume first section is present,
but memory block section number range [start_section_nr, end_section_nr]
would include not present section.

For arch that support vmemmap, we don't setup memmap for struct page area
within not present sections area.

So skip the pfn range that belong to not present section.

Als fixes unregister_mem_sect_under_nodes().

Reported-by: Tony Luck <[email protected]>
Tested-by: Tony Luck <[email protected]>
Signed-off-by: Yinghai Lu <[email protected]>

diff --git a/drivers/base/node.c b/drivers/base/node.c
index 31df474d..cc910ad 100644
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -390,8 +390,14 @@ int register_mem_sect_under_node(struct memory_block *mem_blk, int nid)
sect_end_pfn = section_nr_to_pfn(mem_blk->end_section_nr);
sect_end_pfn += PAGES_PER_SECTION - 1;
for (pfn = sect_start_pfn; pfn <= sect_end_pfn; pfn++) {
- int page_nid;
+ int page_nid, scn_nr;

+ scn_nr = pfn_to_section_nr(pfn);
+ if (!present_section_nr(scn_nr)) {
+ pfn = round_down(pfn + PAGES_PER_SECTION,
+ PAGES_PER_SECTION) - 1;
+ continue;
+ }
page_nid = get_nid_for_pfn(pfn);
if (page_nid < 0)
continue;
@@ -426,10 +432,18 @@ int unregister_mem_sect_under_nodes(struct memory_block *mem_blk,
return -ENOMEM;
nodes_clear(*unlinked_nodes);

- sect_start_pfn = section_nr_to_pfn(phys_index);
- sect_end_pfn = sect_start_pfn + PAGES_PER_SECTION - 1;
+ sect_start_pfn = section_nr_to_pfn(mem_blk->start_section_nr);
+ sect_end_pfn = section_nr_to_pfn(mem_blk->end_section_nr);
+ sect_end_pfn += PAGES_PER_SECTION - 1;
for (pfn = sect_start_pfn; pfn <= sect_end_pfn; pfn++) {
- int nid;
+ int nid, scn_nr;
+
+ scn_nr = pfn_to_section_nr(pfn);
+ if (!present_section_nr(scn_nr)) {
+ pfn = round_down(pfn + PAGES_PER_SECTION,
+ PAGES_PER_SECTION) - 1;
+ continue;
+ }

nid = get_nid_for_pfn(pfn);
if (nid < 0)


2015-08-25 23:08:37

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] mm: Check if section present during memory block (un)registering

On Tue, 25 Aug 2015 15:41:16 -0700 Yinghai Lu <[email protected]> wrote:

> Tony found on his setup, if memory block size 512M will cause crash
> during booting.
>
> BUG: unable to handle kernel paging request at ffffea0074000020
> IP: [<ffffffff81670527>] get_nid_for_pfn+0x17/0x40
> PGD 128ffcb067 PUD 128ffc9067 PMD 0
> Oops: 0000 [#1] SMP
> Modules linked in:
> CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.2.0-rc8 #1
> ...
> Call Trace:
> [<ffffffff81453b56>] ? register_mem_sect_under_node+0x66/0xe0
> [<ffffffff81453eeb>] register_one_node+0x17b/0x240
> [<ffffffff81b1f1ed>] ? pci_iommu_alloc+0x6e/0x6e
> [<ffffffff81b1f229>] topology_init+0x3c/0x95
> [<ffffffff8100213d>] do_one_initcall+0xcd/0x1f0
>
> The system has non continuous RAM address:
> BIOS-e820: [mem 0x0000001300000000-0x0000001cffffffff] usable
> BIOS-e820: [mem 0x0000001d70000000-0x0000001ec7ffefff] usable
> BIOS-e820: [mem 0x0000001f00000000-0x0000002bffffffff] usable
> BIOS-e820: [mem 0x0000002c18000000-0x0000002d6fffefff] usable
> BIOS-e820: [mem 0x0000002e00000000-0x00000039ffffffff] usable
>
> So there are start sections in memory block not present.
> For example:
> memory block : [0x2c18000000, 0x2c20000000) 512M
> first three sections are not present.
>
> Current register_mem_sect_under_node() assume first section is present,
> but memory block section number range [start_section_nr, end_section_nr]
> would include not present section.
>
> For arch that support vmemmap, we don't setup memmap for struct page area
> within not present sections area.
>
> So skip the pfn range that belong to not present section.
>
> Als fixes unregister_mem_sect_under_nodes().

It appears this should be backported into -stable kernels, yes? Do you
know which kernel versions need the fix?

> --- a/drivers/base/node.c
> +++ b/drivers/base/node.c
> @@ -390,8 +390,14 @@ int register_mem_sect_under_node(struct memory_block *mem_blk, int nid)
> sect_end_pfn = section_nr_to_pfn(mem_blk->end_section_nr);
> sect_end_pfn += PAGES_PER_SECTION - 1;
> for (pfn = sect_start_pfn; pfn <= sect_end_pfn; pfn++) {
> - int page_nid;
> + int page_nid, scn_nr;
>
> + scn_nr = pfn_to_section_nr(pfn);
> + if (!present_section_nr(scn_nr)) {
> + pfn = round_down(pfn + PAGES_PER_SECTION,
> + PAGES_PER_SECTION) - 1;
> + continue;
> + }

Can we please add a comment here telling readers why this is being
done? What scenario is being detected and how it comes about.

> page_nid = get_nid_for_pfn(pfn);
> if (page_nid < 0)
> continue;

2015-08-25 23:24:25

by Tony Luck

[permalink] [raw]
Subject: RE: [PATCH] mm: Check if section present during memory block (un)registering

> It appears this should be backported into -stable kernels, yes? Do you
> know which kernel versions need the fix?

For my setup the problem is first seen after:

commit bdee237c0343 " x86: mm: Use 2GB memory block size on large memory x86-64 systems"

which appeared in v3.19 and forced a 2GB memory block size. But it could happen on older
systems depending on the block size picked based on the alignment of max_pfn.

Looking further back (to v3.15) ... we used a fixed MIN_MEMORY_BLOCK_SIZE for
all systems prior to

commit 982792c782ef "x86, mm: probe memory block size for generic x86 64bit"

So maybe:

Cc: [email protected] #v3.15

-Tony

2015-08-26 04:04:46

by Yinghai Lu

[permalink] [raw]
Subject: Re: [PATCH] mm: Check if section present during memory block (un)registering

On Tue, Aug 25, 2015 at 4:08 PM, Andrew Morton
<[email protected]> wrote:
> On Tue, 25 Aug 2015 15:41:16 -0700 Yinghai Lu <[email protected]> wrote:
>
>> Tony found on his setup, if memory block size 512M will cause crash
>> during booting.
>>
>> BUG: unable to handle kernel paging request at ffffea0074000020
>> IP: [<ffffffff81670527>] get_nid_for_pfn+0x17/0x40
>> PGD 128ffcb067 PUD 128ffc9067 PMD 0
>> Oops: 0000 [#1] SMP
>> Modules linked in:
>> CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.2.0-rc8 #1
>> ...
>> Call Trace:
>> [<ffffffff81453b56>] ? register_mem_sect_under_node+0x66/0xe0
>> [<ffffffff81453eeb>] register_one_node+0x17b/0x240
>> [<ffffffff81b1f1ed>] ? pci_iommu_alloc+0x6e/0x6e
>> [<ffffffff81b1f229>] topology_init+0x3c/0x95
>> [<ffffffff8100213d>] do_one_initcall+0xcd/0x1f0
>>
>> The system has non continuous RAM address:
>> BIOS-e820: [mem 0x0000001300000000-0x0000001cffffffff] usable
>> BIOS-e820: [mem 0x0000001d70000000-0x0000001ec7ffefff] usable
>> BIOS-e820: [mem 0x0000001f00000000-0x0000002bffffffff] usable
>> BIOS-e820: [mem 0x0000002c18000000-0x0000002d6fffefff] usable
>> BIOS-e820: [mem 0x0000002e00000000-0x00000039ffffffff] usable
>>
>> So there are start sections in memory block not present.
>> For example:
>> memory block : [0x2c18000000, 0x2c20000000) 512M
>> first three sections are not present.
>>
>> Current register_mem_sect_under_node() assume first section is present,
>> but memory block section number range [start_section_nr, end_section_nr]
>> would include not present section.
>>
>> For arch that support vmemmap, we don't setup memmap for struct page area
>> within not present sections area.
>>
>> So skip the pfn range that belong to not present section.
>>
>> Als fixes unregister_mem_sect_under_nodes().
>
> It appears this should be backported into -stable kernels, yes? Do you
> know which kernel versions need the fix?

should add following according to Tony's email.

Fixes: bdee237c0343 ("x86: mm: Use 2GB memory block size on large
memory x86-64 systems")
Fixes: 982792c782ef ("x86, mm: probe memory block size for generic x86 64bit")
Cc: [email protected] #v3.15

>
>> --- a/drivers/base/node.c
>> +++ b/drivers/base/node.c
>> @@ -390,8 +390,14 @@ int register_mem_sect_under_node(struct memory_block *mem_blk, int nid)
>> sect_end_pfn = section_nr_to_pfn(mem_blk->end_section_nr);
>> sect_end_pfn += PAGES_PER_SECTION - 1;
>> for (pfn = sect_start_pfn; pfn <= sect_end_pfn; pfn++) {
>> - int page_nid;
>> + int page_nid, scn_nr;
>>
>> + scn_nr = pfn_to_section_nr(pfn);
>> + if (!present_section_nr(scn_nr)) {
>> + pfn = round_down(pfn + PAGES_PER_SECTION,
>> + PAGES_PER_SECTION) - 1;
>> + continue;
>> + }
>
> Can we please add a comment here telling readers why this is being
> done? What scenario is being detected and how it comes about.
>

Yes, should add

/* skip pfn range from absent memory section */