2014-01-23 22:49:35

by Dave Hansen

[permalink] [raw]
Subject: Panic on 8-node system in memblock_virt_alloc_try_nid()

Linus's current tree doesn't boot on an 8-node/1TB NUMA system that I
have. Its reboots are *LONG*, so I haven't fully bisected it, but it's
down to a just a few commits, most of which are changes to the memblock
code. Since the panic is in the memblock code, it looks like a
no-brainer. It's almost certainly the code from Santosh or Grygorii
that's triggering this.

Config and good/bad dmesg with memblock=debug are here:

http://sr71.net/~dave/intel/3.13/

Please let me know if you need it bisected further than this.

The remaining commits are these:

> commit 4883e997b26ed857da8dae6a6e6aeb12830b978d
> commit 560dca27a6b36015e4f69a4ceba0ee5be0707c17
> commit 9a28f9dc8d10b619af9a37b1e27c41ada5415629
> commit b6cb5bab263791d09abe88f24df6c2da53415320
> commit cfb665864e54ee7a160750b4815bfe6b7eb13d0d
> commit 9233d2be108f573caa21eb450411bf8fa68cadbb
> commit 4fc0bc58cb7d983e55baa8dcbb7c1a4ee54e65be
> commit 9e43aa2b8d1cb3137bd7e60d5fead83d0569de2b
> commit 999c17e3de4855af4e829c0871ad32fc76a93991
> commit 0d036e9e33df8befa9348683ba68258fee7f0a00
> commit 8b89a1169437541a2a9b62c8f7b1a5c0ceb0fbde
> commit bb016b84164554725899aef544331085e08cb402
> commit c15295001aa940df4e3cf6574808a4addca9f2e5
> commit 457ff1de2d247d9b8917c4664c2325321a35e313
> commit c2f69cdafebb3a46e43b5ac57ca12b539a2c790f
> commit 6782832eba5e8c87a749a41da8deda1c3ef67ba0
> commit 9da791dfabc60218c81904c7906b45789466e68e
> commit 098b081b50d5eb8c7e0200a4770b0bcd28eab9ce
> commit 26f09e9b3a0696f6fe20b021901300fba26fb579
> commit b115423357e0cda6d8f45d0c81df537d7b004020
> commit 87029ee9390b2297dae699d5fb135b77992116e5
> commit 79f40fab0b3a78e0e41fac79a65a9870f4b05652
> commit 869a84e1ca163b737236dae997db4a6a1e230b9b
> commit 10e89523bf5aade79081f501452fe7f1a16fa189
> commit fd615c4e671979e3e362df537d6be38f8d27aa80
> commit 5b6e529521d35e1bcaa0fe43456d1bbb335cae5d

The oops I see is this:

> [ 0.000000] Kernel panic - not syncing: : Failed to allocate 2143289344 bytes align=0x200000 nid=0 from=0x1000000 max_addr=0x0
> [ 0.000000]
> [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 3.13.0-slub-03995-g0dc3fd0-dirty #816
> [ 0.000000] Hardware name: FUJITSU-SV PRIMEQUEST 1800E2/SB, BIOS PRIMEQUEST 1000 Series BIOS Version 1.24 09/14/2011
> [ 0.000000] 0000000001000000 ffffffff81c01ce8 ffffffff81706941 0000000000000687
> [ 0.000000] ffffffff81a30b48 ffffffff81c01d68 ffffffff817029de 0000000000000000
> [ 0.000000] 0000000000000030 ffffffff81c01d80 ffffffff81c01d18 ffffffff81c01d68
> [ 0.000000] Call Trace:
> [ 0.000000] [<ffffffff81706941>] dump_stack+0x4e/0x68
> [ 0.000000] [<ffffffff817029de>] panic+0xbb/0x1cb
> [ 0.000000] [<ffffffff81d3bef9>] memblock_virt_alloc_try_nid+0xa1/0xa1
> [ 0.000000] [<ffffffff816ff5f9>] __earlyonly_bootmem_alloc.constprop.0+0x21/0x28
> [ 0.000000] [<ffffffff81d3cf27>] sparse_mem_maps_populate_node+0x34/0x132
> [ 0.000000] [<ffffffff81d3cbd3>] ? alloc_usemap_and_memmap+0x10f/0x10f
> [ 0.000000] [<ffffffff81d3cbdc>] sparse_early_mem_maps_alloc_node+0x9/0xb
> [ 0.000000] [<ffffffff81d3cb96>] alloc_usemap_and_memmap+0xd2/0x10f
> [ 0.000000] [<ffffffff81d3ce29>] sparse_init+0x85/0x14f
> [ 0.000000] [<ffffffff81d2adbb>] paging_init+0x13/0x22
> [ 0.000000] [<ffffffff81d1b521>] setup_arch+0xb51/0xc6e
> [ 0.000000] [<ffffffff81703150>] ? printk+0x4d/0x4f
> [ 0.000000] [<ffffffff81d14b1a>] start_kernel+0x85/0x3db
> [ 0.000000] [<ffffffff81d145a8>] x86_64_start_reservations+0x2a/0x2c
> [ 0.000000] [<ffffffff81d1469a>] x86_64_start_kernel+0xf0/0xf7


2014-01-24 00:28:04

by Dave Hansen

[permalink] [raw]
Subject: Re: Panic on 8-node system in memblock_virt_alloc_try_nid()

I've got a second failure mode, too, also memblock related with the same
system but a different config. In this one, the memblock code looks to
have returned an address for which there is no virtual mapping. The PMD
is clear.

> [ 0.000000] memblock_find_in_range_node():239
> [ 0.000000] __memblock_find_range_top_down():150
> [ 0.000000] __memblock_find_range_top_down():152 i: 600000001
> [ 0.000000] memblock_find_in_range_node():241 ret: 2147479552
> [ 0.000000] memblock_reserve: [0x0000007ffff000-0x0000007ffff03f] flags 0x0 numa_set_distance+0xd2/0x252
> [ 0.000000] numa_distance phys: 2147479552
> [ 0.000000] numa_distance virt: ffff88007ffff000
> [ 0.000000] numa_distance size: 64
> [ 0.000000] numa_alloc_distance() accessing numa_distance[] at byte: 0
> [ 0.000000] BUG: unable to handle kernel paging request at ffff88007ffff000
> [ 0.000000] IP: [<ffffffff81d2c1f1>] numa_set_distance+0x186/0x252
> [ 0.000000] PGD 211e067 PUD 2121067 PMD 0
> [ 0.000000] Oops: 0002 [#1] SMP
> [ 0.000000] Modules linked in:
> [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 3.13.0-slub-04156-g90804ed-dirty #825
> [ 0.000000] Hardware name: FUJITSU-SV PRIMEQUEST 1800E2/SB, BIOS PRIMEQUEST 1000 Series BIOS Version 1.24 09/14/2011
> [ 0.000000] task: ffffffff81c104a0 ti: ffffffff81c00000 task.ti: ffffffff81c00000
> [ 0.000000] RIP: 0010:[<ffffffff81d2c1f1>] [<ffffffff81d2c1f1>] numa_set_distance+0x186/0x252
> [ 0.000000] RSP: 0000:ffffffff81c01cd8 EFLAGS: 00010002
> [ 0.000000] RAX: 000000000000000a RBX: 0000000000000000 RCX: 0000000000000000
> [ 0.000000] RDX: 0000000000000014 RSI: 0000000000000046 RDI: ffffffff81ea4f84
> [ 0.000000] RBP: ffffffff81c01d68 R08: 000000000000100d R09: ffff88007ffff000
> [ 0.000000] R10: 0000000000000127 R11: 000000000000000d R12: 0000000000000000
> [ 0.000000] R13: 000000000000000a R14: 0000000000000008 R15: 0000000000000001
> [ 0.000000] FS: 0000000000000000(0000) GS:ffffffff81d00000(0000) knlGS:0000000000000000
> [ 0.000000] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 0.000000] CR2: ffff88007ffff000 CR3: 0000000001c0b000 CR4: 00000000000000b0
> [ 0.000000] Stack:
> [ 0.000000] 0000000000000000 ffffffff00000000 0000000000000000 0000004081c01dd0
> [ 0.000000] 00000000000000ff 0000000000000000 0000000000000000 0000000000000000
> [ 0.000000] 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> [ 0.000000] Call Trace:
> [ 0.000000] [<ffffffff81d2c480>] acpi_numa_slit_init+0x47/0x70
> [ 0.000000] [<ffffffff81d52c34>] ? acpi_table_print_srat_entry+0x26/0x26
> [ 0.000000] [<ffffffff81d52c9c>] acpi_parse_slit+0x68/0x6c
> [ 0.000000] [<ffffffff81d5156c>] acpi_table_parse+0x6c/0x82
> [ 0.000000] [<ffffffff81d52dcc>] acpi_numa_init+0x94/0xb0
> [ 0.000000] [<ffffffff81d2c6d9>] ? acpi_numa_arch_fixup+0x6/0x6
> [ 0.000000] [<ffffffff81d2c6d9>] ? acpi_numa_arch_fixup+0x6/0x6
> [ 0.000000] [<ffffffff81d2c6e2>] x86_acpi_numa_init+0x9/0x1b
> [ 0.000000] [<ffffffff81d2bbc2>] numa_init+0xe0/0x589
> [ 0.000000] [<ffffffff8108adba>] ? set_pte_vaddr_pud+0x3a/0x60
> [ 0.000000] [<ffffffff8108ae45>] ? set_pte_vaddr+0x65/0xa0
> [ 0.000000] [<ffffffff810902d5>] ? __native_set_fixmap+0x25/0x30
> [ 0.000000] [<ffffffff81d2c2d6>] x86_numa_init+0x19/0x2b
> [ 0.000000] [<ffffffff81d2c419>] initmem_init+0x9/0xb
> [ 0.000000] [<ffffffff81d1b2f3>] setup_arch+0x923/0xc6e
> [ 0.000000] [<ffffffff817032e0>] ? printk+0x4d/0x4f
> [ 0.000000] [<ffffffff81d14b1a>] start_kernel+0x85/0x3db
> [ 0.000000] [<ffffffff81d145a8>] x86_64_start_reservations+0x2a/0x2c
> [ 0.000000] [<ffffffff81d1469a>] x86_64_start_kernel+0xf0/0xf7
> [ 0.000000] Code: ff ff e8 c6 70 9d ff 8b 4d 80 4c 8b 8d 70 ff ff ff b0 0a 4c 03 0d a8 0a 17 00 ba 14 00 00 00 44 39 f9 0f 45 c2 49 ff c7 45 39 fe <41> 88 01 44 8b 85 78 ff ff ff 7f a0 ff c1 45 01 f0 44 39 f1 7c
> [ 0.000000] RIP [<ffffffff81d2c1f1>] numa_set_distance+0x186/0x252
> [ 0.000000] RSP <ffffffff81c01cd8>
> [ 0.000000] CR2: ffff88007ffff000
> [ 0.000000] ---[ end trace 1ac9854e9d9aedf2 ]---
> [ 0.000000] Kernel panic - not syncing: Attempted to kill the idle task!

2014-01-24 03:43:58

by Santosh Shilimkar

[permalink] [raw]
Subject: Re: Panic on 8-node system in memblock_virt_alloc_try_nid()

Dave,

On Thursday 23 January 2014 05:49 PM, Dave Hansen wrote:
> Linus's current tree doesn't boot on an 8-node/1TB NUMA system that I
> have. Its reboots are *LONG*, so I haven't fully bisected it, but it's
> down to a just a few commits, most of which are changes to the memblock
> code. Since the panic is in the memblock code, it looks like a
> no-brainer. It's almost certainly the code from Santosh or Grygorii
> that's triggering this.
>
> Config and good/bad dmesg with memblock=debug are here:
>
> http://sr71.net/~dave/intel/3.13/
>
> Please let me know if you need it bisected further than this.
>
Thanks a lot for debug information. Its pretty useful. The oops
seems to be actually side effect of not setting up the numa nodes
correctly first place. At least the setup_node_data() results
indicate that. Actually setup_node_data() operates on the physical
memblock interfaces which are untouched except the alignment change
and thats potentially reason for the change in behavior.

Will you be able revert below commit and give a quick try to see
if the behavior changes ? It might impact other APIs since they
assume the default alignment as SMP_CACHE_BYTES but at least
I want to see if with below revert at least setup_node_data()
reserves correct memory space.

79f40fa mm/memblock: drop WARN and use SMP_CACHE_BYTES as a default alignment

Regards,
Santosh

2014-01-24 05:55:16

by Yinghai Lu

[permalink] [raw]
Subject: Re: Panic on 8-node system in memblock_virt_alloc_try_nid()

On Thu, Jan 23, 2014 at 2:49 PM, Dave Hansen <[email protected]> wrote:
> Linus's current tree doesn't boot on an 8-node/1TB NUMA system that I
> have. Its reboots are *LONG*, so I haven't fully bisected it, but it's
> down to a just a few commits, most of which are changes to the memblock
> code. Since the panic is in the memblock code, it looks like a
> no-brainer. It's almost certainly the code from Santosh or Grygorii
> that's triggering this.
>
> Config and good/bad dmesg with memblock=debug are here:
>
> http://sr71.net/~dave/intel/3.13/
>
> Please let me know if you need it bisected further than this.

Please check attached patch, and it should fix the problem.

Yinghai


Attachments:
fix_numa_x.patch (1.82 kB)

2014-01-24 06:38:38

by Santosh Shilimkar

[permalink] [raw]
Subject: Re: Panic on 8-node system in memblock_virt_alloc_try_nid()

Yinghai,

On Friday 24 January 2014 12:55 AM, Yinghai Lu wrote:
> On Thu, Jan 23, 2014 at 2:49 PM, Dave Hansen <[email protected]> wrote:
>> > Linus's current tree doesn't boot on an 8-node/1TB NUMA system that I
>> > have. Its reboots are *LONG*, so I haven't fully bisected it, but it's
>> > down to a just a few commits, most of which are changes to the memblock
>> > code. Since the panic is in the memblock code, it looks like a
>> > no-brainer. It's almost certainly the code from Santosh or Grygorii
>> > that's triggering this.
>> >
>> > Config and good/bad dmesg with memblock=debug are here:
>> >
>> > http://sr71.net/~dave/intel/3.13/
>> >
>> > Please let me know if you need it bisected further than this.
> Please check attached patch, and it should fix the problem.
>

[...]

>
> Subject: [PATCH] x86: Fix numa with reverting wrong memblock setting.
>
> Dave reported Numa on x86 is broken on system with 1T memory.
>
> It turns out
> | commit 5b6e529521d35e1bcaa0fe43456d1bbb335cae5d
> | Author: Santosh Shilimkar <[email protected]>
> | Date: Tue Jan 21 15:50:03 2014 -0800
> |
> | x86: memblock: set current limit to max low memory address
>
> set limit to low wrongly.
>
> max_low_pfn_mapped is different from max_pfn_mapped.
> max_low_pfn_mapped is always under 4G.
>
> That will memblock_alloc_nid all go under 4G.
>
> Revert that offending patch.
>
> Reported-by: Dave Hansen <[email protected]>
> Signed-off-by: Yinghai Lu <[email protected]>
>
>
This mostly will fix the $subject issue but the regression
reported by Andrew [1] will surface with the revert. Its clear
now that even though commit fixed the issue, it wasn't the fix.

Would be great if you can have a look at the thread.

Regards,
Santosh

[1] http://lkml.indiana.edu/hypermail/linux/kernel/1312.1/03770.html

2014-01-24 06:56:42

by Santosh Shilimkar

[permalink] [raw]
Subject: Re: Panic on 8-node system in memblock_virt_alloc_try_nid()

On Friday 24 January 2014 01:38 AM, Santosh Shilimkar wrote:
> Yinghai,
>
> On Friday 24 January 2014 12:55 AM, Yinghai Lu wrote:
>> On Thu, Jan 23, 2014 at 2:49 PM, Dave Hansen <[email protected]> wrote:
>>>> Linus's current tree doesn't boot on an 8-node/1TB NUMA system that I
>>>> have. Its reboots are *LONG*, so I haven't fully bisected it, but it's
>>>> down to a just a few commits, most of which are changes to the memblock
>>>> code. Since the panic is in the memblock code, it looks like a
>>>> no-brainer. It's almost certainly the code from Santosh or Grygorii
>>>> that's triggering this.
>>>>
>>>> Config and good/bad dmesg with memblock=debug are here:
>>>>
>>>> http://sr71.net/~dave/intel/3.13/
>>>>
>>>> Please let me know if you need it bisected further than this.
>> Please check attached patch, and it should fix the problem.
>>
>
> [...]
>
>>
>> Subject: [PATCH] x86: Fix numa with reverting wrong memblock setting.
>>
>> Dave reported Numa on x86 is broken on system with 1T memory.
>>
>> It turns out
>> | commit 5b6e529521d35e1bcaa0fe43456d1bbb335cae5d
>> | Author: Santosh Shilimkar <[email protected]>
>> | Date: Tue Jan 21 15:50:03 2014 -0800
>> |
>> | x86: memblock: set current limit to max low memory address
>>
>> set limit to low wrongly.
>>
>> max_low_pfn_mapped is different from max_pfn_mapped.
>> max_low_pfn_mapped is always under 4G.
>>
>> That will memblock_alloc_nid all go under 4G.
>>
>> Revert that offending patch.
>>
>> Reported-by: Dave Hansen <[email protected]>
>> Signed-off-by: Yinghai Lu <[email protected]>
>>
>>
> This mostly will fix the $subject issue but the regression
> reported by Andrew [1] will surface with the revert. Its clear
> now that even though commit fixed the issue, it wasn't the fix.
>
> Would be great if you can have a look at the thread.
>
The patch which is now commit 457ff1d {lib/swiotlb.c: use
memblock apis for early memory allocations} was the breaking the
boot on Andrew's machine. Now if I look back the patch, based on your
above description, I believe below hunk waS/is the culprit.

@@ -172,8 +172,9 @@ int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose)
/*
* Get the overflow emergency buffer
*/
- v_overflow_buffer = alloc_bootmem_low_pages_nopanic(
- PAGE_ALIGN(io_tlb_overflow));
+ v_overflow_buffer = memblock_virt_alloc_nopanic(
+ PAGE_ALIGN(io_tlb_overflow),
+ PAGE_SIZE);
if (!v_overflow_buffer)
return -ENOMEM;


Looks like 'v_overflow_buffer' must be allocated from low memory in this
case. Is that correct ?

Regards,
Santosh

2014-01-24 06:57:10

by Yinghai Lu

[permalink] [raw]
Subject: Re: Panic on 8-node system in memblock_virt_alloc_try_nid()

On Thu, Jan 23, 2014 at 10:38 PM, Santosh Shilimkar
<[email protected]> wrote:
> Yinghai,
>
> On Friday 24 January 2014 12:55 AM, Yinghai Lu wrote:
>> On Thu, Jan 23, 2014 at 2:49 PM, Dave Hansen <[email protected]> wrote:
>>> > Linus's current tree doesn't boot on an 8-node/1TB NUMA system that I
>>> > have. Its reboots are *LONG*, so I haven't fully bisected it, but it's
>>> > down to a just a few commits, most of which are changes to the memblock
>>> > code. Since the panic is in the memblock code, it looks like a
>>> > no-brainer. It's almost certainly the code from Santosh or Grygorii
>>> > that's triggering this.
>>> >
>>> > Config and good/bad dmesg with memblock=debug are here:
>>> >
>>> > http://sr71.net/~dave/intel/3.13/
>>> >
>>> > Please let me know if you need it bisected further than this.
>> Please check attached patch, and it should fix the problem.
>>
>
> [...]
>
>>
>> Subject: [PATCH] x86: Fix numa with reverting wrong memblock setting.
>>
>> Dave reported Numa on x86 is broken on system with 1T memory.
>>
>> It turns out
>> | commit 5b6e529521d35e1bcaa0fe43456d1bbb335cae5d
>> | Author: Santosh Shilimkar <[email protected]>
>> | Date: Tue Jan 21 15:50:03 2014 -0800
>> |
>> | x86: memblock: set current limit to max low memory address
>>
>> set limit to low wrongly.
>>
>> max_low_pfn_mapped is different from max_pfn_mapped.
>> max_low_pfn_mapped is always under 4G.
>>
>> That will memblock_alloc_nid all go under 4G.
>>
>> Revert that offending patch.
>>
>> Reported-by: Dave Hansen <[email protected]>
>> Signed-off-by: Yinghai Lu <[email protected]>
>>
>>
> This mostly will fix the $subject issue but the regression
> reported by Andrew [1] will surface with the revert. Its clear
> now that even though commit fixed the issue, it wasn't the fix.
>
> Would be great if you can have a look at the thread.

>> [1] http://lkml.indiana.edu/hypermail/linux/kernel/1312.1/03770.html

Andrew,

Did you bisect which patch in that 23 patchset cause your system have problem?

Thanks

Yinghai

2014-01-24 07:01:21

by Andrew Morton

[permalink] [raw]
Subject: Re: Panic on 8-node system in memblock_virt_alloc_try_nid()

On Thu, 23 Jan 2014 22:57:08 -0800 Yinghai Lu <[email protected]> wrote:

> On Thu, Jan 23, 2014 at 10:38 PM, Santosh Shilimkar
> <[email protected]> wrote:
> > Yinghai,
> >
> > On Friday 24 January 2014 12:55 AM, Yinghai Lu wrote:
> >> On Thu, Jan 23, 2014 at 2:49 PM, Dave Hansen <[email protected]> wrote:
> >>> > Linus's current tree doesn't boot on an 8-node/1TB NUMA system that I
> >>> > have. Its reboots are *LONG*, so I haven't fully bisected it, but it's
> >>> > down to a just a few commits, most of which are changes to the memblock
> >>> > code. Since the panic is in the memblock code, it looks like a
> >>> > no-brainer. It's almost certainly the code from Santosh or Grygorii
> >>> > that's triggering this.
> >>> >
> >>> > Config and good/bad dmesg with memblock=debug are here:
> >>> >
> >>> > http://sr71.net/~dave/intel/3.13/
> >>> >
> >>> > Please let me know if you need it bisected further than this.
> >> Please check attached patch, and it should fix the problem.
> >>
> >
> > [...]
> >
> >>
> >> Subject: [PATCH] x86: Fix numa with reverting wrong memblock setting.
> >>
> >> Dave reported Numa on x86 is broken on system with 1T memory.
> >>
> >> It turns out
> >> | commit 5b6e529521d35e1bcaa0fe43456d1bbb335cae5d
> >> | Author: Santosh Shilimkar <[email protected]>
> >> | Date: Tue Jan 21 15:50:03 2014 -0800
> >> |
> >> | x86: memblock: set current limit to max low memory address
> >>
> >> set limit to low wrongly.
> >>
> >> max_low_pfn_mapped is different from max_pfn_mapped.
> >> max_low_pfn_mapped is always under 4G.
> >>
> >> That will memblock_alloc_nid all go under 4G.
> >>
> >> Revert that offending patch.
> >>
> >> Reported-by: Dave Hansen <[email protected]>
> >> Signed-off-by: Yinghai Lu <[email protected]>
> >>
> >>
> > This mostly will fix the $subject issue but the regression
> > reported by Andrew [1] will surface with the revert. Its clear
> > now that even though commit fixed the issue, it wasn't the fix.
> >
> > Would be great if you can have a look at the thread.
>
> >> [1] http://lkml.indiana.edu/hypermail/linux/kernel/1312.1/03770.html
>
> Andrew,
>
> Did you bisect which patch in that 23 patchset cause your system have problem?
>

Yes - it was caused by the patch which that email was replying to.
"[PATCH v3 13/23] mm/lib/swiotlb: Use memblock apis for earlymemory
allocations".

2014-01-24 07:04:14

by Yinghai Lu

[permalink] [raw]
Subject: Re: Panic on 8-node system in memblock_virt_alloc_try_nid()

On Thu, Jan 23, 2014 at 10:56 PM, Santosh Shilimkar
<[email protected]> wrote:
> On Friday 24 January 2014 01:38 AM, Santosh Shilimkar wrote:

> The patch which is now commit 457ff1d {lib/swiotlb.c: use
> memblock apis for early memory allocations} was the breaking the
> boot on Andrew's machine. Now if I look back the patch, based on your
> above description, I believe below hunk waS/is the culprit.
>
> @@ -172,8 +172,9 @@ int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose)
> /*
> * Get the overflow emergency buffer
> */
> - v_overflow_buffer = alloc_bootmem_low_pages_nopanic(
> - PAGE_ALIGN(io_tlb_overflow));
> + v_overflow_buffer = memblock_virt_alloc_nopanic(
> + PAGE_ALIGN(io_tlb_overflow),
> + PAGE_SIZE);
> if (!v_overflow_buffer)
> return -ENOMEM;
>
>
> Looks like 'v_overflow_buffer' must be allocated from low memory in this
> case. Is that correct ?

yes.

but should the change like following

commit 457ff1de2d247d9b8917c4664c2325321a35e313
Author: Santosh Shilimkar <[email protected]>
Date: Tue Jan 21 15:50:30 2014 -0800

lib/swiotlb.c: use memblock apis for early memory allocations


@@ -215,13 +220,13 @@ swiotlb_init(int verbose)
bytes = io_tlb_nslabs << IO_TLB_SHIFT;

/* Get IO TLB memory from the low pages */
- vstart = alloc_bootmem_low_pages_nopanic(PAGE_ALIGN(bytes));
+ vstart = memblock_virt_alloc_nopanic(PAGE_ALIGN(bytes), PAGE_SIZE);
if (vstart && !swiotlb_init_with_tbl(vstart, io_tlb_nslabs, verbose))
return;

2014-01-24 07:23:02

by Santosh Shilimkar

[permalink] [raw]
Subject: Re: Panic on 8-node system in memblock_virt_alloc_try_nid()

On Friday 24 January 2014 02:04 AM, Yinghai Lu wrote:
> On Thu, Jan 23, 2014 at 10:56 PM, Santosh Shilimkar
> <[email protected]> wrote:
>> On Friday 24 January 2014 01:38 AM, Santosh Shilimkar wrote:
>
>> The patch which is now commit 457ff1d {lib/swiotlb.c: use
>> memblock apis for early memory allocations} was the breaking the
>> boot on Andrew's machine. Now if I look back the patch, based on your
>> above description, I believe below hunk waS/is the culprit.
>>
>> @@ -172,8 +172,9 @@ int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose)
>> /*
>> * Get the overflow emergency buffer
>> */
>> - v_overflow_buffer = alloc_bootmem_low_pages_nopanic(
>> - PAGE_ALIGN(io_tlb_overflow));
>> + v_overflow_buffer = memblock_virt_alloc_nopanic(
>> + PAGE_ALIGN(io_tlb_overflow),
>> + PAGE_SIZE);
>> if (!v_overflow_buffer)
>> return -ENOMEM;
>>
>>
>> Looks like 'v_overflow_buffer' must be allocated from low memory in this
>> case. Is that correct ?
>
> yes.
>
> but should the change like following
>
> commit 457ff1de2d247d9b8917c4664c2325321a35e313
> Author: Santosh Shilimkar <[email protected]>
> Date: Tue Jan 21 15:50:30 2014 -0800
>
> lib/swiotlb.c: use memblock apis for early memory allocations
>
>
> @@ -215,13 +220,13 @@ swiotlb_init(int verbose)
> bytes = io_tlb_nslabs << IO_TLB_SHIFT;
>
> /* Get IO TLB memory from the low pages */
> - vstart = alloc_bootmem_low_pages_nopanic(PAGE_ALIGN(bytes));
> + vstart = memblock_virt_alloc_nopanic(PAGE_ALIGN(bytes), PAGE_SIZE);
> if (vstart && !swiotlb_init_with_tbl(vstart, io_tlb_nslabs, verbose))
> return;
>
OK. So we need '__alloc_bootmem_low()' equivalent memblock API. We will try
to come up with a patch for the same. Thanks for inputs.

Regards,
Santosh

2014-01-24 07:46:32

by Yinghai Lu

[permalink] [raw]
Subject: Re: Panic on 8-node system in memblock_virt_alloc_try_nid()

On Thu, Jan 23, 2014 at 11:22 PM, Santosh Shilimkar
<[email protected]> wrote:
> On Friday 24 January 2014 02:04 AM, Yinghai Lu wrote:
>> On Thu, Jan 23, 2014 at 10:56 PM, Santosh Shilimkar
>> <[email protected]> wrote:
>>> On Friday 24 January 2014 01:38 AM, Santosh Shilimkar wrote:
>>
>>> The patch which is now commit 457ff1d {lib/swiotlb.c: use
>>> memblock apis for early memory allocations} was the breaking the
>>> boot on Andrew's machine. Now if I look back the patch, based on your
>>> above description, I believe below hunk waS/is the culprit.
>>>
>>> @@ -172,8 +172,9 @@ int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose)
>>> /*
>>> * Get the overflow emergency buffer
>>> */
>>> - v_overflow_buffer = alloc_bootmem_low_pages_nopanic(
>>> - PAGE_ALIGN(io_tlb_overflow));
>>> + v_overflow_buffer = memblock_virt_alloc_nopanic(
>>> + PAGE_ALIGN(io_tlb_overflow),
>>> + PAGE_SIZE);
>>> if (!v_overflow_buffer)
>>> return -ENOMEM;
>>>
>>>
>>> Looks like 'v_overflow_buffer' must be allocated from low memory in this
>>> case. Is that correct ?
>>
>> yes.
>>
>> but should the change like following
>>
>> commit 457ff1de2d247d9b8917c4664c2325321a35e313
>> Author: Santosh Shilimkar <[email protected]>
>> Date: Tue Jan 21 15:50:30 2014 -0800
>>
>> lib/swiotlb.c: use memblock apis for early memory allocations
>>
>>
>> @@ -215,13 +220,13 @@ swiotlb_init(int verbose)
>> bytes = io_tlb_nslabs << IO_TLB_SHIFT;
>>
>> /* Get IO TLB memory from the low pages */
>> - vstart = alloc_bootmem_low_pages_nopanic(PAGE_ALIGN(bytes));
>> + vstart = memblock_virt_alloc_nopanic(PAGE_ALIGN(bytes), PAGE_SIZE);
>> if (vstart && !swiotlb_init_with_tbl(vstart, io_tlb_nslabs, verbose))
>> return;
>>
> OK. So we need '__alloc_bootmem_low()' equivalent memblock API. We will try
> to come up with a patch for the same. Thanks for inputs.

Yes,

Andrew, can you try attached two patches in your setup?

Assume your system does not have intel iommu support?

Thanks

Yinghai


Attachments:
fix_numa_x.patch (1.82 kB)
revert_memblock_swiotlb_change.patch (3.40 kB)
Download all attachments

2014-01-24 07:54:55

by Santosh Shilimkar

[permalink] [raw]
Subject: Re: Panic on 8-node system in memblock_virt_alloc_try_nid()

On Friday 24 January 2014 02:46 AM, Yinghai Lu wrote:
>> OK. So we need '__alloc_bootmem_low()' equivalent memblock API. We will try
>> > to come up with a patch for the same. Thanks for inputs.
> Yes,
>
> Andrew, can you try attached two patches in your setup?
>
> Assume your system does not have intel iommu support?
>
You are fast.. I was cooking up very similar patch as yours.
Thanks for help. Its should mostly fix the issue on Andrew's
box after the revert of commit 5b6e529521

>
> ---
> arch/arm/kernel/setup.c | 2 +-
> include/linux/bootmem.h | 37 +++++++++++++++++++++++++++++++++++++
> lib/swiotlb.c | 4 ++--
> 3 files changed, 40 insertions(+), 3 deletions(-)
>
> Index: linux-2.6/include/linux/bootmem.h
> ===================================================================
> --- linux-2.6.orig/include/linux/bootmem.h
> +++ linux-2.6/include/linux/bootmem.h
> @@ -175,6 +175,27 @@ static inline void * __init memblock_vir
> NUMA_NO_NODE);
> }
>
> +#ifndef ARCH_LOW_ADDRESS_LIMIT
> +#define ARCH_LOW_ADDRESS_LIMIT 0xffffffffUL
> +#endif
> +
> +static inline void * __init memblock_virt_alloc_low(
> + phys_addr_t size, phys_addr_t align)
> +{
> + return memblock_virt_alloc_try_nid(size, align,
> + BOOTMEM_LOW_LIMIT,
> + ARCH_LOW_ADDRESS_LIMIT,
> + NUMA_NO_NODE);
> +}
> +static inline void * __init memblock_virt_alloc_low_nopanic(
> + phys_addr_t size, phys_addr_t align)
> +{
> + return memblock_virt_alloc_try_nid_nopanic(size, align,
> + BOOTMEM_LOW_LIMIT,
> + ARCH_LOW_ADDRESS_LIMIT,
> + NUMA_NO_NODE);
> +}
> +
> static inline void * __init memblock_virt_alloc_from_nopanic(
> phys_addr_t size, phys_addr_t align, phys_addr_t min_addr)
> {
> @@ -238,6 +259,22 @@ static inline void * __init memblock_vir
> return __alloc_bootmem_nopanic(size, align, BOOTMEM_LOW_LIMIT);
> }
>
> +static inline void * __init memblock_virt_alloc_low(
> + phys_addr_t size, phys_addr_t align)
> +{
> + if (!align)
> + align = SMP_CACHE_BYTES;
> + return __alloc_bootmem_low(size, align, BOOTMEM_LOW_LIMIT);
> +}
> +
> +static inline void * __init memblock_virt_alloc_low_nopanic(
> + phys_addr_t size, phys_addr_t align)
> +{
> + if (!align)
> + align = SMP_CACHE_BYTES;
> + return __alloc_bootmem_low_nopanic(size, align, BOOTMEM_LOW_LIMIT);
> +}
> +
> static inline void * __init memblock_virt_alloc_from_nopanic(
> phys_addr_t size, phys_addr_t align, phys_addr_t min_addr)
> {
> Index: linux-2.6/lib/swiotlb.c
> ===================================================================
> --- linux-2.6.orig/lib/swiotlb.c
> +++ linux-2.6/lib/swiotlb.c
> @@ -172,7 +172,7 @@ int __init swiotlb_init_with_tbl(char *t
> /*
> * Get the overflow emergency buffer
> */
> - v_overflow_buffer = memblock_virt_alloc_nopanic(
> + v_overflow_buffer = memblock_virt_alloc_low_nopanic(
> PAGE_ALIGN(io_tlb_overflow),
> PAGE_SIZE);
> if (!v_overflow_buffer)
> @@ -220,7 +220,7 @@ swiotlb_init(int verbose)
> bytes = io_tlb_nslabs << IO_TLB_SHIFT;
>
> /* Get IO TLB memory from the low pages */
> - vstart = memblock_virt_alloc_nopanic(PAGE_ALIGN(bytes), PAGE_SIZE);
> + vstart = memblock_virt_alloc_low_nopanic(PAGE_ALIGN(bytes), PAGE_SIZE);
> if (vstart && !swiotlb_init_with_tbl(vstart, io_tlb_nslabs, verbose))
> return;
>
> Index: linux-2.6/arch/arm/kernel/setup.c
> ===================================================================
> --- linux-2.6.orig/arch/arm/kernel/setup.c
> +++ linux-2.6/arch/arm/kernel/setup.c
> @@ -717,7 +717,7 @@ static void __init request_standard_reso
> kernel_data.end = virt_to_phys(_end - 1);
>
> for_each_memblock(memory, region) {
> - res = memblock_virt_alloc(sizeof(*res), 0);
> + res = memblock_virt_alloc_low(sizeof(*res), 0);
> res->name = "System RAM";
> res->start = __pfn_to_phys(memblock_region_memory_base_pfn(region));
> res->end = __pfn_to_phys(memblock_region_memory_end_pfn(region)) - 1;

2014-01-24 15:02:36

by Dave Hansen

[permalink] [raw]
Subject: Re: Panic on 8-node system in memblock_virt_alloc_try_nid()

On 01/23/2014 09:55 PM, Yinghai Lu wrote:
> On Thu, Jan 23, 2014 at 2:49 PM, Dave Hansen <[email protected]> wrote:
>> Linus's current tree doesn't boot on an 8-node/1TB NUMA system that I
>> have. Its reboots are *LONG*, so I haven't fully bisected it, but it's
>> down to a just a few commits, most of which are changes to the memblock
>> code. Since the panic is in the memblock code, it looks like a
>> no-brainer. It's almost certainly the code from Santosh or Grygorii
>> that's triggering this.
>>
>> Config and good/bad dmesg with memblock=debug are here:
>>
>> http://sr71.net/~dave/intel/3.13/
>>
>> Please let me know if you need it bisected further than this.
>
> Please check attached patch, and it should fix the problem.

There are two failure modes I'm seeing: one when (failing to) allocate
the first node's mem_map[], and a second where it oopses accessing the
numa_distance[] table. This is the numa_distance[] one, and it happens
even with the patch you suggested applied.

> [ 0.000000] memblock_find_in_range_node():239
> [ 0.000000] __memblock_find_range_top_down():150
> [ 0.000000] __memblock_find_range_top_down():152 i: 600000001
> [ 0.000000] memblock_find_in_range_node():241 ret: 2147479552
> [ 0.000000] memblock_reserve: [0x0000007ffff000-0x0000007ffff03f] flags 0x0 numa_set_distance+0xd2/0x252
> [ 0.000000] numa_distance phys: 7ffff000
> [ 0.000000] numa_distance virt: ffff88007ffff000
> [ 0.000000] numa_distance size: 64
> [ 0.000000] numa_alloc_distance() accessing numa_distance[] at byte: 0
> [ 0.000000] BUG: unable to handle kernel paging request at ffff88007ffff000
> [ 0.000000] IP: [<ffffffff81d2c1f1>] numa_set_distance+0x186/0x252
> [ 0.000000] PGD 211e067 PUD 2121067 PMD 0
> [ 0.000000] Oops: 0002 [#1] SMP
> [ 0.000000] Modules linked in:
> [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 3.13.0-slub-04156-g90804ed-dirty #826
> [ 0.000000] Hardware name: FUJITSU-SV PRIMEQUEST 1800E2/SB, BIOS PRIMEQUEST 1000 Series BIOS Version 1.24 09/14/2011
> [ 0.000000] task: ffffffff81c104a0 ti: ffffffff81c00000 task.ti: ffffffff81c00000
> [ 0.000000] RIP: 0010:[<ffffffff81d2c1f1>] [<ffffffff81d2c1f1>] numa_set_distance+0x186/0x252
> [ 0.000000] RSP: 0000:ffffffff81c01cd8 EFLAGS: 00010002
> [ 0.000000] RAX: 000000000000000a RBX: 0000000000000000 RCX: 0000000000000000
> [ 0.000000] RDX: 0000000000000014 RSI: 0000000000000046 RDI: ffffffff81ea4f84
> [ 0.000000] RBP: ffffffff81c01d68 R08: 000000000000100d R09: ffff88007ffff000
> [ 0.000000] R10: 0000000000000127 R11: 000000000000000d R12: 0000000000000000
> [ 0.000000] R13: 000000000000000a R14: 0000000000000008 R15: 0000000000000001
> [ 0.000000] FS: 0000000000000000(0000) GS:ffffffff81d00000(0000) knlGS:0000000000000000
> [ 0.000000] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 0.000000] CR2: ffff88007ffff000 CR3: 0000000001c0b000 CR4: 00000000000000b0
> [ 0.000000] Stack:
> [ 0.000000] 0000000000000000 ffffffff00000000 0000000000000000 0000004081c01dd0
> [ 0.000000] 00000000000000ff 0000000000000000 0000000000000000 0000000000000000
> [ 0.000000] 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> [ 0.000000] Call Trace:
> [ 0.000000] [<ffffffff81d2c480>] acpi_numa_slit_init+0x47/0x70
> [ 0.000000] [<ffffffff81d52c34>] ? acpi_table_print_srat_entry+0x26/0x26
> [ 0.000000] [<ffffffff81d52c9c>] acpi_parse_slit+0x68/0x6c
> [ 0.000000] [<ffffffff81d5156c>] acpi_table_parse+0x6c/0x82
> [ 0.000000] [<ffffffff81d52dcc>] acpi_numa_init+0x94/0xb0
> [ 0.000000] [<ffffffff81d2c6d9>] ? acpi_numa_arch_fixup+0x6/0x6
> [ 0.000000] [<ffffffff81d2c6d9>] ? acpi_numa_arch_fixup+0x6/0x6
> [ 0.000000] [<ffffffff81d2c6e2>] x86_acpi_numa_init+0x9/0x1b
> [ 0.000000] [<ffffffff81d2bbc2>] numa_init+0xe0/0x589
> [ 0.000000] [<ffffffff8108adba>] ? set_pte_vaddr_pud+0x3a/0x60
> [ 0.000000] [<ffffffff8108ae45>] ? set_pte_vaddr+0x65/0xa0
> [ 0.000000] [<ffffffff810902d5>] ? __native_set_fixmap+0x25/0x30
> [ 0.000000] [<ffffffff81d2c2d6>] x86_numa_init+0x19/0x2b
> [ 0.000000] [<ffffffff81d2c419>] initmem_init+0x9/0xb
> [ 0.000000] [<ffffffff81d1b2f3>] setup_arch+0x923/0xc6e
> [ 0.000000] [<ffffffff817032e0>] ? printk+0x4d/0x4f
> [ 0.000000] [<ffffffff81d14b1a>] start_kernel+0x85/0x3db
> [ 0.000000] [<ffffffff81d145a8>] x86_64_start_reservations+0x2a/0x2c
> [ 0.000000] [<ffffffff81d1469a>] x86_64_start_kernel+0xf0/0xf7
> [ 0.000000] Code: ff ff e8 c6 70 9d ff 8b 4d 80 4c 8b 8d 70 ff ff ff b0 0a 4c 03 0d a8 0a 17 00 ba 14 00 00 00 44 39 f9 0f 45 c2 49 ff c7 45 39 fe <41> 88 01 44 8b 85 78 ff ff ff 7f a0 ff c1 45 01 f0 44 39 f1 7c
> [ 0.000000] RIP [<ffffffff81d2c1f1>] numa_set_distance+0x186/0x252
> [ 0.000000] RSP <ffffffff81c01cd8>
> [ 0.000000] CR2: ffff88007ffff000
> [ 0.000000] ---[ end trace 8a50456ee7e911cb ]---
> [ 0.000000] Kernel panic - not syncing: Attempted to kill the idle task!

2014-01-24 15:25:41

by Dave Hansen

[permalink] [raw]
Subject: Re: Panic on 8-node system in memblock_virt_alloc_try_nid()

On 01/24/2014 07:01 AM, Dave Hansen wrote:
> There are two failure modes I'm seeing: one when (failing to) allocate
> the first node's mem_map[], and a second where it oopses accessing the
> numa_distance[] table. This is the numa_distance[] one, and it happens
> even with the patch you suggested applied.

And with my second (lots of debugging enabled) config, I get the
mem_map[] oops. In other words, none of the reverts or patches are
helping either of the conditions that I'm able to trigger.

2014-01-24 17:45:23

by Yinghai Lu

[permalink] [raw]
Subject: Re: Panic on 8-node system in memblock_virt_alloc_try_nid()

On Fri, Jan 24, 2014 at 7:01 AM, Dave Hansen <[email protected]> wrote:
> There are two failure modes I'm seeing: one when (failing to) allocate
> the first node's mem_map[], and a second where it oopses accessing the
> numa_distance[] table. This is the numa_distance[] one, and it happens
> even with the patch you suggested applied.
>
>> [ 0.000000] memblock_find_in_range_node():239
>> [ 0.000000] __memblock_find_range_top_down():150
>> [ 0.000000] __memblock_find_range_top_down():152 i: 600000001
>> [ 0.000000] memblock_find_in_range_node():241 ret: 2147479552
>> [ 0.000000] memblock_reserve: [0x0000007ffff000-0x0000007ffff03f] flags 0x0 numa_set_distance+0xd2/0x252

that address is wrong.

Can you post whole log with current linus' tree + two patches that I
sent out yesterday?

>> [ 0.000000] numa_distance phys: 7ffff000
>> [ 0.000000] numa_distance virt: ffff88007ffff000
>> [ 0.000000] numa_distance size: 64
>> [ 0.000000] numa_alloc_distance() accessing numa_distance[] at byte: 0
>> [ 0.000000] BUG: unable to handle kernel paging request at ffff88007ffff000

2014-01-24 18:10:01

by Dave Hansen

[permalink] [raw]
Subject: Re: Panic on 8-node system in memblock_virt_alloc_try_nid()

On 01/24/2014 09:45 AM, Yinghai Lu wrote:
> On Fri, Jan 24, 2014 at 7:01 AM, Dave Hansen <[email protected]> wrote:
>> There are two failure modes I'm seeing: one when (failing to) allocate
>> the first node's mem_map[], and a second where it oopses accessing the
>> numa_distance[] table. This is the numa_distance[] one, and it happens
>> even with the patch you suggested applied.
>>
>>> [ 0.000000] memblock_find_in_range_node():239
>>> [ 0.000000] __memblock_find_range_top_down():150
>>> [ 0.000000] __memblock_find_range_top_down():152 i: 600000001
>>> [ 0.000000] memblock_find_in_range_node():241 ret: 2147479552
>>> [ 0.000000] memblock_reserve: [0x0000007ffff000-0x0000007ffff03f] flags 0x0 numa_set_distance+0xd2/0x252
>
> that address is wrong.
>
> Can you post whole log with current linus' tree + two patches that I
> sent out yesterday?

Here you go. It's still spitting out memblock_reserve messages to the
console. I'm not sure if it's making _some_ progress or not.

https://www.sr71.net/~dave/intel/3.13/dmesg.with-2-patches

But, it's certainly not booting. Do you want to see it without
memblock=debug?

2014-01-24 18:13:56

by Yinghai Lu

[permalink] [raw]
Subject: Re: Panic on 8-node system in memblock_virt_alloc_try_nid()

On Fri, Jan 24, 2014 at 10:09 AM, Dave Hansen <[email protected]> wrote:
> On 01/24/2014 09:45 AM, Yinghai Lu wrote:
>> On Fri, Jan 24, 2014 at 7:01 AM, Dave Hansen <[email protected]> wrote:
>>> There are two failure modes I'm seeing: one when (failing to) allocate
>>> the first node's mem_map[], and a second where it oopses accessing the
>>> numa_distance[] table. This is the numa_distance[] one, and it happens
>>> even with the patch you suggested applied.
>>>
>>>> [ 0.000000] memblock_find_in_range_node():239
>>>> [ 0.000000] __memblock_find_range_top_down():150
>>>> [ 0.000000] __memblock_find_range_top_down():152 i: 600000001
>>>> [ 0.000000] memblock_find_in_range_node():241 ret: 2147479552
>>>> [ 0.000000] memblock_reserve: [0x0000007ffff000-0x0000007ffff03f] flags 0x0 numa_set_distance+0xd2/0x252
>>
>> that address is wrong.
>>
>> Can you post whole log with current linus' tree + two patches that I
>> sent out yesterday?
>
> Here you go. It's still spitting out memblock_reserve messages to the
> console. I'm not sure if it's making _some_ progress or not.
>
> https://www.sr71.net/~dave/intel/3.13/dmesg.with-2-patches
>
> But, it's certainly not booting. Do you want to see it without
> memblock=debug?

that looks like different problem. and it can not set memory mapping properly.

can you send me .config ?

Thanks

Yinghai

2014-01-24 18:19:20

by Dave Hansen

[permalink] [raw]
Subject: Re: Panic on 8-node system in memblock_virt_alloc_try_nid()

On 01/24/2014 10:13 AM, Yinghai Lu wrote:
> On Fri, Jan 24, 2014 at 10:09 AM, Dave Hansen <[email protected]> wrote:
>> On 01/24/2014 09:45 AM, Yinghai Lu wrote:
>> Here you go. It's still spitting out memblock_reserve messages to the
>> console. I'm not sure if it's making _some_ progress or not.
>>
>> https://www.sr71.net/~dave/intel/3.13/dmesg.with-2-patches
>>
>> But, it's certainly not booting. Do you want to see it without
>> memblock=debug?
>
> that looks like different problem. and it can not set memory mapping properly.
>
> can you send me .config ?

Here you go.

FWIW, I did turn of memblock=debug. It eventually booted, but
slooooooooooowly.

How many problems in this code are we tracking, btw? This is at least
3, right?


Attachments:
config-3.13.0-05617-g3aacd62-dirty.txt (75.16 kB)

2014-01-24 18:24:29

by Yinghai Lu

[permalink] [raw]
Subject: Re: Panic on 8-node system in memblock_virt_alloc_try_nid()

On Fri, Jan 24, 2014 at 10:19 AM, Dave Hansen <[email protected]> wrote:
> On 01/24/2014 10:13 AM, Yinghai Lu wrote:
>> On Fri, Jan 24, 2014 at 10:09 AM, Dave Hansen <[email protected]> wrote:
>>> On 01/24/2014 09:45 AM, Yinghai Lu wrote:
>>> Here you go. It's still spitting out memblock_reserve messages to the
>>> console. I'm not sure if it's making _some_ progress or not.
>>>
>>> https://www.sr71.net/~dave/intel/3.13/dmesg.with-2-patches
>>>
>>> But, it's certainly not booting. Do you want to see it without
>>> memblock=debug?
>>
>> that looks like different problem. and it can not set memory mapping properly.
>>
>> can you send me .config ?
>
> Here you go.
>
> FWIW, I did turn of memblock=debug. It eventually booted, but
> slooooooooooowly.

then that is not a problem, as you are using 4k page mapping only.
and that printout is too spew...

>
> How many problems in this code are we tracking, btw? This is at least
> 3, right?

two problems:
1. big numa system.
2. Andrew's system with swiotlb.

The two patches should address them.

Thanks

Yinghai

2014-01-24 18:43:00

by Dave Hansen

[permalink] [raw]
Subject: Re: Panic on 8-node system in memblock_virt_alloc_try_nid()

On 01/24/2014 10:24 AM, Yinghai Lu wrote:
> On Fri, Jan 24, 2014 at 10:19 AM, Dave Hansen <[email protected]> wrote:
>> FWIW, I did turn of memblock=debug. It eventually booted, but
>> slooooooooooowly.
>
> then that is not a problem, as you are using 4k page mapping only.
> and that printout is too spew...

This means that, essentially, memblock=debug and
KMEMCHECK/DEBUG_PAGEALLOC can't be used together. That's a shame
because my DEBUG_PAGEALLOC config *broke* this code a few months ago,
right? Oh well.

>> How many problems in this code are we tracking, btw? This is at least
>> 3, right?
>
> two problems:
> 1. big numa system.
> 2. Andrew's system with swiotlb.

Can I ask politely for some more caution on your part in this area?
This is two consecutive kernels where this code has broken my system.

2014-01-24 18:51:42

by Yinghai Lu

[permalink] [raw]
Subject: Re: Panic on 8-node system in memblock_virt_alloc_try_nid()

On Fri, Jan 24, 2014 at 10:42 AM, Dave Hansen <[email protected]> wrote:
> On 01/24/2014 10:24 AM, Yinghai Lu wrote:
>> On Fri, Jan 24, 2014 at 10:19 AM, Dave Hansen <[email protected]> wrote:
>>> FWIW, I did turn of memblock=debug. It eventually booted, but
>>> slooooooooooowly.
>>
>> then that is not a problem, as you are using 4k page mapping only.
>> and that printout is too spew...
>
> This means that, essentially, memblock=debug and
> KMEMCHECK/DEBUG_PAGEALLOC can't be used together. That's a shame
> because my DEBUG_PAGEALLOC config *broke* this code a few months ago,
> right? Oh well.

should only be broken when MOVABLE_NODE is enabled on big system.

>
>>> How many problems in this code are we tracking, btw? This is at least
>>> 3, right?
>>
>> two problems:
>> 1. big numa system.
>> 2. Andrew's system with swiotlb.
>
> Can I ask politely for some more caution on your part in this area?
> This is two consecutive kernels where this code has broken my system.

I agree, the code get messy as now we have top_down and bottom up
mapping for different configuration.

I already tried hard to make parsing srat early solution instead that split.

Yinghai