2007-08-22 20:20:49

by Kamalesh Babulal

[permalink] [raw]
Subject: [BUG] 2.6.23-rc3-mm1 kernel BUG at mm/page_alloc.c:2876!

Hi Andrew,

I see call trace followed by the kernel bug with the 2.6.23-rc3-mm1
kernel and have attached the boot log and config file.

=======================================================
SLUB: Genslabs=12, HWalign=128, Order=0-1, MinObjects=4, CPUs=4, Nodes=16

Bad page state in process 'swapper'

page:cf00000000015818 flags:0x0000020000000400 mapping:0000000000000000
mapcount:0 count:0

Trying to fix it up, but a reboot is needed

Backtrace:

Call Trace:

[c0000000005cbab0] [c000000000010344] .show_stack+0x68/0x1b4 (unreliable)

[c0000000005cbb60] [c0000000000a6c54] .bad_page+0x84/0x138

[c0000000005cbbf0] [c0000000000aa9e0] .free_hot_cold_page+0xdc/0x21c

[c0000000005cbc90] [c0000000000ad7ec] .put_page+0x158/0x180

[c0000000005cbd30] [c0000000000d4de8] .kfree+0x74/0xf0

[c0000000005cbdb0] [c0000000000a866c] .process_zones+0x1a8/0x1f8

[c0000000005cbe60] [c0000000004b5160] .setup_per_cpu_pageset+0x24/0x48

[c0000000005cbee0] [c0000000004978d8] .start_kernel+0x304/0x3f4

[c0000000005cbf90] [c0000000003bef10] .start_here_common+0x54/0x58

Hexdump:

000: cf 00 00 00 00 01 57 d0 00 00 02 00 00 00 04 00

010: 00 00 00 01 ff ff ff ff 00 00 00 00 00 00 00 00

020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

030: cf 00 00 00 00 01 58 08 cf 00 00 00 00 01 58 08

040: 00 00 02 00 00 00 04 00 00 00 00 00 ff ff ff ff

050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

060: 00 00 00 00 00 00 00 00 cf 00 00 00 00 01 58 40

070: cf 00 00 00 00 01 58 40 00 00 02 00 00 00 04 00

080: 00 00 00 01 ff ff ff ff 00 00 00 00 00 00 00 00

090: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

0a0: cf 00 00 00 00 01 58 78 cf 00 00 00 00 01 58 78

0b0: 00 00 02 00 00 00 04 00 00 00 00 01 ff ff ff ff

------------[ cut here ]------------

kernel BUG at mm/page_alloc.c:2876!

cpu 0x0: Vector: 700 (Program Check) at [c0000000005cbbe0]

pc: c0000000004b5160: .setup_per_cpu_pageset+0x24/0x48

lr: c0000000004b5160: .setup_per_cpu_pageset+0x24/0x48

sp: c0000000005cbe60

msr: 8000000000029032

current = 0xc0000000004fd1b0

paca = 0xc0000000004fdd80

pid = 0, comm = swapper

kernel BUG at mm/page_alloc.c:2876!

enter ? for help

Thanks & Regards,
Kamalesh Babulal.


Attachments:
boot_log (5.05 kB)
dotconfig (54.43 kB)
Download all attachments

2007-08-22 20:48:39

by Andrew Morton

[permalink] [raw]
Subject: Re: [BUG] 2.6.23-rc3-mm1 kernel BUG at mm/page_alloc.c:2876!

On Thu, 23 Aug 2007 01:50:10 +0530
Kamalesh Babulal <[email protected]> wrote:

> Hi Andrew,
>
> I see call trace followed by the kernel bug with the 2.6.23-rc3-mm1
> kernel and have attached the boot log and config file.
>
> =======================================================
> SLUB: Genslabs=12, HWalign=128, Order=0-1, MinObjects=4, CPUs=4, Nodes=16
> Bad page state in process 'swapper'
> page:cf00000000015818 flags:0x0000020000000400 mapping:0000000000000000
> mapcount:0 count:0
> Trying to fix it up, but a reboot is needed
> Backtrace:
> Call Trace:
> [c0000000005cbab0] [c000000000010344] .show_stack+0x68/0x1b4 (unreliable)
> [c0000000005cbb60] [c0000000000a6c54] .bad_page+0x84/0x138
> [c0000000005cbbf0] [c0000000000aa9e0] .free_hot_cold_page+0xdc/0x21c
> [c0000000005cbc90] [c0000000000ad7ec] .put_page+0x158/0x180
> [c0000000005cbd30] [c0000000000d4de8] .kfree+0x74/0xf0
> [c0000000005cbdb0] [c0000000000a866c] .process_zones+0x1a8/0x1f8
> [c0000000005cbe60] [c0000000004b5160] .setup_per_cpu_pageset+0x24/0x48
> [c0000000005cbee0] [c0000000004978d8] .start_kernel+0x304/0x3f4
> [c0000000005cbf90] [c0000000003bef10] .start_here_common+0x54/0x58
> Hexdump:
> 000: cf 00 00 00 00 01 57 d0 00 00 02 00 00 00 04 00
> 010: 00 00 00 01 ff ff ff ff 00 00 00 00 00 00 00 00
> 020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 030: cf 00 00 00 00 01 58 08 cf 00 00 00 00 01 58 08
> 040: 00 00 02 00 00 00 04 00 00 00 00 00 ff ff ff ff
> 050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 060: 00 00 00 00 00 00 00 00 cf 00 00 00 00 01 58 40
> 070: cf 00 00 00 00 01 58 40 00 00 02 00 00 00 04 00
> 080: 00 00 00 01 ff ff ff ff 00 00 00 00 00 00 00 00
> 090: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0a0: cf 00 00 00 00 01 58 78 cf 00 00 00 00 01 58 78
> 0b0: 00 00 02 00 00 00 04 00 00 00 00 01 ff ff ff ff
> ------------[ cut here ]------------
> kernel BUG at mm/page_alloc.c:2876!
> cpu 0x0: Vector: 700 (Program Check) at [c0000000005cbbe0]
> pc: c0000000004b5160: .setup_per_cpu_pageset+0x24/0x48
> lr: c0000000004b5160: .setup_per_cpu_pageset+0x24/0x48
> sp: c0000000005cbe60
> msr: 8000000000029032
> current = 0xc0000000004fd1b0
> paca = 0xc0000000004fdd80
> pid = 0, comm = swapper
> kernel BUG at mm/page_alloc.c:2876!
>

Looks like process_zones() got a kmalloc_node() failure and then crashed in
the recovery code.

This:

--- a/mm/page_alloc.c~a
+++ a/mm/page_alloc.c
@@ -2814,6 +2814,8 @@ static int __cpuinit process_zones(int c
return 0;
bad:
for_each_zone(dzone) {
+ if (!populated_zone(zone))
+ continue;
if (dzone == zone)
break;
kfree(zone_pcp(dzone, cpu));
_

might help avoid the crash, but why did kmalloc_node() fail?


2007-08-22 20:50:58

by Andrew Morton

[permalink] [raw]
Subject: Re: [BUG] 2.6.23-rc3-mm1 kernel BUG at mm/page_alloc.c:2876!

On Wed, 22 Aug 2007 13:48:00 -0700
Andrew Morton <[email protected]> wrote:

> This:
>
> --- a/mm/page_alloc.c~a
> +++ a/mm/page_alloc.c
> @@ -2814,6 +2814,8 @@ static int __cpuinit process_zones(int c
> return 0;
> bad:
> for_each_zone(dzone) {
> + if (!populated_zone(zone))
> + continue;
> if (dzone == zone)
> break;
> kfree(zone_pcp(dzone, cpu));
> _
>
> might help avoid the crash

err, make that

--- a/mm/page_alloc.c~a
+++ a/mm/page_alloc.c
@@ -2814,6 +2814,8 @@ static int __cpuinit process_zones(int c
return 0;
bad:
for_each_zone(dzone) {
+ if (!populated_zone(dzone))
+ continue;
if (dzone == zone)
break;
kfree(zone_pcp(dzone, cpu));
_


2007-08-22 21:09:32

by Christoph Lameter

[permalink] [raw]
Subject: Re: [BUG] 2.6.23-rc3-mm1 kernel BUG at mm/page_alloc.c:2876!

On Wed, 22 Aug 2007, Andrew Morton wrote:

> On Thu, 23 Aug 2007 01:50:10 +0530
> Kamalesh Babulal <[email protected]> wrote:
>
> > Hi Andrew,
> >
> > I see call trace followed by the kernel bug with the 2.6.23-rc3-mm1
> > kernel and have attached the boot log and config file.

> > =======================================================
> > SLUB: Genslabs=12, HWalign=128, Order=0-1, MinObjects=4, CPUs=4, Nodes=16

16 nodes and 4 cpus? Can I see the zones map that is displayed on
boot? How are the cpus mapped to the nodes?

kmalloc_node walks the zonelists from the node that was specified.

2007-08-23 13:07:43

by mel

[permalink] [raw]
Subject: Re: [BUG] 2.6.23-rc3-mm1 kernel BUG at mm/page_alloc.c:2876!

On (22/08/07 13:50), Andrew Morton didst pronounce:
> On Wed, 22 Aug 2007 13:48:00 -0700
> Andrew Morton <[email protected]> wrote:
>
> > This:
> >
> > --- a/mm/page_alloc.c~a
> > +++ a/mm/page_alloc.c
> > @@ -2814,6 +2814,8 @@ static int __cpuinit process_zones(int c
> > return 0;
> > bad:
> > for_each_zone(dzone) {
> > + if (!populated_zone(zone))
> > + continue;
> > if (dzone == zone)
> > break;
> > kfree(zone_pcp(dzone, cpu));
> > _
> >
> > might help avoid the crash
>
> err, make that
>

We're already in the error path at this point and it's going to blow up.
The real problem is kmalloc_node() returning NULL for whatever reason.

> --- a/mm/page_alloc.c~a
> +++ a/mm/page_alloc.c
> @@ -2814,6 +2814,8 @@ static int __cpuinit process_zones(int c
> return 0;
> bad:
> for_each_zone(dzone) {
> + if (!populated_zone(dzone))
> + continue;
> if (dzone == zone)
> break;
> kfree(zone_pcp(dzone, cpu));
> _
>
>

--
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab

2007-08-23 14:50:05

by mel

[permalink] [raw]
Subject: Re: [BUG] 2.6.23-rc3-mm1 kernel BUG at mm/page_alloc.c:2876!

On (23/08/07 01:50), Kamalesh Babulal didst pronounce:
> Hi Andrew,
>
> I see call trace followed by the kernel bug with the 2.6.23-rc3-mm1
> kernel and have attached the boot log and config file.
>

For this one in the boot log;

Built 2 zonelists in Node order, mobility grouping off. Total pages: 0

The lack of pages might explain allocation failures :/ . Not super-clear
why it failed to find any memory yet.

--
Mel Gorman

2007-08-23 17:17:39

by Kamalesh Babulal

[permalink] [raw]
Subject: Re: [BUG] 2.6.23-rc3-mm1 kernel BUG at mm/page_alloc.c:2876!

Mel Gorman wrote:
> On (22/08/07 13:50), Andrew Morton didst pronounce:
>
>> On Wed, 22 Aug 2007 13:48:00 -0700
>> Andrew Morton <[email protected]> wrote:
>>
>>
>>> This:
>>>
>>> --- a/mm/page_alloc.c~a
>>> +++ a/mm/page_alloc.c
>>> @@ -2814,6 +2814,8 @@ static int __cpuinit process_zones(int c
>>> return 0;
>>> bad:
>>> for_each_zone(dzone) {
>>> + if (!populated_zone(zone))
>>> + continue;
>>> if (dzone == zone)
>>> break;
>>> kfree(zone_pcp(dzone, cpu));
>>> _
>>>
>>> might help avoid the crash
>>>
>> err, make that
>>
>>
>
> We're already in the error path at this point and it's going to blow up.
> The real problem is kmalloc_node() returning NULL for whatever reason.
>
>
>> --- a/mm/page_alloc.c~a
>> +++ a/mm/page_alloc.c
>> @@ -2814,6 +2814,8 @@ static int __cpuinit process_zones(int c
>> return 0;
>> bad:
>> for_each_zone(dzone) {
>> + if (!populated_zone(dzone))
>> + continue;
>> if (dzone == zone)
>> break;
>> kfree(zone_pcp(dzone, cpu));
>> _
>>
>>
>>
>
>
After applying the patch, the call trace is gone but the kernel bug
is still hit


Memory: 4105840k/4194304k available (4964k kernel code, 88464k reserved,
948k data, 571k bss, 264k init)
SLUB: Genslabs=12, HWalign=128, Order=0-1, MinObjects=4, CPUs=4, Nodes=16
------------[ cut here ]------------
kernel BUG at mm/page_alloc.c:2878!
cpu 0x0: Vector: 700 (Program Check) at [c0000000005cbbe0]
pc: c0000000004b5160: .setup_per_cpu_pageset+0x24/0x48
lr: c0000000004b5160: .setup_per_cpu_pageset+0x24/0x48
sp: c0000000005cbe60
msr: 8000000000029032
current = 0xc0000000004fd1b0
paca = 0xc0000000004fdd80
pid = 0, comm = swapper
kernel BUG at mm/page_alloc.c:2878!
enter ? for help
[c0000000005cbee0] c0000000004978d8 .start_kernel+0x304/0x3f4
[c0000000005cbf90] c0000000003bef1c .start_here_common+0x54/0x58

-
Kamalesh Babulal




2007-08-23 20:05:33

by Christoph Lameter

[permalink] [raw]
Subject: Re: [BUG] 2.6.23-rc3-mm1 kernel BUG at mm/page_alloc.c:2876!

On Thu, 23 Aug 2007, Kamalesh Babulal wrote:

> After applying the patch, the call trace is gone but the kernel bug
> is still hit

Yes that is what we expected. We need more information to figure out why
the kmalloc_node fails there. It should walk through all nodes to find
memory.

I see that you have 4 cpus and 16 nodes. How are the cpus assigned to
nodes? If a cpu would be assigned to a nonexisting node then this could be
the result.

Could you post the full boot log?

2007-08-24 06:16:10

by Kamalesh Babulal

[permalink] [raw]
Subject: Re: [BUG] 2.6.23-rc3-mm1 kernel BUG at mm/page_alloc.c:2876!

Christoph Lameter wrote:
> On Thu, 23 Aug 2007, Kamalesh Babulal wrote:
>
>
>> After applying the patch, the call trace is gone but the kernel bug
>> is still hit
>>
>
> Yes that is what we expected. We need more information to figure out why
> the kmalloc_node fails there. It should walk through all nodes to find
> memory.
>
> I see that you have 4 cpus and 16 nodes. How are the cpus assigned to
> nodes? If a cpu would be assigned to a nonexisting node then this could be
> the result.
>
> Could you post the full boot log?
>
>
boot log with the andrew patch applied

Welcome to yaboot version 1.3.13
Enter "help" to get some basic usage information
boot: autobench
Please wait, loading kernel...
Elf64 kernel loaded...
Loading ramdisk...
ramdisk loaded at 02400000, size: 1191 Kbytes
OF stdout device is: /vdevice/vty@30000000
Hypertas detected, assuming LPAR !
command line: ro console=hvc0 autobench_args: root=/dev/sda6
ABAT:1187885681
memory layout at init:
alloc_bottom : 000000000252a000
alloc_top : 0000000008000000
alloc_top_hi : 0000000100000000
rmo_top : 0000000008000000
ram_top : 0000000100000000
Looking for displays
instantiating rtas at 0x00000000077d9000 ... done
0000000000000000 : boot cpu 0000000000000000
0000000000000002 : starting cpu hw idx 0000000000000002... done
copying OF device tree ...
Building dt strings...
Building dt structure...
Device tree strings 0x000000000262b000 -> 0x000000000262c1d3
Device tree struct 0x000000000262d000 -> 0x0000000002635000
Calling quiesce ...
returning from prom_init
Partition configured for 4 cpus.


Starting Linux PPC64 #1 SMP Thu Aug 23 11:54:44 EDT 2007
-----------------------------------------------------
ppc64_pft_size = 0x1a
physicalMemorySize = 0x100000000
ppc64_caches.dcache_line_size = 0x80
ppc64_caches.icache_line_size = 0x80
htab_address = 0x0000000000000000
htab_hash_mask = 0x7ffff
-----------------------------------------------------
Linux version 2.6.23-rc3-mm1-autokern1
([email protected]) (gcc version 3.4.6 20060404 (Red Hat
3.4.6-3)) #1 SMP Thu Aug 23 11:54:44 EDT 2007
[boot]0012 Setup Arch
vmemmap cf00000000000000 allocated at c000000001000000, physical
0000000001000000.
vmemmap cf00000001000000 allocated at c000000004000000, physical
0000000004000000.
vmemmap cf00000002000000 allocated at c000000005000000, physical
0000000005000000.
vmemmap cf00000003000000 allocated at c000000006000000, physical
0000000006000000.
EEH: PCI Enhanced I/O Error Handling Enabled
PPC64 nvram contains 7168 bytes
Zone PFN ranges:
DMA 0 -> 1048576
Normal 1048576 -> 1048576
Movable zone start PFN for each node
early_node_map[1] active PFN ranges
2: 0 -> 1048576
Could not find start_pfn for node 0
[boot]0015 Setup Done
Built 2 zonelists in Node order, mobility grouping off. Total pages: 0
Policy zone: DMA
Kernel command line: ro console=hvc0 autobench_args: root=/dev/sda6
ABAT:1187885681
[boot]0020 XICS Init
[boot]0021 XICS Done
PID hash table entries: 4096 (order: 12, 32768 bytes)
Console: colour dummy device 80x25
console handover: boot [udbg0] -> real [hvc0]
Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes)
Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes)
freeing bootmem node 2
Memory: 4105840k/4194304k available (4964k kernel code, 88464k reserved,
948k data, 571k bss, 264k init)
SLUB: Genslabs=12, HWalign=128, Order=0-1, MinObjects=4, CPUs=4, Nodes=16
------------[ cut here ]------------
kernel BUG at mm/page_alloc.c:2878!
cpu 0x0: Vector: 700 (Program Check) at [c0000000005cbbe0]
pc: c0000000004b5160: .setup_per_cpu_pageset+0x24/0x48
lr: c0000000004b5160: .setup_per_cpu_pageset+0x24/0x48
sp: c0000000005cbe60
msr: 8000000000029032
current = 0xc0000000004fd1b0
paca = 0xc0000000004fdd80
pid = 0, comm = swapper
kernel BUG at mm/page_alloc.c:2878!
enter ? for help
[c0000000005cbee0] c0000000004978d8 .start_kernel+0x304/0x3f4
[c0000000005cbf90] c0000000003bef1c .start_here_common+0x54/0x58

-
Kamalesh Babulal.



2007-08-24 08:59:00

by mel

[permalink] [raw]
Subject: Re: [BUG] 2.6.23-rc3-mm1 kernel BUG at mm/page_alloc.c:2876!

On (24/08/07 11:45), Kamalesh Babulal didst pronounce:
> Christoph Lameter wrote:
> >On Thu, 23 Aug 2007, Kamalesh Babulal wrote:
> >
> >
> >>After applying the patch, the call trace is gone but the kernel bug
> >>is still hit
> >>
> >
> >Yes that is what we expected. We need more information to figure out why
> >the kmalloc_node fails there. It should walk through all nodes to find
> >memory.
> >
> >I see that you have 4 cpus and 16 nodes. How are the cpus assigned to
> >nodes? If a cpu would be assigned to a nonexisting node then this could be
> >the result.
> >
> >Could you post the full boot log?
> >
> >
> boot log with the andrew patch applied
>
> Welcome to yaboot version 1.3.13
> Enter "help" to get some basic usage information
> boot: autobench
> Please wait, loading kernel...
> Elf64 kernel loaded...
> Loading ramdisk...
> ramdisk loaded at 02400000, size: 1191 Kbytes
> OF stdout device is: /vdevice/vty@30000000
> Hypertas detected, assuming LPAR !
> command line: ro console=hvc0 autobench_args: root=/dev/sda6
> ABAT:1187885681
> memory layout at init:
> alloc_bottom : 000000000252a000
> alloc_top : 0000000008000000
> alloc_top_hi : 0000000100000000
> rmo_top : 0000000008000000
> ram_top : 0000000100000000
> Looking for displays
> instantiating rtas at 0x00000000077d9000 ... done
> 0000000000000000 : boot cpu 0000000000000000
> 0000000000000002 : starting cpu hw idx 0000000000000002... done
> copying OF device tree ...
> Building dt strings...
> Building dt structure...
> Device tree strings 0x000000000262b000 -> 0x000000000262c1d3
> Device tree struct 0x000000000262d000 -> 0x0000000002635000
> Calling quiesce ...
> returning from prom_init
> Partition configured for 4 cpus.
>
>
> Starting Linux PPC64 #1 SMP Thu Aug 23 11:54:44 EDT 2007
> -----------------------------------------------------
> ppc64_pft_size = 0x1a
> physicalMemorySize = 0x100000000
> ppc64_caches.dcache_line_size = 0x80
> ppc64_caches.icache_line_size = 0x80
> htab_address = 0x0000000000000000
> htab_hash_mask = 0x7ffff
> -----------------------------------------------------
> Linux version 2.6.23-rc3-mm1-autokern1
> ([email protected]) (gcc version 3.4.6 20060404 (Red Hat
> 3.4.6-3)) #1 SMP Thu Aug 23 11:54:44 EDT 2007
> [boot]0012 Setup Arch
> vmemmap cf00000000000000 allocated at c000000001000000, physical
> 0000000001000000.
> vmemmap cf00000001000000 allocated at c000000004000000, physical
> 0000000004000000.
> vmemmap cf00000002000000 allocated at c000000005000000, physical
> 0000000005000000.
> vmemmap cf00000003000000 allocated at c000000006000000, physical
> 0000000006000000.
> EEH: PCI Enhanced I/O Error Handling Enabled
> PPC64 nvram contains 7168 bytes
> Zone PFN ranges:
> DMA 0 -> 1048576
> Normal 1048576 -> 1048576
> Movable zone start PFN for each node
> early_node_map[1] active PFN ranges
> 2: 0 -> 1048576
> Could not find start_pfn for node 0
> [boot]0015 Setup Done
> Built 2 zonelists in Node order, mobility grouping off. Total pages: 0

This indicates to me that the zonelists are trashed. All memory is on
zone 2 according to early_node_map[] and the CPU is most likely part of
node 0 that doesn't have a proper fallback list

> Policy zone: DMA
> Kernel command line: ro console=hvc0 autobench_args: root=/dev/sda6
> ABAT:1187885681
> [boot]0020 XICS Init
> [boot]0021 XICS Done
> PID hash table entries: 4096 (order: 12, 32768 bytes)
> Console: colour dummy device 80x25
> console handover: boot [udbg0] -> real [hvc0]
> Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes)
> Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes)
> freeing bootmem node 2
> Memory: 4105840k/4194304k available (4964k kernel code, 88464k reserved,
> 948k data, 571k bss, 264k init)
> SLUB: Genslabs=12, HWalign=128, Order=0-1, MinObjects=4, CPUs=4, Nodes=16
> ------------[ cut here ]------------
> kernel BUG at mm/page_alloc.c:2878!
> cpu 0x0: Vector: 700 (Program Check) at [c0000000005cbbe0]
> pc: c0000000004b5160: .setup_per_cpu_pageset+0x24/0x48
> lr: c0000000004b5160: .setup_per_cpu_pageset+0x24/0x48
> sp: c0000000005cbe60
> msr: 8000000000029032
> current = 0xc0000000004fd1b0
> paca = 0xc0000000004fdd80
> pid = 0, comm = swapper
> kernel BUG at mm/page_alloc.c:2878!
> enter ? for help
> [c0000000005cbee0] c0000000004978d8 .start_kernel+0x304/0x3f4
> [c0000000005cbf90] c0000000003bef1c .start_here_common+0x54/0x58
>
> -
> Kamalesh Babulal.
>
>
>

--
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab

2007-08-24 16:54:20

by Christoph Lameter

[permalink] [raw]
Subject: Re: [BUG] 2.6.23-rc3-mm1 kernel BUG at mm/page_alloc.c:2876!

On Fri, 24 Aug 2007, Kamalesh Babulal wrote:

> Starting Linux PPC64 #1 SMP Thu Aug 23 11:54:44 EDT 2007

Argh. PPC64. The typical thing that we break on all major NUMA
changes.

> EEH: PCI Enhanced I/O Error Handling Enabled
> PPC64 nvram contains 7168 bytes
> Zone PFN ranges:
> DMA 0 -> 1048576
> Normal 1048576 -> 1048576
> Movable zone start PFN for each node
> early_node_map[1] active PFN ranges
> 2: 0 -> 1048576
> Could not find start_pfn for node 0
> [boot]0015 Setup Done
> Built 2 zonelists in Node order, mobility grouping off. Total pages: 0
> Policy zone: DMA

Uhhh huh. So we have node 0 and 2 that got zonelists. What happened to
node 1?

> Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes)
> Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes)
> freeing bootmem node 2

Hmmm... The boot occurs on node 2??

There could be something wrong with zonelist generation since various
people worked on it. Could you add some printks to show how the zonelists
are generated?