2008-06-22 12:56:50

by Vegard Nossum

[permalink] [raw]
Subject: v2.6.26-rc7: BUG: unable to handle kernel NULL pointer dereference

Hi,

More cpu-hotplug-related fun... ;-)

I know the kernel version says dirty, but the only thing I applied was my
softirq-debugging patch, which is completely unrelated.


Vegard


Initializing CPU#1
APIC error on CPU0: 00(40)
Stuck ??
Inquiring remote APIC #1...
... APIC #1 ID: failed
... APIC #1 VERSION: failed
... APIC #1 SPIV: failed
BUG: unable to handle kernel NULL pointer dereference at 00000024
IP: [<c01e0f06>] sysfs_remove_group+0x16/0xc0
*pde = 00000000
Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC

Pid: 3994, comm: bash Not tainted (2.6.26-rc7-00002-g8b2e474-dirty #29)
EIP: 0060:[<c01e0f06>] EFLAGS: 00010282 CPU: 0
EIP is at sysfs_remove_group+0x16/0xc0
EAX: 00000008 EBX: 00000001 ECX: 00000004 EDX: c0694c14
ESI: 00000004 EDI: c07015fc EBP: f3cbfe9c ESP: f3cbfe80
DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process bash (pid: 3994, ti=f3cbe000 task=f3cb4f60 task.ti=f3cbe000)
Stack: c0302262 f4bcda40 f3cbfe98 00000008 00000001 00000004 00000000 f3cbfea8
c05880b2 c07576c4 f3cbfec8 c014f567 00000001 00000004 c0753c90 0000001f
00000001 fffffffb f3cbfedc c014f5d9 0000001f 00000000 00000004 f3cbff00
Call Trace:
[<c0302262>] ? device_unregister+0x12/0x20
[<c05880b2>] ? topology_cpu_callback+0x32/0x70
[<c014f567>] ? notifier_call_chain+0x37/0x70
[<c014f5d9>] ? __raw_notifier_call_chain+0x19/0x20
[<c0586f6e>] ? _cpu_up+0xee/0x100
[<c0586fc9>] ? cpu_up+0x49/0x70
[<c05678d8>] ? store_online+0x58/0x80
[<c0567880>] ? store_online+0x0/0x80
[<c0302a6b>] ? sysdev_store+0x2b/0x40
[<c01df0f2>] ? sysfs_write_file+0xa2/0x100
[<c01a0d06>] ? vfs_write+0x96/0x130
[<c01df050>] ? sysfs_write_file+0x0/0x100
[<c01a13cd>] ? sys_write+0x3d/0x70
[<c010831b>] ? sysenter_past_esp+0x78/0xd1
=======================
Code: 8b 43 04 83 c3 04 85 c0 75 ed 5b 5e 5d c3 8d b4 26 00 00 00 00 55 89 e5 83 ec 1c 89 7d fc 89 d7 89 5d f4 89 75 f8 89 45 f0 8b 12 <8b> 70 1c 85 d2 74 53 89 f0 e8 bc f5 ff ff 85 c0 89 c3 74 59 8b
EIP: [<c01e0f06>] sysfs_remove_group+0x16/0xc0 SS:ESP 0068:f3cbfe80


2008-06-22 14:47:39

by Vegard Nossum

[permalink] [raw]
Subject: Re: v2.6.26-rc7: BUG: unable to handle kernel NULL pointer dereference

On Sun, Jun 22, 2008 at 2:56 PM, Vegard Nossum <[email protected]> wrote:
> Initializing CPU#1
> APIC error on CPU0: 00(40)
> Stuck ??
> Inquiring remote APIC #1...
> ... APIC #1 ID: failed
> ... APIC #1 VERSION: failed
> ... APIC #1 SPIV: failed

Arch-specific __cpu_up() failed...

> BUG: unable to handle kernel NULL pointer dereference at 00000024
> IP: [<c01e0f06>] sysfs_remove_group+0x16/0xc0
> *pde = 00000000
> Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
>
> Pid: 3994, comm: bash Not tainted (2.6.26-rc7-00002-g8b2e474-dirty #29)
> EIP: 0060:[<c01e0f06>] EFLAGS: 00010282 CPU: 0
> EIP is at sysfs_remove_group+0x16/0xc0
> EAX: 00000008 EBX: 00000001 ECX: 00000004 EDX: c0694c14
> ESI: 00000004 EDI: c07015fc EBP: f3cbfe9c ESP: f3cbfe80
> DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
> Process bash (pid: 3994, ti=f3cbe000 task=f3cb4f60 task.ti=f3cbe000)
> Stack: c0302262 f4bcda40 f3cbfe98 00000008 00000001 00000004 00000000 f3cbfea8
> c05880b2 c07576c4 f3cbfec8 c014f567 00000001 00000004 c0753c90 0000001f
> 00000001 fffffffb f3cbfedc c014f5d9 0000001f 00000000 00000004 f3cbff00
> Call Trace:
> [<c0302262>] ? device_unregister+0x12/0x20

This is line 131 of fs/sysfs/group.c:

struct sysfs_dirent *dir_sd = kobj->sd;

> [<c05880b2>] ? topology_cpu_callback+0x32/0x70
> [<c014f567>] ? notifier_call_chain+0x37/0x70
> [<c014f5d9>] ? __raw_notifier_call_chain+0x19/0x20
> [<c0586f6e>] ? _cpu_up+0xee/0x100

This is line 314 of _cpu_up():

out_notify:
if (ret != 0)
__raw_notifier_call_chain(&cpu_chain,
CPU_UP_CANCELED | mod, hcpu, nr_calls, NULL);

> [<c0586fc9>] ? cpu_up+0x49/0x70
> [<c05678d8>] ? store_online+0x58/0x80
> [<c0567880>] ? store_online+0x0/0x80
> [<c0302a6b>] ? sysdev_store+0x2b/0x40
> [<c01df0f2>] ? sysfs_write_file+0xa2/0x100
> [<c01a0d06>] ? vfs_write+0x96/0x130
> [<c01df050>] ? sysfs_write_file+0x0/0x100
> [<c01a13cd>] ? sys_write+0x3d/0x70
> [<c010831b>] ? sysenter_past_esp+0x78/0xd1
> =======================
> Code: 8b 43 04 83 c3 04 85 c0 75 ed 5b 5e 5d c3 8d b4 26 00 00 00 00 55 89 e5 83 ec 1c 89 7d fc 89 d7 89 5d f4 89 75 f8 89 45 f0 8b 12 <8b> 70 1c 85 d2 74 53 89 f0 e8 bc f5 ff ff 85 c0 89 c3 74 59 8b
> EIP: [<c01e0f06>] sysfs_remove_group+0x16/0xc0 SS:ESP 0068:f3cbfe80
>
>

...adding a few related Ccs.


Vegard

--
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
-- E. W. Dijkstra, EWD1036

2008-06-22 14:54:17

by Vegard Nossum

[permalink] [raw]
Subject: Re: v2.6.26-rc7: BUG: unable to handle kernel NULL pointer dereference

On Sun, Jun 22, 2008 at 4:47 PM, Vegard Nossum <[email protected]> wrote:
> On Sun, Jun 22, 2008 at 2:56 PM, Vegard Nossum <[email protected]> wrote:
>> Initializing CPU#1
>> APIC error on CPU0: 00(40)
>> Stuck ??
>> Inquiring remote APIC #1...
>> ... APIC #1 ID: failed
>> ... APIC #1 VERSION: failed
>> ... APIC #1 SPIV: failed
>
> Arch-specific __cpu_up() failed...
>
>> BUG: unable to handle kernel NULL pointer dereference at 00000024
>> IP: [<c01e0f06>] sysfs_remove_group+0x16/0xc0
>> *pde = 00000000
>> Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
>>
>> Pid: 3994, comm: bash Not tainted (2.6.26-rc7-00002-g8b2e474-dirty #29)
>> EIP: 0060:[<c01e0f06>] EFLAGS: 00010282 CPU: 0
>> EIP is at sysfs_remove_group+0x16/0xc0
>> EAX: 00000008 EBX: 00000001 ECX: 00000004 EDX: c0694c14
>> ESI: 00000004 EDI: c07015fc EBP: f3cbfe9c ESP: f3cbfe80
>> DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
>> Process bash (pid: 3994, ti=f3cbe000 task=f3cb4f60 task.ti=f3cbe000)
>> Stack: c0302262 f4bcda40 f3cbfe98 00000008 00000001 00000004 00000000 f3cbfea8
>> c05880b2 c07576c4 f3cbfec8 c014f567 00000001 00000004 c0753c90 0000001f
>> 00000001 fffffffb f3cbfedc c014f5d9 0000001f 00000000 00000004 f3cbff00
>> Call Trace:
>> [<c0302262>] ? device_unregister+0x12/0x20
>
> This is line 131 of fs/sysfs/group.c:
>
> struct sysfs_dirent *dir_sd = kobj->sd;
>
>> [<c05880b2>] ? topology_cpu_callback+0x32/0x70
>> [<c014f567>] ? notifier_call_chain+0x37/0x70
>> [<c014f5d9>] ? __raw_notifier_call_chain+0x19/0x20
>> [<c0586f6e>] ? _cpu_up+0xee/0x100
>
> This is line 314 of _cpu_up():
>
> out_notify:
> if (ret != 0)
> __raw_notifier_call_chain(&cpu_chain,
> CPU_UP_CANCELED | mod, hcpu, nr_calls, NULL);
>
>> [<c0586fc9>] ? cpu_up+0x49/0x70
>> [<c05678d8>] ? store_online+0x58/0x80
>> [<c0567880>] ? store_online+0x0/0x80
>> [<c0302a6b>] ? sysdev_store+0x2b/0x40
>> [<c01df0f2>] ? sysfs_write_file+0xa2/0x100
>> [<c01a0d06>] ? vfs_write+0x96/0x130
>> [<c01df050>] ? sysfs_write_file+0x0/0x100
>> [<c01a13cd>] ? sys_write+0x3d/0x70
>> [<c010831b>] ? sysenter_past_esp+0x78/0xd1
>> =======================
>> Code: 8b 43 04 83 c3 04 85 c0 75 ed 5b 5e 5d c3 8d b4 26 00 00 00 00 55 89 e5 83 ec 1c 89 7d fc 89 d7 89 5d f4 89 75 f8 89 45 f0 8b 12 <8b> 70 1c 85 d2 74 53 89 f0 e8 bc f5 ff ff 85 c0 89 c3 74 59 8b
>> EIP: [<c01e0f06>] sysfs_remove_group+0x16/0xc0 SS:ESP 0068:f3cbfe80
>>
>>
>
> ...adding a few related Ccs.

Sorry, forgot one. This is probably the root cause of the crash:

commit e37d05dad7ff9744efd8ea95a70d389e9a65a6fc
Author: Mike Travis <[email protected]>
Date: Thu May 1 04:35:16 2008 -0700

cpu: change cpu_sys_devices from array to per_cpu variable

Change cpu_sys_devices from array to per_cpu variable in drivers/base/cpu.c.

which had this hunk:

struct sys_device *get_cpu_sysdev(unsigned cpu)
{
- if (cpu < NR_CPUS)
- return cpu_sys_devices[cpu];
+ if (cpu < nr_cpu_ids && cpu_possible(cpu))
+ return per_cpu(cpu_sys_devices, cpu);
else
return NULL;
}

...so now we're trying to unregister a NULL device (which of course
has no ->kobj).


Vegard

--
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
-- E. W. Dijkstra, EWD1036

2008-06-22 15:58:28

by Adrian Bunk

[permalink] [raw]
Subject: Re: v2.6.26-rc7: BUG: unable to handle kernel NULL pointer dereference

On Sun, Jun 22, 2008 at 04:54:01PM +0200, Vegard Nossum wrote:
> On Sun, Jun 22, 2008 at 4:47 PM, Vegard Nossum <[email protected]> wrote:
> > On Sun, Jun 22, 2008 at 2:56 PM, Vegard Nossum <[email protected]> wrote:
> >> Initializing CPU#1
> >> APIC error on CPU0: 00(40)
> >> Stuck ??
> >> Inquiring remote APIC #1...
> >> ... APIC #1 ID: failed
> >> ... APIC #1 VERSION: failed
> >> ... APIC #1 SPIV: failed
> >
> > Arch-specific __cpu_up() failed...
> >
> >> BUG: unable to handle kernel NULL pointer dereference at 00000024
> >> IP: [<c01e0f06>] sysfs_remove_group+0x16/0xc0
> >> *pde = 00000000
> >> Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
> >>
> >> Pid: 3994, comm: bash Not tainted (2.6.26-rc7-00002-g8b2e474-dirty #29)
> >> EIP: 0060:[<c01e0f06>] EFLAGS: 00010282 CPU: 0
> >> EIP is at sysfs_remove_group+0x16/0xc0
> >> EAX: 00000008 EBX: 00000001 ECX: 00000004 EDX: c0694c14
> >> ESI: 00000004 EDI: c07015fc EBP: f3cbfe9c ESP: f3cbfe80
> >> DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
> >> Process bash (pid: 3994, ti=f3cbe000 task=f3cb4f60 task.ti=f3cbe000)
> >> Stack: c0302262 f4bcda40 f3cbfe98 00000008 00000001 00000004 00000000 f3cbfea8
> >> c05880b2 c07576c4 f3cbfec8 c014f567 00000001 00000004 c0753c90 0000001f
> >> 00000001 fffffffb f3cbfedc c014f5d9 0000001f 00000000 00000004 f3cbff00
> >> Call Trace:
> >> [<c0302262>] ? device_unregister+0x12/0x20
> >
> > This is line 131 of fs/sysfs/group.c:
> >
> > struct sysfs_dirent *dir_sd = kobj->sd;
> >
> >> [<c05880b2>] ? topology_cpu_callback+0x32/0x70
> >> [<c014f567>] ? notifier_call_chain+0x37/0x70
> >> [<c014f5d9>] ? __raw_notifier_call_chain+0x19/0x20
> >> [<c0586f6e>] ? _cpu_up+0xee/0x100
> >
> > This is line 314 of _cpu_up():
> >
> > out_notify:
> > if (ret != 0)
> > __raw_notifier_call_chain(&cpu_chain,
> > CPU_UP_CANCELED | mod, hcpu, nr_calls, NULL);
> >
> >> [<c0586fc9>] ? cpu_up+0x49/0x70
> >> [<c05678d8>] ? store_online+0x58/0x80
> >> [<c0567880>] ? store_online+0x0/0x80
> >> [<c0302a6b>] ? sysdev_store+0x2b/0x40
> >> [<c01df0f2>] ? sysfs_write_file+0xa2/0x100
> >> [<c01a0d06>] ? vfs_write+0x96/0x130
> >> [<c01df050>] ? sysfs_write_file+0x0/0x100
> >> [<c01a13cd>] ? sys_write+0x3d/0x70
> >> [<c010831b>] ? sysenter_past_esp+0x78/0xd1
> >> =======================
> >> Code: 8b 43 04 83 c3 04 85 c0 75 ed 5b 5e 5d c3 8d b4 26 00 00 00 00 55 89 e5 83 ec 1c 89 7d fc 89 d7 89 5d f4 89 75 f8 89 45 f0 8b 12 <8b> 70 1c 85 d2 74 53 89 f0 e8 bc f5 ff ff 85 c0 89 c3 74 59 8b
> >> EIP: [<c01e0f06>] sysfs_remove_group+0x16/0xc0 SS:ESP 0068:f3cbfe80
> >>
> >>
> >
> > ...adding a few related Ccs.
>
> Sorry, forgot one. This is probably the root cause of the crash:
>
> commit e37d05dad7ff9744efd8ea95a70d389e9a65a6fc
> Author: Mike Travis <[email protected]>
> Date: Thu May 1 04:35:16 2008 -0700
>
> cpu: change cpu_sys_devices from array to per_cpu variable
>
> Change cpu_sys_devices from array to per_cpu variable in drivers/base/cpu.c.
>...

Can you confirm whether this is definitely the cause or not?

E.g. if it is and 2.6.25 works fine it might qualify as a 2.6.26-rc
regression.

> Vegard

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

2008-06-22 16:29:19

by Vegard Nossum

[permalink] [raw]
Subject: Re: v2.6.26-rc7: BUG: unable to handle kernel NULL pointer dereference

On Sun, Jun 22, 2008 at 5:56 PM, Adrian Bunk <[email protected]> wrote:
>> commit e37d05dad7ff9744efd8ea95a70d389e9a65a6fc
>> Author: Mike Travis <[email protected]>
>> Date: Thu May 1 04:35:16 2008 -0700
>>
>> cpu: change cpu_sys_devices from array to per_cpu variable
>>
>> Change cpu_sys_devices from array to per_cpu variable in drivers/base/cpu.c.
>>...
>
> Can you confirm whether this is definitely the cause or not?
>
> E.g. if it is and 2.6.25 works fine it might qualify as a 2.6.26-rc
> regression.

Hm, no. Each time I run this test, I get a different error (has been
NULL pointer, stuck CPU, circular locking dependency, ...) :-D

But if you look at the patch, I KNOW that this function is the one
that returns NULL, and it does so because the check is now stricter
than before. The hunk was:

- if (cpu < NR_CPUS)
- return cpu_sys_devices[cpu];
+ if (cpu < nr_cpu_ids && cpu_possible(cpu))
+ return per_cpu(cpu_sys_devices, cpu);
else
return NULL;

And the (cpu < nr_cpu_ids) fails because the CPU has just been
offlined (or failed to initialize, but it's the same thing), while
NR_CPUS is the value that was compiled in as CONFIG_NR_CPUS (so the
former check will always be true).

I don't think it is valid to ask for a per_cpu() variable on a CPU
which does not exist, though, so I don't know what the right fix would
be. A straight revert would be possible, but probably not desirable.
I'm so definitely not an expert in this area, but this "fix" _looks_
correct to me:

diff --git a/drivers/base/topology.c b/drivers/base/topology.c
index fdf4044..3bd95fd 100644
--- a/drivers/base/topology.c
+++ b/drivers/base/topology.c
@@ -143,14 +143,10 @@ static int __cpuinit topology_cpu_callback(struct notifier
int rc = 0;

switch (action) {
- case CPU_UP_PREPARE:
- case CPU_UP_PREPARE_FROZEN:
+ case CPU_ONLINE:
rc = topology_add_dev(cpu);
break;
- case CPU_UP_CANCELED:
- case CPU_UP_CANCELED_FROZEN:
- case CPU_DEAD:
- case CPU_DEAD_FROZEN:
+ case CPU_DOWN_PREPARE:
topology_remove_dev(cpu);
break;
}

I'm sorry, I can't really say whether it's a regression or not. But
I'd bet it is.


Vegard

--
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
-- E. W. Dijkstra, EWD1036

2008-06-23 03:28:39

by Rusty Russell

[permalink] [raw]
Subject: Re: v2.6.26-rc7: BUG: unable to handle kernel NULL pointer dereference

On Monday 23 June 2008 02:29:07 Vegard Nossum wrote:
> And the (cpu < nr_cpu_ids) fails because the CPU has just been
> offlined (or failed to initialize, but it's the same thing), while
> NR_CPUS is the value that was compiled in as CONFIG_NR_CPUS (so the
> former check will always be true).
>
> I don't think it is valid to ask for a per_cpu() variable on a CPU
> which does not exist, though

Yes it is. As long as cpu_possible(cpu), per_cpu(cpu) is valid.

The number check should be removed: checking cpu_possible() is sufficient.

Hope that helps,
Rusty.

2008-06-23 16:58:59

by Mike Travis

[permalink] [raw]
Subject: Re: v2.6.26-rc7: BUG: unable to handle kernel NULL pointer dereference

Rusty Russell wrote:
> On Monday 23 June 2008 02:29:07 Vegard Nossum wrote:
>> And the (cpu < nr_cpu_ids) fails because the CPU has just been
>> offlined (or failed to initialize, but it's the same thing), while
>> NR_CPUS is the value that was compiled in as CONFIG_NR_CPUS (so the
>> former check will always be true).
>>
>> I don't think it is valid to ask for a per_cpu() variable on a CPU
>> which does not exist, though
>
> Yes it is. As long as cpu_possible(cpu), per_cpu(cpu) is valid.
>
> The number check should be removed: checking cpu_possible() is sufficient.
>
> Hope that helps,
> Rusty.

I don't see a check for index being out of range in cpu_possible().

Thanks,
Mike

2008-06-24 01:42:27

by Rusty Russell

[permalink] [raw]
Subject: Re: v2.6.26-rc7: BUG: unable to handle kernel NULL pointer dereference

On Tuesday 24 June 2008 02:58:44 Mike Travis wrote:
> Rusty Russell wrote:
> > On Monday 23 June 2008 02:29:07 Vegard Nossum wrote:
> >> And the (cpu < nr_cpu_ids) fails because the CPU has just been
> >> offlined (or failed to initialize, but it's the same thing), while
> >> NR_CPUS is the value that was compiled in as CONFIG_NR_CPUS (so the
> >> former check will always be true).
> >>
> >> I don't think it is valid to ask for a per_cpu() variable on a CPU
> >> which does not exist, though
> >
> > Yes it is. As long as cpu_possible(cpu), per_cpu(cpu) is valid.
> >
> > The number check should be removed: checking cpu_possible() is
> > sufficient.
> >
> > Hope that helps,
> > Rusty.
>
> I don't see a check for index being out of range in cpu_possible().

You're right. It assumes cpu is < NR_CPUS. Hmm, I have no idea what's going
on. nr_cpu_ids (ignore that it's a horrible name for a bad idea) should be
fine to test against.

Vegard's analysis is flawed: just because cpu is offline, it still must be <
nr_cpu_ids, which is based on possible cpus. Unless something crazy is
happening, but a quick grep doesn't reveal anyone manipulating nr_cpu_ids.

If changing this fixes the bug, something else is badly wrong...
Rusty.

2008-06-24 07:40:46

by Vegard Nossum

[permalink] [raw]
Subject: Re: v2.6.26-rc7: BUG: unable to handle kernel NULL pointer dereference

On Tue, Jun 24, 2008 at 3:36 AM, Rusty Russell <[email protected]> wrote:
> Vegard's analysis is flawed: just because cpu is offline, it still must be <
> nr_cpu_ids, which is based on possible cpus. Unless something crazy is
> happening, but a quick grep doesn't reveal anyone manipulating nr_cpu_ids.

Hm, you are right and I was wrong. I'm sorry, it just seemed too
obvious to be any other way, and I made some assumptions about
nr_cpu_ids. (IIRC, nr_node_ids changes dynamically as nodes are
added/removed, so I assumed it was the same for CPUs.)

This doesn't change the fact that get_cpu_sysdev(cpu) returns NULL,
however. This variable, the per-cpu cpu_sys_device, is only ever
changed in two places, register_cpu() and unregister_cpu(); in
register_cpu(), it is set to

per_cpu(cpu_sys_devices, num) = &cpu->sysdev;,

and in unregister_cpu(), it is set to

per_cpu(cpu_sys_devices, logical_cpu) = NULL;.

So it seems *likely* that register_cpu() was never called (after the
previous unregister_cpu(), which we know happened successfully).

register_cpu() is called from arch_register_cpu(), which is called
from toplogy_init() and acpi_processor_hotadd_init(). Now, the
topology_init() call-chain is uninteresting, since it only happens at
boot. The question is whether acpi_processor_hotadd_init() will be
called if the arch-specific __cpu_up() fails...

But I am not able to follow that code.

Thanks for looking at this.


Vegard

PS: I'll withdraw the statement that this is probably a regression. It
seems more likely that nobody ever hit the "cpu failed to init" case
before.

--
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
-- E. W. Dijkstra, EWD1036

2008-06-24 08:08:29

by Yanmin Zhang

[permalink] [raw]
Subject: Re: v2.6.26-rc7: BUG: unable to handle kernel NULL pointer dereference


On Tue, 2008-06-24 at 11:36 +1000, Rusty Russell wrote:
> On Tuesday 24 June 2008 02:58:44 Mike Travis wrote:
> > Rusty Russell wrote:
> > > On Monday 23 June 2008 02:29:07 Vegard Nossum wrote:
> > >> And the (cpu < nr_cpu_ids) fails because the CPU has just been
> > >> offlined (or failed to initialize, but it's the same thing), while
> > >> NR_CPUS is the value that was compiled in as CONFIG_NR_CPUS (so the
> > >> former check will always be true).
> > >>
> > >> I don't think it is valid to ask for a per_cpu() variable on a CPU
> > >> which does not exist, though
> > >
> > > Yes it is. As long as cpu_possible(cpu), per_cpu(cpu) is valid.
> > >
> > > The number check should be removed: checking cpu_possible() is
> > > sufficient.
> > >
> > > Hope that helps,
> > > Rusty.
> >
> > I don't see a check for index being out of range in cpu_possible().
>
> You're right. It assumes cpu is < NR_CPUS. Hmm, I have no idea what's going
> on. nr_cpu_ids (ignore that it's a horrible name for a bad idea) should be
> fine to test against.
>
> Vegard's analysis is flawed: just because cpu is offline, it still must be <
> nr_cpu_ids, which is based on possible cpus. Unless something crazy is
> happening, but a quick grep doesn't reveal anyone manipulating nr_cpu_ids.
>
> If changing this fixes the bug, something else is badly wrong...
> Rusty.

In function _cpu_up, the panic happens when calling __raw_notifier_call_chain
at the second time. Kernel doesn't panic when calling it at the first time. If
just say because of nr_cpu_ids, that's not right.

By checking source codes, I find function do_boot_cpu is the culprit.
Consider below call chain:
_cpu_up=>__cpu_up=>smp_ops.cpu_up=>native_cpu_up=>do_boot_cpu.

So do_boot_cpu is called in the end. In do_boot_cpu, if boot_error==true,
cpu_clear(cpu, cpu_possible_map) is executed. So later on, when _cpu_up
calls __raw_notifier_call_chain at the second time to report CPU_UP_CANCELED,
because this cpu is already cleared from cpu_possible_map, get_cpu_sysdev returns
NULL.

Many resources are related to cpu_possible_map, so it's better not to change it.

Below patch against 2.6.26-rc7 fixes it by removing the bit clearing in cpu_possible_map.

Vegard, would you like to help test it?

Signed-off-by: Zhang Yanmin <[email protected]>

---

diff -Nraup linux-2.6.26-rc7/arch/x86/kernel/smpboot.c linux-2.6.26-rc7_cpuhotplug/arch/x86/kernel/smpboot.c
--- linux-2.6.26-rc7/arch/x86/kernel/smpboot.c 2008-06-24 09:03:54.000000000 +0800
+++ linux-2.6.26-rc7_cpuhotplug/arch/x86/kernel/smpboot.c 2008-06-24 09:04:45.000000000 +0800
@@ -996,7 +996,6 @@ do_rest:
#endif
cpu_clear(cpu, cpu_callout_map); /* was set by do_boot_cpu() */
cpu_clear(cpu, cpu_initialized); /* was set by cpu_init() */
- cpu_clear(cpu, cpu_possible_map);
cpu_clear(cpu, cpu_present_map);
per_cpu(x86_cpu_to_apicid, cpu) = BAD_APICID;
}



2008-06-24 08:37:57

by Vegard Nossum

[permalink] [raw]
Subject: Re: v2.6.26-rc7: BUG: unable to handle kernel NULL pointer dereference

On Tue, Jun 24, 2008 at 10:06 AM, Zhang, Yanmin<[email protected]> wrote:> In function _cpu_up, the panic happens when calling __raw_notifier_call_chain> at the second time. Kernel doesn't panic when calling it at the first time. If> just say because of nr_cpu_ids, that's not right.>> By checking source codes, I find function do_boot_cpu is the culprit.> Consider below call chain:> _cpu_up=>__cpu_up=>smp_ops.cpu_up=>native_cpu_up=>do_boot_cpu.>> So do_boot_cpu is called in the end. In do_boot_cpu, if boot_error==true,> cpu_clear(cpu, cpu_possible_map) is executed. So later on, when _cpu_up> calls __raw_notifier_call_chain at the second time to report CPU_UP_CANCELED,> because this cpu is already cleared from cpu_possible_map, get_cpu_sysdev returns> NULL.
Ahhha! Well done!
(Whew, I have a lot to learn :-D)
>> Many resources are related to cpu_possible_map, so it's better not to change it.>> Below patch against 2.6.26-rc7 fixes it by removing the bit clearing in cpu_possible_map.>> Vegard, would you like to help test it?
Sure, but it can take a while. 1) I have no idea why the processorfailed to initialize in the first place. So far, it only ever happenedthis one time. 2) There seems to be a couple of other failure casesrelated to cpu hotplug (usually the machine freezes hard), so it's notcertain that we hit this (failed to initialize) first. But I will try!
Thanks for solving the mystery.

Vegard
-- "The animistic metaphor of the bug that maliciously sneaked in whilethe programmer was not looking is intellectually dishonest as itdisguises that the error is the programmer's own creation." -- E. W. Dijkstra, EWD1036????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2008-06-24 13:16:47

by Rusty Russell

[permalink] [raw]
Subject: Re: v2.6.26-rc7: BUG: unable to handle kernel NULL pointer dereference

On Tuesday 24 June 2008 18:06:23 Zhang, Yanmin wrote:
> On Tue, 2008-06-24 at 11:36 +1000, Rusty Russell wrote:
> > On Tuesday 24 June 2008 02:58:44 Mike Travis wrote:
> > > Rusty Russell wrote:
> > > > On Monday 23 June 2008 02:29:07 Vegard Nossum wrote:
> > > >> And the (cpu < nr_cpu_ids) fails because the CPU has just been
> > > >> offlined (or failed to initialize, but it's the same thing), while
> > > >> NR_CPUS is the value that was compiled in as CONFIG_NR_CPUS (so the
> > > >> former check will always be true).
> > > >>
> > > >> I don't think it is valid to ask for a per_cpu() variable on a CPU
> > > >> which does not exist, though
> > > >
> > > > Yes it is. As long as cpu_possible(cpu), per_cpu(cpu) is valid.
> > > >
> > > > The number check should be removed: checking cpu_possible() is
> > > > sufficient.
> > > >
> > > > Hope that helps,
> > > > Rusty.
> > >
> > > I don't see a check for index being out of range in cpu_possible().
> >
> > You're right. It assumes cpu is < NR_CPUS. Hmm, I have no idea what's
> > going on. nr_cpu_ids (ignore that it's a horrible name for a bad idea)
> > should be fine to test against.
> >
> > Vegard's analysis is flawed: just because cpu is offline, it still must
> > be < nr_cpu_ids, which is based on possible cpus. Unless something crazy
> > is happening, but a quick grep doesn't reveal anyone manipulating
> > nr_cpu_ids.
> >
> > If changing this fixes the bug, something else is badly wrong...
> > Rusty.
>
> In function _cpu_up, the panic happens when calling
> __raw_notifier_call_chain at the second time. Kernel doesn't panic when
> calling it at the first time. If just say because of nr_cpu_ids, that's
> not right.
>
> By checking source codes, I find function do_boot_cpu is the culprit.
> Consider below call chain:
> _cpu_up=>__cpu_up=>smp_ops.cpu_up=>native_cpu_up=>do_boot_cpu.
>
> So do_boot_cpu is called in the end. In do_boot_cpu, if boot_error==true,
> cpu_clear(cpu, cpu_possible_map) is executed. So later on, when _cpu_up
> calls __raw_notifier_call_chain at the second time to report
> CPU_UP_CANCELED, because this cpu is already cleared from
> cpu_possible_map, get_cpu_sysdev returns NULL.
>
> Many resources are related to cpu_possible_map, so it's better not to
> change it.
>
> Below patch against 2.6.26-rc7 fixes it by removing the bit clearing in
> cpu_possible_map.
>
> Vegard, would you like to help test it?
>
> Signed-off-by: Zhang Yanmin <[email protected]>
>
> ---
>
> diff -Nraup linux-2.6.26-rc7/arch/x86/kernel/smpboot.c
> linux-2.6.26-rc7_cpuhotplug/arch/x86/kernel/smpboot.c ---
> linux-2.6.26-rc7/arch/x86/kernel/smpboot.c 2008-06-24 09:03:54.000000000
> +0800 +++ linux-2.6.26-rc7_cpuhotplug/arch/x86/kernel/smpboot.c 2008-06-24
> 09:04:45.000000000 +0800 @@ -996,7 +996,6 @@ do_rest:
> #endif
> cpu_clear(cpu, cpu_callout_map); /* was set by do_boot_cpu() */
> cpu_clear(cpu, cpu_initialized); /* was set by cpu_init() */
> - cpu_clear(cpu, cpu_possible_map);
> cpu_clear(cpu, cpu_present_map);
> per_cpu(x86_cpu_to_apicid, cpu) = BAD_APICID;
> }

Nice catch. Basically, cpu_possible_map should only be cleared at boot, and
probably not even then.

Acked-by: Rusty Russell <[email protected]>

Thanks,
Rusty.

2008-06-24 14:44:57

by Mike Travis

[permalink] [raw]
Subject: Re: v2.6.26-rc7: BUG: unable to handle kernel NULL pointer dereference

Rusty Russell wrote:
...
> Nice catch. Basically, cpu_possible_map should only be cleared at boot, and
> probably not even then.
>

One thing that should be avoided, is clearing anything but the last bit in the
cpu_possible_map. This is because num_possible_cpus != nr_cpu_ids when there
are holes in the map. (nr_cpu_ids = highest possible cpu # + 1).

Thanks,
Mike

2008-06-25 05:39:03

by Rusty Russell

[permalink] [raw]
Subject: Re: v2.6.26-rc7: BUG: unable to handle kernel NULL pointer dereference

On Wednesday 25 June 2008 00:44:45 Mike Travis wrote:
> Rusty Russell wrote:
> ...
>
> > Nice catch. Basically, cpu_possible_map should only be cleared at boot,
> > and probably not even then.
>
> One thing that should be avoided, is clearing anything but the last bit in
> the cpu_possible_map. This is because num_possible_cpus != nr_cpu_ids when
> there are holes in the map. (nr_cpu_ids = highest possible cpu # + 1).

It's ok if nr_cpu_ids is an overestimate, isn't it?

But for this corner case, I think clearing cpu_possible_map is wrong.

Cheers,
Rusty.

2008-06-25 15:07:15

by Mike Travis

[permalink] [raw]
Subject: Re: v2.6.26-rc7: BUG: unable to handle kernel NULL pointer dereference

Rusty Russell wrote:
> On Wednesday 25 June 2008 00:44:45 Mike Travis wrote:
>> Rusty Russell wrote:
>> ...
>>
>>> Nice catch. Basically, cpu_possible_map should only be cleared at boot,
>>> and probably not even then.
>> One thing that should be avoided, is clearing anything but the last bit in
>> the cpu_possible_map. This is because num_possible_cpus != nr_cpu_ids when
>> there are holes in the map. (nr_cpu_ids = highest possible cpu # + 1).
>
> It's ok if nr_cpu_ids is an overestimate, isn't it?

Yes. As I see it, nr_cpu_ids is the max index (+1) into anything dealing with
cpu's.
>
> But for this corner case, I think clearing cpu_possible_map is wrong.

Yes, I agree. If for some reason, ACPI discovers a "possible" cpu but it faults
when brought online, then it simply stays offline. It may never come online, or
with some trick hardware, it could be replaced on the running system and then
a new attempt can be made to bring it online.

Thanks,
Mike

2008-06-26 01:02:58

by Yanmin Zhang

[permalink] [raw]
Subject: Re: v2.6.26-rc7: BUG: unable to handle kernel NULL pointer dereference


On Tue, 2008-06-24 at 16:06 +0800, Zhang, Yanmin wrote:
> On Tue, 2008-06-24 at 11:36 +1000, Rusty Russell wrote:
> > On Tuesday 24 June 2008 02:58:44 Mike Travis wrote:
> > > Rusty Russell wrote:
> > > > On Monday 23 June 2008 02:29:07 Vegard Nossum wrote:
> > > >> And the (cpu < nr_cpu_ids) fails because the CPU has just been
> > > >> offlined (or failed to initialize, but it's the same thing), while
> > > >> NR_CPUS is the value that was compiled in as CONFIG_NR_CPUS (so the
> > > >> former check will always be true).
> > > >>
> > > >> I don't think it is valid to ask for a per_cpu() variable on a CPU
> > > >> which does not exist, though
> > > >
> > > > Yes it is. As long as cpu_possible(cpu), per_cpu(cpu) is valid.
> > > >
> > > > The number check should be removed: checking cpu_possible() is
> > > > sufficient.
> > > >
> > > > Hope that helps,
> > > > Rusty.
> > >
> > > I don't see a check for index being out of range in cpu_possible().
> >
> > You're right. It assumes cpu is < NR_CPUS. Hmm, I have no idea what's going
> > on. nr_cpu_ids (ignore that it's a horrible name for a bad idea) should be
> > fine to test against.
> >
> > Vegard's analysis is flawed: just because cpu is offline, it still must be <
> > nr_cpu_ids, which is based on possible cpus. Unless something crazy is
> > happening, but a quick grep doesn't reveal anyone manipulating nr_cpu_ids.
> >
> > If changing this fixes the bug, something else is badly wrong...
> > Rusty.
>
> In function _cpu_up, the panic happens when calling __raw_notifier_call_chain
> at the second time. Kernel doesn't panic when calling it at the first time. If
> just say because of nr_cpu_ids, that's not right.
>
> By checking source codes, I find function do_boot_cpu is the culprit.
> Consider below call chain:
> _cpu_up=>__cpu_up=>smp_ops.cpu_up=>native_cpu_up=>do_boot_cpu.
>
> So do_boot_cpu is called in the end. In do_boot_cpu, if boot_error==true,
> cpu_clear(cpu, cpu_possible_map) is executed. So later on, when _cpu_up
> calls __raw_notifier_call_chain at the second time to report CPU_UP_CANCELED,
> because this cpu is already cleared from cpu_possible_map, get_cpu_sysdev returns
> NULL.
>
> Many resources are related to cpu_possible_map, so it's better not to change it.
>
> Below patch against 2.6.26-rc7 fixes it by removing the bit clearing in cpu_possible_map.
>
> Vegard, would you like to help test it?
>
> Signed-off-by: Zhang Yanmin <[email protected]>
>
> ---
>
> diff -Nraup linux-2.6.26-rc7/arch/x86/kernel/smpboot.c linux-2.6.26-rc7_cpuhotplug/arch/x86/kernel/smpboot.c
> --- linux-2.6.26-rc7/arch/x86/kernel/smpboot.c 2008-06-24 09:03:54.000000000 +0800
> +++ linux-2.6.26-rc7_cpuhotplug/arch/x86/kernel/smpboot.c 2008-06-24 09:04:45.000000000 +0800
> @@ -996,7 +996,6 @@ do_rest:
> #endif
> cpu_clear(cpu, cpu_callout_map); /* was set by do_boot_cpu() */
> cpu_clear(cpu, cpu_initialized); /* was set by cpu_init() */
> - cpu_clear(cpu, cpu_possible_map);
> cpu_clear(cpu, cpu_present_map);
> per_cpu(x86_cpu_to_apicid, cpu) = BAD_APICID;
> }
>
Andrew,

Would you like to pick up this patch? Rusty Russell <[email protected]> acked it.

Thanks,
yanmin

2008-06-26 02:17:18

by Andrew Morton

[permalink] [raw]
Subject: Re: v2.6.26-rc7: BUG: unable to handle kernel NULL pointer dereference

On Thu, 26 Jun 2008 08:59:39 +0800 "Zhang, Yanmin" <[email protected]> wrote:

>
> On Tue, 2008-06-24 at 16:06 +0800, Zhang, Yanmin wrote:
> > On Tue, 2008-06-24 at 11:36 +1000, Rusty Russell wrote:
> > > On Tuesday 24 June 2008 02:58:44 Mike Travis wrote:
> > > > Rusty Russell wrote:
> > > > > On Monday 23 June 2008 02:29:07 Vegard Nossum wrote:
> > > > >> And the (cpu < nr_cpu_ids) fails because the CPU has just been
> > > > >> offlined (or failed to initialize, but it's the same thing), while
> > > > >> NR_CPUS is the value that was compiled in as CONFIG_NR_CPUS (so the
> > > > >> former check will always be true).
> > > > >>
> > > > >> I don't think it is valid to ask for a per_cpu() variable on a CPU
> > > > >> which does not exist, though
> > > > >
> > > > > Yes it is. As long as cpu_possible(cpu), per_cpu(cpu) is valid.
> > > > >
> > > > > The number check should be removed: checking cpu_possible() is
> > > > > sufficient.
> > > > >
> > > > > Hope that helps,
> > > > > Rusty.
> > > >
> > > > I don't see a check for index being out of range in cpu_possible().
> > >
> > > You're right. It assumes cpu is < NR_CPUS. Hmm, I have no idea what's going
> > > on. nr_cpu_ids (ignore that it's a horrible name for a bad idea) should be
> > > fine to test against.
> > >
> > > Vegard's analysis is flawed: just because cpu is offline, it still must be <
> > > nr_cpu_ids, which is based on possible cpus. Unless something crazy is
> > > happening, but a quick grep doesn't reveal anyone manipulating nr_cpu_ids.
> > >
> > > If changing this fixes the bug, something else is badly wrong...
> > > Rusty.
> >
> > In function _cpu_up, the panic happens when calling __raw_notifier_call_chain
> > at the second time. Kernel doesn't panic when calling it at the first time. If
> > just say because ___of nr_cpu_ids, that's not right.
> >
> > By checking source codes, I find function do_boot_cpu is the culprit.
> > Consider below call chain:
> > _cpu_up=>__cpu_up=>smp_ops.cpu_up=>native_cpu_up=>do_boot_cpu.
> >
> > So ___do_boot_cpu is called in the end. In ___do_boot_cpu, if boot_error==true,
> > cpu_clear(cpu, cpu_possible_map) is executed. So later on, when ____cpu_up
> > calls _____raw_notifier_call_chain at the second time to report CPU_UP_CANCELED,
> > because this cpu is already cleared from ___cpu_possible_map, get_cpu_sysdev returns
> > NULL.
> >
> > Many resources are related to ___cpu_possible_map, so it's better not to change it.
> >
> > Below patch against 2.6.26-rc7 fixes it by removing the bit clearing in ___cpu_possible_map.
> >
> > Vegard, would you like to help test it?
> >
> > _________Signed-off-by: Zhang Yanmin ___<[email protected]>
> >
> > ---
> >
> > diff -Nraup linux-2.6.26-rc7/arch/x86/kernel/smpboot.c linux-2.6.26-rc7_cpuhotplug/arch/x86/kernel/smpboot.c
> > --- linux-2.6.26-rc7/arch/x86/kernel/smpboot.c 2008-06-24 09:03:54.000000000 +0800
> > +++ linux-2.6.26-rc7_cpuhotplug/arch/x86/kernel/smpboot.c 2008-06-24 09:04:45.000000000 +0800
> > @@ -996,7 +996,6 @@ do_rest:
> > #endif
> > cpu_clear(cpu, cpu_callout_map); /* was set by do_boot_cpu() */
> > cpu_clear(cpu, cpu_initialized); /* was set by cpu_init() */
> > - cpu_clear(cpu, cpu_possible_map);
> > cpu_clear(cpu, cpu_present_map);
> > per_cpu(x86_cpu_to_apicid, cpu) = BAD_APICID;
> > }
> >
> Andrew,
>
> Would you like to pick up this patch? ___Rusty Russell <[email protected]> acked it.
>

Could. But arch/x86/kernel/smpboot.c is an x86-tree file. I'd expect
the x86 maintainers would like a usable changelog and a Tested-by: (if
indeed Vegard tested it).

2008-06-26 09:00:25

by Vegard Nossum

[permalink] [raw]
Subject: Re: v2.6.26-rc7: BUG: unable to handle kernel NULL pointer dereference

On Thu, Jun 26, 2008 at 4:15 AM, Andrew Morton
<[email protected]> wrote:
> Could. But arch/x86/kernel/smpboot.c is an x86-tree file. I'd expect
> the x86 maintainers would like a usable changelog and a Tested-by: (if
> indeed Vegard tested it).

Hi,

I had the patch (on top of v2.6.26-rc8) in testing for a couple of
hours now, but like I expected, the APIC error is really hard to
reproduce. So I fear that we never hit the failure case in the first
place. On the other hand... I hit a number of (other?) problems:

1. Spontaneous reboot. This could have been the APIC error thing, I
have no way to know. I've never seen this before.
2. NULL pointer dereference in pick_next_task_fair(). Not new.
3. NULL pointer dereference in page_cgroup_zoneinfo. Never seen this
before (see screenshot #1).
4. Page fault in I'm guessing configure_kgdbts() during boot. Never
seen this before either (screenshot #2).

So even though I'm inclined to believe the patch is correct, I will
not ack it or claim successful test coverage.

The screenshots:

#1: http://www.kernel.org/pub/linux/kernel/people/vegard/kernels/20080626/DSCF3017.JPG
#2: http://www.kernel.org/pub/linux/kernel/people/vegard/kernels/20080626/DSCF3018.JPG


Vegard

--
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
-- E. W. Dijkstra, EWD1036

2008-06-26 13:05:45

by Jason Wessel

[permalink] [raw]
Subject: Re: v2.6.26-rc7: BUG: unable to handle kernel NULL pointer dereference

Vegard Nossum wrote:
> On Thu, Jun 26, 2008 at 4:15 AM, Andrew Morton
> <[email protected]> wrote:
>
>> Could. But arch/x86/kernel/smpboot.c is an x86-tree file. I'd expect
>> the x86 maintainers would like a usable changelog and a Tested-by: (if
>> indeed Vegard tested it).
>>
>
> Hi,
>
> I had the patch (on top of v2.6.26-rc8) in testing for a couple of
> hours now, but like I expected, the APIC error is really hard to
> reproduce. So I fear that we never hit the failure case in the first
> place. On the other hand... I hit a number of (other?) problems:
>
> 1. Spontaneous reboot. This could have been the APIC error thing, I
> have no way to know. I've never seen this before.
> 2. NULL pointer dereference in pick_next_task_fair(). Not new.
> 3. NULL pointer dereference in page_cgroup_zoneinfo. Never seen this
> before (see screenshot #1).
> 4. Page fault in I'm guessing configure_kgdbts() during boot. Never
> seen this before either (screenshot #2).
>

How often does #4 occur?

Perhaps you can email me directly the .config and kernel boot arguments
you used? It would be good to determine if the #4 is a victim or
culprit of the crash. If it is a culprit I would like to try and fix it.

Thanks,
Jason.

Subject: Re: v2.6.26-rc7: BUG: unable to handle kernel NULL pointer dereference

On Tue, Jun 24, 2008 at 11:14:51PM +1000, Rusty Russell wrote:
> On Tuesday 24 June 2008 18:06:23 Zhang, Yanmin wrote:
> > On Tue, 2008-06-24 at 11:36 +1000, Rusty Russell wrote:
> > > On Tuesday 24 June 2008 02:58:44 Mike Travis wrote:
> > > > Rusty Russell wrote:
> > > > > On Monday 23 June 2008 02:29:07 Vegard Nossum wrote:
> > > > >> And the (cpu < nr_cpu_ids) fails because the CPU has just been
> > > > >> offlined (or failed to initialize, but it's the same thing), while
> > > > >> NR_CPUS is the value that was compiled in as CONFIG_NR_CPUS (so the
> > > > >> former check will always be true).
> > > > >>
> > > > >> I don't think it is valid to ask for a per_cpu() variable on a CPU
> > > > >> which does not exist, though
> > > > >
> > > > > Yes it is. As long as cpu_possible(cpu), per_cpu(cpu) is valid.
> > > > >
> > > > > The number check should be removed: checking cpu_possible() is
> > > > > sufficient.
> > > > >
> > > > > Hope that helps,
> > > > > Rusty.
> > > >
> > > > I don't see a check for index being out of range in cpu_possible().
> > >
> > > You're right. It assumes cpu is < NR_CPUS. Hmm, I have no idea what's
> > > going on. nr_cpu_ids (ignore that it's a horrible name for a bad idea)
> > > should be fine to test against.
> > >
> > > Vegard's analysis is flawed: just because cpu is offline, it still must
> > > be < nr_cpu_ids, which is based on possible cpus. Unless something crazy
> > > is happening, but a quick grep doesn't reveal anyone manipulating
> > > nr_cpu_ids.
> > >
> > > If changing this fixes the bug, something else is badly wrong...
> > > Rusty.
> >
> > In function _cpu_up, the panic happens when calling
> > __raw_notifier_call_chain at the second time. Kernel doesn't panic when
> > calling it at the first time. If just say because of nr_cpu_ids, that's
> > not right.
> >
> > By checking source codes, I find function do_boot_cpu is the culprit.
> > Consider below call chain:
> > _cpu_up=>__cpu_up=>smp_ops.cpu_up=>native_cpu_up=>do_boot_cpu.
> >
> > So do_boot_cpu is called in the end. In do_boot_cpu, if boot_error==true,
> > cpu_clear(cpu, cpu_possible_map) is executed. So later on, when _cpu_up
> > calls __raw_notifier_call_chain at the second time to report
> > CPU_UP_CANCELED, because this cpu is already cleared from
> > cpu_possible_map, get_cpu_sysdev returns NULL.
> >
> > Many resources are related to cpu_possible_map, so it's better not to
> > change it.
> >
> > Below patch against 2.6.26-rc7 fixes it by removing the bit clearing in
> > cpu_possible_map.
> >
> > Vegard, would you like to help test it?
> >
> > Signed-off-by: Zhang Yanmin <[email protected]>
> >
> > ---
> >
> > diff -Nraup linux-2.6.26-rc7/arch/x86/kernel/smpboot.c
> > linux-2.6.26-rc7_cpuhotplug/arch/x86/kernel/smpboot.c ---
> > linux-2.6.26-rc7/arch/x86/kernel/smpboot.c 2008-06-24 09:03:54.000000000
> > +0800 +++ linux-2.6.26-rc7_cpuhotplug/arch/x86/kernel/smpboot.c 2008-06-24
> > 09:04:45.000000000 +0800 @@ -996,7 +996,6 @@ do_rest:
> > #endif
> > cpu_clear(cpu, cpu_callout_map); /* was set by do_boot_cpu() */
> > cpu_clear(cpu, cpu_initialized); /* was set by cpu_init() */
> > - cpu_clear(cpu, cpu_possible_map);
> > cpu_clear(cpu, cpu_present_map);

Nice catch.

While we're at it, is the clearing of cpu from the cpu_present_map
necessary if cpu_up failed for 'cpu' ?

> > per_cpu(x86_cpu_to_apicid, cpu) = BAD_APICID;
> > }
>
> Nice catch. Basically, cpu_possible_map should only be cleared at boot, and
> probably not even then.
>
> Acked-by: Rusty Russell <[email protected]>
>
> Thanks,
> Rusty.

--
Thanks and Regards
gautham

2008-06-26 13:59:34

by Vegard Nossum

[permalink] [raw]
Subject: Re: v2.6.26-rc7: BUG: unable to handle kernel NULL pointer dereference

On Thu, Jun 26, 2008 at 2:40 PM, Jason Wessel
<[email protected]> wrote:
>> 4. Page fault in I'm guessing configure_kgdbts() during boot. Never
>> seen this before either (screenshot #2).
>>
>
> How often does #4 occur?

Only this once yet.

>
> Perhaps you can email me directly the .config and kernel boot arguments
> you used? It would be good to determine if the #4 is a victim or
> culprit of the crash. If it is a culprit I would like to try and fix it.

I don't think this really has anything to do with the CPU hotplug
crashing. It seems to be completely spurious, as this was during boot
before userspace had even started. It's not even certain that it has
anything to do with kgdb at all. I mean, that stack trace is not very
reliable...

Command line was: kernel /vmlinuz-2.6.26-rc8-dirty ro
root=/dev/VolGroup00/LogVol00 rhgb debug 3

You can find vmlinux and config here:

http://www.kernel.org/pub/linux/kernel/people/vegard/kernels/20080626/

..but it probably takes a bit for the mirrors to pick up. In the
meantime, I can provide you with the relevant line numbers:

# configure_kgdbts
$ addr2line -e vmlinux.20080626 -i c030ab18 | sed 's/.*linux-2.6\///'
drivers/misc/kgdbts.c:911
drivers/misc/kgdbts.c:1015

# mtrr_del
$ addr2line -e vmlinux.20080626 -i c011007b | sed 's/.*linux-2.6\///'
arch/x86/kernel/cpu/mtrr/main.c:548

# loop_init
$ addr2line -e vmlinux.20080626 -i c078111a | sed 's/.*linux-2.6\///'
drivers/block/loop.c:1550

# init_kgdbts
$ addr2line -e vmlinux.20080626 -i c0781163 | sed 's/.*linux-2.6\///'
drivers/misc/kgdbts.c:1033

Thanks.


Vegard

--
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
-- E. W. Dijkstra, EWD1036

2008-06-27 03:17:55

by Rusty Russell

[permalink] [raw]
Subject: Re: v2.6.26-rc7: BUG: unable to handle kernel NULL pointer dereference

On Thursday 26 June 2008 22:58:20 Gautham R Shenoy wrote:
> On Tue, Jun 24, 2008 at 11:14:51PM +1000, Rusty Russell wrote:
> > On Tuesday 24 June 2008 18:06:23 Zhang, Yanmin wrote:
> > > On Tue, 2008-06-24 at 11:36 +1000, Rusty Russell wrote:
> > > > On Tuesday 24 June 2008 02:58:44 Mike Travis wrote:
> > > > > Rusty Russell wrote:
> > > > > > On Monday 23 June 2008 02:29:07 Vegard Nossum wrote:
> > > > > >> And the (cpu < nr_cpu_ids) fails because the CPU has just been
> > > > > >> offlined (or failed to initialize, but it's the same thing),
> > > > > >> while NR_CPUS is the value that was compiled in as
> > > > > >> CONFIG_NR_CPUS (so the former check will always be true).
> > > > > >>
> > > > > >> I don't think it is valid to ask for a per_cpu() variable on a
> > > > > >> CPU which does not exist, though
> > > > > >
> > > > > > Yes it is. As long as cpu_possible(cpu), per_cpu(cpu) is valid.
> > > > > >
> > > > > > The number check should be removed: checking cpu_possible() is
> > > > > > sufficient.
> > > > > >
> > > > > > Hope that helps,
> > > > > > Rusty.
> > > > >
> > > > > I don't see a check for index being out of range in cpu_possible().
> > > >
> > > > You're right. It assumes cpu is < NR_CPUS. Hmm, I have no idea
> > > > what's going on. nr_cpu_ids (ignore that it's a horrible name for a
> > > > bad idea) should be fine to test against.
> > > >
> > > > Vegard's analysis is flawed: just because cpu is offline, it still
> > > > must be < nr_cpu_ids, which is based on possible cpus. Unless
> > > > something crazy is happening, but a quick grep doesn't reveal anyone
> > > > manipulating nr_cpu_ids.
> > > >
> > > > If changing this fixes the bug, something else is badly wrong...
> > > > Rusty.
> > >
> > > In function _cpu_up, the panic happens when calling
> > > __raw_notifier_call_chain at the second time. Kernel doesn't panic when
> > > calling it at the first time. If just say because of nr_cpu_ids,
> > > that's not right.
> > >
> > > By checking source codes, I find function do_boot_cpu is the culprit.
> > > Consider below call chain:
> > > _cpu_up=>__cpu_up=>smp_ops.cpu_up=>native_cpu_up=>do_boot_cpu.
> > >
> > > So do_boot_cpu is called in the end. In do_boot_cpu, if
> > > boot_error==true, cpu_clear(cpu, cpu_possible_map) is executed. So
> > > later on, when _cpu_up calls __raw_notifier_call_chain at the second
> > > time to report
> > > CPU_UP_CANCELED, because this cpu is already cleared from
> > > cpu_possible_map, get_cpu_sysdev returns NULL.
> > >
> > > Many resources are related to cpu_possible_map, so it's better not to
> > > change it.
> > >
> > > Below patch against 2.6.26-rc7 fixes it by removing the bit clearing in
> > > cpu_possible_map.
> > >
> > > Vegard, would you like to help test it?
> > >
> > > Signed-off-by: Zhang Yanmin <[email protected]>
> > >
> > > ---
> > >
> > > diff -Nraup linux-2.6.26-rc7/arch/x86/kernel/smpboot.c
> > > linux-2.6.26-rc7_cpuhotplug/arch/x86/kernel/smpboot.c ---
> > > linux-2.6.26-rc7/arch/x86/kernel/smpboot.c 2008-06-24
> > > 09:03:54.000000000 +0800 +++
> > > linux-2.6.26-rc7_cpuhotplug/arch/x86/kernel/smpboot.c 2008-06-24
> > > 09:04:45.000000000 +0800 @@ -996,7 +996,6 @@ do_rest:
> > > #endif
> > > cpu_clear(cpu, cpu_callout_map); /* was set by do_boot_cpu() */
> > > cpu_clear(cpu, cpu_initialized); /* was set by cpu_init() */
> > > - cpu_clear(cpu, cpu_possible_map);
> > > cpu_clear(cpu, cpu_present_map);
>
> Nice catch.
>
> While we're at it, is the clearing of cpu from the cpu_present_map
> necessary if cpu_up failed for 'cpu' ?

It's never necessary, but there there are not many places which cpu_present is
examined. It just prevents it from being hot added again, AFAICT.

Rusty.

2008-06-30 11:25:16

by Ingo Molnar

[permalink] [raw]
Subject: Re: v2.6.26-rc7: BUG: unable to handle kernel NULL pointer dereference


* Rusty Russell <[email protected]> wrote:

> On Tuesday 24 June 2008 18:06:23 Zhang, Yanmin wrote:

> > In function _cpu_up, the panic happens when calling
> > __raw_notifier_call_chain at the second time. Kernel doesn't panic
> > when calling it at the first time. If just say because of
> > nr_cpu_ids, that's not right.
> >
> > By checking source codes, I find function do_boot_cpu is the
> > culprit. Consider below call chain:
> > _cpu_up=>__cpu_up=>smp_ops.cpu_up=>native_cpu_up=>do_boot_cpu.
> >
> > So do_boot_cpu is called in the end. In do_boot_cpu, if
> > boot_error==true, cpu_clear(cpu, cpu_possible_map) is executed. So
> > later on, when _cpu_up calls __raw_notifier_call_chain at the second
> > time to report CPU_UP_CANCELED, because this cpu is already cleared
> > from cpu_possible_map, get_cpu_sysdev returns NULL.
> >
> > Many resources are related to cpu_possible_map, so it's better not to
> > change it.
> >
> > Below patch against 2.6.26-rc7 fixes it by removing the bit clearing in
> > cpu_possible_map.
> >
> > Vegard, would you like to help test it?
> >
> > Signed-off-by: Zhang Yanmin <[email protected]>
[...]

> Nice catch. Basically, cpu_possible_map should only be cleared at
> boot, and probably not even then.
>
> Acked-by: Rusty Russell <[email protected]>

applied to tip/x86/urgent for v2.6.26 merging - thanks everyone!

Ingo

2008-07-10 19:10:52

by Vegard Nossum

[permalink] [raw]
Subject: Re: v2.6.26-rc7: BUG: unable to handle kernel NULL pointer dereference

On Tue, Jun 24, 2008 at 10:06 AM, Zhang, Yanmin<[email protected]> wrote:> In function _cpu_up, the panic happens when calling __raw_notifier_call_chain> at the second time. Kernel doesn't panic when calling it at the first time. If> just say because of nr_cpu_ids, that's not right.>> By checking source codes, I find function do_boot_cpu is the culprit.> Consider below call chain:> _cpu_up=>__cpu_up=>smp_ops.cpu_up=>native_cpu_up=>do_boot_cpu.>> So do_boot_cpu is called in the end. In do_boot_cpu, if boot_error==true,> cpu_clear(cpu, cpu_possible_map) is executed. So later on, when _cpu_up> calls __raw_notifier_call_chain at the second time to report CPU_UP_CANCELED,> because this cpu is already cleared from cpu_possible_map, get_cpu_sysdev returns> NULL.>> Many resources are related to cpu_possible_map, so it's better not to change it.>> Below patch against 2.6.26-rc7 fixes it by removing the bit clearing in cpu_possible_map.>> Vegard, would you like to help test it?
Yay! I just hit this again with your patch applied
Inquiring remote APIC #1...... APIC #1 ID: failed... APIC #1 VERSION: failed... APIC #1 SPIV: failed
and it works correctly, no NULL pointer error this time :-)
I know it's applied to mainline already, actually with this tag too,even though I wasn't able to confirm it before:
Tested-by: Vegard Nossum <[email protected]>
Thanks :-)

Vegard
-- "The animistic metaphor of the bug that maliciously sneaked in whilethe programmer was not looking is intellectually dishonest as itdisguises that the error is the programmer's own creation." -- E. W. Dijkstra, EWD1036????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?