2008-07-27 09:55:27

by Vegard Nossum

[permalink] [raw]
Subject: -git: BUG: unable to handle kernel paging request at c0100248 [CPU hotplug]

Hi,

Something broke CPU-hotplug _again_, here with a freshly built latest -git:

BUG: unable to handle kernel paging request at c0100248
IP: [<c067c3db>] native_cpu_up+0x26b/0x880
*pde = 00cc9067 *pte = 00100161
Oops: 0003 [#1] PREEMPT SMP DEBUG_PAGEALLOC
Pid: 3849, comm: bash Not tainted (2.6.26-07588-g5d1201a #5)
EIP: 0060:[<c067c3db>] EFLAGS: 00210282 CPU: 0
EIP is at native_cpu_up+0x26b/0x880
EAX: c08f9000 EBX: c08fa028 ECX: 01669000 EDX: c08f7004
ESI: 00000001 EDI: 00000000 EBP: f5bd9ed8 ESP: f5bd9dfc
DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process bash (pid: 3849, ti=f5bd8000 task=f480bfc0 task.ti=f5bd8000)
Stack: 00000000 c01e2153 f5bd9e3c f5992bb0 00000001 00000001 c01e2193 f5bd9e30
c01e2223 f59929a0 f5bd9e3c f5992bb0 00000000 00000000 dead4ead ffffffff
ffffffff c0889bc0 ffffffe0 f5bd9eec f5bd9ec4 f5bd9ea8 00000000 f5bd9e88
Call Trace:
[<c01e2153>] ? __sysfs_add_one+0x63/0x90
[<c01e2193>] ? sysfs_add_one+0x13/0x50
[<c01e2223>] ? sysfs_addrm_start+0x53/0xa0
[<c01e3811>] ? internal_create_group+0xd1/0x180
[<c067bdc0>] ? do_fork_idle+0x0/0x20
[<c014db69>] ? __raw_notifier_call_chain+0x19/0x20
[<c067dbc3>] ? _cpu_up+0x83/0x100
[<c067dc89>] ? cpu_up+0x49/0x80
[<c065eb58>] ? store_online+0x58/0x80
[<c065eb00>] ? store_online+0x0/0x80
[<c03f443c>] ? sysdev_store+0x2c/0x40
[<c01e16d2>] ? sysfs_write_file+0xa2/0x100
[<c01a35a6>] ? vfs_write+0x96/0x130
[<c01e1630>] ? sysfs_write_file+0x0/0x100
[<c01a3aed>] ? sys_write+0x3d/0x70
[<c0103f5b>] ? sysenter_do_call+0x12/0x3f
=======================
Code: 34 ff ff ff b8 00 70 8f c0 03 04 95 80 8e 89 c0 8b 55 a8 89 10 8b 85 34 ff
ff ff e8 f0 0a 00 00 8b b5 34 ff ff ff b8 00 90 8f c0 <c7> 05 48 02 10 c0 f0 c9
67 c0 03 04 b5 80 8e 89 c0 a3 2e 31 81

c067c3db: c7 05 48 02 10 c0 f0 movl $0xc067c9f0,0xc0100248
c067c3e2: c9 67 c0

$ addr2line -e vmlinux -i c067c3db
arch/x86/kernel/smpboot.c:859
arch/x86/kernel/smpboot.c:993

Hm... that doesn't really make sense, does it? It's this line:

initial_code = (unsigned long)start_secondary;

..but I don't see where the memory dereference comes from.


I can reproduce it every time, simply by bringing CPU 1 down and up.


Vegard

--
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
-- E. W. Dijkstra, EWD1036


2008-07-27 10:08:09

by Vegard Nossum

[permalink] [raw]
Subject: Re: -git: BUG: unable to handle kernel paging request at c0100248 [CPU hotplug]

On Sun, Jul 27, 2008 at 11:55 AM, Vegard Nossum <[email protected]> wrote:
> Hi,
>
> Something broke CPU-hotplug _again_, here with a freshly built latest -git:
>
> BUG: unable to handle kernel paging request at c0100248

[snip]

> c067c3db: c7 05 48 02 10 c0 f0 movl $0xc067c9f0,0xc0100248
> c067c3e2: c9 67 c0
>
> $ addr2line -e vmlinux -i c067c3db
> arch/x86/kernel/smpboot.c:859
> arch/x86/kernel/smpboot.c:993
>
> Hm... that doesn't really make sense, does it? It's this line:
>
> initial_code = (unsigned long)start_secondary;
>
> ..but I don't see where the memory dereference comes from.

"initial_code" (0xc0100248) is in a .text section:

Idx Name Size VMA LMA File off Algn
0 .text.head 000003a1 c0100000 00100000 00001000 2**2
1 .text 00583b8d c01003b0 001003b0 000013b0 2**4

And I have CONFIG_DEBUG_RODATA=y. Maybe this was introduced in

commit e3f77edfc1d0beb7b10f9f31d9e39206f7dbef7b
Author: Glauber Costa <[email protected]>
Date: Wed May 28 12:57:02 2008 -0300

x86: use initial_code for i386

x86_64 jumps to whatever is written in "initial_code" symbol,
instead of a fixed address. Do it for i386 too. It will allow us
to integrate more of the smp boot code.

Signed-off-by: Glauber Costa <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>

?


Vegard

--
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
-- E. W. Dijkstra, EWD1036

2008-07-27 17:14:32

by Dmitry Adamushko

[permalink] [raw]
Subject: Re: -git: BUG: unable to handle kernel paging request at c0100248 [CPU hotplug]

2008/7/27 Vegard Nossum <[email protected]>:
> Hi,
>
> Something broke CPU-hotplug _again_, here with a freshly built latest -git:
>
> BUG: unable to handle kernel paging request at c0100248
> IP: [<c067c3db>] native_cpu_up+0x26b/0x880
> *pde = 00cc9067 *pte = 00100161
> Oops: 0003 [#1] PREEMPT SMP DEBUG_PAGEALLOC
> Pid: 3849, comm: bash Not tainted (2.6.26-07588-g5d1201a #5)
> EIP: 0060:[<c067c3db>] EFLAGS: 00210282 CPU: 0
> EIP is at native_cpu_up+0x26b/0x880
> EAX: c08f9000 EBX: c08fa028 ECX: 01669000 EDX: c08f7004
> ESI: 00000001 EDI: 00000000 EBP: f5bd9ed8 ESP: f5bd9dfc
> DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
> Process bash (pid: 3849, ti=f5bd8000 task=f480bfc0 task.ti=f5bd8000)
> Stack: 00000000 c01e2153 f5bd9e3c f5992bb0 00000001 00000001 c01e2193 f5bd9e30
> c01e2223 f59929a0 f5bd9e3c f5992bb0 00000000 00000000 dead4ead ffffffff
> ffffffff c0889bc0 ffffffe0 f5bd9eec f5bd9ec4 f5bd9ea8 00000000 f5bd9e88
> Call Trace:
> [<c01e2153>] ? __sysfs_add_one+0x63/0x90
> [<c01e2193>] ? sysfs_add_one+0x13/0x50
> [<c01e2223>] ? sysfs_addrm_start+0x53/0xa0
> [<c01e3811>] ? internal_create_group+0xd1/0x180
> [<c067bdc0>] ? do_fork_idle+0x0/0x20
> [<c014db69>] ? __raw_notifier_call_chain+0x19/0x20
> [<c067dbc3>] ? _cpu_up+0x83/0x100
> [<c067dc89>] ? cpu_up+0x49/0x80
> [<c065eb58>] ? store_online+0x58/0x80
> [<c065eb00>] ? store_online+0x0/0x80
> [<c03f443c>] ? sysdev_store+0x2c/0x40
> [<c01e16d2>] ? sysfs_write_file+0xa2/0x100
> [<c01a35a6>] ? vfs_write+0x96/0x130
> [<c01e1630>] ? sysfs_write_file+0x0/0x100
> [<c01a3aed>] ? sys_write+0x3d/0x70
> [<c0103f5b>] ? sysenter_do_call+0x12/0x3f
> =======================
> Code: 34 ff ff ff b8 00 70 8f c0 03 04 95 80 8e 89 c0 8b 55 a8 89 10 8b 85 34 ff
> ff ff e8 f0 0a 00 00 8b b5 34 ff ff ff b8 00 90 8f c0 <c7> 05 48 02 10 c0 f0 c9
> 67 c0 03 04 b5 80 8e 89 c0 a3 2e 31 81
>
> c067c3db: c7 05 48 02 10 c0 f0 movl $0xc067c9f0,0xc0100248
> c067c3e2: c9 67 c0
>
> $ addr2line -e vmlinux -i c067c3db
> arch/x86/kernel/smpboot.c:859
> arch/x86/kernel/smpboot.c:993
>
> Hm... that doesn't really make sense, does it? It's this line:
>
> initial_code = (unsigned long)start_secondary;
>
> ..but I don't see where the memory dereference comes from.
>
>
> I can reproduce it every time, simply by bringing CPU 1 down and up.

cpu_up() is also executed during an initial boot-up of your system
(for non boot-up cpus).
You should see it with "dmesg | grep "Booting processor".

So it has not caused any problems at that time... maybe something to
do with __initdata.
[ I'm a bit short of time so just speculating here :-) ]


>
> Vegard
>

--
Best regards,
Dmitry Adamushko

2008-07-27 17:19:30

by Vegard Nossum

[permalink] [raw]
Subject: Re: -git: BUG: unable to handle kernel paging request at c0100248 [CPU hotplug]

On Sun, Jul 27, 2008 at 7:14 PM, Dmitry Adamushko
<[email protected]> wrote:
> 2008/7/27 Vegard Nossum <[email protected]>:
>> Hi,
>>
>> Something broke CPU-hotplug _again_, here with a freshly built latest -git:
>>
>> BUG: unable to handle kernel paging request at c0100248
>> IP: [<c067c3db>] native_cpu_up+0x26b/0x880
>> *pde = 00cc9067 *pte = 00100161
>> Oops: 0003 [#1] PREEMPT SMP DEBUG_PAGEALLOC
>> Pid: 3849, comm: bash Not tainted (2.6.26-07588-g5d1201a #5)
>> EIP: 0060:[<c067c3db>] EFLAGS: 00210282 CPU: 0
>> EIP is at native_cpu_up+0x26b/0x880
>> EAX: c08f9000 EBX: c08fa028 ECX: 01669000 EDX: c08f7004
>> ESI: 00000001 EDI: 00000000 EBP: f5bd9ed8 ESP: f5bd9dfc
>> DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
>> Process bash (pid: 3849, ti=f5bd8000 task=f480bfc0 task.ti=f5bd8000)
>> Stack: 00000000 c01e2153 f5bd9e3c f5992bb0 00000001 00000001 c01e2193 f5bd9e30
>> c01e2223 f59929a0 f5bd9e3c f5992bb0 00000000 00000000 dead4ead ffffffff
>> ffffffff c0889bc0 ffffffe0 f5bd9eec f5bd9ec4 f5bd9ea8 00000000 f5bd9e88
>> Call Trace:
>> [<c01e2153>] ? __sysfs_add_one+0x63/0x90
>> [<c01e2193>] ? sysfs_add_one+0x13/0x50
>> [<c01e2223>] ? sysfs_addrm_start+0x53/0xa0
>> [<c01e3811>] ? internal_create_group+0xd1/0x180
>> [<c067bdc0>] ? do_fork_idle+0x0/0x20
>> [<c014db69>] ? __raw_notifier_call_chain+0x19/0x20
>> [<c067dbc3>] ? _cpu_up+0x83/0x100
>> [<c067dc89>] ? cpu_up+0x49/0x80
>> [<c065eb58>] ? store_online+0x58/0x80
>> [<c065eb00>] ? store_online+0x0/0x80
>> [<c03f443c>] ? sysdev_store+0x2c/0x40
>> [<c01e16d2>] ? sysfs_write_file+0xa2/0x100
>> [<c01a35a6>] ? vfs_write+0x96/0x130
>> [<c01e1630>] ? sysfs_write_file+0x0/0x100
>> [<c01a3aed>] ? sys_write+0x3d/0x70
>> [<c0103f5b>] ? sysenter_do_call+0x12/0x3f
>> =======================
>> Code: 34 ff ff ff b8 00 70 8f c0 03 04 95 80 8e 89 c0 8b 55 a8 89 10 8b 85 34 ff
>> ff ff e8 f0 0a 00 00 8b b5 34 ff ff ff b8 00 90 8f c0 <c7> 05 48 02 10 c0 f0 c9
>> 67 c0 03 04 b5 80 8e 89 c0 a3 2e 31 81
>>
>> c067c3db: c7 05 48 02 10 c0 f0 movl $0xc067c9f0,0xc0100248
>> c067c3e2: c9 67 c0
>>
>> $ addr2line -e vmlinux -i c067c3db
>> arch/x86/kernel/smpboot.c:859
>> arch/x86/kernel/smpboot.c:993
>>
>> Hm... that doesn't really make sense, does it? It's this line:
>>
>> initial_code = (unsigned long)start_secondary;
>>
>> ..but I don't see where the memory dereference comes from.
>>
>>
>> I can reproduce it every time, simply by bringing CPU 1 down and up.
>
> cpu_up() is also executed during an initial boot-up of your system
> (for non boot-up cpus).
> You should see it with "dmesg | grep "Booting processor".
>
> So it has not caused any problems at that time... maybe something to
> do with __initdata.
> [ I'm a bit short of time so just speculating here :-) ]

Booting processor 1/1 ip 6000
Initializing CPU#1
[...]
Write protecting the kernel text: 5648k

So it succeeds because the protection happens much later :-)


Vegard

--
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
-- E. W. Dijkstra, EWD1036

2008-07-27 19:29:33

by Thomas Gleixner

[permalink] [raw]
Subject: Re: -git: BUG: unable to handle kernel paging request at c0100248 [CPU hotplug]

On Sun, 27 Jul 2008, Vegard Nossum wrote:
> Booting processor 1/1 ip 6000
> Initializing CPU#1
> [...]
> Write protecting the kernel text: 5648k
>
> So it succeeds because the protection happens much later :-)

Patch below fixes the issue. I get it to Linus ASAP.

Thanks,

tglx
---------
diff --git a/arch/x86/kernel/head_32.S b/arch/x86/kernel/head_32.S
index f67e934..a7010c3 100644
--- a/arch/x86/kernel/head_32.S
+++ b/arch/x86/kernel/head_32.S
@@ -456,9 +456,6 @@ is386: movl $2,%ecx # set MP
1:
#endif /* CONFIG_SMP */
jmp *(initial_code)
-.align 4
-ENTRY(initial_code)
- .long i386_start_kernel

/*
* We depend on ET to be correct. This checks for 287/387.
@@ -601,6 +598,11 @@ ignore_int:
#endif
iret

+.section .cpuinit.data,"wa"
+.align 4
+ENTRY(initial_code)
+ .long i386_start_kernel
+
.section .text
/*
* Real beginning of normal "text" segment

2008-07-27 19:35:36

by Vegard Nossum

[permalink] [raw]
Subject: Re: -git: BUG: unable to handle kernel paging request at c0100248 [CPU hotplug]

On Sun, Jul 27, 2008 at 9:23 PM, Thomas Gleixner <[email protected]> wrote:
> On Sun, 27 Jul 2008, Vegard Nossum wrote:
>> Booting processor 1/1 ip 6000
>> Initializing CPU#1
>> [...]
>> Write protecting the kernel text: 5648k
>>
>> So it succeeds because the protection happens much later :-)
>
> Patch below fixes the issue. I get it to Linus ASAP.

Nice :)

[...]

> +.section .cpuinit.data,"wa"
> +.align 4
> +ENTRY(initial_code)
> + .long i386_start_kernel
> +
> .section .text
> /*
> * Real beginning of normal "text" segment
>

Hm, isn't there some kind of "section stack" that can be used to
restore the previous section? .pushsection/.popsection directives?


Vegard

--
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
-- E. W. Dijkstra, EWD1036

2008-07-27 19:40:57

by Vegard Nossum

[permalink] [raw]
Subject: Re: -git: BUG: unable to handle kernel paging request at c0100248 [CPU hotplug]

On Sun, Jul 27, 2008 at 9:35 PM, Vegard Nossum <[email protected]> wrote:
> On Sun, Jul 27, 2008 at 9:23 PM, Thomas Gleixner <[email protected]> wrote:
>> On Sun, 27 Jul 2008, Vegard Nossum wrote:
>>> Booting processor 1/1 ip 6000
>>> Initializing CPU#1
>>> [...]
>>> Write protecting the kernel text: 5648k
>>>
>>> So it succeeds because the protection happens much later :-)
>>
>> Patch below fixes the issue. I get it to Linus ASAP.
>
> Nice :)
>
> [...]
>
>> +.section .cpuinit.data,"wa"
>> +.align 4
>> +ENTRY(initial_code)
>> + .long i386_start_kernel
>> +
>> .section .text
>> /*
>> * Real beginning of normal "text" segment
>>
>
> Hm, isn't there some kind of "section stack" that can be used to
> restore the previous section? .pushsection/.popsection directives?

Oh, never mind :-)

(Sorry for the noise.)


Vegard

--
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
-- E. W. Dijkstra, EWD1036

2008-07-28 18:34:40

by Glauber Costa

[permalink] [raw]
Subject: Re: -git: BUG: unable to handle kernel paging request at c0100248 [CPU hotplug]

On Sun, Jul 27, 2008 at 4:40 PM, Vegard Nossum <[email protected]> wrote:
> On Sun, Jul 27, 2008 at 9:35 PM, Vegard Nossum <[email protected]> wrote:
>> On Sun, Jul 27, 2008 at 9:23 PM, Thomas Gleixner <[email protected]> wrote:
>>> On Sun, 27 Jul 2008, Vegard Nossum wrote:
>>>> Booting processor 1/1 ip 6000
>>>> Initializing CPU#1
>>>> [...]
>>>> Write protecting the kernel text: 5648k
>>>>
>>>> So it succeeds because the protection happens much later :-)
>>>
>>> Patch below fixes the issue. I get it to Linus ASAP.
>>
>> Nice :)

Just saw this one. Thanks for the quick fix! ;-)

>> [...]
>>
>>> +.section .cpuinit.data,"wa"
>>> +.align 4
>>> +ENTRY(initial_code)
>>> + .long i386_start_kernel
>>> +
>>> .section .text
>>> /*
>>> * Real beginning of normal "text" segment
>>>
>>
>> Hm, isn't there some kind of "section stack" that can be used to
>> restore the previous section? .pushsection/.popsection directives?
>
> Oh, never mind :-)
>
> (Sorry for the noise.)
>
>
> Vegard
>
> --
> "The animistic metaphor of the bug that maliciously sneaked in while
> the programmer was not looking is intellectually dishonest as it
> disguises that the error is the programmer's own creation."
> -- E. W. Dijkstra, EWD1036
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>



--
Glauber Costa.
"Free as in Freedom"
http://glommer.net

"The less confident you are, the more serious you have to act."