2008-07-24 10:48:29

by Vegard Nossum

[permalink] [raw]
Subject: latest -git: kernel BUG at arch/x86/kernel/microcode.c:142!

Hi,

I just got this when doing CPU hotplug:

------------[ cut here ]------------
kernel BUG at arch/x86/kernel/microcode.c:142!
invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC

Pid: 4140, comm: bash Not tainted (2.6.26-06371-g338b9bb-dirty #14)
EIP: 0060:[<c0117f1e>] EFLAGS: 00210202 CPU: 0
EIP is at __mc_sysdev_add+0x1ee/0x200
EAX: 00000000 EBX: c1f61028 ECX: 01798000 EDX: c081ac80
ESI: 00000001 EDI: 00000001 EBP: f5bcbe24 ESP: f5bcbdcc
DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process bash (pid: 4140, ti=f5bca000 task=f4066f90 task.ti=f5bca000)
Stack: 00000000 f5bcbe24 c028300b 00000001 000000d0 c06d8dc3 f73f77d0 00000000
00000000 00000014 00000000 00000000 c0829254 f4f0fa00 f6e950f0 00200282
f6d5180c 00000002 00000003 00000002 00000001 c1f61028 f5bcbe2c c0117f3a
Call Trace:
[<c028300b>] ? kobject_uevent_env+0xdb/0x380
[<c0117f3a>] ? mc_sysdev_add+0xa/0x10
[<c05875fa>] ? mc_cpu_callback+0x1ea/0x240
[<c014db67>] ? notifier_call_chain+0x37/0x70
[<c014dbd9>] ? __raw_notifier_call_chain+0x19/0x20
[<c014dbfa>] ? raw_notifier_call_chain+0x1a/0x20
[<c0589477>] ? _cpu_up+0xa7/0x100
[<c0589519>] ? cpu_up+0x49/0x80
[<c056a3d8>] ? store_online+0x58/0x80
[<c056a380>] ? store_online+0x0/0x80
[<c02ff57c>] ? sysdev_store+0x2c/0x40
[<c01de412>] ? sysfs_write_file+0xa2/0x100
[<c01a0386>] ? vfs_write+0x96/0x130
[<c01de370>] ? sysfs_write_file+0x0/0x100
[<c01a08cd>] ? sys_write+0x3d/0x70
[<c0103f5b>] ? sysenter_do_call+0x12/0x3f
=======================
Code: 4d d8 c7 01 00 00 00 00 b8 00 1a 6f c0 e8 fb 46 47 00 8d 55 f0
64 a1 00 90 7c c0 e8 0d 75 01 00 8b 45 d4 83 c4 4c 5b 5e 5f 5d c3 <0f>
0b eb fe 8d b4 26 00 00 00 00 8d bc 27 00 00 00 00 55 31 d2
EIP: [<c0117f1e>] __mc_sysdev_add+0x1ee/0x200 SS:ESP 0068:f5bcbdcc
---[ end trace 8c86c730d90bf362 ]---

It's this one:

/* We should bind the task to the CPU */
BUG_ON(raw_smp_processor_id() != cpu_num);

Maybe related to recently merged per-cpu changes? (Yesterday's tests ran fine.)

It seems 100% reproducible, so I'll start bisecting it.


Vegard

--
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
-- E. W. Dijkstra, EWD1036


2008-07-24 14:03:07

by Vegard Nossum

[permalink] [raw]
Subject: Re: latest -git: kernel BUG at arch/x86/kernel/microcode.c:142!

On Thu, Jul 24, 2008 at 12:48 PM, Vegard Nossum <[email protected]> wrote:
> Hi,
>
> I just got this when doing CPU hotplug:
>
> ------------[ cut here ]------------
> kernel BUG at arch/x86/kernel/microcode.c:142!
> invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
>
> Pid: 4140, comm: bash Not tainted (2.6.26-06371-g338b9bb-dirty #14)
> EIP: 0060:[<c0117f1e>] EFLAGS: 00210202 CPU: 0
> EIP is at __mc_sysdev_add+0x1ee/0x200
> EAX: 00000000 EBX: c1f61028 ECX: 01798000 EDX: c081ac80
> ESI: 00000001 EDI: 00000001 EBP: f5bcbe24 ESP: f5bcbdcc
> DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
> Process bash (pid: 4140, ti=f5bca000 task=f4066f90 task.ti=f5bca000)
> Stack: 00000000 f5bcbe24 c028300b 00000001 000000d0 c06d8dc3 f73f77d0 00000000
> 00000000 00000014 00000000 00000000 c0829254 f4f0fa00 f6e950f0 00200282
> f6d5180c 00000002 00000003 00000002 00000001 c1f61028 f5bcbe2c c0117f3a
> Call Trace:
> [<c028300b>] ? kobject_uevent_env+0xdb/0x380
> [<c0117f3a>] ? mc_sysdev_add+0xa/0x10
> [<c05875fa>] ? mc_cpu_callback+0x1ea/0x240
> [<c014db67>] ? notifier_call_chain+0x37/0x70
> [<c014dbd9>] ? __raw_notifier_call_chain+0x19/0x20
> [<c014dbfa>] ? raw_notifier_call_chain+0x1a/0x20
> [<c0589477>] ? _cpu_up+0xa7/0x100
> [<c0589519>] ? cpu_up+0x49/0x80
> [<c056a3d8>] ? store_online+0x58/0x80
> [<c056a380>] ? store_online+0x0/0x80
> [<c02ff57c>] ? sysdev_store+0x2c/0x40
> [<c01de412>] ? sysfs_write_file+0xa2/0x100
> [<c01a0386>] ? vfs_write+0x96/0x130
> [<c01de370>] ? sysfs_write_file+0x0/0x100
> [<c01a08cd>] ? sys_write+0x3d/0x70
> [<c0103f5b>] ? sysenter_do_call+0x12/0x3f
> =======================
> Code: 4d d8 c7 01 00 00 00 00 b8 00 1a 6f c0 e8 fb 46 47 00 8d 55 f0
> 64 a1 00 90 7c c0 e8 0d 75 01 00 8b 45 d4 83 c4 4c 5b 5e 5f 5d c3 <0f>
> 0b eb fe 8d b4 26 00 00 00 00 8d bc 27 00 00 00 00 55 31 d2
> EIP: [<c0117f1e>] __mc_sysdev_add+0x1ee/0x200 SS:ESP 0068:f5bcbdcc
> ---[ end trace 8c86c730d90bf362 ]---
>
> It's this one:
>
> /* We should bind the task to the CPU */
> BUG_ON(raw_smp_processor_id() != cpu_num);
>
> Maybe related to recently merged per-cpu changes? (Yesterday's tests ran fine.)
>
> It seems 100% reproducible, so I'll start bisecting it.

Ahha, after many hours of hitting various unrelated crashes,
miscompiles, etc. I finally arrive at this commit:

commit e761b7725234276a802322549cee5255305a0930
Author: Max Krasnyansky <[email protected]>
Date: Tue Jul 15 04:43:49 2008 -0700

cpu hotplug, sched: Introduce cpu_active_map and redo sched domain managment
(take 2)

This is based on Linus' idea of creating cpu_active_map that prevents
scheduler load balancer from migrating tasks to the cpu that is going
down.

It allows us to simplify domain management code and avoid unecessary
domain rebuilds during cpu hotplug event handling.

Please ignore the cpusets part for now. It needs some more work in order
to avoid crazy lock nesting. Although I did simplfy and unify domain
reinitialization logic. We now simply call partition_sched_domains() in
all the cases. This means that we're using exact same code paths as in
cpusets case and hence the test below cover cpusets too.
Cpuset changes to make rebuild_sched_domains() callable from various
contexts are in the separate patch (right next after this one).

This not only boots but also easily handles
while true; do make clean; make -j 8; done
and
while true; do on-off-cpu 1; done
at the same time.
(on-off-cpu 1 simple does echo 0/1 > /sys/.../cpu1/online thing).

Suprisingly the box (dual-core Core2) is quite usable. In fact I'm typing
this on right now in gnome-terminal and things are moving just fine.

Also this is running with most of the debug features enabled (lockdep,
mutex, etc) no BUG_ONs or lockdep complaints so far.

I believe I addressed all of the Dmitry's comments for original Linus'
version. I changed both fair and rt balancer to mask out non-active cpus.
And replaced cpu_is_offline() with !cpu_active() in the main scheduler
code where it made sense (to me).

Signed-off-by: Max Krasnyanskiy <[email protected]>
Acked-by: Linus Torvalds <[email protected]>
Acked-by: Peter Zijlstra <[email protected]>
Acked-by: Gregory Haskins <[email protected]>
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Ingo Molnar <[email protected]>

...I added everybody to Cc.


Vegard

--
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
-- E. W. Dijkstra, EWD1036

2008-07-24 14:54:57

by Dmitry Adamushko

[permalink] [raw]
Subject: Re: latest -git: kernel BUG at arch/x86/kernel/microcode.c:142!

2008/7/24 Vegard Nossum <[email protected]>:
> On Thu, Jul 24, 2008 at 12:48 PM, Vegard Nossum <[email protected]> wrote:
>> Hi,
>>
>> I just got this when doing CPU hotplug:
>>
>> ------------[ cut here ]------------
>> kernel BUG at arch/x86/kernel/microcode.c:142!
>> invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
>>
>> Pid: 4140, comm: bash Not tainted (2.6.26-06371-g338b9bb-dirty #14)
>> EIP: 0060:[<c0117f1e>] EFLAGS: 00210202 CPU: 0
>> EIP is at __mc_sysdev_add+0x1ee/0x200
>> EAX: 00000000 EBX: c1f61028 ECX: 01798000 EDX: c081ac80
>> ESI: 00000001 EDI: 00000001 EBP: f5bcbe24 ESP: f5bcbdcc
>> DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
>> Process bash (pid: 4140, ti=f5bca000 task=f4066f90 task.ti=f5bca000)
>> Stack: 00000000 f5bcbe24 c028300b 00000001 000000d0 c06d8dc3 f73f77d0 00000000
>> 00000000 00000014 00000000 00000000 c0829254 f4f0fa00 f6e950f0 00200282
>> f6d5180c 00000002 00000003 00000002 00000001 c1f61028 f5bcbe2c c0117f3a
>> Call Trace:
>> [<c028300b>] ? kobject_uevent_env+0xdb/0x380
>> [<c0117f3a>] ? mc_sysdev_add+0xa/0x10
>> [<c05875fa>] ? mc_cpu_callback+0x1ea/0x240
>> [<c014db67>] ? notifier_call_chain+0x37/0x70
>> [<c014dbd9>] ? __raw_notifier_call_chain+0x19/0x20
>> [<c014dbfa>] ? raw_notifier_call_chain+0x1a/0x20
>> [<c0589477>] ? _cpu_up+0xa7/0x100
>> [<c0589519>] ? cpu_up+0x49/0x80
>> [<c056a3d8>] ? store_online+0x58/0x80
>> [<c056a380>] ? store_online+0x0/0x80
>> [<c02ff57c>] ? sysdev_store+0x2c/0x40
>> [<c01de412>] ? sysfs_write_file+0xa2/0x100
>> [<c01a0386>] ? vfs_write+0x96/0x130
>> [<c01de370>] ? sysfs_write_file+0x0/0x100
>> [<c01a08cd>] ? sys_write+0x3d/0x70
>> [<c0103f5b>] ? sysenter_do_call+0x12/0x3f
>> =======================
>> Code: 4d d8 c7 01 00 00 00 00 b8 00 1a 6f c0 e8 fb 46 47 00 8d 55 f0
>> 64 a1 00 90 7c c0 e8 0d 75 01 00 8b 45 d4 83 c4 4c 5b 5e 5f 5d c3 <0f>
>> 0b eb fe 8d b4 26 00 00 00 00 8d bc 27 00 00 00 00 55 31 d2
>> EIP: [<c0117f1e>] __mc_sysdev_add+0x1ee/0x200 SS:ESP 0068:f5bcbdcc
>> ---[ end trace 8c86c730d90bf362 ]---
>>
>> It's this one:
>>
>> /* We should bind the task to the CPU */
>> BUG_ON(raw_smp_processor_id() != cpu_num);
>>
>> Maybe related to recently merged per-cpu changes? (Yesterday's tests ran fine.)
>>
>> It seems 100% reproducible, so I'll start bisecting it.
>
> Ahha, after many hours of hitting various unrelated crashes,
> miscompiles, etc. I finally arrive at this commit:
>
> commit e761b7725234276a802322549cee5255305a0930
> Author: Max Krasnyansky <[email protected]>
> Date: Tue Jul 15 04:43:49 2008 -0700

Yeah, there seems to be a funny situation here :-) I'd expect it to be
100% reproduceable with CONFIG_MICROCODE=y.

cpu_up() -> raw_notifier_call_chain(CPU_ONLINE, ...) ->

(microcode's part)

mc_cpu_callback() -> mc_sysdev_add() -> microcode_init_cpu()

and here we have:

set_cpus_allowed_ptr(current, &cpumask_of_cpu(cpu));
mutex_lock(&microcode_mutex);
collect_cpu_info(cpu);

this code expects that after set_cpus_allowed_ptr() has been
completed, it will continue running on "cpu"

that's why BUG_ON(raw_smp_processor_id() != cpu_num);

the funny thing is that (1) it doesn't check for an error (otherwise
it would see an error)
and (2) cpu_active_map does _not_ yet have a bit for 'cpu' at this moment.

so migrate_task() will forward a migration request to migration_thread
(because 'current' is on-the-queue/running at this point and we can't
migrate it immediatelly -- current gets blocked inside migrate_task()
waiting for request's completion)

it all will end up in migration_thread() -> __migrate_task()
which does a test for cpu_active(dest_cpu) and bails out.

summary, with cpu_active_map as it's being used now this microcode's
scheme (the fact that it expects to be migrated onto 'cpu' while its
cpu_up(cpu) is not completely finished) doesn't work.

note, I've only taken a quick look so I don't make any judgements,
(good-bad)design-wise. But it's quite a funny use-case of
cpu-hotplug-notifications and CPU_ONLINE in particular :-)


--
Best regards,
Dmitry Adamushko

2008-07-24 16:18:40

by Dmitry Adamushko

[permalink] [raw]
Subject: Re: latest -git: kernel BUG at arch/x86/kernel/microcode.c:142!

2008/7/24 Dmitry Adamushko <[email protected]>:
> 2008/7/24 Vegard Nossum <[email protected]>:
>> On Thu, Jul 24, 2008 at 12:48 PM, Vegard Nossum <[email protected]> wrote:
>>> Hi,
>>>
>>> I just got this when doing CPU hotplug:
>>>
>>> ------------[ cut here ]------------
>>> kernel BUG at arch/x86/kernel/microcode.c:142!
>>> invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
>>>
>>> Pid: 4140, comm: bash Not tainted (2.6.26-06371-g338b9bb-dirty #14)
>>> EIP: 0060:[<c0117f1e>] EFLAGS: 00210202 CPU: 0
>>> EIP is at __mc_sysdev_add+0x1ee/0x200
>>> EAX: 00000000 EBX: c1f61028 ECX: 01798000 EDX: c081ac80
>>> ESI: 00000001 EDI: 00000001 EBP: f5bcbe24 ESP: f5bcbdcc
>>> DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
>>> Process bash (pid: 4140, ti=f5bca000 task=f4066f90 task.ti=f5bca000)
>>> Stack: 00000000 f5bcbe24 c028300b 00000001 000000d0 c06d8dc3 f73f77d0 00000000
>>> 00000000 00000014 00000000 00000000 c0829254 f4f0fa00 f6e950f0 00200282
>>> f6d5180c 00000002 00000003 00000002 00000001 c1f61028 f5bcbe2c c0117f3a
>>> Call Trace:
>>> [<c028300b>] ? kobject_uevent_env+0xdb/0x380
>>> [<c0117f3a>] ? mc_sysdev_add+0xa/0x10
>>> [<c05875fa>] ? mc_cpu_callback+0x1ea/0x240
>>> [<c014db67>] ? notifier_call_chain+0x37/0x70
>>> [<c014dbd9>] ? __raw_notifier_call_chain+0x19/0x20
>>> [<c014dbfa>] ? raw_notifier_call_chain+0x1a/0x20
>>> [<c0589477>] ? _cpu_up+0xa7/0x100
>>> [<c0589519>] ? cpu_up+0x49/0x80
>>> [<c056a3d8>] ? store_online+0x58/0x80
>>> [<c056a380>] ? store_online+0x0/0x80
>>> [<c02ff57c>] ? sysdev_store+0x2c/0x40
>>> [<c01de412>] ? sysfs_write_file+0xa2/0x100
>>> [<c01a0386>] ? vfs_write+0x96/0x130
>>> [<c01de370>] ? sysfs_write_file+0x0/0x100
>>> [<c01a08cd>] ? sys_write+0x3d/0x70
>>> [<c0103f5b>] ? sysenter_do_call+0x12/0x3f
>>> =======================
>>> Code: 4d d8 c7 01 00 00 00 00 b8 00 1a 6f c0 e8 fb 46 47 00 8d 55 f0
>>> 64 a1 00 90 7c c0 e8 0d 75 01 00 8b 45 d4 83 c4 4c 5b 5e 5f 5d c3 <0f>
>>> 0b eb fe 8d b4 26 00 00 00 00 8d bc 27 00 00 00 00 55 31 d2
>>> EIP: [<c0117f1e>] __mc_sysdev_add+0x1ee/0x200 SS:ESP 0068:f5bcbdcc
>>> ---[ end trace 8c86c730d90bf362 ]---
>>>
>>> It's this one:
>>>
>>> /* We should bind the task to the CPU */
>>> BUG_ON(raw_smp_processor_id() != cpu_num);
>>>
>>> Maybe related to recently merged per-cpu changes? (Yesterday's tests ran fine.)
>>>
>>> It seems 100% reproducible, so I'll start bisecting it.
>>
>> Ahha, after many hours of hitting various unrelated crashes,
>> miscompiles, etc. I finally arrive at this commit:
>>
>> commit e761b7725234276a802322549cee5255305a0930
>> Author: Max Krasnyansky <[email protected]>
>> Date: Tue Jul 15 04:43:49 2008 -0700
>
> Yeah, there seems to be a funny situation here :-) I'd expect it to be
> 100% reproduceable with CONFIG_MICROCODE=y.
>
> cpu_up() -> raw_notifier_call_chain(CPU_ONLINE, ...) ->
>
> (microcode's part)
>
> mc_cpu_callback() -> mc_sysdev_add() -> microcode_init_cpu()
>
> and here we have:
>
> set_cpus_allowed_ptr(current, &cpumask_of_cpu(cpu));

btw., this is obviously bad behavior. This code plays with
"cpus_allowed" (changes and then restores it) of pretty arbitrary
tasks in context of which it happens to run. So it may race with
sched_setaffinity() and negate its effect.


--
Best regards,
Dmitry Adamushko

2008-07-24 17:16:17

by Max Krasnyansky

[permalink] [raw]
Subject: Re: latest -git: kernel BUG at arch/x86/kernel/microcode.c:142!

Dmitry Adamushko wrote:
> 2008/7/24 Dmitry Adamushko <[email protected]>:
>> 2008/7/24 Vegard Nossum <[email protected]>:
>>>> It's this one:
>>>>
>>>> /* We should bind the task to the CPU */
>>>> BUG_ON(raw_smp_processor_id() != cpu_num);
>>>>
>>>> Maybe related to recently merged per-cpu changes? (Yesterday's tests ran fine.)
>>>>
>>>> It seems 100% reproducible, so I'll start bisecting it.
>>> Ahha, after many hours of hitting various unrelated crashes,
>>> miscompiles, etc. I finally arrive at this commit:
>>>
>>> commit e761b7725234276a802322549cee5255305a0930
>>> Author: Max Krasnyansky <[email protected]>
>>> Date: Tue Jul 15 04:43:49 2008 -0700
>> Yeah, there seems to be a funny situation here :-) I'd expect it to be
>> 100% reproduceable with CONFIG_MICROCODE=y.
>>
>> cpu_up() -> raw_notifier_call_chain(CPU_ONLINE, ...) ->
>>
>> (microcode's part)
>>
>> mc_cpu_callback() -> mc_sysdev_add() -> microcode_init_cpu()
>>
>> and here we have:
>>
>> set_cpus_allowed_ptr(current, &cpumask_of_cpu(cpu));
>
> btw., this is obviously bad behavior. This code plays with
> "cpus_allowed" (changes and then restores it) of pretty arbitrary
> tasks in context of which it happens to run. So it may race with
> sched_setaffinity() and negate its effect.

Agree. I came to the similar conclusion.
The solution is to either convert it to schedule_delayed_work_on() or if it's
important to update the microcode synchronously we can the whole thing to do
something like
smp_call_function_single(cpu, collect_cpu_info);
if (needs_update) {
request_firmware(...);
smp_call_function_single(cpu, update_cpu_microcode);
}

Tigran, do we need sync update inside the hotplug handler or async update via
workqueue is fine ?

Max


Max