2002-01-23 12:08:08

by Zwane Mwaikambo

[permalink] [raw]
Subject: 2.5.2-pre2-3 SMP broken on UP boxen

Hi Ingo,
My test box is a single proc machine running an SMP kernel. As of
2.5.2-pre2 it panics on boot. The reason is kinda obvious,
smp_processor_id() will always return the same as global_irq_holder.
How come we do this check now?

kernel/sched.c
asmlinkage void schedule(void)
{
task_t *prev = current, *next;
runqueue_t *rq = this_rq();
prio_array_t *array;
list_t *queue;
int idx;
if (unlikely(in_interrupt()))
BUG();
release_kernel_lock(prev, smp_processor_id()); <==
spin_lock_irq(&rq->lock);
<snip>

include/asm-i386/smplock.h
#define release_kernel_lock(task, cpu) \
do { \
if (unlikely(task->lock_depth >= 0)) { \
spin_unlock(&kernel_flag); \
if (global_irq_holder == (cpu)) \ <== Ugg....
BUG(); \
} \
} while (0)

Here is the oops...

<snip>
CPU0: Intel Pentium II (Deschutes) stepping 02
per-CPU timeslice cutoff: 1463.72 usecs.
SMP motherboard not detected.
enabled ExtINT on CPU#0
ESR value before enabling vector: 00000000
ESR value after enabling vector: 00000000
Using local APIC timer interrupts.
calibrating APIC timer ...
..... CPU clock speed is 350.7822 MHz.
..... host bus clock speed is 100.2233 MHz.
cpu: 0, clocks: 1002233, slice: 501116
CPU0<T0:1002224,T1:501104,D:4,S:501116,C:1002233>
kernel BUG at sched.c:588!
invalid operand: 0000
CPU: 0
EIP: 0010:[<c0119fdf>] Not tainted
EFLAGS: 00010082
eax: 0000001b ebx: c0338000 ecx: c031ee64 edx: 00000c43
esi: c037bca0 edi: c0338000 ebp: c0339f84 esp: c0339f58
ds: 0018 es: 0018 ss: 0018
Process swapper (pid: 0, stackpage=c0339000)
Stack: c02da9e4 0000024c c0339f8c c0339fc0 c0105000 0008e000 c0107661 c0338000
c0338000 c0339fc0 c0105000 0008e000 c0109199 00010f00 c0105090 00000000
c0339fc0 c0105000 0008e000 00000001 00100018 00000018 00000078 c010725d
Call Trace: [<c0105000>] [<c0107661>] [<c0105000>] [<c0109199>] [<c0105090>]
[<c0105000>] [<c010725d>] [<c0105019>] [<c0105090>]

Code: 0f 0b 5a 59 fa 81 7e 04 ad 4e ad de 74 33 68 e3 9f 11 c0 68
<0>Kernel panic: Attempted to kill the idle task!
In idle task - not syncing

>>EIP; c0119fdf <schedule+bf/4b0> <=====
Trace; c0105000 <_stext+0/0>
Trace; c0107661 <sys_clone+21/30>
Trace; c0105000 <_stext+0/0>
Trace; c0109199 <reschedule+5/c>
Trace; c0105090 <init+0/1f0>
Trace; c0105000 <_stext+0/0>
Trace; c010725d <kernel_thread+1d/30>
Trace; c0105019 <rest_init+19/90>
Trace; c0105090 <init+0/1f0>
Code; c0119fdf <schedule+bf/4b0>

00000000 <_EIP>:
Code; c0119fdf <schedule+bf/4b0> <=====
0: 0f 0b ud2a <=====
Code; c0119fe1 <schedule+c1/4b0>
2: 5a pop %edx
Code; c0119fe2 <schedule+c2/4b0>
3: 59 pop %ecx
Code; c0119fe3 <schedule+c3/4b0>
4: fa cli
Code; c0119fe4 <schedule+c4/4b0>
5: 81 7e 04 ad 4e ad de cmpl $0xdead4ead,0x4(%esi)
Code; c0119feb <schedule+cb/4b0>
c: 74 33 je 41 <_EIP+0x41> c011a020 <schedule+100/4b0>
Code; c0119fed <schedule+cd/4b0>
e: 68 e3 9f 11 c0 push $0xc0119fe3
Code; c0119ff2 <schedule+d2/4b0>
13: 68 00 00 00 00 push $0x0

0xc0119fbb <schedule+155>: xchg %bl,0xc037a400
0xc0119fc1 <schedule+161>: movzbl 0xc031d120,%eax
0xc0119fc8 <schedule+168>: mov 0xfffffff0(%ebp),%ecx
0xc0119fcb <schedule+171>: cmp 0x20(%ecx),%eax
0xc0119fce <schedule+174>: jne 0xc0119fe3 <schedule+195>
0xc0119fd0 <schedule+176>: push $0x24c
0xc0119fd5 <schedule+181>: push $0xc02da9e4
0xc0119fda <schedule+186>: call 0xc0118570 <do_BUG>
0xc0119fdf <schedule+191>: ud2a
0xc0119fe1 <schedule+193>: pop %edx
0xc0119fe2 <schedule+194>: pop %ecx



2002-01-23 12:22:09

by Ingo Molnar

[permalink] [raw]
Subject: Re: 2.5.2-pre2-3 SMP broken on UP boxen


On Wed, 23 Jan 2002, Zwane Mwaikambo wrote:

> My test box is a single proc machine running an SMP kernel. As
> of 2.5.2-pre2 it panics on boot. [...]

the same on 2.5.3-pre3 as well?

> [...] The reason is kinda obvious, smp_processor_id() will always
> return the same as global_irq_holder. How come we do this check now?

it should only be set when the current CPU has disabled global IRQs.

Ingo

2002-01-23 12:25:20

by Zwane Mwaikambo

[permalink] [raw]
Subject: Re: 2.5.2-pre2-3 SMP broken on UP boxen

On Wed, 23 Jan 2002, Ingo Molnar wrote:

>
> On Wed, 23 Jan 2002, Zwane Mwaikambo wrote:
>
> > My test box is a single proc machine running an SMP kernel. As
> > of 2.5.2-pre2 it panics on boot. [...]
>
> the same on 2.5.3-pre3 as well?

Yes -pre3 too, started at -pre2. I'm testing on -pre3 currently

> > [...] The reason is kinda obvious, smp_processor_id() will always
> > return the same as global_irq_holder. How come we do this check now?
>
> it should only be set when the current CPU has disabled global IRQs.

Al Viro just asked me to try something, patch coming in...

Regards,
Zwane Mwaikambo


2002-01-23 12:32:52

by Zwane Mwaikambo

[permalink] [raw]
Subject: Re: 2.5.2-pre2-3 SMP broken on UP boxen

On Wed, 23 Jan 2002, Zwane Mwaikambo wrote:

> Al Viro just asked me to try something, patch coming in...

Here's Al's suggestion, unfortunately the box is at home so i can't test
it right now.


Cheers,
Zwane Mwaikambo

--- linux-2.5.3-pre3/arch/i386/kernel/smpboot.c.orig Wed Jan 23 14:23:49 2002
+++ linux-2.5.3-pre3/arch/i386/kernel/smpboot.c Wed Jan 23 14:24:33 2002
@@ -1018,7 +1018,7 @@
boot_cpu_logical_apicid = logical_smp_processor_id();
map_cpu_to_boot_apicid(0, boot_cpu_apicid);

- global_irq_holder = 0;
+ global_irq_holder = NO_PROC_ID;
current->cpu = 0;
smp_tune_scheduling();



2002-01-23 12:35:13

by Alexander Viro

[permalink] [raw]
Subject: Re: 2.5.2-pre2-3 SMP broken on UP boxen



On Wed, 23 Jan 2002, Ingo Molnar wrote:

>
> On Wed, 23 Jan 2002, Zwane Mwaikambo wrote:
>
> > My test box is a single proc machine running an SMP kernel. As
> > of 2.5.2-pre2 it panics on boot. [...]
>
> the same on 2.5.3-pre3 as well?
>
> > [...] The reason is kinda obvious, smp_processor_id() will always
> > return the same as global_irq_holder. How come we do this check now?
>
> it should only be set when the current CPU has disabled global IRQs.

in arch/i386/kernel/smpboot.c:

@@ -1017,7 +1017,7 @@
boot_cpu_logical_apicid = logical_smp_processor_id();
map_cpu_to_boot_apicid(0, boot_cpu_apicid);

- global_irq_holder = 0;
+ global_irq_holder = NO_PROC_ID;
current->cpu = 0;
smp_tune_scheduling();


2002-01-23 12:59:05

by Ingo Molnar

[permalink] [raw]
Subject: Re: 2.5.2-pre2-3 SMP broken on UP boxen


On Wed, 23 Jan 2002, Zwane Mwaikambo wrote:

> My test box is a single proc machine running an SMP kernel. As of
> 2.5.2-pre2 it panics on boot. The reason is kinda obvious,
> smp_processor_id() will always return the same as global_irq_holder.
> How come we do this check now?

Al found the bug, in smpboot.c:

- global_irq_holder = 0;
+ global_irq_holder = NO_PROC_ID;

does this fix it?

Ingo

2002-01-23 13:11:29

by Zwane Mwaikambo

[permalink] [raw]
Subject: Re: 2.5.2-pre2-3 SMP broken on UP boxen

On Wed, 23 Jan 2002, Ingo Molnar wrote:

> Al found the bug, in smpboot.c:
>
> - global_irq_holder = 0;
> + global_irq_holder = NO_PROC_ID;
>
> does this fix it?
>

Unfortunately i'd have to go home to try it out, if you don't mind waiting
till 10:00 GMT Thursday 24th i can send you a confirmation.

Thanks,
Zwane Mwaikambo


2002-01-24 06:43:41

by Zwane Mwaikambo

[permalink] [raw]
Subject: Re: 2.5.2-pre2-3 SMP broken on UP boxen

On Wed, 23 Jan 2002, Zwane Mwaikambo wrote:

> On Wed, 23 Jan 2002, Ingo Molnar wrote:
>
> > Al found the bug, in smpboot.c:
> >
> > - global_irq_holder = 0;
> > + global_irq_holder = NO_PROC_ID;
> >
> > does this fix it?
> >
>
> Unfortunately i'd have to go home to try it out, if you don't mind waiting
> till 10:00 GMT Thursday 24th i can send you a confirmation.

I can confirm that it works (but you both knew that ;).

Thanks,
Zwane Mwaikambo