2006-02-19 23:55:35

by Nathan Lynch

[permalink] [raw]
Subject: i386 cpu hotplug bug - instant reboot when onlining secondary

Hi-

On a dual P3 Xeon machine, offlining and then onlining a cpu makes the
box instantly reboot. I've been seeing this throughout the 2.6.16-rc
series, but wasn't able to collect more information until now. Not
sure when this last worked, unfortunately.

With the debugging patch below, I get this on serial console:

[17179681.704000] CPU 1 is now offline
[17179686.908000] Booting processor 1/1 eip 3000
[17179686.912000] CPU 1 irqstacks, hard=78383000 soft=7837b000
[17179686.920000] Setting warm reset code and vector.
[17179686.924000] 1.
[17179686.924000] 2.
[17179686.928000] 3.
[17179686.928000] Asserting INIT.
[17179686.932000] Waiting for send to finish...
[17179686.936000] +<7>Deasserting INIT.
[17179686.952000] Waiting for send to finish...
[17179686.956000] +<7>#startup loops: 2.
[17179686.960000] Sending STARTUP #1.
[17179686.960000] After apic_write.
[17179686.964000] Doing apic_write_around for target chip...
[17179686.972000] Doing apic_write_around to kick the second...

Any suggestions?


diff --git a/arch/i386/kernel/smpboot.c b/arch/i386/kernel/smpboot.c
index fb00ab7..85aff00 100644
--- a/arch/i386/kernel/smpboot.c
+++ b/arch/i386/kernel/smpboot.c
@@ -801,10 +801,12 @@ wakeup_secondary_cpu(int phys_apicid, un
*/

/* Target chip */
+ Dprintk("Doing apic_write_around for target chip...\n");
apic_write_around(APIC_ICR2, SET_APIC_DEST_FIELD(phys_apicid));

/* Boot on the stack */
/* Kick the second */
+ Dprintk("Doing apic_write_around to kick the second...\n");
apic_write_around(APIC_ICR, APIC_DM_STARTUP
| (start_eip >> 12));

diff --git a/include/asm-i386/apic.h b/include/asm-i386/apic.h
index d30b857..2c8dcfa 100644
--- a/include/asm-i386/apic.h
+++ b/include/asm-i386/apic.h
@@ -8,7 +8,7 @@
#include <asm/processor.h>
#include <asm/system.h>

-#define Dprintk(x...)
+#define Dprintk(fmt,arg...) printk(KERN_DEBUG fmt,##arg)

/*
* Debugging macros


2006-02-21 16:15:41

by Zwane Mwaikambo

[permalink] [raw]
Subject: Re: i386 cpu hotplug bug - instant reboot when onlining secondary

On Sun, 19 Feb 2006, Nathan Lynch wrote:

> On a dual P3 Xeon machine, offlining and then onlining a cpu makes the
> box instantly reboot. I've been seeing this throughout the 2.6.16-rc
> series, but wasn't able to collect more information until now. Not
> sure when this last worked, unfortunately.
>
> With the debugging patch below, I get this on serial console:

Does 2.6.14 work? Also i wonder if it gets out of the trampoline...

Index: linux-2.6.16-rc2/arch/i386/kernel/smpboot.c
===================================================================
RCS file: /home/cvsroot/linux-2.6.16-rc2/arch/i386/kernel/smpboot.c,v
retrieving revision 1.1.1.1
diff -u -p -B -r1.1.1.1 smpboot.c
--- linux-2.6.16-rc2/arch/i386/kernel/smpboot.c 11 Feb 2006 18:55:06 -0000 1.1.1.1
+++ linux-2.6.16-rc2/arch/i386/kernel/smpboot.c 21 Feb 2006 16:19:22 -0000
@@ -514,6 +514,7 @@ static void __devinit start_secondary(vo
cpu_init();
preempt_disable();
smp_callin();
+ Dprintk("startup_secondary\n");
while (!cpu_isset(smp_processor_id(), smp_commenced_mask))
rep_nop();
setup_secondary_APIC_clock();

2006-02-27 07:44:43

by Nathan Lynch

[permalink] [raw]
Subject: Re: i386 cpu hotplug bug - instant reboot when onlining secondary

Zwane Mwaikambo wrote:
> On Sun, 19 Feb 2006, Nathan Lynch wrote:
>
> > On a dual P3 Xeon machine, offlining and then onlining a cpu makes the
> > box instantly reboot. I've been seeing this throughout the 2.6.16-rc
> > series, but wasn't able to collect more information until now. Not
> > sure when this last worked, unfortunately.
> >
> > With the debugging patch below, I get this on serial console:
>
> Does 2.6.14 work? Also i wonder if it gets out of the trampoline...

2.6.14 works (albeit with an APIC error reported). When retesting
2.6.16-rc4 with your patch on top of my debugging patch, I don't see the
"startup_secondary" line:

[17179709.100000] CPU 1 is now offline
[17179714.636000] Booting processor 1/1 eip 3000
[17179714.688000] CPU 1 irqstacks, hard=7837f000 soft=78377000
[17179714.756000] Setting warm reset code and vector.
[17179714.812000] 1.
[17179714.836000] 2.
[17179714.860000] 3.
[17179714.880000] Asserting INIT.
[17179714.916000] Waiting for send to finish...
[17179714.968000] +<7>Deasserting INIT.
[17179715.024000] Waiting for send to finish...
[17179715.072000] +<7>#startup loops: 2.
[17179715.116000] Sending STARTUP #1.
[17179715.160000] After apic_write.
[17179715.196000] Doing apic_write_around for target chip...
[17179715.260000] Doing apic_write_around to kick the second...

>
> Index: linux-2.6.16-rc2/arch/i386/kernel/smpboot.c
> ===================================================================
> RCS file: /home/cvsroot/linux-2.6.16-rc2/arch/i386/kernel/smpboot.c,v
> retrieving revision 1.1.1.1
> diff -u -p -B -r1.1.1.1 smpboot.c
> --- linux-2.6.16-rc2/arch/i386/kernel/smpboot.c 11 Feb 2006 18:55:06 -0000 1.1.1.1
> +++ linux-2.6.16-rc2/arch/i386/kernel/smpboot.c 21 Feb 2006 16:19:22 -0000
> @@ -514,6 +514,7 @@ static void __devinit start_secondary(vo
> cpu_init();
> preempt_disable();
> smp_callin();
> + Dprintk("startup_secondary\n");
> while (!cpu_isset(smp_processor_id(), smp_commenced_mask))
> rep_nop();
> setup_secondary_APIC_clock();

2006-02-28 15:35:43

by Zwane Mwaikambo

[permalink] [raw]
Subject: Re: i386 cpu hotplug bug - instant reboot when onlining secondary

On Mon, 27 Feb 2006, Nathan Lynch wrote:

> Zwane Mwaikambo wrote:
> > On Sun, 19 Feb 2006, Nathan Lynch wrote:
> >
> > > On a dual P3 Xeon machine, offlining and then onlining a cpu makes the
> > > box instantly reboot. I've been seeing this throughout the 2.6.16-rc
> > > series, but wasn't able to collect more information until now. Not
> > > sure when this last worked, unfortunately.
> > >
> > > With the debugging patch below, I get this on serial console:
> >
> > Does 2.6.14 work? Also i wonder if it gets out of the trampoline...
>
> 2.6.14 works (albeit with an APIC error reported). When retesting
> 2.6.16-rc4 with your patch on top of my debugging patch, I don't see the
> "startup_secondary" line:

Hi Nathan,

Can you try the following patch? We can start moving the WARM_BOOT_HLT
down until it triple faults (i'm assuming it at least gets this far).

Index: linux-2.6.16-rc2-mm1/arch/i386/kernel/head.S
===================================================================
RCS file: /home/cvsroot/linux-2.6.16-rc2-mm1/arch/i386/kernel/head.S,v
retrieving revision 1.1.1.1
diff -u -p -B -r1.1.1.1 head.S
--- linux-2.6.16-rc2-mm1/arch/i386/kernel/head.S 11 Feb 2006 16:55:14 -0000 1.1.1.1
+++ linux-2.6.16-rc2-mm1/arch/i386/kernel/head.S 28 Feb 2006 15:34:34 -0000
@@ -146,6 +146,12 @@ page_pde_offset = (__PAGE_OFFSET >> 20);
* we know the trampoline has already loaded the boot_gdt_table GDT
* for us.
*/
+#define warm_boot tsc_sync_disabled-__PAGE_OFFSET
+#define WARM_BOOT_HLT \
+ cmpl $0, warm_boot; \
+10: \
+ jne 10b
+
ENTRY(startup_32_smp)
cld
movl $(__BOOT_DS),%eax
@@ -168,6 +174,8 @@ ENTRY(startup_32_smp)
* NOTE! We have to correct for the fact that we're
* not yet offset PAGE_OFFSET..
*/
+ WARM_BOOT_HLT
+
#define cr4_bits mmu_cr4_features-__PAGE_OFFSET
movl cr4_bits,%edx
andl %edx,%edx
Index: linux-2.6.16-rc2-mm1/arch/i386/kernel/smpboot.c
===================================================================
RCS file: /home/cvsroot/linux-2.6.16-rc2-mm1/arch/i386/kernel/smpboot.c,v
retrieving revision 1.1.1.1
diff -u -p -B -r1.1.1.1 smpboot.c
--- linux-2.6.16-rc2-mm1/arch/i386/kernel/smpboot.c 11 Feb 2006 16:55:14 -0000 1.1.1.1
+++ linux-2.6.16-rc2-mm1/arch/i386/kernel/smpboot.c 28 Feb 2006 15:34:42 -0000
@@ -102,7 +102,7 @@ static cpumask_t smp_commenced_mask;
* is no way to resync one AP against BP. TBD: for prescott and above, we
* should use IA64's algorithm
*/
-static int __devinitdata tsc_sync_disabled;
+int __devinitdata tsc_sync_disabled;

/* Per CPU bogomips and other parameters */
struct cpuinfo_x86 cpu_data[NR_CPUS] __cacheline_aligned;

2006-02-28 21:34:18

by Nathan Lynch

[permalink] [raw]
Subject: Re: i386 cpu hotplug bug - instant reboot when onlining secondary

Zwane Mwaikambo wrote:
> On Mon, 27 Feb 2006, Nathan Lynch wrote:
>
> > Zwane Mwaikambo wrote:
> > > On Sun, 19 Feb 2006, Nathan Lynch wrote:
> > >
> > > > On a dual P3 Xeon machine, offlining and then onlining a cpu makes the
> > > > box instantly reboot. I've been seeing this throughout the 2.6.16-rc
> > > > series, but wasn't able to collect more information until now. Not
> > > > sure when this last worked, unfortunately.
> > > >
> > > > With the debugging patch below, I get this on serial console:
> > >
> > > Does 2.6.14 work? Also i wonder if it gets out of the trampoline...
> >
> > 2.6.14 works (albeit with an APIC error reported). When retesting
> > 2.6.16-rc4 with your patch on top of my debugging patch, I don't see the
> > "startup_secondary" line:
>
> Hi Nathan,
>
> Can you try the following patch? We can start moving the WARM_BOOT_HLT
> down until it triple faults (i'm assuming it at least gets this far).

Here's what I got with this one on top of a day-old -git (all
debugging patches still applied):

[17179725.020000] CPU 1 is now offline
[17179730.900000] Booting processor 1/1 eip 3000
[17179730.952000] CPU 1 irqstacks, hard=7837f000 soft=78377000
[17179731.020000] Setting warm reset code and vector.
[17179731.076000] 1.
[17179731.100000] 2.
[17179731.120000] 3.
[17179731.144000] Asserting INIT.
[17179731.180000] Waiting for send to finish...
[17179731.232000] +<7>Deasserting INIT.
[17179731.284000] Waiting for send to finish...
[17179731.336000] +<7>#startup loops: 2.
[17179731.380000] Sending STARTUP #1.
[17179731.420000] After apic_write.
[17179731.460000] Doing apic_write_around for target chip...
[17179731.524000] Doing apic_write_around to kick the second...
[17179731.592000] Startup point 1.
[17179731.632000] Waiting for send to finish...
[17179731.680000] +<7>Sending STARTUP #2.
[17179731.728000] After apic_write.
[17179731.768000] Doing apic_write_around for target chip...
[17179731.832000] Doing apic_write_around to kick the second...
[17179731.900000] Startup point 1.
[17179731.936000] Waiting for send to finish...
[17179731.988000] +<7>After Startup.
[17179732.028000] Before Callout 1.
[17179732.068000] After Callout 1.
[17179737.080000] Stuck ??
[17179737.108000] Inquiring remote APIC #1...
[17179737.156000] ... APIC #1 ID: 01000000
[17179737.204000] ... APIC #1 VERSION: 00040011
[17179737.256000] ... APIC #1 SPIV: 000000ff


>
> Index: linux-2.6.16-rc2-mm1/arch/i386/kernel/head.S
> ===================================================================
> RCS file: /home/cvsroot/linux-2.6.16-rc2-mm1/arch/i386/kernel/head.S,v
> retrieving revision 1.1.1.1
> diff -u -p -B -r1.1.1.1 head.S
> --- linux-2.6.16-rc2-mm1/arch/i386/kernel/head.S 11 Feb 2006 16:55:14 -0000 1.1.1.1
> +++ linux-2.6.16-rc2-mm1/arch/i386/kernel/head.S 28 Feb 2006 15:34:34 -0000
> @@ -146,6 +146,12 @@ page_pde_offset = (__PAGE_OFFSET >> 20);
> * we know the trampoline has already loaded the boot_gdt_table GDT
> * for us.
> */
> +#define warm_boot tsc_sync_disabled-__PAGE_OFFSET
> +#define WARM_BOOT_HLT \
> + cmpl $0, warm_boot; \
> +10: \
> + jne 10b
> +
> ENTRY(startup_32_smp)
> cld
> movl $(__BOOT_DS),%eax
> @@ -168,6 +174,8 @@ ENTRY(startup_32_smp)
> * NOTE! We have to correct for the fact that we're
> * not yet offset PAGE_OFFSET..
> */
> + WARM_BOOT_HLT
> +
> #define cr4_bits mmu_cr4_features-__PAGE_OFFSET
> movl cr4_bits,%edx
> andl %edx,%edx
> Index: linux-2.6.16-rc2-mm1/arch/i386/kernel/smpboot.c
> ===================================================================
> RCS file: /home/cvsroot/linux-2.6.16-rc2-mm1/arch/i386/kernel/smpboot.c,v
> retrieving revision 1.1.1.1
> diff -u -p -B -r1.1.1.1 smpboot.c
> --- linux-2.6.16-rc2-mm1/arch/i386/kernel/smpboot.c 11 Feb 2006 16:55:14 -0000 1.1.1.1
> +++ linux-2.6.16-rc2-mm1/arch/i386/kernel/smpboot.c 28 Feb 2006 15:34:42 -0000
> @@ -102,7 +102,7 @@ static cpumask_t smp_commenced_mask;
> * is no way to resync one AP against BP. TBD: for prescott and above, we
> * should use IA64's algorithm
> */
> -static int __devinitdata tsc_sync_disabled;
> +int __devinitdata tsc_sync_disabled;
>
> /* Per CPU bogomips and other parameters */
> struct cpuinfo_x86 cpu_data[NR_CPUS] __cacheline_aligned;
>

2006-02-28 22:09:04

by Zwane Mwaikambo

[permalink] [raw]
Subject: Re: i386 cpu hotplug bug - instant reboot when onlining secondary

On Tue, 28 Feb 2006, Nathan Lynch wrote:

> Zwane Mwaikambo wrote:
> > On Mon, 27 Feb 2006, Nathan Lynch wrote:
> >
> > > Zwane Mwaikambo wrote:
> > > > On Sun, 19 Feb 2006, Nathan Lynch wrote:
> > > >
> > > > > On a dual P3 Xeon machine, offlining and then onlining a cpu makes the
> > > > > box instantly reboot. I've been seeing this throughout the 2.6.16-rc
> > > > > series, but wasn't able to collect more information until now. Not
> > > > > sure when this last worked, unfortunately.
> > > > >
> > > > > With the debugging patch below, I get this on serial console:
> > > >
> > > > Does 2.6.14 work? Also i wonder if it gets out of the trampoline...
> > >
> > > 2.6.14 works (albeit with an APIC error reported). When retesting
> > > 2.6.16-rc4 with your patch on top of my debugging patch, I don't see the
> > > "startup_secondary" line:
> >
> > Hi Nathan,
> >
> > Can you try the following patch? We can start moving the WARM_BOOT_HLT
> > down until it triple faults (i'm assuming it at least gets this far).
>
> Here's what I got with this one on top of a day-old -git (all
> debugging patches still applied):

Looks good, how about the following

Index: linux-2.6.16-rc2-mm1/arch/i386/kernel/head.S
===================================================================
RCS file: /home/cvsroot/linux-2.6.16-rc2-mm1/arch/i386/kernel/head.S,v
retrieving revision 1.1.1.1
diff -u -p -B -r1.1.1.1 head.S
--- linux-2.6.16-rc2-mm1/arch/i386/kernel/head.S 11 Feb 2006 16:55:14 -0000 1.1.1.1
+++ linux-2.6.16-rc2-mm1/arch/i386/kernel/head.S 28 Feb 2006 22:12:25 -0000
@@ -146,6 +146,12 @@ page_pde_offset = (__PAGE_OFFSET >> 20);
* we know the trampoline has already loaded the boot_gdt_table GDT
* for us.
*/
+#define warm_boot tsc_sync_disabled-__PAGE_OFFSET
+#define WARM_BOOT_HLT \
+ cmpl $0, warm_boot; \
+10: \
+ jne 10b
+
ENTRY(startup_32_smp)
cld
movl $(__BOOT_DS),%eax
@@ -324,6 +330,7 @@ is386: movl $2,%ecx # set MP
cmpb $0,%cl
je 1f # the first CPU calls start_kernel
# all other CPUs call initialize_secondary
+ WARM_BOOT_HLT
call initialize_secondary
jmp L6
1:
Index: linux-2.6.16-rc2-mm1/arch/i386/kernel/smpboot.c
===================================================================
RCS file: /home/cvsroot/linux-2.6.16-rc2-mm1/arch/i386/kernel/smpboot.c,v
retrieving revision 1.1.1.1
diff -u -p -B -r1.1.1.1 smpboot.c
--- linux-2.6.16-rc2-mm1/arch/i386/kernel/smpboot.c 11 Feb 2006 16:55:14 -0000 1.1.1.1
+++ linux-2.6.16-rc2-mm1/arch/i386/kernel/smpboot.c 28 Feb 2006 15:34:42 -0000
@@ -102,7 +102,7 @@ static cpumask_t smp_commenced_mask;
* is no way to resync one AP against BP. TBD: for prescott and above, we
* should use IA64's algorithm
*/
-static int __devinitdata tsc_sync_disabled;
+int __devinitdata tsc_sync_disabled;

/* Per CPU bogomips and other parameters */
struct cpuinfo_x86 cpu_data[NR_CPUS] __cacheline_aligned;

2006-03-01 03:28:25

by Nathan Lynch

[permalink] [raw]
Subject: Re: i386 cpu hotplug bug - instant reboot when onlining secondary

Zwane Mwaikambo wrote:
> On Tue, 28 Feb 2006, Nathan Lynch wrote:
>
> > Zwane Mwaikambo wrote:
> > > On Mon, 27 Feb 2006, Nathan Lynch wrote:
> > >
> > > > Zwane Mwaikambo wrote:
> > > > > On Sun, 19 Feb 2006, Nathan Lynch wrote:
> > > > >
> > > > > > On a dual P3 Xeon machine, offlining and then onlining a cpu makes the
> > > > > > box instantly reboot. I've been seeing this throughout the 2.6.16-rc
> > > > > > series, but wasn't able to collect more information until now. Not
> > > > > > sure when this last worked, unfortunately.
> > > > > >
> > > > > > With the debugging patch below, I get this on serial console:
> > > > >
> > > > > Does 2.6.14 work? Also i wonder if it gets out of the trampoline...
> > > >
> > > > 2.6.14 works (albeit with an APIC error reported). When retesting
> > > > 2.6.16-rc4 with your patch on top of my debugging patch, I don't see the
> > > > "startup_secondary" line:
> > >
> > > Hi Nathan,
> > >
> > > Can you try the following patch? We can start moving the WARM_BOOT_HLT
> > > down until it triple faults (i'm assuming it at least gets this far).
> >
> > Here's what I got with this one on top of a day-old -git (all
> > debugging patches still applied):
>
> Looks good, how about the following

I now get:

[17179687.244000] CPU 1 is now offline
[17179693.164000] Booting processor 1/1 eip 3000
[17179693.216000] CPU 1 irqstacks, hard=7837f000 soft=78377000
[17179693.284000] Setting warm reset code and vector.
[17179693.340000] 1.
[17179693.364000] 2.
[17179693.388000] 3.
[17179693.408000] Asserting INIT.
[17179693.448000] Waiting for send to finish...
[17179693.496000] +<7>Deasserting INIT.
[17179693.552000] Waiting for send to finish...
[17179693.600000] +<7>#startup loops: 2.
[17179693.644000] Sending STARTUP #1.
[17179693.688000] After apic_write.
[17179693.724000] Doing apic_write_around for target chip...
[17179693.788000] Doing apic_write_around to kick the second...


> Index: linux-2.6.16-rc2-mm1/arch/i386/kernel/head.S
> ===================================================================
> RCS file: /home/cvsroot/linux-2.6.16-rc2-mm1/arch/i386/kernel/head.S,v
> retrieving revision 1.1.1.1
> diff -u -p -B -r1.1.1.1 head.S
> --- linux-2.6.16-rc2-mm1/arch/i386/kernel/head.S 11 Feb 2006 16:55:14 -0000 1.1.1.1
> +++ linux-2.6.16-rc2-mm1/arch/i386/kernel/head.S 28 Feb 2006 22:12:25 -0000
> @@ -146,6 +146,12 @@ page_pde_offset = (__PAGE_OFFSET >> 20);
> * we know the trampoline has already loaded the boot_gdt_table GDT
> * for us.
> */
> +#define warm_boot tsc_sync_disabled-__PAGE_OFFSET
> +#define WARM_BOOT_HLT \
> + cmpl $0, warm_boot; \
> +10: \
> + jne 10b
> +
> ENTRY(startup_32_smp)
> cld
> movl $(__BOOT_DS),%eax
> @@ -324,6 +330,7 @@ is386: movl $2,%ecx # set MP
> cmpb $0,%cl
> je 1f # the first CPU calls start_kernel
> # all other CPUs call initialize_secondary
> + WARM_BOOT_HLT
> call initialize_secondary
> jmp L6
> 1:
> Index: linux-2.6.16-rc2-mm1/arch/i386/kernel/smpboot.c
> ===================================================================
> RCS file: /home/cvsroot/linux-2.6.16-rc2-mm1/arch/i386/kernel/smpboot.c,v
> retrieving revision 1.1.1.1
> diff -u -p -B -r1.1.1.1 smpboot.c
> --- linux-2.6.16-rc2-mm1/arch/i386/kernel/smpboot.c 11 Feb 2006 16:55:14 -0000 1.1.1.1
> +++ linux-2.6.16-rc2-mm1/arch/i386/kernel/smpboot.c 28 Feb 2006 15:34:42 -0000
> @@ -102,7 +102,7 @@ static cpumask_t smp_commenced_mask;
> * is no way to resync one AP against BP. TBD: for prescott and above, we
> * should use IA64's algorithm
> */
> -static int __devinitdata tsc_sync_disabled;
> +int __devinitdata tsc_sync_disabled;
>
> /* Per CPU bogomips and other parameters */
> struct cpuinfo_x86 cpu_data[NR_CPUS] __cacheline_aligned;

2006-03-01 06:27:19

by Zwane Mwaikambo

[permalink] [raw]
Subject: Re: i386 cpu hotplug bug - instant reboot when onlining secondary

On Tue, 28 Feb 2006, Nathan Lynch wrote:

>
> [17179687.244000] CPU 1 is now offline
> [17179693.164000] Booting processor 1/1 eip 3000
> [17179693.216000] CPU 1 irqstacks, hard=7837f000 soft=78377000
> [17179693.284000] Setting warm reset code and vector.
> [17179693.340000] 1.
> [17179693.364000] 2.
> [17179693.388000] 3.
> [17179693.408000] Asserting INIT.
> [17179693.448000] Waiting for send to finish...
> [17179693.496000] +<7>Deasserting INIT.
> [17179693.552000] Waiting for send to finish...
> [17179693.600000] +<7>#startup loops: 2.
> [17179693.644000] Sending STARTUP #1.
> [17179693.688000] After apic_write.
> [17179693.724000] Doing apic_write_around for target chip...
> [17179693.788000] Doing apic_write_around to kick the second...

Ok, could you apply only the following patch?

Index: linux-2.6.16-rc2-mm1/arch/i386/kernel/smpboot.c
===================================================================
RCS file: /home/cvsroot/linux-2.6.16-rc2-mm1/arch/i386/kernel/smpboot.c,v
retrieving revision 1.1.1.1
diff -u -p -B -r1.1.1.1 smpboot.c
--- linux-2.6.16-rc2-mm1/arch/i386/kernel/smpboot.c 11 Feb 2006 16:55:14 -0000 1.1.1.1
+++ linux-2.6.16-rc2-mm1/arch/i386/kernel/smpboot.c 1 Mar 2006 06:30:06 -0000
@@ -535,9 +535,14 @@ static void __devinit start_secondary(vo
* booting is too fragile that we want to limit the
* things done here to the most necessary things.
*/
+ Dprintk("S1\n");
cpu_init();
+ Dprintk("S2\n");
preempt_disable();
+ Dprintk("S3\n");
smp_callin();
+ Dprintk("S4\n");
+
while (!cpu_isset(smp_processor_id(), smp_commenced_mask))
rep_nop();
setup_secondary_APIC_clock();

2006-03-06 13:25:56

by Nathan Lynch

[permalink] [raw]
Subject: Re: i386 cpu hotplug bug - instant reboot when onlining secondary

Zwane Mwaikambo wrote:
> On Tue, 28 Feb 2006, Nathan Lynch wrote:
>
> >
> > [17179687.244000] CPU 1 is now offline
> > [17179693.164000] Booting processor 1/1 eip 3000
> > [17179693.216000] CPU 1 irqstacks, hard=7837f000 soft=78377000
> > [17179693.284000] Setting warm reset code and vector.
> > [17179693.340000] 1.
> > [17179693.364000] 2.
> > [17179693.388000] 3.
> > [17179693.408000] Asserting INIT.
> > [17179693.448000] Waiting for send to finish...
> > [17179693.496000] +<7>Deasserting INIT.
> > [17179693.552000] Waiting for send to finish...
> > [17179693.600000] +<7>#startup loops: 2.
> > [17179693.644000] Sending STARTUP #1.
> > [17179693.688000] After apic_write.
> > [17179693.724000] Doing apic_write_around for target chip...
> > [17179693.788000] Doing apic_write_around to kick the second...
>
> Ok, could you apply only the following patch?

Sorry for the delay in getting back to you.

Applied your latest patch, (plus one-liner to make Dprintk actually
print) -- I don't see any of the new print statements:

[17179687.744000] CPU 1 is now offline
[17179693.032000] Booting processor 1/1 eip 3000
[17179693.084000] CPU 1 irqstacks, hard=783da000 soft=783d2000
[17179693.152000] Setting warm reset code and vector.
[17179693.208000] 1.
[17179693.232000] 2.
[17179693.256000] 3.
[17179693.276000] Asserting INIT.
[17179693.316000] Waiting for send to finish...
[17179693.364000] +<7>Deasserting INIT.
[17179693.420000] Waiting for send to finish...
[17179693.468000] +<7>#startup loops: 2.
[17179693.512000] Sending STARTUP #1.
[17179693.556000] After apic_write.



>
> Index: linux-2.6.16-rc2-mm1/arch/i386/kernel/smpboot.c
> ===================================================================
> RCS file: /home/cvsroot/linux-2.6.16-rc2-mm1/arch/i386/kernel/smpboot.c,v
> retrieving revision 1.1.1.1
> diff -u -p -B -r1.1.1.1 smpboot.c
> --- linux-2.6.16-rc2-mm1/arch/i386/kernel/smpboot.c 11 Feb 2006 16:55:14 -0000 1.1.1.1
> +++ linux-2.6.16-rc2-mm1/arch/i386/kernel/smpboot.c 1 Mar 2006 06:30:06 -0000
> @@ -535,9 +535,14 @@ static void __devinit start_secondary(vo
> * booting is too fragile that we want to limit the
> * things done here to the most necessary things.
> */
> + Dprintk("S1\n");
> cpu_init();
> + Dprintk("S2\n");
> preempt_disable();
> + Dprintk("S3\n");
> smp_callin();
> + Dprintk("S4\n");
> +
> while (!cpu_isset(smp_processor_id(), smp_commenced_mask))
> rep_nop();
> setup_secondary_APIC_clock();