2003-05-15 08:45:02

by David Howells

[permalink] [raw]
Subject: 2.5 kernels fail to start second CPU


I've got a computer here with a pair of Pentium Pro CPUs in it. 2.4 kernels
have no problem starting both CPUs, only later 2.5 kernels (with or without
noapic passed on the kernel cmdline).

Can anyone suggest what might need to be done to fix the problem?

The motherboard sports a fairly standard Intel chipset and there's a Matrox
graphics card plugged in:

[root@host135 root]# lspci
00:00.0 Host bridge: Intel Corp. 440FX - 82441FX PMC [Natoma] (rev 02)
00:06.0 Ethernet controller: Intel Corp. 82557/8/9 [Ethernet Pro 100] (rev 01)
00:07.0 ISA bridge: Intel Corp. 82371SB PIIX3 ISA [Natoma/Triton II] (rev 01)
00:07.1 IDE interface: Intel Corp. 82371SB PIIX3 IDE [Natoma/Triton II]
00:09.0 SCSI storage controller: Adaptec AIC-7880U
00:0b.0 VGA compatible controller: Matrox Graphics, Inc. MGA 2064W [Millennium] (rev 01)

The kernel bootup messages:

Linux version 2.5.69 ([email protected]) (gcc version 3.2.1 20021207 (Red Hat Linux 8.0 3.2.1-2)) #3 SMP Wed May 14 15:49:05 BST 2003
Video mode to be used for restore is ffff
BIOS-provided physical RAM map:
BIOS-e820: 0000000000000000 - 000000000009fc00 (usable)
BIOS-e820: 0000000000100000 - 0000000008000000 (usable)
BIOS-e820: 00000000fec00000 - 00000000fec09000 (reserved)
BIOS-e820: 00000000ffe80000 - 0000000100000000 (reserved)
128MB LOWMEM available.
found SMP MP-table at 000f8120
hm, page 000f8000 reserved twice.
hm, page 000f9000 reserved twice.
hm, page 000f8000 reserved twice.
hm, page 000f9000 reserved twice.
On node 0 totalpages: 32768
DMA zone: 4096 pages, LIFO batch:1
Normal zone: 28672 pages, LIFO batch:7
HighMem zone: 0 pages, LIFO batch:1
Intel MultiProcessor Specification v1.4
Virtual Wire compatibility mode.
OEM ID: INTEL Product ID: PR440FX APIC at: 0xFEC08000
Processor #0 6:1 APIC version 17
Processor #12 6:1 APIC version 17
I/O APIC #13 Version 17 at 0xFEC00000.
Enabling APIC mode: Flat. Using 1 I/O APICs
Processors: 2
Building zonelist for node : 0
Kernel command line: ro root=/dev/hda2 console=ttyS0,115200 console=tty0 nmi_watchdog=2
Initializing CPU#0
PID hash table entries: 1024 (order 10: 8192 bytes)
Detected 198.745 MHz processor.
Console: colour VGA+ 80x25
Calibrating delay loop... 389.12 BogoMIPS
Memory: 126236k/131072k available (1876k kernel code, 4280k reserved, 521k data, 320k init, 0k highmem)
Dentry cache hash table entries: 16384 (order: 4, 65536 bytes)
Inode-cache hash table entries: 8192 (order: 3, 32768 bytes)
Mount-cache hash table entries: 512 (order: 0, 4096 bytes)
-> /dev
-> /dev/console
-> /root
CPU: L1 I cache: 8K, L1 D cache: 8K
CPU: L2 cache: 256K
Checking 'hlt' instruction... OK.
POSIX conformance testing by UNIFIX
CPU0: Intel Pentium Pro stepping 09
per-CPU timeslice cutoff: 733.84 usecs.
task migration cache decay timeout: 1 msecs.
enabled ExtINT on CPU#0
ESR value before enabling vector: 00000040
ESR value after enabling vector: 00000000
Error: only one processor found.
ENABLING IO-APIC IRQs
Setting 13 in the phys_id_present_map
...changing IO-APIC physical APIC ID to 13 ... ok.
..TIMER: vector=0x31 pin1=-1 pin2=0
...trying to set up timer (IRQ0) through the 8259A ...
..... (found pin 0) ...works.
testing the IO APIC.......................

.................................... done.
Using local APIC timer interrupts.
calibrating APIC timer ...
..... CPU clock speed is 198.0643 MHz.
..... host bus clock speed is 66.0214 MHz.
Starting migration thread for cpu 0
CPUS done 2
PCI: PCI BIOS revision 2.10 entry at 0xfda11, last bus=0
PCI: Using configuration type 1
BIO: pool of 256 setup, 15Kb (60 bytes/bio)
biovec pool[0]: 1 bvecs: 246 entries (12 bytes)
biovec pool[1]: 4 bvecs: 246 entries (48 bytes)
biovec pool[2]: 16 bvecs: 246 entries (192 bytes)
biovec pool[3]: 64 bvecs: 246 entries (768 bytes)
biovec pool[4]: 128 bvecs: 123 entries (1536 bytes)
biovec pool[5]: 256 bvecs: 61 entries (3072 bytes)
block request queues:
128 requests per read queue
128 requests per write queue
8 requests per batch
enter congestion at 15
exit congestion at 17
PCI: Probing PCI hardware
PCI: Probing PCI hardware (bus 00)
PCI->APIC IRQ transform: (B0,I6,P0) -> 18
PCI->APIC IRQ transform: (B0,I9,P0) -> 17
PCI->APIC IRQ transform: (B0,I11,P0) -> 16
Initializing RT netlink socket
Enabling SEP on CPU 0
Journalled Block Device driver loaded
Installing knfsd (copyright (C) 1996 [email protected]).
Limiting direct PCI/PCI transfers.
Activating ISA DMA hang workarounds.
pty: 256 Unix98 ptys configured
Serial: 8250/16550 driver $Revision: 1.90 $ IRQ sharing enabled
ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
ttyS2 at I/O 0x3e8 (irq = 4) is a 16550A
ttyS3 at I/O 0x2e8 (irq = 3) is a 16550A
Floppy drive(s): fd0 is 1.44M
FDC 0 is a National Semiconductor PC87306
loop: loaded (max 8 devices)
eepro100.c:v1.09j-t 9/29/99 Donald Becker http://www.scyld.com/network/eepro100.html
eepro100.c: $Revision: 1.36 $ 2000/11/17 Modified by Andrey V. Savochkin <[email protected]> and others
eth0: Intel Corp. 82557/8/9 [Ethernet , 00:A0:C9:49:5D:58, IRQ 18.
Receiver lock-up bug exists -- enabling work-around.
Board assembly 645520-034, Physical connectors present: RJ45
Primary interface chip DP83840 PHY #1.
DP83840 specific setup, setting register 23 to 8462.
General self-test: passed.
Serial sub-system self-test: passed.
Internal registers self-test: passed.
ROM checksum self-test: passed (0x49caa8d6).
Receiver lock-up workaround activated.
Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
PIIX3: IDE controller at PCI slot 00:07.1
PIIX3: chipset revision 0
PIIX3: not 100% native mode: will probe irqs later
ide0: BM-DMA at 0xffa0-0xffa7, BIOS settings: hda:pio, hdb:pio
hda: IBM-DTLA-307045, ATA DISK drive
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
ide2: I/O resource 0x3EE-0x3EE not free.
ide2: ports already in use, skipping probe
hda: host protected area => 1
hda: 90069840 sectors (46116 MB) w/1916KiB Cache, CHS=89355/16/63, (U)DMA
hda: hda1 hda2 hda3 hda4 < hda5 hda6 >
serio: i8042 AUX port at 0x60,0x64 irq 12
input: AT Set 2 keyboard on isa0060/serio0
serio: i8042 KBD port at 0x60,0x64 irq 1
NET4: Linux TCP/IP 1.0 for NET4.0
IP: routing cache hash table of 512 buckets, 8Kbytes
TCP: Hash tables configured (established 4096 bind 5461)
NET4: Unix domain sockets 1.0/SMP for Linux NET4.0.
EXT3-fs: INFO: recovery required on readonly filesystem.
EXT3-fs: write access will be enabled during recovery.
kjournald starting. Commit interval 5 seconds
EXT3-fs: recovery complete.
EXT3-fs: mounted filesystem with ordered data mode.
VFS: Mounted root (ext3 filesystem) readonly.
Freeing unused kernel memory: 320k freed


And the active config options:

CONFIG_X86=y
CONFIG_MMU=y
CONFIG_UID16=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_EXPERIMENTAL=y
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_SYSCTL=y
CONFIG_LOG_BUF_SHIFT=15
CONFIG_MODULES=y
CONFIG_MODULE_UNLOAD=y
CONFIG_OBSOLETE_MODPARM=y
CONFIG_KMOD=y
CONFIG_X86_PC=y
CONFIG_M686=y
CONFIG_X86_CMPXCHG=y
CONFIG_X86_XADD=y
CONFIG_X86_L1_CACHE_SHIFT=5
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_X86_PPRO_FENCE=y
CONFIG_X86_WP_WORKS_OK=y
CONFIG_X86_INVLPG=y
CONFIG_X86_BSWAP=y
CONFIG_X86_POPAD_OK=y
CONFIG_X86_GOOD_APIC=y
CONFIG_X86_USE_PPRO_CHECKSUM=y
CONFIG_SMP=y
CONFIG_NR_CPUS=2
CONFIG_X86_LOCAL_APIC=y
CONFIG_X86_IO_APIC=y
CONFIG_X86_TSC=y
CONFIG_X86_MSR=y
CONFIG_X86_CPUID=y
CONFIG_NOHIGHMEM=y
CONFIG_HAVE_DEC_LOCK=y
CONFIG_PCI=y
CONFIG_PCI_GOANY=y
CONFIG_PCI_BIOS=y
CONFIG_PCI_DIRECT=y
CONFIG_PCI_NAMES=y
CONFIG_ISA=y
CONFIG_KCORE_ELF=y
CONFIG_BINFMT_ELF=y
CONFIG_BINFMT_MISC=y
CONFIG_PARPORT=y
CONFIG_PARPORT_PC=y
CONFIG_PARPORT_PC_CML1=y
CONFIG_BLK_DEV_FD=y
CONFIG_BLK_DEV_LOOP=y
CONFIG_LBD=y
CONFIG_IDE=y
CONFIG_BLK_DEV_IDE=y
CONFIG_BLK_DEV_IDEDISK=y
CONFIG_BLK_DEV_IDECD=y
CONFIG_BLK_DEV_IDEPCI=y
CONFIG_IDEPCI_SHARE_IRQ=y
CONFIG_BLK_DEV_IDEDMA_PCI=y
CONFIG_IDEDMA_PCI_AUTO=y
CONFIG_BLK_DEV_IDEDMA=y
CONFIG_BLK_DEV_ADMA=y
CONFIG_BLK_DEV_PIIX=y
CONFIG_IDEDMA_AUTO=y
CONFIG_BLK_DEV_IDE_MODES=y
CONFIG_NET=y
CONFIG_PACKET=y
CONFIG_UNIX=y
CONFIG_INET=y
CONFIG_IP_MULTICAST=y
CONFIG_IPV6_SCTP__=y
CONFIG_NETDEVICES=y
CONFIG_NET_ETHERNET=y
CONFIG_NET_VENDOR_3COM=y
CONFIG_VORTEX=y
CONFIG_NET_PCI=y
CONFIG_EEPRO100=y
CONFIG_INPUT=y
CONFIG_SOUND_GAMEPORT=y
CONFIG_SERIO=y
CONFIG_SERIO_I8042=y
CONFIG_INPUT_KEYBOARD=y
CONFIG_KEYBOARD_ATKBD=y
CONFIG_VT=y
CONFIG_VT_CONSOLE=y
CONFIG_HW_CONSOLE=y
CONFIG_SERIAL_8250=y
CONFIG_SERIAL_8250_CONSOLE=y
CONFIG_SERIAL_8250_EXTENDED=y
CONFIG_SERIAL_8250_MANY_PORTS=y
CONFIG_SERIAL_8250_SHARE_IRQ=y
CONFIG_SERIAL_CORE=y
CONFIG_SERIAL_CORE_CONSOLE=y
CONFIG_UNIX98_PTYS=y
CONFIG_UNIX98_PTY_COUNT=256
CONFIG_I2C=y
CONFIG_I2C_ALGOBIT=y
CONFIG_EXT2_FS=y
CONFIG_EXT3_FS=y
CONFIG_EXT3_FS_XATTR=y
CONFIG_JBD=y
CONFIG_FS_MBCACHE=y
CONFIG_AUTOFS4_FS=y
CONFIG_ISO9660_FS=y
CONFIG_FAT_FS=y
CONFIG_MSDOS_FS=y
CONFIG_VFAT_FS=y
CONFIG_PROC_FS=y
CONFIG_DEVPTS_FS=y
CONFIG_TMPFS=y
CONFIG_RAMFS=y
CONFIG_NFS_FS=y
CONFIG_NFS_V3=y
CONFIG_NFSD=y
CONFIG_NFSD_V3=y
CONFIG_LOCKD=y
CONFIG_LOCKD_V4=y
CONFIG_EXPORTFS=y
CONFIG_SUNRPC=y
CONFIG_MSDOS_PARTITION=y
CONFIG_NLS=y
CONFIG_NLS_DEFAULT="iso8859-1"
CONFIG_VGA_CONSOLE=y
CONFIG_DUMMY_CONSOLE=y
CONFIG_DEBUG_KERNEL=y
CONFIG_DEBUG_SLAB=y
CONFIG_MAGIC_SYSRQ=y
CONFIG_DEBUG_SPINLOCK=y
CONFIG_KALLSYMS=y
CONFIG_X86_EXTRA_IRQS=y
CONFIG_X86_FIND_SMP_CONFIG=y
CONFIG_X86_MPPARSE=y
CONFIG_X86_SMP=y
CONFIG_X86_HT=y
CONFIG_X86_BIOS_REBOOT=y
CONFIG_X86_TRAMPOLINE=y


David


2003-05-15 09:13:18

by David Howells

[permalink] [raw]
Subject: Re: 2.5 kernels fail to start second CPU


> I've got a computer here with a pair of Pentium Pro CPUs in it. 2.4 kernels
> have no problem starting both CPUs, only later 2.5 kernels (with or without
> noapic passed on the kernel cmdline).

Here's the 2.4 boot messages for comparison:

Linux version 2.4.18-5smp ([email protected]) (gcc version 2.96 20000731 (Red Hat Linux 7.3 2.96-110)) #1 SMP Mon Jun 10 15:19:40 EDT 2002
BIOS-provided physical RAM map:
BIOS-e820: 0000000000000000 - 000000000009fc00 (usable)
BIOS-e820: 0000000000100000 - 0000000008000000 (usable)
BIOS-e820: 00000000fec00000 - 00000000fec09000 (reserved)
BIOS-e820: 00000000ffe80000 - 0000000100000000 (reserved)
0MB HIGHMEM available.
128MB LOWMEM available.
found SMP MP-table at 000f8120
hm, page 000f8000 reserved twice.
hm, page 000f9000 reserved twice.
hm, page 000f8000 reserved twice.
hm, page 000f9000 reserved twice.
On node 0 totalpages: 32768
zone(0): 4096 pages.
zone(1): 28672 pages.
zone(2): 0 pages.
Intel MultiProcessor Specification v1.4
Virtual Wire compatibility mode.
OEM ID: INTEL Product ID: PR440FX APIC at: 0xFEC08000
Processor #0 Pentium(tm) Pro APIC version 17
Processor #12 Pentium(tm) Pro APIC version 17
I/O APIC #13 Version 17 at 0xFEC00000.
Processors: 2
Kernel command line: ro root=/dev/hda2 console=ttyS0,115200 console=tty0 noapic
Initializing CPU#0
Detected 198.668 MHz processor.
Console: colour VGA+ 80x25
Calibrating delay loop... 396.49 BogoMIPS
Memory: 125600k/131072k available (1232k kernel code, 5084k reserved, 853k data, 316k init, 0k highmem)
Dentry cache hash table entries: 16384 (order: 5, 131072 bytes)
Inode cache hash table entries: 8192 (order: 4, 65536 bytes)
Mount cache hash table entries: 2048 (order: 2, 16384 bytes)
Buffer cache hash table entries: 8192 (order: 3, 32768 bytes)
Page-cache hash table entries: 32768 (order: 5, 131072 bytes)
CPU: L1 I cache: 8K, L1 D cache: 8K
CPU: L2 cache: 256K
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#0.
Checking 'hlt' instruction... OK.
POSIX conformance testing by UNIFIX
mtrr: v1.40 (20010327) Richard Gooch ([email protected])
mtrr: detected mtrr type: Intel
CPU: L1 I cache: 8K, L1 D cache: 8K
CPU: L2 cache: 256K
Intel machine check reporting enabled on CPU#0.
CPU0: Intel Pentium Pro stepping 09
per-CPU timeslice cutoff: 733.84 usecs.
task migration cache decay timeout: 10 msecs.
enabled ExtINT on CPU#0
ESR value before enabling vector: 00000040
ESR value after enabling vector: 00000000
Booting processor 1/12 eip 2000
Initializing CPU#1
masked ExtINT on CPU#1
ESR value before enabling vector: 00000000
ESR value after enabling vector: 00000000
Calibrating delay loop... 396.49 BogoMIPS
CPU: L1 I cache: 8K, L1 D cache: 8K
CPU: L2 cache: 256K
Intel machine check reporting enabled on CPU#1.
CPU1: Intel Pentium Pro stepping 09
Total of 2 processors activated (792.98 BogoMIPS).
Using local APIC timer interrupts.
calibrating APIC timer ...
..... CPU clock speed is 198.6715 MHz.
..... host bus clock speed is 66.2235 MHz.
cpu: 0, clocks: 662235, slice: 220745
CPU0<T0:662224,T1:441472,D:7,S:220745,C:662235>
cpu: 1, clocks: 662235, slice: 220745
CPU1<T0:662224,T1:220720,D:14,S:220745,C:662235>
checking TSC synchronization across CPUs: passed.
migration_task 0 on cpu=0
migration_task 1 on cpu=1
PCI: PCI BIOS revision 2.10 entry at 0xfda11, last bus=0
PCI: Using configuration type 1
PCI: Probing PCI hardware
Limiting direct PCI/PCI transfers.
Activating ISA DMA hang workarounds.
isapnp: Scanning for PnP cards...
isapnp: Card 'CS4236 Audio'
isapnp: 1 Plug & Play card detected total
Linux NET4.0 for Linux 2.4
Based upon Swansea University Computer Society NET3.039
Initializing RT netlink socket
apm: BIOS not found.
Starting kswapd
VFS: Diskquotas version dquot_6.5.0 initialized
pty: 2048 Unix98 ptys configured
Serial driver version 5.05c (2001-07-08) with MANY_PORTS MULTIPORT SHARE_IRQ SERIAL_PCI ISAPNP enabled
ttyS00 at 0x03f8 (irq = 4) is a 16550A
ttyS01 at 0x02f8 (irq = 3) is a 16550A
ttyS02 at 0x03e8 (irq = 4) is a 16550A
ttyS03 at 0x02e8 (irq = 3) is a 16550A
Real Time Clock Driver v1.10e
block: 240 slots per queue, batch=60
Uniform Multi-Platform E-IDE driver Revision: 6.31
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
PIIX3: IDE controller on PCI bus 00 dev 39
PIIX3: chipset revision 0
PIIX3: not 100% native mode: will probe irqs later
ide0: BM-DMA at 0xffa0-0xffa7, BIOS settings: hda:pio, hdb:pio
hda: IBM-DTLA-307045, ATA DISK drive
ide2: ports already in use, skipping probe
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
blk: queue c03d0e84, I/O limit 4095Mb (mask 0xffffffff)
hda: 90069840 sectors (46116 MB) w/1916KiB Cache, CHS=5606/255/63, (U)DMA
ide-floppy driver 0.99.newide
Partition check:
hda: hda1 hda2 hda3 hda4 < hda5 hda6 >
Floppy drive(s): fd0 is 1.44M
FDC 0 is a National Semiconductor PC87306
RAMDISK driver initialized: 16 RAM disks of 4096K size 1024 blocksize
ide-floppy driver 0.99.newide
md: md driver 0.90.0 MAX_MD_DEVS=256, MD_SB_DISKS=27
md: Autodetecting RAID arrays.
md: autorun ...
md: ... autorun DONE.
pci_hotplug: PCI Hot Plug PCI Core version: 0.4
NET4: Linux TCP/IP 1.0 for NET4.0
IP Protocols: ICMP, UDP, TCP, IGMP
IP: routing cache hash table of 1024 buckets, 8Kbytes
TCP: Hash tables configured (established 8192 bind 8192)
Linux IP multicast router 0.06 plus PIM-SM
NET4: Unix domain sockets 1.0/SMP for Linux NET4.0.
RAMDISK: Compressed image found at block 0
Freeing initrd memory: 244k freed
VFS: Mounted root (ext2 filesystem).
SCSI subsystem driver Revision: 1.00
kmod: failed to exec /sbin/modprobe -s -k scsi_hostadapter, errno = 2
scsi0 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 6.2.6
<Adaptec aic7880 Ultra SCSI adapter>
aic7880: Ultra Wide Channel A, SCSI Id=7, 16/253 SCBs

Journalled Block Device driver loaded
kjournald starting. Commit interval 5 seconds
EXT3-fs: mounted filesystem with ordered data mode.
Freeing unused kernel memory: 316k freed


David

2003-05-15 09:36:34

by William Lee Irwin III

[permalink] [raw]
Subject: Re: 2.5 kernels fail to start second CPU

On Thu, May 15, 2003 at 09:57:42AM +0100, David Howells wrote:
> I've got a computer here with a pair of Pentium Pro CPUs in it. 2.4 kernels
> have no problem starting both CPUs, only later 2.5 kernels (with or without
> noapic passed on the kernel cmdline).
> Can anyone suggest what might need to be done to fix the problem?
> The motherboard sports a fairly standard Intel chipset and there's a Matrox
> graphics card plugged in:

Sparse physical APIC ID's are not handled properly. This should correct
them.


-- wli


diff -prauN linux-2.5.69-1/arch/i386/kernel/smpboot.c dhowells-2.5.69-1/arch/i386/kernel/smpboot.c
--- linux-2.5.69-1/arch/i386/kernel/smpboot.c Mon May 12 11:09:21 2003
+++ dhowells-2.5.69-1/arch/i386/kernel/smpboot.c Thu May 15 02:46:59 2003
@@ -935,7 +935,7 @@

static void __init smp_boot_cpus(unsigned int max_cpus)
{
- int apicid, cpu, bit;
+ int apicid, cpu, bit, kicked;

/*
* Setup boot CPU information
@@ -1018,7 +1018,8 @@
*/
Dprintk("CPU present map: %lx\n", phys_cpu_present_map);

- for (bit = 0; bit < NR_CPUS; bit++) {
+ kicked = 0;
+ for (bit = 0; kicked < NR_CPUS && bit < BITS_PER_LONG; bit++) {
apicid = cpu_present_to_apicid(bit);
/*
* Don't even attempt to start the boot CPU!
@@ -1034,6 +1035,8 @@
if (do_boot_cpu(apicid))
printk("CPU #%d not responding - cannot use it.\n",
apicid);
+ else
+ ++kicked;
}

/*

2003-05-15 09:59:03

by William Lee Irwin III

[permalink] [raw]
Subject: Re: 2.5 kernels fail to start second CPU

On Thu, May 15, 2003 at 02:48:34AM -0700, William Lee Irwin III wrote:
> Sparse physical APIC ID's are not handled properly. This should correct
> them.

I forgot to count the BSP in the initial count of the number of kicked
cpus. This patch does it correctly.

To handle sparse physical APIC ID's properly the phys_cpu_present_map
must be scanned beyond bit NR_CPUS while ensuring no more than NR_CPUS
are woken in order not to attempt to wake non-addressible cpus.

The following patch adds that logic to smp_boot_cpus() and corrects the
failure to wake secondaries reported by dhowells, with successful
wakeup, runtime, reboot, and halting reported after it was applied.


-- wli


diff -prauN linux-2.5.69-1/arch/i386/kernel/smpboot.c dhowells-2.5.69-1/arch/i386/kernel/smpboot.c
--- linux-2.5.69-1/arch/i386/kernel/smpboot.c Mon May 12 11:09:21 2003
+++ dhowells-2.5.69-1/arch/i386/kernel/smpboot.c Thu May 15 02:46:59 2003
@@ -935,7 +935,7 @@

static void __init smp_boot_cpus(unsigned int max_cpus)
{
- int apicid, cpu, bit;
+ int apicid, cpu, bit, kicked;

/*
* Setup boot CPU information
@@ -1018,7 +1018,8 @@
*/
Dprintk("CPU present map: %lx\n", phys_cpu_present_map);

- for (bit = 0; bit < NR_CPUS; bit++) {
+ kicked = 1;
+ for (bit = 0; kicked < NR_CPUS && bit < BITS_PER_LONG; bit++) {
apicid = cpu_present_to_apicid(bit);
/*
* Don't even attempt to start the boot CPU!
@@ -1034,6 +1035,8 @@
if (do_boot_cpu(apicid))
printk("CPU #%d not responding - cannot use it.\n",
apicid);
+ else
+ ++kicked;
}

/*

2003-05-15 18:14:57

by Bill Davidsen

[permalink] [raw]
Subject: Re: 2.5 kernels fail to start second CPU

On Thu, 15 May 2003, William Lee Irwin III wrote:

> On Thu, May 15, 2003 at 02:48:34AM -0700, William Lee Irwin III wrote:
> > Sparse physical APIC ID's are not handled properly. This should correct
> > them.
>
> I forgot to count the BSP in the initial count of the number of kicked
> cpus. This patch does it correctly.
>
> To handle sparse physical APIC ID's properly the phys_cpu_present_map
> must be scanned beyond bit NR_CPUS while ensuring no more than NR_CPUS
> are woken in order not to attempt to wake non-addressible cpus.
>
> The following patch adds that logic to smp_boot_cpus() and corrects the
> failure to wake secondaries reported by dhowells, with successful
> wakeup, runtime, reboot, and halting reported after it was applied.

While you are (somewhat) on the topic of starting processors, I want to
benchmark and application on a dual Xeon system. I want to try these
configurations, preferably without opening the box, since it's in
another time zone.

2 cpu w/ ht normal boot
2 cpu w/o ht noht
1 cpu w/o ht nosmp noht
1 cpu w/ ht ???

It looks as if maxcpus=2 counts physical units? I can't try it until Monday.

--
bill davidsen <[email protected]>
CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.

2003-05-15 18:36:03

by William Lee Irwin III

[permalink] [raw]
Subject: Re: 2.5 kernels fail to start second CPU

On Thu, May 15, 2003 at 02:21:10PM -0400, Bill Davidsen wrote:
> While you are (somewhat) on the topic of starting processors, I want to
> benchmark and application on a dual Xeon system. I want to try these
> configurations, preferably without opening the box, since it's in
> another time zone.
> 2 cpu w/ ht normal boot
> 2 cpu w/o ht noht
> 1 cpu w/o ht nosmp noht
> 1 cpu w/ ht ???
> It looks as if maxcpus=2 counts physical units? I can't try it until Monday.

What on earth are you getting on about?

ia32 is utter crap with respect to power management, virtualization,
and generalized firmware.

If you don't have remote power management, buy it in whatever form
possible.


-- wli

2003-05-15 19:20:13

by Bill Davidsen

[permalink] [raw]
Subject: Re: 2.5 kernels fail to start second CPU

On Thu, 15 May 2003, William Lee Irwin III wrote:

> On Thu, May 15, 2003 at 02:21:10PM -0400, Bill Davidsen wrote:
> > While you are (somewhat) on the topic of starting processors, I want to
> > benchmark and application on a dual Xeon system. I want to try these
> > configurations, preferably without opening the box, since it's in
> > another time zone.
> > 2 cpu w/ ht normal boot
> > 2 cpu w/o ht noht
> > 1 cpu w/o ht nosmp noht
> > 1 cpu w/ ht ???
> > It looks as if maxcpus=2 counts physical units? I can't try it until Monday.
>
> What on earth are you getting on about?

I want to benchmark the box using one or two CPUs, with and without
hyperthreading, as listed in the configurations above. To do this I want
to use the boot options also listed in the original post above. I can
reboot the box remotely but I can't physically remove a cpu to get the
single cpu+ht config, so I'm looking for boot line options to provide
that.

> ia32 is utter crap with respect to power management, virtualization,
> and generalized firmware.
>
> If you don't have remote power management, buy it in whatever form
> possible.

I'm not trying to manage the power, it's not a laptop, I'm trying to run
benchmarks as noted in the first sentence of my question. I don't see
how you got from there to virtualization from how to start (or not) cpus.

--
bill davidsen <[email protected]>
CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.

2003-05-15 20:17:43

by William Lee Irwin III

[permalink] [raw]
Subject: Re: 2.5 kernels fail to start second CPU

On Thu, 15 May 2003, William Lee Irwin III wrote:
>> What on earth are you getting on about?

On Thu, May 15, 2003 at 03:26:36PM -0400, Bill Davidsen wrote:
> I want to benchmark the box using one or two CPUs, with and without
> hyperthreading, as listed in the configurations above. To do this I want
> to use the boot options also listed in the original post above. I can
> reboot the box remotely but I can't physically remove a cpu to get the
> single cpu+ht config, so I'm looking for boot line options to provide
> that.

Please describe the following:
(1) what options you passed (.config if it differs between boots)
(2) how many cpus you expected
(3) how many cpus you got

for whatever you're doing that appears to go wrong.


On Thu, 15 May 2003, William Lee Irwin III wrote:
>> ia32 is utter crap with respect to power management, virtualization,
>> and generalized firmware.
>> If you don't have remote power management, buy it in whatever form
>> possible.

On Thu, May 15, 2003 at 03:26:36PM -0400, Bill Davidsen wrote:
> I'm not trying to manage the power, it's not a laptop, I'm trying to run
> benchmarks as noted in the first sentence of my question. I don't see
> how you got from there to virtualization from how to start (or not) cpus.

It sounded like what you were talking about. Maybe I get too many
people pushing feature requests as bugs.


-- wli