2008-08-06 03:15:41

by Jeff Chua

[permalink] [raw]
Subject: 2.6.27rc1 cannot boot more than 8CPUs

I've a Dell R900 with 4 quad-core Xeon processors (total 16CPUs), but
can only managed to boot up with CONFIG_NR_CPUS=8. Setting
CONFIG_NR_CPUS=16 causes the kernel to hang while booting.


Here's the dmesg with CONFIG_NR_CPUS=8 ...

CPU6: Intel(R) Xeon(R) CPU L7345 @ 1.86GHz stepping 0b
checking TSC synchronization [CPU#0 -> CPU#6]: passed.
CPU 7 irqstacks, hard=c0526000 soft=c051e000
Booting processor 7/26 ip 6000
Initializing CPU#7
Calibrating delay using timer specific routine.. 3723.85 BogoMIPS (lpj=7447700)
CPU: L1 I cache: 32K, L1 D cache: 32K
CPU: L2 cache: 4096K
CPU: Physical Processor ID: 6
CPU: Processor Core ID: 2
x86 PAT enabled: cpu 7, old 0x7040600070406, new 0x7010600070106
CPU7: Intel(R) Xeon(R) CPU L7345 @ 1.86GHz stepping 0b
checking TSC synchronization [CPU#0 -> CPU#7]: passed.
Brought up 8 CPUs
Total of 8 processors activated (29790.71 BogoMIPS).
net_namespace: 596 bytes
Booting paravirtualized kernel on bare hardware
NET: Registered protocol family 16


Here's the dmesg with CONFIG_NR_CPUS=16 ...
Booting processor 8/1 ip 6000
Initializing CPU#8
Calibrating delay using timer specific routine.. 3723.85 BogoMIPS (lpj=7447793)
CPU: L1 I cache: 32K, L1 D cache: 32K
CPU: L2 cache: 4096K
CPU: Physical Processor ID: 0
CPU: Processor Core ID: 1
x86 PAT enabled: cpu 7, old 0x7040600070406, new 0x7010600070106
CPU8: Intel(R) Xeon(R) CPU L7345 @ 1.86GHz stepping 0b
checking TSC synchronization [CPU#0 -> CPU#8]: passed.
*** Hangs here ***


How can I debug this further? I'm using the latest linux git pull.


Thanks,
Jeff.


2008-08-06 03:31:51

by Max Krasnyansky

[permalink] [raw]
Subject: Re: 2.6.27rc1 cannot boot more than 8CPUs

Jeff Chua wrote:
> I've a Dell R900 with 4 quad-core Xeon processors (total 16CPUs), but
> can only managed to boot up with CONFIG_NR_CPUS=8. Setting
> CONFIG_NR_CPUS=16 causes the kernel to hang while booting.

You could try booting CONFIG_NR_CPUS=16 with maxcpus=8 (kernel command line
option).

If it boots you can then try bringing the rest of the cpus online manually
echo 1 > /sys/devices/system/cpu/cpu8/online
...
echo 1 > /sys/devices/system/cpu/cpu15/online

Might get a better OOPS/BUG_ON/etc report.

Max

2008-08-06 03:52:29

by Jeff Chua

[permalink] [raw]
Subject: Re: 2.6.27rc1 cannot boot more than 8CPUs

On Wed, Aug 6, 2008 at 11:31 AM, Max Krasnyansky <[email protected]> wrote:
> You could try booting CONFIG_NR_CPUS=16 with maxcpus=8 (kernel command line option).
> If it boots you can then try bringing the rest of the cpus online manually

Ok, booted with CONFIG_NR_CPUS=16 with maxcpus=8, but can't find the
rest of the CPUs.

I can only find cpu0 to cpu7 ...


drwxr-xr-x 4 root root 0 Aug 6 19:38 cpu0
drwxr-xr-x 4 root root 0 Aug 6 19:38 cpu1
drwxr-xr-x 4 root root 0 Aug 6 19:38 cpu2
drwxr-xr-x 4 root root 0 Aug 6 19:38 cpu3
drwxr-xr-x 4 root root 0 Aug 6 19:38 cpu4
drwxr-xr-x 4 root root 0 Aug 6 19:38 cpu5
drwxr-xr-x 4 root root 0 Aug 6 19:38 cpu6
drwxr-xr-x 4 root root 0 Aug 6 19:38 cpu7
-r--r--r-- 1 root root 4096 Aug 6 19:41 online
-r--r--r-- 1 root root 4096 Aug 6 19:39 possible
-r--r--r-- 1 root root 4096 Aug 6 19:38 present
-rw-r--r-- 1 root root 4096 Aug 6 19:38 sched_mc_power_savings

# cat online
0-7
# cat possible
0-23
# cat present
0-7

Thanks,
Jeff.

2008-08-06 03:54:56

by Max Krasnyansky

[permalink] [raw]
Subject: Re: 2.6.27rc1 cannot boot more than 8CPUs

Jeff Chua wrote:
> On Wed, Aug 6, 2008 at 11:31 AM, Max Krasnyansky <[email protected]> wrote:
>> You could try booting CONFIG_NR_CPUS=16 with maxcpus=8 (kernel command line option).
>> If it boots you can then try bringing the rest of the cpus online manually
>
> Ok, booted with CONFIG_NR_CPUS=16 with maxcpus=8, but can't find the
> rest of the CPUs.
>
> I can only find cpu0 to cpu7 ...
>
>
> drwxr-xr-x 4 root root 0 Aug 6 19:38 cpu0
> drwxr-xr-x 4 root root 0 Aug 6 19:38 cpu1
> drwxr-xr-x 4 root root 0 Aug 6 19:38 cpu2
> drwxr-xr-x 4 root root 0 Aug 6 19:38 cpu3
> drwxr-xr-x 4 root root 0 Aug 6 19:38 cpu4
> drwxr-xr-x 4 root root 0 Aug 6 19:38 cpu5
> drwxr-xr-x 4 root root 0 Aug 6 19:38 cpu6
> drwxr-xr-x 4 root root 0 Aug 6 19:38 cpu7
> -r--r--r-- 1 root root 4096 Aug 6 19:41 online
> -r--r--r-- 1 root root 4096 Aug 6 19:39 possible
> -r--r--r-- 1 root root 4096 Aug 6 19:38 present
> -rw-r--r-- 1 root root 4096 Aug 6 19:38 sched_mc_power_savings
>
> # cat online
> 0-7
> # cat possible
> 0-23
> # cat present
> 0-7

Are you running 32-bit kernel ?

Max

2008-08-06 04:06:08

by Jeff Chua

[permalink] [raw]
Subject: Re: 2.6.27rc1 cannot boot more than 8CPUs

On Wed, Aug 6, 2008 at 11:50 AM, Yinghai Lu <[email protected]> wrote:

> 32bit or 64 bit?

32bit.

High Memory Support (64GB) --->
-*- PAE (Physical Address Extension) Support
Memory model (Sparse Memory) --->
-*- 64 bit Memory and IO resources (EXPERIMENTAL)
[*] Allocate 3rd-level pagetables from highmem


> can you post full demsg?

The following is "cut&paste" ... (gmail is not the a good way to do
this.) for the 8 cpus that I managed to boot.

routine.. 3723.90 BogoMIPS (lpj=7447810)
CPU: L1 I cache: 32K, L1 D cache: 32K
CPU: L2 cache: 4096K
CPU: Physical Processor ID: 6
CPU: Processor Core ID: 0
x86 PAT enabled: cpu 3, old 0x7040600070406, new 0x7010600070106
CPU3: Intel(R) Xeon(R) CPU L7345 @ 1.86GHz stepping 0b
checking TSC synchronization [CPU#0 -> CPU#3]: passed.
CPU 4 irqstacks, hard=c053f000 soft=c051f000
Booting processor 4/2 ip 6000
Initializing CPU#4
Calibrating delay using timer specific routine.. 3723.87 BogoMIPS (lpj=7447748)
CPU: L1 I cache: 32K, L1 D cache: 32K
CPU: L2 cache: 4096K
CPU: Physical Processor ID: 0
CPU: Processor Core ID: 2
x86 PAT enabled: cpu 4, old 0x7040600070406, new 0x7010600070106
CPU4: Intel(R) Xeon(R) CPU L7345 @ 1.86GHz stepping 0b
checking TSC synchronization [CPU#0 -> CPU#4]: passed.
CPU 5 irqstacks, hard=c0540000 soft=c0520000
Booting processor 5/10 ip 6000
Initializing CPU#5
Calibrating delay using timer specific routine.. 3723.89 BogoMIPS (lpj=7447794)
CPU: L1 I cache: 32K, L1 D cache: 32K
CPU: L2 cache: 4096K
CPU: Physical Processor ID: 2
CPU: Processor Core ID: 2
x86 PAT enabled: cpu 5, old 0x7040600070406, new 0x7010600070106
CPU5: Intel(R) Xeon(R) CPU L7345 @ 1.86GHz stepping 0b
checking TSC synchronization [CPU#0 -> CPU#5]: passed.
CPU 6 irqstacks, hard=c0541000 soft=c0521000
Booting processor 6/18 ip 6000
Initializing CPU#6
Calibrating delay using timer specific routine.. 3723.89 BogoMIPS (lpj=7447791)
CPU: L1 I cache: 32K, L1 D cache: 32K
CPU: L2 cache: 4096K
CPU: Physical Processor ID: 4
CPU: Processor Core ID: 2
x86 PAT enabled: cpu 6, old 0x7040600070406, new 0x7010600070106
CPU6: Intel(R) Xeon(R) CPU L7345 @ 1.86GHz stepping 0b
checking TSC synchronization [CPU#0 -> CPU#6]: passed.
CPU 7 irqstacks, hard=c0542000 soft=c0522000
Booting processor 7/26 ip 6000
Initializing CPU#7
Calibrating delay using timer specific routine.. 3723.88 BogoMIPS (lpj=7447769)
CPU: L1 I cache: 32K, L1 D cache: 32K
CPU: L2 cache: 4096K
CPU: Physical Processor ID: 6
CPU: Processor Core ID: 2
x86 PAT enabled: cpu 7, old 0x7040600070406, new 0x7010600070106
CPU7: Intel(R) Xeon(R) CPU L7345 @ 1.86GHz stepping 0b
checking TSC synchronization [CPU#0 -> CPU#7]: passed.
Brought up 8 CPUs
Total of 8 processors activated (29791.07 BogoMIPS).
net_namespace: 596 bytes
Booting paravirtualized kernel on bare hardware
NET: Registered protocol family 16
No dock devices found.
ACPI FADT declares the system doesn't support PCIe ASPM, so disable it
ACPI: bus type pci registered
PCI: Using configuration type 1 for base access
PCI: Dell PowerEdge R900 detected, enabling pci=bfsort.
ACPI: EC: Look up EC in DSDT
ACPI: BIOS _OSI(Linux) query ignored
ACPI: DMI System Vendor: Dell Inc.
ACPI: DMI Product Name: PowerEdge R900
ACPI: DMI Product Version:
ACPI: DMI Board Name: 0C764H
ACPI: DMI BIOS Vendor: Dell Inc.
ACPI: DMI BIOS Date: 03/23/2008
ACPI: Please send DMI info above to [email protected]
ACPI: If "acpi_osi=Linux" works better, please notify [email protected]
ACPI: Interpreter enabled
ACPI: (supports S0 S4 S5)
ACPI: Using IOAPIC for interrupt routing
ACPI: PCI Root Bridge [PCI0] (0000:00)
pci 0000:00:00.0: PME# supported from D0 D3hot D3cold
pci 0000:00:00.0: PME# disabled
pci 0000:00:01.0: PME# supported from D0 D3hot D3cold
pci 0000:00:01.0: PME# disabled
pci 0000:00:02.0: PME# supported from D0 D3hot D3cold
pci 0000:00:02.0: PME# disabled
pci 0000:00:03.0: PME# supported from D0 D3hot D3cold
pci 0000:00:03.0: PME# disabled
pci 0000:00:04.0: PME# supported from D0 D3hot D3cold
pci 0000:00:04.0: PME# disabled
pci 0000:00:06.0: PME# supported from D0 D3hot D3cold
pci 0000:00:06.0: PME# disabled
pci 0000:00:1c.0: PME# supported from D0 D3hot D3cold
pci 0000:00:1c.0: PME# disabled
pci 0000:00:1d.7: PME# supported from D0 D3hot D3cold
pci 0000:00:1d.7: PME# disabled
pci 0000:00:1f.2: PME# supported from D3hot
pci 0000:00:1f.2: PME# disabled
pci 0000:16:00.0: PME# supported from D0 D3hot D3cold
pci 0000:16:00.0: PME# disabled
pci 0000:16:00.3: PME# supported from D0 D3hot D3cold
pci 0000:16:00.3: PME# disabled
pci 0000:17:00.0: PME# supported from D0 D3hot D3cold
pci 0000:17:00.0: PME# disabled
pci 0000:17:01.0: PME# supported from D0 D3hot D3cold
pci 0000:17:01.0: PME# disabled
pci 0000:19:00.0: supports D1
pci 0000:01:00.0: PME# supported from D0 D3hot D3cold
pci 0000:01:00.0: PME# disabled
pci 0000:02:01.0: PME# supported from D0 D3hot D3cold
pci 0000:02:01.0: PME# disabled
pci 0000:02:02.0: PME# supported from D0 D3hot D3cold
pci 0000:02:02.0: PME# disabled
pci 0000:02:03.0: PME# supported from D0 D3hot D3cold
pci 0000:02:03.0: PME# disabled
pci 0000:02:04.0: PME# supported from D0 D3hot D3cold
pci 0000:02:04.0: PME# disabled
pci 0000:02:05.0: PME# supported from D0 D3hot D3cold
pci 0000:02:05.0: PME# disabled
pci 0000:0a:00.0: PME# supported from D0 D3hot D3cold
pci 0000:0a:00.0: PME# disabled
pci 0000:0b:00.0: PME# supported from D3hot D3cold
pci 0000:0b:00.0: PME# disabled
pci 0000:08:00.0: PME# supported from D0 D3hot D3cold
pci 0000:08:00.0: PME# disabled
pci 0000:09:00.0: PME# supported from D3hot D3cold
pci 0000:09:00.0: PME# disabled
pci 0000:06:00.0: PME# supported from D0 D3hot D3cold
pci 0000:06:00.0: PME# disabled
pci 0000:07:00.0: PME# supported from D3hot D3cold
pci 0000:07:00.0: PME# disabled
pci 0000:04:00.0: PME# supported from D0 D3hot D3cold
pci 0000:04:00.0: PME# disabled
pci 0000:05:00.0: PME# supported from D3hot D3cold
pci 0000:05:00.0: PME# disabled
pci 0000:0c:00.0: PME# supported from D0 D3hot D3cold
pci 0000:0d:02.0: PME# disabled
pci 0000:0d:04.0: PME# supported from D0 D3hot D3cold
pci 0000:0d:04.0: PME# disabled
pci 0000:10:00.0: PME# supported from D0 D3hot D3cold
pci 0000:10:00.0: PME# disabled
pci 0000:11:02.0: PME# supported from D0 D3hot D3cold
pci 0000:11:02.0: PME# disabled
pci 0000:11:04.0: PME# supported from D0 D3hot D3cold
pci 0000:11:04.0: PME# disabled
pci 0000:1b:0c.0: supports D1
pci 0000:1b:0c.0: supports D2
pci 0000:00:1e.0: transparent bridge
bus 00 -> node 0
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.PEX1._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.PEX2._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.PEX2.UPST._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.PEX2.UPST.DWN1._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.PEX2.UPST.DWN2._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.PEX3.UPST.DWN2.BCOM._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.PEX3.UPST.DWN3.BCOM._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.PEX3.UPST.DWN4.BCOM._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.PEX3.UPST.DWN5.BCOM._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.PEX4.UPST.DWNB._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.PEX4.UPST.DWNC._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.PEX6.UPST.DWNB._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.PEX6.UPST.DWNC._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.SBEX._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.COMP._PRT]
ACPI: PCI Interrupt Link [LK00] (IRQs 3 4 5 6 7 10 *11 12)
ACPI: PCI Interrupt Link [LK01] (IRQs 3 4 5 6 7 *10 11 12)
ACPI: PCI Interrupt Link [LK02] (IRQs 3 4 5 *6 7 10 11 12)
ACPI: PCI Interrupt Link [LK03] (IRQs 3 4 5 6 7 10 *11 12)
ACPI: PCI Interrupt Link [LK04] (IRQs 3 4 *5 6 7 10 11 12)
ACPI: PCI Interrupt Link [LK05] (IRQs 3 4 5 6 7 10 *11 12)
ACPI: PCI Interrupt Link [LK06] (IRQs 3 4 5 6 7 10 11 12) *0, disabled.
ACPI: PCI Interrupt Link [LK07] (IRQs 3 4 5 6 7 10 11 12) *0, disabled.
Linux Plug and Play Support v0.97 (c) Adam Belay
pnp: PnP ACPI init
ACPI: bus type pnp registered
pnp: PnP ACPI: found 11 devices
ACPI: ACPI bus type pnp unregistered
SCSI subsystem initialized
libata version 3.00 loaded.
usbcore: registered new interface driver usbfs
usbcore: registered new interface driver hub
usbcore: registered new device driver usb
PCI: Using ACPI for IRQ routing
Switched to high resolution mode on CPU 0
Switched to high resolution mode on CPU 5
Switched to high resolution mode on CPU 6
Switched to high resolution mode on CPU 2
Switched to high resolution mode on CPU 4
Switched to high resolution mode on CPU 7
Switched to high resolution mode on CPU 1
Switched to high resolution mode on CPU 3
system 00:07: ioport range 0x800-0x87f has been reserved
system 00:07: ioport range 0x880-0x8bf has been reserved
system 00:07: ioport range 0x8c0-0x8df has been reserved
system 00:07: ioport range 0x8e0-0x8e3 has been reserved
system 00:07: ioport range 0x8f0-0x8f1 has been reserved
system 00:07: ioport range 0x900-0x900 has been reserved
system 00:07: ioport range 0xc00-0xc7f has been reserved
system 00:07: ioport range 0xca0-0xca7 has been reserved
system 00:07: ioport range 0xca9-0xcab has been reserved
system 00:07: ioport range 0xcad-0xcaf has been reserved
system 00:08: ioport range 0xca8-0xca8 has been reserved
system 00:08: ioport range 0xcac-0xcac has been reserved
system 00:09: iomem range 0xe0000000-0xefffffff could not be reserved
pci 0000:00:01.0: PCI bridge, secondary bus 0000:14
pci 0000:00:01.0: IO window: disabled
pci 0000:00:01.0: MEM window: disabled
pci 0000:00:01.0: PREFETCH window: disabled
pci 0000:17:00.0: PCI bridge, secondary bus 0000:19
pci 0000:17:00.0: IO window: 0xe000-0xefff
pci 0000:17:00.0: MEM window: 0xde200000-0xde3fffff
pci 0000:17:00.0: PREFETCH window: disabled
pci 0000:17:01.0: PCI bridge, secondary bus 0000:18
pci 0000:17:01.0: IO window: disabled
pci 0000:17:01.0: MEM window: disabled
pci 0000:17:01.0: PREFETCH window: disabled
pci 0000:16:00.0: PCI bridge, secondary bus 0000:17
pci 0000:16:00.0: IO window: 0xe000-0xefff
pci 0000:16:00.0: MEM window: 0xde200000-0xde3fffff
pci 0000:16:00.0: PREFETCH window: disabled
pci 0000:16:00.3: PCI bridge, secondary bus 0000:1a
pci 0000:16:00.3: IO window: disabled
pci 0000:16:00.3: MEM window: disabled
pci 0000:16:00.3: PREFETCH window: disabled
pci 0000:00:02.0: PCI bridge, secondary bus 0000:16
pci 0000:00:02.0: IO window: 0xe000-0xefff
pci 0000:00:02.0: MEM window: 0xde100000-0xde3fffff
pci 0000:00:02.0: PREFETCH window: disabled
pci 0000:02:01.0: PCI bridge, secondary bus 0000:03
pci 0000:02:01.0: IO window: disabled
pci 0000:02:01.0: MEM window: disabled
pci 0000:02:01.0: PREFETCH window: disabled
pci 0000:0a:00.0: PCI bridge, secondary bus 0000:0b
pci 0000:0a:00.0: IO window: disabled
pci 0000:0a:00.0: MEM window: 0xd6000000-0xd7ffffff
pci 0000:0a:00.0: PREFETCH window: disabled
pci 0000:02:02.0: PCI bridge, secondary bus 0000:0a
pci 0000:02:02.0: IO window: disabled
pci 0000:02:02.0: MEM window: 0xd6000000-0xd7ffffff
pci 0000:02:02.0: PREFETCH window: disabled
pci 0000:08:00.0: PCI bridge, secondary bus 0000:09
pci 0000:08:00.0: IO window: disabled
pci 0000:08:00.0: MEM window: 0xd8000000-0xd9ffffff
pci 0000:08:00.0: PREFETCH window: disabled
pci 0000:02:03.0: PCI bridge, secondary bus 0000:08
pci 0000:02:03.0: IO window: disabled
pci 0000:02:03.0: MEM window: 0xd8000000-0xd9ffffff
pci 0000:02:03.0: PREFETCH window: disabled
pci 0000:06:00.0: PCI bridge, secondary bus 0000:07
pci 0000:06:00.0: IO window: disabled
pci 0000:06:00.0: MEM window: 0xda000000-0xdbffffff
pci 0000:06:00.0: PREFETCH window: disabled
pci 0000:02:04.0: PCI bridge, secondary bus 0000:06
pci 0000:02:04.0: IO window: disabled
pci 0000:02:04.0: MEM window: 0xda000000-0xdbffffff
pci 0000:02:04.0: PREFETCH window: disabled
pci 0000:04:00.0: PCI bridge, secondary bus 0000:05
pci 0000:04:00.0: IO window: disabled
pci 0000:04:00.0: MEM window: 0xdc000000-0xddffffff
pci 0000:04:00.0: PREFETCH window: disabled
pci 0000:02:05.0: PCI bridge, secondary bus 0000:04
pci 0000:02:05.0: IO window: disabled
pci 0000:02:05.0: MEM window: 0xdc000000-0xddffffff
pci 0000:02:05.0: PREFETCH window: disabled
pci 0000:01:00.0: PCI bridge, secondary bus 0000:02
pci 0000:01:00.0: IO window: disabled
pci 0000:01:00.0: MEM window: 0xd6000000-0xddffffff
pci 0000:01:00.0: PREFETCH window: disabled
pci 0000:00:03.0: PCI bridge, secondary bus 0000:01
pci 0000:00:03.0: IO window: disabled
pci 0000:00:03.0: MEM window: 0xd6000000-0xddffffff
pci 0000:00:03.0: PREFETCH window: disabled
pci 0000:0d:02.0: PCI bridge, secondary bus 0000:0f
pci 0000:0d:02.0: IO window: disabled
pci 0000:0d:02.0: MEM window: disabled
pci 0000:0d:02.0: PREFETCH window: disabled
pci 0000:0d:04.0: PCI bridge, secondary bus 0000:0e
pci 0000:0d:04.0: IO window: disabled
pci 0000:0d:04.0: MEM window: disabled
pci 0000:0d:04.0: PREFETCH window: disabled
pci 0000:0c:00.0: PCI bridge, secondary bus 0000:0d
pci 0000:0c:00.0: IO window: disabled
pci 0000:0c:00.0: MEM window: disabled
pci 0000:0c:00.0: PREFETCH window: disabled
pci 0000:00:04.0: PCI bridge, secondary bus 0000:0c
pci 0000:00:04.0: IO window: disabled
pci 0000:00:04.0: MEM window: disabled
pci 0000:00:04.0: PREFETCH window: disabled
pci 0000:11:02.0: PCI bridge, secondary bus 0000:13
pci 0000:11:02.0: IO window: disabled
pci 0000:11:02.0: MEM window: disabled
pci 0000:11:02.0: PREFETCH window: disabled
pci 0000:11:04.0: PCI bridge, secondary bus 0000:12
pci 0000:11:04.0: IO window: disabled
pci 0000:11:04.0: MEM window: disabled
pci 0000:11:04.0: PREFETCH window: disabled
pci 0000:10:00.0: PCI bridge, secondary bus 0000:11
pci 0000:10:00.0: IO window: disabled
pci 0000:00:06.0: MEM window: disabled
pci 0000:00:06.0: PREFETCH window: disabled
pci 0000:00:1c.0: PCI bridge, secondary bus 0000:15
pci 0000:00:1c.0: IO window: disabled
pci 0000:00:1c.0: MEM window: disabled
pci 0000:00:1c.0: PREFETCH window: disabled
pci 0000:00:1e.0: PCI bridge, secondary bus 0000:1b
pci 0000:00:1e.0: IO window: 0xd000-0xdfff
pci 0000:00:1e.0: MEM window: 0xde400000-0xde4fffff
pci 0000:00:1e.0: PREFETCH window: 0x000000c8000000-0x000000cfffffff
pci 0000:00:01.0: PCI INT A -> GSI 36 (level, low) -> IRQ 36
pci 0000:00:01.0: setting latency timer to 64
pci 0000:00:02.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
pci 0000:00:02.0: setting latency timer to 64
pci 0000:16:00.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
pci 0000:16:00.0: setting latency timer to 64
pci 0000:17:00.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
pci 0000:17:00.0: setting latency timer to 64
pci 0000:17:01.0: PCI INT A -> GSI 18 (level, low) -> IRQ 18
pci 0000:17:01.0: setting latency timer to 64
pci 0000:16:00.3: setting latency timer to 64
pci 0000:00:03.0: PCI INT A -> GSI 37 (level, low) -> IRQ 37
pci 0000:00:03.0: setting latency timer to 64
pci 0000:01:00.0: setting latency timer to 64
pci 0000:02:01.0: setting latency timer to 64
pci 0000:02:02.0: setting latency timer to 64
pci 0000:0a:00.0: setting latency timer to 64
pci 0000:02:03.0: setting latency timer to 64
pci 0000:08:00.0: setting latency timer to 64
pci 0000:02:04.0: setting latency timer to 64
pci 0000:06:00.0: setting latency timer to 64
pci 0000:02:05.0: setting latency timer to 64
pci 0000:04:00.0: setting latency timer to 64
pci 0000:04:00.0: setting latency timer to 64
pci 0000:00:04.0: PCI INT A -> GSI 32 (level, low) -> IRQ 32
pci 0000:00:04.0: setting latency timer to 64
pci 0000:0c:00.0: setting latency timer to 64
pci 0000:0d:02.0: setting latency timer to 64
pci 0000:0d:04.0: setting latency timer to 64
pci 0000:00:06.0: PCI INT A -> GSI 33 (level, low) -> IRQ 33
pci 0000:00:06.0: setting latency timer to 64
pci 0000:10:00.0: setting latency timer to 64
pci 0000:11:02.0: setting latency timer to 64
pci 0000:11:04.0: setting latency timer to 64
pci 0000:00:1c.0: setting latency timer to 64
pci 0000:00:1e.0: setting latency timer to 64
NET: Registered protocol family 2
IP route cache hash table entries: 32768 (order: 5, 131072 bytes)
TCP established hash table entries: 131072 (order: 8, 1048576 bytes)
TCP bind hash table entries: 65536 (order: 7, 524288 bytes)
TCP: Hash tables configured (established 131072 bind 65536)
TCP reno registered
NET: Registered protocol family 1
highmem bounce pool size: 64 pages
HugeTLB registered 2 MB page size, pre-allocated 0 pages
msgmni has been set to 1234
Block layer SCSI generic (bsg) driver version 0.4 loaded (major 254)
io scheduler noop registered
io scheduler anticipatory registered (default)
pci 0000:1b:0c.0: Boot video device
pcieport-driver 0000:00:01.0: setting latency timer to 64
pcieport-driver 0000:00:01.0: found MSI capability
pci_express 0000:00:01.0:pcie00: allocate port service
pcieport-driver 0000:00:02.0: setting latency timer to 64
pcieport-driver 0000:00:02.0: found MSI capability
pci_express 0000:00:02.0:pcie00: allocate port service
pci_express 0000:00:02.0:pcie00: allocate port service
pcieport-driver 0000:00:03.0: setting latency timer to 64
pcieport-driver 0000:00:03.0: found MSI capability
pci_express 0000:00:03.0:pcie00: allocate port service
pcieport-driver 0000:00:04.0: setting latency timer to 64
pcieport-driver 0000:00:04.0: found MSI capability
pci_express 0000:00:04.0:pcie00: allocate port service
pcieport-driver 0000:00:06.0: setting latency timer to 64
pcieport-driver 0000:00:06.0: found MSI capability
pci_express 0000:00:06.0:pcie00: allocate port service
pcieport-driver 0000:00:1c.0: setting latency timer to 64
pcieport-driver 0000:00:1c.0: found MSI capability
pci_express 0000:00:1c.0:pcie00: allocate port service
pci_express 0000:00:1c.0:pcie02: allocate port service
pcieport-driver 0000:01:00.0: setting latency timer to 64
pcieport-driver 0000:02:01.0: setting latency timer to 64
pcieport-driver 0000:02:01.0: found MSI capability
pcieport-driver 0000:02:02.0: setting latency timer to 64
pcieport-driver 0000:02:02.0: found MSI capability
pcieport-driver 0000:02:03.0: setting latency timer to 64
pcieport-driver 0000:02:03.0: found MSI capability
pcieport-driver 0000:02:04.0: setting latency timer to 64
pcieport-driver 0000:02:04.0: found MSI capability
pcieport-driver 0000:02:05.0: setting latency timer to 64
pcieport-driver 0000:02:05.0: found MSI capability
pcieport-driver 0000:0c:00.0: setting latency timer to 64
pcieport-driver 0000:0d:02.0: setting latency timer to 64
pcieport-driver 0000:0d:02.0: found MSI capability
pcieport-driver 0000:0d:04.0: setting latency timer to 64
pcieport-driver 0000:0d:04.0: found MSI capability
pcieport-driver 0000:10:00.0: setting latency timer to 64
pcieport-driver 0000:11:02.0: setting latency timer to 64
pcieport-driver 0000:11:02.0: found MSI capability
ACPI: Power Button (FF) [PWRF]
processor ACPI0007:00: registered as cooling_device0
processor ACPI0007:01: registered as cooling_device1
processor ACPI0007:02: registered as cooling_device2
processor ACPI0007:03: registered as cooling_device3
processor ACPI0007:04: registered as cooling_device4
processor ACPI0007:05: registered as cooling_device5
processor ACPI0007:06: registered as cooling_device6
processor ACPI0007:07: registered as cooling_device7
Real Time Clock Driver v1.12ac
Non-volatile memory driver v1.2
Linux agpgart interface v0.103
[drm] Initialized drm 1.1.0 20060810
brd: module loaded
loop: module loaded
pcnet32.c:v1.35 21.Apr.2008 [email protected]
Broadcom NetXtreme II Gigabit Ethernet Driver bnx2 v1.7.9 (July 18, 2008)
bnx2 0000:05:00.0: PCI INT A -> GSI 38 (level, low) -> IRQ 38
eth0: Broadcom NetXtreme II BCM5708 1000Base-T (B2) PCI-X 64-bit
133MHz found at mem dc000000, IRQ 38, node addr 00:1e:c9:d9:84:57
bnx2 0000:07:00.0: PCI INT A -> GSI 37 (level, low) -> IRQ 37
eth1: Broadcom NetXtreme II BCM5708 1000Base-T (B2) PCI-X 64-bit
133MHz found at mem da000000, IRQ 37, node addr 00:1e:c9:d9:84:59
bnx2 0000:09:00.0: PCI INT A -> GSI 19 (level, low) -> IRQ 19
eth2: Broadcom NetXtreme II BCM5708 1000Base-T (B2) PCI-X 64-bit
133MHz found at mem d8000000, IRQ 19, node addr 00:1e:c9:d9:84:5b
bnx2 0000:0b:00.0: PCI INT A -> GSI 18 (level, low) -> IRQ 18
eth3: Broadcom NetXtreme II BCM5708 1000Base-T (B2) PCI-X 64-bit
133MHz found at mem d6000000, IRQ 18, node addr 00:1e:c9:d9:84:5d
PPP generic driver version 2.4.2
PPP Deflate Compression module registered
PPP BSD Compression module registered
tun: Universal TUN/TAP device driver, 1.6
Uniform Multi-Platform E-IDE driver
ide_generic: please use "probe_mask=0x3f" module parameter for probing
all legacy ISA IDE ports
Probing IDE interface ide0...
hda: TSSTcorpDVD-ROM TS-L333A, ATAPI CD/DVD-ROM drive
Probing IDE interface ide1...
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
ide1 at 0x170-0x177,0x376 on irq 15
megaraid cmm: 2.20.2.7 (Release Date: Sun Jul 16 00:01:03 EST 2006)
megaraid: 2.20.5.1 (Release Date: Thu Nov 16 15:32:35 EST 2006)
megasas: 00.00.03.20-rc1 Mon. March 10 11:02:31 PDT 2008
megasas: 0x1000:0x0060:0x1028:0x1f0c: bus 25:slot 0:func 0
megaraid_sas 0000:19:00.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
megaraid_sas 0000:19:00.0: setting latency timer to 64
megasas: FW now in Ready state
scsi0 : LSI SAS based MegaRAID driver
scsi 0:0:0:0: Direct-Access SEAGATE ST973451SS SM04 PQ: 0 ANSI: 5
scsi 0:0:1:0: Direct-Access SEAGATE ST973451SS SM04 PQ: 0 ANSI: 5
scsi 0:0:2:0: Direct-Access SEAGATE ST973451SS SM04 PQ: 0 ANSI: 5
scsi 0:0:3:0: Direct-Access SEAGATE ST973451SS SM04 PQ: 0 ANSI: 5
scsi 0:0:4:0: Direct-Access SEAGATE ST973451SS SM04 PQ: 0 ANSI: 5
scsi 0:0:5:0: Direct-Access SEAGATE ST973451SS SM04 PQ: 0 ANSI: 5
scsi 0:0:6:0: Direct-Access SEAGATE ST973451SS SM04 PQ: 0 ANSI: 5
scsi 0:0:7:0: Direct-Access SEAGATE ST973451SS SM04 PQ: 0 ANSI: 5
scsi 0:0:32:0: Enclosure DP BACKPLANE 1.06 PQ: 0 ANSI: 5
scsi 0:2:0:0: Direct-Access DELL PERC 6/i 1.11 PQ: 0 ANSI: 5
Driver 'sd' needs updating - please use bus_type methods
sd 0:2:0:0: [sda] 710410240 512-byte hardware sectors (363730 MB)
sd 0:2:0:0: [sda] Write Protect is off
sd 0:2:0:0: [sda] Mode Sense: 1f 00 00 08
sd 0:2:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't
support DPO or FUA
sd 0:2:0:0: [sda] 710410240 512-byte hardware sectors (363730 MB)
sd 0:2:0:0: [sda] Mode Sense: 1f 00 00 08
sd 0:2:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't
support DPO or FUA
sd 0:2:0:0: [sda] 710410240 512-byte hardware sectors (363730 MB)
sd 0:2:0:0: [sda] Write Protect is off
sd 0:2:0:0: [sda] Mode Sense: 1f 00 00 08
sd 0:2:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't
support DPO or FUA
sda: sda1 sda2 sda3 sda4 < sda5 sda6 sda7 sda8 sda9 sda10 sda11 sda12
sda13 sda14 sda15 >
sd 0:2:0:0: [sda] Attached SCSI disk
ata_piix 0000:00:1f.2: version 2.12
ata_piix 0000:00:1f.2: MAP [ P0 P2 P1 P3 ]
ata_piix 0000:00:1f.2: BAR 0: can't reserve I/O region [0x1f0-0x1f7]
ata_piix 0000:00:1f.2: failed to request/iomap BARs for port 0 (errno=-16)
ata_piix 0000:00:1f.2: BAR 2: can't reserve I/O region [0x170-0x177]
ata_piix 0000:00:1f.2: failed to request/iomap BARs for port 1 (errno=-16)
ata_piix 0000:00:1f.2: no available native port
Fusion MPT base driver 3.04.07
Copyright (c) 1999-2008 LSI Corporation
Fusion MPT SPI Host driver 3.04.07
ehci_hcd 0000:00:1d.7: PCI INT B -> GSI 21 (level, low) -> IRQ 21
ehci_hcd 0000:00:1d.7: setting latency timer to 64
ehci_hcd 0000:00:1d.7: EHCI Host Controller
ehci_hcd 0000:00:1d.7: new USB bus registered, assigned bus number 1
ehci_hcd 0000:00:1d.7: debug port 1
ehci_hcd 0000:00:1d.7: cache line size of 32 is not supported
ehci_hcd 0000:00:1d.7: irq 21, io mem 0xde0ffc00
ehci_hcd 0000:00:1d.7: USB 2.0 started, EHCI 1.00, driver 10 Dec 2004
usb usb1: configuration #1 chosen from 1 choice
hub 1-0:1.0: USB hub found
hub 1-0:1.0: 8 ports detected
usb usb1: New USB device found, idVendor=1d6b, idProduct=0002
...

2008-08-06 04:06:41

by Jeff Chua

[permalink] [raw]
Subject: Re: 2.6.27rc1 cannot boot more than 8CPUs

On Wed, Aug 6, 2008 at 11:54 AM, Max Krasnyansky <[email protected]> wrote:

> Are you running 32-bit kernel ?

Yes. But, does it matter?

Thanks,
Jeff.

2008-08-06 04:49:55

by Max Krasnyansky

[permalink] [raw]
Subject: Re: 2.6.27rc1 cannot boot more than 8CPUs

Jeff Chua wrote:
> On Wed, Aug 6, 2008 at 11:54 AM, Max Krasnyansky <[email protected]> wrote:
>
>> Are you running 32-bit kernel ?
>
> Yes. But, does it matter?

It used to. 64-bit kernel used to handle maxcpus option as documented in the
Documentation/cpu-hotplug.txt and 32-bit one was broken.
I just looked at the latest code and realized that both are now broken. They
ignore cpu id > maxcpus instead of not-booting them.

I'll send a patch that fixes that tomorrow.

Max

2008-08-06 05:00:20

by Li Zefan

[permalink] [raw]
Subject: Re: 2.6.27rc1 cannot boot more than 8CPUs

Max Krasnyansky wrote:
> Jeff Chua wrote:
>> On Wed, Aug 6, 2008 at 11:54 AM, Max Krasnyansky <[email protected]>
>> wrote:
>>
>>> Are you running 32-bit kernel ?
>>
>> Yes. But, does it matter?
>
> It used to. 64-bit kernel used to handle maxcpus option as documented in
> the Documentation/cpu-hotplug.txt and 32-bit one was broken.
> I just looked at the latest code and realized that both are now broken.
> They ignore cpu id > maxcpus instead of not-booting them.
>

Yes. I have an x86_64 box with 4 cpus, but yesterday when I booted up with maxcpus=2,
I didn't see the other 2 cpus.

> I'll send a patch that fixes that tomorrow.
>

greate :)

2008-08-06 05:19:24

by David Miller

[permalink] [raw]
Subject: Re: 2.6.27rc1 cannot boot more than 8CPUs

From: "Jeff Chua" <[email protected]>
Date: Wed, 6 Aug 2008 11:15:30 +0800

> I've a Dell R900 with 4 quad-core Xeon processors (total 16CPUs), but
> can only managed to boot up with CONFIG_NR_CPUS=8. Setting
> CONFIG_NR_CPUS=16 causes the kernel to hang while booting.
>
>
> Here's the dmesg with CONFIG_NR_CPUS=8 ...

Do you have lockdep enabled? If sure, try turning that off.

2008-08-06 06:01:30

by Linus Torvalds

[permalink] [raw]
Subject: Re: 2.6.27rc1 cannot boot more than 8CPUs



On Wed, 6 Aug 2008, Jeff Chua wrote:
>
> How can I debug this further? I'm using the latest linux git pull.

One trivial thing to try would be to just bisect it. I assume 2.6.26 is
fine, so while it will take a few boots to try it out (there's 8111
commits in between, so 13 reboots should do it), the advantage of
bisection is that it's fairly straightforward to do even if you don't have
any clue where the problem might lurk.

And with your machine, recompiling the kernel 13 times shouldn't take that
long ;)

Linus

2008-08-06 06:42:22

by Jeff Chua

[permalink] [raw]
Subject: Re: 2.6.27rc1 cannot boot more than 8CPUs

On Wed, Aug 6, 2008 at 1:19 PM, David Miller <[email protected]> wrote:
> From: "Jeff Chua" <[email protected]>
> Date: Wed, 6 Aug 2008 11:15:30 +0800
> Do you have lockdep enabled? If sure, try turning that off.

It's enabled by default, and I can't seem to disable it even if I
commented it out or delete it, it comes back after running "make".

CONFIG_X86_32=y
# CONFIG_X86_64 is not set
CONFIG_X86=y
CONFIG_ARCH_DEFCONFIG="arch/x86/configs/i386_defconfig"
# CONFIG_GENERIC_LOCKBREAK is not set
CONFIG_GENERIC_TIME=y
CONFIG_GENERIC_CMOS_UPDATE=y
CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_HAVE_LATENCYTOP_SUPPORT=y
CONFIG_FAST_CMPXCHG_LOCAL=y
CONFIG_MMU=y
CONFIG_ZONE_DMA=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_IOMAP=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_HWEIGHT=y


Thanks,
Jeff.

2008-08-06 06:44:11

by Jeff Chua

[permalink] [raw]
Subject: Re: 2.6.27rc1 cannot boot more than 8CPUs

On Wed, Aug 6, 2008 at 2:01 PM, Linus Torvalds
<[email protected]> wrote:

> One trivial thing to try would be to just bisect it. I assume 2.6.26 is
> fine, so while it will take a few boots to try it out (there's 8111
> commits in between, so 13 reboots should do it), the advantage of
> bisection is that it's fairly straightforward to do even if you don't have
> any clue where the problem might lurk.

Bisecting now.

Jeff.

2008-08-06 07:18:42

by David Miller

[permalink] [raw]
Subject: Re: 2.6.27rc1 cannot boot more than 8CPUs

From: "Jeff Chua" <[email protected]>
Date: Wed, 6 Aug 2008 14:42:11 +0800

> On Wed, Aug 6, 2008 at 1:19 PM, David Miller <[email protected]> wrote:
> > From: "Jeff Chua" <[email protected]>
> > Date: Wed, 6 Aug 2008 11:15:30 +0800
> > Do you have lockdep enabled? If sure, try turning that off.
>
> It's enabled by default, and I can't seem to disable it even if I
> commented it out or delete it, it comes back after running "make".

You have to turn off CONFIG_PROVE_LOCKING, in fact just turn off
everything in the lock debugging section:

# CONFIG_DEBUG_RT_MUTEXES is not set
# CONFIG_RT_MUTEX_TESTER is not set
# CONFIG_DEBUG_SPINLOCK is not set
# CONFIG_DEBUG_MUTEXES is not set
# CONFIG_DEBUG_LOCK_ALLOC is not set
# CONFIG_PROVE_LOCKING is not set
# CONFIG_LOCK_STAT is not set
# CONFIG_DEBUG_SPINLOCK_SLEEP is not set
# CONFIG_DEBUG_LOCKING_API_SELFTESTS is not set

2008-08-06 08:56:54

by Yinghai Lu

[permalink] [raw]
Subject: Re: 2.6.27rc1 cannot boot more than 8CPUs

On Tue, Aug 5, 2008 at 11:42 PM, Jeff Chua <[email protected]> wrote:
> On Wed, Aug 6, 2008 at 1:19 PM, David Miller <[email protected]> wrote:
>> From: "Jeff Chua" <[email protected]>
>> Date: Wed, 6 Aug 2008 11:15:30 +0800
>> Do you have lockdep enabled? If sure, try turning that off.
>
> It's enabled by default, and I can't seem to disable it even if I
> commented it out or delete it, it comes back after running "make".
>
> CONFIG_X86_32=y
> # CONFIG_X86_64 is not set
> CONFIG_X86=y
> CONFIG_ARCH_DEFCONFIG="arch/x86/configs/i386_defconfig"
> # CONFIG_GENERIC_LOCKBREAK is not set
> CONFIG_GENERIC_TIME=y
> CONFIG_GENERIC_CMOS_UPDATE=y
> CONFIG_CLOCKSOURCE_WATCHDOG=y
> CONFIG_GENERIC_CLOCKEVENTS=y
> CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
> CONFIG_LOCKDEP_SUPPORT=y
> CONFIG_STACKTRACE_SUPPORT=y
> CONFIG_HAVE_LATENCYTOP_SUPPORT=y
> CONFIG_FAST_CMPXCHG_LOCAL=y
> CONFIG_MMU=y
> CONFIG_ZONE_DMA=y
> CONFIG_GENERIC_ISA_DMA=y
> CONFIG_GENERIC_IOMAP=y
> CONFIG_GENERIC_BUG=y
> CONFIG_GENERIC_HWEIGHT=y
>
>

do you have

CONFIG_X86_GENERICARCH=y
CONFIG_X86_BIGSMP=y

8 more cpu need bigsmp mode.

YH

2008-08-06 09:33:21

by Jeff Chua

[permalink] [raw]
Subject: Re: 2.6.27rc1 cannot boot more than 8CPUs

On Wed, Aug 6, 2008 at 3:18 PM, David Miller <[email protected]> wrote:

> You have to turn off CONFIG_PROVE_LOCKING, in fact just turn off
> everything in the lock debugging section:
>
> # CONFIG_DEBUG_RT_MUTEXES is not set
> # CONFIG_RT_MUTEX_TESTER is not set
> # CONFIG_DEBUG_SPINLOCK is not set
> # CONFIG_DEBUG_MUTEXES is not set
> # CONFIG_DEBUG_LOCK_ALLOC is not set
> # CONFIG_PROVE_LOCKING is not set
> # CONFIG_LOCK_STAT is not set
> # CONFIG_DEBUG_SPINLOCK_SLEEP is not set
> # CONFIG_DEBUG_LOCKING_API_SELFTESTS is not set
>

I don't any option to turn these off. Still searching.

Jeff.

2008-08-06 09:36:18

by Jeff Chua

[permalink] [raw]
Subject: Re: 2.6.27rc1 cannot boot more than 8CPUs

On Wed, Aug 6, 2008 at 4:49 PM, Yinghai Lu <[email protected]> wrote:
> do you have
>
> CONFIG_X86_GENERICARCH=y
> CONFIG_X86_BIGSMP=y
>
> 8 more cpu need bigsmp mode.


Are these the ones that supposed to be set? Any, can't find a place to
set these using menuconfig.

# CONFIG_X86_GENERICARCH is not set
# CONFIG_X86_VSMP is not set

Thanks
Jeff.

2008-08-06 09:36:49

by David Miller

[permalink] [raw]
Subject: Re: 2.6.27rc1 cannot boot more than 8CPUs

From: "Jeff Chua" <[email protected]>
Date: Wed, 6 Aug 2008 17:33:07 +0800

> On Wed, Aug 6, 2008 at 3:18 PM, David Miller <[email protected]> wrote:
>
> > You have to turn off CONFIG_PROVE_LOCKING, in fact just turn off
> > everything in the lock debugging section:
> >
> > # CONFIG_DEBUG_RT_MUTEXES is not set
> > # CONFIG_RT_MUTEX_TESTER is not set
> > # CONFIG_DEBUG_SPINLOCK is not set
> > # CONFIG_DEBUG_MUTEXES is not set
> > # CONFIG_DEBUG_LOCK_ALLOC is not set
> > # CONFIG_PROVE_LOCKING is not set
> > # CONFIG_LOCK_STAT is not set
> > # CONFIG_DEBUG_SPINLOCK_SLEEP is not set
> > # CONFIG_DEBUG_LOCKING_API_SELFTESTS is not set
> >
>
> I don't any option to turn these off. Still searching.

Maybe edit the ".config" file at the top level of the kernel
sources and then type "make oldconfig" ?!?!?!

2008-08-06 09:42:20

by Jeff Chua

[permalink] [raw]
Subject: Re: 2.6.27rc1 cannot boot more than 8CPUs

On Wed, Aug 6, 2008 at 5:35 PM, Jeff Chua <[email protected]> wrote:
> On Wed, Aug 6, 2008 at 4:49 PM, Yinghai Lu <[email protected]> wrote:
>> do you have
>> CONFIG_X86_GENERICARCH=y
>> CONFIG_X86_BIGSMP=y
>> 8 more cpu need bigsmp mode.
> Are these the ones that supposed to be set? Any, can't find a place to
> set these using menuconfig.
>
> # CONFIG_X86_GENERICARCH is not set
> # CONFIG_X86_VSMP is not set

Sorry, found it. These are not obvious. I had selected
"Subarchitecture Type (PC-compatible)" and could find a place to set
CONFIG_X86_GENERICARCH.

Just found it under " Subarchitecture Type (Generic architecture)",
and then it shows the CONFIG_X86_BIGSMP option.

Ok, compiling and testing now.

Jeff.

2008-08-06 10:06:48

by Jeff Chua

[permalink] [raw]
Subject: Re: 2.6.27rc1 cannot boot more than 8CPUs

On Wed, Aug 6, 2008 at 5:36 PM, David Miller <[email protected]> wrote:
> From: "Jeff Chua" <[email protected]>
> Date: Wed, 6 Aug 2008 17:33:07 +0800
>
>> On Wed, Aug 6, 2008 at 3:18 PM, David Miller <[email protected]> wrote:
>>
>> > You have to turn off CONFIG_PROVE_LOCKING, in fact just turn off
>> > everything in the lock debugging section:
> Maybe edit the ".config" file at the top level of the kernel
> sources and then type "make oldconfig" ?!?!?!

Ok, may be not as bad as I thought. These are not in .config meaning,
they are not set. So, it should ok. I'll test out these two first.

> CONFIG_X86_GENERICARCH=y
> CONFIG_X86_BIGSMP=y

Thanks,
Jeff.

2008-08-06 11:10:22

by Jeff Chua

[permalink] [raw]
Subject: Re: 2.6.27rc1 cannot boot more than 8CPUs


On Wed, Aug 6, 2008 at 5:42 PM, Jeff Chua <[email protected]>
wrote:
>>> CONFIG_X86_GENERICARCH=y
>>> CONFIG_X86_BIGSMP=y
>>> 8 more cpu need bigsmp mode.

> Just found it under " Subarchitecture Type (Generic architecture)",
> and then it shows the CONFIG_X86_BIGSMP option.

It works. Booted with 16CPUs. 32GB RAM.

CPU0 L7345 1.86GHz 0C
CPU1 L7345 1.86GHz 0C
CPU2 L7345 1.86GHz 0C
CPU3 L7345 1.86GHz 0C
CPU4 L7345 1.86GHz 0C
CPU5 L7345 1.86GHz 0C
CPU6 L7345 1.86GHz 0C
CPU7 L7345 1.86GHz 0C
CPU8 L7345 1.86GHz 0C
CPU9 L7345 1.86GHz 0C
CPU10 L7345 1.86GHz 0C
CPU11 L7345 1.86GHz 0C
CPU12 L7345 1.86GHz 0C
CPU13 L7345 1.86GHz 0C
CPU14 L7345 1.86GHz 0C
CPU15 L7345 1.86GHz 0C


So, but setting the config not obvious. And should CONFIG_X86_PC be
considered as well as CONFIG_X86_GENERICARCH?

With CONFIG_X86_PC, I can set CONFIG_SPARSEMEM=y.

With CONFIG_X86_GENERICARCH, CONFIG_SPARSEMEM depends on CONFIG_NUMA.

I'm using the patch below to enable sparsemem instead of flatmem, but
don't know what impact it has. System booted and running.

It would be nice to automatically default CONFIG_X86_BIGSMP with CPUs > 8.
But I don't know to do that.


Thanks,
Jeff.


--- linux/arch/x86/Kconfig.org 2008-08-06 18:41:08 +0800
+++ linux/arch/x86/Kconfig 2008-08-06 18:48:13 +0800
@@ -1035,7 +1035,7 @@

config ARCH_FLATMEM_ENABLE
def_bool y
- depends on X86_32 && ARCH_SELECT_MEMORY_MODEL && X86_PC && !NUMA
+ depends on X86_32 && ARCH_SELECT_MEMORY_MODEL && !NUMA

config ARCH_DISCONTIGMEM_ENABLE
def_bool y
@@ -1051,7 +1051,7 @@

config ARCH_SPARSEMEM_ENABLE
def_bool y
- depends on X86_64 || NUMA || (EXPERIMENTAL && X86_PC)
+ depends on X86_64 || NUMA || (EXPERIMENTAL && X86_PC) || X86_GENERICARCH
select SPARSEMEM_STATIC if X86_32
select SPARSEMEM_VMEMMAP_ENABLE if X86_64

2008-08-06 15:33:44

by Jeff Chua

[permalink] [raw]
Subject: Re: 2.6.27rc1 cannot boot more than 8CPUs

On Wed, Aug 6, 2008 at 2:42 PM, Jeff Chua <[email protected]> wrote:
> On Wed, Aug 6, 2008 at 2:01 PM, Linus Torvalds
> <[email protected]> wrote:
>> One trivial thing to try would be to just bisect it. I assume 2.6.26 is
> Bisecting now.

Thanks to all the great helpful suggestions from everyone, and this
turns out that I just need to enable the following switches, so I
didn't bisect further, and since it's first machine that I've tried
with more than 8 CPUs so I wasn't sure whether 2.6.16 has the same
problem, but if you wish, I could give 2.6.16 a try.

> CONFIG_X86_GENERICARCH=y
> CONFIG_X86_BIGSMP=y

Thank you all for the great linux kernel!

Jeff.

2008-08-06 16:14:05

by Yinghai Lu

[permalink] [raw]
Subject: Re: 2.6.27rc1 cannot boot more than 8CPUs

On Wed, Aug 6, 2008 at 4:09 AM, Jeff Chua <[email protected]> wrote:
>
> On Wed, Aug 6, 2008 at 5:42 PM, Jeff Chua <[email protected]> wrote:
>>>>
>>>> CONFIG_X86_GENERICARCH=y
>>>> CONFIG_X86_BIGSMP=y
>>>> 8 more cpu need bigsmp mode.
>
>> Just found it under " Subarchitecture Type (Generic architecture)",
>> and then it shows the CONFIG_X86_BIGSMP option.
>
> It works. Booted with 16CPUs. 32GB RAM.
>
> CPU0 L7345 1.86GHz 0C
> CPU1 L7345 1.86GHz 0C
> CPU2 L7345 1.86GHz 0C
> CPU3 L7345 1.86GHz 0C
> CPU4 L7345 1.86GHz 0C
> CPU5 L7345 1.86GHz 0C
> CPU6 L7345 1.86GHz 0C
> CPU7 L7345 1.86GHz 0C
> CPU8 L7345 1.86GHz 0C
> CPU9 L7345 1.86GHz 0C
> CPU10 L7345 1.86GHz 0C
> CPU11 L7345 1.86GHz 0C
> CPU12 L7345 1.86GHz 0C
> CPU13 L7345 1.86GHz 0C
> CPU14 L7345 1.86GHz 0C
> CPU15 L7345 1.86GHz 0C
>
>
> So, but setting the config not obvious. And should CONFIG_X86_PC be
> considered as well as CONFIG_X86_GENERICARCH?
>
> With CONFIG_X86_PC, I can set CONFIG_SPARSEMEM=y.
>
> With CONFIG_X86_GENERICARCH, CONFIG_SPARSEMEM depends on CONFIG_NUMA.
>
> I'm using the patch below to enable sparsemem instead of flatmem, but don't
> know what impact it has. System booted and running.
>
> It would be nice to automatically default CONFIG_X86_BIGSMP with CPUs > 8.
> But I don't know to do that.
>
>

actually x86_pc is one mode of genericarch..., genericarch already
could detect pc, bigsmp, and numaq, es7000, bigsmp, visew..

hope later we can change mach_default to default. but embed guys may
want to keep it as seperated one.

in the dmesg when booting x86_pc only, we already have warning to let
you set bigsmp if you have 8 more cpus.

YH

2008-08-06 16:35:35

by Jeff Chua

[permalink] [raw]
Subject: Re: 2.6.27rc1 cannot boot more than 8CPUs

On Thu, Aug 7, 2008 at 12:13 AM, Yinghai Lu <[email protected]> wrote:

> actually x86_pc is one mode of genericarch..., genericarch already
> could detect pc, bigsmp, and numaq, es7000, bigsmp, visew..

It seems to get "sparse mem", NUMA must be set first, but this is not
required for X86_PC.


> in the dmesg when booting x86_pc only, we already have warning to let
> you set bigsmp if you have 8 more cpus.

With more than 8 CPUs and upon boot up and hangs, Shift+PgUp does not
work, so it's not possible to view console messages except those on
the current page, so I guess I missed that hint.

Jeff.

2008-08-06 20:12:14

by Max Krasnyansky

[permalink] [raw]
Subject: Re: 2.6.27rc1 cannot boot more than 8CPUs

Li Zefan wrote:
> Max Krasnyansky wrote:
>> Jeff Chua wrote:
>>> On Wed, Aug 6, 2008 at 11:54 AM, Max Krasnyansky <[email protected]>
>>> wrote:
>>>
>>>> Are you running 32-bit kernel ?
>>> Yes. But, does it matter?
>> It used to. 64-bit kernel used to handle maxcpus option as documented in
>> the Documentation/cpu-hotplug.txt and 32-bit one was broken.
>> I just looked at the latest code and realized that both are now broken.
>> They ignore cpu id > maxcpus instead of not-booting them.
>>
>
> Yes. I have an x86_64 box with 4 cpus, but yesterday when I booted up with maxcpus=2,
> I didn't see the other 2 cpus.
>
>> I'll send a patch that fixes that tomorrow.
>>
>
> greate :)

I just sent it and CC'ed both of you guys.
[PATCH] Resurect proper handling of maxcpus= kernel option

Jeff, maybe you can try again booting with maxcpus=8 and then bringing
them online one by one to see where/what fails.

Max

2008-08-11 19:54:42

by Ingo Molnar

[permalink] [raw]
Subject: Re: 2.6.27rc1 cannot boot more than 8CPUs


* Jeff Chua <[email protected]> wrote:

>
> On Wed, Aug 6, 2008 at 5:42 PM, Jeff Chua <[email protected]>
> wrote:
>>>> CONFIG_X86_GENERICARCH=y
>>>> CONFIG_X86_BIGSMP=y
>>>> 8 more cpu need bigsmp mode.
>
>> Just found it under " Subarchitecture Type (Generic architecture)",
>> and then it shows the CONFIG_X86_BIGSMP option.
>
> It works. Booted with 16CPUs. 32GB RAM.
>
> CPU0 L7345 1.86GHz 0C
> CPU1 L7345 1.86GHz 0C
> CPU2 L7345 1.86GHz 0C
> CPU3 L7345 1.86GHz 0C
> CPU4 L7345 1.86GHz 0C
> CPU5 L7345 1.86GHz 0C
> CPU6 L7345 1.86GHz 0C
> CPU7 L7345 1.86GHz 0C
> CPU8 L7345 1.86GHz 0C
> CPU9 L7345 1.86GHz 0C
> CPU10 L7345 1.86GHz 0C
> CPU11 L7345 1.86GHz 0C
> CPU12 L7345 1.86GHz 0C
> CPU13 L7345 1.86GHz 0C
> CPU14 L7345 1.86GHz 0C
> CPU15 L7345 1.86GHz 0C
>
>
> So, but setting the config not obvious. And should CONFIG_X86_PC be
> considered as well as CONFIG_X86_GENERICARCH?
>
> With CONFIG_X86_PC, I can set CONFIG_SPARSEMEM=y.
>
> With CONFIG_X86_GENERICARCH, CONFIG_SPARSEMEM depends on CONFIG_NUMA.
>
> I'm using the patch below to enable sparsemem instead of flatmem, but
> don't know what impact it has. System booted and running.
>
> It would be nice to automatically default CONFIG_X86_BIGSMP with CPUs
> > 8. But I don't know to do that.

thanks, applied.

i'm wondering, does with that patch applied a working 2.6.26 .config if
put through 'make oldconfig' boot fine on your box now? Any make
oldconfig breakage is a regression we want to fix. We want upgrades
between kernel versions to be seemless and complete.

Ingo

2008-08-11 20:00:15

by Ingo Molnar

[permalink] [raw]
Subject: Re: 2.6.27rc1 cannot boot more than 8CPUs


* Jeff Chua <[email protected]> wrote:

> On Wed, Aug 6, 2008 at 2:42 PM, Jeff Chua <[email protected]> wrote:
> > On Wed, Aug 6, 2008 at 2:01 PM, Linus Torvalds
> > <[email protected]> wrote:
> >> One trivial thing to try would be to just bisect it. I assume 2.6.26 is
> > Bisecting now.
>
> Thanks to all the great helpful suggestions from everyone, and this
> turns out that I just need to enable the following switches, so I
> didn't bisect further, and since it's first machine that I've tried
> with more than 8 CPUs so I wasn't sure whether 2.6.16 has the same
> problem, but if you wish, I could give 2.6.16 a try.
>
> > CONFIG_X86_GENERICARCH=y
> > CONFIG_X86_BIGSMP=y
>
> Thank you all for the great linux kernel!

i still consider a silent boot hang a bug we need to fix.

bigsmp might be required to have all cpus available on your box, but the
kernel is still supposed to transparently fall back to less CPUs (and
print a warning) if it cannot do that.

Ingo

2008-08-11 20:04:16

by Yinghai Lu

[permalink] [raw]
Subject: Re: 2.6.27rc1 cannot boot more than 8CPUs

On Mon, Aug 11, 2008 at 12:59 PM, Ingo Molnar <[email protected]> wrote:
>
> * Jeff Chua <[email protected]> wrote:
>
>> On Wed, Aug 6, 2008 at 2:42 PM, Jeff Chua <[email protected]> wrote:
>> > On Wed, Aug 6, 2008 at 2:01 PM, Linus Torvalds
>> > <[email protected]> wrote:
>> >> One trivial thing to try would be to just bisect it. I assume 2.6.26 is
>> > Bisecting now.
>>
>> Thanks to all the great helpful suggestions from everyone, and this
>> turns out that I just need to enable the following switches, so I
>> didn't bisect further, and since it's first machine that I've tried
>> with more than 8 CPUs so I wasn't sure whether 2.6.16 has the same
>> problem, but if you wish, I could give 2.6.16 a try.
>>
>> > CONFIG_X86_GENERICARCH=y
>> > CONFIG_X86_BIGSMP=y
>>
>> Thank you all for the great linux kernel!
>
> i still consider a silent boot hang a bug we need to fix.
>
> bigsmp might be required to have all cpus available on your box, but the
> kernel is still supposed to transparently fall back to less CPUs (and
> print a warning) if it cannot do that.
>
in setup.c::setup_arch() after go over with madt or mptable

#if defined(CONFIG_SMP) && defined(CONFIG_X86_PC) && defined(CONFIG_X86_32)
if (def_to_bigsmp)
printk(KERN_WARNING "More than 8 CPUs detected and "
"CONFIG_X86_PC cannot handle it.\nUse "
"CONFIG_X86_GENERICARCH or
CONFIG_X86_BIGSMP.\n"); ===> here need to change "or" to "and"
#endif

or just panic here? because screen scroll to pass it, and user will
not notice that...

YH

2008-08-11 20:08:37

by Ingo Molnar

[permalink] [raw]
Subject: Re: 2.6.27rc1 cannot boot more than 8CPUs


* Yinghai Lu <[email protected]> wrote:

> On Mon, Aug 11, 2008 at 12:59 PM, Ingo Molnar <[email protected]> wrote:
> >
> > * Jeff Chua <[email protected]> wrote:
> >
> >> On Wed, Aug 6, 2008 at 2:42 PM, Jeff Chua <[email protected]> wrote:
> >> > On Wed, Aug 6, 2008 at 2:01 PM, Linus Torvalds
> >> > <[email protected]> wrote:
> >> >> One trivial thing to try would be to just bisect it. I assume 2.6.26 is
> >> > Bisecting now.
> >>
> >> Thanks to all the great helpful suggestions from everyone, and this
> >> turns out that I just need to enable the following switches, so I
> >> didn't bisect further, and since it's first machine that I've tried
> >> with more than 8 CPUs so I wasn't sure whether 2.6.16 has the same
> >> problem, but if you wish, I could give 2.6.16 a try.
> >>
> >> > CONFIG_X86_GENERICARCH=y
> >> > CONFIG_X86_BIGSMP=y
> >>
> >> Thank you all for the great linux kernel!
> >
> > i still consider a silent boot hang a bug we need to fix.
> >
> > bigsmp might be required to have all cpus available on your box, but the
> > kernel is still supposed to transparently fall back to less CPUs (and
> > print a warning) if it cannot do that.
> >
> in setup.c::setup_arch() after go over with madt or mptable
>
> #if defined(CONFIG_SMP) && defined(CONFIG_X86_PC) && defined(CONFIG_X86_32)
> if (def_to_bigsmp)
> printk(KERN_WARNING "More than 8 CPUs detected and "
> "CONFIG_X86_PC cannot handle it.\nUse "
> "CONFIG_X86_GENERICARCH or
> CONFIG_X86_BIGSMP.\n"); ===> here need to change "or" to "and"
> #endif
>
> or just panic here? because screen scroll to pass it, and user will
> not notice that...

a panic is better but still quite rude and doesnt give a user a system
under which he can build an even greater kernel [after having discovered
the warning in the syslog] ;-)

best would be to use as many CPUs as we can support, and skip the rest
and boot up fine. (and print the warning prominently - the user does not
make maximum use of available physical resources)

Ingo

2008-08-11 20:12:28

by Yinghai Lu

[permalink] [raw]
Subject: Re: 2.6.27rc1 cannot boot more than 8CPUs

On Mon, Aug 11, 2008 at 1:08 PM, Ingo Molnar <[email protected]> wrote:
>
> * Yinghai Lu <[email protected]> wrote:
>
>> On Mon, Aug 11, 2008 at 12:59 PM, Ingo Molnar <[email protected]> wrote:
>> >
>> > * Jeff Chua <[email protected]> wrote:
>> >
>> >> On Wed, Aug 6, 2008 at 2:42 PM, Jeff Chua <[email protected]> wrote:
>> >> > On Wed, Aug 6, 2008 at 2:01 PM, Linus Torvalds
>> >> > <[email protected]> wrote:
>> >> >> One trivial thing to try would be to just bisect it. I assume 2.6.26 is
>> >> > Bisecting now.
>> >>
>> >> Thanks to all the great helpful suggestions from everyone, and this
>> >> turns out that I just need to enable the following switches, so I
>> >> didn't bisect further, and since it's first machine that I've tried
>> >> with more than 8 CPUs so I wasn't sure whether 2.6.16 has the same
>> >> problem, but if you wish, I could give 2.6.16 a try.
>> >>
>> >> > CONFIG_X86_GENERICARCH=y
>> >> > CONFIG_X86_BIGSMP=y
>> >>
>> >> Thank you all for the great linux kernel!
>> >
>> > i still consider a silent boot hang a bug we need to fix.
>> >
>> > bigsmp might be required to have all cpus available on your box, but the
>> > kernel is still supposed to transparently fall back to less CPUs (and
>> > print a warning) if it cannot do that.
>> >
>> in setup.c::setup_arch() after go over with madt or mptable
>>
>> #if defined(CONFIG_SMP) && defined(CONFIG_X86_PC) && defined(CONFIG_X86_32)
>> if (def_to_bigsmp)
>> printk(KERN_WARNING "More than 8 CPUs detected and "
>> "CONFIG_X86_PC cannot handle it.\nUse "
>> "CONFIG_X86_GENERICARCH or
>> CONFIG_X86_BIGSMP.\n"); ===> here need to change "or" to "and"
>> #endif
>>
>> or just panic here? because screen scroll to pass it, and user will
>> not notice that...
>
> a panic is better but still quite rude and doesnt give a user a system
> under which he can build an even greater kernel [after having discovered
> the warning in the syslog] ;-)
>
> best would be to use as many CPUs as we can support, and skip the rest
> and boot up fine. (and print the warning prominently - the user does not
> make maximum use of available physical resources)

then smp start AP cpu could check the apic id >=8 etc before try to
start it.in some cases

YH

2008-08-11 20:36:20

by Yinghai Lu

[permalink] [raw]
Subject: Re: 2.6.27rc1 cannot boot more than 8CPUs

On Mon, Aug 11, 2008 at 1:12 PM, Yinghai Lu <[email protected]> wrote:
> On Mon, Aug 11, 2008 at 1:08 PM, Ingo Molnar <[email protected]> wrote:
>>
>> * Yinghai Lu <[email protected]> wrote:
>>
>>> On Mon, Aug 11, 2008 at 12:59 PM, Ingo Molnar <[email protected]> wrote:
>>> >
>>> > * Jeff Chua <[email protected]> wrote:
>>> >
>>> >> On Wed, Aug 6, 2008 at 2:42 PM, Jeff Chua <[email protected]> wrote:
>>> >> > On Wed, Aug 6, 2008 at 2:01 PM, Linus Torvalds
>>> >> > <[email protected]> wrote:
>>> >> >> One trivial thing to try would be to just bisect it. I assume 2.6.26 is
>>> >> > Bisecting now.
>>> >>
>>> >> Thanks to all the great helpful suggestions from everyone, and this
>>> >> turns out that I just need to enable the following switches, so I
>>> >> didn't bisect further, and since it's first machine that I've tried
>>> >> with more than 8 CPUs so I wasn't sure whether 2.6.16 has the same
>>> >> problem, but if you wish, I could give 2.6.16 a try.
>>> >>
>>> >> > CONFIG_X86_GENERICARCH=y
>>> >> > CONFIG_X86_BIGSMP=y
>>> >>
>>> >> Thank you all for the great linux kernel!
>>> >
>>> > i still consider a silent boot hang a bug we need to fix.
>>> >
>>> > bigsmp might be required to have all cpus available on your box, but the
>>> > kernel is still supposed to transparently fall back to less CPUs (and
>>> > print a warning) if it cannot do that.
>>> >
>>> in setup.c::setup_arch() after go over with madt or mptable
>>>
>>> #if defined(CONFIG_SMP) && defined(CONFIG_X86_PC) && defined(CONFIG_X86_32)
>>> if (def_to_bigsmp)
>>> printk(KERN_WARNING "More than 8 CPUs detected and "
>>> "CONFIG_X86_PC cannot handle it.\nUse "
>>> "CONFIG_X86_GENERICARCH or
>>> CONFIG_X86_BIGSMP.\n"); ===> here need to change "or" to "and"
>>> #endif
>>>
>>> or just panic here? because screen scroll to pass it, and user will
>>> not notice that...
>>
>> a panic is better but still quite rude and doesnt give a user a system
>> under which he can build an even greater kernel [after having discovered
>> the warning in the syslog] ;-)
>>
>> best would be to use as many CPUs as we can support, and skip the rest
>> and boot up fine. (and print the warning prominently - the user does not
>> make maximum use of available physical resources)
>
> then smp start AP cpu could check the apic id >=8 etc before try to
> start it.in some cases

please check attach patches..

YH


Attachments:
(No filename) (2.42 kB)
def_big_smp.patch (1.37 kB)
Download all attachments

2008-08-11 20:44:44

by Ingo Molnar

[permalink] [raw]
Subject: Re: 2.6.27rc1 cannot boot more than 8CPUs


* Yinghai Lu <[email protected]> wrote:

> [PATCH] x86: move def_to_bigsmp check later
>
> and skip the apicid id > 8

applied to tip/x86/urgent - thanks Yinghai. While we are touching this
code i cleaned up the printk a bit: the line breaking was way too ugly,
and the message not very informative about the effects of this problem.
See the full commit below.

Ingo

--------------->
>From b74548e76a0eab1f29546e7c5a589429c069a680 Mon Sep 17 00:00:00 2001
From: Yinghai Lu <[email protected]>
Date: Mon, 11 Aug 2008 13:36:04 -0700
Subject: [PATCH] x86: fix 2.6.27rc1 cannot boot more than 8CPUs

Jeff Chua reported that booting a !bigsmp kernel on a 16-way box
hangs silently.

this is a long-standing issue, smp start AP cpu could check the
apic id >=8 etc before trying to start it.

achieve this by moving the def_to_bigsmp check later and skip the
apicid id > 8

[ [email protected]: clean up the message that is printed. ]

Reported-by: "Jeff Chua" <[email protected]>
Signed-off-by: Yinghai Lu <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>

arch/x86/kernel/setup.c | 6 ------
arch/x86/kernel/smpboot.c | 10 ++++++++++
2 files changed, 10 insertions(+), 6 deletions(-)
---
arch/x86/kernel/setup.c | 6 ------
arch/x86/kernel/smpboot.c | 10 ++++++++++
2 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 6e5823b..68b48e3 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -861,12 +861,6 @@ void __init setup_arch(char **cmdline_p)
init_apic_mappings();
ioapic_init_mappings();

-#if defined(CONFIG_SMP) && defined(CONFIG_X86_PC) && defined(CONFIG_X86_32)
- if (def_to_bigsmp)
- printk(KERN_WARNING "More than 8 CPUs detected and "
- "CONFIG_X86_PC cannot handle it.\nUse "
- "CONFIG_X86_GENERICARCH or CONFIG_X86_BIGSMP.\n");
-#endif
kvm_guest_init();

e820_reserve_resources();
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index da10f07..91055d7 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -994,7 +994,17 @@ int __cpuinit native_cpu_up(unsigned int cpu)
flush_tlb_all();
low_mappings = 1;

+#ifdef CONFIG_X86_PC
+ if (def_to_bigsmp && apicid > 8) {
+ printk(KERN_WARNING
+ "More than 8 CPUs detected - skipping them.\n"
+ "Use CONFIG_X86_GENERICARCH and CONFIG_X86_BIGSMP.\n");
+ err = -1;
+ } else
+ err = do_boot_cpu(apicid, cpu);
+#else
err = do_boot_cpu(apicid, cpu);
+#endif

zap_low_mappings();
low_mappings = 0;

2008-08-13 14:18:01

by Ingo Molnar

[permalink] [raw]
Subject: Re: 2.6.27rc1 cannot boot more than 8CPUs


* Jeff Chua <[email protected]> wrote:

> On Wed, Aug 6, 2008 at 5:42 PM, Jeff Chua <[email protected]>
> wrote:
>>>> CONFIG_X86_GENERICARCH=y
>>>> CONFIG_X86_BIGSMP=y
>>>> 8 more cpu need bigsmp mode.
>
>> Just found it under " Subarchitecture Type (Generic architecture)",
>> and then it shows the CONFIG_X86_BIGSMP option.
>
> It works. Booted with 16CPUs. 32GB RAM.

btw., could you please check that v2.6.27-rc3 (or later) kernels boot
fine (with about 8 cpus) even if you hae genericarch/bigsmp disabled,
and do not silently hang as it happened on your box before?

Ingo

2008-08-13 17:11:00

by Jeff Chua

[permalink] [raw]
Subject: Re: 2.6.27rc1 cannot boot more than 8CPUs

On Wed, Aug 13, 2008 at 10:16 PM, Ingo Molnar <[email protected]> wrote:

> btw., could you please check that v2.6.27-rc3 (or later) kernels boot
> fine (with about 8 cpus) even if you have genericarch/bigsmp disabled,
> and do not silently hang as it happened on your box before?

With 16 CPUs, it still hangs, but now the console is showing the
errors as intended.
... but it is supposed to hang?

More than 8 CPUs detected - skipping them.
Use CONFIG_X86_GENERICARCH and CONFIG_X86_BIGSMP.
More than 8 CPUs detected - skipping them.
Use CONFIG_X86_GENERICARCH and CONFIG_X86_BIGSMP.
More than 8 CPUs detected - skipping them.
Use CONFIG_X86_GENERICARCH and CONFIG_X86_BIGSMP.
More than 8 CPUs detected - skipping them.
Use CONFIG_X86_GENERICARCH and CONFIG_X86_BIGSMP.
Booting processor 8/1 ip 6000
Initializing CPU#8
Calibrating delay using timer specific routine.. 3723.88 BogoMIPS (lpj=7447763)
CPU: L1 I cache: 32Kb, L1 D cache: 32K
CPU: L2 cache: 4096K
CPU: Physical Processor ID: 0
CPU: Processor Core ID: 1
CPU8: Intel(8) Xeon(R) CPU L7345 @ 1.86GHz stepping 0b
checking TSC synchronization [CPU#0 -> CPU#8]: passed.
*** HANGS HERE ***

Thanks,
Jeff.

2008-08-13 17:34:08

by Jeff Chua

[permalink] [raw]
Subject: Re: 2.6.27rc1 cannot boot more than 8CPUs

On Thu, Aug 14, 2008 at 1:10 AM, Jeff Chua <[email protected]> wrote:
> On Wed, Aug 13, 2008 at 10:16 PM, Ingo Molnar <[email protected]> wrote:
>
>> btw., could you please check that v2.6.27-rc3 (or later) kernels boot
>> fine (with about 8 cpus) even if you have genericarch/bigsmp disabled,
>> and do not silently hang as it happened on your box before?
>
> With 16 CPUs, it still hangs, but now the console is showing the
> errors as intended.
> ... but it is supposed to hang?

I tried with just CONFIG_NR_CPUS=8 and this time it booted, but stange
thing is I only see 2 CPUs! To be more precise, it's without both
CONFIG_X86_GENERICARCH and CONFIG_X86_BIGSMP.

And when I tried to enable the CPUs, it complained about:

# cat cpu6/online
0
# echo 1 > cpu6/online
More than 8 CPUs detected - skipping them.
Use CONFIG_X86_GENERICARCH and CONFIG_X86_BIGSMP.
-bash: echo: write error: Input/output error

Prior to the patch, the system booted with all 8 CPUs.

Again, if I enable both CONFIG_X86_GENERICARCH and CONFIG_X86_BIGSMP,
I get all 16 CPUs.

Thanks,
Jeff.

2008-08-13 17:40:16

by Ingo Molnar

[permalink] [raw]
Subject: Re: 2.6.27rc1 cannot boot more than 8CPUs


* Jeff Chua <[email protected]> wrote:

> On Thu, Aug 14, 2008 at 1:10 AM, Jeff Chua <[email protected]> wrote:
> > On Wed, Aug 13, 2008 at 10:16 PM, Ingo Molnar <[email protected]> wrote:
> >
> >> btw., could you please check that v2.6.27-rc3 (or later) kernels boot
> >> fine (with about 8 cpus) even if you have genericarch/bigsmp disabled,
> >> and do not silently hang as it happened on your box before?
> >
> > With 16 CPUs, it still hangs, but now the console is showing the
> > errors as intended.
> > ... but it is supposed to hang?
>
> I tried with just CONFIG_NR_CPUS=8 and this time it booted, but stange
> thing is I only see 2 CPUs! To be more precise, it's without both
> CONFIG_X86_GENERICARCH and CONFIG_X86_BIGSMP.
>
> And when I tried to enable the CPUs, it complained about:
>
> # cat cpu6/online
> 0
> # echo 1 > cpu6/online
> More than 8 CPUs detected - skipping them.
> Use CONFIG_X86_GENERICARCH and CONFIG_X86_BIGSMP.
> -bash: echo: write error: Input/output error
>
> Prior to the patch, the system booted with all 8 CPUs.
>
> Again, if I enable both CONFIG_X86_GENERICARCH and CONFIG_X86_BIGSMP,
> I get all 16 CPUs.

Yinghai, could the APIC ID enumeration be nonsequential and we skip CPUs
starting at the third one already? I think we should accept all CPUs
that are within our support range.

Ingo

2008-08-13 17:46:53

by Yinghai Lu

[permalink] [raw]
Subject: Re: 2.6.27rc1 cannot boot more than 8CPUs

On Wed, Aug 13, 2008 at 10:39 AM, Ingo Molnar <[email protected]> wrote:
>
> * Jeff Chua <[email protected]> wrote:
>
>> On Thu, Aug 14, 2008 at 1:10 AM, Jeff Chua <[email protected]> wrote:
>> > On Wed, Aug 13, 2008 at 10:16 PM, Ingo Molnar <[email protected]> wrote:
>> >
>> >> btw., could you please check that v2.6.27-rc3 (or later) kernels boot
>> >> fine (with about 8 cpus) even if you have genericarch/bigsmp disabled,
>> >> and do not silently hang as it happened on your box before?
>> >
>> > With 16 CPUs, it still hangs, but now the console is showing the
>> > errors as intended.
>> > ... but it is supposed to hang?
>>
>> I tried with just CONFIG_NR_CPUS=8 and this time it booted, but stange
>> thing is I only see 2 CPUs! To be more precise, it's without both
>> CONFIG_X86_GENERICARCH and CONFIG_X86_BIGSMP.
>>
>> And when I tried to enable the CPUs, it complained about:
>>
>> # cat cpu6/online
>> 0
>> # echo 1 > cpu6/online
>> More than 8 CPUs detected - skipping them.
>> Use CONFIG_X86_GENERICARCH and CONFIG_X86_BIGSMP.
>> -bash: echo: write error: Input/output error
>>
>> Prior to the patch, the system booted with all 8 CPUs.

that is new regression...

>>
>> Again, if I enable both CONFIG_X86_GENERICARCH and CONFIG_X86_BIGSMP,
>> I get all 16 CPUs.
>
> Yinghai, could the APIC ID enumeration be nonsequential and we skip CPUs
> starting at the third one already? I think we should accept all CPUs
> that are within our support range.

will try to clear those bits on smp_sanity_check...

YH

2008-08-13 18:33:24

by Yinghai Lu

[permalink] [raw]
Subject: Re: 2.6.27rc1 cannot boot more than 8CPUs

On Wed, Aug 13, 2008 at 10:46 AM, Yinghai Lu <[email protected]> wrote:
> On Wed, Aug 13, 2008 at 10:39 AM, Ingo Molnar <[email protected]> wrote:
>>
>> * Jeff Chua <[email protected]> wrote:
>>
>>> On Thu, Aug 14, 2008 at 1:10 AM, Jeff Chua <[email protected]> wrote:
>>> > On Wed, Aug 13, 2008 at 10:16 PM, Ingo Molnar <[email protected]> wrote:
>>> >
>>> >> btw., could you please check that v2.6.27-rc3 (or later) kernels boot
>>> >> fine (with about 8 cpus) even if you have genericarch/bigsmp disabled,
>>> >> and do not silently hang as it happened on your box before?
>>> >
>>> > With 16 CPUs, it still hangs, but now the console is showing the
>>> > errors as intended.
>>> > ... but it is supposed to hang?
>>>
>>> I tried with just CONFIG_NR_CPUS=8 and this time it booted, but stange
>>> thing is I only see 2 CPUs! To be more precise, it's without both
>>> CONFIG_X86_GENERICARCH and CONFIG_X86_BIGSMP.
>>>
>>> And when I tried to enable the CPUs, it complained about:
>>>
>>> # cat cpu6/online
>>> 0
>>> # echo 1 > cpu6/online
>>> More than 8 CPUs detected - skipping them.
>>> Use CONFIG_X86_GENERICARCH and CONFIG_X86_BIGSMP.
>>> -bash: echo: write error: Input/output error
>>>
>>> Prior to the patch, the system booted with all 8 CPUs.
>
> that is new regression...

jeff,

please check the attached patch. it should fix the new regression and
will not hang.

YH


Attachments:
(No filename) (1.36 kB)
big_smp_check.patch (1.80 kB)
Download all attachments

2008-08-14 07:17:15

by Jeff Chua

[permalink] [raw]
Subject: Re: 2.6.27rc1 cannot boot more than 8CPUs

On Thu, Aug 14, 2008 at 2:33 AM, Yinghai Lu <[email protected]> wrote:

>>>> > With 16 CPUs, it still hangs, but now the console is showing the
>>>> > errors as intended.
>>>> > ... but it is supposed to hang?

> please check the attached patch. it should fix the new regression and
> will not hang.

Ok, booted up and not hanged, but those messages below don't show up
anywhere. I've tested with CONFIG_NR_CPUS=16 and 8 as well. Just got 8
cpus

More than 8 CPUs detected - skipping them.
Use CONFIG_X86_GENERICARCH and CONFIG_X86_BIGSMP.

# cat /sys/devices/system/cpu/possible
0-7

CONFIG_X86_32=y
CONFIG_X86_PC=y


Looks like it's not going into this condition
+ if (def_to_bigsmp && nr_cpu_ids > 8) {


Shall this be put back so that it'll show the message?
- if (def_to_bigsmp && apicid > 8) {
- printk(KERN_WARNING
- "More than 8 CPUs detected - skipping them.\n"
- "Use CONFIG_X86_GENERICARCH and CONFIG_X86_BIGSMP.\n");
- }


Thanks,
Jeff.

2008-08-14 09:00:00

by Yinghai Lu

[permalink] [raw]
Subject: Re: 2.6.27rc1 cannot boot more than 8CPUs

On Thu, Aug 14, 2008 at 12:16 AM, Jeff Chua <[email protected]> wrote:
> On Thu, Aug 14, 2008 at 2:33 AM, Yinghai Lu <[email protected]> wrote:
>
>>>>> > With 16 CPUs, it still hangs, but now the console is showing the
>>>>> > errors as intended.
>>>>> > ... but it is supposed to hang?
>
>> please check the attached patch. it should fix the new regression and
>> will not hang.
>
> Ok, booted up and not hanged, but those messages below don't show up
> anywhere. I've tested with CONFIG_NR_CPUS=16 and 8 as well. Just got 8
> cpus
>
> More than 8 CPUs detected - skipping them.
> Use CONFIG_X86_GENERICARCH and CONFIG_X86_BIGSMP.
>
> # cat /sys/devices/system/cpu/possible
> 0-7
>
> CONFIG_X86_32=y
> CONFIG_X86_PC=y
>
>

double checked on one 16 cores system got

CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 512K (64 bytes/line)
CPU 0(4) -> Core 0
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#0.
using C1E aware idle routine
Checking 'hlt' instruction... OK.
ACPI: Core revision 20080609
Parsing all Control Methods:
Table [DSDT](id 0001) - 1289 Objects with 114 Devices 462 Methods 26 Regions
Parsing all Control Methods:
Table [SSDT](id 0002) - 80 Objects with 0 Devices 0 Methods 0 Regions
tbxface-0596 [00] tb_load_namespace : ACPI Tables successfully acquired
evxfevnt-0091 [00] enable : Transition to ACPI mode successful
More than 8 CPUs detected - skipping them.
Use CONFIG_X86_GENERICARCH and CONFIG_X86_BIGSMP.
enabled ExtINT on CPU#0

YH

2008-08-14 09:07:59

by Ingo Molnar

[permalink] [raw]
Subject: Re: 2.6.27rc1 cannot boot more than 8CPUs


* Jeff Chua <[email protected]> wrote:

> On Thu, Aug 14, 2008 at 2:33 AM, Yinghai Lu <[email protected]> wrote:
>
> >>>> > With 16 CPUs, it still hangs, but now the console is showing the
> >>>> > errors as intended.
> >>>> > ... but it is supposed to hang?
>
> > please check the attached patch. it should fix the new regression and
> > will not hang.
>
> Ok, booted up and not hanged, but those messages below don't show up
> anywhere. I've tested with CONFIG_NR_CPUS=16 and 8 as well. Just got 8
> cpus
>
> More than 8 CPUs detected - skipping them.
> Use CONFIG_X86_GENERICARCH and CONFIG_X86_BIGSMP.
>
> # cat /sys/devices/system/cpu/possible
> 0-7
>
> CONFIG_X86_32=y
> CONFIG_X86_PC=y
>
>
> Looks like it's not going into this condition
> + if (def_to_bigsmp && nr_cpu_ids > 8) {
>
>
> Shall this be put back so that it'll show the message?
> - if (def_to_bigsmp && apicid > 8) {
> - printk(KERN_WARNING
> - "More than 8 CPUs detected - skipping them.\n"
> - "Use CONFIG_X86_GENERICARCH and CONFIG_X86_BIGSMP.\n");
> - }

could you post the full dmesg? And the modified patch that you've tested
to both have 8 CPUs without bigsmp and which also shows the printk?

Ingo

2008-08-15 10:35:31

by Jeff Chua

[permalink] [raw]
Subject: Re: 2.6.27rc1 cannot boot more than 8CPUs



On Thu, Aug 14, 2008 at 5:07 PM, Ingo Molnar <[email protected]> wrote:
>
> * Jeff Chua <[email protected]> wrote:
>
>> On Thu, Aug 14, 2008 at 2:33 AM, Yinghai Lu <[email protected]>
wrote:
>>
>> >>>> > With 16 CPUs, it still hangs, but now the console is showing the
>> >>>> > errors as intended.
>> >>>> > ... but it is supposed to hang?
>>
>> > please check the attached patch. it should fix the new regression and
>> > will not hang.
>>
>> Ok, booted up and not hanged, but those messages below don't show up
>> anywhere. I've tested with CONFIG_NR_CPUS=16 and 8 as well. Just got 8
>> cpus
>>
>> More than 8 CPUs detected - skipping them.
>> Use CONFIG_X86_GENERICARCH and CONFIG_X86_BIGSMP.
>>
>> # cat /sys/devices/system/cpu/possible
>> 0-7
>>
>> CONFIG_X86_32=y
>> CONFIG_X86_PC=y
>>
>>
>> Looks like it's not going into this condition
>> + if (def_to_bigsmp && nr_cpu_ids > 8) {
>>
>>
>> Shall this be put back so that it'll show the message?
>> - if (def_to_bigsmp && apicid > 8) {
>> - printk(KERN_WARNING
>> - "More than 8 CPUs detected - skipping them.\n"
>> - "Use CONFIG_X86_GENERICARCH and
CONFIG_X86_BIGSMP.\n");
>> - }
>
> could you post the full dmesg? And the modified patch that you've tested
> to both have 8 CPUs without bigsmp and which also shows the printk?
>
> Ingo
>

Attached. cpu.8
CONFIG_X86_32=y
CONFIG_X86_PC=y
CONFIG_X86=y
CONFIG_NR_CPUS=8

Attached. cpu.16
CONFIG_X86_32=y
CONFIG_X86_PC=y
CONFIG_X86=y
CONFIG_NR_CPUS=16

Attached cpu.16.big
CONFIG_X86_32=y
CONFIG_X86=y
CONFIG_X86_GENERICARCH=y
CONFIG_X86_BIGSMP=y
# CONFIG_X86_PC is not set
CONFIG_NR_CPUS=16

Attached cpu.16.nobig (same as cpu.16 except with modifed patch)
CONFIG_X86_32=y
CONFIG_X86_PC=y
CONFIG_X86=y
CONFIG_NR_CPUS=16

Attached is the modified patch to make the warning appears. It's Yinghai's
patch modified to just display the error and continue to boot.


Thanks,
Jeff

--- linux.16/arch/x86/kernel/smpboot.c.org 2008-08-15 18:15:37 +0800
+++ linux.16/arch/x86/kernel/smpboot.c 2008-08-15 18:13:42 +0800
@@ -999,9 +999,12 @@
printk(KERN_WARNING
"More than 8 CPUs detected - skipping them.\n"
"Use CONFIG_X86_GENERICARCH and CONFIG_X86_BIGSMP.\n");
- }
-#endif
+ err = -1;
+ } else
+ err = do_boot_cpu(apicid, cpu);
+#else
err = do_boot_cpu(apicid, cpu);
+#endif

zap_low_mappings();
low_mappings = 0;



Attachments:
cpu.8 (30.77 kB)
cpu.16 (30.77 kB)
cpu.16.big (30.78 kB)
cpu.16.nobig (30.78 kB)
Download all attachments

2008-08-15 14:07:27

by Ingo Molnar

[permalink] [raw]
Subject: Re: 2.6.27rc1 cannot boot more than 8CPUs


* Jeff Chua <[email protected]> wrote:

>>> Shall this be put back so that it'll show the message?
>>> - if (def_to_bigsmp && apicid > 8) {
>>> - printk(KERN_WARNING
>>> - "More than 8 CPUs detected - skipping them.\n"
>>> - "Use CONFIG_X86_GENERICARCH and
> CONFIG_X86_BIGSMP.\n");
>>> - }
>>
>> could you post the full dmesg? And the modified patch that you've tested
>> to both have 8 CPUs without bigsmp and which also shows the printk?
>>
>> Ingo
>>
>
> Attached. cpu.8
> CONFIG_X86_32=y
> CONFIG_X86_PC=y
> CONFIG_X86=y
> CONFIG_NR_CPUS=8
>
> Attached. cpu.16
> CONFIG_X86_32=y
> CONFIG_X86_PC=y
> CONFIG_X86=y
> CONFIG_NR_CPUS=16
>
> Attached cpu.16.big
> CONFIG_X86_32=y
> CONFIG_X86=y
> CONFIG_X86_GENERICARCH=y
> CONFIG_X86_BIGSMP=y
> # CONFIG_X86_PC is not set
> CONFIG_NR_CPUS=16
>
> Attached cpu.16.nobig (same as cpu.16 except with modifed patch)
> CONFIG_X86_32=y
> CONFIG_X86_PC=y
> CONFIG_X86=y
> CONFIG_NR_CPUS=16
>
> Attached is the modified patch to make the warning appears. It's
> Yinghai's patch modified to just display the error and continue to
> boot.

thanks. To make sure it's all sorted out you might want to boot today's
tip/master and check whether it just does the right thing by default.
(it really should)

Ingo

2008-08-18 03:07:25

by Jeff Chua

[permalink] [raw]
Subject: Re: 2.6.27rc1 cannot boot more than 8CPUs

On Fri, Aug 15, 2008 at 10:07 PM, Ingo Molnar <[email protected]> wrote:

> thanks. To make sure it's all sorted out you might want to boot today's
> tip/master and check whether it just does the right thing by default.
> (it really should)

Yes, verified and it's working now. The warnings show up.

Short dmesg here. Detailed dmesg attached.

Checking 'hlt' instruction... OK.
ACPI: Core revision 20080609
More than 8 CPUs detected - skipping them.
Use CONFIG_X86_GENERICARCH and CONFIG_X86_BIGSMP.
ENABLING IO-APIC IRQs
..TIMER: vector=0x31 apic1=0 pin1=2 apic2=-1 pin2=-1
CPU0: Intel(R) Xeon(R) CPU L7345 @ 1.86GHz stepping 0b
CPU 1 irqstacks, hard=c0527000 soft=c0517000
Booting processor 1/8 ip 6000
Initializing CPU#1
Calibrating delay using timer specific routine.. 3723.84 BogoMIPS (lpj=7447688)
CPU: L1 I cache: 32K, L1 D cache: 32K
CPU: L2 cache: 4096K
CPU: Physical Processor ID: 2
CPU: Processor Core ID: 0

Thank you YingHai and Ingo for your great help! Sorry it look at while to reply.

Jeff.


Attachments:
(No filename) (0.99 kB)
dmesg.txt (38.39 kB)
Download all attachments

2008-08-18 08:01:19

by Ingo Molnar

[permalink] [raw]
Subject: Re: 2.6.27rc1 cannot boot more than 8CPUs


* Jeff Chua <[email protected]> wrote:

> On Fri, Aug 15, 2008 at 10:07 PM, Ingo Molnar <[email protected]> wrote:
>
> > thanks. To make sure it's all sorted out you might want to boot
> > today's tip/master and check whether it just does the right thing by
> > default. (it really should)
>
> Yes, verified and it's working now. The warnings show up.

and the system is up with ~8 cores active [not just 1 or 2], right?

ah, indeed:

> checking TSC synchronization [CPU#0 -> CPU#7]: passed.
> Brought up 8 CPUs
> Total of 8 processors activated (29790.66 BogoMIPS).

great - thanks for testing!

Ingo