2005-11-17 06:33:46

by Maneesh Soni

[permalink] [raw]
Subject: maxcpus=1 broken, ACPI bug?

Hi,

Using maxcpus=1 boot option, hangs the system while booting. It was
working till 2.6.13-rc2. After git bisect I found that after backing
out this ACPI patch it works again, though I had to manually sort the
reject while backing out.

http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=acf05f4b7f558051ea0028e8e617144123650272

This option is useful in case of kexec/kdump if only one cpu is
desired to be up during second kernel boot. I tried this on a 4-way
Xeon 2.5 MHz SMP system.

Following is the boot log with 2.6.15-rc1 with maxcpus=1 option. .config
is also attached.

Thanks
Maneesh

--------------------------------------------------------------------
Booting 'test kernel'

root (hd0,0)
Filesystem type is ext2fs, partition type 0x83
kernel /t ro root=/dev/sda2 rhgb console=tty0 console=ttyS0,38400 maxcpus=1
[Linux-bzImage, setup=0x1400, size=0x218992]
module /initrd-2.6.9-5.ELsmp.img
[Linux-initrd @ 0x37f6f000, 0x80f5d bytes]

Linux version 2.6.15-rc1 ([email protected]) (gcc version 3.4.3 20041212 (Red Hat 3.4.3-9.EL4)) #1 SMP PREEMPT Thu Nov 17 11:51:29 IST 2005
BIOS-provided physical RAM map:
BIOS-e820: 0000000000000000 - 000000000009dc00 (usable)
BIOS-e820: 000000000009dc00 - 00000000000a0000 (reserved)
BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved)
BIOS-e820: 0000000000100000 - 00000000e97f5f40 (usable)
BIOS-e820: 00000000e97f5f40 - 00000000e97ff800 (ACPI data)
BIOS-e820: 00000000e97ff800 - 00000000e9800000 (reserved)
BIOS-e820: 00000000fec00000 - 0000000100000000 (reserved)
BIOS-e820: 0000000100000000 - 0000000140000000 (usable)
4224MB HIGHMEM available.
896MB LOWMEM available.
found SMP MP-table at 0009dd40
DMI 2.3 present.
ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
Processor #0 15:2 APIC version 20
ACPI: LAPIC (acpi_id[0x01] lapic_id[0x02] enabled)
Processor #2 15:2 APIC version 20
WARNING: maxcpus limit of 1 reached. Processor ignored.
ACPI: LAPIC (acpi_id[0x02] lapic_id[0x04] enabled)
Processor #4 15:2 APIC version 20
WARNING: maxcpus limit of 1 reached. Processor ignored.
ACPI: LAPIC (acpi_id[0x03] lapic_id[0x06] enabled)
Processor #6 15:2 APIC version 20
WARNING: maxcpus limit of 1 reached. Processor ignored.
ACPI: LAPIC_NMI (acpi_id[0x00] dfl dfl lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x02] dfl dfl lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x04] dfl dfl lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x06] dfl dfl lint[0x1])
ACPI: IOAPIC (id[0x0e] address[0xfec00000] gsi_base[0])
IOAPIC[0]: apic_id 14, version 17, address 0xfec00000, GSI 0-15
ACPI: IOAPIC (id[0x0d] address[0xfec01000] gsi_base[16])
IOAPIC[1]: apic_id 13, version 17, address 0xfec01000, GSI 16-31
ACPI: IOAPIC (id[0x0c] address[0xfec02000] gsi_base[32])
IOAPIC[2]: apic_id 12, version 17, address 0xfec02000, GSI 32-47
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
Enabling APIC mode: Flat. Using 3 I/O APICs
Using ACPI (MADT) for SMP configuration information
Allocating PCI resources starting at ea000000 (gap: e9800000:15400000)
Built 1 zonelists
Kernel command line: ro root=/dev/sda2 rhgb console=tty0 console=ttyS0,38400 maxcpus=1
Initializing CPU#0
PID hash table entries: 4096 (order: 12, 65536 bytes)
Detected 2488.816 MHz processor.
Using tsc for high-res timesource
Console: colour VGA+ 80x25
Dentry cache hash table entries: 131072 (order: 7, 524288 bytes)
Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
Memory: 4819576k/5242880k available (3171k kernel code, 52464k reserved, 1007k data, 244k init, 3956692k highmem)
Checking if this processor honours the WP bit even in supervisor mode... Ok.
Calibrating delay using timer specific routine.. 4985.70 BogoMIPS (lpj=9971404)
Mount-cache hash table entries: 512
CPU: Trace cache: 12K uops, L1 D cache: 8K
CPU: L2 cache: 512K
CPU: L3 cache: 1024K
CPU: Hyper-Threading is disabled
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#0.
CPU0: Intel P4/Xeon Extended MCE MSRs (12) available
CPU0: Thermal monitoring enabled
mtrr: v2.0 (20020519)
Enabling fast FPU save and restore... done.
Enabling unmasked SIMD FPU exception support... done.
Checking 'hlt' instruction... OK.
tbxface-0109 [02] load_tables : ACPI Tables successfully acquired
Parsing all Control Methods:..........................................................................................................................................................................................................................................................................................................................................................................................................................................
Table [DSDT](id 0005) - 986 Objects with 113 Devices 426 Methods 13 Regions
ACPI Namespace successfully loaded at root c0576818
evxfevnt-0091 [03] enable : Transition to ACPI mode successful
CPU0: Intel(R) Xeon(TM) MP CPU 2.50GHz stepping 05
Total of 1 processors activated (4985.70 BogoMIPS).
ENABLING IO-APIC IRQs
..TIMER: vector=0x31 apic1=0 pin1=2 apic2=-1 pin2=-1
Brought up 1 CPUs
checking if image is initramfs... it is
Freeing initrd memory: 515k freed
NET: Registered protocol family 16
ACPI: bus type pci registered
PCI: PCI BIOS revision 2.10 entry at 0xfd74c, last bus=12
PCI: Using configuration type 1
usbcore: registered new driver usbfs
usbcore: registered new driver hub
ACPI: Subsystem revision 20050902
evgpeblk-0988 [06] ev_create_gpe_block : GPE 00 to 1F [_GPE] 4 regs on int 0x7evgpeblk-0996 [06] ev_create_gpe_block : Found 1 Wake, Enabled 0 Runtime GPEs
in this block
evgpeblk-0988 [06] ev_create_gpe_block : GPE 20 to 3F [_GPE] 4 regs on int 0x7evgpeblk-0996 [06] ev_create_gpe_block : Found 0 Wake, Enabled 1 Runtime GPEs
in this block
Completing Region/Field/Buffer/Package initialization:......................................
Initialized 13/13 Regions 0/0 Fields 8/8 Buffers 17/18 Packages (995 nodes)
Executing all Device _STA and_INI methods:.......................................................................................................................
119 Devices found containing: 119 _STA, 1 _INI methods
ACPI: Interpreter enabled
ACPI: Using IOAPIC for interrupt routing
ACPI: PCI Root Bridge [PCI0] (0000:00)
PCI: Probing PCI hardware (bus 00)
PCI: Ignoring BAR0-3 of IDE controller 0000:00:0f.1
ACPI: PCI Root Bridge [PCI1] (0000:01)
PCI: Probing PCI hardware (bus 01)
ACPI: PCI Root Bridge [PCI2] (0000:05)
PCI: Probing PCI hardware (bus 05)
ACPI: PCI Root Bridge [PCI3] (0000:08)
PCI: Probing PCI hardware (bus 08)
ACPI: PCI Root Bridge [PCI4] (0000:09)
PCI: Probing PCI hardware (bus 09)
ACPI: PCI Interrupt Link [LP00] (IRQs *10)
pci_link-0119 [09] pci_link_check_possibl: Blank IRQ resource
ACPI: PCI Interrupt Link [LP01] (IRQs) *0, disabled.
pci_link-0119 [09] pci_link_check_possibl: Blank IRQ resource
ACPI: PCI Interrupt Link [LP02] (IRQs) *0, disabled.
pci_link-0119 [09] pci_link_check_possibl: Blank IRQ resource
ACPI: PCI Interrupt Link [LP03] (IRQs) *0, disabled.
ACPI: PCI Interrupt Link [LP04] (IRQs *5)
ACPI: PCI Interrupt Link [LP05] (IRQs *5)
ACPI: PCI Interrupt Link [LP06] (IRQs *5)
ACPI: PCI Interrupt Link [LP07] (IRQs *5)
ACPI: PCI Interrupt Link [LP08] (IRQs *5)
ACPI: PCI Interrupt Link [LP09] (IRQs *5)
ACPI: PCI Interrupt Link [LP0A] (IRQs *5)
ACPI: PCI Interrupt Link [LP0B] (IRQs *5)
ACPI: PCI Interrupt Link [LP0C] (IRQs *5)
ACPI: PCI Interrupt Link [LP0D] (IRQs *5)
ACPI: PCI Interrupt Link [LP0E] (IRQs *5)
ACPI: PCI Interrupt Link [LP0F] (IRQs *5)
ACPI: PCI Interrupt Link [LP10] (IRQs *5)
ACPI: PCI Interrupt Link [LP11] (IRQs *5)
ACPI: PCI Interrupt Link [LP12] (IRQs *5)
ACPI: PCI Interrupt Link [LP13] (IRQs *5)
ACPI: PCI Interrupt Link [LP14] (IRQs *5)
ACPI: PCI Interrupt Link [LP15] (IRQs *5)
ACPI: PCI Interrupt Link [LP16] (IRQs *5)
ACPI: PCI Interrupt Link [LP17] (IRQs *5)
ACPI: PCI Interrupt Link [LP18] (IRQs *5)
ACPI: PCI Interrupt Link [LP19] (IRQs *5)
ACPI: PCI Interrupt Link [LP1A] (IRQs *5)
ACPI: PCI Interrupt Link [LP1B] (IRQs *5)
ACPI: PCI Interrupt Link [LP1C] (IRQs *9)
ACPI: PCI Interrupt Link [LP1D] (IRQs *9)
pci_link-0119 [09] pci_link_check_possibl: Blank IRQ resource
ACPI: PCI Interrupt Link [LP1E] (IRQs) *0, disabled.
ACPI: PCI Interrupt Link [LP1F] (IRQs *3)
ACPI: PCI Interrupt Link [LPUS] (IRQs *11)
Linux Plug and Play Support v0.97 (c) Adam Belay
pnp: PnP ACPI init
pnp: PnP ACPI: found 13 devices
SCSI subsystem initialized
PCI: Using ACPI for IRQ routing
PCI: If a device doesn't work, try "pci=routeirq". If it helps, post a report
pnp: 00:00: ioport range 0x900-0x93f has been reserved
pnp: 00:00: ioport range 0x510-0x517 could not be reserved
pnp: 00:00: ioport range 0x504-0x507 could not be reserved
pnp: 00:00: ioport range 0x500-0x503 could not be reserved
pnp: 00:00: ioport range 0x520-0x53f has been reserved
pnp: 00:00: ioport range 0x420-0x427 has been reserved
pnp: 00:00: ioport range 0x460-0x461 has been reserved
pnp: 00:0a: ioport range 0x1ec-0x1ef has been reserved
pnp: 00:0a: ioport range 0x400-0x4fe could not be reserved
pnp: 00:0a: ioport range 0x600-0x600 has been reserved
pnp: 00:0a: ioport range 0x800-0x80f has been reserved
pnp: 00:0a: ioport range 0xc00-0xcfe could not be reserved
pnp: 00:0a: ioport range 0xf50-0xf58 has been reserved
Machine check exception polling timer started.
audit: initializing netlink socket (disabled)
audit(1132228762.256:1): initialized
highmem bounce pool size: 64 pages
Installing knfsd (copyright (C) 1996 [email protected]).
io scheduler noop registered
io scheduler anticipatory registered
io scheduler deadline registered
io scheduler cfq registered
usbmon: debugfs is not available
USB Universal Host Controller Interface driver v2.3
usbcore: registered new driver usblp
drivers/usb/class/usblp.c: v0.13: USB Printer Device Class driver
Initializing USB Mass Storage driver...
usbcore: registered new driver usb-storage
USB Mass Storage support registered.
usbcore: registered new driver usbhid
drivers/usb/input/hid-core.c: v2.6:USB HID core driver
usbcore: registered new driver touchkitusb
usbcore: registered new driver cytherm
drivers/usb/misc/cytherm.c: v1.0:Cypress USB Thermometer driver
usbcore: registered new driver phidgetservo
ACPI: Power Button (FF) [PWRF]
ACPI: CPU0 (power states: C1[C1])
acpi_processor-0507 [06] processor_get_info : Error getting cpuindex for acpiid 0x2
acpi_processor-0507 [06] processor_get_info : Error getting cpuindex for acpiid 0x1
ACPI: CPU0 (power states: C1[C1])
lp: driver loaded but no devices found
Linux agpgart interface v0.101 (c) Dave Jones
[drm] Initialized drm 1.0.0 20040925
PNP: PS/2 Controller [PNP0303:PS2K,PNP0f13:PS2M] at 0x64,0x60 irq 1,12
PNP: PS/2 controller has invalid data port 0x64; using default 0x60
PNP: PS/2 controller has invalid command port 0x60; using default 0x64
serio: i8042 AUX port at 0x60,0x64 irq 12
serio: i8042 KBD port at 0x60,0x64 irq 1
Serial: 8250/16550 driver $Revision: 1.90 $ 4 ports, IRQ sharing disabled
serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
00:04: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
RAMDISK driver initialized: 16 RAM disks of 4096K size 1024 blocksize
tg3.c:v3.43 (Oct 24, 2005)
acpi_bus-0200 [01] bus_set_power : Device is not power manageable
ACPI: PCI Interrupt 0000:09:08.0[A] -> GSI 47 (level, low) -> IRQ 16
eth0: Tigon3 [partno(BCM95703) rev 1002 PHY(5703)] (PCIX:100MHz:64-bit) 10/100/1000BaseT Ethernet 00:02:55:df:b4:13
eth0: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] Split[0] WireSpeed[1] TSOcap[1]
eth0: dma_rwctrl[769f4000]
Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx

<<<< hangs here >>>>


--
Maneesh Soni
Linux Technology Center,
IBM India Software Labs,
Bangalore, India
email: [email protected]
Phone: 91-80-25044990


Attachments:
(No filename) (11.61 kB)
2.6.15-rc1-maxcpus.config (32.21 kB)
Download all attachments

2005-11-17 14:32:04

by Pallipadi, Venkatesh

[permalink] [raw]
Subject: RE: maxcpus=1 broken, ACPI bug?


Hi,

I am not yet able to see how this patch can cause a hang like that. My
initial guess is that it has got something to with idle routine and
preempt. I will look more into this and try to reproduce it locally. Can
you please try out disable preemption in your config and try 2.6.15-rc1
and let me know how it goes.

Thanks,
Venki

>-----Original Message-----
>From: Maneesh Soni [mailto:[email protected]]
>Sent: Wednesday, November 16, 2005 10:31 PM
>To: LKML
>Cc: Pallipadi, Venkatesh; Brown, Len; Andrew Morton; Linus Torvalds
>Subject: maxcpus=1 broken, ACPI bug?
>
>Hi,
>
>Using maxcpus=1 boot option, hangs the system while booting. It was
>working till 2.6.13-rc2. After git bisect I found that after backing
>out this ACPI patch it works again, though I had to manually sort the
>reject while backing out.
>
>http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.
6.git;a=commitdiff;h=acf05f4b7f558051ea0028e8e617144123650272
>
>This option is useful in case of kexec/kdump if only one cpu is
>desired to be up during second kernel boot. I tried this on a 4-way
>Xeon 2.5 MHz SMP system.
>
>Following is the boot log with 2.6.15-rc1 with maxcpus=1
>option. .config
>is also attached.
>
>Thanks
>Maneesh
>
>--------------------------------------------------------------------
> Booting 'test kernel'
>
>root (hd0,0)
> Filesystem type is ext2fs, partition type 0x83
>kernel /t ro root=/dev/sda2 rhgb console=tty0
>console=ttyS0,38400 maxcpus=1
> [Linux-bzImage, setup=0x1400, size=0x218992]
>module /initrd-2.6.9-5.ELsmp.img
> [Linux-initrd @ 0x37f6f000, 0x80f5d bytes]
>
>Linux version 2.6.15-rc1 ([email protected]) (gcc version
>3.4.3 20041212 (Red Hat 3.4.3-9.EL4)) #1 SMP PREEMPT Thu Nov
>17 11:51:29 IST 2005
>BIOS-provided physical RAM map:
> BIOS-e820: 0000000000000000 - 000000000009dc00 (usable)
> BIOS-e820: 000000000009dc00 - 00000000000a0000 (reserved)
> BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved)
> BIOS-e820: 0000000000100000 - 00000000e97f5f40 (usable)
> BIOS-e820: 00000000e97f5f40 - 00000000e97ff800 (ACPI data)
> BIOS-e820: 00000000e97ff800 - 00000000e9800000 (reserved)
> BIOS-e820: 00000000fec00000 - 0000000100000000 (reserved)
> BIOS-e820: 0000000100000000 - 0000000140000000 (usable)
>4224MB HIGHMEM available.
>896MB LOWMEM available.
>found SMP MP-table at 0009dd40
>DMI 2.3 present.
>ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
>Processor #0 15:2 APIC version 20
>ACPI: LAPIC (acpi_id[0x01] lapic_id[0x02] enabled)
>Processor #2 15:2 APIC version 20
>WARNING: maxcpus limit of 1 reached. Processor ignored.
>ACPI: LAPIC (acpi_id[0x02] lapic_id[0x04] enabled)
>Processor #4 15:2 APIC version 20
>WARNING: maxcpus limit of 1 reached. Processor ignored.
>ACPI: LAPIC (acpi_id[0x03] lapic_id[0x06] enabled)
>Processor #6 15:2 APIC version 20
>WARNING: maxcpus limit of 1 reached. Processor ignored.
>ACPI: LAPIC_NMI (acpi_id[0x00] dfl dfl lint[0x1])
>ACPI: LAPIC_NMI (acpi_id[0x02] dfl dfl lint[0x1])
>ACPI: LAPIC_NMI (acpi_id[0x04] dfl dfl lint[0x1])
>ACPI: LAPIC_NMI (acpi_id[0x06] dfl dfl lint[0x1])
>ACPI: IOAPIC (id[0x0e] address[0xfec00000] gsi_base[0])
>IOAPIC[0]: apic_id 14, version 17, address 0xfec00000, GSI 0-15
>ACPI: IOAPIC (id[0x0d] address[0xfec01000] gsi_base[16])
>IOAPIC[1]: apic_id 13, version 17, address 0xfec01000, GSI 16-31
>ACPI: IOAPIC (id[0x0c] address[0xfec02000] gsi_base[32])
>IOAPIC[2]: apic_id 12, version 17, address 0xfec02000, GSI 32-47
>ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
>Enabling APIC mode: Flat. Using 3 I/O APICs
>Using ACPI (MADT) for SMP configuration information
>Allocating PCI resources starting at ea000000 (gap: e9800000:15400000)
>Built 1 zonelists
>Kernel command line: ro root=/dev/sda2 rhgb console=tty0
>console=ttyS0,38400 maxcpus=1
>Initializing CPU#0
>PID hash table entries: 4096 (order: 12, 65536 bytes)
>Detected 2488.816 MHz processor.
>Using tsc for high-res timesource
>Console: colour VGA+ 80x25
>Dentry cache hash table entries: 131072 (order: 7, 524288 bytes)
>Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
>Memory: 4819576k/5242880k available (3171k kernel code, 52464k
>reserved, 1007k data, 244k init, 3956692k highmem)
>Checking if this processor honours the WP bit even in
>supervisor mode... Ok.
>Calibrating delay using timer specific routine.. 4985.70
>BogoMIPS (lpj=9971404)
>Mount-cache hash table entries: 512
>CPU: Trace cache: 12K uops, L1 D cache: 8K
>CPU: L2 cache: 512K
>CPU: L3 cache: 1024K
>CPU: Hyper-Threading is disabled
>Intel machine check architecture supported.
>Intel machine check reporting enabled on CPU#0.
>CPU0: Intel P4/Xeon Extended MCE MSRs (12) available
>CPU0: Thermal monitoring enabled
>mtrr: v2.0 (20020519)
>Enabling fast FPU save and restore... done.
>Enabling unmasked SIMD FPU exception support... done.
>Checking 'hlt' instruction... OK.
> tbxface-0109 [02] load_tables : ACPI Tables
>successfully acquired
>Parsing all Control
>Methods:.......................................................
>...............................................................
>...............................................................
>...............................................................
>...............................................................
>...............................................................
>........................................................
>Table [DSDT](id 0005) - 986 Objects with 113 Devices 426
>Methods 13 Regions
>ACPI Namespace successfully loaded at root c0576818
>evxfevnt-0091 [03] enable : Transition to ACPI
>mode successful
>CPU0: Intel(R) Xeon(TM) MP CPU 2.50GHz stepping 05
>Total of 1 processors activated (4985.70 BogoMIPS).
>ENABLING IO-APIC IRQs
>..TIMER: vector=0x31 apic1=0 pin1=2 apic2=-1 pin2=-1
>Brought up 1 CPUs
>checking if image is initramfs... it is
>Freeing initrd memory: 515k freed
>NET: Registered protocol family 16
>ACPI: bus type pci registered
>PCI: PCI BIOS revision 2.10 entry at 0xfd74c, last bus=12
>PCI: Using configuration type 1
>usbcore: registered new driver usbfs
>usbcore: registered new driver hub
>ACPI: Subsystem revision 20050902
>evgpeblk-0988 [06] ev_create_gpe_block : GPE 00 to 1F [_GPE]
>4 regs on int 0x7evgpeblk-0996 [06] ev_create_gpe_block :
>Found 1 Wake, Enabled 0 Runtime GPEs
>in this block
>evgpeblk-0988 [06] ev_create_gpe_block : GPE 20 to 3F [_GPE]
>4 regs on int 0x7evgpeblk-0996 [06] ev_create_gpe_block :
>Found 0 Wake, Enabled 1 Runtime GPEs
>in this block
>Completing Region/Field/Buffer/Package
>initialization:......................................
>Initialized 13/13 Regions 0/0 Fields 8/8 Buffers 17/18
>Packages (995 nodes)
>Executing all Device _STA and_INI
>methods:.......................................................
>................................................................
>119 Devices found containing: 119 _STA, 1 _INI methods
>ACPI: Interpreter enabled
>ACPI: Using IOAPIC for interrupt routing
>ACPI: PCI Root Bridge [PCI0] (0000:00)
>PCI: Probing PCI hardware (bus 00)
>PCI: Ignoring BAR0-3 of IDE controller 0000:00:0f.1
>ACPI: PCI Root Bridge [PCI1] (0000:01)
>PCI: Probing PCI hardware (bus 01)
>ACPI: PCI Root Bridge [PCI2] (0000:05)
>PCI: Probing PCI hardware (bus 05)
>ACPI: PCI Root Bridge [PCI3] (0000:08)
>PCI: Probing PCI hardware (bus 08)
>ACPI: PCI Root Bridge [PCI4] (0000:09)
>PCI: Probing PCI hardware (bus 09)
>ACPI: PCI Interrupt Link [LP00] (IRQs *10)
>pci_link-0119 [09] pci_link_check_possibl: Blank IRQ resource
>ACPI: PCI Interrupt Link [LP01] (IRQs) *0, disabled.
>pci_link-0119 [09] pci_link_check_possibl: Blank IRQ resource
>ACPI: PCI Interrupt Link [LP02] (IRQs) *0, disabled.
>pci_link-0119 [09] pci_link_check_possibl: Blank IRQ resource
>ACPI: PCI Interrupt Link [LP03] (IRQs) *0, disabled.
>ACPI: PCI Interrupt Link [LP04] (IRQs *5)
>ACPI: PCI Interrupt Link [LP05] (IRQs *5)
>ACPI: PCI Interrupt Link [LP06] (IRQs *5)
>ACPI: PCI Interrupt Link [LP07] (IRQs *5)
>ACPI: PCI Interrupt Link [LP08] (IRQs *5)
>ACPI: PCI Interrupt Link [LP09] (IRQs *5)
>ACPI: PCI Interrupt Link [LP0A] (IRQs *5)
>ACPI: PCI Interrupt Link [LP0B] (IRQs *5)
>ACPI: PCI Interrupt Link [LP0C] (IRQs *5)
>ACPI: PCI Interrupt Link [LP0D] (IRQs *5)
>ACPI: PCI Interrupt Link [LP0E] (IRQs *5)
>ACPI: PCI Interrupt Link [LP0F] (IRQs *5)
>ACPI: PCI Interrupt Link [LP10] (IRQs *5)
>ACPI: PCI Interrupt Link [LP11] (IRQs *5)
>ACPI: PCI Interrupt Link [LP12] (IRQs *5)
>ACPI: PCI Interrupt Link [LP13] (IRQs *5)
>ACPI: PCI Interrupt Link [LP14] (IRQs *5)
>ACPI: PCI Interrupt Link [LP15] (IRQs *5)
>ACPI: PCI Interrupt Link [LP16] (IRQs *5)
>ACPI: PCI Interrupt Link [LP17] (IRQs *5)
>ACPI: PCI Interrupt Link [LP18] (IRQs *5)
>ACPI: PCI Interrupt Link [LP19] (IRQs *5)
>ACPI: PCI Interrupt Link [LP1A] (IRQs *5)
>ACPI: PCI Interrupt Link [LP1B] (IRQs *5)
>ACPI: PCI Interrupt Link [LP1C] (IRQs *9)
>ACPI: PCI Interrupt Link [LP1D] (IRQs *9)
>pci_link-0119 [09] pci_link_check_possibl: Blank IRQ resource
>ACPI: PCI Interrupt Link [LP1E] (IRQs) *0, disabled.
>ACPI: PCI Interrupt Link [LP1F] (IRQs *3)
>ACPI: PCI Interrupt Link [LPUS] (IRQs *11)
>Linux Plug and Play Support v0.97 (c) Adam Belay
>pnp: PnP ACPI init
>pnp: PnP ACPI: found 13 devices
>SCSI subsystem initialized
>PCI: Using ACPI for IRQ routing
>PCI: If a device doesn't work, try "pci=routeirq". If it
>helps, post a report
>pnp: 00:00: ioport range 0x900-0x93f has been reserved
>pnp: 00:00: ioport range 0x510-0x517 could not be reserved
>pnp: 00:00: ioport range 0x504-0x507 could not be reserved
>pnp: 00:00: ioport range 0x500-0x503 could not be reserved
>pnp: 00:00: ioport range 0x520-0x53f has been reserved
>pnp: 00:00: ioport range 0x420-0x427 has been reserved
>pnp: 00:00: ioport range 0x460-0x461 has been reserved
>pnp: 00:0a: ioport range 0x1ec-0x1ef has been reserved
>pnp: 00:0a: ioport range 0x400-0x4fe could not be reserved
>pnp: 00:0a: ioport range 0x600-0x600 has been reserved
>pnp: 00:0a: ioport range 0x800-0x80f has been reserved
>pnp: 00:0a: ioport range 0xc00-0xcfe could not be reserved
>pnp: 00:0a: ioport range 0xf50-0xf58 has been reserved
>Machine check exception polling timer started.
>audit: initializing netlink socket (disabled)
>audit(1132228762.256:1): initialized
>highmem bounce pool size: 64 pages
>Installing knfsd (copyright (C) 1996 [email protected]).
>io scheduler noop registered
>io scheduler anticipatory registered
>io scheduler deadline registered
>io scheduler cfq registered
>usbmon: debugfs is not available
>USB Universal Host Controller Interface driver v2.3
>usbcore: registered new driver usblp
>drivers/usb/class/usblp.c: v0.13: USB Printer Device Class driver
>Initializing USB Mass Storage driver...
>usbcore: registered new driver usb-storage
>USB Mass Storage support registered.
>usbcore: registered new driver usbhid
>drivers/usb/input/hid-core.c: v2.6:USB HID core driver
>usbcore: registered new driver touchkitusb
>usbcore: registered new driver cytherm
>drivers/usb/misc/cytherm.c: v1.0:Cypress USB Thermometer driver
>usbcore: registered new driver phidgetservo
>ACPI: Power Button (FF) [PWRF]
>ACPI: CPU0 (power states: C1[C1])
>acpi_processor-0507 [06] processor_get_info : Error getting
>cpuindex for acpiid 0x2
>acpi_processor-0507 [06] processor_get_info : Error getting
>cpuindex for acpiid 0x1
>ACPI: CPU0 (power states: C1[C1])
>lp: driver loaded but no devices found
>Linux agpgart interface v0.101 (c) Dave Jones
>[drm] Initialized drm 1.0.0 20040925
>PNP: PS/2 Controller [PNP0303:PS2K,PNP0f13:PS2M] at 0x64,0x60 irq 1,12
>PNP: PS/2 controller has invalid data port 0x64; using default 0x60
>PNP: PS/2 controller has invalid command port 0x60; using default 0x64
>serio: i8042 AUX port at 0x60,0x64 irq 12
>serio: i8042 KBD port at 0x60,0x64 irq 1
>Serial: 8250/16550 driver $Revision: 1.90 $ 4 ports, IRQ
>sharing disabled
>serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
>00:04: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
>RAMDISK driver initialized: 16 RAM disks of 4096K size 1024 blocksize
>tg3.c:v3.43 (Oct 24, 2005)
>acpi_bus-0200 [01] bus_set_power : Device is not power
>manageable
>ACPI: PCI Interrupt 0000:09:08.0[A] -> GSI 47 (level, low) -> IRQ 16
>eth0: Tigon3 [partno(BCM95703) rev 1002 PHY(5703)]
>(PCIX:100MHz:64-bit) 10/100/1000BaseT Ethernet 00:02:55:df:b4:13
>eth0: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] Split[0]
>WireSpeed[1] TSOcap[1]
>eth0: dma_rwctrl[769f4000]
>Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
>ide: Assuming 33MHz system bus speed for PIO modes; override
>with idebus=xx
>
><<<< hangs here >>>>
>
>
>--
>Maneesh Soni
>Linux Technology Center,
>IBM India Software Labs,
>Bangalore, India
>email: [email protected]
>Phone: 91-80-25044990
>

2005-11-17 19:01:33

by Maneesh Soni

[permalink] [raw]
Subject: RE: maxcpus=1 broken, ACPI bug?

Quoting "Pallipadi, Venkatesh" <[email protected]>:

>
> Hi,
>
> I am not yet able to see how this patch can cause a hang like that. My
> initial guess is that it has got something to with idle routine and
> preempt. I will look more into this and try to reproduce it locally. Can
> you please try out disable preemption in your config and try 2.6.15-rc1
> and let me know how it goes.

Hi Venki,

Thanks for looking into this. I tried 2.6.15-rc1 with CONFIG_PREEMPT_NONE=y
but still it hangs in the same way.

Thanks
Maneesh

--
Maneesh Soni
IBM Linux Technology Center
IBM India Software Labs,
Bangalore, India
Ph. 91-80-25044990
email: [email protected]

2005-11-17 19:54:39

by Linus Torvalds

[permalink] [raw]
Subject: Re: maxcpus=1 broken, ACPI bug?



On Thu, 17 Nov 2005, Maneesh Soni wrote:
>
> Using maxcpus=1 boot option, hangs the system while booting. It was
> working till 2.6.13-rc2. After git bisect I found that after backing
> out this ACPI patch it works again, though I had to manually sort the
> reject while backing out.
>
> http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=acf05f4b7f558051ea0028e8e617144123650272

Hmm. That patch had a totally idiotic thinko in it (look at the for-loop
in acpi_processor_get_power_info_default() and notice how it doesn't
actually change anything in the loop).

That thinko was later fixed (albeit in a really stupid way, and the same
cut-and-paste bug still exists in acpi_processor_get_power_info_fadt()).

Anyway, can you test this diff? It

(a) removes the insane (and in one case incorrect) memset loop
(b) makes the code that sets "pr->flags.power = 1" match the comment and
the previous behaviour.

Does that make a difference?

Linus

---
diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c
index 573b6a9..2445828 100644
--- a/drivers/acpi/processor_idle.c
+++ b/drivers/acpi/processor_idle.c
@@ -524,8 +524,7 @@ static int acpi_processor_get_power_info
if (!pr->pblk)
return_VALUE(-ENODEV);

- for (i = 0; i < ACPI_PROCESSOR_MAX_POWER; i++)
- memset(pr->power.states, 0, sizeof(struct acpi_processor_cx));
+ memset(pr->power.states, 0, sizeof(pr->power.states));

/* if info is obtained from pblk/fadt, type equals state */
pr->power.states[ACPI_STATE_C1].type = ACPI_STATE_C1;
@@ -559,9 +558,7 @@ static int acpi_processor_get_power_info

ACPI_FUNCTION_TRACE("acpi_processor_get_power_info_default_c1");

- for (i = 0; i < ACPI_PROCESSOR_MAX_POWER; i++)
- memset(&(pr->power.states[i]), 0,
- sizeof(struct acpi_processor_cx));
+ memset(pr->power.states, 0, sizeof(pr->power.states));

/* if info is obtained from pblk/fadt, type equals state */
pr->power.states[ACPI_STATE_C1].type = ACPI_STATE_C1;
@@ -873,7 +870,8 @@ static int acpi_processor_get_power_info
for (i = 1; i < ACPI_PROCESSOR_MAX_POWER; i++) {
if (pr->power.states[i].valid) {
pr->power.count = i;
- pr->flags.power = 1;
+ if (pr->power.states[i].type >= ACPI_STATE_C2)
+ pr->flags.power = 1;
}
}

2005-11-17 22:49:55

by Pallipadi, Venkatesh

[permalink] [raw]
Subject: RE: maxcpus=1 broken, ACPI bug?




>-----Original Message-----
>From: Linus Torvalds [mailto:[email protected]]
>Sent: Thursday, November 17, 2005 11:55 AM
>To: Maneesh Soni
>Cc: LKML; Pallipadi, Venkatesh; Brown, Len; Andrew Morton
>Subject: Re: maxcpus=1 broken, ACPI bug?
>
>
>
>On Thu, 17 Nov 2005, Maneesh Soni wrote:
>>
>> Using maxcpus=1 boot option, hangs the system while booting. It was
>> working till 2.6.13-rc2. After git bisect I found that after backing
>> out this ACPI patch it works again, though I had to manually sort the
>> reject while backing out.
>>
>>
>http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.
6.git;a=commitdiff;h=acf05f4b7f558051ea0028e8e617144123650272
>
>Hmm. That patch had a totally idiotic thinko in it (look at
>the for-loop
>in acpi_processor_get_power_info_default() and notice how it doesn't
>actually change anything in the loop).
>
>That thinko was later fixed (albeit in a really stupid way,
>and the same
>cut-and-paste bug still exists in
>acpi_processor_get_power_info_fadt()).
>

Oops. I had missed the memset in ..info_fadt() in my stupid bug fixing
patch.

There is another patch here http://bugme.osdl.org/show_bug.cgi?id=4485
That touches the same code and doing memset in a loop turned out to be
useful
as we want to initialize only some of those states.

>Anyway, can you test this diff? It
>
> (a) removes the insane (and in one case incorrect) memset loop
> (b) makes the code that sets "pr->flags.power = 1" match the
>comment and
> the previous behaviour.

Regarding the last hunk in the patch below..
Actually, we want to have acpi_processor_idle() used even when only C1
is supported and that C1 has nothing to do with ACPI. The reason being
that there is no generic interface that can show the cstate usage and
time spent in cstate etc. We want to reuse the cstate statistics in
/proc interface in acpi_processor_idle(). However, in long term, this
needs a cleanup and a generic cstate handler is required (similar to
cpufreq for P-state) and acpi should just plugin the handler and latency
and similar details for each cstate into this generic cstate handler.

And I tried on couple of Xeons here and haven't had luck in reproducing
this hang yet.

Thanks,
Venki

>
>Does that make a difference?
>
> Linus
>
>---
>diff --git a/drivers/acpi/processor_idle.c
>b/drivers/acpi/processor_idle.c
>index 573b6a9..2445828 100644
>--- a/drivers/acpi/processor_idle.c
>+++ b/drivers/acpi/processor_idle.c
>@@ -524,8 +524,7 @@ static int acpi_processor_get_power_info
> if (!pr->pblk)
> return_VALUE(-ENODEV);
>
>- for (i = 0; i < ACPI_PROCESSOR_MAX_POWER; i++)
>- memset(pr->power.states, 0, sizeof(struct
>acpi_processor_cx));
>+ memset(pr->power.states, 0, sizeof(pr->power.states));
>
> /* if info is obtained from pblk/fadt, type equals state */
> pr->power.states[ACPI_STATE_C1].type = ACPI_STATE_C1;
>@@ -559,9 +558,7 @@ static int acpi_processor_get_power_info
>
> ACPI_FUNCTION_TRACE("acpi_processor_get_power_info_default_c1");
>
>- for (i = 0; i < ACPI_PROCESSOR_MAX_POWER; i++)
>- memset(&(pr->power.states[i]), 0,
>- sizeof(struct acpi_processor_cx));
>+ memset(pr->power.states, 0, sizeof(pr->power.states));
>
> /* if info is obtained from pblk/fadt, type equals state */
> pr->power.states[ACPI_STATE_C1].type = ACPI_STATE_C1;
>@@ -873,7 +870,8 @@ static int acpi_processor_get_power_info
> for (i = 1; i < ACPI_PROCESSOR_MAX_POWER; i++) {
> if (pr->power.states[i].valid) {
> pr->power.count = i;
>- pr->flags.power = 1;
>+ if (pr->power.states[i].type >= ACPI_STATE_C2)
>+ pr->flags.power = 1;
> }
> }
>
>

2005-11-18 10:32:05

by Maneesh Soni

[permalink] [raw]
Subject: Re: maxcpus=1 broken, ACPI bug?

On Thu, Nov 17, 2005 at 11:54:32AM -0800, Linus Torvalds wrote:
>
>
> On Thu, 17 Nov 2005, Maneesh Soni wrote:
> >
> > Using maxcpus=1 boot option, hangs the system while booting. It was
> > working till 2.6.13-rc2. After git bisect I found that after backing
> > out this ACPI patch it works again, though I had to manually sort the
> > reject while backing out.
> >
> > http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=acf05f4b7f558051ea0028e8e617144123650272
>
> Hmm. That patch had a totally idiotic thinko in it (look at the for-loop
> in acpi_processor_get_power_info_default() and notice how it doesn't
> actually change anything in the loop).
>
> That thinko was later fixed (albeit in a really stupid way, and the same
> cut-and-paste bug still exists in acpi_processor_get_power_info_fadt()).
>
> Anyway, can you test this diff? It
>
> (a) removes the insane (and in one case incorrect) memset loop
> (b) makes the code that sets "pr->flags.power = 1" match the comment and
> the previous behaviour.
>
> Does that make a difference?
>

Yes, it works now. I just have to remove the declaration of "i" at
both the places to avoid compiler warnings.

Thanks a lot..
Maneesh

--
Maneesh Soni
Linux Technology Center,
IBM India Software Labs,
Bangalore, India
email: [email protected]
Phone: 91-80-25044990