2006-10-05 21:22:27

by Muli Ben-Yehuda

[permalink] [raw]
Subject: 2.6.19-rc1 genirq causes either boot hang or "do_IRQ: cannot handle IRQ -1"

My x366 no longer boots with 2.6.19-rc1. The boot either hangs in
uhci_hcd_init or dies with 'do_IRQ: cannot handle IRQ -1". Bisection
says this one is bad:

[PATCH] genirq: x86_64 irq: make vector_irq per cpu
author Eric W. Biederman <[email protected]>
committer Linus Torvalds <[email protected]>
commit 550f2299ac8ffaba943cf211380d3a8d3fa75301
http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=550f2299ac8ffaba943cf211380d3a8d3fa75301

and this one is fine:

[PATCH] genirq: x86_64 irq: Make the external irq handlers report their vector, not ...
author Eric W. Biederman <[email protected]>
committer Linus Torvalds <[email protected]>
commit e500f57436b9056a245216c53113613928155eba
http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=e500f57436b9056a245216c53113613928155eba

Boot logs, lspci -vvv and .config attached.

Boot log with hang in uhci_hcd_init:

kernel (hd0,1)/boot/calgary/bzImage root=/dev/sda2 console=tty0 console=ttyS1,1 9200
[Linux-bzImage, setup=0x1c00, size=0x2e3ad8]
initrd (hd0,1)/boot/calgary/aic94xxfw.initramfs.gz
[Linux-initrd @ 0x37e3f000, 0x1b01e2 bytes]
savedefault

[ 0.000000] Linux version 2.6.18mx (muli@rhun) (gcc version 3.4.1) #147 SMP Thu Oct 5 21:36:48 IST 2006
[ 0.000000] Command line: root=/dev/sda2 console=tty0 console=ttyS1,19200
[ 0.000000] BIOS-provided physical RAM map:
[ 0.000000] BIOS-e820: 0000000000000000 - 0000000000099000 (usable)
[ 0.000000] BIOS-e820: 0000000000099000 - 00000000000a0000 (reserved)
[ 0.000000] BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved)
[ 0.000000] BIOS-e820: 0000000000100000 - 00000000e7f9c640 (usable)
[ 0.000000] BIOS-e820: 00000000e7f9c640 - 00000000e7fa6a40 (ACPI data)
[ 0.000000] BIOS-e820: 00000000e7fa6a40 - 00000000e8000000 (reserved)
[ 0.000000] BIOS-e820: 00000000fec00000 - 0000000100000000 (reserved)
[ 0.000000] BIOS-e820: 0000000100000000 - 0000000198000000 (usable)
[ 0.000000] end_pfn_map = 1671168
[ 0.000000] DMI 2.3 present.
[ 0.000000] Zone PFN ranges:
[ 0.000000] DMA 0 -> 4096
[ 0.000000] DMA32 4096 -> 1048576
[ 0.000000] Normal 1048576 -> 1671168
[ 0.000000] early_node_map[3] active PFN ranges
[ 0.000000] 0: 0 -> 153
[ 0.000000] 0: 256 -> 950172
[ 0.000000] 0: 1048576 -> 1671168
[ 0.000000] ACPI: PM-Timer IO Port: 0x9c
[ 0.000000] ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
[ 0.000000] Processor #0 (Bootup-CPU)
[ 0.000000] ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] enabled)
[ 0.000000] Processor #1
[ 0.000000] ACPI: LAPIC (acpi_id[0x02] lapic_id[0x06] enabled)
[ 0.000000] Processor #C_NMI (acpi_id[0x02] dfl dfl lint[0x1])
[ 0.000000] ACPI: LAPIC_NMI (acpi_id[0x03] dfl dfl lint[0x1])
[ 0.000000] ACPI: IOAPIC (id[0x0f] address[0xfec00000] gsi_base[0])
[ 0.000000] IOAPIC[0]: apic_id 15, address 0xfec00000, GSI 0-35
[ 0.000000] ACPI: IOAPIC (id[0x0e] address[0xfec01000] gsi_base[35])
[ 0.000000] IOAPIC[1]: apic_id)
[ 0.000000] Setting APIC routing to flat
[ 0.000000] Using ACPI (MADT) for SMP configuration information
[ 0.000000] Nosave address range: 0000000000099000 - 00000000000a0000
[ 0.000000] Nosave address range: 00000000000a0000 - 00000000000e0000
[ 0.000000] Nosave address range: 00000000000e0000 - 0000000000100000
[ 0.000000] Nosave address range: 00000000e7f9c000 - 00000000e7f9d000
[ 0.000000] Nosave address range: 00000000e7f9d000 - 00000000e7fa6000
[ 0.000000] Nosave address range: 00000000e7fa6000 - 00000000e7fa7000
[ 0.000000] Nosave address range: 00000000e7fa7000 - 00000000e8000000
[ 0.000000] Nosave address range: 00000000e8000000 - 00000000fec00000
[ 0.000000] Nosave address range: 00000000fec00000 - 0000000100000000
[ 0.000000] Allocating PCI resources starting at ea000000 (gap: e8000000:16c00000)
[ 0.000000] PERCPU: Allocating 34304 bytes of per cpu data
[ 0.000000] Built 1 zonelists. Total pages: 1534048
[ 0.000000] Kernel command line: root=/dev/sda2 console=tty0 console=ttyS1,19200
[ 0.000000] Initializing CPU#0
[ 0.000000] PID hash table entries: 4096 (order: 5705] ... MAX_LOCKDEP_KEYS: 2048
[ 168.121904] ... CLASSHASH_SIZE: 1024
[ 168.148620] ... MAX_LOCKDEP_ENTRIES: 8192
[ 168.174819] ... MAX_LOCKDEP_CHAINS: 8192
[ 168.201018] ... CHAINHASH_SIZE: 4096
[ 168.227212] memory used by lock dependency info: 1328 kB
[ 168.259653] per task-struct memory footprint: 1680 bytes
[ 168.299288] Dentry cache hash table entries: 1048576 (order: 11, 8388608 bytes)
[ 168.354233] Inode-cache hash table entries: 524288 (order: 10, 4194304 bytes)[ 168.400617] Checking aperture...
[ 168.442967] PCI-DMA: Calgary IOMMU detected.
[ 168.468663] PCI-DMA: Calgary TCE table spec is 7, CONFIG_IOMMU_DEBUG is enabled.
[ 168.631217] Memory: 6096428k/6684672k available (3788k kernel code, 193716k reserved, 2726k data, 276k init)
[ 168.769306] Calibrating delay using timer specific routine.. 6346.46 BogoMIPS (lpj=12692924)
[ 168.820297] Mount-cache hash table entries: 256
[ 168.849151] CPU: Trace cache: 12K uops, L1 D cache: 16K
[ 168.880653] CPU: L2 cache: 1024K
[ 168.900049] using mwait in idle threads.
[ 168.923651] CPU: Physical Processor ID: 0
[ 16[ 169.171779] Using local APIC timer interrupts.
[ 169.230014] result 10425811
[ 169.246796] Detected 10.425 MHz APIC timer.
[ 169.274318] lockdep: not fixing up alternatives.
[ 169.302624] Booting processor 1/4 APIC 0x1
[ 169.337672] Initializing CPU#1
[ 169.417131] Calibrating delay using timer specific routine.. 6339.06 BogoMIPS (lpj=12678127)
[ 169.417148] CPU: Trace cache: 12K uops, L1 D cache: 16K
[ 169.417151] CPU: L2 cache: 1024K
[ 169.417155] CPU: Physical Processor ID: 0
[ 169.417157] CPU: Processor Core ID: 0
[ 169.417170] CPU1: Thermal monitoring enabled (TM1)
[ 169.417449] Intel(R) Xeon(TM) MP CPU 3.16GHz stepping 01
[ 169.421461] lockdep: not fixing up alternatives.
[ 169.683506] Booting processor 2/4 APIC 0x6
[ 169.718522] Initializing CPU#2
[ 169.797034] Calibrating delay using timer specific routine.. 6339.24 BogoMIPS (lpj=12677306] Intel(R) Pentium(R) 4 CPU 3.16GHz stepping 09
[ 169.801358] lockdep: not fixing up alternatives.
[ 170.063376] Booting processor 3/4 APIC 0x7
[ 170.098394] Initializing CPU#3
[ 170.176937] Calibrating delay using timer specific routine.. 6339.33 BogoMIPS (lpj=12678668)
[ 170.176953] CPU: Trace cache: 12K uops, L1 D cache: 16K
[ 170.176956] CPU: L2 cache: 1024K
[ 170.176959] CPU: Physical Processor ID: 3
[ 170.176961] CPU: Processor Core ID: 0
[ 170.176971] CPU3: Thermal monitoring enabled (TM1)
[ 170.177211] Intel(R) Pentium(R) 4 CPU 3.16GHz stepping 09
[ 170.180968] Brought up 4 CPUs
[ 170.433147] testing NMI watchdog ... OK.
[ 170.496859] time.c: Using 3.579545 MHz WALL PM GTOD PIT/TSC timer.
[ 170.533946] time.c: Detected 3169.444 MHz processor.
[ 170.732881] migration_cost=1,683
[ 170.753371] checking if image is initramfs... it is
[ 170.945317] Freeing initrd memory: 1728k freed
[ 170.975037] NET: Registered protocol family 16
[ 171.012196] ACPI: bus type pci registered
[ 171.036300] PCI: Using configuration type 1
[ 171.190476] ACPI: Interpreter enabled
[ 171.212520] ACPI: Using IOAPIC for interrupt routing
[ 171.249022] ACPI: PCI Root Bridge [VP00] (0000:00)
[ 171.281222] PCI: Ignoring BAR0-3 of IDE controller 0000:00:0f.1
[ 171.337334] ACPI: PCI Root Bridge [VP01] (0000:01)
[ 171.373147] ACPI: PCI Root Bridge [VP02] (0000:02)
[ 171.411105] ACPI: PCI Root Bridge [VP03] (0000:04)
[ 171.448703] ACPI: PCI Root Bridge [VP04] (0000:06)
[ 171.486345] ACPI: PCI Root Bridge [VP05] (0000:08)
[ 171.524153] ACPI: PCI Root Bridge [VP06] (0000:0a)
[ 171.561037] ACPI: PCI Root Bridge [VP07] (0000:0c)
[ 171.598952] SCSI subsystem initialized
[ 171.621709] usbcore: registered new interface driver usbfs
[ 171.654797] usbcore: registered new interface driver hub
[ 171.686886] usbcore: registered new device driver usb
[ 171.717651] PCI: Using ACPI for IRQ routing
[ 171.742820] PCI: If a device doesn't work, try "pci=routeirq". If it helps, post a report
[ 171.792715] PCI-DMA: Using Calgary IOMMU
[ 172.171973] Calgary: enabling translation on PHB 0
[ 172.200751] Calgary: errant DMAs will now be prevented on this bus.
[ 172.593458] Calgary: enabling translation on PHB 1
[ 172.622214] Calgary: errant DMAs will now be prevented on this bus.
[ 173.015214] Calgary: enabling translation on PHB 2
[ 173.043975] Calgary: errant DMAs will now be prevented on this bus.
[ 173.081667] PCI-GART: No AMD northbridge found.
[ 173.118125] NET: Registered protocol family 2
[ 173.200415 tables configured (established 65536 bind 32768)
[ 173.381286] TCP reno registered
[ 173.422487] Total HugeTLB memory allocated, 0
[ 173.450396] Installing knfsd (copyright (C) 1996 [email protected]).
[ 173.489481] io scheduler noop registered
[ 173.513181] io scheduler anticipatory registered (default)
[ 173.546334] io scheduler deadline registered
[ 173.572153] io scheduler cfq registered
[ 173.602544] GSI 16 sharing vector 0xA9 and IRQ 16
[ 173.630881] ACPI: PCI Interrupt 0000:00:01.0[A] -> GSI 16 (level, low) -> IRQ 16
[ 173.675669] radeonfb: Found Intel x86 BIOS ROM Image
[ 173.720028] radeonfb: Retrieved PLL infos from BIOS
[ 173.749346] radeonfb: Reference=27.00 MHz (RefDiv=60) Memory=143.00 Mhz, System=143.00 MHz
[ 173.798982] radeonfb: PLL min 12000 max 35000
[ 173.929494] i2c_adapter i2c-1: unable to read EDID block.
[ 174.121356] i2c_adapter i2c-1: unable to read EDID block.
[ 174.313307] i2c_adapter i2c-1: unable to read EDID block.
[ 174.777182] i2c_adapter i2c-2: unable to read EDID block.
[ 174.969129] i2c_adapter i2c-2: unable to read EDID block.
[ 175.161078] i2c_adapter i2c-2: unable to read EDID block.
[ 175.315579] radeonfb: Monitor 1 type DFP found
[ 175.342278] radeonfb: EDID probed
[ 175.362230] radeonfb: Monitor 2 type CRT found
[ 176.420926] Console: switching to colour frame buffer device 128x48
[ 177.132991] radeonfb (0000:00:01.0): ATI Radeon QY
[ 177.164714] tridentfb: Trident framebuffer 0.7.8-NEWAPI initializing
[ 177.204778] hgafb: HGA card not detected.
[ 177.229089] hgafb: probe of hgafb.0 failed with error -22
[ 177.265581] vga16fb: mapped to 0xffff8100000a0000
[ 177.294281] fb1: VGA16 VGA frame buffer device
[ 177.323286] fb2: Virtual frame buffer device, using 1024K of video memory
[ 177.364610] ACPI: Power Button (FF) [PWRF]
[ 177.390051] ibm_acpi: ec object not found
[ 177.799265] Linux agpgart interface v0.101 (c) Dave Jones
[ 177.832901] ipmi message handler version 39.0
[ 177.859197] ipmi device interface
[ 177.879561] Hangcheck: starting hangcheck timer 0.9.0 (tick is 180 seconds, margin is 60 seconds).
[ 177.933667] Hangcheck[ 178.096225] RAMDISK driver initialized: 16 RAM disks of 16384K size 1024 blocksize
[ 178.146934] loop: loaded (max 8 devices)
[ 178.170967] ibmasm: IBM ASM Service Processor Driver version 1.0 loaded
[ 178.211074] GSI 17 sharing vector 0xB1 and IRQ 17
[ 178.239566] ACPI: PCI Interrupt 0000:02:01.0[A] -> GSI 18 (level, low) -> IRQ 17
[ 178.284872] 3c59x: Donald Becker and others. http://www.scyld.com/network/vortex.html
[ 178.284885] 0000:02:01.0: 3Com PCI 3c905C Tornado at ffffc20000042000.
[ 178.311834] tg3.c:v3.66 (September 23, 2006)
[ 178.311869] GSI 18 sharing vector 0xB9 and IRQ 18
[ 178.311879] ACPI: PCI Interrupt 0000:01:01.0[A] -> GSI 24 (level, low) -> IRQ 18
[ 178.452374] eth1: Tigon3 [partno(BCM95704A6) rev 2100 PHY(5704)] (PCIX:66MHz:64-bit) 10/100/1000BaseT Ethernet 00:0d:60:98:74:22
[ 178.452390] eth1: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[1] Split[0] WireSpeed[1] TSOcap[0]
[ 178.452438] eth1: dma_rwctrl[769f0000] dma_mask[64-bit]
[ 178.453100] GSI 19 sharing vector 0xC1 and IRQ 19
[ 178.453115] ACPI: PCI Interrupt 0000:01:01.1[B] -> GSI 28 (level, low) -> IRQ 19
[ 178.619694] eth2: Tigon3 [partno(BCM95704A6) rev 2100 PHY(5704)] (PCIX:66MHz:64-bit) 10/100/1000BaseT Ethernet 00:0d:60:98:74:23
[ 178.619705] eth2: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] Split[0] WireSpeed[1] TSOcap[1]
[ 178.619709] eth2: dma_rwctrl[769f0000] dma_mask[64-bit]
[ 178.620480] Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
[ 178.620487] ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
[ 178.620586] SvrWks CSB6: IDE controller at PCI slot 0000:00:0f.1
[ 178.620612] SvrWks CSB6: chipset revision 160
[ 178.620615] SvrWks CSB6: not 100% native mode: will probe irqs later
[ 178.620640] ide0: BM-DMA at 0x0700-0x0707, BIOS settings: hda:DMA, hdb:DMA
[ 178.620661] SvrWks CSB6: simplex device: DMA disabled
[ 178.620664] ide1: SvrWks CSB6 Bus-Master DMA disabled (BIOS)
[ 179.370943] hda: HL-DT-STDVD-ROM GDR8082N, ATAPI CD/DVD-ROM drive
[ 179.715251] ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
[ 180.285934] hda: ATAPI 24X DVD-ROM drive, 256kB Cache
[ 180.306029] Uniform CD-ROM driver Revision: 3.20
[ 180.434132] usbmon: debugfs is not available
[ 180.502630] GSI 20 sharing vector 0xC9 and IRQ 20
[ 180.572766] ACPI: PCI Interrupt 0000:00:03.0[A] -> GSI 20 (level, low) -> IRQ 20
[ 180.660206] ohci_hcd 0000:00:03.0: OHCI Host Controller
[ 180.735266] ohci_hcd 0000:00:03.0: new USB bus registered, assigned bus number 1
[ 180.822372] ohci_hcd 0000:00:03.0: irq 20, io mem 0xf2c10000
[ 180.988387] usb usb1: Product: OHCI Host Controller
[ 181.060155] usb usb1: Manufacturer: Linux 2.6.18mx ohci_hcd
[ 181.136015] usb usb1: SerialNumber: 0000:00:03.0
[ 181.138365] usb usb1: configuration #1 chosen from 1 choice
[ 181.139014] hub 1-0:1.0: USB hub found
[ 181.139358] hub 1-0:1.0: 2 ports detected
[ 181.259512] ACPI: PCI Interrupt 0000:00:03.1[B] -> GSI 20 (level, low) -> IRQ 20
[ 181.414004] ohci_hcd 0000:00:03.1: OHCI Host Controller
[ 181.415193] ohci_hcd 0000:00:03.1: new USB bus registered, assigned bus number 2
[ 181.415233] ohci_hcd 0000:00:03.1: irq 20, io mem 0xf2c11000
[ 181.5092] usb usb2: Product: OHCI Host Controller
[ 181.504194] usb usb2: Manufacturer: Linux 2.6.18mx ohci_hcd
[ 181.504196] usb usb2: SerialNumber: 0000:00:03.1
[ 181.504232] usb usb2: uevent
[ 181.505151] usb usb2: usb_probe_device
[ 181.505960] usb usb2: configuration #1 chosen from 1 choice
[ 181.505980] usb usb2: adding 2-0:1.0 (config #1, interface 0)
[ 181.506026] usb 2-0:1.0: uevent
[ 181.506261] hub 2-0:1.0: usb_probe_interface
[ 181.506267] hub 2-0:1.0: usb_probe_interface - got id
[ 181.506271] hub 2-0:1.0: USB hub found
[ 181.506300] hub 2-0:1.0: 2 ports detected
[ 181.506304] hub 2-0:1.0: standalone hub
[ 181.506307] hub 2-0:1.0: no power switching (usb 1.0)
[ 181.506310] hub 2-0:1.0: global over-current protection
[ 181.506314] hub 2-0:1.0: power on to power good time: 30ms
[ 181.506333] hub 2-0:1.0: local power source is good
[ 181.506337] hub 2-0:1.0: no over-current condition exists
[ 181.506342] hub 2-0:1.0: trying to enable port power on non-switchable hub
[ 181.613892] hub 2-0:1.0: state 7 ports 2 chg 0000 evt 0000
[ 181.618715] USB Universal Host Controller Interface driver v3.0

Boot log with do_IRQ: cannot handle IRQ -1:

kernel (hd0,1)/boot/calgary/bzImage root=/dev/sda2 console=tty0 console=ttyS1,1
9200
[Linux-bzImage, setup=0x1c00, size=0x2e3ad8] initrd (hd0,1)/boot/calgary/aic94xxfw.initramfs.gz
[Linux-initrd @ 0x37e3f000, 0x1b01da bytes]
savedefault

[ 0.000000] Linux version 2.6.18mx (muli@rhun) (gcc version 3.4.1) #149 SMP Thu Oct 5 22:52:09 IST 2006
[ 0.000000] Command line: root=/dev/sda2 console=tty0 console=ttyS1,19200
[ 0.000000] BIOS-provided physical RAM map:
[ 0.000000] BIOS-e820: 0000000000000000 - 0000000000099000 (usable) [ 0.000000] BIOS-e820: 0000000000099000 - 00000000000a0000 (reserved)
[ 0.000000] BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved)
[ 0.000000] BIOS-e820: 0000000000100000 - 00000000e7f9c640 (usable)
[ 0.000000] BIOS-e820: 00000000e7f9c640 - 00000000e7fa6a40 (ACPI data)
[ 0.000000] BIOS-e820: 00000000e7fa6a40 - 00000000e8000000 (reserved)
[ 0.000000] BIOS-e820: 00000000fec00000 - 0000000100000000 (reserved)
[ 0.000000] BIOS-e820: 0000000100000000 - 0000000198000000 (usable)
[ 0.000000] end_pfn_map = 1671168
[ 0.000000] DMI 2.3 present.
[ 0.000000] Zone PFN ranges:
[ 0.000000] DMA 0 -> 4096
[ 0.000000] DMA32 4096 -> 1048576
[ 0.000000] Normal 1048576 -> 1671168
[ 0.000000] early_node_map[3] active PFN ranges
[ 0.000000] 0: 0 -> 153
[ 0.000000] 0: 256 -> 950172
[ 0.000000] 0: 1048576 -> 1671168
[ 0.000000] ACPI: PM-Timer IO Port: 0x9c
[ 0.000000] ACPI: LAPIC (acpirocessor #6
[ 0.000000] ACPI: LAPIC (acpi_id[0x03] lapic_id[0x07] enabled)
[ 0.000000] Processor #7
[ 0.000000] ACPI: LAPIC_NMI (acpi_id[0x00] dfl dfl lint[0x1])
[ 0.000000] ACPI: LAPIC_NMI (acpi_id[0x01] dfl dfl lint[0x1])
[ 0.000000] ACPI: LAPIC_NMI (acpi_id[0x02] dfl dfl lint[0x1])
[ 0.000000] ACPI: LAPIC_NMI (acpi_id[0x03] dfl dfl lint[0x1])
[ 0.000000] ACPI: IOAPIC (id[0x0f] address[0xfec00000] gsi_base[0])
[ 0.000000] IOAPIC[0]: apic_id 15, address 0xfec00000, GSI 0-35
[ 0.000000] ACPI: IOAPIC (id[0x0e] address[0xfec01000] gsi_base[35])
[ 0.000000] IOAPIC[1]: apic_id 14, address 0xfec01000, GSI 35-70
[ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
[ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 8 global_irq 8 low edge)
[ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 14 global_irq 14 low edge)
[ 0.000000] Setting APIC routing to flat
[ 0.000000] Using ACPI (MADT) for SMP configuration information
[ 0.000000] Nosave address range: 0000000000099000 - 00000000000a0000
[ 0.000000] Nosave address range: 00000000000a0000 - 00000000000e0000
[ 0.000000] Nosave address range: 00000000000e0000 - 0000000000100000
[ 0.000000] Nosave address range: 00000000e7f9c000 - 00000000e7f9d000
[ 0.000000] Nosave address range: 00000000e7f9d000 - 00000000e7fa6000
[ 0.000000] Nosave address range: 00000000e7fa6000 - 00000000e7fa7000
[ 0.000000] Nosave address range: 00000000e7fa7000 - 00000000e8000000
[ 0.000000] No4 bytes of per cpu data
[ 0.000000] Built 1 zonelists. Total pages: 1534048
[ 0.000000] Kernel command line: root=/dev/sda2 console=tty0 console=ttyS1,19200
[ 0.000000] Initializing CPU#0
[ 0.000000] PID hash table entries: 4096 (order: 12, 32768 bytes)
[ 162.254679] Console: colour VGA+ 80x25
[ 164.190495] Lock dependency validator:752] ... MAX_LOCKDEP_ENTRIES: 8192
[ 164.366947] ... MAX_LOCKDEP_CHAINS: 8192
[ 164.393142] ... CHAINHASH_SIZE: 4096
[ 164.419339] memory used by lock dependency info: 1328 kB
[ 164.451778] per task-struct memory footprint: 1680 bytes
[ 164.491419] Dentry cache hash table entries: 1048576 (order: 11, 8388608 bytes)
[ 164.546385] Inode-cache hash table entries: 524288 (order: 10, 4194304 bytes)[ 164.592804] Checking aperture...
[ 164.635185] PCI-DMA: Calgary IOMMU detected.
[ 164.660858] PCI-DMA: Calgary TCE table spec is 7, CONFIG_IOMMU_DEBUG is enabled.
[ 164.823121] Memory: 6096428k/6684672k available (3788k kernel code, 193716k reserved, 2726k data, 276k init)
[ 164.961480] Calibrating delay using timer specific routine.. 6346.50 BogoMIPS (lpj=12693003)
[ 165.012500] Mount-cache hash table entries: 256
[ 165.041360] CPU: Trace cache: 12K uops, L1 D cache: 16K
[ 165.072858] CPU: L2 cache: 1024K
[ 165.092254] using mwait in idle threads.
[ 165.115856] CPU: Physical Processor ID: 0
[ 165.139973] CPU: Processor Core ID: 0
[ 165.162031] CPU0: Thermal monitoring enabled (TM1)
[ 165.190818] Freeing SMP alternatives: 32k freed
[ 165.218068] ACPI: Core revision 20060707
[ 165.288939] ..MP-BIOS bug: 8254 timer not connected to IO-APIC
[ 165.363991] Using local APIC timer interrupts.
[ 165.422216] result 10425729
[ 165.439044] Detected 10.425 MHz APIC timer.
[ 165.466518] lockdep: not fixing up alternatives.
[ 165.494795] Booting processor 1/4 APIC 0x1
[ 165.529811] Initializing CPU#1
[ 165.609305] Calibrating delay using timer specific routine.. 6339.08 BogoMIPS (lpj=12678178)
[ 165.609322] CPU: Trace cache: 12K uops, L1 D cache: 16K
[ 165.609325] CPU: L2 cache: 1024K
[ 165.609329] CPU: Physical Processor ID: 0
[ 165.609331] CPU: Processor Core ID: 0
[ 165.609344] CPU1: Thermal monitoring enabled (TM1)
[ 165.609624] Intel(R) Xeon(TM) MP CPU 3.16GHz stepping 01
[ 165.613632] lockdep: not fixing up alternatives.
[ 165.875688] Booting processor 2/4 APIC 0x6
[ 165.910707] Initializing CPU#2
[ 165.989208] Calibrating delay using timer specific routine.. 6339.23 BogoMIPS (lpj=12678478)
[ 165.989221] CPU: Trace cache: 12K uops, L1 D cache: 16K
[ 165.989224] CPU: L2 cache: 1024K
[ 165.989227] CPU: Physical Processor ID: 3
[ 165.989229] CP[ 166.290617] Initializing CPU#3
[ 166.369111] Calibrating delay using timer specific routine.. 6339.30 BogoMIPS (lpj=12678612)
[ 166.369125] CPU: Trace cache: 12K uops, L1 D cache: 16K
[ 166.369128] CPU: L2 cache: 1024K
[ 166.369132] CPU: Physical Processor ID: 3
[ 166.369133] CPU: Processor Core ID: 0
[ 166.369144] CPU3: Thermal monitoring enabled (TM1)
[ 166.369384] Intel(R) Pentium(R) 4 CPU 3.16GHz stepping 09
[ 166.373137] Brought up 4 CPUs
[ 166.625258] testing NMI watchdog ... OK.
[ 166.688965] time.c: Using 3.579545 MHz WALL PM GTOD PIT/TSC timer.
[ 166.726055] time.c: Detected 3169.440 MHz processor.
[ 167.084243] migration_cost=73,703
[ 167.105254] checking if image is initramfs... it is
[ 167.296623] Freeing initrd memory: 1728k freed
[ 167.326347] NET: Registered protocol family 16
[ 167.363464] ACPI: bus type pci registered
[ 167.387553] PCI: Using configuration type 1
[ 167.542434] ACPI: Interpreter enabled
[ 167.564437] ACPI: Using IOAPIC for interrupt routing
[ 167.600989] ACPI: PCI Root Bridge [VP00] (0000:00)
[ 167.633174] PCI: Ignoring BAR0-3 of IDE controller 0000:00:0f.1
[ 167.688862] ACPI: PCI Root Bridge [VP01] (0000:01)
[ 167.723077] ACPI: PCI Root Bridge [VP02] (0000:02)
[ 167.762081] ACPI: PCI Root Bridge [VP03] (0000:04)
[ 167.799356] ACPI: PCI Root Bridge [VP04] (0000:06)
[ 167.838090] ACPI: PCI Root Bridge [VP05] (0000:08)
[ 167.877162] ACPI: PCI Root Bridge [VP06] (0000:0a)
[ 167.915790] ACPI: PCI Root Bridge [VP07] (0000:0c)
[ 167.954746] SCSI subsystem initialized
[ 167.977515] usb, try "pci=routeirq". If it helps, post a report
[ 168.148438] PCI-DMA: Using Calgary IOMMU
[ 168.527814] Calgary: enabling translation on PHB 0
[ 168.556590] Calgary: errant DMAs will now be prevented on this bus.
[ 168.949497] Calgary: enabling translation on PHB 1
[ 168.978270] Calgary: errant DMAs will now be prevented on this bus.
[ 169.371449] Calgary: enabling translation on PHB 2
[ 169.400237] Calgary: errant DMAs will now be prevented on this bus.
[ 169.437927] PCI-GART: No AMD northbridge found.
[ 169.475527] NET: Registered protocol family 2
[ 169.556442] IP route cache hash table entries: 262144 (order: 9, 2097152 bytes)
[ 169.602331] TCP established hash table entries: 65536 (order: 9, 3670016 bytes)
[ 169.654323] TCP bind hash table entries: 32768 (order: 8, 1835008 bytes)
[ 169.697894] TCP: Hash tables configured (established 65536 bind 32768)
[ 169.737140] TCP reno registered
[ 169.778387] Total HugeTLB memory allocated, 0
[ 169.806340] Installing knfsd (copyright (C) 1996 [email protected]).
[ 169.845456] io scheduler noop registered
[ 169.869143] io scheduler anticip16
[ 170.031922] radeonfb: Found Intel x86 BIOS ROM Image
[ 170.076069] radeonfb: Retrieved PLL infos from BIOS
[ 170.105379] radeonfb: Reference=27.00 MHz (RefDiv=60) Memory=143.00 Mhz, System=143.00 MHz
[ 170.155008] radeonfb: PLL min 12000 max 35000
[ 170.289514] i2c_adapter i2c-1: unable to read EDID block.
[ 170.481376] i2c_adapter i2c-1: unable to read EDID block.
[ 170.673320] i2c_adapter i2c-1: unable to read EDID block.
[ 171.137184] i2c_adapter i2c-2: unable to read EDID block.
[ 171.329128] i2c_adapter i2c-2: unable to read EDID block.
[ 171.521072] i2c_adapter i2c-2: unable to read EDID block.
[ 171.675567] radeonfb: Monitor 1 type DFP found
[ 171.702256] radeonfb: EDID probed
[ 171.722209] radeonfb: Monitor 2 type CRT found
[ 172.780899] Console: switching to colour frame buffer device 128x48
[ 173.493268] radeonfb (0000:00:01.0): ATI Radeon QY
[ 173.526192] tridentfb: Trident framebuffer 0.7.8-NEWAPI initializing
[ 173.566008] hgafb: HGA card not detected.
[ 173.590340] hgafb: probe of hgafb.0 failed with error -22
[ 173.625574] vga16fb: mapped to 0xffff8100000a0000
[ 173.654233] fb1: VGA16 VGA frame buffer device
[ 173.682430] fb2: Virtual frame buffer device, using 1024K of video memory
[ 173.723852] ACPI: Power Button (FF) [PWRF]
[ 173.749277] ibm_acpi: ec object not found
[ 174.153281] Linux agpgart interface v0.101 (c) Dave Jones
[ 174.186181] ipmi message handler version 39.0
[ 174.212487] ipmi device interface
[ 174.212704] Hangcheck: starting hangcheck timer 0.9.0 (tick is 180 seconds, margin is 60 seconds).
[ 174.212708] Hangcheck: Using monotonic_clock().
[ 174.212714] Serial: 8250/16550 driver $Revision: 1.90 $ 4 ports, IRQ sharing disabled
[ 174.215329] serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
[ 174.232727] serial8250: ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
[ 174.248340] RAMDISK driver initialized: 16 RAM disks of 16384K size 1024 blocksize
[ 174.253979] loop: loaded (max 8 devices)
[ 174.254880] ibmasm: IBM ASM Service Processor Driver version 1.0 loaded
[ 174.255602] GSI 17 sharing vector 0xB1 and IRQ 17
[ 174.255614] ACPI: PCI Interrupt 0000:02:01.0[A] -> GSI 18 (level, low) -> IRQ 17
[ 174.434473] 3c59x: Donald Becker and others. http://www.scyld.com/network/vortex.html
[ 174.434486] 0000:02:01.0: 3Com PCI 3c905C Tornado at ffffc20000042000.
[ 174.458336] tg3.c:v3.66 (September 23, 2006)
[ 174.458371] GSI 18 sharing vector 0xB9 and IRQ 18
[ 174.458382] ACPI: PCI Interrupt 0000:01:01.0[A] -> GSI 24 (level, low) -> IRQ 18
[ 174.651645] eth1: Tigon3 [partno(BCM95704A6) rev 2100 PHY(5704)] (PCIX:66MHz:64-bit) 10/100/1000BaseT Ethernet 00:0d:60:98:74:22
[ 174.651657] eth1: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[1] Split[0] WireSpeed[1] TSOcap[0]
[ 174.651661] eth1: dma_rwctrl[769f0000] dma_mask[64-bit]
[ 174.651742] GSI 19 sharing vector 0xC1 and IRQ 19
[ 174.651751] ACPI: PCI Interrupt 0000:01:01.1[B] -> GSI 28 (level, low) -> IRQ 19
[ 174.947630] eth2: Tigon3 [partno(BCM95704A6) rev 2100 PHY(5704)] (PCIX:66MHz:64-bit) 10/100/1000BaseT Ethernet 00:0d:60:98:74:23
[ 174.947641] eth2: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] Split[0] WireSpeed[1] TSOcap[1]
[ 174.947645] eth2: dma_rwctrl[769f0000] dma_mask[64-bit]
[ 174.948361] Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
[ 174.948368] ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
[ 174.948462] SvrWks CSB6: IDE controller at PCI slot 0000:00:0f.1
[ 174.948490] SvrWks CSB6: chipset revision 160
[ 174.948492] SvrWks CSB6: not 100% native mode: will probe irqs later
[ 174.948521] ide0: BM-DMA at 0x0700-0x0707, BIOS settings: hda:DMA, hdb:DMA
[ 174.948547] SvrWks CSB6: simplex device: DMA disabled
[ 174.948549] ide1: SvrWks CSB6 Bus-Master DMA disabled (BIOS)
[ 175.686861] hda: HL-DT-STDVD-ROM GDR8082N, ATAPI CD/DVD-ROM drive
[ 176.027298] ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
[ 176.598782] hda: ATAPI 24X DVD-ROM drive, 256kB Cache
[ 176.614367] Uniform CD-ROM driver Revision: 3.20
[ 176.742904] usbmon: debugfs is not available
[ 176.810874] GSI 20 sharing vector 0xC9 and IRQ 20
[ 176.880745] ACPI: PCI Interrupt 0000:00:03.0[A] -> GSI 20 (level, low) -> IRQ 20
[ 176.967527] ohci_hcd 0000:00:03.0: OHCI Host Controller
[ 177.042503] ohci_hcd 0000:00:03.0: new USB bus registered, assigned bus number 1
[ 177.129180] ohci_hcd 0000:00:03.0: irq 20, io mem 0xf2c10000
[ 177.292284] usb usb1: Product: OHCI Host Controller
[ 177.363518] usb usb1: Manufacturer: Linux 2.6.18mx ohci_hcd
[ 177.438777] usb usb1: SerialNumber: 0000:00:03.0
[ 177.508765] usb usb1: configuration #1 chosen from 1 choice
[ 177.584638] hub 1-0:1.0: USB hub found
[ 177.648464] hub 1-0:1.0: 2 ports detected
[ 177.819436] ACPI: PCI Interrupt 0000:00:03.1[B] -> GSI 20 (level, low) -> IRQ 20
[ 177.905381] ohci_hcd 0000:00:03.1: OHCI Host Controller
[ 177.977460] ohci_hcd 0000:00:03.1: new USB bus registered, assigned bus number 2
[ 178.062963] ohci_hcd 0000:00:03.1: irq 20, io mem 0xf2c11000
[ 178.223789] usb usb2: Product: OHCI Host Controller
[ 178.292928] usb usb2: Manufacturer: Linux 2.6.18mx ohci_hcd
[ 178.366239] usb usb2: SerialNumber: 0000:00:03.1
[ 178.367676] usb usb2: configuration #1 chosen from 1 choice
[ 178.368738] hub 2-0:1.0: USB hub found
[ 178.368756] hub 2-0:1.0: 2 ports detected
[ 178.7147] USB Universal Host Controller Interface driver v3.0
[ 178.639733] serio: i8042 KBD port at 0x60,0x64 irq 1
[ 178.639821] serio: i8042 AUX port at 0x60,0x64 irq 12
[ 178.685603] mice: PS/2 mouse device common for all mice
[ 178.705241] input: PC Speaker as /class/input/input0
[ 178.716390] input: AT Translated Set 2 keyboard as /class/input/input1
[ 178.725625] i2c /dev entries driver
[ 178.738285] i2c-parport: adapter type unspecified
[ 178.943983] i2c_adapter i2c-9191: Driver w83781d-isa failed to attach adapter, unregistering
[ 178.953019] i2c_adapter i2c-9191: Driver lm78-isa failed to attach adapter, unregistering
[ 178.959829] md: linear personality registered for level -1
[ 178.959836] md: raid0 personality registered for level 0
[ 178.959842] md: raid1 personality registered for level 1
[ 178.959847] md: multipath personality registered for level -4
[ 179.008855] IBM TrackPoint firmware: 0x0b, buttons: 3/3
[ 179.024974] input: TPPS/2 IBM TrackPoint as /class/input/input2
[ 179.789256] device-mapper: ioctl: 4.10.0-ioctl (2006-09-14) initialised: [email protected]
[ 179.874553] device-mapper: multipath: version 1.0.5 loaded
[ 179.941863] device-mapper: multipath round-robin: version 1.0.0 loaded
[ 180.014917] device-mapper: multipath emc: version 0.0.3 loaded
[ 180.083798] EDAC MC: Ver: 2.0.1 Oct 5 2006
[ 180.143574] pktgen v2.68: Packet Generator for packet performance testing.
[ 180.219216] u32 classifier
[ 180.269850] OLD policer on
[ 180.323042] IPv4 over IPv4 tunneling driver
[ 180.382659] GRE over IPv4 tunneling driver
[ 180.441335] TCP cubic registered
[ 180.494834] Initializing XFRM netlink socket
[ 180.554246] NET: Registered protocol family 1
[ 180.554260] NET: Registered protocol family 17
[ 180.554309] NET: Registered protocol family 15
[ 180.730291] 802.1Q VLAN Support v1.8 Ben Greear <[email protected]>
[ 180.802601] All bugs added by David S. Miller <[email protected]>
[ 180.905733] SCTP: Hash tables configured (established 37449 bind 37449)
[ 180.978800] Freeing unused kernel memory: 276k freed
running (1:0) /[ 181.050177] do_IRQ: cannot handle IRQ -1
[ 181.105984] ----------- [cut here ] --------- [please bite here ] ---------
[ 181.180286] Kernel BUG at ...uli/w/iommu/calgary/linux/arch/x86_64/kernel/irq.c:118
[ 181.258051] invalid opcode: 0000 [1] SMP
[ 181.279092] CPU 1
[ 181.279094] Modules linked in:
[ 181.279098] Pid: 0, comm: swapper Not tainted 2.6.18mx #149
[ 181.279101] RIP: 0010:[<ffffffff8020c792>]

lspci:

0000:00:00.0 Host bridge: IBM: Unknown device 02a1 (rev 01)
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B-
Status: Cap+ 66Mhz+ UDF- FastB2B+ ParErr- DEVSEL=slow >TAbort- <TAbort- <MAbort+ >SERR- <PERR-
Latency: 240
Capabilities: [60] PCI-X non-bridge device.
Command: DPERE- ERO- RBC=0 OST=1
Status: Bus=0 Dev=0 Func=0 64bit- 133MHz- SCD- USC-, DC=simple, DMMRBC=0, DMOST=0, DMCRS=0, RSCEM-
0000:00:01.0 VGA compatible controller: ATI Technologies Inc Radeon RV100 QY [Radeon 7000/VE] (prog-if 00 [VGA])
Subsystem: IBM: Unknown device 02c8
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping+ SERR+ FastB2B-
Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 240 (2000ns min), cache line size 10
Interrupt: pin A routed to IRQ 16
Region 0: Memory at e8000000 (32-bit, prefetchable) [size=128M]
Region 1: I/O ports at 1800 [size=256]
Region 2: Memory at f2c00000 (32-bit, non-prefetchable) [size=64K]
Capabilities: [50] Power Management version 2
Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-

0000:00:03.0 USB Controller: NEC Corporation USB (rev 43) (prog-if 10 [OHCI])
Subsystem: NEC Corporation USB
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr+ Stepping- SERR+ FastB2B-
Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 240 (250ns min, 10500ns max), cache line size 10
Interrupt: pin A routed to IRQ 20
Region 0: Memory at f2c10000 (32-bit, non-prefetchable) [size=4K]
Capabilities: [40] Power Management version 2
Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-

0000:00:03.1 USB Controller: NEC Corporation USB (rev 43) (prog-if 10 [OHCI])
Subsystem: NEC Corporation USB
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr+ Stepping- SERR+ FastB2B-
Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 240 (250ns min, 10500ns max), cache line size 10
Interrupt: pin B routed to IRQ 20
Region 0: Memory at f2c11000 (32-bit, non-prefetchable) [size=4K]
Capabilities: [40] Power Management version 2
Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-

0000:00:03.2 USB Controller: NEC Corporation USB 2.0 (rev 04) (prog-if 20 [EHCI])
Subsystem: NEC Corporation USB 2.0
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr+ Stepping- SERR+ FastB2B-
Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 240 (4000ns min, 8500ns max), cache line size 20
Interrupt: pin C routed to IRQ 20
Region 0: Memory at f2c12000 (32-bit, non-prefetchable) [size=256]
Capabilities: [40] Power Management version 2
Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-

0000:00:0f.0 Host bridge: ServerWorks CSB6 South Bridge (rev a0)
Subsystem: ServerWorks: Unknown device 0201
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B-
Status: Cap- 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 240

0000:00:0f.1 IDE interface: ServerWorks CSB6 RAID/IDE Controller (rev a0) (prog-if 82 [Master PriP])
Subsystem: ServerWorks: Unknown device 0212
Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B-
Status: Cap- 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 64
Region 0: I/O ports at <ignored>
Region 1: I/O ports at <ignored>
Region 2: I/O ports at <ignored>
Region 3: I/O ports at <ignored>
Region 4: I/O ports at 0700 [size=16]

0000:00:0f.3 ISA bridge: ServerWorks GCLE-2 Host Bridge
Subsystem: ServerWorks: Unknown device 0230
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B-
Status: Cap- 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 0

0000:01:00.0 Host bridge: IBM: Unknown device 02a1 (rev 01)
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B-
Status: Cap+ 66Mhz+ UDF- FastB2B+ ParErr- DEVSEL=slow >TAbort- <TAbort- <MAbort+ >SERR- <PERR-
Latency: 240
Capabilities: [60] PCI-X non-bridge device.
Command: DPERE- ERO- RBC=0 OST=1
Status: Bus=0 Dev=0 Func=0 64bit- 133MHz- SCD- USC-, DC=simple, DMMRBC=0, DMOST=0, DMCRS=0, RSCEM-
0000:01:01.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5704 Gigabit Ethernet (rev 10)
Subsystem: IBM: Unknown device 02e7
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B-
Status: Cap+ 66Mhz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 240 (16000ns min), cache line size 10
Interrupt: pin A routed to IRQ 24
Region 0: Memory at f2dc0000 (64-bit, non-prefetchable) [size=64K]
Capabilities: [40] Capabilities: [48] Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot+,D3cold+)
Status: D0 PME-Enable+ DSel=0 DScale=1 PME-
Capabilities: [50] Vital Product Data
Capabilities: [58] Message Signalled Interrupts: 64bit+ Queue=0/3 Enable-
Address: 8e9e3fb7523479b4 Data: b19f

0000:01:01.1 Ethernet controller: Broadcom Corporation NetXtreme BCM5704 Gigabit Ethernet (rev 10)
Subsystem: IBM: Unknown device 02e7
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B-
Status: Cap+ 66Mhz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 240 (16000ns min), cache line size 10
Interrupt: pin B routed to IRQ 28
Region 0: Memory at f2dd0000 (64-bit, non-prefetchable) [size=64K]
Capabilities: [40] Capabilities: [48] Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot+,D3cold+)
Status: D0 PME-Enable+ DSel=0 DScale=1 PME-
Capabilities: [50] Vital Product Data
Capabilities: [58] Message Signalled Interrupts: 64bit+ Queue=0/3 Enable-
Address: cffdfff793f7fddc Data: d6f7

0000:01:02.0 SCSI storage controller: Adaptec: Unknown device 041e (rev 08)
Subsystem: IBM: Unknown device 02e7
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr+ Stepping- SERR+ FastB2B-
Status: Cap+ 66Mhz+ UDF- FastB2B- ParErr- DEVSEL=slow >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 240 (32000ns min, 26750ns max), cache line size 10
Interrupt: pin A routed to IRQ 25
Region 0: Memory at f2d80000 (64-bit, non-prefetchable) [size=256K]
Region 2: Memory at f1400000 (64-bit, prefetchable) [size=128K]
Region 4: I/O ports at 2000 [size=256]
Capabilities: [40] PCI-X non-bridge device.
Command: DPERE- ERO- RBC=3 OST=4
Status: Bus=0 Dev=0 Func=0 64bit- 133MHz- SCD- USC-, DC=simple, DMMRBC=0, DMOST=0, DMCRS=0, RSCEM- Capabilities: [58] Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [e0] Message Signalled Interrupts: 64bit+ Queue=0/2 Enable-
Address: 0000000000000000 Data: 0000

0000:02:00.0 Host bridge: IBM: Unknown device 02a1 (rev 01)
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B-
Status: Cap+ 66Mhz+ UDF- FastB2B+ ParErr- DEVSEL=slow >TAbort- <TAbort- <MAbort+ >SERR- <PERR-
Latency: 240
Capabilities: [60] PCI-X non-bridge device.
Command: DPERE- ERO- RBC=0 OST=1
Status: Bus=0 Dev=0 Func=0 64bit- 133MHz- SCD- USC-, DC=simple, DMMRBC=0, DMOST=0, DMCRS=0, RSCEM-
0000:02:01.0 Ethernet controller: 3Com Corporation 3c905C-TX/TX-M [Tornado] (rev 6c)
Subsystem: 3Com Corporation 3C905C-TX Fast Etherlink for PC Management NIC
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr+ Stepping- SERR+ FastB2B-
Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 240 (2500ns min, 2500ns max), cache line size 10
Interrupt: pin A routed to IRQ 18
Region 0: I/O ports at 2800 [size=128]
Region 1: Memory at f2e20000 (32-bit, non-prefetchable) [size=128]
Capabilities: [dc] Power Management version 2
Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
Status: D0 PME-Enable- DSel=0 DScale=2 PME-

0000:04:00.0 Host bridge: IBM: Unknown device 02a1 (rev 01)
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B-
Status: Cap+ 66Mhz+ UDF- FastB2B+ ParErr- DEVSEL=slow >TAbort- <TAbort- <MAbort+ >SERR- <PERR-
Latency: 240
Capabilities: [60] PCI-X non-bridge device.
Command: DPERE- ERO- RBC=0 OST=1
Status: Bus=0 Dev=0 Func=0 64bit- 133MHz- SCD- USC-, DC=simple, DMMRBC=0, DMOST=0, DMCRS=0, RSCEM-
0000:06:00.0 Host bridge: IBM: Unknown device 02a1 (rev 01)
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B-
Status: Cap+ 66Mhz+ UDF- FastB2B+ ParErr- DEVSEL=slow >TAbort- <TAbort- <MAbort+ >SERR- <PERR-
Latency: 240
Capabilities: [60] PCI-X non-bridge device.
Command: DPERE- ERO- RBC=0 OST=1
Status: Bus=0 Dev=0 Func=0 64bit- 133MHz- SCD- USC-, DC=simple, DMMRBC=0, DMOST=0, DMCRS=0, RSCEM-
0000:08:00.0 Host bridge: IBM: Unknown device 02a1 (rev 01)
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B-
Status: Cap+ 66Mhz+ UDF- FastB2B+ ParErr- DEVSEL=slow >TAbort- <TAbort- <MAbort+ >SERR- <PERR-
Latency: 240
Capabilities: [60] PCI-X non-bridge device.
Command: DPERE- ERO- RBC=0 OST=1
Status: Bus=0 Dev=0 Func=0 64bit- 133MHz- SCD- USC-, DC=simple, DMMRBC=0, DMOST=0, DMCRS=0, RSCEM-
0000:0a:00.0 Host bridge: IBM: Unknown device 02a1 (rev 01)
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B-
Status: Cap+ 66Mhz+ UDF- FastB2B+ ParErr- DEVSEL=slow >TAbort- <TAbort- <MAbort+ >SERR- <PERR-
Latency: 240
Capabilities: [60] PCI-X non-bridge device.
Command: DPERE- ERO- RBC=0 OST=1
Status: Bus=0 Dev=0 Func=0 64bit- 133MHz- SCD- USC-, DC=simple, DMMRBC=0, DMOST=0, DMCRS=0, RSCEM-
0000:0c:00.0 Host bridge: IBM: Unknown device 02a1 (rev 01)
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B-
Status: Cap+ 66Mhz+ UDF- FastB2B+ ParErr- DEVSEL=slow >TAbort- <TAbort- <MAbort+ >SERR- <PERR-
Latency: 240
Capabilities: [60] PCI-X non-bridge device.
Command: DPERE- ERO- RBC=0 OST=1
Status: Bus=0 Dev=0 Func=0 64bit- 133MHz- SCD- USC-, DC=simple, DMMRBC=0, DMOST=0, DMCRS=0, RSCEM-


Cheers,
Muli


Attachments:
(No filename) (41.73 kB)
config.gz (8.36 kB)
Download all attachments

2006-10-06 15:17:23

by Eric W. Biederman

[permalink] [raw]
Subject: Re: 2.6.19-rc1 genirq causes either boot hang or "do_IRQ: cannot handle IRQ -1"

Muli Ben-Yehuda <[email protected]> writes:

> My x366 no longer boots with 2.6.19-rc1. The boot either hangs in
> uhci_hcd_init or dies with 'do_IRQ: cannot handle IRQ -1". Bisection
> says this one is bad:

Ok. So at least the second case is because some irq is being delivered
to a cpu that was not expecting it.

The hang case is weird because the kernel does not get told about
the irqs on your second ioapic.

When it gets the 'do_IRQ: cannot handle IRQ -1' how long
has the system been in user space? (It doesn't look like
init got started but that is hard to tell, shutting off irqbalanced
for testing purposes would be interesting)

Seeing the failure case is really weird because this early in boot
everything should be routed to cpu 0.

What happens if you boot with max_cpus=1?

The change the patch introduced was that we are now always
pointing irqs towards individual cpus, and not accepting an irq
if it comes into the wrong cpu.

The only hypothesis I have so far is that there may be an issue
with the x366 chipset ioapics that this patch reveals.

I would suspect a wider issue but in several months of testing
this is the first bug report I have seen.

If simple tests don't reveal what is going on then we will
have to instrument up that BUG and print out the per
cpu vector to irq tables, the cpu number, and the vector
the unexpected irq came in on.

Eric

2006-10-06 15:50:26

by Muli Ben-Yehuda

[permalink] [raw]
Subject: Re: 2.6.19-rc1 genirq causes either boot hang or "do_IRQ: cannot handle IRQ -1"

On Fri, Oct 06, 2006 at 09:14:53AM -0600, Eric W. Biederman wrote:

> Muli Ben-Yehuda <[email protected]> writes:
>
> > My x366 no longer boots with 2.6.19-rc1. The boot either hangs in
> > uhci_hcd_init or dies with 'do_IRQ: cannot handle IRQ -1". Bisection
> > says this one is bad:
>
> Ok. So at least the second case is because some irq is being delivered
> to a cpu that was not expecting it.
>
> The hang case is weird because the kernel does not get told about
> the irqs on your second ioapic.
>
> When it gets the 'do_IRQ: cannot handle IRQ -1' how long
> has the system been in user space? (It doesn't look like
> init got started but that is hard to tell, shutting off irqbalanced
> for testing purposes would be interesting)

In some cases we haven't made it to userspace at all. In other, we're
in the initrd.

> Seeing the failure case is really weird because this early in boot
> everything should be routed to cpu 0.
>
> What happens if you boot with max_cpus=1?

Trying it now... woohoo, it boots all the way and stays up!

> The change the patch introduced was that we are now always
> pointing irqs towards individual cpus, and not accepting an irq
> if it comes into the wrong cpu.
>
> The only hypothesis I have so far is that there may be an issue
> with the x366 chipset ioapics that this patch reveals.
>
> I would suspect a wider issue but in several months of testing
> this is the first bug report I have seen.

I'm trying to find out if other x366's are also seeing it.

> If simple tests don't reveal what is going on then we will
> have to instrument up that BUG and print out the per
> cpu vector to irq tables, the cpu number, and the vector
> the unexpected irq came in on.

I'm certainly game for any debugging you have in mind - this is my
main Calgary development machine so getting it booting is a pretty
high priority :-)

Cheers,
Muli

2006-10-06 16:03:27

by Linus Torvalds

[permalink] [raw]
Subject: Re: 2.6.19-rc1 genirq causes either boot hang or "do_IRQ: cannot handle IRQ -1"



On Fri, 6 Oct 2006, Eric W. Biederman wrote:
>
> The change the patch introduced was that we are now always
> pointing irqs towards individual cpus, and not accepting an irq
> if it comes into the wrong cpu.

I think we should just revert that thing. I don't think there is any real
reason to force irq's to specific cpu's: the vectors haven't been _that_
problematic a resource, and being limited to just 200+ possible vectors
globally really hasn't been a real problem once we started giving out the
vectors more sanely.

And the new code clearly causes problems, and it seems to limit our use of
irq's in fairly arbitrary ways. It also would seem to depend on the irq
routing being both sane and reliable, something I'm not sure we should
rely on.

Also, I suspect the whole notion of only accepting an irq on one
particular CPU is fundamentally fragile. The irq delivery tends to be a
multi-phase process that we can't even _control_ from software (ie the irq
may be pending inside an APIC or a bridge chip or other system logic, so
things may be happening _while_ we possibly try to change the cpu
delivery).

So how about just reverting that change?

Linus

2006-10-06 16:21:00

by Muli Ben-Yehuda

[permalink] [raw]
Subject: Re: 2.6.19-rc1 genirq causes either boot hang or "do_IRQ: cannot handle IRQ -1"

On Fri, Oct 06, 2006 at 05:50:21PM +0200, Muli Ben-Yehuda wrote:

> > What happens if you boot with max_cpus=1?
>
> Trying it now... woohoo, it boots all the way and stays up!

Ok, after verifying that maxcpus=1 causes the problematic changeset to
boot, I also tried maxcpus=1 with the tip of the tree. I hit this NULL
pointer dereference in profile_tick, with and without
maxcpus=1. Disassembly says that get_irq_regs() is returning NULL,
which may or may not be related to the genirq issue.

kernel (hd0,1)/boot/calgary/bzImage root=/dev/sda2 console=tty0 console=ttyS1,1 9200 maxcpus=1
[Linux-bzImage, setup=0x1c00, size=0x2e44df]
initrd (hd0,1)/boot/calgary/aic94xxfw.initramfs.gz
[Linux-initrd @ 0x37e3f000, 0x1b0188 bytes] savedefault

[ 0.000000] Linux version 2.6.19-rc1mx (muli@rhun) (gcc version 3.4.1) #154 S MP Fri Oct 6 17:57:51 IST 2006
[ 0.000000] Command line: root=/dev/sda2 console=tty0 console=ttyS1,19200 max cpus=1
[ 0.000000] BIOS-provided physical RAM map:
[ 0.000000] BIOS-e820: 0000000000000000 - 0000000000099000 (usable)
[ 0.000000] BIOS-e820: 00 (ACPI data)
[ 0.000000] BIOS-e820: 00000000e7fa6a40 - 00000000e8000000 (reserved)
[ 0.000000] BIOS-e820: 00000000fec00000 - 0000000100000000 (reserved)
[ 0.000000] BIOS-e820: 0000000100000000 - 0000000198000000 (usable)
[ 0.000000] end_pfn_map = 1671168
[ 0.000000] DMI 2.3 present.
[ 0.000000] Zone PFN ranges:
[ 0.000000] DMA 0 -> 4096
[ 0.000000] DMA32 4096 -> 1048576
[ 0.000000] Normal 1048576 -> 1671168
[ 0.000000] early_node_map[3] active PFN ranges
[ 0.000000] 0: 0 -> 153
[ 0.000000] 0: 256 -> 950172
[ 0.000000] 0: 1048576 -> 1671168
[ 0.000000] ACPI: PM-Timer IO Port: 0x9c
[ 0.000000] ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
[ 0.000000] Processor #0 (Bootup-CPU)
[ 0.000000] ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] enabled)
[ 0.000000] Processor #1
[ 0.000000] ACPI: LAPIC (acpi_id[0x02] lapic_id[0x06] enabled)
[ 0.000000] Processor #6
[ 0.000000] ACPI: LAPIC (acpi_id[0x03] lapic_id[0x07] enabled)
[ 0.000000] Processor #7
[ 0.000000] ACPI: LAPIC_NMI APIC (id[0x0f] address[0xfec00000] gsi_base[0])
[ 0.000000] IOAPIC[0]: apic_id 15, address 0xfec00000, GSI 0-35
[ 0.000000] ACPI: IOAPIC (id[0x0e] address[0xfec01000] gsi_base[35])
[ 0.000000] IOAPIC[1]: apic_id 14, address 0xfec01000, GSI 35-70
[ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
[ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 8 global_irq 8 low edge)
[ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 14 global_irq 14 low edge)
[ 0.000000] Setting APIC routing to flat
[ 0.000000] Using ACPI (MADT) for SMP configuration information
[ 0.000000] Nosave address range: 0000000000099000 - 00000000000a0000
[ 0.000000] Nosave address range: 00000000000a0000 - 00000000000e0000
[ 0.000000] Nosave address range: 00000000000e0000 - 0000000000100000
[ 0.000000] Nosave address range: 00000000e7f9c000 - 00000000e7f9d000
[ 0.000000] Nosave address range: 00000000e7f9d000 - 00000000e7fa6000
[ 0.000000] Nosave address range: 00000000e7fa6000 - 00000000e7fa7000
[ 0.000000] Nosave address range: 00000000e7fa7000 - 00000000e8000000
[ 0.000000] Nosave address range: 00000000e8000000 - 00000000fec00000
[ 0.000000] Nosave address range: 00000000fec00000 - 0000000100000000
[ 0.000000] Allocating PCI resources starting at ea000000 (gap: e8000000:16c0 0000)
[ 0.000000] PERCPU: Allocating 34432 bytes of per cpu data
[ 0.000000] Built 1 zonelists. Total pages: 1534050
[ 0.000000] Kernel command line: root=/dev/sda2 console=tty0 console=ttyS1,19 200 maxcpus=1
[ 0.000000] Initializing CPU#0
[ 0.000000] PID hash table entries: 4096 (order: 12, 32768 bytes)
[ 166.530409] Console: colour VGA+ 80x25
[ 168.479402] L 1024
[ 168.629696] ... MAX_LOCKDEP_ENTRIES: 8192
[ 168.655898] ... MAX_LOCKDEP_CHAINS: 8192
[ 168.682097] ... CHAINHASH_SIZE: 4096
[ 168.708299] memory used by lock dependency info: 1328 kB
[ 168.74[ 168.924070] PCI-DMA: Calgary IOMMU detected.
[ 168.949728] PCI-DMA: Calgary TCE table spec is 7, CONFIG_IOMMU_DEBUG is enabl ed.
[ 169.111284] Memory: 6096436k/6684672k available (3788k kernel code, 193708k r eserved, 2727k data, 276k init)
[ 169.249201] Calibrating delay using timer specific routine.. 6346.40 BogoMIPS (lpj=12692802)
[ 169.300193] Mount-cache hash table entries: 256
[ 169.329043] CPU: Trace cache: 12K uops, L1 D cache: 16K
[ 169.360565] CPU: L2 cache: 1024K
[ 169.379968] using mwait in idle threads.
[ 169.403574] CPU: Physical Processor ID: 0
[ 169.427697] CPU: Processor Core ID: 0
[ 169.449745] CPU0: Thermal monitoring enabled (TM1)
[ 169.478556] Freeing SMP alternatives: 32k freed
[ 169.505811] ACPI: Core revision 20060707
[ 169.576566] ..MP-BIOS bug: 8254 timer not connected to IO-APIC
[ 169.651600] Using local APIC timer interrupts.
[ 169.709847] result 10425453
[ 169.726643] Detected 10.425 MHz APIC timer.
[ 169.753344] Brought up 1 CPUs
[ 169.771342] Unable to handle kernel NULL pointer dereference at 0000000000000 088 RIP:
[ 169.804240] [<ffffffff8022de57>] profile_tick+0x34/0x6a
[ 169.851061] PGD 0
[ 169.863259] Oops: 0000 [1] SMP
[ 169.882391] CPU 0
[ 169.894607] Modules linked in:
[ 169.913117] Pid: 1, comm: swapper Not tainted 2.6.19-rc1mx #154
[ 169.948655] RIP: 0010:[<ffffffff8022de57>] [<ffffffff8022de57>] profile_tick +0x34/0x6a
[ 169.996876] RSP: 0000:ffffffff808d8f78 EFLAGS: 00010046
[ 170.028766] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[ 170.071615] RDX: ffff8100893f5f00 RSI: 0000000000000000 RDI: 0000000000000001
[ 170.114451] RBP: ffffffff808d8f88 R08: 0000000000000002 R09: ffffffff8022d24a
[ 170.157290] R10: ffffffff8022d24a R11: ffffffff80732780 R12: 0000000000000001
[ 170.200134] R13: ffffffff808e8d75 R14: 0000000000000246 R15: 0000000000000012
[ 170.242975] FS: 0000000000000000(0000) GS:ffffffff8085e000(0000) knlGS:00000 00000000000
[ 170.291576] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
[ 170.326102] CR2: 0000000000000088 CR3: 0000000000201000 CR4: 00000000000006e0
[ 170.368947] Process swapper (pid: 1, threadinfo ffff810197c7c000, task ffff81 0197c67040)
[ 170.417545] Stack: ffffffff807327c0 0000000000000012 ffffffff808d8f98 ffffff ff8021507e
[ 170.466126] upt+0xe/0x54
[ 170.616246] [<ffffffff80215108>] smp_apic_timer_interrupt+0x44/0x4b
[ 170.654411] [<ffffffff8020a4fb>] apic_timer_interrupt+0x6b/0x70
[ 170.690486] <EOI> [<ffffffff8022d24a>] release_console_sem+0x47/0x200
[ 170.730351] [<ffffffff8022d24a>] release_console_sem+0x47/0x200
[ 18a65>] atomic_notifier_chain_register+0x33/0x3e
[ 170.937284] [<ffffffff808a3359>] spawn_softlockup_task+0x6a/0x6f
[ 170.973876] [<ffffffff80207116>] init+0xce/0x30c
[ 171.002152] [<ffffffff805afd00>] trace_hardirqs_on_thunk+0x35/0x37
[ 171.039796] [<ffffffff80244952>] trace_hardirqs_on+0xf6/0x11a
[ 171.074833] [<ffffffff8020a6e5>] child_rip+0xa/0x15
[ 171.104669] [<ffffffff805b0484>] _spin_unlock_irq+0x29/0x2f
[ 171.138667] [<ffffffff80209e5d>] restore_args+0x0/0x30
[ 171.170065] [<ffffffff80207048>] init+0x0/0x30c
[ 171.197821] [<ffffffff8020a6db>] child_rip+0x0/0x15
[ 171.227660]
[ 171.236634]
[ 171.236635] Code: f6 83 88 00 00 00 03 75 28 65 8b 04 25 24 00 00 00 0f a3 05
[ 171.291160] RIP [<ffffffff8022de57>] profile_tick+0x34/0x6a
[ 171.325282] RSP <ffffffff808d8f78>
[ 171.346230] CR2: 0000000000000088
[ 171.366141] <0>Kernel panic - not syncing: Aiee, killing interrupt handler!
[ 171.408520] <1>Unable to handle kernel NULL pointer d tainted 2.6.19-rc1mx # 154
[ 171.587988] RIP: 0010:[<ffffffff8022de57>] [<ffffffff8022de57>] profile_tick +0x34/0x6a
[ 171.636213] RSP: 0000:ffffffff808d8ba8 EFLAGS: 00010046
[ 171.668098] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000001
[ 171.710945] RDX: ffff8100893f5f00 RSI: 0000000000000001 RDI: 0000000000000001
[ 171.753781] RBP: ffffffff808d8bb8 R08: 0000000000000002 R09: ffffffff80214925
[ 171.796617] R10: ffffffff80732780 R11: ffffffff807307e0 R12: 0000000000000001
[ 171.839455] R13: ffffffff808d8ec8 R14: 0000000000000000 R15: 0000000000000009
[ 171.882291] FS: 0000000000000000(0000) GS:ffffffff8085e000(0000) knlGS:00000 00000000000
[ 171.930894] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
[ 171.965414] CR2: 0000000000000088 CR3: 0000000000201000 CR4: 00000000000006e0
[ 172.008261] Process swapper (pid: 1, threadinfo ffff810197c7c000, task ffff81 0197c67040)
[ 172.056859] Stack: ffffffff805eec32 0000000000000000 ffffffff808d8bc8 ffffff ff8021507e
[ 172.105444] ffffffff808d8bd8 ffffffff80215108 ffffffff808d8bf0 ffffffff8020a 4fb
[ 172.150309] ffffffff808d8bf0 ffffffff808d8c78 ffffffff807307e0 ffffffff80732 780
[ 172.193985] Call Trace:
[ 172.209895] <IRQ> [<ffffffff8021507e>] smp_local_timer_interrupt+0xe/0x54
[ 172.251805] [<ffffffff80215108>] smp_apic_timer_interrupt+0x44/0x4b
[ 172.289972] [<ffffffff8020a4fb>] apic_timer_interrupt+0x6b/0x70
[ 172.326049] [<ffffffff80214925>] smp_send_stop+0x24/0x62
[ 172.358484] [<ffffffff8021495f>] smp_send_stop+0x5e/0x62
[ 172.390927] [<ffffffff8022c938>] panic+0xe0/0x195
[ 172.419721] [<ffffffff8022d24a>] release_console_sem+0x47/0x200
[ 172.455808] [<ffffffff8022d24a>] release_console_sem+0x47/0x200
[ 172.491892] [<ffffffff8022d24a>] release_console_sem+0x47/0x200
[ 172.527973] [<ffffffff8022f1b8>] do_exit+0x96/0x87e
[ 172.557810] [<ffffffff802f8022de57>] profile_tick+0x34/0x6a
[ 172.725730] [<ffffffff8022d24a>] release_console_sem+0x47/0x200
[ 172.761811] [<ffem+0x47/0x200
[ 172.914073] [<ffffffff8022d24a>] release_console_sem+0x47/0x200
[ 172.950159] [<ffffffff8022d72e>] vprintk+0x316/0x32e
[ 172.980515] [<ffffffff8022d731>] vprintk+0x319/0x32e
[ 173.010877] [<ffffffff8024426d>] mark_lock+0x8a/0x55d
[ 173.041758] [<ffffffff8022d7e8>] printk+0xa2/0xa4
[ 173.070560] [<ffffffff80238a65>] atomic_notifier_chain_register+0x33/0x3e
[ 173.111850] [<ffffffff808a3359>] spawn_softlockup_task+0x6a/0x6f
[ 173.148440] [<ffffffff80207116>] init+0xce/0x30c
[ 173.176714] [<ffffffff805afd00>] trace_hardirqs_on_thunk+0x35/0x37
[ 173.214309] [<ffffffff80244952>] trace_hardirqs_on+0xf6/0x11a
[ 173.249347] [<ffffffff8020a6e5>] child_rip+0xa/0x15
[ 173.279184] [<ffffffff805b0484>] _spin_unlock_irq+0x29/0x2f
[ 173.313189] [<ffffffff80209e5d>] restore_args+0x0/0x30
[ 173.344587] [<ffffffff80207048>] init+0x0/0x30c
[ 173.372347] [<ffffffff8020a6db>] child_rip+0x0/0x15
[ 173.402138]
[ 173.411114]
[ 173.411114] Code: f6 83 88 00 00 00 03 75 28 65 8b 04 25 24 00 0

2006-10-06 17:25:15

by Eric W. Biederman

[permalink] [raw]
Subject: Re: 2.6.19-rc1 genirq causes either boot hang or "do_IRQ: cannot handle IRQ -1"

Linus Torvalds <[email protected]> writes:

> On Fri, 6 Oct 2006, Eric W. Biederman wrote:
>>
>> The change the patch introduced was that we are now always
>> pointing irqs towards individual cpus, and not accepting an irq
>> if it comes into the wrong cpu.
>
> I think we should just revert that thing. I don't think there is any real
> reason to force irq's to specific cpu's: the vectors haven't been _that_
> problematic a resource, and being limited to just 200+ possible vectors
> globally really hasn't been a real problem once we started giving out the
> vectors more sanely.

Forcing irqs to specific cpus is not something this patch adds. That
is the way the ioapic routes irqs. There are certain rare circumstances
today with less than 8 cpus when cpu hotplug is not enabled that you can
route an irq at multiple cpus but they are rare, and rarely get used.

What this patch does is allows us to take advantage of the way the hardware
works.

The benefit comes when we start assigning irq numbers which we have been
artificially limiting to the number of vectors in the system. Currently
on x86_64 we have two levels of irq number translation and compression just
to get the irq number to below 256, that code is fragile hard to understand,
hard to test, and hard to maintain. This code you can test on practically
ever x86_64 box out there.

The other benefit is that in the form of MSI hardware devices now have their
own irq controllers. Since those irqs don't have to go through individual
traces on the motherboard we can start expecting to see systems that can
take advantage of a lot more irqs. On the bigger x86_64 systems today
we are just below the 224 vector limit on x86_64.

> And the new code clearly causes problems, and it seems to limit our use of
> irq's in fairly arbitrary ways. It also would seem to depend on the irq
> routing being both sane and reliable, something I'm not sure we should
> rely on.

Yes. A single problem over several months of testing has been found.

I'm not magic I can't test all of the hardware out there, and hardware
invariably shows variation. All I ask is a chance to root cause
this failure before we reject this code out of hand.

I have personally tested this on AMD and Intel small SMP systems as
well as arranging a little on a big Unisys machine with 64 or 128 processor
and tested it there. The patches were simple and self contained
so people could use git bisect to find the problems. Plus the code has
been in -mm since about 2.6.18, since before kernel summit and OLS. I
did all that I could personally think of to make certain this code
would work. Is there something more I should have done?

As for the fairly arbitrary restrictions. I assume you mean the
one irq per cpu thing. This infrastructure of this patch does not
fundamentally have this limitation. The per cpu 256 entry
vector_irq table would have no problem describing an irq that
could show up in one of several cpus simultaneously. The vector
allocator currently doesn't support that because it seemed
pointless to implement an practically unused case.

It probably makes sense to have some kind of spurious irq handler
instead of a bug if an irq comes in on the wrong irq. I know I was
thinking about that at one time but for some reason I never got around
to it. Even dropping the irq would likely have been better than
calling BUG, as the system would have booted.

I think I left the BUG there so that if there were problems in -mm.
The source of the problems would be immediately obvious, and I would
get a bug report. But it never triggered so I completely forgot
the BUG was there.

> Also, I suspect the whole notion of only accepting an irq on one
> particular CPU is fundamentally fragile. The irq delivery tends to be a
> multi-phase process that we can't even _control_ from software (ie the irq
> may be pending inside an APIC or a bridge chip or other system logic, so
> things may be happening _while_ we possibly try to change the cpu
> delivery).

So this is fairly fundamentally an irq migration problem. If you
never change which cpu an irq is pointed at you don't have problems,
as there are no races.

I had a box going for several days just moving it's timer irq and
several others from one cpu to another about once a second, to prove
to myself I had fixed all of the bugs and this code as safe.

The current irq migration logic does everything in the irq handler
after an irq has been received so we can avoid various kinds of races.

For level triggered interrupts we disable the interrupt acknowledge
it, migrate it, and then enable it.

For edge triggered interrupts we disable the interrupt,
migrate it, reenable it and then acknowledge it (which
clears the local apics state and allows the irq to be
received again.

As far as I can tell the code is sane an works for good fundamental
reasons. But it is the code that actually touches something
so I suspect it the most.

Eric

2006-10-06 17:49:23

by Eric W. Biederman

[permalink] [raw]
Subject: Re: 2.6.19-rc1 genirq causes either boot hang or "do_IRQ: cannot handle IRQ -1"

Muli Ben-Yehuda <[email protected]> writes:

> On Fri, Oct 06, 2006 at 09:14:53AM -0600, Eric W. Biederman wrote:
>
>> Muli Ben-Yehuda <[email protected]> writes:
>
> In some cases we haven't made it to userspace at all. In other, we're
> in the initrd.

Ok. So no irqbalanced?
Any non-standard firmware on this box like a hypervisor or weird APM
code that could be causing problems.

I'm just trying to think of things that might trip over a change in
irq handling, besides a chipset.

I want to suspect the irq migration code but it doesn't look like
irqbalanced has started at all so irq migration doesn't appear to be
happening.


>> Seeing the failure case is really weird because this early in boot
>> everything should be routed to cpu 0.
>>
>> What happens if you boot with max_cpus=1?
>
> Trying it now... woohoo, it boots all the way and stays up!

Cool. So this is clearly about irqs being delivered to multiple
cpus, and getting getting the delivery messed up for some reason.

>> If simple tests don't reveal what is going on then we will
>> have to instrument up that BUG and print out the per
>> cpu vector to irq tables, the cpu number, and the vector
>> the unexpected irq came in on.
>
> I'm certainly game for any debugging you have in mind - this is my
> main Calgary development machine so getting it booting is a pretty
> high priority :-)

Sure. Anything that breaks irqs for 2.6.19 is clearly a big problem.

Can you try the debug patch below and tell me what it reports.
As long as the problem irq is not for something important this
should allow you to boot, and just collect the information.

What I am hoping is that we will see which irq or irqs are having
problems. Then we can check out how the irq controller for those
irq are programmed.

Eric


diff --git a/arch/x86_64/kernel/irq.c b/arch/x86_64/kernel/irq.c
index 506f27c..0bd4281 100644
--- a/arch/x86_64/kernel/irq.c
+++ b/arch/x86_64/kernel/irq.c
@@ -113,9 +113,20 @@ asmlinkage unsigned int do_IRQ(struct pt
irq = __get_cpu_var(vector_irq)[vector];

if (unlikely(irq >= NR_IRQS)) {
- printk(KERN_EMERG "%s: cannot handle IRQ %d\n",
- __FUNCTION__, irq);
- BUG();
+ if (printk_ratelimit()) {
+ int cpu, vec;
+ printk(KERN_EMERG "%s: cannot handle IRQ %d vector: %d cpu: %d\n",
+ __FUNCTION__, irq, vector, smp_processor_id());
+ for_each_online_cpu(cpu) {
+ for (vec = 0; vec < NR_VECTORS; vec++) {
+ irq = per_cpu(vector_irq, cpu);
+ printk(KERN_DEBUG "vector_irq[%d][%d] -> %d\n",
+ cpu, vec, irq);
+ }
+ }
+ }
+ irq_exit();
+ return 1;
}

#ifdef CONFIG_DEBUG_STACKOVERFLOW

2006-10-06 18:12:21

by Linus Torvalds

[permalink] [raw]
Subject: Re: 2.6.19-rc1 genirq causes either boot hang or "do_IRQ: cannot handle IRQ -1"



On Fri, 6 Oct 2006, Eric W. Biederman wrote:
>
> Forcing irqs to specific cpus is not something this patch adds. That
> is the way the ioapic routes irqs.

What that patch adds is to make it an ERROR if some irq goes to an
unexpected cpu.

And that very much is wrong.

> Yes. A single problem over several months of testing has been found.

Umm. It got found the moment it became part of the standard tree.

The fact is, "months of testing" is not actually very much, if it's the
-mm tree. That's at best a "good vetting", but it really doesn't prove
anything.

> So this is fairly fundamentally an irq migration problem. If you
> never change which cpu an irq is pointed at you don't have problems,
> as there are no races.

So? Does that change the issue that this new model seems inherently racy?

> The current irq migration logic does everything in the irq handler
> after an irq has been received so we can avoid various kinds of races.

No. You don't understand, or you refuse to face the issue.

The races are in _hardware_, outside the CPU. The fact that we do things
in an irq handler doesn't seem to change a lot.

And what do you intend to do if it turns out that the reason it doesn't
work on x366 is that the _hardware_ just is incompatible with your model?

I'm not saying that's the case, and maybe there's some stupid bug that has
been overlooked, and maybe it can all work fine. But the new model _does_
seem to be at least _potentially_ fundamentally broken.

Linus

2006-10-06 18:51:34

by Eric W. Biederman

[permalink] [raw]
Subject: Re: 2.6.19-rc1 genirq causes either boot hang or "do_IRQ: cannot handle IRQ -1"

Linus Torvalds <[email protected]> writes:

> On Fri, 6 Oct 2006, Eric W. Biederman wrote:
>>
>> Forcing irqs to specific cpus is not something this patch adds. That
>> is the way the ioapic routes irqs.
>
> What that patch adds is to make it an ERROR if some irq goes to an
> unexpected cpu.
>
> And that very much is wrong.

Agreed. Not recovering from an irq that hits the wrong cpu if we
can recover from it is a problem. That part must be fixed.

>> Yes. A single problem over several months of testing has been found.
>
> Umm. It got found the moment it became part of the standard tree.
>
> The fact is, "months of testing" is not actually very much, if it's the
> -mm tree. That's at best a "good vetting", but it really doesn't prove
> anything.

I'm not trying to prove anything just saying that I tried.
All it shows is that there are an interesting subset of systems that
work.

The fact that the system that failed has a comparatively low volume
chipset from IBM let's me entertain my an atypical hardware hypothesis.

>> So this is fairly fundamentally an irq migration problem. If you
>> never change which cpu an irq is pointed at you don't have problems,
>> as there are no races.
>
> So? Does that change the issue that this new model seems inherently racy?

If it is inherently racy, (i.e. it cannot be fixed) I don't have a
problem removing the code.

>> The current irq migration logic does everything in the irq handler
>> after an irq has been received so we can avoid various kinds of races.
>
> No. You don't understand, or you refuse to face the issue.
>
> The races are in _hardware_, outside the CPU. The fact that we do things
> in an irq handler doesn't seem to change a lot.

(as an aside the problem does not appear on the irq migration path
because the kernel has not made it far enough for that to be
possible)

I think I don't understand the race you see. I believe the premise
the irq migration code works under is that while an irq is pending
a second irq will not be sent from the ioapic.

If that premise is true, and we disable that irq on the ioapic,
while the irq is still pending that should successfully prevent
the hardware from sending any further instances of that irq while we
manipulate it's routing.

There are a few more details but that is why I think that path is
safe.

> And what do you intend to do if it turns out that the reason it doesn't
> work on x366 is that the _hardware_ just is incompatible with your
> model?

If the code is fundamentally unfixable the code must go.

> I'm not saying that's the case, and maybe there's some stupid bug that has
> been overlooked, and maybe it can all work fine. But the new model _does_
> seem to be at least _potentially_ fundamentally broken.

The BUG_ON certainly is, I will work up a patch to get rid of that.
I'm hoping to understand how it could possibly happen before I fix
that now that I have a reproducer of that condition, because it may
influence the fix. But dropping an irq on the floor is certainly
better then crashing the entire system.

Eric

2006-10-06 19:01:19

by Andrew Vasquez

[permalink] [raw]
Subject: Re: 2.6.19-rc1 genirq causes either boot hang or "do_IRQ: cannot handle IRQ -1"

On Fri, 06 Oct 2006, Muli Ben-Yehuda wrote:

> On Fri, Oct 06, 2006 at 05:50:21PM +0200, Muli Ben-Yehuda wrote:
>
> > > What happens if you boot with max_cpus=1?
> >
> > Trying it now... woohoo, it boots all the way and stays up!
>
> Ok, after verifying that maxcpus=1 causes the problematic changeset to
> boot, I also tried maxcpus=1 with the tip of the tree. I hit this NULL
> pointer dereference in profile_tick, with and without
> maxcpus=1. Disassembly says that get_irq_regs() is returning NULL,
> which may or may not be related to the genirq issue.
>
> kernel (hd0,1)/boot/calgary/bzImage root=/dev/sda2 console=tty0 console=ttyS1,1 9200 maxcpus=1
> [Linux-bzImage, setup=0x1c00, size=0x2e44df]
> initrd (hd0,1)/boot/calgary/aic94xxfw.initramfs.gz
> [Linux-initrd @ 0x37e3f000, 0x1b0188 bytes] savedefault
>
> [ 0.000000] Linux version 2.6.19-rc1mx (muli@rhun) (gcc version 3.4.1) #154 S MP Fri Oct 6 17:57:51 IST 2006
> [ 0.000000] Command line: root=/dev/sda2 console=tty0 console=ttyS1,19200 max cpus=1
> [ 0.000000] BIOS-provided physical RAM map:
...
> [ 169.111284] Memory: 6096436k/6684672k available (3788k kernel code, 193708k r eserved, 2727k data, 276k init)
> [ 169.249201] Calibrating delay using timer specific routine.. 6346.40 BogoMIPS (lpj=12692802)
> [ 169.300193] Mount-cache hash table entries: 256
> [ 169.329043] CPU: Trace cache: 12K uops, L1 D cache: 16K
> [ 169.360565] CPU: L2 cache: 1024K
> [ 169.379968] using mwait in idle threads.
> [ 169.403574] CPU: Physical Processor ID: 0
> [ 169.427697] CPU: Processor Core ID: 0
> [ 169.449745] CPU0: Thermal monitoring enabled (TM1)
> [ 169.478556] Freeing SMP alternatives: 32k freed
> [ 169.505811] ACPI: Core revision 20060707
> [ 169.576566] ..MP-BIOS bug: 8254 timer not connected to IO-APIC
> [ 169.651600] Using local APIC timer interrupts.
> [ 169.709847] result 10425453
> [ 169.726643] Detected 10.425 MHz APIC timer.
> [ 169.753344] Brought up 1 CPUs
> [ 169.771342] Unable to handle kernel NULL pointer dereference at 0000000000000 088 RIP:
> [ 169.804240] [<ffffffff8022de57>] profile_tick+0x34/0x6a
> [ 169.851061] PGD 0
> [ 169.863259] Oops: 0000 [1] SMP
> [ 169.882391] CPU 0
> [ 169.894607] Modules linked in:
> [ 169.913117] Pid: 1, comm: swapper Not tainted 2.6.19-rc1mx #154
> [ 169.948655] RIP: 0010:[<ffffffff8022de57>] [<ffffffff8022de57>] profile_tick +0x34/0x6a
> [ 169.996876] RSP: 0000:ffffffff808d8f78 EFLAGS: 00010046
> [ 170.028766] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
> [ 170.071615] RDX: ffff8100893f5f00 RSI: 0000000000000000 RDI: 0000000000000001
> [ 170.114451] RBP: ffffffff808d8f88 R08: 0000000000000002 R09: ffffffff8022d24a
> [ 170.157290] R10: ffffffff8022d24a R11: ffffffff80732780 R12: 0000000000000001
> [ 170.200134] R13: ffffffff808e8d75 R14: 0000000000000246 R15: 0000000000000012
> [ 170.242975] FS: 0000000000000000(0000) GS:ffffffff8085e000(0000) knlGS:00000 00000000000
> [ 170.291576] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> [ 170.326102] CR2: 0000000000000088 CR3: 0000000000201000 CR4: 00000000000006e0
> [ 170.368947] Process swapper (pid: 1, threadinfo ffff810197c7c000, task ffff81 0197c67040)
> [ 170.417545] Stack: ffffffff807327c0 0000000000000012 ffffffff808d8f98 ffffff ff8021507e
> [ 170.466126] upt+0xe/0x54
> [ 170.616246] [<ffffffff80215108>] smp_apic_timer_interrupt+0x44/0x4b
> [ 170.654411] [<ffffffff8020a4fb>] apic_timer_interrupt+0x6b/0x70
> [ 170.690486] <EOI> [<ffffffff8022d24a>] release_console_sem+0x47/0x200
> [ 170.730351] [<ffffffff8022d24a>] release_console_sem+0x47/0x200
> [ 18a65>] atomic_notifier_chain_register+0x33/0x3e
> [ 170.937284] [<ffffffff808a3359>] spawn_softlockup_task+0x6a/0x6f
> [ 170.973876] [<ffffffff80207116>] init+0xce/0x30c
> [ 171.002152] [<ffffffff805afd00>] trace_hardirqs_on_thunk+0x35/0x37
> [ 171.039796] [<ffffffff80244952>] trace_hardirqs_on+0xf6/0x11a
> [ 171.074833] [<ffffffff8020a6e5>] child_rip+0xa/0x15
> [ 171.104669] [<ffffffff805b0484>] _spin_unlock_irq+0x29/0x2f
> [ 171.138667] [<ffffffff80209e5d>] restore_args+0x0/0x30
> [ 171.170065] [<ffffffff80207048>] init+0x0/0x30c
> [ 171.197821] [<ffffffff8020a6db>] child_rip+0x0/0x15

Hmm, I'm seeing a similar boot-up panic on my x86_64 box. Here's the
boot-output (.config is attached).

[ 0.000000] Linux version 2.6.19-rc1 (root@spe) (gcc version 4.1.0 (SUSE Linux)) #5 SMP Fri Oct 6 09:49:14 PDT 2006
[ 0.000000] Command line: root=/dev/sda2 vga=1 resume=/dev/sda1 console=ttyS0,115200 console=tty0 nmi_watchdog=1
[ 0.000000] BIOS-provided physical RAM map:
[ 0.000000] BIOS-e820: 0000000000000000 - 000000000009b800 (usable)
[ 0.000000] BIOS-e820: 000000000009b800 - 00000000000a0000 (reserved)
[ 0.000000] BIOS-e820: 00000000000d0000 - 0000000000100000 (reserved)
[ 0.000000] BIOS-e820: 0000000000100000 - 000000003ff10000 (usable)
[ 0.000000] BIOS-e820: 000000003ff10000 - 000000003ff17000 (ACPI data)
[ 0.000000] BIOS-e820: 000000003ff17000 - 000000003ff80000 (ACPI NVS)
[ 0.000000] BIOS-e820: 000000003ff80000 - 0000000040000000 (reserved)
[ 0.000000] BIOS-e820: 00000000e0000000 - 00000000e8000000 (reserved)
[ 0.000000] BIOS-e820: 00000000fec00000 - 00000000fec00400 (reserved)
[ 0.000000] BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
[ 0.000000] BIOS-e820: 00000000fff80000 - 0000000100000000 (reserved)
[ 0.000000] end_pfn_map = 1048576
[ 0.000000] DMI present.
[ 0.000000] Zone PFN ranges:
[ 0.000000] DMA 0 -> 4096
[ 0.000000] DMA32 4096 -> 1048576
[ 0.000000] Normal 1048576 -> 1048576
[ 0.000000] early_node_map[2] active PFN ranges
[ 0.000000] 0: 0 -> 155
[ 0.000000] 0: 256 -> 261904
[ 0.000000] Nvidia board detected. Ignoring ACPI timer override.
[ 0.000000] ACPI: PM-Timer IO Port: 0x8008
[ 0.000000] ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
[ 0.000000] Processor #0 (Bootup-CPU)
[ 0.000000] ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] enabled)
[ 0.000000] Processor #1
[ 0.000000] ACPI: LAPIC_NMI (acpi_id[0x00] high edge lint[0x1])
[ 0.000000] ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1])
[ 0.000000] ACPI: IOAPIC (id[0x02] address[0xfec00000] gsi_base[0])
[ 0.000000] IOAPIC[0]: apic_id 2, address 0xfec00000, GSI 0-23
[ 0.000000] ACPI: IOAPIC (id[0x03] address[0xdf300000] gsi_base[24])
[ 0.000000] IOAPIC[1]: apic_id 3, address 0xdf300000, GSI 24-27
[ 0.000000] ACPI: IOAPIC (id[0x04] address[0xdf301000] gsi_base[28])
[ 0.000000] IOAPIC[2]: apic_id 4, address 0xdf301000, GSI 28-31
[ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 high edge)
[ 0.000000] ACPI: BIOS IRQ0 pin2 override ignored.
[ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 low level)
[ 0.000000] Setting APIC routing to flat
[ 0.000000] Using ACPI (MADT) for SMP configuration information
[ 0.000000] Nosave address range: 000000000009b000 - 000000000009c000
[ 0.000000] Nosave address range: 000000000009c000 - 00000000000a0000
[ 0.000000] Nosave address range: 00000000000a0000 - 00000000000d0000
[ 0.000000] Nosave address range: 00000000000d0000 - 0000000000100000
[ 0.000000] Allocating PCI resources starting at 50000000 (gap: 40000000:a0000000)
[ 0.000000] PERCPU: Allocating 32512 bytes of per cpu data
[ 0.000000] Built 1 zonelists. Total pages: 256711
[ 0.000000] Kernel command line: root=/dev/sda2 vga=1 resume=/dev/sda1 console=ttyS0,115200 console=tty0 nmi_watchdog=1
[ 0.000000] Initializing CPU#0
[ 0.000000] PID hash table entries: 4096 (order: 12, 32768 bytes)
[ 26.923435] Console: colour VGA+ 80x50
[ 27.230350] Dentry cache hash table entries: 131072 (order: 8, 1048576 bytes)
[ 27.238291] Inode-cache hash table entries: 65536 (order: 7, 524288 bytes)
[ 27.245305] Checking aperture...
[ 27.248584] CPU 0: aperture @ 0 size 32 MB
[ 27.255250] No AGP bridge found
[ 27.276817] Memory: 1024988k/1047616k available (2405k kernel code, 22060k reserved, 955k data, 224k init)
[ 27.366087] Calibrating delay using timer specific routine.. 4423.17 BogoMIPS (lpj=8846344)
[ 27.374753] Mount-cache hash table entries: 256
[ 27.379624] CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
[ 27.386804] CPU: L2 Cache: 1024K (64 bytes/line)
[ 27.391486] Freeing SMP alternatives: 24k freed
[ 27.396100] ACPI: Core revision 20060707
[ 27.445859] activating NMI Watchdog ... done.
[ 27.450314] Using local APIC timer interrupts.
[ 27.500022] result 12557931
[ 27.502863] Detected 12.557 MHz APIC timer.
[ 27.510539] Booting processor 1/2 APIC 0x1
[ 27.514684] Unable to handle kernel NULL pointer dereference at 0000000000000088 RIP:
[ 27.520204] [<ffffffff80225fb0>] profile_tick+0x40/0x90
[ 27.528118] PGD 0
[ 27.530222] Oops: 0000 [1] SMP
[ 27.533505] CPU 0
[ 27.535610] Modules linked in:
[ 27.538755] Pid: 1, comm: swapper Not tainted 2.6.19-rc1 #5
[ 27.544367] RIP: 0010:[<ffffffff80225fb0>] [<ffffffff80225fb0>] profile_tick+0x40/0x90
[ 27.552483] RSP: 0000:ffffffff8059ff78 EFLAGS: 00010046
[ 27.557842] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000001
[ 27.565024] RDX: ffff810081a77f40 RSI: 0000000000000046 RDI: 0000000000000001
[ 27.572203] RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000007
[ 27.579383] R10: 0000000000000002 R11: ffffffff8032c8a0 R12: ffffffff805ae6a1
[ 27.586563] R13: 0000000000000012 R14: 0000000000000031 R15: 0000000000000246
[ 27.593744] FS: 0000000000000000(0000) GS:ffffffff80549000(0000) knlGS:0000000000000000
[ 27.601892] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
[ 27.607678] CR2: 0000000000000088 CR3: 0000000000201000 CR4: 00000000000006e0
[ 27.614859] Process swapper (pid: 1, threadinfo ffff8100021fe000, task ffff8100021ee740)
[ 27.623008] Stack: ffffffff8059a6e0 ffffffff804e2700 ffff8100021ffb30 ffffffff80215c7e
[ 27.631313] 0000000000bf9e6b ffffffff802161b5 0000000000000000 ffffffff8020a6f6
[ 27.638953] ffff8100021ffb30 <EOI> 0000000000000000 ffffffff8032c8a0 0000000000000002
[ 27.647016] Call Trace:
[ 27.649745] <IRQ> [<ffffffff80215c7e>] smp_local_timer_interrupt+0xe/0x60
[ 27.656803] [<ffffffff802161b5>] smp_apic_timer_interrupt+0x35/0x40
[ 27.663205] [<ffffffff8020a6f6>] apic_timer_interrupt+0x66/0x70
[ 27.669256] <EOI> [<ffffffff8032c8a0>] vgacon_cursor+0x0/0x1c8
[ 27.675364] [<ffffffff802252fd>] vprintk+0x2fd/0x350
[ 27.680462] [<ffffffff805755a5>] init_idle+0x95/0xb0
[ 27.685558] [<ffffffff8022539e>] printk+0x4e/0x60
[ 27.690391] [<ffffffff8021ce7f>] complete+0x3f/0x60
[ 27.695396] [<ffffffff8056ed8f>] __cpu_up+0x40f/0x7d0
[ 27.700575] [<ffffffff80215a70>] do_fork_idle+0x0/0x20
[ 27.705843] [<ffffffff804573cf>] __mutex_lock_slowpath+0x1df/0x1f0
[ 27.712157] [<ffffffff802407f2>] cpu_up+0xa2/0x120
[ 27.717082] [<ffffffff802070bb>] init+0x9b/0x330
[ 27.721830] [<ffffffff804585d9>] _spin_unlock_irq+0x9/0x10
[ 27.727451] [<ffffffff802209bc>] schedule_tail+0x4c/0xc0
[ 27.732897] [<ffffffff8020a8e5>] child_rip+0xa/0x15
[ 27.737906] [<ffffffff8033171e>] acpi_ds_init_one_object+0x0/0x82
[ 27.744131] [<ffffffff80207020>] init+0x0/0x330
[ 27.748791] [<ffffffff8020a8db>] child_rip+0x0/0x15
[ 27.753795]
[ 27.755327]
[ 27.755328] Code: f6 83 88 00 00 00 03 75 37 65 8b 04 25 24 00 00 00 0f a3 05
[ 27.765157] RIP [<ffffffff80225fb0>] profile_tick+0x40/0x90
[ 27.770908] RSP <ffffffff8059ff78>
[ 27.774442] CR2: 0000000000000088
[ 27.777803] <1>Unable to handle kernel NULL pointer dereference at 0000000000000088 RIP:
[ 27.783718] [<ffffffff80225fb0>] profile_tick+0x40/0x90

Going to bisect now... Again, not sure if its related to the irq
codes.


Attachments:
(No filename) (11.71 kB)
conf2618 (24.26 kB)
Download all attachments

2006-10-06 19:43:01

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.6.19-rc1 genirq causes either boot hang or "do_IRQ: cannot handle IRQ -1"

On Fri, 6 Oct 2006 12:00:39 -0700
Andrew Vasquez <[email protected]> wrote:

> [ 27.510539] Booting processor 1/2 APIC 0x1
> [ 27.514684] Unable to handle kernel NULL pointer dereference at 0000000000000088 RIP:
> [ 27.520204] [<ffffffff80225fb0>] profile_tick+0x40/0x90
> [ 27.528118] PGD 0
> [ 27.530222] Oops: 0000 [1] SMP
> [ 27.533505] CPU 0
> [ 27.535610] Modules linked in:
> [ 27.538755] Pid: 1, comm: swapper Not tainted 2.6.19-rc1 #5
> [ 27.544367] RIP: 0010:[<ffffffff80225fb0>] [<ffffffff80225fb0>] profile_tick+0x40/0x90
> [ 27.552483] RSP: 0000:ffffffff8059ff78 EFLAGS: 00010046
> [ 27.557842] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000001
> [ 27.565024] RDX: ffff810081a77f40 RSI: 0000000000000046 RDI: 0000000000000001
> [ 27.572203] RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000007


hm, we seem to have broken x86_64 completely.

smp_apic_timer_interrupt() needs to do

struct pt_regs *old_regs = set_irq_regs(regs);

on entry and

set_irq_regs(old_regs);

on exit.

But it doesn't get passed the pt_regs*

>From my reading of `macro apicinterrupt' in arch/x86_64/kernel/entry.S,
smp_apic_timer_interrupt() actually _does_ get passed the pt_reg*, only it
doesn't declare it. I think - Andi would need to confirm.

If I'm right...


diff -puN arch/x86_64/kernel/apic.c~x86_64-irq_regs-fix arch/x86_64/kernel/apic.c
--- a/arch/x86_64/kernel/apic.c~x86_64-irq_regs-fix
+++ a/arch/x86_64/kernel/apic.c
@@ -913,8 +913,10 @@ void smp_local_timer_interrupt(void)
* [ if a single-CPU system runs an SMP kernel then we call the local
* interrupt as well. Thus we cannot inline the local irq ... ]
*/
-void smp_apic_timer_interrupt(void)
+void smp_apic_timer_interrupt(struct pt_regs *regs)
{
+ struct pt_regs *old_regs = set_irq_regs(regs);
+
/*
* the NMI deadlock-detector uses this.
*/
@@ -934,6 +936,7 @@ void smp_apic_timer_interrupt(void)
irq_enter();
smp_local_timer_interrupt();
irq_exit();
+ set_irq_regs(old_regs);
}

/*
_


2006-10-06 19:53:32

by Benjamin LaHaise

[permalink] [raw]
Subject: Re: 2.6.19-rc1 genirq causes either boot hang or "do_IRQ: cannot handle IRQ -1"

On Fri, Oct 06, 2006 at 06:20:54PM +0200, Muli Ben-Yehuda wrote:
> On Fri, Oct 06, 2006 at 05:50:21PM +0200, Muli Ben-Yehuda wrote:
>
> > > What happens if you boot with max_cpus=1?
> >
> > Trying it now... woohoo, it boots all the way and stays up!
>
> Ok, after verifying that maxcpus=1 causes the problematic changeset to
> boot, I also tried maxcpus=1 with the tip of the tree. I hit this NULL
> pointer dereference in profile_tick, with and without
> maxcpus=1. Disassembly says that get_irq_regs() is returning NULL,
> which may or may not be related to the genirq issue.

I ran into this as well and managed to bisect it to the following commit:

7d12e780e003f93433d49ce78cfedf4b4c52adc5 is first bad commit
commit 7d12e780e003f93433d49ce78cfedf4b4c52adc5
Author: David Howells <[email protected]>
Date: Thu Oct 5 14:55:46 2006 +0100

IRQ: Maintain regs pointer globally rather than passing to IRQ handlers
...

-ben
--
"Time is of no importance, Mr. President, only life is important."
Don't Email: <[email protected]>.

2006-10-06 20:02:27

by Andrew Vasquez

[permalink] [raw]
Subject: Re: 2.6.19-rc1 genirq causes either boot hang or "do_IRQ: cannot handle IRQ -1"

On Fri, 06 Oct 2006, Andrew Morton wrote:

> On Fri, 6 Oct 2006 12:00:39 -0700
> Andrew Vasquez <[email protected]> wrote:
>
> > [ 27.510539] Booting processor 1/2 APIC 0x1
> > [ 27.514684] Unable to handle kernel NULL pointer dereference at 0000000000000088 RIP:
> > [ 27.520204] [<ffffffff80225fb0>] profile_tick+0x40/0x90
> > [ 27.528118] PGD 0
> > [ 27.530222] Oops: 0000 [1] SMP
> > [ 27.533505] CPU 0
> > [ 27.535610] Modules linked in:
> > [ 27.538755] Pid: 1, comm: swapper Not tainted 2.6.19-rc1 #5
> > [ 27.544367] RIP: 0010:[<ffffffff80225fb0>] [<ffffffff80225fb0>] profile_tick+0x40/0x90
> > [ 27.552483] RSP: 0000:ffffffff8059ff78 EFLAGS: 00010046
> > [ 27.557842] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000001
> > [ 27.565024] RDX: ffff810081a77f40 RSI: 0000000000000046 RDI: 0000000000000001
> > [ 27.572203] RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000007
>
>
> hm, we seem to have broken x86_64 completely.
>
> smp_apic_timer_interrupt() needs to do
>
> struct pt_regs *old_regs = set_irq_regs(regs);
>
> on entry and
>
> set_irq_regs(old_regs);
>
> on exit.
>
> But it doesn't get passed the pt_regs*
>
> >From my reading of `macro apicinterrupt' in arch/x86_64/kernel/entry.S,
> smp_apic_timer_interrupt() actually _does_ get passed the pt_reg*, only it
> doesn't declare it. I think - Andi would need to confirm.
>
> If I'm right...
>
>
> diff -puN arch/x86_64/kernel/apic.c~x86_64-irq_regs-fix arch/x86_64/kernel/apic.c
> --- a/arch/x86_64/kernel/apic.c~x86_64-irq_regs-fix
> +++ a/arch/x86_64/kernel/apic.c
> @@ -913,8 +913,10 @@ void smp_local_timer_interrupt(void)
> * [ if a single-CPU system runs an SMP kernel then we call the local
> * interrupt as well. Thus we cannot inline the local irq ... ]
> */
> -void smp_apic_timer_interrupt(void)
> +void smp_apic_timer_interrupt(struct pt_regs *regs)
> {
> + struct pt_regs *old_regs = set_irq_regs(regs);
> +
> /*
> * the NMI deadlock-detector uses this.
> */
> @@ -934,6 +936,7 @@ void smp_apic_timer_interrupt(void)
> irq_enter();
> smp_local_timer_interrupt();
> irq_exit();
> + set_irq_regs(old_regs);
> }
>
> /*

Patch appears to work.

At least I can now boot my x86_64 box.

2006-10-06 20:15:50

by Linus Torvalds

[permalink] [raw]
Subject: Re: 2.6.19-rc1 genirq causes either boot hang or "do_IRQ: cannot handle IRQ -1"



On Fri, 6 Oct 2006, Andrew Morton wrote:
>
> From my reading of `macro apicinterrupt' in arch/x86_64/kernel/entry.S,
> smp_apic_timer_interrupt() actually _does_ get passed the pt_reg*, only it
> doesn't declare it. I think - Andi would need to confirm.

Yeah, I think you're right.

Anybody want to test Andrew's patch?

Linus

2006-10-06 20:17:11

by Linus Torvalds

[permalink] [raw]
Subject: Re: 2.6.19-rc1 genirq causes either boot hang or "do_IRQ: cannot handle IRQ -1"



On Fri, 6 Oct 2006, Andrew Vasquez wrote:
>
> Patch appears to work.

Ahh, replied to Andrew too early.

Andrew, can you send that over with sign-off, and I'll apply it asap.

Linus

2006-10-06 20:20:03

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.6.19-rc1 genirq causes either boot hang or "do_IRQ: cannot handle IRQ -1"

On Fri, 6 Oct 2006 13:02:23 -0700
Andrew Vasquez <[email protected]> wrote:

> Patch appears to work.

OK, thanks - if Andi can confirm that this:

> -void smp_apic_timer_interrupt(void)
> +void smp_apic_timer_interrupt(struct pt_regs *regs)

really reflects reality then we're good to go.

2006-10-06 20:23:33

by Muli Ben-Yehuda

[permalink] [raw]
Subject: Re: 2.6.19-rc1 genirq causes either boot hang or "do_IRQ: cannot handle IRQ -1"

On Fri, Oct 06, 2006 at 11:47:12AM -0600, Eric W. Biederman wrote:
> Muli Ben-Yehuda <[email protected]> writes:
>
> > On Fri, Oct 06, 2006 at 09:14:53AM -0600, Eric W. Biederman wrote:
> >
> >> Muli Ben-Yehuda <[email protected]> writes:
> >
> > In some cases we haven't made it to userspace at all. In other, we're
> > in the initrd.
>
> Ok. So no irqbalanced?

Nope.

> Any non-standard firmware on this box like a hypervisor or weird APM
> code that could be causing problems.

BIOS is bog standard and has been working fine for at least a
year. The only firmware I updated recently was the aic94xx firmware
when aic94xx was merged into mainline.

> I'm just trying to think of things that might trip over a change in
> irq handling, besides a chipset.

Looking at the code below, aic94xx is certainly suspect.

> Can you try the debug patch below and tell me what it reports.
> As long as the problem irq is not for something important this
> should allow you to boot, and just collect the information.

Unfortunately aic94xx is pretty important, but we do get a lot
further.

> What I am hoping is that we will see which irq or irqs are having
> problems. Then we can check out how the irq controller for those
> irq are programmed.

I had to slightly redo your patch to cut down on the verbosity (and
get the pet CPU vector arrays correctly). This is over Serial-Over-Lan
which is painful beyond words and also tends to lose the most
interesting bits of the log. Sorry. Hopefully there's enough in here
to make progress.

patch I used (note: does not print vectors where IRQ is '-1'!):

diff -r fe0dbfd19a52 arch/x86_64/kernel/irq.c
--- a/arch/x86_64/kernel/irq.c Wed Oct 04 21:55:29 2006 +0700
+++ b/arch/x86_64/kernel/irq.c Fri Oct 06 22:02:45 2006 +0200
@@ -113,9 +113,21 @@ asmlinkage unsigned int do_IRQ(struct pt
irq = __get_cpu_var(vector_irq)[vector];

if (unlikely(irq >= NR_IRQS)) {
- printk(KERN_EMERG "%s: cannot handle IRQ %d\n",
- __FUNCTION__, irq);
- BUG();
+ if (printk_ratelimit()) {
+ int cpu, vec;
+ printk(KERN_EMERG "%s: cannot handle IRQ %d vector: %d cpu: %d\n",
+ __FUNCTION__, irq, vector, smp_processor_id());
+ for_each_online_cpu(cpu) {
+ for (vec = 0; vec < NR_VECTORS; vec++) {
+ irq = per_cpu(vector_irq, cpu)[vec];
+ if (irq != -1)
+ printk("v[%d][%d] -> %d\n",
+ cpu, vec, irq);
+ }
+ }
+ }
+ irq_exit();
+ return 1;
}

Boot log:

kernel (hd0,1)/boot/calgary/bzImage root=/dev/sda2 console=tty0 console=ttyS1,1
9200 [Linux-bzImage, setup=0x1c00, size=0x2e3a9e]
initrd (hd0,1)/boot/calgary/aic94xxfw.initramfs.gz [Linux-initrd @ 0x37e3f000, 0x1b01ca bytes]
savedefault

[ 0.000000] Linux version 2.6.18mx (muli@rhun) (gcc version 3.4.1) #159 SMP Fri Oct 6 22:03:10 IST 2006
[ 0.000000] Command line: root=/dev/sda2 console=tty0 console=ttyS1,19200
[ 0.000000] BIOS-provided physical RAM map:
[ 0.000000] BIOS-e820: 0000000000000000 - 0000000000099000 (usable) [ 0.000000] BIOS-e820: 0000000000099000 - 00000000000a0000 (reserved)
[ 0.000000] BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved)
[ 0.000000] BIOS-e820: 0000000000100000 - 00000000e7f9c640 (usable)
[ 0.000000] BIOS-e820: 00000000e7f9c640 - 00000000e7fa6a40 (ACPI data)
[ 0.000000] BIOS-e820: 00000000e7fa6a40 - 00000000e8000000 (reserved)
[ 0.000000] BIOS-e820: 00000000fec00000 - 0000000100000000 (reserved)
[ 0.000000] BIOS-e820: 0000000100000000 - 0000000198000000 (usable)
[ 0.000000] end_pfn_map = 1671168
[ 0.000000] DMI 2.3 present.
[ 0.000000] Zone PFN ranges:
[ 0.000000] DMA 0 -> 4096
[ 0.000000] DMA32 4096 -> 1048576
[ 0.000000] Normal 1048576 -> 1671168
[ 0.000000] early_node_map[3] active PFN ranges
[ 0.000000] 0: 0 -> 153
[ 0.000000] 0: 256 -> 950172
[ 0.000000] 0: 1048576 -> 1671168
[ 0.000000] ACPI: PM-Timer IO Port: 0x9c
[ 0.000000] ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
[ 0.000000] Processor #0 (Bootup-CPU)
[ 0.000000] ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] enabled)
[ 0.000000] Processor #1
[ 0.000000] ACPI: LAPIC (acpi_id[0x02] lapic_id[0x06] enabled)
[ 0.000000] Processor #6
[ 0.000000] ACPI: LAPIC (acpix1])
[ 0.000000] ACPI: LAPIC_NMI (acpi_id[0x03] dfl dfl lint[0x1])
[ 0.000000] ACPI: IOAPIC (id[0x0f] address[0xfec00000] gsi_base[0])
[ 0.000000] IOAPIC[0]: apic_id 15, address 0xfec00RC_OVR (bus 0 bus_irq 8 global_irq 8 low edge)
[ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 14 global_irq 14 low edge)
[ 0.000000] Setting APIC routing to flat
[ 0.000000] Using ACPI (MADT) for SMP configuration information
[ 0.000000] Nosave address range: 0000000000099000 - 00000000000a0000
[ 0.000000] Nosave address range: 00000000000a0000 - 00000000000e0000
[ 0.000000] Nosave address range: 00000000000e0000 - 0000000000100000
[ 0.000000] Nosave address range: 00000000e7f9c000 - 00000000e7f9d000
[ 0.000000] Nosave address range: 00000000e7f9d000 - 00000000e7fa6000
[ 0.000000] Nosave address range: 00000000e7fa6000 - 00000000e7fa7000
[ 0.000000] Nosave address range: 00000000e7fa7000 - 00000000e8000000
[ 0.000000] Nosave address range: 00000000e8000000 - 00000000fec00000
[ 0.000000] Nosave address range: 00000000fec00000 - 0000000100000000
[ 0.000000] Allocating PCI resources starting at ea000000 (gap: e8000000:16c00000)
78634] Console: colour VGA+ 80x25 34304 bytes of per cpu data
[ 145.314411] Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., Ingo Molnar
[ 145.360930] ... MAX_LOCKDEP_SUBCLASSES: 8
[ 145.386603] ... MAX_LOCK_DEPTH: 30
[ 145.411759] ... MAX_LOCKDEP_KEYS: 2048
[ 145.437952] ... CLASSHASH_SIZE: 1024
[ 145.464683] ... MAX_LOCKDEP_ENTRIES: 8192
[ 145.490864] ... MAX_LOCKDEP_CHAINS: 8192
[ 145.517060] ... CHAINHASH_SIZE: 4096
[ 145.543257] memory used by lock dependency info: 1328 kB
[ 145.575696] per task-struct memory footprint: 1680 bytes
[ 145.615363] Dentry cache hash table entries: 1048576 (order: 11, 8388608 bytes)
[ 145.670335] Inode-cache hash table entries: 524288 (order: 10, 4194304 bytes)
[ 145.716748] Checking aperture...
[ 145.759143] PCI-DMA: Calgary IOMMU detected.
[ 145.784792] PCI-DMA: Calgary TCE table spec is 7, CONFIG_IOMMU_DEBUG is enabled.
[ 145.946885] Memory: 6096428k/6684672k available (3789k kernel code, 193716k reserved, 2726k data, 276k init)
[ 146.085394] Calibrating delay using timer specific routine.. 6346.33 BogoMIPS (lpj=12692676)
[ 146.136398] Mount-cache hash table entries: 256
[ 146.165244] CPU: Trace cache: 12K uops, L1 D cache: 16K
[ 146.196746] CPU: L2 cache: 1024K
[ 146.216144] using mwait in[ 146.412642] ..MP-BIOS bug: 8254 timer not connected to IO-APIC
[ 146.487676] Using local APIC timer interrupts.
[ 146.545914] result 10425790
[ 146.562697] Detected 10.425 MHz APIC timer.
[ 146.590401] lockdep: not fixing up alternatives.
[ 146.618683] Booting processor 1/4 APIC 0x1
[ 146.653732] Initializing CPU#1
[ 146.733219] Calibrating delay using timer specific routine.. 6339.05 BogoMIPS (lpj=12678102)
[ 146.733236] CPU: Trace cache: 12K uops, L1 D cache: 16K
[ 146.733240] CPU: L2 cache: 1024K
[ 146.733244] CPU: Physical Processor ID: 0
[ 146.733246] CPU: Processor Core ID: 0
[ 146.733258] CPU1: Thermal monitoring enabled (TM1)
[ 146.733546] Intel(R) Xeon(TM) MP CPU 3.16GHz stepping 01
[ 146.737545] lockdep: not fixing up alternatives.
[ 146.999581] Booting processor 2/4 APIC 0x6
[ 147.034599] Initializing CPU#2
[ 147.113122] Calibrating delay using timer specific routine.. 6339.23 BogoMIPS (lpj=12678471)
[ 147.113135] CPU: Trace cache: 12K uops, L1 D cache: 16K
[ 147.113138] CPU: L2 cache: 1024K
[ 147.113141] CPU: Physical Processor ID: 3
[ 147.113143] CPU: Processor Core ID: 0
[ 147.113154] CPU2: Thermal monitoring enabled (TM1)
[ 147.113401] Intel(R) Pentium(R) 4 CPU 3.16GHz stepping 09
[ 147.117438] lockdep: not fixing up alternatives.
[ 147.379484] Booting processor 3/4 APIC 0x7
[ 147.414498] Initializing CPU#3
[ 147.493025] Calibrating delay using timer specific routine.. 6339.30 BogoMIPS (lpj=12678616)
[ 147.493039] CPU: Trace cache: 12K uops, L1 D cache: 16K
[ 147.493042] CPU: L2 cache: 1024K
[ 147.493045] CPU: Physical Processor ID: 3
[ 147.493047] CPU: Processor Core ID: 0
[ 147.493057] CPU3: Thermal monitoring enabled (TM1)
[ 147.493304] Intel(R) Pentium(R) 4 CPU 3.16GHz stepping 09
[ 147.497060] Brought up 4 CPUs
[ 147.749269] testing NMI watchdog ... OK.
[ 147.812984] time.c: Using 3.579545 MHz WALL PM GTOD PIT/TSC timer.
[ 147.850065] time.c: Detected 3169.464 MHz processor.
[ 148.098097] migration_cost=8,697
[ 148.118549] checking if image is initramfs... it is
[ 148.310397] Freeing initrd memory: 1728k freed
[ 148.340100] NET: Registered protocol family 16
[ 148.377174] ACPI: bus type pci registered
[ 148.401279] PCI: Using configuration type 1
[ 148.555966] ACPI: Interpreter enabled
[ 148.577976] ACPI: Using IOAPIC for interrupt routing
[ 148.614688] ACPI: PCI Root Bridge [VP00] (0000:00)
[ 148.646922] PCI: Ignoring BAR0-3 of IDE controller 0000:00:0f.1
[ 148.699359] ACPI: PCI Root Bridge [VP01] (0000:01)
[ 148.734053] ACPI: PCI Root Bridge [VP02] (0000:02)
[ 148.771953] ACPI: PCI Root Bridge [VP03] (0000:04)
[ 148.809918] ACPI: PCI Root Bridge [VP04] (0000:06)
[ 148.847903] ACPI: PCI Root Bridge [VP05] (0000:08)
[ 148.886043] ACPI: PCI Root Bridge [VP06] (0000:0a)
[ 148.923887] ACPI: PCI Root Bridge [VP07] (0000:0c)
[ 148.962138] SCSI subsystem initialized
[ 148.984887] usbcore: registered new interface driver usbfs
[ 149.018005] usbcore: registered new interface driver hub
[ 149.050080] usbcore: registered new device driver usb
[ 149.080874] PCI: Using ACPI for IRQ routing
[ 149.106052] PCI: If a device doesn't work, try "pci=routeirq". If it helps, post a report
[ 149.155947] PCI-DMA: Using Calgary IOMMU
[ 149.535125] Calgary: enabling translation on PHB 0
[ 149.563893] Calgary: errant DMAs will now be prevented on this bus.
[ 149.956626] Calgary: enabling translation on PHB 1
[ 149.985385] Calgary: errant DMAs will now be prevented on this bus.
[ 150.378420] Calgary: enabling translation on PHB 2
[ 150.407200] Calgary: errant DMAs will now be prevented on this bus.
[ 150.444887] PCI-GART: No AMD northbridge found.
[ 150.481504] NET: Registered protocol family 2
[ 150.564490] IP route cache hash table entries: 262144 (order: 9, 2097152 bytes)
[ 150.610394] TCP established hash table entries: 65536 (order: 9, 3670016 bytes)
[ 150.662161] TCP bind hash table entries: 32768 (order: 8, 1835008 bytes)
[ 150.705585] TCP: Hash tables configured (established 65536 bind 32768)
[ 150.744845] TCP reno registered
[ 150.788016] Total HugeTLB memory allocated, 0
[ 150.815888] Installing knfsd (copyright (C) 1996 [email protected]).
[ 150.854967] io scheduler noop registered
[ 150.878661] io scheduler anticipatory registered (default)
[ 150.911820] io scheduler deadline registered
[ 150.937631] io scheduler cfq registered
[ 150.968969] GSI 16 sharing vector 0xA9 and IRQ 16
[ 150.997256] ACPI: PCI Interrupt 0000:00:01.0[A] -> GSI 16 (level, low) -> IRQ 16
[ 151.042013] radeonfb: Found Intel x86 BIOS ROM Image
[ 151.084087] radeonfb: Retrieved PLL infos from BIOS
[ 151.113402] radeonfb: Reference=27.00 MHz (RefDiv=60) Memory=143.00 Mhz, System=143.00 MHz
[ 151.163038] radeonfb: PLL min 12000 max 35000
[ 151.293564] i2c_adapter i2c-1: unable to read EDID block.
[ 151.485429] i2c_adapter i2c-1: unable to read EDID block.
[ 151.677378] i2c_adapter i2c-1: unable to read EDID block.
[ 152.141253] i2c_adapter i2c-2: unable to read EDID block.
[ 152.333202] i2c_adapter i2c-2: unable to read EDID block.
[ 152.525151] i2c_adapter i2c-2: unable to read EDID block.
[ 152.679651] radeonfb: Monitor 1 type DFP found
[ 152.706339] radeonfb: EDID probed
[ 152.726291] radeonfb: Monitor 2 type CRT found
[ 153.789033] Console: switching to colour frame buffer device 128x48
[ 154.501204] radeonfb (0000:00:01.0): ATI Radeon QY
[ 154.533124] tridentfb: Trident framebuffer 0.7.8-NEWAPI initializing
[ 154.573158] hgafb: HGA card not detected.
[ 154.597485] hgafb: probe of hgafb.0 failed with error -22
[ 154.632874] vga16fb: mapped to 0xffff8100000a0000
[ 154.661533] fb1: VGA16 VGA frame buffer device
[ 154.690014] fb2: Virtual frame buffer device, using 1024K of video memory
[ 154.731323] ACPI: Power Button (FF) [PWRF]
[ 154.756904] ibm_acpi: ec object not found
[ 155.168952] Linux agpgart interface v0.101 (c) Dave Jones
[ 155.201821] ipmi message handler version 39.0
[ 155.228099] ipmi device interface
[ 155.248427] Hangcheck: starting hangcheck timer 0.9.0 (tick is 180 seconds, margin is 60 seconds).
[ 155.302532] Hangcheck: Using monotonic_clock().
[ 155.329922] Serial: 8250/16550 driver $Revision: 1.90 $ 4 ports, IRQ sharing disabled
[ 155.377566] serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
[ 155.414588] serial8250: ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
[ 155.463482] RAMDISK driver initialized: 16 RAM disks of 16384K size 1024 blocksize
[ 155.515187] loop: loaded (max 8 devices)
[ 155.539151] ibmasm: IBM ASM Service Processor Driver version 1.0 loaded
[ 155.579273] GSI 17 sharing vector 0xB1 and IRQ 17
[ 155.607733] ACPI: PCI Interrupt 0000:02:01.0[A] -> GSI 18 (level, low) -> IRQ 17
[ 155.652707] 3c59x: Donald Becker and others. http://www.scyld.com/network/vortex.html
[ 155.652720] 0000:02:01.0: 3Com PCI 3c905C Tornado at ffffc20000042000.
[ 155.679897] tg3.c:v3.66 (September 23, 2006)
[ 155.679934] GSI 18 sharing vector 0xB9 and IRQ 18
[ 155.679943] ACPI: PCI Interrupt 0000:01:01.0[A] -> GSI 24 (level, low) -> IRQ 18
[ 155.820675] eth1: Tigon3 [partno(BCM95704A6) rev 2100 PHY(5704)] (PCIX:66MHz:64-bit) 10/100/1000BaseT Ethernet 00:0d:60:98:74:22
[ 155.820710] eth1: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[1] Split[0] WireSpeed[1] TSOcap[0]
[ 155.820737] eth1: dma_rwctrl[769f0000] dma_mask[64-bit]
[ 155.821520] GSI 19 sharing vector 0xC1 and IRQ 19
[ 155.821530] ACPI: PCI Interrupt 0000:01:01.1[B] -> GSI 28 (level, low) -> IRQ 19
[ 155.987822] eth2: Tigon3 [partno(BCM95704A6) rev 2100 PHY(5704)] (PCIX:66MHz:64-bit) 10/100/1000BaseT Ethernet 00:0d:60:98:74:23
[ 155.987833] eth2: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] Split[0] WireSpeed[1] TSOcap[1]
[ 155.987837] eth2: dma_rwctrl[769f0000] dma_mask[64-bit]
[ 155.988518] Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
[ 155.988524] ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
[ 155.988627] SvrWks CSB6: IDE controller at PCI slot 0000:00:0f.1
[ 155.988651] SvrWks CSB6: chipset revision 160
[ 155.988654] SvrWks CSB6: not 100% native mode: will probe irqs later
[ 155.988682] ide0: BM-DMA at 0x0700-0x0707, BIOS settings: hda:DMA, hdb:DMA
[ 155.988705] SvrWks CSB6: simplex device: DMA disabled
[ 155.988708] ide1: SvrWks CSB6 Bus-Master DMA disabled (BIOS)
[ 156.731022] hda: HL-DT-STDVD-ROM GDR8082N, ATAPI CD/DVD-ROM drive
[ 157.075385] ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
[ 157.650961] hda: ATAPI 24X DVD-ROM drive, 256kB Cache
[ 157.666573] Uniform CD-ROM driver Revision: 3.20
[ 157.795737] usbmon: debugfs is not available
[ 157.864486] GSI 20 sharing vector 0xC9 and IRQ 20
[ 157.934987] ACPI: PCI Interrupt 0000:00:03.0[A] -> GSI 20 (level, low) -> IRQ 20
[ 158.022162] ohci_hcd 0000:00:03.0: OHCI Host Controller
[ 158.097479] ohci_hcd 0000:00:03.0: new USB bus registered, assigned bus number 1
[ 158.184780] ohci_hcd 0000:00:03.0: irq 20, io mem 0xf2c10000
[ 158.348499] usb usb1: Product: OHCI Host Controller
[ 158.420345] usb usb1: Manufacturer: Linux 2.6.18mx ohci_hcd
[ 158.496076] usb usb1: SerialNumber: 0000:00:03.0
[ 158.566466] usb usb1: configuration #1 chosen from 1 choice
[ 158.642874] hub 1-0:1.0: USB hub found
[ 158.707016] hub 1-0:1.0: 2 ports detected
[ 158.879585] ACPI: PCI Interrupt 0000:00:03.1[B] -> GSI 20 (level, low) -> IRQ 20
[ 158.965660] ohci_hcd 0000:00:03.1: OHCI Host Controller
[ 159.037917] ohci_hcd 0000:00:03.1: new USB bus registered, assigned bus number 2
[ 159.123547] ohci_hcd 0000:00:03.1: irq 20, io mem 0xf2c11000
[ 159.288015] usb usb2: Product: OHCI Host Controller
[ 159.357481] usb usb2: Manufacturer: Linux 2.6.18mx ohci_hcd
[ 159.430967] usb usb2: SerialNumber: 0000:00:03.1
[ 159.432415] usb usb2: configuration #1 chosen from 1 choice
[ 159.433482] hub 2-0:1.0: USB hub found
[ 159.433500] hub 2-0:1.0: 2 ports detected
[ 159.2833] USB Universal Host Controller Interface driver v3.0
[ 159.705996] serio: i8042 KBD port at 0x60,0x64 irq 1
[ 159.706135] serio: i8042 AUX port at 0x60,0x64 irq 12
[ 159.750228] mice: PS/2 mouse device common for all mice
[ 159.769058] input: PC Speaker as /class/input/input0
[ 159.781128] input: AT Translated Set 2 keyboard as /class/input/input1
[ 159.791200] i2c /dev entries driver
[ 159.798292] do_IRQ: cannot handle IRQ -1 vector: 137 cpu: 1
[ 159.798299] v[0][32] -> 0
[ 159.798302] v[0][33] -> 1
[ 159.798308] v[0][34] -> 2
[ 159.798313] v[0][35] -> 3
[ 159.798318] v[0][36] -> 4
[ 159.798323] v[0][37] -> 5
[ 159.798328] v[0][38] -> 6
[ 159.798333] v[0][39] -> 7
[ 159.798338] v[0][40] -> 8
[ 159.798341] v[0][41] -> 9
[ 159.798345] v[0][42] -> 10
[ 159.798351] v[0][43] -> 11
[ 159.798356] v[0][44] -> 12
[ 159.798361] v[0][45] -> 13
[ 159.798366] v[0][46] -> 14
[ 159.798371] v[0][47] -> 15
[ 159.798376] v[0][49] -> 0
[ 159.798382] v[0][57] -> 1
[ 159.798387] v[0][65] -> 3
[ 159.798392] v[0][73] -> 4
[ 159.798397] v[0][81] -> 5
[ 159.798402] v[0][89] -> 6
[ 159.798407] v[0][97] -> 7
[ 159.798412] v[0][105] -> 8
[ 159.798417] v[0][113] -> 9
[ 159.798422] v[0][121] -> 10
[ 159.798427] v[0][129] -> 11
[ 159.798431] v[0][137] -> 12
[ 159.798436] v[0][145] -> 13
[ 159.798441] v[0][153] -> 14
[ 159.798446] v[0][161] -> 15
[ 159.798451] v[0][169] -> 16
[ 159.798456] v[0][177] -> 17
[ 159.798461] v[0][185] -> 18
[ 159.798465] v[0][193] -> 19
[ 159.798469] v[0][201] -> 20
[ 159.798475] v[1][32] -> 0
[ 159.798478] v[1][33] -> 1
[ 159.798482] v[1][34] -> 2
[ 159.798485] v[1][35] -> 3
[ 159.798489] v[1][36] -> 4
[ 159.798494] v[1][37] -> 5
[ 159.798497] v[1][38] -> 6
[ 159.798500] v[1][39] -> 7
[ 159.798503] v[1][40] -> 8
[ 159.798507] v[1][41] -> 9
[ 159.798512] v[1][42] -> 10
[ 159.798517] v[1][43] -> 11
[ 159.798522] v[1][44] -> 12
[ 159.798527] v[1][45] -> 13
[ 159.798532] v[1][46] -> 14
[ 159.798537] v[1][47] -> 15
[ 159.798544] v[2][32] -> 0
[ 159.798548] v[2][33] -> 1
[ 159.798553] v[2][34] -> 2
[ 159.798558] v[2][35] -> 3
[ 159.798564] v[2][36] -> 4
[ 159.798567] v[2][37] -> 5
[ 159.798571] v[2][38] -> 6
[ 159.798575] v[2][39] -> 7
[ 159.798581] v[2][40] -> 8
[ 159.798586] v[2][41] -> 9
[ 159.798589] v[2][[ 159.798632] v[3][35] -> 3
[ 159.798637] v[3][36] -> 4
[ 159.798642] v[3][37] -> 5
[ 159.798647] v[3][38] -> 6
[ 159.798653] v[3][39] -> 7
[ 159.798657] v[3][40] -> 8
[ 159.798663] v[3][41] -> 9
[ 159.798667] v[3][42] -> 10
[ 159.798670] v[3][43] -> 11
[ 159.798674] v[3][44] -> 12
[ 159.798679] v[3][45] -> 13
[ 159.798683] v[3][46] -> 14
[ 159.798688] v[3][47] -> 15
[ 159.804460] i2c-parport: adapter type unspecified
[ 160.009819] i2c_adapter i2c-9191: Driver w83781d-isa failed to attach adapter, unregistering
[ 160.018828] i2c_adapter i2c-9191: Driver lm78-isa failed to attach adapter, unregistering
[ 160.025448] md: linear personality registered for level -1
[ 160.025457] md: raid0 personality registered for level 0
[ 160.025461] md: raid1 personality registered for level 1
[ 160.025466] md: multipath personality registered for level -4
[ 163.205667] device-mapper: ioctl: 4.10.0-ioctl (2006-09-14) initialised: [email protected]
[ 163.268868] device-mapper: multipath: version 1.0.5 loaded
[ 163.314732] device-mapper: multipath round-robin: version 1.0.0 loaded
[ 163.366960] device-mapper: multipath emc: version 0.0.3 loaded
[ 163.415568] EDAC MC: Ver: 2.0.1 Oct 6 2006
[ 163.455463] pktgen v2.68: Packet Generator for packet performance testing.
[ 163.512367] u32 classifier
[ 163.544695] OLD policer on
[ 163.579860] IPv4 over IPv4 tunneling driver
[ 163.621848] GRE over IPv4 tunneling driver
[ 163.663831] TCP cubic registered
[ 163.700685] Initializing XFRM netlink socket
[ 163.744394] NET: Registered protocol family 1
[ 163.788355] NET: Registered protocol family 17
[ 163.833095] NET: Registered protocol family 15
[ 163.878044] 802.1Q VLAN Support v1.8 Ben Greear <[email protected]>
[ 163.937434] All bugs added by David S. Miller <[email protected]>
[ 164.033637] SCTP: Hash tables configured (established 37449 bind 37449)
[ 164.095781] Freeing unused kernel memory: 276k freed
running (1:0) /init
hello worl[ 164.161678] aic94xx: Adaptec aic94xx SAS/SATA driver version 1.0.2 loaded
d from the initrd1!


[ 164.359782] aic94xx: found Adaptec AIC-9410W SAS/SATA Host Adapter, device 0000:01:02.0
[ 164.431025] scsi0 : aic94xx
[ 164.474953] aic94xx: BIOS present (1,1), 1323
[ 164.525536] aic94xx: ue num:2, ue size:88
[ 164.592941] aic94xx: manuf sect SAS_ADDR 5005076a0112df00
[ 164.650431] aic94xx: manuf sect PCBA SN
[ 164.699353] aic94xx: ms: num_phy_desc: 8
[ 164.748439] aic94xx: ms: phy0: ENEBLEABLE
[ 164.798368] aic94xx: ms: phy1: ENEBLEABLE
[ 164.848359] aic94xx: ms: phy2: ENEBLEABLE
[ 164.898376] aic94xx: ms: phy3: ENEBLEABLE
[ 164.948424] aic94xx: ms: phy4: ENEBLEABLE
[ 164.998583] aic94xx: ms: phy5: ENEBLEABLE
[ 165.048668] aic94xx: ms: phy6: ENEBLEABLE
[ 165.098863] aic94xx: ms: phy7: ENEBLEABLE
[ 165.149062] aic94xx: ms: max_phys:0x8, num_phys:0x8
[ 165.204777] aic94xx: ms: enabled_phys:0xff
[ 165.268987] aic94xx: ctrla: phy0: sas_addr: 5005076a0112df00, sas rate:0x9-0x8, sata rate:0x0-0x0, flags:0x0
[ 165.355790] aic94xx: ctrla: phy1: sas_addr: 5005076a0112df00, sas rate:0x9-0x8, sata rate:0x0-0x0, flags:0x0
[ 165.442336] aic94xx: ctrla: phy2: sas_addr: 5005076a0112df00, sas rate:0x9-0x8, sata rate:0x0-0x0, flags:0x0
[ 165.527401] aic94xx: ctrla: phy3: sas_addr: 5005076a0112df00, sas rate:0x9-0x8, sata rate:0x0-0x0, flags:0x0
[ 165.611685] aic94xx: ctrla: phy4: sas_addr: 5005076a0112df00, sas rate:0x9-0x8, sata rate:0x0-0x0, flags:0x0
[ 165.695125] aic94xx: ctrla: phy5: sas_addr: 5005076a0112df00, sas rate:0x9-0x8, sata rate:0x0-0x0, flags:0x0
[ 165.778164] aic94xx: ctrla: phy6: sas_addr: 5005076a0112df00, sas rate:0x9-0x8, sata rate:0x0-0x0, flags:0x0
[ 165.861369] aic94xx: ctrla: phy7: sas_addr: 5005076a0112df00, sas rate:0x9-0x8, sata rate:0x0-0x0, flags:0x0
[ 165.944102] aic94xx: max_scbs:512, max_ddbs:128
[ 165.995003] aic94xx: setting phy0 addr to 5005076a0112df00
[ 166.051079] aic94xx: setting phy1 addr to 5005076a0112df00
[ 166.107028] aic94xx: setting phy2 addr to 5005076a0112df00
[ 166.162587] aic94xx: setting phy3 addr to 5005076a0112df00
[ 166.217456] aic94xx: setting phy4 addr to 5005076a0112df00
[ 166.271516] aic94xx: setting phy5 addr to 5005076a0112df00
[ 166.324662] aic94xx: setting phy6 addr to 5005076a0112df00
[ 166.377457] aic94xx: setting phy7 addr to 5005076a0112df00
[ 166.430105] aic94xx: num_edbs:21
[ 166.469378] aic94xx: num_escbs:3
[ 166.513157] aic94xx: using sequencer V17/10c6
[ 166.558549] aic94xx: downloading CSEQ...
[ 166.601389] aic94xx: dma-ing 8192 bytes
[ 166.647675] aic94xx: verified 8192 bytes, passed
[ 166.695042] aic94xx: downloading LSEQs...
[ 166.738834] aic94xx: dma-ing 14336 bytes
[ 166.788238] aic94xx: LSEQ0 verified 14336 bytes, passed
[ 166.844965] aic94xx: LSEQ1 verified 14336 bytes, passed
[ 166.901410] aic94xx: LSEQ2 verified 14336 bytes, passed
[ 166.957139] aic94xx: LSEQ3 verified 14336 bytes, passed
[ 167.011865] aic94xx: LSEQ4 verified 14336 bytes, passed
[ 167.065769] aic94xx: LSEQ5 verified 14336 bytes, passed
[ 167.119471] aic94xx: LSEQ6 verified 14336 bytes, passed
[ 167.172184] aic94xx: LSEQ7 verified 14336 bytes, passed
[ 167.241724] aic94xx: max_scbs:446
[ 167.276569] aic94xx: first_scb_site_no:0x20
[ 167.316600] aic94xx: last_scb_site_no:0x1fe
[ 167.356463] aic94xx: First SCB dma_handle: 0xd000
[ 167.400345] aic94xx: device 0000:01:02.0: SAS addr 5005076a0112df00, PCBA SN , 8 phys, 8 enabled phys, flash present, BIOS build 1323
[ 167.506200] aic94xx: posting 3 escbs
[ 167.546503] aic94xx: escbs posted
[ 167.591039] aic94xx: posting 8 control phy scbs
[ 167.637137] aic94xx: enabled phys
[ 167.640100] aic94xx: control_phy_tasklet_complete: phy0, lrate:0x9, proto:0xe
[ 167.640188] aic94xx: escb_tasklet_complete: phy0: BYTES_DMAED
[ 167.640383] aic94xx: SAS proto IDENTIFY:
[ 167.640386] aic94xx: 00: 10 00 00 08
[ 167.640388] aic94xx: 04: 00 00 00 00
[ 167.640390] aic94xx: 08: 00 00 00 00
[ 167.640392] aic94xx: 0c: 50 00 c5 00
[ 167.640394] aic94xx: 10: 00 32 f3 95
[ 167.640396] aic94xx: 14: 00 00 00 00
[ 167.640398] aic94xx: 18: 00 00 00 00
[ 167.640581] aic94xx: control_phy_tasklet_complete: phy4, lrate:0x9, proto:0xe
[ 167.640585] aic94xx: escb_tasklet_complete: phy4: BYTES_DMAED
[ 167.640588] aic94xx: SAS proto IDENTIFY:
[ 167.640590] aic94xx: 00: 10 00 00 08
[ 167.640592] aic94xx: 04: 00 00 00 00
[ 167.640594] aic94xx: 08: 00 00 00 00
[ 167.640596] aic94xx: 0c: 50 00 c5 00
[ 167.640598] aic94xx: 10: 00 32 f5 25
[ 167.640599] aic94xx: 14: 00 00 00 00
[ 167.640601] aic94xx: 18: 00 00 00 00
[ 167.640725] sas: phy0 added to port0, phy_mask:0x1
[ 167.641150] sas: phy4 added to port1, phy_mask:0x10
[ 167.647276] aic94xx: control_phy_tasklet_complete: phy1: no device present: oob_status:0x0
[ 167.647292] aic94xx: control_phy_tasklet_complete: phy2: no device present: oob_status:0x0
[ 167.647306] aic94xx: control_phy_tasklet_complete: phy3: no device present: oob_status:0x0
[ 167.647320] aic94xx: control_phy_tasklet_complete: phy5: no device present: oob_status:0x0
[ 167.647334] aic94xx: control_phy_tasklet_complete: phy6: no device present: oob_status:0x0
[ 167.647347] aic94xx: control_phy_tasklet_complete: phy7: no device present: oob_status:0x0
[ 167.647904] sas: DOING DISCOVERY on port 0, pid:1091
[ 167.660165] scsi 0:0:0:0: Direct-Access IBM-ESXS ST936701SS F B512 PQ: 0 ANSI: 4
[ 167.673935] SCSI device sda: 71096640 512-byte hdwr sectors (36401 MB)
[ 167.675156] sda: Write Protect is off
[ 167.676755] SCSI device sda: drive cache: write through w/ FUA
[ 167.680184] SCSI device sda: 71096640 512-byte hdwr sectors (36401 MB)
[ 167.681399] sda: Write Protect is off
[ 167.682899] SCSI device sda: drive cache: write through w/ FUA
[ 167.683098] sda: sda1 sda2
[ 169.583450] sd 0:0:0:0: Attached scsi disk sda
[ 169.636424] sd 0:0:0:0: Attached scsi generic sg0 type 0
[ 169.694797] sas: DONE DISCOVERY on port 0, pid:1091, result:0
[ 169.755353] sas: DOING DISCOVERY on port 1, pid:1091
[ 169.813547] scsi 0:0:1:0: Direct-Access IBM-ESXS ST936701SS F B512 PQ: 0 ANSI: 4
[ 169.889504] SCSI device sdb: 71096640 512-byte hdwr sectors (36401 MB)
[ 169.954627] sdb: Write Protect is off
[ 170.003087] SCSI device sdb: drive cache: write through w/ FUA
[ 170.003785] SCSI device sdb: 71096640 512-byte hdwr sectors (36401 MB)
[ 170.004997] sdb: Write Protect is off
[ 170.006496] SCSI device sdb: drive cache: write through w/ FUA
[ 170.006500] sdb: sdb1 sdb2
[ 170.066820] sd 0:0:1:0: Attached scsi disk sdb
[ 170.068203] sd 0:0:1:0: Attached scsi generic sg1 type 0
[ 170.068551] do_IRQ: cannot handle IRQ -1 vector: 209 cpu: 1
[ 170.068556] v[0][32] -> 0
[ 170.068559] v[0][33] -> 1
[ 170.068562] v[0][34] -> 2
[ 170.068564] v[0][35] -> 3
[ 170.068566] v[0][36] -> 4
[ 170.068569] v[0][37] -> 5
[ 170.068571] v[0][38] -> 6
[ 170.068573] v[0][39] -> 7
[ 170.068576] v[0][40] -> 8
[ 170.068578] v[0][41] -> 9
[ 170.068581] v[0][42] -> 10
[ 170.068583] v[0][43] -> 11
[ 170.068586] v[0][44] -> 12
[ 170.068588] v[0][45] -> 13
[ 170.068591] v[0][46] -> 14
[ 170.068594] v[0][47] -> 15
[ 170.068597] v[0][49] -> 0
[ 170.068600] v[0][57] -> 1
[ 170.068602] v[0][65] -> 3
[ 170.068605] v[0][73] -> 4
[ 170.068607] v[0][81] -> 5
[ 170.068610] v[0][89] -> 6
[ 170.068613] v[0][97] -> 7
[ 170.068615] v[0][105] -> 8
[ 170.068619] v[0][113] -> 9
[ 170.068622] v[0][121] -> 10
[ 170.068624] v[0][129] -> 11
[ 170.068627] v[0][137] -> 12
[ 170.068630] v[0][145] -> 13
[ 170.068633] v[0][153] -> 14
[ 170.068635] v[0][161] -> 15
[ 170.068638] v[0][169] -> 16
[ 170.068641] v[0][177] -> 17
[ 170.068644] v[0][185] -> 18
[ 170.068647] v[0][193] -> 19
[ 170.068650] v[0][201] -> 20
[ 170.068653] v[0][209] -> 21
[ 170.068656] v[1][32] -> 0
[ 170.068659] v[1][33] -> 1
[ 170.068662] v[1][34] -> 2
[ 170.068664] v[1][35] -> 3
[ 170.068667] v[1][36] -> 4
[ 170.068670] v[1][37] -> 5
[ 170.068673] v[1][38] -> 6
[ 170.068675] v[1][39] -> 7
[ 170.068678] v[1][40] -> 8
[ 170.068681] v[1][41] -> 9
[ 170.068684] v[1][42] -> 10
[ 170.068686] v[1][ 170.068713] v[2][36] -> 4
[ 170.068716] v[2][37] -> 5
[ 170.068719] v[2][38] -> 6
[ 170.068721] v[2][39] -> 7
[ 170.068724] v[2][40] -> 8
[ 170.068727] v[2][41] -> 9
[ 170.068729] v[2][42] -> 10
[ 170.068732] v[2][43] -> 11
[ 170.068735] v[2][44] -> 12
[ 170.068737] v[2][45] -> 13
[ 170.068740] v[2][46] -> 14
[ 170.068743] v[2][47] -> 15
[ 170.068748] v[3][32] -> 0
[ 170.068750] v[3][33] -> 1
[ 170.068753] v[3][34] -> 2
[ 170.068756] v[3][35] -> 3
[ 170.068758] v[3][36] -> 4
[ 170.068761] v[3][37] -> 5
[ 170.068763] v[3][38] -> 6
[ 170.068766] v[3][39] -> 7
[ 170.068768] v[3][40] -> 8
[ 170.068771] v[3][41] -> 9
[ 170.068773] v[3][42] -> 10
[ 170.068776] v[3][43] -> 11
[ 170.068779] v[3][44] -> 12
[ 170.068781] v[3][45] -> 13
[ 170.068784] v[3][46] -> 14
[ 170.068786] v[3][47] -> 15
[ 176.069363] sas: command 0xffff810196bc5e00, task 0xffff810196bc0c80, timed out: EH_NOT_HANDLED
[ 176.129298] sas: Enter sas_scsi_recover_host
[ 176.163414] sas: going over list...
[ 176.163417] sas: trying to find task 0xffff810196bc0c80
[ 176.163421] sas: sas_scsi_find_task: aborting task 0xffff810196bc0c80
[ 181.163990] aic94xx: tmf timed out
[ 181.194452] aic94xx: tmf came back
[ 181.225064] aic94xx: task not done, clearing nexus
[ 181.264643] aic94xx: asd_clear_nexus_index: PRE
[ 181.302787] aic94xx: asd_clear_nexus_index: POST
[ 181.341540] aic94xx: asd_clear_nexus_index: clear nexus posted, waiting...
[ 186.342601] aic94xx: asd_clear_nexus_timedout: here
[ 191.385250] aic94xx: came back from clear nexus
[ 191.425072] aic94xx: task not done, clearing nexus
[ 191.467109] aic94xx: asd_clear_nexus_index: PRE
[ 191.507704] aic94xx: asd_clear_nexus_index: POST
[ 191.548862] aic94xx: asd_clear_nexus_index: clear nexus posted, waiting...
[ 196.547861] aic94xx: asd_clear_nexus_timedout: here
[ 201.590511] aic94xx: came back from clear nexus
[ 201.632681] aic94xx: task 0xffff810196bc0c80 aborted, res: 0x5
[ 201.683243] sas: sas_scsi_find_task: querying task 0xffff810196bc0c80
[ 206.737126] aic94xx: tmf timed out
[ 206.774250] aic94xx: asd_initiate_ssp_tmf: converting result 0x5 to TMF_RESP_FUNC_FAILED
[ 206.840183] sas: sas_scsi_find_task: aborting task 0xffff810196bc0c80
[ 211.839757] aic94xx: tmf timed out
[ 211.878727] aic94xx: tmf came back
[ 211.917837] aic94xx: task not done, clearing nexus
[ 211.965846] aic94xx: asd_clear_nexus_index: PRE
[ 212.012452] aic94xx: asd_clear_nexus_index: POST
[ 212.059737] aic94xx: asd_clear_nexus_index: clear nexus posted, waiting...
[ 217.058357] aic94xx: asd_clear_nexus_timedout: here
[ 222.109005] aic94xx: came back from clear nexus
[ 222.157294] aic94xx: task not done, clearing nexus
[ 222.207692] aic94xx: asd_clear_nexus_index: PRE
[ 222.256718] aic94xx: asd_clear_nexus_index: POST
[ 222.306425] aic94xx: asd_clear_nexus_index: clear nexus posted, waiting...
[ 227.307607] aic94xx: asd_clear_nexus_timedout: here
[ 232.358255] aic94xx: came back from clear nexus
[ 232.409012] aic94xx: task 0xffff810196bc0c80 aborted, res: 0x5
[ 232.468159] sas: sas_scsi_find_task: querying task 0xffff810196bc0c80
[ 237.532862] aic94xx: tmf timed out
[ 237.578545] aic94xx: asd_initiate_ssp_tmf: converting result 0x5 to TMF_RESP_FUNC_FAILED
[ 237.652894] sas: sas_scsi_find_task: aborting task 0xffff810196bc0c80
[ 242.719470] aic94xx: tmf timed out
[ 242.766948] aic94xx: tmf came back
[ 242.766951] aic94xx: task not done, clearing nexus
[ 242.766953] aic94xx: asd_clear_nexus_index: PRE
[ 242.766962] aic94xx: asd_clear_nexus_index: POST
[ 242.766982] aic94xx: asd_clear_nexus_index: clear nexus posted, waiting...

Cheers,
Muli

2006-10-06 20:42:42

by Muli Ben-Yehuda

[permalink] [raw]
Subject: Re: 2.6.19-rc1 genirq causes either boot hang or "do_IRQ: cannot handle IRQ -1"

On Fri, Oct 06, 2006 at 01:02:23PM -0700, Andrew Vasquez wrote:

> > diff -puN arch/x86_64/kernel/apic.c~x86_64-irq_regs-fix arch/x86_64/kernel/apic.c
> > --- a/arch/x86_64/kernel/apic.c~x86_64-irq_regs-fix
> > +++ a/arch/x86_64/kernel/apic.c
> > @@ -913,8 +913,10 @@ void smp_local_timer_interrupt(void)
> > * [ if a single-CPU system runs an SMP kernel then we call the local
> > * interrupt as well. Thus we cannot inline the local irq ... ]
> > */
> > -void smp_apic_timer_interrupt(void)
> > +void smp_apic_timer_interrupt(struct pt_regs *regs)
> > {
> > + struct pt_regs *old_regs = set_irq_regs(regs);
> > +
> > /*
> > * the NMI deadlock-detector uses this.
> > */
> > @@ -934,6 +936,7 @@ void smp_apic_timer_interrupt(void)
> > irq_enter();
> > smp_local_timer_interrupt();
> > irq_exit();
> > + set_irq_regs(old_regs);
> > }
> >
> > /*
>
> Patch appears to work.
>
> At least I can now boot my x86_64 box.

Patch fixes the profile_tick() problem for me too. I can boot the tip
of the tree now provided I use maxcpus=1 to work around the genirq
bug.

Cheers,
Muli

2006-10-06 23:45:27

by Eric W. Biederman

[permalink] [raw]
Subject: Re: 2.6.19-rc1 genirq causes either boot hang or "do_IRQ: cannot handle IRQ -1"

Muli Ben-Yehuda <[email protected]> writes:

> On Fri, Oct 06, 2006 at 11:47:12AM -0600, Eric W. Biederman wrote:
>> Muli Ben-Yehuda <[email protected]> writes:
>>
>> > On Fri, Oct 06, 2006 at 09:14:53AM -0600, Eric W. Biederman wrote:
>> >
>> >> Muli Ben-Yehuda <[email protected]> writes:
>> >
>> > In some cases we haven't made it to userspace at all. In other, we're
>> > in the initrd.
>>
>> Ok. So no irqbalanced?
>
> Nope.
>
>> Any non-standard firmware on this box like a hypervisor or weird APM
>> code that could be causing problems.
>
> BIOS is bog standard and has been working fine for at least a
> year. The only firmware I updated recently was the aic94xx firmware
> when aic94xx was merged into mainline.
>
>> I'm just trying to think of things that might trip over a change in
>> irq handling, besides a chipset.
>
> Looking at the code below, aic94xx is certainly suspect.
>
>> Can you try the debug patch below and tell me what it reports.
>> As long as the problem irq is not for something important this
>> should allow you to boot, and just collect the information.
>
> Unfortunately aic94xx is pretty important, but we do get a lot
> further.

Yes.

>> What I am hoping is that we will see which irq or irqs are having
>> problems. Then we can check out how the irq controller for those
>> irq are programmed.
>
> I had to slightly redo your patch to cut down on the verbosity (and
> get the pet CPU vector arrays correctly). This is over Serial-Over-Lan
> which is painful beyond words and also tends to lose the most
> interesting bits of the log. Sorry. Hopefully there's enough in here
> to make progress.

Ok. A couple of interesting tidbits here.
The first is that it is simply not enough to return, to get avoid this.
This may be simply because we are not acknowledging the irq.

If I read your bootlog right. You have logical cpus, but only two
sockets, and I think only two cores. The other two logical cpus
being hyperthreaded.

The irq routing is behaving as I would expect, only cpu 0 is being
setup.

So I guess what I need, and didn't provide the code to inspect
is how the ioapic are being programmed.

It looks like either we are programming them wrong, or that
we can't actually control which part of a cpu in a socket
gets an irq.

Here is a quick debug patch that while being over kill
will show how we are programming the ioapic for these problem
interrupts.

Once I know how we have programmed the ioapics. I will know if
this is a weird irq delivery condition or a bug in our ioapic programming.

I guess I need to start digging through the cpu documentation and errata
and see if I can find any hint of what I am seeing here.

Thanks for your help,
Eric





diff --git a/arch/x86_64/kernel/io_apic.c b/arch/x86_64/kernel/io_apic.c
index 91728d9..738cc97 100644
--- a/arch/x86_64/kernel/io_apic.c
+++ b/arch/x86_64/kernel/io_apic.c
@@ -778,7 +778,7 @@ void __init UNEXPECTED_IO_APIC(void)
{
}

-void __apicdebuginit print_IO_APIC(void)
+void print_IO_APIC(void)
{
int apic, i;
union IO_APIC_reg_00 reg_00;
diff --git a/arch/x86_64/kernel/irq.c b/arch/x86_64/kernel/irq.c
index b8a407f..9dd0793 100644
--- a/arch/x86_64/kernel/irq.c
+++ b/arch/x86_64/kernel/irq.c
@@ -18,6 +18,7 @@ #include <linux/delay.h>
#include <asm/uaccess.h>
#include <asm/io_apic.h>
#include <asm/idle.h>
+#include <asm/hw_irq.h>

atomic_t irq_err_count;

@@ -115,9 +116,18 @@ asmlinkage unsigned int do_IRQ(struct pt
irq = __get_cpu_var(vector_irq)[vector];

if (unlikely(irq >= NR_IRQS)) {
- printk(KERN_EMERG "%s: cannot handle IRQ %d\n",
- __FUNCTION__, irq);
- BUG();
+ if (printk_ratelimit()) {
+ int cpu, vec;
+ printk(KERN_EMERG "%s: cannot handle IRQ %d vector: %d cpu: %d\n",
+ __FUNCTION__, irq, vector, smp_processor_id());
+ irq = per_cpu(vectro_irq, 0);
+ printk("v[0][%d] -> %d\n", vector, irq);
+ print_IO_APIC();
+ }
+ irq_exit();
+
+ set_irq_regs(old_regs);
+ return 1;
}

#ifdef CONFIG_DEBUG_STACKOVERFLOW

2006-10-07 08:03:28

by Muli Ben-Yehuda

[permalink] [raw]
Subject: Re: 2.6.19-rc1 genirq causes either boot hang or "do_IRQ: cannot handle IRQ -1"

On Fri, Oct 06, 2006 at 05:42:40PM -0600, Eric W. Biederman wrote:

> If I read your bootlog right. You have logical cpus, but only two
> sockets, and I think only two cores. The other two logical cpus
> being hyperthreaded.

Yes, 2 sockets each of which is HT. Here's a /proc/cpuinfo from a
distro kernel:

processor : 0
vendor_id : GenuineIntel
cpu family : 15
model : 4
model name : Intel(R) Xeon(TM) MP CPU 3.16GHz
stepping : 1
cpu MHz : 3169.572
cache size : 1024 KB
physical id : 0
siblings : 2
core id : 0
cpu cores : 1
fpu : yes
fpu_exception : yes
cpuid level : 5
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall lm pni monitor ds_cpl tm2 est cid cmpxchg16b
bogomips : 6225.92
clflush size : 64
cache_alignment : 128
address sizes : 40 bits physical, 48 bits virtual
power management:

processor : 1
vendor_id : GenuineIntel
cpu family : 15
model : 4
model name : Intel(R) Xeon(TM) MP CPU 3.16GHz
stepping : 1
cpu MHz : 3169.572
cache size : 1024 KB
physical id : 0
siblings : 2
core id : 0
cpu cores : 1
fpu : yes
fpu_exception : yes
cpuid level : 5
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall lm pni monitor ds_cpl tm2 est cid cmpxchg16b
bogomips : 6324.22
clflush size : 64
cache_alignment : 128
address sizes : 40 bits physical, 48 bits virtual
power management:

processor : 2
vendor_id : GenuineIntel
cpu family : 15
model : 4
model name : Intel(R) Pentium(R) 4 CPU 3.16GHz
stepping : 9
cpu MHz : 3169.572
cache size : 1024 KB
physical id : 3
siblings : 2
core id : 3
cpu cores : 1
fpu : yes
fpu_exception : yes
cpuid level : 5
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall lm pni monitor ds_cpl tm2 est cid cmpxchg16b
bogomips : 6324.22
clflush size : 64
cache_alignment : 128
address sizes : 40 bits physical, 48 bits virtual
power management:

processor : 3
vendor_id : GenuineIntel
cpu family : 15
model : 4
model name : Intel(R) Pentium(R) 4 CPU 3.16GHz
stepping : 9
cpu MHz : 3169.572
cache size : 1024 KB
physical id : 3
siblings : 2
core id : 3
cpu cores : 1
fpu : yes
fpu_exception : yes
cpuid level : 5
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall lm pni monitor ds_cpl tm2 est cid cmpxchg16b
bogomips : 6324.22
clflush size : 64
cache_alignment : 128
address sizes : 40 bits physical, 48 bits virtual
power management:

> Thanks for your help,

Thank you!

Here's the slightly modified patch I used:

diff -r 7e996b460ee5 arch/x86_64/kernel/io_apic.c
--- a/arch/x86_64/kernel/io_apic.c Sat Oct 07 08:58:19 2006 +0200
+++ b/arch/x86_64/kernel/io_apic.c Sat Oct 07 09:22:50 2006 +0200
@@ -46,6 +46,9 @@
#include <asm/nmi.h>
#include <asm/msidef.h>
#include <asm/hypertransport.h>
+
+#undef KERN_DEBUG
+#define KERN_DEBUG ""

static int assign_irq_vector(int irq, cpumask_t mask);

@@ -778,7 +781,7 @@ void __init UNEXPECTED_IO_APIC(void)
{
}

-void __apicdebuginit print_IO_APIC(void)
+void print_IO_APIC(void)
{
int apic, i;
union IO_APIC_reg_00 reg_00;
@@ -786,8 +789,10 @@ void __apicdebuginit print_IO_APIC(void)
union IO_APIC_reg_02 reg_02;
unsigned long flags;

+#if 0
if (apic_verbosity == APIC_QUIET)
return;
+#endif /* 0 */

printk(KERN_DEBUG "number of MP IRQ sources: %d.\n", mp_irq_entries);
for (i = 0; i < nr_ioapics; i++)
diff -r 7e996b460ee5 arch/x86_64/kernel/irq.c
--- a/arch/x86_64/kernel/irq.c Sat Oct 07 08:58:19 2006 +0200
+++ b/arch/x86_64/kernel/irq.c Sat Oct 07 09:00:18 2006 +0200
@@ -18,6 +18,7 @@
#include <asm/uaccess.h>
#include <asm/io_apic.h>
#include <asm/idle.h>
+#include <asm/hw_irq.h>

atomic_t irq_err_count;

@@ -115,9 +116,17 @@ asmlinkage unsigned int do_IRQ(struct pt
irq = __get_cpu_var(vector_irq)[vector];

if (unlikely(irq >= NR_IRQS)) {
- printk(KERN_EMERG "%s: cannot handle IRQ %d\n",
- __FUNCTION__, irq);
- BUG();
+ if (printk_ratelimit()) {
+ printk(KERN_EMERG "%s: cannot handle IRQ %d vector: %d cpu: %d\n",
+ __FUNCTION__, irq, vector, smp_processor_id());
+ irq = per_cpu(vector_irq, 0)[vector];
+ printk("v[0][%d] -> %d\n", vector, irq);
+ print_IO_APIC();
+ }
+ irq_exit();
+
+ set_irq_regs(old_regs);
+ return 1;
}

#ifdef CONFIG_DEBUG_STACKOVERFLOW

And here are two logs from booting with above patch applied to the tip
of the repository (each has different interesting bits missing, thank
you SOL!)

kernel (hd0,1)/boot/calgary/bzImage root=/dev/sda2 console=tty0 console=ttyS1,1
9200
[Linux-bzImage, setup=0x1c00, size=0x2e4590]
initrd (hd0,1)/boot/calgary/aic94xxfw.initramfs.gz
[Linux-initrd @ 0x37e3f000, 0x1b0794 bytes]
savedefault

[ 0.000000] Linux version 2.6.19-rc1mx (muli@rhun) (gcc version 3.4.1) #164 SMP Sat Oct 7 09:23:16 IST 2006
[ 0.000000] Command line: root=/dev/sda2 console=tty0 console=ttyS1,19200
[ 0.000000] BIOS-provided physical RAM map:
[ 0.000000] BIOS-e820: 0000000000000000 - 0000000000099000 (usable) [ 0.000000] BIOS-e820: 0000000000099000 - 00000000000a0000 (reserved) [ 0.000000] BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved)
[ 0.000000] BIOS-e820: 0000000000100000 - 00000000e7f9c640 (usable)
[ 0.000000] BIOS-e820: 00000000e7f9c640 - 00000000e7fa6a40 (ACPI data)
[ 0.000000] BIOS-e820: 00000000e7fa6a40 - 00000000e8000000 (reserved)
[ 0.000000] BIOS-e820: 00000000fec00000 - 0000000100000000 (reserved)
[ 0.000000] BIOS-e820: 0000000100000000 - 0000000198000000 (usable)
[ 0.000000] end_pfn_map = 1671168
[ 0.000000] DMI 2.3 present.
[ 0.000000] Zone PFN ranges:
[ 0.000000] DMA 0 -> 4096
[ 0.000000] DMA32 4096 -> 1048576
[ 0.000000] Normal 1048576 -> 1671168
[ 0.000000] early_node_map[3] active PFN ranges
[ 0.000000] 0: 0 -> 153
[ 0.000000] 0: 256 -> 950172
[ 0.000000] 0: 1048576 -> 1671168
[ 0.000000] ACPI: PM-Timer IO Port: 0x9c
[ 0.000000] ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
[ 0.000000] Processor #0 (Bootup-CPU)
[ 0.000000] ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] enabled)
[ 0.000000] Processor #1
[ 0.000000] ACPI: LAPIC (acpi_id[0x02] lapic_id[0x06] enabled)
[ 0.000000] Processor #6
[ 0.000000] ACPI: LAPIC (acpi_id[0x03] lapic_id[0x07] enabled)
[ 0.000000] Processor #7
[ 0.000000] ACPI: LAPIC_NMI (acpi_id[0x00] dfl dfl linec00000] gsi_base[0])
[ 0.000000] IOAPIC[0]: apic_id 15, address 0xfec00000, GSI 0-35
[ 0.000000] ACPI: IOAPIC (id[0x0e] address[0xfec01000] gsi_base[35])
[ 0.000000] IOAPIC[1]: apic_id 14, address 0xfec01000, GSI 35-70
[ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_iror SMP configuration information
[ 0.000000] Nosave address range: 0000000000099000 - 00000000000a0000
[ 0.000000] Nosave address range: 00000000000a0000 - 00000000000e0000
[ 0.000000] Nosave address range: 00000000000e0000 - 0000000000100000
[ 0.000000] Nosave address range: 00000000e7f9c000 - 00000000e7f9d000
[ 0.000000] Nosave address range: 00000000e7f9d000 - 00000000e7fa6000
[ 0.000000] Nosave address range: 00000000e7fa6000 - 00000000e7fa7000
[ 0.000000] Nosave address range: 00000000e7fa7000 - 00000000e8000000
[ 0.000000] Nosave address range: 00000000e8000000 - 00000000fec00000
[ 0.000000] Nosave address range: 00000000fec00000 - 0000000100000000
[ 0.000000] Allocating PCI resources starting at ea000000 (gap: e8000000:16c00000)
[ 0.000000] PERCPU: Allocating 34432 bytes of per cpu data
[ 0.000000] Built 1 zonelists. Total pages: 1534050
[ 0.000000] Kernel command line: root=/dev/sda2 console=tty0 console=ttyS1,192... MAX_LOCKDEP_SUBCLASSES: 8
[ 159.715176] ... MAX_LOCK_DEPTH: 30
[ 159.740333] ... MAX_LOCKDEP_KEYS: 2048
[ 159.766528] ... CLASSHASH_SIZE: 1024
[ 159.793244] ... MAX_LOCKDEP_ENTRIES: 8192
[ 159.819436] ... MAX_LOCKDEP_CHAINS: 8192
[ 159.845631] ... CHAINHASH_SIZE: 4096
[ 159.871823] memory used by lock dependency info: 1328 kB
[ 159.904261] per task-struct memory footprint: 1680 bytes
[ 159.943893] Dentry cache hash table entries: 1048576 (order: 11, 8388608 bytes)
[ 159.998800] Inode-cache hash table entries: 524288 (order: 10, 4194304 bytes)
[ 160.045221] Checking aperture...
[ 160.087607] PCI-DMA: Calgary IOMMU detected.
[ 160.113316] PCI-DMA: Calgary TCE table spec is 7, CONFIG_IOMMU_DEBUG is enabled.
[ 160.274406] Memory: 6096436k/6684672k available (3789k kernel code, 193708k reserved, 2726k data, 276k init)
[ 160.411584] Calibrating delay using timer specific routine.. 6346.37 BogoMIPS (lpj=12692759)
[ 160.462605] Mount-cache hash table entries: 256
[ 160.491448] CPU: Trace cache: 12K uops, L1 D cache: 16K
[ 160.522964] CPU: L2 cache: 1024K
[ 160.542362] using mwait in idle threads.
[ 160.565962] CPU: Physical Processor ID: 0
[ 160.590081] CPU: Processor Core ID: 0
[ 160.612141] CPU0: Thermal monitoring enabled (TM1)
[ 160.640932] Freeing SMP alternatives: 32k freed
[ 160.668182] ACPI: Core revision 20060707
[ 160.738903] ..MP-BIOS bug: 8254 timer not connected to IO-APIC
[ 160.813953] Using local APIC timer interrupts.
[ 160.872189] result 10425635
[ 160.888969] Detected 10.425 MHz APIC timer.
[ 160.916601] lockdep: not fixing up alternatives.
[ 160.944889] Booting processor 1/4 APIC 0x1
[ 160.979898] Initializing CPU#1
[ 161.059409] Calibrating delay using timer specific routine.. 6339.03 BogoMIPS (lpj=12678074)
[ 161.059426] CPU: Trace cache: 12K uops, L1 D cache: 16K
[ 161.059430] CPU: L2 cache: 1024K
[ 161.059434] CPU: Physical Processor ID: 0
[ 161.059436] CPU: Processor Core ID: 0
[ 161.059448] CPU1: Thermal monitoring enabled (TM1)
[ 161.059723] Intel(R) Xeon(TM) MP CPU 3.16GHz stepping 01
[ 161.063730] lockdep: not fixing up alternatives.
[ 161.325811] Booting processor 2/4 APIC 0x6
[ 161.360829] Initializing CPU#2
[ 161.439312] Calibrating delay using timer specific routine.. 6339.24 BogoMIPS (lpj=12678480)
[ 161.439325] CPU: Trace cache: 12K uops, L1 D cache: 16K
[ 161.439328] CPU: L2 cache: 1024K
[ 161.439331] CPU: Physical Processor ID: 3
[ 161.439333] CP[ 161.740761] Initializing CPU#3
[ 161.819217] Calibrating delay using timer specific routine.. 6339.71 BogoMIPS (lpj=12679423)
[ 161.819231] CPU: Trace cache: 12K uops, L1 D cache: 16K
[ 161.819234] CPU: L2 cache: 1024K
[ 161.819238] CPU: Physical Processor ID: 3
[ 161.819239] CPU: Processor Core ID: 0
[ 161.819250] CPU3: Thermal monitoring enabled (TM1)
[ 161.819488] Intel(R) Pentium(R) 4 CPU 3.16GHz stepping 09
[ 161.823246] Brought up 4 CPUs
[ 162.075472] testing NMI watchdog ... OK.
[ 162.139199] time.c: Using 3.579545 MHz WALL PM GTOD PIT/TSC timer.
[ 162.176289] time.c: Detected 3169.430 MHz processor.
[ 162.388152] migration_cost=13,712
[ 162.409150] checking if image is initramfs... it is
[ 162.600804] Freeing initrd memory: 1729k freed
[ 162.630531] NET: Registered protocol family 16
[ 162.667595] ACPI: bus type pci registered
[ 162.691693] PCI: Using configuration type 1
[ 162.847444] ACPI: Interpreter enabled
[ 162.869451] ACPI: Using IOAPIC for interrupt routing
[ 162.905945] ACPI: PCI Root Bridge [VP00] (0000:00)
[ 162.938198] PCI: Ignoring BAR0-3 of IDE controller 0000:00:0f.1
[ 162.993446] ACPI: PCI Root Bridge [VP01] (0000:01)
[ 163.029072] ACPI: PCI Root Bridge [VP02] (0000:02)
[ 163.068114] ACPI: PCI Root Bridge [VP03] (0000:04)
[ 163.106797] ACPI: PCI Root Bridge [VP04] (0000:06)
[ 163.145750] ACPI: PCI Root Bridge [VP05] (0000:08)
[ 163.184685] ACPI: PCI Root Bridge [VP06] (0000:0a)
[ 163.223525] ACPI: PCI Root Bridge [VP07] (0000:0c)
[ 163.262317] SCSI subsystem initialized
[ 163.285134] usbcore: registered new interface driver usbfs
[ 163.318234] usbcore: registered new interface driver hub
[ 163.351294] usbcore: registered new device driver usb
[ 163.382771] PCI: Using ACPI for IRQ routing
[ 163.407931] PCI: If a device doesn't work, try "pci=routeirq". If it helps, post a report
[ 163.457579] number of MP IRQ sources: 15.
[ 163.481692] number of IO-APIC #15 registers: 36.
[ 163.509453] number of IO-APIC #14 registers: 36.
[ 163.537210] testing the IO APIC.......................
[ 163.568090]
[ 163.577112emented: 0
[ 163.724332] ....... : IO APIC version: 0011
[ 163.751568] .... register #02: 00000000
[ 163.774648] ....... : arbitration: 00
[ 163.798768] .... IRQ redirection table:
[ 163.821847] NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect:
[ 163.858458] 00 000 00 1 0 0 0 0 0 0 00
[ 163.892078] 01 001 01 0 0 0 0 0 1 1 39
[ 163.925659] 02 001 01 1 0 0 0 0 0 0 20
[ 163.959243] 03 001 01 0 0 0 0 0 1 1 41
[ 163.992872] 04 001 01 0 0 0 0 0 1 1 49
[ 164.026458] 05 001 01 0 0 0 0 0 1 1 51
[ 164.060040] 06 001 01 0 0 0 0 0 1 1 59
[ 164.093622] 07 001 01 0 0 0 0 0 1 1 61
[ 164.127205] 08 001 01 0 0 0 1 0 1 1 69
[ 164.160786] 09 001 01 0 1 0 1 0 1 1 71
[ 164.194369] 0a 001 01 0 0 0 0 0 1 1 79
[ 164.227998] 0b 001 01 0 0 0 0 0 1 1 81
[ 164.261579] 0c 001 01 0 0 0 0 0 1 1 89
[ 164.295156] 0d 001 01 0 0 0 0 0 1 1 91
[ 164.328737] 0e 001 01 0 0 0 1 0 1 1 99
[ 164.362314] 0f 001 01 0 0 0 0 0 1 1 A1
[ 164.395893] 10 000 00 1 0 0 0 0 0 0 00
[ 164.429471] 11 000 00 1 0 0 0 0 0 0 00
[ 164.463053] 12 000 00 1 0 0 0 0 0 0 00
[ 164.496636] 13 000 00 1 0 0 0 0 0 0 00
[ 164.530213] 14 000 00 1 0 0 0 0 0 0 00
[ 164.563795] 15 000 00 1 0 0 0 0 0 0 00
[ 164.597377] 16 000 00 1 0 0 0 0 0 0 00
[ 164.765276] 1b 000 00 1 0 0 0 0 0 0 00
[ 164.798855] 1c 000 00 1 0 0 0 0 0 0 00
[ 164.832433] 1d 000 00 1 0 0 0 0 0 0 00
[ 164.866013] 1e 000 00 1 0 0 0 0 0 0 00
[ 164.899504] 23 000 00 1 0 0 0 0 0 0 00
[ 165.067483]
[ 165.076463] IO APIC #14......
[ 165.094301] .... register #00: 0E000000
[ 165.117332] ....... : physical APIC id: 0E
[ 165.143479] .... register #01: 00230011
[ 165.166557] ....... : max redirection entries: 0023
[ 165.197960] ....... : PRQ implemented: 0
[ 165.223633] ....... : IO APIC version: 0011
[ 165.250871] .... register #02: 00000000
[ 165.273945] ....... : arbitration: 00
[ 165.298014] .... IRQ redirection table:
[ 165.321092] NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect:
[ 165.357648] 00 000 00 1 0 0 0 0 0 0 00
[ 165.391269] 01 000 00 1 0 0 0 0 0 0 00
[ 165.424849] 02 000 00 1 0 0 0 0 0 0 00
[ 165.458428] 03 000 00 1 0 0 0 0 0 0 00
[ 165.492012] 04 000 00 1 0 0 0 0 0 0 00
[ 165.525591] 05 000 00 1 0 0 0 0 0 0 00
[ 165.559171] 06 000 00 1 0 0 0 0 0 0 00
[ 165.592749] 07 000 00 1 0 0 0 0 0 0 00
[ 165.626329] 08 000 00 1 0 0 0 0 0 0 00
[ 165.794220] 0d 000 00 1 0 0 0 0 0 0 00
[ 165.827796] 0e 000 00 1 0 0 0 0 0 0 00
[ 165.861376] 0f 000 00 1 0 0 0 0 0 0 00
[ 165.894954] 10 000 00 1 0 0 0 0 0 0 00
[ 165.928532] 11 000 00 1 0 0 0 0 0 0 00
[ 165.962110] 12 000 00 1 0 0 0 0 0 0 00
[ 165.995688] 13 000 00 1 0 0 0 0 0 0 00
[ 166.029266] 14 000 00 1 0 0 0 0 0 0 00
[ 166.062846] 15 000 00 1 0 0 0 0 0 0 00
[ 166.096425] 16 000 00 1 0 0 0 0 0 0 00
[ 166.130004] 17 000 00 1 0 0 0 0 0 0 00
[ 166.163587] 18 000 00 1 0 0 0 0 0 0 00
[ 166.197164] 19 000 00 1 0 0 0 0 0 0 00
[ 166.230745] 1a 000 00 1 0 0 0 0 0 0 00
[ 166.264325] 1b 000 00 1 0 0 0 0 0 0 00
[ 166.297910] 1c 000 00 1 0 0 0 0 0 0 00
[ 166.331541] 1d 000 00 1 0 0 0 0 0 0 00
[ 166.365124] 1e 000 00 1 0 0 0 0 0 0 00
[ 166.398704] 1f 000 00 1 0 0 0 0 0 0 00
[ 166.432284] 20 000 00 1 0 0 0 0 0 0 00
[ 166.465864] 21 000 00 1 0 0 0 0 0 0 00
[ 166.499446] 22 000 00 1 0 0 0 0 0 0 00
[ 166.533029] 23 000 00 1 0 0 0 0 0 0 00
[ 166.566608] IRQ to pin mappings:
[ 166.586000] IRQ0 -> 0:2
[ 166.600951] IRQ1 -> 0:1
[ 166.615924] IRQ3 -> 0:3
[ 166.630900] IRQ4 -> 0:4
[ 166.645870] IRQ5 -> 0:5
[ 166.660834] IRQ6 -> 0:6
[ 166.6758066.816774] .................................... done.
[ 166.847916] PCI-DMA: Using Calgary IOMMU
[ 167.226948] Calgary: enabling translation on PHB 0x0
[ 167.256752] Calgary: errant DMAs will now be prevented on this bus.
[ 167.649298] Calgary: enabling translation on PHB 0x1
[ 167.679145] Calgary: errant DMAs will now be prevented on this bus.
[ 168.072000] Calgary: enabling translation on PHB 0x2
[ 168.101803] Calgary: errant DMAs will now be prevented on this bus.
[ 168.139495] PCI-GART: No AMD northbridge found.
[ 168.177257] NET: Registered protocol family 2
[ 168.253624] IP route cache hash table entries: 262144 (order: 9, 2097152 bytes)
[ 168.299544] TCP established hash table entries: 65536 (order: 9, 3670016 bytes)
[ 168.352200] TCP bind hash table entries: 32768 (order: 8, 1835008 bytes)
[ 168.395982] TCP: Hash tables configured (established 65536 bind 32768)
[ 168.435240] TCP reno registered
[ 168.477799] Total HugeTLB memory allocated, 0
[ 168.505707] Installing knfsd (copyright (C) 199pt 0000:00:01.0[A] -> GSI 16 (level, low) -> IRQ 16
[ 168.703200] radeonfb: Found Intel x86 BIOS ROM Image
[ 168.745203] radeonfb: Retrieved PLL infos from BIOS
[ 168.774491] radeonfb: Reference=27.00 MHz (RefDiv=60) Memory=143.00 Mhz, System=143.00 MHz
[ 168.824155] radeonfb: PLL min 12000 max 35000
[ 168.954703] i2c_adapter i2c-1: unable to read EDID block.
[ 169.146556] i2c_adapter i2c-1: unable to read EDID block.
[ 169.338499] i2c_adapter i2c-1: unable to read EDID block.
[ 169.802364] i2c_adapter i2c-2: unable to read EDID block.
[ 169.994307] i2c_adapter i2c-2: unable to read EDID block.
[ 170.186252] i2c_adapter i2c-2: unable to read EDID block.
[ 170.340745] radeonfb: Monitor 1 type DFP found
[ 170.367444] radeonfb: EDID probed
[ 170.387399] radeonfb: Monitor 2 type CRT found
[ 171.445892] Console: switching to colour frame buffer device 128x48
[ 172.157807] radeonfb (0000:00:01.0): ATI Radeon QY
[ 172.190527] tridentfb: Trident framebuffer 0.7.8-NEWAPI initializing
[ 172.230329] hgafb: HGA card not detected.
[ 172.254639] hgafb: probe of hgafb.0 failed with error -22
[ 172.289812] vga16fb: mapped to 0xffff8100000a0000
[ 172.319392] fb1: VGA16 VGA frame buffer device
[ 172.347627] fb2: Virtual frame buffer device, using 1024K of video memory
[ 172.389093] ACPI: Power Button (FF) [PWRF]
[ 172.414559] ibm_acpi: ec object not found
[ 172.826196] Linux agpgart interface v0.101 (c) Dave Jones
[ 172.859065] ipmi message handler version 39.0
[ 172.885336] ipmi device interface
[ 172.905675] Hangcheck: starting hangcheck timer 0.9.0 (tick is 180 seconds, margin is 60 seconds).
[ 172.959801] Hangcheck: Using monotonic_clock().
[ 172.987184] Serial: 8250/16550 driver $Revision: 1.90 $ 4 ports, IRQ sharing disabled
[ 173.034835] do_IRQ: cannot handle IRQ -1 vector: 73 cpu: 1
[ 173.034848] serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
[ 173.067882] serial8250: ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
[ 173.144126] v[0][73] -> 4
[ 173.144131] number of MP IRQ sources: 15.
[ 173.185670] number of IO-APIC #15 registers: 36.
[ 173.214801] number of IO-APIC #14 registers: 36.
[ 173.243853] testing the IO APIC.......................
[ 173.310563]
[ 173.355937] IO APIC #15......
[ 173.411003] .... register #00: 0F000000
[ 173.471712] ....... : physical APIC id: 0F
[ 173.537310] .... register #01: 00230011
[ 173.600372] ....... : max redirection entries: 0023
[ 173.673337] ....... : PRQ implemented: 0
[ 173.741310] ....... : IO APIC version: 0011
[ 173.812025] .... register #02: 00000000
[ 173.878873] ....... : arbitration: 00
[ 173.947147] .... IRQ redirection table:
[ 174.015401] NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect:
[ 174.098994] 00 000 00 1 0 0 0 0 0 0 00
[ 174.182973] 01 001 01 0 0 0 0 0 1 1 39
[ 174.265977] 02 001 01 1 0 0 0 0 0 0 20
[ 174.349387] 03 001 01 0 0 0 0 0 1 1 41
[ 174.432753] 04 001 01 0 0 0 0 0 1 1 49
[ 174.516545] 05 001 01 0 0 0 0 0 1 1 51
[ 174.601131] 06 001 01 0 0 0 0 0 1 1 59
[ 174.685489] 07 001 01 0 0 0 0 0 1 1 61
[ 174.770828] 08 001 01 0 0 0 1 0 1 1 69
[ 174.856338] 09 001 01 0 1 0 1 0 1 1 71
[ 174.942583] 0a 001 01 0 0 0 0 0 1 1 79
[ 175.028737] 0b 001 01 0 0 0 0 0 1 1 81
[ 175.113592] 0c 001 01 0 0 0 0 0 1 1 89
[ 175.196746] 0d 001 01 0 0 0 0 0 1 1 91
[ 175.278042] 0e 001 01 0 0 0 1 0 1 1 99
[ 175.358284] 0f 001 01 0 0 0 0 0 1 1 A1
[ 175.438277] 10 001 01 1 1 0 1 0 1 1 A9
[ 175.516525] 11 000 00 1 0 0 0 0 0 0 00
[ 175.594250] 12 000 00 1 0 0 0 0 0 0 00
[ 175.670566] 13 000 00 1 0 0 0 0 0 0 00
[ 175.744982] 14 000 00 1 0 0 0 0 0 0 00
[ 175.819438] 15 000 00 1 0 0 0 0 0 0 00
[ 175.891995] 16 000 00 1 0 0 0 0 0 0 00
[ 175.963970] 17 000 00 1 0 0 0 0 0 0 00
[ 176.035218] 18 000 00 1 0 0 0 0 0 0 00
[ 176.103913] 19 000 00 1 0 0 0 0 0 0 00
[ 176.171391] 1a 000 00 1 0 0 0 0 0 0 00
[ 176.236302] 1b 000 00 1 0 0 0 0 0 0 00
[ 176.299290] 1c 000 00 1 0 0 0 0 0 0 00
[ 176.361328] 1d 000 00 1 0 0 0 0 0 0 00
[ 176.423472] 1e 000 00 1 0 0 0 0 0 0 00
[ 176.484175] 1f 000 00 1 0 0 0 0 0 0 00
[ 176.544404] 20 000 00 1 0 0 0 0 0 0 00
[ 176.603905] 21 000 00 1 0 0 0 0 0 0 00
[ 176.662658] 22 000 00 1 0 0 0 0 0 0 00
[ 176.721041] 23 000 00 1 0 0 0 0 0 0 00
[ 176.778326]
[ 176.809970] IO APIC #14......
[ 176.850818] .... register #00: 0E000000
[ 176.896838] ....... : physical APIC id: 0E
[ 176.945618] .... register #01: 00230011
[ 176.991279] ....... : max redirection entries: 0023
[ 177.046099] ....... : PRQ implemented: 0
[ 177.094945] ....... : IO APIC version: 0011
[ 177.145511] .... register #02: 00000000
[ 177.191741] ....... : arbitration: 00
[ 177.238924] .... IRQ redirection table:
[ 177.284087] NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect:
[ 177.343564] 00 000 00 1 0 0 0 0 0 0 00
[ 177.401238] 01 000 00 1 0 0 0 0 0 0 00
[ 177.458475] 02 000 00 1 0 0 0 0 0 0 00
[ 177.515601] 03 000 00 1 0 0 0 0 0 0 00
[ 177.572714] 04 000 00 1 0 0 0 0 0 0 00
[ 177.629406] 05 000 00 1 0 0 0 0 0 0 00
[ 177.686218] 06 000 00 1 0 0 0 0 0 0 00
[ 177.742974] 07 000 00 1 0 0 0 0 0 0 00
[ 177.799598] 08 000 00 1 0 0 0 0 0 0 00
[ 177.856157] 09 000 00 1 0 0 0 0 0 0 00
[ 177.912395] 0a 000 00 1 0 0 0 0 0 0 00
[ 177.968615] 0b 000 00 1 0 0 0 0 0 0 00
[ 178.024711] 0c 000 00 1 0 0 0 0 0 0 00
[ 178.080957] 0d 000 00 1 0 0 0 0 0 0 00
[ 178.137205] 0e 000 00 1 0 0 0 0 0 0 00
[ 178.193008] 0f 000 00 1 0 0 0 0 0 0 00
[ 178.248833] 10 000 00 1 0 0 0 0 0 0 00
[ 178.304384] 11 000 00 1 0 0 0 0 0 0 00
[ 178.359429] 12 000 00 1 0 0 0 0 0 0 00
[ 178.415344] 13 000 00 1 0 0 0 0 0 0 00
[ 178.471047] 14 000 00 1 0 0 0 0 0 0 00
[ 178.526538] 15 000 00 1 0 0 0 0 0 0 00
[ 178.581924] 16 000 00 1 0 0 0 0 0 0 00
[ 178.637489] 17 000 00 1 0 0 0 0 0 0 00
[ 178.693260] 18 000 00 1 0 0 0 0 0 0 00
[ 178.748506] 19 000 00 1 0 0 0 0 0 0 00
[ 178.804012] 1a 000 00 1 0 0 0 0 0 0 00
[ 178.859381] 1b 000 00 1 0 0 0 0 0 0 00
[ 178.914721] 1c 000 00 1 0 0 0 0 0 0 00
[ 178.970304] 1d 000 00 1 0 0 0 0 0 0 00
[ 179.025934] 1e 000 00 1 0 0 0 0 0 0 00
[ 179.081632] 1f 000 00 1 0 0 0 0 0 0 00
[ 179.137372] 20 000 00 1 0 0 0 0 0 0 00
[ 179.193054] 21 000 00 1 0 0 0 0 0 0 00
[ 179.248435] 22 000 00 1 0 0 0 0 0 0 00
[ 179.303584] 23 000 00 1 0 0 0 0 0 0 00
[ 179.358129] IRQ to pin mappings:
[ 179.398163] IRQ0 -> 0:2
[ 179.433497] IRQ1 -> 0:1
[ 179.468578] IRQ3 -> 0:3
[ 179.502723] IRQ4 -> 0:4
[ 179.535751] IRQ5 -> 0:5
[ 179.567898] IRQ6 -> 0:6
[ 179.599575] IRQ7 -> 0:7
[ 179.630636] IRQ8 -> 0:8
[ 179.661388] IRQ9 -> 0:9
[ 179.691365] IRQ10 -> 0:10
[ 179.721289] IRQ11 -> 0:11
[ 179.750107] IRQ12 -> 0:12
[ 179.779057] IRQ13 -> 0:13
[ 179.807867] IRQ14 -> 0:14
[ 179.836816] IRQ15 -> 0:15
[ 179.865827] IRQ16 -> 0:16
[ 179.894889] .................................... done.
[ 179.954564] RAMDISK driver initialized: 16 RAM disks of 16384K size 1024 blocksize
[ 180.016728] loop: loaded (max 8 devices)
[ 180.052659] ibmasm: IBM ASM Service Processor Driver version 1.0 loaded
[ 180.105233] ACPI: PCI Interrupt 0000:02:01.0[A] -> GSI 18 (level, low) -> IRQ 18
[ 180.163772] 3c59x: Donald Becker and others. http://www.scyld.com/network/vortex.html
[ 180.220788] 0000:02:01.0: 3Com PCI 3c905C Tornado at ffffc20000042000.
[ 180.297869] tg3.c:v3.66 (September 23, 2006)
[ 180.338200] ACPI: PCI Interrupt 0000:01:01.0[A] -> GSI 24 (level, low) -> IRQ 24
[ 180.434904] eth1: Tigon3 [partno(BCM95704A6) rev 2100 PHY(5704)] (PCIX:66MHz:64-bit) 10/100/1000BaseT Ethernet 00:0d:60:98:74:22
[ 180.539771] eth1: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[1] Split[0] WireSpeed[1] TSOcap[0]
[ 180.609267] eth1: dma_rwctrl[769f0000] dma_mask[64-bit]
[ 180.660741] ACPI: PCI Interrupt 0000:01:01.1[B] -> GSI 28 (level, low) -> IRQ 28
[ 180.762926] eth2: Tigon3 [partno(BCM95704A6) rev 2100 PHY(5704)] (PCIX:66MHz:64-bit) 10/100/1000BaseT Ethernet 00:0d:60:98:74:23
[ 180.879099] eth2: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] Split[0] WireSpeed[1] TSOcap[1]
[ 180.954434] eth2: dma_rwctrl[769f0000] dma_mask[64-bit]
[ 181.012309] Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
[ 181.077149] ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
[ 181.152505] SvrWks CSB6: IDE controller at PCI slot 0000:00:0f.1
[ 181.216463] SvrWks CSB6: chipset revision 160
[ 181.271080] SvrWks CSB6: not 100% native mode: will probe irqs later
[ 181.338471] ide0: BM-DMA at 0x0700-0x0707, BIOS settings: hda:DMA, hdb:DMA
[ 181.412045] SvrWks CSB6: simplex device: DMA disabled
[ 181.472639] ide1: SvrWks CSB6 Bus-Master DMA disabled (BIOS)
[ 182.277658] hda: HL-DT-STDVD-ROM GDR8082N, ATAPI CD/DVD-ROM drive
[ 182.685216] ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
[ 183.312085] do_IRQ: cannot handle IRQ -1 vector: 153 cpu: 1
[ 183.378484] v[0][153] -> 14
[ 183.428886] number of MP IRQ sources: 15.
[ 183.486190] number of IO-APIC #15 registers: 36.
[ 183.486197] number of IO-APIC #14 registers: 36.
[ 183.486199] testing the IO APIC.......................
[ 183.486206]
[ 183.486207] IO APIC #15......
[ 183.486209] .... register #00: 0F000000
[ 183.486211] ....... : physical APIC id: 0F
[ 183.486213] .... register #01: 00230011
[ 183.486214] ....... : max redirection entries: 0023
[ 183.486217] ....... : PRQ implemented: 0
[ 183.486219] ....... : IO APIC version: 0011
[ 183.486221] .... register #02: 00000000
[ 183.486228] ....... : arbitration: 00
[ 183.486230] .... IRQ redirection table:
[ 183.486231] NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect:
[ 183.486248] 00 000 00 1 0 0 0 0 0 0 00
[ 183.486254] 01 001 01 0 0 0 0 0 1 1 39
[ 183.486260] 02 001 01 1 0 0 0 0 0 0 20
[ 183.486265] 03 001 01 0 0 0 0 0 1 1 41
[ 183.486270] 04 001 01 0 0 0 0 0 1 1 49
[ 183.486275] 05 001 01 0 0 0 0 0 1 1 51
[ 183.486280] 06 001 01 0 0 0 0 0 1 1 59
[ 183.486285] 07 001 01 0 0 0 0 0 1 1 61
[ 183.486291] 08 001 01 0 0 0 1 0 1 1 69
[ 183.486296] 09 001 01 0 1 0 1 0 1 1 71
[ 183.486301] 0a 001 01 0 0 0 0 0 1 1 79
[ 183.486306] 0b 001 01 0 0 0 0 0 1 1 81
[ 183.486311] 0c 001 01 0 0 0 0 0 1 1 89
[ 183.486316] 0d 001 01 0 0 0 0 0 1 1 91
[ 183.486322] 0e 001 01 0 0 0 1 0 1 1 99
[ 183.486327] 0f 001 01 0 0 0 0 0 1 1 A1
[ 183.486332] 10 001 01 1 1 0 1 0 1 1 A9
[ 183.486337] 11 000 00 1 0 0 0 0 0 0 00
[ 183.486342] 12 001 01 1 1 0 1 0 1 1 B1
[ 183.486348] 13 000 00 1 0 0 0 0 0 0 00
[ 183.486353] 14 000 00 1 0 0 0 0 0 0 00
[ 183.486358] 15 000 00 1 0 0 0 0 0 0 00
[ 183.486363] 16 000 00 1 0 0 0 0 0 0 00
[ 183.486368] 17 000 00 1 0 0 0 0 0 0 00
[ 183.486373] 18 001 01 1 1 0 1 0 1 1 B9
[ 183.486378] 19 000 00 1 0 0 0 0 0 0 00
[ 183.486383] 1a 000 00 1 0 0 0 0 0 0 00
[ 183.486388] 1b 000 00 1 0 0 0 0 0 0 00
[ 183.486394] 1c 001 01 1 1 0 1 0 1 1 C1
[ 183.486399] 1d 000 00 1 0 0 0 0 0 0 00
[ 183.486404] 1e 000 00 1 0 0 0 0 0 0 00
[ 183.486409] 1f 000 00 1 0 0 0 0 0 0 00
[ 183.486414] 20 000 00 1 0 0 0 0 0 0 00
[ 183.486419] 21 000 00 1 0 0 0 0 0 0 00
[ 183.486424] 22 000 00 1 0 0 0 0 0 0 00
[ 183.486429] 23 000 00 1 0 0 0 0 0 0 00
[ 183.486435]
[ 183.486436] IO APIC #14......
[ 183.486438] .... register #00: 0E000000
[ 183.486439] ....... : physical APIC id: 0E
[ 183.486441] .... register #01: 00230011
[ 183.486443] ....... : max redirection entries: 0023
[ 183.486445] ....... : PRQ implemented: 0
[ 183.486446] ....... : IO APIC version: 0011
[ 183.486448] .... register #02: 00000000
[ 183.486449] ....... : arbitration: 00
[ 183.486451] .... IRQ redirection table:
[ 183.486452] NR Log Phy Mask 1 0 0 0 0 0 0 00
[ 183.486477] 04 000 00 1 0 0 0 0 0 0 00
[ 183.486482] 05 000 00 1 0 0 0 0 0 0 00
[ 183.486487] 06 000 00 1 0 0 0 0 0 0 00
[ 183.486492] 07 000 00 1 0 0 0 0 0 0 00
[ 183.486497] 08 000 00 1 0 0 0 0 0 0 00
[ 183.486502] 09 000 00 1 0 0 0 0 0 0 00
[ 183.486506] 0a 000 00 1 0 0 0 0 0 0 00
[ 183.486511] 0b 000 00 1 0 0 0 0 0 0 00
[ 183.486516] 0c 000 00 1 0 0 0 0 0 0 00
[ 183.486521] 0d 000 00 1 0 0 0 0 0 0 00
[ 183.486526] 0e 000 00 1 0 0 0 0 0 0 00
[ 183.486531] 0f 000 00 1 0 0 0 0 0 0 00
[ 183.486536] 10 000 00 1 0 0 0 0 0 0 00
[ 183.486541] 11 000 00 1 0 0 0 0 0 0 00
[ 183.486546] 12 000 00 1 0 0 0 0 0 0 00
[ 183.486551] 13 000 00 1 0 0 0 0 0 0 00
[ 183.486556] 14 000 00 1 0 0 0 0 0 0 00
[ 183.486561] 15 000 00 1 0 0 0 0 0 0 00
[ 183.486566] 16 000 00 1 0 0 0 0 0 0 00
[ 183.486571] 17 000 00 1 0 0 0 0 0 0 00
[ 183.486576] 18 000 00 1 0 0 0 0 0 0 00
[ 183.486581] 19 000 00 1 0 0 0 0 0 0 00
[ 183.486586] 1a 000 00 1 0 0 0 0 0 0 00
[ 183.486591] 1b 000 00 1 0 0 0 0 0 0 00
[ 183.486596] 1c 000 00 1 0 0 0 0 0 0 00
[ 183.486602] 1d 000 00 1 0 0 0 0 0 0 00
[ 183.486607] 1e 000 00 1 0 0 0 0 0 0 00
[ 183.486612] 1f 000 00 1 0 0 0 0 0 0 00
[ 183.486617] 20 000 00 1 0 0 0 0 0 0 00
[ 183.486622] 21 000 00 1 0 0 0 0 0 0 00
[ 183.486627] 22 000 00 1 0 0 0 0 0 0 00
[ 183.486632] 23 000 00 1 0 0 0 0 0 0 00
[ 183.486634] IRQ to pin mappings:
[ 183.486636] IRQ0 -> 0:2
[ 183.486638] IRQ1 -> 0:1
[ 183.486640] IRQ3 -> 0:3
[ 183.486643] IRQ4 -> 0:4
[ 183.486645] IRQ5 -> 0:5
[ 183.486647] IRQ6 -> 0:6
[ 183.486649] IRQ7 -> 0:7
[ 183.486651] IRQ8 -> 0:8
[ 183.486654] IRQ9 -> 0:9
[ 183.486656] IRQ10 -> 0:10
[ 183.486658] I 183.486680] .................................... done.
[ 189.048897] hda: lost interrupt
[ 194.049815] hda: lost interrupt
[ 194.075698] hda: ATAPI 24X DVD-ROM drive, 256kB Cache
[ 194.114939] Uniform CD-ROM driver Revision: 3.20

kernel (hd0,1)/boot/calgary/bzImage root=/dev/sda2 console=tty0 console=ttyS1,1
9200
[Linux-bzImage, setup=0x1c00, size=0x2e4590] initrd (hd0,1)/boot/calgary/aic94xxfw.initramfs.gz
[Linux-initrd @ 0x37e3f000, 0x1b0794 bytes]
savedefault

[ 0.000000] Linux version 2.6.19-rc1mx (muli@rhun) (gcc version 3.4.1) #164 S MP Sat Oct 7 09:23:16 IST 2006 [ 0.000000] Command line: root=/dev/sda2 console=tty0 console=ttyS1,19200
[ 0.000000] BIOS-provided physical RAM map:
[ 0.000000] BIOS-e820: 0000000000000000 - 0000000000099000 (usable)
[ 0.000000] BIOS-e820: 0000000000099000 - 00000000000a0000 (reserved)
[ 0.000000] BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved)
[ 0.000000] BIOS-e820: 0000000000100000 - 00000000e7f9c640 (usable)
[ 0.000000] BIOS-e820: 00000000e7f9c640 - 00000000e7fa6a40 (ACPI data)
[ 0.000000] BIOS-e820: 00000000e7fa6a40 - 00000000e8000000 (reserved)
[ 0.000000] BIOS-e820: 00000000fec00000 - 0000000100000000 (reserved)
[ 0.000000] BIOS-e820: 0000000100000000 - 0000000198000000 (usable)
[ 0.000000] end_pfn_map = 1671168
[ 0.000000] DMI 2.3 present.
[ 0.000000] Zone PFN ranges:
[ 0.000000] DMA 0 -> 4096
[ 0.000000] DMA32 4096 -> 1048576
[ 0.000000] Normal 1048576 -> 1671168
[ 0.000000] early_node_map[3] active PFN ranges
[ 0.000000] 0: 0 -> 153
[ 0.000000] 0: 256 -> 950172
[ 0.000000] 0: 1048576 -> 1671168
[ 0.000000] ACPI: PM-Timer IO Port: 0x9c
[ 0.000000] ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
[ 0.000000] Processor #0 (Bootup-CPU)
[ 0.000000] Processor #7
[ 0.000000] ACPI: LAPIC_NMI (acpi_id[0x00] dfl dfl lint[0x1])
[ 0.000000] ACPI: LAPIC_NMI (acpi_id[0x01] dfl dfl lint[0x1])
[ 0.000000] ACPI: LAPIC_NMI (acpi_id[0x02] dfl dfl lint[0x1])
[ 0.000000] ACPI: LAPIC_NMI (acpi_id[0x03] dfl dfl lint[0x1])
[ 0.000000] ACPI: IOAPIC (id[0x0f] address[0xfec00000] gsi_base[0])
[ 0.000000] IOAPIC[0]: apic_id 15, address 0xfec00000, GSI 0-35
[ 0.000000] ACPI: IOAPIC (id[0x0e] address[0xfec01000] gsi_base[35])
[ 0.000000] IOAPIC[1]: apic_id 14, address 0xfec01000, GSI 35-70
[ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
[ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 8 global_irq 8 low edge)
[ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 14 global_irq 14 low edge)
[ 0.000000] Setting APIC routing to flat
[ 0.000000] Using ACPI (MADT) for SMP configuration information
[ 0.000000] Nosave address range: 0000000000099000 - 00000000000a0000
[ 0.000000] Nosave address range: 00000000000a0000 - 00000000000e0000
[ 0.000000] Nosave address range: 00000000000e0000 - 0000000000100000
[ 0.000000] Nosave address range: 00000000e7f9c000 - 00000000e7f9d000
[ 0.000000] Nosave address range: 00000000e7f9d000 - 00000000e7fa6000
[ 0.000000] Nosave address range: 00000000e7fa6000 - 00000000e7fa7000
[ 0.000000] Nosave address range: 00000000e7fa7000 - 00000000e8000000
[ 0.000000] Nosave address range: 00000000e8000000 - 00000000fec00000
[ 0.000000] Nosave address range: 00000000fec00000 - 0000000100000000
[ 0.le=tty0 console=ttyS1,19200
[ 0.000000] Initializing CPU#0
[ 0.000000] PID hash table entries: 4096 (order: 12, 32768 bytes)
[ 172.269585] Console: colour VGA+ 80x25
[ 174.207832] Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., Ingo Molnar
[ 174.254348X_LOCKDEP_CHAINS: 8192
[ 174.410495] ... CHAINHASH_SIZE: 4096
[ 174.436695] memory used by lock dependency info: 1328 kB
[ 174.469139] per task-struct memory footprint: 1680 bytes
[ 174.508784] Dentry cache hash table entries: 1048576 (order: 11, 8388608 bytes)
[ 174.563704] Inode-cache hash table entries: 524288 (order: 10, 4194304 bytes)
[ 174.610140] Checking aperture...
[ 174.652498] PCI-DMA: Calgary IOMMU detected.
[ 174.678165] PCI-DMA: Calgary TCE table spec is 7, CONFIG_IOMMU_DEBUG is enabled.
[ 174.839187] Memory: 6096436k/6684672k available (3789k kernel code, 193708k reserved, 2726k data, 276k init)
[ 174.976352] Calibrating delay using timer specific rohysical Processor ID: 0
[ 175.154816] CPU: Processor Core ID: 0
[ 175.176863] CPU0: Thermal monitoring enabled (TM1)
[ 175.205667] Freeing SMP alternatives: 32k freed
[ 175.232919] ACPI: Core revision 20060707
[ 175.303638] ..MP-BIOS bug: 8254 timer not connected to IO-APIC
[ 175.378690] Using local APIC timer interrupts.
[ 175.436932] result 10425543
[ 175.453717] Detected 10.425 MHz APIC timer.
[ 175.481377] lockdep: not fixing up alternatives.
[ 175.509644] Booting processor 1/4 APIC 0x1
[ 175.544653] Initializing CPU#1
[ 175.624177] Calibrating delay using timer specific routine.. 6339.04 BogoMIPS (lpj=12678092)
[ 175.624194] CPU: Trace cache: 12K uops, L1 D cache: 16K
[ 175.624197] CPU: L2 cache: 1024K
[ 175.624201] CPU: Physical Processor ID: 0
[ 175.624203] CPU: Processor Core ID: 0
[ 175.624216] CPU1: Thermal monitoring enabled (TM1)
[ 175.624491] Intel(R) Xeon(TM) MP CPU 3.16GHz stepping 01
[ 175.628500] lockdep: not fixing up alternatives.
[ 175.890549] Booting processor 2/4 APIC 0x6
[ 175.925565] Initializing CPU#2
[ 176.004081] Calibrating delay using timer specific routine.. 6339.19 BogoMIPS (lpj=12678389)
[ 176.004094] CPU: Trace cache: 12K uops, L1 D cache: 16K
[ 176.004097] CPU: L2 cache: 1024K
[ lternatives.
[ 176.270481] Booting processor 3/4 APIC 0x7
[ 176.305497] Initializing CPU#3
[ 176.383983] Calibrating delay using timer specific routine.. 6339.28 BogoMIPS (lpj=12678572)
[ 176.383998] CPU: Trace cache: 12K uops, L1 D cache: 16K
[ 176.384000] CPU: L2 cache: 1024K
[ 176.384004] CPU: Physical Processor ID: 3
[ 176.384006] CPU: Processor Core ID: 0
[ 176.384017] CPU3: Thermal monitoring enabled (TM1)
[ 176.384255] Intel(R) Pentium(R) 4 CPU 3.16GHz stepping 09
[ 176.388010] Brought up 4 CPUs
[ 176.640222] testing NMI watchdog ... OK.
[ 176.703947] time.c: Using 3.579545 MHz WALL PM GTOD PIT/TSC timer.
[ 176.741029] time.c: Detected 3169.383 MHz processor.
[ 176.946524] migration_cost=4,717
[ 176.967015] checking if image is initramfs... it is
[ 177.158953] Freeing initrd memory: 1729k freed
[ 177.188710] NET: Registered protocol family 16
[ 177.225760] ACPI: bus type pci registered
[ 177.249871] PCI: Using configuration type 1
[ 177.406437] ACPI: Interpreter enabled
[ 177.428447] ACPI: Using IOAPIC for interrupt routing
[ 177.465159] ACPI: PCI Root Bridge [VP00] (0000:00)
[ 177.497384] PCI: Ignoring BAR0-3 of IDE controller 0000:00:0f.1
[ 177.553272] ACPI: PCI Root Bridge [VP01] (0000:01)
[ 177.587765] ACPI: PCI Root Bridge [VP02] (0000:02)
[ 177.626676] ACPI: PCI Root Bridge [VP03] (0000:04)
[ 177.665479] ACPI: PCI Root Bridge [VP04] (0000:06)
[ 177.704132] ACPI: PCI Root Bridge [VP05] (0000:08)
[ 177.743342] ACPI: PCI Root Bridge [VP06] (0000:0a)
[ 177.781137] ACPI: PCI Root Bridge [VP07] (0000:0c)
[ 177.820249] SCSI subsystem initialized
[ 177.843003] usbcore: registered new interface driver usbfs
[ 177.876123] usbcore: registered new interface driver hub
[ 177.908180] usbcore: registered new device driver usb
[ 177.938909] PCI: Using ACPI for IRQ routing
[ 177.964045] PCI: If a device doesn't work, try "pci=routeirq". If it helps, post a report
[ 178.013694] number of MP IRQ sources: 15.
[ 178.037795] number of IO-APIC #15 registers: 36.
[ 178.065551] number of IO-APIC #14 registers: 36.
[ 178.093308] testing the IO APIC........... 0023
[ 178.254698] ....... : PRQ implemented: 0
[ 178.280368] ....... : IO APIC version: 0011
[ 178.307606] 178.447982] 01 001 01 0 0 0 0 0 1 1 39
[ 178.481610] 02 001 01 1 0 0 0 0 0 0 20
[ 178.515184] 03 001 01 0 0 0 0 0 1 1 41
[ 178.548762] 04 001 01 0 0 0 0 0 1 1 49
[ 178.582336] 05 001 01 0 0 0 0 0 1 1 51
[ 178.615909] 06 001 01 0 0 0 0 0 1 1 59
[ 178.649534] 07 001 01 0 0 0 0 0 1 1 61
[ 178.683109] 08 001 01 0 0 0 1 0 1 1 69
[ 178.716687] 09 001 01 0 1 0 1 0 1 1 71
[ 178.750260] 0a 001 01 0 0 0 0 0 1 1 79
[ 178.783838] 0b 001 01 0 0 0 0 0 1 1 81
[ 178.817465] 0c 001 01 0 0 0 0 0 1 1 89
[ 178.851038] 0d 001 01 0 0 0 0 0 1 1 91
[ 178.884613] 0e 001 01 0 0 0 1 0 1 1 99
[ 178.918187] 0f 001 01 0 0 0 0 0 1 1 A1
[ 178.951760] 10 000 00 1 0 0 0 0 0 0 00
[ 179.119642] 15 000 00 1 0 0 0 0 0 0 00
[ 179.153272] 16 000 00 1 0 0 0 0 0 0 00
[ 179.186848] 17 000 00 1 0 0 0 0 0 0 00
[ 179.220426] 18 000 00 1 0 0 0 0 0 0 00
[ 179.254005] 19 000 00 1 0 0 0 0 0 0 00
[ 179.287585] 1a 000 00 1 0 0 0 0 0 0 00
[ 179.321164] 1b 000 00 1 0 0 0 0 0 0 00
[ 179.354743] 1c 000 00 1 0 0 0 0 0 0 00
[ 179.388322] 1d 000 00 1 0 0 0 0 0 0 00
[ 179.421896] 1e 000 00 1 0 0 0 0 0 0 00
[ 179.455474] 1f 000 00 1 0 0 0 0 0 0 00
[ 179.489050] 20 000 00 1 0 0 0 0 0 0 00
[ 179.522626] 21 000 00 1 0 0 0 0 0 0 00
[ 179.556203] 22 000 00 1 0 0 0 0 0 0 00
[ 179.589782] 23 000 00 1 0 0 0 0 0 0 00
[ 179.623362]
[ 179.632394] IO APIC #14......
[ 179.650230] .... register #00: 0E000000
[ 179.673261] ....... : physical APIC id: 0E
[ 179.699408] .... register #01: 00230011
[ 179.722484] ....... : max redirection entries: 0023
[ 179.753885] ....... : PRQ implemented: 0
[ 179.779560] ....... : IO APIC version: 0011
[ 179.806797] .... register #02: 00000000
[ 179.829872] ....... : arbitration: 00
[ 179.853939] .... IRQ redirection table:
[ 179.877016] NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect:
[ 179.913571] 00 000 00 1 0 0 0 0 0 0 00
[ 179.947189] 01 000 00 1 0 0 0 0 0 0 00
[ 179.980764] 02 000 00 1 0 0 0 0 0 0 00
[ 180.014344] 03 000 00 1 0 0 0 0 0 0 00
[ 180.047921] 08 000 00 1 0 0 0 0 0 0 00
[ 180.215808] 09 000 00 1 0 0 0 0 0 0 00
[ 180.249384] 0a 000 00 1 0 0 0 0 0 0 00
[ 180.282960] 0b 000 00 1 0 0 0 0 0 0 00
[ 180.450890] 10 000 00 1 0 0 0 0 0 0 00
[ 180.484467] 11 000 00 1 0 0 0 0 0 0 00
[ 180.518046] 12 000 00 1 0 0 0 0 0 0 00
[ 180.551625] 13 000 00 1 0 0 0 0 0 0 00
[ 180.585198] 14 000 00 1 0 0 0 0 0 0 00
[ 180.618771] 15 000 00 1 0 0 0 0 0 0 00
[ 180.652345] 16 000 00 1 0 0 0 0 0 0 00
[ 180.685919] 17 000 00 1 0 0 0 0 0 0 00
[ 180.719492] 18 000 00 1 0 0 0 0 0 0 00
[ 180.753068] 19 000 00 1 0 0 0 0 0 0 00
[ 180.786644] 1a 000 00 1 0 0 0 0 0 0 00
[ 180.820218] 1b 000 00 1 0 0 0 0 0 0 00
[ 180.853788] 1c 000 00 1 0 0 0 0 0 0 00
[ 180.887362] 1d 000 00 1 0 0 0 0 0 0 00
[ 180.920935] 1e 000 00 1 0 0 0 0 0 0 00
[ 180.954508] 1f 000 00 1 0 0 0 0 0 0 00
[ 180.988083] 2Q to pin mappings:
[ 181.141747] IRQ0 -> 0:2
[ 181.156693] IRQ1 -> 0:1
[ 181.171663] IRQ3 -> 0:3
[ 181.186631] IRQ4 -> 0:4
[ 181.201593] IRQ5 -> 0:5
[ 181.216562] IRQ6 -> 0:6
[ 181.231534] IRQ7 -> 0:7
[ 181.246499] IRQ8 -> 0:8
[ 181.261464] IRQ9 -> 0:9
[ 181.276432] IRQ10 -> 0:10
[ 181.292438] IRQ11 -> 0:11
[ 181.308442] IRQ12 -> 0:12
[ 181.324449] IRQ13 -> 0:13
[ 181.340456] IRQ14 -> 0:14
[ 181.356464] IRQ15 -> 0:15
[ 181.372473] .................................... done.
[ 181.403593] PCI-DMA: Using Calgary IOMMU
[ 181.782763] Calgary: enabling translation on PHB 0x0
[ 181.812583] Calgary: errant DMAs will now be prevented on this bus.
[ 182.205279] Calgary: enabling translation on PHB 0x1
[ 182.235103] Calgary: errant DMAs will now be prevented on this bus.
[ 182.628096] Calgary: enabling translation on PHB 0x2
[ 182.657919] Calgary: errant DMAs will now be prevented on this bus.
[ 182.695604] PCI-GART: No AMD northbridge found.
[ 182.733315] NET: Registered protocol family 2
[ 182.822361] IP route cache hash table entries: 262144 (order: 9, 2097152 bytes)
[ 182.868576] TCP established hash table entries: 65536 (order: 9, 3670016 bytes)
[ 182.920726] TCP bind hash table entries: 32768 (order: 8, 1835008 io scheduler noop registered
[ 183.135880] io scheduler anticipatory registered (default)
[ 183.169040] io scheduler deadline registered
[ 183.194855] io scheduler cfq registered
[ 183.225523] ACPI: PCI Interrupt 0000:00:01.0[A] -> GSI 16 (level, low) -> IRQ 16
[ 183.270354] radeonfb: Found Intel x86 BIOS ROM Image
[ 183.314007] radeonfb: Retrieved PLL infos from BIOS
[ 183.343298] radeonfb: Reference=27.00 MHz (RefDiv=60) Memory=143.00 Mhz, System=143.00 MHz
[ 183.392936] radeonfb: PLL min 12000 max 35000
[ 183.523460] i2c_adapter i2c-1: unable to read EDID block.
[ 183.715306] i2c_adapter i2c-1: unable to read EDID block.
[ 183.907251] i2c_adapter i2c-1: unable to read EDID block.
[ 184.371115] i2c_adapter i2c-2: unable to read EDID block.
[ 184.563059] i2c_adapter i2c-2: unable to read EDID block.
[ 184.755003] i2c_adapter i2c-2: unable to read EDID block.
[ 184.909498] radeonfb: Monitor 1 type DFP found
[ 184.936196] radeonfb: EDID probed
[ 184.956150] radeonfb: Monitor 2 type CRT found
[ 186.014806] Console: switching to colour frame buffer device 128x48
[ 186.727128] radeonfb (0000:00:01.0): ATI Radeon QY
[ 186.758984] tridentfb: Trident framebuffer 0.7.8-NEWAPI initializing
[ 186.799024] hgafb: HGA card not detected.
[ 186.823298] hgafb: probe of hgafb.0 failed with error -22
[ 186.859121] vga16fb: mapped to 0xffff8100000a0000
[ 186.887837] fb1: VGA16 VGA frame buffer device
[ 186.916353] fb2: Virtual frame buffer device, using 1024K of video memory
[ 186.957679] ACPI: Power Button (FF) [PWRF]
[ 186.983088] ibm_acpi: ec object not found
[ 187.393897] Linux agpgart interface v0.101 (c) Dave Jones
[ 187.427426] ipmi message handler version 39.0
[ 187.453736] ipmi device interface
[ 187.474098] Hangcheck: starting hangcheck timer 0.9.0 (tick is 180 seconds, margin is 60 seconds).
[ 187.528219] Hangcheck: Using monotonic_clock().
[ 187.555620] Serial: 8250/16550 driver $Revision: 1.90 $ 4 ports, IRQ sharing disabled
[ 187.603262] do_IRQ: cannot handle IRQ -1 vector: 73 cpu: 1
[ 187.603275] serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
[ 187.636664] serial8250: ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
[ 187.690829] RAMDISK driver initialized: 16 RAM disks of 16384K size 1024 blocksize
[ 187.697456] loop: loaded (max 8 devices)
[ 187.698825] ibmasm: IBM ASM Service Processor Driver version 1.0 loaded
[ 187.699070] ACPI: PCI Interrupt 0000:02:01.0[A] -> GSI 18 (level, low) -> IRQ 18
[ 187.863222] v[0][73] -> 4
[ 187.863228] number of MP IRQ sources: 15.
[ 187.863232] number of IO-APIC #15 registers: 36.
[ 187.863234] number of IO-APIC #14 registers: 36.
[ 187.863235] testing the IO APIC.......................
[ 187.863283]
[ 187.863285] IO APIC #15......
[ 187.863287] .... register #00: 0F000000
[ 187.863289] ....... : physical APIC id: 0F
[ 187.863291] .... register #01: 00230011
[ 187.863293] ....... : max redirection entries: 0023
[ 187.863294] ....... : PRQ implemented: 0
[ 187.863296] ....... : IO APIC version: 0011
[ 187.863298] .... register #02: 00000000
[ 187.863299] ....... : arbitration: 00
[ 187.863301] .... IRQ redirection table:
[ 187.863302] NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect:
[ 187.863307] 00 000 00 1 0 0 0 0 0 0 00
[ 187.863313] 01 001 01 0 0 0 0 0 1 1 39
[ 187.863319] 02 001 01 1 0 0 0 0 0 0 20
[ 187.863324] 03 001 01 0 0 0 0 0 1 1 41
[ 187.863329] 04 001 01 0 0 0 0 0 1 1 49
[ 187.863334] 05 001 01 0 0 0 0 0 1 1 51
[ 187.863340] 06 001 01 0 0 0 0 0 1 1 59
[ 187.863345] 07 001 01 0 0 0 0 0 1 1 61
[ 187.863350] 08 001 01 0 0 0 1 0 1 1 69
[ 187.863355] 09 001 01 0 1 0 1 0 1 1 71
[ 187.863361] 0a 001 01 0 0 0 0 0 1 1 79
[ 187.863366] 0b 001 01 0 0 0 0 0 1 1 81
[ 187.863371] 0c 001 01 0 0 0 0 0 1 1 89
[ 187.863376] 0d 001 01 0 0 0 0 0 1 1 91
[ 187.863381] 0e 001 01 0 0 0 1 0 1 1 99
[ 187.863387] 0f 001 01 0 0 0 0 0 1 1 A1
[ 187.863392] 10 001 01 1 1 0 1 0 1 1 A9
[ 187.863397] 11 000 00 1 0 0 0 0 0 0 00
[ 187.863403] 12 001 01 1 1 0 1 0 1 1 B1
[ 187.863408] 13 000 00 1 0 0 0 0 0 0 00
[ 187.863413] 14 000 00 1 0 0 0 0 0 0 00
[ 187.863418] 15 000 00 1 0 0 0 0 0 0 00
[ 187.863423] 16 000 00 1 0 0 0 0 0 0 00
[ 187.863428] 17 000 00 1 0 0 0 0 0 0 00
[ 187.863434] 18 000 00 1 0 0 0 0 0 0 00
[ 187.863439] 19 000 00 1 0 0 0 0 0 0 00
[ 187.863464] 1e 000 00 1 0 0 0 0 0 0 00
[ 187.863469] 1f 000 00 1 0 0 0 0 0 0 00
[ 187.863474] 20 000 00 1 0 0 0 0 0 0 00
[ 187.863479] 21 000 00 1 0 0 0 0 0 0 00
[ 187.863484] 22 000 00 1 0 0 0 0 0 0 00
[ 187.863489] 23 000 00 1 0 0 0 0 0 0 00
[ 187.863495]
[ 187.863497] IO APIC #14......
[ 187.863499] .... register #00: 0E000000
[ 187.863500] ....... : physical APIC id: 0E
[ 187.863502] .... register #01: 00230011
[ 187.863504] ....... : max redirection entries: 0023
[ 187.863505] ....... : PRQ implemented: 0
[ 187.863507] ....... : IO APIC version: 0011
[ 187.863509] .... register #02: 00000000
[ 187.863510] ....... : arbitration: 00
[ 187.863512] .... IRQ redirection table:
[ 187.863513] NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect:
[ 187.863518] 00 000 00 1 0 0 0 0 0 0 00
[ 187.863523] 01 000 00 1 0 0 0 0 0 0 00
[ 187.863528] 02 000 00 1 0 0 0 0 0 0 00
[ 187.863533] 03 000 00 1 0 0 0 0 0 0 00
[ 187.863538] 04 000 00 1 0 0 0 0 0 0 00
[ 187.863543] 05 000 00 1 0 0 0 0 0 0 00
[ 187.863548] 06 000 00 1 0 0 0 0 0 0 00
[ 187.863553] 07 000 00 1 0 0 0 0 0 0 00
[ 187.863558] 08 000 00 1 0 0 0 0 0 0 00
[ 187.863563] 09 000 00 1 0 0 0 0 0 0 00
[ 187.863568] 0a 000 00 1 0 0 0 0 0 0 00
[ 187.863573] 0b 000 00 1 0 0 0 0 0 0 00
[ 187.863578] 0c 000 00 1 0 0 0 0 0 0 00
[ 187.863584] 0d 000 00 1 0 0 0 0 0 0 00
[ 187.863589] 0e 000 00 1 0 0 0 0 0 0 00
[ 187.863594] 0f 000 00 1 0 0 0 0 0 0 00
[ 187.863599] 10 000 00 1 0 0 0 0 0 0 00
[ 187.863604] 11 000 00 1 0 0 0 0 0 0 00
[ 187.863609] 12 000 00 1 0 0 0 0 0 0 00
[ 187.863614] 13 000 00 1 0 0 0 0 0 0 00
[ 187.863640] 18 000 00 1 0 0 0 0 0 0 00
[ 187.863645] 19 000 00 1 0 0 0 0 0 0 00
[ 187.863650] 1a 000 00 1 0 0 0 0 0 0 00
[ 187.863655] 1b 000 00 1 0 0 0 0 0 0 00
[ 187.863660] 1c 000 00 1 0 0 0 0 0 0 00
[ 187.863665] 1d 000 00 1 0 0 0 0 0 0 00
[ 187.863670] 1e 000 00 1 0 0 0 0 0 0 00
[ 187.863675] 1f 000 00 1 0 0 0 0 0 0 00
[ 187.863680] 20 000 00 1 0 0 0 0 0 0 00
[ 187.863685] 21 000 00 1 0 0 0 0 0 0 00
[ 187.863690] 22 000 00 1 0 0 0 0 0 0 00
[ 187.863696] 23 000 00 1 0 0 0 0 0 0 00
[ 187.863698] IRQ to pin mappings:
[ 187.863700] IRQ0 -> 0:2
[ 187.863703] IRQ1 -> 0:1
[ 187.863705] IRQ3 -> 0:3
[ 187.863707] IRQ4 -> 0:4
[ 187.863709] IRQ5 -> 0:5
[ 187.863712] IRQ6 -> 0:6
[ 187.863714] IRQ7 -> 0:7
[ 187.863716] IRQ8 -> 0:8
[ 187.863719] IRQ9 -> 0:9
[ 187.863721] IRQ10 -> 0:10
[ 187.863723] IRQ11 -> 0:11
[ 187.863726] IRQ12 -> 0:12
[ 187.863728] IRQ13 -> 0:13
[ 187.863730] IRQ14 -> 0:14
[ 187.863733] IRQ15 -> 0:15
[ 187.863735] IRQ16 -> 0:16
[ 187.863738] IRQ18 -> 0:18
[ 187.863741] .................................... done.
[ 187.864068] 3c59x: Donald Becker and others. http://www.scyld.com/network/vortex.html
[ 187.864082] 0000:02:01.0: 3Com PCI 3c905C Tornado at ffffc20000042000.
[ 187.903216] tg3.c:v3.66 (September 23, 2006)
[ 187.903383] ACPI: PCI Interrupt 0000:01:01.0[A] -> GSI 24 (level, low) -> IRQ 24
[ 188.146236] eth1: Tigon3 [partno(BCM95704A6) rev 2100 PHY(5704)] (PCIX:66MHz:64-bit) 10/100/1000BaseT Ethernet 00:0d:60:98:74:22
[ 188.146247] eth1: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[1] Split[0] WireSpeed[1] TSOcap[0]
[ 188.146252] eth1: dma_rwctrl[769f0000] dma_mask[64-bit]
[ 188.146345] ACPI: PCI Interrupt 0000:01:01.1[B] -> GSI 28 (level, low) -> IRQ 28
[ 193.455989] eth2: Tigon3 [partno(BCM95704A6) rev 2100 PHY(5704)] (PCIX:66MHz:64-bit) 10/100/1000BaseT Ethernet 00:0d:60:98:74:23
[ 193.456001] eth2: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] Split[0] WireSpeed[1] TSOcap[1]
[ 193.456005] eth2: dma_rwctrl[769f0000] dma_mask[64-bit]
[ 193.456770] Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
[ 193.456777] ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
[ 193.456883] SvrWks CSB6: IDE controller at PCI slot 0000:00:0f.1
[ 193.456908] SvrWks CSB6: chipset revision 160
[ 193.456911] SvrWks CSB6: not 100% native mode: will probe irqs later
[ 193.456937] ide0: BM-DMA at 0x0700-0x0707, BIOS settings: hda:DMA, hdb:DMA
[ 193.456959] SvrWks CSB6: simplex device: DMA disabled
[ 193.456962] ide1: SvrWks CSB6 Bus-Master DMA disabled (BIOS)
[ 194.196864] hda: HL-DT-STDVD-ROM GDR8082N, ATAPI CD/DVD-ROM drive
[ 194.534776] ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
[ 195.156832] do_IRQ: cannot handle IRQ -1 vector: 153 cpu: 1
[ 195.217657] v[0][153] -> 14
[ 195.217664] number of MP IRQ sources: 15.
[ 195.217667] number of IO-APIC #15 registers: 36.
[ 195.217669] number of IO-APIC #14 registers: 36.
[ 195.217670] testing the IO APIC.......................
[ 195.217676]
[ 195.217677] IO APIC #15......
[ 195.217679] .... register #00: 0F000000
[ 195.217681] ....... : physical APIC id: 0F
[ 195.217683] .... register #01: 00230011
[ 195.217685] ....... : max redirection entries: 0023
[ 195.217686] ....... : PRQ implemented: 0
[ 195.217688] ....... : IO APIC version: 0011
[ 195.217696] .... register #02: 00000000
[ 195.217697] ....... : arbitration: 00
[ 195.217699] .... IRQ redirection table:
[ 195.217700] NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect:
[ 195.217716] 00 000 00 1 0 0 0 0 0 0 00
[ 195.217722] 01 001 01 0 0 0 0 0 1 1 39
[ 195.217727] 02 001 01 1 0 0 0 0 0 0 20
[ 195.217732] 03 001 01 0 0 0 0 0 1 1 41
[ 195.217737] 04 001 01 0 0 0 0 0 1 1 49
[ 195.217742] 05 001 01 0 0 0 0 0 1 1 51
[ 195.217747] 06 001 01 0 0 0 0 0 1 1 59
[ 195.217752] 07 001 01 0 0 0 0 0 1 1 61
[ 195.217757] 08 001 01 0 0 0 1 0 1 1 69
[ 195.217762] 09 001 01 0 1 0 1 0 1 1 71
[ 195.217768] 0a 001 01 0 0 0 0 0 1 1 79
[ 195.217773] 0b 001 01 0 0 0 0 0 1 1 81
[ 195.217778] 0c 001 01 0 0 0 0 0 1 1 89
[ 195.217783] 0d 001 01 0 0 0 0 0 1 1 91
[ 195.217788] 0e 001 01 0 0 0 1 0 1 1 99
[ 195.217793] 0f 001 01 0 0 0 0 0 1 1 A1
[ 195.217798] 10 001 01 1 1 0 1 0 1 1 A9
[ 195.217803] 11 000 00 1 0 0 0 0 0 0 00
[ 195.217808] 12 001 01 1 1 0 1 0 1 1 B1
[ 195.217814] 13 000 00 1 0 0 0 0 0 0 00
[ 195.217819] 14 000 00 1 0 0 0 0 0 0 00
[ 195.217823] 15 000 00 1 0 0 0 0 0 0 00
[ 195.217828] 16 000 00 1 0 0 0 0 0 0 00
[ 195.217833] 17 000 00 1 0 0 0 0 0 0 00
[ 195.217838] 18 001 01 1 1 0 1 0 1 1 B9
[ 195.217844] 19 000 00 1 0 0 0 0 0 0 00
[ 195.217848] 1a 000 00 1 0 0 0 0 0 0 00
[ 195.217853] 1b 000 00 1 0 0 0 0 0 0 00
[ 195.217858] 1c 001 01 1 1 0 1 0 1 1 C1
[ 195.217864] 1d 000 00 1 0 0 0 0 0 0 00
[ 195.217869] 1e 000 00 1 0 0 0 0 0 0 00
[ 195.217873] 1f 000 00 1 0 0 0 0 0 0 00
[ 195.217878] 20 000 00 1 0 0 0 0 ..
[ 195.217902] .... register #00: 0E000000
[ 195.217903] ....... : physical APIC id: 0E
[ 195.217905] .... register #01: 00230011
[ 195.217907] ....... : max redirection entries: 0023
[ 195.217909] ....... : PRQ implemented: 0
[ 195.217910] ....... : IO APIC version: 0011
[ 195.217912] .... register #02: 00000000
[ 195.217913] ....... : arbitration: 00
[ 195.217915] .... IRQ redirection table:
[ 195.217916] NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect:
[ 195.217920] 00 000 00 1 0 0 0 0 0 0 00
[ 195.217925] 01 000 00 1 0 0 0 0 0 0 00
[ 195.217930] 02 000 00 1 0 0 0 0 0 0 00
[ 195.217935] 03 000 00 1 0 0 0 0 0 0 00
[ 195.217940] 04 000 00 1 0 0 0 0 0 0 00
[ 195.217945] 05 000 00 1 0 0 0 0 0 0 00
[ 195.217950] 06 000 00 1 0 0 0 0 0 0 00
[ 195.217955] 07 000 00 1 0 0 0 0 0 0 00
[ 195.217960] 08 000 00 1 0 0 0 0 0 0 00
[ 195.217965] 09 000 00 1 0 0 0 0 0 0 00
[ 195.217970] 0a 000 00 1 0 0 0 0 0 0 00
[ 195.217974] 0b 000 00 1 0 0 0 0 0 0 00
[ 195.217979] 0c 000 00 1 0 0 0 0 0 0 00
[ 195.217984] 0d 000 00 1 0 0 0 0 0 0 00
[ 195.217989] 0e 000 00 1 0 0 0 0 0 0 00
[ 195.217994] 0f 000 00 1 0 0 0 0 0 0 00
[ 195.217999] 10 000 00 1 0 0 0 0 0 0 00
[ 195.218004] 11 000 00 1 0 0 0 0 0 0 00
[ 195.218009] 12 000 00 1 0 0 0 0 0 0 00
[ 195.218014] 13 000 00 1 0 0 0 0 0 0 00
[ 195.218019] 14 000 00 1 0 0 0 0 0 0 00
[ 195.218024] 15 000 00 1 0 0 0 0 0 0 00
[ 195.218028] 16 000 00 1 0 0 0 0 0 0 00
[ 195.218033] 17 000 00 1 0 0 0 0 0 0 00
[ 195.218038] 18 000 00 1 0 0 0 0 0 0 00
[ 195.218043] 19 000 00 1 0 0 0 0 0 0 00
[ 195.218048] 1a 000 00 1 0 0 0 0 0 0 00
[ 195.218053] 1b 000 00 1 0 0 0 0 0 0 00
[ 195.218058] 1c 000 00 1 0 0 0 0 0 0 00
[ 195.218063] 1d 000 00 1 0 0 0 0 0 0 00
[ 195.218068] 1e 000 00 1 0 0 0 0 0 0 00
[ 195.218092] 23 000 00 1 0 0 0 0 0 0 00
[ 195.218095] IRQ to pin mappings:
[ 195.218097] IRQ0 -> 0:2
[ 195.218099] IRQ1 -> 0:1
[ 195.218101] IRQ3 -> 0:3
[ 195.218103] IRQ4 -> 0:4
[ 195.218106] IRQ5 -> 0:5
[ 195.218108] IRQ6 -> 0:6
[ 195.218110] IRQ7 -> 0:7
[ 195.218112] IRQ8 -> 0:8
[ 195.218114] IRQ9 -> 0:9
[ 195.218117] IRQ10 -> 0:10
[ 195.218119] IRQ11 -> 0:11
[ 195.218121] IRQ12 -> 0:12
[ 195.218123] IRQ13 -> 0:13
[ 195.218126] IRQ14 -> 0:14
[ 195.218128] IRQ15 -> 0:15
[ 195.218130] IRQ16 -> 0:16
[ 195.218133] IRQ18 -> 0:18
[ 195.218135] IRQ24 -> 0:24
[ 195.218137] IRQ28 -> 0:28
[ 195.218141] .................................... done.
[ 200.775517] hda: lost interrupt

2006-10-07 16:55:10

by Eric W. Biederman

[permalink] [raw]
Subject: Re: 2.6.19-rc1 genirq causes either boot hang or "do_IRQ: cannot handle IRQ -1"

Muli Ben-Yehuda <[email protected]> writes:

> On Fri, Oct 06, 2006 at 05:42:40PM -0600, Eric W. Biederman wrote:
>
>> If I read your bootlog right. You have logical cpus, but only two
>> sockets, and I think only two cores. The other two logical cpus
>> being hyperthreaded.
>
> Yes, 2 sockets each of which is HT. Here's a /proc/cpuinfo from a
> distro kernel:

Ok. From looking at an individual case the ioapic is programmed
correctly and I don't see a reason the local apic would be programmed
incorrectly. However logical delivery mode and lowest priority
delivery mode are enabled. So we are asking the interrupt delivery
subsystem to choose a cpu to deliver the interrupt to and then are not
giving the cpu any choice. So we may be confusing things.

Can you try CONFIG_CPU_HOTPLUG? That will force genapic to be set
to genapic_physflat instead of genapic_flat.

I am hoping that by running the apics in a different delivery mode
that explicitly says just deliver this interrupt to this cpu we
will avoid the problem you are seeing.

If genapic_physflat works we will have to decide what to do about
genapic_flat.

Thanks,

Eric

2006-10-07 17:59:35

by Muli Ben-Yehuda

[permalink] [raw]
Subject: Re: 2.6.19-rc1 genirq causes either boot hang or "do_IRQ: cannot handle IRQ -1"

On Sat, Oct 07, 2006 at 10:52:24AM -0600, Eric W. Biederman wrote:

> Can you try CONFIG_CPU_HOTPLUG? That will force genapic to be set
> to genapic_physflat instead of genapic_flat.

Yep, it boots with CONFIG_CPU_HOTPLUG!

> If genapic_physflat works we will have to decide what to do about
> genapic_flat.

I'm happy to test any follow-on patches to make it work without
CONFIG_CPU_HOTPLUG.

Cheers,
Muli

2006-10-07 19:03:41

by Linus Torvalds

[permalink] [raw]
Subject: Re: 2.6.19-rc1 genirq causes either boot hang or "do_IRQ: cannot handle IRQ -1"



On Sat, 7 Oct 2006, Eric W. Biederman wrote:
>
> I am hoping that by running the apics in a different delivery mode
> that explicitly says just deliver this interrupt to this cpu we
> will avoid the problem you are seeing.

Note that having too strict delivery modes could be a major pain in the
future, with things like multicore CPU's a lot more actively doing power
management on their own, and effectively going into sleep-states with
reasonably long latencies.

Especially with schedulers that are aware of things like that (and we
_try_, at least to some degree, and people are interested in more of it),
you can easily be in the situation that one of the cores is being fairly
actively kept in a low-power state, and can have millisecond latencies
(not to mention no L1 cache contents etc).

So I really do think that the belief that we should force irqs to a
particular core is fundamentally flawed.

We used to do lowest-priority stuff in hw, and then Intel broke it, but I
always told them that they were _stupid_ to break it. The fact is,
especially with multi-core, it actually makes a lot of sense to have
hardware decide which core to interrupt, because hardware simply
potentially knows better.

This is one of those age-old questions: in _theory_ you can do a better
job in software, but in _practice_ it's just too damn expensive and
complicated to do a perfect job especially with dynamic decisions, so in
_practice_ it tends to be better to let hardware make some of the
decisions.

We can see the same thing in instruction scheduling: in _theory_ a
compiler can do a better job of scheduling, since it can spend inordinate
amounts of resources on doing things once, and then the hardware can be
simpler and faster and never worry about it. In _practice_, however, the
biggest scheduling decisions are all dynamic at run-time, and depend on
things like cache misses etc, and only total idiots (or embedded people)
will do static scheduling these days.

I think it's a huge mistake to do static interrupt routing for the same
reason.

Linus

2006-10-07 19:33:38

by Arjan van de Ven

[permalink] [raw]
Subject: Re: 2.6.19-rc1 genirq causes either boot hang or "do_IRQ: cannot handle IRQ -1"


> This is one of those age-old questions: in _theory_ you can do a better
> job in software, but in _practice_ it's just too damn expensive and
> complicated to do a perfect job especially with dynamic decisions, so in
> _practice_ it tends to be better to let hardware make some of the
> decisions.

it seems the right mix at this time is to have the software select the
package, and the hardware pick the core within the package.

Or rather, the software picks which cache domain (and I only count the
largest cache, not L1) and the hardware then has the freedom to do the
right thing inside that. Binding interrupts to a cache domain seems to
be still the right strategy (at least for frequent interrupts like
networking), but to do that right more higher level info is needed than
that the hw has in general. Within the package... it's the opposite
ballgame.



--
if you want to mail me at work (you don't), use arjan (at) linux.intel.com

2006-10-07 19:58:28

by Linus Torvalds

[permalink] [raw]
Subject: Re: 2.6.19-rc1 genirq causes either boot hang or "do_IRQ: cannot handle IRQ -1"



On Sat, 7 Oct 2006, Arjan van de Ven wrote:
>
> it seems the right mix at this time is to have the software select the
> package, and the hardware pick the core within the package.

I think that sounds like a fairly good approach.

Software obviously can make the "rough" selections, it's the fine-grained
ones that are harder (and might need to be done at a frequency that just
makes it impractical).

So yes, having software say "We want to steer this particular interrupt to
this L3 cache domain" sounds eminently sane.

Having software specify which L1 cache domain it wants to pollute is
likely just crazy micro-management.

Linus

2006-10-07 20:26:43

by Eric W. Biederman

[permalink] [raw]
Subject: Re: 2.6.19-rc1 genirq causes either boot hang or "do_IRQ: cannot handle IRQ -1"

Linus Torvalds <[email protected]> writes:

> On Sat, 7 Oct 2006, Eric W. Biederman wrote:
>>
>> I am hoping that by running the apics in a different delivery mode
>> that explicitly says just deliver this interrupt to this cpu we
>> will avoid the problem you are seeing.
>
> Note that having too strict delivery modes could be a major pain in the
> future, with things like multicore CPU's a lot more actively doing power
> management on their own, and effectively going into sleep-states with
> reasonably long latencies.

Sure.

> Especially with schedulers that are aware of things like that (and we
> _try_, at least to some degree, and people are interested in more of it),
> you can easily be in the situation that one of the cores is being fairly
> actively kept in a low-power state, and can have millisecond latencies
> (not to mention no L1 cache contents etc).
>
> So I really do think that the belief that we should force irqs to a
> particular core is fundamentally flawed.

For me this isn't about forcing an irq to a particular cpu. It
is about not having global vector allocation, because that simply
cannot scale.

Being able to allocate a vector for just a subset of the cpus means we can
support arbitrarily large systems. Making the size of the pool a single
cpu was the simplest implementation of that idea.

> We used to do lowest-priority stuff in hw, and then Intel broke it, but I
> always told them that they were _stupid_ to break it. The fact is,
> especially with multi-core, it actually makes a lot of sense to have
> hardware decide which core to interrupt, because hardware simply
> potentially knows better.
>
> This is one of those age-old questions: in _theory_ you can do a better
> job in software, but in _practice_ it's just too damn expensive and
> complicated to do a perfect job especially with dynamic decisions, so in
> _practice_ it tends to be better to let hardware make some of the
> decisions.
>
> We can see the same thing in instruction scheduling: in _theory_ a
> compiler can do a better job of scheduling, since it can spend inordinate
> amounts of resources on doing things once, and then the hardware can be
> simpler and faster and never worry about it. In _practice_, however, the
> biggest scheduling decisions are all dynamic at run-time, and depend on
> things like cache misses etc, and only total idiots (or embedded people)
> will do static scheduling these days.
>
> I think it's a huge mistake to do static interrupt routing for the same
> reason.

I have no problem with that. The only place where I caused a behavior
changes on x86_64 is genapic_flat which does this, and I figured it was
not a big deal simply because CONFIG_CPU_HOTPLUG is the default so it
is rarely used. I figured if my implementation was too simple someone
would scream and I could add the complexity to the vector allocator to
enable lowest priority interrupt delivery.

Well someone has screamed :)

Eric

2006-10-08 13:41:47

by Eric W. Biederman

[permalink] [raw]
Subject: [PATCH 0/3] x86_64 irq fixes

Muli Ben-Yehuda <[email protected]> writes:

> On Sat, Oct 07, 2006 at 10:52:24AM -0600, Eric W. Biederman wrote:
>
>> Can you try CONFIG_CPU_HOTPLUG? That will force genapic to be set
>> to genapic_physflat instead of genapic_flat.
>
> Yep, it boots with CONFIG_CPU_HOTPLUG!
>
>> If genapic_physflat works we will have to decide what to do about
>> genapic_flat.
>
> I'm happy to test any follow-on patches to make it work without
> CONFIG_CPU_HOTPLUG.

Ok. I have found a fairly clean way to structure the code that
should restore the previous behavior of the genapic_flat allowing
lowest priority interrupt delivery to work, and getting lucky
and avoiding your hardware that does not do what the software
tells it to :)

I still need to dig in and remove the BUG_ON in the interrupt
reception path, but that is a separate problem.

I also found another small bug in the pci_enable_irq because
of some code I failed to remove earlier, and the patches overlap
so I have made this a small patch series.

I have tested the code as best I can, and confirmation that this
fixes the original problem would be great. But I don't see how
it could fail to fix the problem, as it restores genapic_flat to
global vector allocation.

Eric

2006-10-08 13:43:18

by Eric W. Biederman

[permalink] [raw]
Subject: [PATCH 1/3] i386/x86_64: FIX pci_enable_irq to set dev->irq to the irq number


In commit: ace80ab796ae30d2c9ee8a84ab6f608a61f8b87b I removed
the weird logic that used the vector number as the irq number
when MSI was defined. However pci_enable_irq was using a different test
in the io_apic_assign_irqs path and I missed it :(

This patch removes the wrong code so no one hits this problem.

This code is only active when a specific set of boot command line
parameters is specified which likely explains why no one has
notices this earlier.

Signed-off-by: Eric W. Biederman <[email protected]>
---
arch/i386/pci/irq.c | 4 ----
1 files changed, 0 insertions(+), 4 deletions(-)

diff --git a/arch/i386/pci/irq.c b/arch/i386/pci/irq.c
index 47f02af..dbc4aae 100644
--- a/arch/i386/pci/irq.c
+++ b/arch/i386/pci/irq.c
@@ -1141,10 +1141,6 @@ static int pirq_enable_irq(struct pci_de
}
dev = temp_dev;
if (irq >= 0) {
-#ifdef CONFIG_PCI_MSI
- if (!platform_legacy_irq(irq))
- irq = IO_APIC_VECTOR(irq);
-#endif
printk(KERN_INFO "PCI->APIC IRQ transform: %s[%c] -> IRQ %d\n",
pci_name(dev), 'A' + pin, irq);
dev->irq = irq;
--
1.4.2.rc3.g7e18e-dirty

2006-10-08 13:45:52

by Eric W. Biederman

[permalink] [raw]
Subject: [PATCH 2/3] i386/x86_64: Remove global IO_APIC_VECTOR


Which vector an irq is assigned to now varies dynamically and is
not needed outside of io_apic.c. So remove the possibility
of accessing the information outside of io_apic.c and remove
the silly macro that makes looking for users of irq_vector
difficult.

The fact this compiles ensures there aren't any more pieces
of the old CONFIG_PCI_MSI weirdness that I failed to remove.

Signed-off-by: Eric W. Biederman <[email protected]>
---
arch/i386/kernel/io_apic.c | 12 ++++++------
arch/x86_64/kernel/io_apic.c | 8 ++++----
include/asm-i386/hw_irq.h | 3 ---
include/asm-x86_64/hw_irq.h | 2 --
4 files changed, 10 insertions(+), 15 deletions(-)

diff --git a/arch/i386/kernel/io_apic.c b/arch/i386/kernel/io_apic.c
index b7287fb..cd082c3 100644
--- a/arch/i386/kernel/io_apic.c
+++ b/arch/i386/kernel/io_apic.c
@@ -1184,8 +1184,8 @@ static int __assign_irq_vector(int irq)

BUG_ON((unsigned)irq >= NR_IRQ_VECTORS);

- if (IO_APIC_VECTOR(irq) > 0)
- return IO_APIC_VECTOR(irq);
+ if (irq_vector[irq] > 0)
+ return irq_vector[irq];

current_vector += 8;
if (current_vector == SYSCALL_VECTOR)
@@ -1199,7 +1199,7 @@ static int __assign_irq_vector(int irq)
}

vector = current_vector;
- IO_APIC_VECTOR(irq) = vector;
+ irq_vector[irq] = vector;

return vector;
}
@@ -1967,7 +1967,7 @@ static void ack_ioapic_quirk_irq(unsigne
* operation to prevent an edge-triggered interrupt escaping meanwhile.
* The idea is from Manfred Spraul. --macro
*/
- i = IO_APIC_VECTOR(irq);
+ i = irq_vector[irq];

v = apic_read(APIC_TMR + ((i & ~0x1f) >> 1));

@@ -1984,7 +1984,7 @@ static void ack_ioapic_quirk_irq(unsigne

static int ioapic_retrigger_irq(unsigned int irq)
{
- send_IPI_self(IO_APIC_VECTOR(irq));
+ send_IPI_self(irq_vector[irq]);

return 1;
}
@@ -2020,7 +2020,7 @@ static inline void init_IO_APIC_traps(vo
*/
for (irq = 0; irq < NR_IRQS ; irq++) {
int tmp = irq;
- if (IO_APIC_IRQ(tmp) && !IO_APIC_VECTOR(tmp)) {
+ if (IO_APIC_IRQ(tmp) && !irq_vector[tmp]) {
/*
* Hmm.. We don't have an entry for this,
* so default to an old-fashioned 8259
diff --git a/arch/x86_64/kernel/io_apic.c b/arch/x86_64/kernel/io_apic.c
index 91728d9..9c3b9b1 100644
--- a/arch/x86_64/kernel/io_apic.c
+++ b/arch/x86_64/kernel/io_apic.c
@@ -587,8 +587,8 @@ static int __assign_irq_vector(int irq,

BUG_ON((unsigned)irq >= NR_IRQ_VECTORS);

- if (IO_APIC_VECTOR(irq) > 0)
- old_vector = IO_APIC_VECTOR(irq);
+ if (irq_vector[irq] > 0)
+ old_vector = irq_vector[irq];
if ((old_vector > 0) && cpu_isset(old_vector >> 8, mask)) {
return old_vector;
}
@@ -620,7 +620,7 @@ next:
}
per_cpu(vector_irq, cpu)[vector] = irq;
vector |= cpu << 8;
- IO_APIC_VECTOR(irq) = vector;
+ irq_vector[irq] = vector;
return vector;
}
return -ENOSPC;
@@ -1289,7 +1289,7 @@ static inline void init_IO_APIC_traps(vo
*/
for (irq = 0; irq < NR_IRQS ; irq++) {
int tmp = irq;
- if (IO_APIC_IRQ(tmp) && !IO_APIC_VECTOR(tmp)) {
+ if (IO_APIC_IRQ(tmp) && !irq_vector[tmp]) {
/*
* Hmm.. We don't have an entry for this,
* so default to an old-fashioned 8259
diff --git a/include/asm-i386/hw_irq.h b/include/asm-i386/hw_irq.h
index 8806c7e..0bedbdf 100644
--- a/include/asm-i386/hw_irq.h
+++ b/include/asm-i386/hw_irq.h
@@ -26,9 +26,6 @@ #define NMI_VECTOR 0x02
* Interrupt entry/exit code at both C and assembly level
*/

-extern u8 irq_vector[NR_IRQ_VECTORS];
-#define IO_APIC_VECTOR(irq) (irq_vector[irq])
-
extern void (*interrupt[NR_IRQS])(void);

#ifdef CONFIG_SMP
diff --git a/include/asm-x86_64/hw_irq.h b/include/asm-x86_64/hw_irq.h
index 53d0d9f..792dd52 100644
--- a/include/asm-x86_64/hw_irq.h
+++ b/include/asm-x86_64/hw_irq.h
@@ -74,10 +74,8 @@ #define FIRST_SYSTEM_VECTOR 0xef /* du


#ifndef __ASSEMBLY__
-extern unsigned int irq_vector[NR_IRQ_VECTORS];
typedef int vector_irq_t[NR_VECTORS];
DECLARE_PER_CPU(vector_irq_t, vector_irq);
-#define IO_APIC_VECTOR(irq) (irq_vector[irq])

/*
* Various low-level irq details needed by irq.c, process.c,
--
1.4.2.rc3.g7e18e-dirty

2006-10-08 13:50:08

by Eric W. Biederman

[permalink] [raw]
Subject: [PATCH 3/3] x86_64 irq: Allocate a vector across all cpus for genapic_flat.


The problem we can't take advantage of lowest priority delivery
mode if the vectors are allocated for only one cpu at a time.
Nor can we work around hardware that assumes lowest priority
delivery mode is always used with several cpus.

So this patch introduces the concept of a vector_allocation_domain.
A set of cpus that will receive an irq on the same vector. Currently
the code for implementing this is placed in the genapic structure
so we can vary this depending on how we are using the io_apics.

This allows us to restore the previous behaviour of genapic_flat
without removing the benefits of having separate vector allocation
for large machines.

This should also fix the problem report where a hyperthreaded
cpu was receving the irq on the wrong hyperthread when in
logical delivery mode because the previous behaviour is restored.

This patch properly records our allocation of the first 16 irqs
to the first 16 available vectors on all cpus. This should be
fine but it may run into problems with multiple interrupts at
the same interrupt level. Except for some badly maintained comments
in the code and the behaviour of the interrupt allocator I have
no real understanding of that problem.

Signed-off-by: Eric W. Biederman <[email protected]>
---
arch/x86_64/kernel/genapic_cluster.c | 8 ++
arch/x86_64/kernel/genapic_flat.c | 24 ++++++
arch/x86_64/kernel/io_apic.c | 131 ++++++++++++++++++++++------------
include/asm-x86_64/genapic.h | 1
include/asm-x86_64/mach_apic.h | 1
5 files changed, 117 insertions(+), 48 deletions(-)

diff --git a/arch/x86_64/kernel/genapic_cluster.c b/arch/x86_64/kernel/genapic_cluster.c
index cdb90e6..73d7630 100644
--- a/arch/x86_64/kernel/genapic_cluster.c
+++ b/arch/x86_64/kernel/genapic_cluster.c
@@ -63,6 +63,13 @@ static cpumask_t cluster_target_cpus(voi
return cpumask_of_cpu(0);
}

+static cpumask_t cluster_vector_allocation_domain(int cpu)
+{
+ cpumask_t domain = CPU_MASK_NONE;
+ cpu_set(cpu, domain);
+ return domain;
+}
+
static void cluster_send_IPI_mask(cpumask_t mask, int vector)
{
send_IPI_mask_sequence(mask, vector);
@@ -119,6 +126,7 @@ struct genapic apic_cluster = {
.int_delivery_mode = dest_Fixed,
.int_dest_mode = (APIC_DEST_PHYSICAL != 0),
.target_cpus = cluster_target_cpus,
+ .vector_allocation_domain = cluster_vector_allocation_domain,
.apic_id_registered = cluster_apic_id_registered,
.init_apic_ldr = cluster_init_apic_ldr,
.send_IPI_all = cluster_send_IPI_all,
diff --git a/arch/x86_64/kernel/genapic_flat.c b/arch/x86_64/kernel/genapic_flat.c
index 50ad153..0dfc223 100644
--- a/arch/x86_64/kernel/genapic_flat.c
+++ b/arch/x86_64/kernel/genapic_flat.c
@@ -22,6 +22,20 @@ static cpumask_t flat_target_cpus(void)
return cpu_online_map;
}

+static cpumask_t flat_vector_allocation_domain(int cpu)
+{
+ /* Careful. Some cpus do not strictly honor the set of cpus
+ * specified in the interrupt destination when using lowest
+ * priority interrupt delivery mode.
+ *
+ * In particular there was a hyperthreading cpu observed to
+ * deliver interrupts to the wrong hyperthread when only one
+ * hyperthread was specified in the interrupt desitination.
+ */
+ cpumask_t domain = { { [0] = APIC_ALL_CPUS, } };
+ return domain;
+}
+
/*
* Set up the logical destination ID.
*
@@ -121,6 +135,7 @@ struct genapic apic_flat = {
.int_delivery_mode = dest_LowestPrio,
.int_dest_mode = (APIC_DEST_LOGICAL != 0),
.target_cpus = flat_target_cpus,
+ .vector_allocation_domain = flat_vector_allocation_domain,
.apic_id_registered = flat_apic_id_registered,
.init_apic_ldr = flat_init_apic_ldr,
.send_IPI_all = flat_send_IPI_all,
@@ -141,6 +156,14 @@ static cpumask_t physflat_target_cpus(vo
return cpumask_of_cpu(0);
}

+static cpumask_t physflat_vector_allocation_domain(int cpu)
+{
+ cpumask_t domain = CPU_MASK_NONE;
+ cpu_set(cpu, domain);
+ return domain;
+}
+
+
static void physflat_send_IPI_mask(cpumask_t cpumask, int vector)
{
send_IPI_mask_sequence(cpumask, vector);
@@ -179,6 +202,7 @@ struct genapic apic_physflat = {
.int_delivery_mode = dest_Fixed,
.int_dest_mode = (APIC_DEST_PHYSICAL != 0),
.target_cpus = physflat_target_cpus,
+ .vector_allocation_domain = physflat_vector_allocation_domain,
.apic_id_registered = flat_apic_id_registered,
.init_apic_ldr = flat_init_apic_ldr,/*not needed, but shouldn't hurt*/
.send_IPI_all = physflat_send_IPI_all,
diff --git a/arch/x86_64/kernel/io_apic.c b/arch/x86_64/kernel/io_apic.c
index 9c3b9b1..771bcf7 100644
--- a/arch/x86_64/kernel/io_apic.c
+++ b/arch/x86_64/kernel/io_apic.c
@@ -47,7 +47,7 @@ #include <asm/nmi.h>
#include <asm/msidef.h>
#include <asm/hypertransport.h>

-static int assign_irq_vector(int irq, cpumask_t mask);
+static int assign_irq_vector(int irq, cpumask_t mask, cpumask_t *result);

#define __apicdebuginit __init

@@ -174,12 +174,10 @@ static void set_ioapic_affinity_irq(unsi

cpus_and(mask, tmp, CPU_MASK_ALL);

- vector = assign_irq_vector(irq, mask);
+ vector = assign_irq_vector(irq, mask, &tmp);
if (vector < 0)
return;

- cpus_clear(tmp);
- cpu_set(vector >> 8, tmp);
dest = cpu_mask_to_apicid(tmp);

/*
@@ -188,7 +186,7 @@ static void set_ioapic_affinity_irq(unsi
dest = SET_APIC_LOGICAL_ID(dest);

spin_lock_irqsave(&ioapic_lock, flags);
- __target_IO_APIC_irq(irq, dest, vector & 0xff);
+ __target_IO_APIC_irq(irq, dest, vector);
set_native_irq_info(irq, mask);
spin_unlock_irqrestore(&ioapic_lock, flags);
}
@@ -563,9 +561,45 @@ static inline int IO_APIC_irq_trigger(in
}

/* irq_vectors is indexed by the sum of all RTEs in all I/O APICs. */
-unsigned int irq_vector[NR_IRQ_VECTORS] __read_mostly = { FIRST_EXTERNAL_VECTOR, 0 };
+static u8 irq_vector[NR_IRQ_VECTORS] __read_mostly = {
+ [0] = FIRST_EXTERNAL_VECTOR + 0,
+ [1] = FIRST_EXTERNAL_VECTOR + 1,
+ [2] = FIRST_EXTERNAL_VECTOR + 2,
+ [3] = FIRST_EXTERNAL_VECTOR + 3,
+ [4] = FIRST_EXTERNAL_VECTOR + 4,
+ [5] = FIRST_EXTERNAL_VECTOR + 5,
+ [6] = FIRST_EXTERNAL_VECTOR + 6,
+ [7] = FIRST_EXTERNAL_VECTOR + 7,
+ [8] = FIRST_EXTERNAL_VECTOR + 8,
+ [9] = FIRST_EXTERNAL_VECTOR + 9,
+ [10] = FIRST_EXTERNAL_VECTOR + 10,
+ [11] = FIRST_EXTERNAL_VECTOR + 11,
+ [12] = FIRST_EXTERNAL_VECTOR + 12,
+ [13] = FIRST_EXTERNAL_VECTOR + 13,
+ [14] = FIRST_EXTERNAL_VECTOR + 14,
+ [15] = FIRST_EXTERNAL_VECTOR + 15,
+};
+
+static cpumask_t irq_domain[NR_IRQ_VECTORS] __read_mostly = {
+ [0] = CPU_MASK_ALL,
+ [1] = CPU_MASK_ALL,
+ [2] = CPU_MASK_ALL,
+ [3] = CPU_MASK_ALL,
+ [4] = CPU_MASK_ALL,
+ [5] = CPU_MASK_ALL,
+ [6] = CPU_MASK_ALL,
+ [7] = CPU_MASK_ALL,
+ [8] = CPU_MASK_ALL,
+ [9] = CPU_MASK_ALL,
+ [10] = CPU_MASK_ALL,
+ [11] = CPU_MASK_ALL,
+ [12] = CPU_MASK_ALL,
+ [13] = CPU_MASK_ALL,
+ [14] = CPU_MASK_ALL,
+ [15] = CPU_MASK_ALL,
+};

-static int __assign_irq_vector(int irq, cpumask_t mask)
+static int __assign_irq_vector(int irq, cpumask_t mask, cpumask_t *result)
{
/*
* NOTE! The local APIC isn't very good at handling
@@ -589,14 +623,22 @@ static int __assign_irq_vector(int irq,

if (irq_vector[irq] > 0)
old_vector = irq_vector[irq];
- if ((old_vector > 0) && cpu_isset(old_vector >> 8, mask)) {
- return old_vector;
+ if (old_vector > 0) {
+ cpus_and(*result, irq_domain[irq], mask);
+ if (!cpus_empty(*result))
+ return old_vector;
}

for_each_cpu_mask(cpu, mask) {
+ cpumask_t domain;
+ int first, new_cpu;
int vector, offset;
- vector = pos[cpu].vector;
- offset = pos[cpu].offset;
+
+ domain = vector_allocation_domain(cpu);
+ first = first_cpu(domain);
+
+ vector = pos[first].vector;
+ offset = pos[first].offset;
next:
vector += 8;
if (vector >= FIRST_SYSTEM_VECTOR) {
@@ -604,35 +646,40 @@ next:
offset = (offset + 1) % 8;
vector = FIRST_DEVICE_VECTOR + offset;
}
- if (unlikely(pos[cpu].vector == vector))
+ if (unlikely(pos[first].vector == vector))
continue;
if (vector == IA32_SYSCALL_VECTOR)
goto next;
- if (per_cpu(vector_irq, cpu)[vector] != -1)
- goto next;
+ for_each_cpu_mask(new_cpu, domain)
+ if (per_cpu(vector_irq, cpu)[vector] != -1)
+ goto next;
/* Found one! */
- pos[cpu].vector = vector;
- pos[cpu].offset = offset;
+ for_each_cpu_mask(new_cpu, domain) {
+ pos[cpu].vector = vector;
+ pos[cpu].offset = offset;
+ }
if (old_vector >= 0) {
- int old_cpu = old_vector >> 8;
- old_vector &= 0xff;
- per_cpu(vector_irq, old_cpu)[old_vector] = -1;
+ int old_cpu;
+ for_each_cpu_mask(old_cpu, domain)
+ per_cpu(vector_irq, old_cpu)[old_vector] = -1;
}
- per_cpu(vector_irq, cpu)[vector] = irq;
- vector |= cpu << 8;
+ for_each_cpu_mask(new_cpu, domain)
+ per_cpu(vector_irq, new_cpu)[vector] = irq;
irq_vector[irq] = vector;
+ irq_domain[irq] = domain;
+ cpus_and(*result, domain, mask);
return vector;
}
return -ENOSPC;
}

-static int assign_irq_vector(int irq, cpumask_t mask)
+static int assign_irq_vector(int irq, cpumask_t mask, cpumask_t *result)
{
int vector;
unsigned long flags;

spin_lock_irqsave(&vector_lock, flags);
- vector = __assign_irq_vector(irq, mask);
+ vector = __assign_irq_vector(irq, mask, result);
spin_unlock_irqrestore(&vector_lock, flags);
return vector;
}
@@ -704,14 +751,12 @@ static void __init setup_IO_APIC_irqs(vo

if (IO_APIC_IRQ(irq)) {
cpumask_t mask;
- vector = assign_irq_vector(irq, TARGET_CPUS);
+ vector = assign_irq_vector(irq, TARGET_CPUS, &mask);
if (vector < 0)
continue;

- cpus_clear(mask);
- cpu_set(vector >> 8, mask);
entry.dest.logical.logical_dest = cpu_mask_to_apicid(mask);
- entry.vector = vector & 0xff;
+ entry.vector = vector;

ioapic_register_intr(irq, vector, IOAPIC_AUTO);
if (!apic && (irq < 16))
@@ -1430,12 +1475,13 @@ static inline void check_timer(void)
{
int apic1, pin1, apic2, pin2;
int vector;
+ cpumask_t mask;

/*
* get/set the timer IRQ vector:
*/
disable_8259A_irq(0);
- vector = assign_irq_vector(0, TARGET_CPUS);
+ vector = assign_irq_vector(0, TARGET_CPUS, &mask);

/*
* Subtle, code in do_timer_interrupt() expects an AEOI
@@ -1667,6 +1713,7 @@ int create_irq(void)
int new;
int vector = 0;
unsigned long flags;
+ cpumask_t mask;

irq = -ENOSPC;
spin_lock_irqsave(&vector_lock, flags);
@@ -1675,7 +1722,7 @@ int create_irq(void)
continue;
if (irq_vector[new] != 0)
continue;
- vector = __assign_irq_vector(new, TARGET_CPUS);
+ vector = __assign_irq_vector(new, TARGET_CPUS, &mask);
if (likely(vector > 0))
irq = new;
break;
@@ -1707,13 +1754,10 @@ static int msi_compose_msg(struct pci_de
{
int vector;
unsigned dest;
+ cpumask_t tmp;

- vector = assign_irq_vector(irq, TARGET_CPUS);
+ vector = assign_irq_vector(irq, TARGET_CPUS, &tmp);
if (vector >= 0) {
- cpumask_t tmp;
-
- cpus_clear(tmp);
- cpu_set(vector >> 8, tmp);
dest = cpu_mask_to_apicid(tmp);

msg->address_hi = MSI_ADDR_BASE_HI;
@@ -1752,12 +1796,10 @@ static void set_msi_irq_affinity(unsigne

cpus_and(mask, tmp, CPU_MASK_ALL);

- vector = assign_irq_vector(irq, mask);
+ vector = assign_irq_vector(irq, mask, &tmp);
if (vector < 0)
return;

- cpus_clear(tmp);
- cpu_set(vector >> 8, tmp);
dest = cpu_mask_to_apicid(tmp);

read_msi_msg(irq, &msg);
@@ -1844,12 +1886,10 @@ static void set_ht_irq_affinity(unsigned

cpus_and(mask, tmp, CPU_MASK_ALL);

- vector = assign_irq_vector(irq, mask);
+ vector = assign_irq_vector(irq, mask, &tmp);
if (vector < 0)
return;

- cpus_clear(tmp);
- cpu_set(vector >> 8, tmp);
dest = cpu_mask_to_apicid(tmp);

target_ht_irq(irq, dest, vector & 0xff);
@@ -1871,15 +1911,13 @@ #endif
int arch_setup_ht_irq(unsigned int irq, struct pci_dev *dev)
{
int vector;
+ cpumask_t tmp;

- vector = assign_irq_vector(irq, TARGET_CPUS);
+ vector = assign_irq_vector(irq, TARGET_CPUS, &tmp);
if (vector >= 0) {
u32 low, high;
unsigned dest;
- cpumask_t tmp;

- cpus_clear(tmp);
- cpu_set(vector >> 8, tmp);
dest = cpu_mask_to_apicid(tmp);

high = HT_IRQ_HIGH_DEST_ID(dest);
@@ -1945,13 +1983,10 @@ int io_apic_set_pci_routing (int ioapic,
add_pin_to_irq(irq, ioapic, pin);


- vector = assign_irq_vector(irq, TARGET_CPUS);
+ vector = assign_irq_vector(irq, TARGET_CPUS, &mask);
if (vector < 0)
return vector;

- cpus_clear(mask);
- cpu_set(vector >> 8, mask);
-
/*
* Generate a PCI IRQ routing entry and program the IOAPIC accordingly.
* Note that we mask (disable) IRQs now -- these get enabled when the
diff --git a/include/asm-x86_64/genapic.h b/include/asm-x86_64/genapic.h
index 81e7146..a0e9a4b 100644
--- a/include/asm-x86_64/genapic.h
+++ b/include/asm-x86_64/genapic.h
@@ -18,6 +18,7 @@ struct genapic {
u32 int_dest_mode;
int (*apic_id_registered)(void);
cpumask_t (*target_cpus)(void);
+ cpumask_t (*vector_allocation_domain)(int cpu);
void (*init_apic_ldr)(void);
/* ipi */
void (*send_IPI_mask)(cpumask_t mask, int vector);
diff --git a/include/asm-x86_64/mach_apic.h b/include/asm-x86_64/mach_apic.h
index d334224..7b7115a 100644
--- a/include/asm-x86_64/mach_apic.h
+++ b/include/asm-x86_64/mach_apic.h
@@ -17,6 +17,7 @@ #include <asm/genapic.h>
#define INT_DELIVERY_MODE (genapic->int_delivery_mode)
#define INT_DEST_MODE (genapic->int_dest_mode)
#define TARGET_CPUS (genapic->target_cpus())
+#define vector_allocation_domain (genapic->vector_allocation_domain)
#define apic_id_registered (genapic->apic_id_registered)
#define init_apic_ldr (genapic->init_apic_ldr)
#define send_IPI_mask (genapic->send_IPI_mask)
--
1.4.2.rc3.g7e18e-dirty

2006-10-08 18:59:37

by Muli Ben-Yehuda

[permalink] [raw]
Subject: Re: [PATCH 0/3] x86_64 irq fixes

On Sun, Oct 08, 2006 at 07:39:38AM -0600, Eric W. Biederman wrote:

> I have tested the code as best I can, and confirmation that this
> fixes the original problem would be great. But I don't see how
> it could fail to fix the problem, as it restores genapic_flat to
> global vector allocation.

Works for me. Thanks, Eric.

Acked-by: Muli Ben-Yehuda <[email protected]>

Cheers,
Muli

2006-10-08 19:01:49

by Muli Ben-Yehuda

[permalink] [raw]
Subject: Re: [PATCH 3/3] x86_64 irq: Allocate a vector across all cpus for genapic_flat.

On Sun, Oct 08, 2006 at 07:47:55AM -0600, Eric W. Biederman wrote:

> This should also fix the problem report where a hyperthreaded
> cpu was receving the irq on the wrong hyperthread when in
> logical delivery mode because the previous behaviour is restored.
>
> This patch properly records our allocation of the first 16 irqs
> to the first 16 available vectors on all cpus. This should be
> fine but it may run into problems with multiple interrupts at
> the same interrupt level. Except for some badly maintained comments
> in the code and the behaviour of the interrupt allocator I have
> no real understanding of that problem.
>
> Signed-off-by: Eric W. Biederman <[email protected]>

Acked-by: Muli Ben-Yehuda <[email protected]>

Cheers,
Muli

2006-10-09 05:44:43

by Eric W. Biederman

[permalink] [raw]
Subject: [PATCH 1/1] x86_64 irq: Scream but don't die if we receive an unexpected irq


Due to code bugs or misbehaving hardware it is possible that we
can receive an interrupt that we have not mapped into a linux irq.
Calling BUG when that happens is very rude, and if the problem
is mild enough prevents anything else from getting done.

So instead of calling BUG just scream loudly about the problem
and continue running. We don't have enough knowledge to know
which interrupt triggered this behavior so we don't acknowledge it.
This will likely prevent a recurrence of the problem by jamming
up the works with an unacknowledged interrupt.

If the interrupt was something important it is quite possible
that nothing productive will happen past this point. But
it is now at least possible to keep working if the kernel
can survive without the interrupt we dropped on the floor.

Solutions like irqpoll should generally make dropped irqs non-fatal.

Signed-off-by: Eric W. Biederman <[email protected]>
---
arch/x86_64/kernel/irq.c | 14 +++++++-------
1 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/arch/x86_64/kernel/irq.c b/arch/x86_64/kernel/irq.c
index b8a407f..dff68eb 100644
--- a/arch/x86_64/kernel/irq.c
+++ b/arch/x86_64/kernel/irq.c
@@ -114,16 +114,16 @@ asmlinkage unsigned int do_IRQ(struct pt
irq_enter();
irq = __get_cpu_var(vector_irq)[vector];

- if (unlikely(irq >= NR_IRQS)) {
- printk(KERN_EMERG "%s: cannot handle IRQ %d\n",
- __FUNCTION__, irq);
- BUG();
- }
-
#ifdef CONFIG_DEBUG_STACKOVERFLOW
stack_overflow_check(regs);
#endif
- generic_handle_irq(irq);
+
+ if (likely(irq < NR_IRQS))
+ generic_handle_irq(irq);
+ else
+ printk(KERN_EMERG "%s: %d.%d No irq handler for vector\n",
+ __func__, smp_processor_id(), vector);
+
irq_exit();

set_irq_regs(old_regs);
--
1.4.2.rc3.g7e18e-dirty

2006-10-09 06:11:32

by Eric W. Biederman

[permalink] [raw]
Subject: Re: 2.6.19-rc1 genirq causes either boot hang or "do_IRQ: cannot handle IRQ -1"

Linus Torvalds <[email protected]> writes:

> On Sat, 7 Oct 2006, Arjan van de Ven wrote:
>>
>> it seems the right mix at this time is to have the software select the
>> package, and the hardware pick the core within the package.
>
> I think that sounds like a fairly good approach.
>
> Software obviously can make the "rough" selections, it's the fine-grained
> ones that are harder (and might need to be done at a frequency that just
> makes it impractical).
>
> So yes, having software say "We want to steer this particular interrupt to
> this L3 cache domain" sounds eminently sane.
>
> Having software specify which L1 cache domain it wants to pollute is
> likely just crazy micro-management.

The current interrupt delivery abstraction in the kernel is a
set of cpus an interrupt can be delivered to. Which seem sufficient
to the cause of aiming at a cache domain. Frequently the lower
levels of interrupt delivery map this to a single cpu because of
hardware limitations but in certain cases we can honor a multiple cpu
request.

I believe the scheduler has knowledge about different locality domains
for NUMA and everything else. So what is wanting on our side is some
architecture? work to do the broad steering by default.

Our current policies on x86_64 are much less enlightened by default.
If we have < 8 cpus and CONFIG_CPU_HOTPLUG is not defined we let
the hardware pick the cpu. Otherwise we send the interrupt to the
first cpu in the set. Which means the first cpu. Beyond that
everything is left up to the user space irqbalanced.

My patches were about keeping us from artificially merging multiple
irq sources into the same linux irq when we ran short of vectors
so we have a chance to aim and observe all irq sources as individuals.

Now it is possible to do all of this fine policy work that has been
discussed in this thread. But since I don't see that problem yet I'm
probably not the man for that job.

The truly challenging corollary to my work and this discussion is
handling the up coming network adapters that can start demuxing large
network pipes with a different irq for each cache domain in the
system, but the details of how distinct irq sources you need from the
hardware are left up to the software to decide at run time.

Eric

2006-10-09 07:40:42

by Arjan van de Ven

[permalink] [raw]
Subject: Re: 2.6.19-rc1 genirq causes either boot hang or "do_IRQ: cannot handle IRQ -1"

> > So yes, having software say "We want to steer this particular interrupt to
> > this L3 cache domain" sounds eminently sane.
> >
> > Having software specify which L1 cache domain it wants to pollute is
> > likely just crazy micro-management.
>
> The current interrupt delivery abstraction in the kernel is a
> set of cpus an interrupt can be delivered to. Which seem sufficient
> to the cause of aiming at a cache domain. Frequently the lower
> levels of interrupt delivery map this to a single cpu because of
> hardware limitations but in certain cases we can honor a multiple cpu
> request.
>
> I believe the scheduler has knowledge about different locality domains
> for NUMA and everything else. So what is wanting on our side is some
> architecture? work to do the broad steering by default.


well normally this is the job of the userspace IRQ balancer to get
right; the thing is undergoing a redesign right now to be smarter and
deal better with dual/quad core, numa etc etc, but as a principle thing
this is best done in userspace (simply because there's higher level
information there, like "is this interrupt for a network device", so
that policy can take that into account)


--
if you want to mail me at work (you don't), use arjan (at) linux.intel.com

2006-10-09 14:48:56

by Eric W. Biederman

[permalink] [raw]
Subject: Re: 2.6.19-rc1 genirq causes either boot hang or "do_IRQ: cannot handle IRQ -1"

Arjan van de Ven <[email protected]> writes:

>> > So yes, having software say "We want to steer this particular interrupt to
>> > this L3 cache domain" sounds eminently sane.
>> >
>> > Having software specify which L1 cache domain it wants to pollute is
>> > likely just crazy micro-management.
>>
>> The current interrupt delivery abstraction in the kernel is a
>> set of cpus an interrupt can be delivered to. Which seem sufficient
>> to the cause of aiming at a cache domain. Frequently the lower
>> levels of interrupt delivery map this to a single cpu because of
>> hardware limitations but in certain cases we can honor a multiple cpu
>> request.
>>
>> I believe the scheduler has knowledge about different locality domains
>> for NUMA and everything else. So what is wanting on our side is some
>> architecture? work to do the broad steering by default.
>
>
> well normally this is the job of the userspace IRQ balancer to get
> right; the thing is undergoing a redesign right now to be smarter and
> deal better with dual/quad core, numa etc etc, but as a principle thing
> this is best done in userspace (simply because there's higher level
> information there, like "is this interrupt for a network device", so
> that policy can take that into account)

So far I have seen all of that higher level information in the kernel,
and it has to export it to user space for the user space daemon to
do anything about it.

The only time I have seen user space control being useful is when
there isn't a proper default policy so at least you can distribute
things between the cache domains properly. So far I have been
more tempted to turn it off as it will routinely change which
NUMA node an irq is pointing at, which is not at all ideal
for performance, and I haven't seen a way (short of replacing it)
to tell the user space irq balancer that it got it's default policy
wrong.

It is quite possible I don't have the whole story, but so far it
just feels like we are making a questionable decision by pushing
things out to user space. If nothing else it seems to make changes
more difficult in the irq handling infrastructure because we
have to maintain stable interfaces. Things like per cpu counters
for every irq start to look really questionable when you scale
the system up in size.

Eric

2006-10-09 15:28:29

by Protasevich, Natalie

[permalink] [raw]
Subject: RE: 2.6.19-rc1 genirq causes either boot hang or "do_IRQ: cannot handle IRQ -1"



> -----Original Message-----
> From: Eric W. Biederman [mailto:[email protected]]
> Sent: Monday, October 09, 2006 8:46 AM
> To: Arjan van de Ven
> Cc: Linus Torvalds; Muli Ben-Yehuda; Ingo Molnar; Thomas
> Gleixner; Benjamin Herrenschmidt; Rajesh Shah; Andi Kleen;
> Protasevich, Natalie; Luck, Tony; Andrew Morton;
> Linux-Kernel; Badari Pulavarty; Roland Dreier
> Subject: Re: 2.6.19-rc1 genirq causes either boot hang or
> "do_IRQ: cannot handle IRQ -1"
>
> Arjan van de Ven <[email protected]> writes:
>
> >> > So yes, having software say "We want to steer this particular
> >> > interrupt to this L3 cache domain" sounds eminently sane.
> >> >
> >> > Having software specify which L1 cache domain it wants
> to pollute
> >> > is likely just crazy micro-management.
> >>
> >> The current interrupt delivery abstraction in the kernel
> is a set of
> >> cpus an interrupt can be delivered to. Which seem
> sufficient to the
> >> cause of aiming at a cache domain. Frequently the lower levels of
> >> interrupt delivery map this to a single cpu because of hardware
> >> limitations but in certain cases we can honor a multiple
> cpu request.
> >>
> >> I believe the scheduler has knowledge about different locality
> >> domains for NUMA and everything else. So what is wanting
> on our side
> >> is some architecture? work to do the broad steering by default.
> >
> >
> > well normally this is the job of the userspace IRQ balancer to get
> > right; the thing is undergoing a redesign right now to be
> smarter and
> > deal better with dual/quad core, numa etc etc, but as a principle
> > thing this is best done in userspace (simply because there's higher
> > level information there, like "is this interrupt for a network
> > device", so that policy can take that into account)
>
> So far I have seen all of that higher level information in
> the kernel, and it has to export it to user space for the
> user space daemon to do anything about it.
>
> The only time I have seen user space control being useful is
> when there isn't a proper default policy so at least you can
> distribute things between the cache domains properly. So far
> I have been more tempted to turn it off as it will routinely
> change which NUMA node an irq is pointing at, which is not at
> all ideal for performance, and I haven't seen a way (short of
> replacing it) to tell the user space irq balancer that it got
> it's default policy wrong.
>
> It is quite possible I don't have the whole story, but so far
> it just feels like we are making a questionable decision by
> pushing things out to user space. If nothing else it seems
> to make changes more difficult in the irq handling
> infrastructure because we have to maintain stable interfaces.
> Things like per cpu counters for every irq start to look
> really questionable when you scale the system up in size.

I'd like also to question current policies of user space irqbalanced. It
seems to just go round-robin without much heuristics involved. We are
seeing loss of timer interrupts on our systems - and the more processors
the more noticeable it is, but it starts even on 8x partitions; on 48x
system I see about 50% loss, on both ia32 and x86_64 (haven't checked on
ia64 yet). With say 16 threads it is unsettling to see 70% overall idle
time, and still only 40-50% of interrupts go through. System's time is
not affected, so the problem is on the back burner for now :) It's not
clear yet whether this is software or hardware fault, and how much
damage it does (performance wise etc.) I will provide more information
as it becomes available, just wanted to raise red flag, maybe others
also see such phenomena.
Thanks,
--Natalie

2006-10-09 15:39:53

by Arjan van de Ven

[permalink] [raw]
Subject: RE: 2.6.19-rc1 genirq causes either boot hang or "do_IRQ: cannot handle IRQ -1"

On Mon, 2006-10-09 at 10:28 -0500, Protasevich, Natalie wrote:

> I'd like also to question current policies of user space irqbalanced. It
> seems to just go round-robin without much heuristics involved.

only for the timer interrupt and only because "people" didn't want to
see it bound to a specific CPU. For all others there's quite some
heuristics actually

> We are
> seeing loss of timer interrupts on our systems - and the more processors
> the more noticeable it is, but it starts even on 8x partitions; on 48x
> system I see about 50% loss, on both ia32 and x86_64 (haven't checked on
> ia64 yet). With say 16 threads it is unsettling to see 70% overall idle
> time, and still only 40-50% of interrupts go through. System's time is
> not affected, so the problem is on the back burner for now :) It's not
> clear yet whether this is software or hardware fault,

I'd call it a hardware fault. But them I'm biased.


2006-10-09 16:03:23

by Protasevich, Natalie

[permalink] [raw]
Subject: RE: 2.6.19-rc1 genirq causes either boot hang or "do_IRQ: cannothandle IRQ -1"

> On Mon, 2006-10-09 at 10:28 -0500, Protasevich, Natalie wrote:
>
> > I'd like also to question current policies of user space
> irqbalanced.
> > It seems to just go round-robin without much heuristics involved.
>
> only for the timer interrupt and only because "people" didn't
> want to see it bound to a specific CPU. For all others
> there's quite some heuristics actually

Ah, this explains a lot. I was planning to try binding the timer to a
CPU or a node (as soon as get a system for testing).

>
> > We are
> > seeing loss of timer interrupts on our systems - and the more
> > processors the more noticeable it is, but it starts even on 8x
> > partitions; on 48x system I see about 50% loss, on both ia32 and
> > x86_64 (haven't checked on
> > ia64 yet). With say 16 threads it is unsettling to see 70% overall
> > idle time, and still only 40-50% of interrupts go through. System's
> > time is not affected, so the problem is on the back burner
> for now :)
> > It's not clear yet whether this is software or hardware fault,
>
> I'd call it a hardware fault. But them I'm biased.

It is the main suspect for now, yes (I tend to be biased this way too :)
Those are NUMA machines that run as non-NUMA sometimes, and I still need
to sort out if it happens in both cases, or either and all the aspects
that may have come into play.