2007-12-09 07:51:58

by Borislav Petkov

[permalink] [raw]
Subject: 2.6.24-rc4-mm1: acpi reboots machine

Hi Andrew,
Hi Len,

after booting 2.6.24-rc4-mm1 (2.6.24-rc4-190-g94545ba, otoh, boots just
fine) on my asus laptop, the machine reboots after claiming that
"Critical temperature reached (255 C)." However, the degrees number
is kinda hinting at 0xff all-ones field. Will try dump_stack in
acpi_thermal_critical() to checkout the call path. For now here's the netconsole bootlog:

[ 0.000000] Linux version 2.6.24-rc4-mm1 (boris@gollum) (gcc version 4.2.3 20071123 (prerelease) (Debian 4.2.2-4)) #7 SMP PREEMPT Sun Dec 9 08:27:26 CET 2007
[ 0.000000] BIOS-provided physical RAM map:
[ 0.000000] BIOS-e820: 0000000000000000 - 000000000009fc00 (usable)
[ 0.000000] BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved)
[ 0.000000] BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved)
[ 0.000000] BIOS-e820: 0000000000100000 - 000000001ff40000 (usable)
[ 0.000000] BIOS-e820: 000000001ff40000 - 000000001ff50000 (ACPI data)
[ 0.000000] BIOS-e820: 000000001ff50000 - 0000000020000000 (ACPI NVS)
[ 0.000000] 511MB LOWMEM available.
[ 0.000000] Zone PFN ranges:
[ 0.000000] DMA 0 -> 4096
[ 0.000000] Normal 4096 -> 130880
[ 0.000000] Movable zone start PFN for each node
[ 0.000000] early_node_map[1] active PFN ranges
[ 0.000000] 0: 0 -> 130880
[ 0.000000] DMI 2.3 present.
[ 0.000000] ACPI: RSDP 000F5DF0, 0014 (r0 ACPIAM)
[ 0.000000] ACPI: RSDT 1FF40000, 002C (r1 A M I OEMRSDT 6000423 MSFT 97)
[ 0.000000] ACPI: FACP 1FF40200, 0081 (r1 A M I OEMFACP 6000423 MSFT 97)
[ 0.000000] ACPI: DSDT 1FF40400, 628D (r1 1ABSP 1ABSP001 1 MSFT 2000001)
[ 0.000000] ACPI: FACS 1FF50000, 0040
[ 0.000000] ACPI: OEMB 1FF50040, 0053 (r1 A M I OEMBIOS 6000423 MSFT 97)
[ 0.000000] ACPI: PM-Timer IO Port: 0x408
[ 0.000000] Allocating PCI resources starting at 30000000 (gap: 20000000:e0000000)
[ 0.000000] swsusp: Registered nosave memory region: 000000000009f000 - 00000000000a0000
[ 0.000000] swsusp: Registered nosave memory region: 00000000000a0000 - 00000000000e0000
[ 0.000000] swsusp: Registered nosave memory region: 00000000000e0000 - 0000000000100000
[ 0.000000] Built 1 zonelists in Zone order, mobility grouping on. Total pages: 129475
[ 0.000000] Kernel command line: root=/dev/hda1 vga=0 nmi_watchdog=1 [email protected]/,@192.168.45.26/
[ 0.000000] Found and enabled local APIC!
[ 0.000000] Enabling fast FPU save and restore... done.
[ 0.000000] Enabling unmasked SIMD FPU exception support... done.
[ 0.000000] Initializing CPU#0
[ 0.000000] CPU 0 irqstacks, hard=c0451000 soft=c0449000
[ 0.000000] PID hash table entries: 2048 (order: 11, 8192 bytes)
[ 0.000000] Detected 1500.114 MHz processor.
[ 50.138075] Console: colour VGA+ 80x25
[ 50.138080] console [tty0] enabled
[ 50.140479] Dentry cache hash table entries: 65536 (order: 6, 262144 bytes)
[ 50.140882] Inode-cache hash table entries: 32768 (order: 5, 131072 bytes)
[ 50.160065] Memory: 513364k/523520k available (2049k kernel code, 9712k reserved, 1113k data, 172k init, 0k highmem)
[ 50.160147] virtual kernel memory layout:
[ 50.160148] fixmap : 0xfffb5000 - 0xfffff000 ( 296 kB)
[ 50.160150] vmalloc : 0xe0800000 - 0xfffb3000 ( 503 MB)
[ 50.160151] lowmem : 0xc0000000 - 0xdff40000 ( 511 MB)
[ 50.160153] .init : 0xc041b000 - 0xc0446000 ( 172 kB)
[ 50.160154] .data : 0xc030067f - 0xc0416ca8 (1113 kB)
[ 50.160156] .text : 0xc0100000 - 0xc030067f (2049 kB)
[ 50.160549] Checking if this processor honours the WP bit even in supervisor mode... Ok.
[ 50.160705] SLUB: Genslabs=11, HWalign=64, Order=0-1, MinObjects=4, CPUs=1, Nodes=1
[ 50.220728] Calibrating delay using timer specific routine.. 3003.73 BogoMIPS (lpj=1501865)
[ 50.220857] Security Framework initialized
[ 50.220934] Mount-cache hash table entries: 512
[ 50.221174] CPU: L1 I cache: 32K, L1 D cache: 32K
[ 50.221273] CPU: L2 cache: 1024K
[ 50.221338] Intel machine check architecture supported.
[ 50.221398] Intel machine check reporting enabled on CPU#0.
[ 50.221459] Compat vDSO mapped to ffffe000.
[ 50.221524] Checking 'hlt' instruction... OK.
[ 50.225022] SMP alternatives: switching to UP code
[ 50.225766] Freeing SMP alternatives: 11k freed
[ 50.225823] ACPI: Core revision 20070126
[ 50.229623] ACPI: setting ELCR to 0200 (from 0c30)
[ 50.734915] CPU0: Intel(R) Pentium(R) M processor 1500MHz stepping 05
[ 50.735059] SMP motherboard not detected.
[ 50.836119] Brought up 1 CPUs
[ 50.836305] khelper used greatest stack depth: 3352 bytes left
[ 50.836463] net_namespace: 108 bytes
[ 50.837167] NET: Registered protocol family 16
[ 50.837466] ACPI: bus type pci registered
[ 50.838812] PCI: PCI BIOS revision 2.10 entry at 0xf0031, last bus=2
[ 50.838872] PCI: Using configuration type 1
[ 50.838928] Setting up standard PCI resources
[ 50.850451] khelper used greatest stack depth: 3280 bytes left
[ 50.851263] khelper used greatest stack depth: 3112 bytes left
[ 50.857186] ACPI: EC: non-query interrupt received, switching to interrupt mode
[ 50.858465] ACPI: Interpreter enabled
[ 50.858524] ACPI: (supports S0 S3 S4 S5)
[ 50.858760] ACPI: Using PIC for interrupt routing
[ 50.868377] ACPI: EC: GPE = 0x1c, I/O: command/status = 0x66, data = 0x62
[ 50.868445] ACPI: EC: driver started in interrupt mode
[ 50.868555] ACPI: PCI Root Bridge [PCI0] (0000:00)
[ 50.869056] PCI: Enabled i801 SMBus device
[ 50.869117] PCI quirk: region 0400-047f claimed by ICH4 ACPI/GPIO/TCO
[ 50.869179] PCI quirk: region 0500-053f claimed by ICH4 GPIO
[ 50.869806] PCI: Transparent bridge - 0000:00:1e.0
[ 50.873140] ACPI: PCI Interrupt Link [LNKA] (IRQs 3 4 5 6 7 10 *11 12)
[ 50.873851] ACPI: PCI Interrupt Link [LNKB] (IRQs 3 4 5 6 7 *11 12)
[ 50.874535] ACPI: PCI Interrupt Link [LNKC] (IRQs *4 10 12)
[ 50.875034] ACPI: PCI Interrupt Link [LNKD] (IRQs *5 6 10)
[ 50.875537] ACPI: PCI Interrupt Link [LNKE] (IRQs 6 11) *0, disabled.
[ 50.876080] ACPI: PCI Interrupt Link [LNKF] (IRQs 3 7) *0, disabled.
[ 50.876623] ACPI: PCI Interrupt Link [LNKG] (IRQs 4 7) *0, disabled.
[ 50.877169] ACPI: PCI Interrupt Link [LNKH] (IRQs 4 6 *10 12)
[ 50.877640] Linux Plug and Play Support v0.97 (c) Adam Belay
[ 50.877738] pnp: PnP ACPI init
[ 50.877803] ACPI: bus type pnp registered
[ 50.883974] pnp: PnP ACPI: found 15 devices
[ 50.884035] ACPI: ACPI bus type pnp unregistered
[ 50.884094] PnPBIOS: Disabled by ACPI PNP
[ 50.884329] usbcore: registered new interface driver usbfs
[ 50.884441] usbcore: registered new interface driver hub
[ 50.884552] usbcore: registered new device driver usb
[ 50.884777] PCI: Using ACPI for IRQ routing
[ 50.884834] PCI: If a device doesn't work, try "pci=routeirq". If it helps, post a report
[ 50.884927] PCI: Cannot allocate resource region 4 of device 0000:00:1f.3
[ 50.893004] Time: tsc clocksource has been installed.
[ 50.895050] system 00:09: ioport range 0x480-0x48f has been reserved
[ 50.895112] system 00:09: ioport range 0x5c0-0x5cf has been reserved
[ 50.895176] system 00:0a: ioport range 0x4d0-0x4d1 has been reserved
[ 50.895242] system 00:0c: iomem range 0xffc00000-0xffefffff has been reserved
[ 50.895309] system 00:0d: ioport range 0x540-0x55f has been reserved
[ 50.895370] system 00:0d: ioport range 0x400-0x47f has been reserved
[ 50.895435] system 00:0d: ioport range 0x480-0x48f has been reserved
[ 50.895496] system 00:0d: ioport range 0x500-0x53f has been reserved
[ 50.895557] system 00:0d: ioport range 0x4c0-0x4cf has been reserved
[ 50.895617] system 00:0d: iomem range 0xfec00000-0xfec00fff has been reserved
[ 50.895680] system 00:0d: iomem range 0xfee00000-0xfee00fff has been reserved
[ 50.895746] system 00:0e: iomem range 0x0-0x9ffff could not be reserved
[ 50.895807] system 00:0e: iomem range 0xc0000-0xdffff could not be reserved
[ 50.895869] system 00:0e: iomem range 0xe0000-0xfffff could not be reserved
[ 50.895931] system 00:0e: iomem range 0x100000-0x1fffffff could not be reserved
[ 50.896004] system 00:0e: iomem range 0x0-0x0 could not be reserved
[ 50.926541] PCI: Bridge: 0000:00:01.0
[ 50.926598] IO window: d000-dfff
[ 50.926655] MEM window: ff800000-ff8fffff
[ 50.926713] PREFETCH window: ce900000-de9fffff
[ 50.926786] PCI: Bus 3, cardbus bridge: 0000:02:01.0
[ 50.926843] IO window: 00001400-000014ff
[ 50.926901] IO window: 00001800-000018ff
[ 50.926960] PREFETCH window: 34000000-37ffffff
[ 50.927020] MEM window: 38000000-3bffffff
[ 50.927079] PCI: Bus 7, cardbus bridge: 0000:02:01.1
[ 50.927136] IO window: 00001c00-00001cff
[ 50.927194] IO window: 00002000-000020ff
[ 50.927252] PREFETCH window: 3c000000-3fffffff
[ 50.927313] MEM window: 40000000-43ffffff
[ 50.927375] PCI: Bridge: 0000:00:1e.0
[ 50.927430] IO window: disabled.
[ 50.927488] MEM window: ff900000-ff9fffff
[ 50.927547] PREFETCH window: dea00000-deafffff
[ 50.927631] PCI: Enabling device 0000:02:01.0 (0000 -> 0003)
[ 50.927999] ACPI: PCI Interrupt Link [LNKB] enabled at IRQ 11
[ 50.928064] ACPI: PCI Interrupt 0000:02:01.0[A] -> Link [LNKB] -> GSI 11 (level, low) -> IRQ 11
[ 50.928527] ACPI: PCI Interrupt Link [LNKA] enabled at IRQ 11
[ 50.928587] ACPI: PCI Interrupt 0000:02:01.1[B] -> Link [LNKA] -> GSI 11 (level, low) -> IRQ 11
[ 50.928756] NET: Registered protocol family 2
[ 50.937028] IP route cache hash table entries: 4096 (order: 2, 16384 bytes)
[ 50.937394] TCP established hash table entries: 16384 (order: 5, 131072 bytes)
[ 50.937603] TCP bind hash table entries: 16384 (order: 6, 393216 bytes)
[ 50.938109] TCP: Hash tables configured (established 16384 bind 16384)
[ 50.938171] TCP reno registered
[ 50.943908] khelper used greatest stack depth: 3100 bytes left
[ 50.945348] Installing knfsd (copyright (C) 1996 [email protected]).
[ 50.945635] io scheduler noop registered (default)
[ 50.946705] ACPI: AC Adapter [AC] (on-line)
[ 50.947312] ACPI: Battery Slot [BAT0] (battery absent)
[ 50.947441] ACPI: Battery Slot [BAT1] (battery absent)
[ 50.947808] input: Power Button (FF) as /class/input/input0
[ 50.947869] ACPI: Power Button (FF) [PWRF]
[ 50.948032] input: Lid Switch as /class/input/input1
[ 50.948146] ACPI: Lid Switch [LID]
[ 50.948294] input: Sleep Button (CM) as /class/input/input2
[ 50.948359] ACPI: Sleep Button (CM) [SLPB]
[ 50.948511] input: Power Button (CM) as /class/input/input3
[ 50.948571] ACPI: Power Button (CM) [PWRB]
[ 50.948749] ACPI: Invalid PBLK length [7]
[ 50.949640] ACPI: CPU0 (power states: C1[C1] C2[C2])
[ 50.962860] ACPI: Critical trip point
[ 50.962918] Critical temperature reached (255 C), shutting down.
[ 50.962988] ACPI: Thermal Zone [THRM] (255 C)
[ 50.963137] Asus Laptop ACPI Extras version 0.30
[ 50.963335] M6N model detected, supported
[ 50.964168] isapnp: Scanning for PnP cards...
[ 51.320980] isapnp: No Plug & Play device found
[ 51.358289] Serial: 8250/16550 driver $Revision: 1.90 $ 4 ports, IRQ sharing disabled
[ 51.358586] serial8250: ttyS1 at I/O 0x2f8 (irq = 3) is a NS16550A
[ 51.359209] ACPI: PCI Interrupt 0000:00:1f.6[B] -> Link [LNKB] -> GSI 11 (level, low) -> IRQ 11
[ 51.359375] ACPI: PCI interrupt for device 0000:00:1f.6 disabled
[ 51.359565] tg3.c:v3.86 (November 9, 2007)
[ 51.359635] ACPI: PCI Interrupt 0000:02:00.0[A] -> Link [LNKB] -> GSI 11 (level, low) -> IRQ 11
[ 51.388900] eth0: Tigon3 [partno(BCM95788A50) rev 3003 PHY(5705)] (PCI:33MHz:32-bit) 10/100/1000Base-T Ethernet 00:11:2f:00:71:33
[ 51.389234] eth0: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] WireSpeed[0] TSOcap[1]
[ 51.389304] eth0: dma_rwctrl[763f0000] dma_mask[32-bit]
[ 51.389386] netconsole: local port 6665
[ 51.389442] netconsole: local IP 192.168.45.67
[ 51.389500] netconsole: interface eth0
[ 51.389555] netconsole: remote port 6666
[ 51.389611] netconsole: remote IP 192.168.45.26
[ 51.389669] netconsole: remote ethernet address ff:ff:ff:ff:ff:ff
[ 51.389729] netconsole: device eth0 not up yet, forcing it
[ 53.278349] tg3: eth0: Link is up at 100 Mbps, full duplex.
[ 53.278409] tg3: eth0: Flow control is on for TX and on for RX.
[ 53.281188] console [netcon0] enabled
[ 53.293588] netconsole: network logging started
[ 53.293647] Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
[ 53.293720] ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
[ 53.293894] ICH4: IDE controller (0x8086:0x24ca rev 0x03) at PCI slot 0000:00:1f.1
[ 53.293971] PCI: Enabling device 0000:00:1f.1 (0005 -> 0007)
[ 53.294419] ACPI: PCI Interrupt Link [LNKC] enabled at IRQ 4
[ 53.294486] ACPI: PCI Interrupt 0000:00:1f.1[A] -> Link [LNKC] -> GSI 4 (level, low) -> IRQ 4
[ 53.294670] ICH4: not 100% native mode: will probe irqs later
[ 53.294750] ide0: BM-DMA at 0xffa0-0xffa7, BIOS settings: hda:DMA, hdb:pio
[ 53.294924] ide1: BM-DMA at 0xffa8-0xffaf, BIOS settings: hdc:DMA, hdd:pio
[ 54.169666] hda: IC25N060ATMR04-0, ATA DISK drive
[ 55.503664] hdc: TOSHIBA ODD-DVD SD-R6372, ATAPI CD/DVD-ROM drive
[ 55.504886] ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
[ 55.527065] ide1 at 0x170-0x177,0x376 on irq 15
[ 55.538103] hda: max request size: 512KiB
[ 55.538432] hda: 117210240 sectors (60011 MB) w/7884KiB Cache, CHS=16383/255/63, UDMA(100)
[ 55.539013] hda: cache flushes supported
[ 55.539119] hda: hda1 hda2 hda3
[ 55.553931] hdc: ATAPI 24X DVD-ROM DVD-R CD-R/RW drive, 2048kB Cache, UDMA(33)
[ 55.554381] Uniform CD-ROM driver Revision: 3.20
[ 55.560575] ACPI: PCI Interrupt Link [LNKH] enabled at IRQ 10
[ 55.560642] ACPI: PCI Interrupt 0000:00:1d.7[D] -> Link [LNKH] -> GSI 10 (level, low) -> IRQ 10
[ 55.560836] ehci_hcd 0000:00:1d.7: EHCI Host Controller
[ 55.560974] ehci_hcd 0000:00:1d.7: new USB bus registered, assigned bus number 1
[ 55.564994] ehci_hcd 0000:00:1d.7: debug port 1
[ 55.565074] ehci_hcd 0000:00:1d.7: irq 10, io mem 0xffaffc00
[ 55.574449] ehci_hcd 0000:00:1d.7: USB 2.0 started, EHCI 1.00, driver 10 Dec 2004
[ 55.574763] usb usb1: configuration #1 chosen from 1 choice
[ 55.574888] hub 1-0:1.0: USB hub found
[ 55.574951] hub 1-0:1.0: 6 ports detected
[ 55.675460] usb usb1: New USB device found, idVendor=0000, idProduct=0000
[ 55.675523] usb usb1: New USB device strings: Mfr=3, Product=2, SerialNumber=1
[ 55.675606] usb usb1: Product: EHCI Host Controller
[ 55.675664] usb usb1: Manufacturer: Linux 2.6.24-rc4-mm1 ehci_hcd
[ 55.675735] usb usb1: SerialNumber: 0000:00:1d.7
[ 55.830303] usbcore: registered new interface driver usbserial
[ 55.830365] drivers/usb/serial/usb-serial.c: USB Serial Driver core
[ 55.830476] drivers/usb/serial/usb-serial.c: USB Serial support registered for Keyspan - (without firmware)
[ 55.830595] drivers/usb/serial/usb-serial.c: USB Serial support registered for Keyspan 1 port adapter
[ 55.830714] drivers/usb/serial/usb-serial.c: USB Serial support registered for Keyspan 2 port adapter
[ 55.830835] drivers/usb/serial/usb-serial.c: USB Serial support registered for Keyspan 4 port adapter
[ 55.830961] usbcore: registered new interface driver keyspan
[ 55.831022] drivers/usb/serial/keyspan.c: v1.1.5:Keyspan USB to Serial Converter Driver
[ 55.831212] PNP: PS/2 Controller [PNP0303:PS2K,PNP0f12:PS2M] at 0x60,0x64 irq 1,12
[ 55.833451] i8042.c: Detected active multiplexing controller, rev 1.1.
[ 55.835365] serio: i8042 KBD port at 0x60,0x64 irq 1
[ 55.835428] serio: i8042 AUX0 port at 0x60,0x64 irq 12
[ 55.835498] serio: i8042 AUX1 port at 0x60,0x64 irq 12
[ 55.835559] serio: i8042 AUX2 port at 0x60,0x64 irq 12
[ 55.835629] serio: i8042 AUX3 port at 0x60,0x64 irq 12
[ 55.835866] mice: PS/2 mouse device common for all mice
[ 55.836284] cpuidle: using governor ladder
[ 55.951618] input: AT Translated Set 2 keyboard as /class/input/input4
[ 56.836635] cpuidle: using governor menu
[ 56.836763] Advanced Linux Sound Architecture Driver Version 1.0.15 (Tue Nov 20 19:16:42 2007 UTC).
[ 56.836955] ACPI: PCI Interrupt 0000:00:1f.5[B] -> Link [LNKB] -> GSI 11 (level, low) -> IRQ 11
[ 56.838502] Marking TSC unstable due to: TSC halts in idle.
[ 56.838572] Time: acpi_pm clocksource has been installed.
[ 57.646804] intel8x0_measure_ac97_clock: measured 50509 usecs
[ 57.646867] intel8x0: clocking to 48000
[ 57.647364] ALSA device list:
[ 57.647421] #0: Intel 82801DB-ICH4 with STAC9750,51 at irq 11
[ 57.647552] TCP cubic registered
[ 57.647648] NET: Registered protocol family 1
[ 57.647738] NET: Registered protocol family 17
[ 57.647988] RPC: Registered udp transport module.
[ 57.648048] RPC: Registered tcp transport module.
[ 57.648200] Testing NMI watchdog ... OK.
[ 57.668313] Using IPI No-Shortcut mode
[ 57.733562] kjournald starting. Commit interval 5 seconds
[ 57.733643] EXT3-fs: mounted filesystem with ordered data mode.
[ 57.733729] VFS: Mounted root (ext3 filesystem) readonly.
[ 57.733969] Freeing unused kernel memory: 172k freed
[ 57.800235] khelper used greatest stack depth: 2648 bytes left
[ 58.980087] stty used greatest stack depth: 2640 bytes left
[ 59.600742] mount used greatest stack depth: 2140 bytes left
[ 60.959463] ACPI: Critical trip point
[ 60.959542] Critical temperature reached (255 C), shutting down.
[ 62.925991] Real Time Clock Driver v1.12ac
[ 63.082557] Linux agpgart interface v0.103
[ 63.128504] agpgart: suspend/resume problematic: resume with 3D/DRI active may lockup X.Org
[ 63.128508] on some chipset/BIOS combos (see DEBUG_AGP_PM in intel-agp.c)
[ 63.128718] agpgart: Detected an Intel 855PM Chipset.
[ 63.143091] agpgart: AGP aperture is 256M @ 0xe0000000
[ 63.166109] USB Universal Host Controller Interface driver v3.0
[ 63.166282] ACPI: PCI Interrupt 0000:00:1d.0[A] -> Link [LNKA] -> GSI 11 (level, low) -> IRQ 11
[ 63.166490] uhci_hcd 0000:00:1d.0: UHCI Host Controller
[ 63.166607] uhci_hcd 0000:00:1d.0: new USB bus registered, assigned bus number 2
[ 63.167403] uhci_hcd 0000:00:1d.0: irq 11, io base 0x0000e800
[ 63.167652] usb usb2: configuration #1 chosen from 1 choice
[ 63.167757] hub 2-0:1.0: USB hub found
[ 63.167820] hub 2-0:1.0: 2 ports detected
[ 63.237339] Shutdown: hda
[ 63.268462] usb usb2: New USB device found, idVendor=0000, idProduct=0000
[ 63.268537] usb usb2: New USB device strings: Mfr=3, Product=2, SerialNumber=1
[ 63.268629] usb usb2: Product: UHCI Host Controller
[ 63.268687] usb usb2: Manufacturer: Linux 2.6.24-rc4-mm1 uhci_hcd
[ 63.268760] usb usb2: SerialNumber: 0000:00:1d.0
[ 63.269225] ACPI: PCI Interrupt Link [LNKD] enabled at IRQ 5
[ 63.269292] ACPI: PCI Interrupt 0000:00:1d.1[B] -> Link [LNKD] -> GSI 5 (level, low) -> IRQ 5
[ 63.269496] uhci_hcd 0000:00:1d.1: UHCI Host Controller
[ 63.269603] uhci_hcd 0000:00:1d.1: new USB bus registered, assigned bus number 3
[ 63.269709] uhci_hcd 0000:00:1d.1: irq 5, io base 0x0000e880
[ 63.269944] usb usb3: configuration #1 chosen from 1 choice
[ 63.270049] hub 3-0:1.0: USB hub found
[ 63.270111] hub 3-0:1.0: 2 ports detected
[ 63.370377] usb usb3: New USB device found, idVendor=0000, idProduct=0000
[ 63.370450] usb usb3: New USB device strings: Mfr=3, Product=2, SerialNumber=1
[ 63.370536] usb usb3: Product: UHCI Host Controller
[ 63.370595] usb usb3: Manufacturer: Linux 2.6.24-rc4-mm1 uhci_hcd
[ 63.370667] usb usb3: SerialNumber: 0000:00:1d.1
[ 63.370772] ACPI: PCI Interrupt 0000:00:1d.2[C] -> Link [LNKC] -> GSI 4 (level, low) -> IRQ 4
[ 63.370967] uhci_hcd 0000:00:1d.2: UHCI Host Controller
[ 63.371072] uhci_hcd 0000:00:1d.2: new USB bus registered, assigned bus number 4
[ 63.371177] uhci_hcd 0000:00:1d.2: irq 4, io base 0x0000ec00
[ 63.371413] usb usb4: configuration #1 chosen from 1 choice
[ 63.371517] hub 4-0:1.0: USB hub found
[ 63.371579] hub 4-0:1.0: 2 ports detected
[ 63.472261] usb usb4: New USB device found, idVendor=0000, idProduct=0000
[ 63.472325] usb usb4: New USB device strings: Mfr=3, Product=2, SerialNumber=1
[ 63.472409] usb usb4: Product: UHCI Host Controller
[ 63.472467] usb usb4: Manufacturer: Linux 2.6.24-rc4-mm1 uhci_hcd
[ 63.472540] usb usb4: SerialNumber: 0000:00:1d.2
[ 63.475203] usb 2-2: new low speed USB device using uhci_hcd and address 2
[ 63.639154] usb 2-2: configuration #1 chosen from 1 choice
[ 63.642112] usb 2-2: New USB device found, idVendor=046d, idProduct=c016
[ 63.642176] usb 2-2: New USB device strings: Mfr=1, Product=2, SerialNumber=0
[ 63.642253] usb 2-2: Product: Optical USB Mouse
[ 63.642311] usb 2-2: Manufacturer: Logitech
[ 63.709809] Disabling non-boot CPUs ...
[ 63.709884] Power down.
[ 63.709944] acpi_power_off called
--
Regards/Gru?,
Boris.


Attachments:
(No filename) (20.49 kB)
config-2.6.24-rc4-mm1.bz2 (8.82 kB)
Download all attachments

2007-12-09 09:21:17

by Borislav Petkov

[permalink] [raw]
Subject: Re: 2.6.24-rc4-mm1: acpi reboots machine

On Sun, Dec 09, 2007 at 08:50:02AM +0100, Borislav Petkov wrote:
> Hi Andrew,
> Hi Len,
>
> after booting 2.6.24-rc4-mm1 (2.6.24-rc4-190-g94545ba, otoh, boots just
> fine) on my asus laptop, the machine reboots after claiming that
> "Critical temperature reached (255 C)." However, the degrees number
> is kinda hinting at 0xff all-ones field. Will try dump_stack in
> acpi_thermal_critical() to checkout the call path. For now here's the netconsole bootlog:

Here's what i got so far:

[ 50.287939] Pid: 1, comm: swapper Not tainted 2.6.24-rc4-mm1 #14
[ 50.287999] [<c0104b65>] show_trace_log_lvl+0x12/0x25
[ 50.288103] [<c01053e7>] show_trace+0xd/0x10
[ 50.288202] [<c0105a6c>] dump_stack+0x57/0x5f
[ 50.288303] [<c021c991>] acpi_thermal_check+0x150/0x3bb
[ 50.288415] [<c021d4b3>] acpi_thermal_add+0x261/0x2cf
[ 50.288515] [<c0213549>] acpi_device_probe+0x3e/0xdb
[ 50.288615] [<c023f8f5>] driver_probe_device+0xaf/0x12a
[ 50.288717] [<c023fa88>] __driver_attach+0x6c/0xa5
[ 50.288817] [<c023ee5a>] bus_for_each_dev+0x3e/0x60
[ 50.288916] [<c023f77d>] driver_attach+0x14/0x16
[ 50.289015] [<c023f5a6>] bus_add_driver+0xa6/0x1a8
[ 50.289114] [<c023fc53>] driver_register+0x42/0x47
[ 50.289214] [<c02138c2>] acpi_bus_register_driver+0x3a/0x3c
[ 50.289316] [<c044306b>] acpi_thermal_init+0x57/0x76
[ 50.289424] [<c04344a7>] kernel_init+0x138/0x280
[ 50.289525] [<c01047df>] kernel_thread_helper+0x7/0x10
[ 50.289625] =======================
[ 50.289680] ACPI: Critical trip point
[ 50.289736] Critical temperature reached (255 C), shutting down.

so in acpi_thermal_get_temperature() called in acpi_thermal_add() the
tz->temperature thingy is not set properly (printk's added):

[ 50.276607] Old temp: 4294967023
[ 50.281890] Got temp: 255
[ 50.282567] Old temp: 255
[ 50.287882] Got temp: 255

What's also strange is that the tz acpi_thermal is alloc'd with kzalloc and
there's still garbage in it after reading it in acpi_thermal_get_temperature()
for the first time. Debugging continues...
--
Regards/Gru?,
Boris.

2007-12-11 17:46:19

by Borislav Petkov

[permalink] [raw]
Subject: Re: 2.6.24-rc4-mm1: acpi reboots machine... solved

On Sun, Dec 09, 2007 at 10:19:47AM +0100, Borislav Petkov wrote:
> On Sun, Dec 09, 2007 at 08:50:02AM +0100, Borislav Petkov wrote:
> > Hi Andrew,
> > Hi Len,
> >
> > after booting 2.6.24-rc4-mm1 (2.6.24-rc4-190-g94545ba, otoh, boots just
> > fine) on my asus laptop, the machine reboots after claiming that
> > "Critical temperature reached (255 C)." However, the degrees number
> > is kinda hinting at 0xff all-ones field. Will try dump_stack in
> > acpi_thermal_critical() to checkout the call path. For now here's the netconsole bootlog:
>
> Here's what i got so far:
>
> [ 50.287939] Pid: 1, comm: swapper Not tainted 2.6.24-rc4-mm1 #14
> [ 50.287999] [<c0104b65>] show_trace_log_lvl+0x12/0x25
> [ 50.288103] [<c01053e7>] show_trace+0xd/0x10
> [ 50.288202] [<c0105a6c>] dump_stack+0x57/0x5f
> [ 50.288303] [<c021c991>] acpi_thermal_check+0x150/0x3bb
> [ 50.288415] [<c021d4b3>] acpi_thermal_add+0x261/0x2cf
> [ 50.288515] [<c0213549>] acpi_device_probe+0x3e/0xdb
> [ 50.288615] [<c023f8f5>] driver_probe_device+0xaf/0x12a
> [ 50.288717] [<c023fa88>] __driver_attach+0x6c/0xa5
> [ 50.288817] [<c023ee5a>] bus_for_each_dev+0x3e/0x60
> [ 50.288916] [<c023f77d>] driver_attach+0x14/0x16
> [ 50.289015] [<c023f5a6>] bus_add_driver+0xa6/0x1a8
> [ 50.289114] [<c023fc53>] driver_register+0x42/0x47
> [ 50.289214] [<c02138c2>] acpi_bus_register_driver+0x3a/0x3c
> [ 50.289316] [<c044306b>] acpi_thermal_init+0x57/0x76
> [ 50.289424] [<c04344a7>] kernel_init+0x138/0x280
> [ 50.289525] [<c01047df>] kernel_thread_helper+0x7/0x10
> [ 50.289625] =======================
> [ 50.289680] ACPI: Critical trip point
> [ 50.289736] Critical temperature reached (255 C), shutting down.
>
> so in acpi_thermal_get_temperature() called in acpi_thermal_add() the
> tz->temperature thingy is not set properly (printk's added):
>
> [ 50.276607] Old temp: 4294967023
> [ 50.281890] Got temp: 255
> [ 50.282567] Old temp: 255
> [ 50.287882] Got temp: 255
>
> What's also strange is that the tz acpi_thermal is alloc'd with kzalloc and
> there's still garbage in it after reading it in acpi_thermal_get_temperature()
> for the first time. Debugging continues...

(i almost suspected that the problem might be something completely different.)
well, after bisecting the rc4-mm1 tree for a whole day today, the evildoer
turned out to be

broken-out/pnp-request-ioport-and-iomem-resources-used-by-active-devices.patch.

After backing this one out, mm1 boots just fine here.
--
Regards/Gru?,
Boris.

2007-12-11 20:01:15

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: 2.6.24-rc4-mm1: acpi reboots machine... solved

On Tuesday 11 December 2007 10:44:43 am Borislav Petkov wrote:
> On Sun, Dec 09, 2007 at 10:19:47AM +0100, Borislav Petkov wrote:
> > On Sun, Dec 09, 2007 at 08:50:02AM +0100, Borislav Petkov wrote:
> > > Hi Andrew,
> > > Hi Len,
> > >
> > > after booting 2.6.24-rc4-mm1 (2.6.24-rc4-190-g94545ba, otoh, boots just
> > > fine) on my asus laptop, the machine reboots after claiming that
> > > "Critical temperature reached (255 C)." However, the degrees number
> > > is kinda hinting at 0xff all-ones field. Will try dump_stack in
> > > acpi_thermal_critical() to checkout the call path. For now here's the netconsole bootlog:
> >
> > Here's what i got so far:
> >
> > [ 50.287939] Pid: 1, comm: swapper Not tainted 2.6.24-rc4-mm1 #14
> > [ 50.287999] [<c0104b65>] show_trace_log_lvl+0x12/0x25
> > [ 50.288103] [<c01053e7>] show_trace+0xd/0x10
> > [ 50.288202] [<c0105a6c>] dump_stack+0x57/0x5f
> > [ 50.288303] [<c021c991>] acpi_thermal_check+0x150/0x3bb
> > [ 50.288415] [<c021d4b3>] acpi_thermal_add+0x261/0x2cf
> > [ 50.288515] [<c0213549>] acpi_device_probe+0x3e/0xdb
> > [ 50.288615] [<c023f8f5>] driver_probe_device+0xaf/0x12a
> > [ 50.288717] [<c023fa88>] __driver_attach+0x6c/0xa5
> > [ 50.288817] [<c023ee5a>] bus_for_each_dev+0x3e/0x60
> > [ 50.288916] [<c023f77d>] driver_attach+0x14/0x16
> > [ 50.289015] [<c023f5a6>] bus_add_driver+0xa6/0x1a8
> > [ 50.289114] [<c023fc53>] driver_register+0x42/0x47
> > [ 50.289214] [<c02138c2>] acpi_bus_register_driver+0x3a/0x3c
> > [ 50.289316] [<c044306b>] acpi_thermal_init+0x57/0x76
> > [ 50.289424] [<c04344a7>] kernel_init+0x138/0x280
> > [ 50.289525] [<c01047df>] kernel_thread_helper+0x7/0x10
> > [ 50.289625] =======================
> > [ 50.289680] ACPI: Critical trip point
> > [ 50.289736] Critical temperature reached (255 C), shutting down.
> >
> > so in acpi_thermal_get_temperature() called in acpi_thermal_add() the
> > tz->temperature thingy is not set properly (printk's added):
> >
> > [ 50.276607] Old temp: 4294967023
> > [ 50.281890] Got temp: 255
> > [ 50.282567] Old temp: 255
> > [ 50.287882] Got temp: 255
> >
> > What's also strange is that the tz acpi_thermal is alloc'd with kzalloc and
> > there's still garbage in it after reading it in acpi_thermal_get_temperature()
> > for the first time. Debugging continues...
>
> (i almost suspected that the problem might be something completely different.)
> well, after bisecting the rc4-mm1 tree for a whole day today, the evildoer
> turned out to be
>
> broken-out/pnp-request-ioport-and-iomem-resources-used-by-active-devices.patch.
>
> After backing this one out, mm1 boots just fine here.

Thanks for tracking this down. I'll look into your logs and see if I
can figure out what's going on. There's another report related to that
patch here: http://lkml.org/lkml/2007/11/22/110 . Looks like a different
symptom though, so probably a different fix.

Bjorn

2007-12-11 21:29:37

by Borislav Petkov

[permalink] [raw]
Subject: Re: 2.6.24-rc4-mm1: acpi reboots machine... solved

On Tue, Dec 11, 2007 at 01:00:24PM -0700, Bjorn Helgaas wrote:
> On Tuesday 11 December 2007 10:44:43 am Borislav Petkov wrote:
> > On Sun, Dec 09, 2007 at 10:19:47AM +0100, Borislav Petkov wrote:
> > > On Sun, Dec 09, 2007 at 08:50:02AM +0100, Borislav Petkov wrote:
> > > > Hi Andrew,
> > > > Hi Len,
> > > >
> > > > after booting 2.6.24-rc4-mm1 (2.6.24-rc4-190-g94545ba, otoh, boots just
> > > > fine) on my asus laptop, the machine reboots after claiming that
> > > > "Critical temperature reached (255 C)." However, the degrees number
> > > > is kinda hinting at 0xff all-ones field. Will try dump_stack in
> > > > acpi_thermal_critical() to checkout the call path. For now here's the netconsole bootlog:
> > >
> > > Here's what i got so far:
> > >
> > > [ 50.287939] Pid: 1, comm: swapper Not tainted 2.6.24-rc4-mm1 #14
> > > [ 50.287999] [<c0104b65>] show_trace_log_lvl+0x12/0x25
> > > [ 50.288103] [<c01053e7>] show_trace+0xd/0x10
> > > [ 50.288202] [<c0105a6c>] dump_stack+0x57/0x5f
> > > [ 50.288303] [<c021c991>] acpi_thermal_check+0x150/0x3bb
> > > [ 50.288415] [<c021d4b3>] acpi_thermal_add+0x261/0x2cf
> > > [ 50.288515] [<c0213549>] acpi_device_probe+0x3e/0xdb
> > > [ 50.288615] [<c023f8f5>] driver_probe_device+0xaf/0x12a
> > > [ 50.288717] [<c023fa88>] __driver_attach+0x6c/0xa5
> > > [ 50.288817] [<c023ee5a>] bus_for_each_dev+0x3e/0x60
> > > [ 50.288916] [<c023f77d>] driver_attach+0x14/0x16
> > > [ 50.289015] [<c023f5a6>] bus_add_driver+0xa6/0x1a8
> > > [ 50.289114] [<c023fc53>] driver_register+0x42/0x47
> > > [ 50.289214] [<c02138c2>] acpi_bus_register_driver+0x3a/0x3c
> > > [ 50.289316] [<c044306b>] acpi_thermal_init+0x57/0x76
> > > [ 50.289424] [<c04344a7>] kernel_init+0x138/0x280
> > > [ 50.289525] [<c01047df>] kernel_thread_helper+0x7/0x10
> > > [ 50.289625] =======================
> > > [ 50.289680] ACPI: Critical trip point
> > > [ 50.289736] Critical temperature reached (255 C), shutting down.
> > >
> > > so in acpi_thermal_get_temperature() called in acpi_thermal_add() the
> > > tz->temperature thingy is not set properly (printk's added):
> > >
> > > [ 50.276607] Old temp: 4294967023
> > > [ 50.281890] Got temp: 255
> > > [ 50.282567] Old temp: 255
> > > [ 50.287882] Got temp: 255
> > >
> > > What's also strange is that the tz acpi_thermal is alloc'd with kzalloc and
> > > there's still garbage in it after reading it in acpi_thermal_get_temperature()
> > > for the first time. Debugging continues...
> >
> > (i almost suspected that the problem might be something completely different.)
> > well, after bisecting the rc4-mm1 tree for a whole day today, the evildoer
> > turned out to be
> >
> > broken-out/pnp-request-ioport-and-iomem-resources-used-by-active-devices.patch.
> >
> > After backing this one out, mm1 boots just fine here.
>
> Thanks for tracking this down. I'll look into your logs and see if I
> can figure out what's going on. There's another report related to that
> patch here: http://lkml.org/lkml/2007/11/22/110 . Looks like a different
> symptom though, so probably a different fix.

>From what i can roughly tell so far it seems like an resource conflict between acpi and
the pnp requested regions in your patch which result in the acpi_thermal code
to read the wrong (0xff) temperature value and halt the machine, but i might be
wrong on the details since acpi is such a big code chunk to swallow. Anyways, this is a
different issue than the one you quote above.

--
Regards/Gru?,
Boris.

2007-12-12 00:09:18

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: 2.6.24-rc4-mm1: acpi reboots machine... solved

On Tuesday 11 December 2007 01:52:55 pm Borislav Petkov wrote:
> From what i can roughly tell so far it seems like an resource conflict between acpi and
> the pnp requested regions in your patch which result in the acpi_thermal code
> to read the wrong (0xff) temperature value and halt the machine, but i might be
> wrong on the details since acpi is such a big code chunk to swallow.

I don't see any obvious conflict from the log you posted. For the sake
of comparison, can you post the corresponding dmesg log after you removed
the patch?

acpi_thermal_get_temperature() only evaluates _TMP, which isn't very
interesting. I wonder if there's some conflict between that AML method
and the EC driver or something.

If you can also collect the DSDT, maybe I can poke around in there and
see what _TMP is really doing.

Thanks,
Bjorn

2007-12-12 10:13:12

by Borislav Petkov

[permalink] [raw]
Subject: Re: 2.6.24-rc4-mm1: acpi reboots machine... solved

On Tue, Dec 11, 2007 at 05:08:59PM -0700, Bjorn Helgaas wrote:
> On Tuesday 11 December 2007 01:52:55 pm Borislav Petkov wrote:
> > From what i can roughly tell so far it seems like an resource conflict between acpi and
> > the pnp requested regions in your patch which result in the acpi_thermal code
> > to read the wrong (0xff) temperature value and halt the machine, but i might be
> > wrong on the details since acpi is such a big code chunk to swallow.
>
> I don't see any obvious conflict from the log you posted. For the sake
> of comparison, can you post the corresponding dmesg log after you removed
> the patch?

The only difference i see is that ACPI finds EC in DSDT in the working kernel
and in the broken case something silently fails. Please find attached the 2 bootlogs
and a disassembled DSDT.

--
Regards/Gru?,
Boris.


Attachments:
(No filename) (844.00 B)
bootlogs.tar.bz2 (8.67 kB)
dsdt.dsl.bz2 (13.79 kB)
Download all attachments

2007-12-12 10:37:32

by Alexey Starikovskiy

[permalink] [raw]
Subject: Re: 2.6.24-rc4-mm1: acpi reboots machine... solved

Borislav Petkov wrote:
> On Tue, Dec 11, 2007 at 05:08:59PM -0700, Bjorn Helgaas wrote:
>
>> On Tuesday 11 December 2007 01:52:55 pm Borislav Petkov wrote:
>>
>>> From what i can roughly tell so far it seems like an resource conflict between acpi and
>>> the pnp requested regions in your patch which result in the acpi_thermal code
>>> to read the wrong (0xff) temperature value and halt the machine, but i might be
>>> wrong on the details since acpi is such a big code chunk to swallow.
>>>
>> I don't see any obvious conflict from the log you posted. For the sake
>> of comparison, can you post the corresponding dmesg log after you removed
>> the patch?
>>
>
> The only difference i see is that ACPI finds EC in DSDT in the working kernel
> and in the broken case something silently fails. Please find attached the 2 bootlogs
> and a disassembled DSDT.
>
>
This seems to be the start of trouble...
PCI: Cannot allocate resource region 4 of device 0000:00:1f.3

2007-12-12 16:21:57

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: 2.6.24-rc4-mm1: acpi reboots machine... solved

On Wednesday 12 December 2007 03:11:23 am Borislav Petkov wrote:
> On Tue, Dec 11, 2007 at 05:08:59PM -0700, Bjorn Helgaas wrote:
> > On Tuesday 11 December 2007 01:52:55 pm Borislav Petkov wrote:
> > > From what i can roughly tell so far it seems like an resource conflict between acpi and
> > > the pnp requested regions in your patch which result in the acpi_thermal code
> > > to read the wrong (0xff) temperature value and halt the machine, but i might be
> > > wrong on the details since acpi is such a big code chunk to swallow.
> >
> > I don't see any obvious conflict from the log you posted. For the sake
> > of comparison, can you post the corresponding dmesg log after you removed
> > the patch?
>
> The only difference i see is that ACPI finds EC in DSDT in the working kernel
> and in the broken case something silently fails. Please find attached the 2 bootlogs
> and a disassembled DSDT.

Thanks very much!

"ACPI: EC: Look up EC in DSDT" appears in the working log, but not
in the broken one. But I think we *do* find the EC in both cases,
because we see "ACPI: EC: non-query interrupt received" even before
acpi_ec_add() (which prints the "ACPI: EC: GPE = 0x1c, ...". Maybe
the logs were collected with different log levels?

I think Alexey is on the right track with the PCI resource allocation
failure. On your working kernel, can you collect this:

lspci -vv > lspci
cat /proc/ioports > ioports
cat /proc/iomem > iomem
grep . /sys/devices/pnp*/*/resources > pnp
tar -jcf resources.tar.bz2 lspci ioports iomem pnp

Bjorn

2007-12-13 07:22:49

by Borislav Petkov

[permalink] [raw]
Subject: Re: 2.6.24-rc4-mm1: acpi reboots machine... solved

On Wed, Dec 12, 2007 at 09:21:41AM -0700, Bjorn Helgaas wrote:
> On Wednesday 12 December 2007 03:11:23 am Borislav Petkov wrote:
> > On Tue, Dec 11, 2007 at 05:08:59PM -0700, Bjorn Helgaas wrote:
> > > On Tuesday 11 December 2007 01:52:55 pm Borislav Petkov wrote:
> > > > From what i can roughly tell so far it seems like an resource conflict between acpi and
> > > > the pnp requested regions in your patch which result in the acpi_thermal code
> > > > to read the wrong (0xff) temperature value and halt the machine, but i might be
> > > > wrong on the details since acpi is such a big code chunk to swallow.
> > >
> > > I don't see any obvious conflict from the log you posted. For the sake
> > > of comparison, can you post the corresponding dmesg log after you removed
> > > the patch?
> >
> > The only difference i see is that ACPI finds EC in DSDT in the working kernel
> > and in the broken case something silently fails. Please find attached the 2 bootlogs
> > and a disassembled DSDT.
>
> Thanks very much!
>
> "ACPI: EC: Look up EC in DSDT" appears in the working log, but not
> in the broken one. But I think we *do* find the EC in both cases,
> because we see "ACPI: EC: non-query interrupt received" even before
> acpi_ec_add() (which prints the "ACPI: EC: GPE = 0x1c, ...". Maybe
> the logs were collected with different log levels?

Well, hm, actually no, the only difference is that the broken log was taken over
netconsole so the lines might appear in a different order. I'll capture that
log again on the weekend to see whether something is missing..

> I think Alexey is on the right track with the PCI resource allocation
> failure.

Then it should be the SMBus controller, PCI id 00:1f:3, which is having problems
registering its io ports region 4, AFAICT.

> On your working kernel, can you collect this:
>
> lspci -vv > lspci
> cat /proc/ioports > ioports
> cat /proc/iomem > iomem
> grep . /sys/devices/pnp*/*/resources > pnp
> tar -jcf resources.tar.bz2 lspci ioports iomem pnp

attached.

--
Regards/Gru?,
Boris.


Attachments:
(No filename) (2.02 kB)
resources.tar.bz2 (3.85 kB)
Download all attachments

2007-12-13 16:17:33

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: 2.6.24-rc4-mm1: acpi reboots machine... solved

On Thursday 13 December 2007 12:09:23 am Borislav Petkov wrote:
> On Wed, Dec 12, 2007 at 09:21:41AM -0700, Bjorn Helgaas wrote:
> > On Wednesday 12 December 2007 03:11:23 am Borislav Petkov wrote:
> > > On Tue, Dec 11, 2007 at 05:08:59PM -0700, Bjorn Helgaas wrote:
> > > > On Tuesday 11 December 2007 01:52:55 pm Borislav Petkov wrote:
> > > > > From what i can roughly tell so far it seems like an resource conflict between acpi and
> > > > > the pnp requested regions in your patch which result in the acpi_thermal code
> > > > > to read the wrong (0xff) temperature value and halt the machine, but i might be
> > > > > wrong on the details since acpi is such a big code chunk to swallow.
> > > >
> > I think Alexey is on the right track with the PCI resource allocation
> > failure.
>
> Then it should be the SMBus controller, PCI id 00:1f:3, which is having problems
> registering its io ports region 4, AFAICT.

Yes, it looks like the ioport region 0x540-0x55f is described both in
PNP and ACPI:

/sys/devices/pnp0/00:0d/resources:state = active
/sys/devices/pnp0/00:0d/resources:io 0x540-0x55f
/sys/devices/pnp0/00:0d/resources:io 0x400-0x47f

00:1f.3 SMBus: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) SMBus Controller (rev 03)
Subsystem: ASUSTeK Computer Inc. Unknown device 1869
Control: I/O+ Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Interrupt: pin B routed to IRQ 0
Region 4: I/O ports at 0540 [size=32]

The PCI SMBus device was enabled by a quirk, asus_hides_smbus_lpc().

This quirk seems dangerous to me, and the comments above asus_hides_smbus
allude to problems similar to what you're seeing. It's obvious that a
lot of blood, sweat, and tears have gone into this quirk, so I'm not
suggesting that it's time to revert it, but I would be interested in
knowing whether the critical temperature problem goes away if we leave
the PCI device hidden, e.g., with the following patch:

Index: linux-mm/drivers/pci/quirks.c
===================================================================
--- linux-mm.orig/drivers/pci/quirks.c 2007-12-13 09:11:31.000000000 -0700
+++ linux-mm/drivers/pci/quirks.c 2007-12-13 09:12:27.000000000 -0700
@@ -1073,12 +1073,7 @@

pci_read_config_word(dev, 0xF2, &val);
if (val & 0x8) {
- pci_write_config_word(dev, 0xF2, val & (~0x8));
- pci_read_config_word(dev, 0xF2, &val);
- if (val & 0x8)
- printk(KERN_INFO "PCI: i801 SMBus device continues to play 'hide and seek'! 0x%x\n", val);
- else
- printk(KERN_INFO "PCI: Enabled i801 SMBus device\n");
+ printk(KERN_INFO "PCI: Leaving i801 SMBus device hidden\n");
}
}
DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_82801AA_0, asus_hides_smbus_lpc);

2007-12-13 21:32:36

by Borislav Petkov

[permalink] [raw]
Subject: Re: 2.6.24-rc4-mm1: acpi reboots machine... solved

On Thu, Dec 13, 2007 at 09:17:18AM -0700, Bjorn Helgaas wrote:
> On Thursday 13 December 2007 12:09:23 am Borislav Petkov wrote:
> > On Wed, Dec 12, 2007 at 09:21:41AM -0700, Bjorn Helgaas wrote:
> > > On Wednesday 12 December 2007 03:11:23 am Borislav Petkov wrote:
> > > > On Tue, Dec 11, 2007 at 05:08:59PM -0700, Bjorn Helgaas wrote:
> > > > > On Tuesday 11 December 2007 01:52:55 pm Borislav Petkov wrote:
> > > > > > From what i can roughly tell so far it seems like an resource conflict between acpi and
> > > > > > the pnp requested regions in your patch which result in the acpi_thermal code
> > > > > > to read the wrong (0xff) temperature value and halt the machine, but i might be
> > > > > > wrong on the details since acpi is such a big code chunk to swallow.
> > > > >
> > > I think Alexey is on the right track with the PCI resource allocation
> > > failure.
> >
> > Then it should be the SMBus controller, PCI id 00:1f:3, which is having problems
> > registering its io ports region 4, AFAICT.
>
> Yes, it looks like the ioport region 0x540-0x55f is described both in
> PNP and ACPI:
>
> /sys/devices/pnp0/00:0d/resources:state = active
> /sys/devices/pnp0/00:0d/resources:io 0x540-0x55f
> /sys/devices/pnp0/00:0d/resources:io 0x400-0x47f
>
> 00:1f.3 SMBus: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) SMBus Controller (rev 03)
> Subsystem: ASUSTeK Computer Inc. Unknown device 1869
> Control: I/O+ Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
> Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
> Interrupt: pin B routed to IRQ 0
> Region 4: I/O ports at 0540 [size=32]
>
> The PCI SMBus device was enabled by a quirk, asus_hides_smbus_lpc().
>
> This quirk seems dangerous to me, and the comments above asus_hides_smbus
> allude to problems similar to what you're seeing. It's obvious that a
> lot of blood, sweat, and tears have gone into this quirk, so I'm not
> suggesting that it's time to revert it, but I would be interested in
> knowing whether the critical temperature problem goes away if we leave
> the PCI device hidden, e.g., with the following patch:
>
> Index: linux-mm/drivers/pci/quirks.c
> ===================================================================
> --- linux-mm.orig/drivers/pci/quirks.c 2007-12-13 09:11:31.000000000 -0700
> +++ linux-mm/drivers/pci/quirks.c 2007-12-13 09:12:27.000000000 -0700
> @@ -1073,12 +1073,7 @@
>
> pci_read_config_word(dev, 0xF2, &val);
> if (val & 0x8) {
> - pci_write_config_word(dev, 0xF2, val & (~0x8));
> - pci_read_config_word(dev, 0xF2, &val);
> - if (val & 0x8)
> - printk(KERN_INFO "PCI: i801 SMBus device continues to play 'hide and seek'! 0x%x\n", val);
> - else
> - printk(KERN_INFO "PCI: Enabled i801 SMBus device\n");
> + printk(KERN_INFO "PCI: Leaving i801 SMBus device hidden\n");
> }
> }
> DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_82801AA_0, asus_hides_smbus_lpc);

yep, this fixes it. Bootlog attached.

--
Regards/Gru?,
Boris.


Attachments:
(No filename) (3.04 kB)
bootlog-smbus-hidden.bz2 (8.80 kB)
Download all attachments