Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752313AbZDMW2z (ORCPT ); Mon, 13 Apr 2009 18:28:55 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1750966AbZDMW2l (ORCPT ); Mon, 13 Apr 2009 18:28:41 -0400 Received: from g4t0015.houston.hp.com ([15.201.24.18]:33160 "EHLO g4t0015.houston.hp.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750750AbZDMW2j (ORCPT ); Mon, 13 Apr 2009 18:28:39 -0400 From: Bjorn Helgaas To: Alan Jenkins Subject: Re: [BISECTED] EEE PC hangs when booting off battery Date: Mon, 13 Apr 2009 16:28:34 -0600 User-Agent: KMail/1.9.10 Cc: linux-acpi@vger.kernel.org, "linux-kernel" , Kernel Testers List , Venkatesh Pallipadi References: <49E065CF.6040408@tuffmail.co.uk> <200904131315.55519.bjorn.helgaas@hp.com> <49E3990C.6040303@tuffmail.co.uk> In-Reply-To: <49E3990C.6040303@tuffmail.co.uk> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200904131628.35407.bjorn.helgaas@hp.com> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 8664 Lines: 184 On Monday 13 April 2009 01:57:00 pm Alan Jenkins wrote: > Bjorn Helgaas wrote: > > On Sunday 12 April 2009 07:11:57 am Alan Jenkins wrote: > > > > You mention that this occurs when booting off battery. So I > > assume everything works fine when the EEE is plugged in to the > > wall socket? > > When I tested it before, that was what I found. > > However, I now find that's not quite right. It only works (i.e. doesn't > hang) if I remove the battery as well as plugging it into the wall. If > I have the battery in, it hangs. > > >>>>> Magic SysRQ keys work though. ... > >>>>> > >>> I was able to run SysRq-P, and found the following backtrace - > >>> > >>> Pid: 0 > >>> EIP is at acpi_idle_enter_bm+0x1df/0x208 [processor] > > > > Can you figure out where this is in acpi_idle_enter_bm() or > > maybe just email me your processor.ko module? > > > > Does it always happen at the same point? > > Yes, it always happens at the same point. > > It turns out I can read the runes, but I don't understand what they're > saying :-). I'm not much good with x86 assembly either :-) I think that in both cases below, you're right after enabling interrupts and about to exit the idle routine. My guess is the system is not really hung; it just doesn't think it has anything to do and is spending all its time in the idle loop. > 00001bd0 : > > ... > 00001bd0 + 0x1df = 00001daf > ... > 1d70: b8 03 00 00 00 mov $0x3,%eax > 1d75: e8 90 f3 ff ff call 110a > 1d7a: 85 c0 test %eax,%eax > 1d7c: 74 0a je 1d88 > 1d7e: b8 0e 09 00 00 mov $0x90e,%eax > 1d7f: R_386_32 .rodata.str1.1 > 1d83: e8 fc ff ff ff call 1d84 > 1d84: R_386_PC32 mark_tsc_unstable > 1d88: 8b 45 e8 mov -0x18(%ebp),%eax > 1d8b: 8b 55 ec mov -0x14(%ebp),%edx > 1d8e: e8 ab fd ff ff call 1b3e > 1d93: 89 c3 mov %eax,%ebx > 1d95: b8 17 01 00 00 mov $0x117,%eax > 1d9a: 69 ca 17 01 00 00 imul $0x117,%edx,%ecx > 1da0: 89 d6 mov %edx,%esi > 1da2: f7 e3 mul %ebx > 1da4: 8d 14 11 lea (%ecx,%edx,1),%edx > 1da7: e8 fc ff ff ff call 1da8 > 1da8: R_386_PC32 sched_clock_idle_wakeup_event > 1dac: fb sti > 1dad: 89 e0 mov %esp,%eax > -> 1daf: 31 c9 xor %ecx,%ecx <--------- > 1db1: 25 00 e0 ff ff and $0xffffe000,%eax > 1db6: 89 fa mov %edi,%edx > 1db8: 83 48 0c 04 orl $0x4,0xc(%eax) > 1dbc: ff 47 18 incl 0x18(%edi) > 1dbf: 8b 45 e4 mov -0x1c(%ebp),%eax > 1dc2: e8 a4 f5 ff ff call 136b > 1dc7: 01 5f 1c add %ebx,0x1c(%edi) > 1dca: 11 77 20 adc %esi,0x20(%edi) > 1dcd: 8b 45 e8 mov -0x18(%ebp),%eax > 1dd0: 83 c4 10 add $0x10,%esp > 1dd3: 5b pop %ebx > 1dd4: 5e pop %esi > 1dd5: 5f pop %edi > 1dd6: 5d pop %ebp > 1dd7: c3 ret > > > If you blacklist or rename the processor module to prevent it > > from loading, does that keep the hang from occurring? > > No. In that case I get the hang in default_idle+0x59/0x95 > > 0000007a : > 7a: 55 push %ebp > 7b: 89 e5 mov %esp,%ebp > 7d: 56 push %esi > 7e: 53 push %ebx > 7f: 83 ec 18 sub $0x18,%esp > 82: 83 3d 18 00 00 00 00 cmpl $0x0,0x18 > 84: R_386_32 .bss > 89: 75 7a jne 105 > 8b: 80 3d 05 00 00 00 00 cmpb $0x0,0x5 > 8d: R_386_32 boot_cpu_data > 92: 74 71 je 105 > 94: 83 3d 04 00 00 00 00 cmpl $0x0,0x4 > 96: R_386_32 __tracepoint_power_start > 9b: 74 23 je c0 > 9d: 8b 1d 08 00 00 00 mov 0x8,%ebx > 9f: R_386_32 __tracepoint_power_start > a3: 85 db test %ebx,%ebx > a5: 74 19 je c0 > a7: 8d 75 e0 lea -0x20(%ebp),%esi > aa: b9 01 00 00 00 mov $0x1,%ecx > af: ba 01 00 00 00 mov $0x1,%edx > b4: 89 f0 mov %esi,%eax > b6: ff 13 call *(%ebx) > b8: 83 c3 04 add $0x4,%ebx > bb: 83 3b 00 cmpl $0x0,(%ebx) > be: 75 ea jne aa > c0: 89 e0 mov %esp,%eax > c2: 25 00 e0 ff ff and $0xffffe000,%eax > c7: 83 60 0c fb andl $0xfffffffb,0xc(%eax) > cb: f6 40 08 08 testb $0x8,0x8(%eax) > cf: 75 04 jne d5 > d1: fb sti > d2: f4 hlt > --> d3: eb 01 jmp d6 <-------- > d5: fb sti > d6: 89 e0 mov %esp,%eax > d8: 25 00 e0 ff ff and $0xffffe000,%eax > dd: 83 48 0c 04 orl $0x4,0xc(%eax) > e1: 83 3d 04 00 00 00 00 cmpl $0x0,0x4 > e3: R_386_32 __tracepoint_power_end > e8: 74 1e je 108 > > > >> 7ec0a7290797f57b780f792d12f4bcc19c83aa4f is first bad commit > >> commit 7ec0a7290797f57b780f792d12f4bcc19c83aa4f > >> Author: Bjorn Helgaas > >> Date: Mon Mar 30 17:48:24 2009 +0000 > > > > Ouch, sorry about that. Thanks for doing all the bisection work. > > > >> ACPI: processor: use .notify method instead of installing handler > >> directly > >> > >> This patch adds a .notify() method. The presence of .notify() causes > >> Linux/ACPI to manage event handlers and notify handlers on our behalf, > >> so we don't have to install and remove them ourselves. > >> > >> Signed-off-by: Bjorn Helgaas > >> CC: Zhang Rui > >> CC: Zhao Yakui > >> CC: Venki Pallipadi > >> CC: Anil S Keshavamurthy > >> Signed-off-by: Len Brown > >> > >> However, reverting this commit from v2.6.30-rc1 doesn't solve the hang. > > > > I don't see the problem in that commit yet, and if there is a problem > > with it, I would think that reverting it from 2.6.30-rc1 would solve > > it. But maybe it'd be useful to revert the whole .notify series to > > make sure. From 2.6.30-rc1, you should be able to revert these: > > > > 7ec0a7290797f57b780f792d12f4bcc19c83aa4f processor > > 373cfc360ec773be2f7615e59a19f3313255db7c button > > 46ec8598fde74ba59703575c22a6fb0b6b151bb6 Linux/ACPI infrastructure > > > > What happens with those commits reverted? > > I'll find out tomorrow. The fact that it still hangs even when you don't load the processor driver at all suggests that the 7ec0a729079 commit identified by the bisection is not the real problem. That commit only touches drivers/acpi/processor_core.c. I think it's more likely some kind of race or missed wakeup. Since it seems to be sensitive to whether the battery is present, I guess you could try blacklisting the battery.ko driver. There have been a few changes to it since 2.6.29-rc8. If things work without battery.ko, we can look through those changes. Bjorn -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/