Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756831AbZDNO7T (ORCPT ); Tue, 14 Apr 2009 10:59:19 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1755539AbZDNO7J (ORCPT ); Tue, 14 Apr 2009 10:59:09 -0400 Received: from g4t0017.houston.hp.com ([15.201.24.20]:2038 "EHLO g4t0017.houston.hp.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751305AbZDNO7H (ORCPT ); Tue, 14 Apr 2009 10:59:07 -0400 From: Bjorn Helgaas To: Alan Jenkins Subject: Re: [BISECTED] EEE PC hangs when booting off battery Date: Tue, 14 Apr 2009 08:59:01 -0600 User-Agent: KMail/1.9.10 Cc: linux-acpi@vger.kernel.org, "linux-kernel" , Kernel Testers List , Venkatesh Pallipadi , Arjan van de Ven References: <49E065CF.6040408@tuffmail.co.uk> <49E44415.1040500@tuffmail.co.uk> <49E456B5.6030006@tuffmail.co.uk> In-Reply-To: <49E456B5.6030006@tuffmail.co.uk> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200904140859.02188.bjorn.helgaas@hp.com> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 10628 Lines: 215 On Tuesday 14 April 2009 03:26:13 am Alan Jenkins wrote: > Alan Jenkins wrote: > > Bjorn Helgaas wrote: > >> On Monday 13 April 2009 01:57:00 pm Alan Jenkins wrote: > >>> Bjorn Helgaas wrote: > >>>> On Sunday 12 April 2009 07:11:57 am Alan Jenkins wrote: > >>>> > >>>> You mention that this occurs when booting off battery. So I > >>>> assume everything works fine when the EEE is plugged in to the > >>>> wall socket? > >>>> > >>> When I tested it before, that was what I found. > >>> > >>> However, I now find that's not quite right. It only works (i.e. doesn't > >>> hang) if I remove the battery as well as plugging it into the wall. If > >>> I have the battery in, it hangs. > > > > ... and right now, I can only reproduce it by booting with it plugged > > into the wall and the battery present. If I unplug it from the wall, it > > boots fine. > > > > It must be affected by something else as well, maybe battery level or > > charging / discharging status. > > > >>>>>>>> Magic SysRQ keys work though. ... > >>>>>> I was able to run SysRq-P, and found the following backtrace - > >>>>>> > >>>>>> Pid: 0 > >>>>>> EIP is at acpi_idle_enter_bm+0x1df/0x208 [processor] > >>>>>> > >>>> Can you figure out where this is in acpi_idle_enter_bm() or > >>>> maybe just email me your processor.ko module? > >>>> > >>>> Does it always happen at the same point? > >>>> > >>> Yes, it always happens at the same point. > >>> > >>> It turns out I can read the runes, but I don't understand what they're > >>> saying :-). > >>> > >> I'm not much good with x86 assembly either :-) > >> > >> I think that in both cases below, you're right after enabling > >> interrupts and about to exit the idle routine. My guess is the > >> system is not really hung; it just doesn't think it has anything > >> to do and is spending all its time in the idle loop. > >> > >>> 00001bd0 : > >>> ... > >>> 00001bd0 + 0x1df = 00001daf > >>> ... > >>> 1d70: b8 03 00 00 00 mov $0x3,%eax > >>> 1d75: e8 90 f3 ff ff call 110a > >>> 1d7a: 85 c0 test %eax,%eax > >>> 1d7c: 74 0a je 1d88 > >>> 1d7e: b8 0e 09 00 00 mov $0x90e,%eax > >>> 1d7f: R_386_32 .rodata.str1.1 > >>> 1d83: e8 fc ff ff ff call 1d84 > >>> 1d84: R_386_PC32 mark_tsc_unstable > >>> 1d88: 8b 45 e8 mov -0x18(%ebp),%eax > >>> 1d8b: 8b 55 ec mov -0x14(%ebp),%edx > >>> 1d8e: e8 ab fd ff ff call 1b3e > >>> 1d93: 89 c3 mov %eax,%ebx > >>> 1d95: b8 17 01 00 00 mov $0x117,%eax > >>> 1d9a: 69 ca 17 01 00 00 imul $0x117,%edx,%ecx > >>> 1da0: 89 d6 mov %edx,%esi > >>> 1da2: f7 e3 mul %ebx > >>> 1da4: 8d 14 11 lea (%ecx,%edx,1),%edx > >>> 1da7: e8 fc ff ff ff call 1da8 > >>> 1da8: R_386_PC32 sched_clock_idle_wakeup_event > >>> 1dac: fb sti > >>> 1dad: 89 e0 mov %esp,%eax > >>> -> 1daf: 31 c9 xor %ecx,%ecx <--------- > >>> 1db1: 25 00 e0 ff ff and $0xffffe000,%eax > >>> 1db6: 89 fa mov %edi,%edx > >>> 1db8: 83 48 0c 04 orl $0x4,0xc(%eax) > >>> 1dbc: ff 47 18 incl 0x18(%edi) > >>> 1dbf: 8b 45 e4 mov -0x1c(%ebp),%eax > >>> 1dc2: e8 a4 f5 ff ff call 136b > >>> 1dc7: 01 5f 1c add %ebx,0x1c(%edi) > >>> 1dca: 11 77 20 adc %esi,0x20(%edi) > >>> 1dcd: 8b 45 e8 mov -0x18(%ebp),%eax > >>> 1dd0: 83 c4 10 add $0x10,%esp > >>> 1dd3: 5b pop %ebx > >>> 1dd4: 5e pop %esi > >>> 1dd5: 5f pop %edi > >>> 1dd6: 5d pop %ebp > >>> 1dd7: c3 ret > >>> > >>>> If you blacklist or rename the processor module to prevent it > >>>> from loading, does that keep the hang from occurring? > >>>> > >>> No. In that case I get the hang in default_idle+0x59/0x95 > >>> > >>> 0000007a : > >>> 7a: 55 push %ebp > >>> 7b: 89 e5 mov %esp,%ebp > >>> 7d: 56 push %esi > >>> 7e: 53 push %ebx > >>> 7f: 83 ec 18 sub $0x18,%esp > >>> 82: 83 3d 18 00 00 00 00 cmpl $0x0,0x18 > >>> 84: R_386_32 .bss > >>> 89: 75 7a jne 105 > >>> 8b: 80 3d 05 00 00 00 00 cmpb $0x0,0x5 > >>> 8d: R_386_32 boot_cpu_data > >>> 92: 74 71 je 105 > >>> 94: 83 3d 04 00 00 00 00 cmpl $0x0,0x4 > >>> 96: R_386_32 __tracepoint_power_start > >>> 9b: 74 23 je c0 > >>> 9d: 8b 1d 08 00 00 00 mov 0x8,%ebx > >>> 9f: R_386_32 __tracepoint_power_start > >>> a3: 85 db test %ebx,%ebx > >>> a5: 74 19 je c0 > >>> a7: 8d 75 e0 lea -0x20(%ebp),%esi > >>> aa: b9 01 00 00 00 mov $0x1,%ecx > >>> af: ba 01 00 00 00 mov $0x1,%edx > >>> b4: 89 f0 mov %esi,%eax > >>> b6: ff 13 call *(%ebx) > >>> b8: 83 c3 04 add $0x4,%ebx > >>> bb: 83 3b 00 cmpl $0x0,(%ebx) > >>> be: 75 ea jne aa > >>> c0: 89 e0 mov %esp,%eax > >>> c2: 25 00 e0 ff ff and $0xffffe000,%eax > >>> c7: 83 60 0c fb andl $0xfffffffb,0xc(%eax) > >>> cb: f6 40 08 08 testb $0x8,0x8(%eax) > >>> cf: 75 04 jne d5 > >>> d1: fb sti > >>> d2: f4 hlt > >>> --> d3: eb 01 jmp d6 <-------- > >>> d5: fb sti > >>> d6: 89 e0 mov %esp,%eax > >>> d8: 25 00 e0 ff ff and $0xffffe000,%eax > >>> dd: 83 48 0c 04 orl $0x4,0xc(%eax) > >>> e1: 83 3d 04 00 00 00 00 cmpl $0x0,0x4 > >>> e3: R_386_32 __tracepoint_power_end > >>> e8: 74 1e je 108 > >>> > >>>>> 7ec0a7290797f57b780f792d12f4bcc19c83aa4f is first bad commit > >>>>> commit 7ec0a7290797f57b780f792d12f4bcc19c83aa4f > >>>>> Author: Bjorn Helgaas > >>>>> Date: Mon Mar 30 17:48:24 2009 +0000 > >>>>> > >>>> Ouch, sorry about that. Thanks for doing all the bisection work. > >>>> > >>>>> ACPI: processor: use .notify method instead of installing handler > >>>>> directly > >>>>> > >>>>> This patch adds a .notify() method. The presence of .notify() causes > >>>>> Linux/ACPI to manage event handlers and notify handlers on our behalf, > >>>>> so we don't have to install and remove them ourselves. > >>>>> > >>>>> Signed-off-by: Bjorn Helgaas > >>>>> CC: Zhang Rui > >>>>> CC: Zhao Yakui > >>>>> CC: Venki Pallipadi > >>>>> CC: Anil S Keshavamurthy > >>>>> Signed-off-by: Len Brown > >>>>> > >>>>> However, reverting this commit from v2.6.30-rc1 doesn't solve the hang. > >>>>> > >>>> I don't see the problem in that commit yet, and if there is a problem > >>>> with it, I would think that reverting it from 2.6.30-rc1 would solve > >>>> it. But maybe it'd be useful to revert the whole .notify series to > >>>> make sure. From 2.6.30-rc1, you should be able to revert these: > >>>> > >>>> 7ec0a7290797f57b780f792d12f4bcc19c83aa4f processor > >>>> 373cfc360ec773be2f7615e59a19f3313255db7c button > >>>> 46ec8598fde74ba59703575c22a6fb0b6b151bb6 Linux/ACPI infrastructure > >>>> > >>>> What happens with those commits reverted? > >>>> > >>> I'll find out tomorrow. > >>> > >> The fact that it still hangs even when you don't load the processor > >> driver at all suggests that the 7ec0a729079 commit identified by the > >> bisection is not the real problem. That commit only touches > >> drivers/acpi/processor_core.c. > > > > Yah. > > > >> I think it's more likely some kind of race or missed wakeup. > >> > >> Since it seems to be sensitive to whether the battery is present, > >> I guess you could try blacklisting the battery.ko driver. There > >> have been a few changes to it since 2.6.29-rc8. If things work > >> without battery.ko, we can look through those changes. > > > > Good guess :-). I tried a couple of times either way, and blacklisting > > "battery" definitely avoids the hang. > > Ok, I tried reverting > > 0f66af530116e9f4dd97f328d91718b56a6fc5a4 "ACPI: battery: asynchronous init" > > and that fixed it. I can't help with the real problem of why the asynchronous battery init causes the hang. But I do object to the magic makefile ordering change in that commit. Nobody reading the makefile can tell why battery is down at the end, and moving it apparently slows down boot significantly. So the ordering change just feels like a band-aid that covers up a place where ACPI could be improved. I don't see anything unusual in what the battery init is doing, so it's probably just some ACPI methods that take a long time to execute. Other drivers could easily have similar problems. Bjorn -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/