Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757273AbYGBU4X (ORCPT ); Wed, 2 Jul 2008 16:56:23 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1754389AbYGBU4N (ORCPT ); Wed, 2 Jul 2008 16:56:13 -0400 Received: from e28smtp04.in.ibm.com ([59.145.155.4]:40425 "EHLO e28esmtp04.in.ibm.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1753780AbYGBU4L (ORCPT ); Wed, 2 Jul 2008 16:56:11 -0400 Date: Thu, 3 Jul 2008 02:25:44 +0530 From: Dhaval Giani To: Ingo Molnar , Thomas Gleixner Cc: Arun Bharadwaj , lkml Subject: Re: [x86-tip] panic during cpu_up Message-ID: <20080702205544.GB13252@linux.vnet.ibm.com> Reply-To: Dhaval Giani References: <20080702190651.GA13252@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20080702190651.GA13252@linux.vnet.ibm.com> User-Agent: Mutt/1.5.18 (2008-05-17) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 11475 Lines: 303 [Missed cc'ing the LKML last time around] On Thu, Jul 03, 2008 at 12:36:51AM +0530, Dhaval Giani wrote: > Hi Ingo, Thomas, > > I am hitting this on -tip. With 200a86b5d435a217c3d77f3b53cd32cb78c1fde8 > as the top level commit. Wondering if it is known? > > I am trying to fix it atm. > > Thanks, > > Red Hat Enterprise Linux AS release 4 (Nahant Update 2) > Kernel 2.6.26-rc8-tip on an i686 > > llm11.in.ibm.com login: root > Password: > Last login: Thu Jul 3 00:30:47 on ttyS0 > You have new mail. > cd[root@llm11 ~]# cd /sys/devices/system/cpu/cpu1/ > [root@llm11 cpu1]# echo 0 > online > Breaking affinity for irq 45 > [root@llm11 cpu1]# echo 1 > online > lockdep: fixing up alternatives. > BUG: unable to handle kernel <1>BUG: unable to handle kernel NULL > pointer dereference at 00000000 > IP: [<00000000>] > *pde = 00000000 > Oops: 0000 [#1] SMP > Modules linked in: > > Pid: 0, comm: swapper Not tainted (2.6.26-rc8-tip #1) > EIP: 0060:[<00000000>] EFLAGS: 00010002 CPU: 2 > EIP is at 0x0 > EAX: c0614c00 EBX: 00000002 ECX: 0799d000 EDX: ffff3bdf > BUG: unable to handle kernel NULL pointer dereference at 00000000 > IP:ESI: 00000000 EDI: 00000000 EBP: f7cadfac ESP: f7cadfa4 > DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 > [<00000000>] > BUG: unable to handle kernel NULL pointer dereference*pde = 00000000 > <1>BUG: unable to handle kernel at 00000000 > IP:<0>Process swapper (pid: 0, ti=f7cac000 task=f7caaee0 > task.ti=f7cac000)NULL pointer dereference [<00000000>] > *pde = 00000000 <1>BUG: unable to handle kernel > Stack: > at 00000000 > IP:NULL pointer dereference [<00000000>] > at 00000000 > c01020c7 *pde = 00000000 <1>IP: > > [<00000000>] > 0402080c *pde = 00000000 f7cadfb4 > c040b6cf 00000000 00000000 00000000 00000000 > 00000000 00000000 00000000 00000000 00000000 00000000 000000d8 > 00000000 > 00000000 00000000 00000000 00000000 00000000 00000000 00000000 > Call Trace: > [] ? cpu_idle+0x8a/0x9e > [] ? start_secondary+0xbb/0xbd > ======================= > Code: Bad EIP value. > EIP: [<00000000>] 0x0 SS:ESP 0068:f7cadfa4 > Kernel panic - not syncing: Fatal exception > Oops: 0000 [#2] SMP > Pid: 0, comm: swapper Tainted: G D 2.6.26-rc8-tip #1 > Modules linked in: [] > > panic+0x38/0xe0 > Pid: 0, comm: swapper Tainted: G D (2.6.26-rc8-tip #1) > EIP: 0060:[<00000000>] EFLAGS: 00010046 CPU: 3 > [] EIP is at 0x0 > EAX: c0614c00 EBX: 00000003 ECX: 079a5000 EDX: ffff3bdf > ESI: 00000000 EDI: 00000000 EBP: f7cbbfac ESP: f7cbbfa4 > DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 > Process swapper (pid: 0, ti=f7cba000 task=f7cb8fe0 task.ti=f7cba000) > Stack: die+0x130/0x147 > c01020c7 0602080c f7cbbfb4 c040b6cf 00000000 [] 00000000 > 00000000 00000000 > 00000000 do_page_fault+0x3bd/0x482 > 00000000 [] ? 00000000 00000000 00000000 00000000 > do_page_fault+0x0/0x482 > 000000d8 00000000 [] > 00000000 error_code+0x72/0x78 > 00000000 00000000 [] ? 00000000 00000000 cpu_idle+0x8a/0x9e > 00000000 00000000 > Call Trace: > [] <0> [] start_secondary+0xbb/0xbd > ? ======================= > cpu_idle+0x8a/0x9e > [] ? start_secondary+0xbb/0xbd > ======================= > Code: Bad EIP value. > EIP: [<00000000>] 0x0 SS:ESP 0068:f7cbbfa4 > Oops: 0000 [#3] <0>Kernel panic - not syncing: Fatal exception > SMP Pid: 0, comm: swapper Tainted: G D 2.6.26-rc8-tip #1 > > [] Modules linked in: > > panic+0x38/0xe0 > Pid: 0, comm: swapper Tainted: G D (2.6.26-rc8-tip #1) > [] EIP: 0060:[<00000000>] EFLAGS: 00010046 CPU: 7 > die+0x130/0x147 > EIP is at 0x0 > [] EAX: c0614c00 EBX: 00000007 ECX: 079c5000 EDX: ffff3bdf > ESI: 00000000 EDI: 00000000 EBP: f7d1bfac ESP: f7d1bfa4 > DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 > Process swapper (pid: 0, ti=f7d1a000 task=f7d18be0 > task.ti=f7d1a000)do_page_fault+0x3bd/0x482 > > Stack: [] c01020c7 ? 0702080c f7d1bfb4 > do_page_fault+0x0/0x482 > c040b6cf [] 00000000 00000000 error_code+0x72/0x78 > 00000000 00000000 > [] ? 00000000 00000000 cpu_idle+0x8a/0x9e > 00000000 00000000 [] 00000000 00000000 000000d8 > start_secondary+0xbb/0xbd > 00000000 ======================= > > 00000000 00000000 00000000 00000000 00000000 00000000 00000000 > Call Trace: > [] ? cpu_idle+0x8a/0x9e > [] ? start_secondary+0xbb/0xbd > ======================= > Code: Bad EIP value. > EIP: [<00000000>] 0x0 SS:ESP 0068:f7d1bfa4 > Kernel panic - not syncing: Fatal exception > Oops: 0000 [#4] SMP > Pid: 0, comm: swapper Tainted: G D 2.6.26-rc8-tip #1 > Modules linked in: [] > panic+0x38/0xe0 > > [] Pid: 0, comm: swapper Tainted: G D (2.6.26-rc8-tip > #1) > EIP: 0060:[<00000000>] EFLAGS: 00010046 CPU: 4 > die+0x130/0x147 > EIP is at 0x0 > EAX: c0614c00 EBX: 00000004 ECX: 079ad000 EDX: ffff3bdf > ESI: 00000000 EDI: 00000000 EBP: f7ccbfac ESP: f7ccbfa4 > DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 > Process swapper (pid: 0, ti=f7cca000 task=f7cc90e0 task.ti=f7cca000) > [] > Stack: c01020c7 do_page_fault+0x3bd/0x482 > 0102080c [] f7ccbfb4 c040b6cf 00000000 00000000 00000000 > 00000000 ? > 00000000 00000000 00000000 00000000 00000000 > do_page_fault+0x0/0x482 > 00000000 000000d8 00000000 > 00000000 00000000 00000000 00000000 00000000 00000000 00000000 > Call Trace: > [] <0> [] ? error_code+0x72/0x78 > [] ? cpu_idle+0x8a/0x9e > cpu_idle+0x8a/0x9e > [] ? [] start_secondary+0xbb/0xbd > ======================= > start_secondary+0xbb/0xbd > ======================= > Code: Bad EIP value. > EIP: [<00000000>] 0x0 SS:ESP 0068:f7ccbfa4 > Oops: 0000 [#5] <0>Kernel panic - not syncing: Fatal exception > SMP > Pid: 0, comm: swapper Tainted: G D 2.6.26-rc8-tip #1 > Modules linked in: [] > > panic+0x38/0xe0 > Pid: 0, comm: swapper Tainted: G D (2.6.26-rc8-tip #1) > [] EIP: 0060:[<00000000>] EFLAGS: 00010046 CPU: 6 > die+0x130/0x147 > EIP is at 0x0 > EAX: c0614c00 EBX: 00000006 ECX: 079bd000 EDX: ffff3bdf > ESI: 00000000 EDI: 00000000 EBP: f7d0bfac ESP: f7d0bfa4 > [] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 > Process swapper (pid: 0, ti=f7d0a000 task=f7d092e0 > task.ti=f7d0a000)do_page_fault+0x3bd/0x482 > > Stack: [] ? c01020c7 0502080c do_page_fault+0x0/0x482 > f7d0bfb4 [] c040b6cf error_code+0x72/0x78 > 00000000 [] ? 00000000 cpu_idle+0x8a/0x9e > 00000000 [] 00000000 start_secondary+0xbb/0xbd > > ======================= > 00000000 00000000 00000000 00000000 00000000 00000000 000000d8 00000000 > 00000000 00000000 00000000 00000000 00000000 00000000 00000000 > Call Trace: > [] ? cpu_idle+0x8a/0x9e > [] ? start_secondary+0xbb/0xbd > ======================= > Code: Bad EIP value. > EIP: [<00000000>] 0x0 SS:ESP 0068:f7d0bfa4 > Kernel panic - not syncing: Fatal exception > Pid: 0, comm: swapper Tainted: G D 2.6.26-rc8-tip #1 > [] panic+0x38/0xe0 > [] die+0x130/0x147 > [] do_page_fault+0x3bd/0x482 > [] ? do_page_fault+0x0/0x482 > [] error_code+0x72/0x78 > [] ? cpu_idle+0x8a/0x9e > [] start_secondary+0xbb/0xbd > ======================= > NULL pointer dereference at 00000000 > IP: [<00000000>] > *pde = 00000000 > Oops: 0000 [#6] SMP > Modules linked in: > > Pid: 0, comm: swapper Tainted: G D (2.6.26-rc8-tip #1) > EIP: 0060:[<00000000>] EFLAGS: 00010006 CPU: 1 > EIP is at 0x0 > EAX: c0614c00 EBX: 00000001 ECX: 07995000 EDX: ffff3bdf > ESI: 00000000 EDI: 00000000 EBP: f7c7ffb4 ESP: f7c7ffac > DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 > Process swapper (pid: 0, ti=f7c7e000 task=f7c7cde0 task.ti=f7c7e000) > Stack: c01020c7 0202080c f7c7ffbc c040b6cf 00000000 00000000 00000000 > 00000000 > 00000000 00000000 00000000 00000000 000000d8 00000000 00000000 > 00000000 > 00000000 00000000 00000000 00000000 00000000 > Call Trace: > [] ? cpu_idle+0x8a/0x9e > [] ? start_secondary+0xbb/0xbd > ======================= > Code: Bad EIP value. > EIP: [<00000000>] 0x0 SS:ESP 0068:f7c7ffac > Kernel panic - not syncing: Fatal exception > Pid: 0, comm: swapper Tainted: G D 2.6.26-rc8-tip #1 > [] panic+0x38/0xe0 > [] die+0x130/0x147 > [] do_page_fault+0x3bd/0x482 > [] ? do_page_fault+0x0/0x482 > [] error_code+0x72/0x78 > [] ? cpu_idle+0x8a/0x9e > [] start_secondary+0xbb/0xbd > ======================= > BUG: NMI Watchdog detected LOCKUP on CPU6, ip c0111959, registers: > Modules linked in: > > Pid: 0, comm: swapper Tainted: G D (2.6.26-rc8-tip #1) > EIP: 0060:[] EFLAGS: 00000093 CPU: 6 > EIP is at __smp_call_function+0x5d/0x7a > EAX: 0000009e EBX: 00000005 ECX: 00000006 EDX: f7d092e0 > ESI: c01047d4 EDI: c0111a5a EBP: f7d0bf00 ESP: f7d0bed0 > DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 > Process swapper (pid: 0, ti=f7d0a000 task=f7d092e0 task.ti=f7d0a000) > Stack: 00000000 c0111a5a 00000000 00000000 c01047d4 00000000 c029915b > f7d0bf6c > 00000006 00000001 00000046 00000000 f7d0bf14 c0111acb 00000000 > f7d0bf6c > 00000006 f7d0bf20 c01276ee f7d0bf6c f7d0bf3c c0104bcd c04d922f > c04e06cf > Call Trace: > [] ? stop_this_cpu+0x0/0x3a > [] ? show_trace+0x10/0x12 > [] ? do_unblank_screen+0x2a/0xf9 > [] ? native_smp_send_stop+0x37/0x6a > [] ? panic+0x4f/0xe0 > [] ? die+0x130/0x147 > [] ? do_page_fault+0x3bd/0x482 > [] ? do_page_fault+0x0/0x482 > [] ? error_code+0x72/0x78 > [] ? cpu_idle+0x8a/0x9e > [] ? start_secondary+0xbb/0xbd > ======================= > Code: 85 c0 0f 44 75 e0 89 45 e4 8d 45 d4 a3 d4 33 62 c0 89 75 e0 0f ae > f0 0f 1f 00 8b 15 e0 69 57 c0 b8 fb 00 00 00 ff 52 78 39 5d dc <74> 04 > f3 90 eb f7 83 7d 08 00 74 09 39 5d e0 74 04 f3 90 eb f7 > So after digging around a bit, it turns out the pm_idle is NULL. For some reason it is not getting set to default_idle if nothing works. I am not sure of the path being followed, and its a bit late for me to be trying anything serious :). This seems to work as a temporary workaround, but obviously is not the right fix yet. --- arch/x86/kernel/process_32.c | 5 ++++- 1 files changed, 4 insertions(+), 1 deletion(-) Index: linux-2.6.26-rc8-tip/arch/x86/kernel/process_32.c =================================================================== --- linux-2.6.26-rc8-tip.orig/arch/x86/kernel/process_32.c +++ linux-2.6.26-rc8-tip/arch/x86/kernel/process_32.c @@ -144,7 +144,10 @@ void cpu_idle(void) __get_cpu_var(irq_stat).idle_timestamp = jiffies; /* Don't trace irqs off for idle */ stop_critical_timings(); - pm_idle(); + if (pm_idle) + pm_idle(); + else + default_idle(); start_critical_timings(); } tick_nohz_restart_sched_tick(); -- regards, Dhaval -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/