Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756236AbYGWWmT (ORCPT ); Wed, 23 Jul 2008 18:42:19 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1753928AbYGWWmK (ORCPT ); Wed, 23 Jul 2008 18:42:10 -0400 Received: from wolverine02.qualcomm.com ([199.106.114.251]:14125 "EHLO wolverine02.qualcomm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753341AbYGWWmJ (ORCPT ); Wed, 23 Jul 2008 18:42:09 -0400 X-IronPort-AV: E=McAfee;i="5200,2160,5345"; a="4831301" Message-ID: <4887B3BA.2050602@qualcomm.com> Date: Wed, 23 Jul 2008 15:42:02 -0700 From: Max Krasnyansky User-Agent: Thunderbird 2.0.0.14 (X11/20080501) MIME-Version: 1.0 To: Vegard Nossum CC: Suresh Siddha , LKML , the arch/x86 maintainers , "Paul E. McKenney" , Dmitry Adamushko Subject: Re: recent -git: BUG in free_thread_xstate References: <19f34abd0807231307y191c0ad7tfab4cda57ee88eb@mail.gmail.com> <20080723203109.GH14380@linux-os.sc.intel.com> <19f34abd0807231422m30dcdaf3ice9010aa8260ca50@mail.gmail.com> In-Reply-To: <19f34abd0807231422m30dcdaf3ice9010aa8260ca50@mail.gmail.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4071 Lines: 91 Vegard Nossum wrote: > On Wed, Jul 23, 2008 at 10:31 PM, Suresh Siddha > wrote: >> On Wed, Jul 23, 2008 at 01:07:04PM -0700, Vegard Nossum wrote: >>> Hi, >>> >>> I just got this on c010b2f76c3032e48097a6eef291d8593d5d79a6 (-git from >>> yesterday): >> Do you see this in 2.6.26 aswell? I suspect it is coming from post 2.6.26 >> changes. > > Yep. Got this on 2.6.26 now: > > BUG: unable to handle kernel paging request at 00664381 > IP: [] free_thread_xstate+0x4/0x30 > *pde = 00000000 > Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC > Pid: 3796, comm: bash Not tainted (2.6.26 #1) > EIP: 0060:[] EFLAGS: 00210246 CPU: 0 > EIP is at free_thread_xstate+0x4/0x30 > EAX: 00664001 EBX: f3870000 ECX: 00000004 EDX: f4b544e8 > ESI: f4bdef28 EDI: c07feda0 EBP: f5325bd0 ESP: f5325bcc > DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 > Process bash (pid: 3796, ti=f5324000 task=f4b53fc0 task.ti=f5324000) > Stack: f3870000 f5325bdc c010b8bd f4bddfa0 f5325be8 c0132b89 f4bddfa0 f5325bf4 > c0133fd1 f4b77e00 f5325bfc c01368a7 f5325c14 c0172b8c 00200282 c0752b40 > 00000001 00000009 f5325c30 c0139cd3 c0803d00 c0803d00 c0803d00 00200046 > Call Trace: > [] ? free_thread_info+0xd/0x20 > [] ? free_task+0x19/0x30 > [] ? __put_task_struct+0x51/0xa0 > [] ? delayed_put_task_struct+0x27/0x30 > [] ? rcu_process_callbacks+0x6c/0xb0 > [] ? __do_softirq+0x83/0x100 > [] ? do_softirq+0xa5/0xb0 > [] ? irq_exit+0x95/0xa0 > [] ? do_IRQ+0x4d/0xa0 > [] ? common_interrupt+0x2e/0x34 > [] ? vprintk+0x1be/0x420 > [] ? native_sched_clock+0xb5/0x110 > [] ? native_sched_clock+0xb5/0x110 > [] ? printk+0x1b/0x20 > [] ? cpu_attach_domain+0x3ec/0x410 > [] ? native_sched_clock+0xb5/0x110 > [] ? check_bytes_and_report+0x21/0xc0 > [] ? check_object+0xdf/0x1f0 > [] ? sd_free_ctl_entry+0x37/0x50 > [] ? mark_held_locks+0x65/0x80 > [] ? kfree+0xb5/0x120 > [] ? trace_hardirqs_on+0xd4/0x160 > [] ? sd_free_ctl_entry+0x37/0x50 > [] ? sd_free_ctl_entry+0x37/0x50 > [] ? sd_free_ctl_entry+0x37/0x50 > [] ? detach_destroy_domains+0x2e/0x50 > [] ? update_sched_domains+0x3b/0x50 > [] ? notifier_call_chain+0x37/0x70 > [] ? __raw_notifier_call_chain+0x19/0x20 > [] ? _cpu_down+0x78/0x240 > [] ? cpu_maps_update_begin+0xf/0x20 > [] ? cpu_down+0x2b/0x40 > [] ? store_online+0x39/0x80 > [] ? store_online+0x0/0x80 > [] ? sysdev_store+0x2b/0x40 > [] ? sysfs_write_file+0xa2/0x100 > [] ? vfs_write+0x96/0x130 > [] ? sysfs_write_file+0x0/0x100 > [] ? sys_write+0x3d/0x70 > [] ? sysenter_past_esp+0x78/0xd1 > ======================= > Code: 04 00 00 00 00 c7 04 24 00 00 04 00 e8 96 f8 08 00 a3 b4 a5 80 > c0 c9 c3 eb 0d 90 90 90 90 90 90 90 90 90 90 90 90 90 55 89 e5 53 <8b> > 90 80 03 00 00 89 c3 85 d2 74 14 a1 b4 a5 80 c0 e8 d6 e4 08 > EIP: [] free_thread_xstate+0x4/0x30 SS:ESP 0068:f5325bcc > Kernel panic - not syncing: Fatal exception in interrupt > > I'm not sure what to make of this. It looks related to the rebuilding > of sched domains that we saw earlier. But this reproduces on both > v2.6.26 and latest -git (though not with that backtrace). Based on the trace above it seems that we panic even before calling into cpusets. (ie I do not see rebuild_sched_domains() in there). Which means it must be something different. The problem we had before was that cpusets where screwing up domain rebuild sequence during cpu hotplug handling. Max -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/