Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755978AbYGKTHx (ORCPT ); Fri, 11 Jul 2008 15:07:53 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1756641AbYGKTHn (ORCPT ); Fri, 11 Jul 2008 15:07:43 -0400 Received: from rv-out-0506.google.com ([209.85.198.239]:62440 "EHLO rv-out-0506.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754767AbYGKTHl (ORCPT ); Fri, 11 Jul 2008 15:07:41 -0400 DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:cc:mime-version:content-type :content-transfer-encoding:content-disposition; b=GdKOkPGpNPGCxg4tpZl/mVvymUZonBvLKNmaUTdi5ZJ8wMULv6c7DLoayMabhQdM/J GxzihG4sM0TzMaTpkWg7wUUAW3k+poIsYlHHKOXJ++XbYAKazRlDbB9LTQXTU7p1eye5 Jyw+fZywwvSRCYIqVC1CDvSgXjwKhMew6PtVg= Message-ID: <19f34abd0807111207q2ad2011csdb46c6f451fe0f6d@mail.gmail.com> Date: Fri, 11 Jul 2008 21:07:40 +0200 From: "Vegard Nossum" To: "Dmitry Adamushko" , "Paul Jackson" , "Paul Menage" Subject: current linux-2.6.git: cpusets completely broken Cc: "Peter Zijlstra" , miaox@cn.fujitsu.com, "Linux Kernel" MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Content-Disposition: inline Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4955 Lines: 123 Hi, I have now "config-bisected" and found that the difference between a working and a non-working config is just this: +CONFIG_CPUSETS=y +CONFIG_PROC_PID_CPUSET=y The difference between a i386 defconfig base and the non-working config is: +CONFIG_CGROUPS=y +CONFIG_CPUSETS=y +CONFIG_PROC_PID_CPUSET=y (Note that group scheduling is off and has nothing to with it!) The result of having CPUSETS enabled as above is a 100% reproducible BUG on the very first cpu hot-unplug: ------------[ cut here ]------------ kernel BUG at xxx/linux-2.6/kernel/sched.c:5859! invalid opcode: 0000 [#1] SMP Modules linked in: Pid: 3653, comm: bash Not tainted (2.6.26-rc9-00102-ge5a5816 #3) EIP: 0060:[] EFLAGS: 00210046 CPU: 0 EIP is at migration_call+0x29b/0x3bb EAX: c1816558 EBX: c1816500 ECX: 01213000 EDX: c0603500 ESI: f7035f80 EDI: c1816500 EBP: 00000001 ESP: f6c17ec4 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 Process bash (pid: 3653, ti=f6c16000 task=f6cb1c00 task.ti=f6c16000) Stack: c1816500 c0411b05 c0106fc8 c05b10f8 ffffffff 00000000 c05b11ac c0410375 00000001 00000007 00000007 00000001 00000000 f71f5500 c012f916 ffffffff 00000000 c03f42fb 00000000 00000001 fffffffd 00000003 0000001f 00000001 Call Trace: [] _etext+0x0/0xb [] alternatives_smp_unlock+0x42/0x4f [] notifier_call_chain+0x2a/0x47 [] raw_notifier_call_chain+0x9/0xc [] _cpu_down+0x14c/0x1ee [] cpu_down+0x20/0x2c [] store_online+0x24/0x56 [] store_online+0x0/0x56 [] sysdev_store+0x1e/0x22 [] sysfs_write_file+0xa4/0xd8 [] sysfs_write_file+0x0/0xd8 [] vfs_write+0x83/0xf6 [] sys_write+0x3c/0x63 [] sysenter_past_esp+0x6a/0x91 ======================= Code: 18 85 c0 89 c6 75 04 8b 1b eb f0 8b 4e 24 89 f2 8b 04 24 ff 51 1c ba 00 35 60 c0 8b 0c ad 00 d2 5b c0 83 be c0 00 00 00 00 75 04 <0f> 0b eb fe 8b 06 83 f8 40 75 04 0f 0b eb fe 8d 1c 0a 90 ff 46 EIP: [] migration_call+0x29b/0x3bb SS:ESP 0068:f6c17ec4 BUG: NMI Watchdog detected LOCKUP on CPU0, ip c040e76e, registers: Modules linked in: Pid: 3653, comm: bash Not tainted (2.6.26-rc9-00102-ge5a5816 #3) EIP: 0060:[] EFLAGS: 00200097 CPU: 0 EIP is at _spin_lock+0x10/0x15 EAX: c1816500 EBX: c0603500 ECX: 63adf1df EDX: 0000e2e1 ESI: c1816500 EDI: f6c17d6c EBP: f6db5500 ESP: f6c17d50 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 Process bash (pid: 3653, ti=f6c16000 task=f6cb1c00 task.ti=f6c16000) Stack: c01170bb f6db5500 c180c500 00000000 00000000 c011736c 00000001 00200092 f7353f3c c0566ef8 00000001 00000000 c012c4ad 00000000 f7353f3c c0566ef8 c0115e44 00000000 00000001 c0566f00 c0566f00 00000000 00000001 00200092 Call Trace: [] task_rq_lock+0x28/0x4b [] try_to_wake_up+0x65/0xe0 [] autoremove_wake_function+0xd/0x2d [] __wake_up_common+0x2e/0x58 [] __wake_up+0x29/0x39 [] wake_up_klogd+0x2b/0x2d [] die+0xb1/0x10f [] do_invalid_op+0x0/0x6b [] do_invalid_op+0x62/0x6b [] migration_call+0x29b/0x3bb [] kprobe_flush_task+0x4b/0x80 [] hrtick_set+0x7a/0xd8 [] schedule+0x5b6/0x5e8 [] update_curr_rt+0x92/0x339 [] error_code+0x72/0x78 [] send_IPI_mask_sequence+0x24/0x91 [] migration_call+0x29b/0x3bb [] _etext+0x0/0xb [] alternatives_smp_unlock+0x42/0x4f [] notifier_call_chain+0x2a/0x47 [] raw_notifier_call_chain+0x9/0xc [] _cpu_down+0x14c/0x1ee [] cpu_down+0x20/0x2c [] store_online+0x24/0x56 [] store_online+0x0/0x56 [] sysdev_store+0x1e/0x22 [] sysfs_write_file+0xa4/0xd8 [] sysfs_write_file+0x0/0xd8 [] vfs_write+0x83/0xf6 [] sys_write+0x3c/0x63 [] sysenter_past_esp+0x6a/0x91 ======================= Code: 00 00 01 0f 94 c0 84 c0 b9 01 00 00 00 75 09 90 81 02 00 00 00 01 30 c9 89 c8 c3 ba 00 01 00 00 90 66 0f c1 10 38 f2 74 06 f3 90 <8a> 10 eb f6 c3 90 81 28 00 00 00 01 74 05 e8 4f ff ff ff c3 53 Also, this is on the latest linux-2.6.git! Since we're so close to release, maybe cpusets should simply be marked BROKEN for now? (Unless we can fix it, of course. The alternative is to apply Miao Xie's workaround patch temporarily.) I hope this helps at least a little bit. Vegard -- "The animistic metaphor of the bug that maliciously sneaked in while the programmer was not looking is intellectually dishonest as it disguises that the error is the programmer's own creation." -- E. W. Dijkstra, EWD1036 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/