Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756058AbYLJH4t (ORCPT ); Wed, 10 Dec 2008 02:56:49 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1754803AbYLJH4l (ORCPT ); Wed, 10 Dec 2008 02:56:41 -0500 Received: from TYO201.gate.nec.co.jp ([202.32.8.193]:40260 "EHLO tyo201.gate.nec.co.jp" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754421AbYLJH4k (ORCPT ); Wed, 10 Dec 2008 02:56:40 -0500 Date: Wed, 10 Dec 2008 16:41:26 +0900 From: Daisuke Nishimura To: balbir@linux.vnet.ibm.com Cc: menage@google.com, KAMEZAWA Hiroyuki , Daisuke Miyakawa , YAMAMOTO Takashi , linux-kernel@vger.kernel.org, linux-mm@kvack.org, nishimura@mxp.nes.nec.co.jp Subject: Re: [RFC][RFT] memcg fix cgroup_mutex deadlock when cpuset reclaims memory Message-Id: <20081210164126.8b3be761.nishimura@mxp.nes.nec.co.jp> In-Reply-To: <20081210151948.9a83f70a.nishimura@mxp.nes.nec.co.jp> References: <20081210051947.GH7593@balbir.in.ibm.com> <20081210151948.9a83f70a.nishimura@mxp.nes.nec.co.jp> Organization: NEC Soft, Ltd. X-Mailer: Sylpheed 2.4.8 (GTK+ 2.10.14; i686-pc-mingw32) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 11566 Lines: 237 On Wed, 10 Dec 2008 15:19:48 +0900, Daisuke Nishimura wrote: > On Wed, 10 Dec 2008 10:49:47 +0530, Balbir Singh wrote: > > Hi, > > > > Here is a proposed fix for the memory controller cgroup_mutex deadlock > > reported. It is lightly tested and reviewed. I need help with review > > and test. Is the reported deadlock reproducible after this patch? A > > careful review of the cpuset impact will also be highly appreciated. > > > > From: Balbir Singh > > > > cpuset_migrate_mm() holds cgroup_mutex throughout the duration of > > do_migrate_pages(). The issue with that is that > > > > 1. It can lead to deadlock with memcg, as do_migrate_pages() > > enters reclaim > > 2. It can lead to long latencies, preventing users from creating/ > > destroying other cgroups anywhere else > > > > The patch holds callback_mutex through the duration of cpuset_migrate_mm() and > > gives up cgroup_mutex while doing so. > > > I agree changing cpuset_migrate_mm not to hold cgroup_mutex to fix the dead lock > is one choice, and it looks good to me at the first impression. > > But I'm not sure it's good to change cpuset(other subsystem) code because of memcg. > > Anyway, I'll test this patch and report the result tomorrow. > (Sorry, I don't have enough time today.) > Unfortunately, this patch doesn't seem enough. This patch can fix dead lock caused by "circular lock of cgroup_mutex", but cannot that of caused by "race between page reclaim and cpuset_attach(mpol_rebind_mm)". (The dead lock I fixed in memcg-avoid-dead-lock-caused-by-race-between-oom-and-cpuset_attach.patch was caused by "race between memcg's oom and mpol_rebind_mm, and was independent of hierarchy.) I attach logs I got in testing this patch. Thanks, Daisuke Nishimura. === INFO: task automount:23438 blocked for more than 480 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. automount D ffff88010ae963c0 0 23438 1 ffff8803ab8f9300 0000000000000046 0000000000000000 0000000000000000 ffff88010fb72600 ffff8803ab8f9670 0000000d00000000 0000000100026d3a ffffffffffffffff ffffffffffffffff ffffffffffffffff 7fffffffffffffff Call Trace: [] cgroup_show_options+0x20/0xa3 [] mutex_lock_nested+0x188/0x2b2 [] cgroup_show_options+0x20/0xa3 [] mntput_no_expire+0x1e/0x139 [] seq_escape+0x3a/0xb8 [] cgroup_show_options+0x20/0xa3 [] show_vfsmnt+0xd7/0xf5 [] seq_read+0x20c/0x2e5 [] vfs_read+0xaa/0x133 [] fget_light+0x49/0xe1 [] sys_read+0x45/0x6e [] system_call_fastpath+0x16/0x1b INFO: lockdep is turned off. INFO: task automount:24873 blocked for more than 480 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. automount D ffff88010ae963c0 0 24873 1 ffff8803ab8fcc00 0000000000000046 0000000000000000 0000000000000000 ffff8803afbe4c00 ffff8803ab8fcf70 0000000f00000000 0000000100029028 ffffffffffffffff ffffffffffffffff ffffffffffffffff 7fffffffffffffff Call Trace: [] cgroup_show_options+0x20/0xa3 [] mutex_lock_nested+0x188/0x2b2 [] cgroup_show_options+0x20/0xa3 [] mntput_no_expire+0x1e/0x139 [] seq_escape+0x3a/0xb8 [] cgroup_show_options+0x20/0xa3 [] show_vfsmnt+0xd7/0xf5 [] seq_read+0x20c/0x2e5 [] vfs_read+0xaa/0x133 [] fget_light+0x49/0xe1 [] sys_read+0x45/0x6e [] system_call_fastpath+0x16/0x1b INFO: lockdep is turned off. INFO: task mmapstress10:21307 blocked for more than 480 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. mmapstress10 D ffff88010acb84c0 0 21307 14494 ffff88010ad8df00 0000000000000046 0000000000000000 0000000000000000 ffff88010fada600 ffff88010ad8e270 0000000700000000 000000010002983e ffffffffffffffff ffffffffffffffff ffffffffffffffff 7fffffffffffffff Call Trace: [] mem_cgroup_get_first_node+0x29/0x8a [] mutex_lock_nested+0x188/0x2b2 [] mem_cgroup_get_first_node+0x29/0x8a [] mem_cgroup_get_first_node+0x29/0x8a [] mem_cgroup_hierarchical_reclaim+0x4c/0xc6 [] __mem_cgroup_try_charge+0x151/0x1d1 [] __mem_cgroup_try_charge+0x40/0x1d1 [] mem_cgroup_charge_common+0x46/0x72 [] do_wp_page+0x45a/0x646 [] handle_mm_fault+0x6a8/0x737 [] handle_mm_fault+0x6d9/0x737 [] do_page_fault+0x3ab/0x753 [] page_fault+0x1f/0x30 INFO: lockdep is turned off. INFO: task shmem_test_02:22746 blocked for more than 480 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. shmem_test_02 D ffff88010ac81c80 0 22746 14216 ffff8800bf1fa600 0000000000000046 0000000000000000 0000000000000000 ffff88010fa9a600 ffff8800bf1fa970 0000000400000000 000000010002593a ffffffffffffffff ffffffffffffffff ffffffffffffffff 7fffffffffffffff Call Trace: [] mem_cgroup_get_first_node+0x29/0x8a [] mutex_lock_nested+0x188/0x2b2 [] mem_cgroup_get_first_node+0x29/0x8a [] mem_cgroup_get_first_node+0x29/0x8a [] mem_cgroup_hierarchical_reclaim+0x4c/0xc6 [] __mem_cgroup_try_charge+0x151/0x1d1 [] __mem_cgroup_try_charge+0x40/0x1d1 [] mem_cgroup_charge_common+0x46/0x72 [] shmem_getpage+0x6ae/0x851 [] task_rq_lock+0x44/0x78 [] trace_hardirqs_on_thunk+0x3a/0x3f [] trace_hardirqs_on_thunk+0x3a/0x3f [] do_IRQ+0x139/0x15d [] mnt_want_write+0x6e/0x76 [] mnt_drop_write+0x25/0xec [] shmem_fault+0x3a/0x5f [] __do_fault+0x51/0x402 [] handle_mm_fault+0x1c1/0x737 [] do_page_fault+0x3ab/0x753 [] page_fault+0x1f/0x30 INFO: lockdep is turned off. INFO: task move.sh:22750 blocked for more than 480 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. move.sh D ffff88010b4ab900 0 22750 19661 ffff8803ab8fdf00 0000000000000046 0000000000000000 0000000000000002 ffff8803afbe4c00 ffff8803ab8fe270 0000000f00000000 0000000100025624 ffffffffffffffff ffffffffffffffff ffffffffffffffff 7fffffffffffffff Call Trace: [] __down_write_nested+0x7e/0x96 [] down_write+0x64/0x75 [] mpol_rebind_mm+0x16/0x3f [] mpol_rebind_mm+0x16/0x3f [] cpuset_attach+0x7d/0xa6 [] cgroup_attach_task+0x33d/0x397 [] cgroup_tasks_write+0xfa/0x11e [] cgroup_tasks_write+0x39/0x11e [] cgroup_file_write+0xed/0x20b [] vfs_write+0xad/0x136 [] sys_write+0x45/0x6e [] system_call_fastpath+0x16/0x1b INFO: lockdep is turned off. INFO: task udev_run_devd:22758 blocked for more than 480 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. udev_run_devd D ffff88010eb39c80 0 22758 1 ffff8801091e8000 0000000000000046 0000000000000000 0000000000000000 ffff88010fb4cc00 ffff8801091e8370 0000000c00000000 0000000100024ffd ffffffffffffffff ffffffffffffffff ffffffffffffffff 7fffffffffffffff Call Trace: [] cgroup_show_options+0x20/0xa3 [] mutex_lock_nested+0x188/0x2b2 [] cgroup_show_options+0x20/0xa3 [] mntput_no_expire+0x1e/0x139 [] seq_escape+0x3a/0xb8 [] cgroup_show_options+0x20/0xa3 [] show_vfsmnt+0xd7/0xf5 [] seq_read+0x20c/0x2e5 [] vfs_read+0xaa/0x133 [] sys_read+0x45/0x6e [] system_call_fastpath+0x16/0x1b INFO: lockdep is turned off. INFO: task ls:22850 blocked for more than 480 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. ls D ffff88010e8897c0 0 22850 8012 ffff8801091edf00 0000000000000046 0000000000000000 0000000000000000 ffff88010fb22600 ffff8801091ee270 0000000a00000000 0000000100025a97 ffffffffffffffff ffffffffffffffff ffffffffffffffff 7fffffffffffffff Call Trace: [] cgroup_show_options+0x20/0xa3 [] mutex_lock_nested+0x188/0x2b2 [] cgroup_show_options+0x20/0xa3 [] mntput_no_expire+0x1e/0x139 [] seq_escape+0x3a/0xb8 [] cgroup_show_options+0x20/0xa3 [] show_vfsmnt+0xd7/0xf5 [] seq_read+0x20c/0x2e5 [] vfs_read+0xaa/0x133 [] sys_read+0x45/0x6e [] system_call_fastpath+0x16/0x1b INFO: lockdep is turned off. INFO: task multipath:27599 blocked for more than 480 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. multipath D ffff88010b4af200 0 27599 1 ffff8800ba971300 0000000000000046 0000000000000000 0000000000000000 ffff8803afbe4c00 ffff8800ba971670 0000000f00000000 0000000100051678 ffffffffffffffff ffffffffffffffff ffffffffffffffff 7fffffffffffffff Call Trace: [] cgroup_show_options+0x20/0xa3 [] mutex_lock_nested+0x188/0x2b2 [] cgroup_show_options+0x20/0xa3 [] mntput_no_expire+0x1e/0x139 [] seq_escape+0x3a/0xb8 [] cgroup_show_options+0x20/0xa3 [] show_vfsmnt+0xd7/0xf5 [] seq_read+0x20c/0x2e5 [] vfs_read+0xaa/0x133 [] sys_read+0x45/0x6e [] system_call_fastpath+0x16/0x1b INFO: lockdep is turned off. INFO: task udev_run_hotplu:27601 blocked for more than 480 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. udev_run_hotp D ffff88010e476880 0 27601 1 ffff8801091ea600 0000000000000046 0000000000000000 0000000000000000 ffff88010fb48000 ffff8801091ea970 0000000b00000000 0000000100051867 ffffffffffffffff ffffffffffffffff ffffffffffffffff 7fffffffffffffff Call Trace: [] cgroup_show_options+0x20/0xa3 [] mutex_lock_nested+0x188/0x2b2 [] cgroup_show_options+0x20/0xa3 [] mntput_no_expire+0x1e/0x139 [] seq_escape+0x3a/0xb8 [] cgroup_show_options+0x20/0xa3 [] show_vfsmnt+0xd7/0xf5 [] seq_read+0x20c/0x2e5 [] vfs_read+0xaa/0x133 [] sys_read+0x45/0x6e [] system_call_fastpath+0x16/0x1b INFO: lockdep is turned off. === -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/