Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753935AbYKZDlY (ORCPT ); Tue, 25 Nov 2008 22:41:24 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1752670AbYKZDlQ (ORCPT ); Tue, 25 Nov 2008 22:41:16 -0500 Received: from TYO201.gate.nec.co.jp ([202.32.8.193]:60564 "EHLO tyo201.gate.nec.co.jp" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752348AbYKZDlP (ORCPT ); Tue, 25 Nov 2008 22:41:15 -0500 Date: Wed, 26 Nov 2008 11:14:47 +0900 From: Daisuke Nishimura To: balbir@linux.vnet.ibm.com Cc: linux-mm@kvack.org, YAMAMOTO Takashi , Paul Menage , lizf@cn.fujitsu.com, linux-kernel@vger.kernel.org, Nick Piggin , David Rientjes , Pavel Emelianov , Dhaval Giani , Andrew Morton , KAMEZAWA Hiroyuki , nishimura@mxp.nes.nec.co.jp Subject: Re: [mm] [PATCH 3/4] Memory cgroup hierarchical reclaim (v4) Message-Id: <20081126111447.106ec275.nishimura@mxp.nes.nec.co.jp> In-Reply-To: <492C1345.9090201@linux.vnet.ibm.com> References: <20081116081034.25166.7586.sendpatchset@balbir-laptop> <20081116081055.25166.85066.sendpatchset@balbir-laptop> <20081125205832.38f8c365.nishimura@mxp.nes.nec.co.jp> <492C1345.9090201@linux.vnet.ibm.com> Organization: NEC Soft, Ltd. X-Mailer: Sylpheed 2.4.8 (GTK+ 2.10.14; i686-pc-mingw32) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4089 Lines: 98 On Tue, 25 Nov 2008 20:31:25 +0530, Balbir Singh wrote: > Daisuke Nishimura wrote: > > Hi. > > > > Unfortunately, trying to hold cgroup_mutex at reclaim causes dead lock. > > > > For example, when attaching a task to some cpuset directory(memory_migrate=on), > > > > cgroup_tasks_write (hold cgroup_mutex) > > attach_task_by_pid > > cgroup_attach_task > > cpuset_attach > > cpuset_migrate_mm > > : > > unmap_and_move > > mem_cgroup_prepare_migration > > mem_cgroup_try_charge > > mem_cgroup_hierarchical_reclaim > > > > Did lockdep complain about it? > I haven't understood lockdep so well, but I got logs like this: === INFO: task move.sh:17710 blocked for more than 480 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. move.sh D ffff88010e1c76c0 0 17710 17597 ffff8800bd9edf00 0000000000000046 0000000000000000 0000000000000000 ffff8803afbc0000 ffff8800bd9ee270 0000000e00000000 000000010a54459c ffffffffffffffff ffffffffffffffff ffffffffffffffff 7fffffffffffffff Call Trace: [] mem_cgroup_get_first_node+0x29/0x8a [] mutex_lock_nested+0x180/0x2a2 [] mem_cgroup_get_first_node+0x29/0x8a [] mem_cgroup_get_first_node+0x29/0x8a [] __mem_cgroup_try_charge+0x27a/0x2de [] mem_cgroup_prepare_migration+0x6c/0xa5 [] migrate_pages+0x10c/0x4a0 [] migrate_pages+0x155/0x4a0 [] new_node_page+0x0/0x2f [] check_range+0x300/0x325 [] do_migrate_pages+0x1a5/0x1f1 [] cpuset_migrate_mm+0x30/0x93 [] cpuset_migrate_mm+0x5a/0x93 [] cpuset_attach+0x93/0xa6 [] cgroup_attach_task+0x395/0x3e1 [] cgroup_tasks_write+0xfa/0x11d [] cgroup_tasks_write+0x39/0x11d [] cgroup_file_write+0xef/0x216 [] vfs_write+0xad/0x136 [] sys_write+0x45/0x6e [] system_call_fastpath+0x16/0x1b INFO: lockdep is turned off. === And other processes trying to hold cgroup_mutex are also stuck. > 1. We could probably move away from cgroup_mutex to a memory controller specific > mutex. > 2. We could give up cgroup_mutex before migrate_mm, since it seems like we'll > hold the cgroup lock for long and holding it during reclaim will definitely be > visible to users trying to create/delete nodes. > > I prefer to do (2), I'll look at the code more closely > I basically agree, but I think we should also consider mpol_rebind_mm. mpol_rebind_mm, which can be called from cpuset_attach, does down_write(mm->mmap_sem), which means down_write(mm->mmap_sem) can be called under cgroup_mutex. OTOH, page fault path does down_read(mm->mmap_sem) and can call mem_cgroup_try_charge, which means mutex_lock(cgroup_mutex) can be called under down_read(mm->mmap_sem). > > I think similar problem can also happen when removing memcg's directory. > > > > Why removing a directory? memcg (now) marks the directory as obsolete and we > check for obsolete directories and get/put references. > I don't think so. mem_cgroup_pre_destroy (make mem->obsolete = 1) mem_cgroup_force_empty(mem, **FALSE**) mem_cgroup_force_empty_list mem_cgroup_move_parent __mem_cgroup_try_charge hmm, but looking more closely, cgroup_call_pre_destroy is called outside of cgroup_mutex, so this problem doesn't happen at rmdir probably. Thanks, Daisuke Nishimura. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/