Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753311AbYLIDAz (ORCPT ); Mon, 8 Dec 2008 22:00:55 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1752503AbYLIDAk (ORCPT ); Mon, 8 Dec 2008 22:00:40 -0500 Received: from fgwmail5.fujitsu.co.jp ([192.51.44.35]:53861 "EHLO fgwmail5.fujitsu.co.jp" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752069AbYLIDAj (ORCPT ); Mon, 8 Dec 2008 22:00:39 -0500 Date: Tue, 9 Dec 2008 11:59:43 +0900 From: KAMEZAWA Hiroyuki To: Daisuke Nishimura Cc: balbir@linux.vnet.ibm.com, linux-mm@kvack.org, YAMAMOTO Takashi , Paul Menage , lizf@cn.fujitsu.com, linux-kernel@vger.kernel.org, Nick Piggin , David Rientjes , Pavel Emelianov , Dhaval Giani , Andrew Morton Subject: Re: [mm] [PATCH 3/4] Memory cgroup hierarchical reclaim (v4) Message-Id: <20081209115943.7d6a0ea3.kamezawa.hiroyu@jp.fujitsu.com> In-Reply-To: <20081126111447.106ec275.nishimura@mxp.nes.nec.co.jp> References: <20081116081034.25166.7586.sendpatchset@balbir-laptop> <20081116081055.25166.85066.sendpatchset@balbir-laptop> <20081125205832.38f8c365.nishimura@mxp.nes.nec.co.jp> <492C1345.9090201@linux.vnet.ibm.com> <20081126111447.106ec275.nishimura@mxp.nes.nec.co.jp> Organization: FUJITSU Co. LTD. X-Mailer: Sylpheed 2.5.0 (GTK+ 2.10.14; i686-pc-mingw32) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4474 Lines: 111 On Wed, 26 Nov 2008 11:14:47 +0900 Daisuke Nishimura wrote: > On Tue, 25 Nov 2008 20:31:25 +0530, Balbir Singh wrote: > > Daisuke Nishimura wrote: > > > Hi. > > > > > > Unfortunately, trying to hold cgroup_mutex at reclaim causes dead lock. > > > > > > For example, when attaching a task to some cpuset directory(memory_migrate=on), > > > > > > cgroup_tasks_write (hold cgroup_mutex) > > > attach_task_by_pid > > > cgroup_attach_task > > > cpuset_attach > > > cpuset_migrate_mm > > > : > > > unmap_and_move > > > mem_cgroup_prepare_migration > > > mem_cgroup_try_charge > > > mem_cgroup_hierarchical_reclaim > > > > > > > Did lockdep complain about it? > > > I haven't understood lockdep so well, but I got logs like this: > > === > INFO: task move.sh:17710 blocked for more than 480 seconds. > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > move.sh D ffff88010e1c76c0 0 17710 17597 > ffff8800bd9edf00 0000000000000046 0000000000000000 0000000000000000 > ffff8803afbc0000 ffff8800bd9ee270 0000000e00000000 000000010a54459c > ffffffffffffffff ffffffffffffffff ffffffffffffffff 7fffffffffffffff > Call Trace: > [] mem_cgroup_get_first_node+0x29/0x8a > [] mutex_lock_nested+0x180/0x2a2 > [] mem_cgroup_get_first_node+0x29/0x8a > [] mem_cgroup_get_first_node+0x29/0x8a > [] __mem_cgroup_try_charge+0x27a/0x2de > [] mem_cgroup_prepare_migration+0x6c/0xa5 > [] migrate_pages+0x10c/0x4a0 > [] migrate_pages+0x155/0x4a0 > [] new_node_page+0x0/0x2f > [] check_range+0x300/0x325 > [] do_migrate_pages+0x1a5/0x1f1 > [] cpuset_migrate_mm+0x30/0x93 > [] cpuset_migrate_mm+0x5a/0x93 > [] cpuset_attach+0x93/0xa6 > [] cgroup_attach_task+0x395/0x3e1 > [] cgroup_tasks_write+0xfa/0x11d > [] cgroup_tasks_write+0x39/0x11d > [] cgroup_file_write+0xef/0x216 > [] vfs_write+0xad/0x136 > [] sys_write+0x45/0x6e > [] system_call_fastpath+0x16/0x1b > INFO: lockdep is turned off. > === > > And other processes trying to hold cgroup_mutex are also stuck. > > > 1. We could probably move away from cgroup_mutex to a memory controller specific > > mutex. > > 2. We could give up cgroup_mutex before migrate_mm, since it seems like we'll > > hold the cgroup lock for long and holding it during reclaim will definitely be > > visible to users trying to create/delete nodes. > > > > I prefer to do (2), I'll look at the code more closely > > > I basically agree, but I think we should also consider mpol_rebind_mm. > > mpol_rebind_mm, which can be called from cpuset_attach, does down_write(mm->mmap_sem), > which means down_write(mm->mmap_sem) can be called under cgroup_mutex. > OTOH, page fault path does down_read(mm->mmap_sem) and can call mem_cgroup_try_charge, > which means mutex_lock(cgroup_mutex) can be called under down_read(mm->mmap_sem). > What's status of this problem ? fixed or not yet ? Sorry for failing to track paches. Thanks, -Kame > > > I think similar problem can also happen when removing memcg's directory. > > > > > > > Why removing a directory? memcg (now) marks the directory as obsolete and we > > check for obsolete directories and get/put references. > > > I don't think so. > > mem_cgroup_pre_destroy (make mem->obsolete = 1) > mem_cgroup_force_empty(mem, **FALSE**) > mem_cgroup_force_empty_list > mem_cgroup_move_parent > __mem_cgroup_try_charge > > hmm, but looking more closely, cgroup_call_pre_destroy is called > outside of cgroup_mutex, so this problem doesn't happen at rmdir probably. > > > Thanks, > Daisuke Nishimura. > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/