Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755050Ab1BBTDI (ORCPT ); Wed, 2 Feb 2011 14:03:08 -0500 Received: from e28smtp04.in.ibm.com ([122.248.162.4]:47224 "EHLO e28smtp04.in.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755002Ab1BBTDG (ORCPT ); Wed, 2 Feb 2011 14:03:06 -0500 Date: Thu, 3 Feb 2011 00:32:51 +0530 From: Balbir Singh To: Peter Zijlstra Cc: eranian@google.com, linux-kernel@vger.kernel.org, mingo@elte.hu, paulus@samba.org, davem@davemloft.net, fweisbec@gmail.com, perfmon2-devel@lists.sf.net, eranian@gmail.com, robert.richter@amd.com, acme@redhat.com, lizf@cn.fujitsu.com, Paul Menage Subject: Re: [PATCH 1/2] perf_events: add cgroup support (v8) Message-ID: <20110202190251.GB16409@balbir.in.ibm.com> Reply-To: balbir@linux.vnet.ibm.com References: <4d384700.2308e30a.70bc.ffffd532@mx.google.com> <1295534345.28776.175.camel@laptop> <1296646160.26581.315.camel@laptop> <20110202115012.GA16409@balbir.in.ibm.com> <1296650792.26581.319.camel@laptop> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline In-Reply-To: <1296650792.26581.319.camel@laptop> User-Agent: Mutt/1.5.20 (2009-06-14) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3362 Lines: 87 * Peter Zijlstra [2011-02-02 13:46:32]: > On Wed, 2011-02-02 at 17:20 +0530, Balbir Singh wrote: > > * Peter Zijlstra [2011-02-02 12:29:20]: > > > > > On Thu, 2011-01-20 at 15:39 +0100, Peter Zijlstra wrote: > > > > On Thu, 2011-01-20 at 15:30 +0200, Stephane Eranian wrote: > > > > > @@ -4259,8 +4261,20 @@ void cgroup_exit(struct task_struct *tsk, int run_callbacks) > > > > > > > > > > /* Reassign the task to the init_css_set. */ > > > > > task_lock(tsk); > > > > > + /* > > > > > + * we mask interrupts to prevent: > > > > > + * - timer tick to cause event rotation which > > > > > + * could schedule back in cgroup events after > > > > > + * they were switched out by perf_cgroup_sched_out() > > > > > + * > > > > > + * - preemption which could schedule back in cgroup events > > > > > + */ > > > > > + local_irq_save(flags); > > > > > + perf_cgroup_sched_out(tsk); > > > > > cg = tsk->cgroups; > > > > > tsk->cgroups = &init_css_set; > > > > > + perf_cgroup_sched_in(tsk); > > > > > + local_irq_restore(flags); > > > > > task_unlock(tsk); > > > > > if (cg) > > > > > put_css_set_taskexit(cg); > > > > > > > > So you too need a callback on cgroup change there.. Li, Paul, any chance > > > > we can fix this cgroup_subsys::exit callback? The scheduler code needs > > > > to do funny thing because its in the wrong place as well. > > > > > > cgroup guys? Shall I just fix this exit thing since the only user seems > > > to be the scheduler and now perf for both of which its unfortunate at > > > best? > > > > Are you suggesting that the cgroup_exit on task_exit notification should be > > pulled out? > > > No, just fixed. The callback as it exists isn't useful and leads to > hacks like the above. > OK > > > > Balbir, memcontrol.c uses pre_destroy(), I pose that using this method > > > is broken per definition since it makes the cgroup empty notification > > > void. > > > > > > > We use pre_destroy() to reclaim, so that delete/rmdir() will be able > > to clean up the node/group. I am not sure what you mean by it makes > > the empty notification void and why pre_destroy() is broken? > > A quick look at the code looked like it could return -EBUSY (and other > errors), in that case the rmdir of the empty cgroup will fail. > > Therefore it can happen that after the last task is removed, and we get > the notification that the cgroup is empty, and we attempt the rmdir we > will fail. > > This again means that all such notification handlers must poll state, > which is ridiculous. The reason why the failure occurs is because someone has an active reference to the cgroup structure. In the case of memory, it was every page_cgroup earlier. The only reason why a notification would have to poll state is if 1. notification is sent that there are no references, this group can be cleaned up 2. A new reference is acquired before the cleanup 1 and 2 are unlikely -- Three Cheers, Balbir -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/