Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1760181AbZD1Hc2 (ORCPT ); Tue, 28 Apr 2009 03:32:28 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1753531AbZD1HcT (ORCPT ); Tue, 28 Apr 2009 03:32:19 -0400 Received: from e31.co.us.ibm.com ([32.97.110.149]:41377 "EHLO e31.co.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753113AbZD1HcS (ORCPT ); Tue, 28 Apr 2009 03:32:18 -0400 Date: Tue, 28 Apr 2009 13:01:51 +0530 From: Bharata B Rao To: KOSAKI Motohiro Cc: LKML , Balaji Rao , Dhaval Giani , KAMEZAWA Hiroyuki , Peter Zijlstra , Balbir Singh , Ingo Molnar , Martin Schwidefsky , seto.hidetoshi@jp.fujitsu.com Subject: Re: [PATCH] cpuacct: VIRT_CPU_ACCOUNTING don't prevent percpu cputime count Message-ID: <20090428073151.GC3825@in.ibm.com> Reply-To: bharata@linux.vnet.ibm.com References: <20090428153611.EBC6.A69D9226@jp.fujitsu.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20090428153611.EBC6.A69D9226@jp.fujitsu.com> User-Agent: Mutt/1.5.18 (2008-05-17) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2833 Lines: 74 On Tue, Apr 28, 2009 at 03:53:32PM +0900, KOSAKI Motohiro wrote: > > I'm not cpuacct expert. please give me comment. > > ==================== > Subject: [PATCH] cpuacct: VIRT_CPU_ACCOUNTING don't prevent percpu cputime caching > > impact: little performance improvement > > cpuacct_update_stats() is called at every tick updating. and it use percpu_counter > for avoiding performance degression. > > Unfortunately, it doesn't works on VIRT_CPU_ACCOUNTING=y environment properly. > if VIRT_CPU_ACCOUNTING=y, every tick update much than 1000 cputime. > Thus every percpu_counter_add() makes spinlock grabbing and update non-percpu-variable. > > This patch change the batch rule. now, every cpu can store "percpu_counter_bach x jiffies" > cputime in percpu cache. > it mean this patch don't have behavior change if VIRT_CPU_ACCOUNTING=n, but > works well on VIRT_CPU_ACCOUNTING=y too. Let me try to understand what you are saying... For archs which define VIRT_CPU_ACCOUNTING, every tick would result in around 1000 units of cputime updates and since this is much much greater than percpu_batch_counter, we end up taking spinlock on every tick. If my above reading of the problem is correct, please look at my below comments. > Index: b/kernel/sched.c > =================================================================== > --- a/kernel/sched.c 2009-04-28 14:18:36.000000000 +0900 > +++ b/kernel/sched.c 2009-04-28 15:18:07.000000000 +0900 > @@ -10117,6 +10117,7 @@ struct cpuacct { > }; > > struct cgroup_subsys cpuacct_subsys; > +static s32 cpuacct_batch; > > /* return cpu accounting group corresponding to this container */ > static inline struct cpuacct *cgroup_ca(struct cgroup *cgrp) > @@ -10146,6 +10147,9 @@ static struct cgroup_subsys_state *cpuac > if (!ca->cpuusage) > goto out_free_ca; > > + if (!cpuacct_batch) > + cpuacct_batch = jiffies_to_cputime(percpu_counter_batch); You essentially end up increasing the batch value from the default value of max(32, nr_cpus*2). > + > for (i = 0; i < CPUACCT_STAT_NSTATS; i++) > if (percpu_counter_init(&ca->cpustat[i], 0)) > goto out_free_counters; > @@ -10342,7 +10346,7 @@ static void cpuacct_update_stats(struct > ca = task_ca(tsk); > > do { > - percpu_counter_add(&ca->cpustat[idx], val); > + __percpu_counter_add(&ca->cpustat[idx], val, cpuacct_batch); And you do this unconditionally which will affect all archs ? So you make this behaviour default for archs which have VIRT_CPU_ACCOUNTING=n. BTW, did you observe any real problem with the percpu counter spinlock ? Regards, Bharata. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/