Date: Tue, 28 Apr 2009 13:01:51 +0530
From: Bharata B Rao <bharata@linux.vnet.ibm.com>
To: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: LKML <linux-kernel@vger.kernel.org>, Balaji Rao <balajirrao@gmail.com>,
       Dhaval Giani <dhaval@linux.vnet.ibm.com>,
       KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
       Peter Zijlstra <a.p.zijlstra@chello.nl>,
       Balbir Singh <balbir@linux.vnet.ibm.com>, Ingo Molnar <mingo@elte.hu>,
       Martin Schwidefsky <schwidefsky@de.ibm.com>,
       seto.hidetoshi@jp.fujitsu.com
Subject: Re: [PATCH] cpuacct: VIRT_CPU_ACCOUNTING don't prevent percpu
	cputime count
Message-ID: <20090428073151.GC3825@in.ibm.com>
Reply-To: bharata@linux.vnet.ibm.com
References: <20090428153611.EBC6.A69D9226@jp.fujitsu.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20090428153611.EBC6.A69D9226@jp.fujitsu.com>
User-Agent: Mutt/1.5.18 (2008-05-17)
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 2833
Lines: 74

On Tue, Apr 28, 2009 at 03:53:32PM +0900, KOSAKI Motohiro wrote:
> 
> I'm not cpuacct expert. please give me comment.
> 
> ====================
> Subject: [PATCH] cpuacct: VIRT_CPU_ACCOUNTING don't prevent percpu cputime caching
> 
> impact: little performance improvement
> 
> cpuacct_update_stats() is called at every tick updating. and it use percpu_counter
> for avoiding performance degression.
> 
> Unfortunately, it doesn't works on VIRT_CPU_ACCOUNTING=y environment properly.
> if VIRT_CPU_ACCOUNTING=y, every tick update much than 1000 cputime.
> Thus every percpu_counter_add() makes spinlock grabbing and update non-percpu-variable.
> 
> This patch change the batch rule. now, every cpu can store "percpu_counter_bach x jiffies"
> cputime in percpu cache.
> it mean this patch don't have behavior change if VIRT_CPU_ACCOUNTING=n, but
> works well on VIRT_CPU_ACCOUNTING=y too.

Let me try to understand what you are saying...

For archs which define VIRT_CPU_ACCOUNTING, every tick would result
in around 1000 units of cputime updates and since this is much much greater
than percpu_batch_counter, we end up taking spinlock on every tick.

If my above reading of the problem is correct, please look at my below comments.

> Index: b/kernel/sched.c
> ===================================================================
> --- a/kernel/sched.c	2009-04-28 14:18:36.000000000 +0900
> +++ b/kernel/sched.c	2009-04-28 15:18:07.000000000 +0900
> @@ -10117,6 +10117,7 @@ struct cpuacct {
>  };
> 
>  struct cgroup_subsys cpuacct_subsys;
> +static s32 cpuacct_batch;
> 
>  /* return cpu accounting group corresponding to this container */
>  static inline struct cpuacct *cgroup_ca(struct cgroup *cgrp)
> @@ -10146,6 +10147,9 @@ static struct cgroup_subsys_state *cpuac
>  	if (!ca->cpuusage)
>  		goto out_free_ca;
> 
> +	if (!cpuacct_batch)
> +		cpuacct_batch = jiffies_to_cputime(percpu_counter_batch);

You essentially end up increasing the batch value from the default value
of max(32, nr_cpus*2). 

> +
>  	for (i = 0; i < CPUACCT_STAT_NSTATS; i++)
>  		if (percpu_counter_init(&ca->cpustat[i], 0))
>  			goto out_free_counters;
> @@ -10342,7 +10346,7 @@ static void cpuacct_update_stats(struct 
>  	ca = task_ca(tsk);
> 
>  	do {
> -		percpu_counter_add(&ca->cpustat[idx], val);
> +		__percpu_counter_add(&ca->cpustat[idx], val, cpuacct_batch);

And you do this unconditionally which will affect all archs ? So you make
this behaviour default for archs which have VIRT_CPU_ACCOUNTING=n.

BTW, did you observe any real problem with the percpu counter spinlock ?

Regards,
Bharata.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/