I'm not cpuacct expert. please give me comment.
====================
Subject: [PATCH] cpuacct: VIRT_CPU_ACCOUNTING don't prevent percpu cputime caching
impact: little performance improvement
cpuacct_update_stats() is called at every tick updating. and it use percpu_counter
for avoiding performance degression.
Unfortunately, it doesn't works on VIRT_CPU_ACCOUNTING=y environment properly.
if VIRT_CPU_ACCOUNTING=y, every tick update much than 1000 cputime.
Thus every percpu_counter_add() makes spinlock grabbing and update non-percpu-variable.
This patch change the batch rule. now, every cpu can store "percpu_counter_bach x jiffies"
cputime in percpu cache.
it mean this patch don't have behavior change if VIRT_CPU_ACCOUNTING=n, but
works well on VIRT_CPU_ACCOUNTING=y too.
Cc: Bharata B Rao <[email protected]>
Cc: Balaji Rao <[email protected]>
Cc: Dhaval Giani <[email protected]>
Cc: KAMEZAWA Hiroyuki <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Balbir Singh <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Martin Schwidefsky <[email protected]>
Signed-off-by: KOSAKI Motohiro <[email protected]>
---
kernel/sched.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
Index: b/kernel/sched.c
===================================================================
--- a/kernel/sched.c 2009-04-28 14:18:36.000000000 +0900
+++ b/kernel/sched.c 2009-04-28 15:18:07.000000000 +0900
@@ -10117,6 +10117,7 @@ struct cpuacct {
};
struct cgroup_subsys cpuacct_subsys;
+static s32 cpuacct_batch;
/* return cpu accounting group corresponding to this container */
static inline struct cpuacct *cgroup_ca(struct cgroup *cgrp)
@@ -10146,6 +10147,9 @@ static struct cgroup_subsys_state *cpuac
if (!ca->cpuusage)
goto out_free_ca;
+ if (!cpuacct_batch)
+ cpuacct_batch = jiffies_to_cputime(percpu_counter_batch);
+
for (i = 0; i < CPUACCT_STAT_NSTATS; i++)
if (percpu_counter_init(&ca->cpustat[i], 0))
goto out_free_counters;
@@ -10342,7 +10346,7 @@ static void cpuacct_update_stats(struct
ca = task_ca(tsk);
do {
- percpu_counter_add(&ca->cpustat[idx], val);
+ __percpu_counter_add(&ca->cpustat[idx], val, cpuacct_batch);
ca = ca->parent;
} while (ca);
rcu_read_unlock();
On Tue, Apr 28, 2009 at 03:53:32PM +0900, KOSAKI Motohiro wrote:
>
> I'm not cpuacct expert. please give me comment.
>
> ====================
> Subject: [PATCH] cpuacct: VIRT_CPU_ACCOUNTING don't prevent percpu cputime caching
>
> impact: little performance improvement
>
> cpuacct_update_stats() is called at every tick updating. and it use percpu_counter
> for avoiding performance degression.
>
> Unfortunately, it doesn't works on VIRT_CPU_ACCOUNTING=y environment properly.
> if VIRT_CPU_ACCOUNTING=y, every tick update much than 1000 cputime.
> Thus every percpu_counter_add() makes spinlock grabbing and update non-percpu-variable.
>
> This patch change the batch rule. now, every cpu can store "percpu_counter_bach x jiffies"
> cputime in percpu cache.
> it mean this patch don't have behavior change if VIRT_CPU_ACCOUNTING=n, but
> works well on VIRT_CPU_ACCOUNTING=y too.
Let me try to understand what you are saying...
For archs which define VIRT_CPU_ACCOUNTING, every tick would result
in around 1000 units of cputime updates and since this is much much greater
than percpu_batch_counter, we end up taking spinlock on every tick.
If my above reading of the problem is correct, please look at my below comments.
> Index: b/kernel/sched.c
> ===================================================================
> --- a/kernel/sched.c 2009-04-28 14:18:36.000000000 +0900
> +++ b/kernel/sched.c 2009-04-28 15:18:07.000000000 +0900
> @@ -10117,6 +10117,7 @@ struct cpuacct {
> };
>
> struct cgroup_subsys cpuacct_subsys;
> +static s32 cpuacct_batch;
>
> /* return cpu accounting group corresponding to this container */
> static inline struct cpuacct *cgroup_ca(struct cgroup *cgrp)
> @@ -10146,6 +10147,9 @@ static struct cgroup_subsys_state *cpuac
> if (!ca->cpuusage)
> goto out_free_ca;
>
> + if (!cpuacct_batch)
> + cpuacct_batch = jiffies_to_cputime(percpu_counter_batch);
You essentially end up increasing the batch value from the default value
of max(32, nr_cpus*2).
> +
> for (i = 0; i < CPUACCT_STAT_NSTATS; i++)
> if (percpu_counter_init(&ca->cpustat[i], 0))
> goto out_free_counters;
> @@ -10342,7 +10346,7 @@ static void cpuacct_update_stats(struct
> ca = task_ca(tsk);
>
> do {
> - percpu_counter_add(&ca->cpustat[idx], val);
> + __percpu_counter_add(&ca->cpustat[idx], val, cpuacct_batch);
And you do this unconditionally which will affect all archs ? So you make
this behaviour default for archs which have VIRT_CPU_ACCOUNTING=n.
BTW, did you observe any real problem with the percpu counter spinlock ?
Regards,
Bharata.
> On Tue, Apr 28, 2009 at 03:53:32PM +0900, KOSAKI Motohiro wrote:
> >
> > I'm not cpuacct expert. please give me comment.
> >
> > ====================
> > Subject: [PATCH] cpuacct: VIRT_CPU_ACCOUNTING don't prevent percpu cputime caching
> >
> > impact: little performance improvement
> >
> > cpuacct_update_stats() is called at every tick updating. and it use percpu_counter
> > for avoiding performance degression.
> >
> > Unfortunately, it doesn't works on VIRT_CPU_ACCOUNTING=y environment properly.
> > if VIRT_CPU_ACCOUNTING=y, every tick update much than 1000 cputime.
> > Thus every percpu_counter_add() makes spinlock grabbing and update non-percpu-variable.
> >
> > This patch change the batch rule. now, every cpu can store "percpu_counter_bach x jiffies"
> > cputime in percpu cache.
> > it mean this patch don't have behavior change if VIRT_CPU_ACCOUNTING=n, but
> > works well on VIRT_CPU_ACCOUNTING=y too.
>
> Let me try to understand what you are saying...
>
> For archs which define VIRT_CPU_ACCOUNTING, every tick would result
> in around 1000 units of cputime updates and since this is much much greater
> than percpu_batch_counter, we end up taking spinlock on every tick.
Correct.
>
> If my above reading of the problem is correct, please look at my below comments.
>
> > Index: b/kernel/sched.c
> > ===================================================================
> > --- a/kernel/sched.c 2009-04-28 14:18:36.000000000 +0900
> > +++ b/kernel/sched.c 2009-04-28 15:18:07.000000000 +0900
> > @@ -10117,6 +10117,7 @@ struct cpuacct {
> > };
> >
> > struct cgroup_subsys cpuacct_subsys;
> > +static s32 cpuacct_batch;
> >
> > /* return cpu accounting group corresponding to this container */
> > static inline struct cpuacct *cgroup_ca(struct cgroup *cgrp)
> > @@ -10146,6 +10147,9 @@ static struct cgroup_subsys_state *cpuac
> > if (!ca->cpuusage)
> > goto out_free_ca;
> >
> > + if (!cpuacct_batch)
> > + cpuacct_batch = jiffies_to_cputime(percpu_counter_batch);
>
> You essentially end up increasing the batch value from the default value
> of max(32, nr_cpus*2).
>
> > +
> > for (i = 0; i < CPUACCT_STAT_NSTATS; i++)
> > if (percpu_counter_init(&ca->cpustat[i], 0))
> > goto out_free_counters;
> > @@ -10342,7 +10346,7 @@ static void cpuacct_update_stats(struct
> > ca = task_ca(tsk);
> >
> > do {
> > - percpu_counter_add(&ca->cpustat[idx], val);
> > + __percpu_counter_add(&ca->cpustat[idx], val, cpuacct_batch);
>
> And you do this unconditionally which will affect all archs ? So you make
> this behaviour default for archs which have VIRT_CPU_ACCOUNTING=n.
if VIRT_CPU_ACCOUNTING=n, jiffies_to_cputime(n) return n.
IOW, jiffies_to_cputime() don't change value.
and percpu_counter() use percpu_counter_batch as batch.
Then, if VIRT_CPU_ACCOUNTING=n, this patch don't change behavior.
> BTW, did you observe any real problem with the percpu counter spinlock ?
No.
I review percpu_counter() caller recently and it seems stragen usage.
On Tue, Apr 28, 2009 at 04:38:51PM +0900, KOSAKI Motohiro wrote:
> > On Tue, Apr 28, 2009 at 03:53:32PM +0900, KOSAKI Motohiro wrote:
>
> > BTW, did you observe any real problem with the percpu counter spinlock ?
>
> No.
> I review percpu_counter() caller recently and it seems stragen usage.
>
I should have phrased the question better ...
So have you found any performance degradation with any benchmarks/workloads
on archs which define VIRT_CPU_ACCOUNTING due to percpu_counter spinlock
being taken on every tick ? If the answer is no, don't you think we could
wait before making the kind of change you are proposing ?
Regards,
Bharata.
> On Tue, Apr 28, 2009 at 04:38:51PM +0900, KOSAKI Motohiro wrote:
> > > On Tue, Apr 28, 2009 at 03:53:32PM +0900, KOSAKI Motohiro wrote:
> >
> > > BTW, did you observe any real problem with the percpu counter spinlock ?
> >
> > No.
> > I review percpu_counter() caller recently and it seems stragen usage.
> >
>
> I should have phrased the question better ...
>
> So have you found any performance degradation with any benchmarks/workloads
> on archs which define VIRT_CPU_ACCOUNTING due to percpu_counter spinlock
> being taken on every tick ? If the answer is no, don't you think we could
> wait before making the kind of change you are proposing ?
maybe, I don't understand your point.
I don't oppose you wait something :)
On Tue, Apr 28, 2009 at 05:10:47PM +0900, KOSAKI Motohiro wrote:
> > On Tue, Apr 28, 2009 at 04:38:51PM +0900, KOSAKI Motohiro wrote:
> > > > On Tue, Apr 28, 2009 at 03:53:32PM +0900, KOSAKI Motohiro wrote:
> > >
> > > > BTW, did you observe any real problem with the percpu counter spinlock ?
> > >
> > > No.
> > > I review percpu_counter() caller recently and it seems stragen usage.
> > >
> >
> > I should have phrased the question better ...
> >
> > So have you found any performance degradation with any benchmarks/workloads
> > on archs which define VIRT_CPU_ACCOUNTING due to percpu_counter spinlock
> > being taken on every tick ? If the answer is no, don't you think we could
> > wait before making the kind of change you are proposing ?
>
> maybe, I don't understand your point.
My point is, let us not make this change if it is not a real problem that
has been observed on archs which define VIRT_CPU_ACCOUNTING.
Regards,
Bharata.
* KOSAKI Motohiro <[email protected]> [2009-04-28 15:53:32]:
>
> I'm not cpuacct expert. please give me comment.
>
> ====================
> Subject: [PATCH] cpuacct: VIRT_CPU_ACCOUNTING don't prevent percpu cputime caching
>
> impact: little performance improvement
>
> cpuacct_update_stats() is called at every tick updating. and it use percpu_counter
> for avoiding performance degression.
>
> Unfortunately, it doesn't works on VIRT_CPU_ACCOUNTING=y environment properly.
> if VIRT_CPU_ACCOUNTING=y, every tick update much than 1000 cputime.
> Thus every percpu_counter_add() makes spinlock grabbing and update non-percpu-variable.
>
> This patch change the batch rule. now, every cpu can store "percpu_counter_bach x jiffies"
> cputime in percpu cache.
> it mean this patch don't have behavior change if VIRT_CPU_ACCOUNTING=n, but
> works well on VIRT_CPU_ACCOUNTING=y too.
>
>
> Cc: Bharata B Rao <[email protected]>
> Cc: Balaji Rao <[email protected]>
> Cc: Dhaval Giani <[email protected]>
> Cc: KAMEZAWA Hiroyuki <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Balbir Singh <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Martin Schwidefsky <[email protected]>
> Signed-off-by: KOSAKI Motohiro <[email protected]>
> ---
> kernel/sched.c | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
> Index: b/kernel/sched.c
> ===================================================================
> --- a/kernel/sched.c 2009-04-28 14:18:36.000000000 +0900
> +++ b/kernel/sched.c 2009-04-28 15:18:07.000000000 +0900
> @@ -10117,6 +10117,7 @@ struct cpuacct {
> };
>
> struct cgroup_subsys cpuacct_subsys;
> +static s32 cpuacct_batch;
>
> /* return cpu accounting group corresponding to this container */
> static inline struct cpuacct *cgroup_ca(struct cgroup *cgrp)
> @@ -10146,6 +10147,9 @@ static struct cgroup_subsys_state *cpuac
> if (!ca->cpuusage)
> goto out_free_ca;
>
> + if (!cpuacct_batch)
> + cpuacct_batch = jiffies_to_cputime(percpu_counter_batch);
> +
I expect cpuacct_batch to be a large number
> for (i = 0; i < CPUACCT_STAT_NSTATS; i++)
> if (percpu_counter_init(&ca->cpustat[i], 0))
> goto out_free_counters;
> @@ -10342,7 +10346,7 @@ static void cpuacct_update_stats(struct
> ca = task_ca(tsk);
>
> do {
> - percpu_counter_add(&ca->cpustat[idx], val);
> + __percpu_counter_add(&ca->cpustat[idx], val, cpuacct_batch);
This will make the end result very off the real value due to large
batch value per cpu. If we are going to go this route, we should
probably consider using __percpu_counter_sum so that the batch value
does not show data that is way off.
> ca = ca->parent;
> } while (ca);
> rcu_read_unlock();
>
>
--
Balbir
Hi
Thanks for reviewing.
> > /* return cpu accounting group corresponding to this container */
> > static inline struct cpuacct *cgroup_ca(struct cgroup *cgrp)
> > @@ -10146,6 +10147,9 @@ static struct cgroup_subsys_state *cpuac
> > if (!ca->cpuusage)
> > goto out_free_ca;
> >
> > + if (!cpuacct_batch)
> > + cpuacct_batch = jiffies_to_cputime(percpu_counter_batch);
> > +
>
> I expect cpuacct_batch to be a large number
Yes.
>
> > for (i = 0; i < CPUACCT_STAT_NSTATS; i++)
> > if (percpu_counter_init(&ca->cpustat[i], 0))
> > goto out_free_counters;
> > @@ -10342,7 +10346,7 @@ static void cpuacct_update_stats(struct
> > ca = task_ca(tsk);
> >
> > do {
> > - percpu_counter_add(&ca->cpustat[idx], val);
> > + __percpu_counter_add(&ca->cpustat[idx], val, cpuacct_batch);
>
> This will make the end result very off the real value due to large
> batch value per cpu. If we are going to go this route, we should
> probably consider using __percpu_counter_sum so that the batch value
> does not show data that is way off.
No problem.
end-user don't see cputime itself. they see converted time.
cpuacct_stats_show() use cputime64_to_clock_t. it mean
the value less than 10msec don't display.
--------------------------------------------------------
static int cpuacct_stats_show(struct cgroup *cgrp, struct cftype *cft,
struct cgroup_map_cb *cb)
{
struct cpuacct *ca = cgroup_ca(cgrp);
int i;
for (i = 0; i < CPUACCT_STAT_NSTATS; i++) {
s64 val = percpu_counter_read(&ca->cpustat[i]);
val = cputime64_to_clock_t(val);
cb->fill(cb, cpuacct_stat_desc[i], val);
}
return 0;
}
--------------------------------------------------------
> > > > > BTW, did you observe any real problem with the percpu counter spinlock ?
> > > >
> > > > No.
> > > > I review percpu_counter() caller recently and it seems stragen usage.
> > > >
> > >
> > > I should have phrased the question better ...
> > >
> > > So have you found any performance degradation with any benchmarks/workloads
> > > on archs which define VIRT_CPU_ACCOUNTING due to percpu_counter spinlock
> > > being taken on every tick ? If the answer is no, don't you think we could
> > > wait before making the kind of change you are proposing ?
> >
> > maybe, I don't understand your point.
>
> My point is, let us not make this change if it is not a real problem that
> has been observed on archs which define VIRT_CPU_ACCOUNTING.
It's nice joke. but not constructive.
but another idea and another patch are always welcome.
On Wed, Apr 29, 2009 at 03:38:54AM +0530, Balbir Singh wrote:
> * KOSAKI Motohiro <[email protected]> [2009-04-28 15:53:32]:
> >
> > do {
> > - percpu_counter_add(&ca->cpustat[idx], val);
> > + __percpu_counter_add(&ca->cpustat[idx], val, cpuacct_batch);
>
> This will make the end result very off the real value due to large
> batch value per cpu. If we are going to go this route, we should
> probably consider using __percpu_counter_sum so that the batch value
> does not show data that is way off.
But __percpu_counter_sum takes fbc->lock spinlock and that is the cause of
the problem Kosaki mentions I believe.
Regards,
Bharata.
On Wed, Apr 29, 2009 at 11:31:14AM +0900, KOSAKI Motohiro wrote:
> > > > > > BTW, did you observe any real problem with the percpu counter spinlock ?
> > > > >
> > > > > No.
> > > > > I review percpu_counter() caller recently and it seems stragen usage.
> > > > >
> > > >
> > > > I should have phrased the question better ...
> > > >
> > > > So have you found any performance degradation with any benchmarks/workloads
> > > > on archs which define VIRT_CPU_ACCOUNTING due to percpu_counter spinlock
> > > > being taken on every tick ? If the answer is no, don't you think we could
> > > > wait before making the kind of change you are proposing ?
> > >
> > > maybe, I don't understand your point.
> >
> > My point is, let us not make this change if it is not a real problem that
> > has been observed on archs which define VIRT_CPU_ACCOUNTING.
>
> It's nice joke. but not constructive.
I was only asking you if you have seen any real life problem with this
and you said no. In that context, if I re-read my above point, I think
I should have a pretty good sense of humour to consider it as joke :)
Regards,
Bharata.
* KOSAKI Motohiro <[email protected]> [2009-04-29 11:26:30]:
> Hi
>
> Thanks for reviewing.
>
> > > /* return cpu accounting group corresponding to this container */
> > > static inline struct cpuacct *cgroup_ca(struct cgroup *cgrp)
> > > @@ -10146,6 +10147,9 @@ static struct cgroup_subsys_state *cpuac
> > > if (!ca->cpuusage)
> > > goto out_free_ca;
> > >
> > > + if (!cpuacct_batch)
> > > + cpuacct_batch = jiffies_to_cputime(percpu_counter_batch);
> > > +
> >
> > I expect cpuacct_batch to be a large number
>
> Yes.
>
>
> >
> > > for (i = 0; i < CPUACCT_STAT_NSTATS; i++)
> > > if (percpu_counter_init(&ca->cpustat[i], 0))
> > > goto out_free_counters;
> > > @@ -10342,7 +10346,7 @@ static void cpuacct_update_stats(struct
> > > ca = task_ca(tsk);
> > >
> > > do {
> > > - percpu_counter_add(&ca->cpustat[idx], val);
> > > + __percpu_counter_add(&ca->cpustat[idx], val, cpuacct_batch);
> >
> > This will make the end result very off the real value due to large
> > batch value per cpu. If we are going to go this route, we should
> > probably consider using __percpu_counter_sum so that the batch value
> > does not show data that is way off.
>
> No problem.
>
> end-user don't see cputime itself. they see converted time.
> cpuacct_stats_show() use cputime64_to_clock_t. it mean
> the value less than 10msec don't display.
>
Yes, I know, I reviewed Bharata's patch and suggested converting to
clock_t for consistency with other metrics.
> --------------------------------------------------------
> static int cpuacct_stats_show(struct cgroup *cgrp, struct cftype *cft,
> struct cgroup_map_cb *cb)
> {
> struct cpuacct *ca = cgroup_ca(cgrp);
> int i;
>
> for (i = 0; i < CPUACCT_STAT_NSTATS; i++) {
> s64 val = percpu_counter_read(&ca->cpustat[i]);
My point is, this should probably be percpu_counter_sum(), but that
can be expensive and we were willing to tollerate some inaccuracy due
to batch value, I think your patch adds to the inaccuracy even more,
even though it fixes a genuine problem.
> val = cputime64_to_clock_t(val);
> cb->fill(cb, cpuacct_stat_desc[i], val);
> }
> return 0;
> }
> --------------------------------------------------------
>
>
>
>
>
--
Balbir
* Bharata B Rao <[email protected]> [2009-04-29 08:51:46]:
> On Wed, Apr 29, 2009 at 03:38:54AM +0530, Balbir Singh wrote:
> > * KOSAKI Motohiro <[email protected]> [2009-04-28 15:53:32]:
> > >
> > > do {
> > > - percpu_counter_add(&ca->cpustat[idx], val);
> > > + __percpu_counter_add(&ca->cpustat[idx], val, cpuacct_batch);
> >
> > This will make the end result very off the real value due to large
> > batch value per cpu. If we are going to go this route, we should
> > probably consider using __percpu_counter_sum so that the batch value
> > does not show data that is way off.
>
> But __percpu_counter_sum takes fbc->lock spinlock and that is the cause of
> the problem Kosaki mentions I believe.
But this is on the user space read side (show stats), I thought
Kosaki's problem was with per_cpu_counter_add called from update
stats.
--
Balbir
> > > This will make the end result very off the real value due to large
> > > batch value per cpu. If we are going to go this route, we should
> > > probably consider using __percpu_counter_sum so that the batch value
> > > does not show data that is way off.
> >
> > No problem.
> >
> > end-user don't see cputime itself. they see converted time.
> > cpuacct_stats_show() use cputime64_to_clock_t. it mean
> > the value less than 10msec don't display.
> >
>
> Yes, I know, I reviewed Bharata's patch and suggested converting to
> clock_t for consistency with other metrics.
Oh, sorry.
I didn't know this.
> > --------------------------------------------------------
> > static int cpuacct_stats_show(struct cgroup *cgrp, struct cftype *cft,
> > struct cgroup_map_cb *cb)
> > {
> > struct cpuacct *ca = cgroup_ca(cgrp);
> > int i;
> >
> > for (i = 0; i < CPUACCT_STAT_NSTATS; i++) {
> > s64 val = percpu_counter_read(&ca->cpustat[i]);
>
> My point is, this should probably be percpu_counter_sum(), but that
> can be expensive and we were willing to tollerate some inaccuracy due
> to batch value, I think your patch adds to the inaccuracy even more,
> even though it fixes a genuine problem.
Not expensive.
cpuacct_stats_show() is only called when reading stat file.
it definitely slow-path.
I think we can use percpu_counter_sum().
However, I doubt its worth.
before my patch:
VIRT_CPU_ACCOUNTING=y: accuracy but slow
VIRT_CPU_ACCOUNTING=n: inaccuracy few tick but fast
my patch
VIRT_CPU_ACCOUNTING=y: inaccuracy few tick but fast
VIRT_CPU_ACCOUNTING=n: inaccuracy few tick but fast
if my inaccuracy is wrong, current code is also crap when VIRT_CPU_ACCOUNTING=n.
I only make const accuracy to VIRT_CPU_ACCOUNTING=y and n.
Thought?
Although you still think percpu_counter_sum() is better, I can do it.
>
>
> > val = cputime64_to_clock_t(val);
> > cb->fill(cb, cpuacct_stat_desc[i], val);
> > }
> > return 0;
> > }
> > --------------------------------------------------------
On Wed, Apr 29, 2009 at 11:34:15AM +0530, Balbir Singh wrote:
> * Bharata B Rao <[email protected]> [2009-04-29 08:51:46]:
>
> > On Wed, Apr 29, 2009 at 03:38:54AM +0530, Balbir Singh wrote:
> > > * KOSAKI Motohiro <[email protected]> [2009-04-28 15:53:32]:
> > > >
> > > > do {
> > > > - percpu_counter_add(&ca->cpustat[idx], val);
> > > > + __percpu_counter_add(&ca->cpustat[idx], val, cpuacct_batch);
> > >
> > > This will make the end result very off the real value due to large
> > > batch value per cpu. If we are going to go this route, we should
> > > probably consider using __percpu_counter_sum so that the batch value
> > > does not show data that is way off.
> >
> > But __percpu_counter_sum takes fbc->lock spinlock and that is the cause of
> > the problem Kosaki mentions I believe.
>
> But this is on the user space read side (show stats), I thought
> Kosaki's problem was with per_cpu_counter_add called from update
> stats.
Right. Agreed.
Regards,
Bharata.