2013-04-30 00:16:44

by Tim Chen

[permalink] [raw]
Subject: [PATCH 1/2] Make the batch size of the percpu_counter configurable

Currently, there is a single, global, variable (percpu_counter_batch) that
controls the batch sizes for every 'struct percpu_counter' on the system.

However, there are some applications, e.g. memory accounting where it is
more appropriate to scale the batch size according to the memory size.
This patch adds the infrastructure to be able to change the batch sizes
for each individual instance of 'struct percpu_counter'.

I have chosen to implement the added field of batch as a pointer
(by default point to percpu_counter_batch) instead
of a static value. The reason is the percpu_counter initialization
can be called when we only have boot cpu and not all cpus are online.
and percpu_counter_batch value have yet to be udpated with a
call to compute_batch_value function.

Thanks to Dave Hansen and Andi Kleen for their comments and suggestions.

Signed-off-by: Tim Chen <[email protected]>
---
include/linux/percpu_counter.h | 20 +++++++++++++++++++-
lib/percpu_counter.c | 23 ++++++++++++++++++++++-
2 files changed, 41 insertions(+), 2 deletions(-)

diff --git a/include/linux/percpu_counter.h b/include/linux/percpu_counter.h
index d5dd465..5ca7df5 100644
--- a/include/linux/percpu_counter.h
+++ b/include/linux/percpu_counter.h
@@ -22,6 +22,7 @@ struct percpu_counter {
struct list_head list; /* All percpu_counters are on a list */
#endif
s32 __percpu *counters;
+ int *batch ____cacheline_aligned_in_smp;
};

extern int percpu_counter_batch;
@@ -40,11 +41,22 @@ void percpu_counter_destroy(struct percpu_counter *fbc);
void percpu_counter_set(struct percpu_counter *fbc, s64 amount);
void __percpu_counter_add(struct percpu_counter *fbc, s64 amount, s32 batch);
s64 __percpu_counter_sum(struct percpu_counter *fbc);
+void __percpu_counter_batch_resize(struct percpu_counter *fbc, int *batch);
int percpu_counter_compare(struct percpu_counter *fbc, s64 rhs);

+static inline int percpu_counter_and_batch_init(struct percpu_counter *fbc,
+ s64 amount, int *batch)
+{
+ int ret = percpu_counter_init(fbc, amount);
+
+ if (batch && !ret)
+ __percpu_counter_batch_resize(fbc, batch);
+ return ret;
+}
+
static inline void percpu_counter_add(struct percpu_counter *fbc, s64 amount)
{
- __percpu_counter_add(fbc, amount, percpu_counter_batch);
+ __percpu_counter_add(fbc, amount, *fbc->batch);
}

static inline s64 percpu_counter_sum_positive(struct percpu_counter *fbc)
@@ -95,6 +107,12 @@ static inline int percpu_counter_init(struct percpu_counter *fbc, s64 amount)
return 0;
}

+static inline int percpu_counter_and_batch_init(struct percpu_counter *fbc,
+ s64 amount, int *batch)
+{
+ return percpu_counter_init(fbc, amount);
+}
+
static inline void percpu_counter_destroy(struct percpu_counter *fbc)
{
}
diff --git a/lib/percpu_counter.c b/lib/percpu_counter.c
index ba6085d..a75951e 100644
--- a/lib/percpu_counter.c
+++ b/lib/percpu_counter.c
@@ -116,6 +116,7 @@ int __percpu_counter_init(struct percpu_counter *fbc, s64 amount,
lockdep_set_class(&fbc->lock, key);
fbc->count = amount;
fbc->counters = alloc_percpu(s32);
+ fbc->batch = &percpu_counter_batch;
if (!fbc->counters)
return -ENOMEM;

@@ -131,6 +132,26 @@ int __percpu_counter_init(struct percpu_counter *fbc, s64 amount,
}
EXPORT_SYMBOL(__percpu_counter_init);

+void __percpu_counter_batch_resize(struct percpu_counter *fbc, int *batch)
+{
+ unsigned long flags;
+ int cpu;
+
+ if (!batch)
+ return;
+
+ raw_spin_lock_irqsave(&fbc->lock, flags);
+ for_each_online_cpu(cpu) {
+ s32 *pcount = per_cpu_ptr(fbc->counters, cpu);
+ fbc->count += *pcount;
+ *pcount = 0;
+ }
+ *batch = max(*batch, percpu_counter_batch);
+ fbc->batch = batch;
+ raw_spin_unlock_irqrestore(&fbc->lock, flags);
+}
+EXPORT_SYMBOL(__percpu_counter_batch_resize);
+
void percpu_counter_destroy(struct percpu_counter *fbc)
{
if (!fbc->counters)
@@ -196,7 +217,7 @@ int percpu_counter_compare(struct percpu_counter *fbc, s64 rhs)

count = percpu_counter_read(fbc);
/* Check to see if rough count will be sufficient for comparison */
- if (abs(count - rhs) > (percpu_counter_batch*num_online_cpus())) {
+ if (abs(count - rhs) > ((*fbc->batch)*num_online_cpus())) {
if (count > rhs)
return 1;
else
--
1.7.11.7


2013-04-30 00:17:09

by Tim Chen

[permalink] [raw]
Subject: [PATCH 2/2] Make batch size for memory accounting configured according to size of memory

Currently the per cpu counter's batch size for memory accounting is
configured as twice the number of cpus in the system. However,
for system with very large memory, it is more appropriate to make it
proportional to the memory size per cpu in the system.

For example, for a x86_64 system with 64 cpus and 128 GB of memory,
the batch size is only 2*64 pages (0.5 MB). So any memory accounting
changes of more than 0.5MB will overflow the per cpu counter into
the global counter. Instead, for the new scheme, the batch size
is configured to be 0.4% of the memory/cpu = 8MB (128 GB/64 /256),
which is more inline with the memory size.

Signed-off-by: Tim Chen <[email protected]>
---
mm/mmap.c | 13 ++++++++++++-
mm/nommu.c | 13 ++++++++++++-
2 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/mm/mmap.c b/mm/mmap.c
index 0db0de1..082836e 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -89,6 +89,7 @@ int sysctl_max_map_count __read_mostly = DEFAULT_MAX_MAP_COUNT;
* other variables. It can be updated by several CPUs frequently.
*/
struct percpu_counter vm_committed_as ____cacheline_aligned_in_smp;
+int vm_committed_batchsz ____cacheline_aligned_in_smp;

/*
* The global memory commitment made in the system can be a metric
@@ -3090,10 +3091,20 @@ void mm_drop_all_locks(struct mm_struct *mm)
/*
* initialise the VMA slab
*/
+static inline int mm_compute_batch(void)
+{
+ int nr = num_present_cpus();
+
+ /* batch size set to 0.4% of (total memory/#cpus) */
+ return (int) (totalram_pages/nr) / 256;
+}
+
void __init mmap_init(void)
{
int ret;

- ret = percpu_counter_init(&vm_committed_as, 0);
+ vm_committed_batchsz = mm_compute_batch();
+ ret = percpu_counter_and_batch_init(&vm_committed_as, 0,
+ &vm_committed_batchsz);
VM_BUG_ON(ret);
}
diff --git a/mm/nommu.c b/mm/nommu.c
index 2f3ea74..a87a99c 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -59,6 +59,7 @@ unsigned long max_mapnr;
unsigned long num_physpages;
unsigned long highest_memmap_pfn;
struct percpu_counter vm_committed_as;
+int vm_committed_batchsz;
int sysctl_overcommit_memory = OVERCOMMIT_GUESS; /* heuristic overcommit */
int sysctl_overcommit_ratio = 50; /* default is 50% */
int sysctl_max_map_count = DEFAULT_MAX_MAP_COUNT;
@@ -526,11 +527,21 @@ SYSCALL_DEFINE1(brk, unsigned long, brk)
/*
* initialise the VMA and region record slabs
*/
+static inline int mm_compute_batch(void)
+{
+ int nr = num_present_cpus();
+
+ /* batch size set to 0.4% of (total memory/#cpus) */
+ return (int) (totalram_pages/nr) / 256;
+}
+
void __init mmap_init(void)
{
int ret;

- ret = percpu_counter_init(&vm_committed_as, 0);
+ vm_committed_batchsz = mm_compute_batch();
+ ret = percpu_counter_and_batch_init(&vm_committed_as, 0,
+ &vm_committed_batchsz);
VM_BUG_ON(ret);
vm_region_jar = KMEM_CACHE(vm_region, SLAB_PANIC);
}
--
1.7.11.7

Subject: Re: [PATCH 1/2] Make the batch size of the percpu_counter configurable

On Mon, 29 Apr 2013, Tim Chen wrote:

> diff --git a/include/linux/percpu_counter.h b/include/linux/percpu_counter.h
> index d5dd465..5ca7df5 100644
> --- a/include/linux/percpu_counter.h
> +++ b/include/linux/percpu_counter.h
> @@ -22,6 +22,7 @@ struct percpu_counter {
> struct list_head list; /* All percpu_counters are on a list */
> #endif
> s32 __percpu *counters;
> + int *batch ____cacheline_aligned_in_smp;
> };

What is this for and why does it have that alignmend?

2013-04-30 16:23:53

by Tim Chen

[permalink] [raw]
Subject: Re: [PATCH 1/2] Make the batch size of the percpu_counter configurable

On Tue, 2013-04-30 at 13:32 +0000, Christoph Lameter wrote:
> On Mon, 29 Apr 2013, Tim Chen wrote:
>
> > diff --git a/include/linux/percpu_counter.h b/include/linux/percpu_counter.h
> > index d5dd465..5ca7df5 100644
> > --- a/include/linux/percpu_counter.h
> > +++ b/include/linux/percpu_counter.h
> > @@ -22,6 +22,7 @@ struct percpu_counter {
> > struct list_head list; /* All percpu_counters are on a list */
> > #endif
> > s32 __percpu *counters;
> > + int *batch ____cacheline_aligned_in_smp;
> > };
>
> What is this for and why does it have that alignmend?

I was assuming that if batch is frequently referenced, it probably
should not share a cache line with the counters field.

Tim

Subject: Re: [PATCH 1/2] Make the batch size of the percpu_counter configurable

On Tue, 30 Apr 2013, Tim Chen wrote:

> On Tue, 2013-04-30 at 13:32 +0000, Christoph Lameter wrote:
> > On Mon, 29 Apr 2013, Tim Chen wrote:
> >
> > > diff --git a/include/linux/percpu_counter.h b/include/linux/percpu_counter.h
> > > index d5dd465..5ca7df5 100644
> > > --- a/include/linux/percpu_counter.h
> > > +++ b/include/linux/percpu_counter.h
> > > @@ -22,6 +22,7 @@ struct percpu_counter {
> > > struct list_head list; /* All percpu_counters are on a list */
> > > #endif
> > > s32 __percpu *counters;
> > > + int *batch ____cacheline_aligned_in_smp;
> > > };
> >
> > What is this for and why does it have that alignmend?
>
> I was assuming that if batch is frequently referenced, it probably
> should not share a cache line with the counters field.

And why is it a pointer?

And the pointer is so frequently changed that it needs it own cache line?

2013-04-30 17:34:48

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 1/2] Make the batch size of the percpu_counter configurable

> > What is this for and why does it have that alignmend?
>
> I was assuming that if batch is frequently referenced, it probably
> should not share a cache line with the counters field.

As long as they are both read-mostly it should be fine to share
(cache line will just be SHARED)

Padding would be only useful if one gets changed regularly.
I don't think that's the case here?

-Andi

--
[email protected] -- Speaking for myself only

2013-04-30 17:48:38

by Tim Chen

[permalink] [raw]
Subject: Re: [PATCH 1/2] Make the batch size of the percpu_counter configurable

On Tue, 2013-04-30 at 17:28 +0000, Christoph Lameter wrote:
> On Tue, 30 Apr 2013, Tim Chen wrote:
>
> > On Tue, 2013-04-30 at 13:32 +0000, Christoph Lameter wrote:
> > > On Mon, 29 Apr 2013, Tim Chen wrote:
> > >
> > > > diff --git a/include/linux/percpu_counter.h b/include/linux/percpu_counter.h
> > > > index d5dd465..5ca7df5 100644
> > > > --- a/include/linux/percpu_counter.h
> > > > +++ b/include/linux/percpu_counter.h
> > > > @@ -22,6 +22,7 @@ struct percpu_counter {
> > > > struct list_head list; /* All percpu_counters are on a list */
> > > > #endif
> > > > s32 __percpu *counters;
> > > > + int *batch ____cacheline_aligned_in_smp;
> > > > };
> > >
> > > What is this for and why does it have that alignmend?
> >
> > I was assuming that if batch is frequently referenced, it probably
> > should not share a cache line with the counters field.
>
> And why is it a pointer?

A pointer because the default percpu_counter_batch value could change
later when cpus come online after we initialize per cpu counter and
percpu_counter_batch will get computed again in percpu_counter_startup.
Making it a pointer will make it unnecessary to come back and change the
batch sizes if we use static batch value and default batch size.

>
> And the pointer is so frequently changed that it needs it own cache line?
>

On second thought, your're right. It is unnecessary for *batch to have
its own cache line as the counters pointer and head_list above it will
not change frequently. I'll remove the cache alignment.

Tim

2013-04-30 17:55:45

by Tim Chen

[permalink] [raw]
Subject: Re: [PATCH 1/2] Make the batch size of the percpu_counter configurable

On Tue, 2013-04-30 at 17:53 +0000, Christoph Lameter wrote:
> On Tue, 30 Apr 2013, Tim Chen wrote:
>
> > > And why is it a pointer?
> >
> > A pointer because the default percpu_counter_batch value could change
> > later when cpus come online after we initialize per cpu counter and
> > percpu_counter_batch will get computed again in percpu_counter_startup.
> > Making it a pointer will make it unnecessary to come back and change the
> > batch sizes if we use static batch value and default batch size.
>
> But you will have to dereference the pointer whenever you want the batch
> size from the hot path. Looks like it would be better to put the value
> there directly. You have a list of percpu counters that can be traversed
> to change the batch size.
>

I have considered that. But the list is not available unless we have
CONFIG_HOTPLUG_CPU compiled in.

Tim

Subject: Re: [PATCH 1/2] Make the batch size of the percpu_counter configurable

On Tue, 30 Apr 2013, Tim Chen wrote:

> > And why is it a pointer?
>
> A pointer because the default percpu_counter_batch value could change
> later when cpus come online after we initialize per cpu counter and
> percpu_counter_batch will get computed again in percpu_counter_startup.
> Making it a pointer will make it unnecessary to come back and change the
> batch sizes if we use static batch value and default batch size.

But you will have to dereference the pointer whenever you want the batch
size from the hot path. Looks like it would be better to put the value
there directly. You have a list of percpu counters that can be traversed
to change the batch size.

2013-04-30 18:10:09

by Eric Dumazet

[permalink] [raw]
Subject: Re: [PATCH 1/2] Make the batch size of the percpu_counter configurable

On Tue, 2013-04-30 at 09:23 -0700, Tim Chen wrote:
> On Tue, 2013-04-30 at 13:32 +0000, Christoph Lameter wrote:
> > On Mon, 29 Apr 2013, Tim Chen wrote:
> >
> > > diff --git a/include/linux/percpu_counter.h b/include/linux/percpu_counter.h
> > > index d5dd465..5ca7df5 100644
> > > --- a/include/linux/percpu_counter.h
> > > +++ b/include/linux/percpu_counter.h
> > > @@ -22,6 +22,7 @@ struct percpu_counter {
> > > struct list_head list; /* All percpu_counters are on a list */
> > > #endif
> > > s32 __percpu *counters;
> > > + int *batch ____cacheline_aligned_in_smp;
> > > };
> >
> > What is this for and why does it have that alignmend?
>
> I was assuming that if batch is frequently referenced, it probably
> should not share a cache line with the counters field.

But 'counters' field has the same requirement. Its supposed to be read
only field.

So please remove this '____cacheline_aligned_in_smp', as it makes the
whole struct percpu_counter at least two cache lines wide, for no
obvious reason.



Subject: Re: [PATCH 1/2] Make the batch size of the percpu_counter configurable

On Tue, 30 Apr 2013, Tim Chen wrote:

> > But you will have to dereference the pointer whenever you want the batch
> > size from the hot path. Looks like it would be better to put the value
> > there directly. You have a list of percpu counters that can be traversed
> > to change the batch size.
> >
>
> I have considered that. But the list is not available unless we have
> CONFIG_HOTPLUG_CPU compiled in.

percpu counters are performance sensitive and with the pointer you
will need to reference another one increasing the cache footprint. You are
touching an additional cacheline somewhere in memory frequently. Not good.

2013-04-30 18:28:09

by Tim Chen

[permalink] [raw]
Subject: Re: [PATCH 1/2] Make the batch size of the percpu_counter configurable

On Tue, 2013-04-30 at 11:10 -0700, Eric Dumazet wrote:
> On Tue, 2013-04-30 at 09:23 -0700, Tim Chen wrote:
> > On Tue, 2013-04-30 at 13:32 +0000, Christoph Lameter wrote:
> > > On Mon, 29 Apr 2013, Tim Chen wrote:
> > >
> > > > diff --git a/include/linux/percpu_counter.h b/include/linux/percpu_counter.h
> > > > index d5dd465..5ca7df5 100644
> > > > --- a/include/linux/percpu_counter.h
> > > > +++ b/include/linux/percpu_counter.h
> > > > @@ -22,6 +22,7 @@ struct percpu_counter {
> > > > struct list_head list; /* All percpu_counters are on a list */
> > > > #endif
> > > > s32 __percpu *counters;
> > > > + int *batch ____cacheline_aligned_in_smp;
> > > > };
> > >
> > > What is this for and why does it have that alignmend?
> >
> > I was assuming that if batch is frequently referenced, it probably
> > should not share a cache line with the counters field.
>
> But 'counters' field has the same requirement. Its supposed to be read
> only field.
>
> So please remove this '____cacheline_aligned_in_smp', as it makes the
> whole struct percpu_counter at least two cache lines wide, for no
> obvious reason.
>

Will do.

Thanks.

Tim

2013-04-30 19:01:07

by Tim Chen

[permalink] [raw]
Subject: Re: [PATCH 1/2] Make the batch size of the percpu_counter configurable

On Tue, 2013-04-30 at 18:27 +0000, Christoph Lameter wrote:
> On Tue, 30 Apr 2013, Tim Chen wrote:
>
> > > But you will have to dereference the pointer whenever you want the batch
> > > size from the hot path. Looks like it would be better to put the value
> > > there directly. You have a list of percpu counters that can be traversed
> > > to change the batch size.
> > >
> >
> > I have considered that. But the list is not available unless we have
> > CONFIG_HOTPLUG_CPU compiled in.
>
> percpu counters are performance sensitive and with the pointer you
> will need to reference another one increasing the cache footprint. You are
> touching an additional cacheline somewhere in memory frequently. Not good.
>

Seems like there's objection to make batch a pointer and I'll have to
store its value in per cpu pointer.

Will it be acceptable if I make the per cpu counter list under
CONFIG_CPU_HOTPLUG default? I will need the list to go through all
counters to update the batch value sizes. The alternative will be to
make the configurable batch option only available under
CONFIG_HOTPLUG_CPU.

Which is the preferable approach or there are other clever alternatives?

Tim

2013-04-30 19:04:46

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 1/2] Make the batch size of the percpu_counter configurable

Hello,

On Tue, Apr 30, 2013 at 12:00:57PM -0700, Tim Chen wrote:
> Will it be acceptable if I make the per cpu counter list under
> CONFIG_CPU_HOTPLUG default? I will need the list to go through all
> counters to update the batch value sizes. The alternative will be to
> make the configurable batch option only available under
> CONFIG_HOTPLUG_CPU.

I haven't looked at the patch but making cpu counter list default
regardless of hotplug cpu should be okay. Most modern configurations,
even most embedded ones, enable CPU hotplug for PM anyway.

Thanks.

--
tejun

Subject: Re: [PATCH 1/2] Make the batch size of the percpu_counter configurable

On Tue, 30 Apr 2013, Tim Chen wrote:

> Will it be acceptable if I make the per cpu counter list under
> CONFIG_CPU_HOTPLUG default? I will need the list to go through all
> counters to update the batch value sizes. The alternative will be to
> make the configurable batch option only available under
> CONFIG_HOTPLUG_CPU.

Make the list default. Could be useful elsewhere.

2013-05-01 05:00:04

by Simon Jeons

[permalink] [raw]
Subject: Re: [PATCH 1/2] Make the batch size of the percpu_counter configurable

Hi Tim,
On 04/30/2013 01:12 AM, Tim Chen wrote:
> Currently, there is a single, global, variable (percpu_counter_batch) that
> controls the batch sizes for every 'struct percpu_counter' on the system.
>
> However, there are some applications, e.g. memory accounting where it is
> more appropriate to scale the batch size according to the memory size.
> This patch adds the infrastructure to be able to change the batch sizes
> for each individual instance of 'struct percpu_counter'.
>
> I have chosen to implement the added field of batch as a pointer
> (by default point to percpu_counter_batch) instead
> of a static value. The reason is the percpu_counter initialization
> can be called when we only have boot cpu and not all cpus are online.

What's the meaning of boot cpu? Do you mean cpu 0?

> and percpu_counter_batch value have yet to be udpated with a
> call to compute_batch_value function.
>
> Thanks to Dave Hansen and Andi Kleen for their comments and suggestions.
>
> Signed-off-by: Tim Chen <[email protected]>
> ---
> include/linux/percpu_counter.h | 20 +++++++++++++++++++-
> lib/percpu_counter.c | 23 ++++++++++++++++++++++-
> 2 files changed, 41 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/percpu_counter.h b/include/linux/percpu_counter.h
> index d5dd465..5ca7df5 100644
> --- a/include/linux/percpu_counter.h
> +++ b/include/linux/percpu_counter.h
> @@ -22,6 +22,7 @@ struct percpu_counter {
> struct list_head list; /* All percpu_counters are on a list */
> #endif
> s32 __percpu *counters;
> + int *batch ____cacheline_aligned_in_smp;
> };
>
> extern int percpu_counter_batch;
> @@ -40,11 +41,22 @@ void percpu_counter_destroy(struct percpu_counter *fbc);
> void percpu_counter_set(struct percpu_counter *fbc, s64 amount);
> void __percpu_counter_add(struct percpu_counter *fbc, s64 amount, s32 batch);
> s64 __percpu_counter_sum(struct percpu_counter *fbc);
> +void __percpu_counter_batch_resize(struct percpu_counter *fbc, int *batch);
> int percpu_counter_compare(struct percpu_counter *fbc, s64 rhs);
>
> +static inline int percpu_counter_and_batch_init(struct percpu_counter *fbc,
> + s64 amount, int *batch)
> +{
> + int ret = percpu_counter_init(fbc, amount);
> +
> + if (batch && !ret)
> + __percpu_counter_batch_resize(fbc, batch);
> + return ret;
> +}
> +
> static inline void percpu_counter_add(struct percpu_counter *fbc, s64 amount)
> {
> - __percpu_counter_add(fbc, amount, percpu_counter_batch);
> + __percpu_counter_add(fbc, amount, *fbc->batch);
> }
>
> static inline s64 percpu_counter_sum_positive(struct percpu_counter *fbc)
> @@ -95,6 +107,12 @@ static inline int percpu_counter_init(struct percpu_counter *fbc, s64 amount)
> return 0;
> }
>
> +static inline int percpu_counter_and_batch_init(struct percpu_counter *fbc,
> + s64 amount, int *batch)
> +{
> + return percpu_counter_init(fbc, amount);
> +}
> +
> static inline void percpu_counter_destroy(struct percpu_counter *fbc)
> {
> }
> diff --git a/lib/percpu_counter.c b/lib/percpu_counter.c
> index ba6085d..a75951e 100644
> --- a/lib/percpu_counter.c
> +++ b/lib/percpu_counter.c
> @@ -116,6 +116,7 @@ int __percpu_counter_init(struct percpu_counter *fbc, s64 amount,
> lockdep_set_class(&fbc->lock, key);
> fbc->count = amount;
> fbc->counters = alloc_percpu(s32);
> + fbc->batch = &percpu_counter_batch;
> if (!fbc->counters)
> return -ENOMEM;
>
> @@ -131,6 +132,26 @@ int __percpu_counter_init(struct percpu_counter *fbc, s64 amount,
> }
> EXPORT_SYMBOL(__percpu_counter_init);
>
> +void __percpu_counter_batch_resize(struct percpu_counter *fbc, int *batch)
> +{
> + unsigned long flags;
> + int cpu;
> +
> + if (!batch)
> + return;
> +
> + raw_spin_lock_irqsave(&fbc->lock, flags);
> + for_each_online_cpu(cpu) {
> + s32 *pcount = per_cpu_ptr(fbc->counters, cpu);
> + fbc->count += *pcount;
> + *pcount = 0;
> + }
> + *batch = max(*batch, percpu_counter_batch);
> + fbc->batch = batch;
> + raw_spin_unlock_irqrestore(&fbc->lock, flags);
> +}
> +EXPORT_SYMBOL(__percpu_counter_batch_resize);
> +
> void percpu_counter_destroy(struct percpu_counter *fbc)
> {
> if (!fbc->counters)
> @@ -196,7 +217,7 @@ int percpu_counter_compare(struct percpu_counter *fbc, s64 rhs)
>
> count = percpu_counter_read(fbc);
> /* Check to see if rough count will be sufficient for comparison */
> - if (abs(count - rhs) > (percpu_counter_batch*num_online_cpus())) {
> + if (abs(count - rhs) > ((*fbc->batch)*num_online_cpus())) {
> if (count > rhs)
> return 1;
> else

2013-05-01 05:09:31

by Ric Mason

[permalink] [raw]
Subject: Re: [PATCH 2/2] Make batch size for memory accounting configured according to size of memory

Hi Tim,
On 04/30/2013 01:12 AM, Tim Chen wrote:
> Currently the per cpu counter's batch size for memory accounting is
> configured as twice the number of cpus in the system. However,
> for system with very large memory, it is more appropriate to make it
> proportional to the memory size per cpu in the system.
>
> For example, for a x86_64 system with 64 cpus and 128 GB of memory,
> the batch size is only 2*64 pages (0.5 MB). So any memory accounting
> changes of more than 0.5MB will overflow the per cpu counter into
> the global counter. Instead, for the new scheme, the batch size
> is configured to be 0.4% of the memory/cpu = 8MB (128 GB/64 /256),

If large batch size will lead to global counter more inaccurate?

> which is more inline with the memory size.
>
> Signed-off-by: Tim Chen <[email protected]>
> ---
> mm/mmap.c | 13 ++++++++++++-
> mm/nommu.c | 13 ++++++++++++-
> 2 files changed, 24 insertions(+), 2 deletions(-)
>
> diff --git a/mm/mmap.c b/mm/mmap.c
> index 0db0de1..082836e 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -89,6 +89,7 @@ int sysctl_max_map_count __read_mostly = DEFAULT_MAX_MAP_COUNT;
> * other variables. It can be updated by several CPUs frequently.
> */
> struct percpu_counter vm_committed_as ____cacheline_aligned_in_smp;
> +int vm_committed_batchsz ____cacheline_aligned_in_smp;
>
> /*
> * The global memory commitment made in the system can be a metric
> @@ -3090,10 +3091,20 @@ void mm_drop_all_locks(struct mm_struct *mm)
> /*
> * initialise the VMA slab
> */
> +static inline int mm_compute_batch(void)
> +{
> + int nr = num_present_cpus();
> +
> + /* batch size set to 0.4% of (total memory/#cpus) */
> + return (int) (totalram_pages/nr) / 256;
> +}
> +
> void __init mmap_init(void)
> {
> int ret;
>
> - ret = percpu_counter_init(&vm_committed_as, 0);
> + vm_committed_batchsz = mm_compute_batch();
> + ret = percpu_counter_and_batch_init(&vm_committed_as, 0,
> + &vm_committed_batchsz);
> VM_BUG_ON(ret);
> }
> diff --git a/mm/nommu.c b/mm/nommu.c
> index 2f3ea74..a87a99c 100644
> --- a/mm/nommu.c
> +++ b/mm/nommu.c
> @@ -59,6 +59,7 @@ unsigned long max_mapnr;
> unsigned long num_physpages;
> unsigned long highest_memmap_pfn;
> struct percpu_counter vm_committed_as;
> +int vm_committed_batchsz;
> int sysctl_overcommit_memory = OVERCOMMIT_GUESS; /* heuristic overcommit */
> int sysctl_overcommit_ratio = 50; /* default is 50% */
> int sysctl_max_map_count = DEFAULT_MAX_MAP_COUNT;
> @@ -526,11 +527,21 @@ SYSCALL_DEFINE1(brk, unsigned long, brk)
> /*
> * initialise the VMA and region record slabs
> */
> +static inline int mm_compute_batch(void)
> +{
> + int nr = num_present_cpus();
> +
> + /* batch size set to 0.4% of (total memory/#cpus) */
> + return (int) (totalram_pages/nr) / 256;
> +}
> +
> void __init mmap_init(void)
> {
> int ret;
>
> - ret = percpu_counter_init(&vm_committed_as, 0);
> + vm_committed_batchsz = mm_compute_batch();
> + ret = percpu_counter_and_batch_init(&vm_committed_as, 0,
> + &vm_committed_batchsz);
> VM_BUG_ON(ret);
> vm_region_jar = KMEM_CACHE(vm_region, SLAB_PANIC);
> }

2013-05-01 15:53:43

by Tim Chen

[permalink] [raw]
Subject: Re: [PATCH 1/2] Make the batch size of the percpu_counter configurable

On Wed, 2013-05-01 at 12:52 +0800, Simon Jeons wrote:
> Hi Tim,
> On 04/30/2013 01:12 AM, Tim Chen wrote:
> > Currently, there is a single, global, variable (percpu_counter_batch) that
> > controls the batch sizes for every 'struct percpu_counter' on the system.
> >
> > However, there are some applications, e.g. memory accounting where it is
> > more appropriate to scale the batch size according to the memory size.
> > This patch adds the infrastructure to be able to change the batch sizes
> > for each individual instance of 'struct percpu_counter'.
> >
> > I have chosen to implement the added field of batch as a pointer
> > (by default point to percpu_counter_batch) instead
> > of a static value. The reason is the percpu_counter initialization
> > can be called when we only have boot cpu and not all cpus are online.
>
> What's the meaning of boot cpu? Do you mean cpu 0?
>

Yes.

2013-05-01 16:07:40

by Tim Chen

[permalink] [raw]
Subject: Re: [PATCH 2/2] Make batch size for memory accounting configured according to size of memory

On Wed, 2013-05-01 at 13:09 +0800, Ric Mason wrote:
> Hi Tim,
> On 04/30/2013 01:12 AM, Tim Chen wrote:
> > Currently the per cpu counter's batch size for memory accounting is
> > configured as twice the number of cpus in the system. However,
> > for system with very large memory, it is more appropriate to make it
> > proportional to the memory size per cpu in the system.
> >
> > For example, for a x86_64 system with 64 cpus and 128 GB of memory,
> > the batch size is only 2*64 pages (0.5 MB). So any memory accounting
> > changes of more than 0.5MB will overflow the per cpu counter into
> > the global counter. Instead, for the new scheme, the batch size
> > is configured to be 0.4% of the memory/cpu = 8MB (128 GB/64 /256),
>
> If large batch size will lead to global counter more inaccurate?
>

I've kept the error tolerance fairly small (0.4%), so it should not be
an issue.

If this is a concern, we can switch to percpu_counter_compare that will
use the global counter quick compare and switch to accurate compare if
needed (like the following).

index d1e4124..c78be36 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -187,7 +187,7 @@ int __vm_enough_memory(struct mm_struct *mm, long pages, int cap_sys_admin)
if (mm)
allowed -= mm->total_vm / 32;

- if (percpu_counter_read_positive(&vm_committed_as) < allowed)
+ if (percpu_counter_compare(&vm_committed_as, allowed) < 0)
return 0;
error:
vm_unacct_memory(pages);


Tim