2013-05-13 19:08:46

by Cody P Schafer

[permalink] [raw]
Subject: [PATCH RESEND v3 00/11] mm: fixup changers of per cpu pageset's ->high and ->batch

"Problems" with the current code:
1. there is a lack of synchronization in setting ->high and ->batch in
percpu_pagelist_fraction_sysctl_handler()
2. stop_machine() in zone_pcp_update() is unnecissary.
3. zone_pcp_update() does not consider the case where percpu_pagelist_fraction is non-zero

To fix:
1. add memory barriers, a safe ->batch value, an update side mutex when
updating ->high and ->batch, and use ACCESS_ONCE() for ->batch users that
expect a stable value.
2. avoid draining pages in zone_pcp_update(), rely upon the memory barriers added to fix #1
3. factor out quite a few functions, and then call the appropriate one.

Note that it results in a change to the behavior of zone_pcp_update(), which is
used by memory_hotplug. I'm rather certain that I've diserned (and preserved)
the essential behavior (changing ->high and ->batch), and only eliminated
unneeded actions (draining the per cpu pages), but this may not be the case.

Further note that the draining of pages that previously took place in
zone_pcp_update() occured after repeated draining when attempting to offline a
page, and after the offline has "succeeded". It appears that the draining was
added to zone_pcp_update() to avoid refactoring setup_pageset() into 2
funtions.

--

Since v2: https://lkml.org/lkml/2013/4/9/718

- note ACCESS_ONCE() in fix #1 above.
- consolidate ->batch & ->high update protocol into a single funtion (Gilad).
- add missing ACCESS_ONCE() on ->batch

Since v1: https://lkml.org/lkml/2013/4/5/444

- instead of using on_each_cpu(), use memory barriers (Gilad) and an update side mutex.
- add "Problem" #3 above, and fix.
- rename function to match naming style of similar function
- move unrelated comment

--

Cody P Schafer (11):
mm/page_alloc: factor out setting of pcp->high and pcp->batch.
mm/page_alloc: prevent concurrent updaters of pcp ->batch and ->high
mm/page_alloc: insert memory barriers to allow async update of pcp
batch and high
mm/page_alloc: protect pcp->batch accesses with ACCESS_ONCE
mm/page_alloc: convert zone_pcp_update() to rely on memory barriers
instead of stop_machine()
mm/page_alloc: when handling percpu_pagelist_fraction, don't unneedly
recalulate high
mm/page_alloc: factor setup_pageset() into pageset_init() and
pageset_set_batch()
mm/page_alloc: relocate comment to be directly above code it refers
to.
mm/page_alloc: factor zone_pageset_init() out of setup_zone_pageset()
mm/page_alloc: in zone_pcp_update(), uze zone_pageset_init()
mm/page_alloc: rename setup_pagelist_highmark() to match naming of
pageset_set_batch()

mm/page_alloc.c | 151 +++++++++++++++++++++++++++++++++-----------------------
1 file changed, 90 insertions(+), 61 deletions(-)

--
1.8.2.2


2013-05-13 19:08:57

by Cody P Schafer

[permalink] [raw]
Subject: [PATCH RESEND v3 01/11] mm/page_alloc: factor out setting of pcp->high and pcp->batch.

Creates pageset_set_batch() for use in setup_pageset().
pageset_set_batch() imitates the functionality of
setup_pagelist_highmark(), but uses the boot time
(percpu_pagelist_fraction == 0) calculations for determining ->high
based on ->batch.

Signed-off-by: Cody P Schafer <[email protected]>
---
mm/page_alloc.c | 12 +++++++++---
1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 98cbdf6..9e556e6 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4030,6 +4030,14 @@ static int __meminit zone_batchsize(struct zone *zone)
#endif
}

+/* a companion to setup_pagelist_highmark() */
+static void pageset_set_batch(struct per_cpu_pageset *p, unsigned long batch)
+{
+ struct per_cpu_pages *pcp = &p->pcp;
+ pcp->high = 6 * batch;
+ pcp->batch = max(1UL, 1 * batch);
+}
+
static void setup_pageset(struct per_cpu_pageset *p, unsigned long batch)
{
struct per_cpu_pages *pcp;
@@ -4039,8 +4047,7 @@ static void setup_pageset(struct per_cpu_pageset *p, unsigned long batch)

pcp = &p->pcp;
pcp->count = 0;
- pcp->high = 6 * batch;
- pcp->batch = max(1UL, 1 * batch);
+ pageset_set_batch(p, batch);
for (migratetype = 0; migratetype < MIGRATE_PCPTYPES; migratetype++)
INIT_LIST_HEAD(&pcp->lists[migratetype]);
}
@@ -4049,7 +4056,6 @@ static void setup_pageset(struct per_cpu_pageset *p, unsigned long batch)
* setup_pagelist_highmark() sets the high water mark for hot per_cpu_pagelist
* to the value high for the pageset p.
*/
-
static void setup_pagelist_highmark(struct per_cpu_pageset *p,
unsigned long high)
{
--
1.8.2.2

2013-05-13 19:09:25

by Cody P Schafer

[permalink] [raw]
Subject: [PATCH RESEND v3 09/11] mm/page_alloc: factor zone_pageset_init() out of setup_zone_pageset()

Signed-off-by: Cody P Schafer <[email protected]>
---
mm/page_alloc.c | 27 +++++++++++++++------------
1 file changed, 15 insertions(+), 12 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 53c62c5..0c3cdbb6 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4102,22 +4102,25 @@ static void setup_pagelist_highmark(struct per_cpu_pageset *p,
pageset_update(&p->pcp, high, batch);
}

+static void __meminit zone_pageset_init(struct zone *zone, int cpu)
+{
+ struct per_cpu_pageset *pcp = per_cpu_ptr(zone->pageset, cpu);
+
+ pageset_init(pcp);
+ if (percpu_pagelist_fraction)
+ setup_pagelist_highmark(pcp,
+ (zone->managed_pages /
+ percpu_pagelist_fraction));
+ else
+ pageset_set_batch(pcp, zone_batchsize(zone));
+}
+
static void __meminit setup_zone_pageset(struct zone *zone)
{
int cpu;
-
zone->pageset = alloc_percpu(struct per_cpu_pageset);
-
- for_each_possible_cpu(cpu) {
- struct per_cpu_pageset *pcp = per_cpu_ptr(zone->pageset, cpu);
-
- setup_pageset(pcp, zone_batchsize(zone));
-
- if (percpu_pagelist_fraction)
- setup_pagelist_highmark(pcp,
- (zone->managed_pages /
- percpu_pagelist_fraction));
- }
+ for_each_possible_cpu(cpu)
+ zone_pageset_init(zone, cpu);
}

/*
--
1.8.2.2

2013-05-13 19:09:29

by Cody P Schafer

[permalink] [raw]
Subject: [PATCH RESEND v3 08/11] mm/page_alloc: relocate comment to be directly above code it refers to.

Signed-off-by: Cody P Schafer <[email protected]>
---
mm/page_alloc.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 696ce96..53c62c5 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3709,12 +3709,12 @@ void __ref build_all_zonelists(pg_data_t *pgdat, struct zone *zone)
mminit_verify_zonelist();
cpuset_init_current_mems_allowed();
} else {
- /* we have to stop all cpus to guarantee there is no user
- of zonelist */
#ifdef CONFIG_MEMORY_HOTPLUG
if (zone)
setup_zone_pageset(zone);
#endif
+ /* we have to stop all cpus to guarantee there is no user
+ of zonelist */
stop_machine(__build_all_zonelists, pgdat, NULL);
/* cpuset refresh routine should be here */
}
--
1.8.2.2

2013-05-13 19:09:27

by Cody P Schafer

[permalink] [raw]
Subject: [PATCH RESEND v3 10/11] mm/page_alloc: in zone_pcp_update(), uze zone_pageset_init()

Previously, zone_pcp_update() called pageset_set_batch() directly,
essentially assuming that percpu_pagelist_fraction == 0. Correct this by
calling zone_pageset_init(), which chooses the appropriate ->batch and
->high calculations.

Signed-off-by: Cody P Schafer <[email protected]>
---
mm/page_alloc.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 0c3cdbb6..251fb5f 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6096,11 +6096,9 @@ void free_contig_range(unsigned long pfn, unsigned nr_pages)
void __meminit zone_pcp_update(struct zone *zone)
{
unsigned cpu;
- unsigned long batch;
mutex_lock(&pcp_batch_high_lock);
- batch = zone_batchsize(zone);
for_each_possible_cpu(cpu)
- pageset_set_batch(per_cpu_ptr(zone->pageset, cpu), batch);
+ zone_pageset_init(zone, cpu);
mutex_unlock(&pcp_batch_high_lock);
}
#endif
--
1.8.2.2

2013-05-13 19:09:23

by Cody P Schafer

[permalink] [raw]
Subject: [PATCH RESEND v3 07/11] mm/page_alloc: factor setup_pageset() into pageset_init() and pageset_set_batch()

Signed-off-by: Cody P Schafer <[email protected]>
---
mm/page_alloc.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 3583281..696ce96 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4069,7 +4069,7 @@ static void pageset_set_batch(struct per_cpu_pageset *p, unsigned long batch)
pageset_update(&p->pcp, 6 * batch, max(1UL, 1 * batch));
}

-static void setup_pageset(struct per_cpu_pageset *p, unsigned long batch)
+static void pageset_init(struct per_cpu_pageset *p)
{
struct per_cpu_pages *pcp;
int migratetype;
@@ -4078,11 +4078,16 @@ static void setup_pageset(struct per_cpu_pageset *p, unsigned long batch)

pcp = &p->pcp;
pcp->count = 0;
- pageset_set_batch(p, batch);
for (migratetype = 0; migratetype < MIGRATE_PCPTYPES; migratetype++)
INIT_LIST_HEAD(&pcp->lists[migratetype]);
}

+static void setup_pageset(struct per_cpu_pageset *p, unsigned long batch)
+{
+ pageset_init(p);
+ pageset_set_batch(p, batch);
+}
+
/*
* setup_pagelist_highmark() sets the high water mark for hot per_cpu_pagelist
* to the value high for the pageset p.
--
1.8.2.2

2013-05-13 19:09:20

by Cody P Schafer

[permalink] [raw]
Subject: [PATCH RESEND v3 05/11] mm/page_alloc: convert zone_pcp_update() to rely on memory barriers instead of stop_machine()

zone_pcp_update()'s goal is to adjust the ->high and ->mark members of a
percpu pageset based on a zone's ->managed_pages. We don't need to drain
the entire percpu pageset just to modify these fields.

This lets us avoid calling setup_pageset() (and the draining required to
call it) and instead allows simply setting the fields' values (with some
attention paid to memory barriers to prevent the relationship between
->batch and ->high from being thrown off).

This does change the behavior of zone_pcp_update() as the percpu
pagesets will not be drained when zone_pcp_update() is called (they will
end up being shrunk, not completely drained, later when a 0-order page
is freed in free_hot_cold_page()).

Signed-off-by: Cody P Schafer <[email protected]>
---
mm/page_alloc.c | 33 +++++++++------------------------
1 file changed, 9 insertions(+), 24 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 71d843d..5a00195 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6083,33 +6083,18 @@ void free_contig_range(unsigned long pfn, unsigned nr_pages)
#endif

#ifdef CONFIG_MEMORY_HOTPLUG
-static int __meminit __zone_pcp_update(void *data)
-{
- struct zone *zone = data;
- int cpu;
- unsigned long batch = zone_batchsize(zone), flags;
-
- for_each_possible_cpu(cpu) {
- struct per_cpu_pageset *pset;
- struct per_cpu_pages *pcp;
-
- pset = per_cpu_ptr(zone->pageset, cpu);
- pcp = &pset->pcp;
-
- local_irq_save(flags);
- if (pcp->count > 0)
- free_pcppages_bulk(zone, pcp->count, pcp);
- drain_zonestat(zone, pset);
- setup_pageset(pset, batch);
- local_irq_restore(flags);
- }
- return 0;
-}
-
+/*
+ * The zone indicated has a new number of managed_pages; batch sizes and percpu
+ * page high values need to be recalulated.
+ */
void __meminit zone_pcp_update(struct zone *zone)
{
+ unsigned cpu;
+ unsigned long batch;
mutex_lock(&pcp_batch_high_lock);
- stop_machine(__zone_pcp_update, zone, NULL);
+ batch = zone_batchsize(zone);
+ for_each_possible_cpu(cpu)
+ pageset_set_batch(per_cpu_ptr(zone->pageset, cpu), batch);
mutex_unlock(&pcp_batch_high_lock);
}
#endif
--
1.8.2.2

2013-05-13 19:09:18

by Cody P Schafer

[permalink] [raw]
Subject: [PATCH RESEND v3 02/11] mm/page_alloc: prevent concurrent updaters of pcp ->batch and ->high

Because we are going to rely upon a careful transision between old and
new ->high and ->batch values using memory barriers and will remove
stop_machine(), we need to prevent multiple updaters from interweaving
their memory writes.

Add a simple mutex to protect both update loops.

Signed-off-by: Cody P Schafer <[email protected]>
---
mm/page_alloc.c | 8 ++++++++
1 file changed, 8 insertions(+)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 9e556e6..cea883d 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -65,6 +65,9 @@
#include <asm/div64.h>
#include "internal.h"

+/* prevent >1 _updater_ of zone percpu pageset ->high and ->batch fields */
+static DEFINE_MUTEX(pcp_batch_high_lock);
+
#ifdef CONFIG_USE_PERCPU_NUMA_NODE_ID
DEFINE_PER_CPU(int, numa_node);
EXPORT_PER_CPU_SYMBOL(numa_node);
@@ -5555,6 +5558,8 @@ int percpu_pagelist_fraction_sysctl_handler(ctl_table *table, int write,
ret = proc_dointvec_minmax(table, write, buffer, length, ppos);
if (!write || (ret < 0))
return ret;
+
+ mutex_lock(&pcp_batch_high_lock);
for_each_populated_zone(zone) {
for_each_possible_cpu(cpu) {
unsigned long high;
@@ -5563,6 +5568,7 @@ int percpu_pagelist_fraction_sysctl_handler(ctl_table *table, int write,
per_cpu_ptr(zone->pageset, cpu), high);
}
}
+ mutex_unlock(&pcp_batch_high_lock);
return 0;
}

@@ -6076,7 +6082,9 @@ static int __meminit __zone_pcp_update(void *data)

void __meminit zone_pcp_update(struct zone *zone)
{
+ mutex_lock(&pcp_batch_high_lock);
stop_machine(__zone_pcp_update, zone, NULL);
+ mutex_unlock(&pcp_batch_high_lock);
}
#endif

--
1.8.2.2

2013-05-13 19:12:44

by Cody P Schafer

[permalink] [raw]
Subject: [PATCH RESEND v3 11/11] mm/page_alloc: rename setup_pagelist_highmark() to match naming of pageset_set_batch()

Signed-off-by: Cody P Schafer <[email protected]>
---
mm/page_alloc.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 251fb5f..b335c98 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4063,7 +4063,7 @@ static void pageset_update(struct per_cpu_pages *pcp, unsigned long high,
pcp->batch = batch;
}

-/* a companion to setup_pagelist_highmark() */
+/* a companion to pageset_set_high() */
static void pageset_set_batch(struct per_cpu_pageset *p, unsigned long batch)
{
pageset_update(&p->pcp, 6 * batch, max(1UL, 1 * batch));
@@ -4089,10 +4089,10 @@ static void setup_pageset(struct per_cpu_pageset *p, unsigned long batch)
}

/*
- * setup_pagelist_highmark() sets the high water mark for hot per_cpu_pagelist
+ * pageset_set_high() sets the high water mark for hot per_cpu_pagelist
* to the value high for the pageset p.
*/
-static void setup_pagelist_highmark(struct per_cpu_pageset *p,
+static void pageset_set_high(struct per_cpu_pageset *p,
unsigned long high)
{
unsigned long batch = max(1UL, high / 4);
@@ -4108,7 +4108,7 @@ static void __meminit zone_pageset_init(struct zone *zone, int cpu)

pageset_init(pcp);
if (percpu_pagelist_fraction)
- setup_pagelist_highmark(pcp,
+ pageset_set_high(pcp,
(zone->managed_pages /
percpu_pagelist_fraction));
else
@@ -5597,8 +5597,8 @@ int percpu_pagelist_fraction_sysctl_handler(ctl_table *table, int write,
unsigned long high;
high = zone->managed_pages / percpu_pagelist_fraction;
for_each_possible_cpu(cpu)
- setup_pagelist_highmark(
- per_cpu_ptr(zone->pageset, cpu), high);
+ pageset_set_high(per_cpu_ptr(zone->pageset, cpu),
+ high);
}
mutex_unlock(&pcp_batch_high_lock);
return 0;
--
1.8.2.2

2013-05-13 19:14:06

by Cody P Schafer

[permalink] [raw]
Subject: [PATCH RESEND v3 06/11] mm/page_alloc: when handling percpu_pagelist_fraction, don't unneedly recalulate high

Simply moves calculation of the new 'high' value outside the
for_each_possible_cpu() loop, as it does not depend on the cpu.

Signed-off-by: Cody P Schafer <[email protected]>
---
mm/page_alloc.c | 10 ++++------
1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 5a00195..3583281 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5573,7 +5573,6 @@ int lowmem_reserve_ratio_sysctl_handler(ctl_table *table, int write,
* cpu. It is the fraction of total pages in each zone that a hot per cpu pagelist
* can have before it gets flushed back to buddy allocator.
*/
-
int percpu_pagelist_fraction_sysctl_handler(ctl_table *table, int write,
void __user *buffer, size_t *length, loff_t *ppos)
{
@@ -5587,12 +5586,11 @@ int percpu_pagelist_fraction_sysctl_handler(ctl_table *table, int write,

mutex_lock(&pcp_batch_high_lock);
for_each_populated_zone(zone) {
- for_each_possible_cpu(cpu) {
- unsigned long high;
- high = zone->managed_pages / percpu_pagelist_fraction;
+ unsigned long high;
+ high = zone->managed_pages / percpu_pagelist_fraction;
+ for_each_possible_cpu(cpu)
setup_pagelist_highmark(
- per_cpu_ptr(zone->pageset, cpu), high);
- }
+ per_cpu_ptr(zone->pageset, cpu), high);
}
mutex_unlock(&pcp_batch_high_lock);
return 0;
--
1.8.2.2

2013-05-13 19:14:16

by Cody P Schafer

[permalink] [raw]
Subject: [PATCH RESEND v3 03/11] mm/page_alloc: insert memory barriers to allow async update of pcp batch and high

Introduce pageset_update() to perform a safe transision from one set of
pcp->{batch,high} to a new set using memory barriers.

This ensures that batch is always set to a safe value (1) prior to
updating high, and ensure that high is fully updated before setting the
real value of batch. It avoids ->batch ever rising above ->high.

Suggested by Gilad Ben-Yossef <[email protected]> in these threads:

https://lkml.org/lkml/2013/4/9/23
https://lkml.org/lkml/2013/4/10/49

Also reproduces his proposed comment.

Reviewed-by: Gilad Ben-Yossef <[email protected]>
Signed-off-by: Cody P Schafer <[email protected]>
---
mm/page_alloc.c | 41 ++++++++++++++++++++++++++++++++---------
1 file changed, 32 insertions(+), 9 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index cea883d..7e45b91 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4033,12 +4033,37 @@ static int __meminit zone_batchsize(struct zone *zone)
#endif
}

+/*
+ * pcp->high and pcp->batch values are related and dependent on one another:
+ * ->batch must never be higher then ->high.
+ * The following function updates them in a safe manner without read side
+ * locking.
+ *
+ * Any new users of pcp->batch and pcp->high should ensure they can cope with
+ * those fields changing asynchronously (acording the the above rule).
+ *
+ * mutex_is_locked(&pcp_batch_high_lock) required when calling this function
+ * outside of boot time (or some other assurance that no concurrent updaters
+ * exist).
+ */
+static void pageset_update(struct per_cpu_pages *pcp, unsigned long high,
+ unsigned long batch)
+{
+ /* start with a fail safe value for batch */
+ pcp->batch = 1;
+ smp_wmb();
+
+ /* Update high, then batch, in order */
+ pcp->high = high;
+ smp_wmb();
+
+ pcp->batch = batch;
+}
+
/* a companion to setup_pagelist_highmark() */
static void pageset_set_batch(struct per_cpu_pageset *p, unsigned long batch)
{
- struct per_cpu_pages *pcp = &p->pcp;
- pcp->high = 6 * batch;
- pcp->batch = max(1UL, 1 * batch);
+ pageset_update(&p->pcp, 6 * batch, max(1UL, 1 * batch));
}

static void setup_pageset(struct per_cpu_pageset *p, unsigned long batch)
@@ -4062,13 +4087,11 @@ static void setup_pageset(struct per_cpu_pageset *p, unsigned long batch)
static void setup_pagelist_highmark(struct per_cpu_pageset *p,
unsigned long high)
{
- struct per_cpu_pages *pcp;
+ unsigned long batch = max(1UL, high / 4);
+ if ((high / 4) > (PAGE_SHIFT * 8))
+ batch = PAGE_SHIFT * 8;

- pcp = &p->pcp;
- pcp->high = high;
- pcp->batch = max(1UL, high/4);
- if ((high/4) > (PAGE_SHIFT * 8))
- pcp->batch = PAGE_SHIFT * 8;
+ pageset_update(&p->pcp, high, batch);
}

static void __meminit setup_zone_pageset(struct zone *zone)
--
1.8.2.2

2013-05-13 19:14:26

by Cody P Schafer

[permalink] [raw]
Subject: [PATCH RESEND v3 04/11] mm/page_alloc: protect pcp->batch accesses with ACCESS_ONCE

pcp->batch could change at any point, avoid relying on it being a stable value.

Signed-off-by: Cody P Schafer <[email protected]>
---
mm/page_alloc.c | 11 +++++++----
1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 7e45b91..71d843d 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1182,10 +1182,12 @@ void drain_zone_pages(struct zone *zone, struct per_cpu_pages *pcp)
{
unsigned long flags;
int to_drain;
+ unsigned long batch;

local_irq_save(flags);
- if (pcp->count >= pcp->batch)
- to_drain = pcp->batch;
+ batch = ACCESS_ONCE(pcp->batch);
+ if (pcp->count >= batch)
+ to_drain = batch;
else
to_drain = pcp->count;
if (to_drain > 0) {
@@ -1353,8 +1355,9 @@ void free_hot_cold_page(struct page *page, int cold)
list_add(&page->lru, &pcp->lists[migratetype]);
pcp->count++;
if (pcp->count >= pcp->high) {
- free_pcppages_bulk(zone, pcp->batch, pcp);
- pcp->count -= pcp->batch;
+ unsigned long batch = ACCESS_ONCE(pcp->batch);
+ free_pcppages_bulk(zone, batch, pcp);
+ pcp->count -= batch;
}

out:
--
1.8.2.2

2013-05-13 19:20:56

by Pekka Enberg

[permalink] [raw]
Subject: Re: [PATCH RESEND v3 00/11] mm: fixup changers of per cpu pageset's ->high and ->batch

Hi Cody,

On Mon, May 13, 2013 at 10:08 PM, Cody P Schafer
<[email protected]> wrote:
> "Problems" with the current code:
> 1. there is a lack of synchronization in setting ->high and ->batch in
> percpu_pagelist_fraction_sysctl_handler()
> 2. stop_machine() in zone_pcp_update() is unnecissary.
> 3. zone_pcp_update() does not consider the case where percpu_pagelist_fraction is non-zero

Maybe it's just me but I find the above problem description confusing.
How does the problem manifest itself? How did you find about it? Why
do we need to fix all three problems in the same patch set?

Pekka

2013-05-13 20:48:01

by Cody P Schafer

[permalink] [raw]
Subject: Re: [PATCH RESEND v3 00/11] mm: fixup changers of per cpu pageset's ->high and ->batch

On 05/13/2013 12:20 PM, Pekka Enberg wrote:
> Hi Cody,
>
> On Mon, May 13, 2013 at 10:08 PM, Cody P Schafer
> <[email protected]> wrote:
>> "Problems" with the current code:
>> 1. there is a lack of synchronization in setting ->high and ->batch in
>> percpu_pagelist_fraction_sysctl_handler()
>> 2. stop_machine() in zone_pcp_update() is unnecissary.
>> 3. zone_pcp_update() does not consider the case where percpu_pagelist_fraction is non-zero
>
> Maybe it's just me but I find the above problem description confusing.
> How does the problem manifest itself?

1. I've not reproduced this causing issues.
2. Calling zone_pcp_update() is slow.
3. Not reproduced either, but would cause percpu_pagelist_fraction (set
via sysctl) to be ignored after a call to zone_pcp_update() (for
example, after a memory hotplug).

> How did you find about it?

I'm writing some code that resizes zones and thus uses
zone_pcp_update(), and fixing broken things along the way.

> Why
> do we need to fix all three problems in the same patch set?

They all affect the same bit of code and fixing all of them means
restructuring the both of the sites where ->high and ->batch are set.

Additionally, splitting it out (if possible) would make it less clear
what the overall goal is, and would mean a few inter-patchset
dependencies, which are undesirable.