2023-02-09 15:34:07

by Marcelo Tosatti

[permalink] [raw]
Subject: [PATCH v2 09/11] mm/vmstat: use cmpxchg loop in cpu_vm_stats_fold

In preparation to switch vmstat shepherd to flush
per-CPU counters remotely, use a cmpxchg loop
instead of a pair of read/write instructions.

Signed-off-by: Marcelo Tosatti <[email protected]>

Index: linux-2.6/mm/vmstat.c
===================================================================
--- linux-2.6.orig/mm/vmstat.c
+++ linux-2.6/mm/vmstat.c
@@ -885,7 +885,7 @@ static int refresh_cpu_vm_stats(void)
}

/*
- * Fold the data for an offline cpu into the global array.
+ * Fold the data for a cpu into the global array.
* There cannot be any access by the offline cpu and therefore
* synchronization is simplified.
*/
@@ -906,8 +906,9 @@ void cpu_vm_stats_fold(int cpu)
if (pzstats->vm_stat_diff[i]) {
int v;

- v = pzstats->vm_stat_diff[i];
- pzstats->vm_stat_diff[i] = 0;
+ do {
+ v = pzstats->vm_stat_diff[i];
+ } while (!try_cmpxchg(&pzstats->vm_stat_diff[i], &v, 0));
atomic_long_add(v, &zone->vm_stat[i]);
global_zone_diff[i] += v;
}
@@ -917,8 +918,9 @@ void cpu_vm_stats_fold(int cpu)
if (pzstats->vm_numa_event[i]) {
unsigned long v;

- v = pzstats->vm_numa_event[i];
- pzstats->vm_numa_event[i] = 0;
+ do {
+ v = pzstats->vm_numa_event[i];
+ } while (!try_cmpxchg(&pzstats->vm_numa_event[i], &v, 0));
zone_numa_event_add(v, zone, i);
}
}
@@ -934,8 +936,9 @@ void cpu_vm_stats_fold(int cpu)
if (p->vm_node_stat_diff[i]) {
int v;

- v = p->vm_node_stat_diff[i];
- p->vm_node_stat_diff[i] = 0;
+ do {
+ v = p->vm_node_stat_diff[i];
+ } while (!try_cmpxchg(&p->vm_node_stat_diff[i], &v, 0));
atomic_long_add(v, &pgdat->vm_stat[i]);
global_node_diff[i] += v;
}




2023-03-01 22:58:04

by Peter Xu

[permalink] [raw]
Subject: Re: [PATCH v2 09/11] mm/vmstat: use cmpxchg loop in cpu_vm_stats_fold

On Thu, Feb 09, 2023 at 12:01:59PM -0300, Marcelo Tosatti wrote:
> /*
> - * Fold the data for an offline cpu into the global array.
> + * Fold the data for a cpu into the global array.
> * There cannot be any access by the offline cpu and therefore
> * synchronization is simplified.
> */
> @@ -906,8 +906,9 @@ void cpu_vm_stats_fold(int cpu)
> if (pzstats->vm_stat_diff[i]) {
> int v;
>
> - v = pzstats->vm_stat_diff[i];
> - pzstats->vm_stat_diff[i] = 0;
> + do {
> + v = pzstats->vm_stat_diff[i];
> + } while (!try_cmpxchg(&pzstats->vm_stat_diff[i], &v, 0));

IIUC try_cmpxchg will update "v" already, so I'd assume this'll work the
same:

while (!try_cmpxchg(&pzstats->vm_stat_diff[i], &v, 0));

Then I figured, maybe it's easier to use xchg()?

I've no knowledge at all on cpu offline code, so sorry if this will be a
naive question. But from what I understand this should not be touched by
anyone else. Reasons:

(1) cpu_vm_stats_fold() is only called in page_alloc_cpu_dead(), and the
comment says:

/*
* Zero the differential counters of the dead processor
* so that the vm statistics are consistent.
*
* This is only okay since the processor is dead and cannot
* race with what we are doing.
*/
cpu_vm_stats_fold(cpu);

so.. I think that's what it says..

(2) If someone can modify the dead cpu's vm_stat_diff, what guarantees it
won't be e.g. boosted again right after try_cmpxchg() / xchg()
returns? What to do with the left-overs?

Thanks,

--
Peter Xu


2023-03-02 14:51:52

by Marcelo Tosatti

[permalink] [raw]
Subject: Re: [PATCH v2 09/11] mm/vmstat: use cmpxchg loop in cpu_vm_stats_fold

On Wed, Mar 01, 2023 at 05:57:08PM -0500, Peter Xu wrote:
> On Thu, Feb 09, 2023 at 12:01:59PM -0300, Marcelo Tosatti wrote:
> > /*
> > - * Fold the data for an offline cpu into the global array.
> > + * Fold the data for a cpu into the global array.
> > * There cannot be any access by the offline cpu and therefore
> > * synchronization is simplified.
> > */
> > @@ -906,8 +906,9 @@ void cpu_vm_stats_fold(int cpu)
> > if (pzstats->vm_stat_diff[i]) {
> > int v;
> >
> > - v = pzstats->vm_stat_diff[i];
> > - pzstats->vm_stat_diff[i] = 0;
> > + do {
> > + v = pzstats->vm_stat_diff[i];
> > + } while (!try_cmpxchg(&pzstats->vm_stat_diff[i], &v, 0));
>
> IIUC try_cmpxchg will update "v" already, so I'd assume this'll work the
> same:
>
> while (!try_cmpxchg(&pzstats->vm_stat_diff[i], &v, 0));
>
> Then I figured, maybe it's easier to use xchg()?

Yes, fixed.

> I've no knowledge at all on cpu offline code, so sorry if this will be a
> naive question. But from what I understand this should not be touched by
> anyone else. Reasons:
>
> (1) cpu_vm_stats_fold() is only called in page_alloc_cpu_dead(), and the
> comment says:
>
> /*
> * Zero the differential counters of the dead processor
> * so that the vm statistics are consistent.
> *
> * This is only okay since the processor is dead and cannot
> * race with what we are doing.
> */
> cpu_vm_stats_fold(cpu);
>
> so.. I think that's what it says..

This refers to the use of this_cpu operations being performed by the
counter updates.

If both the updater and reader use atomic accesses (which is the case after patch 8:
"mm/vmstat: switch counter modification to cmpxchg"), and
CONFIG_HAVE_CMPXCHG_LOCAL is set, then the comment is stale.

Removed it.

> (2) If someone can modify the dead cpu's vm_stat_diff,

The only context that can modify the cpu's vm_stat_diff are:

1) The CPU itself (increases the counter).
2) cpu_vm_stats_fold (from vmstat_shepherd kernel thread), from
x -> 0 only.

So you should not be able to increase the counter after this point.
I suppose this is what this comment refers to.

> what guarantees it
> won't be e.g. boosted again right after try_cmpxchg() / xchg()
> returns? What to do with the left-overs?

If any code runs on the CPU that is being hotunplugged,
after cpu_vm_stats_fold (from page_alloc_cpu_dead), then there will
be left-overs. But such bugs would exist today as well.

Or, if that bug exists, you could replace "for_each_online_cpu" to
"for_each_cpu" here:

static void vmstat_shepherd(struct work_struct *w)
{
int cpu;

cpus_read_lock();
/* Check processors whose vmstat worker threads have been disabled */
for_each_online_cpu(cpu) {


2023-03-02 22:23:19

by Peter Xu

[permalink] [raw]
Subject: Re: [PATCH v2 09/11] mm/vmstat: use cmpxchg loop in cpu_vm_stats_fold

On Thu, Mar 02, 2023 at 10:55:09AM -0300, Marcelo Tosatti wrote:
> > (2) If someone can modify the dead cpu's vm_stat_diff,
>
> The only context that can modify the cpu's vm_stat_diff are:
>
> 1) The CPU itself (increases the counter).
> 2) cpu_vm_stats_fold (from vmstat_shepherd kernel thread), from
> x -> 0 only.

I think I didn't continue reading so I didn't see cpu_vm_stats_fold() will
be reused when commenting, sorry.

Now with a reworked (and SMP-safe) cpu_vm_stats_fold() and vmstats, I'm
wondering the possibility of merging it with refresh_cpu_vm_stats() since
they really look similar.

IIUC the new refresh_cpu_vm_stats() logically doesn't need the small
preempt disabled sections, not anymore, if with a cpu_id passed over to
cpu_vm_stats_fold(), which seems to be even a good side effect. But not
sure I missed something.

--
Peter Xu


2023-03-03 16:40:42

by Marcelo Tosatti

[permalink] [raw]
Subject: Re: [PATCH v2 09/11] mm/vmstat: use cmpxchg loop in cpu_vm_stats_fold

On Thu, Mar 02, 2023 at 04:19:50PM -0500, Peter Xu wrote:
> On Thu, Mar 02, 2023 at 10:55:09AM -0300, Marcelo Tosatti wrote:
> > > (2) If someone can modify the dead cpu's vm_stat_diff,
> >
> > The only context that can modify the cpu's vm_stat_diff are:
> >
> > 1) The CPU itself (increases the counter).
> > 2) cpu_vm_stats_fold (from vmstat_shepherd kernel thread), from
> > x -> 0 only.
>
> I think I didn't continue reading so I didn't see cpu_vm_stats_fold() will
> be reused when commenting, sorry.
>
> Now with a reworked (and SMP-safe) cpu_vm_stats_fold() and vmstats, I'm
> wondering the possibility of merging it with refresh_cpu_vm_stats() since
> they really look similar.

Seems like a possibility. However that might require replacing

v = this_cpu_xchg(pzstats->vm_stat_diff[i], 0);

with

pzstats = per_cpu_ptr(zone->per_cpu_zonestats, cpu);

Which would drop the this_cpu optimization described at

7340a0b15280c9d902c7dd0608b8e751b5a7c403

Also you would not want the unified function to sync NUMA events
(as it would be called from NOHZ entry and exit).

See f19298b9516c1a031b34b4147773457e3efe743b

> IIUC the new refresh_cpu_vm_stats() logically doesn't need the small
> preempt disabled sections, not anymore,

What preempt disabled sections you refer to?

> if with a cpu_id passed over to
> cpu_vm_stats_fold(), which seems to be even a good side effect. But not
> sure I missed something.
>
> --
> Peter Xu
>
>