LinuxLists.cc - [PATCH v2 0/3] perf bench numa: make possible testing on uneven and/or overlapping CPU ranges

2020-08-10 06:23:09

Subject: [PATCH v2 0/3] perf bench numa: make possible testing on uneven and/or overlapping CPU ranges

This series allows running the tool on some configurations
that do not conform to an assumption each node contains
nr_cpus/nr_nodes CPUs at most. Instead, the actual node-to-
CPU mapping is acquired dynamically.

patch 1 fixes the described issue
patches 2,3 are follow-up fixes

Changes since v1:
- numa01* and numa02* test names left intact;
- "2x3-convergence" fix moved out to separate patch

Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>

Alexander Gordeev (3):
perf bench numa: use numa_node_to_cpus() to bind tasks to nodes
perf bench numa: fix number of processes in "2x3-convergence" test
perf bench numa: fix benchmark names

tools/perf/bench/numa.c | 64 ++++++++++++++++++++++++-------------------------
1 file changed, 31 insertions(+), 33 deletions(-)

--
1.8.3.1

2020-08-10 06:23:14

by Alexander Gordeev

[permalink] [raw]

Subject: [PATCH v2 1/3] perf bench numa: use numa_node_to_cpus() to bind tasks to nodes

It is currently assumed that each node contains at most
nr_cpus/nr_nodes CPUs and node CPU ranges do not overlap.
That assumption is generally incorrect as there are archs
where a CPU number does not depend on to its node number.

This update removes the described assumption by simply calling
numa_node_to_cpus() interface and using the returned mask for
binding CPUs to nodes. It also tightens a cpumask allocation
failure check a bit.

Cc: Satheesh Rajendran <[email protected]>
Cc: Srikar Dronamraju <[email protected]>
Cc: Naveen N. Rao <[email protected]>
Cc: Balamuruhan S <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Signed-off-by: Alexander Gordeev <[email protected]>
---
tools/perf/bench/numa.c | 27 +++++++++++++--------------
1 file changed, 13 insertions(+), 14 deletions(-)

diff --git a/tools/perf/bench/numa.c b/tools/perf/bench/numa.c
index 5797253..23e224e 100644
--- a/tools/perf/bench/numa.c
+++ b/tools/perf/bench/numa.c
@@ -247,12 +247,13 @@ static int is_node_present(int node)
*/
static bool node_has_cpus(int node)
{
- struct bitmask *cpu = numa_allocate_cpumask();
+ struct bitmask *cpumask = numa_allocate_cpumask();
unsigned int i;

- if (cpu && !numa_node_to_cpus(node, cpu)) {
- for (i = 0; i < cpu->size; i++) {
- if (numa_bitmask_isbitset(cpu, i))
+ BUG_ON(!cpumask);
+ if (!numa_node_to_cpus(node, cpumask)) {
+ for (i = 0; i < cpumask->size; i++) {
+ if (numa_bitmask_isbitset(cpumask, i))
return true;
}
}
@@ -288,14 +289,10 @@ static cpu_set_t bind_to_cpu(int target_cpu)

static cpu_set_t bind_to_node(int target_node)
{
- int cpus_per_node = g->p.nr_cpus / nr_numa_nodes();
cpu_set_t orig_mask, mask;
int cpu;
int ret;

- BUG_ON(cpus_per_node * nr_numa_nodes() != g->p.nr_cpus);
- BUG_ON(!cpus_per_node);
-
ret = sched_getaffinity(0, sizeof(orig_mask), &orig_mask);
BUG_ON(ret);

@@ -305,13 +302,15 @@ static cpu_set_t bind_to_node(int target_node)
for (cpu = 0; cpu < g->p.nr_cpus; cpu++)
CPU_SET(cpu, &mask);
} else {
- int cpu_start = (target_node + 0) * cpus_per_node;
- int cpu_stop = (target_node + 1) * cpus_per_node;
-
- BUG_ON(cpu_stop > g->p.nr_cpus);
+ struct bitmask *cpumask = numa_allocate_cpumask();

- for (cpu = cpu_start; cpu < cpu_stop; cpu++)
- CPU_SET(cpu, &mask);
+ BUG_ON(!cpumask);
+ if (!numa_node_to_cpus(target_node, cpumask)) {
+ for (cpu = 0; cpu < (int)cpumask->size; cpu++) {
+ if (numa_bitmask_isbitset(cpumask, cpu))
+ CPU_SET(cpu, &mask);
+ }
+ }
}

ret = sched_setaffinity(0, sizeof(mask), &mask);
--
1.8.3.1

2020-08-10 06:26:03

by Alexander Gordeev

[permalink] [raw]

Subject: [PATCH v2 2/3] perf bench numa: fix number of processes in "2x3-convergence" test

Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Signed-off-by: Alexander Gordeev <[email protected]>
---
tools/perf/bench/numa.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/bench/numa.c b/tools/perf/bench/numa.c
index 23e224e..90639c9 100644
--- a/tools/perf/bench/numa.c
+++ b/tools/perf/bench/numa.c
@@ -1754,7 +1754,7 @@ static int run_bench_numa(const char *name, const char **argv)
{ " 1x3-convergence,", "mem", "-p", "1", "-t", "3", "-P", "512", OPT_CONV },
{ " 1x4-convergence,", "mem", "-p", "1", "-t", "4", "-P", "512", OPT_CONV },
{ " 1x6-convergence,", "mem", "-p", "1", "-t", "6", "-P", "1020", OPT_CONV },
- { " 2x3-convergence,", "mem", "-p", "3", "-t", "3", "-P", "1020", OPT_CONV },
+ { " 2x3-convergence,", "mem", "-p", "2", "-t", "3", "-P", "1020", OPT_CONV },
{ " 3x3-convergence,", "mem", "-p", "3", "-t", "3", "-P", "1020", OPT_CONV },
{ " 4x4-convergence,", "mem", "-p", "4", "-t", "4", "-P", "512", OPT_CONV },
{ " 4x4-convergence-NOTHP,",
--
1.8.3.1

2020-08-10 06:26:44

by Alexander Gordeev

[permalink] [raw]

Subject: [PATCH v2 3/3] perf bench numa: fix benchmark names

Standard benchmark names let users know the tests specifics.
For example "2x1-bw-process" name tells that two processes
one thread each are run and the RAM bandwidth is measured.

Several benchmarks names do not correspond to their actual
running configuration. Fix that and also some whitespace
and comment inconsistencies.

Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Signed-off-by: Alexander Gordeev <[email protected]>
---
tools/perf/bench/numa.c | 35 +++++++++++++++++------------------
1 file changed, 17 insertions(+), 18 deletions(-)

diff --git a/tools/perf/bench/numa.c b/tools/perf/bench/numa.c
index 90639c9..3b4b63f 100644
--- a/tools/perf/bench/numa.c
+++ b/tools/perf/bench/numa.c
@@ -812,12 +812,12 @@ static u64 do_work(u8 *__data, long bytes, int nr, int nr_max, int loop, u64 val
}
}
} else if (!g->p.data_backwards || (nr + loop) & 1) {
+ /* Process data forwards: */

d0 = data + off;
d = data + off + 1;
d1 = data + words;

- /* Process data forwards: */
for (;;) {
if (unlikely(d >= d1))
d = data;
@@ -835,7 +835,6 @@ static u64 do_work(u8 *__data, long bytes, int nr, int nr_max, int loop, u64 val
d = data + off - 1;
d1 = data + words;

- /* Process data forwards: */
for (;;) {
if (unlikely(d < data))
d = data + words-1;
@@ -1732,12 +1731,12 @@ static int run_bench_numa(const char *name, const char **argv)
*/
static const char *tests[][MAX_ARGS] = {
/* Basic single-stream NUMA bandwidth measurements: */
- { "RAM-bw-local,", "mem", "-p", "1", "-t", "1", "-P", "1024",
+ { "RAM-bw-local,", "mem", "-p", "1", "-t", "1", "-P", "1024",
"-C" , "0", "-M", "0", OPT_BW_RAM },
{ "RAM-bw-local-NOTHP,",
"mem", "-p", "1", "-t", "1", "-P", "1024",
"-C" , "0", "-M", "0", OPT_BW_RAM_NOTHP },
- { "RAM-bw-remote,", "mem", "-p", "1", "-t", "1", "-P", "1024",
+ { "RAM-bw-remote,", "mem", "-p", "1", "-t", "1", "-P", "1024",
"-C" , "0", "-M", "1", OPT_BW_RAM },

/* 2-stream NUMA bandwidth measurements: */
@@ -1779,24 +1778,24 @@ static int run_bench_numa(const char *name, const char **argv)
"mem", "-p", "8", "-t", "1", "-P", " 512", OPT_BW_NOTHP },
{ "16x1-bw-process,", "mem", "-p", "16", "-t", "1", "-P", "256", OPT_BW },

- { " 4x1-bw-thread,", "mem", "-p", "1", "-t", "4", "-T", "256", OPT_BW },
- { " 8x1-bw-thread,", "mem", "-p", "1", "-t", "8", "-T", "256", OPT_BW },
- { "16x1-bw-thread,", "mem", "-p", "1", "-t", "16", "-T", "128", OPT_BW },
- { "32x1-bw-thread,", "mem", "-p", "1", "-t", "32", "-T", "64", OPT_BW },
+ { " 1x4-bw-thread,", "mem", "-p", "1", "-t", "4", "-T", "256", OPT_BW },
+ { " 1x8-bw-thread,", "mem", "-p", "1", "-t", "8", "-T", "256", OPT_BW },
+ { "1x16-bw-thread,", "mem", "-p", "1", "-t", "16", "-T", "128", OPT_BW },
+ { "1x32-bw-thread,", "mem", "-p", "1", "-t", "32", "-T", "64", OPT_BW },

- { " 2x3-bw-thread,", "mem", "-p", "2", "-t", "3", "-P", "512", OPT_BW },
- { " 4x4-bw-thread,", "mem", "-p", "4", "-t", "4", "-P", "512", OPT_BW },
- { " 4x6-bw-thread,", "mem", "-p", "4", "-t", "6", "-P", "512", OPT_BW },
- { " 4x8-bw-thread,", "mem", "-p", "4", "-t", "8", "-P", "512", OPT_BW },
- { " 4x8-bw-thread-NOTHP,",
+ { " 2x3-bw-process,", "mem", "-p", "2", "-t", "3", "-P", "512", OPT_BW },
+ { " 4x4-bw-process,", "mem", "-p", "4", "-t", "4", "-P", "512", OPT_BW },
+ { " 4x6-bw-process,", "mem", "-p", "4", "-t", "6", "-P", "512", OPT_BW },
+ { " 4x8-bw-process,", "mem", "-p", "4", "-t", "8", "-P", "512", OPT_BW },
+ { " 4x8-bw-process-NOTHP,",
"mem", "-p", "4", "-t", "8", "-P", "512", OPT_BW_NOTHP },
- { " 3x3-bw-thread,", "mem", "-p", "3", "-t", "3", "-P", "512", OPT_BW },
- { " 5x5-bw-thread,", "mem", "-p", "5", "-t", "5", "-P", "512", OPT_BW },
+ { " 3x3-bw-process,", "mem", "-p", "3", "-t", "3", "-P", "512", OPT_BW },
+ { " 5x5-bw-process,", "mem", "-p", "5", "-t", "5", "-P", "512", OPT_BW },

- { "2x16-bw-thread,", "mem", "-p", "2", "-t", "16", "-P", "512", OPT_BW },
- { "1x32-bw-thread,", "mem", "-p", "1", "-t", "32", "-P", "2048", OPT_BW },
+ { "2x16-bw-process,", "mem", "-p", "2", "-t", "16", "-P", "512", OPT_BW },
+ { "1x32-bw-process,", "mem", "-p", "1", "-t", "32", "-P", "2048", OPT_BW },

- { "numa02-bw,", "mem", "-p", "1", "-t", "32", "-T", "32", OPT_BW },
+ { "numa02-bw,", "mem", "-p", "1", "-t", "32", "-T", "32", OPT_BW },
{ "numa02-bw-NOTHP,", "mem", "-p", "1", "-t", "32", "-T", "32", OPT_BW_NOTHP },
{ "numa01-bw-thread,", "mem", "-p", "2", "-t", "16", "-T", "192", OPT_BW },
{ "numa01-bw-thread-NOTHP,",
--
1.8.3.1

2020-08-11 07:30:21

by Namhyung Kim

[permalink] [raw]

Subject: Re: [PATCH v2 1/3] perf bench numa: use numa_node_to_cpus() to bind tasks to nodes

Hello,

On Mon, Aug 10, 2020 at 3:22 PM Alexander Gordeev
<[email protected]> wrote:
>
> It is currently assumed that each node contains at most
> nr_cpus/nr_nodes CPUs and node CPU ranges do not overlap.
> That assumption is generally incorrect as there are archs
> where a CPU number does not depend on to its node number.
>
> This update removes the described assumption by simply calling
> numa_node_to_cpus() interface and using the returned mask for
> binding CPUs to nodes. It also tightens a cpumask allocation
> failure check a bit.
>
> Cc: Satheesh Rajendran <[email protected]>
> Cc: Srikar Dronamraju <[email protected]>
> Cc: Naveen N. Rao <[email protected]>
> Cc: Balamuruhan S <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Arnaldo Carvalho de Melo <[email protected]>
> Cc: Mark Rutland <[email protected]>
> Cc: Alexander Shishkin <[email protected]>
> Cc: Jiri Olsa <[email protected]>
> Cc: Namhyung Kim <[email protected]>
> Signed-off-by: Alexander Gordeev <[email protected]>
> ---
> tools/perf/bench/numa.c | 27 +++++++++++++--------------
> 1 file changed, 13 insertions(+), 14 deletions(-)
>
> diff --git a/tools/perf/bench/numa.c b/tools/perf/bench/numa.c
> index 5797253..23e224e 100644
> --- a/tools/perf/bench/numa.c
> +++ b/tools/perf/bench/numa.c
> @@ -247,12 +247,13 @@ static int is_node_present(int node)
> */
> static bool node_has_cpus(int node)
> {
> - struct bitmask *cpu = numa_allocate_cpumask();
> + struct bitmask *cpumask = numa_allocate_cpumask();
> unsigned int i;
>
> - if (cpu && !numa_node_to_cpus(node, cpu)) {
> - for (i = 0; i < cpu->size; i++) {
> - if (numa_bitmask_isbitset(cpu, i))
> + BUG_ON(!cpumask);
> + if (!numa_node_to_cpus(node, cpumask)) {
> + for (i = 0; i < cpumask->size; i++) {
> + if (numa_bitmask_isbitset(cpumask, i))
> return true;
> }
> }
> @@ -288,14 +289,10 @@ static cpu_set_t bind_to_cpu(int target_cpu)
>
> static cpu_set_t bind_to_node(int target_node)
> {
> - int cpus_per_node = g->p.nr_cpus / nr_numa_nodes();
> cpu_set_t orig_mask, mask;
> int cpu;
> int ret;
>
> - BUG_ON(cpus_per_node * nr_numa_nodes() != g->p.nr_cpus);
> - BUG_ON(!cpus_per_node);
> -
> ret = sched_getaffinity(0, sizeof(orig_mask), &orig_mask);
> BUG_ON(ret);
>
> @@ -305,13 +302,15 @@ static cpu_set_t bind_to_node(int target_node)
> for (cpu = 0; cpu < g->p.nr_cpus; cpu++)
> CPU_SET(cpu, &mask);
> } else {
> - int cpu_start = (target_node + 0) * cpus_per_node;
> - int cpu_stop = (target_node + 1) * cpus_per_node;
> -
> - BUG_ON(cpu_stop > g->p.nr_cpus);
> + struct bitmask *cpumask = numa_allocate_cpumask();
>
> - for (cpu = cpu_start; cpu < cpu_stop; cpu++)
> - CPU_SET(cpu, &mask);
> + BUG_ON(!cpumask);
> + if (!numa_node_to_cpus(target_node, cpumask)) {
> + for (cpu = 0; cpu < (int)cpumask->size; cpu++) {
> + if (numa_bitmask_isbitset(cpumask, cpu))
> + CPU_SET(cpu, &mask);
> + }
> + }

It seems you need to call numa_free_cpumask() for both functions.

Thanks
Namhyung

> }
>
> ret = sched_setaffinity(0, sizeof(mask), &mask);
> --
> 1.8.3.1
>

2020-08-11 07:30:48

by Namhyung Kim

[permalink] [raw]

Subject: Re: [PATCH v2 2/3] perf bench numa: fix number of processes in "2x3-convergence" test

On Mon, Aug 10, 2020 at 3:22 PM Alexander Gordeev
<[email protected]> wrote:
>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Arnaldo Carvalho de Melo <[email protected]>
> Cc: Mark Rutland <[email protected]>
> Cc: Alexander Shishkin <[email protected]>
> Cc: Jiri Olsa <[email protected]>
> Cc: Namhyung Kim <[email protected]>
> Signed-off-by: Alexander Gordeev <[email protected]>

Acked-by: Namhyung Kim <[email protected]>

Thanks
Namhyung

> ---
> tools/perf/bench/numa.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/tools/perf/bench/numa.c b/tools/perf/bench/numa.c
> index 23e224e..90639c9 100644
> --- a/tools/perf/bench/numa.c
> +++ b/tools/perf/bench/numa.c
> @@ -1754,7 +1754,7 @@ static int run_bench_numa(const char *name, const char **argv)
> { " 1x3-convergence,", "mem", "-p", "1", "-t", "3", "-P", "512", OPT_CONV },
> { " 1x4-convergence,", "mem", "-p", "1", "-t", "4", "-P", "512", OPT_CONV },
> { " 1x6-convergence,", "mem", "-p", "1", "-t", "6", "-P", "1020", OPT_CONV },
> - { " 2x3-convergence,", "mem", "-p", "3", "-t", "3", "-P", "1020", OPT_CONV },
> + { " 2x3-convergence,", "mem", "-p", "2", "-t", "3", "-P", "1020", OPT_CONV },
> { " 3x3-convergence,", "mem", "-p", "3", "-t", "3", "-P", "1020", OPT_CONV },
> { " 4x4-convergence,", "mem", "-p", "4", "-t", "4", "-P", "512", OPT_CONV },
> { " 4x4-convergence-NOTHP,",
> --
> 1.8.3.1
>

2020-08-11 08:12:38

by Alexander Gordeev

[permalink] [raw]

Subject: Re: [PATCH v2 1/3] perf bench numa: use numa_node_to_cpus() to bind tasks to nodes

On Tue, Aug 11, 2020 at 04:26:50PM +0900, Namhyung Kim wrote:
> Hello,
>
> On Mon, Aug 10, 2020 at 3:22 PM Alexander Gordeev
> <[email protected]> wrote:
> >
> > It is currently assumed that each node contains at most
> > nr_cpus/nr_nodes CPUs and node CPU ranges do not overlap.
> > That assumption is generally incorrect as there are archs
> > where a CPU number does not depend on to its node number.
> >
> > This update removes the described assumption by simply calling
> > numa_node_to_cpus() interface and using the returned mask for
> > binding CPUs to nodes. It also tightens a cpumask allocation
> > failure check a bit.
> >
> > Cc: Satheesh Rajendran <[email protected]>
> > Cc: Srikar Dronamraju <[email protected]>
> > Cc: Naveen N. Rao <[email protected]>
> > Cc: Balamuruhan S <[email protected]>
> > Cc: Peter Zijlstra <[email protected]>
> > Cc: Ingo Molnar <[email protected]>
> > Cc: Arnaldo Carvalho de Melo <[email protected]>
> > Cc: Mark Rutland <[email protected]>
> > Cc: Alexander Shishkin <[email protected]>
> > Cc: Jiri Olsa <[email protected]>
> > Cc: Namhyung Kim <[email protected]>
> > Signed-off-by: Alexander Gordeev <[email protected]>
> > ---
> > tools/perf/bench/numa.c | 27 +++++++++++++--------------
> > 1 file changed, 13 insertions(+), 14 deletions(-)
> >
> > diff --git a/tools/perf/bench/numa.c b/tools/perf/bench/numa.c
> > index 5797253..23e224e 100644
> > --- a/tools/perf/bench/numa.c
> > +++ b/tools/perf/bench/numa.c
> > @@ -247,12 +247,13 @@ static int is_node_present(int node)
> > */
> > static bool node_has_cpus(int node)
> > {
> > - struct bitmask *cpu = numa_allocate_cpumask();
> > + struct bitmask *cpumask = numa_allocate_cpumask();
> > unsigned int i;
> >
> > - if (cpu && !numa_node_to_cpus(node, cpu)) {
> > - for (i = 0; i < cpu->size; i++) {
> > - if (numa_bitmask_isbitset(cpu, i))
> > + BUG_ON(!cpumask);
> > + if (!numa_node_to_cpus(node, cpumask)) {
> > + for (i = 0; i < cpumask->size; i++) {
> > + if (numa_bitmask_isbitset(cpumask, i))
> > return true;
> > }
> > }
> > @@ -288,14 +289,10 @@ static cpu_set_t bind_to_cpu(int target_cpu)
> >
> > static cpu_set_t bind_to_node(int target_node)
> > {
> > - int cpus_per_node = g->p.nr_cpus / nr_numa_nodes();
> > cpu_set_t orig_mask, mask;
> > int cpu;
> > int ret;
> >
> > - BUG_ON(cpus_per_node * nr_numa_nodes() != g->p.nr_cpus);
> > - BUG_ON(!cpus_per_node);
> > -
> > ret = sched_getaffinity(0, sizeof(orig_mask), &orig_mask);
> > BUG_ON(ret);
> >
> > @@ -305,13 +302,15 @@ static cpu_set_t bind_to_node(int target_node)
> > for (cpu = 0; cpu < g->p.nr_cpus; cpu++)
> > CPU_SET(cpu, &mask);
> > } else {
> > - int cpu_start = (target_node + 0) * cpus_per_node;
> > - int cpu_stop = (target_node + 1) * cpus_per_node;
> > -
> > - BUG_ON(cpu_stop > g->p.nr_cpus);
> > + struct bitmask *cpumask = numa_allocate_cpumask();
> >
> > - for (cpu = cpu_start; cpu < cpu_stop; cpu++)
> > - CPU_SET(cpu, &mask);
> > + BUG_ON(!cpumask);
> > + if (!numa_node_to_cpus(target_node, cpumask)) {
> > + for (cpu = 0; cpu < (int)cpumask->size; cpu++) {
> > + if (numa_bitmask_isbitset(cpumask, cpu))
> > + CPU_SET(cpu, &mask);
> > + }
> > + }
>
> It seems you need to call numa_free_cpumask() for both functions.

Well, the whole approach to memory allocation is rather relaxed
troughout the code. I.e cpumasks do not get deallocated (*),
strdup() return values are not checked etc.

If it worth fixing all that then it would be a separate effort,
as far as I am concerned.

> Thanks
> Namhyung
>
> > }
> >
> > ret = sched_setaffinity(0, sizeof(mask), &mask);
> > --
> > 1.8.3.1
> >

2020-08-12 12:09:07

by Arnaldo Carvalho de Melo

[permalink] [raw]

Subject: Re: [PATCH v2 2/3] perf bench numa: fix number of processes in "2x3-convergence" test

Em Tue, Aug 11, 2020 at 04:27:35PM +0900, Namhyung Kim escreveu:
> On Mon, Aug 10, 2020 at 3:22 PM Alexander Gordeev
> <[email protected]> wrote:
> >
> > Cc: Peter Zijlstra <[email protected]>
> > Cc: Ingo Molnar <[email protected]>
> > Cc: Arnaldo Carvalho de Melo <[email protected]>
> > Cc: Mark Rutland <[email protected]>
> > Cc: Alexander Shishkin <[email protected]>
> > Cc: Jiri Olsa <[email protected]>
> > Cc: Namhyung Kim <[email protected]>
> > Signed-off-by: Alexander Gordeev <[email protected]>
>
> Acked-by: Namhyung Kim <[email protected]>

Thanks, applied,

- Arnaldo

> Thanks
> Namhyung
>
> > ---
> > tools/perf/bench/numa.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/tools/perf/bench/numa.c b/tools/perf/bench/numa.c
> > index 23e224e..90639c9 100644
> > --- a/tools/perf/bench/numa.c
> > +++ b/tools/perf/bench/numa.c
> > @@ -1754,7 +1754,7 @@ static int run_bench_numa(const char *name, const char **argv)
> > { " 1x3-convergence,", "mem", "-p", "1", "-t", "3", "-P", "512", OPT_CONV },
> > { " 1x4-convergence,", "mem", "-p", "1", "-t", "4", "-P", "512", OPT_CONV },
> > { " 1x6-convergence,", "mem", "-p", "1", "-t", "6", "-P", "1020", OPT_CONV },
> > - { " 2x3-convergence,", "mem", "-p", "3", "-t", "3", "-P", "1020", OPT_CONV },
> > + { " 2x3-convergence,", "mem", "-p", "2", "-t", "3", "-P", "1020", OPT_CONV },
> > { " 3x3-convergence,", "mem", "-p", "3", "-t", "3", "-P", "1020", OPT_CONV },
> > { " 4x4-convergence,", "mem", "-p", "4", "-t", "4", "-P", "512", OPT_CONV },
> > { " 4x4-convergence-NOTHP,",
> > --
> > 1.8.3.1
> >

--

- Arnaldo

2020-08-12 12:12:02

by Arnaldo Carvalho de Melo

[permalink] [raw]

Subject: Re: [PATCH v2 3/3] perf bench numa: fix benchmark names

Em Mon, Aug 10, 2020 at 08:22:00AM +0200, Alexander Gordeev escreveu:
> Standard benchmark names let users know the tests specifics.
> For example "2x1-bw-process" name tells that two processes
> one thread each are run and the RAM bandwidth is measured.
>
> Several benchmarks names do not correspond to their actual
> running configuration. Fix that and also some whitespace
> and comment inconsistencies.

Looks, ok, applied.

- Arnaldo

> Cc: Peter Zijlstra <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Arnaldo Carvalho de Melo <[email protected]>
> Cc: Mark Rutland <[email protected]>
> Cc: Alexander Shishkin <[email protected]>
> Cc: Jiri Olsa <[email protected]>
> Cc: Namhyung Kim <[email protected]>
> Signed-off-by: Alexander Gordeev <[email protected]>
> ---
> tools/perf/bench/numa.c | 35 +++++++++++++++++------------------
> 1 file changed, 17 insertions(+), 18 deletions(-)
>
> diff --git a/tools/perf/bench/numa.c b/tools/perf/bench/numa.c
> index 90639c9..3b4b63f 100644
> --- a/tools/perf/bench/numa.c
> +++ b/tools/perf/bench/numa.c
> @@ -812,12 +812,12 @@ static u64 do_work(u8 *__data, long bytes, int nr, int nr_max, int loop, u64 val
> }
> }
> } else if (!g->p.data_backwards || (nr + loop) & 1) {
> + /* Process data forwards: */
>
> d0 = data + off;
> d = data + off + 1;
> d1 = data + words;
>
> - /* Process data forwards: */
> for (;;) {
> if (unlikely(d >= d1))
> d = data;
> @@ -835,7 +835,6 @@ static u64 do_work(u8 *__data, long bytes, int nr, int nr_max, int loop, u64 val
> d = data + off - 1;
> d1 = data + words;
>
> - /* Process data forwards: */
> for (;;) {
> if (unlikely(d < data))
> d = data + words-1;
> @@ -1732,12 +1731,12 @@ static int run_bench_numa(const char *name, const char **argv)
> */
> static const char *tests[][MAX_ARGS] = {
> /* Basic single-stream NUMA bandwidth measurements: */
> - { "RAM-bw-local,", "mem", "-p", "1", "-t", "1", "-P", "1024",
> + { "RAM-bw-local,", "mem", "-p", "1", "-t", "1", "-P", "1024",
> "-C" , "0", "-M", "0", OPT_BW_RAM },
> { "RAM-bw-local-NOTHP,",
> "mem", "-p", "1", "-t", "1", "-P", "1024",
> "-C" , "0", "-M", "0", OPT_BW_RAM_NOTHP },
> - { "RAM-bw-remote,", "mem", "-p", "1", "-t", "1", "-P", "1024",
> + { "RAM-bw-remote,", "mem", "-p", "1", "-t", "1", "-P", "1024",
> "-C" , "0", "-M", "1", OPT_BW_RAM },
>
> /* 2-stream NUMA bandwidth measurements: */
> @@ -1779,24 +1778,24 @@ static int run_bench_numa(const char *name, const char **argv)
> "mem", "-p", "8", "-t", "1", "-P", " 512", OPT_BW_NOTHP },
> { "16x1-bw-process,", "mem", "-p", "16", "-t", "1", "-P", "256", OPT_BW },
>
> - { " 4x1-bw-thread,", "mem", "-p", "1", "-t", "4", "-T", "256", OPT_BW },
> - { " 8x1-bw-thread,", "mem", "-p", "1", "-t", "8", "-T", "256", OPT_BW },
> - { "16x1-bw-thread,", "mem", "-p", "1", "-t", "16", "-T", "128", OPT_BW },
> - { "32x1-bw-thread,", "mem", "-p", "1", "-t", "32", "-T", "64", OPT_BW },
> + { " 1x4-bw-thread,", "mem", "-p", "1", "-t", "4", "-T", "256", OPT_BW },
> + { " 1x8-bw-thread,", "mem", "-p", "1", "-t", "8", "-T", "256", OPT_BW },
> + { "1x16-bw-thread,", "mem", "-p", "1", "-t", "16", "-T", "128", OPT_BW },
> + { "1x32-bw-thread,", "mem", "-p", "1", "-t", "32", "-T", "64", OPT_BW },
>
> - { " 2x3-bw-thread,", "mem", "-p", "2", "-t", "3", "-P", "512", OPT_BW },
> - { " 4x4-bw-thread,", "mem", "-p", "4", "-t", "4", "-P", "512", OPT_BW },
> - { " 4x6-bw-thread,", "mem", "-p", "4", "-t", "6", "-P", "512", OPT_BW },
> - { " 4x8-bw-thread,", "mem", "-p", "4", "-t", "8", "-P", "512", OPT_BW },
> - { " 4x8-bw-thread-NOTHP,",
> + { " 2x3-bw-process,", "mem", "-p", "2", "-t", "3", "-P", "512", OPT_BW },
> + { " 4x4-bw-process,", "mem", "-p", "4", "-t", "4", "-P", "512", OPT_BW },
> + { " 4x6-bw-process,", "mem", "-p", "4", "-t", "6", "-P", "512", OPT_BW },
> + { " 4x8-bw-process,", "mem", "-p", "4", "-t", "8", "-P", "512", OPT_BW },
> + { " 4x8-bw-process-NOTHP,",
> "mem", "-p", "4", "-t", "8", "-P", "512", OPT_BW_NOTHP },
> - { " 3x3-bw-thread,", "mem", "-p", "3", "-t", "3", "-P", "512", OPT_BW },
> - { " 5x5-bw-thread,", "mem", "-p", "5", "-t", "5", "-P", "512", OPT_BW },
> + { " 3x3-bw-process,", "mem", "-p", "3", "-t", "3", "-P", "512", OPT_BW },
> + { " 5x5-bw-process,", "mem", "-p", "5", "-t", "5", "-P", "512", OPT_BW },
>
> - { "2x16-bw-thread,", "mem", "-p", "2", "-t", "16", "-P", "512", OPT_BW },
> - { "1x32-bw-thread,", "mem", "-p", "1", "-t", "32", "-P", "2048", OPT_BW },
> + { "2x16-bw-process,", "mem", "-p", "2", "-t", "16", "-P", "512", OPT_BW },
> + { "1x32-bw-process,", "mem", "-p", "1", "-t", "32", "-P", "2048", OPT_BW },
>
> - { "numa02-bw,", "mem", "-p", "1", "-t", "32", "-T", "32", OPT_BW },
> + { "numa02-bw,", "mem", "-p", "1", "-t", "32", "-T", "32", OPT_BW },
> { "numa02-bw-NOTHP,", "mem", "-p", "1", "-t", "32", "-T", "32", OPT_BW_NOTHP },
> { "numa01-bw-thread,", "mem", "-p", "2", "-t", "16", "-T", "192", OPT_BW },
> { "numa01-bw-thread-NOTHP,",
> --
> 1.8.3.1
>

--

- Arnaldo

2020-08-13 11:32:13

by Alexander Gordeev

[permalink] [raw]

Subject: [PATCH v3 1/2] perf bench numa: fix cpumask memory leak in node_has_cpus()

Couple numa_allocate_cpumask() and numa_free_cpumask() functions

Signed-off-by: Alexander Gordeev <[email protected]>
---
tools/perf/bench/numa.c | 13 +++++++++----
1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/tools/perf/bench/numa.c b/tools/perf/bench/numa.c
index 31e2601..9066511 100644
--- a/tools/perf/bench/numa.c
+++ b/tools/perf/bench/numa.c
@@ -248,16 +248,21 @@ static int is_node_present(int node)
static bool node_has_cpus(int node)
{
struct bitmask *cpu = numa_allocate_cpumask();
+ bool ret = false; /* fall back to nocpus */
unsigned int i;

- if (cpu && !numa_node_to_cpus(node, cpu)) {
+ BUG_ON(!cpu);
+ if (!numa_node_to_cpus(node, cpu)) {
for (i = 0; i < cpu->size; i++) {
- if (numa_bitmask_isbitset(cpu, i))
- return true;
+ if (numa_bitmask_isbitset(cpu, i)) {
+ ret = true;
+ break;
+ }
}
}
+ numa_free_cpumask(cpu);

- return false; /* lets fall back to nocpus safely */
+ return ret;
}

static cpu_set_t bind_to_cpu(int target_cpu)
--
1.8.3.1

2020-08-13 11:34:02

by Alexander Gordeev

[permalink] [raw]

Subject: [PATCH v3 1/2] perf bench numa: use numa_node_to_cpus() to bind tasks to nodes

It is currently assumed that each node contains at most
nr_cpus/nr_nodes CPUs and nodes' CPU ranges do not overlap.
That assumption is generally incorrect as there are archs
where a CPU number does not depend on to its node number.

This update removes the described assumption by simply calling
numa_node_to_cpus() interface and using the returned mask for
binding CPUs to nodes.

Also, variable types and names made consistent in functions
using cpumask.

Cc: Satheesh Rajendran <[email protected]>
Cc: Srikar Dronamraju <[email protected]>
Cc: Naveen N. Rao <[email protected]>
Cc: Balamuruhan S <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Signed-off-by: Alexander Gordeev <[email protected]>
---
tools/perf/bench/numa.c | 33 ++++++++++++++++-----------------
1 file changed, 16 insertions(+), 17 deletions(-)

diff --git a/tools/perf/bench/numa.c b/tools/perf/bench/numa.c
index 9066511..6d5c890 100644
--- a/tools/perf/bench/numa.c
+++ b/tools/perf/bench/numa.c
@@ -247,20 +247,20 @@ static int is_node_present(int node)
*/
static bool node_has_cpus(int node)
{
- struct bitmask *cpu = numa_allocate_cpumask();
+ struct bitmask *cpumask = numa_allocate_cpumask();
bool ret = false; /* fall back to nocpus */
- unsigned int i;
+ int cpu;

- BUG_ON(!cpu);
- if (!numa_node_to_cpus(node, cpu)) {
- for (i = 0; i < cpu->size; i++) {
- if (numa_bitmask_isbitset(cpu, i)) {
+ BUG_ON(!cpumask);
+ if (!numa_node_to_cpus(node, cpumask)) {
+ for (cpu = 0; cpu < (int)cpumask->size; cpu++) {
+ if (numa_bitmask_isbitset(cpumask, cpu)) {
ret = true;
break;
}
}
}
- numa_free_cpumask(cpu);
+ numa_free_cpumask(cpumask);

return ret;
}
@@ -293,14 +293,10 @@ static cpu_set_t bind_to_cpu(int target_cpu)

static cpu_set_t bind_to_node(int target_node)
{
- int cpus_per_node = g->p.nr_cpus / nr_numa_nodes();
cpu_set_t orig_mask, mask;
int cpu;
int ret;

- BUG_ON(cpus_per_node * nr_numa_nodes() != g->p.nr_cpus);
- BUG_ON(!cpus_per_node);
-
ret = sched_getaffinity(0, sizeof(orig_mask), &orig_mask);
BUG_ON(ret);

@@ -310,13 +306,16 @@ static cpu_set_t bind_to_node(int target_node)
for (cpu = 0; cpu < g->p.nr_cpus; cpu++)
CPU_SET(cpu, &mask);
} else {
- int cpu_start = (target_node + 0) * cpus_per_node;
- int cpu_stop = (target_node + 1) * cpus_per_node;
-
- BUG_ON(cpu_stop > g->p.nr_cpus);
+ struct bitmask *cpumask = numa_allocate_cpumask();

- for (cpu = cpu_start; cpu < cpu_stop; cpu++)
- CPU_SET(cpu, &mask);
+ BUG_ON(!cpumask);
+ if (!numa_node_to_cpus(target_node, cpumask)) {
+ for (cpu = 0; cpu < (int)cpumask->size; cpu++) {
+ if (numa_bitmask_isbitset(cpumask, cpu))
+ CPU_SET(cpu, &mask);
+ }
+ }
+ numa_free_cpumask(cpumask);
}

ret = sched_setaffinity(0, sizeof(mask), &mask);
--
1.8.3.1

2020-08-13 12:09:17

by Srikar Dronamraju

[permalink] [raw]

Subject: Re: [PATCH v3 1/2] perf bench numa: use numa_node_to_cpus() to bind tasks to nodes

* Alexander Gordeev <[email protected]> [2020-08-13 13:32:48]:

> It is currently assumed that each node contains at most
> nr_cpus/nr_nodes CPUs and nodes' CPU ranges do not overlap.
> That assumption is generally incorrect as there are archs
> where a CPU number does not depend on to its node number.
>
> This update removes the described assumption by simply calling
> numa_node_to_cpus() interface and using the returned mask for
> binding CPUs to nodes.
>
> Also, variable types and names made consistent in functions
> using cpumask.
>
> Cc: Satheesh Rajendran <[email protected]>
> Cc: Srikar Dronamraju <[email protected]>
> Cc: Naveen N. Rao <[email protected]>
> Cc: Balamuruhan S <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Arnaldo Carvalho de Melo <[email protected]>
> Cc: Mark Rutland <[email protected]>
> Cc: Alexander Shishkin <[email protected]>
> Cc: Jiri Olsa <[email protected]>
> Cc: Namhyung Kim <[email protected]>
> Signed-off-by: Alexander Gordeev <[email protected]>
> ---
> @@ -310,13 +306,16 @@ static cpu_set_t bind_to_node(int target_node)
> for (cpu = 0; cpu < g->p.nr_cpus; cpu++)
> CPU_SET(cpu, &mask);
> } else {
> - int cpu_start = (target_node + 0) * cpus_per_node;
> - int cpu_stop = (target_node + 1) * cpus_per_node;
> -
> - BUG_ON(cpu_stop > g->p.nr_cpus);
> + struct bitmask *cpumask = numa_allocate_cpumask();
>
> - for (cpu = cpu_start; cpu < cpu_stop; cpu++)
> - CPU_SET(cpu, &mask);
> + BUG_ON(!cpumask);
> + if (!numa_node_to_cpus(target_node, cpumask)) {
> + for (cpu = 0; cpu < (int)cpumask->size; cpu++) {
> + if (numa_bitmask_isbitset(cpumask, cpu))
> + CPU_SET(cpu, &mask);
> + }
> + }
> + numa_free_cpumask(cpumask);
> }
>
> ret = sched_setaffinity(0, sizeof(mask), &mask);
> --
> 1.8.3.1
>

Looks good to me.

Reviewed-by: Srikar Dronamraju <[email protected]>

--
Thanks and Regards
Srikar Dronamraju

2020-08-13 12:10:21

by Srikar Dronamraju

[permalink] [raw]

Subject: Re: [PATCH v3 1/2] perf bench numa: fix cpumask memory leak in node_has_cpus()

* Alexander Gordeev <[email protected]> [2020-08-13 13:30:42]:

> Couple numa_allocate_cpumask() and numa_free_cpumask() functions
>
> Signed-off-by: Alexander Gordeev <[email protected]>
> ---
> tools/perf/bench/numa.c | 13 +++++++++----
> 1 file changed, 9 insertions(+), 4 deletions(-)
>
> diff --git a/tools/perf/bench/numa.c b/tools/perf/bench/numa.c
> index 31e2601..9066511 100644
> --- a/tools/perf/bench/numa.c
> +++ b/tools/perf/bench/numa.c
> @@ -248,16 +248,21 @@ static int is_node_present(int node)
> static bool node_has_cpus(int node)
> {
> struct bitmask *cpu = numa_allocate_cpumask();
> + bool ret = false; /* fall back to nocpus */
> unsigned int i;
>
> - if (cpu && !numa_node_to_cpus(node, cpu)) {
> + BUG_ON(!cpu);
> + if (!numa_node_to_cpus(node, cpu)) {
> for (i = 0; i < cpu->size; i++) {
> - if (numa_bitmask_isbitset(cpu, i))
> - return true;
> + if (numa_bitmask_isbitset(cpu, i)) {
> + ret = true;
> + break;
> + }
> }
> }
> + numa_free_cpumask(cpu);
>
> - return false; /* lets fall back to nocpus safely */
> + return ret;
> }
>
> static cpu_set_t bind_to_cpu(int target_cpu)
> --
> 1.8.3.1
>

Looks good to me.

Reviewed-by: Srikar Dronamraju <[email protected]>

--
Thanks and Regards
Srikar Dronamraju

2020-08-13 13:04:23

by Arnaldo Carvalho de Melo

[permalink] [raw]

Subject: Re: [PATCH v3 1/2] perf bench numa: fix cpumask memory leak in node_has_cpus()

Em Thu, Aug 13, 2020 at 05:36:49PM +0530, Srikar Dronamraju escreveu:
> Looks good to me.
>
> Reviewed-by: Srikar Dronamraju <[email protected]>

Thanks, applied.

- Arnaldo

2020-08-13 13:04:46

by Arnaldo Carvalho de Melo

[permalink] [raw]

Subject: Re: [PATCH v3 1/2] perf bench numa: use numa_node_to_cpus() to bind tasks to nodes

Em Thu, Aug 13, 2020 at 05:37:38PM +0530, Srikar Dronamraju escreveu:
>
> Looks good to me.
>
> Reviewed-by: Srikar Dronamraju <[email protected]>

Thanks, applied.

- Arnaldo