LinuxLists.cc - [PATCH v3 0/2] Fixup for discontiguous/sparse numa nodes

2017-08-21 10:17:06

Subject: [PATCH v3 0/2] Fixup for discontiguous/sparse numa nodes

From: Satheesh Rajendran <[email protected]>

Certain systems would have sparse/discontinguous
numa nodes.
perf bench numa doesnt work well on such nodes.
1. It shows wrong values.
2. It can hang.
3. It can show redundant information for non-existant nodes.

#numactl -H
available: 2 nodes (0,8)
node 0 cpus: 0 1 2 3 4 5 6 7
node 0 size: 61352 MB
node 0 free: 57168 MB
node 8 cpus: 8 9 10 11 12 13 14 15
node 8 size: 65416 MB
node 8 free: 36593 MB
node distances:
node 0 8
0: 10 40
8: 40 10

Scenario 1:

Before Fix:
# perf bench numa mem --no-data_rand_walk -p 2 -t 20 -G 0 -P 3072 -T 0 -l 50 -c -s 1000
...
...
# 40 tasks will execute (on 9 nodes, 16 CPUs): ----> Wrong number of nodes
...
# 2.0% [0.2 mins] 1/1 0/0 0/0 0/0 0/0 0/0 0/0 0/0 4/1 [ 4/2 ] l: 0-0 ( 0) ----> Shows info on non-existant nodes.

After Fix:
# ./perf bench numa mem --no-data_rand_walk -p 2 -t 20 -G 0 -P 3072 -T 0 -l 50 -c -s 1000
...
...
# 40 tasks will execute (on 2 nodes, 16 CPUs):
...
# 2.0% [0.2 mins] 9/1 0/0 [ 9/1 ] l: 0-0 ( 0)
# 4.0% [0.4 mins] 21/2 19/1 [ 2/3 ] l: 0-1 ( 1) {1-2}

Scenario 2:

Before Fix:
# perf bench numa all
# Running numa/mem benchmark...
....
...
# Running RAM-bw-remote, "perf bench numa mem -p 1 -t 1 -P 1024 -C 0 -M 1 -s 20 -zZq --thp 1 --no-data_rand_walk"
perf: bench/numa.c:306: bind_to_memnode: Assertion `!(ret)' failed. ------------> Got hung

After Fix:
# ./perf bench numa all
# Running numa/mem benchmark...
....
...
# Running RAM-bw-remote, "perf bench numa mem -p 1 -t 1 -P 1024 -C 0 -M 1 -s 20 -zZq --thp 1 --no-data_rand_walk"

# NOTE: ignoring bind NODEs starting at NODE#1
# NOTE: 0 tasks mem-bound, 1 tasks unbound
20.017 secs slowest (max) thread-runtime
20.000 secs fastest (min) thread-runtime
20.006 secs average thread-runtime
0.043 % difference between max/avg runtime
413.794 GB data processed, per thread
413.794 GB data processed, total
0.048 nsecs/byte/thread runtime
20.672 GB/sec/thread speed
20.672 GB/sec total speed

Changes in v2:
Fixed review comments for function names and alloc failure handle

Changes in v3:
Coding Style fixes.

Satheesh Rajendran (2):
perf/bench/numa: Add functions to detect sparse numa nodes
perf/bench/numa: Handle discontiguous/sparse numa nodes

tools/perf/bench/numa.c | 61 +++++++++++++++++++++++++++++++++++++++++++------
1 file changed, 54 insertions(+), 7 deletions(-)

--
2.7.4

2017-08-21 10:16:47

by Satheesh Rajendran

[permalink] [raw]

Subject: [PATCH v3 1/2] perf/bench/numa: Add functions to detect sparse numa nodes

From: Satheesh Rajendran <[email protected]>

Added functions 1) to get a count of all nodes that are exposed to
userspace. These nodes could be memoryless cpu nodes or cpuless memory
nodes, 2) to check given node is present and 3) to check given
node has cpus

This information can be used to handle sparse/discontiguous nodes.

Cc: Arnaldo Carvalho de Melo <[email protected]>
Reviewed-by: Srikar Dronamraju <[email protected]>
Signed-off-by: Satheesh Rajendran <[email protected]>
Signed-off-by: Balamuruhan S <[email protected]>
---
tools/perf/bench/numa.c | 44 ++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 44 insertions(+)

diff --git a/tools/perf/bench/numa.c b/tools/perf/bench/numa.c
index 469d65b..2483174 100644
--- a/tools/perf/bench/numa.c
+++ b/tools/perf/bench/numa.c
@@ -215,6 +215,50 @@ static const char * const numa_usage[] = {
NULL
};

+/*
+ * To get number of numa nodes present.
+ */
+static int nr_numa_nodes(void)
+{
+ int i, nr_nodes = 0;
+
+ for (i = 0; i < g->p.nr_nodes; i++) {
+ if (numa_bitmask_isbitset(numa_nodes_ptr, i))
+ nr_nodes++;
+ }
+
+ return nr_nodes;
+}
+
+/*
+ * To check if given numa node is present.
+ */
+static int is_node_present(int node)
+{
+ return numa_bitmask_isbitset(numa_nodes_ptr, node);
+}
+
+/*
+ * To check given numa node has cpus.
+ */
+static bool node_has_cpus(int node)
+{
+ struct bitmask *cpu = numa_allocate_cpumask();
+ unsigned int i;
+
+ if (cpu == NULL)
+ return false; /* lets fall back to nocpus safely */
+
+ if (numa_node_to_cpus(node, cpu) == 0) {
+ for (i = 0; i < cpu->size; i++) {
+ if (numa_bitmask_isbitset(cpu, i))
+ return true;
+ }
+ }
+
+ return false;
+}
+
static cpu_set_t bind_to_cpu(int target_cpu)
{
cpu_set_t orig_mask, mask;
--
2.7.4

2017-08-21 10:17:28

by Satheesh Rajendran

[permalink] [raw]

Subject: [PATCH v3 2/2] perf/bench/numa: Handle discontiguous/sparse numa nodes

From: Satheesh Rajendran <[email protected]>

Certain systems are designed to have sparse/discontiguous nodes.
On such systems, perf bench numa hangs, shows wrong number of nodes
and shows values for non-existent nodes. Handle this by only
taking nodes that are exposed by kernel to userspace.

Cc: Arnaldo Carvalho de Melo <[email protected]>
Reviewed-by: Srikar Dronamraju <[email protected]>
Signed-off-by: Satheesh Rajendran <[email protected]>
Signed-off-by: Balamuruhan S <[email protected]>
---
tools/perf/bench/numa.c | 17 ++++++++++-------
1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/tools/perf/bench/numa.c b/tools/perf/bench/numa.c
index 2483174..d4cccc4 100644
--- a/tools/perf/bench/numa.c
+++ b/tools/perf/bench/numa.c
@@ -287,12 +287,12 @@ static cpu_set_t bind_to_cpu(int target_cpu)

static cpu_set_t bind_to_node(int target_node)
{
- int cpus_per_node = g->p.nr_cpus/g->p.nr_nodes;
+ int cpus_per_node = g->p.nr_cpus/nr_numa_nodes();
cpu_set_t orig_mask, mask;
int cpu;
int ret;

- BUG_ON(cpus_per_node*g->p.nr_nodes != g->p.nr_cpus);
+ BUG_ON(cpus_per_node*nr_numa_nodes() != g->p.nr_cpus);
BUG_ON(!cpus_per_node);

ret = sched_getaffinity(0, sizeof(orig_mask), &orig_mask);
@@ -692,7 +692,7 @@ static int parse_setup_node_list(void)
int i;

for (i = 0; i < mul; i++) {
- if (t >= g->p.nr_tasks) {
+ if (t >= g->p.nr_tasks || !node_has_cpus(bind_node)) {
printf("\n# NOTE: ignoring bind NODEs starting at NODE#%d\n", bind_node);
goto out;
}
@@ -973,6 +973,7 @@ static void calc_convergence(double runtime_ns_max, double *convergence)
int node;
int cpu;
int t;
+ int processes;

if (!g->p.show_convergence && !g->p.measure_convergence)
return;
@@ -1007,13 +1008,14 @@ static void calc_convergence(double runtime_ns_max, double *convergence)
sum = 0;

for (node = 0; node < g->p.nr_nodes; node++) {
+ if (!is_node_present(node))
+ continue;
nr = nodes[node];
nr_min = min(nr, nr_min);
nr_max = max(nr, nr_max);
sum += nr;
}
BUG_ON(nr_min > nr_max);
-
BUG_ON(sum > g->p.nr_tasks);

if (0 && (sum < g->p.nr_tasks))
@@ -1027,8 +1029,9 @@ static void calc_convergence(double runtime_ns_max, double *convergence)
process_groups = 0;

for (node = 0; node < g->p.nr_nodes; node++) {
- int processes = count_node_processes(node);
-
+ if (!is_node_present(node))
+ continue;
+ processes = count_node_processes(node);
nr = nodes[node];
tprintf(" %2d/%-2d", nr, processes);

@@ -1334,7 +1337,7 @@ static void print_summary(void)

printf("\n ###\n");
printf(" # %d %s will execute (on %d nodes, %d CPUs):\n",
- g->p.nr_tasks, g->p.nr_tasks == 1 ? "task" : "tasks", g->p.nr_nodes, g->p.nr_cpus);
+ g->p.nr_tasks, g->p.nr_tasks == 1 ? "task" : "tasks", nr_numa_nodes(), g->p.nr_cpus);
printf(" # %5dx %5ldMB global shared mem operations\n",
g->p.nr_loops, g->p.bytes_global/1024/1024);
printf(" # %5dx %5ldMB process shared mem operations\n",
--
2.7.4

2017-09-19 09:35:01

by Satheesh Rajendran

[permalink] [raw]

Subject: Re: [PATCH v3 0/2] Fixup for discontiguous/sparse numa nodes

Hi Arnaldo,

Please let me know if any further comments.
Thanks in advance :-)

Regards,
-Satheesh.On Mon, 2017-08-21 at 15:45 +0530, [email protected] wrote:
> From: Satheesh Rajendran <[email protected]>
>
> Certain systems would have sparse/discontinguous
> numa nodes.
> perf bench numa doesnt work well on such nodes.
> 1. It shows wrong values.
> 2. It can hang.
> 3. It can show redundant information for non-existant nodes.
>
> #numactl -H
> available: 2 nodes (0,8)
> node 0 cpus: 0 1 2 3 4 5 6 7
> node 0 size: 61352 MB
> node 0 free: 57168 MB
> node 8 cpus: 8 9 10 11 12 13 14 15
> node 8 size: 65416 MB
> node 8 free: 36593 MB
> node distances:
> node   0   8
> 0:  10  40
> 8:  40  10
>
> Scenario 1:
>
> Before Fix:
> # perf bench numa mem --no-data_rand_walk -p 2 -t 20 -G 0 -P 3072 -T
> 0 -l 50 -c -s 1000
> ...
> ...
> # 40 tasks will execute (on 9 nodes, 16 CPUs): ----> Wrong number of
> nodes
> ...
> #    2.0%  [0.2
> mins]  1/1   0/0   0/0   0/0   0/0   0/0   0/0   0/0   4/1  [ 4/2 ]
> l:  0-0   (  0) ----> Shows info on non-existant nodes.
>
> After Fix:
> # ./perf bench numa mem --no-data_rand_walk -p 2 -t 20 -G 0 -P 3072
> -T 0 -l 50 -c -s 1000
> ...
> ...
> # 40 tasks will execute (on 2 nodes, 16 CPUs):
> ...
> #    2.0%  [0.2 mins]  9/1   0/0  [ 9/1 ] l:  0-0   (  0)
> #    4.0%  [0.4 mins] 21/2  19/1  [ 2/3 ] l:  0-1   (  1) {1-2}
>
> Scenario 2:
>
> Before Fix:
> # perf bench numa all
> # Running numa/mem benchmark...
> ....
> ...
> # Running RAM-bw-remote, "perf bench numa mem -p 1 -t 1 -P 1024 -C 0
> -M 1 -s 20 -zZq --thp  1 --no-data_rand_walk"
> perf: bench/numa.c:306: bind_to_memnode: Assertion `!(ret)' failed.
> ------------> Got hung
>
> After Fix:
> # ./perf bench numa all
> # Running numa/mem benchmark...
> ....
> ...
> # Running RAM-bw-remote, "perf bench numa mem -p 1 -t 1 -P 1024 -C 0
> -M 1 -s 20 -zZq --thp  1 --no-data_rand_walk"
>
> # NOTE: ignoring bind NODEs starting at NODE#1
> # NOTE: 0 tasks mem-bound, 1 tasks unbound
>          20.017 secs slowest (max) thread-runtime
>          20.000 secs fastest (min) thread-runtime
>          20.006 secs average thread-runtime
>           0.043 % difference between max/avg runtime
>         413.794 GB data processed, per thread
>         413.794 GB data processed, total
>           0.048 nsecs/byte/thread runtime
>          20.672 GB/sec/thread speed
>          20.672 GB/sec total speed
>
> Changes in v2:
> Fixed review comments for function names and alloc failure handle
>
> Changes in v3:
> Coding Style fixes.
>
>
> Satheesh Rajendran (2):
> perf/bench/numa: Add functions to detect sparse numa nodes
> perf/bench/numa: Handle discontiguous/sparse numa nodes
>
> tools/perf/bench/numa.c | 61
> +++++++++++++++++++++++++++++++++++++++++++------
> 1 file changed, 54 insertions(+), 7 deletions(-)
>

2017-11-15 15:54:31

by Satheesh Rajendran

[permalink] [raw]

Subject: Re: [PATCH v3 2/2] perf/bench/numa: Handle discontiguous/sparse numa nodes

Hi Arnaldo,Please find my reply inline.

On Tue, 2017-10-31 at 12:26 -0300, Arnaldo Carvalho de Melo wrote:
> Em Tue, Oct 31, 2017 at 08:46:58PM +0530, Naveen N. Rao escreveu:
> >
> > On 2017/08/21 10:17AM, [email protected] wrote:
> > >
> > > From: Satheesh Rajendran <[email protected]>
> > >
> > > Certain systems are designed to have sparse/discontiguous nodes.
> > > On such systems, perf bench numa hangs, shows wrong number of
> > > nodes
> > > and shows values for non-existent nodes. Handle this by only
> > > taking nodes that are exposed by kernel to userspace.
> > >
> > > Cc: Arnaldo Carvalho de Melo <[email protected]>
> > > Reviewed-by: Srikar Dronamraju <[email protected]>
> > > Signed-off-by: Satheesh Rajendran <[email protected]>
> > > Signed-off-by: Balamuruhan S <[email protected]>
> > > ---
> > > tools/perf/bench/numa.c | 17 ++++++++++-------
> > > 1 file changed, 10 insertions(+), 7 deletions(-)
> > >
> > > diff --git a/tools/perf/bench/numa.c b/tools/perf/bench/numa.c
> > > index 2483174..d4cccc4 100644
> > > --- a/tools/perf/bench/numa.c
> > > +++ b/tools/perf/bench/numa.c
> > > @@ -287,12 +287,12 @@ static cpu_set_t bind_to_cpu(int
> > > target_cpu)
> > >
> > > static cpu_set_t bind_to_node(int target_node)
> > > {
> > > - int cpus_per_node = g->p.nr_cpus/g->p.nr_nodes;
> > > + int cpus_per_node = g->p.nr_cpus/nr_numa_nodes();
> > > cpu_set_t orig_mask, mask;
> > > int cpu;
> > > int ret;
> > >
> > > - BUG_ON(cpus_per_node*g->p.nr_nodes != g->p.nr_cpus);
> > > + BUG_ON(cpus_per_node*nr_numa_nodes() != g->p.nr_cpus);
> > > BUG_ON(!cpus_per_node);
> > >
> > > ret = sched_getaffinity(0, sizeof(orig_mask),
> > > &orig_mask);
> > > @@ -692,7 +692,7 @@ static int parse_setup_node_list(void)
> > > int i;
> > >
> > > for (i = 0; i < mul; i++) {
> > > - if (t >= g->p.nr_tasks) {
> > > + if (t >= g->p.nr_tasks ||
> > > !node_has_cpus(bind_node)) {
> > > printf("\n# NOTE:
> > > ignoring bind NODEs starting at NODE#%d\n", bind_node);
> > > goto out;
> > > }
> > > @@ -973,6 +973,7 @@ static void calc_convergence(double
> > > runtime_ns_max, double *convergence)
> > > int node;
> > > int cpu;
> > > int t;
> > > + int processes;
> > >
> > > if (!g->p.show_convergence && !g->p.measure_convergence)
> > > return;
> > > @@ -1007,13 +1008,14 @@ static void calc_convergence(double
> > > runtime_ns_max, double *convergence)
> > > sum = 0;
> > >
> > > for (node = 0; node < g->p.nr_nodes; node++) {
> > > + if (!is_node_present(node))
> > > + continue;
> > > nr = nodes[node];
> > > nr_min = min(nr, nr_min);
> > > nr_max = max(nr, nr_max);
> > > sum += nr;
> > > }
> > > BUG_ON(nr_min > nr_max);
> > > -
> > Looks like an un-necessary change there.
> Right, and I would leave the 'int processes' declaration where it is,
> as
> it is not used outside that loop.
>
I had hit with this compilation error, so had to move the
initialization above.

CC       bench/numa.o
bench/numa.c: In function ‘calc_convergence’:
bench/numa.c:1035:3: error: ISO C90 forbids mixed declarations and code
[-Werror=declaration-after-statement]
   int processes = count_node_processes(node);
   ^
cc1: all warnings being treated as errors
mv: cannot stat ‘bench/.numa.o.tmp’: No such file or directory
make[4]: *** [bench/numa.o] Error 1
make[3]: *** [bench] Error 2
make[2]: *** [perf-in.o] Error 2
make[1]: *** [sub-make] Error 2
make: *** [all] Error 2

> The move of that declaration to the top of the calc_convergence()
> function made me spend some cycles trying to figure out why that was
> done, only to realize that it was an unnecessary change :-\
>

Agree, I would have kept it in the same scope, will keep as below,

@@ -984,8 +1026,11 @@ static void calc_convergence(double runtime_ns_max, double *convergence)
        process_groups = 0;

        for (node = 0; node < g->p.nr_nodes; node++) {
-               int processes = count_node_processes(node);
+               int processes;

+               if (!is_node_present(node))
+                       continue;
+               processes = count_node_processes(node);
                nr = nodes[node];
                tprintf(" %2d/%-2d", nr, processes);

Please advice. Thanks.

Regards,
-Satheesh.
> >
> > - Naveen
> >
> > >
> > > BUG_ON(sum > g->p.nr_tasks);
> > >
> > > if (0 && (sum < g->p.nr_tasks))
> > > @@ -1027,8 +1029,9 @@ static void calc_convergence(double
> > > runtime_ns_max, double *convergence)
> > > process_groups = 0;
> > >
> > > for (node = 0; node < g->p.nr_nodes; node++) {
> > > - int processes = count_node_processes(node);
> > > -
> > > + if (!is_node_present(node))
> > > + continue;
> > > + processes = count_node_processes(node);
> > > nr = nodes[node];
> > > tprintf(" %2d/%-2d", nr, processes);
> > >
> > > @@ -1334,7 +1337,7 @@ static void print_summary(void)
> > >
> > > printf("\n ###\n");
> > > printf(" # %d %s will execute (on %d nodes, %d
> > > CPUs):\n",
> > > - g->p.nr_tasks, g->p.nr_tasks == 1 ? "task" :
> > > "tasks", g->p.nr_nodes, g->p.nr_cpus);
> > > + g->p.nr_tasks, g->p.nr_tasks == 1 ? "task" :
> > > "tasks", nr_numa_nodes(), g->p.nr_cpus);
> > > printf(" #      %5dx %5ldMB global  shared mem
> > > operations\n",
> > > g->p.nr_loops, g-
> > > >p.bytes_global/1024/1024);
> > > printf(" #      %5dx %5ldMB process shared mem
> > > operations\n",

From 1584132756947624638@xxx Wed Nov 15 11:52:04 +0000 2017
X-GM-THRID: 1576332437617451512
X-Gmail-Labels: Inbox,Category Forums,HistoricalUnread

2017-11-15 11:52:04

by Naveen N. Rao

[permalink] [raw]

Subject: Re: [PATCH v3 2/2] perf/bench/numa: Handle discontiguous/sparse numa nodes

Hi Satheesh,

Satheesh Rajendran wrote:
> Hi Naveen,
>
>
> On Tue, 2017-10-31 at 20:46 +0530, Naveen N. Rao wrote:
>>
>> > }
>> > BUG_ON(nr_min > nr_max);
>> > -
>> Looks like an un-necessary change there.
>>
>> - Naveen
>>
> I had hit with this compilation error, so had to move the
> initialization above.

I suppose you intended to reply to Arnaldo's comment about the
declaration being moved, since my comment above was about a blank line
being deleted in your patch. It's preferable to not do any un-related
changes/cleanups in a patch.

> Please advice. Thanks.
>
> CC       bench/numa.o
> bench/numa.c: In function ‘calc_convergence’:
> bench/numa.c:1035:3: error: ISO C90 forbids mixed declarations and code
> [-Werror=declaration-after-statement]
>    int processes = count_node_processes(node);
>    ^

Not sure what changes you made, but from the error message, I'm guessing
you placed the above statement _after_ the check for is_node_present().
What Arnaldo recommended is to retain the processes declaration within
the scope where it's used - in this case, the for() loop - rather than
moving it out to the start of the function.

- Naveen

> cc1: all warnings being treated as errors
> mv: cannot stat ‘bench/.numa.o.tmp’: No such file or directory
> make[4]: *** [bench/numa.o] Error 1
> make[3]: *** [bench] Error 2
> make[2]: *** [perf-in.o] Error 2
> make[1]: *** [sub-make] Error 2
> make: *** [all] Error 2
>
> Regards,
> -Satheesh.
>> >
>> > BUG_ON(sum > g->p.nr_tasks);
>> >
>> > if (0 && (sum < g->p.nr_tasks))
> @@ -1027,8 +1029,9 @@ static void calc_convergence(double
>

From 1584045755945603202@xxx Tue Nov 14 12:49:13 +0000 2017
X-GM-THRID: 1576332437617451512
X-Gmail-Labels: Inbox,Category Forums,HistoricalUnread

2017-11-14 12:49:13

by Satheesh Rajendran

[permalink] [raw]

Subject: Re: [PATCH v3 2/2] perf/bench/numa: Handle discontiguous/sparse numa nodes

Hi Naveen,

On Tue, 2017-10-31 at 20:46 +0530, Naveen N. Rao wrote:
>
> > }
> > BUG_ON(nr_min > nr_max);
> > -
> Looks like an un-necessary change there.
>
> - Naveen
>
I had hit with this compilation error, so had to move the
initialization above.
Please advice. Thanks.

CC       bench/numa.o
bench/numa.c: In function ‘calc_convergence’:
bench/numa.c:1035:3: error: ISO C90 forbids mixed declarations and code
[-Werror=declaration-after-statement]
   int processes = count_node_processes(node);
   ^
cc1: all warnings being treated as errors
mv: cannot stat ‘bench/.numa.o.tmp’: No such file or directory
make[4]: *** [bench/numa.o] Error 1
make[3]: *** [bench] Error 2
make[2]: *** [perf-in.o] Error 2
make[1]: *** [sub-make] Error 2
make: *** [all] Error 2

Regards,
-Satheesh.
> >
> > BUG_ON(sum > g->p.nr_tasks);
> >
> > if (0 && (sum < g->p.nr_tasks))
@@ -1027,8 +1029,9 @@ static void calc_convergence(double

From 1582787487178857466@xxx Tue Oct 31 15:29:34 +0000 2017
X-GM-THRID: 1576332437617451512
X-Gmail-Labels: Inbox,Category Forums

2017-10-31 15:29:35

by Arnaldo Carvalho de Melo

[permalink] [raw]

Subject: Re: [PATCH v3 2/2] perf/bench/numa: Handle discontiguous/sparse numa nodes

Em Tue, Oct 31, 2017 at 08:46:58PM +0530, Naveen N. Rao escreveu:
> On 2017/08/21 10:17AM, [email protected] wrote:
> > From: Satheesh Rajendran <[email protected]>
> >
> > Certain systems are designed to have sparse/discontiguous nodes.
> > On such systems, perf bench numa hangs, shows wrong number of nodes
> > and shows values for non-existent nodes. Handle this by only
> > taking nodes that are exposed by kernel to userspace.
> >
> > Cc: Arnaldo Carvalho de Melo <[email protected]>
> > Reviewed-by: Srikar Dronamraju <[email protected]>
> > Signed-off-by: Satheesh Rajendran <[email protected]>
> > Signed-off-by: Balamuruhan S <[email protected]>
> > ---
> > tools/perf/bench/numa.c | 17 ++++++++++-------
> > 1 file changed, 10 insertions(+), 7 deletions(-)
> >
> > diff --git a/tools/perf/bench/numa.c b/tools/perf/bench/numa.c
> > index 2483174..d4cccc4 100644
> > --- a/tools/perf/bench/numa.c
> > +++ b/tools/perf/bench/numa.c
> > @@ -287,12 +287,12 @@ static cpu_set_t bind_to_cpu(int target_cpu)
> >
> > static cpu_set_t bind_to_node(int target_node)
> > {
> > - int cpus_per_node = g->p.nr_cpus/g->p.nr_nodes;
> > + int cpus_per_node = g->p.nr_cpus/nr_numa_nodes();
> > cpu_set_t orig_mask, mask;
> > int cpu;
> > int ret;
> >
> > - BUG_ON(cpus_per_node*g->p.nr_nodes != g->p.nr_cpus);
> > + BUG_ON(cpus_per_node*nr_numa_nodes() != g->p.nr_cpus);
> > BUG_ON(!cpus_per_node);
> >
> > ret = sched_getaffinity(0, sizeof(orig_mask), &orig_mask);
> > @@ -692,7 +692,7 @@ static int parse_setup_node_list(void)
> > int i;
> >
> > for (i = 0; i < mul; i++) {
> > - if (t >= g->p.nr_tasks) {
> > + if (t >= g->p.nr_tasks || !node_has_cpus(bind_node)) {
> > printf("\n# NOTE: ignoring bind NODEs starting at NODE#%d\n", bind_node);
> > goto out;
> > }
> > @@ -973,6 +973,7 @@ static void calc_convergence(double runtime_ns_max, double *convergence)
> > int node;
> > int cpu;
> > int t;
> > + int processes;
> >
> > if (!g->p.show_convergence && !g->p.measure_convergence)
> > return;
> > @@ -1007,13 +1008,14 @@ static void calc_convergence(double runtime_ns_max, double *convergence)
> > sum = 0;
> >
> > for (node = 0; node < g->p.nr_nodes; node++) {
> > + if (!is_node_present(node))
> > + continue;
> > nr = nodes[node];
> > nr_min = min(nr, nr_min);
> > nr_max = max(nr, nr_max);
> > sum += nr;
> > }
> > BUG_ON(nr_min > nr_max);
> > -
>
> Looks like an un-necessary change there.

Right, and I would leave the 'int processes' declaration where it is, as
it is not used outside that loop.

The move of that declaration to the top of the calc_convergence()
function made me spend some cycles trying to figure out why that was
done, only to realize that it was an unnecessary change :-\

> - Naveen
>
> > BUG_ON(sum > g->p.nr_tasks);
> >
> > if (0 && (sum < g->p.nr_tasks))
> > @@ -1027,8 +1029,9 @@ static void calc_convergence(double runtime_ns_max, double *convergence)
> > process_groups = 0;
> >
> > for (node = 0; node < g->p.nr_nodes; node++) {
> > - int processes = count_node_processes(node);
> > -
> > + if (!is_node_present(node))
> > + continue;
> > + processes = count_node_processes(node);
> > nr = nodes[node];
> > tprintf(" %2d/%-2d", nr, processes);
> >
> > @@ -1334,7 +1337,7 @@ static void print_summary(void)
> >
> > printf("\n ###\n");
> > printf(" # %d %s will execute (on %d nodes, %d CPUs):\n",
> > - g->p.nr_tasks, g->p.nr_tasks == 1 ? "task" : "tasks", g->p.nr_nodes, g->p.nr_cpus);
> > + g->p.nr_tasks, g->p.nr_tasks == 1 ? "task" : "tasks", nr_numa_nodes(), g->p.nr_cpus);
> > printf(" # %5dx %5ldMB global shared mem operations\n",
> > g->p.nr_loops, g->p.bytes_global/1024/1024);
> > printf(" # %5dx %5ldMB process shared mem operations\n",

From 1582786757574011873@xxx Tue Oct 31 15:17:59 +0000 2017
X-GM-THRID: 1576332437617451512
X-Gmail-Labels: Inbox,Category Forums

2017-10-31 15:17:59

by Naveen N. Rao

[permalink] [raw]

Subject: Re: [PATCH v3 2/2] perf/bench/numa: Handle discontiguous/sparse numa nodes

On 2017/08/21 10:17AM, [email protected] wrote:
> From: Satheesh Rajendran <[email protected]>
>
> Certain systems are designed to have sparse/discontiguous nodes.
> On such systems, perf bench numa hangs, shows wrong number of nodes
> and shows values for non-existent nodes. Handle this by only
> taking nodes that are exposed by kernel to userspace.
>
> Cc: Arnaldo Carvalho de Melo <[email protected]>
> Reviewed-by: Srikar Dronamraju <[email protected]>
> Signed-off-by: Satheesh Rajendran <[email protected]>
> Signed-off-by: Balamuruhan S <[email protected]>
> ---
> tools/perf/bench/numa.c | 17 ++++++++++-------
> 1 file changed, 10 insertions(+), 7 deletions(-)
>
> diff --git a/tools/perf/bench/numa.c b/tools/perf/bench/numa.c
> index 2483174..d4cccc4 100644
> --- a/tools/perf/bench/numa.c
> +++ b/tools/perf/bench/numa.c
> @@ -287,12 +287,12 @@ static cpu_set_t bind_to_cpu(int target_cpu)
>
> static cpu_set_t bind_to_node(int target_node)
> {
> - int cpus_per_node = g->p.nr_cpus/g->p.nr_nodes;
> + int cpus_per_node = g->p.nr_cpus/nr_numa_nodes();
> cpu_set_t orig_mask, mask;
> int cpu;
> int ret;
>
> - BUG_ON(cpus_per_node*g->p.nr_nodes != g->p.nr_cpus);
> + BUG_ON(cpus_per_node*nr_numa_nodes() != g->p.nr_cpus);
> BUG_ON(!cpus_per_node);
>
> ret = sched_getaffinity(0, sizeof(orig_mask), &orig_mask);
> @@ -692,7 +692,7 @@ static int parse_setup_node_list(void)
> int i;
>
> for (i = 0; i < mul; i++) {
> - if (t >= g->p.nr_tasks) {
> + if (t >= g->p.nr_tasks || !node_has_cpus(bind_node)) {
> printf("\n# NOTE: ignoring bind NODEs starting at NODE#%d\n", bind_node);
> goto out;
> }
> @@ -973,6 +973,7 @@ static void calc_convergence(double runtime_ns_max, double *convergence)
> int node;
> int cpu;
> int t;
> + int processes;
>
> if (!g->p.show_convergence && !g->p.measure_convergence)
> return;
> @@ -1007,13 +1008,14 @@ static void calc_convergence(double runtime_ns_max, double *convergence)
> sum = 0;
>
> for (node = 0; node < g->p.nr_nodes; node++) {
> + if (!is_node_present(node))
> + continue;
> nr = nodes[node];
> nr_min = min(nr, nr_min);
> nr_max = max(nr, nr_max);
> sum += nr;
> }
> BUG_ON(nr_min > nr_max);
> -

Looks like an un-necessary change there.

- Naveen

> BUG_ON(sum > g->p.nr_tasks);
>
> if (0 && (sum < g->p.nr_tasks))
> @@ -1027,8 +1029,9 @@ static void calc_convergence(double runtime_ns_max, double *convergence)
> process_groups = 0;
>
> for (node = 0; node < g->p.nr_nodes; node++) {
> - int processes = count_node_processes(node);
> -
> + if (!is_node_present(node))
> + continue;
> + processes = count_node_processes(node);
> nr = nodes[node];
> tprintf(" %2d/%-2d", nr, processes);
>
> @@ -1334,7 +1337,7 @@ static void print_summary(void)
>
> printf("\n ###\n");
> printf(" # %d %s will execute (on %d nodes, %d CPUs):\n",
> - g->p.nr_tasks, g->p.nr_tasks == 1 ? "task" : "tasks", g->p.nr_nodes, g->p.nr_cpus);
> + g->p.nr_tasks, g->p.nr_tasks == 1 ? "task" : "tasks", nr_numa_nodes(), g->p.nr_cpus);
> printf(" # %5dx %5ldMB global shared mem operations\n",
> g->p.nr_loops, g->p.bytes_global/1024/1024);
> printf(" # %5dx %5ldMB process shared mem operations\n",
> --
> 2.7.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-perf-users" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

From 1576335551888433029@xxx Mon Aug 21 10:18:49 +0000 2017
X-GM-THRID: 1576332437617451512
X-Gmail-Labels: Inbox,Category Forums

2017-11-14 13:44:39

by Satheesh Rajendran

[permalink] [raw]

Subject: Re: [PATCH v3 1/2] perf/bench/numa: Add functions to detect sparse numa nodes

Hi Naveen,Thanks for detailed review, my comments inline.

On Tue, 2017-10-31 at 20:44 +0530, Naveen N. Rao wrote:
> Hi Satheesh,
>
> On 2017/08/21 10:15AM, [email protected] wrote:
> >
> > From: Satheesh Rajendran <[email protected]>
> >
> > Added functions 1) to get a count of all nodes that are exposed to
> > userspace. These nodes could be memoryless cpu nodes or cpuless
> > memory
> > nodes, 2) to check given node is present and 3) to check given
> > node has cpus
> >
> > This information can be used to handle sparse/discontiguous nodes.
> >
> > Cc: Arnaldo Carvalho de Melo <[email protected]>
> > Reviewed-by: Srikar Dronamraju <[email protected]>
> > Signed-off-by: Satheesh Rajendran <[email protected]>
> > Signed-off-by: Balamuruhan S <[email protected]>
> > ---
> > tools/perf/bench/numa.c | 44
> > ++++++++++++++++++++++++++++++++++++++++++++
> > 1 file changed, 44 insertions(+)
> >
> > diff --git a/tools/perf/bench/numa.c b/tools/perf/bench/numa.c
> > index 469d65b..2483174 100644
> > --- a/tools/perf/bench/numa.c
> > +++ b/tools/perf/bench/numa.c
> > @@ -215,6 +215,50 @@ static const char * const numa_usage[] = {
> > NULL
> > };
> >
> > +/*
> > + * To get number of numa nodes present.
> > + */
> > +static int nr_numa_nodes(void)
> > +{
> > + int i, nr_nodes = 0;
> > +
> > + for (i = 0; i < g->p.nr_nodes; i++) {
> > + if (numa_bitmask_isbitset(numa_nodes_ptr, i))
> > + nr_nodes++;
> > + }
> > +
> > + return nr_nodes;
> > +}
> > +
> > +/*
> Please run patches through scripts/checkpatch.pl. There is a
> trailing
> whitespace above...
>
Sure
> >
> > + * To check if given numa node is present.
> > + */
> > +static int is_node_present(int node)
> > +{
> > + return numa_bitmask_isbitset(numa_nodes_ptr, node);
> > +}
> > +
> > +/*
> > + * To check given numa node has cpus.
> > + */
> > +static bool node_has_cpus(int node)
> > +{
> > + struct bitmask *cpu = numa_allocate_cpumask();
> > + unsigned int i;
> > +
> > + if (cpu == NULL)
> > + return false; /* lets fall back to nocpus safely
> > */
> > +
> > + if (numa_node_to_cpus(node, cpu) == 0) {
> This can be simplified to:
> if (cpu && !numa_node_to_cpus(node, cpu)) {
>
> >
> > + for (i = 0; i < cpu->size; i++) {
> > + if (numa_bitmask_isbitset(cpu, i))
> > + return true;
> > + }
> > + }
> The indentation on those brackets look to be wrong.
>
Sure
> >
> > +
> > + return false;
> > +}
> > +
> More importantly, you've introduced few functions in this patch, but
> none of those are being used. This is not a useful way to split your
> patches. In fact, this hurts bisect since trying to build perf with
> just
> this patch applied throws errors.
>
Sure, This can be merged to single patch, will do it in next version.

> You seem to be addressing a few different issues related to perf
> bench
> numa. You might want to split your patch based on the specific
> issue(s)
> each change fixes.
>
>
> - Naveen
>
>
Regards,
-Satheesh.
> >
> > static cpu_set_t bind_to_cpu(int target_cpu)
> > {
> > cpu_set_t orig_mask, mask;

From 1582786709302944039@xxx Tue Oct 31 15:17:13 +0000 2017
X-GM-THRID: 1576332506056657646
X-Gmail-Labels: Inbox,Category Forums

2017-10-31 15:17:13

by Naveen N. Rao

[permalink] [raw]

Subject: Re: [PATCH v3 1/2] perf/bench/numa: Add functions to detect sparse numa nodes

Hi Satheesh,

On 2017/08/21 10:15AM, [email protected] wrote:
> From: Satheesh Rajendran <[email protected]>
>
> Added functions 1) to get a count of all nodes that are exposed to
> userspace. These nodes could be memoryless cpu nodes or cpuless memory
> nodes, 2) to check given node is present and 3) to check given
> node has cpus
>
> This information can be used to handle sparse/discontiguous nodes.
>
> Cc: Arnaldo Carvalho de Melo <[email protected]>
> Reviewed-by: Srikar Dronamraju <[email protected]>
> Signed-off-by: Satheesh Rajendran <[email protected]>
> Signed-off-by: Balamuruhan S <[email protected]>
> ---
> tools/perf/bench/numa.c | 44 ++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 44 insertions(+)
>
> diff --git a/tools/perf/bench/numa.c b/tools/perf/bench/numa.c
> index 469d65b..2483174 100644
> --- a/tools/perf/bench/numa.c
> +++ b/tools/perf/bench/numa.c
> @@ -215,6 +215,50 @@ static const char * const numa_usage[] = {
> NULL
> };
>
> +/*
> + * To get number of numa nodes present.
> + */
> +static int nr_numa_nodes(void)
> +{
> + int i, nr_nodes = 0;
> +
> + for (i = 0; i < g->p.nr_nodes; i++) {
> + if (numa_bitmask_isbitset(numa_nodes_ptr, i))
> + nr_nodes++;
> + }
> +
> + return nr_nodes;
> +}
> +
> +/*

Please run patches through scripts/checkpatch.pl. There is a trailing
whitespace above...

> + * To check if given numa node is present.
> + */
> +static int is_node_present(int node)
> +{
> + return numa_bitmask_isbitset(numa_nodes_ptr, node);
> +}
> +
> +/*
> + * To check given numa node has cpus.
> + */
> +static bool node_has_cpus(int node)
> +{
> + struct bitmask *cpu = numa_allocate_cpumask();
> + unsigned int i;
> +
> + if (cpu == NULL)
> + return false; /* lets fall back to nocpus safely */
> +
> + if (numa_node_to_cpus(node, cpu) == 0) {

This can be simplified to:
if (cpu && !numa_node_to_cpus(node, cpu)) {

> + for (i = 0; i < cpu->size; i++) {
> + if (numa_bitmask_isbitset(cpu, i))
> + return true;
> + }
> + }

The indentation on those brackets look to be wrong.

> +
> + return false;
> +}
> +

More importantly, you've introduced few functions in this patch, but
none of those are being used. This is not a useful way to split your
patches. In fact, this hurts bisect since trying to build perf with just
this patch applied throws errors.

You seem to be addressing a few different issues related to perf bench
numa. You might want to split your patch based on the specific issue(s)
each change fixes.

- Naveen

> static cpu_set_t bind_to_cpu(int target_cpu)
> {
> cpu_set_t orig_mask, mask;
> --
> 2.7.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-perf-users" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

From 1576335468855890653@xxx Mon Aug 21 10:17:30 +0000 2017
X-GM-THRID: 1576332506056657646
X-Gmail-Labels: Inbox,Category Forums