The perf benchmark for collections: numa, futex and epoll
hits failure in system configuration with CPU's more than 1024.
These benchmarks uses "sched_getaffinity" and "sched_setaffinity"
in the code to work with affinity.
Example snippet from numa benchmark:
<<>>
perf: bench/numa.c:302: bind_to_node: Assertion `!(ret)' failed.
Aborted (core dumped)
<<>>
bind_to_node function uses "sched_getaffinity" to save the cpumask.
This fails with EINVAL because the default mask size in glibc is 1024.
Similarly in futex and epoll benchmark, uses sched_setaffinity during
pthread_create with affinity. And since it returns EINVAL in such system
configuration, benchmark doesn't run.
To overcome this 1024 CPUs mask size limitation of cpu_set_t,
change the mask size using the CPU_*_S macros ie, use CPU_ALLOC to
allocate cpumask, CPU_ALLOC_SIZE for size, CPU_SET_S to set mask bit.
Fix all the relevant places in the code to use mask size which is large
enough to represent number of possible CPU's in the system.
Fix parse_setup_cpu_list function in numa bench to check if input CPU
is online before binding task to that CPU. This is to fix failures where,
though CPU number is within max CPU, it could happen that CPU is offline.
Here, sched_setaffinity will result in failure when using cpumask having
that cpu bit set in the mask.
Patch 1 and Patch 2 address fix for perf bench futex and perf bench
epoll benchmark. Patch 3 and Patch 4 address fix in perf bench numa
benchmark
Athira Rajeev (4):
tools/perf: Fix perf bench futex to correct usage of affinity for
machines with #CPUs > 1K
tools/perf: Fix perf bench epoll to correct usage of affinity for
machines with #CPUs > 1K
tools/perf: Fix perf numa bench to fix usage of affinity for machines
with #CPUs > 1K
tools/perf: Fix perf bench numa testcase to check if CPU used to bind
task is online
Changelog:
From v1 -> v2:
Addressed review comment from Ian Rogers to do
CPU_FREE in a cleaner way.
Added Tested-by from Disha Goel
tools/perf/bench/epoll-ctl.c | 25 ++++--
tools/perf/bench/epoll-wait.c | 25 ++++--
tools/perf/bench/futex-hash.c | 26 ++++--
tools/perf/bench/futex-lock-pi.c | 21 +++--
tools/perf/bench/futex-requeue.c | 21 +++--
tools/perf/bench/futex-wake-parallel.c | 21 +++--
tools/perf/bench/futex-wake.c | 22 ++++--
tools/perf/bench/numa.c | 105 ++++++++++++++++++-------
tools/perf/util/header.c | 43 ++++++++++
tools/perf/util/header.h | 1 +
10 files changed, 242 insertions(+), 68 deletions(-)
--
2.35.1
Perf numa bench test fails with error:
Testcase:
./perf bench numa mem -p 2 -t 1 -P 1024 -C 0,8 -M 1,0 -s 20 -zZq
--thp 1 --no-data_rand_walk
Failure snippet:
<<>>
Running 'numa/mem' benchmark:
# Running main, "perf bench numa numa-mem -p 2 -t 1 -P 1024 -C 0,8
-M 1,0 -s 20 -zZq --thp 1 --no-data_rand_walk"
perf: bench/numa.c:333: bind_to_cpumask: Assertion `!(ret)' failed.
<<>>
The Testcases uses CPU’s 0 and 8. In function "parse_setup_cpu_list",
There is check to see if cpu number is greater than max cpu’s possible
in the system ie via "if (bind_cpu_0 >= g->p.nr_cpus ||
bind_cpu_1 >= g->p.nr_cpus) {". But it could happen that system has
say 48 CPU’s, but only number of online CPU’s is 0-7. Other CPU’s
are offlined. Since "g->p.nr_cpus" is 48, so function will go ahead
and set bit for CPU 8 also in cpumask ( td->bind_cpumask).
bind_to_cpumask function is called to set affinity using
sched_setaffinity and the cpumask. Since the CPU8 is not present,
set affinity will fail here with EINVAL. Fix this issue by adding a
check to make sure that, CPU’s provided in the input argument values
are online before proceeding further and skip the test. For this,
include new helper function "is_cpu_online" in
"tools/perf/util/header.c".
Since "BIT(x)" definition will get included from header.h, remove
that from bench/numa.c
Tested-by: Disha Goel <[email protected]>
Signed-off-by: Athira Rajeev <[email protected]>
Reported-by: Disha Goel <[email protected]>
---
tools/perf/bench/numa.c | 8 ++++++--
tools/perf/util/header.c | 43 ++++++++++++++++++++++++++++++++++++++++
tools/perf/util/header.h | 1 +
3 files changed, 50 insertions(+), 2 deletions(-)
diff --git a/tools/perf/bench/numa.c b/tools/perf/bench/numa.c
index 29e41e32bd88..7992d79b3e41 100644
--- a/tools/perf/bench/numa.c
+++ b/tools/perf/bench/numa.c
@@ -34,6 +34,7 @@
#include <linux/numa.h>
#include <linux/zalloc.h>
+#include "../util/header.h"
#include <numa.h>
#include <numaif.h>
@@ -616,6 +617,11 @@ static int parse_setup_cpu_list(void)
return -1;
}
+ if (is_cpu_online(bind_cpu_0) != 1 || is_cpu_online(bind_cpu_1) != 1) {
+ printf("\nTest not applicable, bind_cpu_0 or bind_cpu_1 is offline\n");
+ return -1;
+ }
+
BUG_ON(bind_cpu_0 < 0 || bind_cpu_1 < 0);
BUG_ON(bind_cpu_0 > bind_cpu_1);
@@ -786,8 +792,6 @@ static int parse_nodes_opt(const struct option *opt __maybe_unused,
return parse_node_list(arg);
}
-#define BIT(x) (1ul << x)
-
static inline uint32_t lfsr_32(uint32_t lfsr)
{
const uint32_t taps = BIT(1) | BIT(5) | BIT(6) | BIT(31);
diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
index 6da12e522edc..3f5fcf5d4b3f 100644
--- a/tools/perf/util/header.c
+++ b/tools/perf/util/header.c
@@ -983,6 +983,49 @@ static int write_dir_format(struct feat_fd *ff,
return do_write(ff, &data->dir.version, sizeof(data->dir.version));
}
+#define SYSFS "/sys/devices/system/cpu/"
+
+/*
+ * Check whether a CPU is online
+ *
+ * Returns:
+ * 1 -> if CPU is online
+ * 0 -> if CPU is offline
+ * -1 -> error case
+ */
+int is_cpu_online(unsigned int cpu)
+{
+ char sysfs_cpu[255];
+ char buf[255];
+ struct stat statbuf;
+ size_t len;
+ int fd;
+
+ snprintf(sysfs_cpu, sizeof(sysfs_cpu), SYSFS "cpu%u", cpu);
+
+ if (stat(sysfs_cpu, &statbuf) != 0)
+ return 0;
+
+ /*
+ * Check if /sys/devices/system/cpu/cpux/online file
+ * exists. In kernels without CONFIG_HOTPLUG_CPU, this
+ * file won't exist.
+ */
+ snprintf(sysfs_cpu, sizeof(sysfs_cpu), SYSFS "cpu%u/online", cpu);
+ if (stat(sysfs_cpu, &statbuf) != 0)
+ return 1;
+
+ fd = open(sysfs_cpu, O_RDONLY);
+ if (fd == -1)
+ return -1;
+
+ len = read(fd, buf, sizeof(buf) - 1);
+ buf[len] = '\0';
+ close(fd);
+
+ return strtoul(buf, NULL, 16);
+}
+
#ifdef HAVE_LIBBPF_SUPPORT
static int write_bpf_prog_info(struct feat_fd *ff,
struct evlist *evlist __maybe_unused)
diff --git a/tools/perf/util/header.h b/tools/perf/util/header.h
index c9e3265832d9..0eb4bc29a5a4 100644
--- a/tools/perf/util/header.h
+++ b/tools/perf/util/header.h
@@ -158,6 +158,7 @@ int do_write(struct feat_fd *fd, const void *buf, size_t size);
int write_padded(struct feat_fd *fd, const void *bf,
size_t count, size_t count_aligned);
+int is_cpu_online(unsigned int cpu);
/*
* arch specific callback
*/
--
2.35.1
perf bench numa testcase fails on systems with CPU's
more than 1K.
Testcase: perf bench numa mem -p 1 -t 3 -P 512 -s 100 -zZ0qcm --thp 1
Snippet of code:
<<>>
perf: bench/numa.c:302: bind_to_node: Assertion `!(ret)' failed.
Aborted (core dumped)
<<>>
bind_to_node function uses "sched_getaffinity" to save the original
cpumask and this call is returning EINVAL ((invalid argument).
This happens because the default mask size in glibc is 1024.
To overcome this 1024 CPUs mask size limitation of cpu_set_t,
change the mask size using the CPU_*_S macros ie, use CPU_ALLOC to
allocate cpumask, CPU_ALLOC_SIZE for size. Apart from fixing this
for "orig_mask", apply same logic to "mask" as well which is used to
setaffinity so that mask size is large enough to represent number
of possible CPU's in the system.
sched_getaffinity is used in one more place in perf numa bench. It
is in "bind_to_cpu" function. Apply the same logic there also. Though
currently no failure is reported from there, it is ideal to change
getaffinity to work with such system configurations having CPU's more
than default mask size supported by glibc.
Also fix "sched_setaffinity" to use mask size which is large enough
to represent number of possible CPU's in the system.
Fixed all places where "bind_cpumask" which is part of "struct
thread_data" is used such that bind_cpumask works in all configuration.
Tested-by: Disha Goel <[email protected]>
Signed-off-by: Athira Rajeev <[email protected]>
Reported-by: Disha Goel <[email protected]>
---
tools/perf/bench/numa.c | 97 ++++++++++++++++++++++++++++++-----------
1 file changed, 71 insertions(+), 26 deletions(-)
diff --git a/tools/perf/bench/numa.c b/tools/perf/bench/numa.c
index f2640179ada9..29e41e32bd88 100644
--- a/tools/perf/bench/numa.c
+++ b/tools/perf/bench/numa.c
@@ -54,7 +54,7 @@
struct thread_data {
int curr_cpu;
- cpu_set_t bind_cpumask;
+ cpu_set_t *bind_cpumask;
int bind_node;
u8 *process_data;
int process_nr;
@@ -266,46 +266,71 @@ static bool node_has_cpus(int node)
return ret;
}
-static cpu_set_t bind_to_cpu(int target_cpu)
+static cpu_set_t *bind_to_cpu(int target_cpu)
{
- cpu_set_t orig_mask, mask;
+ int nrcpus = numa_num_possible_cpus();
+ cpu_set_t *orig_mask, *mask;
+ size_t size;
int ret;
- ret = sched_getaffinity(0, sizeof(orig_mask), &orig_mask);
- BUG_ON(ret);
+ orig_mask = CPU_ALLOC(nrcpus);
+ BUG_ON(!orig_mask);
+ size = CPU_ALLOC_SIZE(nrcpus);
+ CPU_ZERO_S(size, orig_mask);
+
+ ret = sched_getaffinity(0, size, orig_mask);
+ if (ret) {
+ CPU_FREE(orig_mask);
+ BUG_ON(ret);
+ }
- CPU_ZERO(&mask);
+ mask = CPU_ALLOC(nrcpus);
+ BUG_ON(!mask);
+ CPU_ZERO_S(size, mask);
if (target_cpu == -1) {
int cpu;
for (cpu = 0; cpu < g->p.nr_cpus; cpu++)
- CPU_SET(cpu, &mask);
+ CPU_SET_S(cpu, size, mask);
} else {
BUG_ON(target_cpu < 0 || target_cpu >= g->p.nr_cpus);
- CPU_SET(target_cpu, &mask);
+ CPU_SET_S(target_cpu, size, mask);
}
- ret = sched_setaffinity(0, sizeof(mask), &mask);
+ ret = sched_setaffinity(0, size, mask);
+ CPU_FREE(mask);
BUG_ON(ret);
return orig_mask;
}
-static cpu_set_t bind_to_node(int target_node)
+static cpu_set_t *bind_to_node(int target_node)
{
- cpu_set_t orig_mask, mask;
+ int nrcpus = numa_num_possible_cpus();
+ cpu_set_t *orig_mask, *mask;
+ size_t size;
int cpu;
int ret;
- ret = sched_getaffinity(0, sizeof(orig_mask), &orig_mask);
- BUG_ON(ret);
+ orig_mask = CPU_ALLOC(nrcpus);
+ BUG_ON(!orig_mask);
+ size = CPU_ALLOC_SIZE(nrcpus);
+ CPU_ZERO_S(size, orig_mask);
+
+ ret = sched_getaffinity(0, size, orig_mask);
+ if (ret) {
+ CPU_FREE(orig_mask);
+ BUG_ON(ret);
+ }
- CPU_ZERO(&mask);
+ mask = CPU_ALLOC(nrcpus);
+ BUG_ON(!mask);
+ CPU_ZERO_S(size, mask);
if (target_node == NUMA_NO_NODE) {
for (cpu = 0; cpu < g->p.nr_cpus; cpu++)
- CPU_SET(cpu, &mask);
+ CPU_SET_S(cpu, size, mask);
} else {
struct bitmask *cpumask = numa_allocate_cpumask();
@@ -313,24 +338,29 @@ static cpu_set_t bind_to_node(int target_node)
if (!numa_node_to_cpus(target_node, cpumask)) {
for (cpu = 0; cpu < (int)cpumask->size; cpu++) {
if (numa_bitmask_isbitset(cpumask, cpu))
- CPU_SET(cpu, &mask);
+ CPU_SET_S(cpu, size, mask);
}
}
numa_free_cpumask(cpumask);
}
- ret = sched_setaffinity(0, sizeof(mask), &mask);
+ ret = sched_setaffinity(0, size, mask);
+ CPU_FREE(mask);
BUG_ON(ret);
return orig_mask;
}
-static void bind_to_cpumask(cpu_set_t mask)
+static void bind_to_cpumask(cpu_set_t *mask)
{
int ret;
+ size_t size = CPU_ALLOC_SIZE(numa_num_possible_cpus());
- ret = sched_setaffinity(0, sizeof(mask), &mask);
- BUG_ON(ret);
+ ret = sched_setaffinity(0, size, mask);
+ if (ret) {
+ CPU_FREE(mask);
+ BUG_ON(ret);
+ }
}
static void mempol_restore(void)
@@ -376,7 +406,7 @@ do { \
static u8 *alloc_data(ssize_t bytes0, int map_flags,
int init_zero, int init_cpu0, int thp, int init_random)
{
- cpu_set_t orig_mask;
+ cpu_set_t *orig_mask;
ssize_t bytes;
u8 *buf;
int ret;
@@ -434,6 +464,7 @@ static u8 *alloc_data(ssize_t bytes0, int map_flags,
/* Restore affinity: */
if (init_cpu0) {
bind_to_cpumask(orig_mask);
+ CPU_FREE(orig_mask);
mempol_restore();
}
@@ -589,6 +620,7 @@ static int parse_setup_cpu_list(void)
BUG_ON(bind_cpu_0 > bind_cpu_1);
for (bind_cpu = bind_cpu_0; bind_cpu <= bind_cpu_1; bind_cpu += step) {
+ size_t size = CPU_ALLOC_SIZE(g->p.nr_cpus);
int i;
for (i = 0; i < mul; i++) {
@@ -608,10 +640,12 @@ static int parse_setup_cpu_list(void)
tprintf("%2d", bind_cpu);
}
- CPU_ZERO(&td->bind_cpumask);
+ td->bind_cpumask = CPU_ALLOC(g->p.nr_cpus);
+ BUG_ON(!td->bind_cpumask);
+ CPU_ZERO_S(size, td->bind_cpumask);
for (cpu = bind_cpu; cpu < bind_cpu+bind_len; cpu++) {
BUG_ON(cpu < 0 || cpu >= g->p.nr_cpus);
- CPU_SET(cpu, &td->bind_cpumask);
+ CPU_SET_S(cpu, size, td->bind_cpumask);
}
t++;
}
@@ -1241,7 +1275,7 @@ static void *worker_thread(void *__tdata)
* by migrating to CPU#0:
*/
if (first_task && g->p.perturb_secs && (int)(stop.tv_sec - last_perturbance) >= g->p.perturb_secs) {
- cpu_set_t orig_mask;
+ cpu_set_t *orig_mask;
int target_cpu;
int this_cpu;
@@ -1265,6 +1299,7 @@ static void *worker_thread(void *__tdata)
printf(" (injecting perturbalance, moved to CPU#%d)\n", target_cpu);
bind_to_cpumask(orig_mask);
+ CPU_FREE(orig_mask);
}
if (details >= 3) {
@@ -1398,21 +1433,31 @@ static void init_thread_data(void)
for (t = 0; t < g->p.nr_tasks; t++) {
struct thread_data *td = g->threads + t;
+ size_t cpuset_size = CPU_ALLOC_SIZE(g->p.nr_cpus);
int cpu;
/* Allow all nodes by default: */
td->bind_node = NUMA_NO_NODE;
/* Allow all CPUs by default: */
- CPU_ZERO(&td->bind_cpumask);
+ td->bind_cpumask = CPU_ALLOC(g->p.nr_cpus);
+ BUG_ON(!td->bind_cpumask);
+ CPU_ZERO_S(cpuset_size, td->bind_cpumask);
for (cpu = 0; cpu < g->p.nr_cpus; cpu++)
- CPU_SET(cpu, &td->bind_cpumask);
+ CPU_SET_S(cpu, cpuset_size, td->bind_cpumask);
}
}
static void deinit_thread_data(void)
{
ssize_t size = sizeof(*g->threads)*g->p.nr_tasks;
+ int t;
+
+ /* Free the bind_cpumask allocated for thread_data */
+ for (t = 0; t < g->p.nr_tasks; t++) {
+ struct thread_data *td = g->threads + t;
+ CPU_FREE(td->bind_cpumask);
+ }
free_data(g->threads, size);
}
--
2.35.1
perf bench epoll testcase fails on systems with CPU's
more than 1K.
Testcase: perf bench epoll all
Result snippet:
<<>>
Run summary [PID 106497]: 1399 threads monitoring on 64 file-descriptors for 8 secs.
perf: pthread_create: No such file or directory
<<>>
In epoll benchmarks (ctl, wait) pthread_create is invoked in do_threads
from respective bench_epoll_* function. Though the logs shows direct
failure from pthread_create, the actual failure is from "sched_setaffinity"
returning EINVAL (invalid argument). This happens because the default
mask size in glibc is 1024. To overcome this 1024 CPUs mask size
limitation of cpu_set_t, change the mask size using the CPU_*_S macros.
Patch addresses this by fixing all the epoll benchmarks to use
CPU_ALLOC to allocate cpumask, CPU_ALLOC_SIZE for size, and
CPU_SET_S to set the mask.
Tested-by: Disha Goel <[email protected]>
Signed-off-by: Athira Rajeev <[email protected]>
Reported-by: Disha Goel <[email protected]>
---
tools/perf/bench/epoll-ctl.c | 25 +++++++++++++++++++------
tools/perf/bench/epoll-wait.c | 25 +++++++++++++++++++------
2 files changed, 38 insertions(+), 12 deletions(-)
diff --git a/tools/perf/bench/epoll-ctl.c b/tools/perf/bench/epoll-ctl.c
index 1a17ec83d3c4..91c53f6c6d87 100644
--- a/tools/perf/bench/epoll-ctl.c
+++ b/tools/perf/bench/epoll-ctl.c
@@ -222,13 +222,20 @@ static void init_fdmaps(struct worker *w, int pct)
static int do_threads(struct worker *worker, struct perf_cpu_map *cpu)
{
pthread_attr_t thread_attr, *attrp = NULL;
- cpu_set_t cpuset;
+ cpu_set_t *cpuset;
unsigned int i, j;
int ret = 0;
+ int nrcpus;
+ size_t size;
if (!noaffinity)
pthread_attr_init(&thread_attr);
+ nrcpus = perf_cpu_map__nr(cpu);
+ cpuset = CPU_ALLOC(nrcpus);
+ BUG_ON(!cpuset);
+ size = CPU_ALLOC_SIZE(nrcpus);
+
for (i = 0; i < nthreads; i++) {
struct worker *w = &worker[i];
@@ -252,22 +259,28 @@ static int do_threads(struct worker *worker, struct perf_cpu_map *cpu)
init_fdmaps(w, 50);
if (!noaffinity) {
- CPU_ZERO(&cpuset);
- CPU_SET(perf_cpu_map__cpu(cpu, i % perf_cpu_map__nr(cpu)).cpu, &cpuset);
+ CPU_ZERO_S(size, cpuset);
+ CPU_SET_S(perf_cpu_map__cpu(cpu, i % perf_cpu_map__nr(cpu)).cpu,
+ size, cpuset);
- ret = pthread_attr_setaffinity_np(&thread_attr, sizeof(cpu_set_t), &cpuset);
- if (ret)
+ ret = pthread_attr_setaffinity_np(&thread_attr, size, cpuset);
+ if (ret) {
+ CPU_FREE(cpuset);
err(EXIT_FAILURE, "pthread_attr_setaffinity_np");
+ }
attrp = &thread_attr;
}
ret = pthread_create(&w->thread, attrp, workerfn,
(void *)(struct worker *) w);
- if (ret)
+ if (ret) {
+ CPU_FREE(cpuset);
err(EXIT_FAILURE, "pthread_create");
+ }
}
+ CPU_FREE(cpuset);
if (!noaffinity)
pthread_attr_destroy(&thread_attr);
diff --git a/tools/perf/bench/epoll-wait.c b/tools/perf/bench/epoll-wait.c
index 0d1dd8879197..9469a53ffab9 100644
--- a/tools/perf/bench/epoll-wait.c
+++ b/tools/perf/bench/epoll-wait.c
@@ -291,9 +291,11 @@ static void print_summary(void)
static int do_threads(struct worker *worker, struct perf_cpu_map *cpu)
{
pthread_attr_t thread_attr, *attrp = NULL;
- cpu_set_t cpuset;
+ cpu_set_t *cpuset;
unsigned int i, j;
int ret = 0, events = EPOLLIN;
+ int nrcpus;
+ size_t size;
if (oneshot)
events |= EPOLLONESHOT;
@@ -306,6 +308,11 @@ static int do_threads(struct worker *worker, struct perf_cpu_map *cpu)
if (!noaffinity)
pthread_attr_init(&thread_attr);
+ nrcpus = perf_cpu_map__nr(cpu);
+ cpuset = CPU_ALLOC(nrcpus);
+ BUG_ON(!cpuset);
+ size = CPU_ALLOC_SIZE(nrcpus);
+
for (i = 0; i < nthreads; i++) {
struct worker *w = &worker[i];
@@ -341,22 +348,28 @@ static int do_threads(struct worker *worker, struct perf_cpu_map *cpu)
}
if (!noaffinity) {
- CPU_ZERO(&cpuset);
- CPU_SET(perf_cpu_map__cpu(cpu, i % perf_cpu_map__nr(cpu)).cpu, &cpuset);
+ CPU_ZERO_S(size, cpuset);
+ CPU_SET_S(perf_cpu_map__cpu(cpu, i % perf_cpu_map__nr(cpu)).cpu,
+ size, cpuset);
- ret = pthread_attr_setaffinity_np(&thread_attr, sizeof(cpu_set_t), &cpuset);
- if (ret)
+ ret = pthread_attr_setaffinity_np(&thread_attr, size, cpuset);
+ if (ret) {
+ CPU_FREE(cpuset);
err(EXIT_FAILURE, "pthread_attr_setaffinity_np");
+ }
attrp = &thread_attr;
}
ret = pthread_create(&w->thread, attrp, workerfn,
(void *)(struct worker *) w);
- if (ret)
+ if (ret) {
+ CPU_FREE(cpuset);
err(EXIT_FAILURE, "pthread_create");
+ }
}
+ CPU_FREE(cpuset);
if (!noaffinity)
pthread_attr_destroy(&thread_attr);
--
2.35.1
> On 07-Apr-2022, at 6:05 AM, Ian Rogers <[email protected]> wrote:
>
> On Wed, Apr 6, 2022 at 10:51 AM Athira Rajeev
> <[email protected]> wrote:
>>
>> The perf benchmark for collections: numa, futex and epoll
>> hits failure in system configuration with CPU's more than 1024.
>> These benchmarks uses "sched_getaffinity" and "sched_setaffinity"
>> in the code to work with affinity.
>>
>> Example snippet from numa benchmark:
>> <<>>
>> perf: bench/numa.c:302: bind_to_node: Assertion `!(ret)' failed.
>> Aborted (core dumped)
>> <<>>
>>
>> bind_to_node function uses "sched_getaffinity" to save the cpumask.
>> This fails with EINVAL because the default mask size in glibc is 1024.
>>
>> Similarly in futex and epoll benchmark, uses sched_setaffinity during
>> pthread_create with affinity. And since it returns EINVAL in such system
>> configuration, benchmark doesn't run.
>>
>> To overcome this 1024 CPUs mask size limitation of cpu_set_t,
>> change the mask size using the CPU_*_S macros ie, use CPU_ALLOC to
>> allocate cpumask, CPU_ALLOC_SIZE for size, CPU_SET_S to set mask bit.
>>
>> Fix all the relevant places in the code to use mask size which is large
>> enough to represent number of possible CPU's in the system.
>>
>> Fix parse_setup_cpu_list function in numa bench to check if input CPU
>> is online before binding task to that CPU. This is to fix failures where,
>> though CPU number is within max CPU, it could happen that CPU is offline.
>> Here, sched_setaffinity will result in failure when using cpumask having
>> that cpu bit set in the mask.
>>
>> Patch 1 and Patch 2 address fix for perf bench futex and perf bench
>> epoll benchmark. Patch 3 and Patch 4 address fix in perf bench numa
>> benchmark
>>
>> Athira Rajeev (4):
>> tools/perf: Fix perf bench futex to correct usage of affinity for
>> machines with #CPUs > 1K
>> tools/perf: Fix perf bench epoll to correct usage of affinity for
>> machines with #CPUs > 1K
>> tools/perf: Fix perf numa bench to fix usage of affinity for machines
>> with #CPUs > 1K
>> tools/perf: Fix perf bench numa testcase to check if CPU used to bind
>> task is online
>>
>> Changelog:
>> From v1 -> v2:
>> Addressed review comment from Ian Rogers to do
>> CPU_FREE in a cleaner way.
>> Added Tested-by from Disha Goel
>
>
> The whole set:
> Acked-by: Ian Rogers <[email protected]>
Thanks for checking Ian.
Athira.
>
> Thanks,
> Ian
>
>> tools/perf/bench/epoll-ctl.c | 25 ++++--
>> tools/perf/bench/epoll-wait.c | 25 ++++--
>> tools/perf/bench/futex-hash.c | 26 ++++--
>> tools/perf/bench/futex-lock-pi.c | 21 +++--
>> tools/perf/bench/futex-requeue.c | 21 +++--
>> tools/perf/bench/futex-wake-parallel.c | 21 +++--
>> tools/perf/bench/futex-wake.c | 22 ++++--
>> tools/perf/bench/numa.c | 105 ++++++++++++++++++-------
>> tools/perf/util/header.c | 43 ++++++++++
>> tools/perf/util/header.h | 1 +
>> 10 files changed, 242 insertions(+), 68 deletions(-)
>>
>> --
>> 2.35.1
On Wed, Apr 6, 2022 at 10:51 AM Athira Rajeev
<[email protected]> wrote:
>
> The perf benchmark for collections: numa, futex and epoll
> hits failure in system configuration with CPU's more than 1024.
> These benchmarks uses "sched_getaffinity" and "sched_setaffinity"
> in the code to work with affinity.
>
> Example snippet from numa benchmark:
> <<>>
> perf: bench/numa.c:302: bind_to_node: Assertion `!(ret)' failed.
> Aborted (core dumped)
> <<>>
>
> bind_to_node function uses "sched_getaffinity" to save the cpumask.
> This fails with EINVAL because the default mask size in glibc is 1024.
>
> Similarly in futex and epoll benchmark, uses sched_setaffinity during
> pthread_create with affinity. And since it returns EINVAL in such system
> configuration, benchmark doesn't run.
>
> To overcome this 1024 CPUs mask size limitation of cpu_set_t,
> change the mask size using the CPU_*_S macros ie, use CPU_ALLOC to
> allocate cpumask, CPU_ALLOC_SIZE for size, CPU_SET_S to set mask bit.
>
> Fix all the relevant places in the code to use mask size which is large
> enough to represent number of possible CPU's in the system.
>
> Fix parse_setup_cpu_list function in numa bench to check if input CPU
> is online before binding task to that CPU. This is to fix failures where,
> though CPU number is within max CPU, it could happen that CPU is offline.
> Here, sched_setaffinity will result in failure when using cpumask having
> that cpu bit set in the mask.
>
> Patch 1 and Patch 2 address fix for perf bench futex and perf bench
> epoll benchmark. Patch 3 and Patch 4 address fix in perf bench numa
> benchmark
>
> Athira Rajeev (4):
> tools/perf: Fix perf bench futex to correct usage of affinity for
> machines with #CPUs > 1K
> tools/perf: Fix perf bench epoll to correct usage of affinity for
> machines with #CPUs > 1K
> tools/perf: Fix perf numa bench to fix usage of affinity for machines
> with #CPUs > 1K
> tools/perf: Fix perf bench numa testcase to check if CPU used to bind
> task is online
>
> Changelog:
> From v1 -> v2:
> Addressed review comment from Ian Rogers to do
> CPU_FREE in a cleaner way.
> Added Tested-by from Disha Goel
The whole set:
Acked-by: Ian Rogers <[email protected]>
Thanks,
Ian
> tools/perf/bench/epoll-ctl.c | 25 ++++--
> tools/perf/bench/epoll-wait.c | 25 ++++--
> tools/perf/bench/futex-hash.c | 26 ++++--
> tools/perf/bench/futex-lock-pi.c | 21 +++--
> tools/perf/bench/futex-requeue.c | 21 +++--
> tools/perf/bench/futex-wake-parallel.c | 21 +++--
> tools/perf/bench/futex-wake.c | 22 ++++--
> tools/perf/bench/numa.c | 105 ++++++++++++++++++-------
> tools/perf/util/header.c | 43 ++++++++++
> tools/perf/util/header.h | 1 +
> 10 files changed, 242 insertions(+), 68 deletions(-)
>
> --
> 2.35.1
>
* Athira Rajeev <[email protected]> [2022-04-06 23:21:13]:
> Perf numa bench test fails with error:
>
> Testcase:
> ./perf bench numa mem -p 2 -t 1 -P 1024 -C 0,8 -M 1,0 -s 20 -zZq
> --thp 1 --no-data_rand_walk
>
> Failure snippet:
> <<>>
> Running 'numa/mem' benchmark:
>
> # Running main, "perf bench numa numa-mem -p 2 -t 1 -P 1024 -C 0,8
> -M 1,0 -s 20 -zZq --thp 1 --no-data_rand_walk"
>
> perf: bench/numa.c:333: bind_to_cpumask: Assertion `!(ret)' failed.
> <<>>
>
> The Testcases uses CPU???s 0 and 8. In function "parse_setup_cpu_list",
> There is check to see if cpu number is greater than max cpu???s possible
> in the system ie via "if (bind_cpu_0 >= g->p.nr_cpus ||
> bind_cpu_1 >= g->p.nr_cpus) {". But it could happen that system has
> say 48 CPU???s, but only number of online CPU???s is 0-7. Other CPU???s
> are offlined. Since "g->p.nr_cpus" is 48, so function will go ahead
> and set bit for CPU 8 also in cpumask ( td->bind_cpumask).
>
> bind_to_cpumask function is called to set affinity using
> sched_setaffinity and the cpumask. Since the CPU8 is not present,
> set affinity will fail here with EINVAL. Fix this issue by adding a
> check to make sure that, CPU???s provided in the input argument values
> are online before proceeding further and skip the test. For this,
> include new helper function "is_cpu_online" in
> "tools/perf/util/header.c".
>
> Since "BIT(x)" definition will get included from header.h, remove
> that from bench/numa.c
>
> Tested-by: Disha Goel <[email protected]>
> Signed-off-by: Athira Rajeev <[email protected]>
> Reported-by: Disha Goel <[email protected]>
Looks good to me.
Reviewed-by: Srikar Dronamraju <[email protected]>
> ---
> tools/perf/bench/numa.c | 8 ++++++--
> tools/perf/util/header.c | 43 ++++++++++++++++++++++++++++++++++++++++
> tools/perf/util/header.h | 1 +
> 3 files changed, 50 insertions(+), 2 deletions(-)
>
> diff --git a/tools/perf/bench/numa.c b/tools/perf/bench/numa.c
> index 29e41e32bd88..7992d79b3e41 100644
> --- a/tools/perf/bench/numa.c
> +++ b/tools/perf/bench/numa.c
> @@ -34,6 +34,7 @@
> #include <linux/numa.h>
> #include <linux/zalloc.h>
>
> +#include "../util/header.h"
> #include <numa.h>
> #include <numaif.h>
>
> @@ -616,6 +617,11 @@ static int parse_setup_cpu_list(void)
> return -1;
> }
>
> + if (is_cpu_online(bind_cpu_0) != 1 || is_cpu_online(bind_cpu_1) != 1) {
> + printf("\nTest not applicable, bind_cpu_0 or bind_cpu_1 is offline\n");
> + return -1;
> + }
> +
> BUG_ON(bind_cpu_0 < 0 || bind_cpu_1 < 0);
> BUG_ON(bind_cpu_0 > bind_cpu_1);
>
> @@ -786,8 +792,6 @@ static int parse_nodes_opt(const struct option *opt __maybe_unused,
> return parse_node_list(arg);
> }
>
> -#define BIT(x) (1ul << x)
> -
> static inline uint32_t lfsr_32(uint32_t lfsr)
> {
> const uint32_t taps = BIT(1) | BIT(5) | BIT(6) | BIT(31);
> diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
> index 6da12e522edc..3f5fcf5d4b3f 100644
> --- a/tools/perf/util/header.c
> +++ b/tools/perf/util/header.c
> @@ -983,6 +983,49 @@ static int write_dir_format(struct feat_fd *ff,
> return do_write(ff, &data->dir.version, sizeof(data->dir.version));
> }
>
> +#define SYSFS "/sys/devices/system/cpu/"
> +
> +/*
> + * Check whether a CPU is online
> + *
> + * Returns:
> + * 1 -> if CPU is online
> + * 0 -> if CPU is offline
> + * -1 -> error case
> + */
> +int is_cpu_online(unsigned int cpu)
> +{
> + char sysfs_cpu[255];
> + char buf[255];
> + struct stat statbuf;
> + size_t len;
> + int fd;
> +
> + snprintf(sysfs_cpu, sizeof(sysfs_cpu), SYSFS "cpu%u", cpu);
> +
> + if (stat(sysfs_cpu, &statbuf) != 0)
> + return 0;
> +
> + /*
> + * Check if /sys/devices/system/cpu/cpux/online file
> + * exists. In kernels without CONFIG_HOTPLUG_CPU, this
> + * file won't exist.
> + */
> + snprintf(sysfs_cpu, sizeof(sysfs_cpu), SYSFS "cpu%u/online", cpu);
> + if (stat(sysfs_cpu, &statbuf) != 0)
> + return 1;
> +
> + fd = open(sysfs_cpu, O_RDONLY);
> + if (fd == -1)
> + return -1;
> +
> + len = read(fd, buf, sizeof(buf) - 1);
> + buf[len] = '\0';
> + close(fd);
> +
> + return strtoul(buf, NULL, 16);
> +}
> +
> #ifdef HAVE_LIBBPF_SUPPORT
> static int write_bpf_prog_info(struct feat_fd *ff,
> struct evlist *evlist __maybe_unused)
> diff --git a/tools/perf/util/header.h b/tools/perf/util/header.h
> index c9e3265832d9..0eb4bc29a5a4 100644
> --- a/tools/perf/util/header.h
> +++ b/tools/perf/util/header.h
> @@ -158,6 +158,7 @@ int do_write(struct feat_fd *fd, const void *buf, size_t size);
> int write_padded(struct feat_fd *fd, const void *bf,
> size_t count, size_t count_aligned);
>
> +int is_cpu_online(unsigned int cpu);
> /*
> * arch specific callback
> */
> --
> 2.35.1
>
* Athira Rajeev <[email protected]> [2022-04-06 23:21:12]:
> perf bench numa testcase fails on systems with CPU's
> more than 1K.
>
> Testcase: perf bench numa mem -p 1 -t 3 -P 512 -s 100 -zZ0qcm --thp 1
> Snippet of code:
> <<>>
> perf: bench/numa.c:302: bind_to_node: Assertion `!(ret)' failed.
> Aborted (core dumped)
> <<>>
>
> bind_to_node function uses "sched_getaffinity" to save the original
> cpumask and this call is returning EINVAL ((invalid argument).
> This happens because the default mask size in glibc is 1024.
> To overcome this 1024 CPUs mask size limitation of cpu_set_t,
> change the mask size using the CPU_*_S macros ie, use CPU_ALLOC to
> allocate cpumask, CPU_ALLOC_SIZE for size. Apart from fixing this
> for "orig_mask", apply same logic to "mask" as well which is used to
> setaffinity so that mask size is large enough to represent number
> of possible CPU's in the system.
>
> sched_getaffinity is used in one more place in perf numa bench. It
> is in "bind_to_cpu" function. Apply the same logic there also. Though
> currently no failure is reported from there, it is ideal to change
> getaffinity to work with such system configurations having CPU's more
> than default mask size supported by glibc.
>
> Also fix "sched_setaffinity" to use mask size which is large enough
> to represent number of possible CPU's in the system.
>
> Fixed all places where "bind_cpumask" which is part of "struct
> thread_data" is used such that bind_cpumask works in all configuration.
>
> Tested-by: Disha Goel <[email protected]>
> Signed-off-by: Athira Rajeev <[email protected]>
> Reported-by: Disha Goel <[email protected]>
Looks good to me.
Reviewed-by: Srikar Dronamraju <[email protected]>
> ---
> tools/perf/bench/numa.c | 97 ++++++++++++++++++++++++++++++-----------
> 1 file changed, 71 insertions(+), 26 deletions(-)
>
> diff --git a/tools/perf/bench/numa.c b/tools/perf/bench/numa.c
> index f2640179ada9..29e41e32bd88 100644
> --- a/tools/perf/bench/numa.c
> +++ b/tools/perf/bench/numa.c
> @@ -54,7 +54,7 @@
>
> struct thread_data {
> int curr_cpu;
> - cpu_set_t bind_cpumask;
> + cpu_set_t *bind_cpumask;
> int bind_node;
> u8 *process_data;
> int process_nr;
> @@ -266,46 +266,71 @@ static bool node_has_cpus(int node)
> return ret;
> }
>
> -static cpu_set_t bind_to_cpu(int target_cpu)
> +static cpu_set_t *bind_to_cpu(int target_cpu)
> {
> - cpu_set_t orig_mask, mask;
> + int nrcpus = numa_num_possible_cpus();
> + cpu_set_t *orig_mask, *mask;
> + size_t size;
> int ret;
>
> - ret = sched_getaffinity(0, sizeof(orig_mask), &orig_mask);
> - BUG_ON(ret);
> + orig_mask = CPU_ALLOC(nrcpus);
> + BUG_ON(!orig_mask);
> + size = CPU_ALLOC_SIZE(nrcpus);
> + CPU_ZERO_S(size, orig_mask);
> +
> + ret = sched_getaffinity(0, size, orig_mask);
> + if (ret) {
> + CPU_FREE(orig_mask);
> + BUG_ON(ret);
> + }
>
> - CPU_ZERO(&mask);
> + mask = CPU_ALLOC(nrcpus);
> + BUG_ON(!mask);
> + CPU_ZERO_S(size, mask);
>
> if (target_cpu == -1) {
> int cpu;
>
> for (cpu = 0; cpu < g->p.nr_cpus; cpu++)
> - CPU_SET(cpu, &mask);
> + CPU_SET_S(cpu, size, mask);
> } else {
> BUG_ON(target_cpu < 0 || target_cpu >= g->p.nr_cpus);
> - CPU_SET(target_cpu, &mask);
> + CPU_SET_S(target_cpu, size, mask);
> }
>
> - ret = sched_setaffinity(0, sizeof(mask), &mask);
> + ret = sched_setaffinity(0, size, mask);
> + CPU_FREE(mask);
> BUG_ON(ret);
>
> return orig_mask;
> }
>
> -static cpu_set_t bind_to_node(int target_node)
> +static cpu_set_t *bind_to_node(int target_node)
> {
> - cpu_set_t orig_mask, mask;
> + int nrcpus = numa_num_possible_cpus();
> + cpu_set_t *orig_mask, *mask;
> + size_t size;
> int cpu;
> int ret;
>
> - ret = sched_getaffinity(0, sizeof(orig_mask), &orig_mask);
> - BUG_ON(ret);
> + orig_mask = CPU_ALLOC(nrcpus);
> + BUG_ON(!orig_mask);
> + size = CPU_ALLOC_SIZE(nrcpus);
> + CPU_ZERO_S(size, orig_mask);
> +
> + ret = sched_getaffinity(0, size, orig_mask);
> + if (ret) {
> + CPU_FREE(orig_mask);
> + BUG_ON(ret);
> + }
>
> - CPU_ZERO(&mask);
> + mask = CPU_ALLOC(nrcpus);
> + BUG_ON(!mask);
> + CPU_ZERO_S(size, mask);
>
> if (target_node == NUMA_NO_NODE) {
> for (cpu = 0; cpu < g->p.nr_cpus; cpu++)
> - CPU_SET(cpu, &mask);
> + CPU_SET_S(cpu, size, mask);
> } else {
> struct bitmask *cpumask = numa_allocate_cpumask();
>
> @@ -313,24 +338,29 @@ static cpu_set_t bind_to_node(int target_node)
> if (!numa_node_to_cpus(target_node, cpumask)) {
> for (cpu = 0; cpu < (int)cpumask->size; cpu++) {
> if (numa_bitmask_isbitset(cpumask, cpu))
> - CPU_SET(cpu, &mask);
> + CPU_SET_S(cpu, size, mask);
> }
> }
> numa_free_cpumask(cpumask);
> }
>
> - ret = sched_setaffinity(0, sizeof(mask), &mask);
> + ret = sched_setaffinity(0, size, mask);
> + CPU_FREE(mask);
> BUG_ON(ret);
>
> return orig_mask;
> }
>
> -static void bind_to_cpumask(cpu_set_t mask)
> +static void bind_to_cpumask(cpu_set_t *mask)
> {
> int ret;
> + size_t size = CPU_ALLOC_SIZE(numa_num_possible_cpus());
>
> - ret = sched_setaffinity(0, sizeof(mask), &mask);
> - BUG_ON(ret);
> + ret = sched_setaffinity(0, size, mask);
> + if (ret) {
> + CPU_FREE(mask);
> + BUG_ON(ret);
> + }
> }
>
> static void mempol_restore(void)
> @@ -376,7 +406,7 @@ do { \
> static u8 *alloc_data(ssize_t bytes0, int map_flags,
> int init_zero, int init_cpu0, int thp, int init_random)
> {
> - cpu_set_t orig_mask;
> + cpu_set_t *orig_mask;
> ssize_t bytes;
> u8 *buf;
> int ret;
> @@ -434,6 +464,7 @@ static u8 *alloc_data(ssize_t bytes0, int map_flags,
> /* Restore affinity: */
> if (init_cpu0) {
> bind_to_cpumask(orig_mask);
> + CPU_FREE(orig_mask);
> mempol_restore();
> }
>
> @@ -589,6 +620,7 @@ static int parse_setup_cpu_list(void)
> BUG_ON(bind_cpu_0 > bind_cpu_1);
>
> for (bind_cpu = bind_cpu_0; bind_cpu <= bind_cpu_1; bind_cpu += step) {
> + size_t size = CPU_ALLOC_SIZE(g->p.nr_cpus);
> int i;
>
> for (i = 0; i < mul; i++) {
> @@ -608,10 +640,12 @@ static int parse_setup_cpu_list(void)
> tprintf("%2d", bind_cpu);
> }
>
> - CPU_ZERO(&td->bind_cpumask);
> + td->bind_cpumask = CPU_ALLOC(g->p.nr_cpus);
> + BUG_ON(!td->bind_cpumask);
> + CPU_ZERO_S(size, td->bind_cpumask);
> for (cpu = bind_cpu; cpu < bind_cpu+bind_len; cpu++) {
> BUG_ON(cpu < 0 || cpu >= g->p.nr_cpus);
> - CPU_SET(cpu, &td->bind_cpumask);
> + CPU_SET_S(cpu, size, td->bind_cpumask);
> }
> t++;
> }
> @@ -1241,7 +1275,7 @@ static void *worker_thread(void *__tdata)
> * by migrating to CPU#0:
> */
> if (first_task && g->p.perturb_secs && (int)(stop.tv_sec - last_perturbance) >= g->p.perturb_secs) {
> - cpu_set_t orig_mask;
> + cpu_set_t *orig_mask;
> int target_cpu;
> int this_cpu;
>
> @@ -1265,6 +1299,7 @@ static void *worker_thread(void *__tdata)
> printf(" (injecting perturbalance, moved to CPU#%d)\n", target_cpu);
>
> bind_to_cpumask(orig_mask);
> + CPU_FREE(orig_mask);
> }
>
> if (details >= 3) {
> @@ -1398,21 +1433,31 @@ static void init_thread_data(void)
>
> for (t = 0; t < g->p.nr_tasks; t++) {
> struct thread_data *td = g->threads + t;
> + size_t cpuset_size = CPU_ALLOC_SIZE(g->p.nr_cpus);
> int cpu;
>
> /* Allow all nodes by default: */
> td->bind_node = NUMA_NO_NODE;
>
> /* Allow all CPUs by default: */
> - CPU_ZERO(&td->bind_cpumask);
> + td->bind_cpumask = CPU_ALLOC(g->p.nr_cpus);
> + BUG_ON(!td->bind_cpumask);
> + CPU_ZERO_S(cpuset_size, td->bind_cpumask);
> for (cpu = 0; cpu < g->p.nr_cpus; cpu++)
> - CPU_SET(cpu, &td->bind_cpumask);
> + CPU_SET_S(cpu, cpuset_size, td->bind_cpumask);
> }
> }
>
> static void deinit_thread_data(void)
> {
> ssize_t size = sizeof(*g->threads)*g->p.nr_tasks;
> + int t;
> +
> + /* Free the bind_cpumask allocated for thread_data */
> + for (t = 0; t < g->p.nr_tasks; t++) {
> + struct thread_data *td = g->threads + t;
> + CPU_FREE(td->bind_cpumask);
> + }
>
> free_data(g->threads, size);
> }
> --
> 2.35.1
>
Em Sat, Apr 09, 2022 at 12:28:01PM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Wed, Apr 06, 2022 at 11:21:09PM +0530, Athira Rajeev escreveu:
> > The perf benchmark for collections: numa, futex and epoll
> > hits failure in system configuration with CPU's more than 1024.
> > These benchmarks uses "sched_getaffinity" and "sched_setaffinity"
> > in the code to work with affinity.
>
> Applied 1-3, 4 needs some reworking and can wait for v5.19, the first 3
> are fixes, so can go now.
Now trying to fix this:
26 7.89 debian:9 : FAIL gcc version 6.3.0 20170516 (Debian 6.3.0-18+deb9u1)
bench/numa.c: In function 'alloc_data':
bench/numa.c:359:6: error: 'orig_mask' may be used uninitialized in this function [-Werror=maybe-uninitialized]
ret = sched_setaffinity(0, size, mask);
~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
bench/numa.c:409:13: note: 'orig_mask' was declared here
cpu_set_t *orig_mask;
^~~~~~~~~
cc1: all warnings being treated as errors
/git/perf-5.18.0-rc1/tools/build/Makefile.build:139: recipe for target 'bench' failed
make[3]: *** [bench] Error 2
Happened in several distros.
- Arnaldo
> On 08-Apr-2022, at 5:56 PM, Srikar Dronamraju <[email protected]> wrote:
>
> * Athira Rajeev <[email protected]> [2022-04-06 23:21:13]:
>
>> Perf numa bench test fails with error:
>>
>> Testcase:
>> ./perf bench numa mem -p 2 -t 1 -P 1024 -C 0,8 -M 1,0 -s 20 -zZq
>> --thp 1 --no-data_rand_walk
>>
>> Failure snippet:
>> <<>>
>> Running 'numa/mem' benchmark:
>>
>> # Running main, "perf bench numa numa-mem -p 2 -t 1 -P 1024 -C 0,8
>> -M 1,0 -s 20 -zZq --thp 1 --no-data_rand_walk"
>>
>> perf: bench/numa.c:333: bind_to_cpumask: Assertion `!(ret)' failed.
>> <<>>
>>
>> The Testcases uses CPU???s 0 and 8. In function "parse_setup_cpu_list",
>> There is check to see if cpu number is greater than max cpu???s possible
>> in the system ie via "if (bind_cpu_0 >= g->p.nr_cpus ||
>> bind_cpu_1 >= g->p.nr_cpus) {". But it could happen that system has
>> say 48 CPU???s, but only number of online CPU???s is 0-7. Other CPU???s
>> are offlined. Since "g->p.nr_cpus" is 48, so function will go ahead
>> and set bit for CPU 8 also in cpumask ( td->bind_cpumask).
>>
>> bind_to_cpumask function is called to set affinity using
>> sched_setaffinity and the cpumask. Since the CPU8 is not present,
>> set affinity will fail here with EINVAL. Fix this issue by adding a
>> check to make sure that, CPU???s provided in the input argument values
>> are online before proceeding further and skip the test. For this,
>> include new helper function "is_cpu_online" in
>> "tools/perf/util/header.c".
>>
>> Since "BIT(x)" definition will get included from header.h, remove
>> that from bench/numa.c
>>
>> Tested-by: Disha Goel <[email protected]>
>> Signed-off-by: Athira Rajeev <[email protected]>
>> Reported-by: Disha Goel <[email protected]>
>
> Looks good to me.
> Reviewed-by: Srikar Dronamraju <[email protected]>
Hi Srikar,
Thanks for the review
Athira
>
>> ---
>> tools/perf/bench/numa.c | 8 ++++++--
>> tools/perf/util/header.c | 43 ++++++++++++++++++++++++++++++++++++++++
>> tools/perf/util/header.h | 1 +
>> 3 files changed, 50 insertions(+), 2 deletions(-)
>>
>> diff --git a/tools/perf/bench/numa.c b/tools/perf/bench/numa.c
>> index 29e41e32bd88..7992d79b3e41 100644
>> --- a/tools/perf/bench/numa.c
>> +++ b/tools/perf/bench/numa.c
>> @@ -34,6 +34,7 @@
>> #include <linux/numa.h>
>> #include <linux/zalloc.h>
>>
>> +#include "../util/header.h"
>> #include <numa.h>
>> #include <numaif.h>
>>
>> @@ -616,6 +617,11 @@ static int parse_setup_cpu_list(void)
>> return -1;
>> }
>>
>> + if (is_cpu_online(bind_cpu_0) != 1 || is_cpu_online(bind_cpu_1) != 1) {
>> + printf("\nTest not applicable, bind_cpu_0 or bind_cpu_1 is offline\n");
>> + return -1;
>> + }
>> +
>> BUG_ON(bind_cpu_0 < 0 || bind_cpu_1 < 0);
>> BUG_ON(bind_cpu_0 > bind_cpu_1);
>>
>> @@ -786,8 +792,6 @@ static int parse_nodes_opt(const struct option *opt __maybe_unused,
>> return parse_node_list(arg);
>> }
>>
>> -#define BIT(x) (1ul << x)
>> -
>> static inline uint32_t lfsr_32(uint32_t lfsr)
>> {
>> const uint32_t taps = BIT(1) | BIT(5) | BIT(6) | BIT(31);
>> diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
>> index 6da12e522edc..3f5fcf5d4b3f 100644
>> --- a/tools/perf/util/header.c
>> +++ b/tools/perf/util/header.c
>> @@ -983,6 +983,49 @@ static int write_dir_format(struct feat_fd *ff,
>> return do_write(ff, &data->dir.version, sizeof(data->dir.version));
>> }
>>
>> +#define SYSFS "/sys/devices/system/cpu/"
>> +
>> +/*
>> + * Check whether a CPU is online
>> + *
>> + * Returns:
>> + * 1 -> if CPU is online
>> + * 0 -> if CPU is offline
>> + * -1 -> error case
>> + */
>> +int is_cpu_online(unsigned int cpu)
>> +{
>> + char sysfs_cpu[255];
>> + char buf[255];
>> + struct stat statbuf;
>> + size_t len;
>> + int fd;
>> +
>> + snprintf(sysfs_cpu, sizeof(sysfs_cpu), SYSFS "cpu%u", cpu);
>> +
>> + if (stat(sysfs_cpu, &statbuf) != 0)
>> + return 0;
>> +
>> + /*
>> + * Check if /sys/devices/system/cpu/cpux/online file
>> + * exists. In kernels without CONFIG_HOTPLUG_CPU, this
>> + * file won't exist.
>> + */
>> + snprintf(sysfs_cpu, sizeof(sysfs_cpu), SYSFS "cpu%u/online", cpu);
>> + if (stat(sysfs_cpu, &statbuf) != 0)
>> + return 1;
>> +
>> + fd = open(sysfs_cpu, O_RDONLY);
>> + if (fd == -1)
>> + return -1;
>> +
>> + len = read(fd, buf, sizeof(buf) - 1);
>> + buf[len] = '\0';
>> + close(fd);
>> +
>> + return strtoul(buf, NULL, 16);
>> +}
>> +
>> #ifdef HAVE_LIBBPF_SUPPORT
>> static int write_bpf_prog_info(struct feat_fd *ff,
>> struct evlist *evlist __maybe_unused)
>> diff --git a/tools/perf/util/header.h b/tools/perf/util/header.h
>> index c9e3265832d9..0eb4bc29a5a4 100644
>> --- a/tools/perf/util/header.h
>> +++ b/tools/perf/util/header.h
>> @@ -158,6 +158,7 @@ int do_write(struct feat_fd *fd, const void *buf, size_t size);
>> int write_padded(struct feat_fd *fd, const void *bf,
>> size_t count, size_t count_aligned);
>>
>> +int is_cpu_online(unsigned int cpu);
>> /*
>> * arch specific callback
>> */
>> --
>> 2.35.1
Em Wed, Apr 06, 2022 at 11:21:13PM +0530, Athira Rajeev escreveu:
> Perf numa bench test fails with error:
>
> Testcase:
> ./perf bench numa mem -p 2 -t 1 -P 1024 -C 0,8 -M 1,0 -s 20 -zZq
> --thp 1 --no-data_rand_walk
>
> Failure snippet:
> <<>>
> Running 'numa/mem' benchmark:
>
> # Running main, "perf bench numa numa-mem -p 2 -t 1 -P 1024 -C 0,8
> -M 1,0 -s 20 -zZq --thp 1 --no-data_rand_walk"
>
> perf: bench/numa.c:333: bind_to_cpumask: Assertion `!(ret)' failed.
> <<>>
>
> The Testcases uses CPU’s 0 and 8. In function "parse_setup_cpu_list",
> There is check to see if cpu number is greater than max cpu’s possible
> in the system ie via "if (bind_cpu_0 >= g->p.nr_cpus ||
> bind_cpu_1 >= g->p.nr_cpus) {". But it could happen that system has
> say 48 CPU’s, but only number of online CPU’s is 0-7. Other CPU’s
> are offlined. Since "g->p.nr_cpus" is 48, so function will go ahead
> and set bit for CPU 8 also in cpumask ( td->bind_cpumask).
>
> bind_to_cpumask function is called to set affinity using
> sched_setaffinity and the cpumask. Since the CPU8 is not present,
> set affinity will fail here with EINVAL. Fix this issue by adding a
> check to make sure that, CPU’s provided in the input argument values
> are online before proceeding further and skip the test. For this,
> include new helper function "is_cpu_online" in
> "tools/perf/util/header.c".
>
> Since "BIT(x)" definition will get included from header.h, remove
> that from bench/numa.c
>
> Tested-by: Disha Goel <[email protected]>
> Signed-off-by: Athira Rajeev <[email protected]>
> Reported-by: Disha Goel <[email protected]>
> ---
> tools/perf/bench/numa.c | 8 ++++++--
> tools/perf/util/header.c | 43 ++++++++++++++++++++++++++++++++++++++++
> tools/perf/util/header.h | 1 +
> 3 files changed, 50 insertions(+), 2 deletions(-)
>
> diff --git a/tools/perf/bench/numa.c b/tools/perf/bench/numa.c
> index 29e41e32bd88..7992d79b3e41 100644
> --- a/tools/perf/bench/numa.c
> +++ b/tools/perf/bench/numa.c
> @@ -34,6 +34,7 @@
> #include <linux/numa.h>
> #include <linux/zalloc.h>
>
> +#include "../util/header.h"
> #include <numa.h>
> #include <numaif.h>
>
> @@ -616,6 +617,11 @@ static int parse_setup_cpu_list(void)
> return -1;
> }
>
> + if (is_cpu_online(bind_cpu_0) != 1 || is_cpu_online(bind_cpu_1) != 1) {
> + printf("\nTest not applicable, bind_cpu_0 or bind_cpu_1 is offline\n");
> + return -1;
> + }
> +
> BUG_ON(bind_cpu_0 < 0 || bind_cpu_1 < 0);
> BUG_ON(bind_cpu_0 > bind_cpu_1);
>
> @@ -786,8 +792,6 @@ static int parse_nodes_opt(const struct option *opt __maybe_unused,
> return parse_node_list(arg);
> }
>
> -#define BIT(x) (1ul << x)
> -
> static inline uint32_t lfsr_32(uint32_t lfsr)
> {
> const uint32_t taps = BIT(1) | BIT(5) | BIT(6) | BIT(31);
> diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
> index 6da12e522edc..3f5fcf5d4b3f 100644
> --- a/tools/perf/util/header.c
> +++ b/tools/perf/util/header.c
> @@ -983,6 +983,49 @@ static int write_dir_format(struct feat_fd *ff,
> return do_write(ff, &data->dir.version, sizeof(data->dir.version));
> }
>
> +#define SYSFS "/sys/devices/system/cpu/"
Please use
int sysfs__read_str(const char *entry, char **buf, size_t *sizep)
See how to use it in the smt_on() function at tools/perf/util/smt.c, for
example.
Now looking at the first patches in the series.
- Arnaldo
> +/*
> + * Check whether a CPU is online
> + *
> + * Returns:
> + * 1 -> if CPU is online
> + * 0 -> if CPU is offline
> + * -1 -> error case
> + */
> +int is_cpu_online(unsigned int cpu)
> +{
> + char sysfs_cpu[255];
> + char buf[255];
> + struct stat statbuf;
> + size_t len;
> + int fd;
> +
> + snprintf(sysfs_cpu, sizeof(sysfs_cpu), SYSFS "cpu%u", cpu);
> +
> + if (stat(sysfs_cpu, &statbuf) != 0)
> + return 0;
> +
> + /*
> + * Check if /sys/devices/system/cpu/cpux/online file
> + * exists. In kernels without CONFIG_HOTPLUG_CPU, this
> + * file won't exist.
> + */
> + snprintf(sysfs_cpu, sizeof(sysfs_cpu), SYSFS "cpu%u/online", cpu);
> + if (stat(sysfs_cpu, &statbuf) != 0)
> + return 1;
> +
> + fd = open(sysfs_cpu, O_RDONLY);
> + if (fd == -1)
> + return -1;
> +
> + len = read(fd, buf, sizeof(buf) - 1);
> + buf[len] = '\0';
> + close(fd);
> +
> + return strtoul(buf, NULL, 16);
> +}
> +
> #ifdef HAVE_LIBBPF_SUPPORT
> static int write_bpf_prog_info(struct feat_fd *ff,
> struct evlist *evlist __maybe_unused)
> diff --git a/tools/perf/util/header.h b/tools/perf/util/header.h
> index c9e3265832d9..0eb4bc29a5a4 100644
> --- a/tools/perf/util/header.h
> +++ b/tools/perf/util/header.h
> @@ -158,6 +158,7 @@ int do_write(struct feat_fd *fd, const void *buf, size_t size);
> int write_padded(struct feat_fd *fd, const void *bf,
> size_t count, size_t count_aligned);
>
> +int is_cpu_online(unsigned int cpu);
> /*
> * arch specific callback
> */
> --
> 2.35.1
--
- Arnaldo
Em Wed, Apr 06, 2022 at 11:21:09PM +0530, Athira Rajeev escreveu:
> The perf benchmark for collections: numa, futex and epoll
> hits failure in system configuration with CPU's more than 1024.
> These benchmarks uses "sched_getaffinity" and "sched_setaffinity"
> in the code to work with affinity.
Applied 1-3, 4 needs some reworking and can wait for v5.19, the first 3
are fixes, so can go now.
- Arnaldo
> On 09-Apr-2022, at 8:50 PM, Arnaldo Carvalho de Melo <[email protected]> wrote:
>
> Em Wed, Apr 06, 2022 at 11:21:13PM +0530, Athira Rajeev escreveu:
>> Perf numa bench test fails with error:
>>
>> Testcase:
>> ./perf bench numa mem -p 2 -t 1 -P 1024 -C 0,8 -M 1,0 -s 20 -zZq
>> --thp 1 --no-data_rand_walk
>>
>> Failure snippet:
>> <<>>
>> Running 'numa/mem' benchmark:
>>
>> # Running main, "perf bench numa numa-mem -p 2 -t 1 -P 1024 -C 0,8
>> -M 1,0 -s 20 -zZq --thp 1 --no-data_rand_walk"
>>
>> perf: bench/numa.c:333: bind_to_cpumask: Assertion `!(ret)' failed.
>> <<>>
>>
>> The Testcases uses CPU’s 0 and 8. In function "parse_setup_cpu_list",
>> There is check to see if cpu number is greater than max cpu’s possible
>> in the system ie via "if (bind_cpu_0 >= g->p.nr_cpus ||
>> bind_cpu_1 >= g->p.nr_cpus) {". But it could happen that system has
>> say 48 CPU’s, but only number of online CPU’s is 0-7. Other CPU’s
>> are offlined. Since "g->p.nr_cpus" is 48, so function will go ahead
>> and set bit for CPU 8 also in cpumask ( td->bind_cpumask).
>>
>> bind_to_cpumask function is called to set affinity using
>> sched_setaffinity and the cpumask. Since the CPU8 is not present,
>> set affinity will fail here with EINVAL. Fix this issue by adding a
>> check to make sure that, CPU’s provided in the input argument values
>> are online before proceeding further and skip the test. For this,
>> include new helper function "is_cpu_online" in
>> "tools/perf/util/header.c".
>>
>> Since "BIT(x)" definition will get included from header.h, remove
>> that from bench/numa.c
>>
>> Tested-by: Disha Goel <[email protected]>
>> Signed-off-by: Athira Rajeev <[email protected]>
>> Reported-by: Disha Goel <[email protected]>
>> ---
>> tools/perf/bench/numa.c | 8 ++++++--
>> tools/perf/util/header.c | 43 ++++++++++++++++++++++++++++++++++++++++
>> tools/perf/util/header.h | 1 +
>> 3 files changed, 50 insertions(+), 2 deletions(-)
>>
>> diff --git a/tools/perf/bench/numa.c b/tools/perf/bench/numa.c
>> index 29e41e32bd88..7992d79b3e41 100644
>> --- a/tools/perf/bench/numa.c
>> +++ b/tools/perf/bench/numa.c
>> @@ -34,6 +34,7 @@
>> #include <linux/numa.h>
>> #include <linux/zalloc.h>
>>
>> +#include "../util/header.h"
>> #include <numa.h>
>> #include <numaif.h>
>>
>> @@ -616,6 +617,11 @@ static int parse_setup_cpu_list(void)
>> return -1;
>> }
>>
>> + if (is_cpu_online(bind_cpu_0) != 1 || is_cpu_online(bind_cpu_1) != 1) {
>> + printf("\nTest not applicable, bind_cpu_0 or bind_cpu_1 is offline\n");
>> + return -1;
>> + }
>> +
>> BUG_ON(bind_cpu_0 < 0 || bind_cpu_1 < 0);
>> BUG_ON(bind_cpu_0 > bind_cpu_1);
>>
>> @@ -786,8 +792,6 @@ static int parse_nodes_opt(const struct option *opt __maybe_unused,
>> return parse_node_list(arg);
>> }
>>
>> -#define BIT(x) (1ul << x)
>> -
>> static inline uint32_t lfsr_32(uint32_t lfsr)
>> {
>> const uint32_t taps = BIT(1) | BIT(5) | BIT(6) | BIT(31);
>> diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
>> index 6da12e522edc..3f5fcf5d4b3f 100644
>> --- a/tools/perf/util/header.c
>> +++ b/tools/perf/util/header.c
>> @@ -983,6 +983,49 @@ static int write_dir_format(struct feat_fd *ff,
>> return do_write(ff, &data->dir.version, sizeof(data->dir.version));
>> }
>>
>> +#define SYSFS "/sys/devices/system/cpu/"
>
> Please use
>
> int sysfs__read_str(const char *entry, char **buf, size_t *sizep)
Hi Arnaldo,
Sure, I will send a V3 for this separately which uses “sysfs__read_str”
Thanks for the review
Athira
>
> See how to use it in the smt_on() function at tools/perf/util/smt.c, for
> example.
>
> Now looking at the first patches in the series.
>
> - Arnaldo
>
>> +/*
>> + * Check whether a CPU is online
>> + *
>> + * Returns:
>> + * 1 -> if CPU is online
>> + * 0 -> if CPU is offline
>> + * -1 -> error case
>> + */
>> +int is_cpu_online(unsigned int cpu)
>> +{
>> + char sysfs_cpu[255];
>> + char buf[255];
>> + struct stat statbuf;
>> + size_t len;
>> + int fd;
>> +
>> + snprintf(sysfs_cpu, sizeof(sysfs_cpu), SYSFS "cpu%u", cpu);
>> +
>> + if (stat(sysfs_cpu, &statbuf) != 0)
>> + return 0;
>> +
>> + /*
>> + * Check if /sys/devices/system/cpu/cpux/online file
>> + * exists. In kernels without CONFIG_HOTPLUG_CPU, this
>> + * file won't exist.
>> + */
>> + snprintf(sysfs_cpu, sizeof(sysfs_cpu), SYSFS "cpu%u/online", cpu);
>> + if (stat(sysfs_cpu, &statbuf) != 0)
>> + return 1;
>> +
>> + fd = open(sysfs_cpu, O_RDONLY);
>> + if (fd == -1)
>> + return -1;
>> +
>> + len = read(fd, buf, sizeof(buf) - 1);
>> + buf[len] = '\0';
>> + close(fd);
>> +
>> + return strtoul(buf, NULL, 16);
>> +}
>> +
>> #ifdef HAVE_LIBBPF_SUPPORT
>> static int write_bpf_prog_info(struct feat_fd *ff,
>> struct evlist *evlist __maybe_unused)
>> diff --git a/tools/perf/util/header.h b/tools/perf/util/header.h
>> index c9e3265832d9..0eb4bc29a5a4 100644
>> --- a/tools/perf/util/header.h
>> +++ b/tools/perf/util/header.h
>> @@ -158,6 +158,7 @@ int do_write(struct feat_fd *fd, const void *buf, size_t size);
>> int write_padded(struct feat_fd *fd, const void *bf,
>> size_t count, size_t count_aligned);
>>
>> +int is_cpu_online(unsigned int cpu);
>> /*
>> * arch specific callback
>> */
>> --
>> 2.35.1
>
> --
>
> - Arnaldo
> On 09-Apr-2022, at 10:48 PM, Arnaldo Carvalho de Melo <[email protected]> wrote:
>
> Em Sat, Apr 09, 2022 at 12:28:01PM -0300, Arnaldo Carvalho de Melo escreveu:
>> Em Wed, Apr 06, 2022 at 11:21:09PM +0530, Athira Rajeev escreveu:
>>> The perf benchmark for collections: numa, futex and epoll
>>> hits failure in system configuration with CPU's more than 1024.
>>> These benchmarks uses "sched_getaffinity" and "sched_setaffinity"
>>> in the code to work with affinity.
>>
>> Applied 1-3, 4 needs some reworking and can wait for v5.19, the first 3
>> are fixes, so can go now.
>
> Now trying to fix this:
>
> 26 7.89 debian:9 : FAIL gcc version 6.3.0 20170516 (Debian 6.3.0-18+deb9u1)
> bench/numa.c: In function 'alloc_data':
> bench/numa.c:359:6: error: 'orig_mask' may be used uninitialized in this function [-Werror=maybe-uninitialized]
> ret = sched_setaffinity(0, size, mask);
> ~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> bench/numa.c:409:13: note: 'orig_mask' was declared here
> cpu_set_t *orig_mask;
> ^~~~~~~~~
> cc1: all warnings being treated as errors
> /git/perf-5.18.0-rc1/tools/build/Makefile.build:139: recipe for target 'bench' failed
> make[3]: *** [bench] Error 2
>
>
> Happened in several distros.
Hi Arnaldo
Thanks for pointing it. I could be able to recreate this compile error in Debian.
The reason for this issue is variable orig_mask which is used and initialised in “alloc_data"
function within if condition for "init_cpu0”. We can fix this issue by initialising it to NULL since
it is accessed conditionally. I also made some changes to CPU_FREE the mask in other error paths.
I will post a V3 with these changes.
Athira
>
> - Arnaldo