2019-10-20 17:54:21

by Andi Kleen

[permalink] [raw]
Subject: Optimize perf stat for large number of events/cpus v2

[The earlier v1 version had a lot of conflicts against some
recent libperf changes in tip/perf/core. Resolve that and
also fix some minor issues.]

This patch kit optimizes perf stat for a large number of events
on systems with many CPUs and PMUs.

Some profiling shows that the most overhead is doing IPIs to
all the target CPUs. We can optimize this by using sched_setaffinity
to set the affinity to a target CPU once and then doing
the perf operation for all events on that CPU. This requires
some restructuring, but cuts the set up time quite a bit.

In theory we could go further by parallelizing these setups
too, but that would be much more complicated and for now just batching it
per CPU seems to be sufficient. At some point with many more cores
parallelization or a better bulk perf setup API might be needed though.

In addition perf does a lot of redundant /sys accesses with
many PMUs, which can be also expensve. This is also optimized.

On a large test case (>700 events with many weak groups) on a 94 CPU
system I go from

real 0m8.607s
user 0m0.550s
sys 0m8.041s

to

real 0m3.269s
user 0m0.760s
sys 0m1.694s

so shaving ~6 seconds of system time, at slightly more cost
in perf stat itself. On a 4 socket system with the savings
are more dramatic:

real 0m15.641s
user 0m0.873s
sys 0m14.729s

to

real 0m4.493s
user 0m1.578s
sys 0m2.444s

so 11s difference in the user visible set up time.

Also available in

git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc perf/stat-scale-4

v1: Initial post.
v2: Rebase. Fix some minor issues.

-Andi


2019-10-20 17:54:42

by Andi Kleen

[permalink] [raw]
Subject: [PATCH v2 9/9] perf stat: Use affinity for enabling/disabling events

From: Andi Kleen <[email protected]>

Restructure event enabling/disabling to use affinity, which
minimizes the number of IPIs needed.

Before on a large test case with 94 CPUs:

% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
54.65 1.899986 22 84812 660 ioctl

after:

39.21 0.930451 10 84796 644 ioctl

Signed-off-by: Andi Kleen <[email protected]>
---
tools/perf/lib/evsel.c | 49 ++++++++++++++++++++--------
tools/perf/lib/include/perf/evsel.h | 2 ++
tools/perf/util/evlist.c | 50 ++++++++++++++++++++++++++---
tools/perf/util/evsel.c | 13 ++++++++
tools/perf/util/evsel.h | 2 ++
5 files changed, 98 insertions(+), 18 deletions(-)

diff --git a/tools/perf/lib/evsel.c b/tools/perf/lib/evsel.c
index ea775dacbd2d..89ddfade0b96 100644
--- a/tools/perf/lib/evsel.c
+++ b/tools/perf/lib/evsel.c
@@ -198,38 +198,61 @@ int perf_evsel__read(struct perf_evsel *evsel, int cpu, int thread,
}

static int perf_evsel__run_ioctl(struct perf_evsel *evsel,
- int ioc, void *arg)
+ int ioc, void *arg,
+ int cpu)
{
- int cpu, thread;
+ int thread;

- for (cpu = 0; cpu < xyarray__max_x(evsel->fd); cpu++) {
- for (thread = 0; thread < xyarray__max_y(evsel->fd); thread++) {
- int fd = FD(evsel, cpu, thread),
- err = ioctl(fd, ioc, arg);
+ for (thread = 0; thread < xyarray__max_y(evsel->fd); thread++) {
+ int fd = FD(evsel, cpu, thread),
+ err = ioctl(fd, ioc, arg);

- if (err)
- return err;
- }
+ if (err)
+ return err;
}

return 0;
}

+int perf_evsel__enable_cpu(struct perf_evsel *evsel, int cpu)
+{
+ return perf_evsel__run_ioctl(evsel, PERF_EVENT_IOC_ENABLE, 0, cpu);
+}
+
int perf_evsel__enable(struct perf_evsel *evsel)
{
- return perf_evsel__run_ioctl(evsel, PERF_EVENT_IOC_ENABLE, 0);
+ int i;
+ int err = 0;
+
+ for (i = 0; i < evsel->cpus->nr && !err; i++)
+ err = perf_evsel__run_ioctl(evsel, PERF_EVENT_IOC_ENABLE, 0, i);
+ return err;
+}
+
+int perf_evsel__disable_cpu(struct perf_evsel *evsel, int cpu)
+{
+ return perf_evsel__run_ioctl(evsel, PERF_EVENT_IOC_DISABLE, 0, cpu);
}

int perf_evsel__disable(struct perf_evsel *evsel)
{
- return perf_evsel__run_ioctl(evsel, PERF_EVENT_IOC_DISABLE, 0);
+ int i;
+ int err = 0;
+
+ for (i = 0; i < evsel->cpus->nr && !err; i++)
+ err = perf_evsel__run_ioctl(evsel, PERF_EVENT_IOC_DISABLE, 0, i);
+ return err;
}

int perf_evsel__apply_filter(struct perf_evsel *evsel, const char *filter)
{
- return perf_evsel__run_ioctl(evsel,
+ int err = 0, i;
+
+ for (i = 0; i < evsel->cpus->nr && !err; i++)
+ err = perf_evsel__run_ioctl(evsel,
PERF_EVENT_IOC_SET_FILTER,
- (void *)filter);
+ (void *)filter, i);
+ return err;
}

struct perf_cpu_map *perf_evsel__cpus(struct perf_evsel *evsel)
diff --git a/tools/perf/lib/include/perf/evsel.h b/tools/perf/lib/include/perf/evsel.h
index ed10a914cd3f..db31e512a120 100644
--- a/tools/perf/lib/include/perf/evsel.h
+++ b/tools/perf/lib/include/perf/evsel.h
@@ -32,7 +32,9 @@ LIBPERF_API void perf_evsel__close_cpu(struct perf_evsel *evsel, int cpu);
LIBPERF_API int perf_evsel__read(struct perf_evsel *evsel, int cpu, int thread,
struct perf_counts_values *count);
LIBPERF_API int perf_evsel__enable(struct perf_evsel *evsel);
+LIBPERF_API int perf_evsel__enable_cpu(struct perf_evsel *evsel, int cpu);
LIBPERF_API int perf_evsel__disable(struct perf_evsel *evsel);
+LIBPERF_API int perf_evsel__disable_cpu(struct perf_evsel *evsel, int cpu);
LIBPERF_API struct perf_cpu_map *perf_evsel__cpus(struct perf_evsel *evsel);
LIBPERF_API struct perf_thread_map *perf_evsel__threads(struct perf_evsel *evsel);
LIBPERF_API struct perf_event_attr *perf_evsel__attr(struct perf_evsel *evsel);
diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index bcb8a3670f3f..55f38a71ad30 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -378,26 +378,66 @@ void evlist__cpu_iter_next(struct evsel *ev)
void evlist__disable(struct evlist *evlist)
{
struct evsel *pos;
+ struct affinity affinity;
+ struct perf_cpu_map *cpus;
+ int i;

+ if (affinity__setup(&affinity) < 0)
+ return;
+
+ cpus = evlist__cpu_iter_start(evlist);
+ for (i = 0; i < cpus->nr; i++) {
+ int cpu = cpus->map[i];
+ affinity__set(&affinity, cpu);
+
+ evlist__for_each_entry(evlist, pos) {
+ if (evlist__cpu_iter_skip(pos, cpu))
+ continue;
+ if (pos->disabled || !perf_evsel__is_group_leader(pos) || !pos->core.fd)
+ continue;
+ evsel__disable_cpu(pos, pos->cpu_index);
+ evlist__cpu_iter_next(pos);
+ }
+ }
+ affinity__cleanup(&affinity);
evlist__for_each_entry(evlist, pos) {
- if (pos->disabled || !perf_evsel__is_group_leader(pos) || !pos->core.fd)
+ if (!perf_evsel__is_group_leader(pos) || !pos->core.fd)
continue;
- evsel__disable(pos);
+ pos->disabled = true;
}
-
evlist->enabled = false;
}

void evlist__enable(struct evlist *evlist)
{
struct evsel *pos;
+ struct affinity affinity;
+ struct perf_cpu_map *cpus;
+ int i;
+
+ if (affinity__setup(&affinity) < 0)
+ return;
+
+ cpus = evlist__cpu_iter_start(evlist);
+ for (i = 0; i < cpus->nr; i++) {
+ int cpu = cpus->map[i];
+ affinity__set(&affinity, cpu);

+ evlist__for_each_entry(evlist, pos) {
+ if (evlist__cpu_iter_skip(pos, cpu))
+ continue;
+ if (!perf_evsel__is_group_leader(pos) || !pos->core.fd)
+ continue;
+ evsel__enable_cpu(pos, pos->cpu_index);
+ evlist__cpu_iter_next(pos);
+ }
+ }
+ affinity__cleanup(&affinity);
evlist__for_each_entry(evlist, pos) {
if (!perf_evsel__is_group_leader(pos) || !pos->core.fd)
continue;
- evsel__enable(pos);
+ pos->disabled = false;
}
-
evlist->enabled = true;
}

diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 7106f9a067df..79050a6f4991 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -1205,13 +1205,26 @@ int perf_evsel__append_addr_filter(struct evsel *evsel, const char *filter)
return perf_evsel__append_filter(evsel, "%s,%s", filter);
}

+/* Caller has to clear disabled after going through all CPUs. */
+int evsel__enable_cpu(struct evsel *evsel, int cpu)
+{
+ int err = perf_evsel__enable_cpu(&evsel->core, cpu);
+ return err;
+}
+
int evsel__enable(struct evsel *evsel)
{
int err = perf_evsel__enable(&evsel->core);

if (!err)
evsel->disabled = false;
+ return err;
+}

+/* Caller has to set disabled after going through all CPUs. */
+int evsel__disable_cpu(struct evsel *evsel, int cpu)
+{
+ int err = perf_evsel__disable_cpu(&evsel->core, cpu);
return err;
}

diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
index 9fc9f6698aa4..15977bbe7b63 100644
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
@@ -222,8 +222,10 @@ int perf_evsel__set_filter(struct evsel *evsel, const char *filter);
int perf_evsel__append_tp_filter(struct evsel *evsel, const char *filter);
int perf_evsel__append_addr_filter(struct evsel *evsel,
const char *filter);
+int evsel__enable_cpu(struct evsel *evsel, int cpu);
int evsel__enable(struct evsel *evsel);
int evsel__disable(struct evsel *evsel);
+int evsel__disable_cpu(struct evsel *evsel, int cpu);

int perf_evsel__open_per_cpu(struct evsel *evsel,
struct perf_cpu_map *cpus,
--
2.21.0

2019-10-20 17:55:44

by Andi Kleen

[permalink] [raw]
Subject: [PATCH v2 2/9] perf evsel: Avoid close(-1)

From: Andi Kleen <[email protected]>

In some weak fallback cases close can be called a lot with -1. Check
for this case and avoid calling close then.

This is mainly to shut up valgrind which complains about this case.

Signed-off-by: Andi Kleen <[email protected]>
---
tools/perf/lib/evsel.c | 3 ++-
tools/perf/util/evsel.c | 3 ++-
2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/tools/perf/lib/evsel.c b/tools/perf/lib/evsel.c
index a8cb582e2721..5a89857b0381 100644
--- a/tools/perf/lib/evsel.c
+++ b/tools/perf/lib/evsel.c
@@ -120,7 +120,8 @@ void perf_evsel__close_fd(struct perf_evsel *evsel)

for (cpu = 0; cpu < xyarray__max_x(evsel->fd); cpu++)
for (thread = 0; thread < xyarray__max_y(evsel->fd); ++thread) {
- close(FD(evsel, cpu, thread));
+ if (FD(evsel, cpu, thread) >= 0)
+ close(FD(evsel, cpu, thread));
FD(evsel, cpu, thread) = -1;
}
}
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index d831038b55f2..d4451846af93 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -1815,7 +1815,8 @@ int evsel__open(struct evsel *evsel, struct perf_cpu_map *cpus,
old_errno = errno;
do {
while (--thread >= 0) {
- close(FD(evsel, cpu, thread));
+ if (FD(evsel, cpu, thread) >= 0)
+ close(FD(evsel, cpu, thread));
FD(evsel, cpu, thread) = -1;
}
thread = nthreads;
--
2.21.0

2019-10-20 17:57:19

by Andi Kleen

[permalink] [raw]
Subject: [PATCH v2 6/9] perf stat: Use affinity for closing file descriptors

From: Andi Kleen <[email protected]>

Closing a perf fd can also trigger an IPI to the target CPU.
Use the same affinity technique as we use for reading/enabling events
to closing to optimize the CPU transitions.

Before on a large test case with 94 CPUs:

% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
32.56 3.085463 50 61483 close

After:

10.54 0.735704 11 61485 close

Signed-off-by: Andi Kleen <[email protected]>
---
tools/perf/lib/evsel.c | 27 +++++++++++++++++++------
tools/perf/lib/include/perf/evsel.h | 1 +
tools/perf/util/evlist.c | 31 +++++++++++++++++++++++++++--
tools/perf/util/evsel.h | 1 +
4 files changed, 52 insertions(+), 8 deletions(-)

diff --git a/tools/perf/lib/evsel.c b/tools/perf/lib/evsel.c
index 5a89857b0381..ea775dacbd2d 100644
--- a/tools/perf/lib/evsel.c
+++ b/tools/perf/lib/evsel.c
@@ -114,16 +114,23 @@ int perf_evsel__open(struct perf_evsel *evsel, struct perf_cpu_map *cpus,
return err;
}

+static void perf_evsel__close_fd_cpu(struct perf_evsel *evsel, int cpu)
+{
+ int thread;
+
+ for (thread = 0; thread < xyarray__max_y(evsel->fd); ++thread) {
+ if (FD(evsel, cpu, thread) >= 0)
+ close(FD(evsel, cpu, thread));
+ FD(evsel, cpu, thread) = -1;
+ }
+}
+
void perf_evsel__close_fd(struct perf_evsel *evsel)
{
- int cpu, thread;
+ int cpu;

for (cpu = 0; cpu < xyarray__max_x(evsel->fd); cpu++)
- for (thread = 0; thread < xyarray__max_y(evsel->fd); ++thread) {
- if (FD(evsel, cpu, thread) >= 0)
- close(FD(evsel, cpu, thread));
- FD(evsel, cpu, thread) = -1;
- }
+ perf_evsel__close_fd_cpu(evsel, cpu);
}

void perf_evsel__free_fd(struct perf_evsel *evsel)
@@ -141,6 +148,14 @@ void perf_evsel__close(struct perf_evsel *evsel)
perf_evsel__free_fd(evsel);
}

+void perf_evsel__close_cpu(struct perf_evsel *evsel, int cpu)
+{
+ if (evsel->fd == NULL)
+ return;
+
+ perf_evsel__close_fd_cpu(evsel, cpu);
+}
+
int perf_evsel__read_size(struct perf_evsel *evsel)
{
u64 read_format = evsel->attr.read_format;
diff --git a/tools/perf/lib/include/perf/evsel.h b/tools/perf/lib/include/perf/evsel.h
index 4388667f265c..ed10a914cd3f 100644
--- a/tools/perf/lib/include/perf/evsel.h
+++ b/tools/perf/lib/include/perf/evsel.h
@@ -28,6 +28,7 @@ LIBPERF_API void perf_evsel__delete(struct perf_evsel *evsel);
LIBPERF_API int perf_evsel__open(struct perf_evsel *evsel, struct perf_cpu_map *cpus,
struct perf_thread_map *threads);
LIBPERF_API void perf_evsel__close(struct perf_evsel *evsel);
+LIBPERF_API void perf_evsel__close_cpu(struct perf_evsel *evsel, int cpu);
LIBPERF_API int perf_evsel__read(struct perf_evsel *evsel, int cpu, int thread,
struct perf_counts_values *count);
LIBPERF_API int perf_evsel__enable(struct perf_evsel *evsel);
diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index 27b4b958eddd..b1b29d473a9f 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -18,6 +18,7 @@
#include "debug.h"
#include "units.h"
#include <internal/lib.h> // page_size
+#include "affinity.h"
#include "../perf.h"
#include "asm/bug.h"
#include "bpf-event.h"
@@ -1174,9 +1175,35 @@ void perf_evlist__set_selected(struct evlist *evlist,
void evlist__close(struct evlist *evlist)
{
struct evsel *evsel;
+ struct affinity affinity;
+ struct perf_cpu_map *cpus;
+ int i;
+
+ /* So far record doesn't set this up */
+ if (!evlist->core.cpus) {
+ evlist__for_each_entry_reverse(evlist, evsel)
+ evsel__close(evsel);
+ return;
+ }

- evlist__for_each_entry_reverse(evlist, evsel)
- evsel__close(evsel);
+ if (affinity__setup(&affinity) < 0)
+ return;
+ cpus = evlist__cpu_iter_start(evlist);
+ for (i = 0; i < cpus->nr; i++) {
+ int cpu = cpus->map[i];
+ affinity__set(&affinity, cpu);
+
+ evlist__for_each_entry_reverse(evlist, evsel) {
+ if (evlist__cpu_iter_skip(evsel, cpu))
+ continue;
+ perf_evsel__close_cpu(&evsel->core, evsel->cpu_index);
+ evlist__cpu_iter_next(evsel);
+ }
+ }
+ evlist__for_each_entry_reverse(evlist, evsel) {
+ perf_evsel__free_fd(&evsel->core);
+ perf_evsel__free_id(&evsel->core);
+ }
}

static int perf_evlist__create_syswide_maps(struct evlist *evlist)
diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
index cf90019ae744..2e3b011ed09e 100644
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
@@ -391,4 +391,5 @@ static inline bool evsel__has_callchain(const struct evsel *evsel)
struct perf_env *perf_evsel__env(struct evsel *evsel);

int perf_evsel__store_ids(struct evsel *evsel, struct evlist *evlist);
+
#endif /* __PERF_EVSEL_H */
--
2.21.0

2019-10-20 17:57:24

by Andi Kleen

[permalink] [raw]
Subject: [PATCH v2 4/9] perf affinity: Add infrastructure to save/restore affinity

From: Andi Kleen <[email protected]>

The kernel perf subsystem has to IPI to the target CPU for many
operations. On systems with many CPUs and when managing many events the
overhead can be dominated by lots of IPIs.

An alternative is to set up CPU affinity in the perf tool, then set up
all the events for that CPU, and then move on to the next CPU.

Add some affinity management infrastructure to enable such a model.
Used in followon patches.

Signed-off-by: Andi Kleen <[email protected]>
---
tools/perf/util/Build | 1 +
tools/perf/util/affinity.c | 71 ++++++++++++++++++++++++++++++++++++++
tools/perf/util/affinity.h | 15 ++++++++
3 files changed, 87 insertions(+)
create mode 100644 tools/perf/util/affinity.c
create mode 100644 tools/perf/util/affinity.h

diff --git a/tools/perf/util/Build b/tools/perf/util/Build
index 2c1504fe924c..c7d4eab017e5 100644
--- a/tools/perf/util/Build
+++ b/tools/perf/util/Build
@@ -76,6 +76,7 @@ perf-y += sort.o
perf-y += hist.o
perf-y += util.o
perf-y += cpumap.o
+perf-y += affinity.o
perf-y += cputopo.o
perf-y += cgroup.o
perf-y += target.o
diff --git a/tools/perf/util/affinity.c b/tools/perf/util/affinity.c
new file mode 100644
index 000000000000..c42a6b9d63f0
--- /dev/null
+++ b/tools/perf/util/affinity.c
@@ -0,0 +1,71 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Manage affinity to optimize IPIs inside the kernel perf API. */
+#define _GNU_SOURCE 1
+#include <sched.h>
+#include <stdlib.h>
+#include <linux/zalloc.h>
+#include "perf.h"
+#include "cpumap.h"
+#include "affinity.h"
+
+static int get_cpu_set_size(void)
+{
+ int sz = (cpu__max_cpu() + 64 - 1) / 64;
+ /*
+ * sched_getaffinity doesn't like masks smaller than the kernel.
+ * Hopefully that's big enough.
+ */
+ if (sz < 4096/8)
+ sz = 4096/8;
+ return sz;
+}
+
+int affinity__setup(struct affinity *a)
+{
+ int cpu_set_size = get_cpu_set_size();
+
+ a->orig_cpus = malloc(cpu_set_size);
+ if (!a->orig_cpus)
+ return -1;
+ sched_getaffinity(0, cpu_set_size, (cpu_set_t *)a->orig_cpus);
+ a->sched_cpus = zalloc(cpu_set_size);
+ if (!a->sched_cpus) {
+ free(a->orig_cpus);
+ return -1;
+ }
+ a->changed = false;
+ return 0;
+}
+
+/*
+ * perf_event_open does an IPI internally to the target CPU.
+ * It is more efficient to change perf's affinity to the target
+ * CPU and then set up all events on that CPU, so we amortize
+ * CPU communication.
+ */
+void affinity__set(struct affinity *a, int cpu)
+{
+ int cpu_set_size = get_cpu_set_size();
+
+ if (cpu == -1)
+ return;
+ a->changed = true;
+ a->sched_cpus[cpu / 8] |= 1 << (cpu % 8);
+ /*
+ * We ignore errors because affinity is just an optimization.
+ * This could happen for example with isolated CPUs or cpusets.
+ * In this case the IPIs inside the kernel's perf API still work.
+ */
+ sched_setaffinity(0, cpu_set_size, (cpu_set_t *)a->sched_cpus);
+ a->sched_cpus[cpu / 8] ^= 1 << (cpu % 8);
+}
+
+void affinity__cleanup(struct affinity *a)
+{
+ int cpu_set_size = get_cpu_set_size();
+
+ if (a->changed)
+ sched_setaffinity(0, cpu_set_size, (cpu_set_t *)a->orig_cpus);
+ free(a->sched_cpus);
+ free(a->orig_cpus);
+}
diff --git a/tools/perf/util/affinity.h b/tools/perf/util/affinity.h
new file mode 100644
index 000000000000..e56148607e33
--- /dev/null
+++ b/tools/perf/util/affinity.h
@@ -0,0 +1,15 @@
+// SPDX-License-Identifier: GPL-2.0
+#ifndef AFFINITY_H
+#define AFFINITY_H 1
+
+struct affinity {
+ unsigned char *orig_cpus;
+ unsigned char *sched_cpus;
+ bool changed;
+};
+
+void affinity__cleanup(struct affinity *a);
+void affinity__set(struct affinity *a, int cpu);
+int affinity__setup(struct affinity *a);
+
+#endif
--
2.21.0

2019-10-22 08:04:16

by Jiri Olsa

[permalink] [raw]
Subject: Re: Optimize perf stat for large number of events/cpus v2

On Sun, Oct 20, 2019 at 10:51:53AM -0700, Andi Kleen wrote:
> [The earlier v1 version had a lot of conflicts against some
> recent libperf changes in tip/perf/core. Resolve that and
> also fix some minor issues.]
>
> This patch kit optimizes perf stat for a large number of events
> on systems with many CPUs and PMUs.
>
> Some profiling shows that the most overhead is doing IPIs to
> all the target CPUs. We can optimize this by using sched_setaffinity
> to set the affinity to a target CPU once and then doing
> the perf operation for all events on that CPU. This requires
> some restructuring, but cuts the set up time quite a bit.
>
> In theory we could go further by parallelizing these setups
> too, but that would be much more complicated and for now just batching it
> per CPU seems to be sufficient. At some point with many more cores
> parallelization or a better bulk perf setup API might be needed though.
>
> In addition perf does a lot of redundant /sys accesses with
> many PMUs, which can be also expensve. This is also optimized.
>
> On a large test case (>700 events with many weak groups) on a 94 CPU
> system I go from
>
> real 0m8.607s
> user 0m0.550s
> sys 0m8.041s
>
> to
>
> real 0m3.269s
> user 0m0.760s
> sys 0m1.694s
>
> so shaving ~6 seconds of system time, at slightly more cost
> in perf stat itself. On a 4 socket system with the savings
> are more dramatic:
>
> real 0m15.641s
> user 0m0.873s
> sys 0m14.729s
>
> to
>
> real 0m4.493s
> user 0m1.578s
> sys 0m2.444s
>
> so 11s difference in the user visible set up time.
>
> Also available in
>
> git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc perf/stat-scale-4
>
> v1: Initial post.
> v2: Rebase. Fix some minor issues.

looks really helpful, I ack-ed 1st 2 patches,
I'll need more time for the rest

thanks,
jirka

2019-10-22 11:21:06

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH v2 2/9] perf evsel: Avoid close(-1)

On Sun, Oct 20, 2019 at 10:51:55AM -0700, Andi Kleen wrote:
> From: Andi Kleen <[email protected]>
>
> In some weak fallback cases close can be called a lot with -1. Check
> for this case and avoid calling close then.
>
> This is mainly to shut up valgrind which complains about this case.
>
> Signed-off-by: Andi Kleen <[email protected]>

Acked-by: Jiri Olsa <[email protected]>

thanks,
jirka

> ---
> tools/perf/lib/evsel.c | 3 ++-
> tools/perf/util/evsel.c | 3 ++-
> 2 files changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/tools/perf/lib/evsel.c b/tools/perf/lib/evsel.c
> index a8cb582e2721..5a89857b0381 100644
> --- a/tools/perf/lib/evsel.c
> +++ b/tools/perf/lib/evsel.c
> @@ -120,7 +120,8 @@ void perf_evsel__close_fd(struct perf_evsel *evsel)
>
> for (cpu = 0; cpu < xyarray__max_x(evsel->fd); cpu++)
> for (thread = 0; thread < xyarray__max_y(evsel->fd); ++thread) {
> - close(FD(evsel, cpu, thread));
> + if (FD(evsel, cpu, thread) >= 0)
> + close(FD(evsel, cpu, thread));
> FD(evsel, cpu, thread) = -1;
> }
> }
> diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
> index d831038b55f2..d4451846af93 100644
> --- a/tools/perf/util/evsel.c
> +++ b/tools/perf/util/evsel.c
> @@ -1815,7 +1815,8 @@ int evsel__open(struct evsel *evsel, struct perf_cpu_map *cpus,
> old_errno = errno;
> do {
> while (--thread >= 0) {
> - close(FD(evsel, cpu, thread));
> + if (FD(evsel, cpu, thread) >= 0)
> + close(FD(evsel, cpu, thread));
> FD(evsel, cpu, thread) = -1;
> }
> thread = nthreads;
> --
> 2.21.0
>

2019-10-22 14:20:29

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: Optimize perf stat for large number of events/cpus v2

Em Tue, Oct 22, 2019 at 10:02:23AM +0200, Jiri Olsa escreveu:
> On Sun, Oct 20, 2019 at 10:51:53AM -0700, Andi Kleen wrote:
> > v1: Initial post.
> > v2: Rebase. Fix some minor issues.

> looks really helpful, I ack-ed 1st 2 patches,
> I'll need more time for the rest

Thanks, applied the first two, will go thru the others.

- Arnaldo

2019-10-23 10:04:09

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH v2 4/9] perf affinity: Add infrastructure to save/restore affinity

On Sun, Oct 20, 2019 at 10:51:57AM -0700, Andi Kleen wrote:

SNIP

> +}
> diff --git a/tools/perf/util/affinity.h b/tools/perf/util/affinity.h
> new file mode 100644
> index 000000000000..e56148607e33
> --- /dev/null
> +++ b/tools/perf/util/affinity.h
> @@ -0,0 +1,15 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#ifndef AFFINITY_H
> +#define AFFINITY_H 1
> +
> +struct affinity {
> + unsigned char *orig_cpus;
> + unsigned char *sched_cpus;

why not use cpu_set_t directly?

jirka

2019-10-23 12:53:55

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH v2 9/9] perf stat: Use affinity for enabling/disabling events

On Sun, Oct 20, 2019 at 10:52:02AM -0700, Andi Kleen wrote:

SNIP

>
> void evlist__enable(struct evlist *evlist)
> {
> struct evsel *pos;
> + struct affinity affinity;
> + struct perf_cpu_map *cpus;
> + int i;
> +
> + if (affinity__setup(&affinity) < 0)
> + return;
> +
> + cpus = evlist__cpu_iter_start(evlist);
> + for (i = 0; i < cpus->nr; i++) {
> + int cpu = cpus->map[i];
> + affinity__set(&affinity, cpu);
>
> + evlist__for_each_entry(evlist, pos) {
> + if (evlist__cpu_iter_skip(pos, cpu))
> + continue;
> + if (!perf_evsel__is_group_leader(pos) || !pos->core.fd)
> + continue;

all the previous patches and this one have this code in common,
could we make this a single function, that would call a callback
that would have affinity set.. sort of like what we do in
cpu_function_call in the kernel

thanks,
jirka

> + evsel__enable_cpu(pos, pos->cpu_index);
> + evlist__cpu_iter_next(pos);
> + }
> + }
> + affinity__cleanup(&affinity);

SNIP

2019-10-23 20:04:20

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH v2 4/9] perf affinity: Add infrastructure to save/restore affinity

On Wed, Oct 23, 2019 at 11:59:11AM +0200, Jiri Olsa wrote:
> On Sun, Oct 20, 2019 at 10:51:57AM -0700, Andi Kleen wrote:
>
> SNIP
>
> > +}
> > diff --git a/tools/perf/util/affinity.h b/tools/perf/util/affinity.h
> > new file mode 100644
> > index 000000000000..e56148607e33
> > --- /dev/null
> > +++ b/tools/perf/util/affinity.h
> > @@ -0,0 +1,15 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +#ifndef AFFINITY_H
> > +#define AFFINITY_H 1
> > +
> > +struct affinity {
> > + unsigned char *orig_cpus;
> > + unsigned char *sched_cpus;
>
> why not use cpu_set_t directly?

Because it's too small in glibc (only 1024 CPUs) and perf already
supports more.

-andi

2019-10-23 20:04:31

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH v2 9/9] perf stat: Use affinity for enabling/disabling events

On Wed, Oct 23, 2019 at 12:30:48PM +0200, Jiri Olsa wrote:
> On Sun, Oct 20, 2019 at 10:52:02AM -0700, Andi Kleen wrote:
>
> SNIP
>
> >
> > void evlist__enable(struct evlist *evlist)
> > {
> > struct evsel *pos;
> > + struct affinity affinity;
> > + struct perf_cpu_map *cpus;
> > + int i;
> > +
> > + if (affinity__setup(&affinity) < 0)
> > + return;
> > +
> > + cpus = evlist__cpu_iter_start(evlist);
> > + for (i = 0; i < cpus->nr; i++) {
> > + int cpu = cpus->map[i];
> > + affinity__set(&affinity, cpu);
> >
> > + evlist__for_each_entry(evlist, pos) {
> > + if (evlist__cpu_iter_skip(pos, cpu))
> > + continue;
> > + if (!perf_evsel__is_group_leader(pos) || !pos->core.fd)
> > + continue;
>
> all the previous patches and this one have this code in common,
> could we make this a single function, that would call a callback
> that would have affinity set.. sort of like what we do in
> cpu_function_call in the kernel

I'm personally not a big friend of call backs. They usually make
the code harder to read and reason about.

Prefer to use callable libraries of common code.

Also the event open code has some more complex variants of this pattern
which would need multiple call backs.

I already factored the common code into the iterator.

I guess the for loop could be a macro, and affinity_set() could
perhaps accept the map and return the cpu. I'll add that in
the next version. This will reduce the common code by a few lines
more.

-Andi

2019-10-24 00:47:07

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH v2 4/9] perf affinity: Add infrastructure to save/restore affinity

On Wed, Oct 23, 2019 at 06:02:35AM -0700, Andi Kleen wrote:
> On Wed, Oct 23, 2019 at 11:59:11AM +0200, Jiri Olsa wrote:
> > On Sun, Oct 20, 2019 at 10:51:57AM -0700, Andi Kleen wrote:
> >
> > SNIP
> >
> > > +}
> > > diff --git a/tools/perf/util/affinity.h b/tools/perf/util/affinity.h
> > > new file mode 100644
> > > index 000000000000..e56148607e33
> > > --- /dev/null
> > > +++ b/tools/perf/util/affinity.h
> > > @@ -0,0 +1,15 @@
> > > +// SPDX-License-Identifier: GPL-2.0
> > > +#ifndef AFFINITY_H
> > > +#define AFFINITY_H 1
> > > +
> > > +struct affinity {
> > > + unsigned char *orig_cpus;
> > > + unsigned char *sched_cpus;
> >
> > why not use cpu_set_t directly?
>
> Because it's too small in glibc (only 1024 CPUs) and perf already
> supports more.

nice, we're using it all over the place.. how about using bitmap_alloc?

jirka

2019-10-24 01:21:58

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH v2 4/9] perf affinity: Add infrastructure to save/restore affinity

On Wed, Oct 23, 2019 at 04:30:49PM +0200, Jiri Olsa wrote:
> On Wed, Oct 23, 2019 at 06:02:35AM -0700, Andi Kleen wrote:
> > On Wed, Oct 23, 2019 at 11:59:11AM +0200, Jiri Olsa wrote:
> > > On Sun, Oct 20, 2019 at 10:51:57AM -0700, Andi Kleen wrote:
> > >
> > > SNIP
> > >
> > > > +}
> > > > diff --git a/tools/perf/util/affinity.h b/tools/perf/util/affinity.h
> > > > new file mode 100644
> > > > index 000000000000..e56148607e33
> > > > --- /dev/null
> > > > +++ b/tools/perf/util/affinity.h
> > > > @@ -0,0 +1,15 @@
> > > > +// SPDX-License-Identifier: GPL-2.0
> > > > +#ifndef AFFINITY_H
> > > > +#define AFFINITY_H 1
> > > > +
> > > > +struct affinity {
> > > > + unsigned char *orig_cpus;
> > > > + unsigned char *sched_cpus;
> > >
> > > why not use cpu_set_t directly?
> >
> > Because it's too small in glibc (only 1024 CPUs) and perf already
> > supports more.
>
> nice, we're using it all over the place.. how about using bitmap_alloc?

Okay.

The other places is mainly perf record from Alexey's recent affinity changes.
These probably need to be fixed.

+Alexey

And some stuff in bench/*. That's more nice to have.

-Andi

2019-10-24 07:16:54

by Alexey Budankov

[permalink] [raw]
Subject: Re: [PATCH v2 4/9] perf affinity: Add infrastructure to save/restore affinity


On 23.10.2019 17:52, Andi Kleen wrote:
> On Wed, Oct 23, 2019 at 04:30:49PM +0200, Jiri Olsa wrote:
>> On Wed, Oct 23, 2019 at 06:02:35AM -0700, Andi Kleen wrote:
>>> On Wed, Oct 23, 2019 at 11:59:11AM +0200, Jiri Olsa wrote:
>>>> On Sun, Oct 20, 2019 at 10:51:57AM -0700, Andi Kleen wrote:
>>>>
>>>> SNIP
>>>>
>>>>> +}
>>>>> diff --git a/tools/perf/util/affinity.h b/tools/perf/util/affinity.h
>>>>> new file mode 100644
>>>>> index 000000000000..e56148607e33
>>>>> --- /dev/null
>>>>> +++ b/tools/perf/util/affinity.h
>>>>> @@ -0,0 +1,15 @@
>>>>> +// SPDX-License-Identifier: GPL-2.0
>>>>> +#ifndef AFFINITY_H
>>>>> +#define AFFINITY_H 1
>>>>> +
>>>>> +struct affinity {
>>>>> + unsigned char *orig_cpus;
>>>>> + unsigned char *sched_cpus;
>>>>
>>>> why not use cpu_set_t directly?
>>>
>>> Because it's too small in glibc (only 1024 CPUs) and perf already
>>> supports more.
>>
>> nice, we're using it all over the place.. how about using bitmap_alloc?
>
> Okay.
>
> The other places is mainly perf record from Alexey's recent affinity changes.
> These probably need to be fixed.
>
> +Alexey

Despite the issue indeed looks generic for stat and record modes,
have you already observed record startup overhead somewhere in your setups?
I would, first, prefer to reproduce the overhead, to have stable use case
for evaluation and then, possibly, improvement.

~Alexey

2019-10-24 08:33:55

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH v2 4/9] perf affinity: Add infrastructure to save/restore affinity

On Wed, Oct 23, 2019 at 07:16:13PM +0300, Alexey Budankov wrote:
>
> On 23.10.2019 17:52, Andi Kleen wrote:
> > On Wed, Oct 23, 2019 at 04:30:49PM +0200, Jiri Olsa wrote:
> >> On Wed, Oct 23, 2019 at 06:02:35AM -0700, Andi Kleen wrote:
> >>> On Wed, Oct 23, 2019 at 11:59:11AM +0200, Jiri Olsa wrote:
> >>>> On Sun, Oct 20, 2019 at 10:51:57AM -0700, Andi Kleen wrote:
> >>>>
> >>>> SNIP
> >>>>
> >>>>> +}
> >>>>> diff --git a/tools/perf/util/affinity.h b/tools/perf/util/affinity.h
> >>>>> new file mode 100644
> >>>>> index 000000000000..e56148607e33
> >>>>> --- /dev/null
> >>>>> +++ b/tools/perf/util/affinity.h
> >>>>> @@ -0,0 +1,15 @@
> >>>>> +// SPDX-License-Identifier: GPL-2.0
> >>>>> +#ifndef AFFINITY_H
> >>>>> +#define AFFINITY_H 1
> >>>>> +
> >>>>> +struct affinity {
> >>>>> + unsigned char *orig_cpus;
> >>>>> + unsigned char *sched_cpus;
> >>>>
> >>>> why not use cpu_set_t directly?
> >>>
> >>> Because it's too small in glibc (only 1024 CPUs) and perf already
> >>> supports more.
> >>
> >> nice, we're using it all over the place.. how about using bitmap_alloc?
> >
> > Okay.
> >
> > The other places is mainly perf record from Alexey's recent affinity changes.
> > These probably need to be fixed.
> >
> > +Alexey
>
> Despite the issue indeed looks generic for stat and record modes,
> have you already observed record startup overhead somewhere in your setups?
> I would, first, prefer to reproduce the overhead, to have stable use case
> for evaluation and then, possibly, improvement.

What I meant the cpu_set usages you added in

commit 9d2ed64587c045304efe8872b0258c30803d370c
Author: Alexey Budankov <[email protected]>
Date: Tue Jan 22 20:47:43 2019 +0300

perf record: Allocate affinity masks

need to be fixed to allocate dynamically, or at least use MAX_NR_CPUs to
support systems with >1024CPUs. That's an independent functionality
problem.

I haven't seen any large enough perf record usage to run
into the IPI problems for record.

-Andi

2019-10-24 13:07:37

by Alexey Budankov

[permalink] [raw]
Subject: Re: [PATCH v2 4/9] perf affinity: Add infrastructure to save/restore affinity

On 23.10.2019 20:19, Andi Kleen wrote:
> On Wed, Oct 23, 2019 at 07:16:13PM +0300, Alexey Budankov wrote:
>>
>> On 23.10.2019 17:52, Andi Kleen wrote:
>>> On Wed, Oct 23, 2019 at 04:30:49PM +0200, Jiri Olsa wrote:
>>>> On Wed, Oct 23, 2019 at 06:02:35AM -0700, Andi Kleen wrote:
>>>>> On Wed, Oct 23, 2019 at 11:59:11AM +0200, Jiri Olsa wrote:
>>>>>> On Sun, Oct 20, 2019 at 10:51:57AM -0700, Andi Kleen wrote:
>>>>>>
>>>>>> SNIP
>>>>>>
>>>>>>> +}
>>>>>>> diff --git a/tools/perf/util/affinity.h b/tools/perf/util/affinity.h
>>>>>>> new file mode 100644
>>>>>>> index 000000000000..e56148607e33
>>>>>>> --- /dev/null
>>>>>>> +++ b/tools/perf/util/affinity.h
>>>>>>> @@ -0,0 +1,15 @@
>>>>>>> +// SPDX-License-Identifier: GPL-2.0
>>>>>>> +#ifndef AFFINITY_H
>>>>>>> +#define AFFINITY_H 1
>>>>>>> +
>>>>>>> +struct affinity {
>>>>>>> + unsigned char *orig_cpus;
>>>>>>> + unsigned char *sched_cpus;
>>>>>>
>>>>>> why not use cpu_set_t directly?
>>>>>
>>>>> Because it's too small in glibc (only 1024 CPUs) and perf already
>>>>> supports more.
>>>>
>>>> nice, we're using it all over the place.. how about using bitmap_alloc?
>>>
>>> Okay.
>>>
>>> The other places is mainly perf record from Alexey's recent affinity changes.
>>> These probably need to be fixed.
>>>
>>> +Alexey
>>
>> Despite the issue indeed looks generic for stat and record modes,
>> have you already observed record startup overhead somewhere in your setups?
>> I would, first, prefer to reproduce the overhead, to have stable use case
>> for evaluation and then, possibly, improvement.
>
> What I meant the cpu_set usages you added in
>
> commit 9d2ed64587c045304efe8872b0258c30803d370c
> Author: Alexey Budankov <[email protected]>
> Date: Tue Jan 22 20:47:43 2019 +0300
>
> perf record: Allocate affinity masks
>
> need to be fixed to allocate dynamically, or at least use MAX_NR_CPUs to
> support systems with >1024CPUs. That's an independent functionality
> problem.

Oh, it is clear now. Thanks for pointing this out. For that to move from
cpu_mask_t to new custom struct affinity type its API requires extension
to provide mask operations similar to the ones that cpu_mask_t provides:
CPU_ZERO(), CPU_SET(), CPU_EQUAL(), CPU_OR().

For example it could be like: affinity__mask_zero(), affinity__mask_set(),
affinity__mask_equal(), affinity__mask_or() and then the collecting part
of record could also be moved to struct affinity type and overcome >1024CPUs
limitation.

~Alexey

>
> I haven't seen any large enough perf record usage to run
> into the IPI problems for record.
>
> -Andi
>

2019-10-24 16:14:29

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH v2 4/9] perf affinity: Add infrastructure to save/restore affinity

On Wed, Oct 23, 2019 at 09:08:47PM +0300, Alexey Budankov wrote:
> On 23.10.2019 20:19, Andi Kleen wrote:
> > On Wed, Oct 23, 2019 at 07:16:13PM +0300, Alexey Budankov wrote:
> >>
> >> On 23.10.2019 17:52, Andi Kleen wrote:
> >>> On Wed, Oct 23, 2019 at 04:30:49PM +0200, Jiri Olsa wrote:
> >>>> On Wed, Oct 23, 2019 at 06:02:35AM -0700, Andi Kleen wrote:
> >>>>> On Wed, Oct 23, 2019 at 11:59:11AM +0200, Jiri Olsa wrote:
> >>>>>> On Sun, Oct 20, 2019 at 10:51:57AM -0700, Andi Kleen wrote:
> >>>>>>
> >>>>>> SNIP
> >>>>>>
> >>>>>>> +}
> >>>>>>> diff --git a/tools/perf/util/affinity.h b/tools/perf/util/affinity.h
> >>>>>>> new file mode 100644
> >>>>>>> index 000000000000..e56148607e33
> >>>>>>> --- /dev/null
> >>>>>>> +++ b/tools/perf/util/affinity.h
> >>>>>>> @@ -0,0 +1,15 @@
> >>>>>>> +// SPDX-License-Identifier: GPL-2.0
> >>>>>>> +#ifndef AFFINITY_H
> >>>>>>> +#define AFFINITY_H 1
> >>>>>>> +
> >>>>>>> +struct affinity {
> >>>>>>> + unsigned char *orig_cpus;
> >>>>>>> + unsigned char *sched_cpus;
> >>>>>>
> >>>>>> why not use cpu_set_t directly?
> >>>>>
> >>>>> Because it's too small in glibc (only 1024 CPUs) and perf already
> >>>>> supports more.
> >>>>
> >>>> nice, we're using it all over the place.. how about using bitmap_alloc?
> >>>
> >>> Okay.
> >>>
> >>> The other places is mainly perf record from Alexey's recent affinity changes.
> >>> These probably need to be fixed.
> >>>
> >>> +Alexey
> >>
> >> Despite the issue indeed looks generic for stat and record modes,
> >> have you already observed record startup overhead somewhere in your setups?
> >> I would, first, prefer to reproduce the overhead, to have stable use case
> >> for evaluation and then, possibly, improvement.
> >
> > What I meant the cpu_set usages you added in
> >
> > commit 9d2ed64587c045304efe8872b0258c30803d370c
> > Author: Alexey Budankov <[email protected]>
> > Date: Tue Jan 22 20:47:43 2019 +0300
> >
> > perf record: Allocate affinity masks
> >
> > need to be fixed to allocate dynamically, or at least use MAX_NR_CPUs to
> > support systems with >1024CPUs. That's an independent functionality
> > problem.
>
> Oh, it is clear now. Thanks for pointing this out. For that to move from
> cpu_mask_t to new custom struct affinity type its API requires extension
> to provide mask operations similar to the ones that cpu_mask_t provides:
> CPU_ZERO(), CPU_SET(), CPU_EQUAL(), CPU_OR().
>
> For example it could be like: affinity__mask_zero(), affinity__mask_set(),
> affinity__mask_equal(), affinity__mask_or() and then the collecting part
> of record could also be moved to struct affinity type and overcome >1024CPUs
> limitation.

Not sure you need to use my library, except perhaps the get_cpu_set_size()
function. It is somewhat specialized.

Everything else you can use normal Linux bitmap functions,
or call the sys call directly.

-Andi

2019-10-25 02:25:40

by Alexey Budankov

[permalink] [raw]
Subject: Re: [PATCH v2 4/9] perf affinity: Add infrastructure to save/restore affinity

On 24.10.2019 1:37, Andi Kleen wrote:
> On Wed, Oct 23, 2019 at 09:08:47PM +0300, Alexey Budankov wrote:
>> On 23.10.2019 20:19, Andi Kleen wrote:
>>> On Wed, Oct 23, 2019 at 07:16:13PM +0300, Alexey Budankov wrote:
>>>>
>>>> On 23.10.2019 17:52, Andi Kleen wrote:
>>>>> On Wed, Oct 23, 2019 at 04:30:49PM +0200, Jiri Olsa wrote:
>>>>>> On Wed, Oct 23, 2019 at 06:02:35AM -0700, Andi Kleen wrote:
>>>>>>> On Wed, Oct 23, 2019 at 11:59:11AM +0200, Jiri Olsa wrote:
>>>>>>>> On Sun, Oct 20, 2019 at 10:51:57AM -0700, Andi Kleen wrote:
>>>>>>>>
>>>>>>>> SNIP
>>>>>>>>
>>>>>>>>> +}
>>>>>>>>> diff --git a/tools/perf/util/affinity.h b/tools/perf/util/affinity.h
>>>>>>>>> new file mode 100644
>>>>>>>>> index 000000000000..e56148607e33
>>>>>>>>> --- /dev/null
>>>>>>>>> +++ b/tools/perf/util/affinity.h
>>>>>>>>> @@ -0,0 +1,15 @@
>>>>>>>>> +// SPDX-License-Identifier: GPL-2.0
>>>>>>>>> +#ifndef AFFINITY_H
>>>>>>>>> +#define AFFINITY_H 1
>>>>>>>>> +
>>>>>>>>> +struct affinity {
>>>>>>>>> + unsigned char *orig_cpus;
>>>>>>>>> + unsigned char *sched_cpus;
>>>>>>>>
>>>>>>>> why not use cpu_set_t directly?
>>>>>>>
>>>>>>> Because it's too small in glibc (only 1024 CPUs) and perf already
>>>>>>> supports more.
>>>>>>
>>>>>> nice, we're using it all over the place.. how about using bitmap_alloc?
>>>>>
>>>>> Okay.
>>>>>
>>>>> The other places is mainly perf record from Alexey's recent affinity changes.
>>>>> These probably need to be fixed.
>>>>>
>>>>> +Alexey
>>>>
>>>> Despite the issue indeed looks generic for stat and record modes,
>>>> have you already observed record startup overhead somewhere in your setups?
>>>> I would, first, prefer to reproduce the overhead, to have stable use case
>>>> for evaluation and then, possibly, improvement.
>>>
>>> What I meant the cpu_set usages you added in
>>>
>>> commit 9d2ed64587c045304efe8872b0258c30803d370c
>>> Author: Alexey Budankov <[email protected]>
>>> Date: Tue Jan 22 20:47:43 2019 +0300
>>>
>>> perf record: Allocate affinity masks
>>>
>>> need to be fixed to allocate dynamically, or at least use MAX_NR_CPUs to
>>> support systems with >1024CPUs. That's an independent functionality
>>> problem.
>>
>> Oh, it is clear now. Thanks for pointing this out. For that to move from
>> cpu_mask_t to new custom struct affinity type its API requires extension
>> to provide mask operations similar to the ones that cpu_mask_t provides:
>> CPU_ZERO(), CPU_SET(), CPU_EQUAL(), CPU_OR().
>>
>> For example it could be like: affinity__mask_zero(), affinity__mask_set(),
>> affinity__mask_equal(), affinity__mask_or() and then the collecting part
>> of record could also be moved to struct affinity type and overcome >1024CPUs
>> limitation.
>
> Not sure you need to use my library, except perhaps the get_cpu_set_size()
> function. It is somewhat specialized.

Ok, I see.

>
> Everything else you can use normal Linux bitmap functions,
> or call the sys call directly.

Thanks,
Alexey

>
> -Andi
>

Subject: [tip: perf/core] perf evsel: Avoid close(-1)

The following commit has been merged into the perf/core branch of tip:

Commit-ID: 2ccfb8bc2143ca347609d1d4434176d73a78d805
Gitweb: https://git.kernel.org/tip/2ccfb8bc2143ca347609d1d4434176d73a78d805
Author: Andi Kleen <[email protected]>
AuthorDate: Sun, 20 Oct 2019 10:51:55 -07:00
Committer: Arnaldo Carvalho de Melo <[email protected]>
CommitterDate: Wed, 06 Nov 2019 15:43:05 -03:00

perf evsel: Avoid close(-1)

In some weak fallback cases close can be called a lot with -1. Check for
this case and avoid calling close then.

This is mainly to shut up valgrind which complains about this case.

Signed-off-by: Andi Kleen <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
---
tools/perf/lib/evsel.c | 3 ++-
tools/perf/util/evsel.c | 3 ++-
2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/tools/perf/lib/evsel.c b/tools/perf/lib/evsel.c
index a8cb582..5a89857 100644
--- a/tools/perf/lib/evsel.c
+++ b/tools/perf/lib/evsel.c
@@ -120,7 +120,8 @@ void perf_evsel__close_fd(struct perf_evsel *evsel)

for (cpu = 0; cpu < xyarray__max_x(evsel->fd); cpu++)
for (thread = 0; thread < xyarray__max_y(evsel->fd); ++thread) {
- close(FD(evsel, cpu, thread));
+ if (FD(evsel, cpu, thread) >= 0)
+ close(FD(evsel, cpu, thread));
FD(evsel, cpu, thread) = -1;
}
}
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index d831038..d445184 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -1815,7 +1815,8 @@ out_close:
old_errno = errno;
do {
while (--thread >= 0) {
- close(FD(evsel, cpu, thread));
+ if (FD(evsel, cpu, thread) >= 0)
+ close(FD(evsel, cpu, thread));
FD(evsel, cpu, thread) = -1;
}
thread = nthreads;