2020-10-26 17:36:14

by Namhyung Kim

[permalink] [raw]
Subject: [RFC] perf evlist: Warn if event group has mixed sw/hw events

I found that order of events in a group impacts performance during the
open. If a group has a software event as a leader and has other
hardware events, the lead needs to be moved to a hardware context.
This includes RCU synchronization which takes about 20 msec on my
system. And this is just for a single group, so total time increases
in proportion to the number of event groups and the number of cpus.

On my 36 cpu system, opening 3 groups system-wide takes more than 2
seconds. You can see and compare it easily with the following:

$ time ./perf stat -a -e '{cs,cycles},{cs,cycles},{cs,cycles}' sleep 1
...
1.006333430 seconds time elapsed

real 0m3.969s
user 0m0.089s
sys 0m0.074s

$ time ./perf stat -a -e '{cycles,cs},{cycles,cs},{cycles,cs}' sleep 1
...
1.006755292 seconds time elapsed

real 0m1.144s
user 0m0.067s
sys 0m0.083s

This patch just added a warning before running it. I'd really want to
fix the kernel if possible but don't have a good idea. Thoughts?

Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/builtin-record.c | 2 +
tools/perf/builtin-stat.c | 2 +
tools/perf/builtin-top.c | 2 +
tools/perf/util/evlist.c | 78 +++++++++++++++++++++++++++++++++++++
tools/perf/util/evlist.h | 1 +
5 files changed, 85 insertions(+)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index adf311d15d3d..c0b08cacbae0 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -912,6 +912,8 @@ static int record__open(struct record *rec)

perf_evlist__config(evlist, opts, &callchain_param);

+ evlist__warn_mixed_group(evlist);
+
evlist__for_each_entry(evlist, pos) {
try_again:
if (evsel__open(pos, pos->core.cpus, pos->core.threads) < 0) {
diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index b01af171d94f..d5d4e02bda69 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -738,6 +738,8 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx)
if (affinity__setup(&affinity) < 0)
return -1;

+ evlist__warn_mixed_group(evsel_list);
+
evlist__for_each_cpu (evsel_list, i, cpu) {
affinity__set(&affinity, cpu);

diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index 7c64134472c7..9ad319cea948 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -1027,6 +1027,8 @@ static int perf_top__start_counters(struct perf_top *top)

perf_evlist__config(evlist, opts, &callchain_param);

+ evlist__warn_mixed_group(evlist);
+
evlist__for_each_entry(evlist, counter) {
try_again:
if (evsel__open(counter, top->evlist->core.cpus,
diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index 8bdf3d2c907c..02cff39e509e 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -28,6 +28,7 @@
#include <unistd.h>
#include <sched.h>
#include <stdlib.h>
+#include <dirent.h>

#include "parse-events.h"
#include <subcmd/parse-options.h>
@@ -1980,3 +1981,80 @@ struct evsel *evlist__find_evsel(struct evlist *evlist, int idx)
}
return NULL;
}
+
+static int *sw_types;
+static int nr_sw_types;
+
+static void collect_software_pmu_types(void)
+{
+ const char *known_sw_pmu[] = {
+ "software", "tracepoint", "breakpoint", "kprobe", "uprobe", "msr"
+ };
+ DIR *dir;
+ struct dirent *d;
+ char path[PATH_MAX];
+ int i;
+
+ if (sw_types != NULL)
+ return;
+
+ nr_sw_types = ARRAY_SIZE(known_sw_pmu);
+ sw_types = calloc(nr_sw_types, sizeof(int));
+ if (sw_types == NULL) {
+ pr_err("Memory allocation failed!\n");
+ return;
+ }
+
+ dir = opendir("/sys/bus/event_source/devices");
+ while ((d = readdir(dir)) != NULL) {
+ for (i = 0; i < nr_sw_types; i++) {
+ if (strcmp(d->d_name, known_sw_pmu[i]))
+ continue;
+
+ snprintf(path, sizeof(path), "%s/%s/type",
+ "bus/event_source/devices", d->d_name);
+ sysfs__read_int(path, &sw_types[i]);
+ }
+ }
+ closedir(dir);
+}
+
+static bool is_software_event(struct evsel *evsel)
+{
+ int i;
+
+ for (i = 0; i < nr_sw_types; i++) {
+ if (evsel->core.attr.type == (unsigned)sw_types[i])
+ return true;
+ }
+ return false;
+}
+
+void evlist__warn_mixed_group(struct evlist *evlist)
+{
+ struct evsel *leader, *evsel;
+ bool warn = true;
+
+ collect_software_pmu_types();
+
+ /* Warn if an event group has a sw leader and hw siblings */
+ evlist__for_each_entry(evlist, leader) {
+ if (!evsel__is_group_event(leader))
+ continue;
+
+ if (!is_software_event(leader))
+ continue;
+
+ for_each_group_member(evsel, leader) {
+ if (is_software_event(evsel))
+ continue;
+ if (!warn)
+ continue;
+
+ pr_warning("WARNING: Event group has mixed hw/sw events.\n"
+ "This will slow down the perf_event_open syscall.\n"
+ "Consider putting a hw event as a leader.\n\n");
+ warn = false;
+ }
+ }
+}
diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index e1a450322bc5..a5b0a1d03a00 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -387,4 +387,5 @@ int evlist__ctlfd_ack(struct evlist *evlist);
#define EVLIST_DISABLED_MSG "Events disabled\n"

struct evsel *evlist__find_evsel(struct evlist *evlist, int idx);
+void evlist__warn_mixed_group(struct evlist *evlist);
#endif /* __PERF_EVLIST_H */
--
2.29.0.rc1.297.gfa9743e501-goog


2020-11-01 08:29:44

by Chen, Rong A

[permalink] [raw]
Subject: [perf evlist] 5f2344c367: perf-sanity-tests.'import_perf'_in_python.fail

Greeting,

FYI, we noticed the following commit (built with gcc-9):

commit: 5f2344c36783a8b48c66c6417423c4ae9542519b ("perf evlist: Warn if event group has mixed sw/hw events")
url: https://github.com/0day-ci/linux/commits/Namhyung-Kim/perf-evlist-Warn-if-event-group-has-mixed-sw-hw-events/20201026-222036


in testcase: perf-sanity-tests
version: perf-x86_64-c85fb28b6f99-1_20201008
with following parameters:

perf_compiler: gcc
ucode: 0xdc



on test machine: 4 threads Intel(R) Core(TM) i5-6500 CPU @ 3.20GHz with 32G memory

caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):




If you fix the issue, kindly add following tag
Reported-by: kernel test robot <[email protected]>




2020-10-30 13:53:22 sudo /usr/src/perf_selftests-x86_64-rhel-8.3-5f2344c36783a8b48c66c6417423c4ae9542519b/tools/perf/perf test 19
19: 'import perf' in python : FAILED!
2020-10-30 13:53:22 sudo /usr/src/perf_selftests-x86_64-rhel-8.3-5f2344c36783a8b48c66c6417423c4ae9542519b/tools/perf/perf test 20
20: Breakpoint overflow signal handler : Ok
2020-10-30 13:53:22 sudo /usr/src/perf_selftests-x86_64-rhel-8.3-5f2344c36783a8b48c66c6417423c4ae9542519b/tools/perf/perf test 21
21: Breakpoint overflow sampling : Ok
2020-10-30 13:53:22 sudo /usr/src/perf_selftests-x86_64-rhel-8.3-5f2344c36783a8b48c66c6417423c4ae9542519b/tools/perf/perf test 22
22: Breakpoint accounting : Ok
2020-10-30 13:53:22 sudo /usr/src/perf_selftests-x86_64-rhel-8.3-5f2344c36783a8b48c66c6417423c4ae9542519b/tools/perf/perf test 23
23: Watchpoint :
23.1: Read Only Watchpoint : Skip
23.2: Write Only Watchpoint : Ok
23.3: Read / Write Watchpoint : Ok
23.4: Modify Watchpoint : Ok
2020-10-30 13:53:22 sudo /usr/src/perf_selftests-x86_64-rhel-8.3-5f2344c36783a8b48c66c6417423c4ae9542519b/tools/perf/perf test 24
24: Number of exit events of a simple workload : Ok
2020-10-30 13:53:22 sudo /usr/src/perf_selftests-x86_64-rhel-8.3-5f2344c36783a8b48c66c6417423c4ae9542519b/tools/perf/perf test 25
25: Software clock events period values : Ok
2020-10-30 13:53:22 sudo /usr/src/perf_selftests-x86_64-rhel-8.3-5f2344c36783a8b48c66c6417423c4ae9542519b/tools/perf/perf test 26
26: Object code reading : Ok
2020-10-30 13:53:24 sudo /usr/src/perf_selftests-x86_64-rhel-8.3-5f2344c36783a8b48c66c6417423c4ae9542519b/tools/perf/perf test 27
27: Sample parsing : Ok
2020-10-30 13:53:24 sudo /usr/src/perf_selftests-x86_64-rhel-8.3-5f2344c36783a8b48c66c6417423c4ae9542519b/tools/perf/perf test 28
28: Use a dummy software event to keep tracking : Ok
2020-10-30 13:53:24 sudo /usr/src/perf_selftests-x86_64-rhel-8.3-5f2344c36783a8b48c66c6417423c4ae9542519b/tools/perf/perf test 29
29: Parse with no sample_id_all bit set : Ok
2020-10-30 13:53:24 sudo /usr/src/perf_selftests-x86_64-rhel-8.3-5f2344c36783a8b48c66c6417423c4ae9542519b/tools/perf/perf test 30
30: Filter hist entries : Ok
2020-10-30 13:53:24 sudo /usr/src/perf_selftests-x86_64-rhel-8.3-5f2344c36783a8b48c66c6417423c4ae9542519b/tools/perf/perf test 31
31: Lookup mmap thread : Ok
2020-10-30 13:53:24 sudo /usr/src/perf_selftests-x86_64-rhel-8.3-5f2344c36783a8b48c66c6417423c4ae9542519b/tools/perf/perf test 32
32: Share thread maps : Ok
2020-10-30 13:53:24 sudo /usr/src/perf_selftests-x86_64-rhel-8.3-5f2344c36783a8b48c66c6417423c4ae9542519b/tools/perf/perf test 33
33: Sort output of hist entries : Ok
2020-10-30 13:53:24 sudo /usr/src/perf_selftests-x86_64-rhel-8.3-5f2344c36783a8b48c66c6417423c4ae9542519b/tools/perf/perf test 34
34: Cumulate child hist entries : Ok
2020-10-30 13:53:24 sudo /usr/src/perf_selftests-x86_64-rhel-8.3-5f2344c36783a8b48c66c6417423c4ae9542519b/tools/perf/perf test 35
35: Track with sched_switch : Ok
2020-10-30 13:53:24 sudo /usr/src/perf_selftests-x86_64-rhel-8.3-5f2344c36783a8b48c66c6417423c4ae9542519b/tools/perf/perf test 36
36: Filter fds with revents mask in a fdarray : Ok
2020-10-30 13:53:24 sudo /usr/src/perf_selftests-x86_64-rhel-8.3-5f2344c36783a8b48c66c6417423c4ae9542519b/tools/perf/perf test 37
37: Add fd to a fdarray, making it autogrow : Ok
2020-10-30 13:53:24 sudo /usr/src/perf_selftests-x86_64-rhel-8.3-5f2344c36783a8b48c66c6417423c4ae9542519b/tools/perf/perf test 38
38: kmod_path__parse : Ok
2020-10-30 13:53:24 sudo /usr/src/perf_selftests-x86_64-rhel-8.3-5f2344c36783a8b48c66c6417423c4ae9542519b/tools/perf/perf test 39
39: Thread map : Ok
2020-10-30 13:53:24 sudo /usr/src/perf_selftests-x86_64-rhel-8.3-5f2344c36783a8b48c66c6417423c4ae9542519b/tools/perf/perf test 41
41: Session topology : Ok
2020-10-30 13:53:24 sudo /usr/src/perf_selftests-x86_64-rhel-8.3-5f2344c36783a8b48c66c6417423c4ae9542519b/tools/perf/perf test 43
43: Synthesize thread map : Ok
2020-10-30 13:53:24 sudo /usr/src/perf_selftests-x86_64-rhel-8.3-5f2344c36783a8b48c66c6417423c4ae9542519b/tools/perf/perf test 44
44: Remove thread map : Ok
2020-10-30 13:53:24 sudo /usr/src/perf_selftests-x86_64-rhel-8.3-5f2344c36783a8b48c66c6417423c4ae9542519b/tools/perf/perf test 45
45: Synthesize cpu map : Ok
2020-10-30 13:53:24 sudo /usr/src/perf_selftests-x86_64-rhel-8.3-5f2344c36783a8b48c66c6417423c4ae9542519b/tools/perf/perf test 46
46: Synthesize stat config : Ok
2020-10-30 13:53:25 sudo /usr/src/perf_selftests-x86_64-rhel-8.3-5f2344c36783a8b48c66c6417423c4ae9542519b/tools/perf/perf test 47
47: Synthesize stat : Ok
2020-10-30 13:53:25 sudo /usr/src/perf_selftests-x86_64-rhel-8.3-5f2344c36783a8b48c66c6417423c4ae9542519b/tools/perf/perf test 48
48: Synthesize stat round : Ok
2020-10-30 13:53:25 sudo /usr/src/perf_selftests-x86_64-rhel-8.3-5f2344c36783a8b48c66c6417423c4ae9542519b/tools/perf/perf test 49
49: Synthesize attr update : Ok
2020-10-30 13:53:25 sudo /usr/src/perf_selftests-x86_64-rhel-8.3-5f2344c36783a8b48c66c6417423c4ae9542519b/tools/perf/perf test 50
50: Event times : Ok
2020-10-30 13:53:25 sudo /usr/src/perf_selftests-x86_64-rhel-8.3-5f2344c36783a8b48c66c6417423c4ae9542519b/tools/perf/perf test 51
51: Read backward ring buffer : Ok
2020-10-30 13:53:25 sudo /usr/src/perf_selftests-x86_64-rhel-8.3-5f2344c36783a8b48c66c6417423c4ae9542519b/tools/perf/perf test 52
52: Print cpu map : Ok
2020-10-30 13:53:25 sudo /usr/src/perf_selftests-x86_64-rhel-8.3-5f2344c36783a8b48c66c6417423c4ae9542519b/tools/perf/perf test 53
53: Merge cpu map : Ok
2020-10-30 13:53:25 sudo /usr/src/perf_selftests-x86_64-rhel-8.3-5f2344c36783a8b48c66c6417423c4ae9542519b/tools/perf/perf test 54
54: Probe SDT events : Ok
2020-10-30 13:53:25 sudo /usr/src/perf_selftests-x86_64-rhel-8.3-5f2344c36783a8b48c66c6417423c4ae9542519b/tools/perf/perf test 55
55: is_printable_array : Ok
2020-10-30 13:53:25 sudo /usr/src/perf_selftests-x86_64-rhel-8.3-5f2344c36783a8b48c66c6417423c4ae9542519b/tools/perf/perf test 56
56: Print bitmap : Ok
2020-10-30 13:53:25 sudo /usr/src/perf_selftests-x86_64-rhel-8.3-5f2344c36783a8b48c66c6417423c4ae9542519b/tools/perf/perf test 57
57: perf hooks : Ok
2020-10-30 13:53:25 sudo /usr/src/perf_selftests-x86_64-rhel-8.3-5f2344c36783a8b48c66c6417423c4ae9542519b/tools/perf/perf test 59
59: unit_number__scnprintf : Ok
2020-10-30 13:53:25 sudo /usr/src/perf_selftests-x86_64-rhel-8.3-5f2344c36783a8b48c66c6417423c4ae9542519b/tools/perf/perf test 60
60: mem2node : Ok
2020-10-30 13:53:25 sudo /usr/src/perf_selftests-x86_64-rhel-8.3-5f2344c36783a8b48c66c6417423c4ae9542519b/tools/perf/perf test 61
61: time utils : Ok
2020-10-30 13:53:25 sudo /usr/src/perf_selftests-x86_64-rhel-8.3-5f2344c36783a8b48c66c6417423c4ae9542519b/tools/perf/perf test 62
62: Test jit_write_elf : Ok
2020-10-30 13:53:25 sudo /usr/src/perf_selftests-x86_64-rhel-8.3-5f2344c36783a8b48c66c6417423c4ae9542519b/tools/perf/perf test 63
63: Test libpfm4 support : Skip (not compiled in)
2020-10-30 13:53:25 sudo /usr/src/perf_selftests-x86_64-rhel-8.3-5f2344c36783a8b48c66c6417423c4ae9542519b/tools/perf/perf test 64
64: Test api io : Ok
2020-10-30 13:53:25 sudo /usr/src/perf_selftests-x86_64-rhel-8.3-5f2344c36783a8b48c66c6417423c4ae9542519b/tools/perf/perf test 65
65: maps__merge_in : Ok
2020-10-30 13:53:25 sudo /usr/src/perf_selftests-x86_64-rhel-8.3-5f2344c36783a8b48c66c6417423c4ae9542519b/tools/perf/perf test 66
66: Demangle Java : Ok
2020-10-30 13:53:25 sudo /usr/src/perf_selftests-x86_64-rhel-8.3-5f2344c36783a8b48c66c6417423c4ae9542519b/tools/perf/perf test 67
67: Parse and process metrics : Ok
2020-10-30 13:53:25 sudo /usr/src/perf_selftests-x86_64-rhel-8.3-5f2344c36783a8b48c66c6417423c4ae9542519b/tools/perf/perf test 68
68: PE file support : Ok
2020-10-30 13:53:25 sudo /usr/src/perf_selftests-x86_64-rhel-8.3-5f2344c36783a8b48c66c6417423c4ae9542519b/tools/perf/perf test 69
69: Event expansion for cgroups : Ok
2020-10-30 13:53:25 sudo /usr/src/perf_selftests-x86_64-rhel-8.3-5f2344c36783a8b48c66c6417423c4ae9542519b/tools/perf/perf test 70
70: x86 rdpmc : Ok
2020-10-30 13:53:25 sudo /usr/src/perf_selftests-x86_64-rhel-8.3-5f2344c36783a8b48c66c6417423c4ae9542519b/tools/perf/perf test 71
71: Convert perf time to TSC : Ok
2020-10-30 13:53:25 sudo /usr/src/perf_selftests-x86_64-rhel-8.3-5f2344c36783a8b48c66c6417423c4ae9542519b/tools/perf/perf test 72
72: DWARF unwind : Ok
2020-10-30 13:53:25 sudo /usr/src/perf_selftests-x86_64-rhel-8.3-5f2344c36783a8b48c66c6417423c4ae9542519b/tools/perf/perf test 73
73: x86 instruction decoder - new instructions : Ok
2020-10-30 13:53:25 sudo /usr/src/perf_selftests-x86_64-rhel-8.3-5f2344c36783a8b48c66c6417423c4ae9542519b/tools/perf/perf test 74
74: Intel PT packet decoder : Ok
2020-10-30 13:53:25 sudo /usr/src/perf_selftests-x86_64-rhel-8.3-5f2344c36783a8b48c66c6417423c4ae9542519b/tools/perf/perf test 75
75: x86 bp modify : Ok
2020-10-30 13:53:25 sudo /usr/src/perf_selftests-x86_64-rhel-8.3-5f2344c36783a8b48c66c6417423c4ae9542519b/tools/perf/perf test 76
76: probe libc's inet_pton & backtrace it with ping : Ok
2020-10-30 13:53:27 sudo /usr/src/perf_selftests-x86_64-rhel-8.3-5f2344c36783a8b48c66c6417423c4ae9542519b/tools/perf/perf test 77
77: Zstd perf.data compression/decompression : Ok
2020-10-30 13:53:28 sudo /usr/src/perf_selftests-x86_64-rhel-8.3-5f2344c36783a8b48c66c6417423c4ae9542519b/tools/perf/perf test 78
78: build id cache operations : Ok
2020-10-30 13:53:30 sudo /usr/src/perf_selftests-x86_64-rhel-8.3-5f2344c36783a8b48c66c6417423c4ae9542519b/tools/perf/perf test 79
79: Check Arm CoreSight trace data recording and synthesized samples: Skip



To reproduce:

git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp install job.yaml # job file is attached in this email
bin/lkp run job.yaml



Thanks,
Rong Chen


Attachments:
(No filename) (12.90 kB)
config-5.10.0-rc1-00001-g5f2344c36783 (174.23 kB)
job-script (5.64 kB)
kmsg.xz (32.19 kB)
perf-sanity-tests (33.70 kB)
job.yaml (4.63 kB)
reproduce (8.66 kB)
Download all attachments

2020-11-05 08:42:52

by Stephane Eranian

[permalink] [raw]
Subject: Re: [RFC] perf evlist: Warn if event group has mixed sw/hw events

On Mon, Oct 26, 2020 at 7:19 AM Namhyung Kim <[email protected]> wrote:
>
> I found that order of events in a group impacts performance during the
> open. If a group has a software event as a leader and has other
> hardware events, the lead needs to be moved to a hardware context.
> This includes RCU synchronization which takes about 20 msec on my
> system. And this is just for a single group, so total time increases
> in proportion to the number of event groups and the number of cpus.
>
> On my 36 cpu system, opening 3 groups system-wide takes more than 2
> seconds. You can see and compare it easily with the following:
>
> $ time ./perf stat -a -e '{cs,cycles},{cs,cycles},{cs,cycles}' sleep 1
> ...
> 1.006333430 seconds time elapsed
>
> real 0m3.969s
> user 0m0.089s
> sys 0m0.074s
>
> $ time ./perf stat -a -e '{cycles,cs},{cycles,cs},{cycles,cs}' sleep 1
> ...
> 1.006755292 seconds time elapsed
>
> real 0m1.144s
> user 0m0.067s
> sys 0m0.083s
>
> This patch just added a warning before running it. I'd really want to
> fix the kernel if possible but don't have a good idea. Thoughts?
>
This is a problem for us. This has caused problems on our systems with
perf command taking much longer than expected and firing timeouts.

The cost of perf_event_open() should not be so dependent on the order
of the events in a group. The penalty incurred by synchronize_rcu()
is very large and likely does not scale too well. Scalability may not
only be impacted by the number of CPUs of the machine. I am not an
expert
at RCU but it seems it exposes perf_event_open() to penalties caused
by other subsystem operations. I am wondering if there would be a
different way of handling the change of group type that would avoid
the high cost of synchronize_rcu().


> Signed-off-by: Namhyung Kim <[email protected]>
> ---
> tools/perf/builtin-record.c | 2 +
> tools/perf/builtin-stat.c | 2 +
> tools/perf/builtin-top.c | 2 +
> tools/perf/util/evlist.c | 78 +++++++++++++++++++++++++++++++++++++
> tools/perf/util/evlist.h | 1 +
> 5 files changed, 85 insertions(+)
>
> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
> index adf311d15d3d..c0b08cacbae0 100644
> --- a/tools/perf/builtin-record.c
> +++ b/tools/perf/builtin-record.c
> @@ -912,6 +912,8 @@ static int record__open(struct record *rec)
>
> perf_evlist__config(evlist, opts, &callchain_param);
>
> + evlist__warn_mixed_group(evlist);
> +
> evlist__for_each_entry(evlist, pos) {
> try_again:
> if (evsel__open(pos, pos->core.cpus, pos->core.threads) < 0) {
> diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
> index b01af171d94f..d5d4e02bda69 100644
> --- a/tools/perf/builtin-stat.c
> +++ b/tools/perf/builtin-stat.c
> @@ -738,6 +738,8 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx)
> if (affinity__setup(&affinity) < 0)
> return -1;
>
> + evlist__warn_mixed_group(evsel_list);
> +
> evlist__for_each_cpu (evsel_list, i, cpu) {
> affinity__set(&affinity, cpu);
>
> diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
> index 7c64134472c7..9ad319cea948 100644
> --- a/tools/perf/builtin-top.c
> +++ b/tools/perf/builtin-top.c
> @@ -1027,6 +1027,8 @@ static int perf_top__start_counters(struct perf_top *top)
>
> perf_evlist__config(evlist, opts, &callchain_param);
>
> + evlist__warn_mixed_group(evlist);
> +
> evlist__for_each_entry(evlist, counter) {
> try_again:
> if (evsel__open(counter, top->evlist->core.cpus,
> diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
> index 8bdf3d2c907c..02cff39e509e 100644
> --- a/tools/perf/util/evlist.c
> +++ b/tools/perf/util/evlist.c
> @@ -28,6 +28,7 @@
> #include <unistd.h>
> #include <sched.h>
> #include <stdlib.h>
> +#include <dirent.h>
>
> #include "parse-events.h"
> #include <subcmd/parse-options.h>
> @@ -1980,3 +1981,80 @@ struct evsel *evlist__find_evsel(struct evlist *evlist, int idx)
> }
> return NULL;
> }
> +
> +static int *sw_types;
> +static int nr_sw_types;
> +
> +static void collect_software_pmu_types(void)
> +{
> + const char *known_sw_pmu[] = {
> + "software", "tracepoint", "breakpoint", "kprobe", "uprobe", "msr"
> + };
> + DIR *dir;
> + struct dirent *d;
> + char path[PATH_MAX];
> + int i;
> +
> + if (sw_types != NULL)
> + return;
> +
> + nr_sw_types = ARRAY_SIZE(known_sw_pmu);
> + sw_types = calloc(nr_sw_types, sizeof(int));
> + if (sw_types == NULL) {
> + pr_err("Memory allocation failed!\n");
> + return;
> + }
> +
> + dir = opendir("/sys/bus/event_source/devices");
> + while ((d = readdir(dir)) != NULL) {
> + for (i = 0; i < nr_sw_types; i++) {
> + if (strcmp(d->d_name, known_sw_pmu[i]))
> + continue;
> +
> + snprintf(path, sizeof(path), "%s/%s/type",
> + "bus/event_source/devices", d->d_name);
> + sysfs__read_int(path, &sw_types[i]);
> + }
> + }
> + closedir(dir);
> +}
> +
> +static bool is_software_event(struct evsel *evsel)
> +{
> + int i;
> +
> + for (i = 0; i < nr_sw_types; i++) {
> + if (evsel->core.attr.type == (unsigned)sw_types[i])
> + return true;
> + }
> + return false;
> +}
> +
> +void evlist__warn_mixed_group(struct evlist *evlist)
> +{
> + struct evsel *leader, *evsel;
> + bool warn = true;
> +
> + collect_software_pmu_types();
> +
> + /* Warn if an event group has a sw leader and hw siblings */
> + evlist__for_each_entry(evlist, leader) {
> + if (!evsel__is_group_event(leader))
> + continue;
> +
> + if (!is_software_event(leader))
> + continue;
> +
> + for_each_group_member(evsel, leader) {
> + if (is_software_event(evsel))
> + continue;
> + if (!warn)
> + continue;
> +
> + pr_warning("WARNING: Event group has mixed hw/sw events.\n"
> + "This will slow down the perf_event_open syscall.\n"
> + "Consider putting a hw event as a leader.\n\n");
> + warn = false;
> + }
> + }
> +}
> diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
> index e1a450322bc5..a5b0a1d03a00 100644
> --- a/tools/perf/util/evlist.h
> +++ b/tools/perf/util/evlist.h
> @@ -387,4 +387,5 @@ int evlist__ctlfd_ack(struct evlist *evlist);
> #define EVLIST_DISABLED_MSG "Events disabled\n"
>
> struct evsel *evlist__find_evsel(struct evlist *evlist, int idx);
> +void evlist__warn_mixed_group(struct evlist *evlist);
> #endif /* __PERF_EVLIST_H */
> --
> 2.29.0.rc1.297.gfa9743e501-goog
>